# Understanding encodings

Below is a summary of the compatibility with different encodings on reading and writing files.



| Type of Character | Compatible with utf8  | Compatible with ascii |Compatible with latin-1
|------|------|-----|---|
|   ASCII (0-127) | Yes|Yes|Yes|
|   Extended ASCII (128-255)  | Yes|No|Reads a different character|
|   Unicode  | Yes|No|Reads a different character|


### <b>ASCII Characters 0-127</b>

Open a file with ASCII encoding and write to the file string <b>'ab'</b>. 

In [5]:
import codecs

c = 'ab'
f = codecs.open('test_ascii_1.txt', encoding='ascii', mode='w+')
f.write(c)
f.close()

Read the file written in <b>ASCII</b> back with <b>ASCII</b> encoding gives the right text

In [16]:
f_openWithASCII = codecs.open('test_ascii_1.txt', encoding='ascii', mode='r+')
s = f_openWithASCII.read()
print s
f_openWithASCII.close()

ab


Read the file written in <b>ASCII</b> with <b>utf-8</b> encoding gives the right text.

In [23]:
f_openWithUTF8 = codecs.open('test_ascii_1.txt', encoding='utf-8', mode='r+')
s = f_openWithUTF8.read()
print s
f_openWithUTF8.close()

ab


Read the file written in <b>ASCII</b> with <b>latin-1</b> encoding gives the right text.

In [27]:
out_file = codecs.open('test_ascii_1.txt', encoding='latin-1', mode='r+')
s = out_file.read()
print s
out_file.close()

ab


-----

Open a file with <b>utf-8</b> encoding and write to the file string <b>'ab'</b>. 

In [30]:
import sys

reload(sys)
sys.setdefaultencoding('utf8')

c='ab'
f = codecs.open('test_utf8_1.txt', encoding='utf-8', mode='w+')
f.write(c)
f.close()

Read the file written in <b>utf-8</b> back with <b>utf-8</b> encoding gives the right character 'ab'

In [20]:
f_openWithUTF8 = codecs.open('test_utf8_1.txt', encoding='utf-8', mode='r+')
s = f_openWithUTF8.read()
print s
f_openWithUTF8.close()

ab


Read the file written in <b>utf-8</b> back with <b>ASCII</b> encoding produces the right text.

In [19]:
f_openWithASCII = codecs.open('test_utf8_1.txt', encoding='ascii', mode='r+')
s = f_openWithASCII.read()
print s
f_openWithASCII.close()

ab


Read the file written in <b>utf-8</b> with <b>latin-1</b> encoding produces the right text.

In [25]:
out_file = codecs.open('test_utf8_1.txt', encoding='latin-1', mode='r+')
s = out_file.read()
print s
out_file.close()

ab


In [40]:
# import os
# clear = lambda: os.system('cls')
# clear()

0

In [31]:
# sys.getdefaultencoding()

'utf8'

*******

### <b>Extended ASCII Characters (128-255)</b>

Open a file with ascii encoding and try writing to the file an extended ascii character <b>'â'</b>
This fails as it is not possible to store an extended ASCII character in an ASCII encoding file.<break>It throws error that 'ascii' can't decode the byte.

In [13]:
c = 'â'
try:
    f = codecs.open('test_extendedASCII.txt', encoding='ascii', mode='w+')
    f.write(c)
    f.close()
except ValueError as error:
    print error

'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)


Open a file with <b>utf-8</b> encoding and write to the file character <b>'â'</b>. 

In [42]:
c = 'â'
f_ext_char = codecs.open('test_extendedUTF-8.txt', encoding='utf-8', mode='w+')
f_ext_char.write(c)
f_ext_char.close()

Read the <i>extended character</i> written in <b>utf-8</b> encoding file with <b>ASCII</b> encoding.
This fails since a file containing extended ASCII characters cannot be read with ASCII encoding.  

In [14]:
try:
    f_read_ext_char = codecs.open('test_extendedUTF-8.txt', encoding='ascii', mode='r+')
    char_read = f_read_ext_char.read()
    print char_read
    f_read_ext_char.close()
except ValueError as error:
    print error

'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)


Read the <i>extended character</i> written in <b>utf-8</b> encoding file with <b>utf-8</b> encoding.
This produces right text.

In [18]:
f_read_ext_char = codecs.open('test_extendedUTF-8.txt', encoding='utf-8', mode='r+')
char_read = f_read_ext_char.read()
print char_read
f_read_ext_char.close()

â


Read the <i>extended character</i> written in <b>utf-8</b> encoding with <b>latin-1</b> encoding. This returns a different character 'Ã¢' instead of â

In [22]:
out_file = codecs.open('test_extendedUTF-8.txt', encoding='latin-1', mode='r+')
char_read = out_file.read()
print char_read
out_file.close()

Ã¢


-------

### <b>Unicode characters</b> 

Open a file with <b>ASCII</b> encoding and write to the file a <i>unicode character</i> like <b>'Ђ'</b> which is 'CYRILLIC CAPITAL LETTER DJE' (U+0402).
This fails as it is not possible to write a unicode character to a file in ASCII Encoding.

In [15]:
chr1 = 'Ђ'
try:
    f = codecs.open('test_unicode_to_ascii.txt', encoding='ascii', mode='w+')
    f.write(chr1)
    f.close()
except ValueError as error:
    print error
    

'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128)


Open a file with <b>utf-8</b> encoding and write to the file character <b>'Ђ'</b> . 

In [57]:
f = codecs.open('test_unicode2.txt', encoding='utf-8', mode='w+')
f.write(chr1)
f.close()

In [58]:
sys.getdefaultencoding()

'utf8'

Read the <i>unicode</i> character written in <b>utf-8</b> encoding file with <b>utf-8</b> encoding. This produces the right result.

In [17]:
f_read_unicode_char = codecs.open('test_unicode2.txt', encoding='utf-8', mode='r+')
char_read = f_read_unicode_char.read()
print char_read
f_read_unicode_char.close()

Ђ


Read the <i>unicode</i> character written in <b>utf-8</b> encoding file with <b>latin-1</b> encoding. A different character Ð is read instead of Ђ

In [21]:
out_file = codecs.open('test_unicode2.txt', encoding='latin-1', mode='r+')
char_read = out_file.read()
print char_read
out_file.close()

Ð
