# How to handle Unicode in Python 2?
Quoted from Philip Guo [blog](http://www.pgbovine.net/unicode-python.htm):

>In Python 2, if you see a `str` object, convert it to a unicode object right away by calling `.decode('utf-8')`. Process all strings as `unicode` objects, not `str` objects. If you need to write a `unicode` object out to a file or database, first call .`encode('utf-8')` on it. 

* This is the string text, write normal text

In [1]:
str_text = 'Friedrichstra\xc3\x9fe'  
print type(str_text)
with open("file1.txt", 'w') as file:
    file.write(str_text)
    # Expect a file with Friedrichstraße

<type 'str'>


* This is unicode text. It needs to be __encoded__ before writing to file

In [2]:
unicode_text = u'Friedrichstra\xdfe' 
print type(unicode_text)
with open("file2.txt", 'w') as file:
    file.write(unicode_text.encode("utf-8"))
    # Expect a file with Friedrichstraße

<type 'unicode'>


* Read a file with unicode character

In [3]:
with open("file2.txt", 'r') as file:
    for line in file:
        print line

Friedrichstraße


# How to convert from str to unicode? 
### Only apply in order:
* str     ====(decode)====> unicode
* unicode ====(encode)====> str

Other methods would raise error. 
* `str_text.encode("utf-8")`
* `unicode_text.decode("utf-8")` 

In [4]:
str_text.decode("utf-8") == unicode_text

True

In [5]:
str_text == unicode_text.encode("utf-8")

True

* Memebership check is also a form of comparison check. If both are not unicode, the result will be False by default. 

In [6]:
unicode_text in [str_text]

  """Entry point for launching an IPython kernel.


False

* Good practice: Convert both in Unicode

In [7]:
unicode_text in [str_text.decode('utf-8')]

True

### The reverse direction works only if there is no unicode character in the text (only ASCII)

In [8]:
"abc".encode('utf-8')  # But the following would throw error str_text.encode('utf-8') because of non-ASCII ß

'abc'

In [9]:
"abc".decode('utf-8')  # But unicode_text.decode('utf-8') wouth throw error because of non-ASCII ß

u'abc'

### In the mixture of both types

In [10]:
unicode_text in [unicode_text]

True

# Write unicode to CSV file 
[StackOverflow](https://stackoverflow.com/questions/17245415/read-and-write-csv-files-including-unicode-with-python-2-7)

In [11]:
import csv

tests={'German': [u'Straße',u'auslösen',u'zerstören'], 
       'French': [u'français',u'américaine',u'épais'], 
       'Chinese': [u'中國的',u'英語',u'美國人']}

with open('utf.csv','w') as fout:
    writer=csv.writer(fout)    
    writer.writerows([tests.keys()])
    for row in zip(*tests.values()):
        row=[s.encode('utf-8') for s in row]
        writer.writerows([row])

with open('utf.csv','r') as fin:
    reader=csv.reader(fin)
    for row in reader:
        temp=list(row)
        fmt=u'{:<15}'*len(temp)
        print fmt.format(*[s.decode('utf-8') for s in temp])

German         Chinese        French         
Straße         中國的            français       
auslösen       英語             américaine     
zerstören      美國人            épais          


# Using unicodecsv

In [12]:
import unicodecsv as csv

filename = "unicode.csv"

# Write to file
with open(filename, 'w') as f:
    w = csv.writer(f, encoding='utf-8')
    w.writerow([u'é', u'ñ', 'a'])
    
# Read from file
with open(filename, 'r') as f:
    w = csv.reader(f, encoding='utf-8')
    print next(w)


[u'\xe9', u'\xf1', u'a']


# More indept about Unicode
Unicode Howto [Python Official Guide](https://docs.python.org/2/howto/unicode.html)
* Treatment of special convertion functions such as: `str`, `unicode`, `chr`