# Dataframe and string encodings
There are a few option how to handle decoding issues:

- `strict` (raise a UnicodeDecodeError exception)
- `replace` (add U+FFFD, ‘REPLACEMENT CHARACTER’), or
- `ignore` (just leave the character out of the Unicode result)

Examples:

    >>> unicode('\x80abc', errors='strict')     
    Traceback (most recent call last):
        ...
    UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0:
    ordinal not in range(128)
    >>> unicode('\x80abc', errors='replace')
    u'\ufffdabc'
    >>> unicode('\x80abc', errors='ignore')
    u'abc'


In [1]:
import numpy as np
import pandas as pd

In [2]:
df = pd.DataFrame(
    {'a': ['foo1', 'bar1', 'baz1'],
     'b': ['foo2', 'bar2', 'baz2'],
     'c': ['foo3', 'bar3', 'baz33️⃣'],
     'd': ['foo4', 'bar5', 'baz5'],
     'e': ['foo4', 'bar5', 'baz5'],
     'f': [6, 6, 6]}
)

In [3]:
df

Unnamed: 0,a,b,c,d,e,f
0,foo1,foo2,foo3,foo4,foo4,6
1,bar1,bar2,bar3,bar5,bar5,6
2,baz1,baz2,baz33️⃣,baz5,baz5,6


In [4]:
df.dtypes

a    object
b    object
c    object
d    object
e    object
f     int64
dtype: object

In [5]:
def convert(s):
    return s.encode('latin1', 'ignore')

In [6]:
df['c'].apply(convert)

0     b'foo3'
1     b'bar3'
2    b'baz33'
Name: c, dtype: object

In [7]:
# replace column c
df.c = df.c.str.encode('latin1', 'replace')

In [8]:
df

Unnamed: 0,a,b,c,d,e,f
0,foo1,foo2,b'foo3',foo4,foo4,6
1,bar1,bar2,b'bar3',bar5,bar5,6
2,baz1,baz2,b'baz33??',baz5,baz5,6


In [9]:
print(df.to_string())

      a     b           c     d     e  f
0  foo1  foo2     b'foo3'  foo4  foo4  6
1  bar1  bar2     b'bar3'  bar5  bar5  6
2  baz1  baz2  b'baz33??'  baz5  baz5  6
