DictReader doesn't work with io.StringIO (Python 2.7) #42

arielpontes · 2015-01-26T10:10:49Z

As described in this SO question, I am getting the following error with unicodecsv.DictReader:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe3' in position 2: ordinal not in range(128)

Here's a simplified version of my code:

from io import StringIO
from unicodecsv import DictReader, Dialect, QUOTE_MINIMAL

data = (
    'first_name,last_name,email\r'
    'Elmer,Fudd,elmer@looneytunes.com\r'
    'Jo\xc3\xa3o Ant\xc3\xb4nio,Ara\xc3\xbajo,joaoantonio@araujo.com\r'
)

unicode_data = StringIO(unicode(data, 'utf-8-sig'), newline=None)

class CustomDialect(Dialect):
    delimiter = ','
    doublequote = True
    escapechar = '\\'
    lineterminator = '\r\n'
    quotechar = '"'
    quoting = QUOTE_MINIMAL
    skipinitialspace = True

rows = DictReader(unicode_data, dialect=CustomDialect)

for row in rows:
    print row

If I replace StringIO with BytesIO, the encoding works but I can't send the newlines argument anymore and then I get:

Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?

The text was updated successfully, but these errors were encountered:

jdunck · 2015-01-26T20:20:06Z

It may not be clear from the docs, but the input to a unicodecsv reader is expected to be bytes (str in python2), not unicode, so you should be using BytesIO rather than StringIO.

Testing with BytesIO rather than StringIO, I do see the "new-line character seen in unquoted field" error. I think this is a bug in the underlying csv module - https://docs.python.org/2/library/csv.html#csv.Dialect.lineterminator "The reader is hard-coded to recognise either '\r' or '\n' as end-of-line, and ignores lineterminator. This behavior may change in the future."

Using \r (as with your original data), I can reproduce this with a normal file, not io.BytesIO or io.StringIO -- and of course unicodecsv does generally work with files.

When I change your data to use \n rather than \r, then the code works:

from io import StringIO, BytesIO
from unicodecsv import DictReader, Dialect, QUOTE_MINIMAL

data = (
    'first_name,last_name,email\n'
    'Elmer,Fudd,elmer@looneytunes.com\n'
    'Jo\xc3\xa3o Ant\xc3\xb4nio,Ara\xc3\xbajo,joaoantonio@araujo.com\n'
)

unicode_data = StringIO(unicode(data, 'utf-8-sig'), newline=None)
str_data = BytesIO(data)

class CustomDialect(Dialect):
    delimiter = ','
    doublequote = True
    escapechar = '\\'
    lineterminator = '\r'
    quotechar = '"'
    quoting = QUOTE_MINIMAL
    skipinitialspace = True

rows = DictReader(str_data, dialect=CustomDialect)

Unfortunately I don't see a way to fix this from within unicodecsv.

jdunck closed this as completed Jan 26, 2015

tsroten mentioned this issue Feb 2, 2015

Unicode input raises UnicodeEncodeError exception #34

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DictReader doesn't work with io.StringIO (Python 2.7) #42

DictReader doesn't work with io.StringIO (Python 2.7) #42

arielpontes commented Jan 26, 2015

jdunck commented Jan 26, 2015

DictReader doesn't work with io.StringIO (Python 2.7) #42

DictReader doesn't work with io.StringIO (Python 2.7) #42

Comments

arielpontes commented Jan 26, 2015

jdunck commented Jan 26, 2015