New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_csv incompatible with newstr and future #14477

Closed
larssono opened this Issue Oct 23, 2016 · 7 comments

Comments

Projects
None yet
4 participants
@larssono

larssono commented Oct 23, 2016

When upgrading the pandas-0.19 I have several tests failing on a package I maintain. These packages are using several imports from future to work with both py2 and py3. It seems there is an issue with using from __future__ import unicode_literals

A small, complete example of the issue

import pandas as pd
pd.read_csv('simple.txt', quotechar='"')
from __future__ import unicode_literals
pd.read_csv('simple.txt', quotechar='"')

The first reading works the second does not and throws the stack trace attached. ("TypeError: "quotechar" must be string, not unicode")
The example file
simple.txt

Expected Output

Output of pd.show_versions()

## INSTALLED VERSIONS

commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Darwin
OS-release: 15.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.19.0
nose: 1.3.7
pip: 8.1.2
setuptools: 26.0.0
Cython: None
numpy: 1.11.2
scipy: 0.16.1
statsmodels: 0.6.1
xarray: None
IPython: 5.1.0
sphinx: 1.3.1
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.7
blosc: None
bottleneck: None
tables: None
numexpr: 2.4.6
matplotlib: 1.5.1
openpyxl: None
xlrd: 0.9.4
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.4.1
html5lib: 0.9999999
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.42.0
pandas_datareader: None

TypeError                                 Traceback (most recent call last)
<ipython-input-2-6e275a5a7598> in <module>()
      1 from __future__ import unicode_literals
----> 2 pd.read_csv('/Users/lom/simple.csv', quotechar='"')

/usr/local/lib/python2.7/site-packages/pandas/io/parsers.pyc in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision)
    643                     skip_blank_lines=skip_blank_lines)
    644 
--> 645         return _read(filepath_or_buffer, kwds)
    646 
    647     parser_f.__name__ = name

/usr/local/lib/python2.7/site-packages/pandas/io/parsers.pyc in _read(filepath_or_buffer, kwds)
    386 
    387     # Create the parser.
--> 388     parser = TextFileReader(filepath_or_buffer, **kwds)
    389 
    390     if (nrows is not None) and (chunksize is not None):

/usr/local/lib/python2.7/site-packages/pandas/io/parsers.pyc in __init__(self, f, engine, **kwds)
    727             self.options['has_index_names'] = kwds['has_index_names']
    728 
--> 729         self._make_engine(self.engine)
    730 
    731     def close(self):

/usr/local/lib/python2.7/site-packages/pandas/io/parsers.pyc in _make_engine(self, engine)
    920     def _make_engine(self, engine='c'):
    921         if engine == 'c':
--> 922             self._engine = CParserWrapper(self.f, **self.options)
    923         else:
    924             if engine == 'python':

/usr/local/lib/python2.7/site-packages/pandas/io/parsers.pyc in __init__(self, src, **kwds)
   1387         kwds['allow_leading_cols'] = self.index_col is not False
   1388 
-> 1389         self._reader = _parser.TextReader(src, **kwds)
   1390 
   1391         # XXX

pandas/parser.pyx in pandas.parser.TextReader.__cinit__ (pandas/parser.c:4411)()

pandas/parser.pyx in pandas.parser.TextReader._set_quoting (pandas/parser.c:6535)()

TypeError: "quotechar" must be string, not unicode
@jorisvandenbossche

This comment has been minimized.

Show comment
Hide comment
@jorisvandenbossche

jorisvandenbossche Oct 24, 2016

Member

@larssono Thanks for the report!

cc @gfyoung

Member

jorisvandenbossche commented Oct 24, 2016

@larssono Thanks for the report!

cc @gfyoung

@gfyoung

This comment has been minimized.

Show comment
Hide comment
@gfyoung

gfyoung Oct 24, 2016

Member

@jorisvandenbossche : Might it be best to just add a unicode class to pandas.compat? I think that should patch this issue IINM i.e.

try:
    unicode
except NameError:
    unicode = str
Member

gfyoung commented Oct 24, 2016

@jorisvandenbossche : Might it be best to just add a unicode class to pandas.compat? I think that should patch this issue IINM i.e.

try:
    unicode
except NameError:
    unicode = str
@gfyoung

This comment has been minimized.

Show comment
Hide comment
@gfyoung

gfyoung Oct 24, 2016

Member

FYI, for future reference, here's a slightly easier way to reproduce (Note: Python 2.x required):

>>> from pandas import read_csv
>>> from pandas.compat import StringIO, u
>>>
>>> data = 'a\n1'
>>> read_csv(StringIO(data), quotechar=u('"'))
...
TypeError: "quotechar" must be string, not unicode
Member

gfyoung commented Oct 24, 2016

FYI, for future reference, here's a slightly easier way to reproduce (Note: Python 2.x required):

>>> from pandas import read_csv
>>> from pandas.compat import StringIO, u
>>>
>>> data = 'a\n1'
>>> read_csv(StringIO(data), quotechar=u('"'))
...
TypeError: "quotechar" must be string, not unicode
@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Oct 24, 2016

Contributor

@gfyoung unicode needs to be very explicit

Contributor

jreback commented Oct 24, 2016

@gfyoung unicode needs to be very explicit

@gfyoung

This comment has been minimized.

Show comment
Hide comment
@gfyoung

gfyoung Oct 24, 2016

Member

@jreback : Right...but what do you think of the patch I proposed above, and we can then add the class to the allowed string types in parser.pyx?

Member

gfyoung commented Oct 24, 2016

@jreback : Right...but what do you think of the patch I proposed above, and we can then add the class to the allowed string types in parser.pyx?

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Oct 24, 2016

Contributor

well it's not explicit
so -1

Contributor

jreback commented Oct 24, 2016

well it's not explicit
so -1

@gfyoung

This comment has been minimized.

Show comment
Hide comment
@gfyoung

gfyoung Oct 24, 2016

Member

In pandas.compat:

try:
    unicode
except NameError:
    unicode = str
...

In parser.pyx:

if not isinstance(quote_char, (str, bytes, compat.unicode)) and quote_char is not None:
...
Member

gfyoung commented Oct 24, 2016

In pandas.compat:

try:
    unicode
except NameError:
    unicode = str
...

In parser.pyx:

if not isinstance(quote_char, (str, bytes, compat.unicode)) and quote_char is not None:
...

gfyoung added a commit to gfyoung/pandas that referenced this issue Oct 25, 2016

gfyoung added a commit to gfyoung/pandas that referenced this issue Oct 25, 2016

gfyoung added a commit to gfyoung/pandas that referenced this issue Oct 25, 2016

gfyoung added a commit to gfyoung/pandas that referenced this issue Oct 25, 2016

gfyoung added a commit to gfyoung/pandas that referenced this issue Oct 26, 2016

gfyoung added a commit to gfyoung/pandas that referenced this issue Oct 26, 2016

@jreback jreback closed this in 6130e77 Oct 26, 2016

jorisvandenbossche added a commit to jorisvandenbossche/pandas that referenced this issue Nov 2, 2016

[Backport #14492] BUG: Accept unicode quotechars again in pd.read_csv
Title is self-explanatory.  Affects Python 2.x only.  Closes #14477.

Author: gfyoung <gfyoung17@gmail.com>

Closes #14492 from gfyoung/quotechar-unicode-2.x and squashes the following commits:

ec9f59a [gfyoung] BUG: Accept unicode quotechars again in pd.read_csv

(cherry picked from commit 6130e77)

amolkahat added a commit to amolkahat/pandas that referenced this issue Nov 26, 2016

BUG: Accept unicode quotechars again in pd.read_csv
Title is self-explanatory.  Affects Python 2.x only.  Closes #14477.

Author: gfyoung <gfyoung17@gmail.com>

Closes #14492 from gfyoung/quotechar-unicode-2.x and squashes the following commits:

ec9f59a [gfyoung] BUG: Accept unicode quotechars again in pd.read_csv
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment