New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"TypeError: 'set' object does not support indexing" using na_values in read_csv() #11374

Closed
goyodiaz opened this Issue Oct 19, 2015 · 11 comments

Comments

Projects
None yet
3 participants
@goyodiaz
Contributor

goyodiaz commented Oct 19, 2015

Test case:

user@host:~$ python3
Python 3.4.3+ (default, Oct 14 2015, 16:03:50) 
[GCC 5.2.1 20151010] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from io import StringIO
>>> import pandas as pd
>>> src = """first, second
... 0,0.1
... 1,1.1
... """
>>> df = pd.read_csv(StringIO(src), na_values='XX')
>>> print(df)
   first   second
0      0      0.1
1      1      1.1
>>> df = pd.read_csv(StringIO(src), na_values='-999.99')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/goyo/.local/lib/python3.4/site-packages/pandas/io/parsers.py", line 491, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/home/goyo/.local/lib/python3.4/site-packages/pandas/io/parsers.py", line 278, in _read
    return parser.read()
  File "/home/goyo/.local/lib/python3.4/site-packages/pandas/io/parsers.py", line 740, in read
    ret = self._engine.read(nrows)
  File "/home/goyo/.local/lib/python3.4/site-packages/pandas/io/parsers.py", line 1187, in read
    data = self._reader.read(nrows)
  File "pandas/parser.pyx", line 766, in pandas.parser.TextReader.read (pandas/parser.c:8082)
  File "pandas/parser.pyx", line 788, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:8338)
  File "pandas/parser.pyx", line 868, in pandas.parser.TextReader._read_rows (pandas/parser.c:9465)
  File "pandas/parser.pyx", line 975, in pandas.parser.TextReader._convert_column_data (pandas/parser.c:10858)
  File "pandas/parser.pyx", line 1035, in pandas.parser.TextReader._convert_tokens (pandas/parser.c:11744)
  File "pandas/parser.pyx", line 1085, in pandas.parser.TextReader._convert_with_dtype (pandas/parser.c:12634)
  File "pandas/parser.pyx", line 1499, in pandas.parser._try_double (pandas/parser.c:19996)
  File "pandas/parser.pyx", line 1818, in pandas.parser.kset_float64_from_list (pandas/parser.c:22852)
TypeError: 'set' object does not support indexing
>>> pd.util.print_versions.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.4.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.2.0-16-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: es_ES.UTF-8

pandas: 0.17.0
nose: 1.3.6
pip: 1.5.6
setuptools: 18.4
Cython: None
numpy: 1.8.2
scipy: 0.14.1
statsmodels: 0.6.1
IPython: 4.0.0
sphinx: 1.3.1
patsy: 0.3.0
dateutil: 2.4.2
pytz: 2015.6
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.4.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.3.2
html5lib: 0.999
httplib2: 0.9
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Oct 19, 2015

Contributor

You must be picking up another version of pandas somehow. The error you are seeing IIRC is from a somewhat older version of pandas

This works just fine on linux with 3.4 (mac is below).
I know this is also tested.

Python 3.4.3 |Continuum Analytics, Inc.| (default, Mar  6 2015, 12:07:41) 
Type "copyright", "credits" or "license" for more information.

IPython 4.0.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: pd.__version__ 
Out[1]: '0.17.0'

In [2]: src = 'first, second\n0,0.1\n1,1.1'

In [3]: from io import StringIO

In [4]: pd.read_csv(StringIO(src), na_values='-999.99')
Out[4]: 
   first   second
0      0      0.1
1      1      1.1
Contributor

jreback commented Oct 19, 2015

You must be picking up another version of pandas somehow. The error you are seeing IIRC is from a somewhat older version of pandas

This works just fine on linux with 3.4 (mac is below).
I know this is also tested.

Python 3.4.3 |Continuum Analytics, Inc.| (default, Mar  6 2015, 12:07:41) 
Type "copyright", "credits" or "license" for more information.

IPython 4.0.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: pd.__version__ 
Out[1]: '0.17.0'

In [2]: src = 'first, second\n0,0.1\n1,1.1'

In [3]: from io import StringIO

In [4]: pd.read_csv(StringIO(src), na_values='-999.99')
Out[4]: 
   first   second
0      0      0.1
1      1      1.1
@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Oct 19, 2015

Contributor

show pd.__version__.

it looks like you are directly running print_versions which is another indication you are actually using an older version (BUT ``print_versions actually will look at your environment and NOT from where it is called)

Contributor

jreback commented Oct 19, 2015

show pd.__version__.

it looks like you are directly running print_versions which is another indication you are actually using an older version (BUT ``print_versions actually will look at your environment and NOT from where it is called)

@jreback jreback closed this Oct 19, 2015

@vlasisva

This comment has been minimized.

Show comment
Hide comment
@vlasisva

vlasisva Oct 27, 2015

I see exactly the same error:

python b.py

0.17.0
Traceback (most recent call last):
File "b.py", line 8, in
df = pd.read_csv(StringIO(src), na_values='-999.99')
File "/home/vlasisva/Software/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 491, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/vlasisva/Software/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 278, in _read
return parser.read()
File "/home/vlasisva/Software/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 740, in read
ret = self._engine.read(nrows)
File "/home/vlasisva/Software/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 1187, in read
data = self._reader.read(nrows)
File "pandas/parser.pyx", line 766, in pandas.parser.TextReader.read (pandas/parser.c:8082)
File "pandas/parser.pyx", line 788, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:8338)
File "pandas/parser.pyx", line 868, in pandas.parser.TextReader._read_rows (pandas/parser.c:9465)
File "pandas/parser.pyx", line 975, in pandas.parser.TextReader._convert_column_data (pandas/parser.c:10858)
File "pandas/parser.pyx", line 1035, in pandas.parser.TextReader._convert_tokens (pandas/parser.c:11744)
File "pandas/parser.pyx", line 1085, in pandas.parser.TextReader._convert_with_dtype (pandas/parser.c:12634)
File "pandas/parser.pyx", line 1499, in pandas.parser._try_double (pandas/parser.c:19996)
File "pandas/parser.pyx", line 1818, in pandas.parser.kset_float64_from_list (pandas/parser.c:22852)
TypeError: 'set' object does not support indexing


cat b.py
from StringIO import StringIO
import pandas as pd
src = """first, second
0,0.1
1,1.1
"""
print pd.version
df = pd.read_csv(StringIO(src), na_values='-999.99')


lsb_release --all
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.04.3 LTS
Release: 14.04

Codename: trusty

python
Python 2.7.10 |Anaconda 2.1.0 (64-bit)| (default, May 28 2015, 17:02:03)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://binstar.org


vlasisva commented Oct 27, 2015

I see exactly the same error:

python b.py

0.17.0
Traceback (most recent call last):
File "b.py", line 8, in
df = pd.read_csv(StringIO(src), na_values='-999.99')
File "/home/vlasisva/Software/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 491, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/vlasisva/Software/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 278, in _read
return parser.read()
File "/home/vlasisva/Software/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 740, in read
ret = self._engine.read(nrows)
File "/home/vlasisva/Software/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 1187, in read
data = self._reader.read(nrows)
File "pandas/parser.pyx", line 766, in pandas.parser.TextReader.read (pandas/parser.c:8082)
File "pandas/parser.pyx", line 788, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:8338)
File "pandas/parser.pyx", line 868, in pandas.parser.TextReader._read_rows (pandas/parser.c:9465)
File "pandas/parser.pyx", line 975, in pandas.parser.TextReader._convert_column_data (pandas/parser.c:10858)
File "pandas/parser.pyx", line 1035, in pandas.parser.TextReader._convert_tokens (pandas/parser.c:11744)
File "pandas/parser.pyx", line 1085, in pandas.parser.TextReader._convert_with_dtype (pandas/parser.c:12634)
File "pandas/parser.pyx", line 1499, in pandas.parser._try_double (pandas/parser.c:19996)
File "pandas/parser.pyx", line 1818, in pandas.parser.kset_float64_from_list (pandas/parser.c:22852)
TypeError: 'set' object does not support indexing


cat b.py
from StringIO import StringIO
import pandas as pd
src = """first, second
0,0.1
1,1.1
"""
print pd.version
df = pd.read_csv(StringIO(src), na_values='-999.99')


lsb_release --all
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.04.3 LTS
Release: 14.04

Codename: trusty

python
Python 2.7.10 |Anaconda 2.1.0 (64-bit)| (default, May 28 2015, 17:02:03)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://binstar.org


@goyodiaz

This comment has been minimized.

Show comment
Hide comment
@goyodiaz

goyodiaz Oct 27, 2015

Contributor

pd.__version__ is also 0.17.0 here.

More facts:

  • It happened in both python2.7 and python3.4 after pip-upgrading pandas to 0.17.0 and ubuntu to 15.10 (but it looks like @vlasisva is using ubuntu 14.04 and anaconda)
  • Using engine='python makes the test pass.
  • Everything else seems to be working as expected in pandas and python.

In order to clean my python environment as much as possible I uninstalled every non-distro package/version and every distro package not installed by default except dependencies of other software I use: python2.7 numpy, python2.7 gdal bindings, gnome stuff... I even uninstalled pip (packaged python3 pip is almost useless in willy anyway).

I also did my best to ensure there where nothing python-related in ~/.local/bin, ~/.local/lib, /usr/local/bin and /usr/local/lib. I also made sure there were nothing called pandas in every mounted file system. I then used get-pip.py to install pip2 and pip3 and installed python2 and python3 pandas. The issue is still present.

While this is not critical to me (it just broke one test for a function I never use in that way) I would really like to understand what's going on, but I do not know where to look at.

Contributor

goyodiaz commented Oct 27, 2015

pd.__version__ is also 0.17.0 here.

More facts:

  • It happened in both python2.7 and python3.4 after pip-upgrading pandas to 0.17.0 and ubuntu to 15.10 (but it looks like @vlasisva is using ubuntu 14.04 and anaconda)
  • Using engine='python makes the test pass.
  • Everything else seems to be working as expected in pandas and python.

In order to clean my python environment as much as possible I uninstalled every non-distro package/version and every distro package not installed by default except dependencies of other software I use: python2.7 numpy, python2.7 gdal bindings, gnome stuff... I even uninstalled pip (packaged python3 pip is almost useless in willy anyway).

I also did my best to ensure there where nothing python-related in ~/.local/bin, ~/.local/lib, /usr/local/bin and /usr/local/lib. I also made sure there were nothing called pandas in every mounted file system. I then used get-pip.py to install pip2 and pip3 and installed python2 and python3 pandas. The issue is still present.

While this is not critical to me (it just broke one test for a function I never use in that way) I would really like to understand what's going on, but I do not know where to look at.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Oct 28, 2015

Contributor

so the error line:

File "pandas/parser.pyx", line 1818, in pandas.parser.kset_float64_from_list (pandas/parser.c:22852)
TypeError: 'set' object does not support indexing

tells me that you are using some kind of development version of pandas (somewhere). This function DOES not exist in master or 0.17.0.

pls make sure that you are not in a development directory when trying to import pandas.

Its not clear what you actually have installed, so pls create a new virtual env or use conda.

Contributor

jreback commented Oct 28, 2015

so the error line:

File "pandas/parser.pyx", line 1818, in pandas.parser.kset_float64_from_list (pandas/parser.c:22852)
TypeError: 'set' object does not support indexing

tells me that you are using some kind of development version of pandas (somewhere). This function DOES not exist in master or 0.17.0.

pls make sure that you are not in a development directory when trying to import pandas.

Its not clear what you actually have installed, so pls create a new virtual env or use conda.

@vlasisva

This comment has been minimized.

Show comment
Hide comment
@vlasisva

vlasisva Oct 28, 2015

I installed pandas via pip
Either our environment is contaminated somehow, or what pip brings is now what you/we expect?

Will check and get back to you.

vlasisva commented Oct 28, 2015

I installed pandas via pip
Either our environment is contaminated somehow, or what pip brings is now what you/we expect?

Will check and get back to you.

@vlasisva

This comment has been minimized.

Show comment
Hide comment
@vlasisva

vlasisva Oct 28, 2015

My "pip install pandas==0.17.0" downloads
https://pypi.python.org/packages/source/p/pandas/pandas-0.17.0.tar.gz#md5=55d34c4d5655c94ca30a59dea6b36316

which contains file pandas/parser.c,
which contains the following in line 1554:

static kh_float64_t ___pyx_f_6pandas_6parser_kset_float64_from_list(PyObject ); /_proto/

vlasisva commented Oct 28, 2015

My "pip install pandas==0.17.0" downloads
https://pypi.python.org/packages/source/p/pandas/pandas-0.17.0.tar.gz#md5=55d34c4d5655c94ca30a59dea6b36316

which contains file pandas/parser.c,
which contains the following in line 1554:

static kh_float64_t ___pyx_f_6pandas_6parser_kset_float64_from_list(PyObject ); /_proto/

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Oct 28, 2015

Contributor

ok, it appears that when I distributed this it didn't rebuild the .c files (and had a newer version I was testing out). very odd.

so will fix for 0.17.1 (e.g. will make a clean version). you can simply regenerate the .c files (you need cython installed).

e.g.

make clean
python setup.py install
Contributor

jreback commented Oct 28, 2015

ok, it appears that when I distributed this it didn't rebuild the .c files (and had a newer version I was testing out). very odd.

so will fix for 0.17.1 (e.g. will make a clean version). you can simply regenerate the .c files (you need cython installed).

e.g.

make clean
python setup.py install
@goyodiaz

This comment has been minimized.

Show comment
Hide comment
@goyodiaz

goyodiaz Oct 28, 2015

Contributor

Thanks, Jeff. That worked.

Contributor

goyodiaz commented Oct 28, 2015

Thanks, Jeff. That worked.

@vlasisva

This comment has been minimized.

Show comment
Hide comment
@vlasisva

vlasisva Oct 29, 2015

Other than this bug, would you consider pip-obtained pandas 0.17.0 as safe to use?

vlasisva commented Oct 29, 2015

Other than this bug, would you consider pip-obtained pandas 0.17.0 as safe to use?

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Oct 29, 2015

Contributor

yep as I said the .c for he parser came from a or which is now merged

Contributor

jreback commented Oct 29, 2015

yep as I said the .c for he parser came from a or which is now merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment