Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OverflowError in read_csv when specifying certain na_values #17128

Closed
YS-L opened this issue Jul 31, 2017 · 2 comments · Fixed by #22169
Closed

OverflowError in read_csv when specifying certain na_values #17128

YS-L opened this issue Jul 31, 2017 · 2 comments · Fixed by #22169
Labels
Bug IO CSV read_csv, to_csv
Milestone

Comments

@YS-L
Copy link

YS-L commented Jul 31, 2017

Code Sample, a copy-pastable example if possible

import pandas as pd
from pandas.compat import StringIO
data = StringIO("a,b,c\n1,2,3\n4,5,6\n7,8,9")
na_values = ['-inf']
index_col = 0
df = pd.read_csv(data, na_values=na_values, index_col=index_col)

Problem description

read_csv() fails with the following traceback when specifying certain na_values with index_col:

Traceback (most recent call last):
  File "run.py", line 9, in <module>
    df = pd.read_csv(data, na_values=na_values, index_col=index_col)
  File "/home/liauys/Code/pandas/pandas/io/parsers.py", line 660, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/home/liauys/Code/pandas/pandas/io/parsers.py", line 416, in _read
    data = parser.read(nrows)
  File "/home/liauys/Code/pandas/pandas/io/parsers.py", line 1010, in read
    ret = self._engine.read(nrows)
  File "/home/liauys/Code/pandas/pandas/io/parsers.py", line 1837, in read
    index, names = self._make_index(data, alldata, names)
  File "/home/liauys/Code/pandas/pandas/io/parsers.py", line 1347, in _make_index
    index = self._agg_index(index)
  File "/home/liauys/Code/pandas/pandas/io/parsers.py", line 1440, in _agg_index
    arr, _ = self._infer_types(arr, col_na_values | col_na_fvalues)
  File "/home/liauys/Code/pandas/pandas/io/parsers.py", line 1524, in _infer_types
    mask = algorithms.isin(values, list(na_values))
  File "/home/liauys/Code/pandas/pandas/core/algorithms.py", line 408, in isin
    values, _, _ = _ensure_data(values, dtype=dtype)
  File "/home/liauys/Code/pandas/pandas/core/algorithms.py", line 74, in _ensure_data
    return _ensure_int64(values), 'int64', 'int64'
  File "pandas/_libs/algos_common_helper.pxi", line 3227, in pandas._libs.algos.ensure_int64
  File "pandas/_libs/algos_common_helper.pxi", line 3232, in pandas._libs.algos.ensure_int64
OverflowError: cannot convert float infinity to integer

Any of the following makes the error go away:

  • The index column does contain the said NA value
  • Using na_values of ['inf'] instead of ['-inf']
  • Not specifying index_col
  • Using version 0.19 or older

Expected Output

There should not be any error.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Linux
OS-release: 4.11.9-1-ARCH
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.21.0.dev+316.gf2b0bdc9b
pytest: None
pip: 9.0.1
setuptools: 36.2.5
Cython: 0.26
numpy: 1.13.1
scipy: None
pyarrow: None
xarray: None
IPython: 5.4.1
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
pandas_gbq: None
pandas_datareader: None

@YS-L YS-L changed the title OverflowError in read_csv when certain na_values OverflowError in read_csv when specifying certain na_values Jul 31, 2017
@gfyoung
Copy link
Member

gfyoung commented Jul 31, 2017

@YS-L : Thanks for the report!

I'm not sure I follow you here: if upgrading makes the error go away, why are you filing this issue? Closing given your explanation.

@gfyoung gfyoung added the IO CSV read_csv, to_csv label Jul 31, 2017
@gfyoung gfyoung added this to the No action milestone Jul 31, 2017
@gfyoung gfyoung closed this as completed Jul 31, 2017
@gfyoung gfyoung removed this from the No action milestone Jul 31, 2017
@gfyoung
Copy link
Member

gfyoung commented Jul 31, 2017

It seems like I was mislead by your comment. This issue is in fact reproducible on master, which I see now is what you were using. Sorry about that! Reopening.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants