Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_csv python engine errors #10476

Closed
michaelaye opened this issue Jun 30, 2015 · 4 comments
Closed

read_csv python engine errors #10476

michaelaye opened this issue Jun 30, 2015 · 4 comments
Labels
Bug IO CSV read_csv, to_csv
Milestone

Comments

@michaelaye
Copy link
Contributor

Only thing I changed from my usually working reduction pipeline is to try engine="python" (because I wanted to use nrows for a smaller test-read, but that fails as well, and I thought maybe the python engine is buggy currently):

$ python reduction.py ~/data/planet4/2015-06-21_planet_four_classifications.csv
INFO:Starting reduction.
Traceback (most recent call last):
  File "reduction.py", line 258, in <module>
    args.test_n_rows, args.remove_duplicates)
  File "reduction.py", line 182, in main
    data = [chunk for chunk in reader]
  File "reduction.py", line 182, in <listcomp>
    data = [chunk for chunk in reader]
  File "/Users/klay6683/miniconda3/lib/python3.4/site-packages/pandas-0.16.2_58_g01995b2-py3.4-macosx-10.5-x86_64.egg/pandas/io/parsers.py", line 697, in __iter__
    yield self.read(self.chunksize)
  File "/Users/klay6683/miniconda3/lib/python3.4/site-packages/pandas-0.16.2_58_g01995b2-py3.4-macosx-10.5-x86_64.egg/pandas/io/parsers.py", line 721, in read
    ret = self._engine.read(nrows)
  File "/Users/klay6683/miniconda3/lib/python3.4/site-packages/pandas-0.16.2_58_g01995b2-py3.4-macosx-10.5-x86_64.egg/pandas/io/parsers.py", line 1556, in read
    content = self._get_lines(rows)
  File "/Users/klay6683/miniconda3/lib/python3.4/site-packages/pandas-0.16.2_58_g01995b2-py3.4-macosx-10.5-x86_64.egg/pandas/io/parsers.py", line 2007, in _get_lines
    for _ in range(rows):
TypeError: 'float' object cannot be interpreted as an integer

My function call is this:

# as chunksize and nrows cannot be used together yet, i switch chunksize
# to None if I want test_n_rows for a small test database:
if test_n_rows:
    chunks = None
else:
    chunks = 1e6
# creating reader object with pandas interface for csv parsing
# doing this in chunks as its faster. Also, later will do a split
# into multiple processes to do this.
reader = pd.read_csv(fname, chunksize=chunks, na_values=['null'],
                                   usecols=analysis_cols, nrows=test_n_rows,
                                   engine='c')

Using pandas-0.16.2_58_g01995b2-py3.4

@kawochen
Copy link
Contributor

1e6 is float.

@michaelaye
Copy link
Contributor Author

Changing title, because the apparent same error happens in Py2.7:

$ python reduction.py ~/data/planet4/2015-06-21_planet_four_classifications.csv
INFO:Starting reduction.
Traceback (most recent call last):
  File "reduction.py", line 258, in <module>
    args.test_n_rows, args.remove_duplicates)
  File "reduction.py", line 182, in main
    data = [chunk for chunk in reader]
  File "/Users/klay6683/miniconda3/envs/py27/lib/python2.7/site-packages/pandas/io/parsers.py", line 686, in __iter__
    yield self.read(self.chunksize)
  File "/Users/klay6683/miniconda3/envs/py27/lib/python2.7/site-packages/pandas/io/parsers.py", line 710, in read
    ret = self._engine.read(nrows)
  File "/Users/klay6683/miniconda3/envs/py27/lib/python2.7/site-packages/pandas/io/parsers.py", line 1534, in read
    content = self._get_lines(rows)
  File "/Users/klay6683/miniconda3/envs/py27/lib/python2.7/site-packages/pandas/io/parsers.py", line 1985, in _get_lines
    for _ in range(rows):
TypeError: integer argument expected, got float
(py27)

@michaelaye michaelaye changed the title read_csv python engine errors with Python 3 read_csv python engine errors Jun 30, 2015
@michaelaye
Copy link
Contributor Author

1e6 is float.
Yes, but it works with the C engine. Shouldn't the same function call be transparent in terms of syntax validity towards both engines?

@jreback
Copy link
Contributor

jreback commented Jul 1, 2015

ok, will mark as a bug. should be easy test/fix. want to do a PR?

@jreback jreback added Bug IO CSV read_csv, to_csv labels Jul 1, 2015
@jreback jreback added this to the Next Major Release milestone Jul 1, 2015
@jreback jreback changed the title read_csv python engine errors read_csv python engine errors Jul 1, 2015
gfyoung added a commit to forking-repos/pandas that referenced this issue May 25, 2016
@jreback jreback modified the milestones: 0.18.2, Next Major Release May 25, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv
Projects
None yet
Development

No branches or pull requests

3 participants