Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression: read_csv call with skiprows fails on pandas versions 0.15.2, 0.16.0, works on 0.15.0 and 0.15.1 #9832

Closed
lexual opened this issue Apr 8, 2015 · 5 comments
Labels
Bug IO CSV read_csv, to_csv
Milestone

Comments

@lexual
Copy link
Contributor

lexual commented Apr 8, 2015

# passes for 0.15.0
# passes for 0.15.1
# fails for 0.15.2
# fails for 0.16.0
# This is for a "csv" file where there are a number of initial rows to be skipped at file start.
import pandas as pd
import urllib

test_data_url = 'http://www.bom.gov.au/fwo/IDV60901/IDV60901.95936.axf'
test_file = 'test.csv'

ROWS_TO_SKIP_AT_THE_START = 19


def main():
    urllib.urlretrieve(test_data_url, test_file)
    print('pandas version: {}'.format(pd.version.version))
    data = pd.read_csv(test_file, skiprows=ROWS_TO_SKIP_AT_THE_START)
    assert len(data) == 145
    assert 'sort_order' in data
    print('file successfully read by read_csv()')



if __name__ == '__main__':
    main()
@lexual
Copy link
Contributor Author

lexual commented Apr 8, 2015

Version 0.16.0 gives this error:

Traceback (most recent call last):
  File "test.py", line 25, in <module>
    main()
  File "test.py", line 17, in main
    data = pd.read_csv(test_file, skiprows=ROWS_TO_SKIP_AT_THE_START)
  File "/Users/user_x/miniconda3/envs/pandas_bug/lib/python2.7/site-packages/pandas/io/parsers.py", line 470, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/Users/user_x/miniconda3/envs/pandas_bug/lib/python2.7/site-packages/pandas/io/parsers.py", line 246, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/Users/user_x/miniconda3/envs/pandas_bug/lib/python2.7/site-packages/pandas/io/parsers.py", line 562, in __init__
    self._make_engine(self.engine)
  File "/Users/user_x/miniconda3/envs/pandas_bug/lib/python2.7/site-packages/pandas/io/parsers.py", line 699, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/Users/user_x/miniconda3/envs/pandas_bug/lib/python2.7/site-packages/pandas/io/parsers.py", line 1066, in __init__
    self._reader = _parser.TextReader(src, **kwds)
  File "pandas/parser.pyx", line 512, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:4804)
ValueError: No columns to parse from file

@lexual
Copy link
Contributor Author

lexual commented Apr 8, 2015

$ git bisect bad
6bf83c5dc575f52c84783d6bd6c4b9713b6201ab is the first bad commit
commit 6bf83c5dc575f52c84783d6bd6c4b9713b6201ab
Author: Scott E Lasley <slasley@umd.edu>
Date:   Fri Nov 7 15:31:53 2014 -0500

    BUG CSV: fix problem with trailing whitespace in skipped rows, issues 8661, 8679
    ENH CSV: Reduce memory usage when skiprows is an integer in read_csv, issue 8681

:040000 040000 ee53a53ff0780a7f8924d6abedbd846c2d2ac373 53e9f1268236309da5050bf5fc899178e10297d2 M      doc
:040000 040000 37079ae86ee722bd051638de6be4aeccb8d58e8e 6a7c5399ab7b94c0eed1d80117b4fe3595691660 M      pandas
:040000 040000 851e4c5b19db9027a7eddbad9b43f1092b46eae3 058f482ce83106fe3193d07cfeabbbb730c22918 M      vb_suite

@lexual
Copy link
Contributor Author

lexual commented Apr 8, 2015

6bf83c5

lexual referenced this issue Apr 8, 2015
… 8661, 8679

ENH CSV: Reduce memory usage when skiprows is an integer in read_csv, issue 8681
@jreback jreback added Bug IO CSV read_csv, to_csv labels Apr 8, 2015
@jreback jreback added this to the 0.16.1 milestone Apr 8, 2015
@lexual
Copy link
Contributor Author

lexual commented Apr 8, 2015

Yep my testcase now passes with latest HEAD

@jreback
Copy link
Contributor

jreback commented Apr 9, 2015

closed by #9834

@jreback jreback closed this as completed Apr 9, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv
Projects
None yet
Development

No branches or pull requests

2 participants