read_csv: chunksize clashes with nrows #6774

Closed
michaelaye opened this Issue Apr 3, 2014 · 3 comments

Comments

Projects
None yet
2 participants
Contributor

michaelaye commented Apr 3, 2014

If nrows are determined, the use of chunksize option does not create a TextFileReader object.

reader = pd.read_table(fname, sep=',',chunksize=4, na_values=['null'],
                       nrows=20)
type(reader)
pandas.core.frame.DataFrame

My suggestion:

  • Either put into docs that they are not to be used concurrently
  • or it's a feature request that they can.

I would find it useful to get chunks of x size, but only for the first n row of a huge file.

jreback added this to the 0.14.0 milestone Apr 3, 2014

Contributor

jreback commented Apr 3, 2014

I think easiest ATM to simply raise NotImplementedError if both nrows and chunksize are specified.

Implemented is a bit non-trivial but would be useful I agree.

You want to do a PR for the NotImplementedError? Then we'll create an issue to implement this at some point.

@jreback jreback modified the milestone: 0.15.0, 0.14.0 Apr 21, 2014

Contributor

michaelaye commented May 8, 2014

I would like to but find io/parsers.py quite confusing.

read_csv is 'declared' here:

449 read_csv = _make_parser_function('read_csv', sep=',')
450 read_csv = Appender(_read_csv_doc)(read_csv)

_make_parser_function defines a parser_f on line 311, and that definition has both the chunksize and nrows option, but is it true, that they are in no case, so for none of the generated parsers, work together? In that case should I implemented right there, at parser_f definition?

@jreback jreback modified the milestone: 0.14.1, 0.15.0 May 30, 2014

@jreback jreback modified the milestone: 0.15.0, 0.14.1 Jun 17, 2014

@michaelaye michaelaye added a commit to michaelaye/pandas that referenced this issue Jun 24, 2014

@michaelaye michaelaye + K.-Michael Aye ENH: Adding a NotImplementedError for simultaneous use of nrows and c…
…hunksize.

For read_csv() the user intention most likely is to get a TextFileReader, when using the chunksize option, but simultaneous use of nrows is not implemented yet. This raises now a NotImplementedError. Test and entry to current whatsnew source (v0.14.1.txt) added.
Fixes #6774
19a1083

@jreback jreback modified the milestone: 0.14.1, 0.15.0 Jun 24, 2014

jreback closed this in #7085 Jun 24, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment