TextFileReader.get_chunk returns full file DataFram despite of chunksize specified in read_csv #3406

vshkolyar · 2013-04-20T16:26:42Z

TextFileReader.get_chunk returns full file DataFram despite of chunksize specified in read_csv.

jreback · 2013-04-21T17:02:18Z

can you show exactly what you are doing? you need to specify the number of rows that you want in get_chunk, otherwise it returns the full frame. normally you would just iterator over the returned object anyhow.

http://pandas.pydata.org/pandas-docs/dev/io.html#iterating-through-files-chunk-by-chunk

vshkolyar · 2013-04-22T09:55:26Z

read_csv method called with some specific chunk size.
When I call afterwards get_chunk method there is two cases:

chunksize parameter passed to the method - I expect to receive chunk with size specified as argument in the method that overrides the instance chunk size the TextFileReader initialized with (passed to read_csv as parameter), and this OK.
But, in case no such parameter passed to the get_chunk, I would expect to receive DataFrame with chunk size specified in read_csv, that TextFileReader instance initialized with and stored as instance variable (property). Again, that because get_chunk is type's instance method (not static type method, not some global function), and this instance of this type holds the chunksize member inside.

P.S. So I can call my_text_file_reader.get_chunk(my_text_file_reader.chunksize), but this is ugly and not object oriented.

jreback · 2013-04-22T11:31:30Z

@wes this is expected behavior yes?
IOW get_chunk allows u to get variable sizes chunks by calling with a parameter, or everything if no parameter? or should default to passed chunk size ?

wesm · 2013-04-22T23:53:49Z

Calling it a bug. Fixed, thanks

Version 0.11 * tag 'v0.11.0': (75 commits) RLS: Version 0.11 BUG: respect passed chunksize in read_csv when using get_chunk function. close pandas-dev#3406 DOC: Adding parameters to frequencies, offsets (issue pandas-dev#2916) BUG: fix broken validators again Revert "BUG: config.is_one_of_factory is broken" DOC: minor indexing.rst doc updates BUG: config.is_one_of_factory is broken TST: add test for config.is_one_of_factory DOC: RELEASE.rst mention new options disp.height/width and deprecated line_width BUG: fix validators for config options ENH: default dimensions for ip zmq frontends derive from disp.height/width defaults ENH: let display.notebook_repr_html control HTML repr in qtconsole again. ENH: fix df.repr() for scripts (keep GH1611 away), accept None for max_cols/rows BUG: rework get_console_size to handle cases properly ENH: optimize _repr_fits_horizontal_, to_string only on slice of interest BUG: for numerical option, sentry should be another Type, not 0 TST: fix test_repr_non_interactive passing by accident TST: adjust tests to accommodate new defaults for display options ENH: Adjust height/width/max_rows defaults to match informal survey results CLN: convert test utilizing disp.line_width to disp.width ...

wesm closed this as completed in f24b923 Apr 22, 2013

jreback mentioned this issue Jun 20, 2013

read_csv with iterator=True does not seem to work as expected without chunksize #3967

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TextFileReader.get_chunk returns full file DataFram despite of chunksize specified in read_csv #3406

TextFileReader.get_chunk returns full file DataFram despite of chunksize specified in read_csv #3406

vshkolyar commented Apr 20, 2013

jreback commented Apr 21, 2013

vshkolyar commented Apr 22, 2013

jreback commented Apr 22, 2013

wesm commented Apr 22, 2013

TextFileReader.get_chunk returns full file DataFram despite of chunksize specified in read_csv #3406

TextFileReader.get_chunk returns full file DataFram despite of chunksize specified in read_csv #3406

Comments

vshkolyar commented Apr 20, 2013

jreback commented Apr 21, 2013

vshkolyar commented Apr 22, 2013

jreback commented Apr 22, 2013

wesm commented Apr 22, 2013