Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TextFileReader.get_chunk returns full file DataFram despite of chunksize specified in read_csv #3406

Closed
vshkolyar opened this issue Apr 20, 2013 · 4 comments
Milestone

Comments

@vshkolyar
Copy link

TextFileReader.get_chunk returns full file DataFram despite of chunksize specified in read_csv.

@jreback
Copy link
Contributor

jreback commented Apr 21, 2013

can you show exactly what you are doing? you need to specify the number of rows that you want in get_chunk, otherwise it returns the full frame. normally you would just iterator over the returned object anyhow.

http://pandas.pydata.org/pandas-docs/dev/io.html#iterating-through-files-chunk-by-chunk

@vshkolyar
Copy link
Author

read_csv method called with some specific chunk size.
When I call afterwards get_chunk method there is two cases:

  1. chunksize parameter passed to the method - I expect to receive chunk with size specified as argument in the method that overrides the instance chunk size the TextFileReader initialized with (passed to read_csv as parameter), and this OK.
  2. But, in case no such parameter passed to the get_chunk, I would expect to receive DataFrame with chunk size specified in read_csv, that TextFileReader instance initialized with and stored as instance variable (property). Again, that because get_chunk is type's instance method (not static type method, not some global function), and this instance of this type holds the chunksize member inside.

P.S. So I can call my_text_file_reader.get_chunk(my_text_file_reader.chunksize), but this is ugly and not object oriented.

@jreback
Copy link
Contributor

jreback commented Apr 22, 2013

@wes this is expected behavior yes?
IOW get_chunk allows u to get variable sizes chunks by calling with a parameter, or everything if no parameter? or should default to passed chunk size ?

@wesm wesm closed this as completed in f24b923 Apr 22, 2013
@wesm
Copy link
Member

wesm commented Apr 22, 2013

Calling it a bug. Fixed, thanks

yarikoptic added a commit to neurodebian/pandas that referenced this issue Jun 3, 2013
Version 0.11

* tag 'v0.11.0': (75 commits)
  RLS: Version 0.11
  BUG: respect passed chunksize in read_csv when using get_chunk function. close pandas-dev#3406
  DOC: Adding parameters to frequencies, offsets (issue pandas-dev#2916)
  BUG: fix broken validators again
  Revert "BUG: config.is_one_of_factory is broken"
  DOC: minor indexing.rst doc updates
  BUG: config.is_one_of_factory is broken
  TST: add test for config.is_one_of_factory
  DOC: RELEASE.rst mention new options disp.height/width and deprecated line_width
  BUG: fix validators for config options
  ENH: default dimensions for ip zmq frontends derive from disp.height/width defaults
  ENH: let display.notebook_repr_html control HTML repr in qtconsole again.
  ENH: fix df.repr() for scripts (keep GH1611 away), accept None for max_cols/rows
  BUG: rework get_console_size to handle cases properly
  ENH: optimize _repr_fits_horizontal_, to_string only on slice of interest
  BUG: for numerical option, sentry should be another Type, not 0
  TST: fix test_repr_non_interactive passing by accident
  TST: adjust tests to accommodate new defaults for display options
  ENH: Adjust height/width/max_rows defaults to match informal survey results
  CLN: convert test utilizing disp.line_width to disp.width
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants