Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: context-manager for chunksize/iterator-reader #38225

Merged
merged 2 commits into from
Dec 4, 2020

Conversation

twoertwein
Copy link
Member

@twoertwein twoertwein commented Dec 2, 2020

  • closes #xxxx
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

Allows (and encourages) the following use:

import pandas as pd

filename = "pandas/tests/io/data/csv/iris.csv"
chunksize = 2
with pd.read_csv(filename, chunksize=chunksize) as reader:
    for chunk in reader:
        # risky code that might raise

Same can be done for read_json/sas (I think these are all methods that support chunksize). If this PR should make it into 1.2, I can quickly add the changes for json/sas as well.

Are there more places to promote this new context manager?

@jreback jreback added the IO CSV read_csv, to_csv label Dec 2, 2020
@jreback
Copy link
Contributor

jreback commented Dec 2, 2020

this looks pretty neat. can you add a small note section in io.rst which shows this off. i think this is ok for 1.2, can add json/sas as followsup.

@twoertwein twoertwein force-pushed the contextmanagers branch 4 times, most recently from b820965 to 4a59498 Compare December 2, 2020 06:22
@twoertwein twoertwein changed the title ENH: context-manager for TextFileReader ENH: context-manager for TextFile/JSON/SASReader Dec 2, 2020
@twoertwein twoertwein marked this pull request as ready for review December 2, 2020 07:37
@twoertwein
Copy link
Member Author

twoertwein commented Dec 2, 2020

I don't understand why the documentation example is failing:

Exception in /home/runner/work/pandas/pandas/doc/source/user_guide/io.rst at block ending on line 1586
ParserError: Error tokenizing data. C error: out of memory

edit: the blank line was causing the issue, it was interpreted as the end of the with-block

@twoertwein twoertwein force-pushed the contextmanagers branch 3 times, most recently from 0871fb8 to a0fb4f7 Compare December 3, 2020 18:18
@twoertwein twoertwein changed the title ENH: context-manager for TextFile/JSON/SASReader ENH: context-manager for chunksize/iterator-reader Dec 3, 2020
@twoertwein twoertwein force-pushed the contextmanagers branch 4 times, most recently from 7af38d3 to 0b10951 Compare December 4, 2020 14:52
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great. just some minor doc omments. ping on green.

doc/source/user_guide/io.rst Show resolved Hide resolved
doc/source/user_guide/io.rst Show resolved Hide resolved
doc/source/whatsnew/v1.2.0.rst Outdated Show resolved Hide resolved
pandas/io/parsers.py Outdated Show resolved Hide resolved
pandas/io/sas/sasreader.py Show resolved Hide resolved
@jreback jreback added this to the 1.2 milestone Dec 4, 2020
@twoertwein
Copy link
Member Author

@jreback green. I hope the whatsnew entry is good now

@jreback jreback merged commit 5011a37 into pandas-dev:master Dec 4, 2020
@jreback
Copy link
Contributor

jreback commented Dec 4, 2020

thanks @twoertwein very nice

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants