New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Respect usecols even with empty data #12506

Closed
wants to merge 2 commits into
base: master
from

Conversation

Projects
None yet
4 participants
@gfyoung
Member

gfyoung commented Mar 2, 2016

closes #12493

in which the usecols argument was not being respected for empty data. This is because no filtering was applied when the first (and only) chunk was being read.

@gfyoung

This comment has been minimized.

Show comment
Hide comment
@gfyoung

gfyoung Mar 2, 2016

Member

Tests are passing. Should be good to merge.

Member

gfyoung commented Mar 2, 2016

Tests are passing. Should be good to merge.

@jreback

View changes

Show outdated Hide outdated pandas/io/tests/test_parsers.py Outdated

@jreback jreback added Bug IO CSV labels Mar 2, 2016

@gfyoung

This comment has been minimized.

Show comment
Hide comment
@gfyoung

gfyoung Mar 3, 2016

Member

If some could cancel this build #18023 (it's an old build), that would be great. Thanks!

Member

gfyoung commented Mar 3, 2016

If some could cancel this build #18023 (it's an old build), that would be great. Thanks!

@gfyoung

This comment has been minimized.

Show comment
Hide comment
@gfyoung

gfyoung Mar 6, 2016

Member

@jreback : Tests are passing once more. Should be good to merge AFAICT.

Member

gfyoung commented Mar 6, 2016

@jreback : Tests are passing once more. Should be good to merge AFAICT.

@jreback

View changes

Show outdated Hide outdated pandas/io/parsers.py Outdated
@jreback

View changes

Show outdated Hide outdated pandas/parser.pyx Outdated
@@ -741,7 +741,7 @@ def _parse(flavor, io, match, header, index_col, skiprows,
parse_dates=parse_dates,
tupleize_cols=tupleize_cols,
thousands=thousands))
except StopIteration: # empty table

This comment has been minimized.

@jreback

jreback Mar 6, 2016

Contributor

this is not a good idea at all (same as above)

@jreback

jreback Mar 6, 2016

Contributor

this is not a good idea at all (same as above)

This comment has been minimized.

@gfyoung

gfyoung Mar 6, 2016

Member

Telling me it's not a very good idea is not very helpful feedback given that that was what was written before in the first place. First, why is it not a good idea? And second, what might you suggest as alternatives to doing a try-except block? Direct checks of whether the table is empty come to mind, but if there are others, it would be good to know

@gfyoung

gfyoung Mar 6, 2016

Member

Telling me it's not a very good idea is not very helpful feedback given that that was what was written before in the first place. First, why is it not a good idea? And second, what might you suggest as alternatives to doing a try-except block? Direct checks of whether the table is empty come to mind, but if there are others, it would be good to know

@jreback

View changes

Show outdated Hide outdated doc/source/whatsnew/v0.18.0.txt Outdated
@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Mar 6, 2016

Contributor

@gfyoung you had to change a whole bunch of things to get your code to work (e.g. changing long established behavior in excel/html parsing), the StopIteration can now also be a ValueError. That is just crazy. My point is to make minimal changes that get the job done, use general methods where possible, and don't duplicate code.

Contributor

jreback commented Mar 6, 2016

@gfyoung you had to change a whole bunch of things to get your code to work (e.g. changing long established behavior in excel/html parsing), the StopIteration can now also be a ValueError. That is just crazy. My point is to make minimal changes that get the job done, use general methods where possible, and don't duplicate code.

@gfyoung

This comment has been minimized.

Show comment
Hide comment
@gfyoung

gfyoung Mar 8, 2016

Member

@jreback : Simplified things a little by changing the error from ValueError to StopIteration. I still kept some of the error messages from the C engine because I find them to be quite useful.

Member

gfyoung commented Mar 8, 2016

@jreback : Simplified things a little by changing the error from ValueError to StopIteration. I still kept some of the error messages from the C engine because I find them to be quite useful.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Mar 8, 2016

Contributor

at a glance looks ok. still working on 0.18.0, so will get to next week.

Contributor

jreback commented Mar 8, 2016

at a glance looks ok. still working on 0.18.0, so will get to next week.

@jreback jreback added this to the 0.18.1 milestone Mar 8, 2016

@gfyoung

This comment has been minimized.

Show comment
Hide comment
@gfyoung

gfyoung Mar 8, 2016

Member

👍 will ping on all of them next week then.

Member

gfyoung commented Mar 8, 2016

👍 will ping on all of them next week then.

@jreback

View changes

Show outdated Hide outdated pandas/io/tests/test_parsers.py Outdated
@jreback

View changes

Show outdated Hide outdated pandas/io/tests/test_parsers.py Outdated
@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Mar 18, 2016

Contributor

don't continue to propogate the differnces between the c and python parser. Let's sync them up. The c-parser is almost certainly right (the cases are of course when a ValueError/CParserError is throw vs a StopIteration) which is for a completely different reason.

For each thing that is changing, you need to be VERY explict in the whatsnew note, showing an example of where a change is made.

Contributor

jreback commented Mar 18, 2016

don't continue to propogate the differnces between the c and python parser. Let's sync them up. The c-parser is almost certainly right (the cases are of course when a ValueError/CParserError is throw vs a StopIteration) which is for a completely different reason.

For each thing that is changing, you need to be VERY explict in the whatsnew note, showing an example of where a change is made.

@gfyoung

This comment has been minimized.

Show comment
Hide comment
@gfyoung

gfyoung Mar 18, 2016

Member

What do you mean by "syncing" them up? I'm a little stuck here because you said above that we should keep the changes as simple as possible.

EDIT: You answered my question above, so ignore this.

Member

gfyoung commented Mar 18, 2016

What do you mean by "syncing" them up? I'm a little stuck here because you said above that we should keep the changes as simple as possible.

EDIT: You answered my question above, so ignore this.

@gfyoung

This comment has been minimized.

Show comment
Hide comment
@gfyoung

gfyoung Mar 18, 2016

Member

Actually, I just realized that once #12551 is landed, aligning CParserError errors in the C engine with ValueError errors in the Python engine will make a lot more sense since CParserError will then inherit from ValueError.

Member

gfyoung commented Mar 18, 2016

Actually, I just realized that once #12551 is landed, aligning CParserError errors in the C engine with ValueError errors in the Python engine will make a lot more sense since CParserError will then inherit from ValueError.

@gfyoung gfyoung closed this Mar 20, 2016

@gfyoung gfyoung deleted the gfyoung:empty_usecols branch Mar 20, 2016

@gfyoung gfyoung restored the gfyoung:empty_usecols branch Mar 20, 2016

@gfyoung gfyoung reopened this Mar 20, 2016

@gfyoung

This comment has been minimized.

Show comment
Hide comment
@gfyoung

gfyoung Mar 22, 2016

Member

@jreback : I've attempted to sync up CParserError with ValueError on the Python side, but the problem is that it seems StopIteration is too ingrained into the code at the moment. Converting those errors to ValueError suddenly causes breakages all over the place in test_parsers.py and other files like test_html.py or test_excel.py.

While I haven't made the commit here, I was able to locally change all the CParserError's to plain ValueError's with no issues. However, that is breaking backwards compatibility to a degree. What do you think is the best way to proceed?

Member

gfyoung commented Mar 22, 2016

@jreback : I've attempted to sync up CParserError with ValueError on the Python side, but the problem is that it seems StopIteration is too ingrained into the code at the moment. Converting those errors to ValueError suddenly causes breakages all over the place in test_parsers.py and other files like test_html.py or test_excel.py.

While I haven't made the commit here, I was able to locally change all the CParserError's to plain ValueError's with no issues. However, that is breaking backwards compatibility to a degree. What do you think is the best way to proceed?

@jreback

View changes

Show outdated Hide outdated pandas/io/parsers.py Outdated
@jreback

View changes

Show outdated Hide outdated pandas/io/parsers.py Outdated
@jreback

View changes

Show outdated Hide outdated pandas/io/tests/test_parsers.py Outdated
@gfyoung

This comment has been minimized.

Show comment
Hide comment
@gfyoung

gfyoung Mar 22, 2016

Member

@jreback : It's not as simple as you make it or sound, or as I originally thought. I wouldn't have made that comment if it was easy to do. Changing that StopIteration alone breaks a good number of tests not just for html.py and excel.py but for parsers.py as well. I will give it another shot today, but I think this change should be refrained for a separate PR if I can't manage.

Member

gfyoung commented Mar 22, 2016

@jreback : It's not as simple as you make it or sound, or as I originally thought. I wouldn't have made that comment if it was easy to do. Changing that StopIteration alone breaks a good number of tests not just for html.py and excel.py but for parsers.py as well. I will give it another shot today, but I think this change should be refrained for a separate PR if I can't manage.

@jreback

View changes

Show outdated Hide outdated doc/source/whatsnew/v0.18.1.txt Outdated
self.read_csv(StringIO(''))
with tm.assertRaisesRegexp(EmptyDataError, errmsg):
self.read_csv(StringIO(''), usecols=usecols)

This comment has been minimized.

@jreback

jreback Mar 26, 2016

Contributor

I suspect there are tests that are testing for Exception but now need to look for the specific exception

@jreback

jreback Mar 26, 2016

Contributor

I suspect there are tests that are testing for Exception but now need to look for the specific exception

This comment has been minimized.

@gfyoung

gfyoung Mar 26, 2016

Member

You are indeed right. I changed some of those to ValueError now.

@gfyoung

gfyoung Mar 26, 2016

Member

You are indeed right. I changed some of those to ValueError now.

@gfyoung

This comment has been minimized.

Show comment
Hide comment
@gfyoung

gfyoung Mar 26, 2016

Member

@jreback : Modified some of the tests that initially tested for Exception, and Travis is still happy. Ready to merge if there is nothing else.

Member

gfyoung commented Mar 26, 2016

@jreback : Modified some of the tests that initially tested for Exception, and Travis is still happy. Ready to merge if there is nothing else.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback
Contributor

jreback commented Mar 26, 2016

@TomAugspurger

View changes

Show outdated Hide outdated pandas/io/tests/test_parsers.py Outdated
@gfyoung

This comment has been minimized.

Show comment
Hide comment
@gfyoung

gfyoung Mar 30, 2016

Member

@jreback : Is this good to merge, or are we still waiting to see if there are anymore complaints/concerns?

Member

gfyoung commented Mar 30, 2016

@jreback : Is this good to merge, or are we still waiting to see if there are anymore complaints/concerns?

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback
Contributor

jreback commented Mar 30, 2016

@gfyoung

This comment has been minimized.

Show comment
Hide comment
@gfyoung

gfyoung Apr 5, 2016

Member

@jreback : I haven't seen any complaints from @jorisvandenbossche in the past week...so...good to go?

Member

gfyoung commented Apr 5, 2016

@jreback : I haven't seen any complaints from @jorisvandenbossche in the past week...so...good to go?

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Apr 5, 2016

Contributor

@gfyoung no I don't think he's had a chance to look.

Contributor

jreback commented Apr 5, 2016

@gfyoung no I don't think he's had a chance to look.

@gfyoung

This comment has been minimized.

Show comment
Hide comment
@gfyoung

gfyoung Apr 5, 2016

Member

@jreback : Okay, I'll ping again in about a week then.

Member

gfyoung commented Apr 5, 2016

@jreback : Okay, I'll ping again in about a week then.

@gfyoung

This comment has been minimized.

Show comment
Hide comment
@gfyoung

gfyoung Apr 12, 2016

Member

@jorisvandenbossche : Any update on this PR? I think @jreback is waiting for your feedback before he merges this in.

Member

gfyoung commented Apr 12, 2016

@jorisvandenbossche : Any update on this PR? I think @jreback is waiting for your feedback before he merges this in.

@jorisvandenbossche

This comment has been minimized.

Show comment
Hide comment
@jorisvandenbossche

jorisvandenbossche Apr 12, 2016

Member

Sorry, slowly getting back on track :-) I will take a look now.

Just a quick question, my previous comment was about HTML and Excel parser. The whatsnew note now does not mention those anymore. There are no longer changes to to read_html and read_excel ?

Member

jorisvandenbossche commented Apr 12, 2016

Sorry, slowly getting back on track :-) I will take a look now.

Just a quick question, my previous comment was about HTML and Excel parser. The whatsnew note now does not mention those anymore. There are no longer changes to to read_html and read_excel ?

@gfyoung

This comment has been minimized.

Show comment
Hide comment
@gfyoung

gfyoung Apr 12, 2016

Member

@jorisvandenbossche : No worries. 😄 There are in fact changes to read_html and read_excel. If you look at the changed files, both changed the exception they are catching from StopIteration to EmptyDataError when there are empty tables. However, those changes are a result of the changes made to the parsers and are not direct changes to read_html and read_excel.

Member

gfyoung commented Apr 12, 2016

@jorisvandenbossche : No worries. 😄 There are in fact changes to read_html and read_excel. If you look at the changed files, both changed the exception they are catching from StopIteration to EmptyDataError when there are empty tables. However, those changes are a result of the changes made to the parsers and are not direct changes to read_html and read_excel.

@jorisvandenbossche

View changes

Show outdated Hide outdated pandas/io/tests/test_parsers.py Outdated
@jorisvandenbossche

This comment has been minimized.

Show comment
Hide comment
@jorisvandenbossche

jorisvandenbossche Apr 12, 2016

Member

@gfyoung But they only catch it under the hood? (I mean, it doesn't bubble up to the user?)

Member

jorisvandenbossche commented Apr 12, 2016

@gfyoung But they only catch it under the hood? (I mean, it doesn't bubble up to the user?)

In [1]: df = pd.read_csv(StringIO(''), engine='c')
...
pandas.io.common.EmptyDataError: No columns to parse from file

This comment has been minimized.

@jorisvandenbossche

jorisvandenbossche Apr 12, 2016

Member

Is it actually shown with the full name? (just a question, didn't test it, didn't fetch the PR, but I would just show the same as in an actual console)

@jorisvandenbossche

jorisvandenbossche Apr 12, 2016

Member

Is it actually shown with the full name? (just a question, didn't test it, didn't fetch the PR, but I would just show the same as in an actual console)

This comment has been minimized.

@gfyoung

gfyoung Apr 12, 2016

Member

Which is what I did. 😄 - FYI you can observe this full out name thing if you trigger any current CParserError.

@gfyoung

gfyoung Apr 12, 2016

Member

Which is what I did. 😄 - FYI you can observe this full out name thing if you trigger any current CParserError.

This comment has been minimized.

@jorisvandenbossche

jorisvandenbossche Apr 12, 2016

Member

OK, perfect! (I just wondered if it was the case)

@jorisvandenbossche

jorisvandenbossche Apr 12, 2016

Member

OK, perfect! (I just wondered if it was the case)

@gfyoung

This comment has been minimized.

Show comment
Hide comment
@gfyoung

gfyoung Apr 12, 2016

Member

@jorisvandenbossche : Not AFAICT (it's an explicit try-except block for read_html and read_excel)

Member

gfyoung commented Apr 12, 2016

@jorisvandenbossche : Not AFAICT (it's an explicit try-except block for read_html and read_excel)

@jorisvandenbossche

This comment has been minimized.

Show comment
Hide comment
@jorisvandenbossche

jorisvandenbossche Apr 12, 2016

Member

@jreback @gfyoung I didn't look at the code changes in detail, but the whatsnew is in any case very clear and sounds logical! So good to go for me

Member

jorisvandenbossche commented Apr 12, 2016

@jreback @gfyoung I didn't look at the code changes in detail, but the whatsnew is in any case very clear and sounds logical! So good to go for me

@jreback

View changes

Show outdated Hide outdated doc/source/whatsnew/v0.18.1.txt Outdated
@jreback

View changes

Show outdated Hide outdated doc/source/whatsnew/v0.18.1.txt Outdated
@jreback

View changes

Show outdated Hide outdated doc/source/whatsnew/v0.18.1.txt Outdated
@jreback

View changes

Show outdated Hide outdated doc/source/whatsnew/v0.18.1.txt Outdated
@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Apr 13, 2016

Contributor

@gfyoung ok just a couple of minor doc comments. Pls also have a look thru io.rst and see if any exceptions are mentioned (and if so fix them), as well as doc-strings.

Contributor

jreback commented Apr 13, 2016

@gfyoung ok just a couple of minor doc comments. Pls also have a look thru io.rst and see if any exceptions are mentioned (and if so fix them), as well as doc-strings.

BUG: Better handling of empty data reads with Python engine
In Python, when reading an empty file, it used to throw a StopIteration
error with no error message. This PR helps to differentiate the case
when no columns are inferable, which now leads to an EmptyDataError for
both the C and Python engines.

[ci skip]
In addition to this error change, several others have been made as well:
- ``CParserError`` is now a ``ValueError`` instead of just an ``Exception`` (:issue:`12551`)

This comment has been minimized.

@jreback

jreback Apr 13, 2016

Contributor

was this whatsnew just not put in before? (the PR was already merged)

@jreback

jreback Apr 13, 2016

Contributor

was this whatsnew just not put in before? (the PR was already merged)

This comment has been minimized.

@gfyoung

gfyoung Apr 13, 2016

Member

Yes, but I moved it into this section because it's related

@gfyoung

gfyoung Apr 13, 2016

Member

Yes, but I moved it into this section because it's related

@jreback jreback closed this in 827745d Apr 13, 2016

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Apr 13, 2016

Contributor

@gfyoung thanks!

Contributor

jreback commented Apr 13, 2016

@gfyoung thanks!

@gfyoung

This comment has been minimized.

Show comment
Hide comment
@gfyoung

gfyoung Apr 13, 2016

Member

@jreback : Sure thing! FYI, you can also cancel my build here.

Member

gfyoung commented Apr 13, 2016

@jreback : Sure thing! FYI, you can also cancel my build here.

@gfyoung gfyoung deleted the gfyoung:empty_usecols branch Apr 13, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment