Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Respect 'usecols' parameter even when CSV rows are uneven #12551

Closed
wants to merge 1 commit into from

Conversation

gfyoung
Copy link
Member

@gfyoung gfyoung commented Mar 7, 2016

Closes #12203 by overriding the row alignment checks for both engines when the usecols parameter is passed into read_csv.

@gfyoung gfyoung force-pushed the usecol_long_lines branch 3 times, most recently from 77c150a to 9c2cb89 Compare March 7, 2016 11:47
@jreback jreback added the IO CSV read_csv, to_csv label Mar 7, 2016
@jreback jreback added this to the 0.18.1 milestone Mar 7, 2016
@gfyoung gfyoung force-pushed the usecol_long_lines branch 3 times, most recently from 3e8be6a to f7d9698 Compare March 9, 2016 02:41
@jreback
Copy link
Contributor

jreback commented Mar 9, 2016

@gfyoung its not necessary for you to keep rebasing every time master is slightly updated.

@gfyoung
Copy link
Member Author

gfyoung commented Mar 9, 2016

@jreback : Oh whoops, sorry! I have a small script that updates my branches (including master in my fork) whenever I see there are commits, and I forgot to specify to just update the master branch.

@jreback
Copy link
Contributor

jreback commented Mar 9, 2016

oh ok maybe just do it manually for a while

trying to get out another rc

@jreback
Copy link
Contributor

jreback commented Mar 9, 2016

the issue is that Travis runs and it can get very busy

@gfyoung
Copy link
Member Author

gfyoung commented Mar 9, 2016

No, I perfectly understand. As a precaution, I'll add [ci skip] to all of my open PR's, since I imagine there will be changes to be made once reviewed after which I'll remove it so that Travis can run.

@gfyoung gfyoung force-pushed the usecol_long_lines branch 13 times, most recently from cb2fb1a to a5f0562 Compare March 14, 2016 21:06
@gfyoung gfyoung force-pushed the usecol_long_lines branch 2 times, most recently from a5d7434 to 5fadd81 Compare March 16, 2016 15:25
# make sure that an error is still thrown
# when the 'usecols' parameter is not provided
name = self.__class__.__name__
msg = "Expected \d+ fields in line \d+, saw \d+"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is too much gymnastics here. Why is this needed? e.g. diffs in the errors from python/c parser? if that IS the case. then I'd like to change the raise for this kind of error to a ValueError in the cparser.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well that was why I had initially created a PythonParserError to keep it consistent with what you would see with the C engine. ValueError though makes sense for consistency and semantic purposes. Will change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, looking at where the CParserError is raised, it's buried in the tokenizer.c code, which is supposed to be for parsing errors in general. Suggestions on how to externalize this particular CParserError?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no its in parser.pyx (but it inherits from ValueError). I would actually change that, but that might cause some issues (you can check that soln though)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right, sorry! Do you mean just getting rid of CParserError altogether and replacing with just ValueError?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you wont' be able to remove it (well shouldn't as its in the public API). But it inherites from Exception. if you can make it inherit from ValueError then I think we'd take that. See what happens. lmk.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. That seems like a good compromise because ValueError inherits from Exception anyways. Will give it a shot.

@gfyoung
Copy link
Member Author

gfyoung commented Mar 18, 2016

@jreback : Travis has no problems making CParserError inherit from ValueError, so I think this is also good to merge if there is nothing else.

@jreback
Copy link
Contributor

jreback commented Mar 18, 2016

need s note in the API changes section wrt error class

@gfyoung gfyoung force-pushed the usecol_long_lines branch 2 times, most recently from 2476657 to 0d397e5 Compare March 18, 2016 00:37
@gfyoung
Copy link
Member Author

gfyoung commented Mar 18, 2016

Added note in the API changes. Should be good to merge if there is nothing else.

@@ -50,6 +50,12 @@ API changes



- ``CParserError`` is now a ``ValueError`` instead of just an ``Exception``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issue number for the PR here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@gfyoung
Copy link
Member Author

gfyoung commented Mar 19, 2016

@jreback : Made all of the requested changes. Should be ready to merge if there is nothing else.

@jreback jreback closed this in e55875e Mar 20, 2016
@jreback
Copy link
Contributor

jreback commented Mar 20, 2016

thanks @gfyoung

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants