Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Python parser now accepts delim_whitespace=True #12958

Closed

Conversation

gfyoung
Copy link
Member

@gfyoung gfyoung commented Apr 22, 2016

Title is self-explanatory.

@jreback
Copy link
Contributor

jreback commented Apr 22, 2016

can you confirm with #12686 that all options are covered?

@jreback jreback added the IO CSV read_csv, to_csv label Apr 22, 2016
@jreback
Copy link
Contributor

jreback commented Apr 22, 2016

I think doc-string and io.rst need small edits for this (to remove that this is only for c engine)

@jreback jreback added this to the 0.18.1 milestone Apr 22, 2016
@gfyoung
Copy link
Member Author

gfyoung commented Apr 22, 2016

I don't quite understand your question. Could you clarify?

Ah, yes, good point. Need to change what I just added since it's supported for Python now!

@@ -2867,6 +2841,103 @@ def test_read_only_header_no_rows(self):
df = self.read_csv(StringIO('a,b,c'), index_col=False)
tm.assert_frame_equal(df, expected)

def test_line_comment(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the net affect of moving all of these tests around?

Copy link
Member Author

@gfyoung gfyoung Apr 22, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better coverage. Now all of the engines are getting tested under all of the conditions for these delim_whitespace-related ones. The tests in general are arranged somewhat strangely, so at some point a refactor of the entire test suite will be needed, but not here of course 😄

@jreback
Copy link
Contributor

jreback commented Apr 22, 2016

I meant we are now listing all of the options in that issue so that we can check them off. And you have them listed in the variables to check for the options, just want a cross check.

bonus points if you can autogenerate a section in the doc-string which explicity lists the unsupported options (by using those variables). Or maybe don't need to auto-generate, maybe just list a section in doc-string & io.rst to at least list the un-described / only certain engine supported options.

@gfyoung
Copy link
Member Author

gfyoung commented Apr 22, 2016

Well, they used to be a copy of _c_parser_defaults, which I would consider to be the "correct" version until demonstrated otherwise (Python will raise if any of these are passed with a python engine specified). So in fact the list for C-only features in #12686 should be the one I created in this PR, which it is not.

@jreback
Copy link
Contributor

jreback commented Apr 22, 2016

@gfyoung if you can post a correct list I will update

@gfyoung
Copy link
Member Author

gfyoung commented Apr 22, 2016

FYI, Travis has just given the green light for my changes, so if you don't have any other code-related complaints, I'll update the documentation with [ci skip].

@gfyoung
Copy link
Member Author

gfyoung commented Apr 22, 2016

as_recarray
na_filter
compact_ints
use_unsigned
low_memory
memory_map
buffer_lines
error_bad_lines
warn_bad_lines
dtype
decimal
float_precision

If any of these are non-default with a python engine, the parser will raise.

@jreback
Copy link
Contributor

jreback commented Apr 22, 2016

ok I edited the other issue, pls have a look

@gfyoung
Copy link
Member Author

gfyoung commented Apr 22, 2016

Are those uncapitalised headers (e.g. "undocumented in C engine only") sub sections for "Features supported in the C engine only"? If not, all of the ones that I listed just now should also go under "Features supported in the C engine only"

@gfyoung gfyoung force-pushed the delim-whitespace-python branch 2 times, most recently from c835a6a to 18c02f3 Compare April 22, 2016 15:26
@@ -101,8 +101,7 @@ delim_whitespace : boolean, default False
Specifies whether or not whitespace (e.g. ``' '`` or ``'\t'``)
will be used as the delimiter. Equivalent to setting ``sep='\+s'``.
If this option is set to True, nothing should be passed in for the
``delimiter`` parameter. This parameter is currently supported for
the C parser only.
``delimiter`` parameter.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add that this was added in 0.18.1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gfyoung if you can update this, and lmlk when you push.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@jreback
Copy link
Contributor

jreback commented Apr 22, 2016

ty

@gfyoung gfyoung deleted the delim-whitespace-python branch April 22, 2016 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants