-
-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support for read_fwf() as in R. #952
Conversation
self.f = f | ||
self.colspecs = colspecs | ||
|
||
assert isinstance(colspecs, (tuple, list)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be less strict with this first check - there might be situations where you want to pass a generator rather than a list. I'd do self.colspecs = list(colspecs)
, and let it handle anything that can be turned into a list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. WIll do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
A broader point about the behaviour: how should it handle non-ascii data, especially with variable width encodings like UTF-8? At present, I think this will do different things according to the version of Python:
I guess counting bytes is more consistent with what other tools mean by fixed width, but we should decode strings in line with the encoding parameter. |
…ents from Thomas Kluyver).
I've made changes recommend by Thomas. |
verbose=False, | ||
delimiter=None, | ||
encoding=None): | ||
kwds = locals() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not wild on the use of locals()
for these, it seems like unnecessary magic. But maybe I'm being overly picky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the alternative here is to either write out kw=kw
for each keyword argument or to have **kwds
which makes the signature in IPython less attractive. Not sure what's the best solution-- using locals doesn't strike me as so bad
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having to enumerate all the paramters is both error-prone, makes it difficult to extend the other functions, and it hides the differences between the calls to _read(). I wish there was a method to get just the args, but there isn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. I think PEP 362 is aimed at this sort of thing - you'd use **kwargs
, and construct a more meaningful function signature for introspection - but that's still a work in progress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally it would be like LISP and a :as variable could be assigned to the set of kwargs. But it's Python. Whatever. We'll eventually end up with LISP again.
OK, I've made a couple more minor comments. I'll like someone with a fresh pair of eyes look over this - pinging @wesm and @adamklein. |
This all looks pretty good to me. I'll take it for a spin in the next couple of days and merge it in. Thanks for the PR! |
Sweet thanks all for this, very nice work, merged it in. @blais you'll want to fetch pydata/master now and |
I added a simple function like R's read_fwf().
It also made sense to refactor read_csv() and read_table() to redirect to a new function _read() which takes the parser's class as input. I needed this for the FixedWidthFieldParser which needs to override the TextParser in order to provide a custom reader for the fields. Please review; the tests pass.
Also, I made the column spec ("widths")
I'm not sure that's the best; I think in practice a lot of column format specs start counting at one, but I haven't done a survey or anything, feel free to change it. Maybe we should support both.