Minor change to csv reading #146

MLnick · 2011-09-16T09:32:28Z

Hi Wes

Firstly, congrats on such an amazing project! I love prototyping in python / numpy / ipython, but I always envy some of R's features. I tried pandas out about 9 months ago and although it was interesting, it seemed very rough around the edges. Now, however, it is looking really polished and I've been using it for prototyping and testing some trading models, and everything works extremely well. I hope it keeps growing, together with the integration with scikits statsmodels/timeseries and maybe even scikits.learn in future ...

Anyway, as I was starting to dive into the code, I came across the read_csv functions and noticed that there was full duplication in read_table. The csv module in python actually has full support for arbitrary delimiters, so there is no need for the duplication. Also, there is csv.Sniffer().sniff(sample) that attempts to sniff out the delimiter automatically. This commit tries to "magically" handle any arbitrary CSV file without needing to specify a separator, whether separated by blank spaces, tabs, commas, semicolons or other weird separators (I have a file at work with "^" separators :). If it doesn't work, one can fall back on specifying the separator (so read_csv looks more like read_table). In future it could make sense to simply have one read_data or read_table function.

Incidentally, the csv.Sniffer() also tries to sniff out other things like quote escaping and double quoting, but this commit effectively only uses it for the delimiter. If problems with quote / string escaping crop up with users one could always let the sniffer try to figure out the full dialect.

wesm · 2011-09-17T01:44:09Z

Rebased into wesm/master-- thanks for these changes, a big help!

Make sure we don't fetch chunks that don't span the start point

Nick Pentreath added 3 commits September 15, 2011 17:13

read_csv automatically sniffs out separator, using csv.Sniffer()

11ccfab

Adding additional comment to docstring

79f841d

Remove sniff_sep from params; cleaner and simpler

ef5515c

wesm closed this Sep 17, 2011

dan-nadler pushed a commit to dan-nadler/pandas that referenced this pull request Sep 23, 2019

Merge pull request pandas-dev#146 from manahl/tickstore-querying

cb0bee6

Make sure we don't fetch chunks that don't span the start point

MrToustous mentioned this pull request Apr 20, 2021

BUG: Inconsistent handling of empty windows for rolling mean with min_periods=0 #41053

Closed

1 task

SimonKitSangChu mentioned this pull request Apr 27, 2021

BUG: df.hist by column alongside with specified columns #41188

Open

johnands mentioned this pull request Jun 3, 2021

BUG: combine_first fails when one series is timezone-aware and the other is emtpy. #41800

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minor change to csv reading #146

Minor change to csv reading #146

MLnick commented Sep 16, 2011

wesm commented Sep 17, 2011

Minor change to csv reading #146

Minor change to csv reading #146

Conversation

MLnick commented Sep 16, 2011

wesm commented Sep 17, 2011