Feature Request: skipcols in .read_csv #10882

pylang · 2015-08-22T02:24:27Z

I'd like to read a set of csv files but exclude specific columns. read_csv currently has a usecols keyword, but it requires writing a list of all the columns present. This is a bit tedious and more importantly, not all files have the same columns, so usecols would not work in general cases, whereas a complimentary function would work. Can a skipcols keyword be added to 0.17 that accepts a list of column names and reads all but those columns into a DataFrame? Thanks.

xref #4749
xref #8985
xref #6710

The text was updated successfully, but these errors were encountered:

terrytangyuan · 2015-09-01T18:42:39Z

So read_csv() is defined by calling _make_parser_function() which calls _read(). Any instructions would be appreciated. It's a bit confusing to me. @jreback

jreback · 2015-09-01T19:37:41Z

parser is a bit complicated. see how usecols is used.

terrytangyuan · 2015-09-02T01:05:58Z

It looks like the code related to usecols needs to be re-factored before I add things on top of it. I won't be able to re-factor coz I might break a lot of internal things. @jreback

TomAugspurger · 2015-09-02T12:38:19Z

Is there a spot in the code where when know all the columns before starting to parse the rows? If so you can assign usecols=set(all_cols) - set(skipcols) (would need to fixup the ordering afterwards) and go from there.

terrytangyuan · 2015-09-02T13:09:11Z

Yeah I did something similar but stopped due to other code related to
usecols. I'll look into it again. Thanks.
On Sep 2, 2015 8:38 AM, "Tom Augspurger" notifications@github.com wrote:

Is there a spot in the code where when know all the columns before
starting to parse the rows? If so you can take set usecols=set(all_cols)

set(skipcols) (would need to fixup the ordering afterwards) and go from
there.

—
Reply to this email directly or view it on GitHub
#10882 (comment).

pylang · 2015-12-20T03:33:28Z

Any progress on this addition? Thanks.

pylang · 2016-07-10T17:56:50Z

ping @jreback

jreback · 2016-07-10T18:04:58Z

if you submit a PR there will be progress
we have 1700 open issues

pylang · 2016-07-10T22:16:43Z

many thanks

pylang · 2016-09-25T10:10:43Z

In a similar vain, is there a way to read in a subset of rows? In other words, is there a counterpart to the skiprows keyword? For example, this feature is desired:

df = pd.read_csv("bigdata.csv")
df
# Output: Millions of rows

selection = [i for i in range(0, 1000000) if i % 2 == 0]
subset = pd.read_csv("bigdata.csv", use_rows=selection)    # skip all rows except those listed
subset
# Output: only even rows for the first million

gfyoung · 2017-01-03T23:04:14Z

@pylang : We now accept callable for usecols. Does that help to resolve this issue?

jreback · 2017-01-03T23:45:48Z

sure, maybe an example of doing that in io.rst would be helpful?

pylang · 2017-01-04T00:07:07Z

@gfyoung I'm not sure what you have in mind. I am interested in selecting rows. An example would be helpful, thank you.

gfyoung · 2017-01-04T02:15:48Z

@pylang :

Your original issue was for skipcols though?
skiprows is currently not supported by the C engine. However, we could by all means allow skiprows be a callable like usecols is? How does that sound? Something like:

>>> data = 'a,b,c\n1,2,3\n2,3,4'
>>> read_csv(StringIO(data), skiprows=lambda x: x%2 == 0, engine='python')
a b c
2 3 4

where x is the row number (starting at 0)

gfyoung · 2017-01-04T02:17:51Z

@jreback : There are examples in the docs to illustrate usecols, but we can also mention that we can use the callable to exclude columns as well. How does that sound?

jreback · 2017-01-04T02:18:50Z

yes that's what i mean, to show using s callable to skipcols

pylang · 2017-01-04T02:31:43Z

@gfyoung I think your example for skiprows would suffice. And yes you are correct re: skipcols. A similar callable option to filter usecols with an example in the docs would be sufficient imo.

Illustrate how we can use the "usecols" argument to skip particular columns. Closes pandas-devgh-10882.

gfyoung · 2017-01-04T18:38:46Z

@pylang : #15059 is up to address skiprows. I've hit a roadblock at this point implementing it for the C engine, so any input on that would be appreciated!

Illustrate how we can use the "usecols" argument to skip particular columns. Closes pandas-devgh-10882.

Illustrate how we can use the "usecols" argument to skip particular columns. Closes gh-10882.

Title is self-explanatory. xref #10882. Author: gfyoung <gfyoung17@gmail.com> Closes #15059 from gfyoung/skiprows-callable and squashes the following commits: d15e3a3 [gfyoung] ENH: Accept callable for skiprows

Title is self-explanatory. xref pandas-dev#10882. Author: gfyoung <gfyoung17@gmail.com> Closes pandas-dev#15059 from gfyoung/skiprows-callable and squashes the following commits: d15e3a3 [gfyoung] ENH: Accept callable for skiprows

jreback added Enhancement Prio-medium IO CSV read_csv, to_csv labels Aug 22, 2015

jreback added this to the Next Major Release milestone Aug 22, 2015

jreback added Prio-low and removed Prio-medium labels Aug 22, 2015

terrytangyuan mentioned this issue Sep 4, 2015

ENH: Added skipcols option for CParser #10985

Closed

jorisvandenbossche modified the milestones: Someday, Next Major Release Jul 11, 2016

jreback mentioned this issue Sep 5, 2016

ENH: allow usecols to be case insensitive #14154

Closed

jreback modified the milestones: Next Major Release, Someday Sep 5, 2016

jreback added the Docs label Jan 3, 2017

gfyoung added a commit to forking-repos/pandas that referenced this issue Jan 4, 2017

DOC: Add example of skipcols in read_csv

bea6137

Illustrate how we can use the "usecols" argument to skip particular columns. Closes pandas-devgh-10882.

This was referenced Jan 4, 2017

DOC: Add example of skipcols in read_csv #15052

Merged

ENH: Accept callable for skiprows in read_csv #15059

Closed

gfyoung added a commit to forking-repos/pandas that referenced this issue Jan 4, 2017

DOC: Add example of skipcols in read_csv

ea3279f

Illustrate how we can use the "usecols" argument to skip particular columns. Closes pandas-devgh-10882.

jorisvandenbossche closed this as completed in #15052 Jan 4, 2017

jorisvandenbossche pushed a commit that referenced this issue Jan 4, 2017

DOC: Add example of skipcols in read_csv (#15052)

4de5cdc

Illustrate how we can use the "usecols" argument to skip particular columns. Closes gh-10882.

jorisvandenbossche modified the milestones: 0.20.0, Next Major Release Jan 4, 2017

jreback mentioned this issue Mar 24, 2017

read_csv arguments: can we have skipcols and userows? #15799

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: skipcols in .read_csv #10882

Feature Request: skipcols in .read_csv #10882

pylang commented Aug 22, 2015

terrytangyuan commented Sep 1, 2015

jreback commented Sep 1, 2015

terrytangyuan commented Sep 2, 2015

TomAugspurger commented Sep 2, 2015

terrytangyuan commented Sep 2, 2015

pylang commented Dec 20, 2015

pylang commented Jul 10, 2016

jreback commented Jul 10, 2016

pylang commented Jul 10, 2016

pylang commented Sep 25, 2016 •

edited

Loading

gfyoung commented Jan 3, 2017

jreback commented Jan 3, 2017

pylang commented Jan 4, 2017

gfyoung commented Jan 4, 2017

gfyoung commented Jan 4, 2017

jreback commented Jan 4, 2017

pylang commented Jan 4, 2017

gfyoung commented Jan 4, 2017

Feature Request: skipcols in .read_csv #10882

Feature Request: skipcols in .read_csv #10882

Comments

pylang commented Aug 22, 2015

terrytangyuan commented Sep 1, 2015

jreback commented Sep 1, 2015

terrytangyuan commented Sep 2, 2015

TomAugspurger commented Sep 2, 2015

terrytangyuan commented Sep 2, 2015

pylang commented Dec 20, 2015

pylang commented Jul 10, 2016

jreback commented Jul 10, 2016

pylang commented Jul 10, 2016

pylang commented Sep 25, 2016 • edited Loading

gfyoung commented Jan 3, 2017

jreback commented Jan 3, 2017

pylang commented Jan 4, 2017

gfyoung commented Jan 4, 2017

gfyoung commented Jan 4, 2017

jreback commented Jan 4, 2017

pylang commented Jan 4, 2017

gfyoung commented Jan 4, 2017

pylang commented Sep 25, 2016 •

edited

Loading