Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Feature Request: skipcols in .read_csv #10882
Comments
jreback
added this to the
Next Major Release
milestone
Aug 22, 2015
jreback
added Prio-low and removed Prio-medium
labels
Aug 22, 2015
|
So read_csv() is defined by calling _make_parser_function() which calls _read(). Any instructions would be appreciated. It's a bit confusing to me. @jreback |
|
parser is a bit complicated. see how |
|
It looks like the code related to |
|
Is there a spot in the code where when know all the columns before starting to parse the rows? If so you can assign |
|
Yeah I did something similar but stopped due to other code related to
|
terrytangyuan
referenced
this issue
Sep 4, 2015
Closed
ENH: Added skipcols option for CParser #10985
pylang
commented
Dec 20, 2015
|
Any progress on this addition? Thanks. |
pylang
commented
Jul 10, 2016
|
ping @jreback |
|
if you submit a PR there will be progress |
pylang
commented
Jul 10, 2016
|
many thanks |
jorisvandenbossche
modified the milestone: Someday, Next Major Release
Jul 11, 2016
jreback
modified the milestone: Next Major Release, Someday
Sep 5, 2016
pylang
commented
Sep 25, 2016
•
|
In a similar vain, is there a way to read in a subset of rows? In other words, is there a counterpart to the df = pd.read_csv("bigdata.csv")
df
# Output: Millions of rows
selection = [i for i in range(0, 1000000) if i % 2 == 0]
subset = pd.read_csv("bigdata.csv", use_rows=selection) # skip all rows except those listed
subset
# Output: only even rows for the first million
|
|
@pylang : We now accept |
|
sure, maybe an example of doing that in io.rst would be helpful? |
jreback
added the
Docs
label
Jan 3, 2017
pylang
commented
Jan 4, 2017
|
@gfyoung I'm not sure what you have in mind. I am interested in selecting rows. An example would be helpful, thank you. |
|
@pylang :
>>> data = 'a,b,c\n1,2,3\n2,3,4'
>>> read_csv(StringIO(data), skiprows=lambda x: x%2 == 0, engine='python')
a b c
2 3 4where |
|
@jreback : There are examples in the docs to illustrate |
|
yes that's what i mean, to show using s callable to skipcols |
pylang
commented
Jan 4, 2017
|
@gfyoung I think your example for |
gfyoung
added a commit
to gfyoung/pandas
that referenced
this issue
Jan 4, 2017
|
|
gfyoung |
bea6137
|
This was referenced Jan 4, 2017
gfyoung
added a commit
to gfyoung/pandas
that referenced
this issue
Jan 4, 2017
|
|
gfyoung |
ea3279f
|
jorisvandenbossche
closed this
in #15052
Jan 4, 2017
jorisvandenbossche
added a commit
that referenced
this issue
Jan 4, 2017
|
|
gfyoung + jorisvandenbossche |
4de5cdc
|
jorisvandenbossche
modified the milestone: 0.20.0, Next Major Release
Jan 4, 2017
jreback
added a commit
that referenced
this issue
Jan 14, 2017
|
|
gfyoung + jreback |
7ad6c65
|
AnkurDedania
added a commit
to AnkurDedania/pandas
that referenced
this issue
Mar 21, 2017
|
|
gfyoung + AnkurDedania |
ecbafbb
|
pylang commentedAug 22, 2015
I'd like to read a set of csv files but exclude specific columns.
read_csvcurrently has ausecolskeyword, but it requires writing a list of all the columns present. This is a bit tedious and more importantly, not all files have the same columns, sousecolswould not work in general cases, whereas a complimentary function would work. Can askipcolskeyword be added to 0.17 that accepts a list of column names and reads all but those columns into a DataFrame? Thanks.xref #4749
xref #8985
xref #6710