Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pd.io.parsers.read_csv ignores skiprows when parse_dates is set to a dict #4382

Closed
cancan101 opened this issue Jul 27, 2013 · 3 comments · Fixed by #4969

Comments

@cancan101
Copy link
Contributor

commented Jul 27, 2013

For example:

pd.io.parsers.read_csv("http://www.datazoa.com/publish/export.asp?hash=yjPceG6fHL&uid=dzadmin&a=exportcsv",skiprows=range(1,13+1),skipfooter=4,parse_dates={"date":[0]})

has 907 rows.

As does:

pd.io.parsers.read_csv("http://www.datazoa.com/publish/export.asp?hash=yjPceG6fHL&uid=dzadmin&a=exportcsv",skipfooter=4,parse_dates={"date":[0]})

whereas:

pd.io.parsers.read_csv("http://www.datazoa.com/publish/export.asp?hash=yjPceG6fHL&uid=dzadmin&a=exportcsv",skiprows=range(1,13+1),skipfooter=4,parse_dates=[0])

has 894 rows.

I am on Pandas v0.11.0

@guyrt

This comment has been minimized.

Copy link
Contributor

commented Sep 24, 2013

This is a symptom of a bigger problem:

s = "a,b,c\n" + "\n".join([",".join([str(i), str(i+1), str(i+2)]) for i in xrange(500)])
print pd.read_csv(StringIO(s),skiprows=[200, 202], engine='python')
  <class 'pandas.core.frame.DataFrame'>
  Int64Index: 500 entries, 0 to 499
  Data columns (total 3 columns):
  a    500  non-null values
  b    500  non-null values
  c    500  non-null values
  dtypes: int64(3)

Somehow, skiprows got removed in python engine except in the code that sniffs for the header. That's why the header isn't getting properly removed. Fix coming.

@cancan101

This comment has been minimized.

Copy link
Contributor Author

commented Sep 24, 2013

I assume the issue exists in both the python and non-python engines?

@guyrt

This comment has been minimized.

Copy link
Contributor

commented Sep 24, 2013

Just python. However, since skiprows is defined in the example, read_csv silently fails over to the python engine.

This is a prime example of why I don't like to have silent failover to an unanticipated code path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.