Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_csv not correctly parsing dates when parse_dates is string and index_col not set #5636

Closed
cancan101 opened this issue Dec 3, 2013 · 3 comments
Labels
API Design Bug Error Reporting Incorrect or improved errors from pandas IO CSV read_csv, to_csv
Milestone

Comments

@cancan101
Copy link
Contributor

With:

csv="""A,B,C
1,2,2003-11-1
"""

These all work as expected:

In [40]: pd.read_csv(StringIO(csv), parse_dates="C",index_col="C").index[0]

Out[40]: Timestamp('2003-11-01 00:00:00', tz=None)

In [41]: pd.read_csv(StringIO(csv), parse_dates=["C"],index_col="C").index[0]

Out[41]: Timestamp('2003-11-01 00:00:00', tz=None)

In [42]: pd.read_csv(StringIO(csv), parse_dates=["C"]).C[0]

Out[42]: Timestamp('2003-11-01 00:00:00', tz=None)

but this does not parse the string:

In [39]: pd.read_csv(StringIO(csv), parse_dates="C",).C[0]

Out[39]: '2003-11-1'
@jreback
Copy link
Contributor

jreback commented Dec 3, 2013

I don't think parse_dates can be a string, so not sure this should work at all (I think your first example works because 'C' is True). will mark as a bug though.

@jreback jreback modified the milestones: 0.15.0, 0.14.0 Feb 18, 2014
@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 3, 2015
@gfyoung
Copy link
Member

gfyoung commented Apr 17, 2016

@jreback : I think this issue can be closed. The "buggy" example fails as expected because parse_dates="C" is the same as parse_dates=True, which will try to convert the index attribute of the resulting DataFrame as datetime objects per the 0.18.0 documentation. By not specifying index to be the C column, it correctly does not parse it as a datetime.

@jreback
Copy link
Contributor

jreback commented Apr 17, 2016

parse_dates : boolean or list of ints or names or list of lists or dict, default False

    * boolean. If True -> try parsing the index.
    * list of ints or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3
      each as a separate date column.
    * list of lists. e.g.  If [[1, 3]] -> combine columns 1 and 3 and parse as
        a single date column.
    * dict, e.g. {'foo' : [1, 3]} -> parse columns 1, 3 as date and call result
      'foo'

So I think this should then have a nice meesage if a non-boolean scalar is passed as its not valid

@jreback jreback modified the milestones: 0.18.1, Next Major Release Apr 17, 2016
gfyoung added a commit to forking-repos/pandas that referenced this issue Apr 18, 2016
Closes pandas-devgh-5636.

In addition, this commit also adds validation
to ensure that parse_dates is one of bool, list,
or dict.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Bug Error Reporting Incorrect or improved errors from pandas IO CSV read_csv, to_csv
Projects
None yet
Development

No branches or pull requests

3 participants