New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_csv not correctly parsing dates when parse_dates is string and index_col not set #5636

Closed
cancan101 opened this Issue Dec 3, 2013 · 3 comments

Comments

Projects
None yet
3 participants
@cancan101
Contributor

cancan101 commented Dec 3, 2013

With:

csv="""A,B,C
1,2,2003-11-1
"""

These all work as expected:

In [40]: pd.read_csv(StringIO(csv), parse_dates="C",index_col="C").index[0]

Out[40]: Timestamp('2003-11-01 00:00:00', tz=None)

In [41]: pd.read_csv(StringIO(csv), parse_dates=["C"],index_col="C").index[0]

Out[41]: Timestamp('2003-11-01 00:00:00', tz=None)

In [42]: pd.read_csv(StringIO(csv), parse_dates=["C"]).C[0]

Out[42]: Timestamp('2003-11-01 00:00:00', tz=None)

but this does not parse the string:

In [39]: pd.read_csv(StringIO(csv), parse_dates="C",).C[0]

Out[39]: '2003-11-1'
@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Dec 3, 2013

Contributor

I don't think parse_dates can be a string, so not sure this should work at all (I think your first example works because 'C' is True). will mark as a bug though.

Contributor

jreback commented Dec 3, 2013

I don't think parse_dates can be a string, so not sure this should work at all (I think your first example works because 'C' is True). will mark as a bug though.

@jreback jreback modified the milestones: 0.15.0, 0.14.0 Feb 18, 2014

@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 3, 2015

@gfyoung

This comment has been minimized.

Show comment
Hide comment
@gfyoung

gfyoung Apr 17, 2016

Member

@jreback : I think this issue can be closed. The "buggy" example fails as expected because parse_dates="C" is the same as parse_dates=True, which will try to convert the index attribute of the resulting DataFrame as datetime objects per the 0.18.0 documentation. By not specifying index to be the C column, it correctly does not parse it as a datetime.

Member

gfyoung commented Apr 17, 2016

@jreback : I think this issue can be closed. The "buggy" example fails as expected because parse_dates="C" is the same as parse_dates=True, which will try to convert the index attribute of the resulting DataFrame as datetime objects per the 0.18.0 documentation. By not specifying index to be the C column, it correctly does not parse it as a datetime.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Apr 17, 2016

Contributor
parse_dates : boolean or list of ints or names or list of lists or dict, default False

    * boolean. If True -> try parsing the index.
    * list of ints or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3
      each as a separate date column.
    * list of lists. e.g.  If [[1, 3]] -> combine columns 1 and 3 and parse as
        a single date column.
    * dict, e.g. {'foo' : [1, 3]} -> parse columns 1, 3 as date and call result
      'foo'

So I think this should then have a nice meesage if a non-boolean scalar is passed as its not valid

Contributor

jreback commented Apr 17, 2016

parse_dates : boolean or list of ints or names or list of lists or dict, default False

    * boolean. If True -> try parsing the index.
    * list of ints or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3
      each as a separate date column.
    * list of lists. e.g.  If [[1, 3]] -> combine columns 1 and 3 and parse as
        a single date column.
    * dict, e.g. {'foo' : [1, 3]} -> parse columns 1, 3 as date and call result
      'foo'

So I think this should then have a nice meesage if a non-boolean scalar is passed as its not valid

@jreback jreback modified the milestones: 0.18.1, Next Major Release Apr 17, 2016

gfyoung added a commit to gfyoung/pandas that referenced this issue Apr 18, 2016

BUG: Enforce parse_dates as bool when scalar
Closes pandas-devgh-5636.

In addition, this commit also adds validation
to ensure that parse_dates is one of bool, list,
or dict.

@jreback jreback closed this in fe8f8f4 Apr 19, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment