Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: read_csv inconsistent with from_csv -- parses ints as dates #3418

Closed
darindillon opened this issue Apr 22, 2013 · 4 comments
Closed

API: read_csv inconsistent with from_csv -- parses ints as dates #3418

darindillon opened this issue Apr 22, 2013 · 4 comments
Labels
Bug IO CSV read_csv, to_csv Output-Formatting __repr__ of pandas objects, to_string
Milestone

Comments

@darindillon
Copy link

Using pandas 0.10.1.
I read the docs, but didn't see any explanation of why this would be true. pandas.read_csv() works exactly as you'd expect, but pandas.DataFrame.from_csv() is different. Looks like the latter method assumes you're probably dealing with time series data, so it sets defaults parameters to automatically convert integers to dates. I disagree that this is desired, but even if it is, why would it be true for the later method but not the former? Why shouldn't both methods assume the same default assumptions?

Create a CSV like this:
a,b
1,4
2,3

Now this does exactly what you'd expect:
p = pandas.read_csv(your_csv_file)

But this converts the first column into a data. Almost certainly not what you'd expect:
p = pandas.DataFrame.from_csv(your_csv_file)

There is an optional parameter on the second method "parse_dates" which is default False. If you add that flag, then the second method works just like the first. But why the inconsistency? I'd expect this method to default to acting just like the other one.

@jtratner
Copy link
Contributor

jtratner commented Sep 5, 2013

resolution on this? should we just change the default kwarg to parse_dates=False? (granted, it's weird that even with parse_dates=True it interprets that column as dates...

@jreback
Copy link
Contributor

jreback commented Sep 5, 2013

I think we discusses this before

but would just make DataFrame.from_csv just be a direct pass thru call to read_csv

I think the different defaults are confusing

only issue is there is no easy way to warn the user (aside from release notes) that the API has changed (to be read_csv)

not sure what is actually different though

(also if u fix this I think there is an issue about deprecating DataFrame.from_csv. which could be closed)

@jtratner
Copy link
Contributor

jtratner commented Sep 5, 2013

I have zero idea if it matters to change the from_csv API. The differences
seem strange enough that I'm not sure we have to be concerned that many
people are using it now.

@jreback
Copy link
Contributor

jreback commented Sep 30, 2013

closing in favor of #4916

@jreback jreback closed this as completed Sep 30, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

No branches or pull requests

3 participants