Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

index=False by default when exporting dataframes #34576

Closed
JavierLTPromofarma opened this issue Jun 4, 2020 · 7 comments
Closed

index=False by default when exporting dataframes #34576

JavierLTPromofarma opened this issue Jun 4, 2020 · 7 comments

Comments

@JavierLTPromofarma
Copy link

Is your feature request related to a problem?

Every single time I have exported a dataframe (usually with .to_csv), I have not needed the index.

Describe the solution you'd like

Change the default from True to False.

API breaking implications

All code that has not explicitely set index=True should make it.

Describe alternatives you've considered

Leave things as they are now.

Not using the index is something usual or am I biased by my own experience?

@JavierLTPromofarma JavierLTPromofarma added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 4, 2020
@TomAugspurger
Copy link
Contributor

We can leave this open for a bit, but I don't think changing this is feasible. In my experience, index=True is the appropriate default, and not using a meaningful index is a bit of an anti-pattern.

This overlaps with an ongoing about an "optional index". I wasn't able to turn up the issue in a quick search.

@TomAugspurger TomAugspurger added API Design and removed Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 4, 2020
@jreback
Copy link
Contributor

jreback commented Jun 4, 2020

agree with @TomAugspurger here
even if i agreed (i don’t) this is impossible to change because of compatibility

note this is why we had a Dataframe.from_csv method with different reading defaults, but that was just confusing and has since been removed

@jorisvandenbossche
Copy link
Member

I personally agree with @JavierLTPromofarma that the default of index=True is not always ideal, but it's also something that is not really easy / realistic to change.

I think a non-meaningful index is not an anti-pattern. Basically whenever you don't explicitly set one of your columns as the index, you often have a meaningless index. And unless you want to take advantage of easier indexing by setting your column as the index (or from timeseries functionality), setting a column as the index is not always that important.

What we did in GeoPandas' to_file method is change the default to index=None and then "infer" wether to write it: only write the index if it is a non-integer index or if it has a name. I personally think this is a better default (it will of course do a wrong guess in some cases, but then you can explicitly set index to True or False, but I think in a majority of the cases it will be a better default).

Now, the way we can change this is probably rather through the "optional" index idea, as Tom mentioned. We have an old issue about it here: wesm/pandas2#17.

@roschly
Copy link

roschly commented Sep 7, 2020

From a UX standpoint the default is less than ideal. The fact that a simple to_csv/read_csv doesn't produce the same dataframe by default is not intuitive.

df = pd.DataFrame(columns=["A", "B"], data=[[1, 2], [3, 4]])
df.to_csv("tmp.csv")
df2 = pd.read_csv("tmp.csv")

df:

  A B
0 1 2
1 3 4

df2:

  Unnamed: 0 A B
0 0 1 2
1 1 3 4

This has caused me headaches before when swapping runtime created dataframes with dataframes read from csv. The index=False is an easy fix, but also just as easy to forget.

@mroeschke
Copy link
Member

Thanks for the suggestion, but it appears unlikely that index=False would be an easy change for pandas to make due to compatibility. Since having an optional index is covered in another issue, closing.

@DanTaranis
Copy link

I have been bothered by this for years - just not enough to do anything about it. but I think the community as a whole would much much much rather have False be the default. Only on rare occasions is the index a meaningful column.

Is there a way to ask the community - like a vote or something?

@pieterseeder
Copy link

@DanTaranis for president.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants