-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pivot
can create dataframe with duplicate columns
#13994
Comments
Haven't checked if it is the exact same cause, but there has been some previous on this topic which is probably relevant: |
yup, I'd say that this is a duplicate of #11663 FWIW tomorrow I have all day free to do Polars work so I'll do a full-immersion into this, hope I can come up with a way forwards 🤞 |
@MarcoGorelli great. I think first we should agree on a path forward though.
I think the best way forward is to use the first format for everything (which avoids duplicate column names, but creates messy looking names always) by default, but also provide an output naming strategy. I'm not entirely sure what the best way to approach implementing the naming strategy would be, so that is something to think about. |
Duplicate of #11663 |
Checks
Reproducible example
Log output
Issue description
When multiple
column
columns are supplied and they contain overlapping values, and there is only onevalues
column, then duplicate columns will arise.Note that when there are multiple
values
columns, thevalues
andcolumns
names are included as part of the output column names:However, if there is only a single
values
column, then the new column names use the unique values found within thecolumns
only. Thus, if there is any value that is common to thecolumns
columns, we end up with a duplicate column name.Expected behavior
We should probably simply always include the
values
andcolumns
names, and perhaps in a separate issue allow for a name generator.Installed versions
The text was updated successfully, but these errors were encountered: