Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_excel() modifies provided types dict when accessing file with duplicate column #42462

Closed
2 of 3 tasks
cdol opened this issue Jul 9, 2021 · 1 comment · Fixed by #42508
Closed
2 of 3 tasks

read_excel() modifies provided types dict when accessing file with duplicate column #42462

cdol opened this issue Jul 9, 2021 · 1 comment · Fixed by #42508
Labels
Bug IO Excel read_excel, to_excel Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@cdol
Copy link

cdol commented Jul 9, 2021

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of pandas.
  • (optional) I have confirmed this bug exists on the master branch of pandas.

test.xlsx :

a a b c
1 1 b1 c1
2 2 b2 c2
3 3 b3 c3
import pandas as pd


types_dict = {'a': str,
             'b': str,
             'c': str,
             }


if __name__ == "__main__":
    df = pd.read_excel('./test.xlsx', dtype=type_dict)
    print(list(type_dict.keys()))
>> ['a', 'b', 'c', 'a.1']

Bug/Issue description:
When using dtype loading a .xlsx-file with a duplicate column into a dataframe modifies the provided types_dict / adds entries for duplicate columns.

It seems to me like the modification of the types_dict is an unwanted side effect.

@lithomas1 lithomas1 added Bug IO Excel read_excel, to_excel Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 9, 2021
@mzeitlin11
Copy link
Member

Thanks for reporting this @cdol! Confirmed on master (seems limited to read_excel, doesn't show up for similar example with read_csv), agree it is an unwelcome side effect. Fixes very welcome!

@mzeitlin11 mzeitlin11 removed the Needs Triage Issue that has not been reviewed by a pandas team member label Jul 10, 2021
@mzeitlin11 mzeitlin11 added this to the Contributions Welcome milestone Jul 10, 2021
@jreback jreback modified the milestones: Contributions Welcome, 1.4 Aug 4, 2021
@phofl phofl added the Regression Functionality that used to work in a prior pandas version label Aug 4, 2021
@phofl phofl modified the milestones: 1.4, 1.3.2 Aug 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO Excel read_excel, to_excel Regression Functionality that used to work in a prior pandas version
Projects
None yet
5 participants