Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: read_excel() doesn't handle Multiindex correctly #47487

Closed
2 of 3 tasks
0nelight opened this issue Jun 23, 2022 · 0 comments · Fixed by #47496
Closed
2 of 3 tasks

BUG: read_excel() doesn't handle Multiindex correctly #47487

0nelight opened this issue Jun 23, 2022 · 0 comments · Fixed by #47496
Labels
Bug IO Excel read_excel, to_excel
Milestone

Comments

@0nelight
Copy link

0nelight commented Jun 23, 2022

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

# Could not provide a copy-pastable Reproducible Example because of empty line bug in ".to_excel" https://github.com/pandas-dev/pandas/issues/27772
# Please use either Screenshot or attached Excel-File.

import pandas as pd

df = pd.read_excel('sample.xlsx', index_col=[0, 1, 2], header=[0, 1, 2])

print(df.index)

MultiIndex([('A', 'AA', 'AAA'),
            (nan, 'BB', 'BBB'),
            (nan, 'CC', 'CCC'),
            (nan, 'DD', 'DDD'),
            (nan, 'EE', 'EEE')],
           )

Issue Description

When importing a Excel-XSLX File (sample.xlsx) into pandas with .read_excel() the Multiindex is not handled properly.

Instead of copying the value of the first Index-Column-Cell into the Cells beneath it, it is pasting "NAN"s there.
This also doesn't change when merging the area to a merged cell.

Example Data in Excel/Calc:
pandas_bug

Expected Behavior

It should copy the first Value in the Index-Column into the cells beneath them:

    print(df.index)

    MultiIndex([('A', 'AA', 'AAA'),
                ('A', 'BB', 'BBB'),
                ('A', 'CC', 'CCC'),
                ('A', 'DD', 'DDD'),
                ('A', 'EE', 'EEE')],
               )

When commenting out the changes made with Issue:34673 it is working correctly:

# GH34673: if MultiIndex names present and not defined in the header,
# offset needs to be incremented so that forward filling starts
# from the first MI value instead of the name
if has_index_names:
offset += 1

Installed Versions

'1.4.2'
@0nelight 0nelight added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 23, 2022
@phofl phofl added IO Excel read_excel, to_excel and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 24, 2022
@jreback jreback added this to the 1.5 milestone Jul 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO Excel read_excel, to_excel
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants