Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: pandas accepts non-increasing MultiIndex header arguments #47011

Open
3 tasks done
ahawryluk opened this issue May 13, 2022 · 2 comments
Open
3 tasks done

BUG: pandas accepts non-increasing MultiIndex header arguments #47011

ahawryluk opened this issue May 13, 2022 · 2 comments
Labels

Comments

@ahawryluk
Copy link
Contributor

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

from io import StringIO
import pandas as pd

data = """a,a,a,b,c,c
q,r,s,t,u,v
1,2,3,4,5,6
7,8,9,10,11,12"""
pd.read_csv(StringIO(data), header=[1, 0])

Issue Description

pandas accepts non-increasing MultiIndex header arguments, but they don't work. For instance, the snippet above produces

  NaN                    
    a a.1 a.2   b   c c.1
0   a   a   a   b   c   c
1   q   r   s   t   u   v
2   1   2   3   4   5   6
3   7   8   9  10  11  12

i.e., a DataFrame whose columns are

MultiIndex([(nan,   'a'),
            (nan, 'a.1'),
            (nan, 'a.2'),
            (nan,   'b'),
            (nan,   'c'),
            (nan, 'c.1')],
           )

Parsing the data with header=[0, 0] is also accepted, and behaves sensibly:

   a           b   c    
   a a.1 a.2   b   c c.1
0  q   r   s   t   u   v
1  1   2   3   4   5   6
2  7   8   9  10  11  12

but I can't see a reason to support redundant header levels.

Expected Behavior

I propose that non-increasing header arguments raise a ValueError('header elements must be increasing') or something to that effect.

Installed Versions

INSTALLED VERSIONS

commit : 7c913d6
python : 3.8.13.final.0
python-bits : 64
OS : Linux
OS-release : 5.15.0-27-generic
Version : #28-Ubuntu SMP Thu Apr 14 04:55:28 UTC 2022
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_CA.UTF-8
LOCALE : en_CA.UTF-8

pandas : 1.5.0.dev0+786.g7c913d6b75
numpy : 1.21.5
pytz : 2022.1
dateutil : 2.8.2
pip : 22.0.4
setuptools : 62.0.0
Cython : 0.29.28
pytest : 7.1.1
hypothesis : 6.41.0
sphinx : 4.5.0
blosc : None
feather : None
xlsxwriter : 3.0.3
lxml.etree : 4.8.0
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.0.3
IPython : 8.2.0
pandas_datareader: None
bs4 : 4.11.0
bottleneck : 1.3.4
brotli :
fastparquet : 0.8.0
fsspec : 2021.11.0
gcsfs : 2021.11.0
markupsafe : 2.1.1
matplotlib : 3.5.1
numba : 0.55.1
numexpr : 2.8.0
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : 7.0.0
pyreadstat : 1.1.4
pyxlsb : 1.0.9
s3fs : 2021.11.0
scipy : 1.8.0
snappy :
sqlalchemy : 1.4.35
tables : 3.7.0
tabulate : 0.8.9
xarray : 0.18.2
xlrd : 2.0.1
xlwt : 1.3.0
zstandard : None

@ahawryluk ahawryluk added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 13, 2022
@simonjayhawkins simonjayhawkins added the IO CSV read_csv, to_csv label May 15, 2022
@simonjayhawkins
Copy link
Member

Thanks @ahawryluk for the report.

I propose that non-increasing header arguments raise a ValueError('header elements must be increasing') or something to that effect.

makes sense and keep this issue labelled as a bug for now. The documentation maybe should be updated too.

The alternative could be an enhancement to allow this, but maybe unnecessary to add code complexity when a simple swaplevel would give the expected result in this case.

pd.read_csv(StringIO(data), header=[0, 1]).swaplevel(axis=1)
   q  r  s   t   u   v
   a  a  a   b   c   c
0  1  2  3   4   5   6
1  7  8  9  10  11  12

but I can't see a reason to support redundant header levels.

agreed. This maybe should raise a ValueError too, although this would be a breaking change if any users currently depend on this behavior.

contributions, PR and further investigation/suggestions welcome.

@simonjayhawkins simonjayhawkins added MultiIndex and removed Needs Triage Issue that has not been reviewed by a pandas team member labels May 15, 2022
@simonjayhawkins simonjayhawkins added this to the Contributions Welcome milestone May 15, 2022
@ahawryluk
Copy link
Contributor Author

take

@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@ahawryluk ahawryluk removed their assignment Feb 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants