Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Column Offset Bug with to_html(index=False) with MultiIndex Columns and Index #22579

Closed
simonjayhawkins opened this issue Sep 3, 2018 · 3 comments · Fixed by #22655
Closed
Labels
Bug IO HTML read_html, to_html, Styler.apply, Styler.applymap MultiIndex
Milestone

Comments

@simonjayhawkins
Copy link
Member

import pandas as pd
from IPython.display import HTML
df = pd.DataFrame({'a': range(10), 'b': range(10,20), 'c': range(10,20), 'd': range(10,20)})
df.columns = pd.MultiIndex.from_product([['a', 'b'], ['c', 'd']])
df.index = pd.MultiIndex.from_product([['a', 'b'], ['c', 'd', 'e', 'f', 'g']])
HTML(df.to_html(index=False))

Output:

a b
c d c d
0 10 10 10
1 11 11 11
2 12 12 12
3 13 13 13
4 14 14 14
5 15 15 15
6 16 16 16
7 17 17 17
8 18 18 18
9 19 19 19

Problem description

The code sample above is taken from #8452 but with a MultiIndex applied to the index as well as the columns. This seems to reproduce the column offset bug as seen in the original issue. The desired behaviour is observed if either one of the axes is a MultiIndex but not both.

INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 69 Stepping 1, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.0
pytest: 3.5.1
pip: 10.0.1
setuptools: 39.1.0
Cython: 0.28.2
numpy: 1.14.3
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: 1.7.4
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.3
numexpr: 2.6.5
feather: None
matplotlib: 2.2.2
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.4
lxml: 4.2.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.7
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@WillAyd
Copy link
Member

WillAyd commented Sep 3, 2018

Thanks for the report. Just to be clear this doesn't have anything to do with the IPython rendering. AFAICT if both the index and columns are MultiIndex and you use index=False in to_html the column headers of the output seem misaligned.

Investigation and PRs welcome

@WillAyd WillAyd added Bug IO HTML read_html, to_html, Styler.apply, Styler.applymap MultiIndex labels Sep 3, 2018
@simonjayhawkins
Copy link
Member Author

on initial investigation of how the index=False is handled, it appears that the handling of a multiIndex index is not quite right for the to_string method when 'max_cols' is also set:

>>> import pandas as pd
>>> print(pd.__version__)
0.24.0.dev0+562.g996f361f8
>>> df = pd.DataFrame({'a': range(10), 'b': range(10,20), 'c': range(10,20), 'd': range(10,20)})
>>> df.columns = pd.MultiIndex.from_product([['a', 'b'], ['c', 'd']])
>>> df.index = pd.MultiIndex.from_product([['a', 'b'], ['c', 'd', 'e', 'f', 'g']])
>>> df = df.head(2)
>>> print(df.to_string(index=True))
     a       b
     c   d   c   d
a c  0  10  10  10
  d  1  11  11  11
>>>
>>> print(df.to_string(index=False))
a       b
c   d   c   d
0  10  10  10
1  11  11  11
>>>
>>> print(df.to_string(index=True, max_cols=3))
     a ...   b
     c ...   d
a c  0 ...  10
  d  1 ...  11
>>>
>>> print(df.to_string(index=False, max_cols=3))
a   b ...
c   d ...
0  10 ...
1  11 ...
>>>
>>> print(df.to_string(index=True, max_cols=2))
     a ...   b
     c ...   d
a c  0 ...  10
  d  1 ...  11
>>>
>>> print(df.to_string(index=False, max_cols=2))
a   b ...
c   d ...
0  10 ...
1  11 ...
>>>

@simonjayhawkins
Copy link
Member Author

column misalignment also occurs for a standard index when the columns MultiIndex is named:

import numpy as np
import pandas as pd
df = pd.DataFrame(np.zeros((2, 2), dtype=int))
df.columns = pd.MultiIndex.from_product([['a'], ['b', 'c']], names=[
                'columns.name.0', 'columns.name.1'])
df.to_html(index=False)

output:

columns.name.0 a
columns.name.1 b c
0 0
0 0

expected:

columns.name.0 a
columns.name.1 b c
0 0
0 0

@jreback jreback added this to the 0.24.0 milestone Nov 19, 2018
@jreback jreback modified the milestones: 0.24.0, Contributions Welcome Dec 2, 2018
@jreback jreback modified the milestones: Contributions Welcome, 0.24.0 Dec 28, 2018
thoo added a commit to thoo/pandas that referenced this issue Jan 1, 2019
* upstream/master:
  BUG: output formatting with to_html(), index=False and/or index_names=False (pandas-dev#22579, pandas-dev#22747) (pandas-dev#22655)
  MAINT: Port _timelex in codebase (pandas-dev#24520)
  Implement unique+array parts of 24024 (pandas-dev#24527)
  Integer NA docs (pandas-dev#23617)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO HTML read_html, to_html, Styler.apply, Styler.applymap MultiIndex
Projects
None yet
3 participants