New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: to_latex outputs string with missing second index level values #14484

Closed
the-alleged-car opened this Issue Oct 24, 2016 · 3 comments

Comments

Projects
None yet
4 participants
@the-alleged-car

the-alleged-car commented Oct 24, 2016

I am using pandas to generate a LaTeX string using the to_latex() method on a DataFrame, which is indexed using a MultiIndex object. Running the code snippet produces an incorrect list of strings: the LaTeX table is missing two index numbers.

Code Snippet

import pandas as pd

outliers_lst = [(23240, 0),
                 (23240, 15),
                 (23240, 23),
                 (23240, 31),
                 (23240, 85),
                 (38661, 85),
                 (41231, 85),
                 (41231, 92),
                 (46371, 0)]

headers = (['max', 'EC 1', 'S'],
             ['max', 'EC 1', 'A'],
             ['max', 'EC 2', 'S'])

table = pd.DataFrame("",index = pd.MultiIndex.from_tuples(sorted(outliers_lst)), columns = pd.MultiIndex.from_tuples(headers))
table.to_latex(index = True, longtable = True, column_format = 'c'*5).split('\n')

Incorrect Output

[u'\\begin{longtable}{cccccccccccccccccccccccccc}',
 u'\\toprule',
 u'      &    &  max &   &      \\\\',
 u'      &    & EC 1 &   & EC 2 \\\\',
 u'      &    &    S & A &    S \\\\',
 u'\\midrule',
 u'\\endhead',
 u'\\midrule',
 u'\\multicolumn{3}{r}{{Continued on next page}} \\\\',
 u'\\midrule',
 u'\\endfoot',
 u'',
 u'\\bottomrule',
 u'\\endlastfoot',
 u'23240 & 0  &      &   &      \\\\',
 u'      & 15 &      &   &      \\\\',
 u'      & 23 &      &   &      \\\\',
 u'      & 31 &      &   &      \\\\',
 u'      & 85 &      &   &      \\\\',
 u'38661 &    &      &   &      \\\\',
 u'41231 &    &      &   &      \\\\',
 u'      & 92 &      &   &      \\\\',
 u'46371 & 0  &      &   &      \\\\',
 u'\\end{longtable}',
 u'']

Correct Output

[u'\\begin{longtable}{cccccccccccccccccccccccccc}',
 u'\\toprule',
 u'      &    &  max &   &      \\\\',
 u'      &    & EC 1 &   & EC 2 \\\\',
 u'      &    &    S & A &    S \\\\',
 u'\\midrule',
 u'\\endhead',
 u'\\midrule',
 u'\\multicolumn{3}{r}{{Continued on next page}} \\\\',
 u'\\midrule',
 u'\\endfoot',
 u'',
 u'\\bottomrule',
 u'\\endlastfoot',
 u'23240 & 0  &      &   &      \\\\',
 u'      & 15 &      &   &      \\\\',
 u'      & 23 &      &   &      \\\\',
 u'      & 31 &      &   &      \\\\',
 u'      & 85 &      &   &      \\\\',
 u'38661 & 85 &      &   &      \\\\',
 u'41231 & 85 &      &   &      \\\\',
 u'      & 92 &      &   &      \\\\',
 u'46371 & 0  &      &   &      \\\\',
 u'\\end{longtable}',
 u'']

Note that in the correct output LaTeX strings, the rows with indices (38661, 85) and (41231, 85) correctly include the second index (the number 85), but in the incorrect LaTeX strings the rows do not include the number 85.

Could this be because the row (23240, 85) above (38661, 85) includes 85 in its second index?

commit: None python: 2.7.12.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: None

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 23.0.0
Cython: 0.24
numpy: 1.11.1
scipy: 0.17.1
statsmodels: 0.6.1
xarray: None
IPython: 4.2.0
sphinx: 1.4.1
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: 1.1.0
tables: 3.2.2
numexpr: 2.6.0
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.2
lxml: 3.6.0
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: 0.7.6.None
psycopg2: None
jinja2: 2.8
boto: 2.40.0
pandas_datareader: None

@jorisvandenbossche

This comment has been minimized.

Show comment
Hide comment
@jorisvandenbossche

jorisvandenbossche Oct 24, 2016

Member

@the-alleged-car That indeed looks like a bug in the multi-index handling (not printing consecutive values should only happen for the same values of the previous level). Thanks for the report!

Smaller reproducible example:

In [18]: df = pd.DataFrame(index=pd.MultiIndex.from_tuples([('A', 'c'), ('B', 'c')]), columns=['col'])

In [19]: print(df.to_latex())
\begin{tabular}{lll}
\toprule
  &   &  col \\
\midrule
A & c &  NaN \\
B &   &  NaN \\
\bottomrule
\end{tabular}

Member

jorisvandenbossche commented Oct 24, 2016

@the-alleged-car That indeed looks like a bug in the multi-index handling (not printing consecutive values should only happen for the same values of the previous level). Thanks for the report!

Smaller reproducible example:

In [18]: df = pd.DataFrame(index=pd.MultiIndex.from_tuples([('A', 'c'), ('B', 'c')]), columns=['col'])

In [19]: print(df.to_latex())
\begin{tabular}{lll}
\toprule
  &   &  col \\
\midrule
A & c &  NaN \\
B &   &  NaN \\
\bottomrule
\end{tabular}

@jorisvandenbossche jorisvandenbossche changed the title from DataFrame.to_latex() outputs string with missing second index values to BUG: to_latex outputs string with missing second index level values Oct 24, 2016

@jorisvandenbossche

This comment has been minimized.

Show comment
Hide comment
@jorisvandenbossche

jorisvandenbossche Oct 24, 2016

Member

@the-alleged-car If you want to take a look how to fix it, always welcome!

Member

jorisvandenbossche commented Oct 24, 2016

@the-alleged-car If you want to take a look how to fix it, always welcome!

@enriquefernandez

This comment has been minimized.

Show comment
Hide comment
@enriquefernandez

enriquefernandez Oct 28, 2017

This just bit me as well.
Any known workarounds for the moment?

enriquefernandez commented Oct 28, 2017

This just bit me as well.
Any known workarounds for the moment?

gfyoung added a commit to gfyoung/pandas that referenced this issue Dec 8, 2017

@jreback jreback added this to the 0.21.1 milestone Dec 8, 2017

jreback added a commit that referenced this issue Dec 8, 2017

BUG: LatexFormatter.write_result multi-index (#18685)
* BUG: LatexFormatter.write_result multi-index

Fixed GH issue 14484:
`LatexFormatter.write_result`` now does not print blanks if a
higher-order index differs from the previous row.
Also added testcase for this.

* MAINT: Address reviewer comments

Closes gh-14484
Closes gh-17499

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Dec 8, 2017

BUG: LatexFormatter.write_result multi-index (pandas-dev#18685)
* BUG: LatexFormatter.write_result multi-index

Fixed GH issue 14484:
`LatexFormatter.write_result`` now does not print blanks if a
higher-order index differs from the previous row.
Also added testcase for this.

* MAINT: Address reviewer comments

Closes pandas-devgh-14484
Closes pandas-devgh-17499

TomAugspurger added a commit that referenced this issue Dec 11, 2017

BUG: LatexFormatter.write_result multi-index (#18685)
* BUG: LatexFormatter.write_result multi-index

Fixed GH issue 14484:
`LatexFormatter.write_result`` now does not print blanks if a
higher-order index differs from the previous row.
Also added testcase for this.

* MAINT: Address reviewer comments

Closes gh-14484
Closes gh-17499
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment