Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: to_latex() output broken when the index has a name #10660

Closed
jakbaum opened this issue Jul 23, 2015 · 11 comments · Fixed by #11233
Closed

BUG: to_latex() output broken when the index has a name #10660

jakbaum opened this issue Jul 23, 2015 · 11 comments · Fixed by #11233
Labels
Milestone

Comments

@jakbaum
Copy link

jakbaum commented Jul 23, 2015

Hey folks,

I posted this on SO and was asked to file a report here as well.

I'm trying to export pandas.DataFrame.describe() to LaTex using the to_latex()-method. This works all fine as long as I don't apply the groupby()-method beforehand. With a grouped DataFrame, the first row has no values, even though its label is count. Note that the first row of a grouped dataframe is used to mark down the variable used for grouping in iPython notebook.

I'm using pandas 0.16.2, python 3.
Is this a bug or am I doing something wrong?

Cheers,
Jakob

Here some examples:

Without groupby:

\begin{tabular}{lr}
\toprule
{} &    IS\_FEMALE \\
\midrule
count &  2267.000000 \\
mean  &     0.384649 \\
...
...
75\%   &     1.000000 \\
max   &     1.000000 \\
\bottomrule
\end{tabular}

enter image description here

With groupby:

\begin{tabular}{llr}
\toprule
  &       &    IS\_FEMALE \\
\midrule
0 & count &              \\     % <-- note missing value here
  & mean &  1134.000000 \\
  & std &     0.554674 \\
...
...
  & 75\% &     0.000000 \\
  & max &     0.000000 \\
\bottomrule
\end{tabular}

enter image description here

Output in the notebook:

enter image description here

@jorisvandenbossche jorisvandenbossche added the Output-Formatting __repr__ of pandas objects, to_string label Jul 23, 2015
@jorisvandenbossche
Copy link
Member

Thanks for the report! Can you:

  • try to provide a small reproducible example? (so some code we can run that makes up a dummy dataframe and that reproduces the error)
  • check if it is an issue with groupby, or just with to_latex. For example, if you create a similar dataframe comparable to the output of the groupby by hand, and then export it to latex, do you experience the same error?

@jakbaum
Copy link
Author

jakbaum commented Jul 23, 2015

Sure. This snippet re-creates the issue. Sorry for the messy DataFrame-construction. First time I create one with numpy.

import pandas as pd
import numpy as np

cols = ['Group','Value']
group = np.random.randint(2, size=10)
values = np.random.random_sample(10)
df = pd.DataFrame([group, values]).T
df.columns = cols

print(df.groupby('Group').describe().to_latex())

I don't really know how to test your second point, to be honest. The first 'blank' row of a groupby is just visualization, I reckon?

@jorisvandenbossche
Copy link
Member

Thanks for the reproducible example! That indeeds triggers the error for me as well.

Here is an example of just a small dataframe that also shows the error (as it has as such nothing to do with the groupby, it is just that it creates a multi-index that to_latex handles incorrectly):


In [22]: df = pd.DataFrame({'a':[0,0,1,1], 'b':list('abab'), 'c':[1,2,3,4]})

In [23]: df = df.set_index(['a', 'b'])

In [24]: df
Out[24]:
     c
a b
0 a  1
  b  2
1 a  3
  b  4

In [25]: print(df.to_latex())
\begin{tabular}{llr}
\toprule
  &   &  c \\
\midrule
0 & a &    \\
  & b &  1 \\
1 & a &  2 \\
  & b &  3 \\
\bottomrule
\end{tabular}

It seems that all values are shifted one line below.

@jorisvandenbossche
Copy link
Member

It seems this has something to do with the index level names:

In [35]: df.index.names = [None, None]

In [36]: df
Out[36]:
     c
0 a  1
  b  2
1 a  3
  b  4

In [37]: print df.to_latex()
\begin{tabular}{llr}
\toprule
  &   &  c \\
\midrule
0 & a &  1 \\
  & b &  2 \\
1 & a &  3 \\
  & b &  4 \\
\bottomrule
\end{tabular}

And possibly related: #9908

@jreback
Copy link
Contributor

jreback commented Jul 23, 2015

dupe if #2942 ?

@jorisvandenbossche jorisvandenbossche changed the title .groupby().to_latex() output broken BUG: to_latex() output broken when the index has a name Jul 23, 2015
@jorisvandenbossche
Copy link
Member

No, I don't think so, as this one not only applies to multi-index:

In [45]: df = pd.DataFrame({'a':list('abc'), 'b':[1,2,3]})

In [46]: df = df.set_index(['a'])

In [47]: df
Out[47]:
   b
a
a  1
b  2
c  3

In [49]: print df.to_latex()
\begin{tabular}{lr}
\toprule
{} &  b \\
\midrule
a &    \\
a &  1 \\
b &  2 \\
c &  3 \\
\bottomrule
\end{tabular}

So it is something with the index name.

@jakbaum
Copy link
Author

jakbaum commented Jul 23, 2015

Is the proposed fix of #9908 implemented in 0.16.2?

@jorisvandenbossche
Copy link
Member

@jakbaum yes, it is already in 0.16.1. But it does not fix this one, it possibly fixed a related issue, but should look into more detail into that.

And very welcome to look into the problem if you want! It shouldn't be that hard I think.

@jreback
Copy link
Contributor

jreback commented Jul 23, 2015

also #8336

@jakbaum
Copy link
Author

jakbaum commented Jul 23, 2015

@jorisvandenbossche Your belief in my coding qualities honor me, but quite honestly: I don't think I'm capable of fixing this. I wouldn't even know how to start and I don't want to mess things up. Actually, I'm more of a copy-paste coder than anything else. :)

@jorisvandenbossche
Copy link
Member

@jakbaum no problem, thanks for reporting it anyway!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants