BUG in MultiIndex truncated repr with integer level names #15262

KevinBaudin · 2017-01-30T15:29:29Z

Reproducible example:

In [10]: df = pd.DataFrame({'col': range(9)}, index=pd.MultiIndex.from_product([['A0', 'A1', 'A2'], ['B0', 'B1', 'B2']], names=[1,2]))

In [11]: df
Out[11]: 
       col
1  2      
A0 B0    0
   B1    1
   B2    2
A1 B0    3
   B1    4
   B2    5
A2 B0    6
   B1    7
   B2    8

In [12]: pd.options.display.max_rows = 4

In [13]: df
Out[13]: 
       col
1  2      
A0 A0    0
   A0    1
...    ...
A2 A2    7
   A2    8

[9 rows x 1 columns]

So the truncated repr shows incorrectly the first index level (with integer level name 1) again for the second level.

Original post:

Code Sample, a copy-pastable example if possible

import wget
import pandas
import pickle

url = 'https://www.dropbox.com/s/aldllo0bi3m3wkl/stock?dl=1'
filename = wget.download(url)
df = pickle.load(open(filename))
df 
#bad display, index duplicated?
df.head() 
#expected display

Problem description

		merged
1	2
a.	a.	2
abel	abel	1
agnes	agnes	2
alain	alain	8
alain	alain	2

I have created a multi-index based on 2 columns .
Those two columns wont appear properly, index_column "2" being duplicated from "1"
When displaying up to the 60th first rows of dataframe, it's fine, then it duplicates again the column 1 in the column 2.

Expected Output

		merged
1	2
a.	masson-dubois	2
abel	pinchard	1
agnes	paquet	2
alain	corcia	8
alain	hudelot-noellat	2

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 2.7.9.final.0 python-bits: 64 OS: Linux OS-release: 4.9.4-moby machine: x86_64 processor: byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.19.1
nose: None
pip: 9.0.1
setuptools: 28.8.0
Cython: None
numpy: 1.11.2
scipy: 0.18.0
statsmodels: 0.6.1
xarray: None
IPython: 5.1.0
sphinx: None
patsy: 0.4.1
dateutil: 2.5.0
pytz: 2016.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.5.1
html5lib: None
httplib2: 0.9.2
apiclient: 1.5.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

jreback · 2017-01-30T15:34:29Z

pls show a copy-pastable example that doesn't rely on opening your file

KevinBaudin · 2017-01-30T15:40:34Z

Sorry, I didn't manage to find a simple reproducible example.
The problem only appeared on this particular dataframe at the moment.

PS : It's only a 22Kb file

jreback · 2017-01-30T15:46:54Z

this is not reproducible. you can try .sort_index()

adrtod · 2017-01-30T16:04:18Z

I could reproduce it but .sort_index() did not work.
The data seem not corrupted, only displaying is wrong

jorisvandenbossche · 2017-01-31T00:00:50Z

@KevinBaudin The file is not available anymore?

KevinBaudin · 2017-01-31T10:36:12Z

@jorisvandenbossche edited with new link, sorry.

jorisvandenbossche · 2017-01-31T12:54:21Z

@KevinBaudin The cause of the issue are the index level names ([1, 2]). If you set those to something else, you will see that the issue is resolved:

In [6]: df
Out[6]: 
                                 merged
1               2                      
a.              a.                    2
abel            abel                  1
agnes           agnes                 2
...
[100 rows x 1 columns]

In [7]: df.index.names = ['a', 'b']

In [8]: df
Out[8]: 
                                   merged
a               b                        
a.              masson-dubois           2
abel            pinchard                1
agnes           paquet                  2
...
[100 rows x 1 columns]

The reason for this is the integer level names (confusion between integer number of first (0) or second (1) level, or the level names (1 and 2)).
So it seems that the repr used in .head() is correctly dealing with this distinction, but the general repr not.

jorisvandenbossche · 2017-01-31T12:58:15Z

Smaller reproducible example:

In [10]: df = pd.DataFrame({'col': range(9)}, index=pd.MultiIndex.from_product([['A0', 'A1', 'A2'], ['B0', 'B1', 'B2']], names=[1,2]))

In [11]: df
Out[11]: 
       col
1  2      
A0 B0    0
   B1    1
   B2    2
A1 B0    3
   B1    4
   B2    5
A2 B0    6
   B1    7
   B2    8

In [12]: pd.options.display.max_rows = 4

In [13]: df
Out[13]: 
       col
1  2      
A0 A0    0
   A0    1
...    ...
A2 A2    7
   A2    8

[9 rows x 1 columns]

So it is the truncated repr that has this issue.

KevinBaudin · 2017-01-31T13:28:13Z

@jorisvandenbossche ❤️ 👍

Dr-Irv · 2017-02-10T22:45:59Z

I've started looking at this. Seems to be an issue in pd.concat():

In [2]: df = pd.DataFrame({'col': range(9)}, index=pd.MultiIndex.from_product([ ['A0', 'A1', 'A2'], ['B0', 'B1', 'B2']], names=[1,2]))

In [3]: df.iloc[:2,:]
Out[3]:
       col
1  2
A0 B0    0
   B1    1

In [4]: df.iloc[-2:,:]
Out[4]:
       col
1  2
A2 B1    7
   B2    8

In [5]: pd.concat((df.iloc[:2,:],df.iloc[-2:,:]))
Out[5]:
       col
1  2
A0 A0    0
   A0    1
A2 A2    7
   A2    8

That last result is incorrect. Should the name of this issue be changed? (@jorisvandenbossche)

jreback · 2017-02-17T19:55:10Z

I think this is a dupe of: #12223

if this is the case, just use an example from there as well in tests.

…n MultiIndex

…n MultiIndex closes pandas-dev#12223 closes pandas-dev#15262 Author: Dr-Irv <irv@princeton.com> Closes pandas-dev#15478 from Dr-Irv/Issue15262 and squashes the following commits: 15d8433 [Dr-Irv] Address jreback comments 10667a3 [Dr-Irv] Fix types for test 8935068 [Dr-Irv] resolve conflicts 385ca3e [Dr-Irv] BUG: GH pandas-dev#12223, GH pandas-dev#15262. Allow ints for names in MultiIndex

jreback closed this as completed Jan 30, 2017

jreback added the Can't Repro label Jan 30, 2017

jorisvandenbossche reopened this Jan 31, 2017

jorisvandenbossche added Bug Output-Formatting __repr__ of pandas objects, to_string and removed Can't Repro labels Jan 31, 2017

jorisvandenbossche changed the title ~~Multi-Index doesn't display as expected using IPython~~ BUG in MultiIndex truncated repr with integer level names Jan 31, 2017

Dr-Irv mentioned this issue Feb 18, 2017

API: MultiIndex.get_level_values should have label only / level only mode? #10461

Open

Dr-Irv added a commit to Dr-Irv/pandas that referenced this issue Feb 22, 2017

BUG: GH pandas-dev#12223, GH pandas-dev#15262. Allow ints for names i…

0da6cc3

…n MultiIndex

Dr-Irv mentioned this issue Feb 22, 2017

BUG: GH #12223, GH #15262. Allow ints for names in MultiIndex #15478

Closed

4 tasks

Dr-Irv added a commit to Dr-Irv/pandas that referenced this issue Feb 23, 2017

BUG: GH pandas-dev#12223, GH pandas-dev#15262. Allow ints for names i…

385ca3e

…n MultiIndex

jreback added this to the 0.20.0 milestone Feb 23, 2017

jreback closed this as completed in 5955804 Feb 24, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG in MultiIndex truncated repr with integer level names #15262

BUG in MultiIndex truncated repr with integer level names #15262

KevinBaudin commented Jan 30, 2017 •

edited by jorisvandenbossche

Loading

jreback commented Jan 30, 2017

KevinBaudin commented Jan 30, 2017 •

edited

Loading

jreback commented Jan 30, 2017

adrtod commented Jan 30, 2017

jorisvandenbossche commented Jan 31, 2017

KevinBaudin commented Jan 31, 2017

jorisvandenbossche commented Jan 31, 2017

jorisvandenbossche commented Jan 31, 2017

KevinBaudin commented Jan 31, 2017

Dr-Irv commented Feb 10, 2017 •

edited

Loading

jreback commented Feb 17, 2017 •

edited

Loading

BUG in MultiIndex truncated repr with integer level names #15262

BUG in MultiIndex truncated repr with integer level names #15262

Comments

KevinBaudin commented Jan 30, 2017 • edited by jorisvandenbossche Loading

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

jreback commented Jan 30, 2017

KevinBaudin commented Jan 30, 2017 • edited Loading

jreback commented Jan 30, 2017

adrtod commented Jan 30, 2017

jorisvandenbossche commented Jan 31, 2017

KevinBaudin commented Jan 31, 2017

jorisvandenbossche commented Jan 31, 2017

jorisvandenbossche commented Jan 31, 2017

KevinBaudin commented Jan 31, 2017

Dr-Irv commented Feb 10, 2017 • edited Loading

jreback commented Feb 17, 2017 • edited Loading

KevinBaudin commented Jan 30, 2017 •

edited by jorisvandenbossche

Loading

Output of `pd.show_versions()`

KevinBaudin commented Jan 30, 2017 •

edited

Loading

Dr-Irv commented Feb 10, 2017 •

edited

Loading

jreback commented Feb 17, 2017 •

edited

Loading