ENH: multiindex formatting #12929

jreback · 2016-04-19T20:43:30Z

In [1]: pd.options.display.width=80

In [2]: pd.options.display.max_seq_items=10

In [3]: pd.options.display.max_seq_items=100

In [4]:         mi = pd.MultiIndex.from_tuples([('A', 1), ('A', 2),
                                        ('B', 3), ('B', 4)])

In [5]: mi
Out[5]: 
MultiIndex(levels=[[u'A', u'B'],
                   [1, 2, 3, 4]],
           labels=[[0, 0, 1, 1],
                   [0, 1, 2, 3]])

In [6]:         mi = MultiIndex.from_product([list('abcdefg'),
                                      range(10),
                                      pd.date_range('20130101', periods=10)],
                                     names=['first', 'second', 'third'])

In [7]: mi
Out[7]: 
MultiIndex(levels=[[u'a', u'b', u'c', u'd', u'e', u'f', u'g'],
                   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
                   ['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04', '2013-01-05', '2013-01-06', '2013-01-07', '2013-01-08', '2013-01-09', '2013-01-10']],
           labels=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                    ...
                    6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6],
                   [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2,
                    ...
                    7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9],
                   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4,
                    ...
                    5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]],
           names=[u'first', u'second', u'third'])

jreback · 2016-04-19T20:44:12Z

@TomAugspurger @jorisvandenbossche @sinhrks @shoyer @wesm

hmm, I see a double comma already........

jreback · 2016-04-19T20:45:57Z

could probably squeeze name,dtype in for each level I think.

jreback · 2016-04-22T19:25:10Z

comments?

jorisvandenbossche · 2016-04-22T19:30:54Z

Something like

           labels=[[0, 0, 0, 0, 0, ... 6, 6, 6, 6, 6],

instead of

           labels=[[0, 0, 0, 0, 0,
                    ...
                    6, 6, 6, 6, 6],

would maybe be clearer?

But, in any case, following max_seq_items is a good improvement

jorisvandenbossche · 2016-04-22T19:33:46Z

In the example you pasted above (copied from there):

In [11]: mi
Out[11]: 
MultiIndex(levels=[[u'a', u'b', u'c', u'd'],
                   [0, 1, 2]],
           labels=[[0, 0, 0, 1, 1, 1, 2, 2,
                    2, 3, 3, 3],
                   [0, 1, 2, 0, 1, 2, 0, 1,
                    2, 0, 1, 2]],
           names=[u'first', u'second'])

Is there a reason the individual labels lists break of and do not fit on one line? It seems they should fit on one line?

jreback · 2016-04-22T19:35:12Z

I think that last might be a bug. I have to twiddle the justification settings.

jreback · 2016-04-22T20:02:49Z

so I think there is a bug in the current formatter that basically hard-codes 10 as the truncated width, here is a modification that uses 1/2 of the max_seq_items and bounds it between 10 and 50 (so it has some flex) (this is of course limited by the display_width as well)

In [9]: pd.options.display.max_seq_items=50

In [10]: Index(list(range(100)))
Out[10]: 
Int64Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
            ...
            75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99],
           dtype='int64', length=100)

In [3]: pd.options.display.max_seq_items=30

In [4]: Index(list(range(100)))
Out[4]: 
Int64Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14,
            ...
            85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99], dtype='int64', length=100)

In [5]: pd.options.display.max_seq_items=20

In [6]: Index(list(range(100)))
Out[6]: 
Int64Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9,
            ...
            90, 91, 92, 93, 94, 95, 96, 97, 98, 99], dtype='int64', length=100)

In [7]: pd.options.display.max_seq_items=10

In [8]: Index(list(range(100)))
Out[8]: 
Int64Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9,
            ...
            90, 91, 92, 93, 94, 95, 96, 97, 98, 99], dtype='int64', length=100)

jreback · 2016-04-22T20:22:25Z

so with the above change (and allowing labels & levels to have different max_seq_items), though still limited to (10,50) range

In [1]: pd.options.display.width=200

In [2]: pd.options.display.max_seq_items=50

In [3]: mi = MultiIndex.from_product([list('abcdefg'), range(10), pd.date_range('20130101',periods=10)],
                                     names=['first', 'second', 'third'])

In [4]: mi
Out[4]: 
MultiIndex(levels=[[u'a', u'b', u'c', u'd', u'e', u'f', u'g'],
                   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
                   ['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04', '2013-01-05', '2013-01-06', '2013-01-07', '2013-01-08', '2013-01-09', '2013-01-10']],
           labels=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                    ...
                    6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6],
                   [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2,
                    ...
                    7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9],
                   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4,
                    ...
                    5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]],
           names=[u'first', u'second', u'third'])

In [5]: pd.options.display.max_seq_items=30

In [6]: mi
Out[6]: 
MultiIndex(levels=[[u'a', u'b', u'c', u'd', u'e', u'f', u'g'],
                   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
                   ['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04', '2013-01-05', '2013-01-06', '2013-01-07', '2013-01-08', '2013-01-09',
                    '2013-01-10']],
           labels=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                    ...
                    6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6],
                   [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1,
                    ...
                    8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9],
                   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4,
                    ...
                    5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]],
           names=[u'first', u'second', u'third'])

max-sixty · 2016-04-22T21:09:56Z

One option where there are a lot of levels could be to display them like a DataFrame. Although that's a fair distance from where we are now

jreback · 2016-04-26T17:57:37Z

ok updated a bit. The above is with the default options (though width is 80)

closes pandas-dev#12423

jreback · 2016-04-26T19:16:05Z

@sinhrks

if you have a chance. can you see if these changes look resonable for the truncation / formatting in categoricals

[Tue Apr 26 15:11:58 ~/pandas]$ nosetests  pandas/tests/indexes/test_multi.py pandas/tests/indexes/test_category.py pandas/tests/indexes/test_base.py 
..............................................................................................................................................................................................F................................................................................................................F............
======================================================================
FAIL: test_string_categorical_index_repr (pandas.tests.indexes.test_category.TestCategoricalIndex)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jreback/pandas/pandas/tests/indexes/test_category.py", line 638, in test_string_categorical_index_repr
    self.assertEqual(unicode(idx), expected)
AssertionError: u"CategoricalIndex([u'\u3042', u'\u3044\u3044', u'\u3046\u3046\u3046', u'\u3042' [truncated]... != u"CategoricalIndex([u'\u3042', u'\u3044\u3044', u'\u3046\u3046\u3046', u'\u3042' [truncated]...
Diff is 711 characters long. Set self.maxDiff to None to see it.

======================================================================
FAIL: test_string_index_repr (pandas.tests.indexes.test_base.TestIndex)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jreback/pandas/pandas/tests/indexes/test_base.py", line 1465, in test_string_index_repr
    self.assertEqual(coerce(idx), expected)
AssertionError: u"Index([u'a', u'bb', u'ccc', u'a', u'bb', u'ccc', u'a', u'bb', u'ccc', u'a',\n  [truncated]... != u"                Index([u'a', u'bb', u'ccc', u'a', u'bb', u'ccc', u'a', u'bb',  [truncated]...
Diff is 670 characters long. Set self.maxDiff to None to see it.

----------------------------------------------------------------------
Ran 316 tests in 5.779s

FAILED (failures=2)

couple of failing tests. I am not sure of the parameters that were used to initially generate these. I have hard coded in a context manager, but I am doing something wrong.

jreback · 2016-08-26T20:24:31Z

closing, but maybe someone can pick this up.

jorisvandenbossche · 2016-08-27T12:48:52Z

It would also depend a bit on what we do with #13480 (issue about changing the default multi-index repr)

jreback added Output-Formatting __repr__ of pandas objects, to_string Compat pandas objects compatability with Numpy or Python functions labels Apr 19, 2016

jreback added this to the 0.18.1 milestone Apr 19, 2016

jreback force-pushed the trunc_mi branch 6 times, most recently from d8a98be to 11403dd Compare April 20, 2016 00:35

jreback force-pushed the trunc_mi branch 3 times, most recently from 7631ad5 to 81c8ca9 Compare April 26, 2016 17:56

ENH: multiindex formatting

5c4f95c

closes pandas-dev#12423

jreback force-pushed the trunc_mi branch from 81c8ca9 to 5c4f95c Compare April 26, 2016 19:14

jreback modified the milestones: 0.18.2, 0.18.1 Apr 26, 2016

jorisvandenbossche modified the milestones: 0.19.0, 0.18.2 Jun 14, 2016

jorisvandenbossche removed this from the 0.20.0 milestone Jul 8, 2016

jorisvandenbossche modified the milestones: 0.19.0, 0.20.0 Jul 8, 2016

jreback modified the milestones: 0.20.0, 0.19.0 Jul 13, 2016

jreback closed this Aug 26, 2016

jorisvandenbossche added the Closed PR label Aug 27, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: multiindex formatting #12929

ENH: multiindex formatting #12929

jreback commented Apr 19, 2016 •

edited

jreback commented Apr 19, 2016

jreback commented Apr 19, 2016

jreback commented Apr 22, 2016

jorisvandenbossche commented Apr 22, 2016 •

edited

jorisvandenbossche commented Apr 22, 2016

jreback commented Apr 22, 2016

jreback commented Apr 22, 2016

jreback commented Apr 22, 2016

max-sixty commented Apr 22, 2016

jreback commented Apr 26, 2016

jreback commented Apr 26, 2016

jreback commented Aug 26, 2016

jorisvandenbossche commented Aug 27, 2016

ENH: multiindex formatting #12929

ENH: multiindex formatting #12929

Conversation

jreback commented Apr 19, 2016 • edited

jreback commented Apr 19, 2016

jreback commented Apr 19, 2016

jreback commented Apr 22, 2016

jorisvandenbossche commented Apr 22, 2016 • edited

jorisvandenbossche commented Apr 22, 2016

jreback commented Apr 22, 2016

jreback commented Apr 22, 2016

jreback commented Apr 22, 2016

max-sixty commented Apr 22, 2016

jreback commented Apr 26, 2016

jreback commented Apr 26, 2016

jreback commented Aug 26, 2016

jorisvandenbossche commented Aug 27, 2016

jreback commented Apr 19, 2016 •

edited

jorisvandenbossche commented Apr 22, 2016 •

edited