Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: multiindex formatting #12929

Closed
wants to merge 1 commit into from
Closed

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Apr 19, 2016

closes #12423

In [1]: pd.options.display.width=80

In [2]: pd.options.display.max_seq_items=10

In [3]: pd.options.display.max_seq_items=100

In [4]:         mi = pd.MultiIndex.from_tuples([('A', 1), ('A', 2),
                                        ('B', 3), ('B', 4)])

In [5]: mi
Out[5]: 
MultiIndex(levels=[[u'A', u'B'],
                   [1, 2, 3, 4]],
           labels=[[0, 0, 1, 1],
                   [0, 1, 2, 3]])

In [6]:         mi = MultiIndex.from_product([list('abcdefg'),
                                      range(10),
                                      pd.date_range('20130101', periods=10)],
                                     names=['first', 'second', 'third'])

In [7]: mi
Out[7]: 
MultiIndex(levels=[[u'a', u'b', u'c', u'd', u'e', u'f', u'g'],
                   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
                   ['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04', '2013-01-05', '2013-01-06', '2013-01-07', '2013-01-08', '2013-01-09', '2013-01-10']],
           labels=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                    ...
                    6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6],
                   [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2,
                    ...
                    7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9],
                   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4,
                    ...
                    5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]],
           names=[u'first', u'second', u'third'])

@jreback jreback added Output-Formatting __repr__ of pandas objects, to_string Compat pandas objects compatability with Numpy or Python functions labels Apr 19, 2016
@jreback jreback added this to the 0.18.1 milestone Apr 19, 2016
@jreback
Copy link
Contributor Author

jreback commented Apr 19, 2016

@TomAugspurger @jorisvandenbossche @sinhrks @shoyer @wesm

hmm, I see a double comma already........

@jreback
Copy link
Contributor Author

jreback commented Apr 19, 2016

could probably squeeze name,dtype in for each level I think.

@jreback jreback force-pushed the trunc_mi branch 6 times, most recently from d8a98be to 11403dd Compare April 20, 2016 00:35
@jreback
Copy link
Contributor Author

jreback commented Apr 22, 2016

comments?

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Apr 22, 2016

Something like

           labels=[[0, 0, 0, 0, 0, ... 6, 6, 6, 6, 6],

instead of

           labels=[[0, 0, 0, 0, 0,
                    ...
                    6, 6, 6, 6, 6],

would maybe be clearer?

But, in any case, following max_seq_items is a good improvement

@jorisvandenbossche
Copy link
Member

In the example you pasted above (copied from there):

In [11]: mi
Out[11]: 
MultiIndex(levels=[[u'a', u'b', u'c', u'd'],
                   [0, 1, 2]],
           labels=[[0, 0, 0, 1, 1, 1, 2, 2,
                    2, 3, 3, 3],
                   [0, 1, 2, 0, 1, 2, 0, 1,
                    2, 0, 1, 2]],
           names=[u'first', u'second'])

Is there a reason the individual labels lists break of and do not fit on one line? It seems they should fit on one line?

@jreback
Copy link
Contributor Author

jreback commented Apr 22, 2016

I think that last might be a bug. I have to twiddle the justification settings.

@jreback
Copy link
Contributor Author

jreback commented Apr 22, 2016

so I think there is a bug in the current formatter that basically hard-codes 10 as the truncated width, here is a modification that uses 1/2 of the max_seq_items and bounds it between 10 and 50 (so it has some flex) (this is of course limited by the display_width as well)

In [9]: pd.options.display.max_seq_items=50

In [10]: Index(list(range(100)))
Out[10]: 
Int64Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
            ...
            75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99],
           dtype='int64', length=100)

In [3]: pd.options.display.max_seq_items=30

In [4]: Index(list(range(100)))
Out[4]: 
Int64Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14,
            ...
            85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99], dtype='int64', length=100)

In [5]: pd.options.display.max_seq_items=20

In [6]: Index(list(range(100)))
Out[6]: 
Int64Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9,
            ...
            90, 91, 92, 93, 94, 95, 96, 97, 98, 99], dtype='int64', length=100)

In [7]: pd.options.display.max_seq_items=10

In [8]: Index(list(range(100)))
Out[8]: 
Int64Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9,
            ...
            90, 91, 92, 93, 94, 95, 96, 97, 98, 99], dtype='int64', length=100)

@jreback
Copy link
Contributor Author

jreback commented Apr 22, 2016

so with the above change (and allowing labels & levels to have different max_seq_items), though still limited to (10,50) range

In [1]: pd.options.display.width=200

In [2]: pd.options.display.max_seq_items=50

In [3]: mi = MultiIndex.from_product([list('abcdefg'), range(10), pd.date_range('20130101',periods=10)],
                                     names=['first', 'second', 'third'])

In [4]: mi
Out[4]: 
MultiIndex(levels=[[u'a', u'b', u'c', u'd', u'e', u'f', u'g'],
                   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
                   ['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04', '2013-01-05', '2013-01-06', '2013-01-07', '2013-01-08', '2013-01-09', '2013-01-10']],
           labels=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                    ...
                    6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6],
                   [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2,
                    ...
                    7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9],
                   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4,
                    ...
                    5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]],
           names=[u'first', u'second', u'third'])

In [5]: pd.options.display.max_seq_items=30

In [6]: mi
Out[6]: 
MultiIndex(levels=[[u'a', u'b', u'c', u'd', u'e', u'f', u'g'],
                   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
                   ['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04', '2013-01-05', '2013-01-06', '2013-01-07', '2013-01-08', '2013-01-09',
                    '2013-01-10']],
           labels=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                    ...
                    6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6],
                   [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1,
                    ...
                    8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9],
                   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4,
                    ...
                    5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]],
           names=[u'first', u'second', u'third'])

@max-sixty
Copy link
Contributor

One option where there are a lot of levels could be to display them like a DataFrame. Although that's a fair distance from where we are now

@jreback jreback force-pushed the trunc_mi branch 3 times, most recently from 7631ad5 to 81c8ca9 Compare April 26, 2016 17:56
@jreback
Copy link
Contributor Author

jreback commented Apr 26, 2016

ok updated a bit. The above is with the default options (though width is 80)

@jreback
Copy link
Contributor Author

jreback commented Apr 26, 2016

@sinhrks

if you have a chance. can you see if these changes look resonable for the truncation / formatting in categoricals

[Tue Apr 26 15:11:58 ~/pandas]$ nosetests  pandas/tests/indexes/test_multi.py pandas/tests/indexes/test_category.py pandas/tests/indexes/test_base.py 
..............................................................................................................................................................................................F................................................................................................................F............
======================================================================
FAIL: test_string_categorical_index_repr (pandas.tests.indexes.test_category.TestCategoricalIndex)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jreback/pandas/pandas/tests/indexes/test_category.py", line 638, in test_string_categorical_index_repr
    self.assertEqual(unicode(idx), expected)
AssertionError: u"CategoricalIndex([u'\u3042', u'\u3044\u3044', u'\u3046\u3046\u3046', u'\u3042' [truncated]... != u"CategoricalIndex([u'\u3042', u'\u3044\u3044', u'\u3046\u3046\u3046', u'\u3042' [truncated]...
Diff is 711 characters long. Set self.maxDiff to None to see it.

======================================================================
FAIL: test_string_index_repr (pandas.tests.indexes.test_base.TestIndex)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jreback/pandas/pandas/tests/indexes/test_base.py", line 1465, in test_string_index_repr
    self.assertEqual(coerce(idx), expected)
AssertionError: u"Index([u'a', u'bb', u'ccc', u'a', u'bb', u'ccc', u'a', u'bb', u'ccc', u'a',\n  [truncated]... != u"                Index([u'a', u'bb', u'ccc', u'a', u'bb', u'ccc', u'a', u'bb',  [truncated]...
Diff is 670 characters long. Set self.maxDiff to None to see it.

----------------------------------------------------------------------
Ran 316 tests in 5.779s

FAILED (failures=2)

couple of failing tests. I am not sure of the parameters that were used to initially generate these. I have hard coded in a context manager, but I am doing something wrong.

@jreback jreback modified the milestones: 0.18.2, 0.18.1 Apr 26, 2016
@jorisvandenbossche jorisvandenbossche modified the milestones: 0.19.0, 0.18.2 Jun 14, 2016
@jorisvandenbossche jorisvandenbossche removed this from the 0.20.0 milestone Jul 8, 2016
@jorisvandenbossche jorisvandenbossche modified the milestones: 0.19.0, 0.20.0 Jul 8, 2016
@jreback jreback modified the milestones: 0.20.0, 0.19.0 Jul 13, 2016
@jreback
Copy link
Contributor Author

jreback commented Aug 26, 2016

closing, but maybe someone can pick this up.

@jorisvandenbossche
Copy link
Member

It would also depend a bit on what we do with #13480 (issue about changing the default multi-index repr)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Abbreviate MultiIndex representation
3 participants