MultiIndex `to_string` edge case Error after 0.23.0 upgrade #21180

atlasstrategic · 2018-05-23T10:04:11Z

Code example

import pandas as pd
import numpy as np

index = pd.date_range('1970', '2018', freq='A')
data = np.random.randn(len(index))
columns1 = [
    ['This is a long title with > 37 chars.'],
    ['cat'],
]
columns2 = [
    ['This is a loooooonger title with > 43 chars.'],
    ['dog'],
]
df1 = pd.DataFrame(data=data, index=index, columns=columns1)
df2 = pd.DataFrame(data=data, index=index, columns=columns2)
df = pd.concat([df1, df2], axis=1)
df.head()

Output (using pandas 0.23.0)

>>> df.head()
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/home/david/.virtualenvs/thegrid-py3-venv/lib/python3.5/site-packages/pandas/core/base.py", line 82, in __repr__
    return str(self)
  File "/home/david/.virtualenvs/thegrid-py3-venv/lib/python3.5/site-packages/pandas/core/base.py", line 61, in __str__
    return self.__unicode__()
  File "/home/david/.virtualenvs/thegrid-py3-venv/lib/python3.5/site-packages/pandas/core/frame.py", line 663, in __unicode__
    line_width=width, show_dimensions=show_dimensions)
  File "/home/david/.virtualenvs/thegrid-py3-venv/lib/python3.5/site-packages/pandas/core/frame.py", line 1968, in to_string
    formatter.to_string()
  File "/home/david/.virtualenvs/thegrid-py3-venv/lib/python3.5/site-packages/pandas/io/formats/format.py", line 648, in to_string
    strcols = self._to_str_columns()
  File "/home/david/.virtualenvs/thegrid-py3-venv/lib/python3.5/site-packages/pandas/io/formats/format.py", line 539, in _to_str_columns
    str_columns = self._get_formatted_column_labels(frame)
  File "/home/david/.virtualenvs/thegrid-py3-venv/lib/python3.5/site-packages/pandas/io/formats/format.py", line 782, in _get_formatted_column_labels
    str_columns = _sparsify(str_columns)
  File "/home/david/.virtualenvs/thegrid-py3-venv/lib/python3.5/site-packages/pandas/core/indexes/multi.py", line 2962, in _sparsify
    prev = pivoted[start]
IndexError: list index out of range

Problem description

After upgrading Pandas 0.22.0 to 0.23.0 I have experienced the above error. I have noticed that it is the length of the column values, This is a long title with > 37 chars. and This is a loooooonger title with > 43 chars., that makes the difference. If I tweak the combined length of these to be <= 80 characters, there is no error, and output is as expected.

Expected Output (using pandas 0.22.0)

>>> df.head()
           This is a long title with > 37 chars.  \
                                             cat   
1970-12-31                             -1.448415   
1971-12-31                              0.081324   
1972-12-31                             -0.018105   
1973-12-31                              0.902790   
1974-12-31                              0.668474   

           This is a loooooonger title with > 43 chars.  
                                                    dog  
1970-12-31                                    -1.448415  
1971-12-31                                     0.081324  
1972-12-31                                    -0.018105  
1973-12-31                                     0.902790  
1974-12-31                                     0.668474

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-124-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_ZA.UTF-8
LOCALE: en_ZA.UTF-8

pandas: 0.23.0
pytest: None
pip: 10.0.1
setuptools: 32.3.1
Cython: None
numpy: 1.14.0
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2018.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: 2.5.3
xlrd: None
xlwt: None
xlsxwriter: 1.0.4
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: 2.7.4 (dt dec pq3 ext lo64)
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2018-05-23T11:09:57Z

This doesn't raise for me (py36, and pandas master).

What is pd.options.display.max_colwidth, pd.options.display.wdith, and pd.options.display.max_columns?

atlasstrategic · 2018-05-23T12:27:35Z

@TomAugspurger Here my system pandas 0.23.0 output:

>>> import pandas as pd
>>> pd.options.display.max_colwidth
50
>>> pd.options.display.width
80
>>> pd.options.display.max_columns
0

0.22.0 output:

>>> import pandas as pd
>>> pd.options.display.max_colwidth
50
>>> pd.options.display.width
80
>>> pd.options.display.max_columns
20

If I do the following it works in 0.23.0!

pd.set_option("max_columns", 20)

Did the default setting change in 0.23.0?

atlasstrategic · 2018-05-25T13:16:57Z

Reading the docs show how 0.22:

In case python/IPython is running in a terminal this can be set to 0

has been updated in 0.23 to:

In case Python/IPython is running in a terminal this is set to 0 by default.

However, when switching back to 0.22.0 and manually changing the max_columns option to 0 doesn't result in raising the exception.

🤔 So it still doesn't explain why there would be an error raised when max_columns is set to 0?

TomAugspurger · 2018-05-26T18:49:27Z

cc @cbrnr if you have any ideas.

cbrnr · 2018-05-28T06:48:18Z

I get an AttributeError: module 'pandas._libs.tslibs.timezones' has no attribute 'tz_standardize' when I test this with the latest master branch revision. Any ideas how to fix this? Using 0.23, I can reproduce the issue.

TomAugspurger · 2018-06-02T21:29:00Z

You need to recompile the extension modules. Commands for your platform should be in the contributing docs.

…

________________________________ From: Clemens Brunner <notifications@github.com> Sent: Monday, May 28, 2018 1:48:25 AM To: pandas-dev/pandas Cc: Tom Augspurger; Mention Subject: Re: [pandas-dev/pandas] MultiIndex `to_string` edge case Error after 0.23.0 upgrade (#21180) I get an AttributeError: module 'pandas._libs.tslibs.timezones' has no attribute 'tz_standardize' when I test this with the latest master branch revision. Any ideas how to fix this? Using 0.23, I can reproduce the issue. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#21180 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ABQHIikYuSBQuVOuVYnZ4mSdLRkxq_pCks5t2525gaJpZM4UKJe1>.

cbrnr · 2018-06-03T08:27:58Z

Thanks, I forgot about that. Thankfully, it's not the add one business (I get the same error when I revert this change). This will take a bit of work, since everything works in PyCharm but not in IPython (so debugging will be much slower for me since I'm not used to pdb at all)...

cbrnr · 2018-06-04T10:12:00Z

Apparently, setting pd.options.display.max_columns = 0 in 0.22 also results in this error. So the issue was not introduced by my change, which merely changed the default to 0.

atlasstrategic · 2018-06-04T15:13:52Z

Hi @cbrnr

also results in this error.

Probably you ment to say does not? I do agree, merely changing the default to 0 should not result in the unexpected error.

cbrnr · 2018-06-04T16:38:39Z

No, I get the same error with pandas 0.22 if I first set pd.options.display.max_columns = 0. This means that this bug has been there for a while (I haven't tried older versions, but I suspect that they will behave similarly).

TomAugspurger · 2018-06-04T17:37:05Z

I do not get the exception on pandas 0.22.0 with

import pandas as pd
pd.options.display.max_columns = 0
import numpy as np

index = pd.date_range('1970', '2018', freq='A')
data = np.random.randn(len(index))
columns1 = [
    ['This is a long title with > 37 chars.'],
    ['cat'],
]
columns2 = [
    ['This is a loooooonger title with > 43 chars.'],
    ['dog'],
]
df1 = pd.DataFrame(data=data, index=index, columns=columns1)
df2 = pd.DataFrame(data=data, index=index, columns=columns2)
df = pd.concat([df1, df2], axis=1)
df.head()

TomAugspurger · 2018-06-04T17:41:42Z

Though it occurs to me that this probably depends on the width of the terminal.

jreback · 2018-06-04T22:01:56Z

not a regression, but still should fix.

cbrnr · 2018-06-05T06:15:34Z

@TomAugspurger I just tried again, I do get the error with 0.22. How are you running this code? If you are not in interactive mode (e.g. IPython), you need to change the last line to print(df.head()) in order to produce the output. I'm running this in IPython on macOS in a normal terminal (not Jupyter QtConsole) with 100x35 window size.

TomAugspurger · 2018-06-05T15:57:27Z

@jreback could you please make a note when you're moving the milestone? This should be fixed for 0.23.1.

jreback · 2018-06-05T16:05:56Z

i made a note
and this does not need to block 0.23.1
it’s jot a regression

pls don’t mark milestones unless ready to go

jorisvandenbossche · 2018-06-05T16:37:52Z

@jreback This is a regression in user experience. It may be an existing bug, but code that was working before, is failing now, because we changed the default. So we should still fix that existing bug for 0.23.1.

I cannot reproduce the error with the example in this issue, but I do see it with the example from #21327

jreback · 2018-06-07T11:11:57Z

@jorisvandenbossche sure regressions happen, and we should fix them all. but unless this is fixed today, it will go in the next release.

jorisvandenbossche · 2018-06-07T14:29:41Z

We can also change the default of max_columns back to 20 for now if we don't find the effort to fix the bugs

TomAugspurger · 2018-06-07T20:02:51Z

Here's a failing unit test

diff --git a/pandas/tests/io/formats/test_format.py b/pandas/tests/io/formats/test_format.py
index f221df93d..52f83f093 100644
--- a/pandas/tests/io/formats/test_format.py
+++ b/pandas/tests/io/formats/test_format.py
@@ -305,6 +305,36 @@ class TestDataFrameFormatting(object):
             assert not has_truncated_repr(df)
             assert not has_expanded_repr(df)
 
+    def test_repr_multiindex(self):
+        # https://github.com/pandas-dev/pandas/issues/21180
+        from unittest import mock
+
+        def f():
+            return os.terminal_size((118, 96))
+
+        terminal_size = os.terminal_size((118, 96))
+
+        p1 = mock.patch('pandas.io.formats.console.get_terminal_size',
+                        return_value=terminal_size)
+        p2 = mock.patch('pandas.io.formats.format.get_terminal_size',
+                        return_value=terminal_size)
+        index = pd.date_range('1970', '2018', freq='A')
+        data = np.random.randn(len(index))
+        columns1 = [
+            ['This is a long title with > 37 chars.'],
+            ['cat'],
+        ]
+        columns2 = [
+            ['This is a loooooonger title with > 43 chars.'],
+            ['dog'],
+        ]
+        df1 = pd.DataFrame(data=data, index=index, columns=columns1)
+        df2 = pd.DataFrame(data=data, index=index, columns=columns2)
+        df = pd.concat([df1, df2], axis=1)
+
+        with p1, p2:
+            repr(df.head())
+
     def test_repr_max_columns_max_rows(self):
         term_width, term_height = get_terminal_size()
         if term_width < 10 or term_height < 10:

jorisvandenbossche · 2018-06-27T12:03:27Z

If we don't have a fix for this, I would consider reverting the pandas.options.display.max_columns back to 20, and work on fixing this and possibly turning back to 0 for 0.24.0.

Errors in the repr are really annoying, as you cannot even inspect the data properly to see what might be the reason something is not working.

TomAugspurger · 2018-06-27T12:58:43Z

I'm going to try to fix it now.

TomAugspurger · 2018-06-27T13:23:22Z

What's the expected behavior here? I can easily match the behavior of the non-MI case,

In [3]: s = pd.DataFrame({"A" * 41: [1, 2], 'B' * 41: [1, 2]})

In [4]: with p1, p2:
   ...:     print(repr(s))
   ...:
  ...
0 ...
1 ...

[2 rows x 2 columns]

but that's not too useful...

jorisvandenbossche · 2018-06-27T13:37:01Z

That's a good question. For the truncated repr, we always need two columns right? (first and last)
So previously it put the two columns below each other, but now they would need to be next to each other, which is exactly the problem as they do not fit ..

jorisvandenbossche · 2018-06-27T13:39:35Z

Overflowing the line is what is happening in my console if I make it smaller (instead of the error), that might be an option in general (it does not make the repr very readable for this case, but at least would not lead to an error)

atlasstrategic changed the title ~~Multiindex to_string edge case Error after 0.23.0 upgrade~~ MultiIndex to_string edge case Error after 0.23.0 upgrade May 23, 2018

TomAugspurger added the Output-Formatting __repr__ of pandas objects, to_string label Jun 4, 2018

TomAugspurger added this to the 0.23.1 milestone Jun 4, 2018

jreback modified the milestones: 0.23.1, Next Major Release Jun 4, 2018

TomAugspurger mentioned this issue Jun 5, 2018

IndexError is raised when printing wide dataframes on too narrow terminal #21327

Closed

TomAugspurger modified the milestones: Next Major Release, 0.23.1 Jun 5, 2018

jorisvandenbossche added the Regression Functionality that used to work in a prior pandas version label Jun 5, 2018

jreback modified the milestones: 0.23.1, 0.23.2 Jun 7, 2018

TomAugspurger mentioned this issue Jun 27, 2018

BUG: Fix MI repr with long names #21655

Merged

jorisvandenbossche closed this as completed in #21655 Jul 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MultiIndex `to_string` edge case Error after 0.23.0 upgrade #21180

MultiIndex `to_string` edge case Error after 0.23.0 upgrade #21180

atlasstrategic commented May 23, 2018

INSTALLED VERSIONS

TomAugspurger commented May 23, 2018

atlasstrategic commented May 23, 2018 •

edited

Loading

atlasstrategic commented May 25, 2018

TomAugspurger commented May 26, 2018

cbrnr commented May 28, 2018

TomAugspurger commented Jun 2, 2018 via email

cbrnr commented Jun 3, 2018

cbrnr commented Jun 4, 2018

atlasstrategic commented Jun 4, 2018

cbrnr commented Jun 4, 2018

TomAugspurger commented Jun 4, 2018

TomAugspurger commented Jun 4, 2018

jreback commented Jun 4, 2018

cbrnr commented Jun 5, 2018

TomAugspurger commented Jun 5, 2018

jreback commented Jun 5, 2018

jorisvandenbossche commented Jun 5, 2018 •

edited

Loading

jreback commented Jun 7, 2018

jorisvandenbossche commented Jun 7, 2018

TomAugspurger commented Jun 7, 2018

jorisvandenbossche commented Jun 27, 2018

TomAugspurger commented Jun 27, 2018

TomAugspurger commented Jun 27, 2018

jorisvandenbossche commented Jun 27, 2018

jorisvandenbossche commented Jun 27, 2018

MultiIndex to_string edge case Error after 0.23.0 upgrade #21180

MultiIndex to_string edge case Error after 0.23.0 upgrade #21180

Comments

atlasstrategic commented May 23, 2018

Code example

Output (using pandas 0.23.0)

Problem description

Expected Output (using pandas 0.22.0)

Output of pd.show_versions()

INSTALLED VERSIONS

TomAugspurger commented May 23, 2018

atlasstrategic commented May 23, 2018 • edited Loading

atlasstrategic commented May 25, 2018

TomAugspurger commented May 26, 2018

cbrnr commented May 28, 2018

TomAugspurger commented Jun 2, 2018 via email

cbrnr commented Jun 3, 2018

cbrnr commented Jun 4, 2018

atlasstrategic commented Jun 4, 2018

cbrnr commented Jun 4, 2018

TomAugspurger commented Jun 4, 2018

TomAugspurger commented Jun 4, 2018

jreback commented Jun 4, 2018

cbrnr commented Jun 5, 2018

TomAugspurger commented Jun 5, 2018

jreback commented Jun 5, 2018

jorisvandenbossche commented Jun 5, 2018 • edited Loading

jreback commented Jun 7, 2018

jorisvandenbossche commented Jun 7, 2018

TomAugspurger commented Jun 7, 2018

jorisvandenbossche commented Jun 27, 2018

TomAugspurger commented Jun 27, 2018

TomAugspurger commented Jun 27, 2018

jorisvandenbossche commented Jun 27, 2018

jorisvandenbossche commented Jun 27, 2018

MultiIndex `to_string` edge case Error after 0.23.0 upgrade #21180

MultiIndex `to_string` edge case Error after 0.23.0 upgrade #21180

Output of `pd.show_versions()`

atlasstrategic commented May 23, 2018 •

edited

Loading

jorisvandenbossche commented Jun 5, 2018 •

edited

Loading