BUG: Wrong Custom Formatters applied when displaying trancated frames #35410

ipcoder · 2020-07-25T13:48:04Z

Problem description

I am providing custom formatters for specific columns as dict.
If frame is large enough and some columns are truncated - then wrong formatters are applied to the columns.
(In my case that leads to crushes as wrong data type is received by the formatter).

Please notice, that behavior changes depending on the width of the console window as different columns are displayed.

Problem investigation

I have examined the code of my version of panda (1.0.5) and compared with the last version in GitHub - the bug seems to be still there.

The source of the problem starts with this method (DataFrameFormatter._to_str_columns), when
frame is set to truncated frame = self.tr_frame and then self._format_col(i) is called with index of the column in the TRUNCATED frame:

    def _to_str_columns(self) -> List[List[str]]:
        """
        Render a DataFrame to a list of columns (as lists of strings).
        """
        # this method is not used by to_html where self.col_space
        # could be a string so safe to cast
        self.col_space = cast(int, self.col_space)

        frame = self.tr_frame
        # may include levels names also

        str_index = self._get_formatted_index(frame)

        if not is_list_like(self.header) and not self.header:
            stringified = []
            for i, c in enumerate(frame):
                fmt_values = self._format_col(i)

Then this "truncated" column index is passed to self._get_formatter:

    def _format_col(self, i: int) -> List[str]:
        frame = self.tr_frame
        formatter = self._get_formatter(i)   # the problem is HERE? _get_formatter(frame.columns[i]) ?

which uses full frame columns to retrieve formatter using index i which corresponds to the columns of the truncated frame:

           # ...
        else:
            if is_integer(i) and i not in self.columns:
                i = self.columns[i]
            return self.formatters.get(i, None)

INSTALLED VERSIONS

commit : None
python : 3.6.10.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-37-generic
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.0.5
numpy : 1.18.5
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 46.0.0.post20200309
Cython : 0.29.15
pytest : 5.4.1
hypothesis : 5.19.3
sphinx : 2.4.0
blosc : None
feather : None
xlsxwriter : 1.2.8
lxml.etree : 4.5.0
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.16.1
pandas_datareader: None
bs4 : 4.8.2
bottleneck : 1.3.2
fastparquet : None
gcsfs : None
lxml.etree : 4.5.0
matplotlib : 3.2.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.4.1
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.15
tables : 3.4.4
tabulate : 0.8.3
xarray : 0.15.0
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.8
numba : 0.50.1

The text was updated successfully, but these errors were encountered:

rhshadrach · 2020-07-26T17:39:02Z

Thanks for reporting this - could you provide a minimal reproducible example of the data/code that demonstrates the issue?

ipcoder · 2020-08-15T17:19:57Z

import pandas as pd

def form(name):
    return lambda x: f"{name}: {x}"

df = pd.DataFrame({f"Col{x}":range(5) for x in range(6)})
print(df.to_string(formatters=formatters, max_cols=6))
print(df.to_string(formatters={c: form(c) for c in df}, max_cols=4))

produces:

     Col0    Col1    Col2    Col3    Col4    Col5
0 Col0: 0 Col1: 0 Col2: 0 Col3: 0 Col4: 0 Col5: 0
1 Col0: 1 Col1: 1 Col2: 1 Col3: 1 Col4: 1 Col5: 1
2 Col0: 2 Col1: 2 Col2: 2 Col3: 2 Col4: 2 Col5: 2
3 Col0: 3 Col1: 3 Col2: 3 Col3: 3 Col4: 3 Col5: 3
4 Col0: 4 Col1: 4 Col2: 4 Col3: 4 Col4: 4 Col5: 4
     Col0    Col1   ...      Col4    Col5
0 Col0: 0 Col1: 0   ...   Col2: 0 Col3: 0
1 Col0: 1 Col1: 1   ...   Col2: 1 Col3: 1
2 Col0: 2 Col1: 2   ...   Col2: 2 Col3: 2
3 Col0: 3 Col1: 3   ...   Col2: 3 Col3: 3
4 Col0: 4 Col1: 4   ...   Col2: 4 Col3: 4```

As you can see the second print uses wrong formatters after the truncated columns
by selecting from the full instead of the truncated sequence of formatters.

I have patched my version as suggested above:

def _format_col(self, i: int) -> List[str]:
        frame = self.tr_frame
        formatter = self._get_formatter(frame.columns[i])   # instead of _get_formatter(i)

rhshadrach · 2020-08-16T18:40:05Z

Thanks - I can reproduce on master once I replace formatters=formatters in your example with the dictionary from the line below. This is indeed a bug, and your fix works well for the case where formatters are a dictionary, but I don't think it will work in the case of a list or tuple. Here, _get_formatter is really expecting an integer representing the position.

I think the root cause of the issue is that the columns attribute is not updated after the call to _chk_truncate in __init__.

Would you be interested in submitting a PR to fix?

ipcoder · 2022-02-17T22:07:18Z

I have never contributed to pandas development, and don't know the procedure.
I assume some tests should be passed before I commit, and may be other things.

In addition to that, I have tried to follow your lead to see if columns attribute indeed should be updated, but it is used in different places, and the impact is difficult to estimate, especially without any comments describing the general logic and design intentions.
It seems I need to understand all the implicit assumptions behind the formatting flow, to be able to make changes.

ipcoder added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 25, 2020

ipcoder changed the title ~~BUG: Wrong Custom Formatters applied in when displaying trancated frames~~ BUG: Wrong Custom Formatters applied when displaying trancated frames Jul 25, 2020

rhshadrach added the Output-Formatting __repr__ of pandas objects, to_string label Aug 16, 2020

rhshadrach added this to the Contributions Welcome milestone Aug 16, 2020

rhshadrach removed the Needs Triage Issue that has not been reviewed by a pandas team member label Aug 16, 2020

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Wrong Custom Formatters applied when displaying trancated frames #35410

BUG: Wrong Custom Formatters applied when displaying trancated frames #35410

ipcoder commented Jul 25, 2020 •

edited

INSTALLED VERSIONS

rhshadrach commented Jul 26, 2020 •

edited

ipcoder commented Aug 15, 2020 •

edited

rhshadrach commented Aug 16, 2020

ipcoder commented Feb 17, 2022 •

edited

BUG: Wrong Custom Formatters applied when displaying trancated frames #35410

BUG: Wrong Custom Formatters applied when displaying trancated frames #35410

Comments

ipcoder commented Jul 25, 2020 • edited

Problem description

Problem investigation

INSTALLED VERSIONS

rhshadrach commented Jul 26, 2020 • edited

ipcoder commented Aug 15, 2020 • edited

rhshadrach commented Aug 16, 2020

ipcoder commented Feb 17, 2022 • edited

ipcoder commented Jul 25, 2020 •

edited

rhshadrach commented Jul 26, 2020 •

edited

ipcoder commented Aug 15, 2020 •

edited

ipcoder commented Feb 17, 2022 •

edited