BUG: Unexpected Styler.format behavior #57980

quangngd · 2024-03-24T04:35:27Z

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

df = pd.DataFrame(["_"])
styler = df.style.format(lambda x: x.replace("_", "A"), escape="latex")
print(styler.to_string())

Issue Description

The code returns \A instead of A.

0
0 \A

This is because instead of escape(format(val)): _ -> A -> A

pandas does format(escape(val)) instead: _ -> \_ -> \A

Expected Behavior

0
0 A

Installed Versions

INSTALLED VERSIONS

commit : 05f75c6
python : 3.10.13.final.0
python-bits : 64
OS : Darwin
OS-release : 23.3.0
Version : Darwin Kernel Version 23.3.0: Wed Dec 20 21:30:59 PST 2023; root:xnu-10002.81.5~7/RELEASE_ARM64_T6030
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.UTF-8

pandas : 3.0.0.dev0+267.g05f75c65d4
numpy : 1.26.4
pytz : 2024.1
dateutil : 2.9.0
setuptools : 69.2.0
pip : 24.0
Cython : 3.0.9
pytest : 8.1.1
hypothesis : 6.99.6
sphinx : 7.2.6
blosc : None
feather : None
xlsxwriter : 3.1.9
lxml.etree : 5.1.0
html5lib : 1.1
pymysql : 1.4.6
psycopg2 : 2.9.9
jinja2 : 3.1.3
IPython : 8.22.2
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
bottleneck : 1.3.8
fastparquet : 2024.2.0
fsspec : 2024.2.0
gcsfs : 2024.2.0
matplotlib : 3.8.3
numba : 0.59.0
numexpr : 2.9.0
odfpy : None
openpyxl : 3.1.2
pyarrow : 15.0.1
pyreadstat : 1.2.7
python-calamine : None
pyxlsb : 1.0.10
s3fs : 2024.2.0
scipy : 1.12.0
sqlalchemy : 2.0.28
tables : 3.9.2
tabulate : 0.9.0
xarray : 2024.2.0
xlrd : 2.0.1
zstandard : 0.22.0
tzdata : 2024.1
qtpy : None
pyqt5 : None

The text was updated successfully, but these errors were encountered:

attack68 · 2024-03-24T13:37:02Z

This is not a bug. It is as documented. escaping is done before formatter.

This is actually a feature by design and was implemented this way after careful consideration.

Achieving the result you want is possible. It is simply to design your own formatter that performs escaping after it has formatted your text. Then you do not add anything to the esacpe argument directly.

rhshadrach · 2024-03-24T14:49:28Z

This is actually a feature by design and was implemented this way after careful consideration.

Just curious - what is the benefit of escaping prior?

attack68 · 2024-04-19T19:46:33Z

To be honest I can't remember exactly, we are going back a few years.
It may be that in the way it is designed you can achieve both, whilst that might be trickier with the reverse.
It seems more natural that if you:

Turned on escape and added a formatter your formatter is applied to the strings that have been pre-escaped.
Turned off escape and added a formatter that formatter will operate directly on the strings unchanged (and then can include your own escaping function if you so choose).

rhshadrach · 2024-04-19T21:55:09Z

To be honest I can't remember exactly, we are going back a few years.

Makes sense, no problem. The trouble I'm having is I can't envision a use case where you wouldn't want escaping to be the very last step.

It is simply to design your own formatter that performs escaping after it has formatted your text.

What would the code for this look like? At a glance, it sounds intricate in the general case.

attack68 · 2024-04-19T22:35:18Z

OK I think it came to me.
Setup a DF and show it (you have to escape this to see the <div>):

df = DataFrame(["ab&%<div>"])
df.style.format(escape="html")

Since Styler is a display tool one can use the formatter to hack elements into HTML or LaTeX, for specific rows, columns or even cells, with native HTML or LaTeX commands. Something like this:

df.style.format('<span style="color:red;">{}</span>', escape="html")

This will not work if you escape after the fact. You can simulate escaping after as follows:

from pandas.io.formats.style_render import _str_escape
def custom_formatter(s):
    s = '<span style="color:red;">{}</span>'.format(s)
    return _str_escape(s, "html")
df.style.format(custom_formatter)

Thus with the current system you can do everything if you know the various mechanisms.

You could of course argue that one could write a customer formatter to escape first and then format, but I do also think there were possibly some other reasons when mixing the other things that the format function did, e.g. hyperlinks, which shouldn't be escaped.

quangngd added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 24, 2024

attack68 added Styler conditional formatting using DataFrame.style Closing Candidate May be closeable, needs more eyeballs and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Unexpected Styler.format behavior #57980

BUG: Unexpected Styler.format behavior #57980

quangngd commented Mar 24, 2024

INSTALLED VERSIONS

attack68 commented Mar 24, 2024

rhshadrach commented Mar 24, 2024

attack68 commented Apr 19, 2024

rhshadrach commented Apr 19, 2024

attack68 commented Apr 19, 2024 •

edited

BUG: Unexpected Styler.format behavior #57980

BUG: Unexpected Styler.format behavior #57980

Comments

quangngd commented Mar 24, 2024

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

INSTALLED VERSIONS

attack68 commented Mar 24, 2024

rhshadrach commented Mar 24, 2024

attack68 commented Apr 19, 2024

rhshadrach commented Apr 19, 2024

attack68 commented Apr 19, 2024 • edited

attack68 commented Apr 19, 2024 •

edited