Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Unexpected Styler.format behavior #57980

Open
3 tasks done
quangngd opened this issue Mar 24, 2024 · 5 comments
Open
3 tasks done

BUG: Unexpected Styler.format behavior #57980

quangngd opened this issue Mar 24, 2024 · 5 comments
Labels
Closing Candidate May be closeable, needs more eyeballs Styler conditional formatting using DataFrame.style

Comments

@quangngd
Copy link
Contributor

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

df = pd.DataFrame(["_"])
styler = df.style.format(lambda x: x.replace("_", "A"), escape="latex")
print(styler.to_string())

Issue Description

The code returns \A instead of A.

0
0 \A

This is because instead of escape(format(val)): _ -> A -> A

pandas does format(escape(val)) instead: _ -> \_ -> \A

Expected Behavior

0
0 A

Installed Versions

INSTALLED VERSIONS

commit : 05f75c6
python : 3.10.13.final.0
python-bits : 64
OS : Darwin
OS-release : 23.3.0
Version : Darwin Kernel Version 23.3.0: Wed Dec 20 21:30:59 PST 2023; root:xnu-10002.81.5~7/RELEASE_ARM64_T6030
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.UTF-8

pandas : 3.0.0.dev0+267.g05f75c65d4
numpy : 1.26.4
pytz : 2024.1
dateutil : 2.9.0
setuptools : 69.2.0
pip : 24.0
Cython : 3.0.9
pytest : 8.1.1
hypothesis : 6.99.6
sphinx : 7.2.6
blosc : None
feather : None
xlsxwriter : 3.1.9
lxml.etree : 5.1.0
html5lib : 1.1
pymysql : 1.4.6
psycopg2 : 2.9.9
jinja2 : 3.1.3
IPython : 8.22.2
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
bottleneck : 1.3.8
fastparquet : 2024.2.0
fsspec : 2024.2.0
gcsfs : 2024.2.0
matplotlib : 3.8.3
numba : 0.59.0
numexpr : 2.9.0
odfpy : None
openpyxl : 3.1.2
pyarrow : 15.0.1
pyreadstat : 1.2.7
python-calamine : None
pyxlsb : 1.0.10
s3fs : 2024.2.0
scipy : 1.12.0
sqlalchemy : 2.0.28
tables : 3.9.2
tabulate : 0.9.0
xarray : 2024.2.0
xlrd : 2.0.1
zstandard : 0.22.0
tzdata : 2024.1
qtpy : None
pyqt5 : None

@quangngd quangngd added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 24, 2024
@attack68
Copy link
Contributor

This is not a bug. It is as documented. escaping is done before formatter.

This is actually a feature by design and was implemented this way after careful consideration.

Achieving the result you want is possible. It is simply to design your own formatter that performs escaping after it has formatted your text. Then you do not add anything to the esacpe argument directly.

@attack68 attack68 added Styler conditional formatting using DataFrame.style Closing Candidate May be closeable, needs more eyeballs and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 24, 2024
@rhshadrach
Copy link
Member

This is actually a feature by design and was implemented this way after careful consideration.

Just curious - what is the benefit of escaping prior?

@attack68
Copy link
Contributor

To be honest I can't remember exactly, we are going back a few years.
It may be that in the way it is designed you can achieve both, whilst that might be trickier with the reverse.
It seems more natural that if you:

  • Turned on escape and added a formatter your formatter is applied to the strings that have been pre-escaped.
  • Turned off escape and added a formatter that formatter will operate directly on the strings unchanged (and then can include your own escaping function if you so choose).

@rhshadrach
Copy link
Member

To be honest I can't remember exactly, we are going back a few years.

Makes sense, no problem. The trouble I'm having is I can't envision a use case where you wouldn't want escaping to be the very last step.

It is simply to design your own formatter that performs escaping after it has formatted your text.

What would the code for this look like? At a glance, it sounds intricate in the general case.

@attack68
Copy link
Contributor

attack68 commented Apr 19, 2024

OK I think it came to me.
Setup a DF and show it (you have to escape this to see the <div>):

df = DataFrame(["ab&%<div>"])
df.style.format(escape="html")
Screenshot 2024-04-20 at 00 20 46

Since Styler is a display tool one can use the formatter to hack elements into HTML or LaTeX, for specific rows, columns or even cells, with native HTML or LaTeX commands. Something like this:

df.style.format('<span style="color:red;">{}</span>', escape="html")
Screenshot 2024-04-20 at 00 26 25

This will not work if you escape after the fact. You can simulate escaping after as follows:

from pandas.io.formats.style_render import _str_escape
def custom_formatter(s):
    s = '<span style="color:red;">{}</span>'.format(s)
    return _str_escape(s, "html")
df.style.format(custom_formatter)
Screenshot 2024-04-20 at 00 31 41

Thus with the current system you can do everything if you know the various mechanisms.

You could of course argue that one could write a customer formatter to escape first and then format, but I do also think there were possibly some other reasons when mixing the other things that the format function did, e.g. hyperlinks, which shouldn't be escaped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Closing Candidate May be closeable, needs more eyeballs Styler conditional formatting using DataFrame.style
Projects
None yet
Development

No branches or pull requests

3 participants