Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: bleach truncates Katex style attributes #676

Closed
nguiard opened this issue Jul 18, 2022 · 7 comments · Fixed by #691
Closed

bug: bleach truncates Katex style attributes #676

nguiard opened this issue Jul 18, 2022 · 7 comments · Fixed by #691
Labels
untriaged Bug reports that haven't been triaged
Milestone

Comments

@nguiard
Copy link

nguiard commented Jul 18, 2022

Bleach truncates a lot of Katex style attributes

Basic example: a markdown_katex output may contain a span like so : <span class="vlist" style="height:1.0697em;">. It contains a style attribute, and when passed through bleach (allowing the style attribute), I get this:

<span class="vlist" style=""></span>

While the desired output would be:

<span class="vlist" style="height:1.0697em;"></span>

As a result, actual Katex math doesn't render properly.

** python and bleach versions:**

  • Python Version: 3.9.2
  • Bleach Version: 5.0.1
  • Markdown-katex Version: 202112.1034

To Reproduce

Steps to reproduce the behavior:

from bleach import Cleaner
 
cleaner = Cleaner(tags = ['span'],
                  attributes = {'span': ['class', 'style']})
          
minimal_katex_span = '<span class="vlist" style="height:1.0697em;">'
res = cleaner.clean(minimal_katex_span)
print(res)

Additional context

I am unsure if this is actually a bug or intended behavior in some way. The more general problem I face is: how to correctly use bleach after user input is transformed through markdown with the markdown_katex extension?

@nguiard nguiard added the untriaged Bug reports that haven't been triaged label Jul 18, 2022
@willkg
Copy link
Member

willkg commented Jul 18, 2022

Did you install the css extras?

https://bleach.readthedocs.io/en/latest/clean.html#sanitizing-css

@nguiard
Copy link
Author

nguiard commented Jul 18, 2022

Oh sorry I didn't. It's probably just that. I'll do that and reopen if needed. Thanks!

@nguiard nguiard closed this as completed Jul 18, 2022
@willkg
Copy link
Member

willkg commented Jul 18, 2022

Can I get some help with this? The thing you're hitting is this:

bleach/bleach/sanitizer.py

Lines 555 to 561 in 6cd4d52

# FIXME(willkg): if style is allowed, but no
# css_sanitizer was set up, then this is probably a
# mistake and we should raise an error here
#
# For now, we're going to set the value to "" because
# there was no sanitizer set
val = ""

Would it have helped if Bleach had emitted a Python warning because you've got "style" as an allowed attribute, but hadn't specified a css_sanitizer? If not that, should it throw an exception? I'm pretty sure the situation is an indication of a mistake and a developer would want to know and not have the problem you just had. I can't think of a case where you'd want to be in that situation (specifying style as allowed, but don't want to have the css sanitized), but I didn't know if I was lacking imagination or not. What do you think?

@nguiard
Copy link
Author

nguiard commented Jul 19, 2022

Sure! So, first of all, installing and using the css extras fixed my issue.

But as you suggested, effectively I think it would have been very nice to have a Python warning or error about that. Being a bit new to bleach and just wanting to adjust my previous basic bleaching to now allow for katex markup, I looked at the docs and the issues here, but did not get at first that the css extras would be relevant. I saw the css_sanitizer option in Cleaner, but I thought that a value of None would not parse/sanitize the css.

I think it's not crazy to think that at first (after all, it feels natural that "None" sanitizer would sanitize nothing), even though I understand that not sanitizing the css would rarely be the correct call.

@willkg
Copy link
Member

willkg commented Jul 19, 2022

I'm going to re-open this to cover two changes:

  1. Add a note to the clean docs about how if you're allowing the style attribute, you should also set a css_sanitizer otherwise the style value will be truncated.
  2. Change the code to emit a Python warning when style is allowed, but the css_sanitizer is not set.

@willkg willkg reopened this Jul 19, 2022
@nguiard
Copy link
Author

nguiard commented Jul 19, 2022

Related to this is the question of what tags and styles we should allow for Katex, as it is not necessarily trivial to get the complete list.

And more generally, say in theory you trust a plugin's output (not saying I trust Katex output specifically), but if that plugin uses a lot of tags, then you end up allowing a lot of tags you wouldn't have allowed normally. The allowed tags approach seems kind of flawed in that case. I don't know if there is a better way in these kinds of cases, like maybe treating parts separately...

@willkg
Copy link
Member

willkg commented Jul 19, 2022

Having a context aware allow list could help here. Bleach definitely doesn't support that currently. It feels like it'd be hard to implement because the stripping/escaping for tags is spread across a few classes, but maybe that's not true. You could try looking into that.

@willkg willkg added this to the 5.0.2 (tentative) milestone Oct 27, 2022
willkg added a commit that referenced this issue Dec 23, 2022
Add warning when css_sanitizer is not set, but style is allowed (#676)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
untriaged Bug reports that haven't been triaged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants