New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strings returned from strip_tags
are correctly tagged html_safe?
#45218
Strings returned from strip_tags
are correctly tagged html_safe?
#45218
Conversation
def test_strip_tags_is_marked_safe | ||
frag = "<div><<span>script</span>>xss();<<span>/script</span>></div>" | ||
assert_equal("<script>xss();</script>", strip_tags(frag)) # this string is safe for use as pcdata in html | ||
assert_predicate(strip_tags(frag), :html_safe?) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps for further confidence, we could add a few more cases to test_strip_tags
above. e.g.:
assert_equal "firstname", strip_tags("firstname<script")
assert_equal "middlename>xss()", strip_tags("middlename>xss()")
assert_equal "lastname", strip_tags("lastname</script>")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This behavior is well-tested in rails-html-sanitizer and the assert_equal
here is more for the reader of the test than anything else. I don't think it's necessary to add more coverage here, but will do so if you feel strongly about it.
Looking at the git history of strip_tags, it used to return a html_safe string. |
@p8 seems like the behavior changes because now it returns: > helper.strip_tags("something <img onerror=alert(1337)")
=> "something " |
@dorianmariefr Ah, so this got fixed in the sanitizer compared to Rails 3 where the sanitizer wasn't properly escaping? |
@p8 seems like a change in the underlying library, I don't know the history: > html = "something <img onerror=alert(1337)"
=> "something <img onerror=alert(1337)"
[6] pry(main)> Nokogiri::HTML::DocumentFragment.parse(html).content
=> "something "
[7] pry(main)> Loofah.fragment(html).text
=> "something "
[8] pry(main)> helper.strip_tags(html)
=> "something " I think it was previously using nokogiri and nokogiri changed its parser to nokogumbo https://github.com/sparklemotion/nokogiri/blob/main/CHANGELOG.md#1120--2021-08-02 |
@dorianmariefr Loofah does not yet use the gumbo parser, that work is in motion at flavorjones/loofah#239 @p8 the behavior patched in d8cf713 in 2012 pre-dates the use of Loofah and Nokogiri, which weren't introduced into Rails until 2013 in 209c007 and subsequent commits. The behavior of Rails::Html::Sanitizer, Loofah, and Nokogiri is well-established and well-tested at this point (any slight behavior changes in HTML parsing or sanitization require updating tests in both rails-html-sanitizer and loofah (and at times nokogiri)). I'm confident that this behavior is indeed "html safe", and as the person on the hook for any security problems in the sanitization stack at this point, my hope is that statement carries some weight. |
Because these strings contain no HTML elements and the basic entities are escaped, they are safe to be included as-is as PCDATA in HTML content. Tagging them as html-safe avoids double-escaping entities when being concatenated to a SafeBuffer during rendering. Fixes rails/rails-html-sanitizer#124
6cfdf0e
to
5f8f676
Compare
Strings returned from `strip_tags` are correctly tagged `html_safe?`
Backported to 7-0-stable in d15a694. Let me know if this needs to be in more versions, technically only 7.0 is supported for bug fixes, but if this is security adjacent then it might be necessary in older versions. |
Thanks for the explanation and your work @flavorjones ! 🙇 |
I'm happy if this is backported to 7-0, not sure it needs to be backported further than that as this is more of a nuisance (double-escaping) than a security problem. |
Summary
Fixes rails/rails-html-sanitizer#124
Because the strings returned from
strip_tags
contain no HTML elements and the basic entities are escaped, they are safe to be included as-is as PCDATA in HTML content. Tagging them as html-safe avoids double-escaping entities when being concatenated to a SafeBuffer during rendering.Other Information
The added tests contain some explanatory context, but I'll explain here as well. The bug at rails/rails-html-sanitizer#124 describes this behavior and asks if it is incorrect:
The things to note in this example:
strip_tags
escapes the&
character, transforming it into the HTML entity&
SafeBuffer#<<
sees that the string is nothtml_safe?
and escapes it again, resulting in the double-escaped&amp;
This appears to be unexpected and incorrect behavior. In fact, any string returned from
strip_tags
is "HTML safe" in the sense that it can be safely parsed by a browser as PCDATA. We can demonstrate this with an example of an attempted XSS attack:By marking the returned string as
html_safe?
, double-escaping is avoided and the string correctly represents itself as "HTML safe" to any other code in Rails (besidesSafeBuffer
) that might attempt escaping or manipulating HTML content.