Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strings returned from strip_tags are correctly tagged html_safe? #45218

Merged
merged 1 commit into from May 31, 2022

Conversation

flavorjones
Copy link
Member

@flavorjones flavorjones commented May 30, 2022

Summary

Fixes rails/rails-html-sanitizer#124

Because the strings returned from strip_tags contain no HTML elements and the basic entities are escaped, they are safe to be included as-is as PCDATA in HTML content. Tagging them as html-safe avoids double-escaping entities when being concatenated to a SafeBuffer during rendering.

Other Information

The added tests contain some explanatory context, but I'll explain here as well. The bug at rails/rails-html-sanitizer#124 describes this behavior and asks if it is incorrect:

    buffer = ActiveSupport::SafeBuffer.new
    buffer << helper.strip_tags("<div>hello & goodbye</div>")
    buffer # => "hello &amp;amp; goodbye"

The things to note in this example:

  • strip_tags escapes the & character, transforming it into the HTML entity &amp;
  • SafeBuffer#<< sees that the string is not html_safe? and escapes it again, resulting in the double-escaped &amp;amp;

This appears to be unexpected and incorrect behavior. In fact, any string returned from strip_tags is "HTML safe" in the sense that it can be safely parsed by a browser as PCDATA. We can demonstrate this with an example of an attempted XSS attack:

    frag = "<div>&lt;<span>script</span>&gt;xss();&lt;<span>/script</span>&gt;</div>"
    strip_tags(frag) # => "&lt;script&gt;xss();&lt;/script&gt;"

By marking the returned string as html_safe?, double-escaping is avoided and the string correctly represents itself as "HTML safe" to any other code in Rails (besides SafeBuffer) that might attempt escaping or manipulating HTML content.

actionview/test/template/render_test.rb Outdated Show resolved Hide resolved
Comment on lines +40 to +44
def test_strip_tags_is_marked_safe
frag = "<div>&lt;<span>script</span>&gt;xss();&lt;<span>/script</span>&gt;</div>"
assert_equal("&lt;script&gt;xss();&lt;/script&gt;", strip_tags(frag)) # this string is safe for use as pcdata in html
assert_predicate(strip_tags(frag), :html_safe?)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps for further confidence, we could add a few more cases to test_strip_tags above. e.g.:

assert_equal "firstname", strip_tags("firstname<script")
assert_equal "middlename&gt;xss()", strip_tags("middlename>xss()")
assert_equal "lastname", strip_tags("lastname</script>")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior is well-tested in rails-html-sanitizer and the assert_equal here is more for the reader of the test than anything else. I don't think it's necessary to add more coverage here, but will do so if you feel strongly about it.

@p8
Copy link
Member

p8 commented May 31, 2022

Looking at the git history of strip_tags, it used to return a html_safe string.
That got removed because of a CVE: d8cf713

@dorianmariecom
Copy link
Contributor

@p8 seems like the behavior changes because now it returns:

> helper.strip_tags("something <img onerror=alert(1337)")
=> "something "

@p8
Copy link
Member

p8 commented May 31, 2022

@dorianmariefr Ah, so this got fixed in the sanitizer compared to Rails 3 where the sanitizer wasn't properly escaping?

@dorianmariecom
Copy link
Contributor

@p8 seems like a change in the underlying library, I don't know the history:

> html = "something <img onerror=alert(1337)"
=> "something <img onerror=alert(1337)"
[6] pry(main)> Nokogiri::HTML::DocumentFragment.parse(html).content
=> "something "
[7] pry(main)> Loofah.fragment(html).text
=> "something "
[8] pry(main)> helper.strip_tags(html)
=> "something "

I think it was previously using nokogiri and nokogiri changed its parser to nokogumbo https://github.com/sparklemotion/nokogiri/blob/main/CHANGELOG.md#1120--2021-08-02

@flavorjones
Copy link
Member Author

flavorjones commented May 31, 2022

@dorianmariefr Loofah does not yet use the gumbo parser, that work is in motion at flavorjones/loofah#239

@p8 the behavior patched in d8cf713 in 2012 pre-dates the use of Loofah and Nokogiri, which weren't introduced into Rails until 2013 in 209c007 and subsequent commits.

The behavior of Rails::Html::Sanitizer, Loofah, and Nokogiri is well-established and well-tested at this point (any slight behavior changes in HTML parsing or sanitization require updating tests in both rails-html-sanitizer and loofah (and at times nokogiri)). I'm confident that this behavior is indeed "html safe", and as the person on the hook for any security problems in the sanitization stack at this point, my hope is that statement carries some weight.

Because these strings contain no HTML elements and the basic entities
are escaped, they are safe to be included as-is as PCDATA in HTML
content. Tagging them as html-safe avoids double-escaping entities
when being concatenated to a SafeBuffer during rendering.

Fixes rails/rails-html-sanitizer#124
@flavorjones flavorjones force-pushed the flavorjones-rhs124-html-safe branch from 6cfdf0e to 5f8f676 Compare May 31, 2022 14:02
@eileencodes eileencodes merged commit 55e143a into rails:main May 31, 2022
eileencodes added a commit that referenced this pull request May 31, 2022
Strings returned from `strip_tags` are correctly tagged `html_safe?`
@eileencodes
Copy link
Member

Backported to 7-0-stable in d15a694. Let me know if this needs to be in more versions, technically only 7.0 is supported for bug fixes, but if this is security adjacent then it might be necessary in older versions.

@flavorjones flavorjones deleted the flavorjones-rhs124-html-safe branch May 31, 2022 14:44
@p8
Copy link
Member

p8 commented May 31, 2022

Thanks for the explanation and your work @flavorjones ! 🙇
It's much appreciated.

@flavorjones
Copy link
Member Author

I'm happy if this is backported to 7-0, not sure it needs to be backported further than that as this is more of a nuisance (double-escaping) than a security problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

strip_tags(input).html_safe? # => false to true ?
5 participants