Strings returned from `strip_tags` are correctly tagged `html_safe?` #45218

flavorjones · 2022-05-30T15:23:17Z

Summary

Because the strings returned from strip_tags contain no HTML elements and the basic entities are escaped, they are safe to be included as-is as PCDATA in HTML content. Tagging them as html-safe avoids double-escaping entities when being concatenated to a SafeBuffer during rendering.

Other Information

The added tests contain some explanatory context, but I'll explain here as well. The bug at rails/rails-html-sanitizer#124 describes this behavior and asks if it is incorrect:

    buffer = ActiveSupport::SafeBuffer.new
    buffer << helper.strip_tags("<div>hello & goodbye</div>")
    buffer # => "hello &amp;amp; goodbye"

The things to note in this example:

strip_tags escapes the & character, transforming it into the HTML entity &
SafeBuffer#<< sees that the string is not html_safe? and escapes it again, resulting in the double-escaped &amp;

This appears to be unexpected and incorrect behavior. In fact, any string returned from strip_tags is "HTML safe" in the sense that it can be safely parsed by a browser as PCDATA. We can demonstrate this with an example of an attempted XSS attack:

    frag = "<div>&lt;<span>script</span>&gt;xss();&lt;<span>/script</span>&gt;</div>"
    strip_tags(frag) # => "&lt;script&gt;xss();&lt;/script&gt;"

By marking the returned string as html_safe?, double-escaping is avoided and the string correctly represents itself as "HTML safe" to any other code in Rails (besides SafeBuffer) that might attempt escaping or manipulating HTML content.

actionview/test/template/render_test.rb

jonathanhefner · 2022-05-30T16:50:54Z

actionview/test/template/sanitize_helper_test.rb

+  def test_strip_tags_is_marked_safe
+    frag = "<div>&lt;<span>script</span>&gt;xss();&lt;<span>/script</span>&gt;</div>"
+    assert_equal("&lt;script&gt;xss();&lt;/script&gt;", strip_tags(frag)) # this string is safe for use as pcdata in html
+    assert_predicate(strip_tags(frag), :html_safe?)
+  end


Perhaps for further confidence, we could add a few more cases to test_strip_tags above. e.g.:

assert_equal "firstname", strip_tags("firstname<script") assert_equal "middlename>xss()", strip_tags("middlename>xss()") assert_equal "lastname", strip_tags("lastname</script>")

This behavior is well-tested in rails-html-sanitizer and the assert_equal here is more for the reader of the test than anything else. I don't think it's necessary to add more coverage here, but will do so if you feel strongly about it.

p8 · 2022-05-31T05:16:47Z

Looking at the git history of strip_tags, it used to return a html_safe string.
That got removed because of a CVE: d8cf713

dorianmariecom · 2022-05-31T05:50:36Z

@p8 seems like the behavior changes because now it returns:

> helper.strip_tags("something <img onerror=alert(1337)")
=> "something "

p8 · 2022-05-31T07:06:45Z

@dorianmariefr Ah, so this got fixed in the sanitizer compared to Rails 3 where the sanitizer wasn't properly escaping?

dorianmariecom · 2022-05-31T12:08:29Z

@p8 seems like a change in the underlying library, I don't know the history:

> html = "something <img onerror=alert(1337)"
=> "something <img onerror=alert(1337)"
[6] pry(main)> Nokogiri::HTML::DocumentFragment.parse(html).content
=> "something "
[7] pry(main)> Loofah.fragment(html).text
=> "something "
[8] pry(main)> helper.strip_tags(html)
=> "something "

I think it was previously using nokogiri and nokogiri changed its parser to nokogumbo https://github.com/sparklemotion/nokogiri/blob/main/CHANGELOG.md#1120--2021-08-02

flavorjones · 2022-05-31T13:53:23Z

@dorianmariefr Loofah does not yet use the gumbo parser, that work is in motion at flavorjones/loofah#239

@p8 the behavior patched in d8cf713 in 2012 pre-dates the use of Loofah and Nokogiri, which weren't introduced into Rails until 2013 in 209c007 and subsequent commits.

The behavior of Rails::Html::Sanitizer, Loofah, and Nokogiri is well-established and well-tested at this point (any slight behavior changes in HTML parsing or sanitization require updating tests in both rails-html-sanitizer and loofah (and at times nokogiri)). I'm confident that this behavior is indeed "html safe", and as the person on the hook for any security problems in the sanitization stack at this point, my hope is that statement carries some weight.

Because these strings contain no HTML elements and the basic entities are escaped, they are safe to be included as-is as PCDATA in HTML content. Tagging them as html-safe avoids double-escaping entities when being concatenated to a SafeBuffer during rendering. Fixes rails/rails-html-sanitizer#124

Strings returned from `strip_tags` are correctly tagged `html_safe?`

eileencodes · 2022-05-31T14:30:20Z

Backported to 7-0-stable in d15a694. Let me know if this needs to be in more versions, technically only 7.0 is supported for bug fixes, but if this is security adjacent then it might be necessary in older versions.

p8 · 2022-05-31T14:45:29Z

Thanks for the explanation and your work @flavorjones ! 🙇
It's much appreciated.

flavorjones · 2022-05-31T14:45:34Z

I'm happy if this is backported to 7-0, not sure it needs to be backported further than that as this is more of a nuisance (double-escaping) than a security problem.

rails-bot bot added the actionview label May 30, 2022

flavorjones mentioned this pull request May 30, 2022

strip_tags(input).html_safe? # => false to true ? rails/rails-html-sanitizer#124

Closed

jonathanhefner reviewed May 30, 2022

View reviewed changes

flavorjones force-pushed the flavorjones-rhs124-html-safe branch from 6cfdf0e to 5f8f676 Compare May 31, 2022 14:02

rails-bot bot added the activesupport label May 31, 2022

eileencodes merged commit 55e143a into rails:main May 31, 2022

eileencodes added a commit that referenced this pull request May 31, 2022

Merge pull request #45218 from flavorjones/flavorjones-rhs124-html-safe

d15a694

Strings returned from `strip_tags` are correctly tagged `html_safe?`

flavorjones deleted the flavorjones-rhs124-html-safe branch May 31, 2022 14:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strings returned from `strip_tags` are correctly tagged `html_safe?` #45218

Strings returned from `strip_tags` are correctly tagged `html_safe?` #45218

flavorjones commented May 30, 2022 •

edited

jonathanhefner May 30, 2022

flavorjones May 31, 2022

p8 commented May 31, 2022

dorianmariecom commented May 31, 2022

p8 commented May 31, 2022

dorianmariecom commented May 31, 2022

flavorjones commented May 31, 2022 •

edited

eileencodes commented May 31, 2022

p8 commented May 31, 2022

flavorjones commented May 31, 2022

Strings returned from strip_tags are correctly tagged html_safe? #45218

Strings returned from strip_tags are correctly tagged html_safe? #45218

Conversation

flavorjones commented May 30, 2022 • edited

Summary

Other Information

jonathanhefner May 30, 2022

Choose a reason for hiding this comment

flavorjones May 31, 2022

Choose a reason for hiding this comment

p8 commented May 31, 2022

dorianmariecom commented May 31, 2022

p8 commented May 31, 2022

dorianmariecom commented May 31, 2022

flavorjones commented May 31, 2022 • edited

eileencodes commented May 31, 2022

p8 commented May 31, 2022

flavorjones commented May 31, 2022

Strings returned from `strip_tags` are correctly tagged `html_safe?` #45218

Strings returned from `strip_tags` are correctly tagged `html_safe?` #45218

flavorjones commented May 30, 2022 •

edited

flavorjones commented May 31, 2022 •

edited