Skip to content

Conversation

schneems
Copy link
Member

There are cases where either a customer calls html_safe or it is called internally on an already safe string. When that happens we should return self instead of a new SafeBuffer

require 'benchmark/ips'
require 'erb'
require 'active_support/core_ext/kernel/singleton_class'
require 'active_support/core_ext/string/output_safety'


string = "foo"
safe = ActiveSupport::SafeBuffer.new(string)
safe_too = safe.dup


safe.define_singleton_method(:html_safe) do
  ActiveSupport::SafeBuffer.new(self)
end 

safe_too.define_singleton_method(:html_safe) do
  @html_safe = true
  self 
end

Benchmark.ips do |x|
  x.report("safe") { safe.html_safe }
  x.report("safe_too") { safe_too.html_safe }
end

Results

Calculating -------------------------------------
                safe     45950 i/100ms
            safe_too    121646 i/100ms
-------------------------------------------------
                safe   664721.6 (±7.1%) i/s -    3308400 in   5.005413s
            safe_too  4751968.1 (±7.0%) i/s -   23720970 in   5.021281s

There are cases where either a customer calls html_safe or it is called internally on an already safe string. When that happens we should return self instead of a new SafeBuffer

```

require 'benchmark/ips'
require 'erb'
require 'active_support/core_ext/kernel/singleton_class'
require 'active_support/core_ext/string/output_safety'


string = "foo"
safe = ActiveSupport::SafeBuffer.new(string )
safe_too = safe.dup


safe.define_singleton_method(:html_safe) do
  ActiveSupport::SafeBuffer.new(self)
end 

safe_too.define_singleton_method(:html_safe) do
  @html_safe = true
  self 
end
 
Benchmark.ips do |x|
  x.report("safe") { safe.html_safe }
  x.report("safe_too") { safe_too.html_safe }
end
```

Results

```
Calculating -------------------------------------
                safe     45950 i/100ms
            safe_too    121646 i/100ms
-------------------------------------------------
                safe   664721.6 (±7.1%) i/s -    3308400 in   5.005413s
            safe_too  4751968.1 (±7.0%) i/s -   23720970 in   5.021281s
```
@rafaelfranca
Copy link
Member

We had a similar pull request last week. I don't remember what we did with
it but I think we rejected. Could you check?
On Oct 13, 2014 7:35 PM, "Richard Schneeman" notifications@github.com
wrote:

There are cases where either a customer calls html_safe or it is called
internally on an already safe string. When that happens we should return
self instead of a new SafeBuffer

require 'benchmark/ips'require 'erb'require 'active_support/core_ext/kernel/singleton_class'require 'active_support/core_ext/string/output_safety'

string = "foo"safe = ActiveSupport::SafeBuffer.new(string )safe_too = safe.dup

safe.define_singleton_method(:html_safe) do
ActiveSupport::SafeBuffer.new(self)end
safe_too.define_singleton_method(:html_safe) do
@html_safe = true
self end
Benchmark.ips do |x|
x.report("safe") { safe.html_safe }
x.report("safe_too") { safe_too.html_safe }end

Results

Calculating -------------------------------------
safe 45950 i/100ms

safe_too 121646 i/100ms

            safe   664721.6 (±7.1%) i/s -    3308400 in   5.005413s
        safe_too  4751968.1 (±7.0%) i/s -   23720970 in   5.021281s

You can merge this Pull Request by running

git pull https://github.com/schneems/rails schneems/faster-html-safe

Or view, comment on, or merge it at:

#17250
Commit Summary

  • Don't re-initialize safe string

File Changes

Patch Links:


Reply to this email directly or view it on GitHub
#17250.

@schneems
Copy link
Member Author

It was this one https://github.com/rails/rails/pull/17199/files

They tried moving up the definition of html_safe to String in addition to Object which has no net benefit. This optimization works as it removes a code path that doesn't have to be executed (we already have a SafeBuffer allocated, simply use that one).

@rafaelfranca
Copy link
Member

I wonder if this method should mutate the object it is called. When we call "x".html_safe we get a new object. I think it is weird to have this being true:

x = "x".html_safe
y = x.html_safe
x.object_id == y.object_id

@sgrif
Copy link
Contributor

sgrif commented Oct 14, 2014

That equality seems perfectly reasonable to me.

@schneems
Copy link
Member Author

Minimum viable patch defense: All the tests pass 😁

I'm with @sgrif, i took the method to be mutating an internal flag. My original thinking was that calling to_s on a string doesn't return a new string if the target is already a string:

x = "x".to_s
y = x.to_s
x.object_id == y.object_id
# => true

However, I also agree that someone may have written code somewhere that depends on this behavior. These subtle changes can be the worst 😦 On the other hand html_safe wasn't documented and still isn't. We should probably fix that.

@matthewd
Copy link
Member

@rafaelfranca was talking about #17206 -- which sounds like a much safer option than this.

Having html_safe suddenly become a mutator is not a subtle change, any more than swapping the behaviour of gsub and gsub! would be. I don't believe this approach is viable.

The actually-subtle issue -- sometimes returning the same object -- seems more open to interpretation. Like @sgrif, I don't find it objectionable per se... but the compatibility issue certainly looms.

@egilburg
Copy link
Contributor

@matthewd there is precedence in conditional mutation, to_s returns same object if called already on string.

@matthewd
Copy link
Member

@egilburg to_s doesn't modify the string you call it on, ever.

@egilburg
Copy link
Contributor

@matthewd sorry good point, I mixed mutation with conditional dup

@rafaelfranca
Copy link
Member

@matthewd Yes. That one.

html_safe is not public API. AFAIK people should not use it in applications but I have seem it being used a lot, everywhere. Like @NZKoz always says, you should not call html_safe ever. This is why I believe it should remain undocumented and that #17206 (comment) is a better option.

@schneems
Copy link
Member Author

#17206 alternative is as fast for the case where your string is already safe, but still requires an initialization when not safe, in that case this method is much faster (even when not considering the additional html_safe? method call.

require 'benchmark/ips'
require 'erb'
require 'active_support/core_ext/kernel/singleton_class'
require 'active_support/core_ext/string/output_safety'


STRING = "foo"
SAFE     = STRING.html_safe
SAFE_TOO = STRING.html_safe

SAFE_TOO.define_singleton_method(:html_safe) do
  @html_safe = true
  self 
end

Benchmark.ips do |x|
  x.report("super") { SAFE.html_safe }
  x.report("return-self") { SAFE_TOO.html_safe }
end

result

Calculating -------------------------------------
               super     49631 i/100ms
         return-self    123500 i/100ms
-------------------------------------------------
               super   763078.3 (±8.4%) i/s -    3821587 in   5.046523s
         return-self  5009277.4 (±6.8%) i/s -   24947000 in   5.014187s

@rafaelfranca
Copy link
Member

It is faster, but is it safer to include in a minor release? People may expect that calling html_safe will always return a new object, and changing this may break some applications.

@schneems
Copy link
Member Author

I don't see how the sometimes new object behavior would be less confusing. I also don't see either PR as being a very backwards compatible change. Even if it's not a public method, it's used internally and contributors should have some reference instead of guessing behavior. Let's document it and call it out as explicitly not for public consumption. Maybe we can even document the correct way people should be using the API.

@matthewd
Copy link
Member

@foo = '<b>text</b>'
..
The markup "<%= @foo %>" will produce: <%= raw @foo %><br>
That markup again is: "<%= @foo %>"

@matthewd
Copy link
Member

(okay, not literally that, because @foo would need to become an unsafe SafeBuffer, but you get my point)

@egilburg
Copy link
Contributor

@rafaelfranca

A lot of guides on the web (e.g. one from 2010 by Yehuda Katz) recommend calling #html_safe too ( http://yehudakatz.com/2010/02/01/safebuffers-and-rails-3-0/ )

If we shouldn't call #html_safe, what is the supposed way to mark a hand-crafted string as "do not escape this, it contains HTML but I know its safe"? is the raw above the correct usage example? What if I actually do want to mutate the string as HTML-safe once and not have to call raw every place it's used?

@NZKoz
Copy link
Member

NZKoz commented Oct 14, 2014

@egilburg <%=raw is clear and states your intention, it also means it's called pretty much exactly where it's going to shoot you in the foot.

However, if you want to call html_safe, feel free, it's just that about 99% of the time I review apps, the html_safe call is an XSS bug. Use raw, and the tag/content_tag helpers and you'll find your code is a bit more verbose, but way less likely to bone you 😄

@NZKoz
Copy link
Member

NZKoz commented Oct 14, 2014

@schneems prior to yehuda's refactoring to use a string subclass, we used essentially this approach. There were two methods. #html_safe returned a new safe string, and #html_safe! mutated the existing string to mark it safe.

Changing #html_safe to change internal state, seems both 'icky' and risky. Do you have a benchmark showing real-world gains from this change?

@egilburg
Copy link
Contributor

@NZKoz thanks for explanation.

BTW The I18n gem supports magic keys with name something_html which is not escaped by default even if you don't use raw(). Does it use behind the scenes .html_safe to implement?

@NZKoz
Copy link
Member

NZKoz commented Oct 14, 2014

@egilburg yep, but check out how html_safe is implemented

The issue is not the function html_safe, but rather marking something as safe, rather than relying on the escaping code to do it for you, often leads to mistakes.

@egilburg
Copy link
Contributor

@NZKoz thanks.

If the goal is to support mutative html_safe while not breaking compatibility for those not explicitly using it, perhaps revert to having.html_safe! method, with html_safe calling .dup.html_safe! ?

@NZKoz
Copy link
Member

NZKoz commented Oct 14, 2014

Not sure that buys you very much though as you can't implement html_safe! on the other classes, so callers would have to basically checkout respond_to? first

@schneems
Copy link
Member Author

So...my benchmarks are all over the place here (on codetriage.com). It seems like this patch helps, but it's not enough to be reproducible at a level i'm comfortable with. Based on this info, i'm going to close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants