Don't re-initialize safe string #17250

schneems · 2014-10-13T22:35:44Z

There are cases where either a customer calls html_safe or it is called internally on an already safe string. When that happens we should return self instead of a new SafeBuffer

require 'benchmark/ips'
require 'erb'
require 'active_support/core_ext/kernel/singleton_class'
require 'active_support/core_ext/string/output_safety'


string = "foo"
safe = ActiveSupport::SafeBuffer.new(string)
safe_too = safe.dup


safe.define_singleton_method(:html_safe) do
  ActiveSupport::SafeBuffer.new(self)
end 

safe_too.define_singleton_method(:html_safe) do
  @html_safe = true
  self 
end

Benchmark.ips do |x|
  x.report("safe") { safe.html_safe }
  x.report("safe_too") { safe_too.html_safe }
end

Results

Calculating -------------------------------------
                safe     45950 i/100ms
            safe_too    121646 i/100ms
-------------------------------------------------
                safe   664721.6 (±7.1%) i/s -    3308400 in   5.005413s
            safe_too  4751968.1 (±7.0%) i/s -   23720970 in   5.021281s

There are cases where either a customer calls html_safe or it is called internally on an already safe string. When that happens we should return self instead of a new SafeBuffer ``` require 'benchmark/ips' require 'erb' require 'active_support/core_ext/kernel/singleton_class' require 'active_support/core_ext/string/output_safety' string = "foo" safe = ActiveSupport::SafeBuffer.new(string ) safe_too = safe.dup safe.define_singleton_method(:html_safe) do ActiveSupport::SafeBuffer.new(self) end safe_too.define_singleton_method(:html_safe) do @html_safe = true self end Benchmark.ips do |x| x.report("safe") { safe.html_safe } x.report("safe_too") { safe_too.html_safe } end ``` Results ``` Calculating ------------------------------------- safe 45950 i/100ms safe_too 121646 i/100ms ------------------------------------------------- safe 664721.6 (±7.1%) i/s - 3308400 in 5.005413s safe_too 4751968.1 (±7.0%) i/s - 23720970 in 5.021281s ```

rafaelfranca · 2014-10-13T22:38:32Z

We had a similar pull request last week. I don't remember what we did with
it but I think we rejected. Could you check?
On Oct 13, 2014 7:35 PM, "Richard Schneeman" notifications@github.com
wrote:

There are cases where either a customer calls html_safe or it is called
internally on an already safe string. When that happens we should return
self instead of a new SafeBuffer

require 'benchmark/ips'require 'erb'require 'active_support/core_ext/kernel/singleton_class'require 'active_support/core_ext/string/output_safety'

string = "foo"safe = ActiveSupport::SafeBuffer.new(string )safe_too = safe.dup

safe.define_singleton_method(:html_safe) do
ActiveSupport::SafeBuffer.new(self)end
safe_too.define_singleton_method(:html_safe) do
@html_safe = true
self end
Benchmark.ips do |x|
x.report("safe") { safe.html_safe }
x.report("safe_too") { safe_too.html_safe }end

Results

Calculating -------------------------------------
safe 45950 i/100ms

safe_too 121646 i/100ms
            safe   664721.6 (±7.1%) i/s -    3308400 in   5.005413s
        safe_too  4751968.1 (±7.0%) i/s -   23720970 in   5.021281s
You can merge this Pull Request by running

git pull https://github.com/schneems/rails schneems/faster-html-safe

Or view, comment on, or merge it at:

#17250
Commit Summary

Don't re-initialize safe string

File Changes

M activesupport/lib/active_support/core_ext/string/output_safety.rb
https://github.com/rails/rails/pull/17250/files#diff-0 (5)

Patch Links:

https://github.com/rails/rails/pull/17250.patch

https://github.com/rails/rails/pull/17250.diff

—
Reply to this email directly or view it on GitHub
#17250.

schneems · 2014-10-13T22:52:03Z

It was this one https://github.com/rails/rails/pull/17199/files

They tried moving up the definition of html_safe to String in addition to Object which has no net benefit. This optimization works as it removes a code path that doesn't have to be executed (we already have a SafeBuffer allocated, simply use that one).

rafaelfranca · 2014-10-13T23:59:30Z

I wonder if this method should mutate the object it is called. When we call "x".html_safe we get a new object. I think it is weird to have this being true:

x = "x".html_safe
y = x.html_safe
x.object_id == y.object_id

sgrif · 2014-10-14T03:14:05Z

That equality seems perfectly reasonable to me.

schneems · 2014-10-14T04:33:06Z

Minimum viable patch defense: All the tests pass 😁

I'm with @sgrif, i took the method to be mutating an internal flag. My original thinking was that calling to_s on a string doesn't return a new string if the target is already a string:

x = "x".to_s
y = x.to_s
x.object_id == y.object_id
# => true

However, I also agree that someone may have written code somewhere that depends on this behavior. These subtle changes can be the worst 😦 On the other hand html_safe wasn't documented and still isn't. We should probably fix that.

matthewd · 2014-10-14T05:07:58Z

@rafaelfranca was talking about #17206 -- which sounds like a much safer option than this.

Having html_safe suddenly become a mutator is not a subtle change, any more than swapping the behaviour of gsub and gsub! would be. I don't believe this approach is viable.

The actually-subtle issue -- sometimes returning the same object -- seems more open to interpretation. Like @sgrif, I don't find it objectionable per se... but the compatibility issue certainly looms.

egilburg · 2014-10-14T08:27:31Z

@matthewd there is precedence in conditional mutation, to_s returns same object if called already on string.

matthewd · 2014-10-14T08:32:13Z

@egilburg to_s doesn't modify the string you call it on, ever.

egilburg · 2014-10-14T08:59:12Z

@matthewd sorry good point, I mixed mutation with conditional dup

rafaelfranca · 2014-10-14T13:35:05Z

@matthewd Yes. That one.

html_safe is not public API. AFAIK people should not use it in applications but I have seem it being used a lot, everywhere. Like @NZKoz always says, you should not call html_safe ever. This is why I believe it should remain undocumented and that #17206 (comment) is a better option.

schneems · 2014-10-14T14:18:19Z

#17206 alternative is as fast for the case where your string is already safe, but still requires an initialization when not safe, in that case this method is much faster (even when not considering the additional html_safe? method call.

require 'benchmark/ips'
require 'erb'
require 'active_support/core_ext/kernel/singleton_class'
require 'active_support/core_ext/string/output_safety'


STRING = "foo"
SAFE     = STRING.html_safe
SAFE_TOO = STRING.html_safe

SAFE_TOO.define_singleton_method(:html_safe) do
  @html_safe = true
  self 
end

Benchmark.ips do |x|
  x.report("super") { SAFE.html_safe }
  x.report("return-self") { SAFE_TOO.html_safe }
end

result

Calculating -------------------------------------
               super     49631 i/100ms
         return-self    123500 i/100ms
-------------------------------------------------
               super   763078.3 (±8.4%) i/s -    3821587 in   5.046523s
         return-self  5009277.4 (±6.8%) i/s -   24947000 in   5.014187s

rafaelfranca · 2014-10-14T14:34:39Z

It is faster, but is it safer to include in a minor release? People may expect that calling html_safe will always return a new object, and changing this may break some applications.

schneems · 2014-10-14T15:24:26Z

I don't see how the sometimes new object behavior would be less confusing. I also don't see either PR as being a very backwards compatible change. Even if it's not a public method, it's used internally and contributors should have some reference instead of guessing behavior. Let's document it and call it out as explicitly not for public consumption. Maybe we can even document the correct way people should be using the API.

matthewd · 2014-10-14T15:40:02Z

@foo = '<b>text</b>'
..
The markup "<%= @foo %>" will produce: <%= raw @foo %><br>
That markup again is: "<%= @foo %>"

matthewd · 2014-10-14T15:41:12Z

(okay, not literally that, because @foo would need to become an unsafe SafeBuffer, but you get my point)

egilburg · 2014-10-14T15:58:02Z

@rafaelfranca

A lot of guides on the web (e.g. one from 2010 by Yehuda Katz) recommend calling #html_safe too ( http://yehudakatz.com/2010/02/01/safebuffers-and-rails-3-0/ )

If we shouldn't call #html_safe, what is the supposed way to mark a hand-crafted string as "do not escape this, it contains HTML but I know its safe"? is the raw above the correct usage example? What if I actually do want to mutate the string as HTML-safe once and not have to call raw every place it's used?

NZKoz · 2014-10-14T20:03:12Z

@egilburg <%=raw is clear and states your intention, it also means it's called pretty much exactly where it's going to shoot you in the foot.

However, if you want to call html_safe, feel free, it's just that about 99% of the time I review apps, the html_safe call is an XSS bug. Use raw, and the tag/content_tag helpers and you'll find your code is a bit more verbose, but way less likely to bone you 😄

NZKoz · 2014-10-14T20:07:02Z

@schneems prior to yehuda's refactoring to use a string subclass, we used essentially this approach. There were two methods. #html_safe returned a new safe string, and #html_safe! mutated the existing string to mark it safe.

Changing #html_safe to change internal state, seems both 'icky' and risky. Do you have a benchmark showing real-world gains from this change?

egilburg · 2014-10-14T20:11:40Z

@NZKoz thanks for explanation.

BTW The I18n gem supports magic keys with name something_html which is not escaped by default even if you don't use raw(). Does it use behind the scenes .html_safe to implement?

NZKoz · 2014-10-14T20:20:35Z

@egilburg yep, but check out how html_safe is implemented

The issue is not the function html_safe, but rather marking something as safe, rather than relying on the escaping code to do it for you, often leads to mistakes.

egilburg · 2014-10-14T20:23:49Z

@NZKoz thanks.

If the goal is to support mutative html_safe while not breaking compatibility for those not explicitly using it, perhaps revert to having.html_safe! method, with html_safe calling .dup.html_safe! ?

NZKoz · 2014-10-14T20:31:38Z

Not sure that buys you very much though as you can't implement html_safe! on the other classes, so callers would have to basically checkout respond_to? first

schneems · 2014-10-14T22:27:00Z

So...my benchmarks are all over the place here (on codetriage.com). It seems like this patch helps, but it's not enough to be reproducible at a level i'm comfortable with. Based on this info, i'm going to close this issue.

rafaelfranca added the activesupport label Oct 14, 2014

schneems closed this Oct 14, 2014

schneems mentioned this pull request Oct 27, 2014

Missed optimizations rack/rack#742

Merged

Don't re-initialize safe string #17250

Don't re-initialize safe string #17250

Uh oh!

Conversation

schneems commented Oct 13, 2014

Uh oh!

rafaelfranca commented Oct 13, 2014

safe_too 121646 i/100ms

Uh oh!

schneems commented Oct 13, 2014

Uh oh!

rafaelfranca commented Oct 13, 2014

Uh oh!

sgrif commented Oct 14, 2014

Uh oh!

schneems commented Oct 14, 2014

Uh oh!

matthewd commented Oct 14, 2014

Uh oh!

egilburg commented Oct 14, 2014

Uh oh!

matthewd commented Oct 14, 2014

Uh oh!

egilburg commented Oct 14, 2014

Uh oh!

rafaelfranca commented Oct 14, 2014

Uh oh!

schneems commented Oct 14, 2014

Uh oh!

rafaelfranca commented Oct 14, 2014

Uh oh!

schneems commented Oct 14, 2014

Uh oh!

matthewd commented Oct 14, 2014

Uh oh!

matthewd commented Oct 14, 2014

Uh oh!

egilburg commented Oct 14, 2014

Uh oh!

NZKoz commented Oct 14, 2014

Uh oh!

NZKoz commented Oct 14, 2014

Uh oh!

egilburg commented Oct 14, 2014

Uh oh!

NZKoz commented Oct 14, 2014

Uh oh!

egilburg commented Oct 14, 2014

Uh oh!

NZKoz commented Oct 14, 2014

Uh oh!

schneems commented Oct 14, 2014

Uh oh!

Uh oh!