Beyond Ludicrous Speed #21057

Merged
merged 22 commits into from Jul 30, 2015

Projects

None yet
@schneems
Member

I've been working on quite a few performance improvements to Rails and the Rails ecosystem, but this is by far the single biggest set of improvements I've been able to make. Those other changes have maybe shaved off a few objects here and there, and maybe 1 or 2% faster request time if I'm extremely lucky.

This change shaves off 34,299 objects and 3,457,318 bytes (3.29 MiB) allocated on every request. This is a 29% decrease in objects and a 23% decrease in memory allocated per request. Taking us past ludicrous speed...

So just how much does this improve overall request time? To measure I used http://www.codetriage.com as an example app and https://github.com/schneems/derailed_benchmarks to generate load. I hit the app with 1,000 requests with my patch and then 1,000 requests against Rails master. I repeated several times until I saw a fairly stable standard deviation.

The average time for 1,000 request against master was 221.88 seconds and against this patch was 195.401 seconds. This gives us a...

11.9 % speed improvement

Where does that speed improvement come from? The big theme is not allocating objects when we don't need to. I used $ derailed exec perf:objects which uses memory_profiler to look for large pockets of allocated memory against http://www.codetriage.com. You can get an in-depth explanation of each change in the commit along with a benchmark of objects and memory saved. The largest area I was able to make gains on was link generation, by removing extraneous arrays, duplicated hashes, and getting rid of intermediate objects whenever possible. I measured this speedup directly:

# $ bin/rails console

require 'benchmark/ips'
repo = Repo.first

Benchmark.ips do |x|
  x.report("url_for speed") {
    app.url_for(repo)
  }
end

This showed 2,865 iterations per second on master and 4,143 iterations per second on my patch. Which is a 44% speed improvement in url generation.

These performance improvements preserve existing interfaces and behaviors, all tests pass.

@rafaelfranca
Member

If you want just to run tests you can always push to rails/rails since you have commit access.

schneems added some commits Jul 24, 2015
@schneems schneems Decrease string allocations on AR#respond_to?
When a symbol is passed in, we call `to_s` on it which allocates a string. The two hardcoded symbols that are used internally are `:to_partial_path` and `:to_model`.

This change buys us 71,136 bytes of memory and 1,777 fewer objects per request.
f80aa59
@schneems schneems Decrease string allocations in apply_inflections
In `apply_inflections` a string is down cased and some whitespace stripped in the front (which allocate strings). This would normally be fine, however `uncountables` is a fairly small array (10 elements out of the box) and this method gets called a TON. Instead we can keep an array of valid regexes for each uncountable so we don't have to allocate new strings.

This change buys us 325,106 bytes of memory and 3,251 fewer objects per request.
1bf50ba
@schneems schneems Decrease string allocations in url_options
The request.script_name is dup-d which allocates an extra string. It is most commonly an empty string "". We can save a ton of string allocations by checking first if the string is empty, if so we can use a frozen empty string instead of duplicating an empty string.

This change buys us 35,714 bytes of memory and 893 fewer objects per request.
83ee043
@schneems schneems Speed up journey extract_parameterized_parts
Micro optimization: `reverse.drop_while` is slower than `reverse_each.drop_while`. This doesn't save any object allocations.

Second, `keys_to_keep` is typically a very small array. The operation `parameterized_parts.keys - keys_to_keep` actually allocates two arrays. It is quicker (I benchmarked) to iterate over each and check inclusion in array manually.

This change buys us 1774 fewer objects per request
9b82588
@schneems schneems Speed up journey missing_keys
Most routes have a `route.path.requirements[key]` of `/[-_.a-zA-Z0-9]+\/[-_.a-zA-Z0-9]+/` yet every time this method is called a new regex is generated on the fly with `/\A#{DEFAULT_INPUT}\Z/`. OBJECT ALLOCATIONS BLERG!

This change uses a special module that implements `===` so it can be used in a case statement to pull out the default input. When this happens, we use a pre-generated regex.

This change buys us 1,643,465 bytes of memory and 7,990 fewer objects per request.
097ec6f
@schneems schneems Decrease route_set allocations
In handle_positional_args `Array#-=` is used which allocates a new array. Instead we can iterate through and delete elements, modifying the array in place.

Also `Array#take` allocates a new array. We can build the same by iterating over the other element.

This change buys us 106,470 bytes of memory and 2,663 fewer objects per request.
0cbec58
@schneems schneems Reduce hash allocations in route_set
When generating a url with `url_for` the hash of arguments passed in, is dup-d and merged a TON. I wish I could clean this up better, and might be able to do it in the future. This change removes one dup, since it's literally right after we just dup-d the hash to pass into this constructor.

This may be a breaking, change but the tests pass...so :shipit: we can revert if it causes problems

This change buys us 205,933 bytes of memory and 887 fewer objects per request.
1a14074
@schneems schneems Decrease string allocation in content_tag_string
When an unknonwn key is passed to the hash in `PRE_CONTENT_STRINGS` it returns nil, when you call "#{nil}" it allocates a new empty string. We can get around this allocation by using a default value `Hash.new { "".freeze }`. We can avoid the `to_sym` call by pre-populating the hash with a symbol key in addition to a string key.

We can freeze some strings when using Array#* to reduce allocations.

Array#join can take frozen strings.

This change buys us 86,600 bytes of memory and 1,857 fewer objects per request.
2a4d430
@schneems schneems Optimize hash key
No idea why on earth this hash key isn't already optimized by MRI, but it isn't. 💩

This change buys us 74,077 bytes of memory and 1,852 fewer objects per request.
2e95d2e
@schneems schneems Cut string ActionView template allocations
The instrument method creates new strings, the most common action to instrument is "!render_template` so we can detect when that action is occurring and use a frozen string instead.

This change buys us 113,714 bytes of memory and 1,790 fewer objects per request.
9b189a3
@schneems schneems Cut string allocations in content_tag_string
content_tag's first argument is will generate a string with an html tag so `:a` will generate: `<a></a>`. When this happens, the symbol is implicitly `to_s`-d so a new string is allocated. We can get around that by using a frozen string instead which

This change buys us 74,236 bytes of memory and 1,855 fewer objects per request.
e76a843
@schneems schneems changed the title from Running tests to Beyond Ludicrous Speed Jul 30, 2015
@schneems
Member

Build is green, for some reason it's not showing up on the PR https://travis-ci.org/rails/rails/builds/73305474

@jeremy
Member
jeremy commented Jul 30, 2015

Nice work @schneems !

@vipulnsward
Member

🏇

@arthurnn
Member

Squash! =)

@rafaelfranca
Member

:shipit:

@rafaelfranca
Member

Really nice work!

@arthurnn
Member

❤️ ❤️ ❤️ ❤️ ❤️

@jeremy jeremy and 2 others commented on an outdated diff Jul 30, 2015
actionpack/lib/action_dispatch/journey/formatter.rb
@@ -33,6 +33,7 @@ def generate(name, options, path_parameters, parameterize = nil)
defaults = route.defaults
required_parts = route.required_parts
parameterized_parts.delete_if do |key, value|
+ next if defaults[key].nil?
@jeremy
jeremy Jul 30, 2015 Member

Combining control flow (next as early-return) and a boolean expression feels a little awkward. How about folding the not-nil check into the expression?

@schneems
schneems Jul 30, 2015 Member

It becomes a bit long for my tastes, also requires a bang expression, which I try to avoid:

!defaults[key].nil? && value.to_s == defaults[key].to_s && !required_parts.include?(key)

You make the call. Which do you prefer?

@matthewd
matthewd Jul 30, 2015 Member

I haven't really considered its merits (or checked it's right 😉), but here's a less negative spelling of that:

parameterized_parts.keep_if do |key, value|
  defaults[key].nil? ||
    value.to_s != defaults[key].to_s ||
    required_parts.include?(key)
end
@jeremy
jeremy Jul 30, 2015 Member

Could flip to keep_if to simplify the expression a bit.

@schneems
schneems Jul 30, 2015 Member

Thanks! action patch tests pass with that for me. seems good, somehow removing the bangs makes it okay to be on one line (for me). I pushed this change. Will wait on the whole suite to run again to be sure.

@pixeltrix
Member

@schneems well done 👏

@jeremy jeremy and 1 other commented on an outdated diff Jul 30, 2015
actionpack/lib/action_dispatch/journey/formatter.rb
@@ -33,6 +33,7 @@ def generate(name, options, path_parameters, parameterize = nil)
defaults = route.defaults
required_parts = route.required_parts
parameterized_parts.delete_if do |key, value|
+ next if defaults[key].nil?
value.to_s == defaults[key].to_s && !required_parts.include?(key)
@jeremy
jeremy Jul 30, 2015 Member

Is value ever nil?

@schneems
schneems Jul 30, 2015 Member

checked via actionpack tests and with my own app, value is never nil.

@jeremy jeremy and 1 other commented on an outdated diff Jul 30, 2015
actionpack/lib/action_dispatch/journey/formatter.rb
!options.key?(part) || (options[part] || recall[part]).nil?
} | route.required_parts
- (parameterized_parts.keys - keys_to_keep).each do |bad_key|
+ parameterized_parts.each do |bad_key, _|
+ next if keys_to_keep.include?(bad_key)
parameterized_parts.delete(bad_key)
@jeremy
jeremy Jul 30, 2015 Member

Could switch to a parameterized_parts.delete_if { … ?

@schneems
schneems Jul 30, 2015 Member

Pretty sure the logic would be the same, I think you could do that delete_if is slightly faster...

Calculating -------------------------------------
        each; delete    35.166k i/100ms
           delete_if    36.416k i/100ms
-------------------------------------------------
        each; delete    478.026k (± 8.5%) i/s -      2.391M
           delete_if    485.123k (± 7.9%) i/s -      2.440M
@schneems
schneems Jul 30, 2015 Member

I'm already re-running tests, so I added this in. Thanks!

@jeremy jeremy commented on the diff Jul 30, 2015
actionpack/lib/action_dispatch/journey/formatter.rb
tests = route.path.requirements
route.required_parts.each { |key|
- if tests.key?(key)
- missing_keys << key unless /\A#{tests[key]}\Z/ === parts[key]
+ case tests[key]
+ when nil
@jeremy
jeremy Jul 30, 2015 Member

Changes a presence check to a nil check—can this value legitimately be nil?

@schneems
schneems Jul 30, 2015 Member

I don't think you can have a required key be nil. I ran this against the test suite and came up with nothing:

          puts tests.inspect if tests.values.include?(nil)
@jeremy jeremy commented on the diff Jul 30, 2015
actionpack/lib/action_dispatch/routing/route_set.rb
@@ -617,7 +621,7 @@ def current_controller
def use_recall_for(key)
if @recall[key] && (!@options.key?(key) || @options[key] == @recall[key])
if !named_route_exists? || segment_keys.include?(key)
- @options[key] = @recall.delete(key)
+ @options[key] = @recall[key]
@jeremy
jeremy Jul 30, 2015 Member

Red flag, kind of thing that may cause subtle/untested regressions.

@schneems
schneems Jul 30, 2015 Member

I agree, it raised some red flags with me when I first did it. Here's the specific commit:

schneems@9060b92

It's a fairly large savings in memory. Here's why I think it's okay

This method moves a key/value pair from recall to options

def use_recall_for(key)
  if @recall[key] && (!@options.key?(key) || @options[key] == @recall[key])
    if !named_route_exists? || segment_keys.include?(key)
      @options[key] = @recall[key]
    end
  end
end

I changed it so that recall is preserved i.e. the value is copied not "moved." This method gets called 3 times for the keys :controller, :action, and :id:

# This pulls :controller, :action, and :id out of the recall.
# The recall key is only used if there is no key in the options
# or if the key in the options is identical. If any of
# :controller, :action or :id is not found, don't pull any
# more keys from the recall.
def normalize_controller_action_id!
  use_recall_for(:controller) or return
  use_recall_for(:action) or return
  use_recall_for(:id)
end

Based on the comment every time options would have a key, it should be favored over recall, so it doesn't matter that recall also has the key. But let's not trust comments, let's look at the code. After we are done in this class the formatter is called:

@set.formatter.generate(named_route, options, recall, PARAMETERIZE)

Here the first thing that is done is to merge the two hashes:

def generate(name, options, path_parameters, parameterize = nil)
  constraints = path_parameters.merge(options)

So whether we delete the key in recall (which becomes path_parameters) the constraints hash will be the same.

You might be thinking "well i bet path_parameters is uesed somewhere else" and you would be right

It's used here:

parameterized_parts = extract_parameterized_parts(route, options, path_parameters, parameterize)

Yet once again, it's immediately merged:

def extract_parameterized_parts(route, options, recall, parameterize = nil)
  parameterized_parts = recall.merge(options)

In both cases we will use the key in options if it exists. I'm pretty comfortable with this change, but I would like some more eyes. Maybe I missed a really subtle and untested use-case, for which we should certainly add some tests.

@jeremy
jeremy Jul 30, 2015 Member

@tenderlove stubbed some toes on the same thing—ring a bell?

@schneems
schneems Jul 30, 2015 Member

Via campfire AP says he's OK with it. If any weirdness comes up on master about URL generation for the next while i'll be happy to take a look.

schneems added some commits Jul 25, 2015
@schneems schneems Avoid calling to_s on nil in journey/formatter
When `defaults[key]` in `generate` in the journey formatter is called, it often returns a `nil` when we call `to_s` on a nil, it allocates an empty string. We can skip this check when the default value is nil.

This change buys us 35,431 bytes of memory and 887 fewer objects per request.

Thanks to @matthewd for help with the readability
bff61ba
@schneems schneems Freeze a string in comparator
Saves 888 string objects per request.
045cdd3
@schneems schneems Only allocate new string when needed
Instead of calling `sub` on every link_to call for controller, we can detect when the string __needs__ to be allocated and only then create a new string (without the leading slash), otherwise, use the string that is given to us.

Saves 888 string objects per request, 35,524 bytes.
3fb9e80
@schneems schneems Avoid hash duplication by skipping mutation
If we don't mutate the `recall` hash, then there's no reason to duplicate it. While this change doesn't get rid of that many objects, each hash object it gets rid of was massive.

Saves 888 string objects per request, 206,013 bytes (thats 0.2 mb which is kinda a lot).
1993e2c
@schneems schneems Decrease allocations in transliterate
We can save a few objects by freezing the `replacement` string. We save a few more by down-casing the string in memory instead of allocating a new one. We save far more objects by checking for the default separator `"-"`, and using pre-generated regular expressions.

We will save 209,231 bytes and 1,322 objects.
57ba9cb
@schneems schneems String#freeze optimizations 0d7a714
@schneems schneems Don't allocate array when not necessary
In the `tag_options` method an array is used to build up elements, then `Array#*` (which is an alias for `Array#join` is called to turn the array into a string. Instead of allocating an array to build a string, we can build the string we want from the beginning.

Saved: 121,743 bytes 893 objects
1f831fe
@schneems schneems zOMG 37 objects saved 005541b
@schneems schneems Remove array allocation
THe only reason we were allocating an array is to get the "missing_keys" variable in scope of the error message generator. Guess what? Arrays kinda take up a lot of memory, so by replacing that with a nil, we save:

35,303 bytes and 886 objects per request
4d2ccc1
@schneems schneems Remove (another) array allocation
We don't always need an array when generating a url with the formatter. We can be lazy about allocating the `missing_keys` array. This saves us:

35,606 bytes and 889 objects per request
61dae88
@schneems schneems Use delete_if instead of each; delete(key)
It is slightly faster:

```
Calculating -------------------------------------
        each; delete    35.166k i/100ms
           delete_if    36.416k i/100ms
-------------------------------------------------
        each; delete    478.026k (± 8.5%) i/s -      2.391M
           delete_if    485.123k (± 7.9%) i/s -      2.440M
```
22f5924
@schneems schneems merged commit 5373bf2 into rails:master Jul 30, 2015

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@byroot byroot commented on the diff Jul 30, 2015
actionpack/lib/action_dispatch/journey/formatter.rb
@@ -25,22 +25,22 @@ def generate(name, options, path_parameters, parameterize = nil)
next unless name || route.dispatcher?
missing_keys = missing_keys(route, parameterized_parts)
- next unless missing_keys.empty?
+ next if missing_keys && missing_keys.any?
@byroot
byroot Jul 30, 2015 Member

Isn't any? doing quite a lot more than !empty?.

empty_array = []
small_array = [1] * 30
bigger_array = [1] * 300

Benchmark.ips do |x|
  x.report('empty !empty?') { !empty_array.empty? }
  x.report('small !empty?') { !small_array.empty? }
  x.report('bigger !empty?') { !bigger_array.empty? }

  x.report('empty any?') { empty_array.any? }
  x.report('small any?') { small_array.any? }
  x.report('bigger any?') { bigger_array.any? }
end
Calculating -------------------------------------
       empty !empty?   132.059k i/100ms
       small !empty?   133.974k i/100ms
      bigger !empty?   133.848k i/100ms
          empty any?   106.924k i/100ms
          small any?    85.525k i/100ms
         bigger any?    86.663k i/100ms
-------------------------------------------------
       empty !empty?      8.522M (± 7.9%) i/s -     42.391M
       small !empty?      8.501M (± 8.5%) i/s -     42.202M
      bigger !empty?      8.434M (± 8.6%) i/s -     41.894M
          empty any?      4.161M (± 8.3%) i/s -     20.743M
          small any?      2.654M (± 5.2%) i/s -     13.256M
         bigger any?      2.642M (± 6.4%) i/s -     13.173M
@schneems
schneems Jul 30, 2015 Member

You're correct, I never knew. it looks like the big difference is this line: https://github.com/ruby/ruby/blob/trunk/array.c#L5524

I'm actually looking to see if we can optimize the case of any? a little for when the array is empty. It will still be slightly slower though. I think it would make sense for this to be:

if missing_keys && !missing_keys.empty?

Want to give me a PR and @ mention me?

@byroot byroot added a commit to byroot/rails that referenced this pull request Jul 30, 2015
@byroot byroot Array#any? is slower and not the inverse of Array#empty?
```
empty_array = []
small_array = [1] * 30
bigger_array = [1] * 300

Benchmark.ips do |x|
  x.report('empty !empty?') { !empty_array.empty? }
  x.report('small !empty?') { !small_array.empty? }
  x.report('bigger !empty?') { !bigger_array.empty? }

  x.report('empty any?') { empty_array.any? }
  x.report('small any?') { small_array.any? }
  x.report('bigger any?') { bigger_array.any? }
end
```

```
Calculating -------------------------------------
       empty !empty?   132.059k i/100ms
       small !empty?   133.974k i/100ms
      bigger !empty?   133.848k i/100ms
          empty any?   106.924k i/100ms
          small any?    85.525k i/100ms
         bigger any?    86.663k i/100ms
-------------------------------------------------
       empty !empty?      8.522M (± 7.9%) i/s -     42.391M
       small !empty?      8.501M (± 8.5%) i/s -     42.202M
      bigger !empty?      8.434M (± 8.6%) i/s -     41.894M
          empty any?      4.161M (± 8.3%) i/s -     20.743M
          small any?      2.654M (± 5.2%) i/s -     13.256M
         bigger any?      2.642M (± 6.4%) i/s -     13.173M
```

Ref: rails#21057 (comment)
32133db
@JMD1986
JMD1986 commented Jul 31, 2015

nice!

@pcreux
Contributor
pcreux commented Jul 31, 2015

👍 ❤️ 💚 💛

@NikoRoberts

👍 noice

@shakycode

This is sick! Great work!

@pacoguzman pacoguzman commented on the diff Aug 1, 2015
activerecord/lib/active_record/attribute_methods.rb
@@ -230,7 +230,15 @@ def column_for_attribute(name)
# person.respond_to(:nothing) # => false
def respond_to?(name, include_private = false)
return false unless super
- name = name.to_s
+
+ case name
@pacoguzman
pacoguzman Aug 1, 2015 Contributor

this case statement knows a lot about its ecosystem. in the whole file there is no mention to to_partial_path or to_model. Could be difficult to read it is just performance. For me this goes too far as optimization

@schneems
schneems Aug 1, 2015 Member

The only time we explicitly call reponds_to in the code uses those two symbols. Yes it knows a lot about the ecosystem, it's an optimization. This code gets called many times with those two symbols, and the knowledge of this saves us a good amount of memory and time. With this change, you're looking at 71,136 bytes of memory and 1,777 objects saved per request:

require 'benchmark/ips'

Benchmark.ips do |x|
  x.report("Symbol#to_s") {
    name = :to_partial_path.to_s
  }
  x.report("case") {
    case :to_partial_path
    when :to_partial_path
      name = "to_partial_path".freeze
    end
  }
end

Also a pretty nice speed boost

Calculating -------------------------------------
         Symbol#to_s   122.411k i/100ms
                case   132.355k i/100ms
-------------------------------------------------
         Symbol#to_s      5.456M (±11.1%) i/s -     26.930M
                case      7.838M (±12.5%) i/s -     38.515M

Or roughly 43% faster.

I encourage you to doubt people's "performance optimizations", to do so you must argue that the benefit is not worth the cost (slightly more code). Please ping me on a PR If you re-write this to be faster without sacrificing purity or readability,

@pacoguzman
pacoguzman Aug 1, 2015 Contributor

You're doing an awesome work, but I'm not the one that has to decide if the benefit is not worth the cost on terms of purity or readability, anyway why don't you call respond_to? in those places passing a "string".freeze as argument instead a symbol

@schneems
schneems Aug 1, 2015 Member

We're calling respond_to? against an argument passed in. This is to see if the object is a model. A normal ruby Object needs a symbol key for that method. Changing to a frozen string would be a breaking change.

@pacoguzman
pacoguzman Aug 1, 2015 Contributor

Nevermind the string passed is converted to a symbol so we're in the same case probably. So end this probably there is not a "solution" that satisfy performance and readability all of them has some tradeoffs

@ZempTime
ZempTime commented Aug 1, 2015

bravo @schneems ! 🍕

@bquorning bquorning commented on the diff Aug 1, 2015
actionpack/lib/action_dispatch/routing/route_set.rb
@@ -267,9 +267,13 @@ def handle_positional_args(controller_options, inner_options, args, result, path
path_params -= controller_options.keys
path_params -= result.keys
@bquorning
bquorning Aug 1, 2015 Contributor

I don’t know how often this code path is reached (inside an if args.size < path_params_size block), but the argument for saving an array allocation in 0cbec58 might apply for lines 267 and 268 as well.

Just a thought: Doesn’t the refactoring from foo -= bar.keys to bar.each { |key, _| foo.delete(key) } smell like there’s a method missing on the Array class?

@bquorning bquorning commented on the diff Aug 1, 2015
actionpack/lib/action_dispatch/routing/route_set.rb
@@ -671,12 +675,18 @@ def use_relative_controller!
# Remove leading slashes from controllers
def normalize_controller!
- @options[:controller] = controller.sub(%r{^/}, ''.freeze) if controller
+ if controller
+ if m = controller.match(/\A\/(?<controller_without_leading_slash>.*)/)
+ @options[:controller] = m[:controller_without_leading_slash]
+ else
+ @options[:controller] = controller
+ end
+ end
@bquorning
bquorning Aug 1, 2015 Contributor

While this may save on String allocations, I suspect the regex matching will actually make it slower. While the readability may be questionable, using start_with? and [] is much faster:

require 'benchmark/ips'

def sub(controller)
  controller.sub(%r{^/}, ''.freeze) if controller
end

def match(controller)
  if controller
    if m = controller.match(/\A\/(?<controller_without_leading_slash>.*)/)
      m[:controller_without_leading_slash]
    else
      controller
    end
  end
end

def start_with(controller)
  if controller
    if controller.start_with?('/'.freeze)
      controller[1..-1]
    else
      controller
    end
  end
end

Benchmark.ips do |x|
  x.report("sub") { sub("no_leading_slash") }
  x.report("match") { match("no_leading_slash") }
  x.report("start_with") { start_with("no_leading_slash") }

  x.compare!
end

Benchmark.ips do |x|
  x.report("sub") { sub("/a_leading_slash") }
  x.report("match") { match("/a_leading_slash") }
  x.report("start_with") { start_with("/a_leading_slash") }

  x.compare!
end
Calculating -------------------------------------
                 sub    59.229k i/100ms
               match    66.314k i/100ms
          start_with    91.586k i/100ms
-------------------------------------------------
                 sub      1.140M (± 9.0%) i/s -      5.686M
               match      1.423M (± 7.1%) i/s -      7.096M
          start_with      3.590M (±10.1%) i/s -     17.768M

Comparison:
          start_with:  3589743.1 i/s
               match:  1423380.7 i/s - 2.52x slower
                 sub:  1140204.8 i/s - 3.15x slower

Calculating -------------------------------------
                 sub    41.670k i/100ms
               match    37.536k i/100ms
          start_with    90.949k i/100ms
-------------------------------------------------
                 sub    653.775k (± 7.6%) i/s -      3.250M
               match    523.276k (±10.9%) i/s -      2.590M
          start_with      2.329M (±12.3%) i/s -     11.460M

Comparison:
          start_with:  2328795.8 i/s
                 sub:   653774.6 i/s - 3.56x slower
               match:   523276.2 i/s - 4.45x slower
@bquorning bquorning commented on the diff Aug 1, 2015
actionview/lib/action_view/helpers/tag_helper.rb
options.each_pair do |key, value|
if TAG_PREFIXES.include?(key) && value.is_a?(Hash)
value.each_pair do |k, v|
- attrs << prefix_tag_option(key, k, v, escape)
+ output << sep + prefix_tag_option(key, k, v, escape)
@bquorning
bquorning Aug 1, 2015 Contributor

I was of the understanding that sep + prefix_tag_option(key, k, v, escape) would allocate a new string object. This could be avoided by shoveling twice instead of once:

output << sep
output << prefix_tag_option(key, k, v, escape)
@schneems
schneems Aug 2, 2015 Member

Looks like it's faster

require 'benchmark/ips'

sep    = " ".freeze

Benchmark.ips do |x|
  x.report("string +") {
    output = ""
    sep    = " ".freeze
    output << sep + "foo"
  }

  x.report("string <<") {
    output = ""
    sep    = " ".freeze
    output << sep
    output << "foo"
  }
  x.report("array") {
    array = []
    array << "foo"
    array.join(" ")
  }
end

gives us

Calculating -------------------------------------
            string +    82.243k i/100ms
           string <<    86.053k i/100ms
               array    60.794k i/100ms
-------------------------------------------------
            string +      2.220M (±10.9%) i/s -     11.021M
           string <<      2.430M (±10.6%) i/s -     12.047M
               array      1.158M (± 8.7%) i/s -      5.775M

Looks like 10% faster. Care to give me a PR and @-mention me?

@bquorning
Contributor

This is an amazing PR. Thank you @schneems

@aditya-kapoor
Contributor

Amazing Work.. @schneems 👏 👏 👏

@jonatack
Contributor

Nice! 😃

@korny
Contributor
korny commented Aug 3, 2015

This is awesome. 👍

@b264
b264 commented Aug 3, 2015

Most appreciated 🚦 🏁

@thibaudgg
Contributor

Awesome! 👏 👏 👏

@endlos
endlos commented Aug 5, 2015

Nice job @schneems !

Thanks for your hard work and dedication.

@007lva
007lva commented Aug 5, 2015

👏 👏 👏

@AaronLasseigne

( I hope you don't mind me commenting here.)

You can avoid creating the MatchData object by using [] or =~ on str and passing it regex.

@regex_array.detect { |regex| str[regex] } or @regex_array.detect { |regex| str =~ regex }

Also any? seems more appropriate than detect. Is there a reason to use detect?

Owner

( I hope you don't mind me commenting here.)

Not at all, thanks for chiming in! You're right about the match:

require 'benchmark/ips'

string = "foo".freeze

Benchmark.ips do |x|
  x.report("match") {
    /boo/.match(string)
  }
  x.report("=~") {
    /boo/ =~ string
  }
  x.report("[]") {
    string[/boo/]
  }
end

results in

Calculating -------------------------------------
               match    66.837k i/100ms
                  =~    74.183k i/100ms
                  []    71.748k i/100ms
-------------------------------------------------
               match      1.783M (±12.8%) i/s -      8.822M
                  =~      2.077M (±13.0%) i/s -     10.237M
                  []      1.910M (±13.1%) i/s -      9.399M

Looks like switching to any? would also give a speed bump:

require 'benchmark/ips'

string = "foo".freeze
array = [/zoo/, /boo/, /moo/]

Benchmark.ips do |x|
  x.report("detect") {
    array.detect {|regex| string =~ regex }
  }
  x.report("any?") {
    array.any? {|regex| string =~ regex }
  }
end
Calculating -------------------------------------
              detect    31.241k i/100ms
                any?    37.831k i/100ms
-------------------------------------------------
              detect    489.087k (±10.2%) i/s -      2.437M
                any?    616.260k (± 9.5%) i/s -      3.064M

Could you put that in the change as well? Didn't realize there was a speed difference. Here's the source:

This is used for any?

static VALUE
rb_ary_any_p(VALUE ary)
{
    long i, len = RARRAY_LEN(ary);

    if (!len) return Qfalse;
    if (!rb_block_given_p()) {
    const VALUE *ptr = RARRAY_CONST_PTR(ary);
    for (i = 0; i < len; ++i) if (RTEST(ptr[i])) return Qtrue;
    }
    else {
    for (i = 0; i < RARRAY_LEN(ary); ++i) {
        if (RTEST(rb_yield(RARRAY_AREF(ary, i)))) return Qtrue;
    }
    }
    return Qfalse;
}

This is used for detect:

static VALUE
enum_find(int argc, VALUE *argv, VALUE obj)
{
    struct MEMO *memo;
    VALUE if_none;

    rb_scan_args(argc, argv, "01", &if_none);
    RETURN_ENUMERATOR(obj, argc, argv);
    memo = MEMO_NEW(Qundef, 0, 0);
    rb_block_call(obj, id_each, 0, 0, find_i, (VALUE)memo);
    if (memo->u3.cnt) {
    return memo->v1;
    }
    if (!NIL_P(if_none)) {
    return rb_funcallv(if_none, id_call, 0, 0);
    }
    return Qnil;
}

I guess the extra speed comes from not having to allocate an enumerator and the extra layer of indirection? Could you give me a PR and @-mention me to change this to a str =~ regex and change the method to any? since you found the change?

Sure thing.

Owner

I added the String#=~ to fast-ruby btw JuanitoFatas/fast-ruby#59. It would be good to add the any? versus detect benchmark too if you get a bit more time after your rails patch.

Thanks again for the help 👍 ❤️

@connorshea
Contributor

Another reason to look forward to Rails 5.0! :shipit:

@toreriklinnerud

Thanks for this - great work that will benefit everyone using Rails! ❤️

@davekapp
davekapp commented Aug 6, 2015

Amazing work. Congrats to you and everyone else and I owe you a 🍺 sometime. :)

@Johnius Johnius commented on the diff Aug 6, 2015
actionpack/lib/action_dispatch/routing/route_set.rb
end
# Move 'index' action from options to recall
def normalize_action!
- if @options[:action] == 'index'
+ if @options[:action] == 'index'.freeze
@Johnius
Johnius Aug 6, 2015

Maybe it's a stupid question. But why do you freeze strings?

@schneems
schneems Aug 6, 2015 Member

This explains the freeze method for strings 
http://tmm1.net/ruby21-fstrings/

This article explains why you might want I to use frozen strings for performance with some benchmarks http://www.sitepoint.com/unraveling-string-key-performance-ruby-2-2/

@dmitry
dmitry Aug 7, 2015 Contributor

Isn't it better to have a constant defined in a class/module?

@radar
Contributor
radar commented Aug 7, 2015

Excellent work @schneems! 🎉 🍻

@jjgh
jjgh commented Aug 7, 2015

@schneems thanks for making Rails so much better, one more time! 👏 👏 👏

@kgrz kgrz commented on the diff Aug 7, 2015
...ort/lib/active_support/core_ext/string/inflections.rb
@@ -164,7 +164,7 @@ def deconstantize
#
# <%= link_to(@person.name, person_path) %>
# # => <a href="/person/1-donald-e-knuth">Donald E. Knuth</a>
- def parameterize(sep = '-')
+ def parameterize(sep = '-'.freeze)
@kgrz
kgrz Aug 7, 2015

Do you think it's better to have a single place that has all the frozen strings? Sort of like boot.rb, but that will have the calls to freeze for these tokens (- here, :: in the next file etc.

@bquorning
bquorning Aug 7, 2015 Contributor

You’d still have to call freeze when you want to use the frozen string, so there is no point in freezing them up front.

>> '-'.freeze.object_id
=> 70222534974420
>> '-'.object_id
=> 70222535341760
>> '-'.freeze.object_id
=> 70222534974420
>> '-'.object_id
=> 70222535533680
@kgrz
kgrz Aug 7, 2015

Ah, TIL, thank you!

@kgrz
kgrz Aug 7, 2015

@bquorning That said, I remember a way I did that in a different project. That was by using constants in cases like these. For example, def parameterize(sep = DASH_TOKEN). That way, the object_ids will be same no matter where this token gets used, IIRC.

@bquorning
bquorning Aug 7, 2015 Contributor

True, if you assign the frozen at trings to constants up front, you can re-use the same instance all over the place. Plus, they won't be GC'ed.

@bquorning
bquorning Aug 7, 2015 Contributor

To quote @schneems from rack/rack#737:

“While we could certainly go overboard and pre-define ALL strings as constants, that would be pretty gnarly to work with. This patch goes after the largest of the low hanging fruit.”

@kgrz
kgrz Aug 7, 2015

Yes. More cleaner, IMO. That was what I wanted to convey when I first posted the comment. Evidently, I wasn't clear. Sorry about that.

@kgrz
kgrz Aug 7, 2015

Let me clarify my first comment again:

I meant that it might be nice to have these tokens like -, ::, '' etc to, say, DASH, DOUBLE_COLON, EMPTY_STRING etc defined in a separate file that's loaded upfront and those constants used everywhere instead of multiple "-".freeze calls all over the code.

@schneems
schneems Aug 7, 2015 Member

Specifying frozen strings as constants is slower. Constants work internally as a global hash, every time you call a constant Ruby has to do a hash lookup. If you use String#freeze inline, you don't have to do the lookup and your code is faster. Also ''.freeze is shorter than EMPTY_STRING.

require 'benchmark/ips'

HELLO = "hello".freeze
Benchmark.ips do |x|
  x.report("freeze")   { "hello".freeze + "world" }
  x.report("constant") { HELLO + "world" }
end
Calculating -------------------------------------
              freeze   100.611k i/100ms
            constant    99.036k i/100ms
-------------------------------------------------
              freeze      3.630M (± 7.9%) i/s -     18.110M
            constant      3.470M (±11.6%) i/s -     17.034M
@nitinstp23

Nice work @schneems 🍻

@arteezy
arteezy commented Aug 7, 2015

@schneems bravo 👏

@marcgg
Contributor
marcgg commented Aug 7, 2015

👍 👍 👍 Thanks a lot for this

@arun057
arun057 commented Aug 7, 2015

Nice work @schneems

@robinw777

Sorry for a newbie question - I don't understand why "#{action}.action_view".freeze is not applicable to the other branch condition, and why this can save object allocation. Could you explain? Thanks!

Is it because to evaluate "#{action}.action_view", a new string will always be created?

Owner

Try out some benchmarks

require 'benchmark/ips'

action = "!render_template"
Benchmark.ips do |x|
  x.report("frozen dynamic") { "#{action}.action_view".freeze }
  x.report("frozen static")  { "!render_template.action_view".freeze }
  x.report("not frozen")     { "#{action}.action_view" }
end
Calculating -------------------------------------
      frozen dynamic    88.653k i/100ms
       frozen static   138.259k i/100ms
          not frozen    92.168k i/100ms
-------------------------------------------------
      frozen dynamic      2.085M (±10.0%) i/s -     10.372M
       frozen static     10.060M (±12.5%) i/s -     49.358M
          not frozen      2.242M (± 9.3%) i/s -     11.152M

The freeze method does 2 things, if called on a string literal "hello".freeze then it will only ever allocate one string, it also sets a flag on the string letting it know it cannot be modified. If we create a dynamic string "#{action}.action_view".freeze It must allocate a new string. We are joining two strings that may have never been joined together before and it will create a totally new string. Once that string is created, we flip the flag letting everyone know it cannot be modified. This actually takes longer than not calling freeze on the dynamic string since we have to perform the extra operation. So yes, your statement is correct #{action}.action_view" will always create a new string

@coderxin

@schneems nice work! 👏

@kennym
Contributor
kennym commented Aug 10, 2015

@schneems this is awesome :-) 👍

@dmitry
Contributor
dmitry commented Aug 10, 2015

Still a question, why not to use constants, instead of forzen? Because of the readability?

@matthewd
Member

@dmitry there is no reason to use constants. This is how you spell an immutable string literal in ruby; we want immutable string literals, so that's what we're doing.

(Also, @schneems has already pointed out that constants may be slower.)

@dmitry
Contributor
dmitry commented Aug 10, 2015

@matthewd thanks for pointing out.

Interestingly on my computer (with 2.1.5 and 2.2.2 rubies) this benchmark produces almost the same results for constant/freeze strings. But readability of freeze is better, except the times when it's repeating many times in a code, and you would like to inspect them with IDE or grep.

@garysweaver
Contributor

👍 👍 Nice work!

@heliocola heliocola commented on the diff Aug 24, 2015
...support/lib/active_support/inflector/transliterate.rb
@@ -75,13 +75,21 @@ def parameterize(string, sep = '-')
# Turn unwanted chars into the separator
parameterized_string.gsub!(/[^a-z0-9\-_]+/i, sep)
unless sep.nil? || sep.empty?
- re_sep = Regexp.escape(sep)
+ if sep == "-".freeze
@heliocola
heliocola Aug 24, 2015

@schneems the "sep" variable in the definition of the parameterize function is not using the .freeze when sep is not present in the call. Will this if catch or miss the case when somebody call parameterize("string")?
If the comparison takes in consideration the string (and not the object id), this probably won't be a problem... Does it make sense?
I noticed you used the sep = "-".freeze in the parameterize function on file 'activesupport/lib/active_support/core_ext/string/inflections.rb'

@schneems
schneems Aug 24, 2015 Member

Not sure what is wrong. Are you concerned that "-".freeze does not == "-" ? It does

puts "-".freeze == "-"
true
puts "-" == "-".freeze
true
@heliocola
heliocola Aug 24, 2015

The comparison was the main thing I thought about.
If you call it without the second parameter it will have a different string "-" created but the comparison will work.

@alejandrodevs

@schneems Nice work!

@lucascaton
Contributor

@schneems Nice one! ツ

@RKushnir

Why don't you use Hash#each_key if you only need keys?

@swapab
swapab commented Feb 16, 2016

Is this PR available in rails 3.2.22 or 3.2.22.1 ?

@dmitry
Contributor
dmitry commented Feb 16, 2016

@swapnilabnave no, it's only available for v5.0.0.beta2 v5.0.0.beta1.1 v5.0.0.beta1. You can see that by checking merge commit: 5373bf2

@swapab
swapab commented Feb 16, 2016

@dmitry Thanks!

@Chizuru-Maxienne

Keep up the good work! @schneems 👏 👏 👍 🍰

@tiagoovieira

🔝

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment