-
Notifications
You must be signed in to change notification settings - Fork 21.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Beyond Ludicrous Speed #21057
Beyond Ludicrous Speed #21057
Conversation
If you want just to run tests you can always push to rails/rails since you have commit access. |
fb782aa
to
6776ce7
Compare
When a symbol is passed in, we call `to_s` on it which allocates a string. The two hardcoded symbols that are used internally are `:to_partial_path` and `:to_model`. This change buys us 71,136 bytes of memory and 1,777 fewer objects per request.
In `apply_inflections` a string is down cased and some whitespace stripped in the front (which allocate strings). This would normally be fine, however `uncountables` is a fairly small array (10 elements out of the box) and this method gets called a TON. Instead we can keep an array of valid regexes for each uncountable so we don't have to allocate new strings. This change buys us 325,106 bytes of memory and 3,251 fewer objects per request.
The request.script_name is dup-d which allocates an extra string. It is most commonly an empty string "". We can save a ton of string allocations by checking first if the string is empty, if so we can use a frozen empty string instead of duplicating an empty string. This change buys us 35,714 bytes of memory and 893 fewer objects per request.
Micro optimization: `reverse.drop_while` is slower than `reverse_each.drop_while`. This doesn't save any object allocations. Second, `keys_to_keep` is typically a very small array. The operation `parameterized_parts.keys - keys_to_keep` actually allocates two arrays. It is quicker (I benchmarked) to iterate over each and check inclusion in array manually. This change buys us 1774 fewer objects per request
Most routes have a `route.path.requirements[key]` of `/[-_.a-zA-Z0-9]+\/[-_.a-zA-Z0-9]+/` yet every time this method is called a new regex is generated on the fly with `/\A#{DEFAULT_INPUT}\Z/`. OBJECT ALLOCATIONS BLERG! This change uses a special module that implements `===` so it can be used in a case statement to pull out the default input. When this happens, we use a pre-generated regex. This change buys us 1,643,465 bytes of memory and 7,990 fewer objects per request.
In handle_positional_args `Array#-=` is used which allocates a new array. Instead we can iterate through and delete elements, modifying the array in place. Also `Array#take` allocates a new array. We can build the same by iterating over the other element. This change buys us 106,470 bytes of memory and 2,663 fewer objects per request.
When generating a url with `url_for` the hash of arguments passed in, is dup-d and merged a TON. I wish I could clean this up better, and might be able to do it in the future. This change removes one dup, since it's literally right after we just dup-d the hash to pass into this constructor. This may be a breaking, change but the tests pass...so we can revert if it causes problems This change buys us 205,933 bytes of memory and 887 fewer objects per request.
When an unknonwn key is passed to the hash in `PRE_CONTENT_STRINGS` it returns nil, when you call "#{nil}" it allocates a new empty string. We can get around this allocation by using a default value `Hash.new { "".freeze }`. We can avoid the `to_sym` call by pre-populating the hash with a symbol key in addition to a string key. We can freeze some strings when using Array#* to reduce allocations. Array#join can take frozen strings. This change buys us 86,600 bytes of memory and 1,857 fewer objects per request.
No idea why on earth this hash key isn't already optimized by MRI, but it isn't. 💩 This change buys us 74,077 bytes of memory and 1,852 fewer objects per request.
The instrument method creates new strings, the most common action to instrument is "!render_template` so we can detect when that action is occurring and use a frozen string instead. This change buys us 113,714 bytes of memory and 1,790 fewer objects per request.
content_tag's first argument is will generate a string with an html tag so `:a` will generate: `<a></a>`. When this happens, the symbol is implicitly `to_s`-d so a new string is allocated. We can get around that by using a frozen string instead which This change buys us 74,236 bytes of memory and 1,855 fewer objects per request.
6776ce7
to
7fc36cb
Compare
Build is green, for some reason it's not showing up on the PR https://travis-ci.org/rails/rails/builds/73305474 |
Nice work @schneems ! |
🏇 |
Squash! =) |
Really nice work! |
❤️ ❤️ ❤️ ❤️ ❤️ |
@@ -33,6 +33,7 @@ def generate(name, options, path_parameters, parameterize = nil) | |||
defaults = route.defaults | |||
required_parts = route.required_parts | |||
parameterized_parts.delete_if do |key, value| | |||
next if defaults[key].nil? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Combining control flow (next
as early-return) and a boolean expression feels a little awkward. How about folding the not-nil check into the expression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It becomes a bit long for my tastes, also requires a bang expression, which I try to avoid:
!defaults[key].nil? && value.to_s == defaults[key].to_s && !required_parts.include?(key)
You make the call. Which do you prefer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't really considered its merits (or checked it's right 😉), but here's a less negative spelling of that:
parameterized_parts.keep_if do |key, value|
defaults[key].nil? ||
value.to_s != defaults[key].to_s ||
required_parts.include?(key)
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could flip to keep_if
to simplify the expression a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! action patch tests pass with that for me. seems good, somehow removing the bangs makes it okay to be on one line (for me). I pushed this change. Will wait on the whole suite to run again to be sure.
@schneems well done 👏 |
@@ -33,6 +33,7 @@ def generate(name, options, path_parameters, parameterize = nil) | |||
defaults = route.defaults | |||
required_parts = route.required_parts | |||
parameterized_parts.delete_if do |key, value| | |||
next if defaults[key].nil? | |||
value.to_s == defaults[key].to_s && !required_parts.include?(key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is value
ever nil
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checked via actionpack
tests and with my own app, value is never nil.
7fc36cb
to
61dae88
Compare
When `defaults[key]` in `generate` in the journey formatter is called, it often returns a `nil` when we call `to_s` on a nil, it allocates an empty string. We can skip this check when the default value is nil. This change buys us 35,431 bytes of memory and 887 fewer objects per request. Thanks to @matthewd for help with the readability
👏 👏 👏 |
Another reason to look forward to Rails 5.0! |
Thanks for this - great work that will benefit everyone using Rails! ❤️ |
Amazing work. Congrats to you and everyone else and I owe you a 🍺 sometime. :) |
end | ||
|
||
# Move 'index' action from options to recall | ||
def normalize_action! | ||
if @options[:action] == 'index' | ||
if @options[:action] == 'index'.freeze |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's a stupid question. But why do you freeze strings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This explains the freeze method for strings
http://tmm1.net/ruby21-fstrings/
This article explains why you might want I to use frozen strings for performance with some benchmarks http://www.sitepoint.com/unraveling-string-key-performance-ruby-2-2/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it better to have a constant defined in a class/module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check out https://github.com/rails/rails/pull/21057/files#r36532115 for why we're not doing that
Excellent work @schneems! 🎉 🍻 |
@schneems thanks for making Rails so much better, one more time! 👏 👏 👏 |
@@ -164,7 +164,7 @@ def deconstantize | |||
# | |||
# <%= link_to(@person.name, person_path) %> | |||
# # => <a href="/person/1-donald-e-knuth">Donald E. Knuth</a> | |||
def parameterize(sep = '-') | |||
def parameterize(sep = '-'.freeze) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it's better to have a single place that has all the frozen strings? Sort of like boot.rb
, but that will have the calls to freeze
for these tokens (-
here, ::
in the next file etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You’d still have to call freeze
when you want to use the frozen string, so there is no point in freezing them up front.
>> '-'.freeze.object_id
=> 70222534974420
>> '-'.object_id
=> 70222535341760
>> '-'.freeze.object_id
=> 70222534974420
>> '-'.object_id
=> 70222535533680
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, TIL, thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bquorning That said, I remember a way I did that in a different project. That was by using constants in cases like these. For example, def parameterize(sep = DASH_TOKEN)
. That way, the object_id
s will be same no matter where this token gets used, IIRC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, if you assign the frozen strings to constants up front, you can re-use the same instance all over the place. Plus, they won't be GC'ed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To quote @schneems from rack/rack#737:
“While we could certainly go overboard and pre-define ALL strings as constants, that would be pretty gnarly to work with. This patch goes after the largest of the low hanging fruit.”
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. More cleaner, IMO. That was what I wanted to convey when I first posted the comment. Evidently, I wasn't clear. Sorry about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me clarify my first comment again:
I meant that it might be nice to have these tokens like -
, ::
, ''
etc to, say, DASH
, DOUBLE_COLON
, EMPTY_STRING
etc defined in a separate file that's loaded upfront and those constants used everywhere instead of multiple "-".freeze
calls all over the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifying frozen strings as constants is slower. Constants work internally as a global hash, every time you call a constant Ruby has to do a hash lookup. If you use String#freeze
inline, you don't have to do the lookup and your code is faster. Also ''.freeze
is shorter than EMPTY_STRING
.
require 'benchmark/ips'
HELLO = "hello".freeze
Benchmark.ips do |x|
x.report("freeze") { "hello".freeze + "world" }
x.report("constant") { HELLO + "world" }
end
Calculating -------------------------------------
freeze 100.611k i/100ms
constant 99.036k i/100ms
-------------------------------------------------
freeze 3.630M (± 7.9%) i/s - 18.110M
constant 3.470M (±11.6%) i/s - 17.034M
Nice work @schneems 🍻 |
@schneems bravo 👏 |
👍 👍 👍 Thanks a lot for this |
Nice work @schneems |
@schneems nice work! 👏 |
@schneems this is awesome :-) 👍 |
Still a question, why not to use constants, instead of |
@dmitry there is no reason to use constants. This is how you spell an immutable string literal in ruby; we want immutable string literals, so that's what we're doing. (Also, @schneems has already pointed out that constants may be slower.) |
@matthewd thanks for pointing out. Interestingly on my computer (with 2.1.5 and 2.2.2 rubies) this benchmark produces almost the same results for constant/freeze strings. But readability of |
👍 👍 Nice work! |
@@ -75,13 +75,21 @@ def parameterize(string, sep = '-') | |||
# Turn unwanted chars into the separator | |||
parameterized_string.gsub!(/[^a-z0-9\-_]+/i, sep) | |||
unless sep.nil? || sep.empty? | |||
re_sep = Regexp.escape(sep) | |||
if sep == "-".freeze |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@schneems the "sep" variable in the definition of the parameterize function is not using the .freeze when sep is not present in the call. Will this if catch or miss the case when somebody call parameterize("string")?
If the comparison takes in consideration the string (and not the object id), this probably won't be a problem... Does it make sense?
I noticed you used the sep = "-".freeze in the parameterize function on file 'activesupport/lib/active_support/core_ext/string/inflections.rb'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what is wrong. Are you concerned that "-".freeze
does not == "-"
? It does
puts "-".freeze == "-"
true
puts "-" == "-".freeze
true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comparison was the main thing I thought about.
If you call it without the second parameter it will have a different string "-" created but the comparison will work.
@schneems Nice work! |
@schneems Nice one! ツ |
Is this PR available in rails 3.2.22 or 3.2.22.1 ? |
@swapnilabnave no, it's only available for v5.0.0.beta2 v5.0.0.beta1.1 v5.0.0.beta1. You can see that by checking merge commit: 5373bf2 |
@dmitry Thanks! |
Keep up the good work! @schneems 👏 👏 👍 🍰 |
🔝 |
I've been working on quite a few performance improvements to Rails and the Rails ecosystem, but this is by far the single biggest set of improvements I've been able to make. Those other changes have maybe shaved off a few objects here and there, and maybe 1 or 2% faster request time if I'm extremely lucky.
This change shaves off 34,299 objects and 3,457,318 bytes (3.29 MiB) allocated on every request. This is a 29% decrease in objects and a 23% decrease in memory allocated per request. Taking us past ludicrous speed...
So just how much does this improve overall request time? To measure I used http://www.codetriage.com as an example app and https://github.com/schneems/derailed_benchmarks to generate load. I hit the app with 1,000 requests with my patch and then 1,000 requests against Rails master. I repeated several times until I saw a fairly stable standard deviation.
The average time for 1,000 request against master was 221.88 seconds and against this patch was 195.401 seconds. This gives us a...
11.9 % speed improvement
Where does that speed improvement come from? The big theme is not allocating objects when we don't need to. I used
$ derailed exec perf:objects
which uses memory_profiler to look for large pockets of allocated memory against http://www.codetriage.com. You can get an in-depth explanation of each change in the commit along with a benchmark of objects and memory saved. The largest area I was able to make gains on was link generation, by removing extraneous arrays, duplicated hashes, and getting rid of intermediate objects whenever possible. I measured this speedup directly:This showed 2,865 iterations per second on master and 4,143 iterations per second on my patch. Which is a 44% speed improvement in url generation.
These performance improvements preserve existing interfaces and behaviors, all tests pass.