-
Notifications
You must be signed in to change notification settings - Fork 21.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster permitted_scalar_filter #33758
Conversation
7056b9f
to
1d62cd9
Compare
1d62cd9
to
c423f0b
Compare
Updated along with new numbers. It's pretty close to the original implementation. Few bytes more but not much. |
@schneems Thanks for taking the time to investigate all these optimizations. I'd like to see us contextualize the microbenchmarks with something higher level. Even if it's literally just the dreaded Hello World that runs through an entire Rails request with template rendering and such. That would give us some sense of scale. Because these optimizations aren't free; code that's more verbose/less idiomatic/uses multiple exits/whatever should pay for its cost with real, measurable benefits for actual applications. Thanks again for looking into this! I'm sure there are optimizations that sit in hot-paths were the trade-off of worse code is worth the gain 👍 |
In fact, I'd go as far as to say it'd be really awesome if we could get a general test app running that we can run any kind of change against. Not just these optimizations, but things we fear might be performance regressions. And a way to measure major versions against each other for improvements. |
The memory savings are 0.013919 mb and on codetriage it takes about 0.95 mb of memory for a request. So, in this case, it's a 1.4 % reduction in memory per request. I couldn't get that to show up on a benchmark by itself in any kind of statistically significant way via benchmark.ips or benchmark.bmbm. This location was bad enough to show up on derailed's top offenders for memory allocation before the change. Before
After
Measured with
On CodeTriage. |
@dhh maybe we could re-use some of @noahgibbs work with rails ruby bench. |
I'd be happy to run some before/after tests. That would be full end-to-end, though, so anything that doesn't save at least around 2% on the total Rails runtime for Discourse (everything: database, Redis, etc) is going to be hard to measure. |
params[k] = self[k] | ||
end | ||
permitted_string_key = permitted_key.to_s | ||
keys.each do |key| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use each_key
here to avoid allocating an intermediate array (#17099).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice.
if permitted_scalar?(self[k]) | ||
params[k] = self[k] | ||
end | ||
permitted_string_key = permitted_key.to_s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might not need this value if the block returns on the first line, so perhaps we should inline the to_s
call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, but then we might end up calling it multiple times. Never mind 🙄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You actually can amortize it, though, by moving this line up higher: has_key?
will also need to stringify its parameter.
@noahgibbs Would be nice to be able to quickly call in the Performance Budget Accessor on any PR and get a verdict. That would be a dream 😄 |
In general I'm happy to. Feel free to call me in where you're curious. My specific benchmark is a giant, concurrent end-to-end test based on Discourse - so it's only good for finding larger deltas in runtime and it can be finicky to set up. But when it works, it's a pretty realistic assessment. I'm running it against this change right now, but it takes at least a couple of hours. |
@noahgibbs Awesome. Thanks! I wonder if there's a way we can think about having multiple layers of this. There's the microbenchmark, then there's the whole shebang that takes hours to run. Probably some thing good in the middle too. |
Pretty certainly. I'm working on an intermediate benchmark now - still a Rails app, but one with a lot fewer dependencies and less non-Ruby time. With luck, that will be a good "something in the middle." For microbenchmarks, you'd probably want an approach kind of like what the Ruby repo does internally - a lot of tiny benchmarks that just hit a few methods each. ActiveSupport would be really easy to do this with and I expect it would work really well. It's also pretty similar to what @schneems did when writing this, just checked into the tree to be rerunnable. |
It doesn't look like 'try' is much of the end-to-end performance. Specifically, with half a million HTTP requests for each version, I see the version with @schneems' change running 0.08% faster for the median request (like, less than a tenth of one percent.) But my variance on that measurement is around 3.2 reqs/second (std dev around 1.79), on a difference of about 0.14 requests/second. Basically it's far too small a measurement to be meaningful at this test size and variance. |
c423f0b
to
832614f
Compare
I adopted the new suggestions. Thanks! I had to add a delegation of In the interest of seeing any difference, I decided to benchmark this on a "hello world" app. First I made a dev app that points to my local rails codebase:
Then I added a controller action that hits this codepath:
I then hit the app using derailed benchmarks with and without this patch. memory profiling "hello world" appFor memory I ran:
Without patch:
With patch:
Difference:
Raw speed "hello world app"For raw speed I hit this endpoint 10,000 times:
With patch
Without patch
Difference:
ConclusionsFrom everything i've seen in profiling % of memory allocation decrease is roughly a 1:1 correlation with execution speed. Profiling memory allocation is also way easier and much more consistent with less variability. In terms of how much faster this makes an app entirely depends on the app. For codetriage it's about a 1% bump, for discourse 0.8% (ish) for an app that practically does nothing a 20% bump provided that you're "permit" -ing lots of keys. Mostly it's a question of how much memory is the app in question allocating. I can pretty confidently say that this code is faster. I cannot with any certainty try to say exactly what the impact is. |
832614f
to
11a2eb9
Compare
When running with code triage and derailed benchmarks and focusing on this file: Before 16199 /Users/rschneeman/Documents/projects/rails/actionpack/lib/action_controller/metal/strong_parameters.r After 2280 /Users/rschneeman/Documents/projects/rails/actionpack/lib/action_controller/metal/strong_parameters.rb
11a2eb9
to
daa3565
Compare
When running with code triage and derailed benchmarks and focusing on this file:
Before
After
In an "hello world" app i'm seeing a roughly 20% speed increase.