-
Notifications
You must be signed in to change notification settings - Fork 21.9k
Slow performance when rendering collections of nested partials #41452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@willcosgrove this is super interesting - I thought it was well-known that this behavior existed and that's why we render "with collections" instead of "in loops" but I can't seem to find an official citation anywhere! Here's a scout article though: https://scoutapm.com/blog/performance-impact-of-using-ruby-on-rails-view-partials |
@mdworken I agree. The surprising part was not that loops are slow, it's that you can apparently negate the optimization of rendering with collections if the partial you're rendering renders another partial. It may be that it's technically intractable to solve this problem. I was compelled to write up the issue though because tenderlove indicated over Twitter that it seemed like a bug. Again, it could have just been because I was explaining the issue to him poorly. But if it is truly a bug, I wanted to document it. |
I see that the app is using slim. Does this also happen if the app was only using ERBs? |
@rafaelfranca Thanks for chiming in! I added a PR swapping slim for erb - the same issue persists: |
I believe this may be the case (intractable to optimize). Consider the first example: <%= render partial: "x", collection: 100.times.to_a %> In this case, we know we'll be rendering Now, consider another example: <% 100.times do %>
<%= render partial: rand() > 0.5 ? "x" : "y" %>
<% end %> The above can't easily be optimized, because the argument passed to Even if we remove the randomness, the same challenge remains - even though we (humans) can easily see the first and second cases will produce the same output, a computer won't automatically assume the same argument will be passed to The same holds true for a nested partial - even when the outer partial is rendered using collection, the inner render call will still need to be performed once per iteration, same as if we used a for loop. This isn't to say there's no room for improvement (in terms of caching, etc.), but rather an effort to understand why we might expect nested |
@alipman88 That makes sense. Thanks for walking through that! It sounds like there's probably nothing that can be done to improve this specific problem. Our "fix" to this problem in our app has been to move away from partials towards helper methods that build markup through the use of the _fuzzy_option.html.slim- html_class ||= ""
a.fuzzy-select__item(
class=html_class
data-action="touchdown->fuzzy-select#selectItem mousedown->fuzzy-select#selectItem"
data-fuzzy-select-target="item"
data-fuzzy-search-target="item"
data-value=value
role="button")
- if block
= capture(&block) theme_helper.rbdef fuzzy_option(**opts, &block)
render "components/fuzzy_select/fuzzy_option", **opts.merge(block: block)
end We now put all of that markup directly in the helper method theme_helper.rbdef fuzzy_option(**opts, &block)
tag.a(class: "fuzzy-select__item #{opts[:html_class]}",
data: {
action: "touchdown->fuzzy-select#selectItem mousedown->fuzzy-select#selectItem",
fuzzy_select_target: "item",
fuzzy_search_target: "item",
value: opts[:value],
},
role: "button",
&block
)
end We would prefer to keep our markup out of our ruby files, but this seems to be the only way to sidestep the performance penalty of calling I don't suppose anyone knows if there is a way to bypass a majority of the overhead cost of And thank you to everyone who has spent time looking at this and thinking about it 🙏 |
@alipman88 totally agree! The same is true for calling any method with differing inputs. I think where we're wondering about optimization is: if we were to call a method directly, instead of I've updated my PR to include a scenario with calling a method directly instead of calling render. Here's the render speed difference: Flat Rendering:Nested Partial Rendering:Method Rendering:And yes, thank you to everyone who's spent time on this! 🙏 |
As a side - in looking at the screenshots - calling |
This issue has been automatically marked as stale because it has not been commented on for at least three months. |
🦄 |
Not exactly related, but one interesting point here is that instrumentation increases the overhead of the Benchmark the slow method with
Doing just a simple render, the difference between both cases is similar. Benchmark script here: https://github.com/lfalcao/benchmarks/blob/main/views/notifications.rb |
I added an example using View Component, just to see how it compares in general. Here's my PR with the additions: willcosgrove/partial-perf-repro#2 Something to note: My PR increases the nesting, to where there are three layers of partials/components being used. A Flat:60ms | ~57k allocationsNested Partials:3015ms | ~1.74M allocationsMethods, using Tag Helpers:1201ms | ~2.2M allocationsView Components:558ms | ~487k allocationsHonestly, I'm quite surprised at the performance of Again, this example is extreme to emphasize what's going on (10k items, 3 layers of nesting), but I think it really highlights some inherent performance constraints of using partials (or even tag helpers). |
Previously, we see many lines repeated of ``` Rendered canonical_transactions/_canonical_transaction.html.erb (Duration: 4.9ms | Allocations: 1405) Rendered canonical_transactions/_canonical_transaction.html.erb (Duration: 4.3ms | Allocations: 1416) Rendered canonical_transactions/_canonical_transaction.html.erb (Duration: 4.4ms | Allocations: 1416) ``` This is because the template gets reread from disk every loop iteration. Fortunately the partials API has a way to read it once and reuse it for the whole loop. After: ``` Rendered collection of canonical_pending_transactions/_canonical_pending_transaction.html.erb [0 times] (Duration: 0.0ms | Allocations: 34) Rendered collection of canonical_transactions/_canonical_transaction.html.erb [100 times] (Duration: 398.0ms | Allocations: 130029) ``` Local timing (absolute values aren't informative but relative are): Before: 827.9 ms in views `Completed 200 OK in 1409ms (Views: 827.9ms | ActiveRecord: 135.4ms | SQL Queries: 25 (1 cached) | Allocations: 605016)` After: 714.2 ms in views `Completed 200 OK in 1325ms (Views: 714.2ms | ActiveRecord: 138.8ms | SQL Queries: 25 (1 cached) | Allocations: 603067)` More reading: https://scoutapm.com/blog/performance-impact-of-using-ruby-on-rails-view-partials rails/rails#41452 https://www.justinweiss.com/articles/how-much-time-does-rendering-a-partial-really-take/
cc @jhawthorn. You might be interested into this. |
I don't have a ton of time to dig in right now, but a couple quick notes (I see a bunch of this is also discussed above). I tested against @thewatts branch. First is that the benchmarks from above are no longer from the latest Rails (they probably were when this issue was opened 😅) but a quick bundle update improves performance of the "nested" case about 40%, so progress has been made here. The next reason the "nested" case is slower is logging/instrumentation. In my benchmark script I added I think the next significant cost is the template lookup, which has had significant improvements in the past year (which is why the bundle update helped so much 😁). A significant difference from the "flat" and "nested" case is that the nested case requires each of the lookups to be separate (and makes 20k of them). These are cached, but what we're seeing is the cost to lookup the cached result. Benchmark script
(I think we shouldn't consider the "tag method" part of this benchmark, I believe the performance there is completely unrelated to partial rendering) |
Aha! Another issue (mentioned briefly above) is that the benchmark repo uses "implicit" prefixes in the templates: |
Based on @jhawthorn's comments, I wanted to pair down the benchmark. Observations:
Here's my reproduction repo: https://github.com/bensheldon/benchmark_render Benchmark Output (with logging)❯ RAILS_VERSION=6.0.5 ./benchmark_render.rb \
&& RAILS_VERSION=7.0.3 ./benchmark_render.rb \
&& ./benchmark_render.rb
Rails version: 6.0.5
Warming up --------------------------------------
/inline 313.000 i/100ms
/nested_loop 172.000 i/100ms
/nested_loop_path 174.000 i/100ms
/nested_collection 272.000 i/100ms
/vc_loop 298.000 i/100ms
Calculating -------------------------------------
/inline 3.254k (± 1.3%) i/s - 16.276k in 5.003258s
/nested_loop 1.737k (± 1.4%) i/s - 8.772k in 5.051732s
/nested_loop_path 1.727k (± 1.2%) i/s - 8.700k in 5.039853s
/nested_collection 2.754k (± 3.6%) i/s - 13.872k in 5.045295s
/vc_loop 3.029k (± 3.4%) i/s - 15.198k in 5.023531s
Comparison:
/inline: 3253.7 i/s
/vc_loop: 3029.4 i/s - 1.07x (± 0.00) slower
/nested_collection: 2753.9 i/s - 1.18x (± 0.00) slower
/nested_loop: 1736.8 i/s - 1.87x (± 0.00) slower
/nested_loop_path: 1726.5 i/s - 1.88x (± 0.00) slower
Rails version: 7.0.3
Warming up --------------------------------------
/inline 308.000 i/100ms
/nested_loop 173.000 i/100ms
/nested_loop_path 174.000 i/100ms
/nested_collection 268.000 i/100ms
/vc_loop 281.000 i/100ms
Calculating -------------------------------------
/inline 3.153k (± 2.5%) i/s - 16.016k in 5.083613s
/nested_loop 1.732k (± 1.4%) i/s - 8.823k in 5.094727s
/nested_loop_path 1.748k (± 2.5%) i/s - 8.874k in 5.081193s
/nested_collection 2.686k (± 2.7%) i/s - 13.668k in 5.091552s
/vc_loop 2.858k (± 1.7%) i/s - 14.331k in 5.016606s
Comparison:
/inline: 3152.5 i/s
/vc_loop: 2857.5 i/s - 1.10x (± 0.00) slower
/nested_collection: 2686.5 i/s - 1.17x (± 0.00) slower
/nested_loop_path: 1747.5 i/s - 1.80x (± 0.00) slower
/nested_loop: 1732.2 i/s - 1.82x (± 0.00) slower
Rails version: 7.1.0.alpha
Warming up --------------------------------------
/inline 307.000 i/100ms
/nested_loop 181.000 i/100ms
/nested_loop_path 180.000 i/100ms
/nested_collection 258.000 i/100ms
/vc_loop 280.000 i/100ms
Calculating -------------------------------------
/inline 3.062k (± 3.4%) i/s - 15.350k in 5.020859s
/nested_loop 1.784k (± 4.1%) i/s - 9.050k in 5.082636s
/nested_loop_path 1.736k (± 2.0%) i/s - 8.820k in 5.083707s
/nested_collection 2.647k (± 1.9%) i/s - 13.416k in 5.069750s
/vc_loop 2.829k (± 2.5%) i/s - 14.280k in 5.051345s
Comparison:
/inline: 3061.6 i/s
/vc_loop: 2828.8 i/s - 1.08x (± 0.00) slower
/nested_collection: 2647.2 i/s - 1.16x (± 0.00) slower
/nested_loop: 1784.0 i/s - 1.72x (± 0.00) slower
/nested_loop_path: 1735.7 i/s - 1.76x (± 0.00) slower Benchmark Output (without logging)❯ WITHOUT_LOGGER=1 RAILS_VERSION=6.0.5 ./benchmark_render.rb \
&& WITHOUT_LOGGER=1 RAILS_VERSION=7.0.3 ./benchmark_render.rb \
&& WITHOUT_LOGGER=1 ./benchmark_render.rb
Rails version: 6.0.5
Warming up --------------------------------------
/inline 353.000 i/100ms
/nested_loop 182.000 i/100ms
/nested_loop_path 275.000 i/100ms
/nested_collection 325.000 i/100ms
/vc_loop 326.000 i/100ms
Calculating -------------------------------------
/inline 3.518k (± 6.0%) i/s - 17.650k in 5.038094s
/nested_loop 2.770k (± 1.8%) i/s - 14.014k in 5.060657s
/nested_loop_path 2.715k (± 4.3%) i/s - 13.750k in 5.074845s
/nested_collection 3.218k (± 4.3%) i/s - 16.250k in 5.061147s
/vc_loop 3.310k (± 2.0%) i/s - 16.626k in 5.024399s
Comparison:
/inline: 3517.7 i/s
/vc_loop: 3310.4 i/s - same-ish: difference falls within error
/nested_collection: 3217.8 i/s - same-ish: difference falls within error
/nested_loop: 2770.1 i/s - 1.27x (± 0.00) slower
/nested_loop_path: 2715.4 i/s - 1.30x (± 0.00) slower
Rails version: 7.0.3
Warming up --------------------------------------
/inline 345.000 i/100ms
/nested_loop 279.000 i/100ms
/nested_loop_path 272.000 i/100ms
/nested_collection 304.000 i/100ms
/vc_loop 310.000 i/100ms
Calculating -------------------------------------
/inline 3.452k (± 1.5%) i/s - 17.595k in 5.098560s
/nested_loop 2.708k (± 3.5%) i/s - 13.671k in 5.054214s
/nested_loop_path 2.652k (± 5.2%) i/s - 13.328k in 5.042542s
/nested_collection 2.992k (± 3.9%) i/s - 15.200k in 5.089442s
/vc_loop 3.082k (± 7.6%) i/s - 15.500k in 5.076108s
Comparison:
/inline: 3451.8 i/s
/vc_loop: 3082.2 i/s - 1.12x (± 0.00) slower
/nested_collection: 2991.5 i/s - 1.15x (± 0.00) slower
/nested_loop: 2708.5 i/s - 1.27x (± 0.00) slower
/nested_loop_path: 2651.9 i/s - 1.30x (± 0.00) slower
Rails version: 7.1.0.alpha
Warming up --------------------------------------
/inline 296.000 i/100ms
/nested_loop 247.000 i/100ms
/nested_loop_path 248.000 i/100ms
/nested_collection 296.000 i/100ms
/vc_loop 304.000 i/100ms
Calculating -------------------------------------
/inline 3.281k (± 3.9%) i/s - 16.576k in 5.060233s
/nested_loop 2.476k (± 5.1%) i/s - 12.350k in 5.005304s
/nested_loop_path 2.500k (± 1.9%) i/s - 12.648k in 5.061520s
/nested_collection 2.970k (± 2.1%) i/s - 15.096k in 5.085463s
/vc_loop 3.051k (± 2.1%) i/s - 15.504k in 5.083644s
Comparison:
/inline: 3281.4 i/s
/vc_loop: 3051.1 i/s - 1.08x (± 0.00) slower
/nested_collection: 2969.8 i/s - 1.10x (± 0.00) slower
/nested_loop_path: 2499.8 i/s - 1.31x (± 0.00) slower
/nested_loop: 2475.6 i/s - 1.33x (± 0.00) slower |
…ssue. this is because the whole render context is given to the partial. this view is using this extensively and thousands of times. just inlining the object should improve the performance. At the same time a lot of code can be de-duplicated. I kept a fallback in case of new objects. rails/rails#41452
Steps to reproduce
_a.html.erb
and_b.html.erb
)_a.html.erb
should contain a render call for partial_b.html.erb
_a.html.erb
in a collectionExpected behavior
Ideally, the time to render should be on par with the aggregate time to render both
_a
and_b
if_b
was not nested inside of_a
.Actual behavior
I see about a 10x difference in render time between a nested example and a flat example. I've done some preliminary digging with stackprof. Keep in mind that I am not well versed in the internals of ActionView rendering so my understanding of what is not working may not line up with reality. It seems like whatever optimization is done that affords such a significant speed difference between these two cases:
Fast
Slow
...is not able to be applied to the partial that is nested inside of the outer partial. Based on my stackprof digging it looks like some of that optimization is happening in the conversion of the partial string (
"a"
) into an actual path to a view file. In the fast case above, that conversion of"a"
into"#{Rails.root}/app/views/application/_a.html.erb"
happens one time, and in the slow case it happens 100 times. The same is true of the nested partials situation: the outer partial gets its view path calculated/looked-up one time, but the inner render call for the nested partial_b
gets looked-up on each iteration.I had some back and forth on Twitter with @tenderlove about this, and his intuition from the best description I could muster in 270 characters was that this was a bug and shouldn't require a performance hit. But it is entirely possible I wasn't explaining it well enough to him over the limited bandwidth communication channel of Twitter 😄
Tweet
https://twitter.com/tenderlove/status/1351223079805083648
Reproduction Repo
https://github.com/willcosgrove/partial-perf-repro
Server log images
Flat case (fast):

Nested case (slow):


I should also clarify about the logs: The default view logging in development definitely adds some additional overhead to the slow case, because it logs a line of output for each iteration. But even with view logging turned off completely, there is still a significant performance difference between the two cases, so I don't believe the difference falls entirely at the feet of ActionView logging instrumentation.
Flamegraph:
Fast:

Slow:

System configuration
Rails version:
Seems to be any version, but I tested specifically on 6.1.2.1 and
main
.Ruby version:
Also seems to be any version but I tested on both Ruby 2.7 and Ruby 3.0.
CC: @tenderlove
The text was updated successfully, but these errors were encountered: