-
Notifications
You must be signed in to change notification settings - Fork 21.9k
Introduce Journey::Ast to avoid extra ast walks #39935
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce Journey::Ast to avoid extra ast walks #39935
Conversation
aa21dea
to
ccabbc0
Compare
My initial review:
I'll take a closer look at the implementation soon. |
I pointed my app to my local Rails and added this to my routes file: Rails.application.routes.draw do
+ before_object_count = GC.stat[:total_allocated_objects]
+ time = Benchmark.realtime do
#... all the route definitions
+ end
+ puts GC.stat[:total_allocated_objects] - before_object_count
+ puts time
end Then ran For the timing I ran this not terribly elegant script, passing the refs I wanted to test out as arguments. You might want to change 500 to something more reasonable - I ran this and then went to bed. def report(filename, ljust)
times = File.read("journey_benchmark/#{filename}").lines.map(&:chomp).map(&:to_f)
median = times.inject(&:+) / times.length
dev = Math.sqrt((times.map { |time| (time - median)**2 }).inject(&:+) / (times.length - 1))
puts "#{filename.ljust(ljust)} - #{median.round(3)} ± #{dev.round(3)} (#{times.length} runs)"
end
def run_benchmark(ref)
puts "Running benchmark for '#{ref}'"
Dir.chdir("path/to/local/rails") do
`git checkout #{ref}`
end
500.times do
`bin/rails runner true | tail -1 >> "journey_benchmark/#{ref}"`
end
end
Dir.chdir("path/to/application") do
ARGV.each do |ref|
run_benchmark(ref)
end
ljust = ARGV.map(&:length).max
ARGV.each do |ref|
report(ref, ljust)
end
end I'm sure there is a better way to do this... |
Great question. We should try to measure the retained memory. We might not need the ast objects anymore after building the simulator in lib/action_dispatch/journey/routes.rb, but we may have to do something to make clearing them out work with |
ccabbc0
to
5c8c836
Compare
I put this in def eager_load!
router.eager_load!
routes.each(&:eager_load!)
require 'objspace'
GC.start
puts ObjectSpace.each_object(Object).count
nil
end master - 1141852
this commit - 1159326 Is this a reasonable way to measure what was retained? Update: if I wipe out the references from within |
Once possibility is to clear out the AST objects at the end of We found two tests that involve creating a simulator and then adding a new route after, but I don't know how realistic these tests are. These two test also don't have anything to do with eager loading.
Update: After some discussion with Rafael França and Matthew Draper I think we should preserve the ability to add more routes after eager loading. So instead we can clear out all the cached values inside the ast objects after eager loading, then recalculate those values again if necessary. |
Somewhat yes. But I'd suggest using http://github.com/SamSaffron/memory_profiler for a breakdown of what was retained etc. |
Profiling this branch against our application full boot sequence. Baseline is Most stackprof results will be CPU profiles unless specified otherwise. First I do see the nice
If I look at
On the memory front, I see a slight increase. The total retention post boot is up by 2.7MB for us (to take with a grain of salt as there are some variance. We're a bit above what I consider to be the margin of error, but not by that much). On the retentions attributed to journey itself (NB: again these are not perfectly precise, a retention can very easily attributed to the wrong source): The total allocations during boot is up by about
So the speedup does indeed seem visible on our app, and the memory increase if any isn't a show stopper.
Let me know if you'd like me to memory profile that. |
cc13db6
to
45f657e
Compare
@casperisfine thanks so much for running those profiles. @emilford and I added a commit that will hopefully cut down on the retained memory, and would be most grateful if you could run the profile again with the new commit. |
This is what I am seeing: Before clearing out ast objects in 45f657e
After clearing out ast objects in 45f657e
Baseline
We could probably do a little better in path/pattern, except that we have to hold onto the ast root node to make the new |
Total memory retention is now down by about
So that looks very good from my end. |
Tests added and all the numbers seem to be looking pretty good. This is ready for another review when somebody has a chance. |
if node.terminal? | ||
terminals << node | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like a lot for initialize
- what do you think about pulling the tree
iteration out into a private
method and then calling that from initialize
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that makes sense to me. Thanks!
7843c50
to
f984804
Compare
f984804
to
56e3e85
Compare
@tenderlove I would be interested in getting your thoughts on this PR at some point as well (no rush). The tl;dr for the benchmarks is that this seems to be a positive change for shopify and github/github. |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
@eileencodes @jhawthorn this is something @emilford and I had worked on to improve boot time for GitHub. If this is something you are still interested in I'd be happy to revive this PR, otherwise I'll go ahead and close it. |
@composerinteralia I'll be happy to merge it. |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
@composerinteralia do you want to pick this up again? There were some changes to journey so it'll need a rebase and possible a new benchmark/profile. |
Oh right, I forgot about this. I'd planned on working on it a few months ago, but got distracted by an unexpected job search. Yeah, I should be able to get to it in the next couple weeks. |
2242d7c
to
bf86dac
Compare
Rebased and reworked a little to take #41024 into consideration. I also wrote a cleaner benchmark to show that drawing routes is a bit faster with this PR: Before:
After:
Benchmark# frozen_string_literal: true
require "bundler/inline"
gemfile(true) do
source "https://rubygems.org"
git_source(:github) { |repo| "https://github.com/#{repo}.git" }
gem "rails", path: "/Users/composerinteralia/code/rails/rails"
gem "benchmark-ips"
gem "stackprof"
gem "memory_profiler"
end
require "action_controller"
class FooController < ActionController::Base; end
result = Benchmark.ips do |x|
x.report("drawing a simple route") {
rs_simple = ActionDispatch::Routing::RouteSet.new
rs_simple.draw do
get "/path", to: "controller#action"
end
rs_simple.eager_load!
}
x.report("drawing a complex route") {
rs_simple = ActionDispatch::Routing::RouteSet.new
rs_simple.draw do
get "/path/(optional)/:symbol/with-:constraint/*glob", constraints: { constraint: /[abc]+/}, to: "controller#action"
end
rs_simple.eager_load!
}
end |
This CI failures don't look obviously related. Are there any known problems at the moment, or do I need to do some digging? |
Most of the failures are ruby-head. We just passed midnight UTC so we got a new version, some upstream change must have broken the suite, I'll likely have a look. And yeah the one Ruby 3.0 postgres failure is clearly unrelated. |
Ah most of these I already fixed with bbaf40b, a few remaining ones in Active Record will need some extra work. |
bf86dac
to
2369b1e
Compare
So I tested this against our app again. Not sure if intended, and we can probably update our code, but it does break one test:
As for boot performance, As for memory, So 👍 from my end, but maybe have a look at that Other than that, I'm still happy to merge, you'll just have to squash your commits into one. |
Thank you for doing that! Are you able to look into that NoMethodError a bit more, or provide more of the stack trace? Seems like there is a Journey::Path::Pattern getting initialized with a CAT node wrapping the AST, instead of the AST directly. I'm not sure how that would happen. |
We have some monkey patch in the routing code that is creating the def concat_paths(prefix, suffix)
joined_ast = ActionDispatch::Journey::Nodes::Cat.new(prefix.ast, suffix.ast)
prefix_sep = prefix.instance_variable_get(:@separators).split
suffix_sep = suffix.instance_variable_get(:@separators).split
build_pattern(
joined_ast,
prefix.requirements.merge(suffix.requirements),
(prefix_sep + suffix_sep).uniq,
prefix.anchored,
)
end We should just fix that. |
This commit introduces a new `Journey::Ast` class that wraps the root node of an ast. The purpose of this class is to reduce the number of walks through the ast by taking a single pass through each node and caching values for later use. To avoid retaining additional memory, we clear out these ast objects after eager loading. Benefits --- * Performance improvements (see benchmarks below) * Keeps various ast manipulations together in a single class, rather than scattered throughout * Adding some names to things will hopefully make this code a little easier to follow for future readers Benchmarks --- We benchmarked loading a real routes file with > 3500 routes. master was at a9336a6 when we ran these. Note that these benchmarks also include a few small changes that we didn't include in this commit, but that we will follow up with after this gets merged - these additional changes do not change the benchmarks significantly. Time: ``` master - 0.798 ± 0.024 (500 runs) this branch - 0.695 ± 0.029 (500 runs) ``` Allocations: ``` master - 980149 total allocated objects this branch - 931357 total allocated objects ``` Stackprof: Seeing `ActionDispatch::Journey::Visitors::Each#visit` more frequently on the stack is what led us down this path in the first place. These changes seem to have done the trick. ``` master: TOTAL (pct) SAMPLES (pct) FRAME 52 (0.5%) 52 (0.5%) ActionDispatch::Journey::Nodes::Node#symbol? 58 (0.5%) 45 (0.4%) ActionDispatch::Journey::Scanner#scan 45 (0.4%) 45 (0.4%) ActionDispatch::Journey::Nodes::Cat#type 43 (0.4%) 43 (0.4%) ActionDispatch::Journey::Visitors::FunctionalVisitor#terminal 303 (2.7%) 43 (0.4%) ActionDispatch::Journey::Visitors::Each#visit 69 (0.6%) 40 (0.4%) ActionDispatch::Routing::Mapper::Scope#each this commit: TOTAL (pct) SAMPLES (pct) FRAME 82 (0.6%) 42 (0.3%) ActionDispatch::Journey::Scanner#next_token 31 (0.2%) 31 (0.2%) ActionDispatch::Journey::Nodes::Node#symbol? 30 (0.2%) 30 (0.2%) ActionDispatch::Journey::Nodes::Node#initialize ``` See also the benchmark script in rails#39935 (comment) Co-authored-by: Eric Milford <ericmilford@gmail.com>
2369b1e
to
f353057
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite the Journey
on this PR! Glad we're doing less walking around these trees, an Ast
ute find!
end | ||
|
||
@spec | ||
@ast = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to set the @ast
to nil
? We have code that get two patterns and join then and I can't find a way to made that code work if we don't have the original ast anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not to say that this breaks other methods in this object, so the object will be in an invalid state after eager load.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the reason to not keep the @ast
is to avoid to keep the arrays that are stored in the Ast
object, we could keep the ast.tree
like it was being done before.
diff --git a/actionpack/lib/action_dispatch/journey/path/pattern.rb b/actionpack/lib/action_dispatch/journey/path/pattern.rb
index 4cd862f791..41a04fdcc0 100644
--- a/actionpack/lib/action_dispatch/journey/path/pattern.rb
+++ b/actionpack/lib/action_dispatch/journey/path/pattern.rb
@@ -4,11 +4,11 @@ module ActionDispatch
module Journey # :nodoc:
module Path # :nodoc:
class Pattern # :nodoc:
- attr_reader :ast, :names, :requirements, :anchored
- alias :spec :ast
+ attr_reader :ast, :names, :requirements, :anchored, :spec
def initialize(ast, requirements, separators, anchored)
@ast = ast
+ @spec = ast.tree
@requirements = requirements
@separators = separators
@anchored = anchored
@composerinteralia what do you think?
requirements_anchored
would still be broken after eager loading though, but we could probably cache that method before removing the @ast
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the line to clear out the Ast
objects after eager load in response to concerns about increasing retained memory. I was aware this would break some methods, but those methods are not public api and not called after eager load so I thought it would be OK.
Bringing back the original behavior of spec
to return the root Node
rather than an Ast
makes sense to me. That way we can clear out the Ast
and still be able to return the root node. I've opened #43006 to do that, as well as clean up a few other small details.
I'm not sure we need to cache requirements_anchored
, unless there's a use case for calling that after eager_load. We use it to partition the route, which only happens when first adding the route.
#39935 changed the behavior of `Path::Pattern#spec` and `Route#ast` to return an `Ast` rather than the root `Node`. After eager loading, however, we clear out the `Ast` to limit retained memory and these methods return `nil`. While these methods are not public and they aren't used internally after eager loading, having them return `nil` makes it difficult for applications that had been using these methods to get access to the root `Node`. This commit restores the behavior of these two methods to return the root `Node`. The `Ast` is still available via `Path::Pattern#ast`, and we still clear it out after eager loading. Now that spec is a `Node` and not an `Ast` masquerading as one, we can get rid of the delegate methods on `Ast. Since `Route#ast` now returns the root `Node`, the newly added `Route#ast_root` is no longer necessary so I've removed it. I also removed the unused `@decorated_ast` method, which should have been removed in #39935.
This commit introduces a new
Journey::Ast
class that wraps the rootnode of an ast. The purpose of this class is to reduce the number of
walks through the ast by consolidating necessary processing of nodes in
a single pass and holding onto values that can be used later.
Benefits
than scattered throughout
easier to follow for future readers
Benchmarks
We benchmarked loading a real routes file with > 3500 routes.
master was at a9336a6 when we ran these. Note that these benchmarks
also include a few small changes that we didn't include in this commit,
but that we will follow up with after this gets merged - these
additional changes do not change the benchmarks significantly.
Time:
Allocations:
Stackprof:
Seeing
ActionDispatch::Journey::Visitors::Each#visit
more frequentlyon the stack is what led us down this path in the first place. These
changes seem to have done the trick.
Co-authored-by: Eric Milford ericmilford@gmail.com
@eugeneius this could use an initial review, if you are interested