-
Notifications
You must be signed in to change notification settings - Fork 22k
Micro-optimize Active Record connection checkin #55736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
5850403
to
2db00da
Compare
4caa087
to
2463dcb
Compare
activerecord/lib/active_record/connection_adapters/abstract/connection_pool.rb
Outdated
Show resolved
Hide resolved
b8f61b8
to
279dd75
Compare
279dd75
to
165251d
Compare
3948f4a
to
ceaf104
Compare
Ok, so this branch currently reclaims a large part of the lost performance:
There might be some more to squeeze but it's getting pretty big so might be time to cleanup and merge what can be. My problem is that this is a lot of small optimizations that are within the error margin when benchmarked individually, so hard to justify them... Hence not to sure how to ship that. |
I guess that adding a |
Yes, if I keep a permanent lease:
|
Ran the bench on my side too:
Much closer indeed, and the micro optimizations does indeed add up, even if by isolation they are certainly hard to quantify. EDIT: Ruby 3.4.6:
|
Benchmark result is not super stable on my laptop...
Looks like the conclusion still holds. |
I'd not worry with that. If the aggregation of patches make the code faster, I'd just apply them all together (in individual commits) |
ceaf104
to
3d22ddd
Compare
When initially defined, the `_run_<name>_callbacks` method is a noop. Only once a callback has been registers, we swap the implementation for the real one. This saves a little bit of performance for hooks points that are rarely used or only used in development.
Realistically, it's only ever going to be invoked in development, hence it is wasteful to have a notification subscriber that is a noop. Instead we can delay the subscription to when the explain collection is enabled for the first time, meaning except for a few rare cases this code won't even be loaded in production environments.
This saves having to match the same regexp twice in a row. Ideally we wouldn't even use a regexp in most cases, and instead rely on the higher level hints, e.g. if `select_all` was used we should assume it's a read.
Recent Rubies are able to perform `...` delegation with 0 allocations we should use it when possible.
Callbacks have a fairly significant overhead, and don't really make the code easier to follow. They are useful to allow third party code to hook into the connections lifecycle, but there isn't a big reason for internal code to use them rather than a regular method call.
Thanks to the GVL, we don't need that mutex when running on MRI. This remove a little bit of overhead on every connection checkout and checkin.
3d22ddd
to
8f1c140
Compare
Alright, I've cleaned up the git history, the benchmark is now:
I think there is a bit more perf to squeeze in AS:N, but I'd rather not sleep on these gains and merge soon. I also removed the interlock patch given Matthew merged #55753 |
8f1c140
to
aec433d
Compare
- Adds a fastpath in `iterate_guarding_exceptions` for when there's only a single subscriber. In such case we don't need any fancy exception handling. - Merge the `groups_for` and `silenceable_group_for` caches, as to fetch both records with a single lookup. - Return a null object in `build_handle` if there are no subscribers
Accessing `IsolatedExecutionState` is a bit costly, hence it's prefereable to minimize accesses. By moving all 4 RuntimeRegistry metrics in a consolidated object, we now only access the `IsolatedExecutionState` once per query while before it was up to 8 times.
aec433d
to
7d12071
Compare
I feel like I'm taking crazy pills here. I added a monkey patch in the benchmark to entirely bypass the database, because 7.1 and this branch use very different version of SQLite 3, and I wanted to reduce variance. There's still a 9 to 15% difference (varies quite a bit)
But I'm struggling to spot a significant difference on the flame graphs: 7.1: https://share.firefox.dev/3W92sGQ Benchmark with monkey patches: # frozen_string_literal: true
require "bundler/inline"
gemfile(true) do
source "https://rubygems.org"
gem "rails", path: "."
# If you want to test against edge Rails replace the previous line with this:
# gem "rails", github: "rails/rails", branch: "main"
gem "debug"
gem "sqlite3"
gem "benchmark-ips"
gem "vernier"
end
require "active_record/railtie"
require "benchmark/ips"
# This connection will do for database-independent bug reports.
ENV["DATABASE_URL"] = "sqlite3::memory:"
class TestApp < Rails::Application
config.load_defaults Rails::VERSION::STRING.to_f
config.eager_load = false
config.logger = Logger.new(nil)
config.log_level = :info
config.secret_key_base = "secret_key_base"
config.cache_classes = true
config.eager_load = true
config.active_record.encryption.primary_key = "primary_key"
config.active_record.encryption.deterministic_key = "deterministic_key"
config.active_record.encryption.key_derivation_salt = "key_derivation_salt"
end
Rails.application.initialize!
ActiveRecord::Schema.define do
create_table :posts, force: true do |t|
end
create_table :comments, force: true do |t|
t.integer :post_id
end
end
ActiveRecord::Base.release_connection unless Rails::VERSION::STRING < "7.2"
class Post < ActiveRecord::Base
end
if ENV["SKIP_DB"]
if Rails::VERSION::STRING < "7.2"
module Sqlite3SkipDb
def internal_exec_query(sql, name = nil, binds = [], prepare: false, async: false) # :nodoc:
if sql == "SELECT COUNT(*) FROM \"posts\""
sql = transform_query(sql)
check_if_write_query(sql)
mark_transaction_written_if_write(sql)
type_casted_binds = type_casted_binds(binds)
log(sql, name, binds, type_casted_binds, async: async) do
with_raw_connection do |conn|
# Don't cache statements if they are not prepared
verified!
build_result(columns: ["count(*)"], rows: [[1]])
end
end
else
super
end
end
end
ActiveRecord::ConnectionAdapters::SQLite3Adapter.prepend(Sqlite3SkipDb)
else
module Sqlite3SkipDb
def perform_query(raw_connection, sql, binds, type_casted_binds, prepare:, notification_payload:, batch: false)
if sql == "SELECT COUNT(*) FROM \"posts\""
verified!
notification_payload[:affected_rows] = 0
notification_payload[:row_count] = 1
ActiveRecord::Result.new(["count(*)"], [[1]], affected_rows: 0)
else
super
end
end
end
ActiveRecord::ConnectionAdapters::SQLite3Adapter.prepend(Sqlite3SkipDb)
end
end
BRANCH = `git rev-parse --abbrev-ref HEAD`.strip + (ENV["SKIP_DB"] ? "*" : "")
Vernier.profile(out: "time_profile-#{BRANCH}.json") do
(55060 * 10).times do
Post.count
end
end
Benchmark.ips do |x|
x.report(BRANCH) { Post.count }
x.save!("/tmp/bench-ar-data")
x.compare!(order: :baseline)
end |
# Thanks to the GVL, the LeaseRegistry doesn't need to be synchronized on MRI | ||
class LeaseRegistry < WeakThreadKeyMap # :nodoc: | ||
def [](context) | ||
super || (self[context] = Lease.new) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this safe just because we know that we are context
, and therefore even if we stall during Lease.new
, we can't possibly be racing someone else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. If we were to check the lease of another thread, we could race on the self[context] = Lease.new
, but even that wouldn't necessarily be a problem.
Perhaps this may bite us in the future if we're not careful, but I like having minimal locking.
mark_transaction_written_if_write(sql) | ||
if write_query?(sql) | ||
ensure_writes_are_allowed(sql) | ||
mark_transaction_written |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just inline both of these? They're both basically x if y
, which was probably more method-worthy when they were called separately from each adapter etc? 🤷🏻♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we could yes. More importantly we should pass down whether we expect a read or write query, e.g. if we're called from select_all
, we should assume it's a read query.
That regexp is about 1.5% on the profile AFAICT.
_run_checkin_callbacks do | ||
@idle_since = Process.clock_gettime(Process::CLOCK_MONOTONIC) if update_idle | ||
@owner = nil | ||
enable_lazy_transactions! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not something to change here, but consolidating like this draws my eye to the fairly arbitrary distribution of which "return to neutral state" things happen during checkin, and which during checkout.
end | ||
|
||
def start | ||
Subscriber.ensure_subscribed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
s.map(&:delegate) | ||
end | ||
end | ||
def group_listeners(listeners) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def group_listeners(listeners) | |
def group_listeners(listeners) # :nodoc: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, should have waited for your review. I'll fix that on main
.
Ref: #55728
Starting in 7.2, since we no longer pin Active Record connection for the entirely of the request or job cycle,
Pool#checkout
andPool#checkin
overhead has to be paid for every query. It's not a whole lot once you consider a query will need to do a network hop, but we should still try to keep that overhead low.This PR contains many different small optimizations, many of them too small to show a clear gain on a benchmark, but together they add up, see individual commits. (FYI: @cmaion).
Which brings up back to about half-way between 7.1 and
main
:There might be some more optimizations to perform, but I think this is now large enough to be worth merging.