Micro-optimize Active Record connection checkin #55736

byroot · 2025-09-23T12:34:01Z

Starting in 7.2, since we no longer pin Active Record connection for the entirely of the request or job cycle, Pool#checkout and Pool#checkin overhead has to be paid for every query. It's not a whole lot once you consider a query will need to do a network hop, but we should still try to keep that overhead low.

This PR contains many different small optimizations, many of them too small to show a clear gain on a benchmark, but together they add up, see individual commits. (FYI: @cmaion).

Which brings up back to about half-way between 7.1 and main:

ruby 3.4.6 (2025-09-16 revision dbd83256b1) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
          7-1-stable     3.219k i/100ms
Calculating -------------------------------------
          7-1-stable     88.512k (± 4.1%) i/s   (11.30 μs/i) -    444.222k in   5.027300s

Comparison:
     ar-fast-checkin:    79704.1 i/s
          7-1-stable:    88512.0 i/s - 1.11x  faster
                main:    70279.4 i/s - 1.13x  slower

# frozen_string_literal: true

require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"

  gem "rails", path: "."
  gem "sqlite3"
  gem "benchmark-ips"
end

require "active_record/railtie"
require "benchmark/ips"

# This connection will do for database-independent bug reports.
ENV["DATABASE_URL"] = "sqlite3::memory:"

class TestApp < Rails::Application
  config.load_defaults Rails::VERSION::STRING.to_f
  config.eager_load = false
  config.logger = Logger.new(nil)
  config.log_level = :info
  config.secret_key_base = "secret_key_base"

  config.active_record.encryption.primary_key = "primary_key"
  config.active_record.encryption.deterministic_key = "deterministic_key"
  config.active_record.encryption.key_derivation_salt = "key_derivation_salt"
end
Rails.application.initialize!


ActiveRecord::Schema.define do
  create_table :posts, force: true do |t|
  end

  create_table :comments, force: true do |t|
    t.integer :post_id
  end
end

ActiveRecord::Base.release_connection unless Rails::VERSION::STRING < "7.2"

class Post < ActiveRecord::Base
end

BRANCH = `git rev-parse --abbrev-ref HEAD`.strip

Benchmark.ips do |x|
  x.report(BRANCH) { Post.count }
  x.save!("/tmp/bench-ar-data")
  x.compare!(order: :baseline)
end

There might be some more optimizations to perform, but I think this is now large enough to be worth merging.

activerecord/lib/active_record/connection_adapters/abstract/connection_pool.rb

activesupport/lib/active_support/callbacks.rb

activerecord/lib/active_record/runtime_registry.rb

activerecord/lib/active_record/explain_subscriber.rb

byroot · 2025-09-24T14:37:07Z

Ok, so this branch currently reclaims a large part of the lost performance:

ruby 3.4.6 (2025-09-16 revision dbd83256b1) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
     ar-fast-checkin     7.781k i/100ms
Calculating -------------------------------------
     ar-fast-checkin     77.041k (± 2.8%) i/s   (12.98 μs/i) -    389.050k in   5.053953s

Comparison:
          7-1-stable:    87392.0 i/s
     ar-fast-checkin:    77040.6 i/s - 1.13x  slower
                main:    67712.3 i/s - 1.29x  slower

There might be some more to squeeze but it's getting pretty big so might be time to cleanup and merge what can be. My problem is that this is a lot of small optimizations that are within the error margin when benchmarked individually, so hard to justify them... Hence not to sure how to ship that.

cmaion · 2025-09-24T14:49:07Z

I guess that adding a ActiveRecord::Base.lease_connection before running the test still reduces the gap a bit?

byroot · 2025-09-24T14:52:55Z

Yes, if I keep a permanent lease:

ruby 3.4.6 (2025-09-16 revision dbd83256b1) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
     ar-fast-checkin     8.581k i/100ms
Calculating -------------------------------------
     ar-fast-checkin     86.840k (± 1.2%) i/s   (11.52 μs/i) -    437.631k in   5.040239s

Comparison:
          7-1-stable:    87392.0 i/s
     ar-fast-checkin:    86839.6 i/s - same-ish: difference falls within error
                main:    67712.3 i/s - 1.29x  slower

cmaion · 2025-09-24T15:01:42Z

Ran the bench on my side too:

ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [x86_64-linux]
Warming up --------------------------------------
         7-2-stable*     3.263k i/100ms
Calculating -------------------------------------
         7-2-stable*     37.049k (± 2.4%) i/s   (26.99 μs/i) -    185.991k in   5.022907s

Comparison:
          7-1-stable:    42964.7 i/s
    ar-fast-checkin*:    38859.4 i/s - 1.11x  slower
         7-2-stable*:    37049.1 i/s - 1.16x  slower
     ar-fast-checkin:    36060.7 i/s - 1.19x  slower
                main:    31789.4 i/s - 1.35x  slower
          7-2-stable:    31203.9 i/s - 1.38x  slower

7-2-stable* is 7.2 + permanent lease
ar-fast-checkin* is your branch + permanent lease

Much closer indeed, and the micro optimizations does indeed add up, even if by isolation they are certainly hard to quantify.

EDIT: Ruby 3.4.6:

ruby 3.4.6 (2025-09-16 revision dbd83256b1) +YJIT +PRISM [x86_64-linux]
Warming up --------------------------------------
    ar-fast-checkin*     3.529k i/100ms
Calculating -------------------------------------
    ar-fast-checkin*     37.999k (± 4.0%) i/s   (26.32 μs/i) -    190.566k in   5.022988s

Comparison:
          7-1-stable:    44028.9 i/s
    ar-fast-checkin*:    37999.3 i/s - 1.16x  slower
               main*:    36109.0 i/s - 1.22x  slower
         7-2-stable*:    35829.3 i/s - 1.23x  slower
     ar-fast-checkin:    35292.5 i/s - 1.25x  slower
                main:    32538.4 i/s - 1.35x  slower
          7-2-stable:    30355.4 i/s - 1.45x  slower

cmaion · 2025-09-24T15:27:11Z

Benchmark result is not super stable on my laptop...
Re-ran the tests without YJIT to eliminate one potential source of noise.

ruby 3.4.6 (2025-09-16 revision dbd83256b1) +PRISM [x86_64-linux]
Warming up --------------------------------------
    ar-fast-checkin*     1.563k i/100ms
Calculating -------------------------------------
    ar-fast-checkin*     15.658k (± 2.6%) i/s   (63.86 μs/i) -     79.713k in   5.094245s

Comparison:
          7-1-stable:    15920.1 i/s
    ar-fast-checkin*:    15658.2 i/s - same-ish: difference falls within error
         7-2-stable*:    14475.3 i/s - 1.10x  slower
               main*:    14319.4 i/s - 1.11x  slower
     ar-fast-checkin:    13249.6 i/s - 1.20x  slower
                main:    12245.5 i/s - 1.30x  slower
          7-2-stable:    11532.5 i/s - 1.38x  slower

Looks like the conclusion still holds.
I also note that holding a permanent connection is still worthwhile when it's reasonable to do so (and obviously when the latency with the database is minimal too).

rafaelfranca · 2025-09-24T17:55:37Z

My problem is that this is a lot of small optimizations that are within the error margin when benchmarked individually, so hard to justify them... Hence not to sure how to ship that.

I'd not worry with that. If the aggregation of patches make the code faster, I'd just apply them all together (in individual commits)

When initially defined, the `_run_<name>_callbacks` method is a noop. Only once a callback has been registers, we swap the implementation for the real one. This saves a little bit of performance for hooks points that are rarely used or only used in development.

Realistically, it's only ever going to be invoked in development, hence it is wasteful to have a notification subscriber that is a noop. Instead we can delay the subscription to when the explain collection is enabled for the first time, meaning except for a few rare cases this code won't even be loaded in production environments.

This saves having to match the same regexp twice in a row. Ideally we wouldn't even use a regexp in most cases, and instead rely on the higher level hints, e.g. if `select_all` was used we should assume it's a read.

Recent Rubies are able to perform `...` delegation with 0 allocations we should use it when possible.

Callbacks have a fairly significant overhead, and don't really make the code easier to follow. They are useful to allow third party code to hook into the connections lifecycle, but there isn't a big reason for internal code to use them rather than a regular method call.

Thanks to the GVL, we don't need that mutex when running on MRI. This remove a little bit of overhead on every connection checkout and checkin.

byroot · 2025-09-25T09:16:08Z

Alright, I've cleaned up the git history, the benchmark is now:

ruby 3.4.6 (2025-09-16 revision dbd83256b1) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
          7-1-stable     3.219k i/100ms
Calculating -------------------------------------
          7-1-stable     88.512k (± 4.1%) i/s   (11.30 μs/i) -    444.222k in   5.027300s

Comparison:
     ar-fast-checkin:    79704.1 i/s
          7-1-stable:    88512.0 i/s - 1.11x  faster
                main:    70279.4 i/s - 1.13x  slower

I think there is a bit more perf to squeeze in AS:N, but I'd rather not sleep on these gains and merge soon.

I also removed the interlock patch given Matthew merged #55753

- Adds a fastpath in `iterate_guarding_exceptions` for when there's only a single subscriber. In such case we don't need any fancy exception handling. - Merge the `groups_for` and `silenceable_group_for` caches, as to fetch both records with a single lookup. - Return a null object in `build_handle` if there are no subscribers

Accessing `IsolatedExecutionState` is a bit costly, hence it's prefereable to minimize accesses. By moving all 4 RuntimeRegistry metrics in a consolidated object, we now only access the `IsolatedExecutionState` once per query while before it was up to 8 times.

byroot · 2025-09-25T09:49:45Z

I feel like I'm taking crazy pills here.

I added a monkey patch in the benchmark to entirely bypass the database, because 7.1 and this branch use very different version of SQLite 3, and I wanted to reduce variance.

There's still a 9 to 15% difference (varies quite a bit)

Comparison:
         7-1-stable*:   114190.0 i/s
    ar-fast-checkin*:   101621.7 i/s - 1.12x  slower

But I'm struggling to spot a significant difference on the flame graphs:

7.1: https://share.firefox.dev/3W92sGQ
this branch: https://share.firefox.dev/4pGMEZo

Benchmark with monkey patches:

# frozen_string_literal: true

require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"

  gem "rails", path: "."
  # If you want to test against edge Rails replace the previous line with this:
  # gem "rails", github: "rails/rails", branch: "main"

  gem "debug"
  gem "sqlite3"
  gem "benchmark-ips"
  gem "vernier"
end

require "active_record/railtie"
require "benchmark/ips"

# This connection will do for database-independent bug reports.
ENV["DATABASE_URL"] = "sqlite3::memory:"

class TestApp < Rails::Application
  config.load_defaults Rails::VERSION::STRING.to_f
  config.eager_load = false
  config.logger = Logger.new(nil)
  config.log_level = :info
  config.secret_key_base = "secret_key_base"
  config.cache_classes = true
  config.eager_load = true

  config.active_record.encryption.primary_key = "primary_key"
  config.active_record.encryption.deterministic_key = "deterministic_key"
  config.active_record.encryption.key_derivation_salt = "key_derivation_salt"
end
Rails.application.initialize!


ActiveRecord::Schema.define do
  create_table :posts, force: true do |t|
  end

  create_table :comments, force: true do |t|
    t.integer :post_id
  end
end

ActiveRecord::Base.release_connection unless Rails::VERSION::STRING < "7.2"

class Post < ActiveRecord::Base
end

if ENV["SKIP_DB"]
  if Rails::VERSION::STRING < "7.2"
    module Sqlite3SkipDb
      def internal_exec_query(sql, name = nil, binds = [], prepare: false, async: false) # :nodoc:
        if sql == "SELECT COUNT(*) FROM \"posts\""
          sql = transform_query(sql)
          check_if_write_query(sql)

          mark_transaction_written_if_write(sql)

          type_casted_binds = type_casted_binds(binds)

          log(sql, name, binds, type_casted_binds, async: async) do
            with_raw_connection do |conn|
              # Don't cache statements if they are not prepared
              verified!

              build_result(columns: ["count(*)"], rows: [[1]])
            end
          end
        else
          super
        end
      end
    end
    ActiveRecord::ConnectionAdapters::SQLite3Adapter.prepend(Sqlite3SkipDb)
  else
    module Sqlite3SkipDb
      def perform_query(raw_connection, sql, binds, type_casted_binds, prepare:, notification_payload:, batch: false)
        if sql == "SELECT COUNT(*) FROM \"posts\""
          verified!

          notification_payload[:affected_rows] = 0
          notification_payload[:row_count] = 1
          ActiveRecord::Result.new(["count(*)"], [[1]], affected_rows: 0)
        else
          super
        end
      end
    end
    ActiveRecord::ConnectionAdapters::SQLite3Adapter.prepend(Sqlite3SkipDb)
  end
end


BRANCH = `git rev-parse --abbrev-ref HEAD`.strip + (ENV["SKIP_DB"] ? "*" : "")

Vernier.profile(out: "time_profile-#{BRANCH}.json") do
  (55060 * 10).times do
    Post.count
  end
end

Benchmark.ips do |x|
  x.report(BRANCH) { Post.count }
  x.save!("/tmp/bench-ar-data")
  x.compare!(order: :baseline)
end

matthewd · 2025-09-25T09:41:52Z

activerecord/lib/active_record/connection_adapters/abstract/connection_pool.rb

+        # Thanks to the GVL, the LeaseRegistry doesn't need to be synchronized on MRI
+        class LeaseRegistry < WeakThreadKeyMap # :nodoc:
+          def [](context)
+            super || (self[context] = Lease.new)


Is this safe just because we know that we are context, and therefore even if we stall during Lease.new, we can't possibly be racing someone else?

Indeed. If we were to check the lease of another thread, we could race on the self[context] = Lease.new, but even that wouldn't necessarily be a problem.

Perhaps this may bite us in the future if we're not careful, but I like having minimal locking.

matthewd · 2025-09-25T09:45:49Z

activerecord/lib/active_record/connection_adapters/abstract/database_statements.rb

-          mark_transaction_written_if_write(sql)
+          if write_query?(sql)
+            ensure_writes_are_allowed(sql)
+            mark_transaction_written


Should we just inline both of these? They're both basically x if y, which was probably more method-worthy when they were called separately from each adapter etc? 🤷🏻‍♂️

I suppose we could yes. More importantly we should pass down whether we expect a read or write query, e.g. if we're called from select_all, we should assume it's a read query.

That regexp is about 1.5% on the profile AFAICT.

matthewd · 2025-09-25T09:48:31Z

activerecord/lib/active_record/connection_adapters/abstract_adapter.rb

+          _run_checkin_callbacks do
+            @idle_since = Process.clock_gettime(Process::CLOCK_MONOTONIC) if update_idle
+            @owner = nil
+            enable_lazy_transactions!


Not something to change here, but consolidating like this draws my eye to the fairly arbitrary distribution of which "return to neutral state" things happen during checkin, and which during checkout.

matthewd · 2025-09-25T09:52:38Z

activerecord/lib/active_record/explain_registry.rb

    end

+    def start
+      Subscriber.ensure_subscribed


matthewd · 2025-09-25T09:57:32Z

activesupport/lib/active_support/notifications/fanout.rb

-            s.map(&:delegate)
-          end
-        end
+      def group_listeners(listeners)


Suggested change

def group_listeners(listeners)

def group_listeners(listeners) # :nodoc:

Thanks, should have waited for your review. I'll fix that on main.

Ref: rails/rails#55736

rails-bot bot added activerecord activesupport labels Sep 23, 2025

byroot force-pushed the ar-fast-checkin branch 2 times, most recently from 5850403 to 2db00da Compare September 23, 2025 14:07

rails-bot bot added the railties label Sep 23, 2025

byroot force-pushed the ar-fast-checkin branch 2 times, most recently from 4caa087 to 2463dcb Compare September 23, 2025 14:43

rafaelfranca reviewed Sep 24, 2025

View reviewed changes

activerecord/lib/active_record/connection_adapters/abstract/connection_pool.rb Outdated Show resolved Hide resolved

activesupport/lib/active_support/callbacks.rb Outdated Show resolved Hide resolved

activerecord/lib/active_record/runtime_registry.rb Outdated Show resolved Hide resolved

byroot commented Sep 24, 2025

View reviewed changes

activerecord/lib/active_record/explain_subscriber.rb Outdated Show resolved Hide resolved

byroot force-pushed the ar-fast-checkin branch 2 times, most recently from b8f61b8 to 279dd75 Compare September 24, 2025 13:42

rails-bot bot added the actionview label Sep 24, 2025

byroot force-pushed the ar-fast-checkin branch from 279dd75 to 165251d Compare September 24, 2025 14:04

rails-bot bot added actioncable actionpack labels Sep 24, 2025

byroot force-pushed the ar-fast-checkin branch 2 times, most recently from 3948f4a to ceaf104 Compare September 24, 2025 14:33

byroot force-pushed the ar-fast-checkin branch from ceaf104 to 3d22ddd Compare September 25, 2025 09:09

byroot added 6 commits September 25, 2025 11:10

Refactor preprocess_query to call write_query? only once

07854a5

This saves having to match the same regexp twice in a row. Ideally we wouldn't even use a regexp in most cases, and instead rely on the higher level hints, e.g. if `select_all` was used we should assume it's a read.

ActiveSupport::Notifications leverage ... delegation

d918d4e

Recent Rubies are able to perform `...` delegation with 0 allocations we should use it when possible.

ActiveRecord LeaseRegistry: skip locking on MRI

3f669e6

Thanks to the GVL, we don't need that mutex when running on MRI. This remove a little bit of overhead on every connection checkout and checkin.

byroot force-pushed the ar-fast-checkin branch from 3d22ddd to 8f1c140 Compare September 25, 2025 09:10

byroot marked this pull request as ready for review September 25, 2025 09:14

byroot force-pushed the ar-fast-checkin branch from 8f1c140 to aec433d Compare September 25, 2025 09:17

byroot added 2 commits September 25, 2025 11:25

byroot force-pushed the ar-fast-checkin branch from aec433d to 7d12071 Compare September 25, 2025 09:29

byroot merged commit 15e1b2f into rails:main Sep 25, 2025
2 of 3 checks passed

byroot mentioned this pull request Sep 25, 2025

7.1 -> 7.2 perf regression due to non permanent ActiveRecord connections #55728

Open

matthewd reviewed Sep 25, 2025

View reviewed changes

aidanharan added a commit to rails-sqlserver/activerecord-sqlserver-adapter that referenced this pull request Sep 28, 2025

Modify namespace of ActiveRecord explain class

8217d47

Ref: rails/rails#55736

tombruijn mentioned this pull request Oct 17, 2025

Fix Rails 8.1 (RC1) ActiveSupport::Notifications support appsignal/appsignal-ruby#1471

Open

	def group_listeners(listeners)
	def group_listeners(listeners) # :nodoc:

Micro-optimize Active Record connection checkin #55736

Micro-optimize Active Record connection checkin #55736

Uh oh!

Conversation

byroot commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

byroot commented Sep 24, 2025

Uh oh!

cmaion commented Sep 24, 2025

Uh oh!

byroot commented Sep 24, 2025

Uh oh!

cmaion commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cmaion commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rafaelfranca commented Sep 24, 2025

Uh oh!

byroot commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

byroot commented Sep 25, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

byroot commented Sep 23, 2025 •

edited

Loading

cmaion commented Sep 24, 2025 •

edited

Loading

cmaion commented Sep 24, 2025 •

edited

Loading

byroot commented Sep 25, 2025 •

edited

Loading