Skip to content

Conversation

@byroot
Copy link
Member

@byroot byroot commented Sep 23, 2025

Ref: #55728

Starting in 7.2, since we no longer pin Active Record connection for the entirely of the request or job cycle, Pool#checkout and Pool#checkin overhead has to be paid for every query. It's not a whole lot once you consider a query will need to do a network hop, but we should still try to keep that overhead low.

This PR contains many different small optimizations, many of them too small to show a clear gain on a benchmark, but together they add up, see individual commits. (FYI: @cmaion).

Which brings up back to about half-way between 7.1 and main:

ruby 3.4.6 (2025-09-16 revision dbd83256b1) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
          7-1-stable     3.219k i/100ms
Calculating -------------------------------------
          7-1-stable     88.512k (± 4.1%) i/s   (11.30 μs/i) -    444.222k in   5.027300s

Comparison:
     ar-fast-checkin:    79704.1 i/s
          7-1-stable:    88512.0 i/s - 1.11x  faster
                main:    70279.4 i/s - 1.13x  slower
# frozen_string_literal: true

require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"

  gem "rails", path: "."
  gem "sqlite3"
  gem "benchmark-ips"
end

require "active_record/railtie"
require "benchmark/ips"

# This connection will do for database-independent bug reports.
ENV["DATABASE_URL"] = "sqlite3::memory:"

class TestApp < Rails::Application
  config.load_defaults Rails::VERSION::STRING.to_f
  config.eager_load = false
  config.logger = Logger.new(nil)
  config.log_level = :info
  config.secret_key_base = "secret_key_base"

  config.active_record.encryption.primary_key = "primary_key"
  config.active_record.encryption.deterministic_key = "deterministic_key"
  config.active_record.encryption.key_derivation_salt = "key_derivation_salt"
end
Rails.application.initialize!


ActiveRecord::Schema.define do
  create_table :posts, force: true do |t|
  end

  create_table :comments, force: true do |t|
    t.integer :post_id
  end
end

ActiveRecord::Base.release_connection unless Rails::VERSION::STRING < "7.2"

class Post < ActiveRecord::Base
end

BRANCH = `git rev-parse --abbrev-ref HEAD`.strip

Benchmark.ips do |x|
  x.report(BRANCH) { Post.count }
  x.save!("/tmp/bench-ar-data")
  x.compare!(order: :baseline)
end

There might be some more optimizations to perform, but I think this is now large enough to be worth merging.

@byroot byroot force-pushed the ar-fast-checkin branch 2 times, most recently from 5850403 to 2db00da Compare September 23, 2025 14:07
@rails-bot rails-bot bot added the railties label Sep 23, 2025
@byroot byroot force-pushed the ar-fast-checkin branch 2 times, most recently from 4caa087 to 2463dcb Compare September 23, 2025 14:43
@byroot byroot force-pushed the ar-fast-checkin branch 2 times, most recently from b8f61b8 to 279dd75 Compare September 24, 2025 13:42
@rails-bot rails-bot bot added the actionview label Sep 24, 2025
@byroot byroot force-pushed the ar-fast-checkin branch 2 times, most recently from 3948f4a to ceaf104 Compare September 24, 2025 14:33
@byroot
Copy link
Member Author

byroot commented Sep 24, 2025

Ok, so this branch currently reclaims a large part of the lost performance:

ruby 3.4.6 (2025-09-16 revision dbd83256b1) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
     ar-fast-checkin     7.781k i/100ms
Calculating -------------------------------------
     ar-fast-checkin     77.041k (± 2.8%) i/s   (12.98 μs/i) -    389.050k in   5.053953s

Comparison:
          7-1-stable:    87392.0 i/s
     ar-fast-checkin:    77040.6 i/s - 1.13x  slower
                main:    67712.3 i/s - 1.29x  slower

There might be some more to squeeze but it's getting pretty big so might be time to cleanup and merge what can be. My problem is that this is a lot of small optimizations that are within the error margin when benchmarked individually, so hard to justify them... Hence not to sure how to ship that.

@cmaion
Copy link

cmaion commented Sep 24, 2025

I guess that adding a ActiveRecord::Base.lease_connection before running the test still reduces the gap a bit?

@byroot
Copy link
Member Author

byroot commented Sep 24, 2025

Yes, if I keep a permanent lease:

ruby 3.4.6 (2025-09-16 revision dbd83256b1) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
     ar-fast-checkin     8.581k i/100ms
Calculating -------------------------------------
     ar-fast-checkin     86.840k (± 1.2%) i/s   (11.52 μs/i) -    437.631k in   5.040239s

Comparison:
          7-1-stable:    87392.0 i/s
     ar-fast-checkin:    86839.6 i/s - same-ish: difference falls within error
                main:    67712.3 i/s - 1.29x  slower

@cmaion
Copy link

cmaion commented Sep 24, 2025

Ran the bench on my side too:

ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [x86_64-linux]
Warming up --------------------------------------
         7-2-stable*     3.263k i/100ms
Calculating -------------------------------------
         7-2-stable*     37.049k (± 2.4%) i/s   (26.99 μs/i) -    185.991k in   5.022907s

Comparison:
          7-1-stable:    42964.7 i/s
    ar-fast-checkin*:    38859.4 i/s - 1.11x  slower
         7-2-stable*:    37049.1 i/s - 1.16x  slower
     ar-fast-checkin:    36060.7 i/s - 1.19x  slower
                main:    31789.4 i/s - 1.35x  slower
          7-2-stable:    31203.9 i/s - 1.38x  slower
  • 7-2-stable* is 7.2 + permanent lease
  • ar-fast-checkin* is your branch + permanent lease

Much closer indeed, and the micro optimizations does indeed add up, even if by isolation they are certainly hard to quantify.

EDIT: Ruby 3.4.6:

ruby 3.4.6 (2025-09-16 revision dbd83256b1) +YJIT +PRISM [x86_64-linux]
Warming up --------------------------------------
    ar-fast-checkin*     3.529k i/100ms
Calculating -------------------------------------
    ar-fast-checkin*     37.999k (± 4.0%) i/s   (26.32 μs/i) -    190.566k in   5.022988s

Comparison:
          7-1-stable:    44028.9 i/s
    ar-fast-checkin*:    37999.3 i/s - 1.16x  slower
               main*:    36109.0 i/s - 1.22x  slower
         7-2-stable*:    35829.3 i/s - 1.23x  slower
     ar-fast-checkin:    35292.5 i/s - 1.25x  slower
                main:    32538.4 i/s - 1.35x  slower
          7-2-stable:    30355.4 i/s - 1.45x  slower

@cmaion
Copy link

cmaion commented Sep 24, 2025

Benchmark result is not super stable on my laptop...
Re-ran the tests without YJIT to eliminate one potential source of noise.

ruby 3.4.6 (2025-09-16 revision dbd83256b1) +PRISM [x86_64-linux]
Warming up --------------------------------------
    ar-fast-checkin*     1.563k i/100ms
Calculating -------------------------------------
    ar-fast-checkin*     15.658k (± 2.6%) i/s   (63.86 μs/i) -     79.713k in   5.094245s

Comparison:
          7-1-stable:    15920.1 i/s
    ar-fast-checkin*:    15658.2 i/s - same-ish: difference falls within error
         7-2-stable*:    14475.3 i/s - 1.10x  slower
               main*:    14319.4 i/s - 1.11x  slower
     ar-fast-checkin:    13249.6 i/s - 1.20x  slower
                main:    12245.5 i/s - 1.30x  slower
          7-2-stable:    11532.5 i/s - 1.38x  slower

Looks like the conclusion still holds.
I also note that holding a permanent connection is still worthwhile when it's reasonable to do so (and obviously when the latency with the database is minimal too).

@rafaelfranca
Copy link
Member

My problem is that this is a lot of small optimizations that are within the error margin when benchmarked individually, so hard to justify them... Hence not to sure how to ship that.

I'd not worry with that. If the aggregation of patches make the code faster, I'd just apply them all together (in individual commits)

When initially defined, the `_run_<name>_callbacks` method is a noop.

Only once a callback has been registers, we swap the implementation
for the real one.

This saves a little bit of performance for hooks points that are
rarely used or only used in development.
Realistically, it's only ever going to be invoked in development,
hence it is wasteful to have a notification subscriber that is
a noop.

Instead we can delay the subscription to when the explain collection
is enabled for the first time, meaning except for a few rare cases
this code won't even be loaded in production environments.
This saves having to match the same regexp twice in a row.

Ideally we wouldn't even use a regexp in most cases, and instead
rely on the higher level hints, e.g. if `select_all` was used
we should assume it's a read.
Recent Rubies are able to perform `...` delegation with 0 allocations
we should use it when possible.
Callbacks have a fairly significant overhead, and don't really make the
code easier to follow. They are useful to allow third party code
to hook into the connections lifecycle, but there isn't a big
reason for internal code to use them rather than a regular method call.
Thanks to the GVL, we don't need that mutex when running on MRI.
This remove a little bit of overhead on every connection checkout and checkin.
@byroot byroot marked this pull request as ready for review September 25, 2025 09:14
@byroot
Copy link
Member Author

byroot commented Sep 25, 2025

Alright, I've cleaned up the git history, the benchmark is now:

ruby 3.4.6 (2025-09-16 revision dbd83256b1) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
          7-1-stable     3.219k i/100ms
Calculating -------------------------------------
          7-1-stable     88.512k (± 4.1%) i/s   (11.30 μs/i) -    444.222k in   5.027300s

Comparison:
     ar-fast-checkin:    79704.1 i/s
          7-1-stable:    88512.0 i/s - 1.11x  faster
                main:    70279.4 i/s - 1.13x  slower

I think there is a bit more perf to squeeze in AS:N, but I'd rather not sleep on these gains and merge soon.

I also removed the interlock patch given Matthew merged #55753

- Adds a fastpath in `iterate_guarding_exceptions` for when there's
only a single subscriber. In such case we don't need any fancy exception
handling.

- Merge the `groups_for` and `silenceable_group_for` caches, as to
fetch both records with a single lookup.

- Return a null object in `build_handle` if there are no subscribers
Accessing `IsolatedExecutionState` is a bit costly,
hence it's prefereable to minimize accesses.

By moving all 4 RuntimeRegistry metrics in a consolidated object,
we now only access the `IsolatedExecutionState` once per query
while before it was up to 8 times.
@byroot
Copy link
Member Author

byroot commented Sep 25, 2025

I feel like I'm taking crazy pills here.

I added a monkey patch in the benchmark to entirely bypass the database, because 7.1 and this branch use very different version of SQLite 3, and I wanted to reduce variance.

There's still a 9 to 15% difference (varies quite a bit)

Comparison:
         7-1-stable*:   114190.0 i/s
    ar-fast-checkin*:   101621.7 i/s - 1.12x  slower

But I'm struggling to spot a significant difference on the flame graphs:

7.1: https://share.firefox.dev/3W92sGQ
this branch: https://share.firefox.dev/4pGMEZo

Benchmark with monkey patches:

# frozen_string_literal: true

require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"

  gem "rails", path: "."
  # If you want to test against edge Rails replace the previous line with this:
  # gem "rails", github: "rails/rails", branch: "main"

  gem "debug"
  gem "sqlite3"
  gem "benchmark-ips"
  gem "vernier"
end

require "active_record/railtie"
require "benchmark/ips"

# This connection will do for database-independent bug reports.
ENV["DATABASE_URL"] = "sqlite3::memory:"

class TestApp < Rails::Application
  config.load_defaults Rails::VERSION::STRING.to_f
  config.eager_load = false
  config.logger = Logger.new(nil)
  config.log_level = :info
  config.secret_key_base = "secret_key_base"
  config.cache_classes = true
  config.eager_load = true

  config.active_record.encryption.primary_key = "primary_key"
  config.active_record.encryption.deterministic_key = "deterministic_key"
  config.active_record.encryption.key_derivation_salt = "key_derivation_salt"
end
Rails.application.initialize!


ActiveRecord::Schema.define do
  create_table :posts, force: true do |t|
  end

  create_table :comments, force: true do |t|
    t.integer :post_id
  end
end

ActiveRecord::Base.release_connection unless Rails::VERSION::STRING < "7.2"

class Post < ActiveRecord::Base
end

if ENV["SKIP_DB"]
  if Rails::VERSION::STRING < "7.2"
    module Sqlite3SkipDb
      def internal_exec_query(sql, name = nil, binds = [], prepare: false, async: false) # :nodoc:
        if sql == "SELECT COUNT(*) FROM \"posts\""
          sql = transform_query(sql)
          check_if_write_query(sql)

          mark_transaction_written_if_write(sql)

          type_casted_binds = type_casted_binds(binds)

          log(sql, name, binds, type_casted_binds, async: async) do
            with_raw_connection do |conn|
              # Don't cache statements if they are not prepared
              verified!

              build_result(columns: ["count(*)"], rows: [[1]])
            end
          end
        else
          super
        end
      end
    end
    ActiveRecord::ConnectionAdapters::SQLite3Adapter.prepend(Sqlite3SkipDb)
  else
    module Sqlite3SkipDb
      def perform_query(raw_connection, sql, binds, type_casted_binds, prepare:, notification_payload:, batch: false)
        if sql == "SELECT COUNT(*) FROM \"posts\""
          verified!

          notification_payload[:affected_rows] = 0
          notification_payload[:row_count] = 1
          ActiveRecord::Result.new(["count(*)"], [[1]], affected_rows: 0)
        else
          super
        end
      end
    end
    ActiveRecord::ConnectionAdapters::SQLite3Adapter.prepend(Sqlite3SkipDb)
  end
end


BRANCH = `git rev-parse --abbrev-ref HEAD`.strip + (ENV["SKIP_DB"] ? "*" : "")

Vernier.profile(out: "time_profile-#{BRANCH}.json") do
  (55060 * 10).times do
    Post.count
  end
end

Benchmark.ips do |x|
  x.report(BRANCH) { Post.count }
  x.save!("/tmp/bench-ar-data")
  x.compare!(order: :baseline)
end

@byroot byroot merged commit 15e1b2f into rails:main Sep 25, 2025
2 of 3 checks passed
# Thanks to the GVL, the LeaseRegistry doesn't need to be synchronized on MRI
class LeaseRegistry < WeakThreadKeyMap # :nodoc:
def [](context)
super || (self[context] = Lease.new)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this safe just because we know that we are context, and therefore even if we stall during Lease.new, we can't possibly be racing someone else?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. If we were to check the lease of another thread, we could race on the self[context] = Lease.new, but even that wouldn't necessarily be a problem.

Perhaps this may bite us in the future if we're not careful, but I like having minimal locking.

mark_transaction_written_if_write(sql)
if write_query?(sql)
ensure_writes_are_allowed(sql)
mark_transaction_written
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just inline both of these? They're both basically x if y, which was probably more method-worthy when they were called separately from each adapter etc? 🤷🏻‍♂️

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we could yes. More importantly we should pass down whether we expect a read or write query, e.g. if we're called from select_all, we should assume it's a read query.

That regexp is about 1.5% on the profile AFAICT.

_run_checkin_callbacks do
@idle_since = Process.clock_gettime(Process::CLOCK_MONOTONIC) if update_idle
@owner = nil
enable_lazy_transactions!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not something to change here, but consolidating like this draws my eye to the fairly arbitrary distribution of which "return to neutral state" things happen during checkin, and which during checkout.

end

def start
Subscriber.ensure_subscribed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

s.map(&:delegate)
end
end
def group_listeners(listeners)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def group_listeners(listeners)
def group_listeners(listeners) # :nodoc:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, should have waited for your review. I'll fix that on main.

aidanharan added a commit to rails-sqlserver/activerecord-sqlserver-adapter that referenced this pull request Sep 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants