Make Active Record's query cache an LRU #48110

casperisfine · 2023-05-02T15:46:52Z

I don't know how prevalent this really is, but I heard several time about users having memory exhaustion issues caused by the query cache when dealing with long running jobs.

Overall it seems sensible for this cache not to be entirely unbounded.

Opening this for feedback.

TODO:

Changelog
Document the flag
Decide if 50 is a good default

Ping @tenderlove in (the unlikely) case you'd remember the context on 9ce0211 as I'm essentially reverting it.

Public complaints:

https://code.jjb.cc/turning-off-activerecord-query-cache-to-improve-memory-consumption-in-background-jobs
Understanding a leak that happens only when running job via sidekiq worker sidekiq/sidekiq#3752 (comment)
(I remember a bunch more but these search term return a lot of noise)

nateberkopec · 2023-05-02T23:45:54Z

Lovely! I would be happy to no longer have to cover this in my workshops and conf talks 😆

50 feels low... but I'm unsure how to calibrate for the right number here.

IIRC there is a branch in the Batches code too that avoids QueryCache. If QueryCache is LRU, do we still want that branch in there? Maybe not?

QueryCache is already quite a hot path and not as fast as I would like... do we have a benchmark for it?

casperisfine · 2023-05-03T07:12:44Z

there is a branch in the Batches code too that avoids QueryCache. If QueryCache is LRU, do we still want that branch in there? Maybe not?

The in_batches is very unlikely to make hits, and is likely to generate many queries. If you know a query likely won't hit, I think it still makes sense to bypass the cache.

QueryCache is already quite a hot path and not as fast as I would like... do we have a benchmark for it?

No, I haven't benchmarked yet. But yes, I should do that. But I don't see a lot of opportunities for performance improvement here.

matthewd · 2023-05-03T07:55:40Z

QueryCache is already quite a hot path and not as fast as I would like... do we have a benchmark for it?

Do you mean cache lookup, or the full relation-through-cache-hit-to-result flow? I expect the latter to be pretty slow [compared to caching an already-loaded relation, say], because this is such a low level cache.

context on 9ce0211 as I'm essentially reverting it

Looks like it was the array allocation per cache lookup. Avoiding it in the no-binds case is an improvement, though still doesn't seem ideal. I don't have any better suggestions, though, short of separately LRUing each layer of the existing nested-hash structure.

It's a pretty trivial implementation, but still maybe worth pulling out an LRU class? I recall thinking that an LRU would be useful somewhere at some point in the past... but it might've been for this very thing 😅

casperisfine · 2023-05-03T08:31:24Z

Looks like it was the array allocation per cache lookup.

That's what Aaron remembers. But that was a decade ago, small allocations like this aren't as costly.

Avoiding it in the no-binds case is an improvement

The "no-binds" case is rather rare though, I'm not even sure if it's worth it.

short of separately LRUing each layer of the existing nested-hash structure.

I guess if we used a dedicated class it would be easy to keep track the global size, but then it becomes hard to figure out which is the oldest inserted values. Either way I'm not convinced that this two-level hash is actually faster, I'll benchmark.

casperisfine · 2023-05-03T09:18:55Z

Ok, so here's a benchmark:

# frozen_string_literal: true

begin
  require "bundler/inline"
rescue LoadError => e
  $stderr.puts "Bundler version 1.10 or later is required. Please update your Bundler"
  raise e
end

gemfile(true) do
  source "https://rubygems.org"
  case ENV["RAILS"]
  when "local"
    gem "activerecord", path: File.expand_path(".")
  when "edge"
    gem "activerecord", github: "rails/rails"
  else
    gem "activerecord", "~> 7.0.0"
  end
  gem "sqlite3"
  gem "benchmark-ips"
end

require "active_record"
require "minitest/autorun"
require "logger"

# This connection will do for database-independent bug reports.
ActiveRecord::Base.establish_connection(adapter: "sqlite3", database: ":memory:")
ActiveRecord::Base.logger = Logger.new(nil)
ActiveRecord::Base.logger.level = Logger::INFO

conn = ActiveRecord::Base.connection
conn.class.class_eval do
  public(:cache_sql)
end

binds = [ActiveRecord::Relation::QueryAttribute.new("id", "10", ActiveRecord::Type::Integer.new)]

Benchmark.ips do |x|
  x.report("hit") do
    conn.cache_sql("SELECT 1", "SQL", binds) { 1 }
  end
end

Benchmark.ips do |x|
  x.report("miss") do
    conn.cache_sql("SELECT 0", "SQL", binds) { 1 }
    conn.cache_sql("SELECT 1", "SQL", binds) { 1 }
    conn.cache_sql("SELECT 2", "SQL", binds) { 1 }
    conn.cache_sql("SELECT 3", "SQL", binds) { 1 }
    conn.cache_sql("SELECT 4", "SQL", binds) { 1 }
    conn.cache_sql("SELECT 5", "SQL", binds) { 1 }
    conn.cache_sql("SELECT 6", "SQL", binds) { 1 }
    conn.cache_sql("SELECT 7", "SQL", binds) { 1 }
    conn.cache_sql("SELECT 8", "SQL", binds) { 1 }
    conn.cache_sql("SELECT 9", "SQL", binds) { 1 }
    conn.clear_query_cache
  end
end

7.0:

Warming up --------------------------------------
                 hit    13.794k i/100ms
Calculating -------------------------------------
                 hit    138.448k (± 2.3%) i/s -    703.494k in   5.083924s
Warming up --------------------------------------
                miss     6.286k i/100ms
Calculating -------------------------------------
                miss     62.456k (± 4.3%) i/s -    314.300k in   5.043301s

main:

Warming up --------------------------------------
                 hit    20.380k i/100ms
Calculating -------------------------------------
                 hit    205.475k (± 1.7%) i/s -      1.039M in   5.059895s
Warming up --------------------------------------
                miss    11.048k i/100ms
Calculating -------------------------------------
                miss    109.093k (± 1.1%) i/s -    552.400k in   5.064167s

This branch:

Warming up --------------------------------------
                 hit    18.114k i/100ms
Calculating -------------------------------------
                 hit    180.707k (± 1.4%) i/s -    905.700k in   5.013033s
Warming up --------------------------------------
                miss     4.866k i/100ms
Calculating -------------------------------------
                miss     48.961k (± 1.4%) i/s -    248.166k in   5.069772s

Observations:

I think most of the difference between main and 7.0 is the earlier bail-out in AR LogSubscriber. The logging of the sql.activerecord event really dwarfs the lookup.
We do indeed take a moderate ~10% slowdown on cache hits, but a much bigger one on misses.

I'll profile a bit to see if I can close the gap a bit.

casperisfine · 2023-05-03T09:25:01Z

Ok, so unsurprisingly the vast majority of the time is spent hashing the binds:

==================================
  Mode: cpu(1000)
  Samples: 175 (0.00% miss rate)
  GC: 11 (6.29%)
==================================
     TOTAL    (pct)     SAMPLES    (pct)     FRAME
       157  (89.7%)          77  (44.0%)     ActiveModel::Attribute#hash
        78  (44.6%)          76  (43.4%)     ActiveModel::Type::Value#hash
        11   (6.3%)          11   (6.3%)     (marking)
       160  (91.4%)           6   (3.4%)     Array#hash
       164  (93.7%)           3   (1.7%)     ActiveRecord::ConnectionAdapters::QueryCache#cache_sql
        89  (50.9%)           1   (0.6%)     Hash#delete
         1   (0.6%)           1   (0.6%)     Kernel#hash

https://bugs.ruby-lang.org/issues/18897 may make this significantly faster. I'll with Ruby 3.3

casperisfine · 2023-05-03T09:44:11Z

It helps a bit, but it's not stellar:

Warming up --------------------------------------
                 hit    19.248k i/100ms
Calculating -------------------------------------
                 hit    194.125k (± 1.7%) i/s -    981.648k in   5.058307s
Warming up --------------------------------------
                miss     5.617k i/100ms
Calculating -------------------------------------
                miss     55.951k (± 1.3%) i/s -    280.850k in   5.020406s

I can't really think of any way to improve the performance though.

That said, the miss number is for 10 misses + 1 clear. So while it indeed much slower, it's still in the ~500k i/s range, so ~2us instead of ~1us while a hit on even the simplest query will likely save close to 1ms. So I think it's an acceptable hit, but happy to reconsider if some disagree.

casperisfine · 2023-05-03T14:24:16Z

Also thinking query_cache_size should probably be defined in the database config, not globally.

casperisfine · 2023-05-04T08:55:52Z

Ok, it's now controlled via database.yml and also allow to entirely disable it by setting query_cache: false.

I think the last thing here is to decide on the best default, but I feel like it's always going to be a bit arbitrary.

eileencodes

Looks good, left a few documentation comments but the rest seems fine to me.

activerecord/CHANGELOG.md

guides/source/configuring.md

I don't know how prevalent this really is, but I heard several time about users having memory exhaustion issues caused by the query cache when dealing with long running jobs. Overall it seems sensible for this cache not to be entirely unbounded.

casperisfine · 2023-05-07T02:36:43Z

Decide if 50 is a good default

So I raised the default to 100. It's still very much arbitrary, but the idea is mostly to avoid pathological cases (the classic long running job).

If someone has actual data, or strong opinion, I'm totally open to change the default. But ultimately a query result can be a handful of bytes or hundreds of megabytes. So no default will ever prevent all problems.

machour · 2023-05-11T23:52:11Z

activerecord/CHANGELOG.md

@@ -1,3 +1,25 @@
+*   Active Record query cache is now evicts least recently used entries
+
+    By default it only keeps the `50` most recently used queries.


Should this be 100 instead ?

That was fixed in #48192

rails-bot bot added the activerecord label May 2, 2023

casperisfine force-pushed the query-cache-lru branch from 3ee713e to ebd6b79 Compare May 4, 2023 07:59

rails-bot bot added the docs label May 4, 2023

casperisfine force-pushed the query-cache-lru branch 2 times, most recently from 12a9aaa to ea939a8 Compare May 4, 2023 08:36

eileencodes reviewed May 4, 2023

View reviewed changes

activerecord/CHANGELOG.md Outdated Show resolved Hide resolved

activerecord/CHANGELOG.md Outdated Show resolved Hide resolved

guides/source/configuring.md Outdated Show resolved Hide resolved

guides/source/configuring.md Outdated Show resolved Hide resolved

casperisfine force-pushed the query-cache-lru branch from ea939a8 to 28b4415 Compare May 4, 2023 15:44

Make Active Record's query cache an LRU

89a5d6a

I don't know how prevalent this really is, but I heard several time about users having memory exhaustion issues caused by the query cache when dealing with long running jobs. Overall it seems sensible for this cache not to be entirely unbounded.

casperisfine force-pushed the query-cache-lru branch from 28b4415 to 89a5d6a Compare May 7, 2023 02:17

byroot merged commit 5fd77fe into rails:main May 7, 2023
9 checks passed

casperisfine deleted the query-cache-lru branch May 7, 2023 02:38

sampatbadhe added a commit to sampatbadhe/rails that referenced this pull request May 7, 2023

Update changelog for rails#48110

85dc814

sampatbadhe mentioned this pull request May 7, 2023

Update changelog for https://github.com/rails/rails/pull/48110 [ci-skip] #48155

Merged

This was referenced May 11, 2023

Update CHANGELOG.md Shopify/rails#17

Closed

Update ActiveRecord CHANGELOG.md #48192

Merged

machour reviewed May 11, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Active Record's query cache an LRU #48110

Make Active Record's query cache an LRU #48110

casperisfine commented May 2, 2023 •

edited

nateberkopec commented May 2, 2023

casperisfine commented May 3, 2023

matthewd commented May 3, 2023

casperisfine commented May 3, 2023

casperisfine commented May 3, 2023

casperisfine commented May 3, 2023

casperisfine commented May 3, 2023

casperisfine commented May 3, 2023

casperisfine commented May 4, 2023

eileencodes left a comment

casperisfine commented May 7, 2023

machour May 11, 2023

casperisfine May 12, 2023

		@@ -1,3 +1,25 @@
		* Active Record query cache is now evicts least recently used entries

		By default it only keeps the `50` most recently used queries.

Make Active Record's query cache an LRU #48110

Make Active Record's query cache an LRU #48110

Conversation

casperisfine commented May 2, 2023 • edited

nateberkopec commented May 2, 2023

casperisfine commented May 3, 2023

matthewd commented May 3, 2023

casperisfine commented May 3, 2023

casperisfine commented May 3, 2023

casperisfine commented May 3, 2023

casperisfine commented May 3, 2023

casperisfine commented May 3, 2023

casperisfine commented May 4, 2023

eileencodes left a comment

Choose a reason for hiding this comment

casperisfine commented May 7, 2023

machour May 11, 2023

Choose a reason for hiding this comment

casperisfine May 12, 2023

Choose a reason for hiding this comment

casperisfine commented May 2, 2023 •

edited