Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Active Record's query cache an LRU #48110

Merged
merged 1 commit into from May 7, 2023
Merged

Conversation

casperisfine
Copy link
Contributor

@casperisfine casperisfine commented May 2, 2023

I don't know how prevalent this really is, but I heard several time about users having memory exhaustion issues caused by the query cache when dealing with long running jobs.

Overall it seems sensible for this cache not to be entirely unbounded.

Opening this for feedback.

TODO:

  • Changelog
  • Document the flag
  • Decide if 50 is a good default

Ping @tenderlove in (the unlikely) case you'd remember the context on 9ce0211 as I'm essentially reverting it.

Public complaints:

@nateberkopec
Copy link
Contributor

Lovely! I would be happy to no longer have to cover this in my workshops and conf talks 😆

50 feels low... but I'm unsure how to calibrate for the right number here.

IIRC there is a branch in the Batches code too that avoids QueryCache. If QueryCache is LRU, do we still want that branch in there? Maybe not?

QueryCache is already quite a hot path and not as fast as I would like... do we have a benchmark for it?

@casperisfine
Copy link
Contributor Author

there is a branch in the Batches code too that avoids QueryCache. If QueryCache is LRU, do we still want that branch in there? Maybe not?

The in_batches is very unlikely to make hits, and is likely to generate many queries. If you know a query likely won't hit, I think it still makes sense to bypass the cache.

QueryCache is already quite a hot path and not as fast as I would like... do we have a benchmark for it?

No, I haven't benchmarked yet. But yes, I should do that. But I don't see a lot of opportunities for performance improvement here.

@matthewd
Copy link
Member

matthewd commented May 3, 2023

QueryCache is already quite a hot path and not as fast as I would like... do we have a benchmark for it?

Do you mean cache lookup, or the full relation-through-cache-hit-to-result flow? I expect the latter to be pretty slow [compared to caching an already-loaded relation, say], because this is such a low level cache.

context on 9ce0211 as I'm essentially reverting it

Looks like it was the array allocation per cache lookup. Avoiding it in the no-binds case is an improvement, though still doesn't seem ideal. I don't have any better suggestions, though, short of separately LRUing each layer of the existing nested-hash structure.


It's a pretty trivial implementation, but still maybe worth pulling out an LRU class? I recall thinking that an LRU would be useful somewhere at some point in the past... but it might've been for this very thing 😅

@casperisfine
Copy link
Contributor Author

Looks like it was the array allocation per cache lookup.

That's what Aaron remembers. But that was a decade ago, small allocations like this aren't as costly.

Avoiding it in the no-binds case is an improvement

The "no-binds" case is rather rare though, I'm not even sure if it's worth it.

short of separately LRUing each layer of the existing nested-hash structure.

I guess if we used a dedicated class it would be easy to keep track the global size, but then it becomes hard to figure out which is the oldest inserted values. Either way I'm not convinced that this two-level hash is actually faster, I'll benchmark.

@casperisfine
Copy link
Contributor Author

Ok, so here's a benchmark:

# frozen_string_literal: true

begin
  require "bundler/inline"
rescue LoadError => e
  $stderr.puts "Bundler version 1.10 or later is required. Please update your Bundler"
  raise e
end

gemfile(true) do
  source "https://rubygems.org"
  case ENV["RAILS"]
  when "local"
    gem "activerecord", path: File.expand_path(".")
  when "edge"
    gem "activerecord", github: "rails/rails"
  else
    gem "activerecord", "~> 7.0.0"
  end
  gem "sqlite3"
  gem "benchmark-ips"
end

require "active_record"
require "minitest/autorun"
require "logger"

# This connection will do for database-independent bug reports.
ActiveRecord::Base.establish_connection(adapter: "sqlite3", database: ":memory:")
ActiveRecord::Base.logger = Logger.new(nil)
ActiveRecord::Base.logger.level = Logger::INFO

conn = ActiveRecord::Base.connection
conn.class.class_eval do
  public(:cache_sql)
end

binds = [ActiveRecord::Relation::QueryAttribute.new("id", "10", ActiveRecord::Type::Integer.new)]

Benchmark.ips do |x|
  x.report("hit") do
    conn.cache_sql("SELECT 1", "SQL", binds) { 1 }
  end
end

Benchmark.ips do |x|
  x.report("miss") do
    conn.cache_sql("SELECT 0", "SQL", binds) { 1 }
    conn.cache_sql("SELECT 1", "SQL", binds) { 1 }
    conn.cache_sql("SELECT 2", "SQL", binds) { 1 }
    conn.cache_sql("SELECT 3", "SQL", binds) { 1 }
    conn.cache_sql("SELECT 4", "SQL", binds) { 1 }
    conn.cache_sql("SELECT 5", "SQL", binds) { 1 }
    conn.cache_sql("SELECT 6", "SQL", binds) { 1 }
    conn.cache_sql("SELECT 7", "SQL", binds) { 1 }
    conn.cache_sql("SELECT 8", "SQL", binds) { 1 }
    conn.cache_sql("SELECT 9", "SQL", binds) { 1 }
    conn.clear_query_cache
  end
end

7.0:

Warming up --------------------------------------
                 hit    13.794k i/100ms
Calculating -------------------------------------
                 hit    138.448k (± 2.3%) i/s -    703.494k in   5.083924s
Warming up --------------------------------------
                miss     6.286k i/100ms
Calculating -------------------------------------
                miss     62.456k (± 4.3%) i/s -    314.300k in   5.043301s

main:

Warming up --------------------------------------
                 hit    20.380k i/100ms
Calculating -------------------------------------
                 hit    205.475k (± 1.7%) i/s -      1.039M in   5.059895s
Warming up --------------------------------------
                miss    11.048k i/100ms
Calculating -------------------------------------
                miss    109.093k (± 1.1%) i/s -    552.400k in   5.064167s

This branch:

Warming up --------------------------------------
                 hit    18.114k i/100ms
Calculating -------------------------------------
                 hit    180.707k (± 1.4%) i/s -    905.700k in   5.013033s
Warming up --------------------------------------
                miss     4.866k i/100ms
Calculating -------------------------------------
                miss     48.961k (± 1.4%) i/s -    248.166k in   5.069772s

Observations:

  • I think most of the difference between main and 7.0 is the earlier bail-out in AR LogSubscriber. The logging of the sql.activerecord event really dwarfs the lookup.
  • We do indeed take a moderate ~10% slowdown on cache hits, but a much bigger one on misses.

I'll profile a bit to see if I can close the gap a bit.

@casperisfine
Copy link
Contributor Author

Ok, so unsurprisingly the vast majority of the time is spent hashing the binds:

==================================
  Mode: cpu(1000)
  Samples: 175 (0.00% miss rate)
  GC: 11 (6.29%)
==================================
     TOTAL    (pct)     SAMPLES    (pct)     FRAME
       157  (89.7%)          77  (44.0%)     ActiveModel::Attribute#hash
        78  (44.6%)          76  (43.4%)     ActiveModel::Type::Value#hash
        11   (6.3%)          11   (6.3%)     (marking)
       160  (91.4%)           6   (3.4%)     Array#hash
       164  (93.7%)           3   (1.7%)     ActiveRecord::ConnectionAdapters::QueryCache#cache_sql
        89  (50.9%)           1   (0.6%)     Hash#delete
         1   (0.6%)           1   (0.6%)     Kernel#hash

https://bugs.ruby-lang.org/issues/18897 may make this significantly faster. I'll with Ruby 3.3

@casperisfine
Copy link
Contributor Author

It helps a bit, but it's not stellar:

Warming up --------------------------------------
                 hit    19.248k i/100ms
Calculating -------------------------------------
                 hit    194.125k (± 1.7%) i/s -    981.648k in   5.058307s
Warming up --------------------------------------
                miss     5.617k i/100ms
Calculating -------------------------------------
                miss     55.951k (± 1.3%) i/s -    280.850k in   5.020406s

I can't really think of any way to improve the performance though.

That said, the miss number is for 10 misses + 1 clear. So while it indeed much slower, it's still in the ~500k i/s range, so ~2us instead of ~1us while a hit on even the simplest query will likely save close to 1ms. So I think it's an acceptable hit, but happy to reconsider if some disagree.

@casperisfine
Copy link
Contributor Author

Also thinking query_cache_size should probably be defined in the database config, not globally.

@rails-bot rails-bot bot added the docs label May 4, 2023
@casperisfine casperisfine force-pushed the query-cache-lru branch 2 times, most recently from 12a9aaa to ea939a8 Compare May 4, 2023 08:36
@casperisfine
Copy link
Contributor Author

Ok, it's now controlled via database.yml and also allow to entirely disable it by setting query_cache: false.

I think the last thing here is to decide on the best default, but I feel like it's always going to be a bit arbitrary.

Copy link
Member

@eileencodes eileencodes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, left a few documentation comments but the rest seems fine to me.

activerecord/CHANGELOG.md Outdated Show resolved Hide resolved
activerecord/CHANGELOG.md Outdated Show resolved Hide resolved
guides/source/configuring.md Outdated Show resolved Hide resolved
guides/source/configuring.md Outdated Show resolved Hide resolved
I don't know how prevalent this really is, but I heard several time
about users having memory exhaustion issues caused by the query cache
when dealing with long running jobs.

Overall it seems sensible for this cache not to be entirely unbounded.
@casperisfine
Copy link
Contributor Author

  • Decide if 50 is a good default

So I raised the default to 100. It's still very much arbitrary, but the idea is mostly to avoid pathological cases (the classic long running job).

If someone has actual data, or strong opinion, I'm totally open to change the default. But ultimately a query result can be a handful of bytes or hundreds of megabytes. So no default will ever prevent all problems.

@byroot byroot merged commit 5fd77fe into rails:main May 7, 2023
9 checks passed
@casperisfine casperisfine deleted the query-cache-lru branch May 7, 2023 02:38
sampatbadhe added a commit to sampatbadhe/rails that referenced this pull request May 7, 2023
@@ -1,3 +1,25 @@
* Active Record query cache is now evicts least recently used entries

By default it only keeps the `50` most recently used queries.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be 100 instead ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was fixed in #48192

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants