Skip to content

Recyclable cache keys #29092

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 42 commits into from
May 18, 2017
Merged

Recyclable cache keys #29092

merged 42 commits into from
May 18, 2017

Conversation

dhh
Copy link
Member

@dhh dhh commented May 15, 2017

Key-based cache expiration is an incredibly powerful, simple way to do away with the error-ridden ways of manual cache expiration, but it can also be highly wasteful and generate lots of cache trash.

This happens when you have keys which churn at high velocity, leaving the abandoned keys to be garbage collected with no hint to the fact that they will never be used again. If you cache write volume is high, and you turn over your entire cache allowance frequently enough, this results in cache trash crowding out less-frequently-accessed-but-still-valid keys. Which in turn leads to high cache miss rates.

We can solve this problem by making the keys stable by separating the explicit version. So you can keep a stable key, like "products/1" and an associated version, like "20170202145500", instead of the combined "projects/1-20170202145500" key we've been using so far. This means that no matter how frequently Product/1 is touched, it'll still only write to the same cache key. That's the recycling part here.

This approach is similar to how HTTP caching works. There's a cache key in the form of the URL and then there's a version component in form of the ETAG.

This will form the foundation of recyclable cache keys.
@dhh dhh added this to the 5.2.0 milestone May 15, 2017
@bogdan
Copy link
Contributor

bogdan commented May 15, 2017

I don't like the resulting API I would suppose to be using here:

Rails.cache.fetch(["post_preview", post], version: post.updated_at)
# or slightly shorter version:
Rails.cache.fetch(["post_preview", post], version: post)

The need to explicitly pass version is sad.

Rails should still need to maintain a nicer API:

Rails.cache.fetch(["post_preview", post])
# with multiple objects too:
Rails.cache.fetch(["post_preview", post, post.author]) 

Supposing that any object passed as cache key to have a cache_version method besides cache_key:

class Post
  def cache_key
     "post/#{id}"
  end
  def cache_version
    updated_at
  end
end

@dhh
Copy link
Member Author

dhh commented May 15, 2017 via email

@bogdan
Copy link
Contributor

bogdan commented May 15, 2017

This API is documented and well maintained. There is no reason for anyone to avoid using this API directly.
What is an intended way to cache some heavy calculations in model if not through this API?

I use this API directly a lot because we have a project with heavy calculations that are outside of HTTP layer. Also this functionality has nothing to do with HTTP stack so there is no benefit from it being only available in the ActionView level.

@dhh
Copy link
Member Author

dhh commented May 15, 2017 via email

@dhh
Copy link
Member Author

dhh commented May 15, 2017 via email

@bogdan
Copy link
Contributor

bogdan commented May 15, 2017

Agree this is not 100% backward compatible. But if you would design the system this way, if you would start it from scratch, this is a good sign to review this as a direction to go. We can search for solutions if backward compatibility is the only concern.

@dhh
Copy link
Member Author

dhh commented May 15, 2017 via email

@kaspth
Copy link
Contributor

kaspth commented May 15, 2017

you turn over your entire cache allowance frequently enough

Just ask ol' daddy-o for some bigger smackeroos then, sonny! 😄

@kaspth
Copy link
Contributor

kaspth commented May 15, 2017

Unfinished: Dealing with multi_get/fetch

Are you tackling this as part of this PR or intending that for a later one?

Copy link
Contributor

@kaspth kaspth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly just found the docs a bit hard to read, 👍

Add a changelog entry too 😉

@@ -232,6 +232,11 @@ def mute
# new value. After that all the processes will start getting the new value.
# The key is to keep <tt>:race_condition_ttl</tt> small.
#
# Setting <tt>:version</tt> will verify that the cache stored in the <tt>name</tt>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This took me some tries to understand, so I took a dab at slimming it:

# Passing a <tt>:version</tt> verifies the cache stored under <tt>name</tt>
# is of the same version. nil is returned on mismatches despite contents.
# This feature is used to support recyclable cache keys.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like that.

@@ -307,17 +313,30 @@ def fetch(name, options = nil)
# the cache with the given key, then that data is returned. Otherwise,
# +nil+ is returned.
#
# As with fetch, the data is only returned if it has not expired per the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why mention fetch? Also confusing when put together with Fetches data from the cache which is how the read doc starts.

@@ -307,17 +313,30 @@ def fetch(name, options = nil)
# the cache with the given key, then that data is returned. Otherwise,
# +nil+ is returned.
#
# As with fetch, the data is only returned if it has not expired per the
# <tt>:expires_in<tt> option, and, if a <tt>:version</tt> parameter is passed
# to <tt>read</tt>, if it matches the <tt>:version</tt> it was written with.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also took me some tries. How about:

# Any found entry is only returned if it has not expired per the
# <tt>:expires_in</tt> option and when passed a
# <tt>:version</tt> parameter if the entry's version matches that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it so.

elsif entry.mismatched?(options[:version])
if payload
payload[:hit] = false
payload[:mismatch] = "#{entry.version} != #{options[:version]}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume we'll be logging this in the Action View log subscribing, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have a subscriber there to output this. It's a bit annoying that this is disconnected from the general instrumentation of read/write, because it means it has to be logged as a separate line. Would be much nicer if the read log line could say whether there was a version hit or not. Thoughts on how?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have a subscriber there to output this. It's a bit annoying that this is disconnected from the general instrumentation of read/write, because it means it has to be logged as a separate line. Would be much nicer if the read log line could say whether there was a version hit or not. Thoughts on how?

@dhh why does it have to be logged as a separate line?

I guess we could make this type of logging have tags as well, except they're tailing unlike the standard tagged logging.

Something like:

payload.tags[:hit]             = false
payload.tags[:mismatch] = "#{entry.version} != #{options[:version}"
# Later: …cache/key… [hit: false][mismatch: 123 != 456]

Would require payload being more than a Hash though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, right now we're not using this information. Don't think we need it at the moment given the fact that all fragment keys are now using versions. I'll nix it.

Could you have a look at the cache hit/miss issue?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'd rather hit that issue out of the park than miss it 🤓

@bogdan
Copy link
Contributor

bogdan commented May 16, 2017

@dhh do you plan to change ActiveRecord::Base#cache_key method? I think you can not move forward without it and it will be backward incompatible anyway (or at least generate the same type of problem as in my suggestion). Anyway I made the cache_version method in the way imagine it: #29107

@dhh
Copy link
Member Author

dhh commented May 16, 2017

I don't intend to change that method. What the CacheHelper#vcache method I will propose shortly does is check arity on that method and call it with #cache_key(:without_version). That relies on the fact that the first parameter to AR::Base#cache_key specifies the timestamp field to be used and if there's no match, it won't include a timestamp.

@dhh
Copy link
Member Author

dhh commented May 16, 2017

@kaspth Any thoughts on how we could make this work with multiget? I'm thinking that this low-level API could just be Rails.cache.fetch("a", "b", versions: [ 1, 2 ]). Then we'd need to do the right thing for cache: true in the render collection call.

@bogdan
Copy link
Contributor

bogdan commented May 16, 2017

That sounds pretty sad because cache_key is only used for ActiveSupport::Cache implementation (correct me if I am wrong). It means that cache_key argument and timestamp will be an artifact we would need to support.

In the meanwhile the behavior of CacheHelper#cache will be changed anyway. It may affect Apps in the same way the change in CacheStore#fetch would.

I would goal for a more idealistic solution at least long term and change the cache_key method to remove the timestamp.

Apps relaying on cache_key to include timestamp would not be necessary as cache_key is designed to be defined per app per model.

Here is the way one can maintain the old behavior basically forever:

module DeprecatedCacheKey
  def cache_key
     # old cache key implementation
  end
  def cache_version
    nil
  end
end
# To maintain the old behavior temporary
ActiveRecord::Base.send(:include, DeprecatedCacheKey)
# To migrate models one by one relaying on old behavior: 
Post.send(:include, DeprecatedCacheKey)

@dhh
Copy link
Member Author

dhh commented May 16, 2017

I think it's quite reasonable to leave CacheHelper#cache in place and use a second helper, like CacheHelper#vcache that uses the new version-based strategy. This would mean that people can adopt as they please.

cache_key is used in a variety of ways, not just by ActiveSupport::Cache. I don't think blowing backwards compatibility is necessary or would buy us very much.

@dhh
Copy link
Member Author

dhh commented May 16, 2017

Although, we could also consider that if cache_version is present on the model, then cache_key does not include the version. That would be backwards compatible.

@bogdan
Copy link
Contributor

bogdan commented May 16, 2017

I think it's quite reasonable to leave CacheHelper#cache in place and use a second helper, like CacheHelper#vcache that uses the new version-based strategy. This would mean that people can adopt as they please.

When I see vcache besides cache, I imagine how I would deliver the knowledge to the team of 20 people, where not all of them are Sr developers. Ideally I want vcache to be the only thing used. I don't imagine vcache and cache working together in apps that were generated on 5.2+. cache versioning: true looks like a better idea anyway.

Although, we could also consider that if cache_version is present on the model, then cache_key does not include the version. That would be backwards compatible.

That is a good step forward. We should definitely do that in case we go for the cache_version method support.
The problem of canonic cache_version method in each app would be a problem.
It can be simplified to the following:

class ApplicationRecord < AR::Base
  def cache_version
    ActiveRecord::Base.cache_version(self)
  end
end
# or maybe even
ApplicationRecord.cache_version = true

@dhh
Copy link
Member Author

dhh commented May 16, 2017 via email

@@ -547,6 +579,10 @@ def normalize_key(key, options)
key
end

def normalize_version(key, options)
options[:version] || key.try(:cache_version)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to support more advanced array keys like:

fetch(["preview", post])
fetch(["preview", post, post.author])

There is an implementation in expand_cache_version here: https://github.com/rails/rails/pull/29107/files#diff-438394335b9c1ce6ec4f67a407f50a42R89

@dhh
Copy link
Member Author

dhh commented May 16, 2017

I think the #cache_version as the toggle has legs. Just extended it to Active Record in a backwards compatible form 👍.


##
# :singleton-method:
# Indicates whether to use a stable #cache_key method that is accompaigned
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/accompaigned/accompanied/

@@ -85,6 +85,14 @@ def expand_cache_key(key, namespace = nil)
expanded_cache_key
end

def expand_cache_version(key)
case
when key.respond_to?(:cache_version) then key.cache_version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to have a consistent behavior for the cache version. cache_key can only be a string at the end (meaning that String is always put into cache store). Maybe we need the same for version. In this case we always need to call to_param here.

def cache_key(*timestamp_names)
if new_record?
"#{model_name.cache_key}/new"
else
timestamp = if timestamp_names.any?
max_updated_column_timestamp(timestamp_names)
if cache_version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When cache_key is called with argument, the cache_version setting should be ignored.

I am not sure why cache_key has an argument: I have checked the code and there is no calls to cache_key with arguments neither from AS::Cache nor from other places. I would deprecate the argument as part of this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The argument is for calling like <% cache [ person.cache_key(:bio_updated_at) ] %>, but yes, not a common usage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. From this usecase: when cache_key is called with argument, the cache_version setting should be ignored.

@@ -85,6 +85,14 @@ def expand_cache_key(key, namespace = nil)
expanded_cache_key
end

def expand_cache_version(key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is currently public. I am not sure why would anyone call it explicitly. I would make it private. Otherwise it needs to be documented.

@dhh
Copy link
Member Author

dhh commented May 17, 2017

Current problem is that to use versioning via views, it goes CacheHelper#cache -> Fragments#write_fragment -> ActiveSupport::Cache.expand_cache_key -> ActiveSupport::Cache::Store#write. Well, in the expand_cache_key step we currently convert the key to a string, which means that ActiveSupport::Cache::Store#write can't introspect it for #cache_version. So need to retain the array-based key all the way from the top to the store, which I'm having some trouble trying to do in a backwards compatible manner.

tubbo pushed a commit to redis-store/redis-activesupport that referenced this pull request Mar 13, 2018
ActiveSupport 5.2.0 introduces the concept of
[recyclable cache keys](rails/rails#29092),
which prevents a bunch of unnecessary keys being created and then having
to be evicted. This should reduce the memory overhead of keeping a Redis
or Memcache server up in order to support a cache. This caused issues
for us because the `options` are only being written into the cache key,
not the actual entry, which is now required by Rails' cache store
semantics. This should fix issues that people are having with rails 5.2.0.rc1
and redis-activesupport.
@gingerlime
Copy link
Contributor

I noticed some unexpected behaviour related to this change, when used with Active Record relations (e.g. cache Product.all).

The cache_key implementation works great, but when normalize_version is executed, the relation does not respond to cache_version, and therefore ends up being expanded with to_a. So essentially, calling Product.all.to_a and then for each object calling cache_version, which returns nil. This obviously has significant performance repercussions, which defeats the benefits of caching in lots of cases.

I can work around it by using something like cache ActiveSupport::Cache.expand_cache_key(Product.all), but this feels a bit dirty. See SO question

This is, by the way, without versioning switched on.

It's probably an anti-pattern to use AR relations as cache keys in views(?), but this still caught me by surprise. It works and produces the right cache key, but with a pretty hefty and unexpected side-effect.

Is there a plan to implement cache_version on Active Record relations? Or what's the recommended way to deal with this? (sorry if I'm posting in the wrong place, but it's directly connected to this change as far as I can tell).

gingerlime pushed a commit to gingerlime/rails that referenced this pull request Nov 21, 2018
* After introducing cache versioning, even with cache versioning off
  there's a performance regression when passing an Active Record
  relation to cache
* This happens in ActiveSupport::Cache inside `normalize_version`
* This method would check if the relation responds to cache_version
  and if not, would recrusively normalize it with `to_a`
* This would lead to the relation being retrieved from database and
  enumerated, causing the performance regression
* This fix simply adds `cache_version` returning `nil` to Active Record
  relations
* This is a temporary stopgap, until relation cache versioning is
  implemented. See rails#34378
@ghost ghost deleted a comment Jul 16, 2019
rsanheim added a commit to simpledotorg/simple-server that referenced this pull request Jul 10, 2020
This keeps the version seperate from Rails standard cache_key, which
allows for better recycling of cached entries as things get updated
freqently.  This shouldn't impact our app right now, as we are only
using our own manual cache keys for analytics.

For more details on how recyclable cache keys work, see:

* rails/rails#29092
* https://dzone.com/articles/cache-invalidation-complexity-rails-52-and-dalli-c
rsanheim added a commit to simpledotorg/simple-server that referenced this pull request Jul 20, 2020
This keeps the version seperate from Rails standard cache_key, which
allows for better recycling of cached entries as things get updated
freqently.  This shouldn't impact our app right now, as we are only
using our own manual cache keys for analytics.

For more details on how recyclable cache keys work, see:

* rails/rails#29092
* https://dzone.com/articles/cache-invalidation-complexity-rails-52-and-dalli-c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants