Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rails.cache.fetch unable to retrieve previous cached value if expired #45311

Closed
poloka opened this issue Jun 9, 2022 · 18 comments · May be fixed by #45313
Closed

Rails.cache.fetch unable to retrieve previous cached value if expired #45311

poloka opened this issue Jun 9, 2022 · 18 comments · May be fixed by #45313

Comments

@poloka
Copy link

poloka commented Jun 9, 2022

Steps to reproduce

key = 'foo'
args = { expires_in: 10 }
Rails.cache.fetch(key, args) { 'bing' }
sleep(10)
entry = Rails.cache.send(:read_entry, key, args)
begin
  Rails.cache.fetch(key, expires_in: 10) { raise 'error' }
rescue StandardError
  # nice to have to reset original cache value
  Rails.cache.write(key, entry.value, args) if entry&.expired? # write back into the cache the original value and reset cache 
  entry&.value
end

Problem statement

If an unexpected error occurs within the yield block say to a service call being down for a short period, it would be nice to provide the option to retrieve the existing cached value and re inject the value back into the cache to check again when the new cache period expires. This would allow servers to continue to function with a value rather than crashing because a consuming service went down for retrieving updates making my server more resilient to other consuming service problems.

Actual behavior

If the yield block fails, the previous cached value is deleted and no longer accessible after the fact.

Enhancement request

My suggestion is to move read_entry to be publicly accessible allowing consumers to first hold the previous result prior to calling the fetch. If the fetch yields an error, the consumer can then catch their acceptable errors, determine their acceptable retries for exponential backoff, and then write back into the cache their wait period for attempting the 'refresh' of their cached content.

System configuration

Rails version:
5.2 or greater
Ruby version:
2.6 or greater

@ghiculescu
Copy link
Member

Hmm, I'm not sure I follow your example. I assume the sleep(10) indicates the boundary between two requests. The first request is above the sleep and that's when the cache value is written. The second request is after the sleep and that's when the cache read fails. But if that's the case how will read_entry work across requests?

@poloka
Copy link
Author

poloka commented Jun 9, 2022

Hmm, I'm not sure I follow your example. I assume the sleep(10) indicates the boundary between two requests. The first request is above the sleep and that's when the cache value is written. The second request is after the sleep and that's when the cache read fails. But if that's the case how will read_entry work across requests?

The intent would be to get the entry just prior to the 'begin' block and just use its value inside the 'rescue'. I just didn't add it a second time.

I just updated it to make more sense in the flow.

@ghiculescu
Copy link
Member

But if you can get its entry just before the begin block doesn’t that mean it hasn’t expired? In which case fetch would not call its block, as it can also get the entry?

@poloka
Copy link
Author

poloka commented Jun 9, 2022

Nope. The read_entry retrieves the value stored in the cache regardless if its expired. You have a way to check the expiration by executing .expired? on the returned object. The implementation of the fetch deletes the entry after it's usage of read_entry and prior to yielding the block if it's expired so you have no way to get the value after the fact. The implementation calls a method called handle_expired_entry which will delete it from the cache when expired. So the only way to get the previous value is to retrieve beforehand to account for the failure.

@byroot
Copy link
Member

byroot commented Jun 13, 2022

Not sure what you are trying to do, but it looks a lot like the race_condition_ttl options https://api.rubyonrails.org/classes/ActiveSupport/Cache/Store.html#method-i-fetch

@poloka
Copy link
Author

poloka commented Jun 13, 2022

Not sure what you are trying to do, but it looks a lot like the race_condition_ttl options https://api.rubyonrails.org/classes/ActiveSupport/Cache/Store.html#method-i-fetch

As discussions are also occurring on the PR, this does not work. See my following reply:
#45313 (comment)

If an error occurs inside the yield block the opportunity to retrieve the previous value is lost. The previous 'entry' is deleted prior to performing the execution of the yield block so I cannot retrieve it after the fact.

@byroot
Copy link
Member

byroot commented Jun 13, 2022

Ok, I didn't see the PR. I skimmed it and still don't understand your use case.

But anyway making Entry and the related methods public API is not desirable at all. I think it would help if you gave a more concrete example of what you are trying to do.

@poloka
Copy link
Author

poloka commented Jun 13, 2022

@byroot I have a very concrete example provide in my steps to reproduce in the opening of this discussion. My use case is simple in that when I am trying to perform my external service retrieval of a new value to cache (yield block) I am experiencing an error in that retrieval and I am unable to get a new value. When that error occurs, I want my consumers to be unaware that a downline service call failed and continue working with the previous cached value. So what I want to do is return that previous cached value and just try hitting the downstream service at a later point when it may or may not be back up. The way to accomplish is to simply place the previous entry's value back into the cache for a designated period of time and all the normal flow to simply retry next time when the cache once again expires.

@byroot
Copy link
Member

byroot commented Jun 13, 2022

Ok, that makes more sense.

What I propose it to either modify or add an option to fetch so that if the block fails the value is put back. I think that's what race_condition_ttl should do, if it doesn't we can change it.

But Entry shouldn't be exposed.

@poloka
Copy link
Author

poloka commented Jun 13, 2022

Cool, thanks @byroot. So, I added in my example the race_condition_ttl just to test the current result of the existing functionality and the results do not allow for serving the previous result. Here is my example code:

key = 'foo'
args = { expires_in: 10, race_condition_ttl: 60 }
Rails.cache.fetch(key, args) { 'bing' }
sleep(10)
# entry = Rails.cache.send(:read_entry, key, args)
begin
  value = Rails.cache.fetch(key, args) { raise 'error' }
  puts "I made it here with #{value}"
rescue StandardError
  puts "I landed in the exception case"
end

The race_condition_ttl does as you suggest in that it retains the cached value internally and does not delete it initially. I can see that by performing the Rails.cache.send(:read_entry, key, args) for validation purposes while inside my rescue block. However, there is absolutely no way to retrieve that previous value from the cache once my yield block has errored out and I fall into the rescue block. I don't see the I made it here execution.

If the Entry is not to be exposed then there would have to be another way to auto inject the existing value back into the cache for another cache period or provide the value and a callback maybe in maybe a form of an exception to catch to allow the consumer another way to handle the retry period. But even in the latter, the raised exception would have to include the previous value to allow the consumer to Rails.cache.write the value back into the cache for whatever designated period they deem acceptable. Maybe its new cache options to be provided to the fetch for auto_inject_on_error as a boolean and allow_handle_block_retry as a boolean. The former would have to provide logging indicating there was an error in the yield block and the cache entry reinjected because it would be hard for a consumer to understand why their value is not refreshing if there was a swallowed exception cache that was handling this behind the scenes. The latter would at least allow consumers with defining the reinjection parameters and if they wanted to log information they could.

@byroot
Copy link
Member

byroot commented Jun 13, 2022

Ok, I think I see where the confusion is. race_condition_ttl is for other processes to read the stale value while you regenerate it. You wish to return the stale value from that same process when you fail to regenerate it as a fallback. Is that correct?

I think it also somewhat relate to #41344.

What about Rails.cache.read("key", stale: true) ?

begin
  value = Rails.cache.fetch(key, args) { raise 'error' }
  puts "I made it here with #{value}"
rescue StandardError
  Rails.cache.read(key, stale: true)
end

It means two reads instead of one, but as a fallback mechanism it's probably ok.

@poloka
Copy link
Author

poloka commented Jun 13, 2022

You are correct that I have a single process that wants to read and then regenerate if it fails as a fallback. I've wondered how the race_condition_ttl really works because yes even though its expired, every process that attempts to fetch the expired value could fall into its yield block regardless and attempt to perform the logic inside that yield block. So say you have three processes that all access the cache via the fetch at the exact same time all three will attempt the yield and the last successful execution wins to update the cache. I do see based on the documentation that if the first process attempts the fetch prior to anyone else it should update the cache with extending the expires_in TTL for a short period of time and allow other threads to continue until the original thread updates the cache. This only temporarily prolongs the immediate issue that if the yield block failed and after that short extended TTL the cache value will be lost and all processes will now be attempting to retrieve the new entry to cache. I do question if this race condition should just be baked into the cache's implementation because I want to ask who wouldn't want this type of functionality in the first place? Or at least a default value of say 10 seconds where I would hope most refreshing of caches wouldn't take that long.

Let me give you the real world scenario I ran into. I had a cache set to four hours for retrieving some data. The external service I attempted to call went down for over 6 hours. After all my 500+ rails applications with their multiple processes expired their cache entries, they were all absolutely bombarding the dead service. Once that service tried to come back online, I had so many services attempting to retrieve that new value to cache that it DDoS'ed the starting service and essentially took it back down.

What I wanted to solve is that if a service goes down, my service can run indefinitely with cached results and at the cache TTL interval attempt to refresh its data. What I have implemented as a workaround for now is a dual cache system where I have a short-term and long-term cache that if I lose my short-term cached value I pull from the long-term and stuff it back into short-term and reset the long-term TTL. This way I solve if that service went down for 6 hours that all my rails applications continue to work without interruption. Now I understand if any service was restarted and lost the cache that it would fail and that is expected. Maybe at that point I should look into file_store if that would help with restarting services; however, in a container world a new container would not share files so I'm back to square one.

I am only looking to mitigate the situation for the population that knows of the existing value and keep them continuing as if nothing is wrong.

@byroot
Copy link
Member

byroot commented Jun 13, 2022

I do question if this race condition should just be baked into the cache's implementation

I don't think so. It's hard for a framework to assume that X seconds stale responses are acceptable for the users. That's extremely use case dependent.

That said, unless I'm mistaken all options can be passed to the constructor, so you can configure your cache with a default value to race_condition_ttl, which is pretty much what you suggest, but opt-in.

@fatkodima
Copy link
Member

Maybe it will be easier for you in this case not to use caching, but retrieve the values from 3rd party, store them in redis indefinitely and use where you need it. And just periodically refresh it.

@poloka
Copy link
Author

poloka commented Jun 17, 2022

@fatkodima but then I would have to call another external service to retrieve that data and then cache it locally so I don't waste processing time by making that external call. I just changed the potential point of failure and the issue could still arise if now my redis service goes down. If you're implying to just start a redis instance locally on the same server it just seems like a lot of overhead (and expensive $$) for something that could be simply baked into the existing cache design.

@natematykiewicz
Copy link
Contributor

I believe @fatkodima is saying just use Redis on your own, but don't set any expirations on the keys.

Redis.current.hmset('cachedthing', 'value', 'This is my cached value', 'at', Time.now.to_f)

result = Redis.current.hgetall('cachedthing')
#=> {"value"=>"This is my cached value", "at"=>"1655534182.348695"}

if !result['at'] || Time.at(result['at'].to_f) < 10.minutes.ago
  # enqueue an ActiveJob to fetch the new value and hmset it
end

return result['value']

@natematykiewicz
Copy link
Contributor

natematykiewicz commented Jun 18, 2022

Wait, what am I saying? You'd just need to use Rails Cache without an expiration:

value, cached_at = Rails.cache.fetch('cachedthing') do # don't set expires_in: 10.minutes here
  ['This is my cached value', Time.now]
end

if !cached_at || cached_at < 10.minutes.ago
  # enqueue an ActiveJob to fetch the new value and write it into the cache
end

return value

Alternatively, you could do:

Rails.cache.fetch('cachedthing') do # don't set expires_in: 10.minutes here
  MyCacheJob.set(wait: 10.minutes).perform_later # enqueue a job to rebuild the cache in 10 minutes
  'This is my cached value'
end

@rails-bot
Copy link

rails-bot bot commented Sep 16, 2022

This issue has been automatically marked as stale because it has not been commented on for at least three months.
The resources of the Rails team are limited, and so we are asking for your help.
If you can still reproduce this error on the 7-0-stable branch or on main, please reply with all of the information you have about it in order to keep the issue open.
Thank you for all your contributions.

@rails-bot rails-bot bot added the stale label Sep 16, 2022
@rails-bot rails-bot bot closed this as completed Sep 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants