Skip to content

Conversation

taylorotwell
Copy link
Member

@taylorotwell taylorotwell commented Oct 17, 2025

Similar to queue failover, this implements cache failover. If a cache operation fails, the next cache store in the failover list will be used. The CacheFailedOver event is dispatched on failover.

In the config/cache.php configuration file:

'failover' => [
    'driver' => 'failover',
    'stores' => [
        'redis',
        'database',
        'array',
    ],
],

Then, use the cache normally:

return Cache::get('name');

@deleugpn
Copy link
Contributor

deleugpn commented Oct 18, 2025

This one seems like an incomplete feature. Here is one example:

1: attempt to write key “foo” to Redis.
2: Redis fails
3: attempt to write key “foo” to Database.
4: success.

Later on

1: attempt to read key “foo” from Redis
2: Redis does NOT fail, but key doesn’t exist so it just returns null
3: The fact the database has the cached value is moot

The conclusion here is that failing over during a write puts a burden on future reads: we either would need to brute-force read every store in the chain or we would need some sort of memorization to know where a key was successfully written. Considering the unlikelihood of a cache driver failing it seems more burden than benefit.

The only solid solution I can think of for this problem-space is a more extensive package that provides not only this driver but also a “side-worker” responsible for replicating every cache write from store 1 into the subsequent stores in the list. That way when there is a failure on the first store the second one is likely able to serve the same content. This does eliminate array as a valid failover store.

@taylorotwell
Copy link
Member Author

taylorotwell commented Oct 18, 2025

@deleugpn I don't actually see that as a problem.

On any cache read there is typically the built-in expectation that the cached value may not be available? It could have expired. The cached could have been flushed for some reason. To me cached data is usually inherently ephemeral and "nice to have" but not required to have for the application to function.

The point of this PR is to keep your application online during brief cache interruptions.

@NickSdot
Copy link
Contributor

The point of this PR is to keep your application online during brief cache interruptions.

What would a downtime be caused by? I assume you would catch connection exceptions and serve fresh if the service isn't up. So I guess you mean bringing it down by very heavy queries during cache service downtime?

But that also means that on main service failure you still would have heavy load until the secondary cache catches up. How long are such downtimes usually? Short. So would you really avoid a downtime with the proposed solution?

I feel like Deleugpn has a point here. If you have that kinda load you probably want to have a synched failover cache, no?

@ziming
Copy link
Contributor

ziming commented Oct 19, 2025

maybe can have a basic and advanced version. by default is the basic version (this 1) then in the future add an advanced version default to false but if true will be the deleu suggested behaviour

this will allow the current version to be merged 1st

@NickSdot
Copy link
Contributor

maybe can have a basic and advanced version. by default is the basic version (this 1) then in the future add an advanced version default to false but if true will be the synced cache behaviour

this will allow the current version to be merged 1st

Mind to share an example how this would benefit your project?

Copy link
Member

@timacdonald timacdonald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to pass the exception to the event so you can log why the failover occurred.

@Propaganistas
Copy link
Contributor

Propaganistas commented Oct 21, 2025

To me cached data is usually inherently ephemeral and "nice to have" but not required to have for the application to function.

While I certainly agree with this principle, I'm wondering though if various framework implementations that rely on the cache then aren't actually violating this...? At least some of them are quite application critical in my opinion (e.g. atomic locks).

It'll be hard to guarantee the correct functioning of them as soon as a failover kicks in.

@deleugpn
Copy link
Contributor

deleugpn commented Oct 21, 2025

While I certainly understand and agree that cached data is a nice-to-have and fine to hit a miss, my point is not at all arguing about whether I expect the cache to respond with data or empty. I view this problem more in the context of minor blips where 1 request fails and end up writing the cache to the 2nd driver which then never gets used while a subsequent request will not find the cache that has been written to the 2nd driver. Blips like this happens constantly at a large scale.

If the goal here is to talk exclusively about large outage that lasts for an extensive period, I can see how this PR may cause a cache stampede degrading the system but at the same time slowly recovering without ever having a full failure, which is a great thing.

overall I see the point that minor blips become a minor annoyance at the expense of providing a stronger resilience for the system overall. Yesterday’s outage certainly put things into perspective and make a strong argument in favor of this PR.

@NickSdot
Copy link
Contributor

brief cache interruptions

vs.

talk exclusively about large outage

taylorotwell and others added 2 commits October 21, 2025 10:39
Co-authored-by: Tim MacDonald <hello@timacdonald.me>
Co-authored-by: Tim MacDonald <hello@timacdonald.me>
@taylorotwell taylorotwell merged commit d439ff5 into 12.x Oct 21, 2025
67 checks passed
@taylorotwell taylorotwell deleted the failover-cache branch October 21, 2025 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants