Scope: Fix TOCTOU race in prefix initialization by djmetzle · Pull Request #43 · iFixit/Matryoshka

djmetzle · 2026-04-07T20:49:54Z

Backend::getAndSet() has Time-of-Check-Time-of-Use (TOCTOU) race conditions. It performs non-atomic get() then set(), so concurrent callers that both observe a miss will each compute and write different values. The last writer wins, silently orphaning data written by earlier callers under the first value.

Add a regression test documenting the race conditions, and the new expected behavior to handle concurrent writes.

We can use add() instead of set() when setting backend values so the first writer wins. If add() fails, re-get() to pick up the winner's value. The $reset path (deleteScope) still uses set() for intentional overwrites.

Note this addresses the race for both the Scope prefix, as well as getAndSet.

Ref:

TOCTOU race in getAndSet causes data loss with shared backends #42

CC @sctice-ifixit @danielbeardsley

Use add() instead of set() when initializing the scope prefix so the first writer wins. If add() fails, re-get() to pick up the winner's value. The $reset path (deleteScope) still uses set() for intentional overwrites.

A get() then set() allows a slow callback's result to be overwritten by a concurrent caller. Use add() so the first writer wins. Try to fetch the first writers value, if possible, but fall back to the computed value if the get() returns a miss.

If we cannot `add`, we try to fetch the first writers value. If that fails, make sure to always return a scope prefix. This is the same failure mode that was fixed for `getAndSet`, which also always needs to return a value. Note: we can clean up the getAndSet test to be a bit more intent revealing, using the same pattern found for the prefix test.

danielbeardsley · 2026-04-07T21:22:34Z

-            $this->set($key, $value, $expiration);
+            if ($reset) {
+               $this->set($key, $value, $expiration);
+            } else if (!$this->add($key, $value, $expiration)) {


This does as claimed, but I feel like this could cause problems with some usage patterns:

DCG (where we short-circuit all GETs and return MISS)

This change would fail to update the cache

McRouter: how does it handle add() when one instance has a value and the other doesn't?

That seems like a usage error. DCG doesn't use the reset option then?

I see, DCG is a backend.

Seems like we'd want to sub away set and add there, as we do in the new tests here.

Wow that's confusing, no, im incorrect. DCG is intended to repopulate the cache.

Help me understand the failure mode you're describing? I'm reviewing DCG, and it seems like it would continue to work as expected.

I dont see any problems with DCG or Mcrouter. This should help fix the race condition in both. And behavior is otherwise preserved.

Help me understand the failure mode you're describing? I'm reviewing DCG, and it seems like it would continue to work as expected.

I think the scenario is:

Prior to DCG request, getAndSet(K, () => V1) sets K ⇒ V1

On DCG request, we try to getAndSet(K, () => V2):

get(K) => MISS because of the DCG backend wrap on get

Because of MISS, we run $value = $callback() and get V2

Not $reset, so we try to add(K, V2, TTL)

But K is in the cache, so add fails

So we get(K) => V1 and return V1

We expected to write V2 to the cache and return it in the DCG request (simulating the cache actually starting empty), but instead we wrote nothing and got back V1.

Right! I see the concern. Missed that add will see the current value.

Doesn't that mean we need DCG to explicitly reset?

Like this: https://github.com/iFixit/ifixit/commit/9fb04cf6bab16d2cef06fa615c29adc88273ae3b

Found this too:

Matryoshka/library/iFixit/Matryoshka/McRouter.php

Lines 21 to 27 in ab81f01

/**

* Override the `set` method. Use `add` which is synchronous to detect

* `set` over-top of existing keys. Delete and reset them to

* enforce consistency.

*/

public function set($key, $value, $expiration = 0) {

$addReturn = $this->memcached->add($key, $value, $expiration);

djmetzle · 2026-04-07T21:52:41Z

Some things Claude flagged reviewing this:

Stats profile shifts. The Stats wrapper tracks set_count and add_count separately. Before this change, every getAndSet miss showed up as a set. Now it shows up as an add (and a get on race-loss). If anyone is monitoring set_count or add_count specifically, the numbers will change. Your Cache::getMemcacheStats() aggregates these so it probably doesn't matter, but worth noting.
getAndSetMultiple still uses setMultiple(). Same TOCTOU problem, but there's no addMultiple() primitive in the Backend interface, so we can't fix it the same way. Not a regression — it was already racy — but it's now inconsistent with getAndSet.

danielbeardsley · 2026-04-07T22:14:32Z

The last writer wins, silently orphaning data written by earlier callers under the first value.

This seems to be the behavior this pull is trying alter. I'm tempted to say we should narrow the focus here and just make this change for Scopes. I feel like that would reduce the chance of breaking current behavior and address the problem that the issue talks about.

Outside of scopes, this doesn't seem like a big deal to me, the later SET wins. But I see how it could play poorly with Scopes where we are storing a random prefix, not a cached version of some external source of truth.

djmetzle · 2026-04-07T22:17:42Z

Is the trouble though that Scope relies on getAndSet's behavior? We need to fix both to fully address the race:

Matryoshka/library/iFixit/Matryoshka/Scope.php

Lines 24 to 28 in ab81f01

    
           public function getScopePrefix(bool $reset = false) { 
        
              if ($this->scopePrefix === null || $reset) { 
        
                 $scopeValue = $this->backend->getAndSet($this->getScopeKey(), 
        
                  function() { 
        
                    return substr(md5(microtime() . $this->scopeName), 0, 16);

djmetzle · 2026-04-07T22:18:36Z

Actually the getAndSet fix also addresses the scope problem?

With the race condition also addressed in `getAndSet`, we can now safely rely on it for concurrent scope initialization. Revert back to the original version.

sterlinghirsh · 2026-04-09T01:55:46Z

Responding to the original issue here:

The scope prefix is also cached in $this->scopePrefix (an instance variable), so within a single request the stale prefix persists even after it's been overwritten in the backend — causing all subsequent reads to miss for the rest of that request.

In the case you're describing, two simultaneous requests both hit a cold cache. Workers A and B both generate scope keys. But any time you generate a scope key, all subsequent requests would be misses for the rest of that request anyway since that request is responsible for populating that scope.

First writer wins. After add(), re-get() to learn the winning value.

In what situation do you want first writer to win? I'd think the later writer would have the more up to date information if it came down to that.

silently orphaning data written by earlier callers under the first value.

So the bug here is wasted writes? What is the consequence of that? Are we getting untimely cache evictions?

Is getAndSet intended to be atomic? The docstring doesn't say, but the name and usage pattern (lazy cache population) strongly implies it

It's impossible for this to be atomic. The whole point is that you get first, and then if you need to revalidate then you compute the value, and then you set. Computing the value is the expensive part, and this does nothing to mitigate overlapping revalidations, but that wasn't the intent of getAndSet.

Let's say you start 2 processes computing a few keys of scoped data under your proposal.

A: GET scope key - MISS - start generating scope prefix A
B: GET scope key - MISS - start generating scope prefix B
A: ADD scope key - SUCCESS - GET value key A - MISS - start generating value A
B: ADD scope key - FAIL - GET scope key again - HIT - GET value key A - MISS - start generating value B
A: ADD value key A - SUCCESS - RETURN value A
B: ADD value key A - FAIL - GET value key A again - HIT - RETURN value A

vs on master:

A: GET scope key - MISS - start generating scope prefix A
B: GET scope key - MISS - start generating scope prefix B
A: SET scope key - GET value key A - MISS - start generating value A
B: SET scope key - GET value key B - MISS - start generating value B
A: SET value key A - RETURN value A
B: SET value key B - RETURN value B

This seems like it will increase our cache traffic and I'm not sure what the advantage of having first writer win over last writer if both are doing all the computation anyway. It seems like this is intended to solve a bug but I'm curious what the actual behavior leading to this was.

In a perfect world, maybe there would be a way to do a getOrLock, either getting the value or telling the cache you intend to place a value in that key. Then subsequent requests for that key are held (e.g. blocking network request) until the value is set by the original process so that the result can be instantly distributed to everyone waiting for it. You can do something like this in mysql. I think there is a way to do this with redis too but I'm not sure. Doubt it for apcu / memcache. One problem is what happens if the first process never comes back. With mysql, it can end the transaction eventually and let the next client awaiting a lock have one. In memcache / apcu you probably just have to have a timeout at which point the process starts revalidating anyway.

djmetzle · 2026-04-09T03:16:31Z

These seem like questions for the issue/spec @sterlinghirsh. Did you see the other PR? This version should probably be closed.

djmetzle added 2 commits April 7, 2026 13:22

Scope: Fix TOCTOU race in prefix initialization

440ae2e

Use add() instead of set() when initializing the scope prefix so the first writer wins. If add() fails, re-get() to pick up the winner's value. The $reset path (deleteScope) still uses set() for intentional overwrites.

djmetzle added the bug label Apr 7, 2026

danielbeardsley reviewed Apr 7, 2026

View reviewed changes

Resume using getAndSet for getScopePrefix

05cc5e2

With the race condition also addressed in `getAndSet`, we can now safely rely on it for concurrent scope initialization. Revert back to the original version.

djmetzle mentioned this pull request Apr 7, 2026

TOCTOU race in getAndSet causes data loss with shared backends #42

Open

sterlinghirsh closed this Apr 9, 2026

djmetzle mentioned this pull request Apr 9, 2026

Add getAndAdd to Backend, use for Scope prefix #44

Draft

	/**
	* Override the `set` method. Use `add` which is synchronous to detect
	* `set` over-top of existing keys. Delete and reset them to
	* enforce consistency.
	*/
	public function set($key, $value, $expiration = 0) {
	$addReturn = $this->memcached->add($key, $value, $expiration);

Conversation

djmetzle commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danielbeardsley Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

djmetzle commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danielbeardsley commented Apr 7, 2026

Uh oh!

djmetzle commented Apr 7, 2026

Uh oh!

djmetzle commented Apr 7, 2026

Uh oh!

sterlinghirsh commented Apr 9, 2026

Uh oh!

djmetzle commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

djmetzle commented Apr 7, 2026 •

edited

Loading

danielbeardsley Apr 7, 2026 •

edited

Loading

djmetzle commented Apr 7, 2026 •

edited

Loading