Update `WorkerLock` tests to better stress the `WORKER_LOCK_MAX_RETRY_INTERVAL` by MadLittleMods · Pull Request #19772 · element-hq/synapse

MadLittleMods · 2026-05-11T20:20:51Z

Update WorkerLock tests to better stress the WORKER_LOCK_MAX_RETRY_INTERVAL. There is no behavioral change, only a change to the tests. See #19772 (comment) for an explanation of why the tests needed changing (and diff comments).

Follow-up to #19394. The test discussion originally happened in #19394 (comment)

This is spawning from thinking about the problem again.

Dev notes

SYNAPSE_POSTGRES=1 SYNAPSE_POSTGRES_USER=postgres SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.handlers.test_worker_lock

SYNAPSE_POSTGRES=1 SYNAPSE_POSTGRES_USER=postgres SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.handlers.test_worker_lock.WorkerLockWorkersTestCase.test_timeouts_for_lock_worker

Pull Request Checklist

Pull request is based on the develop branch
Pull request includes a changelog file. The entry should:
- Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
- Use markdown where necessary, mostly for code blocks.
- End with either a period (.) or an exclamation mark (!).
- Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
Code style is correct (run the linters)

MadLittleMods · 2026-05-11T20:23:12Z


 # How long before an acquired lock times out.
-_LOCK_TIMEOUT_MS = 2 * 60 * 1000
+_LOCK_TIMEOUT = Duration(minutes=2)


Just a refactor to use Duration for _LOCK_TIMEOUT (no behavioral change)

MadLittleMods · 2026-05-11T20:35:57Z

+This matters most when locks go stale as normally, when the lock holder releases, we
+signal to other locks (with the same name/key) that they should try reacquiring the lock
+immediately. But stale locks are never released and instead forcefully reaped behind the
+scenes.


The original reasoning here came from #19755

It was based on my flawed understanding on how the lock release notifications worked. It turns out we also notify_lock_released(...) over replication when other workers tell us about it.

MadLittleMods · 2026-05-11T20:50:26Z


-        # Release the first lock (`lock1`). The second lock(`lock2`) should be
-        # automatically acquired by the `pump()` inside `get_success()`
-        self.get_success(lock1.__aexit__(None, None, None))


# [...] The second lock(`lock2`) should be # automatically acquired by the `pump()` inside `get_success()`

Basically, the behavior described by this comment circumvents the retry timeout interval logic we're trying to stress. And the previous tests actually pass without any of the fixes from #19394 because of this happy-path flow.

To explain further: When a lock is released, we immediately try to re-acquire the lock again

Notifier.notify_lock_released(...) -> calls any callbacks registered from Notifier.add_lock_released_callback(...) -> which we do in WorkerLocksHandler and will call release_lock() which resolves the deferred and wakes up the timeout_deferred(...) and loops around the while-loop again which tries to re-acquire the lock.

Instead, we want to avoid the lock released notification stuff and stress the retry interval which helps in situations where the lock holder goes stale, is reaped, and the other locks want to try to acquire the lock.

I've tested to make sure these new tests fail with a version of Synapse before #19394

git checkout v1.152.0

Paste the latest tests/handlers/test_worker_lock.py into the codebase

Shim a couple values that don't exist in that Synapse version:
WORKER_LOCK_MAX_RETRY_INTERVAL = Duration(seconds=5) _LOCK_TIMEOUT = Duration(minutes=2)

poetry install --extras all

SYNAPSE_POSTGRES=1 SYNAPSE_POSTGRES_USER=postgres SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.handlers.test_worker_lock

Notice the tests fail as expected:
tests.handlers.test_worker_lock WorkerLockTestCase test_lock_contention ... [OK] test_timeouts_for_lock_locally ... [FAIL] test_wait_for_lock_locally ... [OK] WorkerLockWorkersTestCase test_timeouts_for_lock_worker ... [FAIL] test_wait_for_lock_worker ... [OK]

MadLittleMods · 2026-05-12T15:10:43Z

Thanks for the review @erikjohnston 🐎

MadLittleMods added 3 commits May 11, 2026 15:14

Update _LOCK_TIMEOUT to be a Duration

f24eafb

Better test that actually stresses retry timeout interval

e90a919

Apply treatment to other test

f3be850

MadLittleMods added the A-Workers label May 11, 2026

Add changelog

dcd3591

MadLittleMods commented May 11, 2026

View reviewed changes

Update understanding on how locks notified about releases

1b57897

MadLittleMods commented May 11, 2026

View reviewed changes

This was referenced May 11, 2026

Reduce WORKER_LOCK_MAX_RETRY_INTERVAL to 5 seconds #19755

Merged

fix: Cap WorkerLock timeout intervals to 60 seconds #19394

Merged

MadLittleMods commented May 11, 2026

View reviewed changes

MadLittleMods marked this pull request as ready for review May 11, 2026 21:10

MadLittleMods requested a review from a team as a code owner May 11, 2026 21:10

erikjohnston approved these changes May 12, 2026

View reviewed changes

MadLittleMods merged commit b8bd351 into develop May 12, 2026
46 checks passed

MadLittleMods deleted the madlittlemods/better-worker-lock-tests branch May 12, 2026 15:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update `WorkerLock` tests to better stress the `WORKER_LOCK_MAX_RETRY_INTERVAL`#19772

Update `WorkerLock` tests to better stress the `WORKER_LOCK_MAX_RETRY_INTERVAL`#19772
MadLittleMods merged 5 commits into
developfrom
madlittlemods/better-worker-lock-tests

MadLittleMods commented May 11, 2026 •

edited

Loading

Uh oh!

MadLittleMods May 11, 2026

Uh oh!

MadLittleMods May 11, 2026 •

edited

Loading

Uh oh!

MadLittleMods May 11, 2026

Uh oh!

Uh oh!

MadLittleMods commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MadLittleMods commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dev notes

Pull Request Checklist

Uh oh!

MadLittleMods May 11, 2026

Choose a reason for hiding this comment

Uh oh!

MadLittleMods May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MadLittleMods May 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MadLittleMods commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MadLittleMods commented May 11, 2026 •

edited

Loading

MadLittleMods May 11, 2026 •

edited

Loading