Skip to content

Conversation

anoadragon453
Copy link
Member

@anoadragon453 anoadragon453 commented Sep 8, 2025

This PR adds a cache to _get_e2e_cross_signing_signatures_for_devices to put less pressure on the database.

Recommended to review commit-by-commit.

Requires matrix-org/sytest#1409 for Sytest CI to pass.

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Code style is correct (run the linters)

@anoadragon453 anoadragon453 marked this pull request as ready for review September 8, 2025 16:20
@anoadragon453 anoadragon453 requested a review from a team as a code owner September 8, 2025 16:20
@anoadragon453 anoadragon453 marked this pull request as draft September 8, 2025 16:52
@anoadragon453
Copy link
Member Author

anoadragon453 commented Sep 8, 2025

This needs a small change. I had missed that functions decorated with @cachedList must return a dict with keys matching the original collection items.

@reivilibre reivilibre removed the request for review from a team September 9, 2025 10:32
@anoadragon453 anoadragon453 force-pushed the anoa/cache_device_key_signatures branch 4 times, most recently from 7ba742b to 8811a86 Compare September 11, 2025 12:55
anoadragon453 added a commit to matrix-org/sytest that referenced this pull request Sep 12, 2025
This test started to fail after implementing
element-hq/synapse#18899. The failure
appeared to be due to the test not waiting for signatures to
propagate over federation. The  loop
would exit as soon as it saw a 'signatures' key. But the
value was an empty dict.

A few moments later, the dict would be populated with the key
we were waiting for. But while that was being sent over federation,
the test would fail and exit abruptly.

This commit changes the  loop to actually
check for the signature we're waiting for, instead of exiting
upon seeing the 'signatures' key. This seems like it would be
less flaky in general, and prevents the test from failing in
this case.
anoadragon453 added a commit to matrix-org/sytest that referenced this pull request Sep 12, 2025
This test started to fail after implementing
element-hq/synapse#18899. The failure
appeared to be due to the test not waiting for signatures to
propagate over federation. The  loop
would exit as soon as it saw a 'signatures' key. But the
value was an empty dict.

A few moments later, the dict would be populated with the key
we were waiting for. But while that was being sent over federation,
the test would fail and exit abruptly.

This commit changes the  loop to actually
check for the signature we're waiting for, instead of exiting
upon seeing the 'signatures' key. This seems like it would be
less flaky in general, and prevents the test from failing in
this case.
anoadragon453 added a commit to matrix-org/sytest that referenced this pull request Sep 12, 2025
This test started to fail after implementing
element-hq/synapse#18899. The failure
appeared to be due to the test not waiting for signatures to
propagate over federation. The  loop
would exit as soon as it saw a 'signatures' key. But the
value was an empty dict.

A few moments later, the dict would be populated with the key
we were waiting for. But while that was being sent over federation,
the test would fail and exit abruptly.

This commit changes the  loop to actually
check for the signature we're waiting for, instead of exiting
upon seeing the 'signatures' key. This seems like it would be
less flaky in general, and prevents the test from failing in
this case.
anoadragon453 added a commit to matrix-org/sytest that referenced this pull request Sep 16, 2025
This test started to fail after implementing
element-hq/synapse#18899. The failure
appeared to be due to the test not waiting for signatures to
propagate over federation. The  loop
would exit as soon as it saw a 'signatures' key. But the
value was an empty dict.

A few moments later, the dict would be populated with the key
we were waiting for. But while that was being sent over federation,
the test would fail and exit abruptly.

This commit changes the  loop to actually
check for the signature we're waiting for, instead of exiting
upon seeing the 'signatures' key. This seems like it would be
less flaky in general, and prevents the test from failing in
this case.
@anoadragon453 anoadragon453 force-pushed the anoa/cache_device_key_signatures branch from 2df4e22 to 834662d Compare September 16, 2025 17:13
The use of `@cachedList` requires having a cached, equivalent "single
item" method that represents the key and cached value. Calls to methods
with @cachedList are then processed and items that are already cached
are removed before the @cachedList-decorated method is called. The
single item method should not be directly used.

The return type needed to be changed to a "Mapping" containing a "Sequence" to satisfy the
requirement that cached results must be immutable (lest you risk
mutating the cache).

`@cachedList` also requires returning an dict where the keys
matches that of the items in the cached input list. Our keys are tuples (and you'll later see why this is a pain).
The return type has changed, and we now need to handle a map of user_id/device_id tuples to lists of key signatures.
This is needed to then call `_invalidate_cache_and_stream_bulk
with the `txn` object in the next commit.
Invalidate and stream whenever we update `e2e_cross_signing_signatures`.
As the "keys" in an incoming cache invalidation stream object appear to always be a string,
we have to do some special case handling to convert the string into a tuple of (user_id, device_id).

This works, but is awful.
@anoadragon453 anoadragon453 force-pushed the anoa/cache_device_key_signatures branch from 834662d to 983f850 Compare September 17, 2025 10:49
@anoadragon453 anoadragon453 marked this pull request as ready for review September 17, 2025 10:49
@anoadragon453 anoadragon453 force-pushed the anoa/cache_device_key_signatures branch from 983f850 to a096967 Compare September 17, 2025 10:52
@anoadragon453 anoadragon453 requested a review from a team September 17, 2025 11:08
Copy link
Contributor

@reivilibre reivilibre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems overall OK though; are there any tests we should add (I guess cache invalidation would be on my mind)?

Instead of encoding them using a brittle, custom scheme.

Avoids errors where the device ID or user ID contains a comma.
Required defining a three-column overload for the method to pass mypy.

Additionally the type-ignore was no longer required according to mypy.
Since `_get_e2e_cross_signing_signatures_for_devices` always return a key for every entry in `device_query`,
there will never be a case where the function can return `None`, even when decorated with `@cacheList`.
@anoadragon453 anoadragon453 force-pushed the anoa/cache_device_key_signatures branch from b037d41 to e7ca69d Compare September 17, 2025 17:50
Comment on lines +1881 to +1882
(json.dumps([user_id, item.target_device_id]),)
for item in signatures
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not possible to json.dumps the entries of to_invalidate?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can; it looks like this:

to_send = [
    (json.dumps(item[0]),)
    for item in to_invalidate
]

which seemed less clear as to what we're actually sending than what's there now.

@anoadragon453 anoadragon453 merged commit b596faa into develop Sep 18, 2025
44 checks passed
@anoadragon453 anoadragon453 deleted the anoa/cache_device_key_signatures branch September 18, 2025 11:06
@anoadragon453
Copy link
Member Author

are there any tests we should add (I guess cache invalidation would be on my mind)?

Ah, I missed this. Sytest does fail (specifically uploading signed devices gets propagated over federation) if the cache invalidation does not work. This is what allowed me to catch the lack of being able to send nested data structures over replication in the first place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants