Full stack E2EE Testing MEGAISSUE #2165

kegsay · 2023-10-27T16:18:24Z

Full stack E2EE Tests

Historically, we have had a lot of difficult bugs around encryption. There is a lot of demand for fixing "unable to decrypt" errors and ensuring that our new rust stack is working well. Part of this work involves testing. However, we lack a central set of end-to-end tests for our cryptography in Matrix, beyond basic happy path test cases.

"End-to-End" in this case means:

a real server implementation
a real client implementation, with or without a UI.

This issue aims to be the nexus for:

defining what our test requirements are
defining how we want to test it
a definition of done

This issue currently lives in the element-meta repository because it touches the entire stack. If there is a better home for this, please let me know where to move it to.

Requirements

These requirements have been formulated purely from my brain. There has been no consensus around this yet.

Any solution MUST:

be able to be run under normal CI pipelines (Github Actions and Gitlab CI/CD).
be able to test all clients using the rust SDK (EX-{Android|iOS} and Element-Web R)
be able to test Synapse.
run in a "reasonable" amount of time, where reasonable is no slower than the slowest thing running alongside it in the CI pipeline.

These "MUST" conditions are formed around the assumptions that we only care about rust SDK crypto, no other client matters. Similarly, we only care about Synapse, no other server matters. We also want these tests to be run on a per-commit basis, so we can spot regressions quickly.

Any solution SHOULD:

be able to test over federation.
be able to manipulate network conditions.
be able to manipulate program state e.g restart on demand, clear storage.
be able to manipulate server-side state (e.g the amount of one-time keys available).
be able to test the full spectrum of E2EE (e.g key backups, cross-signing, key gossiping)
be able to test UI flows (e.g assert padlocks are shown, red warnings are shown appropriately)
use best practices to reduce flakiness (e.g sending sentinel events rather than using timeouts)

These "SHOULD" conditions are formed around the assumption that just testing the happy path isn't enough, and we need the ability to test more edge cases. E2EE in general mostly works in Matrix, so it's the edge cases where we will see the most value.

Any solution COULD:

test homeservers other than Synapse.
test clients other than rust SDK backed ones.
provide benchmarking/performance testing.

These "COULD" conditions are generally nice-to-haves and aren't make or break goals. In the wild, there will be different servers and clients, so ensuring we play nicely (or at least know if we don't play nicely) would be useful for the public federation.

Anti-goals:

We do not want to test esoteric edge cases (e.g worker race conditions, ffi oddities, etc). For these, a unit test would be more appropriate.

Prior Work

To my knowledge, the prior work around end-to-end tests which use at least the rust crypto crate includes:

Trafficlight: https://github.com/matrix-org/trafficlight/tree/main/trafficlight/tests/chat
Element-Web Cypress tests: https://github.com/matrix-org/matrix-react-sdk/tree/develop/cypress/e2e/crypto
Element-X-Android Maestro E2E tests: https://github.com/vector-im/element-x-android/tree/develop/.maestro/tests
Element-X-iOS tests: https://github.com/vector-im/element-x-ios/tree/develop/IntegrationTests/Sources

There is also more work which is not end-to-end:

Element-Android: https://github.com/vector-im/element-android/blob/8bfd5f7c543c9d4ffbdb1e5117b1e6014b474f85/matrix-sdk-android/src/androidTest/java/org/matrix/android/sdk/internal/crypto/E2eeSanityTests.kt#L4
TODO fill in more

Proposal

Make use of Complement to write E2EE crypto tests. This reuses all the machinery for running HSes, assertion libraries, etc. This is also then familiar to backend folks.
Use uniffi bindings to Go to drive rust SDK from Go. Using matrix_sdk and not matrix_sdk_crypto as this is what EX uses. This has been confirmed as possible, and I have some crude tests which send/receive an encrypted message.
~~Use https://github.com/rogchap/v8go to drive matrix JS SDK (ER edition) to test Element Web R edition.~~ Confirmed this stack cannot easily work. It doesn't support WASM and has no browser defaults (fetch, console, timers, indexeddb, etc) which JS SDK relies on.
Use https://pkg.go.dev/github.com/chromedp/chromedp to drive matrix JS SDK (ER edition) to test Element Web R edition. Confirmed this stack can work and Alice-Bob hello world was sent successfully. It's more faffy because it needs to be on different domains to avoid crypto store backend clashes: Error: the account in the store doesn't match the account in the constructor: expected @user-1-alice:hs1:ZLTRJKJTQY, got @user-2-bob:hs1:UYXRUBXKOU if you try aliceClient.initRustCrypto() and then bobClient.initRustCrypto().

Rationale: Existing E2E test frameworks are heavily UI based. This makes it slower, harder to do on CI boxes and less portable as you now need to chuck in an emulator or run it on real devices. As we are only targeting the rust SDK, we can just test it "directly" and bypass the UI layer entirely. This means it should run reasonably quickly on CI boxes (particularly if they make use of Complement's new dirty run mode). In an effort to keep the tests honest and truly "end-to-end", the proposal uses the high-level crate that Element X uses and drives Matrix JS SDK which Element R uses. This keeps the tests "high level": creating rooms, syncing and sending messages, rather than uploading OTKs, querying keys, etc. This should provide more coverage than just testing the matrix_sdk_crypto crate alone, which is important as layers above have a lot of complexity which would otherwise be untested. Using Complement means we can set up mock federation servers which can serve up weird edge cases like reusing OTKs, exhausting OTKs, delaying updates, using unicode device list updates, etc, all of which have caused E2EE problems in the past. Complement now also supports running out-of-repo so the tests needn't sit in the Complement repo (which wouldn't really make much sense as it's mostly testing rust SDK).

Why not:

Cypress, Maestro, etc: we ultimately don't want to test UIs as they are slower and less portable.
Trafficlight: we don't want to use drive real clients (which is where it excels) for the same reasons as Cypress/Maestro/etc.

Definition of Done

There exists a CI step in Rust SDK (and Synapse?) which runs tests which include at a minimum:

[ ] Membership ACLs:

Happy case Alice <-> Bob encrypted room can send/recv messages.
New user Charlie does not see previous messages when he joins the room. TODO: can he see messages sent after he was invited?. Assert this either way.
Subsequent messages are decryptable by all 3 users.
Bob leaves the room. Some messages are sent. Bob rejoins and cannot decrypt the messages sent whilst he was gone (ensuring we cycle keys). Repeat this again with a device instead of a user (so 2x device, 1 remains always in the room, 1 then logs out -> messages sent -> logs in again).
Having A invite B, having B then change device, then B join, see if B can see A's message.

[ ] Key backups:

New device for Alice cannot decrypt previous messages.
Backups can be made on Alice's first device.
Alice's new device can download the backup and decrypt the messages.

[ ] One-time Keys:

When Alice runs out of OTKs, other users use the fallback key.
Ensure things don't explode if OTKs are reused (TODO: what should happen here?)

[ ] Network connectivity:

If a client cannot upload OTKs, it retries.
If a client cannot claim OTKs, it retries.
If a server cannot send device list updates over federation, it retries.
If a client cannot query device keys for a user, it retries.
If a server cannot query device keys on another server, it retries.
If a client cannot send a to-device msg, it retries.
If a server cannot send a to-device msg to another server, it retries.
Repeat all of the above, but restart the client|server after the initial connection failure. This checks that retries aren't just stored in memory but persisted to disk.

All of these tests again, but with Alice on a different homeserver (testing federation).

The text was updated successfully, but these errors were encountered:

kegsay · 2023-10-30T13:06:51Z

xref #245

kegsay · 2023-11-21T18:01:07Z

Tests are now tracked at https://github.com/matrix-org/complement-crypto/blob/main/TEST_HITLIST.md

kegsay · 2023-11-27T15:21:41Z

Collection of issues found as a result of testing:

Synapse does not attempt to send events in the device outbox at startup matrix-org/synapse#16680
Running under sqlite, Synapse incorrectly populates the to-device messages current stream ID matrix-org/synapse#16681
rediscovered an 8 year old bug e2e upload API allows you to POST keys claiming to belong to other user's UIDs (SYN-496) matrix-org/synapse#1396
Fallback keys are cycled too quickly matrix-org/matrix-rust-sdk#3127

Collection of MSCs as a result of this work:

MSC4081: Eagerly sharing fallback keys with federated servers matrix-org/matrix-spec-proposals#4081

Collection of regressions which could have been caught if Complement-Crypto ran in CI:

FFI: BackupDownloadStrategy cannot be set in ClientBuilder matrix-org/matrix-rust-sdk#3130

kegsay added the A-E2EE label Oct 27, 2023

kegsay mentioned this issue Oct 27, 2023

Add end-to-end tests for End-to-end encryption element-hq/element-web#7312

Closed

richvdh mentioned this issue Nov 8, 2023

Verification | Run Sanity test for verification against all clients currently in prod #253

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Full stack E2EE Testing MEGAISSUE #2165

Full stack E2EE Testing MEGAISSUE #2165

kegsay commented Oct 27, 2023 •

edited

Loading

kegsay commented Oct 30, 2023

kegsay commented Nov 21, 2023 •

edited

Loading

kegsay commented Nov 27, 2023 •

edited

Loading

Full stack E2EE Testing MEGAISSUE #2165

Full stack E2EE Testing MEGAISSUE #2165

Comments

kegsay commented Oct 27, 2023 • edited Loading

Full stack E2EE Tests

Requirements

Prior Work

Proposal

Definition of Done

kegsay commented Oct 30, 2023

kegsay commented Nov 21, 2023 • edited Loading

kegsay commented Nov 27, 2023 • edited Loading

kegsay commented Oct 27, 2023 •

edited

Loading

kegsay commented Nov 21, 2023 •

edited

Loading

kegsay commented Nov 27, 2023 •

edited

Loading