Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full stack E2EE Testing MEGAISSUE #2165

Open
kegsay opened this issue Oct 27, 2023 · 3 comments
Open

Full stack E2EE Testing MEGAISSUE #2165

kegsay opened this issue Oct 27, 2023 · 3 comments
Labels

Comments

@kegsay
Copy link

kegsay commented Oct 27, 2023

Full stack E2EE Tests

Historically, we have had a lot of difficult bugs around encryption. There is a lot of demand for fixing "unable to decrypt" errors and ensuring that our new rust stack is working well. Part of this work involves testing. However, we lack a central set of end-to-end tests for our cryptography in Matrix, beyond basic happy path test cases.

"End-to-End" in this case means:

  • a real server implementation
  • a real client implementation, with or without a UI.

This issue aims to be the nexus for:

  • defining what our test requirements are
  • defining how we want to test it
  • a definition of done

This issue currently lives in the element-meta repository because it touches the entire stack. If there is a better home for this, please let me know where to move it to.

Requirements

These requirements have been formulated purely from my brain. There has been no consensus around this yet.

Any solution MUST:

  • be able to be run under normal CI pipelines (Github Actions and Gitlab CI/CD).
  • be able to test all clients using the rust SDK (EX-{Android|iOS} and Element-Web R)
  • be able to test Synapse.
  • run in a "reasonable" amount of time, where reasonable is no slower than the slowest thing running alongside it in the CI pipeline.

These "MUST" conditions are formed around the assumptions that we only care about rust SDK crypto, no other client matters. Similarly, we only care about Synapse, no other server matters. We also want these tests to be run on a per-commit basis, so we can spot regressions quickly.

Any solution SHOULD:

  • be able to test over federation.
  • be able to manipulate network conditions.
  • be able to manipulate program state e.g restart on demand, clear storage.
  • be able to manipulate server-side state (e.g the amount of one-time keys available).
  • be able to test the full spectrum of E2EE (e.g key backups, cross-signing, key gossiping)
  • be able to test UI flows (e.g assert padlocks are shown, red warnings are shown appropriately)
  • use best practices to reduce flakiness (e.g sending sentinel events rather than using timeouts)

These "SHOULD" conditions are formed around the assumption that just testing the happy path isn't enough, and we need the ability to test more edge cases. E2EE in general mostly works in Matrix, so it's the edge cases where we will see the most value.

Any solution COULD:

  • test homeservers other than Synapse.
  • test clients other than rust SDK backed ones.
  • provide benchmarking/performance testing.

These "COULD" conditions are generally nice-to-haves and aren't make or break goals. In the wild, there will be different servers and clients, so ensuring we play nicely (or at least know if we don't play nicely) would be useful for the public federation.

Anti-goals:

  • We do not want to test esoteric edge cases (e.g worker race conditions, ffi oddities, etc). For these, a unit test would be more appropriate.

Prior Work

To my knowledge, the prior work around end-to-end tests which use at least the rust crypto crate includes:

There is also more work which is not end-to-end:

Proposal

  • Make use of Complement to write E2EE crypto tests. This reuses all the machinery for running HSes, assertion libraries, etc. This is also then familiar to backend folks.
  • Use uniffi bindings to Go to drive rust SDK from Go. Using matrix_sdk and not matrix_sdk_crypto as this is what EX uses. This has been confirmed as possible, and I have some crude tests which send/receive an encrypted message.
  • Use https://github.com/rogchap/v8go to drive matrix JS SDK (ER edition) to test Element Web R edition. Confirmed this stack cannot easily work. It doesn't support WASM and has no browser defaults (fetch, console, timers, indexeddb, etc) which JS SDK relies on.
  • Use https://pkg.go.dev/github.com/chromedp/chromedp to drive matrix JS SDK (ER edition) to test Element Web R edition. Confirmed this stack can work and Alice-Bob hello world was sent successfully. It's more faffy because it needs to be on different domains to avoid crypto store backend clashes: Error: the account in the store doesn't match the account in the constructor: expected @user-1-alice:hs1:ZLTRJKJTQY, got @user-2-bob:hs1:UYXRUBXKOU if you try aliceClient.initRustCrypto() and then bobClient.initRustCrypto().

Rationale: Existing E2E test frameworks are heavily UI based. This makes it slower, harder to do on CI boxes and less portable as you now need to chuck in an emulator or run it on real devices. As we are only targeting the rust SDK, we can just test it "directly" and bypass the UI layer entirely. This means it should run reasonably quickly on CI boxes (particularly if they make use of Complement's new dirty run mode). In an effort to keep the tests honest and truly "end-to-end", the proposal uses the high-level crate that Element X uses and drives Matrix JS SDK which Element R uses. This keeps the tests "high level": creating rooms, syncing and sending messages, rather than uploading OTKs, querying keys, etc. This should provide more coverage than just testing the matrix_sdk_crypto crate alone, which is important as layers above have a lot of complexity which would otherwise be untested. Using Complement means we can set up mock federation servers which can serve up weird edge cases like reusing OTKs, exhausting OTKs, delaying updates, using unicode device list updates, etc, all of which have caused E2EE problems in the past. Complement now also supports running out-of-repo so the tests needn't sit in the Complement repo (which wouldn't really make much sense as it's mostly testing rust SDK).

Why not:

  • Cypress, Maestro, etc: we ultimately don't want to test UIs as they are slower and less portable.
  • Trafficlight: we don't want to use drive real clients (which is where it excels) for the same reasons as Cypress/Maestro/etc.

Definition of Done

There exists a CI step in Rust SDK (and Synapse?) which runs tests which include at a minimum:

[ ] Membership ACLs:

  • Happy case Alice <-> Bob encrypted room can send/recv messages.
  • New user Charlie does not see previous messages when he joins the room. TODO: can he see messages sent after he was invited?. Assert this either way.
  • Subsequent messages are decryptable by all 3 users.
  • Bob leaves the room. Some messages are sent. Bob rejoins and cannot decrypt the messages sent whilst he was gone (ensuring we cycle keys). Repeat this again with a device instead of a user (so 2x device, 1 remains always in the room, 1 then logs out -> messages sent -> logs in again).
  • Having A invite B, having B then change device, then B join, see if B can see A's message.

[ ] Key backups:

  • New device for Alice cannot decrypt previous messages.
  • Backups can be made on Alice's first device.
  • Alice's new device can download the backup and decrypt the messages.

[ ] One-time Keys:

  • When Alice runs out of OTKs, other users use the fallback key.
  • Ensure things don't explode if OTKs are reused (TODO: what should happen here?)

[ ] Network connectivity:

  • If a client cannot upload OTKs, it retries.
  • If a client cannot claim OTKs, it retries.
  • If a server cannot send device list updates over federation, it retries.
  • If a client cannot query device keys for a user, it retries.
  • If a server cannot query device keys on another server, it retries.
  • If a client cannot send a to-device msg, it retries.
  • If a server cannot send a to-device msg to another server, it retries.
  • Repeat all of the above, but restart the client|server after the initial connection failure. This checks that retries aren't just stored in memory but persisted to disk.

All of these tests again, but with Alice on a different homeserver (testing federation).

@kegsay
Copy link
Author

kegsay commented Oct 30, 2023

xref #245

@kegsay
Copy link
Author

kegsay commented Nov 21, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant