Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTD: Some clients are encrypting Olm messages with the same ratchet key and chain index #2302

Closed
kegsay opened this issue Feb 22, 2024 · 3 comments
Labels
Team: Crypto Z-UISI Unable to decrypt errors

Comments

@kegsay
Copy link

kegsay commented Feb 22, 2024

Describe the bug
@Hywan completely nerd sniped me. Blame him. He pointed me at this UTD rageshake on EIX. It turns out that the cause was with the Olm session between the two devices, not the delivery of to-device events or room keys. For some reason, the Olm session had "wedged". These are the reasons why sessions may wedge, so I set about ruling out the sender being at fault.

I collected all undelivered to-device events on all sliding sync shards and analysed the message content. The ciphertext has a particular format and there should be only 1 event with the same (sender key, chain index, ratchet key). If there were multiple, this indicates using the same key for encrypting multiple events. The receiver won't know this has happened, and so the Olm session will wedge. The results:

  • Out of 234,837 to-device events with type: 1 (already established Olm sessions),
  • there were 118 duplicate events (with differing ciphertexts) spanning 8 users,
  • where possible, mapping the sender_key to a device ID and hence display name shows the affected clients are:
    • Element iOS (4 devices)
    • Cinny Web (2 devices)
    • Element Desktop: Linux (1 device)

When this happens, the receiver will see a UTD. Wedged session recovery should fix this for subsequent messages.

To Reproduce

Unsure. EI and ED don't really share any code, so this feels like a systemic problem with how the SDKs are being used.

Expected behavior
Olm messages should always be decryptable.


My working theory on this:

  • Both platforms have independent processes which need to talk to each other:
    • Element iOS: NSE process and App process
    • Element Web: multiple tabs, web workers
  • Faults in how these processes talk to each other could cause this by:
    • Lack of safe read-modify-write on the Session pickle. If this occurs, both processes will retrieve the Session at chain index N and encrypt messages based on N. The fix here would be to ensure there is proper locking such that this cannot happen.
  • Alternatively, lack of flushing coupled with hard crashes could cause a chain index to be reused.
  • Alternatively, both processes receiving messages at the same time (?) this would be more a receiver chain fault though not sender chain.

I've looked at the JS SDK and cannot cause the read-modify-write to trip up. I cannot get indexeddb locking to fail cross tab / web worker on Firefox or Chrome either (with a test rig that increments an integer in a loop). Worryingly though, both browsers do not provide durable writes:

The complete event may thus be delivered quicker than before, however, there exists a small chance that the entire transaction will be lost if the OS crashes or there is a loss of system power before the data is flushed to disk. Since such catastrophic events are rare, most consumers should not need to concern themselves further.

https://developer.mozilla.org/en-US/docs/Web/API/IDBTransaction#firefox_durability_guarantees

It's important to note that strict does not ensure that changes are actually written immediately to disk. After a site calls put(), there's still some finite amount of time during which a power failure could cause the change to not make it to disk and therefore be missing the next time the app runs.

https://developer.chrome.com/blog/indexeddb-durability-mode-now-defaults-to-relaxed

I'm unsure how much this matters in practice, as if there is a power failure, it's reasonable to assume that the newly encrypted event did not have enough time to be sent.

@richvdh
Copy link
Member

richvdh commented Mar 7, 2024

Possible EW cause: element-hq/element-web#25157

I don't really understand how that issue, which was closed back in August, could cause this?

@richvdh
Copy link
Member

richvdh commented Mar 7, 2024

AFAICT The concern over writes not being durable is not relevant for cross-tab locking. (If a claim to exclusive ownership of the indexeddb is lost due to power-failure / process crash, then we can also be sure that the now-dead process is not mutating the indexeddb, so it doesnt matter that the claim is lost.)

However, it is relevant for updates to the Olm Session in indexeddb. It seems like we should be setting durability:strict on indexeddb transactions that bump the Olm sender ratchet.

@richvdh
Copy link
Member

richvdh commented Apr 25, 2024

We've now tracked down a big cause of this in Element X iOS in matrix-org/matrix-rust-sdk#3310. I also have suspicions that matrix-org/matrix-rust-sdk#3313 could be a source of problems.

I have also opened matrix-org/matrix-rust-sdk#3354 regarding the indexeddb durability mode.

I don't really think this issue is giving much value: it's somewhat handwavey and unactionable. I'm going to close it in favour of the above issues.

@richvdh richvdh closed this as completed Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team: Crypto Z-UISI Unable to decrypt errors
Projects
None yet
Development

No branches or pull requests

2 participants