[WIP] MSC3814: Dehydrated devices with SSSS #3814

uhoreg · 2022-05-12T20:11:44Z

proposals/3814-dehydrated-devices-with-ssss.md

nico-famedly · 2022-08-17T22:01:55Z

proposals/3814-dehydrated-devices-with-ssss.md

+/dehydrated_device/{device_id}/events` to obtain the next batch.
+
+```
+POST /dehydrated_device/{device_id}/events


Why is this a POST and not a GET like /sync and /messages?

IIRC, the rationale was because the call has side-effects (deleting the device).

It's a bit weird that it doesn't follow the pattern of /messages, /events or /sync imo. I'll try implementing it as a GET without the device deletion first and see how that works out, I think.

A GET endpoint with side-effects seems like a big no-no to me. Everyone expects a GET request to have approximately zero side-effects.

oh, but we're also proposing removing the side-effects? SGTM in that case

Yup, the current implementation no longer automatically deletes the device on the server side, but relies on the client to delete/create a new device. So we're going to try to make this a GET.

nico-famedly · 2022-08-17T22:04:40Z

proposals/3814-dehydrated-devices-with-ssss.md

+```
+
+Once a client calls `POST /dehydrated_device/{device_id}/events`, the server
+can delete the device (though not necessarily its to-device messages).  Once a


Why should the server delete the device? Shouldn't this rather be done by the client explicitly in a delete call?

Imo it is not quite obvious that fetching the events should "break" the device. A client might fail to properly restore and now you lost all the intermediate sessions. instead the client should replace the device once it is somewhat sure it restored successfully and has uploaded the megolm keys to online backup.

The idea is that when the client starts getting events, it means that the client is signalling its intention to use the dehydrated device, and it has been "claimed", so it shouldn't be used by anyone else. At this point, there isn't much that can be done if the client, e.g. fails to decrypt some messages. If it fails to decrypt messages with the dehydrated device, it's unlikely that leaving the device around will fix anything in the future -- any future attempts would likely fail as well. So the best thing to do is to replace the dehydrated device with a new one anyways.

I'm not insistent on this endpoint deleting the dehydrated device, but I think that once you start using a dehydrated device, you'll want to create a new device no matter what.

It's more that if the device fetches the first few events but then the user closes the browser and it never gets to upload the devices, then you have effectively thrown the dried fish out the window without properly getting to use it. So in that case it should either only delete the device, when it deletes the first few messages (i.e. by the client paginating with a next token), or just wait for the user to send a new device. Since you CAN still use the same dried device from another device, I think. All of the messages will be PRE KEY messages, so you can decrypt them as long as you haven't deleted the one time keys from the pickled device. So even if a client downloads the first batch of messages and then starts with the next batch and the first batch gets deleted, a different client should still be able to pick up from there.

I agree that you want to create a new device no matter what, but that can just be done by uploading a new one instead of implicitly doing it when receiving messages.

Yeah, the issue of a client starting to load events and then dying somehow is possible, but seems like it would be extremely rare.

I think another consideration is that a client could forget to replace the dehydrated device. If the device gets deleted automatically, then it makes it obvious that the client didn't do that.

In any event, I think it's fine to try it out with GET and without automatically deleting the device, and see how it goes.

There's also an issue if there's a connection problem during the POST request; the client presumably won't be able to re-try the request because the device will have been deleted.
We generally try to design our APIs so that they work / can be retried even if the request fails half-way through for some reason.

Actually, the idea is that the dehydrated device is "deleted" in the sense that no other client can claim it, and if a client queries for the dehydrated device, it won't be returned. But the events associated with it are still there and can be retrieved (until the events get deleted as described elsewhere in the MSC), so if the client re-tries the POST request, it will still get the events.

Yes, the problem is just that if the client fails to replace the device, then a failed login attempt will break the device dehydration until the next successfull login, since there is no way to receive messages in the meantime (while that would work fine if the device is just kept).

I actually ran into another race condition here in production. We currently have it implemented that the PUT of a new device removes the old device. However, since uploading a new device takes several requests (claim new device, upload keys, sign it, upload encrypted device), we run into a race condition, where the user closes the browser window during one of the steps and maybe only signs back in later. That means we have an unhydrateable device and we again lose messages over the gap. Ideally there would be some way to make this atomic to prevent this race condition.

I agree that having this be a PUT is the wrong pattern here. If the whole idea here is for a Client to fetch to-device events and room keys then we should design the flows in a way that minimizes the risk of loss of such room keys.

As it stands the flows can't be resumed once you start fetching the to-device events, and the fetching of the to-device events will be by far the longest operation here.

What would allow perfect resumption is:

The dehydrated device only gets deleted if the client requests so.

The to-device events only get deleted¹ when the dehydrated device gets deleted, this is opposed to the current mechanism, where to-device events get deleted once the server sees a next_batch from a previous request.

No 2. would ensure that, even if a device that attempts a rehydration gets stopped and deleted mid-rehydration, another, new device can restart the rehydration process.

I actually ran into another race condition here in production. We currently have it implemented that the PUT of a new device removes the old device. However, since uploading a new device takes several requests (claim new device, upload keys, sign it, upload encrypted device),

Agree here as well, PUT of a new device should happen in a single request which should upload the dehydrated device, its device keys and any one-time keys. I implemented a draft version of this behavior in this patch: matrix-org/synapse@777b305

proposals/3814-dehydrated-devices-with-ssss.md

nico-famedly · 2022-08-22T16:07:50Z

proposals/3814-dehydrated-devices-with-ssss.md

+dehydration algorithm `m.dehydration.v1.olm` will be called
+`org.matrix.msc3814.v1.olm`.  The SSSS name for the dehydration key will be
+`org.matrix.msc3814` instead of `m.dehydrated_device`.
+


Client implementation: https://gitlab.com/famedly/company/frontend/famedlysdk/-/merge_requests/1111

Server implementation: matrix-org/synapse#13581

Both not merged yet and notably missing is the dehydrated device format.

proposals/3814-dehydrated-devices-with-ssss.md

BillCarsonFr · 2023-07-03T08:10:20Z

proposals/3814-dehydrated-devices-with-ssss.md

+losing any megolm keys that were sent to the dehydrated device, but the client
+would likely have received those megolm keys itself.
+
+Alternatively, the client could perform a `/sync` for the dehydrated device,


Does this sill works with v2? can we still sync on the dehydrated device?

You can't sync as a different device in this proposal. You can fetch the events for that device, but this proposal implicitly deletes the device in that case, which means you can't keep the device alive after that. So imo your only option is to replace it (which is somewhat easy to do, but you might need to authenticate the new signature upload/device?).

clokep · 2023-07-14T17:55:21Z

proposals/3814-dehydrated-devices-with-ssss.md

+
+If the client is able to decrypt the data and wants to use the dehydrated
+device, the client retrieves the to-device messages sent to the dehydrated
+device by calling `POST /dehydrated_device/{device_id}/events`, where


Why include a device_id if you can only have a single dehydrated device? It has implications that you can provide more than one (and causes additional error checking of whether the provided device ID matches the dehydrated device ID).

I agree that the device ID is redundant. Though since this MSC has been written a new use-case for this endpoint has been found.

In the sliding sync world we have split out the fetching of to-device events into a separate sync loop. Namely one of the biggest problems of the existing /sync mechanism is that you get too much data all at once and the downloading and processing of that data prevents the UI from being updated.

To-device events are one of those things that are not directly related to the things that a client will want to display in a room or room list, so putting it into a separate sync loop allows the main loop to quickly send updates while to-device moves along in the background. More info here: matrix-org/matrix-rust-sdk#1928

I think that old sync could handle such a split as well, so I would suggest here to rename the endpoint to become /sync/to_device/{device_id} where device_id might be optional and used only in the case of a dehydrated device.

The reason for the device_id parameter is that, while one client is fetching events, another client could create a new dehydrated device. Without the device_id parameter, the server could think that the client wants to fetch the events for the new device which, if there are any, it won't be able to decrypt since it's for a device that it doesn't know about. With the device_id parameter, the server will at least be able to say that there are no more events (since the device has been replaced by a new one).

This sounds quite racy to me -- how does the server know that one dehydrated device is claimed? How would the client know to make a new one instead of claim the old one?

how does the server know that one dehydrated device is claimed?

It's OK for multiple clients to rehydrate the same device (unlike in the previous proposal), because it never becomes a real device. So the server can just wait until some client fetches all the events before dropping the device.

How would the client know to make a new one instead of claim the old one?

Making a new device and rehydrating an old one are two different use cases. Rehydration happens after you log in, and you're setting up encryption and trying to get keys. It only happens once in the device's lifetime. Creating a new dehydrated device would happen after you've already set up your encryption and already attempted to rehydrate a device.

clokep · 2023-07-14T18:00:17Z

proposals/3814-dehydrated-devices-with-ssss.md

+batches.  For the last batch of messages, the server will still send a
+`next_batch` token, and return an empty `events` array when called with that
+token, so that it knows that the client has successfully received all the
+messages and can clean up all the to-device messages for that device.


I'm failing to track what's going on here, I think it is:

Client requests with no next_batch; server returns some messages and a next_batch of A

Client requests with next_batch=A; server returns some messages and a next_batch of B

Client requests with next_batch=B; server returns an empty array of messages and a next_batch of C

Client requests with next_batch=C and discards any response

I'm unsure about steps 3 & 4. Why would the server provide a next_batch if it is already out of messages?

I think that in your example, the "last batch" of messages is 2. where the client requests with next_batch=A. So the server still returns a next_batch of B even though it knows that the next call will be empty. Then at 3. when the client requests with next_batch=B, the server discards all messages, and returns an empty array of messages and no next_batch. I agree that the text could be clearer.

proposals/3814-dehydrated-devices-with-ssss.md

@wrjlewis

No significant changes since 1.89.0rc1. - Add Unix Socket support for HTTP Replication Listeners. [Document and provide usage instructions](https://matrix-org.github.io/synapse/v1.89/usage/configuration/config_documentation.html#listeners) for utilizing Unix sockets in Synapse. Contributed by Jason Little. ([\matrix-org#15708](matrix-org#15708), [\matrix-org#15924](matrix-org#15924)) - Allow `+` in Matrix IDs, per [MSC4009](matrix-org/matrix-spec-proposals#4009). ([\matrix-org#15911](matrix-org#15911)) - Support room version 11 from [MSC3820](matrix-org/matrix-spec-proposals#3820). ([\matrix-org#15912](matrix-org#15912)) - Allow configuring the set of workers to proxy outbound federation traffic through via `outbound_federation_restricted_to`. ([\matrix-org#15913](matrix-org#15913), [\matrix-org#15969](matrix-org#15969)) - Implement [MSC3814](matrix-org/matrix-spec-proposals#3814), dehydrated devices v2/shrivelled sessions and move [MSC2697](matrix-org/matrix-spec-proposals#2697) behind a config flag. Contributed by Nico from Famedly, H-Shay and poljar. ([\matrix-org#15929](matrix-org#15929)) - Fix a long-standing bug where remote invites weren't correctly pushed. ([\matrix-org#15820](matrix-org#15820)) - Fix background schema updates failing over a large upgrade gap. ([\matrix-org#15887](matrix-org#15887)) - Fix a bug introduced in 1.86.0 where Synapse starting with an empty `experimental_features` configuration setting. ([\matrix-org#15925](matrix-org#15925)) - Fixed deploy annotations in the provided Grafana dashboard config, so that it shows for any homeserver and not just matrix.org. Contributed by @wrjlewis. ([\matrix-org#15957](matrix-org#15957)) - Ensure a long state res does not starve CPU by occasionally yielding to the reactor. ([\matrix-org#15960](matrix-org#15960)) - Properly handle redactions of creation events. ([\matrix-org#15973](matrix-org#15973)) - Fix a bug where resyncing stale device lists could block responding to federation transactions, and thus delay receiving new data from the remote server. ([\matrix-org#15975](matrix-org#15975)) - Better clarify how to run a worker instance (pass both configs). ([\matrix-org#15921](matrix-org#15921)) - Improve [the documentation](https://matrix-org.github.io/synapse/v1.89/admin_api/user_admin_api.html#login-as-a-user) for the login as a user admin API. ([\matrix-org#15938](matrix-org#15938)) - Fix broken Arch Linux package link. Contributed by @SnipeXandrej. ([\matrix-org#15981](matrix-org#15981)) - Remove support for calling the `/register` endpoint with an unspecced `user` property for application services. ([\matrix-org#15928](matrix-org#15928)) - Mark `get_user_in_directory` private since it is only used in tests. Also remove the cache from it. ([\matrix-org#15884](matrix-org#15884)) - Document which Python version runs on a given Linux distribution so we can more easily clean up later. ([\matrix-org#15909](matrix-org#15909)) - Add details to warning in log when we fail to fetch an alias. ([\matrix-org#15922](matrix-org#15922)) - Remove unneeded `__init__`. ([\matrix-org#15926](matrix-org#15926)) - Fix bug with read/write lock implementation. This is currently unused so has no observable effects. ([\matrix-org#15933](matrix-org#15933), [\matrix-org#15958](matrix-org#15958)) - Unbreak the nix development environment by pinning the Rust version to 1.70.0. ([\matrix-org#15940](matrix-org#15940)) - Update presence metrics to differentiate remote vs local users. ([\matrix-org#15952](matrix-org#15952)) - Stop reading from column `user_id` of table `profiles`. ([\matrix-org#15955](matrix-org#15955)) - Build packages for Debian Trixie. ([\matrix-org#15961](matrix-org#15961)) - Reduce the amount of state we pull out. ([\matrix-org#15968](matrix-org#15968)) - Speed up updating state in large rooms. ([\matrix-org#15971](matrix-org#15971)) * Bump anyhow from 1.0.71 to 1.0.72. ([\matrix-org#15949](matrix-org#15949)) * Bump click from 8.1.3 to 8.1.6. ([\matrix-org#15984](matrix-org#15984)) * Bump cryptography from 41.0.1 to 41.0.2. ([\matrix-org#15943](matrix-org#15943)) * Bump jsonschema from 4.17.3 to 4.18.3. ([\matrix-org#15948](matrix-org#15948)) * Bump pillow from 9.4.0 to 10.0.0. ([\matrix-org#15986](matrix-org#15986)) * Bump prometheus-client from 0.17.0 to 0.17.1. ([\matrix-org#15945](matrix-org#15945)) * Bump pydantic from 1.10.10 to 1.10.11. ([\matrix-org#15946](matrix-org#15946)) * Bump pygithub from 1.58.2 to 1.59.0. ([\matrix-org#15834](matrix-org#15834)) * Bump pyo3-log from 0.8.2 to 0.8.3. ([\matrix-org#15951](matrix-org#15951)) * Bump sentry-sdk from 1.26.0 to 1.28.1. ([\matrix-org#15985](matrix-org#15985)) * Bump serde_json from 1.0.100 to 1.0.103. ([\matrix-org#15950](matrix-org#15950)) * Bump types-pillow from 9.5.0.4 to 10.0.0.1. ([\matrix-org#15932](matrix-org#15932)) * Bump types-requests from 2.31.0.1 to 2.31.0.2. ([\matrix-org#15983](matrix-org#15983)) * Bump typing-extensions from 4.5.0 to 4.7.1. ([\matrix-org#15947](matrix-org#15947))

This patch adds support for the endpoints used in [MSC3814]. One notable change to the MSC here is that the PUT endpoint uploads the device and one-time keys as well. [MSC3814]: matrix-org/matrix-spec-proposals#3814 Co-authored-by: Kévin Commaille <76261501+zecakeh@users.noreply.github.com>

@wrjlewis

No significant changes since 1.89.0rc1. - Add Unix Socket support for HTTP Replication Listeners. [Document and provide usage instructions](https://matrix-org.github.io/synapse/v1.89/usage/configuration/config_documentation.html#listeners) for utilizing Unix sockets in Synapse. Contributed by Jason Little. ([\matrix-org#15708](matrix-org#15708), [\matrix-org#15924](matrix-org#15924)) - Allow `+` in Matrix IDs, per [MSC4009](matrix-org/matrix-spec-proposals#4009). ([\matrix-org#15911](matrix-org#15911)) - Support room version 11 from [MSC3820](matrix-org/matrix-spec-proposals#3820). ([\matrix-org#15912](matrix-org#15912)) - Allow configuring the set of workers to proxy outbound federation traffic through via `outbound_federation_restricted_to`. ([\matrix-org#15913](matrix-org#15913), [\matrix-org#15969](matrix-org#15969)) - Implement [MSC3814](matrix-org/matrix-spec-proposals#3814), dehydrated devices v2/shrivelled sessions and move [MSC2697](matrix-org/matrix-spec-proposals#2697) behind a config flag. Contributed by Nico from Famedly, H-Shay and poljar. ([\matrix-org#15929](matrix-org#15929)) - Fix a long-standing bug where remote invites weren't correctly pushed. ([\matrix-org#15820](matrix-org#15820)) - Fix background schema updates failing over a large upgrade gap. ([\matrix-org#15887](matrix-org#15887)) - Fix a bug introduced in 1.86.0 where Synapse starting with an empty `experimental_features` configuration setting. ([\matrix-org#15925](matrix-org#15925)) - Fixed deploy annotations in the provided Grafana dashboard config, so that it shows for any homeserver and not just matrix.org. Contributed by @wrjlewis. ([\matrix-org#15957](matrix-org#15957)) - Ensure a long state res does not starve CPU by occasionally yielding to the reactor. ([\matrix-org#15960](matrix-org#15960)) - Properly handle redactions of creation events. ([\matrix-org#15973](matrix-org#15973)) - Fix a bug where resyncing stale device lists could block responding to federation transactions, and thus delay receiving new data from the remote server. ([\matrix-org#15975](matrix-org#15975)) - Better clarify how to run a worker instance (pass both configs). ([\matrix-org#15921](matrix-org#15921)) - Improve [the documentation](https://matrix-org.github.io/synapse/v1.89/admin_api/user_admin_api.html#login-as-a-user) for the login as a user admin API. ([\matrix-org#15938](matrix-org#15938)) - Fix broken Arch Linux package link. Contributed by @SnipeXandrej. ([\matrix-org#15981](matrix-org#15981)) - Remove support for calling the `/register` endpoint with an unspecced `user` property for application services. ([\matrix-org#15928](matrix-org#15928)) - Mark `get_user_in_directory` private since it is only used in tests. Also remove the cache from it. ([\matrix-org#15884](matrix-org#15884)) - Document which Python version runs on a given Linux distribution so we can more easily clean up later. ([\matrix-org#15909](matrix-org#15909)) - Add details to warning in log when we fail to fetch an alias. ([\matrix-org#15922](matrix-org#15922)) - Remove unneeded `__init__`. ([\matrix-org#15926](matrix-org#15926)) - Fix bug with read/write lock implementation. This is currently unused so has no observable effects. ([\matrix-org#15933](matrix-org#15933), [\matrix-org#15958](matrix-org#15958)) - Unbreak the nix development environment by pinning the Rust version to 1.70.0. ([\matrix-org#15940](matrix-org#15940)) - Update presence metrics to differentiate remote vs local users. ([\matrix-org#15952](matrix-org#15952)) - Stop reading from column `user_id` of table `profiles`. ([\matrix-org#15955](matrix-org#15955)) - Build packages for Debian Trixie. ([\matrix-org#15961](matrix-org#15961)) - Reduce the amount of state we pull out. ([\matrix-org#15968](matrix-org#15968)) - Speed up updating state in large rooms. ([\matrix-org#15971](matrix-org#15971)) * Bump anyhow from 1.0.71 to 1.0.72. ([\matrix-org#15949](matrix-org#15949)) * Bump click from 8.1.3 to 8.1.6. ([\matrix-org#15984](matrix-org#15984)) * Bump cryptography from 41.0.1 to 41.0.2. ([\matrix-org#15943](matrix-org#15943)) * Bump jsonschema from 4.17.3 to 4.18.3. ([\matrix-org#15948](matrix-org#15948)) * Bump pillow from 9.4.0 to 10.0.0. ([\matrix-org#15986](matrix-org#15986)) * Bump prometheus-client from 0.17.0 to 0.17.1. ([\matrix-org#15945](matrix-org#15945)) * Bump pydantic from 1.10.10 to 1.10.11. ([\matrix-org#15946](matrix-org#15946)) * Bump pygithub from 1.58.2 to 1.59.0. ([\matrix-org#15834](matrix-org#15834)) * Bump pyo3-log from 0.8.2 to 0.8.3. ([\matrix-org#15951](matrix-org#15951)) * Bump sentry-sdk from 1.26.0 to 1.28.1. ([\matrix-org#15985](matrix-org#15985)) * Bump serde_json from 1.0.100 to 1.0.103. ([\matrix-org#15950](matrix-org#15950)) * Bump types-pillow from 9.5.0.4 to 10.0.0.1. ([\matrix-org#15932](matrix-org#15932)) * Bump types-requests from 2.31.0.1 to 2.31.0.2. ([\matrix-org#15983](matrix-org#15983)) * Bump typing-extensions from 4.5.0 to 4.7.1. ([\matrix-org#15947](matrix-org#15947)) # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEE1508oLYUKainYFJakD7OEIo53t0FAmTI2e4ACgkQkD7OEIo5 # 3t2x1RAAohu1Rmjv0mOqFR4P1YZpA5RFbYajcyq77n/ciDKSM1dqBelONqKOq2A9 # uGbVNm6rC+EFwIl5MF5TrFdsDQHvGcRgW6NpQDZ+uIUOYizjZH1g37BoNPLlGYQx # fmKG7/XqdWhSc5tHN9HsRHyHKmsndebjXoUCPKmieGZa1GLXvGwrNkWQlEpwd9Qu # mj3uewJxLFGgIIAOiplJ4UO8FaCbMD+By27hSiWtVsLT6pyav4HC2P8RQD1iv0jW # OXNHvEWyqfBPlsPOkCD4nQZrmZqa5GWLYfBm8zFgIBxNy+e33C07L4bO+QdCE86v # /SUKug/0nsp66jSZst1fM/M2ssXvjU+LNO9fqonOCZ4TiJ4i/yoa8AvmcAg5hy7C # HR9IBp9cMrQ2u1y2/knxF657AGHxgXEltgw0PDvZHowqsqoSb+5HWl0zv1wnVjMa # 2QYLKWPBk/AdlHkmC3S4/+gfVZVsT2RSBP3JUCbFyOqug9vXFvSGTfH07Lk4PDI3 # o5idBzumvyonsuC2ypkzlj49FAj21l/8DInxEpY9JcHdVncLWvu9gmLd+H7GY7H7 # ODa2gOynrsSGVH7IpOl6dpw/GH6R8ZlfHl87bFslOqVObBxquL/ODIoFOgld+MpT # YYXp+0tW564mg+AYw3+eo44JTq0lKh7eyENP3SqKN/Z8ssQL97c= # =Ar/g # -----END PGP SIGNATURE----- # gpg: Signature made Tue Aug 1 11:09:50 2023 BST # gpg: using RSA key D79D3CA0B61429A8A760525A903ECE108A39DEDD # gpg: key 903ECE108A39DEDD: new key but contains no user ID - skipped # gpg: Total number processed: 1 # gpg: w/o user IDs: 1 # gpg: Can't check signature: No public key # Conflicts: # poetry.lock # synapse/http/site.py # synapse/storage/databases/main/roommember.py

uhoreg · 2024-02-01T19:28:51Z

proposals/3814-dehydrated-devices-with-ssss.md

+   │Curve25519 key pair     │ KeyPair       │ 64               │
+   │Number of one-time keys │ u32           │ 4                │
+   │One-time keys           │ [OneTimeKey]  │ N * 69           │
+   │Fallback keys           │ FallbackKeys  │ 2 * 69           │


I think we only need one fallback key. Normal clients store two fallback keys because they upload a new fallback key after the current fallback key is used. But with dehydrated devices, we will never upload a new fallback.

uhoreg · 2024-02-01T19:31:00Z

proposals/3814-dehydrated-devices-with-ssss.md

+   │Number of one-time keys │ u32           │ 4                │
+   │One-time keys           │ [OneTimeKey]  │ N * 69           │
+   │Fallback keys           │ FallbackKeys  │ 2 * 69           │
+   │Next key ID             │ u32           │ 4                │


Likewise, I don't think we need this because we won't upload new OTKs after we upload the dehydrated device. It also makes assumptions about the structure of the OTK IDs, which may not be true for all olm implementations.

uhoreg · 2024-02-01T19:34:15Z

proposals/3814-dehydrated-devices-with-ssss.md

+   │Key ID                  │ u32           │ 4                │
+   │Is published            │ u8            │ 1                │


I don't think we need these, because the key ID doesn't get used when decrypting events, and any OTKs that are included in the dehydrated device will surely have been published -- any unpublished OTKs should just be omitted.

uhoreg · 2024-02-01T19:34:29Z

proposals/3814-dehydrated-devices-with-ssss.md

+   ├────────────────────────┼───────────────┼──────────────────┤
+   │Key ID                  │ u32           │ 4                │
+   │Is published            │ u8            │ 1                │
+   │Curve 25519 key pair    │ KeyPair       │ 69               │


Suggested change

│Curve 25519 key pair │ KeyPair │ 69 │

│Curve 25519 key pair │ KeyPair │ 64 │

uhoreg · 2024-02-02T22:50:50Z

proposals/3814-dehydrated-devices-with-ssss.md

+If no dehydrated device is available, the server responds with an error code of
+`M_NOT_FOUND`, HTTP code 404.
+
+If the client is able to decrypt the data and wants to use the dehydrated


I wonder if we can say something like: the server is allowed to discard any non-m.room.encrypted to-device message that it receives for the dehydrated device. There's no point in keeping key requests sent to the dehydrated device because it won't send anything back.

uhoreg · 2024-02-07T02:53:29Z

proposals/3814-dehydrated-devices-with-ssss.md

+will remove any previously-set dehydrated device.
+
+The client *must* use the public [Curve25519] [identity key] of the device,
+encoded as unpadded Base64, as the device ID.


Since the device ID ends up in a path parameter, we should probably make this URL-safe Base64 to avoid issues with clients failing to URL-encode the ID.

Suggested change

encoded as unpadded Base64, as the device ID.

encoded as unpadded URL-safe Base64, as the device ID.

This was previously considered, but was rejected as we would end up with two different representations of the same data (as device ID and as the key), which would be confusing.

proposals/3814-dehydrated-devices-with-ssss.md

uhoreg · 2024-02-08T01:47:41Z

proposals/3814-dehydrated-devices-with-ssss.md

+  "device_keys": {
+    "user_id": "<user_id>",
+    "device_id": "<device_id>",
+    "valid_until_ts": <millisecond_timestamp>,


This property isn't defined anywhere

uhoreg · 2024-02-09T04:20:59Z

proposals/3814-dehydrated-devices-with-ssss.md

+If the given `device_id` is not the dehydrated device ID, the server responds
+with an error code of `M_FORBIDDEN`, HTTP code 403.
+
+### Deleting a dehydrated device


We should probably specify what happens when we use POST /delete_devices and DELETE /devices/{deviceId} on the dehydrated device. Also POST /logout/all. Presumably those would delete the dehydrated device. So maybe we don't need our own DELETE endpoint here? Though this endpoint allows you to delete the dehydrated device without knowing the device ID, so might still be useful.

uhoreg · 2024-02-09T23:52:47Z

proposals/3814-dehydrated-devices-with-ssss.md

+TODO: Explain why the double derivation is necessary.
+
+The encryption key used for the dehydrated device will be randomly generated
+and stored/shared via SSSS using the name `m.dehydrated_device`.


if I'm reading the iOS implementation correctly, the key is encoded with unpadded base64 (as is done with the other keys in secret storage)

proposals/3814-dehydrated-devices-with-ssss.md

richvdh · 2024-03-27T11:05:57Z

proposals/3814-dehydrated-devices-with-ssss.md

+For the last batch of messages, the server will still send a
+`next_batch` token, and return an empty `events` array when called with that


This is not what https://spec.matrix.org/v1.9/appendices/#pagination says should happen. I'd like to see this changed before this stabilises.

The original reason why this endpoint used an empty events array to signal the end was that the client using the final next_batch token would signal to the server that it could delete the to-device messages. Since we aren't doing that any more, we can make it work like the appendix says it should.

richvdh · 2024-05-13T20:09:07Z

proposals/3814-dehydrated-devices-with-ssss.md

+To reduce the chances of one-time key exhaustion, if the user has an active
+client, it can periodically replace the dehydrated device with a new dehydrated
+device with new one-time keys.  If a client does this, then it runs the risk of
+losing any megolm keys that were sent to the dehydrated device, but the client
+would likely have received those megolm keys itself.


Are we doing this [replacing the dehydrated device periodically] or not?

It seems like both have serious downsides. If we do replace it, we have a very racy operation that is certain to cause UTDs in practice. If we don't replace it, then we'll end up with no remaining OTKs at all, and an incredibly long list of to-device messages all of which have to be downloaded and decrypted by any new clients.

cvwright · 2024-05-21T20:09:47Z

proposals/3814-dehydrated-devices-with-ssss.md

+```math
+\begin{aligned}
+    DEVICE\_KEY
+    &= \text{HKDF} \left(\text{``Device ID``}, RANDOM\_KEY, \text{``dehydrated-device-pickle-key"}, 32\right)
+\end{aligned}
+```
+
+The `device_key` is then further expanded into a AES256 key, HMAC key and
+initialization vector.
+
+
+```math
+\begin{aligned}
+    AES\_KEY \parallel HMAC\_KEY \parallel AES\_IV
+    &= \text{HKDF}\left(0,DEVICE\_KEY,\text{``Pickle"},80\right)
+\end{aligned}
+```
+
+The plain-text is encrypted with [AES-256] in [CBC] mode with [PKCS#7] padding,
+using the key $`AES\_KEY`$ and the IV $`AES\_IV`$ to give the cipher-text.
+
+Then the cipher-text are passed through [HMAC-SHA-256]. The first 8 bytes of the
+MAC are appended to the cipher-text.
+
+The cipher-text, including the appended MAC tag, are encoded using unpadded
+Base64 to give the device pickle.


Why is this key derivation so different from the existing symmetric scheme for encrypting secrets in secret storage? I'm looking at https://spec.matrix.org/v1.10/client-server-api/#msecret_storagev1aes-hmac-sha2-1

If the existing m.secret_storage.v1.aes-hmac-sha2 is not good enough for this key, then why are we still using it for the other secrets?

If it is good enough, then why do something randomly different here?

IMO if there's going to be a new scheme for encrypting secrets in secret storage, then that should be a new MSC all on its own. Otherwise it feels like meaningless duplication of effort, and more chance for implementations to make a stupid mistake.

As you have noticed, this is a scheme already used in the stack, it's used by Olm/Megolm. The libolm pickle format is using it as well.

Not that I'm disagreeing with you on your other points, but the scheme isn't randomly different.

cvwright · 2024-05-21T20:26:29Z

proposals/3814-dehydrated-devices-with-ssss.md

+    AES\_KEY \parallel HMAC\_KEY \parallel AES\_IV
+    &= \text{HKDF}\left(0,DEVICE\_KEY,\text{``Pickle"},80\right)


Why is the IV not random here?

Is there a reason why this encryption needs to be deterministic? I don't think there is. But even if so, why not use a well known deterministic encryption mode like SIV?

Otherwise this looks pretty sketchy. In practice it's probably not going to lead to an exploit, but it should probably get flagged in a security audit.

Ok I think I see what's going on. Based on this comment https://github.com/matrix-org/matrix-rust-sdk/blob/794b11a0cecd2fab8e43931639eba5212fb89921/crates/matrix-sdk-crypto/src/dehydrated_devices.rs#L347 you're using the libolm pickle encryption scheme?

But given the warning in that Rust comment, it sounds like the potential for IV reuse here is a real problem. If an application fails to generate a new random key when it creates a new dehydrated device, then you're going to reuse the IV.

So I guess this just brings me back to my question below: Why not just use the existing m.secret_storage.v1.aes-hmac-sha2 construction here?

Initial proposal for dehydrated devices with SSSS

80243a4

uhoreg changed the title ~~MSCxxxx: Dehydrated devices with SSSS~~ MSC3814: Dehydrated devices with SSSS May 12, 2022

use MSC number

ed2c5eb

uhoreg changed the title ~~MSC3814: Dehydrated devices with SSSS~~ [WIP] MSC3814: Dehydrated devices with SSSS May 12, 2022

uhoreg added e2e proposal A matrix spec change proposal kind:feature MSC for not-core and not-maintenance stuff needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. labels May 12, 2022

dkasak reviewed May 13, 2022

View reviewed changes

proposals/3814-dehydrated-devices-with-ssss.md Outdated Show resolved Hide resolved

BillCarsonFr mentioned this pull request May 20, 2022

Dehydrated device Settings element-hq/element-meta#278

Open

This was referenced Jul 1, 2022

Dehydrated devices with SSSS element-hq/element-ios#6366

Open

Dehydrated devices with SSSS element-hq/element-android#6435

Open

Dehydrated devices with SSSS element-hq/element-web#22711

Closed

nico-famedly reviewed Aug 17, 2022

View reviewed changes

nico-famedly reviewed Aug 18, 2022

View reviewed changes

proposals/3814-dehydrated-devices-with-ssss.md Show resolved Hide resolved

nico-famedly reviewed Aug 22, 2022

View reviewed changes

proposals/3814-dehydrated-devices-with-ssss.md Outdated Show resolved Hide resolved

nico-famedly mentioned this pull request Aug 22, 2022

Support MSC3814: Dehydrated devices v2 aka shrivelled sessions matrix-org/synapse#13581

Closed

4 tasks

nico-famedly reviewed Aug 22, 2022

View reviewed changes

proposals/3814-dehydrated-devices-with-ssss.md Show resolved Hide resolved

wording improvements and clarifications

703281e

opusforlife2 mentioned this pull request Oct 21, 2022

[WIP] MSC2697: Device dehydration #2697

Closed

uhoreg mentioned this pull request Feb 8, 2023

Certain messages are not available after re-login element-hq/element-web#23381

Closed

BillCarsonFr reviewed Jul 3, 2023

View reviewed changes

H-Shay mentioned this pull request Jul 12, 2023

Support MSC3814: Dehydrated Devices matrix-org/synapse#15929

Merged

clokep reviewed Jul 14, 2023

View reviewed changes

proposals/3814-dehydrated-devices-with-ssss.md Outdated Show resolved Hide resolved

poljar mentioned this pull request Aug 4, 2023

Support MSC3814: Dehydrated Devices Part 2 matrix-org/synapse#16010

Merged

poljar added 7 commits August 9, 2023 14:49

Uploading a dehydrated device now uploads the public keys as well

0a149c5

Make the next_batch token non-optional in the response

a4e87a6

Let's not delete to-device events when a client receives them

3827bc0

Introduce the DELETE endpoint

12acd43

Attempt to define the dehydration format

f756db3

Don't use operatorname, try to unwedge the Latex

6223db4

More Latex tweaks

e3c9ac8

poljar added 3 commits September 5, 2023 10:46

Remove the bytes unit from every single row, put it in the header

7f24f0d

Attempt to fix the math rendering

f85c18d

Align the table headers for the pickle format

4954c27

uhoreg commented Feb 1, 2024

View reviewed changes

uhoreg commented Feb 2, 2024

View reviewed changes

uhoreg commented Feb 7, 2024

View reviewed changes

uhoreg commented Feb 8, 2024

View reviewed changes

proposals/3814-dehydrated-devices-with-ssss.md Outdated Show resolved Hide resolved

Fix JSON example

087154a

uhoreg commented Feb 8, 2024

View reviewed changes

uhoreg commented Feb 9, 2024

View reviewed changes

link to fallback key spec

e7c8266

uhoreg commented Feb 15, 2024

View reviewed changes

proposals/3814-dehydrated-devices-with-ssss.md Show resolved Hide resolved

add dehydrated flag

cf5ae99

richvdh reviewed Mar 27, 2024

View reviewed changes

andybalaam mentioned this pull request Mar 27, 2024

Add support for device dehydration v2 (Element R) matrix-org/matrix-js-sdk#4062

Merged

richvdh reviewed May 13, 2024

View reviewed changes

cvwright reviewed May 21, 2024

View reviewed changes

BillCarsonFr mentioned this pull request Jun 5, 2024

Fix | Share room keys with dehydrated devices with rust stack matrix-org/matrix-ios-sdk#1858

Merged

3 tasks

	│Curve 25519 key pair │ KeyPair │ 69 │
	│Curve 25519 key pair │ KeyPair │ 64 │

	encoded as unpadded Base64, as the device ID.
	encoded as unpadded URL-safe Base64, as the device ID.

		For the last batch of messages, the server will still send a
		`next_batch` token, and return an empty `events` array when called with that

		AES\_KEY \parallel HMAC\_KEY \parallel AES\_IV
		&= \text{HKDF}\left(0,DEVICE\_KEY,\text{``Pickle"},80\right)

[WIP] MSC3814: Dehydrated devices with SSSS #3814

Are you sure you want to change the base?

[WIP] MSC3814: Dehydrated devices with SSSS #3814

Conversation

uhoreg commented May 12, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

uhoreg commented May 12, 2022 •

edited

Loading