-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MSC3270: Symmetric megolm backup #3270
Changes from all commits
46e2f91
9555ca2
c850283
a88bbe7
51719ac
6596351
6a4a7c4
f08c4e4
c5b6e5d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,135 @@ | ||
# MSC3270: Symmetric megolm backup | ||
The current megolm backup uses asymmetric encryption. This was chosen so that | ||
clients without the private key can still add their own megolm sessions to the | ||
backup. This, however, allows a homeserver admin to inject their own malicious | ||
megolm session into someone’s backup and then send an encrypted message as a user | ||
that they wish to impersonate. Due to this, some clients such as Element [warn the | ||
user](https://github.com/vector-im/element-web/issues/14323#issuecomment-740855963) | ||
that a message cannot be authenticated when the megolm session for that | ||
message was obtained from backup. | ||
|
||
Using symmetric encryption for megolm backup would fix this attack vector, | ||
since keys added by untrusted devices would be undecryptable, thus allowing keys | ||
obtained from backup to be trusted. Additionally, many clients cache the | ||
megolm private key anyway, making the original reason for choosing asymmetric | ||
encryption obsolete. | ||
|
||
**Credits:** This proposal was originally written by @sorunome. | ||
|
||
## Proposal | ||
This proposal introduces a new megolm backup algorithm, `m.megolm_backup.v1.aes-hmac-sha2`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This stays as v1, because it uses the same API, only the algorithm is different? (I would have expected a v2, but it doesn't really seem necessary, just surprising!) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My rationale is that it's v1 of the I'm open to being persuaded that it should be different. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't really see a reason for v2, apart from being able to distinguish the backup format the key name. (See other thread) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see a couple of reasons:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Soru commented:
(It's too bad GitHub doesn't automatically put email replies in the right thread.) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure that it's that important that people can easily tell how many versions there are. Especially since the old algorithm will eventually be deprecated and (hopefully eventually) will be able to be ignored. So in practice we'll end up with just one version in use. Also, I don't think that we would refer to backup algorithms by number -- I think that it would be more common to refer to them by name, or as "symmetric" versus "asymmetric". And I think it would be confusing to have two different ways (number and name) to distinguish the two algorithms embedded in the same string. If people don't like the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I agree with this and I think it does have merit to name things in a way that makes it easy to refer to them, both from code (in the form of method names) and in casual conversation. So I'm wondering if it's even important to include the crypto construct name in the version at all? It's clunky to spell out so it's unlikely to ever get used in conversation, and then you have to mentally translate between Barring that, I would prefer moving the version number to the end. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We generally try to include the algorithm names, as explained in https://spec.matrix.org/unstable/client-server-api/#messaging-algorithm-names |
||
The backup method `m.megolm_backup.v1.curve25519-aes-sha2` | ||
is deprecated. | ||
|
||
### `session_data` | ||
The session data of the megolm backup is an object containing the `iv` property | ||
and the `ciphertext` property. Below is described how to generate these from | ||
the megolm backup key. The data to be encoded is the same JSON-encoded data as | ||
in `m.megolm_backup.v1.curve25519-aes-sha2`. | ||
|
||
As such, a complete `KeyBackupData` object could look as follows: | ||
|
||
```json | ||
{ | ||
"first_message_index": 0, | ||
"forwarded_count": 0, | ||
"is_verified": true, | ||
"session_data": { | ||
"iv": "cL/0MJZaiEd3fNU+I9oJrw", | ||
"ciphertext": "WL73Pzdk5wZdaaSpaeRH0uZYKcxkuV8IS6Qa2FEfA1+vMeRLuHcWlXbMX0w" | ||
} | ||
} | ||
``` | ||
|
||
### Encryption | ||
Room keys are encrypted using AES-GCM using the following process: | ||
|
||
1. Encode the session key to be backed up as a JSON object with the same | ||
properties as with `m.megolm_backup.v1.curve25519-aes-sha2`, with the | ||
addition of an optional property `untrusted`, which is a boolean indicating | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As I discovered when writing matrix-org/matrix-spec#1294, the spec uses the term "trusted" to mean different things is different contexts, so we may want to consider using a different term. Suggestions welcome. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could we maybe store what the session was signed by? |
||
whether the Megolm session should be considered as untrusted, for example | ||
because it came from an untrusted source. If the `untrusted` property is | ||
absent, the key should be considered as trusted. | ||
2. Given the megolm backup key, generate 32 bytes by performing an HKDF with | ||
SHA-256 as the hash, a salt of 32 bytes of 0, and with `<session ID>|<backup | ||
version>` as the info. This is the AES encryption key. | ||
3. Generate 16 random bytes, and use this as the AES initialization | ||
vector. This becomes the `iv` property, encoded using unpadded base64. | ||
4. Stringify the JSON object and encrypt it using AES-GCM-256 using the AES key | ||
and initialization vector generated above. This encrypted data, encoded | ||
using unpadded base64, becomes the `ciphertext` property. | ||
|
||
### `auth_data` | ||
Similar to symmetric SSSS, the `auth_data` object in the `versions` reply | ||
contains information to verify that you indeed have the correct private | ||
key. For that, you generate a random initialization vector and encrypt the | ||
empty string using the method above, and put the `iv` and the `ciphertext` into | ||
the `auth_data` object. For session ID and backup version, an empty string is | ||
used. As such, a reply to the `versions` endpoint could look as follows: | ||
|
||
```json | ||
{ | ||
"algorithm": "m.megolm_backup.v1.aes-hmac-sha2", | ||
"auth_data": { | ||
"iv": "cL/0MJZaiEd3fNU+I9oJrw", | ||
"ciphertext": "+xozp909S6oDX8KRV8D8ZFVRyh7eEYQpPP76f+DOsnw" | ||
}, | ||
"count": 42, | ||
"etag": "meow", | ||
"version": "foxies" | ||
} | ||
``` | ||
|
||
### Secret key storage | ||
The secret key for symmetric megolm is stored in SSSS base64-encoded with the | ||
name `m.megolm_backup.v1.<backup version>`. | ||
|
||
### Transitioning to symmetric backup | ||
|
||
Clients that implement this backup method should consider older clients, | ||
which will not be able to use backups created using this method. For example, | ||
clients can initially be updated so that they will be able to use backups | ||
created using this method, but not yet create new backups using this method. | ||
Some time later, once the client author deems that a sufficient number of | ||
clients have been updated to use the new backup method, the client can be | ||
modified such that new backups are created using this method. | ||
|
||
## Potential issues | ||
Many users already have an asymmetric megolm backup and rely on it. In order to | ||
not lose any megolm keys, clients would have to implement a migration to | ||
symmetric megolm backup. When doing so, the existing keys should be marked as | ||
untrusted. | ||
|
||
## Alternatives | ||
An earlier version of this MSC used the same encryption as is currently used in | ||
[symmetric | ||
SSSS](https://github.com/matrix-org/matrix-spec-proposals/pull/2472). However, | ||
it was | ||
[later](https://matrix.org/blog/2022/09/28/upgrade-now-to-address-encryption-vulns-in-matrix-sdks-and-clients) | ||
noted that this method is not "IND-CCA2 secure". As a result, it now uses | ||
AES-GCM-256 instead. | ||
|
||
The authenticity of keys could be established in backups using the | ||
`m.megolm_backup.v1.curve25519-aes-sha2` algorithm by adding a signature or a | ||
MAC. However, this would require managing another key for the key backup. It | ||
would be easier for clients to only need to manage one key. | ||
|
||
## Security considerations | ||
The proposed method requires clients to cache the encryption key for the key | ||
backup. This means that an attacker who compromises a client that uses the key | ||
backup will have access to all the key stored in the backup. Since most | ||
clients already cache the decryption key for current backups, this is not a | ||
change from current practice. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. While I don't disagree with this, I don't think that current implementation practices should legitimize increasing the impact of a client being compromised at the spec level. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The current (unstable) spec with asymmetric megolm backup introduces a security issue (not because of the specific asymmetric implementation, but because of using asymmetric cryptography in a place where symmetric one is more appropriate), which lead to this MSC, as outlined in the first paragraph. So, we are closing one bigger security issue with this. Clients who don't want to cache the key could always have the user manually input it for a one-shot sync or similar. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i think the risks here depend on your threat model. if you trust your server admin not to inject malicious history (but you want the legit history to be as secure as possible), then using a client with assymetric crypto which doesn’t cache the megolm key may be most appropriate. The use case here might be a govt server or something where you assume the server is generally trusted but you still don’t want your messages to be exposed if the server was compromised - and are happy for msgs whose keys came from msg backup to say “untrusted”. Conversely, if you are a random user on a relatively untrusted server (eg using a public hosting solution), you might be very worried about the server sending you faked history when you log in on a new device (eg some malware attachment which appears to come from a verified user but was actually inserted by the server admin). In which case risks of an endpoint attack stealing your symmetric backup key might be less than the risk of a malicious server admin attack, making this MSC worthwhile. I feel pretty uncomfortable that after all the effort the double ratchet takes to mitigate the risk of key exfiltration by having a self healing ratchet, we end up a) storing all our megolm keys locally anyway, b) storing our megolm backup key locally too. That said, given the megolm keys are stored locally anyway… perhaps storing the backup key is okay. (At which point, why are we even using the double ratchet in the first place?) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you are storing the backup key and all sessions you know about locally, I think symmetric backup is fine. If you don't want past messages to get compromised, wouldn't you rather just not use any backup at all and as soon as you log out, all your history is burned and lost? That gives you the best forward secrecy. Asymmetric backup is somewhere in between, where only the device with the decryption key can decrypt the past messages, while other devices can only upload keys into the backup. But that backup can still get compromised, you could just theoretically require the client to always prompt for a password to decrypt past messages. That is however weakened by other clients readily sharing the backup key, if a verified device asks for it. So really, I think the best approach here is to disable the online key backup and rely on key gossiping instead to access past messages. Because then at least you always have the control of burning old decryption keys. Once you upload a backup, that is always there and once someone gets access to the key, those messages are all forever compromised. |
||
|
||
To migrate an existing backup to the new method, clients will need the SSSS key | ||
to read the existing backup and to store the key for the new backup. When the | ||
user enters the SSSS key, the client will have access to all of the other | ||
secrets stored in SSSS. In general, most users already trust their clients | ||
with their secrets, or could select a trusted client to perform the migration. | ||
|
||
## Unstable prefix | ||
While this feature is in development, implementations should use | ||
`org.matrix.msc3270.v2.aes-hmac-sha2` as the backup algorithm. | ||
|
||
`org.matrix.msc3270.v1.aes-hmac-sha2` was used in a previous version that used | ||
AES-CTR-256 for encryption. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation details of some clients don't make "the original factor for choosing asymmetric
encryption obsolete" at the spec level. As long as a user doesn't use such a client, this doesn't affect them at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thought behind this paragraph was probably that some x-signing keys typically need to be cached anyways so it is trivial to toss the megolm key into said cache. online megolm backup outdates x-signing.