Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC3270: Symmetric megolm backup #3270

Closed
wants to merge 9 commits into from
Closed
129 changes: 129 additions & 0 deletions proposals/3270-symmetric-megolm-backup.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
# MSC3270: Symmetric megolm backup
The current megolm backup use asymmetric encryption. This was chosen so that
clients without the private key can still add their own megolm sessions to the
backup. This, however, allows a homeserver admin to inject their own malicious
megolm session into someones backup and then send an encrypted message as that
uhoreg marked this conversation as resolved.
Show resolved Hide resolved
user. Due to this, some clients such as Element [warn the
user](https://github.com/vector-im/element-web/issues/14323#issuecomment-740855963)
that a message cannot be authenticated when the megolm session for that
message was obtained from backup.

Using symmetric encryption for megolm backup would fix this attack vector,
since keys added by untrusted devices be undecryptable, thus allowing keys
uhoreg marked this conversation as resolved.
Show resolved Hide resolved
obtained from backup to be trusted. Additionally, many clients cache the
megolm private key anyways, making the original factor for choosing asymmetric
uhoreg marked this conversation as resolved.
Show resolved Hide resolved
encryption obsolete.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation details of some clients don't make "the original factor for choosing asymmetric
encryption obsolete" at the spec level. As long as a user doesn't use such a client, this doesn't affect them at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thought behind this paragraph was probably that some x-signing keys typically need to be cached anyways so it is trivial to toss the megolm key into said cache. online megolm backup outdates x-signing.


**Credits:** This proposal was originally written by @sorunome.

## Proposal
This proposal introduces a new megolm backup algorithm, `m.megolm_backup.v1.aes-hmac-sha2`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stays as v1, because it uses the same API, only the algorithm is different? (I would have expected a v2, but it doesn't really seem necessary, just surprising!)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My rationale is that it's v1 of the aes-hmac-sha2 algorithm. We probably should have put the version number after the name originally (so that we would end up with m.megolm_backup.aes-hmac-sha2.v1), but we did it this way to begin with.

I'm open to being persuaded that it should be different.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really see a reason for v2, apart from being able to distinguish the backup format the key name. (See other thread)

Copy link
Contributor

@poljar poljar Nov 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a couple of reasons:

  1. Newcomers can quickly know how many versions there were in total.
  2. It's not ambiguous which backup algorithm I mean if I just say the version number N.
  3. Since it's unlikely that the algorithm won't change when the version number changes we're going to have only v1 algorithms, which renders the version number pointless.
  4. Since the key itself didn't change but only the way we use it has, we can have a RecoveryKey struct that now has a decrypt_v1() and decrypt_v2() method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Soru commented:

Fair point, calling it v2 will be trivial

(It's too bad GitHub doesn't automatically put email replies in the right thread.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that it's that important that people can easily tell how many versions there are. Especially since the old algorithm will eventually be deprecated and (hopefully eventually) will be able to be ignored. So in practice we'll end up with just one version in use.

Also, I don't think that we would refer to backup algorithms by number -- I think that it would be more common to refer to them by name, or as "symmetric" versus "asymmetric". And I think it would be confusing to have two different ways (number and name) to distinguish the two algorithms embedded in the same string.

If people don't like the m.megolm_backup.v1.aes-hmac-sha2, my current first preference would be switching to m.megolm_backup.aes-hmac-sha2.v1 (so that it's clearer that we're talking about version 1 of the aes-hmac-sha2 method, rather than version 1 of megolm backup) and my second preference would be to drop the number altogether (m.megolm_backup.aes-hmac-sha2) so that we avoid the confusion altogether.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it would be more common to refer to them by name, or as "symmetric" versus "asymmetric".

I agree with this and I think it does have merit to name things in a way that makes it easy to refer to them, both from code (in the form of method names) and in casual conversation.

So I'm wondering if it's even important to include the crypto construct name in the version at all? It's clunky to spell out so it's unlikely to ever get used in conversation, and then you have to mentally translate between symmetric and aes-hmac-sha2. Would it perhaps make more sense to name it m.megolm_backup.symmetric.v1?

Barring that, I would prefer moving the version number to the end.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally try to include the algorithm names, as explained in https://spec.matrix.org/unstable/client-server-api/#messaging-algorithm-names

It works basically the same as the encryption in symmetric SSSS. The backup method `m.megolm_backup.v1.curve25519-aes-sha2`
is deprecated.

### `session_data`
The session data of the megolm backup is an object containing the `iv` property, the `ciphertext`
property and the `mac` property. Below is described how to generate these from the megolm backup key.
The data to be encoded is the same JSON-encoded data as in `m.megolm_backup.v1.curve25519-aes-sha2`.

As such, a complete `KeyBackupData` object could look as follows:

```json
{
"first_message_index": 0,
"forwarded_count": 0,
"is_verified": true,
"session_data": {
"iv": "cL/0MJZaiEd3fNU+I9oJrw==",
"ciphertext": "WL73Pzdk5wZdaaSpaeRH0uZYKcxkuV8IS6Qa2FEfA1+vMeRLuHcWlXbMX0w=",
"mac": "+xozp909S6oDX8KRV8D8ZFVRyh7eEYQpPP76f+DOsnw="
}
}
```

### Encryption
Encryption works the same as asymmetric SSSS (as such, this section is in parts taken off of the unstable
uhoreg marked this conversation as resolved.
Show resolved Hide resolved
spec). The secret name in symmetric SSSS is replaced with the session ID. They are encrypted using
AES-CTR-256 and authenticated using HMAC-SHA-256. The process works as follows:

1. Encode the session key to be backed up as a JSON object with the same
properties as with `m.megolm_backup.v1.curve25519-aes-sha2`, with the
addition of an optional property `untrusted`, which is a boolean indicating
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I discovered when writing matrix-org/matrix-spec#1294, the spec uses the term "trusted" to mean different things is different contexts, so we may want to consider using a different term. Suggestions welcome.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we maybe store what the session was signed by?

whether the megolm session should be considered as untrusted, for example
because it came from an untrusted source. If the `untrusted` property is
absent, the key should be considered as trusted.
2. Given the megolm backup key, generate 64 bytes by performing an HKDF with SHA-256 as the hash, a
salt of 32 bytes of 0, and with the session ID as the info. The first 32 bytes are used as the AES
key, the next 32 bytes are used as the MAC key.
3. Generate 16 random bytes, set bit 63 to 0 (in order to work around differences in AES-CTR implementations),
and use this as the AES initialization vector. This becomes the `iv` property, encoded using base64.
4. Stringify the JSON object and encrypt it using AES-CTR-256 using the AES key
generated above. This encrypted data, encoded using base64, becomes the
`ciphertext` property.
5. Pass the raw encrypted data (prior to base64 encoding) through HMAC-SHA-256 using the MAC generated
uhoreg marked this conversation as resolved.
Show resolved Hide resolved
above. The resulting MAC is base64-encoded and becomes the `mac` property.

### `auth_data`
Similar to symmetric SSSS, the `auth_data` object in the `versions` reply contains a MAC to verify
that you indeed have the correct private key. For that, you encrypt 32 bytes of 0's using the above
mentioned algorithm and put the `iv` and the `mac` into the `auth_data` object. As name / session ID
an empty string is used. As such, a reply to the `versions` endpoint could look as follows:

```json
{
"algorithm": "m.megolm_backup.v1.aes-hmac-sha2",
"auth_data": {
"iv": "cL/0MJZaiEd3fNU+I9oJrw==",
"mac": "+xozp909S6oDX8KRV8D8ZFVRyh7eEYQpPP76f+DOsnw="
},
"count": 42,
"etag": "meow",
"version": "foxies"
}
```

### Secret key storage
The secret key for symmetric megolm is stored in SSSS base64-encoded with the name `m.megolm_backup.v1`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the old encryption key be reused, that is already stored under this key? Or should a new one be generated? How does a client tell in that case, that this key is for the new or old backup?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the same issue that we have when you create a new backup version -- you can't tell if the thing stored in SSSS is for the new backup or the old backup. The best we can say right now is that the key stored in SSSS should be for the current backup, which means that there is a race when you're migrating from one backup to the other.

We could fix this by saying that we should store it using the name m.megolm_backup.v1.<backup version>, but it's unclear if we should fix this as part of this MSC, or as a separate MSC.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this special instance, we could also name the key v2, which would make it obvious, but we may want to reserve v2 for proper naming of the keys? Although I would prefer to only migrate all key backups once, in which case a new naming scheme may make sense being introduced in this as well.


### Transitioning to symmetric backup

Clients that implement this backup method should consider older clients,
which will not be able to use backups created using this method. For example,
clients can initially be updated so that they will be able to use backups
created using this method, but not yet create new backups using this method.
Some time later, once the client author deems that a sufficient number of
clients have been updated to use the new backup method, the client can be
modified such that new backups are created using this method.

## Potential issues
Many users already have an asymmetric megolm backup and rely on it. In order to
not lose any megolm keys, clients would have to implement a migration to
symmetric megolm backup. When doing so, the existing keys should be marked as
untrusted.

## Alternatives
Many different alternatives on how specifically to do the symmetric encryption can be thought of.
Keeping this the same as symmetric SSSS, however, gives the advantage that clients likely already
have the needed algorithms implemented and can just re-use them.

The authenticity of keys could be established in backups using the
`m.megolm_backup.v1.curve25519-aes-sha2` algorithm by adding a signature or a
MAC. However, this would require managing another key for the key backup. It
would be easier for clients to only need to manage one key.

## Security considerations
The proposed method requires clients to cache the encryption key for the key
backup. This means that an attacker who compromises a client that uses the key
backup will have access to all the key stored in the backup. Since most
clients already cache the decryption key for current backups, this is not a
change from current practice.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I don't disagree with this, I don't think that current implementation practices should legitimize increasing the impact of a client being compromised at the spec level.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current (unstable) spec with asymmetric megolm backup introduces a security issue (not because of the specific asymmetric implementation, but because of using asymmetric cryptography in a place where symmetric one is more appropriate), which lead to this MSC, as outlined in the first paragraph.

So, we are closing one bigger security issue with this. Clients who don't want to cache the key could always have the user manually input it for a one-shot sync or similar.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the risks here depend on your threat model. if you trust your server admin not to inject malicious history (but you want the legit history to be as secure as possible), then using a client with assymetric crypto which doesn’t cache the megolm key may be most appropriate. The use case here might be a govt server or something where you assume the server is generally trusted but you still don’t want your messages to be exposed if the server was compromised - and are happy for msgs whose keys came from msg backup to say “untrusted”.

Conversely, if you are a random user on a relatively untrusted server (eg using a public hosting solution), you might be very worried about the server sending you faked history when you log in on a new device (eg some malware attachment which appears to come from a verified user but was actually inserted by the server admin). In which case risks of an endpoint attack stealing your symmetric backup key might be less than the risk of a malicious server admin attack, making this MSC worthwhile.

I feel pretty uncomfortable that after all the effort the double ratchet takes to mitigate the risk of key exfiltration by having a self healing ratchet, we end up a) storing all our megolm keys locally anyway, b) storing our megolm backup key locally too. That said, given the megolm keys are stored locally anyway… perhaps storing the backup key is okay. (At which point, why are we even using the double ratchet in the first place?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are storing the backup key and all sessions you know about locally, I think symmetric backup is fine. If you don't want past messages to get compromised, wouldn't you rather just not use any backup at all and as soon as you log out, all your history is burned and lost? That gives you the best forward secrecy.

Asymmetric backup is somewhere in between, where only the device with the decryption key can decrypt the past messages, while other devices can only upload keys into the backup. But that backup can still get compromised, you could just theoretically require the client to always prompt for a password to decrypt past messages. That is however weakened by other clients readily sharing the backup key, if a verified device asks for it. So really, I think the best approach here is to disable the online key backup and rely on key gossiping instead to access past messages. Because then at least you always have the control of burning old decryption keys. Once you upload a backup, that is always there and once someone gets access to the key, those messages are all forever compromised.


To migrate an existing backup to the new method, clients will need the SSSS key
to read the existing backup and to store the key for the new backup. When the
user enters the SSSS key, the client will have access to all of the other
secrets stored in SSSS. In general, most users already trust their clients
with their secrets, or could select a trusted client to perform the migration.

## Unstable prefix
While this feature is in development, implementations should use
`org.matrix.msc3270.v1.aes-hmac-sha2` as the backup version.