Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC2732: Olm fallback keys #2732

Merged
merged 7 commits into from Jun 13, 2021
Merged

Conversation

uhoreg
Copy link
Member

@uhoreg uhoreg commented Aug 14, 2020

@uhoreg uhoreg changed the title MSCxxxx: Olm fallback keys MSC2732: Olm fallback keys Aug 14, 2020
@uhoreg uhoreg marked this pull request as ready for review August 14, 2020 20:46
@uhoreg uhoreg added the proposal A matrix spec change proposal label Aug 14, 2020
@turt2live turt2live added kind:feature MSC for not-core and not-maintenance stuff proposal-in-review labels Aug 14, 2020
@uhoreg
Copy link
Member Author

uhoreg commented Sep 15, 2020

are not deleted when they are returned by `/keys/claim`. However, the server
marks that they have been used.

A new response parameter, `device_unused_fallback_keys`, is added to `/sync`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean a supporting server will list your fallback key in every sync response indefinitely? It would be nice if there were some way for it to only appear when changed to avoid adding overhead to every sync response.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean a supporting server will list your fallback key in every sync response indefinitely?

Yes (until it's used). This makes it easier for client authors to keep the fallback keys up to date, as they just need to check if the array contains the key type expected, and if not, the client uploads a new key. This works for both initial setup (so the client doesn't need to remember if it already uploaded a key), and ensuring that the server has a new key.

This also mirrors the existing device_one_time_keys_count property.

@uhoreg
Copy link
Member Author

uhoreg commented Oct 7, 2020

@mscbot fcp merge

@mscbot
Copy link
Collaborator

mscbot commented Oct 7, 2020

Team member @mscbot has proposed to merge this. The next step is review by the rest of the tagged people:

Concerns:

Once at least 75% of reviewers approve (and there are no outstanding concerns), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for information about what commands tagged team members can give me.

@mscbot mscbot added disposition-merge proposed-final-comment-period Currently awaiting signoff of a majority of team members in order to enter the final comment period. labels Oct 7, 2020
@turt2live turt2live self-requested a review October 7, 2020 16:33
@uhoreg uhoreg added this to Ready for FCP ticks in Spec Core Team Backlog Oct 13, 2020
erikjohnston added a commit to matrix-org/synapse that referenced this pull request Oct 22, 2020
Synapse 1.22.0rc1 (2020-10-22)
==============================

Features
--------

- Add a configuration option for always using the "userinfo endpoint" for OpenID Connect. This fixes support for some identity providers, e.g. GitLab. Contributed by Benjamin Koch. ([\#7658](#7658))
- Add ability for `ThirdPartyEventRules` modules to query and manipulate whether a room is in the public rooms directory. ([\#8292](#8292), [\#8467](#8467))
- Add support for olm fallback keys ([MSC2732](matrix-org/matrix-spec-proposals#2732)). ([\#8312](#8312), [\#8501](#8501))
- Add support for running background tasks in a separate worker process. ([\#8369](#8369), [\#8458](#8458), [\#8489](#8489), [\#8513](#8513), [\#8544](#8544), [\#8599](#8599))
- Add support for device dehydration ([MSC2697](matrix-org/matrix-spec-proposals#2697)). ([\#8380](#8380))
- Add support for [MSC2409](matrix-org/matrix-spec-proposals#2409), which allows sending typing, read receipts, and presence events to appservices. ([\#8437](#8437), [\#8590](#8590))
- Change default room version to "6", per [MSC2788](matrix-org/matrix-spec-proposals#2788). ([\#8461](#8461))
- Add the ability to send non-membership events into a room via the `ModuleApi`. ([\#8479](#8479))
- Increase default upload size limit from 10M to 50M. Contributed by @Akkowicz. ([\#8502](#8502))
- Add support for modifying event content in `ThirdPartyRules` modules. ([\#8535](#8535), [\#8564](#8564))

Bugfixes
--------

- Fix a longstanding bug where invalid ignored users in account data could break clients. ([\#8454](#8454))
- Fix a bug where backfilling a room with an event that was missing the `redacts` field would break. ([\#8457](#8457))
- Don't attempt to respond to some requests if the client has already disconnected. ([\#8465](#8465))
- Fix message duplication if something goes wrong after persisting the event. ([\#8476](#8476))
- Fix incremental sync returning an incorrect `prev_batch` token in timeline section, which when used to paginate returned events that were included in the incremental sync. Broken since v0.16.0. ([\#8486](#8486))
- Expose the `uk.half-shot.msc2778.login.application_service` to clients from the login API. This feature was added in v1.21.0, but was not exposed as a potential login flow. ([\#8504](#8504))
- Fix error code for `/profile/{userId}/displayname` to be `M_BAD_JSON`. ([\#8517](#8517))
- Fix a bug introduced in v1.7.0 that could cause Synapse to insert values from non-state `m.room.retention` events into the `room_retention` database table. ([\#8527](#8527))
- Fix not sending events over federation when using sharded event writers. ([\#8536](#8536))
- Fix a long standing bug where email notifications for encrypted messages were blank. ([\#8545](#8545))
- Fix increase in the number of `There was no active span...` errors logged when using OpenTracing. ([\#8567](#8567))
- Fix a bug that prevented errors encountered during execution of the `synapse_port_db` from being correctly printed. ([\#8585](#8585))
- Fix appservice transactions to only include a maximum of 100 persistent and 100 ephemeral events. ([\#8606](#8606))

Updates to the Docker image
---------------------------

- Added multi-arch support (arm64,arm/v7) for the docker images. Contributed by @maquis196. ([\#7921](#7921))
- Add support for passing commandline args to the synapse process. Contributed by @samuel-p. ([\#8390](#8390))

Improved Documentation
----------------------

- Update the directions for using the manhole with coroutines. ([\#8462](#8462))
- Improve readme by adding new shield.io badges. ([\#8493](#8493))
- Added note about docker in manhole.md regarding which ip address to bind to. Contributed by @maquis196. ([\#8526](#8526))
- Document the new behaviour of the `allowed_lifetime_min` and `allowed_lifetime_max` settings in the room retention configuration. ([\#8529](#8529))

Deprecations and Removals
-------------------------

- Drop unused `device_max_stream_id` table. ([\#8589](#8589))

Internal Changes
----------------

- Check for unreachable code with mypy. ([\#8432](#8432))
- Add unit test for event persister sharding. ([\#8433](#8433))
- Allow events to be sent to clients sooner when using sharded event persisters. ([\#8439](#8439), [\#8488](#8488), [\#8496](#8496), [\#8499](#8499))
- Configure `public_baseurl` when using demo scripts. ([\#8443](#8443))
- Add SQL logging on queries that happen during startup. ([\#8448](#8448))
- Speed up unit tests when using PostgreSQL. ([\#8450](#8450))
- Remove redundant database loads of stream_ordering for events we already have. ([\#8452](#8452))
- Reduce inconsistencies between codepaths for membership and non-membership events. ([\#8463](#8463))
- Combine `SpamCheckerApi` with the more generic `ModuleApi`. ([\#8464](#8464))
- Additional testing for `ThirdPartyEventRules`. ([\#8468](#8468))
- Add `-d` option to `./scripts-dev/lint.sh` to lint files that have changed since the last git commit. ([\#8472](#8472))
- Unblacklist some sytests. ([\#8474](#8474))
- Include the log level in the phone home stats. ([\#8477](#8477))
- Remove outdated sphinx documentation, scripts and configuration. ([\#8480](#8480))
- Clarify error message when plugin config parsers raise an error. ([\#8492](#8492))
- Remove the deprecated `Handlers` object. ([\#8494](#8494))
- Fix a threadsafety bug in unit tests. ([\#8497](#8497))
- Add user agent to user_daily_visits table. ([\#8503](#8503))
- Add type hints to various parts of the code base. ([\#8407](#8407), [\#8505](#8505), [\#8507](#8507), [\#8547](#8547), [\#8562](#8562), [\#8609](#8609))
- Remove unused code from the test framework. ([\#8514](#8514))
- Apply some internal fixes to the `HomeServer` class to make its code more idiomatic and statically-verifiable. ([\#8515](#8515))
- Factor out common code between `RoomMemberHandler._locally_reject_invite` and `EventCreationHandler.create_event`. ([\#8537](#8537))
- Improve database performance by executing more queries without starting transactions. ([\#8542](#8542))
- Rename `Cache` to `DeferredCache`, to better reflect its purpose. ([\#8548](#8548))
- Move metric registration code down into `LruCache`. ([\#8561](#8561), [\#8591](#8591))
- Replace `DeferredCache` with the lighter-weight `LruCache` where possible. ([\#8563](#8563))
- Add virtualenv-generated folders to `.gitignore`. ([\#8566](#8566))
- Add `get_immediate` method to `DeferredCache`. ([\#8568](#8568))
- Fix mypy not properly checking across the codebase, additionally, fix a typing assertion error in `handlers/auth.py`. ([\#8569](#8569))
- Fix `synmark` benchmark runner. ([\#8571](#8571))
- Modify `DeferredCache.get()` to return `Deferred`s instead of `ObservableDeferred`s. ([\#8572](#8572))
- Adjust a protocol-type definition to fit `sqlite3` assertions. ([\#8577](#8577))
- Support macOS on the `synmark` benchmark runner. ([\#8578](#8578))
- Update `mypy` static type checker to 0.790. ([\#8583](#8583), [\#8600](#8600))
- Re-organize the structured logging code to separate the TCP transport handling from the JSON formatting. ([\#8587](#8587))
- Remove extraneous unittest logging decorators from unit tests. ([\#8592](#8592))
- Minor optimisations in caching code. ([\#8593](#8593), [\#8594](#8594))
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Oct 30, 2020
Synapse 1.22.0 (2020-10-27)
===========================

No significant changes.


Synapse 1.22.0rc2 (2020-10-26)
==============================

Bugfixes
--------

- Fix bugs where ephemeral events were not sent to appservices. Broke in v1.22.0rc1. ([\#8648](matrix-org/synapse#8648), [\#8656](matrix-org/synapse#8656))
- Fix `user_daily_visits` table to not have duplicate rows per user/device due to multiple user agents. Broke in v1.22.0rc1. ([\#8654](matrix-org/synapse#8654))

Synapse 1.22.0rc1 (2020-10-22)
==============================

Features
--------

- Add a configuration option for always using the "userinfo endpoint" for OpenID Connect. This fixes support for some identity providers, e.g. GitLab. Contributed by Benjamin Koch. ([\#7658](matrix-org/synapse#7658))
- Add ability for `ThirdPartyEventRules` modules to query and manipulate whether a room is in the public rooms directory. ([\#8292](matrix-org/synapse#8292), [\#8467](matrix-org/synapse#8467))
- Add support for olm fallback keys ([MSC2732](matrix-org/matrix-spec-proposals#2732)). ([\#8312](matrix-org/synapse#8312), [\#8501](matrix-org/synapse#8501))
- Add support for running background tasks in a separate worker process. ([\#8369](matrix-org/synapse#8369), [\#8458](matrix-org/synapse#8458), [\#8489](matrix-org/synapse#8489), [\#8513](matrix-org/synapse#8513), [\#8544](matrix-org/synapse#8544), [\#8599](matrix-org/synapse#8599))
- Add support for device dehydration ([MSC2697](matrix-org/matrix-spec-proposals#2697)). ([\#8380](matrix-org/synapse#8380))
- Add support for [MSC2409](matrix-org/matrix-spec-proposals#2409), which allows sending typing, read receipts, and presence events to appservices. ([\#8437](matrix-org/synapse#8437), [\#8590](matrix-org/synapse#8590))
- Change default room version to "6", per [MSC2788](matrix-org/matrix-spec-proposals#2788). ([\#8461](matrix-org/synapse#8461))
- Add the ability to send non-membership events into a room via the `ModuleApi`. ([\#8479](matrix-org/synapse#8479))
- Increase default upload size limit from 10M to 50M. Contributed by @Akkowicz. ([\#8502](matrix-org/synapse#8502))
- Add support for modifying event content in `ThirdPartyRules` modules. ([\#8535](matrix-org/synapse#8535), [\#8564](matrix-org/synapse#8564))


Bugfixes
--------

- Fix a longstanding bug where invalid ignored users in account data could break clients. ([\#8454](matrix-org/synapse#8454))
- Fix a bug where backfilling a room with an event that was missing the `redacts` field would break. ([\#8457](matrix-org/synapse#8457))
- Don't attempt to respond to some requests if the client has already disconnected. ([\#8465](matrix-org/synapse#8465))
- Fix message duplication if something goes wrong after persisting the event. ([\#8476](matrix-org/synapse#8476))
- Fix incremental sync returning an incorrect `prev_batch` token in timeline section, which when used to paginate returned events that were included in the incremental sync. Broken since v0.16.0. ([\#8486](matrix-org/synapse#8486))
- Expose the `uk.half-shot.msc2778.login.application_service` to clients from the login API. This feature was added in v1.21.0, but was not exposed as a potential login flow. ([\#8504](matrix-org/synapse#8504))
- Fix error code for `/profile/{userId}/displayname` to be `M_BAD_JSON`. ([\#8517](matrix-org/synapse#8517))
- Fix a bug introduced in v1.7.0 that could cause Synapse to insert values from non-state `m.room.retention` events into the `room_retention` database table. ([\#8527](matrix-org/synapse#8527))
- Fix not sending events over federation when using sharded event writers. ([\#8536](matrix-org/synapse#8536))
- Fix a long standing bug where email notifications for encrypted messages were blank. ([\#8545](matrix-org/synapse#8545))
- Fix increase in the number of `There was no active span...` errors logged when using OpenTracing. ([\#8567](matrix-org/synapse#8567))
- Fix a bug that prevented errors encountered during execution of the `synapse_port_db` from being correctly printed. ([\#8585](matrix-org/synapse#8585))
- Fix appservice transactions to only include a maximum of 100 persistent and 100 ephemeral events. ([\#8606](matrix-org/synapse#8606))


Updates to the Docker image
---------------------------

- Added multi-arch support (arm64,arm/v7) for the docker images. Contributed by @maquis196. ([\#7921](matrix-org/synapse#7921))
- Add support for passing commandline args to the synapse process. Contributed by @samuel-p. ([\#8390](matrix-org/synapse#8390))


Improved Documentation
----------------------

- Update the directions for using the manhole with coroutines. ([\#8462](matrix-org/synapse#8462))
- Improve readme by adding new shield.io badges. ([\#8493](matrix-org/synapse#8493))
- Added note about docker in manhole.md regarding which ip address to bind to. Contributed by @maquis196. ([\#8526](matrix-org/synapse#8526))
- Document the new behaviour of the `allowed_lifetime_min` and `allowed_lifetime_max` settings in the room retention configuration. ([\#8529](matrix-org/synapse#8529))


Deprecations and Removals
-------------------------

- Drop unused `device_max_stream_id` table. ([\#8589](matrix-org/synapse#8589))


Internal Changes
----------------

- Check for unreachable code with mypy. ([\#8432](matrix-org/synapse#8432))
- Add unit test for event persister sharding. ([\#8433](matrix-org/synapse#8433))
- Allow events to be sent to clients sooner when using sharded event persisters. ([\#8439](matrix-org/synapse#8439), [\#8488](matrix-org/synapse#8488), [\#8496](matrix-org/synapse#8496), [\#8499](matrix-org/synapse#8499))
- Configure `public_baseurl` when using demo scripts. ([\#8443](matrix-org/synapse#8443))
- Add SQL logging on queries that happen during startup. ([\#8448](matrix-org/synapse#8448))
- Speed up unit tests when using PostgreSQL. ([\#8450](matrix-org/synapse#8450))
- Remove redundant database loads of stream_ordering for events we already have. ([\#8452](matrix-org/synapse#8452))
- Reduce inconsistencies between codepaths for membership and non-membership events. ([\#8463](matrix-org/synapse#8463))
- Combine `SpamCheckerApi` with the more generic `ModuleApi`. ([\#8464](matrix-org/synapse#8464))
- Additional testing for `ThirdPartyEventRules`. ([\#8468](matrix-org/synapse#8468))
- Add `-d` option to `./scripts-dev/lint.sh` to lint files that have changed since the last git commit. ([\#8472](matrix-org/synapse#8472))
- Unblacklist some sytests. ([\#8474](matrix-org/synapse#8474))
- Include the log level in the phone home stats. ([\#8477](matrix-org/synapse#8477))
- Remove outdated sphinx documentation, scripts and configuration. ([\#8480](matrix-org/synapse#8480))
- Clarify error message when plugin config parsers raise an error. ([\#8492](matrix-org/synapse#8492))
- Remove the deprecated `Handlers` object. ([\#8494](matrix-org/synapse#8494))
- Fix a threadsafety bug in unit tests. ([\#8497](matrix-org/synapse#8497))
- Add user agent to user_daily_visits table. ([\#8503](matrix-org/synapse#8503))
- Add type hints to various parts of the code base. ([\#8407](matrix-org/synapse#8407), [\#8505](matrix-org/synapse#8505), [\#8507](matrix-org/synapse#8507), [\#8547](matrix-org/synapse#8547), [\#8562](matrix-org/synapse#8562), [\#8609](matrix-org/synapse#8609))
- Remove unused code from the test framework. ([\#8514](matrix-org/synapse#8514))
- Apply some internal fixes to the `HomeServer` class to make its code more idiomatic and statically-verifiable. ([\#8515](matrix-org/synapse#8515))
- Factor out common code between `RoomMemberHandler._locally_reject_invite` and `EventCreationHandler.create_event`. ([\#8537](matrix-org/synapse#8537))
- Improve database performance by executing more queries without starting transactions. ([\#8542](matrix-org/synapse#8542))
- Rename `Cache` to `DeferredCache`, to better reflect its purpose. ([\#8548](matrix-org/synapse#8548))
- Move metric registration code down into `LruCache`. ([\#8561](matrix-org/synapse#8561), [\#8591](matrix-org/synapse#8591))
- Replace `DeferredCache` with the lighter-weight `LruCache` where possible. ([\#8563](matrix-org/synapse#8563))
- Add virtualenv-generated folders to `.gitignore`. ([\#8566](matrix-org/synapse#8566))
- Add `get_immediate` method to `DeferredCache`. ([\#8568](matrix-org/synapse#8568))
- Fix mypy not properly checking across the codebase, additionally, fix a typing assertion error in `handlers/auth.py`. ([\#8569](matrix-org/synapse#8569))
- Fix `synmark` benchmark runner. ([\#8571](matrix-org/synapse#8571))
- Modify `DeferredCache.get()` to return `Deferred`s instead of `ObservableDeferred`s. ([\#8572](matrix-org/synapse#8572))
- Adjust a protocol-type definition to fit `sqlite3` assertions. ([\#8577](matrix-org/synapse#8577))
- Support macOS on the `synmark` benchmark runner. ([\#8578](matrix-org/synapse#8578))
- Update `mypy` static type checker to 0.790. ([\#8583](matrix-org/synapse#8583), [\#8600](matrix-org/synapse#8600))
- Re-organize the structured logging code to separate the TCP transport handling from the JSON formatting. ([\#8587](matrix-org/synapse#8587))
- Remove extraneous unittest logging decorators from unit tests. ([\#8592](matrix-org/synapse#8592))
- Minor optimisations in caching code. ([\#8593](matrix-org/synapse#8593), [\#8594](matrix-org/synapse#8594))
@@ -0,0 +1,97 @@
# MSC2732: Olm fallback keys
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not quite clear to me what exactly the purpose and tradeoffs of this entire design are (hence attaching this thread to the first line, as good a place as any).

What does this improve over just having a number of one-time keys? As I understand it, the benefit is that fallback keys can be used indefinitely and so there is no limit on established sessions like with pre-generated keys.

But then why not drop the pre-generated key mechanism in favour of this mechanism entirely? As I understand it, because the security guarantees of this model are weaker, and so it is preferable to use pre-generated keys where possible.

But then doesn't this weaken the overall security model, by making it possible for an attacker to intentionally exhaust all of someone's pre-generated keys, essentially carrying out a downgrade attack and forcing them (or rather, the people communicating with them) into fallback keys being used instead, which would weaken security properties?

I have difficulty squaring this circle and understanding how this can be both a) useful, b) secure, and c) additive to the current model. Can you shed some light on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. I have vague memories from staring at the signal protocol (which does the same trick) that this wasn't as much of a disaster as you might think, but I can't remember why. @uhoreg I think this is worth spelling out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I keep forgetting to respond to this.

But then why not drop the pre-generated key mechanism in favour of this mechanism entirely? As I understand it, because the security guarantees of this model are weaker, and so it is preferable to use pre-generated keys where possible.

My understanding is that it is slightly weaker security, but it's not a huge difference. So it is "better" to use one-time keys, but when one is unavailable, a fallback key is "good enough" in many cases. But this proposal allows clients to make up their own mind about the tradeoffs. If they don't think that a fallback key is secure enough, then they don't need to use it. If they don't care at all about the extra security from one-time keys vs. fallback keys, they can drop one-time keys completely and just rely on fallback keys. If they want to the extra security from one-time keys when available, but don't want to inconvenience the user when one-time keys aren't available, it can choose to do that too.

As an explanation of the security difference: under an assumption that an attacker cannot break Curve25519, and so must attack the client directly to get the private keys, there is not much difference between using a one-time key and a fallback key. If an attacker is able to extract the keys from the client, then they will have the ciient's private identity key, all the one-time keys that have already been used, and the current fallback key. And the only difference between having a private key for a one-time key and for a fallback key is that the fallback key may have been used already in a session that was already processed. But if a client promptly replaces a fallback key after it has been used and forgets the private key quickly (after it's reasonably sure that it has received all the sessions that use it), then the difference is small.

So I think this comes down to a case of cryptographic ideal vs. pragmatism. From a cryptographic standpoint, one-time keys are the way to go, because we're paranoid and worry about Curve25519 being broken. But practically speaking, the tradeoff between the possibility of someone cracking Curve25519 or attacking the client at just the right moment that they can decrypt some extra sessions, versus the inconvenience of having undecryptable messages because we ran out of one-time keys, for most people, I think, leans towards convenience in this case. But again, the client can make they choice about what to do, and the user can choose a client that matches their paranoia level.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an explanation of the security difference: under an assumption that an attacker cannot break Curve25519, and so must attack the client directly to get the private keys, there is not much difference between using a one-time key and a fallback key.

How do OTKs help if an attacker can break Curve25519? If they have the encrypted payloads it seems reasonable that they'll also see the OTK key go past as its claimed? Or am I missing something here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with OTKs you have to break every single curve25519 keypair. With fallback keys if you break the curve25519 fallback key you immidiately have all sessions with devices who used that fallback key to initiate communication.

Copy link
Contributor

@poljar poljar Mar 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to correct a misconception here: Signal added fallback keys to their protocol after we forked it into Olm. The same might apply to the MAC length recommendation; I don't remember offhand.

Is this correction of the misconception correct (🥁)? I have not been around back then but as far as I can tell libolm was forked from Axolotl (v2?) which did have a concept of a fallback key, Axolotl (v3?) which introduced x3dh appeared a couple of months after the fork. The removal of the fallback key can be found in this commit.

I don't know if the truncation has been changed or not, but I think that one case of such changes (the removal of the fallback key) and one potential such change (the removal of one-time keys) is enough to get my point across.

So, the relevant question here isn't "why were fallback keys stripped out", but rather "why didn't we immediately follow suit when Signal added them", which can be answered as (a) lack of tuits, but more fundamentally (b) I don't accept that we should automatically do everything Signal does just because Signal does it. They have some excellent ideas cryptographically, but at the end of the day their product is a bit different from ours, and certainly some of their decisions have been at least open to debate in the past.

I don't think this reversal is true given the above so I won't address everything, but my complaint isn't about following Signal step by step, it's about introducing changes to complex protocols without proper justification.

Fallback keys are used to defend against denial of service attacks. We should know why we don't want this property in the protocol.

One-time keys are used to offer better forward secrecy. Again, we should know why this isn't desirable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forward secrecy protects past sessions against future compromises of keys or passwords.

So given that we have established shared secrets, a compromise of secret keys in the future wouldn't compromise the shared secret. In this case the one-time keys that established the session are gone, if the fallback key is used it might not be gone.

To be almost unhelpfully pedantic here, OTKs don't strictly allow this as they're not deleted immediately on key use, they're deleted once the client has become aware that they've been used. Sure, this is likely a short window of time, but then if the fallback keys are only kept for a short amount of time the situations become effectively equivalent. I don't believe any of our forward secrecy actually immediately protects all old messages, rather it protects sufficiently old messages, i.e. there's always a period of time in the past where an attacker can still get messages for.

This is the crux of what confuses me: if OTKs only slightly reduce this window of time, and we're saying we're happy with the window that having fallback keys gives us (since we're using them), then are OTKs worth the additional complexity?

However:

Given that we don't know how many people have used our fallback key it becomes hard to know when we can throw the key away.

I think is what I've been missing here, I've been assuming that we can delete the fallback private keys quickly, and so the window of attack for OTKs vs fallback keys are effectively the same. If we have to keep round the fallback keys for hours, then that's probably(?) a sufficiently large window that using OTKs to reduce that windows makes sense.

The other piece here is that while an attacker can drain a device's OTKs, and so force new sessions to use the fallback keys, that is an active attack that can be observed by the servers and device. That will at least give some breadcrumbs that something fishy is going on, rather than allowing completely passive attacks. Though since draining of OTKs happens sufficiently often that we're making this MSC, I don't know if anybody would actually notice the draining of OTKs.

Basically: adding fallback keys weakens security, and while removing OTKs would weaken security some more, do they provide enough meaningful protection to warrant their complexity? It sounds like the answer is "yes, just", so I'm OK keeping them long term if people agree with the above analysis.


I'll stop being off topic in this MSC now and won't answer here anymore.

I'd rather these thoughts were recorded on the MSC for future reference, so that they can be linked back.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removal of the fallback key can be found in this commit.

oh. I'll get back in my box. Sorry.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think is what I've been missing here, I've been assuming that we can delete the fallback private keys quickly, and so the window of attack for OTKs vs fallback keys are effectively the same. If we have to keep round the fallback keys for hours, then that's probably(?) a sufficiently large window that using OTKs to reduce that windows makes sense.

Worse than that, since there is no way to determine the number of times the fallback key was retrieved (as far as I'm aware), we have no deterministic way of concluding that all sessions involving that fallback key have been created. So the choices are between keeping a given fallback key for an indefinite amount of time or else risk having undecryptable messages.

This doesn't happen with OTKs due to their property that they will be used at most once, so when a session is created for a given OTK, the client can conclude with certainty that it can drop that OTK.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worse than that, since there is no way to determine the number of times the fallback key was retrieved (as far as I'm aware), we have no deterministic way of concluding that all sessions involving that fallback key have been created. So the choices are between keeping a given fallback key for an indefinite amount of time or else risk having undecryptable messages.

Generally, it should be fairly safe to assume that when a user claims a one-time key (whether it's actually an OTK or ends up being a fallback key), they're going to use it right away. So clients should only need to keep the fallback key for a little time after it's used (maybe a couple minutes after the sync in which they notice that it's been used, to allow for network delays). The risk of having undecryptable messages is somewhat mitigated by having olm unwedging and key resharing, though that requires the sender to come back online.

This doesn't happen with OTKs due to their property that they will be used at most once, so when a session is created for a given OTK, the client can conclude with certainty that it can drop that OTK.

We actually do have a similar problem with OTKs. libolm tries to use a constant amount of memory, so it only has limited space for OTKs. That means that libolm will sometimes evict OTKs when it generates new ones, so if someone claims a OTK and waits too long to use it, it may have been evicted by the time they finally get around to using it.

@turt2live turt2live removed their request for review November 10, 2020 05:28
@turt2live turt2live added the spec-pr-missing Proposal has been implemented and is being used in the wild but hasn't yet been added to the spec label Jun 13, 2021
@turt2live turt2live removed the needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. label Aug 16, 2021
turt2live added a commit to turt2live/matrix-bot-sdk that referenced this pull request Aug 17, 2021
richvdh pushed a commit that referenced this pull request Aug 23, 2021
richvdh pushed a commit that referenced this pull request Aug 27, 2021
poljar added a commit to matrix-org/ruma that referenced this pull request Dec 3, 2021
This implements support for MSC2732[1], fallback keys. Only support to
upload and get notifications about fallback keys via `/sync` is
implemented.

[1]: matrix-org/matrix-spec-proposals#2732
poljar added a commit to matrix-org/ruma that referenced this pull request Dec 3, 2021
This implements support for MSC2732[1], fallback keys. Only support to
upload and get notifications about fallback keys via `/sync` is
implemented.

[1]: matrix-org/matrix-spec-proposals#2732
poljar added a commit to matrix-org/ruma that referenced this pull request Dec 3, 2021
This implements support for MSC2732[1], fallback keys. Only support to
upload and get notifications about fallback keys via `/sync` is
implemented.

[1]: matrix-org/matrix-spec-proposals#2732
jplatte pushed a commit to ruma/ruma that referenced this pull request Dec 6, 2021
This implements support for MSC2732[1], fallback keys. Only support to
upload and get notifications about fallback keys via `/sync` is
implemented.

[1]: matrix-org/matrix-spec-proposals#2732
turt2live added a commit that referenced this pull request Jan 1, 2022
@turt2live turt2live mentioned this pull request Jan 1, 2022
@turt2live
Copy link
Member

Spec PR: #3615

@turt2live turt2live added spec-pr-in-review A proposal which has been PR'd against the spec and is in review and removed spec-pr-missing Proposal has been implemented and is being used in the wild but hasn't yet been added to the spec labels Jan 1, 2022
turt2live added a commit that referenced this pull request Jan 5, 2022
* Specify fallback keys

MSC: #2732

* changelog

* Appease spell check

* Fine, let's appease the spellcheck this way

* Apply suggestions from code review

Co-authored-by: Hubert Chathi <hubert@uhoreg.ca>

* Fix intro

* word wrap

Co-authored-by: Hubert Chathi <hubert@uhoreg.ca>
@turt2live
Copy link
Member

Merged 🎉

@turt2live turt2live added merged A proposal whose PR has merged into the spec! and removed spec-pr-in-review A proposal which has been PR'd against the spec and is in review labels Jan 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
e2e kind:feature MSC for not-core and not-maintenance stuff merged A proposal whose PR has merged into the spec! proposal A matrix spec change proposal
Projects
Spec Core Team Backlog
  
Done to some definition
Development

Successfully merging this pull request may close these issues.

None yet