Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC3983: Sending One-Time Key (OTK) claims to appservices #3983

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
4 changes: 4 additions & 0 deletions .github/_typos.toml
Original file line number Diff line number Diff line change
@@ -1,2 +1,6 @@
[default]
check-filename = true

[default.extend-identifiers]
OTK = "OTK"
OTKs = "OTKs"
178 changes: 178 additions & 0 deletions proposals/3983-sending-otk-claims-to-appservices.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
# MSC3983: Sending One-Time Key (OTK) claims to appservices

Presently in Matrix, the public portion of OTKs are [uploaded](https://spec.matrix.org/v1.6/client-server-api/#uploading-keys)
to the homeserver to ensure other devices can encrypt new messages without requiring the device to
be online and responsive. This works for devices operating exclusively over the Client-Server API,
however [appservices](https://spec.matrix.org/v1.6/application-service-api/) looking to support
encryption (through [MSC3202](https://github.com/matrix-org/matrix-spec-proposals/pull/3202) or
similar) could have millions or billions of users on them, which can easily translate to quite a
few public keys needing to be uploaded to the homeserver.

Given appservices *generally* have an uptime which is equivalent to the homeserver itself, and will
have already stored the public portion of its OTKs somewhere, we can save a bit of duplication by
having the homeserver delegate [`/keys/claim`](https://spec.matrix.org/v1.6/client-server-api/#post_matrixclientv3keysclaim)
requests to the appservice.

In numbers, a conservative estimate for an interoperable messaging bridge (appservice) would be
500 million users. Each user generates between 50 and 100 OTKs, so we'll pick the low end at 50.
That's 25 **billion** public keys. Currently in Matrix, that means the appservice stores 25 billion
keys and the homeserver stores a copy of those 25 billion keys.

This proposal introduces a mechanism for saving the homeserver from duplicating 25 billion keys.

## Background

Appservices can register a [namespace](https://spec.matrix.org/v1.6/application-service-api/#registration)
of users either exclusively (no one else can register users matching the regex) or implicitly (the
appservice receives events about those users, but can't prevent registration). Implicit namespaces
can be shared across multiple appservices.

## Proposal

For users under an appservice's explicit namespace, if that user has no unused OTKs (excluding fallback
keys) on the homeserver, the homeserver proxies the following APIs to the appservice using the new
turt2live marked this conversation as resolved.
Show resolved Hide resolved
API described below:
* [`/_matrix/client/v3/keys/claim`](https://spec.matrix.org/v1.6/client-server-api/#post_matrixclientv3keysclaim)
* [`/_matrix/federation/v1/user/keys/claim`](https://spec.matrix.org/v1.6/server-server-api/#post_matrixfederationv1userkeysclaim)

**`POST /_matrix/app/v1/keys/claim`**
```jsonc
// Request
{
"@alice:example.org": {
"DEVICEID": ["signed_curve25519", "signed_curve25519"] // device ID to algorithm names
},
// ...
}
```
```jsonc
// Response
{
"@alice:example.org": {
"DEVICEID": {
"signed_curve25519:AAAAHg": {
"key": "...",
"signatures": {
"@alice:example.org": {
"ed25519:DEVICEID": "..."
}
}
},
"signed_curve25519:BBBBHg": {
"key": "...",
"signatures": {
"@alice:example.org": {
"ed25519:DEVICEID": "..."
}
}
}
}
},
// ...
}
```

*Note*: Like other appservice endpoints, this endpoint should *not* be ratelimited and *does* require
normal [authentication](https://spec.matrix.org/v1.6/application-service-api/#authorization).

Multiple users, devices, and keys for those devices can be claimed in a single request. This is to
allow homeservers to batch multiple client/federation requests into a single request on the appservice,
if desirable. This is an optional optimization for homeserver implementations. In the example above, 2
keys are claimed for one device.

If the appservice responds with an error of any kind (including timeout), the homeserver uses the
fallback key, if known. The homeserver additionally uses the fallback key (if known) to fill in
missing keys from the appservice. For example, if the homeserver requested 2 keys for Alice but
the appservice only provided 1, the homeserver would use the fallback key to fulfill the second.
Comment on lines +83 to +86
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm left wondering why the appservice would bother uploaded fallback keys when (to my reading) it could return them directly as part of the response.

I suppose uploading them would still be useful for error conditions, however.


In this case, the appservice is responsible for ensuring it doesn't use a key twice. The
`device_one_time_keys_count` field for the appservice (over MSC3202, for example) would be zero. In
many implementations, when this field falls below a threshold it is common for upload requests to
happen: appservices intending on using the new API should not perform those uploads as it means,
quite simply, not using the new API.
Comment on lines +88 to +92
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having a hard time understanding what this paragraph is attempting to say.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it simply trying to say that implementations implementing MSC3983 should ignore device_one_time_keys_count?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more of a warning to existing implementations of crypto: nearly all of them do a if (count < 50) { generateAndUploadKeys() } call, which would be a problem here.

Copy link
Member

@dkasak dkasak Mar 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm sure, but what are implementations supposed to do instead? That's the part I'm unclear about from the above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"not that" :p

Normally implementation details like this wouldn't be called out, however given the chance for everyone to have the exact same bug (intending to use new API, uploads keys by accident, new API isn't used), this is called out here.


Normally the homeserver would be [ensuring](https://spec.matrix.org/v1.6/client-server-api/#one-time-and-fallback-keys)
OTKs are only used once, however with the appservice serving the endpoint it becomes the responsibility
of the appservice to perform this check.

If the homeserver uses the fallback key, that will be communicated in the traditional ways to the
appservice (namely through `device_unused_fallback_key_types` in the case of MSC3202).

We don't apply this API to implicit (non-exclusive) users as it's possible for multiple appservices
to have a namespace covering the user: instead of guessing or going around to each, we require the
user to be in an exclusive namespace. This guarantees that there's only one appservice responsible
for the user.

## Returning extra keys

**TODO**: This is probably best as its own MSC.

Independent of the appservice having `/keys/claim` proxied to it, it may be desireable for both the
fallback and one-time key to be returned. Servers should *always* include the fallback key alongside
the requested OTKs. When using this proposal's new endpoint, the server should use the fallback key
from the appservice's response rather than a previously stored fallback key, if present (if the
appservice doesn't respond with a fallback key then the server uses the stored fallback key instead,
if known).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this enough of a change that it need to be a v4 of the endpoint, perhaps? (Or a query parameter which defaults to the current behavior?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be fine, though open to thoughts. Currently clients appear to pick the "first" key returned, not validating if the extra data is useful to them. In theory this means we can include extra keys with no issues, though if there's a significant need to have it be a dedicated endpoint then let's do that.

We should probably only do this fallback stuff on an /unstable/org.matrix.msc3983 endpoint though, just in case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matrix-org/synapse#15462 implements this as a separate endpoint.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another piece of this I realized during implementation is if we need to also make a change on the federation side.

Comment on lines +112 to +115
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't define if the appservice should be queried for fallback keys if a OTK is in the database.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In lieu of MSC text: the appservice is supposed to be queried, similar to MSC3984 where the server uses what it knows if the appservice didn't provide it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I suppose the text in the MSC is enough as is:

For users under an appservice's explicit namespace, if that user has no unused OTKs (excluding fallback keys) on the homeserver, the homeserver proxies the following APIs to the appservice using the new API described below:

This would not match what you just wrote though and would mean it is not possible for an appservice to only provide fallback keys. I'm not sure if that's a good thing or a bad thing though. 🤷

If we really do want that then I think we want to use different endpoints or something. Otherwise the appservice wouldn't know if we truly want to query for OTKs or only for fallback keys (it could end up returning OTKs which go unused?)


The server SHOULD NOT replace any uploaded fallback keys with ones returned by the appservice via
this proposal. The appservice MUST re-upload the fallback key if it wants to replace it, as it would
do upon first (known) use.

Clients can determine which of the keys returned is the fallback key by `fallback: true` on the returned
keys.

## Potential issues

As described, the appservice could be offline or in fact experience a worse uptime than the homeserver.
This new API is optional for appservices: if they don't want to use it (because they know their uptime
will be bad), they can simply upload keys in advance, just like before this proposal. Similarly, if
the appservice is trying to use the API but is offline, they *should* have a fallback key to continue
using as, well, a fallback.
Comment on lines +130 to +132
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who is offline in this sentence? And which API are we talking about? Also, who is they?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole MSC is in context of an appservice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gathered as much 😄 But I'm still having trouble parsing that sentence in exact terms. e.g. If I interpret "but is offline" as "but the appservice is offline", how is the appservice able to do anything given it's offline?

I'd suggest rewriting the sentence with less implicit words / pronouns.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The appservice is opting into using the API by not uploading keys, effectively. This is it "using" the API, which might be the confusing part?


For appservices which never intend to upload keys there is a bit of a wasted lookup to see if there are
any keys for the user(s). This could be mitigated with an implementation-specific flag to skip the lookup
and just do proxying, though for the general case in this MSC the fallback key consideration is kept for
reliability concerns.

Similarly, if an appservice doesn't intend on uploading keys (because it doesn't support encryption) and
indicates the route is [unknown](https://spec.matrix.org/v1.6/application-service-api/#unknown-routes),
the homeserver could avoid calling appservice with a backoff to prevent excessive calls.

## Alternatives

Many encryption-capable bridges today can avoid uploading OTKs (and sometimes even device keys) because
they have a bot user in the room. The bot user uploads its keys, but the remaining bridge users do not.
This works if the bridge users don't need to be involved in rooms without the bot user present, though
being able to (securely) DM bridge users is a valuable consideration for this MSC. In future, scalable
encryption for appservices might take the shape of an appservice-wide device of some sort.

It could be argued that supporting a fallback key for appservices is too much considering their uptime,
however in practice appservices are not quite able to achieve 100% uptime. This proposal doesn't propose
proxying device/signing key queries to the appservice for the same reliability concerns, though appservices
which wish to opt to do so anyways could use [MSC3984](https://github.com/matrix-org/matrix-spec-proposals/pull/3984).

## Additional uses

An appservice aiming to bridge two different encryption systems might use this endpoint to save on data,
though currently the encryption used on both sides of the bridge would need to be compatible (ie: signatures
from device IDs and user IDs need to exist). In future, other MSCs might make encryption bridges easier to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from device IDs and user IDs need to exist). In future, other MSCs might make encryption bridges easier to
from device and user identities need to exist). In the future, other MSCs might make encryption bridges easier to

Device IDs and user IDs can't produce signatures, so the proposed change would make it a bit clearer to me.

build.

## Security considerations

No major considerations.

## Unstable prefix

While this MSC is not considered stable, implementations should use
`/_matrix/app/unstable/org.matrix.msc3983/keys/claim` as the endpoint instead. There is no version
compatibility check: homeservers implementing this functionality would receive an error from appservices
which don't support the endpoint and thus engage in the behaviour described by the MSC.

## Dependencies

This MSC has no direct dependencies, however is of little use without being partnered with something
like [MSC3202](https://github.com/matrix-org/matrix-spec-proposals/pull/3202).

This MSC is additionally useful when paired with [MSC3984](https://github.com/matrix-org/matrix-spec-proposals/pull/3984),
though has no direct dependency.