Scope transaction IDs to the (user, device_id) instead of the access token #1133

sandhose · 2022-06-16T14:40:27Z

Transaction IDs are currently scoped by access token. They are used to make some request idempotent, and to help client map sent event when they get back in /sync.

Problem is, since MSC2918 (aka. refresh tokens), a client may refresh its access token, which may lead to scenarios like this:

the client starts a /sync with an access token T1
since T1 is about to expire, it refreshes it and gets an access token T2
it sends an event via /rooms/{room}/send/{type}/{txnId} with the access token T2
/sync gets back with the newly created event, but without the transaction_id field, since that /sync was done with another access token

which directly contradicts the spec, where says on the transaction_id field:

The client-supplied transaction ID, for example, provided via PUT /_matrix/client/v3/rooms/{roomId}/send/{eventType}/{txnId}, if the client being given the event is the same one which sent it.

There are other places in the spec where we mention those transaction IDs:

The client-server API typically uses HTTP PUT to submit requests with a client-generated transaction identifier. This means that these requests are idempotent. The scope of a transaction identifier is a particular access token. It only serves to identify new requests from retransmits. After the request has finished, the {txnId} value should be changed (how is not specified; a monotonically increasing integer is recommended).

And on the txnId on /send:

The transaction ID for this event. Clients should generate an ID unique across requests with the same access token; it will be used by the server to ensure idempotency of requests.

What I suggest is to use the device_id (+ user_id) instead of the access token to scope those transaction IDs.

Related Synapse issue: matrix-org/synapse#13064 and PR: matrix-org/synapse#13083

The text was updated successfully, but these errors were encountered:

turt2live · 2022-06-16T15:25:18Z

this is arguably an MSC-needing thing or a spec bug. The refresh tokens MSC should have handled it at the time, but that ship has well sailed.

turt2live · 2022-06-30T21:53:49Z

@matrix-org/spec-core-team thoughts on JFDI and fixing the spec versus requiring an MSC?

richvdh · 2022-07-01T12:00:58Z

My thoughts:

To me, it's an obvious corollary of MSC2918 that the scope of a txnid would change from "access token" to "sequence of access tokens/refresh tokens". I don't know if there is a better way to express that concept, but I don't think we would need an MSC to adopt it.
It's less clear to me that (user_id, device_id) is exactly equivalent - particularly given that, as the spec currently stands, you can have multiple, active, access tokens for the same device_id at once (via the device_id param to /login - and that works just fine as long as you're not doing E2E. Accordingly, I think that gets into MSC territory.

uhoreg · 2022-07-04T10:11:50Z

It's less clear to me that (user_id, device_id) is exactly equivalent - particularly given that, as the spec currently stands, you can have multiple, active, access tokens for the same device_id at once (via the device_id param to /login - and that works just fine as long as you're not doing E2E. Accordingly, I think that gets into MSC territory.

Currently, the spec says that a device_id can only be bound to one active access token at a time:

If the client sets the device_id, the server will invalidate any access and refresh tokens previously assigned to that device.

though IIRC Synapse currently doesn't follow that.

But even with that, it's not clear that (user_id, device_id) is the right thing to do, since you don't need to have multiple active access tokens for the same device ID to create a conflict: client A sends an event with txnId: foo; client B logs in using the same device ID, evicting client A; client B sends a different event with txnId: foo, which should be accepted since it's sent by a different client, but will be dropped if txnIds are scoped to (user_id, device_id).

I agree that changing it from "access token" to "sequence of access tokens/refresh tokens" should be fine, if someone can think of a better way to word it. Maybe we need to define a "session" (or some term that we haven't already overloaded)? "client session"?

sandhose · 2022-07-04T13:28:34Z

But even with that, it's not clear that (user_id, device_id) is the right thing to do, since you don't need to have multiple active access tokens for the same device ID to create a conflict: client A sends an event with txnId: foo; client B logs in using the same device ID, evicting client A; client B sends a different event with txnId: foo, which should be accepted since it's sent by a different client, but will be dropped if txnIds are scoped to (user_id, device_id).

Am I wrong to think this seems like quite an improbable edge case for me? I guess it depends how clients generate their txnIDs, but for example, Element Web uses a mix of a timestamp + monotonically increasing counter, I don't see how it could ever happen.

I agree that changing it from "access token" to "sequence of access tokens/refresh tokens" should be fine, if someone can think of a better way to word it. Maybe we need to define a "session" (or some term that we haven't already overloaded)? "client session"?

My problem with that is it makes it much more difficult to implement, especially in the context of the OIDC work. I recently touched on that part of the code in Synapse, and introducing this kind of 'session' mechanism would require quite a big database migration, adding a unique session ID to existing access/refresh tokens. With the OIDC work, it would add additional requirements on the OP (auth server) which would make it hard for us to use other off-the-shelf OPs like Keycloak.

erikjohnston · 2022-07-04T13:35:38Z

.. since you don't need to have multiple active access tokens for the same device ID to create a conflict...

It's worth noting that to-device message delivery in /sync will break if you have multiple clients syncing with the same device ID, since we delete to-device messages was they've been delivered.

uhoreg · 2022-08-26T14:44:31Z

Could we say that transaction IDs are scoped to a "device" and then define a device as a sequence of access tokens?

sandhose · 2022-08-26T14:51:13Z

Could we say that transaction IDs are scoped to a "device" and then define a device as a sequence of access tokens?

That would complexify by a lot the eventual transition to OIDC, by adding a new concept to keep track of between the HS and the OP. I understand how it would 'fix' the current spec without breaking anything, but I'm also pretty confident that the breakage would be pretty minor in real world cases if we choose to scope the TXN IDs to device_ids.

turt2live · 2022-09-12T21:56:54Z

Spec-wise, this seems to be fixed by a "session" concept given the MSC's behaviour, though implementation complexity is a whole other note to worry about. To fix the implementation I think we'd need another MSC - it's just not as easy to make the leap towards that with the current understanding.

This adds two tests, which check the current spec behaviour of transaction IDs, which are that they are scoped to a series of access tokens, and not the device ID. The first test highlight this behaviour, by logging in with refresh token enabled, sending an event, using the refresh token and syncing with the new access token. On the sync, the transaction ID should be there, but currently in Synapse it is not. The second test highlight that the transaction ID is not scoped to the device ID, by logging in twice with the same device ID, sending an event with the first access token, and syncing with the second access token. In that case, the sync should not contain the transaction ID, but I think it's the case in HS implementations which use the device ID to scope the transaction IDs, like Conduit. Related: matrix-org/matrix-spec#1133, matrix-org/matrix-spec#1236, matrix-org/synapse#13064 and matrix-org/synapse#13083

sandhose added the clarification An area where the expected behaviour is understood, but the spec could do with being more explicit label Jun 16, 2022

turt2live added spec-bug Something which is in the spec, but is wrong and removed clarification An area where the expected behaviour is understood, but the spec could do with being more explicit labels Jun 16, 2022

turt2live added the release-blocker Blocks the next release from happening label Jun 30, 2022

turt2live mentioned this issue Sep 12, 2022

Clarify that refreshed access tokens don't invalidate the scope of txnid #1236

Merged

turt2live closed this as completed in #1236 Sep 26, 2022

sandhose mentioned this issue Feb 15, 2023

Test the scope of a transaction IDs matrix-org/complement#613

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scope transaction IDs to the (user, device_id) instead of the access token #1133

Scope transaction IDs to the (user, device_id) instead of the access token #1133

sandhose commented Jun 16, 2022

turt2live commented Jun 16, 2022

turt2live commented Jun 30, 2022

richvdh commented Jul 1, 2022

uhoreg commented Jul 4, 2022

sandhose commented Jul 4, 2022 •

edited

Loading

erikjohnston commented Jul 4, 2022

uhoreg commented Aug 26, 2022

sandhose commented Aug 26, 2022

turt2live commented Sep 12, 2022

Scope transaction IDs to the (user, device_id) instead of the access token #1133

Scope transaction IDs to the (user, device_id) instead of the access token #1133

Comments

sandhose commented Jun 16, 2022

turt2live commented Jun 16, 2022

turt2live commented Jun 30, 2022

richvdh commented Jul 1, 2022

uhoreg commented Jul 4, 2022

sandhose commented Jul 4, 2022 • edited Loading

erikjohnston commented Jul 4, 2022

uhoreg commented Aug 26, 2022

sandhose commented Aug 26, 2022

turt2live commented Sep 12, 2022

sandhose commented Jul 4, 2022 •

edited

Loading