Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scope transaction IDs to the (user, device_id) instead of the access token #1133

Closed
sandhose opened this issue Jun 16, 2022 · 9 comments · Fixed by #1236
Closed

Scope transaction IDs to the (user, device_id) instead of the access token #1133

sandhose opened this issue Jun 16, 2022 · 9 comments · Fixed by #1236
Labels
release-blocker Blocks the next release from happening spec-bug Something which is in the spec, but is wrong

Comments

@sandhose
Copy link
Member

Transaction IDs are currently scoped by access token. They are used to make some request idempotent, and to help client map sent event when they get back in /sync.

Problem is, since MSC2918 (aka. refresh tokens), a client may refresh its access token, which may lead to scenarios like this:

  • the client starts a /sync with an access token T1
  • since T1 is about to expire, it refreshes it and gets an access token T2
  • it sends an event via /rooms/{room}/send/{type}/{txnId} with the access token T2
  • /sync gets back with the newly created event, but without the transaction_id field, since that /sync was done with another access token

which directly contradicts the spec, where says on the transaction_id field:

The client-supplied transaction ID, for example, provided via PUT /_matrix/client/v3/rooms/{roomId}/send/{eventType}/{txnId}, if the client being given the event is the same one which sent it.

There are other places in the spec where we mention those transaction IDs:

The client-server API typically uses HTTP PUT to submit requests with a client-generated transaction identifier. This means that these requests are idempotent. The scope of a transaction identifier is a particular access token. It only serves to identify new requests from retransmits. After the request has finished, the {txnId} value should be changed (how is not specified; a monotonically increasing integer is recommended).

And on the txnId on /send:

The transaction ID for this event. Clients should generate an ID unique across requests with the same access token; it will be used by the server to ensure idempotency of requests.


What I suggest is to use the device_id (+ user_id) instead of the access token to scope those transaction IDs.

Related Synapse issue: matrix-org/synapse#13064 and PR: matrix-org/synapse#13083

@sandhose sandhose added the clarification An area where the expected behaviour is understood, but the spec could do with being more explicit label Jun 16, 2022
@turt2live turt2live added spec-bug Something which is in the spec, but is wrong and removed clarification An area where the expected behaviour is understood, but the spec could do with being more explicit labels Jun 16, 2022
@turt2live
Copy link
Member

this is arguably an MSC-needing thing or a spec bug. The refresh tokens MSC should have handled it at the time, but that ship has well sailed.

@turt2live turt2live added the release-blocker Blocks the next release from happening label Jun 30, 2022
@turt2live
Copy link
Member

@matrix-org/spec-core-team thoughts on JFDI and fixing the spec versus requiring an MSC?

@richvdh
Copy link
Member

richvdh commented Jul 1, 2022

My thoughts:

  • To me, it's an obvious corollary of MSC2918 that the scope of a txnid would change from "access token" to "sequence of access tokens/refresh tokens". I don't know if there is a better way to express that concept, but I don't think we would need an MSC to adopt it.
  • It's less clear to me that (user_id, device_id) is exactly equivalent - particularly given that, as the spec currently stands, you can have multiple, active, access tokens for the same device_id at once (via the device_id param to /login - and that works just fine as long as you're not doing E2E. Accordingly, I think that gets into MSC territory.

@uhoreg
Copy link
Member

uhoreg commented Jul 4, 2022

  • It's less clear to me that (user_id, device_id) is exactly equivalent - particularly given that, as the spec currently stands, you can have multiple, active, access tokens for the same device_id at once (via the device_id param to /login - and that works just fine as long as you're not doing E2E. Accordingly, I think that gets into MSC territory.

Currently, the spec says that a device_id can only be bound to one active access token at a time:

If the client sets the device_id, the server will invalidate any access and refresh tokens previously assigned to that device.

though IIRC Synapse currently doesn't follow that.

But even with that, it's not clear that (user_id, device_id) is the right thing to do, since you don't need to have multiple active access tokens for the same device ID to create a conflict: client A sends an event with txnId: foo; client B logs in using the same device ID, evicting client A; client B sends a different event with txnId: foo, which should be accepted since it's sent by a different client, but will be dropped if txnIds are scoped to (user_id, device_id).

I agree that changing it from "access token" to "sequence of access tokens/refresh tokens" should be fine, if someone can think of a better way to word it. Maybe we need to define a "session" (or some term that we haven't already overloaded)? "client session"?

@sandhose
Copy link
Member Author

sandhose commented Jul 4, 2022

But even with that, it's not clear that (user_id, device_id) is the right thing to do, since you don't need to have multiple active access tokens for the same device ID to create a conflict: client A sends an event with txnId: foo; client B logs in using the same device ID, evicting client A; client B sends a different event with txnId: foo, which should be accepted since it's sent by a different client, but will be dropped if txnIds are scoped to (user_id, device_id).

Am I wrong to think this seems like quite an improbable edge case for me? I guess it depends how clients generate their txnIDs, but for example, Element Web uses a mix of a timestamp + monotonically increasing counter, I don't see how it could ever happen.

I agree that changing it from "access token" to "sequence of access tokens/refresh tokens" should be fine, if someone can think of a better way to word it. Maybe we need to define a "session" (or some term that we haven't already overloaded)? "client session"?

My problem with that is it makes it much more difficult to implement, especially in the context of the OIDC work. I recently touched on that part of the code in Synapse, and introducing this kind of 'session' mechanism would require quite a big database migration, adding a unique session ID to existing access/refresh tokens. With the OIDC work, it would add additional requirements on the OP (auth server) which would make it hard for us to use other off-the-shelf OPs like Keycloak.

@erikjohnston
Copy link
Member

.. since you don't need to have multiple active access tokens for the same device ID to create a conflict...

It's worth noting that to-device message delivery in /sync will break if you have multiple clients syncing with the same device ID, since we delete to-device messages was they've been delivered.

@uhoreg
Copy link
Member

uhoreg commented Aug 26, 2022

Could we say that transaction IDs are scoped to a "device" and then define a device as a sequence of access tokens?

@sandhose
Copy link
Member Author

Could we say that transaction IDs are scoped to a "device" and then define a device as a sequence of access tokens?

That would complexify by a lot the eventual transition to OIDC, by adding a new concept to keep track of between the HS and the OP. I understand how it would 'fix' the current spec without breaking anything, but I'm also pretty confident that the breakage would be pretty minor in real world cases if we choose to scope the TXN IDs to device_ids.

@turt2live
Copy link
Member

Spec-wise, this seems to be fixed by a "session" concept given the MSC's behaviour, though implementation complexity is a whole other note to worry about. To fix the implementation I think we'd need another MSC - it's just not as easy to make the leap towards that with the current understanding.

sandhose added a commit to sandhose/complement that referenced this issue Feb 15, 2023
This adds two tests, which check the current spec behaviour of
transaction IDs, which are that they are scoped to a series of access
tokens, and not the device ID.

The first test highlight this behaviour, by logging in with refresh
token enabled, sending an event, using the refresh token and syncing
with the new access token. On the sync, the transaction ID should be
there, but currently in Synapse it is not.

The second test highlight that the transaction ID is not scoped to the
device ID, by logging in twice with the same device ID, sending an event
with the first access token, and syncing with the second access token.
In that case, the sync should not contain the transaction ID, but
I think it's the case in HS implementations which use the device ID to
scope the transaction IDs, like Conduit.

Related: matrix-org/matrix-spec#1133, matrix-org/matrix-spec#1236, matrix-org/synapse#13064 and matrix-org/synapse#13083
sandhose added a commit to sandhose/complement that referenced this issue Feb 15, 2023
This adds two tests, which check the current spec behaviour of
transaction IDs, which are that they are scoped to a series of access
tokens, and not the device ID.

The first test highlight this behaviour, by logging in with refresh
token enabled, sending an event, using the refresh token and syncing
with the new access token. On the sync, the transaction ID should be
there, but currently in Synapse it is not.

The second test highlight that the transaction ID is not scoped to the
device ID, by logging in twice with the same device ID, sending an event
with the first access token, and syncing with the second access token.
In that case, the sync should not contain the transaction ID, but
I think it's the case in HS implementations which use the device ID to
scope the transaction IDs, like Conduit.

Related: matrix-org/matrix-spec#1133, matrix-org/matrix-spec#1236, matrix-org/synapse#13064 and matrix-org/synapse#13083
sandhose added a commit to sandhose/complement that referenced this issue Feb 16, 2023
This adds two tests, which check the current spec behaviour of
transaction IDs, which are that they are scoped to a series of access
tokens, and not the device ID.

The first test highlight this behaviour, by logging in with refresh
token enabled, sending an event, using the refresh token and syncing
with the new access token. On the sync, the transaction ID should be
there, but currently in Synapse it is not.

The second test highlight that the transaction ID is not scoped to the
device ID, by logging in twice with the same device ID, sending an event
with the first access token, and syncing with the second access token.
In that case, the sync should not contain the transaction ID, but
I think it's the case in HS implementations which use the device ID to
scope the transaction IDs, like Conduit.

Related: matrix-org/matrix-spec#1133, matrix-org/matrix-spec#1236, matrix-org/synapse#13064 and matrix-org/synapse#13083
sandhose added a commit to sandhose/complement that referenced this issue Feb 16, 2023
This adds two tests, which check the current spec behaviour of
transaction IDs, which are that they are scoped to a series of access
tokens, and not the device ID.

The first test highlight this behaviour, by logging in with refresh
token enabled, sending an event, using the refresh token and syncing
with the new access token. On the sync, the transaction ID should be
there, but currently in Synapse it is not.

The second test highlight that the transaction ID is not scoped to the
device ID, by logging in twice with the same device ID, sending an event
with the first access token, and syncing with the second access token.
In that case, the sync should not contain the transaction ID, but
I think it's the case in HS implementations which use the device ID to
scope the transaction IDs, like Conduit.

Related: matrix-org/matrix-spec#1133, matrix-org/matrix-spec#1236, matrix-org/synapse#13064 and matrix-org/synapse#13083
hughns pushed a commit to sandhose/complement that referenced this issue Apr 18, 2023
This adds two tests, which check the current spec behaviour of
transaction IDs, which are that they are scoped to a series of access
tokens, and not the device ID.

The first test highlight this behaviour, by logging in with refresh
token enabled, sending an event, using the refresh token and syncing
with the new access token. On the sync, the transaction ID should be
there, but currently in Synapse it is not.

The second test highlight that the transaction ID is not scoped to the
device ID, by logging in twice with the same device ID, sending an event
with the first access token, and syncing with the second access token.
In that case, the sync should not contain the transaction ID, but
I think it's the case in HS implementations which use the device ID to
scope the transaction IDs, like Conduit.

Related: matrix-org/matrix-spec#1133, matrix-org/matrix-spec#1236, matrix-org/synapse#13064 and matrix-org/synapse#13083
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-blocker Blocks the next release from happening spec-bug Something which is in the spec, but is wrong
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants