Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long-living Access Token needed for internal batch processes/offline tasks? #62

Open
obfuscoder opened this issue Jan 15, 2024 · 6 comments

Comments

@obfuscoder
Copy link

The current spec states:

"Even for long-running "batch" jobs, a longer lived access token should be used to initiate the request to the batch endpoint. It then obtains short-lived Txn-Tokens that may be used to authorize the call to downstream services in the call-chain."

I assume the reasoning behind this is that there would be no need to define a Txn-Token Request without an original Access Token. However, I think that creating long-living Access Tokens just for the purpose of exchanging them for short-lived Txn-Tokens and then throwing them away can be considered wasteful (energy costs). Every internally originating job would need at least two calls instead of just one to obtain the Txn-Token.

Keeping those long-living Access Tokens around to re-use them for longer time is usually not useful for batch processes as those processes operate on a lot of user accounts in quick succession. Maintaining all those Access Tokens in memory is often not feasible. Neither is a storage system. I also think that using long-living Access Tokens in this scenario can add security risks. Those Access Tokens could leak and another party could use them to create Txn-Tokens or access resources the Access Token was meant for.

In our own implementation of a Txn-Service, the Txn-Service (Token Exchange) is also providing the Token endpoint and allows specific internal clients to use the Client Credentials Grant Type (mTLS client certificates) to obtain Txn-Tokens right away. The necessary input parameters are essentially a combination of the Token Request based on Client Credentials Grant and Token Exchange with a Txn-Token profile.

Is it possible to extend the Txn-Token Service interface to allow Txn-Token Requests without the Access Token as input? This would essentially mean to define a Txn-Token profile for the Token endpoint in addition to the Token Exchange protocol.

@gffletch
Copy link
Collaborator

So a couple of thoughts here:

  1. I believe the Txn-Token Service (TTS) must be able to issue Txn-Tokens without a previously issued access_token. Think about a use case where email is being delivered to a mail box. SMTP does not carry the access token that authorized the sending of the email. In that context, the in-bound email server needs to be able to request a Txn-Token for the purpose of delivering the email to the specified inbox.
  2. The general intent for keeping Txn-Tokens very short lived is to remove the need for revocation. Revocation is difficult for any token system (see the issuer-holder-verifier efforts in this space). Given that the vast majority of transactions complete in a very short timeframe, keeping the life-time of the Txn-Token short makes a lot of sense.
  3. There are some edge cases where a longer lived Txn-Token may be beneficial or required. Say for instance a user asks for their account to be deleted (GDPR) and the Txn-Token needs to live for multiple days? even though the underlying identity system may have cleaned up the user identifier data. That said, these edge cases should be vetted very carefully as there is a security risk if a Txn-Token lives for a long time and hence can be replayed potentially in compromising ways.

@obfuscoder
Copy link
Author

obfuscoder commented Jan 30, 2024

The use cases you mention are pretty much what we had to implement.

  1. For incoming mails we have a mail delivery agent which wants to use the mail storage system service to store the mail in the user mailbox. In order to write to the mailbox of a user, the delivery agent needs a Txn-Token with the sub of the user. The delivery agent asks the TTS about a Txn-Token and presents its client credentials and the mail address of the recipient/user. The TTS allows issuing Txn-Tokens to the delivery agent. There is no long-living access token involved. The issued Txn-Token is short lived (1 minute in our deployments).

  2. Whenever a user cancels their contract, we have to keep the data for some time (number of days depending on type of data, legal restrictions, etc.) after the delete operations are performed. The delayed delete operation is done by batch processes. Now, instead of creating and storing long-living access tokens which could be leaked and abused elsewhere, we decided to retain the short-lived Txn-Tokens and allow only those batch processes to ask the TTS to issue new Txn-Tokens for already expired Txn-Tokens to perform the actual delete operation which requires a valid and not expired Txn-Token when calling the storage system services.

So my argument is against using (and potentially abusing) long-living access tokens, and instead for allowing expired Txn-Tokens to be presented to the TTS to re-issue Txn-Tokens for the purpose of finalizing a delayed operation.

@tulshi
Copy link
Collaborator

tulshi commented Feb 1, 2024

I do not think we should allow the use of expired TraTs to authorize the issuance of new TraTs. This can be abused for token replay attacks, and negates the security we get through short-lived TraTs.

@obfuscoder
Copy link
Author

@tulshi Good point. In order to get a new TraTs, the caller needs proper client authentication (e.g. SPIFFE, mTLS, or Basic Auth credentials). My point was to only allow specific authenticated clients the exchange of expired TraTs. It is true that if those clients get hacked themselves (and the client authentication can be done by an attacker) new TraTs can be created and abused.

However, longer living TraTs have also the risk of being abused with replay attacks. I think the risk for replay attacks with longer living TraTs is even higher as an attacker would just need to get hold of the long-living TraTs and does not have the need to renew it.

@gffletch
Copy link
Collaborator

Using an expired Txn-Token has some interesting properties. Assuming the purp claim is narrowly scoped, the TTS should NOT issue a new Txn-Token with a different purp value.

I would probably recommend creating very specific purp values for this use case and then only allowing specific clients to request a Txn-Token where the subject_token value is the expired Txn-Token. This is currently allowed within the spec... the question might be whether we want to explicitly encourage or discourage such behavior.

In my mind, using an expired Txn-Token is a bit like requesting a replacement Txn-Token except the processing rules are even more strict.

@obfuscoder
Copy link
Author

Indeed, the spec does not say anything about whether and how the TTS is validating the txn token before issuing a replacement txn-token. So presenting an already expired txn-token can be accepted based on the authenticated client, purpose and scope. After all, it can be a specifically defined policy.

I recommend that we at least acknowledge this case in the spec and give guidance when to use it. As we also have the case of not presenting any token when requesting a txn-token (see #53), I think that as long as a workload is in possession of a txn-token - even if it is expired - we should recommend to present it when requesting replacement txn-tokens. It should be the decision of the TTS whether to accept the expired txn-token and not of the workload (the workload could just take the subject from the expired token and present it as the subject_token to the TTS).

One additional aspect came to our attention. There are actually two cases for expired txn-tokens:

  1. The exp claim says it is expired, but the signature is still valid
  2. The signature is not valid (anymore) as the key used to sign the token is not available anymore

Depending on how frequently a TTS is rotating its key material, the second case might occur for such long running processes like deleting account data several weeks after the customer has cancelled their contract. For instance our TTS implementation is rotating the keys daily. Only the next, current and previous signing keys are provided to workloads (via JWKS uri). Even if a workload is not able to verify the signature of a txn-token it received several weeks ago, the TTS might still be able to keep the old key material for a longer period so that it can at least validate the signature of those txn-tokens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants