Refresh tokens need a grace period to deal with network errors and similar issues #1831

NavindrenBaskaran · 2020-04-28T06:41:21Z

So basically, on Hydra if the refresh token is already used to request for a new set of tokens, that refresh token is invalidated and can't be used to request for a new set of tokens. It's a one time use only.

At the event, where the tokens were actually issued by Hydra but the response was not sent to the users because of some network issues, the users are generally stuck. Is there a way for us to not invalidate the refresh token immediately and have a refresh token grace period to perform a duplicate refresh requests with the same refresh token.

aeneasr · 2020-04-28T07:21:15Z

At the event, where the tokens were actually issued by Hydra but the response was not sent to the users because of some network issues, the users are generally stuck. Is there a way for us to not invalidate the refresh token immediately and have a refresh token grace period to perform a duplicate refresh requests with the same refresh token.

This is currently not possible, the client would need to talk to /oauth2/auth again. A response getting lost in the network can of course happen but is very unlikely and does not justify issuing multiple tokens as this would not only potentially exponentially explode the amount of tokens issued but also enable replay attacks.

imranismail · 2020-04-28T08:08:12Z

@aeneasr does it help to mitigate the replay attack if we set a low default before the old token is completely unusable.

And to avoid exploding the amount of token issued, we could return the same token that was lost during the network error.

Please, just correct me if I'm wrong.

aeneasr · 2020-04-28T08:13:43Z

@aeneasr does it help to mitigate the replay attack if we set a low default before the old token is completely unusable.

Lowering TTL of things is typically a good guard against replayability, but doesn't resolve it completely of course :)

And to avoid exploding the amount of token issued, we could return the same token that was lost during the network error.

While this is a valid strategy and also supported by some servers, it does not work in ORY Hydra. We do not store the full access tokens in the database but instead generate it from a strong /dev/urandom reader. What we store in the database is its HMAC-SHA256 hash which we can use to look up a token and see if it's still valid.

This feature prevents anyone with database access from issuing access and refresh tokens without also having access to the ORY Hydra secrets.system.

aeneasr · 2020-08-21T08:38:04Z

#1928 (comment)

github-actions · 2021-09-21T00:02:15Z

I am marking this issue as stale as it has not received any engagement from the community or maintainers in over half a year. That does not imply that the issue has no merit! If you feel strongly about this issue

open a PR referencing and resolving the issue;
leave a comment on it and discuss ideas how you could contribute towards resolving it;
open a new issue with updated details and a plan on resolving the issue.

We are cleaning up issues every now and then, primarily to keep the 4000+ issues in our backlog in check and to prevent maintainer burnout. Burnout in open source maintainership is a widespread and serious issue. It can lead to severe personal and health issues as well as enabling catastrophic attack vectors.

Thank you for your understanding and to anyone who participated in the issue! 🙏✌️

If you feel strongly about this issues and have ideas on resolving it, please comment. Otherwise it will be closed in 30 days!

aeneasr · 2021-09-21T06:01:09Z

Closed in error

seanhoughton · 2021-10-14T21:53:22Z

This issue also prevents Ory Hydra from being used in a clustered environment without added complexity, as mentioned in this issue: ory/fosite#255

also see: https://datatracker.ietf.org/doc/html/rfc6819#section-5.2.2.3

Note: This measure may cause problems in clustered environments, since usage of the currently valid refresh token must be ensured. In such an environment, other measures might be more appropriate.

If two concurrent processes both attempt to refresh the token simultaneously one will always fail. This means either adding a global mutex to prevent concurrent refresh requests or adding retries. The mutex solution means more ops and hurts scalability while the retry solution adds complexity to the client code and creates a bunch of false-positive auth errors in logs.

A grace period would allow some hysteresis that would just need to last enough time to cover the "read token, check if expired, refresh token" code.

Auth0 and Okta solve this with a per-client configurable option to disable refresh token rotation entirely. An option to add a grace period would be even better.

mitar · 2021-11-17T11:44:31Z

Just to add another perspective here, maybe instead of using a grace period, idempotency keys is what should be implemented instead. That is not covered by any OpenID Connect standard though (there is a standardization draft for the header itself though), but it could be an opt-in extension, if you use idempotency key header, then you enable this behavior. An interesting thing about idempotency keys is that it is done at an API layer and would not require underlying changes to Fosite/Hydra storage.

aeneasr · 2021-11-19T08:01:44Z

Thank you for the idea - the concept looks really interesting. I'm not sure if it addresses the issue we have here though. The proposed standard prevents resending the same payload twice, as the idempotency key will only allow one request to pass.

We do have an idempotency key already! It's the refresh token! If you re-use the refresh token, the request fails.

The problem described here is that we want to be able to reuse the refresh token in certain scenarios. This might be the case in distributed applications, where it is difficult to get a shared state due to eventual consistency. Another example is that we have a race condition and send the same refresh token (idempotency key) twice, which ends up in rejecting the second request, and - following OAuth2 best practices - invalidates the refresh keys.

Hope this makes sense!

yangm97 · 2021-12-09T16:17:04Z

It seems like Google expects the old refresh token to keep working until "they have sent a request using the new token".

From Google Smart Home Test Suite Refresh Token Validation Test (emphasis mine):

Error: Refresh token has been rotated. This is not forbidden, however we do not see much benefit that rotating the refresh token can provide but the potential problem it has. We also tried to refresh token with the old refresh token after it has been rotated. Refresh didn't work, this means partner invalidated the old refresh token right after the rotation. Partners shall only invalidate the old refresh token after seeing we use the new one to ensure we got it successfully.

After this test fails the account link becomes broken. I can only assume Google threw away the new refresh token because of the test failure and it would use the new token under normal circumstances. More on account linking.

Google's OAuth2 implementation allows up to 50 refresh tokens to be valid concurrently. That's a lot of tokens and I'm not sure how they implement token reuse detection but I'm afraid having more than one valid refresh token at any given time may be required to avoid account link breakage.

aeneasr · 2021-12-09T20:16:50Z

Google expects the old refresh token to keep working until they have sent a request using the new token.

That only works if your infrastructure uses the introspection endpoint at Ory Hydra. That will often not be the case, especially when people use the JWT strategy. I think the grace period should sufficiently deal with this problem :)

mitar · 2021-12-10T10:41:06Z

But I think they have a point. I think the best combination would be:

You set grace to one hour.
If client uses the access token and sends it to you and you use introspection endpoint before one hour, you can safely invalidate the refresh token early.

So I think grace period works for the worst case, but for the one common case (when you do use introspection endpoint) we can increase security by invalidating early. So I do not think we loose anything by adding logic to introspection endpoints to invalidate old refresh tokens when they get a call associated with the new one.

mitar · 2022-01-14T01:11:00Z

The proposed standard prevents resending the same payload twice, as the idempotency key will only allow one request to pass.

I think it is the opposite: it allows multiple requests to have the same successful response, but it is only once acted on the request: other times it just returns recorded response for the first successful request.

We do have an idempotency key already! It's the refresh token! If you re-use the refresh token, the request fails.

This is exactly the opposite of what the idempotency keys provide. The behavior should be something like:

I make a request with idempotency key AAA and refresh token BBB.
If for whatever reason I do not get a response (connection drops, etc.) I can just retry the same request, with same idempotency key and refresh token.
No matter if it was the first or second request which in fact got to the server and server acted upon it, the refresh token is revoked when any request got to the server, even if the response did not get back.
If it was the first request which succeeded, the second request simply gets the same (successful) response back. The only reason why client would be sending the same request is because it did not get the response. So in this scheme, we just return recorded response again.

yangm97 · 2022-01-20T00:34:15Z

If client uses the access token and sends it to you and you use introspection endpoint before one hour, you can safely invalidate the refresh token early.

I think that could be a problem for clients running on "eventually consistent" databases. For instance, the Google Identity documentation, specifically states:

Immediate and fully consistent shared state both within, and between, your and Google's token handling systems cannot be guaranteed. [...]

Support multiple, concurrently valid access and refresh tokens. [...]

So the Google Smart Home Test Suite Refresh Token Validation Test recommendation of invalidating the old token as soon as you notice a request using the new token is actually in contradiction with this 🤷‍♂️

Speaking of that test, I can see the refresh token grace period feature from #2827 makes it pass, which is awesome, but still, if for whatever reason token reuse detection does kick in, without an alternative refresh token, I suppose the issue with the account link breaking would still occur.

Anyway I believe I'm drifting further away from the issue subject so I've opened a discussion if anyone is interest in continuing this conversation: #2946

tommyasai · 2022-02-22T21:56:51Z

@yangm97
May I ask how you passed the Account Linking test in Google Actions Test Suite?
I saw your comment here and I am in the same situation.
#1831 (comment)

In my understanding, we need to wait for #2827 to be released, but do you have any workaround in your mind?

yangm97 · 2022-02-24T00:45:24Z

@tommyasai I built a docker image from the source code of #2827 (HEAD was e992907 at that time).

If you just need to test something quick and don't mind running a stale version of that PR (nor running binaries from strangers on the internet 😅) then sure I could push that docker image somewhere public.

If you intend to run this on production and are not comfortable enough with the codebase to compile a version yourself I would advise against running off tree. As a general rule, you're almost always guaranteed to get undocumented breaking changes which may be as easy to fix as running the down migration(s) from previous iterations and then applying the newer ones or in the worst case you might need to patch your way out a situation nobody else has been before. Scary.

PS: you can also find me on telegram, same username as here. While it is good to know people on the same boat as us I think we should avoid stealing the thread heh

tommyasai · 2022-02-24T01:54:06Z

@yangm97
I appreciate your comment.
Now that I need to pass the test quickly to get the release approval, so If possible I'd like you to push the image. 🙏

We still have some time till the actual release for users, so as a workaround for production, we are also thinking about implementing the proxy to cover the functionality.

StarAurryon · 2022-09-24T12:06:42Z

Just a quick question :
Is So the Google Smart Home Test Suite Refresh Token Validation Test recommendation of invalidating the old token as soon as you notice a request using the new token is actually in contradiction with this the current planned behaviour ?

I'm using Hydra in production on a 3 site cluster with JWT and access_token lifespan of 5m to easily revoke accesses in case of compromise. And in 1 week of usage on our Drive Sync Client on 30 beta testers, 2 clients already lost the new tokens due to crashes at the wrong moment or the os killing the app at shutdown.

aeneasr · 2022-10-05T07:55:02Z

Yes but it’s very difficult to implement securely

github-actions · 2023-10-06T00:03:44Z

Hello contributors!

I am marking this issue as stale as it has not received any engagement from the community or maintainers for a year. That does not imply that the issue has no merit! If you feel strongly about this issue

open a PR referencing and resolving the issue;
leave a comment on it and discuss ideas on how you could contribute towards resolving it;
leave a comment and describe in detail why this issue is critical for your use case;
open a new issue with updated details and a plan for resolving the issue.

Throughout its lifetime, Ory has received over 10.000 issues and PRs. To sustain that growth, we need to prioritize and focus on issues that are important to the community. A good indication of importance, and thus priority, is activity on a topic.

Unfortunately, burnout has become a topic of concern amongst open-source projects.

It can lead to severe personal and health issues as well as opening catastrophic attack vectors.

The motivation for this automation is to help prioritize issues in the backlog and not ignore, reject, or belittle anyone.

If this issue was marked as stale erroneously you can exempt it by adding the backlog label, assigning someone, or setting a milestone for it.

Thank you for your understanding and to anyone who participated in the conversation! And as written above, please do participate in the conversation if this topic is important to you!

Thank you 🙏✌️

RazerM · 2023-10-06T08:40:51Z

This issue is not stale

StarAurryon · 2023-10-06T09:05:03Z

I was waiting on an input on #2827 regarding code reuse, to know if I can adapt the code to hydra v2.

JohannesHuster · 2024-08-20T14:59:02Z

@aeneasr Would more/different work from the community be helpful here at the moment? And do you think this issue will be a priority in the near future? - Not meant to put any pressure on you. We're just discussing whether to wait on this or change quite a few things on our end to solve our problems; and would be grateful for a quick update 🙏

We don't have any Go programmers on the team, but are happy to help where we can (documentation, test cases?, ...)

This patch adds a configuration flag which enables graceful refresh token rotation. Previously, refresh tokens could only be used once. On reuse, all tokens of that chain would be revoked. This is particularly challenging in environments, where it's difficult to make guarantees on synchronization. This could lead to refresh tokens being sent twice due to some parallel execution. To resolve this, refresh tokens can now be graceful by changing `oauth2.grant.refresh_token.grace_period=10s` (example value). During this time, a refresh token can be used multiple times to generate new refresh, ID, and access tokens. All tokens will correctly be invalidated, when the refresh token is re-used after the grace period expires, or when the delete consent endpoint is used. Closes #1831 #3770

struhtanov mentioned this issue Jul 13, 2020

Refresh token flow breaks up during db server downtime #1928

Closed

aeneasr added rfc A request for comments to discuss and share ideas. breaking change Changes behavior in a breaking manner. package/oauth2 labels Aug 20, 2020

aeneasr added this to the unplanned milestone Aug 20, 2020

aeneasr changed the title ~~Hydra Refresh Token Grace Period~~ Refresh tokens need a grace period to deal with network errors and similar issues Aug 21, 2020

IonFoXx mentioned this issue Aug 27, 2020

Reuse Detection in Refresh Token Rotation #2022

Closed

aeneasr mentioned this issue Apr 9, 2021

Support for reusing used RT in a short time window ory/fosite#580

Closed

github-actions bot added the stale Feedback from one or more authors is required to proceed. label Sep 21, 2021

github-actions bot removed the stale Feedback from one or more authors is required to proceed. label Sep 22, 2021

This was referenced Oct 21, 2021

feat: allow for permanent refresh_token(s) ory/fosite#632

Closed

feat: allow for permanent refresh_token(s) with config #2815

Closed

This was referenced Nov 1, 2021

feat: add new function to TokenRevocationStorage to support refresh token grace-period ory/fosite#635

Merged

feat: Refresh token expiration window #2827

Closed

aeneasr modified the milestones: unplanned, v2.0 May 18, 2022

aeneasr modified the milestones: v2.0, v2.1 Jul 17, 2022

aeneasr mentioned this issue Aug 11, 2023

Competing refresh token requests can cause concurrent update issues on PostgreSQL #3604

Open

5 tasks

github-actions bot added the stale Feedback from one or more authors is required to proceed. label Oct 6, 2023

aeneasr removed the stale Feedback from one or more authors is required to proceed. label Oct 6, 2023

StarAurryon mentioned this issue May 25, 2024

feat: Refresh token expiration window #3770

Closed

1 task

hhow09 mentioned this issue Sep 21, 2024

feat: refresh token notes after surveying ory/fosite hhow09/hhow09.github.io#28

Merged

aeneasr closed this as completed in e278b40 Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refresh tokens need a grace period to deal with network errors and similar issues #1831

Refresh tokens need a grace period to deal with network errors and similar issues #1831

NavindrenBaskaran commented Apr 28, 2020 •

edited

Loading

aeneasr commented Apr 28, 2020

imranismail commented Apr 28, 2020

aeneasr commented Apr 28, 2020 •

edited

Loading

aeneasr commented Aug 21, 2020

github-actions bot commented Sep 21, 2021

aeneasr commented Sep 21, 2021

seanhoughton commented Oct 14, 2021

mitar commented Nov 17, 2021 •

edited

Loading

aeneasr commented Nov 19, 2021

yangm97 commented Dec 9, 2021 •

edited

Loading

aeneasr commented Dec 9, 2021

mitar commented Dec 10, 2021 •

edited

Loading

mitar commented Jan 14, 2022

yangm97 commented Jan 20, 2022

tommyasai commented Feb 22, 2022

yangm97 commented Feb 24, 2022

tommyasai commented Feb 24, 2022

StarAurryon commented Sep 24, 2022

aeneasr commented Oct 5, 2022

github-actions bot commented Oct 6, 2023

RazerM commented Oct 6, 2023

StarAurryon commented Oct 6, 2023

JohannesHuster commented Aug 20, 2024

Refresh tokens need a grace period to deal with network errors and similar issues #1831

Refresh tokens need a grace period to deal with network errors and similar issues #1831

Comments

NavindrenBaskaran commented Apr 28, 2020 • edited Loading

aeneasr commented Apr 28, 2020

imranismail commented Apr 28, 2020

aeneasr commented Apr 28, 2020 • edited Loading

aeneasr commented Aug 21, 2020

github-actions bot commented Sep 21, 2021

aeneasr commented Sep 21, 2021

seanhoughton commented Oct 14, 2021

mitar commented Nov 17, 2021 • edited Loading

aeneasr commented Nov 19, 2021

yangm97 commented Dec 9, 2021 • edited Loading

aeneasr commented Dec 9, 2021

mitar commented Dec 10, 2021 • edited Loading

mitar commented Jan 14, 2022

yangm97 commented Jan 20, 2022

tommyasai commented Feb 22, 2022

yangm97 commented Feb 24, 2022

tommyasai commented Feb 24, 2022

StarAurryon commented Sep 24, 2022

aeneasr commented Oct 5, 2022

github-actions bot commented Oct 6, 2023

RazerM commented Oct 6, 2023

StarAurryon commented Oct 6, 2023

JohannesHuster commented Aug 20, 2024

NavindrenBaskaran commented Apr 28, 2020 •

edited

Loading

aeneasr commented Apr 28, 2020 •

edited

Loading

mitar commented Nov 17, 2021 •

edited

Loading

yangm97 commented Dec 9, 2021 •

edited

Loading

mitar commented Dec 10, 2021 •

edited

Loading