New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need way to housekeep records in DB #1574
Comments
Yes, so I checked and flush currently only clears up access tokens. Since this has not ever become an issue even in large deployments so far, I'm pushing the urgency back a bit. |
PKCE is removed when it's being used, but it's still possible that old, dead entries are around if the authorize code is not used. |
Thanks, that's helpful - and I just verified the normal path won't create a record there. So it should be safe to clean up these pkce records then (still don't yet understand where they came from as the auth code should always be consumed immediately by our mobile app/spa... maybe there's some error case I'm not aware of...). Looking forward to this enhancement so other DB tables can be handled too. It just seems not that trivial by looking at the other tables to identify which records are safe to delete so I dare not do so... Thanks again! |
Hello. Any progress with it? As described here it is safe to cleanup the tables manually. But PR was not merged |
Contributions are welcomed, I think the threads and comments and existing PRs shed light on what needs to be done! |
@aeneasr
That says to flush all not handled and obsolete requests. I'm confused here the first statement will delete all records including was_used=false.
That understood, deleting all not used and obsolete requests. Also the logout request does not have ttl currently. Is this parameter shall be added, or ttl.login_consent_request can be used? On other side what a reason to keep handled, used and obsolete requests? Is it removed at some flow? |
Keep in mind, this comment is from 2019. The code has changed since then! Maybe let's start with that: What tables accumulate lots in your system? If we know those, we will know how to clean them. |
The tables i most care about:
Question is can we flush hydra_oauth2_(authentication|consent)_request_handled as well? |
How much data has accumulated for those? Are they growing fast? And if so, how fast? |
@aeneasr |
Hi @aeneasr, We are facing the similar issue in our production database that hydra_oauth2_(authentication|consent)request tables grow quite quickly. At the rate of 9000 new records per day. |
Hi @aeneasr We are going to update to latest version of Hydra which can solve our long query problem. We still want to tidy up some tables to keep the size in a cost margin. As you mentioned above, the latest version had some code change. Is the clean up instruction in #1574 still relevant or you could provide me an update instruction on how to safely clean up those table? We will only only do the simple clean up that remove records older than a certain time. |
Clean up should now be more or less automatic for most flows. We also introduced better foreign key checks since then. There might still be stale data but it definitely shouldn't be too much. If you do find leaky tables in the latest version, please do report here :) |
@aeneasr Let me follow up on that. The newer version of hydra has cleanup function build-in and we shouldn't worry about too much in the future. How about the existing aged data in the current table? Is the new version of hydra will clean the old data as well? |
We are interested in this feature as well. The tables I'm seeing growing the most at the moment are
When using the PKCE flow from a SPA, we store the
I don't get this, once the refresh token has expired, it can't be used for refreshing any |
But aren't all the tokens with requested_at earlier than their now - TTL already expired? |
With few million users per day, the number of records can be a problem for me. |
testing with locust after 1 day we have 21GB data with outdated data. Any plan to review your cleanup process? |
Please provide useful information and context as mandated by our contribution guidelines. Also make sure to read the comments in this thread, as a lot of time was spent on this already. |
As I wrote here using authorization code flow with refresh token could avoid fast db data increasing. But since a few days we are using ory hydra for production for further approaches and getting thousends of new users every day. That is leading to the same problem. Our tables like hydra_oauth2_authentication_request, hydra_oauth2_code, hydra_oauth2_consent_request and hydra_oauth2_refresh are becoming a few gigabytes bigger every day. As I understood hydra_oauth2_refresh would be cleaned automatically after TTL_REFRESH_TOKEN. But for the other once we would need some sort of workaround... |
Are you running the flush command? |
Yes, we are flushing with a request to "/oauth2/flush" automatically every x hours. As I could see in our logs this only deletes rows from "hydra_oauth2_access". Or is there any dependencies to other ones? |
@aeneasr :-) Yes, I'm the author of it :-) |
Ah I see, so the issue still persists? Could you maybe show some of the records you think should be purged from the database? Based on that we can craft improved queries for the clean up routine. |
If those record are "technically valid" and for this they cannot be deleted automatically, there should be a configuration parameter to limit those records or the total space occupied in the database, sacrifying the least recently used / oldest ones, OR (better) a parameter to the flush command to do the same cleaning. |
Yes, I agree, but what would be really helpful is some real world examples of rows that you deem obsolete. If we get those, we can easily add this code. Without examples, we are walking in the dark! |
You know why records are not deleted, so let's take any db in real world use: (I'm still not in production, @VladislavKravitski4888 have you some example data to show?) |
To the best of my knowledge, all records might be required for auditing or are required for a functioning system. That's why I am asking. Most people asking about this here are running load tests with synthetic data which does not reflect real world use. This has been explained and discussed in the earlier comments. What we need to help you is concrete data, concrete examples, and ideas what records are being kept that could be purged. Without this information, commenting +1 on this issue will not help anyone and we keep going in circles. |
I'm facing same issue already described by @LAMASE and @VladislavKravitski4888. The real issue comes now that we are replacing separate logins with a centralized single sign-on powered by hydra, increasing user base to few millions per day (from statistics we expect a minimum 30x increase in hydra requests). Of course we tried
As I understood table Now the hintIf a login request never be completed before its expiration should be deleted because useless for user. The only use for this row is to trigger a "The login request has expired" error message, but a new login flow must be started and saying "expired" or "not valid" have same meaning for user; therefore this operation does not penalize user experience. Making login requests expiring in 1 month in our system allows us to delete 937405 rows from each previous mentioned tables ( Take in account that in our application we can consider a session expired even after 1 hour. Running this query:
gives me our very first login sessions not completed, dated more than 1 year ago:
and this is the number of non completed requests session we can consider expired (1 month old) and drop safely from database without breaking user experience:
Please note: our application starts new oauth login request anytime user press the login button, we don't try to cache or reuse same login challenge in following calls. Let me know if can I share other specific data to help you. |
I can add some more production experience. We have 16m users and handle between .5-1m log-ins per day and around 250-500k token refreshes per day. We have the token flush command running nightly. Here is table size in descending size order. I'll keep this up to date to see how it's changing. We have had Hydra in production for about 6 months, but we went from ~5m to ~16m users only a few weeks ago, so we don't have a clear picture yet. Throughout this year we'll ramp up to about 30m.
Update! After another week of running:
Update 2/3/21
|
Hi all, Please review this draft PR #2381 |
See #1574 (comment) Co-authored-by: hackerman <3372410+aeneasr@users.noreply.github.com>
We are happy to announce that |
Hey, I noticed that If not, I can create a PR for that. |
A PR would be welcomed for this! You can also first create an issue and outlay the plan for the delete query :) Also there's an ongoing refactoring of the system, see #2540 |
is it fine to remove the hydra_oauth2_(authentication|consent)_request_handled.was_used=true records as well. what is the _request_handled tables being used for? |
There are lots of records in Hydra DB that get accumulated over time and are never cleaned up.
The flush API (as I tested with 1.0) can only clean up access tokens as stated in documentation, but not records in at least these tables:
We need at minimum an API to flush all obsolete records, if not an automatic housekeeping mechanism which I understand isn't in place today for access tokens neither.
The text was updated successfully, but these errors were encountered: