Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add index on requested_at for refresh tokens and use it in janitor #3516

Merged
merged 2 commits into from May 24, 2023
Merged

fix: add index on requested_at for refresh tokens and use it in janitor #3516

merged 2 commits into from May 24, 2023

Conversation

sgal
Copy link
Contributor

@sgal sgal commented May 12, 2023

Related issue(s)

Inspired by #3115

This change addresses the performance issue with Hydra Janitor, which makes the cleanups extremely slow due to inefficient query that leads to a full table scan.

The approach here is taken from the issue above. Adding an index on the requested_at field and ordering by it in Janitor avoids full table scan and improves the performance of the cleanups.

Checklist

  • I have read the contributing guidelines.
  • I have referenced an issue containing the design document if my change
    introduces a new feature.
  • I am following the
    contributing code guidelines.
  • I have read the security policy.
  • I confirm that this pull request does not address a security
    vulnerability. If this pull request addresses a security vulnerability, I
    confirm that I got the approval (please contact
    security@ory.sh) from the maintainers to push
    the changes.
  • I have added tests that prove my fix is effective or that my feature
    works.
  • I have added or changed the documentation.

Further Comments

When we run the following query:

EXPLAIN ANALYSE DELETE FROM hydra_oauth2_refresh WHERE signature in (
SELECT signature FROM (SELECT signature FROM hydra_oauth2_refresh hoa WHERE requested_at < now() - interval '720 hours' and nid = '<UUID>' ORDER BY signature LIMIT 100 ) as s
)

We get the following plan, with an execution time of 2.6 seconds. So 2,6 seconds to delete 100 refresh tokens. We can note here that there is a very high cost of the index scan with a lot of filtered rows.

"Delete on hydra_oauth2_refresh  (cost=97.31..955.25 rows=100 width=74) (actual time=2635.862..2635.864 rows=0 loops=1)"
"  ->  Nested Loop  (cost=97.31..955.25 rows=100 width=74) (actual time=2584.547..2586.199 rows=100 loops=1)"
"        ->  HashAggregate  (cost=96.75..97.75 rows=100 width=112) (actual time=2584.517..2584.619 rows=100 loops=1)"
"              Group Key: (s.signature)::text"
"              Batches: 1  Memory Usage: 48kB"
"              ->  Subquery Scan on s  (cost=0.56..96.50 rows=100 width=112) (actual time=277.987..2584.186 rows=100 loops=1)"
"                    ->  Limit  (cost=0.56..95.50 rows=100 width=44) (actual time=277.981..2583.939 rows=100 loops=1)"
"                          ->  Index Scan using hydra_oauth2_refresh_pkey on hydra_oauth2_refresh hoa  (cost=0.56..2184604.46 rows=2300987 width=44) (actual time=277.979..2583.852 rows=100 loops=1)"
"                                Filter: ((nid = '<UUID>'::uuid) AND (requested_at < (now() - '720:00:00'::interval)))"
"                                Rows Removed by Filter: 1778803"
"        ->  Index Scan using hydra_oauth2_refresh_pkey on hydra_oauth2_refresh  (cost=0.56..8.57 rows=1 width=50) (actual time=0.014..0.014 rows=1 loops=100)"
"              Index Cond: ((signature)::text = (s.signature)::text)"
"Planning Time: 1.830 ms"
"Execution Time: 2635.921 ms"

Just to confirm, we also checked the distribution of refresh tokens over the dates, but looks good:

"2023-04-15"	114043
"2023-04-14"	162217
"2023-04-13"	176243
"2023-04-12"	151112
"2023-04-11"	104758
"2023-04-10"	66714

To resolve the issue, we applied an index to the refresh token table:

CREATE INDEX hydra_oauth2_refresh_requested_at_idx ON hydra_oauth2_refresh (requested_at);

When we now try to run the cleanup query, and instead change the order by from signature to requested_at

EXPLAIN ANALYSE DELETE FROM hydra_oauth2_refresh WHERE signature in (
SELECT signature FROM (SELECT signature FROM hydra_oauth2_refresh hoa WHERE requested_at < now() - interval '720 hours' and nid = '<UUID>' ORDER BY requested_at LIMIT 100 ) as s
)

We get a much more healthy execution with 720 hours interval:

"Delete on hydra_oauth2_refresh  (cost=44.65..902.59 rows=100 width=74) (actual time=201.503..201.506 rows=0 loops=1)"
"  ->  Nested Loop  (cost=44.65..902.59 rows=100 width=74) (actual time=1.548..109.779 rows=100 loops=1)"
"        ->  HashAggregate  (cost=44.09..45.09 rows=100 width=112) (actual time=0.389..0.540 rows=100 loops=1)"
"              Group Key: (s.signature)::text"
"              Batches: 1  Memory Usage: 48kB"
"              ->  Subquery Scan on s  (cost=0.44..43.84 rows=100 width=112) (actual time=0.146..0.354 rows=100 loops=1)"
"                    ->  Limit  (cost=0.44..42.84 rows=100 width=52) (actual time=0.143..0.326 rows=100 loops=1)"
"                          ->  Index Scan using hydra_oauth2_refresh_requested_at_idx on hydra_oauth2_refresh hoa  (cost=0.44..976522.26 rows=2302988 width=52) (actual time=0.141..0.316 rows=100 loops=1)"
"                                Index Cond: (requested_at < (now() - '720:00:00'::interval))"
"                                Filter: (nid = '<UUID>'::uuid)"
"        ->  Index Scan using hydra_oauth2_refresh_pkey on hydra_oauth2_refresh  (cost=0.56..8.57 rows=1 width=50) (actual time=1.089..1.089 rows=1 loops=100)"
"              Index Cond: ((signature)::text = (s.signature)::text)"
"Planning Time: 1.876 ms"
"Execution Time: 201.566 ms"

@sgal sgal requested a review from aeneasr as a code owner May 12, 2023 19:09
@codecov
Copy link

codecov bot commented May 12, 2023

Codecov Report

Merging #3516 (cfb9b01) into master (31b9e66) will decrease coverage by 0.05%.
The diff coverage is 100.00%.

❗ Current head cfb9b01 differs from pull request most recent head eda1501. Consider uploading reports for the commit eda1501 to get more accurate results

@@            Coverage Diff             @@
##           master    #3516      +/-   ##
==========================================
- Coverage   76.89%   76.85%   -0.05%     
==========================================
  Files         124      124              
  Lines        9102     9175      +73     
==========================================
+ Hits         6999     7051      +52     
- Misses       1660     1673      +13     
- Partials      443      451       +8     
Impacted Files Coverage Δ
persistence/sql/persister_oauth2.go 82.19% <100.00%> (-0.76%) ⬇️

... and 1 file with indirect coverage changes

@aeneasr
Copy link
Member

aeneasr commented May 12, 2023

May I suggest adding nid as the index as well, given that it's part of the query in question?

@sgal
Copy link
Contributor Author

sgal commented May 12, 2023

@aeneasr Do you mean as a separate index or part of the requested_at one like below

CREATE INDEX hydra_oauth2_refresh_requested_at_idx ON hydra_oauth2_refresh (nid, requested_at);

The composite one was suggested by @arnolf here #3115 (comment)

@aeneasr
Copy link
Member

aeneasr commented May 12, 2023

Like below :)

@sgal
Copy link
Contributor Author

sgal commented May 16, 2023

@aeneasr Please have a look, I fixed the index.

@kmherrmann
Copy link
Contributor

@hperl @alnr can either of you review this please?

Copy link
Contributor

@hperl hperl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🎉

Thanks for the contribution!

@aeneasr aeneasr merged commit 5b8e712 into ory:master May 24, 2023
28 checks passed
@sgal sgal deleted the fix-refresh-janitororder-by-requested-at branch September 1, 2023 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants