Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: improve delete queries for janitor command #2540

Merged
merged 17 commits into from Aug 4, 2021

Conversation

flavioleggio
Copy link
Contributor

@flavioleggio flavioleggio commented May 20, 2021

Related issue

#2513
@aeneasr @Benehiko

Proposed changes

Improve delete queries by separating the data extraction from actual delete.
Extraction is made with a configurable limit, using --limit flag, then deletes are made in batch mode with a configurable batch size, using --batch-size flag.
I chose default values for limit and batch size of respectively 100000 records and 100 records.

Checklist

  • I have read the contributing guidelines.
  • I have read the security policy.
  • I confirm that this pull request does not address a security
    vulnerability. If this pull request addresses a security. vulnerability, I
    confirm that I got green light (please contact
    security@ory.sh) from the maintainers to push
    the changes.
  • I have added tests that prove my fix is effective or that my feature
    works.
  • I have added or changed the documentation.

Further comments

I chose to remove the transaction in cleanup for login and consent requests, which I think could be part of the problem with table locking.

@CLAassistant
Copy link

CLAassistant commented May 20, 2021

CLA assistant check
All committers have signed the CLA.

Copy link
Contributor

@Benehiko Benehiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is looking good! :)
Great job!

cmd/janitor.go Outdated
@@ -52,6 +52,8 @@ Janitor can be used in several ways.
RunE: cli.NewHandler().Janitor.RunE,
Args: cli.NewHandler().Janitor.Args,
}
cmd.Flags().Int(cli.Limit, 10000, "Limits the number of records retrieved from database for deletion.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this limit here is 10K instead of 100k as suggested in the comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, my mistake. Should I fix the comment or change the default value in the config? What do you think about this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can adjust the defaults to what has worked for you

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, my limit was 1 million records, but it takes an hour for the select only and delete is a long process (not affecting database resources nor service though). I tought that a lower default would be more reasonable for common use cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we stick with the lower limits :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well then, we just have to decide between 10k and 100k.
In my database setup a single 100 batch delete for consent (the slowest table, I guess because of many cascade deletes) takes 1 second or so. A default limit of 10k would take a total of 100 seconds, while a limit of 100k would take 1000 seconds (i.e. almost 17 minutes). We could stick with one of these (I would take the 100k limit, if I have to choose) or get a value in the middle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aeneasr what is your opinion on this?

persistence/sql/persister_consent.go Outdated Show resolved Hide resolved
cmd/janitor.go Outdated Show resolved Hide resolved
@flavioleggio flavioleggio changed the title perf(janitor): improve delete queries perf: improve delete queries May 20, 2021
@flavioleggio flavioleggio changed the title perf: improve delete queries perf: improve delete queries for janitor command May 20, 2021
Improve delete queries by separating the data extraction from actual delete.
Extraction is made with a configurable limit, using --limit flag, then deletes are made in batch mode with a configurable batch size, using --batch-size flag.
Default value for limit is 100000 records and default value for batch size is 100 records.
This uses LEFT JOIN to select also login and consent requests which
did not result in a complete authentication, i.e. user requested login
but timed out or user logged in and timed out at consent.
This splits in two independent SELECTs the extraction of login and consent
requests eligible for deletion. This solves a bug in the single SELECT
causing deletion of consent requests where matching login requests were
eligible for deletion and vice versa. With independent SELECTs we keep
consent requests even if matching login request gets deleted and vice
versa.
@flavioleggio flavioleggio changed the title perf: improve delete queries for janitor command fix: improve delete queries for janitor command May 21, 2021
This adds a check in janitor command handler to ensure that user is not
passing wrong values for limit anch batch flags. This also adds tests
for these command line arguments.
@flavioleggio flavioleggio force-pushed the janitor-slow-queries branch 2 times, most recently from 2ab5910 to 86bda51 Compare May 21, 2021 22:22
Copy link
Member

@aeneasr aeneasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job @flavioleggio ! I have some ideas here :)

FROM %[1]s
LEFT JOIN %[2]s ON %[1]s.challenge = %[2]s.challenge
WHERE (
(%[2]s.challenge IS NULL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can challenge be null here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, LEFT JOIN selects records matching in both tables and records only present in left table: in that case, column from right tables will we NULL, so that check should mean "give me unhandled requests".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thank you for the explanation. I'll keep this comment open so that I can re-check the SQL query when I'm back in the office to ensure nothing gets deleted that shouldn't be deleted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you have a chance to check the query?

persistence/sql/persister_consent.go Outdated Show resolved Hide resolved
persistence/sql/persister_consent.go Outdated Show resolved Hide resolved
persistence/sql/persister_oauth2.go Show resolved Hide resolved
@aeneasr aeneasr mentioned this pull request Jun 4, 2021
8 tasks
@flavioleggio flavioleggio marked this pull request as ready for review June 25, 2021 13:28
Copy link
Contributor

@Benehiko Benehiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is looking quite good! Nice job @flavioleggio :)

I just have one request on the tests, maybe there should be a test to verify setting the limit and the batch size to confirm it actually does work, meaning if we have 5 rows in the database and you limit to 2, then we should have 3 rows still in the database - assuming the rows were all invalid/expired. Since testing the batch size isn't really possible on increments, e.g. on batch size 1 we get X, on batch size 2 we get XY. I think we should just have a test setting it to 0 and another setting it to an arbitrary number to verify 0 does not delete anything and some number above 0 does delete the rows according to the limit size.

@aeneasr
Copy link
Member

aeneasr commented Jun 28, 2021

Let me know when I should take a final look :)

@aeneasr
Copy link
Member

aeneasr commented Jul 14, 2021

While the PR is being worked on I will mark it as a draft. That declutters our review backlog :)

Once you're done with your changes and would like someone to review them, mark the PR as ready and request a review from one of the maintainers.

Thank you!

@aeneasr aeneasr marked this pull request as draft July 14, 2021 08:48
@flavioleggio flavioleggio force-pushed the janitor-slow-queries branch 2 times, most recently from c7f79c5 to 9103f42 Compare July 17, 2021 13:20
@flavioleggio
Copy link
Contributor Author

flavioleggio commented Jul 17, 2021

Looks like there are problems with the dependency check with some transitive dependencies. The command states the following:

pkg:golang/github.com/gobuffalo/packr@1.22.0
1 known vulnerabilities affecting installed version

I tried with a simple go mod tidy and also upgrading transitive dependencies, but that upgraded them all and caused even more errors. Can someone give me a pointer on this field?

UPDATE
I upgraded only the indirect packr dependency, resulting in this line in go.mod
github.com/gobuffalo/packr v1.30.1 // indirect
This made dependency check job pass in tests.

@codecov
Copy link

codecov bot commented Jul 17, 2021

Codecov Report

Merging #2540 (12661f2) into master (a8675dd) will increase coverage by 0.26%.
The diff coverage is 91.46%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2540      +/-   ##
==========================================
+ Coverage   52.47%   52.73%   +0.26%     
==========================================
  Files         234      234              
  Lines       13905    13995      +90     
==========================================
+ Hits         7296     7380      +84     
- Misses       5986     5989       +3     
- Partials      623      626       +3     
Impacted Files Coverage Δ
oauth2/handler.go 67.01% <0.00%> (ø)
persistence/sql/persister_consent.go 78.40% <82.60%> (-2.06%) ⬇️
cmd/cli/handler_janitor.go 77.90% <86.66%> (+2.90%) ⬆️
persistence/sql/persister_oauth2.go 82.86% <95.00%> (+2.15%) ⬆️
cmd/janitor.go 100.00% <100.00%> (ø)
internal/testhelpers/janitor_test_helper.go 100.00% <100.00%> (ø)
oauth2/fosite_store_helpers.go 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a8675dd...12661f2. Read the comment docs.

@flavioleggio flavioleggio marked this pull request as ready for review July 17, 2021 13:58
Copy link
Contributor

@Benehiko Benehiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice job! @flavioleggio
The changes make sense to me and from my side I don't have any major suggestions or change requests :)

As for the dependency issues, I'm not too sure. Maybe @aeneasr knows something about it?

oauth2/fosite_store_helpers.go Outdated Show resolved Hide resolved
@aeneasr
Copy link
Member

aeneasr commented Jul 23, 2021

Nice, thank you! I will review it next week together with @Benehiko :)

Copy link
Contributor

@Benehiko Benehiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! :)

Copy link
Member

@aeneasr aeneasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just have to verify one thing, otherwise this looks good! :)

@aeneasr aeneasr merged commit 6ea0bf8 into ory:master Aug 4, 2021
@aeneasr
Copy link
Member

aeneasr commented Aug 4, 2021

Congratulations, thank you for the hard work @flavioleggio !!!

@flavioleggio flavioleggio deleted the janitor-slow-queries branch August 6, 2021 13:43
@aeneasr
Copy link
Member

aeneasr commented Aug 17, 2021

@aeneasr
Copy link
Member

aeneasr commented Feb 3, 2022

@flavioleggio would you be open to investigate the test flake? It's unfortunately plaguing almost 50% of runs with a failure

@flavioleggio
Copy link
Contributor Author

I'll take a look this weekend, hopefully. If I can't find an answer on my own I will ask for some help on slack chat, maybe, but I will be in touch.

@aeneasr
Copy link
Member

aeneasr commented Feb 3, 2022

Thank you!!

@vinckr
Copy link
Member

vinckr commented Feb 9, 2022

Hello @flavioleggio
Congrats on merging your first PR in Ory 🎉 !
Your contribution will soon be helping secure millions of identities around the globe 🌏.
Don't stress about the test flake, I am sure it will get sorted out.
As a small token of appreciation we send all our first time contributors a gift package to welcome them to the community.
Please drop me an email and I will forward you the form to claim your Ory swag!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants