Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: faster regional queries #1171

Merged
merged 18 commits into from
Jan 17, 2023
Merged

feat: faster regional queries #1171

merged 18 commits into from
Jan 17, 2023

Conversation

hperl
Copy link
Collaborator

@hperl hperl commented Jan 3, 2023

This PR accelerates the database queries when running on a globally-distributed Cockroach cluster.

Related issue(s)

Fixes #1167
Relates to https://github.com/ory-corp/cloud/issues/3668
Fixes https://github.com/ory-corp/cloud/issues/3680

Checklist

  • I have read the contributing guidelines.
  • I have referenced an issue containing the design document if my change
    introduces a new feature.
  • I am following the
    contributing code guidelines.
  • I have read the security policy.
  • I confirm that this pull request does not address a security
    vulnerability. If this pull request addresses a security vulnerability, I
    confirm that I got the approval (please contact
    security@ory.sh) from the maintainers to push
    the changes.
  • I have added tests that prove my fix is effective or that my feature
    works.
  • I have added or changed the documentation.

Further Comments

Further improvements

  • Optimize subject-set expand queries: multiple hops in one query
  • Optimize computed userset queries: multiple hops in one query

Query fusing in the check eninge

During one check, we currently execute a lot of queries in parallel that could be fused into one. For example, take query for a namespace with $n$ computed userset rewrites. These currently result in $2\cdot n$ queries (one for the direct access, and one for the subject set expansion, respectively).

Ideally, we want to combine the queries to have one of

  • Either one single query for each of the traversals:
    • subject set expansion
    • computed userset rewrites
    • tuple to userset rewrites
  • or one big query combining the above.

In both cases, the queries should work on sets of starting tuples, so that multiple paths can be explored in parallel by the database. Of course, you always start with a single node (the query), but that might lead to multiple intermediate nodes that should be checked in parallel, so that each successive query expands the frontier of the traversed graph until either a tuple is found in the DB or the maximum depth is reached.

Traversal results

In order to be compatible with each other, all graph traversal queries are written such that they return the exact same columns and can be scanned into a common TraversalResult.

type (
	TraversalResult struct {
		From  RelationTuple
		To    RelationTuple
		Via   Traversal
		Found bool
	}

	Traversal string
)

Before/after the computed userset optimization:

name                                 old time/op     new time/op     delta
ComputedUsersets/Computed_userset-8      583µs ± 0%       81µs ± 0%   ~     (p=1.000 n=1+1)

name                                 old queries/op  new queries/op  delta
ComputedUsersets/Computed_userset-8       9.00 ± 0%       1.00 ± 0%   ~     (p=1.000 n=1+1)

name                                 old alloc/op    new alloc/op    delta
ComputedUsersets/Computed_userset-8      109kB ± 0%       18kB ± 0%   ~     (p=1.000 n=1+1)

name                                 old allocs/op   new allocs/op   delta
ComputedUsersets/Computed_userset-8      1.56k ± 0%      0.36k ± 0%   ~     (p=1.000 n=1+1)

Keto SQL optimizations

@hperl hperl requested a review from zepatrik as a code owner January 3, 2023 08:42
@hperl hperl changed the title hperl/faster-regional-queries feat: faster regional queries Jan 3, 2023
@hperl hperl self-assigned this Jan 3, 2023
@hperl hperl marked this pull request as draft January 3, 2023 08:43
@aeneasr
Copy link
Member

aeneasr commented Jan 3, 2023

Can you run the formatter fo rthe new year in another pr and then rebase? makes it easier to review

@hperl hperl force-pushed the hperl/faster-regional-queries branch 4 times, most recently from e8f9fc7 to eaa4cf9 Compare January 6, 2023 14:20
Copy link
Member

@zepatrik zepatrik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, looking pretty good already 👍

internal/persistence/sql/traverser.go Outdated Show resolved Hide resolved
@hperl hperl force-pushed the hperl/faster-regional-queries branch from d2e490a to b5b7c97 Compare January 9, 2023 17:02
@hperl
Copy link
Collaborator Author

hperl commented Jan 9, 2023

benchstat ./internal/check/2023-01-09-benchtest.txt ./benchtest.new.txt              
name                                 old time/op     new time/op     delta
ComputedUsersets/Computed_userset-8      583µs ± 0%      112µs ± 0%   ~     (p=1.000 n=1+1)

name                                 old queries/op  new queries/op  delta
ComputedUsersets/Computed_userset-8       9.00 ± 0%       2.00 ± 0%   ~     (p=1.000 n=1+1)

name                                 old alloc/op    new alloc/op    delta
ComputedUsersets/Computed_userset-8      109kB ± 0%       24kB ± 0%   ~     (p=1.000 n=1+1)

name                                 old allocs/op   new allocs/op   delta
ComputedUsersets/Computed_userset-8      1.56k ± 0%      0.47k ± 0%   ~     (p=1.000 n=1+1)

Queries went down from 9 to 2 already (and I have an idea about the last redundant one) for the common case of computed userset rewrite checks.

@hperl hperl force-pushed the hperl/faster-regional-queries branch 3 times, most recently from 4c39cf1 to 055f6ec Compare January 12, 2023 11:15
@hperl hperl marked this pull request as ready for review January 12, 2023 11:20
go.mod Outdated Show resolved Hide resolved
internal/check/bench_test.go Outdated Show resolved Hide resolved
@aeneasr
Copy link
Member

aeneasr commented Jan 12, 2023

Can you create a PR in cloud that uses this patch and demonstrates it for our queries?

@hperl
Copy link
Collaborator Author

hperl commented Jan 12, 2023

Can you create a PR in cloud that uses this patch and demonstrates it for our queries?

done :)

Copy link
Member

@aeneasr aeneasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell from our review, this looks good to me. Only a few minor things that should be addressed IMO.

I think it would be good to add some test:

  • self-referencing OPL definitions (this is the -1 bugfix)
  • think about potential edge cases for this new set up and write tests
  • create a regression check list in the cloud PR and test it locally

internal/persistence/sql/traverser.go Outdated Show resolved Hide resolved
internal/check/engine.go Show resolved Hide resolved
internal/check/rewrites.go Show resolved Hide resolved
@zepatrik zepatrik force-pushed the hperl/faster-regional-queries branch from 27757f4 to 32484d4 Compare January 17, 2023 14:59
@hperl hperl force-pushed the hperl/faster-regional-queries branch from 32484d4 to 736c9bd Compare January 17, 2023 15:25
@hperl hperl force-pushed the hperl/faster-regional-queries branch from 736c9bd to 93c8ed6 Compare January 17, 2023 15:26
@zepatrik zepatrik force-pushed the hperl/faster-regional-queries branch from 93c8ed6 to 5dbe21e Compare January 17, 2023 16:27
@zepatrik zepatrik force-pushed the hperl/faster-regional-queries branch from 5dbe21e to 3d1cbbc Compare January 17, 2023 16:28
@zepatrik zepatrik merged commit 8e07890 into master Jan 17, 2023
@zepatrik zepatrik deleted the hperl/faster-regional-queries branch January 17, 2023 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LIMIT argument is incorrectly set to 2 instead of 1
3 participants