Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of ListResources #23534

Merged
merged 6 commits into from
Mar 24, 2023
Merged

Improve performance of ListResources #23534

merged 6 commits into from
Mar 24, 2023

Conversation

rosstimothy
Copy link
Contributor

@rosstimothy rosstimothy commented Mar 23, 2023

Reduces latency of proto.AuthService/ListResources in a variety of ways:

  • 6f9ac02: Moves RBAC logging from DEBUG to TRACE
  • c71ef7b: Stores compiled *regexp.Regexp in a LRU cache so that they can be reused during RBAC
  • ec0860c: Adds a GetLabel(key string) (value string, ok bool) to types.ResourcesWithLabels to prevent copying the entire label set when we just need to look up keys
  • 78d0bef: Prevents loading an extra page from the cache to determine the next key in auth.ServerWithRoles.ListResources
  • 81b08ed: Modifies services.UnmarshalServer to unmarshal directly into a types.ServerV2 instead of first into a types.ResourceHeader to check that the version is types.V2

Comparison of BenchmarkListNodes from b1715a5 to ec0860c:

benchstat b1715a5.txt ec0860c.txt
goos: darwin
goarch: amd64
pkg: github.com/gravitational/teleport/lib/auth
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
             │ b1715a5.txt │             ec0860c.txt             │
             │   sec/op    │   sec/op     vs base                │
ListNodes-16   19.307 ± 5%   1.033 ± 14%  -94.65% (p=0.000 n=10)

             │  b1715a5.txt   │             ec0860c.txt              │
             │      B/op      │     B/op      vs base                │
ListNodes-16   11154.5Mi ± 0%   493.0Mi ± 0%  -95.58% (p=0.000 n=10)

             │  b1715a5.txt  │             ec0860c.txt             │
             │   allocs/op   │  allocs/op   vs base                │
ListNodes-16   112.067M ± 0%   8.341M ± 0%  -92.56% (p=0.000 n=10)

Copy link
Contributor

@fspmarshall fspmarshall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the old CombineLabels strategy would perform the combination s.t. command labels took precedence over static labels (i.e. if a command label and a static label exist for the same key, only the command label would be observed). As implemented, these GetLabel methods are giving precedence to static labels. I actually prefer this strategy, but it would technically be a breaking change in RBAC behavior, so probably best to change it.

@rosstimothy rosstimothy force-pushed the tross/ls_bench branch 5 times, most recently from 64cf613 to ec0860c Compare March 24, 2023 13:00
@rosstimothy rosstimothy marked this pull request as ready for review March 24, 2023 13:52
Copy link
Collaborator

@zmb3 zmb3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work!

Copy link
Contributor

@strideynet strideynet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, awesome to see benchmarks being used to ensure improvements are worthwhile.

lib/auth/auth_with_roles_test.go Outdated Show resolved Hide resolved
@rosstimothy rosstimothy force-pushed the tross/ls_bench branch 2 times, most recently from fc4dc2b to 623322a Compare March 24, 2023 17:43
BenchmarkListNodes is twice as slow when RBAC logging is enabled.
By switching RBAC logging from debug to trace we can eliminate
the performance hit while still providing a way for users to opt
in to the behavior if they need to debug RBAC.
Profiles of the benchmark test revealed that the `regexp.Compile`
done within `utils.matchString` was the most cpu and memory intensive
portion of the tests. By leveraging a `lru.Cache` to intern the
compiled regular expressions we get quite a performance improvement.
Increases the request limit prior to loading the resources from
the cache so that we load enough items in a single page to determine
the start key of the next page.
Unmarshal directly to a `types.ServerV2` instead of first creating
a `types.ResourceHeader` to inspect the version. There is only a
single version for `types.ServerV2` making the check unnecessary.
`GetAllLabels` can be overkill if one simply needs to look up the
value for a particular label. It creates a new `map[string]string`
and copies all of a resources existing labels. RBAC decisions
driven by labels incurred the penalty of the copy each time access
was checked. The impact of the copy is much more noticeable when
a resource has several labels or really long strings in the key
or value.

By leveraging `GetLabel` RBAC can avoid copying the labels altogether
and simply lookup each label key when required.
@rosstimothy rosstimothy added this pull request to the merge queue Mar 24, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 24, 2023
@rosstimothy rosstimothy added this pull request to the merge queue Mar 24, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 24, 2023
@rosstimothy rosstimothy added this pull request to the merge queue Mar 24, 2023
Merged via the queue into master with commit 62fbd3f Mar 24, 2023
@rosstimothy rosstimothy deleted the tross/ls_bench branch March 24, 2023 19:11
rosstimothy added a commit that referenced this pull request Mar 25, 2023
* Add benchmark for ListNodes

* Move RBAC logging to trace level

BenchmarkListNodes is twice as slow when RBAC logging is enabled.
By switching RBAC logging from debug to trace we can eliminate
the performance hit while still providing a way for users to opt
in to the behavior if they need to debug RBAC.

* Intern compiled regular expressions

Profiles of the benchmark test revealed that the `regexp.Compile`
done within `utils.matchString` was the most cpu and memory intensive
portion of the tests. By leveraging a `lru.Cache` to intern the
compiled regular expressions we get quite a performance improvement.

* Only fetch a single page of resources

Increases the request limit prior to loading the resources from
the cache so that we load enough items in a single page to determine
the start key of the next page.

* Remove version checking from `services.UnmarshalServer`

Unmarshal directly to a `types.ServerV2` instead of first creating
a `types.ResourceHeader` to inspect the version. There is only a
single version for `types.ServerV2` making the check unnecessary.

* Add `GetLabel` to `types.ResourceWithLables`

`GetAllLabels` can be overkill if one simply needs to look up the
value for a particular label. It creates a new `map[string]string`
and copies all of a resources existing labels. RBAC decisions
driven by labels incurred the penalty of the copy each time access
was checked. The impact of the copy is much more noticeable when
a resource has several labels or really long strings in the key
or value.

By leveraging `GetLabel` RBAC can avoid copying the labels altogether
and simply lookup each label key when required.
rosstimothy added a commit that referenced this pull request Mar 25, 2023
* Add benchmark for ListNodes

* Move RBAC logging to trace level

BenchmarkListNodes is twice as slow when RBAC logging is enabled.
By switching RBAC logging from debug to trace we can eliminate
the performance hit while still providing a way for users to opt
in to the behavior if they need to debug RBAC.

* Intern compiled regular expressions

Profiles of the benchmark test revealed that the `regexp.Compile`
done within `utils.matchString` was the most cpu and memory intensive
portion of the tests. By leveraging a `lru.Cache` to intern the
compiled regular expressions we get quite a performance improvement.

* Only fetch a single page of resources

Increases the request limit prior to loading the resources from
the cache so that we load enough items in a single page to determine
the start key of the next page.

* Remove version checking from `services.UnmarshalServer`

Unmarshal directly to a `types.ServerV2` instead of first creating
a `types.ResourceHeader` to inspect the version. There is only a
single version for `types.ServerV2` making the check unnecessary.

* Add `GetLabel` to `types.ResourceWithLables`

`GetAllLabels` can be overkill if one simply needs to look up the
value for a particular label. It creates a new `map[string]string`
and copies all of a resources existing labels. RBAC decisions
driven by labels incurred the penalty of the copy each time access
was checked. The impact of the copy is much more noticeable when
a resource has several labels or really long strings in the key
or value.

By leveraging `GetLabel` RBAC can avoid copying the labels altogether
and simply lookup each label key when required.
rosstimothy added a commit that referenced this pull request Mar 25, 2023
* Add benchmark for ListNodes

* Move RBAC logging to trace level

BenchmarkListNodes is twice as slow when RBAC logging is enabled.
By switching RBAC logging from debug to trace we can eliminate
the performance hit while still providing a way for users to opt
in to the behavior if they need to debug RBAC.

* Intern compiled regular expressions

Profiles of the benchmark test revealed that the `regexp.Compile`
done within `utils.matchString` was the most cpu and memory intensive
portion of the tests. By leveraging a `lru.Cache` to intern the
compiled regular expressions we get quite a performance improvement.

* Only fetch a single page of resources

Increases the request limit prior to loading the resources from
the cache so that we load enough items in a single page to determine
the start key of the next page.

* Remove version checking from `services.UnmarshalServer`

Unmarshal directly to a `types.ServerV2` instead of first creating
a `types.ResourceHeader` to inspect the version. There is only a
single version for `types.ServerV2` making the check unnecessary.

* Add `GetLabel` to `types.ResourceWithLables`

`GetAllLabels` can be overkill if one simply needs to look up the
value for a particular label. It creates a new `map[string]string`
and copies all of a resources existing labels. RBAC decisions
driven by labels incurred the penalty of the copy each time access
was checked. The impact of the copy is much more noticeable when
a resource has several labels or really long strings in the key
or value.

By leveraging `GetLabel` RBAC can avoid copying the labels altogether
and simply lookup each label key when required.
r0mant pushed a commit that referenced this pull request Mar 28, 2023
* Add benchmark for ListNodes

* Move RBAC logging to trace level

BenchmarkListNodes is twice as slow when RBAC logging is enabled.
By switching RBAC logging from debug to trace we can eliminate
the performance hit while still providing a way for users to opt
in to the behavior if they need to debug RBAC.

* Intern compiled regular expressions

Profiles of the benchmark test revealed that the `regexp.Compile`
done within `utils.matchString` was the most cpu and memory intensive
portion of the tests. By leveraging a `lru.Cache` to intern the
compiled regular expressions we get quite a performance improvement.

* Only fetch a single page of resources

Increases the request limit prior to loading the resources from
the cache so that we load enough items in a single page to determine
the start key of the next page.

* Remove version checking from `services.UnmarshalServer`

Unmarshal directly to a `types.ServerV2` instead of first creating
a `types.ResourceHeader` to inspect the version. There is only a
single version for `types.ServerV2` making the check unnecessary.

* Add `GetLabel` to `types.ResourceWithLables`

`GetAllLabels` can be overkill if one simply needs to look up the
value for a particular label. It creates a new `map[string]string`
and copies all of a resources existing labels. RBAC decisions
driven by labels incurred the penalty of the copy each time access
was checked. The impact of the copy is much more noticeable when
a resource has several labels or really long strings in the key
or value.

By leveraging `GetLabel` RBAC can avoid copying the labels altogether
and simply lookup each label key when required.
nklaassen pushed a commit that referenced this pull request Mar 28, 2023
* Add benchmark for ListNodes

* Move RBAC logging to trace level

BenchmarkListNodes is twice as slow when RBAC logging is enabled.
By switching RBAC logging from debug to trace we can eliminate
the performance hit while still providing a way for users to opt
in to the behavior if they need to debug RBAC.

* Intern compiled regular expressions

Profiles of the benchmark test revealed that the `regexp.Compile`
done within `utils.matchString` was the most cpu and memory intensive
portion of the tests. By leveraging a `lru.Cache` to intern the
compiled regular expressions we get quite a performance improvement.

* Only fetch a single page of resources

Increases the request limit prior to loading the resources from
the cache so that we load enough items in a single page to determine
the start key of the next page.

* Remove version checking from `services.UnmarshalServer`

Unmarshal directly to a `types.ServerV2` instead of first creating
a `types.ResourceHeader` to inspect the version. There is only a
single version for `types.ServerV2` making the check unnecessary.

* Add `GetLabel` to `types.ResourceWithLables`

`GetAllLabels` can be overkill if one simply needs to look up the
value for a particular label. It creates a new `map[string]string`
and copies all of a resources existing labels. RBAC decisions
driven by labels incurred the penalty of the copy each time access
was checked. The impact of the copy is much more noticeable when
a resource has several labels or really long strings in the key
or value.

By leveraging `GetLabel` RBAC can avoid copying the labels altogether
and simply lookup each label key when required.
espadolini pushed a commit that referenced this pull request Mar 28, 2023
* Add benchmark for ListNodes

* Move RBAC logging to trace level

BenchmarkListNodes is twice as slow when RBAC logging is enabled.
By switching RBAC logging from debug to trace we can eliminate
the performance hit while still providing a way for users to opt
in to the behavior if they need to debug RBAC.

* Intern compiled regular expressions

Profiles of the benchmark test revealed that the `regexp.Compile`
done within `utils.matchString` was the most cpu and memory intensive
portion of the tests. By leveraging a `lru.Cache` to intern the
compiled regular expressions we get quite a performance improvement.

* Only fetch a single page of resources

Increases the request limit prior to loading the resources from
the cache so that we load enough items in a single page to determine
the start key of the next page.

* Remove version checking from `services.UnmarshalServer`

Unmarshal directly to a `types.ServerV2` instead of first creating
a `types.ResourceHeader` to inspect the version. There is only a
single version for `types.ServerV2` making the check unnecessary.

* Add `GetLabel` to `types.ResourceWithLables`

`GetAllLabels` can be overkill if one simply needs to look up the
value for a particular label. It creates a new `map[string]string`
and copies all of a resources existing labels. RBAC decisions
driven by labels incurred the penalty of the copy each time access
was checked. The impact of the copy is much more noticeable when
a resource has several labels or really long strings in the key
or value.

By leveraging `GetLabel` RBAC can avoid copying the labels altogether
and simply lookup each label key when required.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants