Skip to content

Reduce resource consumption of ListUnifiedResources#53213

Merged
rosstimothy merged 1 commit intomasterfrom
tross/improve_unified_resources_perf
Mar 21, 2025
Merged

Reduce resource consumption of ListUnifiedResources#53213
rosstimothy merged 1 commit intomasterfrom
tross/improve_unified_resources_perf

Conversation

@rosstimothy
Copy link
Contributor

@rosstimothy rosstimothy commented Mar 20, 2025

The changes here were a result of examining CPU and memory profiles of BenchmarkListUnifiedResources. The biggest inefficiencies stemmed from the authorizer and (types.Server) MatchSearch.

When the authorizer computes lock targets it will call (types.LockTarget) Equals, which was relying on proto.Equals to determine if nine strings were equivalent. The fields are now individually checked and a test was added to ensure that if any new fields are added they must be added to the check for the test to pass.

ListUnifiedResources attempts to be smart and evaluate whether the user has read/list access to a particular resource kind once. However, it was always checking if users had access to all supported resources instead of just the resources that the user was requesting. While most of the checks are not very expensive, checking all resources is still worse than only checking the required resources.

Lastly, the (types.Server) MatchSearch implementation is no longer allocating a slice of search fields and using types.MatchSearch. Instead a variant of the loop from types.MatchSearch is inlined. While this was only applied to types.Server it should likely be extended to other resource types in the future - or types.MatchSearch should be refactored in a way that does not require callers to allocate.

$ benchstat old.txt new.txt
goos: darwin
goarch: arm64
pkg: github.com/gravitational/teleport/lib/auth
cpu: Apple M2 Pro
                                              │   old.txt   │               new.txt               │
                                              │   sec/op    │   sec/op     vs base                │
ListUnifiedResourcesFilter/labels-12            195.0m ± 6%   173.3m ± 2%  -11.12% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12    230.6m ± 3%   210.8m ± 4%   -8.57% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12   229.8m ± 5%   209.0m ± 7%   -9.08% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      600.6m ± 3%   364.3m ± 1%  -39.35% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      687.0m ± 2%   423.4m ± 1%  -38.37% (p=0.000 n=10)
geomean                                         335.7m        259.5m       -22.69%

                                              │    old.txt     │               new.txt                │
                                              │      B/op      │     B/op      vs base                │
ListUnifiedResourcesFilter/labels-12              868.8Ki ± 0%   414.1Ki ± 0%  -52.34% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12      7.731Mi ± 0%   7.287Mi ± 0%   -5.74% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12     7.731Mi ± 0%   7.288Mi ± 0%   -5.74% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      22106.9Ki ± 0%   486.8Ki ± 0%  -97.80% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      22107.0Ki ± 0%   486.3Ki ± 0%  -97.80% (p=0.000 n=10)
geomean                                           7.494Mi        1.371Mi       -81.70%

                                              │    old.txt    │               new.txt               │
                                              │   allocs/op   │  allocs/op   vs base                │
ListUnifiedResourcesFilter/labels-12             11.401k ± 0%   4.865k ± 0%  -57.32% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12      161.5k ± 0%   155.0k ± 0%   -4.04% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12     161.5k ± 0%   155.0k ± 0%   -4.04% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      162.212k ± 0%   5.277k ± 0%  -96.75% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      162.224k ± 0%   5.275k ± 0%  -96.75% (p=0.000 n=10)
geomean                                           95.22k        20.07k       -78.93%

Changelog: Improve resource consumption when retrieving resources via the Web UI or tsh ls.

t.Run("equal", func(t *testing.T) {
require.True(t, (LockTarget{}).Equals(LockTarget{}), "empty targets equal")

for i, field := range reflect.VisibleFields(reflect.TypeOf(LockTarget{})) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like a testhelper for this scramble behaviour would be super useful at somepoint

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I shamelessly stole this from TestLockTargetIsEmpty. If we are to continue hand rolling these functions I agree this could be made generic to cover other resources. Though perhaps we should also consider using goderive, or option (gogoproto.equal) = true; instead of hand rolling.

@rosstimothy
Copy link
Contributor Author

rosstimothy commented Mar 20, 2025

Flame graphs that inspired this change:
image
image

@rosstimothy rosstimothy marked this pull request as ready for review March 20, 2025 13:52
@github-actions github-actions bot requested review from Joerger and hugoShaka March 20, 2025 13:53
Comment on lines +6294 to +6296
types.ServerSpecV2{
Hostname: "node." + strconv.Itoa(i),
},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is inconsequential, but as I was running the benchmarks I noticed that our invalid hostname detection was flagging these nodes.

Comment on lines +6411 to +6413
types.ServerSpecV2{
Hostname: "node." + strconv.Itoa(i),
},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is inconsequential, but as I was running the benchmarks I noticed that our invalid hostname detection was flagging these nodes.

Outer:
for _, searchV := range values {
for key, value := range s.Metadata.Labels {
if containsFold(key, searchV) || containsFold(value, searchV) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're doing perf improvements, could we replace containsFold with github.com/charlievieth/strcase.Contains?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparison of benchmarks with containsFold vs github.com/charlievieth/strcase.Contains.

$ benchstat new.txt strcase.txt
goos: darwin
goarch: arm64
pkg: github.com/gravitational/teleport/lib/auth
cpu: Apple M2 Pro
                                              │   new.txt   │              strcase.txt            │
                                              │   sec/op    │   sec/op     vs base                │
ListUnifiedResourcesFilter/labels-12            173.3m ± 2%   178.2m ± 5%   +2.78% (p=0.011 n=10)
ListUnifiedResourcesFilter/predicate_path-12    210.8m ± 4%   214.9m ± 1%   +1.94% (p=0.023 n=10)
ListUnifiedResourcesFilter/predicate_index-12   209.0m ± 7%   211.1m ± 1%        ~ (p=0.529 n=10)
ListUnifiedResourcesFilter/search_lower-12      364.3m ± 1%   240.3m ± 3%  -34.03% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      423.4m ± 1%   278.1m ± 5%  -34.31% (p=0.000 n=10)
geomean                                         259.5m        222.1m       -14.43%

                                              │   new.txt    │               strcase.txt            │
                                              │     B/op     │     B/op      vs base                │
ListUnifiedResourcesFilter/labels-12            414.1Ki ± 0%   414.6Ki ± 0%        ~ (p=0.436 n=10)
ListUnifiedResourcesFilter/predicate_path-12    7.287Mi ± 0%   7.289Mi ± 0%        ~ (p=0.089 n=10)
ListUnifiedResourcesFilter/predicate_index-12   7.288Mi ± 0%   7.288Mi ± 0%        ~ (p=0.529 n=10)
ListUnifiedResourcesFilter/search_lower-12      486.8Ki ± 0%   427.8Ki ± 0%  -12.12% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      486.3Ki ± 0%   449.6Ki ± 0%   -7.55% (p=0.000 n=10)
geomean                                         1.371Mi        1.316Mi        -4.04%

                                              │   new.txt   │              strcase.txt           │
                                              │  allocs/op  │  allocs/op   vs base               │
ListUnifiedResourcesFilter/labels-12            4.865k ± 0%   4.869k ± 0%       ~ (p=0.670 n=10)
ListUnifiedResourcesFilter/predicate_path-12    155.0k ± 0%   155.0k ± 0%       ~ (p=0.643 n=10)
ListUnifiedResourcesFilter/predicate_index-12   155.0k ± 0%   155.0k ± 0%       ~ (p=0.468 n=10)
ListUnifiedResourcesFilter/search_lower-12      5.277k ± 0%   4.949k ± 0%  -6.22% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      5.275k ± 0%   5.076k ± 0%  -3.77% (p=0.000 n=10)
geomean                                         20.07k        19.66k       -2.02%

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find E!

@rosstimothy rosstimothy requested a review from espadolini March 20, 2025 20:07
@rosstimothy rosstimothy force-pushed the tross/improve_unified_resources_perf branch from 479534f to df17553 Compare March 20, 2025 20:13
@rosstimothy rosstimothy force-pushed the tross/improve_unified_resources_perf branch 2 times, most recently from c2da40c to b91364e Compare March 21, 2025 18:20
The changes here were a result of examining CPU and memory profiles
of BenchmarkListUnifiedResources. The biggest inefficiencies stemmed
from the authorizer and `(types.Server) MatchSearch`.

When the authorizer computes lock targets it will call `(types.LockTarget) Equals`,
which was relying on proto.Equals to determine if nine strings were
equivalent. The fields are now individually checked and a test
was added to ensure that if any new fields are added they must
be added to the check for the test to pass.

ListUnifiedResources attempts to be smart and evaluate whether the
user has read/list access to a particular resource kind once. However,
it was always checking if users had access to _all_ supported resources
instead of just the resources that the user was requesting. While most
of the checks are not very expensive, checking all resources is still
worse than only checking the required resources.

Lastly, the `(types.Server) MatchSearch` implementation is no longer
allocating a slice of search fields and using `types.MatchSearch`.
Instead a variant of the loop from `types.MatchSearch` is inlined.
While this was only applied to `types.Server` it should likely
be extended to other resource types in the future - or `types.MatchSearch`
should be refactored in a way that does not require callers to allocate.

```bash
$ benchstat old.txt new.txt
goos: darwin
goarch: arm64
pkg: github.com/gravitational/teleport/lib/auth
cpu: Apple M2 Pro
                                              │   old.txt   │               new.txt               │
                                              │   sec/op    │   sec/op     vs base                │
ListUnifiedResourcesFilter/labels-12            195.0m ± 6%   173.3m ± 2%  -11.12% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12    230.6m ± 3%   210.8m ± 4%   -8.57% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12   229.8m ± 5%   209.0m ± 7%   -9.08% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      600.6m ± 3%   364.3m ± 1%  -39.35% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      687.0m ± 2%   423.4m ± 1%  -38.37% (p=0.000 n=10)
geomean                                         335.7m        259.5m       -22.69%

                                              │    old.txt     │               new.txt                │
                                              │      B/op      │     B/op      vs base                │
ListUnifiedResourcesFilter/labels-12              868.8Ki ± 0%   414.1Ki ± 0%  -52.34% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12      7.731Mi ± 0%   7.287Mi ± 0%   -5.74% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12     7.731Mi ± 0%   7.288Mi ± 0%   -5.74% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      22106.9Ki ± 0%   486.8Ki ± 0%  -97.80% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      22107.0Ki ± 0%   486.3Ki ± 0%  -97.80% (p=0.000 n=10)
geomean                                           7.494Mi        1.371Mi       -81.70%

                                              │    old.txt    │               new.txt               │
                                              │   allocs/op   │  allocs/op   vs base                │
ListUnifiedResourcesFilter/labels-12             11.401k ± 0%   4.865k ± 0%  -57.32% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12      161.5k ± 0%   155.0k ± 0%   -4.04% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12     161.5k ± 0%   155.0k ± 0%   -4.04% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      162.212k ± 0%   5.277k ± 0%  -96.75% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      162.224k ± 0%   5.275k ± 0%  -96.75% (p=0.000 n=10)
geomean                                           95.22k        20.07k       -78.93%

```
@rosstimothy rosstimothy force-pushed the tross/improve_unified_resources_perf branch from b91364e to 8834392 Compare March 21, 2025 19:00
@rosstimothy rosstimothy enabled auto-merge March 21, 2025 19:15
@rosstimothy rosstimothy added this pull request to the merge queue Mar 21, 2025
Merged via the queue into master with commit 6d13e96 Mar 21, 2025
43 checks passed
@rosstimothy rosstimothy deleted the tross/improve_unified_resources_perf branch March 21, 2025 19:42
@backport-bot-workflows
Copy link
Contributor

@rosstimothy See the table below for backport results.

Branch Result
branch/v15 Failed
branch/v16 Failed
branch/v17 Failed

rosstimothy added a commit that referenced this pull request Mar 21, 2025
The changes here were a result of examining CPU and memory profiles
of BenchmarkListUnifiedResources. The biggest inefficiencies stemmed
from the authorizer and `(types.Server) MatchSearch`.

When the authorizer computes lock targets it will call `(types.LockTarget) Equals`,
which was relying on proto.Equals to determine if nine strings were
equivalent. The fields are now individually checked and a test
was added to ensure that if any new fields are added they must
be added to the check for the test to pass.

ListUnifiedResources attempts to be smart and evaluate whether the
user has read/list access to a particular resource kind once. However,
it was always checking if users had access to _all_ supported resources
instead of just the resources that the user was requesting. While most
of the checks are not very expensive, checking all resources is still
worse than only checking the required resources.

Lastly, the `(types.Server) MatchSearch` implementation is no longer
allocating a slice of search fields and using `types.MatchSearch`.
Instead a variant of the loop from `types.MatchSearch` is inlined.
While this was only applied to `types.Server` it should likely
be extended to other resource types in the future - or `types.MatchSearch`
should be refactored in a way that does not require callers to allocate.

```bash
$ benchstat old.txt new.txt
goos: darwin
goarch: arm64
pkg: github.com/gravitational/teleport/lib/auth
cpu: Apple M2 Pro
                                              │   old.txt   │               new.txt               │
                                              │   sec/op    │   sec/op     vs base                │
ListUnifiedResourcesFilter/labels-12            195.0m ± 6%   173.3m ± 2%  -11.12% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12    230.6m ± 3%   210.8m ± 4%   -8.57% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12   229.8m ± 5%   209.0m ± 7%   -9.08% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      600.6m ± 3%   364.3m ± 1%  -39.35% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      687.0m ± 2%   423.4m ± 1%  -38.37% (p=0.000 n=10)
geomean                                         335.7m        259.5m       -22.69%

                                              │    old.txt     │               new.txt                │
                                              │      B/op      │     B/op      vs base                │
ListUnifiedResourcesFilter/labels-12              868.8Ki ± 0%   414.1Ki ± 0%  -52.34% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12      7.731Mi ± 0%   7.287Mi ± 0%   -5.74% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12     7.731Mi ± 0%   7.288Mi ± 0%   -5.74% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      22106.9Ki ± 0%   486.8Ki ± 0%  -97.80% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      22107.0Ki ± 0%   486.3Ki ± 0%  -97.80% (p=0.000 n=10)
geomean                                           7.494Mi        1.371Mi       -81.70%

                                              │    old.txt    │               new.txt               │
                                              │   allocs/op   │  allocs/op   vs base                │
ListUnifiedResourcesFilter/labels-12             11.401k ± 0%   4.865k ± 0%  -57.32% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12      161.5k ± 0%   155.0k ± 0%   -4.04% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12     161.5k ± 0%   155.0k ± 0%   -4.04% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      162.212k ± 0%   5.277k ± 0%  -96.75% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      162.224k ± 0%   5.275k ± 0%  -96.75% (p=0.000 n=10)
geomean                                           95.22k        20.07k       -78.93%

```
rosstimothy added a commit that referenced this pull request Mar 21, 2025
The changes here were a result of examining CPU and memory profiles
of BenchmarkListUnifiedResources. The biggest inefficiencies stemmed
from the authorizer and `(types.Server) MatchSearch`.

When the authorizer computes lock targets it will call `(types.LockTarget) Equals`,
which was relying on proto.Equals to determine if nine strings were
equivalent. The fields are now individually checked and a test
was added to ensure that if any new fields are added they must
be added to the check for the test to pass.

ListUnifiedResources attempts to be smart and evaluate whether the
user has read/list access to a particular resource kind once. However,
it was always checking if users had access to _all_ supported resources
instead of just the resources that the user was requesting. While most
of the checks are not very expensive, checking all resources is still
worse than only checking the required resources.

Lastly, the `(types.Server) MatchSearch` implementation is no longer
allocating a slice of search fields and using `types.MatchSearch`.
Instead a variant of the loop from `types.MatchSearch` is inlined.
While this was only applied to `types.Server` it should likely
be extended to other resource types in the future - or `types.MatchSearch`
should be refactored in a way that does not require callers to allocate.

```bash
$ benchstat old.txt new.txt
goos: darwin
goarch: arm64
pkg: github.com/gravitational/teleport/lib/auth
cpu: Apple M2 Pro
                                              │   old.txt   │               new.txt               │
                                              │   sec/op    │   sec/op     vs base                │
ListUnifiedResourcesFilter/labels-12            195.0m ± 6%   173.3m ± 2%  -11.12% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12    230.6m ± 3%   210.8m ± 4%   -8.57% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12   229.8m ± 5%   209.0m ± 7%   -9.08% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      600.6m ± 3%   364.3m ± 1%  -39.35% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      687.0m ± 2%   423.4m ± 1%  -38.37% (p=0.000 n=10)
geomean                                         335.7m        259.5m       -22.69%

                                              │    old.txt     │               new.txt                │
                                              │      B/op      │     B/op      vs base                │
ListUnifiedResourcesFilter/labels-12              868.8Ki ± 0%   414.1Ki ± 0%  -52.34% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12      7.731Mi ± 0%   7.287Mi ± 0%   -5.74% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12     7.731Mi ± 0%   7.288Mi ± 0%   -5.74% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      22106.9Ki ± 0%   486.8Ki ± 0%  -97.80% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      22107.0Ki ± 0%   486.3Ki ± 0%  -97.80% (p=0.000 n=10)
geomean                                           7.494Mi        1.371Mi       -81.70%

                                              │    old.txt    │               new.txt               │
                                              │   allocs/op   │  allocs/op   vs base                │
ListUnifiedResourcesFilter/labels-12             11.401k ± 0%   4.865k ± 0%  -57.32% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12      161.5k ± 0%   155.0k ± 0%   -4.04% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12     161.5k ± 0%   155.0k ± 0%   -4.04% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      162.212k ± 0%   5.277k ± 0%  -96.75% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      162.224k ± 0%   5.275k ± 0%  -96.75% (p=0.000 n=10)
geomean                                           95.22k        20.07k       -78.93%

```
rosstimothy added a commit that referenced this pull request Mar 21, 2025
The changes here were a result of examining CPU and memory profiles
of BenchmarkListUnifiedResources. The biggest inefficiencies stemmed
from the authorizer and `(types.Server) MatchSearch`.

When the authorizer computes lock targets it will call `(types.LockTarget) Equals`,
which was relying on proto.Equals to determine if nine strings were
equivalent. The fields are now individually checked and a test
was added to ensure that if any new fields are added they must
be added to the check for the test to pass.

ListUnifiedResources attempts to be smart and evaluate whether the
user has read/list access to a particular resource kind once. However,
it was always checking if users had access to _all_ supported resources
instead of just the resources that the user was requesting. While most
of the checks are not very expensive, checking all resources is still
worse than only checking the required resources.

Lastly, the `(types.Server) MatchSearch` implementation is no longer
allocating a slice of search fields and using `types.MatchSearch`.
Instead a variant of the loop from `types.MatchSearch` is inlined.
While this was only applied to `types.Server` it should likely
be extended to other resource types in the future - or `types.MatchSearch`
should be refactored in a way that does not require callers to allocate.

```bash
$ benchstat old.txt new.txt
goos: darwin
goarch: arm64
pkg: github.com/gravitational/teleport/lib/auth
cpu: Apple M2 Pro
                                              │   old.txt   │               new.txt               │
                                              │   sec/op    │   sec/op     vs base                │
ListUnifiedResourcesFilter/labels-12            195.0m ± 6%   173.3m ± 2%  -11.12% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12    230.6m ± 3%   210.8m ± 4%   -8.57% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12   229.8m ± 5%   209.0m ± 7%   -9.08% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      600.6m ± 3%   364.3m ± 1%  -39.35% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      687.0m ± 2%   423.4m ± 1%  -38.37% (p=0.000 n=10)
geomean                                         335.7m        259.5m       -22.69%

                                              │    old.txt     │               new.txt                │
                                              │      B/op      │     B/op      vs base                │
ListUnifiedResourcesFilter/labels-12              868.8Ki ± 0%   414.1Ki ± 0%  -52.34% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12      7.731Mi ± 0%   7.287Mi ± 0%   -5.74% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12     7.731Mi ± 0%   7.288Mi ± 0%   -5.74% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      22106.9Ki ± 0%   486.8Ki ± 0%  -97.80% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      22107.0Ki ± 0%   486.3Ki ± 0%  -97.80% (p=0.000 n=10)
geomean                                           7.494Mi        1.371Mi       -81.70%

                                              │    old.txt    │               new.txt               │
                                              │   allocs/op   │  allocs/op   vs base                │
ListUnifiedResourcesFilter/labels-12             11.401k ± 0%   4.865k ± 0%  -57.32% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12      161.5k ± 0%   155.0k ± 0%   -4.04% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12     161.5k ± 0%   155.0k ± 0%   -4.04% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      162.212k ± 0%   5.277k ± 0%  -96.75% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      162.224k ± 0%   5.275k ± 0%  -96.75% (p=0.000 n=10)
geomean                                           95.22k        20.07k       -78.93%

```
rosstimothy added a commit that referenced this pull request Mar 21, 2025
The changes here were a result of examining CPU and memory profiles
of BenchmarkListUnifiedResources. The biggest inefficiencies stemmed
from the authorizer and `(types.Server) MatchSearch`.

When the authorizer computes lock targets it will call `(types.LockTarget) Equals`,
which was relying on proto.Equals to determine if nine strings were
equivalent. The fields are now individually checked and a test
was added to ensure that if any new fields are added they must
be added to the check for the test to pass.

ListUnifiedResources attempts to be smart and evaluate whether the
user has read/list access to a particular resource kind once. However,
it was always checking if users had access to _all_ supported resources
instead of just the resources that the user was requesting. While most
of the checks are not very expensive, checking all resources is still
worse than only checking the required resources.

Lastly, the `(types.Server) MatchSearch` implementation is no longer
allocating a slice of search fields and using `types.MatchSearch`.
Instead a variant of the loop from `types.MatchSearch` is inlined.
While this was only applied to `types.Server` it should likely
be extended to other resource types in the future - or `types.MatchSearch`
should be refactored in a way that does not require callers to allocate.

```bash
$ benchstat old.txt new.txt
goos: darwin
goarch: arm64
pkg: github.com/gravitational/teleport/lib/auth
cpu: Apple M2 Pro
                                              │   old.txt   │               new.txt               │
                                              │   sec/op    │   sec/op     vs base                │
ListUnifiedResourcesFilter/labels-12            195.0m ± 6%   173.3m ± 2%  -11.12% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12    230.6m ± 3%   210.8m ± 4%   -8.57% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12   229.8m ± 5%   209.0m ± 7%   -9.08% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      600.6m ± 3%   364.3m ± 1%  -39.35% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      687.0m ± 2%   423.4m ± 1%  -38.37% (p=0.000 n=10)
geomean                                         335.7m        259.5m       -22.69%

                                              │    old.txt     │               new.txt                │
                                              │      B/op      │     B/op      vs base                │
ListUnifiedResourcesFilter/labels-12              868.8Ki ± 0%   414.1Ki ± 0%  -52.34% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12      7.731Mi ± 0%   7.287Mi ± 0%   -5.74% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12     7.731Mi ± 0%   7.288Mi ± 0%   -5.74% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      22106.9Ki ± 0%   486.8Ki ± 0%  -97.80% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      22107.0Ki ± 0%   486.3Ki ± 0%  -97.80% (p=0.000 n=10)
geomean                                           7.494Mi        1.371Mi       -81.70%

                                              │    old.txt    │               new.txt               │
                                              │   allocs/op   │  allocs/op   vs base                │
ListUnifiedResourcesFilter/labels-12             11.401k ± 0%   4.865k ± 0%  -57.32% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12      161.5k ± 0%   155.0k ± 0%   -4.04% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12     161.5k ± 0%   155.0k ± 0%   -4.04% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      162.212k ± 0%   5.277k ± 0%  -96.75% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      162.224k ± 0%   5.275k ± 0%  -96.75% (p=0.000 n=10)
geomean                                           95.22k        20.07k       -78.93%

```
rosstimothy added a commit that referenced this pull request Mar 21, 2025
The changes here were a result of examining CPU and memory profiles
of BenchmarkListUnifiedResources. The biggest inefficiencies stemmed
from the authorizer and `(types.Server) MatchSearch`.

When the authorizer computes lock targets it will call `(types.LockTarget) Equals`,
which was relying on proto.Equals to determine if nine strings were
equivalent. The fields are now individually checked and a test
was added to ensure that if any new fields are added they must
be added to the check for the test to pass.

ListUnifiedResources attempts to be smart and evaluate whether the
user has read/list access to a particular resource kind once. However,
it was always checking if users had access to _all_ supported resources
instead of just the resources that the user was requesting. While most
of the checks are not very expensive, checking all resources is still
worse than only checking the required resources.

Lastly, the `(types.Server) MatchSearch` implementation is no longer
allocating a slice of search fields and using `types.MatchSearch`.
Instead a variant of the loop from `types.MatchSearch` is inlined.
While this was only applied to `types.Server` it should likely
be extended to other resource types in the future - or `types.MatchSearch`
should be refactored in a way that does not require callers to allocate.

```bash
$ benchstat old.txt new.txt
goos: darwin
goarch: arm64
pkg: github.com/gravitational/teleport/lib/auth
cpu: Apple M2 Pro
                                              │   old.txt   │               new.txt               │
                                              │   sec/op    │   sec/op     vs base                │
ListUnifiedResourcesFilter/labels-12            195.0m ± 6%   173.3m ± 2%  -11.12% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12    230.6m ± 3%   210.8m ± 4%   -8.57% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12   229.8m ± 5%   209.0m ± 7%   -9.08% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      600.6m ± 3%   364.3m ± 1%  -39.35% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      687.0m ± 2%   423.4m ± 1%  -38.37% (p=0.000 n=10)
geomean                                         335.7m        259.5m       -22.69%

                                              │    old.txt     │               new.txt                │
                                              │      B/op      │     B/op      vs base                │
ListUnifiedResourcesFilter/labels-12              868.8Ki ± 0%   414.1Ki ± 0%  -52.34% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12      7.731Mi ± 0%   7.287Mi ± 0%   -5.74% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12     7.731Mi ± 0%   7.288Mi ± 0%   -5.74% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      22106.9Ki ± 0%   486.8Ki ± 0%  -97.80% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      22107.0Ki ± 0%   486.3Ki ± 0%  -97.80% (p=0.000 n=10)
geomean                                           7.494Mi        1.371Mi       -81.70%

                                              │    old.txt    │               new.txt               │
                                              │   allocs/op   │  allocs/op   vs base                │
ListUnifiedResourcesFilter/labels-12             11.401k ± 0%   4.865k ± 0%  -57.32% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12      161.5k ± 0%   155.0k ± 0%   -4.04% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12     161.5k ± 0%   155.0k ± 0%   -4.04% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      162.212k ± 0%   5.277k ± 0%  -96.75% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      162.224k ± 0%   5.275k ± 0%  -96.75% (p=0.000 n=10)
geomean                                           95.22k        20.07k       -78.93%

```
rosstimothy added a commit that referenced this pull request Mar 21, 2025
The changes here were a result of examining CPU and memory profiles
of BenchmarkListUnifiedResources. The biggest inefficiencies stemmed
from the authorizer and `(types.Server) MatchSearch`.

When the authorizer computes lock targets it will call `(types.LockTarget) Equals`,
which was relying on proto.Equals to determine if nine strings were
equivalent. The fields are now individually checked and a test
was added to ensure that if any new fields are added they must
be added to the check for the test to pass.

ListUnifiedResources attempts to be smart and evaluate whether the
user has read/list access to a particular resource kind once. However,
it was always checking if users had access to _all_ supported resources
instead of just the resources that the user was requesting. While most
of the checks are not very expensive, checking all resources is still
worse than only checking the required resources.

Lastly, the `(types.Server) MatchSearch` implementation is no longer
allocating a slice of search fields and using `types.MatchSearch`.
Instead a variant of the loop from `types.MatchSearch` is inlined.
While this was only applied to `types.Server` it should likely
be extended to other resource types in the future - or `types.MatchSearch`
should be refactored in a way that does not require callers to allocate.

```bash
$ benchstat old.txt new.txt
goos: darwin
goarch: arm64
pkg: github.com/gravitational/teleport/lib/auth
cpu: Apple M2 Pro
                                              │   old.txt   │               new.txt               │
                                              │   sec/op    │   sec/op     vs base                │
ListUnifiedResourcesFilter/labels-12            195.0m ± 6%   173.3m ± 2%  -11.12% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12    230.6m ± 3%   210.8m ± 4%   -8.57% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12   229.8m ± 5%   209.0m ± 7%   -9.08% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      600.6m ± 3%   364.3m ± 1%  -39.35% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      687.0m ± 2%   423.4m ± 1%  -38.37% (p=0.000 n=10)
geomean                                         335.7m        259.5m       -22.69%

                                              │    old.txt     │               new.txt                │
                                              │      B/op      │     B/op      vs base                │
ListUnifiedResourcesFilter/labels-12              868.8Ki ± 0%   414.1Ki ± 0%  -52.34% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12      7.731Mi ± 0%   7.287Mi ± 0%   -5.74% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12     7.731Mi ± 0%   7.288Mi ± 0%   -5.74% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      22106.9Ki ± 0%   486.8Ki ± 0%  -97.80% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      22107.0Ki ± 0%   486.3Ki ± 0%  -97.80% (p=0.000 n=10)
geomean                                           7.494Mi        1.371Mi       -81.70%

                                              │    old.txt    │               new.txt               │
                                              │   allocs/op   │  allocs/op   vs base                │
ListUnifiedResourcesFilter/labels-12             11.401k ± 0%   4.865k ± 0%  -57.32% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12      161.5k ± 0%   155.0k ± 0%   -4.04% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12     161.5k ± 0%   155.0k ± 0%   -4.04% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      162.212k ± 0%   5.277k ± 0%  -96.75% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      162.224k ± 0%   5.275k ± 0%  -96.75% (p=0.000 n=10)
geomean                                           95.22k        20.07k       -78.93%

```
rosstimothy added a commit that referenced this pull request Mar 21, 2025
The changes here were a result of examining CPU and memory profiles
of BenchmarkListUnifiedResources. The biggest inefficiencies stemmed
from the authorizer and `(types.Server) MatchSearch`.

When the authorizer computes lock targets it will call `(types.LockTarget) Equals`,
which was relying on proto.Equals to determine if nine strings were
equivalent. The fields are now individually checked and a test
was added to ensure that if any new fields are added they must
be added to the check for the test to pass.

ListUnifiedResources attempts to be smart and evaluate whether the
user has read/list access to a particular resource kind once. However,
it was always checking if users had access to _all_ supported resources
instead of just the resources that the user was requesting. While most
of the checks are not very expensive, checking all resources is still
worse than only checking the required resources.

Lastly, the `(types.Server) MatchSearch` implementation is no longer
allocating a slice of search fields and using `types.MatchSearch`.
Instead a variant of the loop from `types.MatchSearch` is inlined.
While this was only applied to `types.Server` it should likely
be extended to other resource types in the future - or `types.MatchSearch`
should be refactored in a way that does not require callers to allocate.

```bash
$ benchstat old.txt new.txt
goos: darwin
goarch: arm64
pkg: github.com/gravitational/teleport/lib/auth
cpu: Apple M2 Pro
                                              │   old.txt   │               new.txt               │
                                              │   sec/op    │   sec/op     vs base                │
ListUnifiedResourcesFilter/labels-12            195.0m ± 6%   173.3m ± 2%  -11.12% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12    230.6m ± 3%   210.8m ± 4%   -8.57% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12   229.8m ± 5%   209.0m ± 7%   -9.08% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      600.6m ± 3%   364.3m ± 1%  -39.35% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      687.0m ± 2%   423.4m ± 1%  -38.37% (p=0.000 n=10)
geomean                                         335.7m        259.5m       -22.69%

                                              │    old.txt     │               new.txt                │
                                              │      B/op      │     B/op      vs base                │
ListUnifiedResourcesFilter/labels-12              868.8Ki ± 0%   414.1Ki ± 0%  -52.34% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12      7.731Mi ± 0%   7.287Mi ± 0%   -5.74% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12     7.731Mi ± 0%   7.288Mi ± 0%   -5.74% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      22106.9Ki ± 0%   486.8Ki ± 0%  -97.80% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      22107.0Ki ± 0%   486.3Ki ± 0%  -97.80% (p=0.000 n=10)
geomean                                           7.494Mi        1.371Mi       -81.70%

                                              │    old.txt    │               new.txt               │
                                              │   allocs/op   │  allocs/op   vs base                │
ListUnifiedResourcesFilter/labels-12             11.401k ± 0%   4.865k ± 0%  -57.32% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12      161.5k ± 0%   155.0k ± 0%   -4.04% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12     161.5k ± 0%   155.0k ± 0%   -4.04% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      162.212k ± 0%   5.277k ± 0%  -96.75% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      162.224k ± 0%   5.275k ± 0%  -96.75% (p=0.000 n=10)
geomean                                           95.22k        20.07k       -78.93%

```
rosstimothy added a commit that referenced this pull request Mar 21, 2025
The changes here were a result of examining CPU and memory profiles
of BenchmarkListUnifiedResources. The biggest inefficiencies stemmed
from the authorizer and `(types.Server) MatchSearch`.

When the authorizer computes lock targets it will call `(types.LockTarget) Equals`,
which was relying on proto.Equals to determine if nine strings were
equivalent. The fields are now individually checked and a test
was added to ensure that if any new fields are added they must
be added to the check for the test to pass.

ListUnifiedResources attempts to be smart and evaluate whether the
user has read/list access to a particular resource kind once. However,
it was always checking if users had access to _all_ supported resources
instead of just the resources that the user was requesting. While most
of the checks are not very expensive, checking all resources is still
worse than only checking the required resources.

Lastly, the `(types.Server) MatchSearch` implementation is no longer
allocating a slice of search fields and using `types.MatchSearch`.
Instead a variant of the loop from `types.MatchSearch` is inlined.
While this was only applied to `types.Server` it should likely
be extended to other resource types in the future - or `types.MatchSearch`
should be refactored in a way that does not require callers to allocate.

```bash
$ benchstat old.txt new.txt
goos: darwin
goarch: arm64
pkg: github.com/gravitational/teleport/lib/auth
cpu: Apple M2 Pro
                                              │   old.txt   │               new.txt               │
                                              │   sec/op    │   sec/op     vs base                │
ListUnifiedResourcesFilter/labels-12            195.0m ± 6%   173.3m ± 2%  -11.12% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12    230.6m ± 3%   210.8m ± 4%   -8.57% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12   229.8m ± 5%   209.0m ± 7%   -9.08% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      600.6m ± 3%   364.3m ± 1%  -39.35% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      687.0m ± 2%   423.4m ± 1%  -38.37% (p=0.000 n=10)
geomean                                         335.7m        259.5m       -22.69%

                                              │    old.txt     │               new.txt                │
                                              │      B/op      │     B/op      vs base                │
ListUnifiedResourcesFilter/labels-12              868.8Ki ± 0%   414.1Ki ± 0%  -52.34% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12      7.731Mi ± 0%   7.287Mi ± 0%   -5.74% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12     7.731Mi ± 0%   7.288Mi ± 0%   -5.74% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      22106.9Ki ± 0%   486.8Ki ± 0%  -97.80% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      22107.0Ki ± 0%   486.3Ki ± 0%  -97.80% (p=0.000 n=10)
geomean                                           7.494Mi        1.371Mi       -81.70%

                                              │    old.txt    │               new.txt               │
                                              │   allocs/op   │  allocs/op   vs base                │
ListUnifiedResourcesFilter/labels-12             11.401k ± 0%   4.865k ± 0%  -57.32% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12      161.5k ± 0%   155.0k ± 0%   -4.04% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12     161.5k ± 0%   155.0k ± 0%   -4.04% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      162.212k ± 0%   5.277k ± 0%  -96.75% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      162.224k ± 0%   5.275k ± 0%  -96.75% (p=0.000 n=10)
geomean                                           95.22k        20.07k       -78.93%

```
rosstimothy added a commit that referenced this pull request Mar 21, 2025
The changes here were a result of examining CPU and memory profiles
of BenchmarkListUnifiedResources. The biggest inefficiencies stemmed
from the authorizer and `(types.Server) MatchSearch`.

When the authorizer computes lock targets it will call `(types.LockTarget) Equals`,
which was relying on proto.Equals to determine if nine strings were
equivalent. The fields are now individually checked and a test
was added to ensure that if any new fields are added they must
be added to the check for the test to pass.

ListUnifiedResources attempts to be smart and evaluate whether the
user has read/list access to a particular resource kind once. However,
it was always checking if users had access to _all_ supported resources
instead of just the resources that the user was requesting. While most
of the checks are not very expensive, checking all resources is still
worse than only checking the required resources.

Lastly, the `(types.Server) MatchSearch` implementation is no longer
allocating a slice of search fields and using `types.MatchSearch`.
Instead a variant of the loop from `types.MatchSearch` is inlined.
While this was only applied to `types.Server` it should likely
be extended to other resource types in the future - or `types.MatchSearch`
should be refactored in a way that does not require callers to allocate.

```bash
$ benchstat old.txt new.txt
goos: darwin
goarch: arm64
pkg: github.com/gravitational/teleport/lib/auth
cpu: Apple M2 Pro
                                              │   old.txt   │               new.txt               │
                                              │   sec/op    │   sec/op     vs base                │
ListUnifiedResourcesFilter/labels-12            195.0m ± 6%   173.3m ± 2%  -11.12% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12    230.6m ± 3%   210.8m ± 4%   -8.57% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12   229.8m ± 5%   209.0m ± 7%   -9.08% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      600.6m ± 3%   364.3m ± 1%  -39.35% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      687.0m ± 2%   423.4m ± 1%  -38.37% (p=0.000 n=10)
geomean                                         335.7m        259.5m       -22.69%

                                              │    old.txt     │               new.txt                │
                                              │      B/op      │     B/op      vs base                │
ListUnifiedResourcesFilter/labels-12              868.8Ki ± 0%   414.1Ki ± 0%  -52.34% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12      7.731Mi ± 0%   7.287Mi ± 0%   -5.74% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12     7.731Mi ± 0%   7.288Mi ± 0%   -5.74% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      22106.9Ki ± 0%   486.8Ki ± 0%  -97.80% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      22107.0Ki ± 0%   486.3Ki ± 0%  -97.80% (p=0.000 n=10)
geomean                                           7.494Mi        1.371Mi       -81.70%

                                              │    old.txt    │               new.txt               │
                                              │   allocs/op   │  allocs/op   vs base                │
ListUnifiedResourcesFilter/labels-12             11.401k ± 0%   4.865k ± 0%  -57.32% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12      161.5k ± 0%   155.0k ± 0%   -4.04% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12     161.5k ± 0%   155.0k ± 0%   -4.04% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      162.212k ± 0%   5.277k ± 0%  -96.75% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      162.224k ± 0%   5.275k ± 0%  -96.75% (p=0.000 n=10)
geomean                                           95.22k        20.07k       -78.93%

```
rosstimothy added a commit that referenced this pull request Mar 21, 2025
The changes here were a result of examining CPU and memory profiles
of BenchmarkListUnifiedResources. The biggest inefficiencies stemmed
from the authorizer and `(types.Server) MatchSearch`.

When the authorizer computes lock targets it will call `(types.LockTarget) Equals`,
which was relying on proto.Equals to determine if nine strings were
equivalent. The fields are now individually checked and a test
was added to ensure that if any new fields are added they must
be added to the check for the test to pass.

ListUnifiedResources attempts to be smart and evaluate whether the
user has read/list access to a particular resource kind once. However,
it was always checking if users had access to _all_ supported resources
instead of just the resources that the user was requesting. While most
of the checks are not very expensive, checking all resources is still
worse than only checking the required resources.

Lastly, the `(types.Server) MatchSearch` implementation is no longer
allocating a slice of search fields and using `types.MatchSearch`.
Instead a variant of the loop from `types.MatchSearch` is inlined.
While this was only applied to `types.Server` it should likely
be extended to other resource types in the future - or `types.MatchSearch`
should be refactored in a way that does not require callers to allocate.

```bash
$ benchstat old.txt new.txt
goos: darwin
goarch: arm64
pkg: github.com/gravitational/teleport/lib/auth
cpu: Apple M2 Pro
                                              │   old.txt   │               new.txt               │
                                              │   sec/op    │   sec/op     vs base                │
ListUnifiedResourcesFilter/labels-12            195.0m ± 6%   173.3m ± 2%  -11.12% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12    230.6m ± 3%   210.8m ± 4%   -8.57% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12   229.8m ± 5%   209.0m ± 7%   -9.08% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      600.6m ± 3%   364.3m ± 1%  -39.35% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      687.0m ± 2%   423.4m ± 1%  -38.37% (p=0.000 n=10)
geomean                                         335.7m        259.5m       -22.69%

                                              │    old.txt     │               new.txt                │
                                              │      B/op      │     B/op      vs base                │
ListUnifiedResourcesFilter/labels-12              868.8Ki ± 0%   414.1Ki ± 0%  -52.34% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12      7.731Mi ± 0%   7.287Mi ± 0%   -5.74% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12     7.731Mi ± 0%   7.288Mi ± 0%   -5.74% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      22106.9Ki ± 0%   486.8Ki ± 0%  -97.80% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      22107.0Ki ± 0%   486.3Ki ± 0%  -97.80% (p=0.000 n=10)
geomean                                           7.494Mi        1.371Mi       -81.70%

                                              │    old.txt    │               new.txt               │
                                              │   allocs/op   │  allocs/op   vs base                │
ListUnifiedResourcesFilter/labels-12             11.401k ± 0%   4.865k ± 0%  -57.32% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12      161.5k ± 0%   155.0k ± 0%   -4.04% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12     161.5k ± 0%   155.0k ± 0%   -4.04% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      162.212k ± 0%   5.277k ± 0%  -96.75% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      162.224k ± 0%   5.275k ± 0%  -96.75% (p=0.000 n=10)
geomean                                           95.22k        20.07k       -78.93%

```
rosstimothy added a commit that referenced this pull request Mar 21, 2025
The changes here were a result of examining CPU and memory profiles
of BenchmarkListUnifiedResources. The biggest inefficiencies stemmed
from the authorizer and `(types.Server) MatchSearch`.

When the authorizer computes lock targets it will call `(types.LockTarget) Equals`,
which was relying on proto.Equals to determine if nine strings were
equivalent. The fields are now individually checked and a test
was added to ensure that if any new fields are added they must
be added to the check for the test to pass.

ListUnifiedResources attempts to be smart and evaluate whether the
user has read/list access to a particular resource kind once. However,
it was always checking if users had access to _all_ supported resources
instead of just the resources that the user was requesting. While most
of the checks are not very expensive, checking all resources is still
worse than only checking the required resources.

Lastly, the `(types.Server) MatchSearch` implementation is no longer
allocating a slice of search fields and using `types.MatchSearch`.
Instead a variant of the loop from `types.MatchSearch` is inlined.
While this was only applied to `types.Server` it should likely
be extended to other resource types in the future - or `types.MatchSearch`
should be refactored in a way that does not require callers to allocate.

```bash
$ benchstat old.txt new.txt
goos: darwin
goarch: arm64
pkg: github.com/gravitational/teleport/lib/auth
cpu: Apple M2 Pro
                                              │   old.txt   │               new.txt               │
                                              │   sec/op    │   sec/op     vs base                │
ListUnifiedResourcesFilter/labels-12            195.0m ± 6%   173.3m ± 2%  -11.12% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12    230.6m ± 3%   210.8m ± 4%   -8.57% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12   229.8m ± 5%   209.0m ± 7%   -9.08% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      600.6m ± 3%   364.3m ± 1%  -39.35% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      687.0m ± 2%   423.4m ± 1%  -38.37% (p=0.000 n=10)
geomean                                         335.7m        259.5m       -22.69%

                                              │    old.txt     │               new.txt                │
                                              │      B/op      │     B/op      vs base                │
ListUnifiedResourcesFilter/labels-12              868.8Ki ± 0%   414.1Ki ± 0%  -52.34% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12      7.731Mi ± 0%   7.287Mi ± 0%   -5.74% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12     7.731Mi ± 0%   7.288Mi ± 0%   -5.74% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      22106.9Ki ± 0%   486.8Ki ± 0%  -97.80% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      22107.0Ki ± 0%   486.3Ki ± 0%  -97.80% (p=0.000 n=10)
geomean                                           7.494Mi        1.371Mi       -81.70%

                                              │    old.txt    │               new.txt               │
                                              │   allocs/op   │  allocs/op   vs base                │
ListUnifiedResourcesFilter/labels-12             11.401k ± 0%   4.865k ± 0%  -57.32% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12      161.5k ± 0%   155.0k ± 0%   -4.04% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12     161.5k ± 0%   155.0k ± 0%   -4.04% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      162.212k ± 0%   5.277k ± 0%  -96.75% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      162.224k ± 0%   5.275k ± 0%  -96.75% (p=0.000 n=10)
geomean                                           95.22k        20.07k       -78.93%

```
rosstimothy added a commit that referenced this pull request Mar 21, 2025
The changes here were a result of examining CPU and memory profiles
of BenchmarkListUnifiedResources. The biggest inefficiencies stemmed
from the authorizer and `(types.Server) MatchSearch`.

When the authorizer computes lock targets it will call `(types.LockTarget) Equals`,
which was relying on proto.Equals to determine if nine strings were
equivalent. The fields are now individually checked and a test
was added to ensure that if any new fields are added they must
be added to the check for the test to pass.

ListUnifiedResources attempts to be smart and evaluate whether the
user has read/list access to a particular resource kind once. However,
it was always checking if users had access to _all_ supported resources
instead of just the resources that the user was requesting. While most
of the checks are not very expensive, checking all resources is still
worse than only checking the required resources.

Lastly, the `(types.Server) MatchSearch` implementation is no longer
allocating a slice of search fields and using `types.MatchSearch`.
Instead a variant of the loop from `types.MatchSearch` is inlined.
While this was only applied to `types.Server` it should likely
be extended to other resource types in the future - or `types.MatchSearch`
should be refactored in a way that does not require callers to allocate.

```bash
$ benchstat old.txt new.txt
goos: darwin
goarch: arm64
pkg: github.com/gravitational/teleport/lib/auth
cpu: Apple M2 Pro
                                              │   old.txt   │               new.txt               │
                                              │   sec/op    │   sec/op     vs base                │
ListUnifiedResourcesFilter/labels-12            195.0m ± 6%   173.3m ± 2%  -11.12% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12    230.6m ± 3%   210.8m ± 4%   -8.57% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12   229.8m ± 5%   209.0m ± 7%   -9.08% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      600.6m ± 3%   364.3m ± 1%  -39.35% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      687.0m ± 2%   423.4m ± 1%  -38.37% (p=0.000 n=10)
geomean                                         335.7m        259.5m       -22.69%

                                              │    old.txt     │               new.txt                │
                                              │      B/op      │     B/op      vs base                │
ListUnifiedResourcesFilter/labels-12              868.8Ki ± 0%   414.1Ki ± 0%  -52.34% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12      7.731Mi ± 0%   7.287Mi ± 0%   -5.74% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12     7.731Mi ± 0%   7.288Mi ± 0%   -5.74% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      22106.9Ki ± 0%   486.8Ki ± 0%  -97.80% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      22107.0Ki ± 0%   486.3Ki ± 0%  -97.80% (p=0.000 n=10)
geomean                                           7.494Mi        1.371Mi       -81.70%

                                              │    old.txt    │               new.txt               │
                                              │   allocs/op   │  allocs/op   vs base                │
ListUnifiedResourcesFilter/labels-12             11.401k ± 0%   4.865k ± 0%  -57.32% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12      161.5k ± 0%   155.0k ± 0%   -4.04% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12     161.5k ± 0%   155.0k ± 0%   -4.04% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      162.212k ± 0%   5.277k ± 0%  -96.75% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      162.224k ± 0%   5.275k ± 0%  -96.75% (p=0.000 n=10)
geomean                                           95.22k        20.07k       -78.93%

```
github-merge-queue bot pushed a commit that referenced this pull request Mar 21, 2025
The changes here were a result of examining CPU and memory profiles
of BenchmarkListUnifiedResources. The biggest inefficiencies stemmed
from the authorizer and `(types.Server) MatchSearch`.

When the authorizer computes lock targets it will call `(types.LockTarget) Equals`,
which was relying on proto.Equals to determine if nine strings were
equivalent. The fields are now individually checked and a test
was added to ensure that if any new fields are added they must
be added to the check for the test to pass.

ListUnifiedResources attempts to be smart and evaluate whether the
user has read/list access to a particular resource kind once. However,
it was always checking if users had access to _all_ supported resources
instead of just the resources that the user was requesting. While most
of the checks are not very expensive, checking all resources is still
worse than only checking the required resources.

Lastly, the `(types.Server) MatchSearch` implementation is no longer
allocating a slice of search fields and using `types.MatchSearch`.
Instead a variant of the loop from `types.MatchSearch` is inlined.
While this was only applied to `types.Server` it should likely
be extended to other resource types in the future - or `types.MatchSearch`
should be refactored in a way that does not require callers to allocate.

```bash
$ benchstat old.txt new.txt
goos: darwin
goarch: arm64
pkg: github.com/gravitational/teleport/lib/auth
cpu: Apple M2 Pro
                                              │   old.txt   │               new.txt               │
                                              │   sec/op    │   sec/op     vs base                │
ListUnifiedResourcesFilter/labels-12            195.0m ± 6%   173.3m ± 2%  -11.12% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12    230.6m ± 3%   210.8m ± 4%   -8.57% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12   229.8m ± 5%   209.0m ± 7%   -9.08% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      600.6m ± 3%   364.3m ± 1%  -39.35% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      687.0m ± 2%   423.4m ± 1%  -38.37% (p=0.000 n=10)
geomean                                         335.7m        259.5m       -22.69%

                                              │    old.txt     │               new.txt                │
                                              │      B/op      │     B/op      vs base                │
ListUnifiedResourcesFilter/labels-12              868.8Ki ± 0%   414.1Ki ± 0%  -52.34% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12      7.731Mi ± 0%   7.287Mi ± 0%   -5.74% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12     7.731Mi ± 0%   7.288Mi ± 0%   -5.74% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      22106.9Ki ± 0%   486.8Ki ± 0%  -97.80% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      22107.0Ki ± 0%   486.3Ki ± 0%  -97.80% (p=0.000 n=10)
geomean                                           7.494Mi        1.371Mi       -81.70%

                                              │    old.txt    │               new.txt               │
                                              │   allocs/op   │  allocs/op   vs base                │
ListUnifiedResourcesFilter/labels-12             11.401k ± 0%   4.865k ± 0%  -57.32% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12      161.5k ± 0%   155.0k ± 0%   -4.04% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12     161.5k ± 0%   155.0k ± 0%   -4.04% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      162.212k ± 0%   5.277k ± 0%  -96.75% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      162.224k ± 0%   5.275k ± 0%  -96.75% (p=0.000 n=10)
geomean                                           95.22k        20.07k       -78.93%

```
github-merge-queue bot pushed a commit that referenced this pull request Mar 21, 2025
The changes here were a result of examining CPU and memory profiles
of BenchmarkListUnifiedResources. The biggest inefficiencies stemmed
from the authorizer and `(types.Server) MatchSearch`.

When the authorizer computes lock targets it will call `(types.LockTarget) Equals`,
which was relying on proto.Equals to determine if nine strings were
equivalent. The fields are now individually checked and a test
was added to ensure that if any new fields are added they must
be added to the check for the test to pass.

ListUnifiedResources attempts to be smart and evaluate whether the
user has read/list access to a particular resource kind once. However,
it was always checking if users had access to _all_ supported resources
instead of just the resources that the user was requesting. While most
of the checks are not very expensive, checking all resources is still
worse than only checking the required resources.

Lastly, the `(types.Server) MatchSearch` implementation is no longer
allocating a slice of search fields and using `types.MatchSearch`.
Instead a variant of the loop from `types.MatchSearch` is inlined.
While this was only applied to `types.Server` it should likely
be extended to other resource types in the future - or `types.MatchSearch`
should be refactored in a way that does not require callers to allocate.

```bash
$ benchstat old.txt new.txt
goos: darwin
goarch: arm64
pkg: github.com/gravitational/teleport/lib/auth
cpu: Apple M2 Pro
                                              │   old.txt   │               new.txt               │
                                              │   sec/op    │   sec/op     vs base                │
ListUnifiedResourcesFilter/labels-12            195.0m ± 6%   173.3m ± 2%  -11.12% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12    230.6m ± 3%   210.8m ± 4%   -8.57% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12   229.8m ± 5%   209.0m ± 7%   -9.08% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      600.6m ± 3%   364.3m ± 1%  -39.35% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      687.0m ± 2%   423.4m ± 1%  -38.37% (p=0.000 n=10)
geomean                                         335.7m        259.5m       -22.69%

                                              │    old.txt     │               new.txt                │
                                              │      B/op      │     B/op      vs base                │
ListUnifiedResourcesFilter/labels-12              868.8Ki ± 0%   414.1Ki ± 0%  -52.34% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12      7.731Mi ± 0%   7.287Mi ± 0%   -5.74% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12     7.731Mi ± 0%   7.288Mi ± 0%   -5.74% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      22106.9Ki ± 0%   486.8Ki ± 0%  -97.80% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      22107.0Ki ± 0%   486.3Ki ± 0%  -97.80% (p=0.000 n=10)
geomean                                           7.494Mi        1.371Mi       -81.70%

                                              │    old.txt    │               new.txt               │
                                              │   allocs/op   │  allocs/op   vs base                │
ListUnifiedResourcesFilter/labels-12             11.401k ± 0%   4.865k ± 0%  -57.32% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12      161.5k ± 0%   155.0k ± 0%   -4.04% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12     161.5k ± 0%   155.0k ± 0%   -4.04% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      162.212k ± 0%   5.277k ± 0%  -96.75% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      162.224k ± 0%   5.275k ± 0%  -96.75% (p=0.000 n=10)
geomean                                           95.22k        20.07k       -78.93%

```
github-merge-queue bot pushed a commit that referenced this pull request Mar 23, 2025
The changes here were a result of examining CPU and memory profiles
of BenchmarkListUnifiedResources. The biggest inefficiencies stemmed
from the authorizer and `(types.Server) MatchSearch`.

When the authorizer computes lock targets it will call `(types.LockTarget) Equals`,
which was relying on proto.Equals to determine if nine strings were
equivalent. The fields are now individually checked and a test
was added to ensure that if any new fields are added they must
be added to the check for the test to pass.

ListUnifiedResources attempts to be smart and evaluate whether the
user has read/list access to a particular resource kind once. However,
it was always checking if users had access to _all_ supported resources
instead of just the resources that the user was requesting. While most
of the checks are not very expensive, checking all resources is still
worse than only checking the required resources.

Lastly, the `(types.Server) MatchSearch` implementation is no longer
allocating a slice of search fields and using `types.MatchSearch`.
Instead a variant of the loop from `types.MatchSearch` is inlined.
While this was only applied to `types.Server` it should likely
be extended to other resource types in the future - or `types.MatchSearch`
should be refactored in a way that does not require callers to allocate.

```bash
$ benchstat old.txt new.txt
goos: darwin
goarch: arm64
pkg: github.com/gravitational/teleport/lib/auth
cpu: Apple M2 Pro
                                              │   old.txt   │               new.txt               │
                                              │   sec/op    │   sec/op     vs base                │
ListUnifiedResourcesFilter/labels-12            195.0m ± 6%   173.3m ± 2%  -11.12% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12    230.6m ± 3%   210.8m ± 4%   -8.57% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12   229.8m ± 5%   209.0m ± 7%   -9.08% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      600.6m ± 3%   364.3m ± 1%  -39.35% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      687.0m ± 2%   423.4m ± 1%  -38.37% (p=0.000 n=10)
geomean                                         335.7m        259.5m       -22.69%

                                              │    old.txt     │               new.txt                │
                                              │      B/op      │     B/op      vs base                │
ListUnifiedResourcesFilter/labels-12              868.8Ki ± 0%   414.1Ki ± 0%  -52.34% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12      7.731Mi ± 0%   7.287Mi ± 0%   -5.74% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12     7.731Mi ± 0%   7.288Mi ± 0%   -5.74% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      22106.9Ki ± 0%   486.8Ki ± 0%  -97.80% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      22107.0Ki ± 0%   486.3Ki ± 0%  -97.80% (p=0.000 n=10)
geomean                                           7.494Mi        1.371Mi       -81.70%

                                              │    old.txt    │               new.txt               │
                                              │   allocs/op   │  allocs/op   vs base                │
ListUnifiedResourcesFilter/labels-12             11.401k ± 0%   4.865k ± 0%  -57.32% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_path-12      161.5k ± 0%   155.0k ± 0%   -4.04% (p=0.000 n=10)
ListUnifiedResourcesFilter/predicate_index-12     161.5k ± 0%   155.0k ± 0%   -4.04% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_lower-12      162.212k ± 0%   5.277k ± 0%  -96.75% (p=0.000 n=10)
ListUnifiedResourcesFilter/search_upper-12      162.224k ± 0%   5.275k ± 0%  -96.75% (p=0.000 n=10)
geomean                                           95.22k        20.07k       -78.93%

```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants