Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduced resource consumption after Auth restart #25235

Closed
rosstimothy opened this issue Apr 26, 2023 · 5 comments · Fixed by #25622
Closed

Reduced resource consumption after Auth restart #25235

rosstimothy opened this issue Apr 26, 2023 · 5 comments · Fixed by #25622
Labels
bug robustness Resistance to crashes and reliability scale Changes required to achieve 100K nodes per cluster. test-plan-problem Issues which have been surfaced by running the manual release test plan

Comments

@rosstimothy
Copy link
Contributor

  1. Start Auth/Proxy
  2. Join several nodes at once (Screenshot below was with 10k nodes)
  3. Wait for all nodes to successfully join
  4. Wait a bit for the cluster to reach steady state
  5. Stop Auth
  6. Wait for all nodes to reach degraded state
  7. Start Auth again
  8. Observe that the Proxy consumes less memory, has fewer goroutines and is handling less tunnel connections than after all the nodes joined and the cluster reached steady state

Screenshot 2023-04-25 at 6 32 35 PM

Profiles taken from steps 4, 5, and 7:
profiles.zip

@rosstimothy rosstimothy added bug scale Changes required to achieve 100K nodes per cluster. robustness Resistance to crashes and reliability labels Apr 26, 2023
@rosstimothy
Copy link
Contributor Author

I tested the same scenarios on v12.0.1 and v12.3.0

v12.0.1

12 0-auth-scale-down-test

v12.3.0

12 3 0-auth-scale-down-test

@zmb3 zmb3 added the test-plan-problem Issues which have been surfaced by running the manual release test plan label May 2, 2023
@rosstimothy
Copy link
Contributor Author

v12.1.0

12 1 0-auth-scale-down-test

v12.1.1

12 1 1-auth-scale-down-test

v12.1.5

12 1 5-auth-scale-down

v12.2.1

12 2 1-auth-scale

@zmb3
Copy link
Collaborator

zmb3 commented May 3, 2023

Based on the above, looks like the issue was introduced after 12.1.1 and before 12.1.5.

Nothing super obvious in the diff, but here's a few things worth looking at:

@rosstimothy
Copy link
Contributor Author

A git bisect between 12.1.1 and 12.1.5 revealed that the increased number of connections are a result of #23377

rosstimothy added a commit that referenced this issue May 4, 2023
The Okta client was creating a separate connection to Auth in
`auth.NewClient` instead of reusing the gRPC connection of the
api client. Since this method is used by every Teleport instance
when establishing their initial connection to Auth it poses two
problems:
 1) Auth now has one additional connection per instance
 2) Proxies require an additional trasnport channel per tunnel

Fixes #25235
github-actions bot pushed a commit that referenced this issue May 4, 2023
The Okta client was creating a separate connection to Auth in
`auth.NewClient` instead of reusing the gRPC connection of the
api client. Since this method is used by every Teleport instance
when establishing their initial connection to Auth it poses two
problems:
 1) Auth now has one additional connection per instance
 2) Proxies require an additional trasnport channel per tunnel

Fixes #25235
rosstimothy added a commit that referenced this issue May 4, 2023
The Okta client was creating a separate connection to Auth in
`auth.NewClient` instead of reusing the gRPC connection of the
api client. Since this method is used by every Teleport instance
when establishing their initial connection to Auth it poses two
problems:
 1) Auth now has one additional connection per instance
 2) Proxies require an additional trasnport channel per tunnel

Fixes #25235
rosstimothy added a commit that referenced this issue May 4, 2023
The Okta client was creating a separate connection to Auth in
`auth.NewClient` instead of reusing the gRPC connection of the
api client. Since this method is used by every Teleport instance
when establishing their initial connection to Auth it poses two
problems:
 1) Auth now has one additional connection per instance
 2) Proxies require an additional trasnport channel per tunnel

Fixes #25235
rosstimothy added a commit that referenced this issue May 4, 2023
The Okta client was creating a separate connection to Auth in
`auth.NewClient` instead of reusing the gRPC connection of the
api client. Since this method is used by every Teleport instance
when establishing their initial connection to Auth it poses two
problems:
 1) Auth now has one additional connection per instance
 2) Proxies require an additional trasnport channel per tunnel

Fixes #25235
@zmb3
Copy link
Collaborator

zmb3 commented Jul 25, 2023

For posterity - the fix for this was released in 12.3.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug robustness Resistance to crashes and reliability scale Changes required to achieve 100K nodes per cluster. test-plan-problem Issues which have been surfaced by running the manual release test plan
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants