Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor hostname resolution for SSH connections via the WebUI #35773

Merged
merged 2 commits into from
Jan 3, 2024

Conversation

rosstimothy
Copy link
Contributor

@rosstimothy rosstimothy commented Dec 15, 2023

Previously all hostname resolution was performed by sending either the hostname or server uuid to Auth in a GetSSHTargets request. The returned information was then sent to the UI so that the console tab can be updated to include user@host. While this worked, it required a round trip to Auth that had to iterate through the entire set of nodes to do the resolution. During a concurrent session test that tried to spawn several thousand web sessions it was discovered that this causes a massive spike in CPU on Auth.

There are two reasons that Auth was used to resolve the hostname.

  1. The proxy doesn't perform RBAC on behalf of the user. Instead the proxy has an Auth client with the users identity.
  2. The UI used to provide an input box that users could manually enter the target host.

Since we no longer allow open dialing and the input box to enter a connection string manually has been removed we no longer need to worry about the second case. To avoid the round trip to Auth we can use the local Proxy cache to look up the hostname if we wait until AFTER the ssh connection to the target is established. At this point we can be sure that the user does have access to the target since the node allowed the connection.

@rosstimothy rosstimothy added the no-changelog Indicates that a PR does not require a changelog entry label Dec 15, 2023
@rosstimothy rosstimothy force-pushed the tross/resolve_hostname_locally branch 6 times, most recently from 821af64 to 650a804 Compare December 22, 2023 13:30
@rosstimothy rosstimothy force-pushed the tross/resolve_hostname_locally branch from 650a804 to 8cede18 Compare January 2, 2024 21:20
@rosstimothy rosstimothy marked this pull request as ready for review January 3, 2024 20:26
@github-actions github-actions bot requested review from Tener and zmb3 January 3, 2024 20:27
lib/web/terminal.go Outdated Show resolved Hide resolved
lib/web/terminal.go Outdated Show resolved Hide resolved
@public-teleport-github-review-bot public-teleport-github-review-bot bot removed the request for review from Tener January 3, 2024 20:58
Previously all hostname resolution was performed by sending either
the hostname or server uuid to Auth in a GetSSHTargets request.
The returned information was then sent to the UI so that the console
tab can be updated to include `user@host`. While this worked, it
required a round trip to Auth that had to iterate through the entire
set of nodes to do the resolution. During a concurrent session test
that tried to spawn several thousand web sessions it was discovered
that this cause a massive spike in CPU on Auth.

There are two reasons that Auth was used to resolve the hostname.
 1) The proxy doesn't perform RBAC on behalf of the user. Instead
    the proxy has an Auth client with the users identity.
 2) The UI used to provide an input box that users could manually
    enter the target host.

Since we no longer allow open dialing and the input box to enter
a connection string manually has been removed we no longer need
to worry about the second case. To avoid the round trip to auth
we can use the local Proxy cache to look up the hostname if we
wait until AFTER the ssh connection to the target is established.
At this point we can be sure that the user does have access to
the target since the node allowed the connection.
The testing utilities used to mimic the web ui half of the websocket
made assumptions that the session metadata would be the first message
provided. However, now that hostname resolution occurs after the
initial SSH connection has been established it is possible for other
messages to be sent first. This caused some tests to fail and others
to hang forever.

The duplicated terminal logic from (WebSuite) makeTerminal and
(testProxy makeTerminal has now consolidated into a single test
terminal utility. The new `terminal` now wraps a TerminalStream
which allows test code to make use of the message processing logic
already in place there instead of having to write custom logic
per test. All a test needs to do is provide handlers for any messages
that it desires to introspect.
@rosstimothy rosstimothy force-pushed the tross/resolve_hostname_locally branch from 8cede18 to c4df6d1 Compare January 3, 2024 21:08
@zmb3
Copy link
Collaborator

zmb3 commented Jan 3, 2024

/excludeflake *

@rosstimothy rosstimothy added this pull request to the merge queue Jan 3, 2024
Merged via the queue into master with commit ce4f415 Jan 3, 2024
33 checks passed
@rosstimothy rosstimothy deleted the tross/resolve_hostname_locally branch January 3, 2024 21:56
Envek added a commit to Envek/teleport that referenced this pull request Jan 4, 2024
…se-anon-key

* origin/master: (344 commits)
  Undelete CreateHostUserMode_HOST_USER_MODE_DROP (gravitational#36273)
  allow cwd to be changed in difftest (gravitational#35946)
  Auth device list component (gravitational#36235)
  make unified resources responsive (gravitational#35961)
  Support running Teleport in a "hot reload" mode (gravitational#35040)
  Prevent deleting enum values, allow deleting enum reservations in types.proto (gravitational#36248)
  Remove support for legacy (Amazon Linux 2) AMIs (gravitational#36153)
  Bump version(s) used for teleport-lab and teleport-quickstart (gravitational#36167)
  Allow Reconciler update handler to examine old value during update (gravitational#36171)
  Validate the user still exists during account reset (gravitational#35676)
  ButtonTextWithAddIcon shared component (gravitational#36103)
  Refactor hostname resolution for SSH connections via the WebUI (gravitational#35773)
  add structuredClone to jest JSDOMEnvironment (gravitational#36213)
  fix flaky `lib/auth` cache-enabled tests (gravitational#36216)
  Report resource usage counts by handling heartbeat events (gravitational#35968)
  Reviewer bot should use the stable version of Go (gravitational#36242)
  RFD 0153 Resource Guidelines (gravitational#34103)
  Use cmp and cmpots properly in operator tests (gravitational#36215)
  Relax Kubernetes CRD discovery when building cache (gravitational#36214)
  Add Access List messages to TAG protobuf (gravitational#36176)
  ...
rosstimothy added a commit that referenced this pull request Feb 16, 2024
The changes to hostname resolution introduced in #35773 were always
using the root cache to lookup a matching node, which is never
going to find a match if the target is in the leaf cluster. This
is now resolved by using the `services.NodeWatcher` of the cluster
that the target node is a member of.
rosstimothy added a commit that referenced this pull request Feb 16, 2024
The changes to hostname resolution introduced in #35773 were always
using the root cache to lookup a matching node, which is never
going to find a match if the target is in the leaf cluster. This
is now resolved by using the `services.NodeWatcher` of the cluster
that the target node is a member of.
rosstimothy added a commit that referenced this pull request Feb 16, 2024
The changes to hostname resolution introduced in #35773 were always
using the root cache to lookup a matching node, which is never
going to find a match if the target is in the leaf cluster. This
is now resolved by using the `services.NodeWatcher` of the cluster
that the target node is a member of.
github-merge-queue bot pushed a commit that referenced this pull request Feb 20, 2024
The changes to hostname resolution introduced in #35773 were always
using the root cache to lookup a matching node, which is never
going to find a match if the target is in the leaf cluster. This
is now resolved by using the `services.NodeWatcher` of the cluster
that the target node is a member of.
github-actions bot pushed a commit that referenced this pull request Feb 20, 2024
The changes to hostname resolution introduced in #35773 were always
using the root cache to lookup a matching node, which is never
going to find a match if the target is in the leaf cluster. This
is now resolved by using the `services.NodeWatcher` of the cluster
that the target node is a member of.
github-merge-queue bot pushed a commit that referenced this pull request Feb 27, 2024
The changes to hostname resolution introduced in #35773 were always
using the root cache to lookup a matching node, which is never
going to find a match if the target is in the leaf cluster. This
is now resolved by using the `services.NodeWatcher` of the cluster
that the target node is a member of.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-changelog Indicates that a PR does not require a changelog entry size/sm
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants