Teleport 16 Test Plan #42118

r0mant · 2024-05-29T00:59:39Z

Manual Testing Plan

Below are the items that should be manually tested with each release of Teleport.
These tests should be run on both a fresh installation of the version to be released
as well as an upgrade of the previous version of Teleport.

User accounting @atburke

Verify that active interactive sessions are tracked in /var/run/utmp on Linux.
Verify that interactive sessions are logged in /var/log/wtmp on Linux.
Verify that failed logins are logged in /var/log/btmp on Linux.

Combinations @Joerger

For some manual testing, many combinations need to be tested. For example, for
interactive sessions the 12 combinations are below.

Add an agentless Node in a local cluster.
- Connect using OpenSSH.
- Connect using Teleport.
- Connect using the Web UI.
- Remove the Node (but keep its custom CA in sshd config).
  - Verify that it fails to connect when using OpenSSH.
  - Verify that it fails to connect when using Teleport.
  - Verify that it fails to connect when using the Web UI.
Add a Teleport Node in a local cluster.
- Connect using OpenSSH.
- Connect using Teleport.
- Connect using the Web UI.
Add an agentless Node in a remote (leaf) cluster.
- Connect using OpenSSH from root cluster.
- Connect using Teleport from root cluster.
- Connect using the Web UI from root cluster.
- Remove the Node (but keep its custom CA in sshd config).
  - Verify that it fails to connect when using OpenSSH from root cluster.
  - Verify that it fails to connect when using Teleport from root cluster.
  - Verify that it fails to connect when using the Web UI from root cluster.
Add a Teleport Node in a remote (leaf) cluster.
- Connect using OpenSSH from root cluster.
- Connect using Teleport from root cluster.
- Connect using the Web UI from root cluster.

Teleport with EKS/GKE @AntonAM

Deploy Teleport on a single EKS cluster
Deploy Teleport on two EKS clusters and connect them via trusted cluster feature
Deploy Teleport Proxy outside GKE cluster fronting connections to it (use this script to generate a kubeconfig)
Deploy Teleport Proxy outside EKS cluster fronting connections to it (use this script to generate a kubeconfig)

Teleport with multiple Kubernetes clusters @tigrato

Note: you can use GKE or EKS or minikube to run Kubernetes clusters.
Minikube is the only caveat - it's not reachable publicly so don't run a proxy there.

Kubernetes exec via WebSockets/SPDY @AntonAM

To control usage of websockets on kubectl side environment variable KUBECTL_REMOTE_COMMAND_WEBSOCKETS can be used:
KUBECTL_REMOTE_COMMAND_WEBSOCKETS=true kubectl -v 8 exec -n namespace podName -- /bin/bash --version. With -v 8 logging level
you should be able to see X-Stream-Protocol-Version: v5.channel.k8s.io in case kubectl is connected over websockets to Teleport.
To do tests you'll need kubectl version at least 1.29, Kubernetes cluster v1.29 or less (doesn't support websockets stream protocol v5)
and cluster v1.30 (does support it by default) and to access them both through kube agent and kubeconfig each.

Kubernetes auto-discovery @AntonAM

Kubernetes Secret Storage @AntonAM

Kubernetes Secret storage for Agent's Identity
- Install Teleport agent with a short-lived token
  - Validate if the Teleport is installed as a Kubernetes Statefulset
  - Restart the agent after token TTL expires to see if it reuses the same identity.
- Force cluster CA rotation

Kubernetes Pod RBAC @AntonAM

Teleport with FIPS mode @bl-nero

Perform trusted clusters, Web and SSH sanity check with all teleport components deployed in FIPS mode.

ACME @bl-nero

Teleport can fetch TLS certificate automatically using ACME protocol.

Migrations @tigrato

Migrate trusted clusters
- Migrate auth server on main cluster, then rest of the servers on main cluster
  SSH should work for both main and old clusters
- Migrate auth server on remote cluster, then rest of the remote cluster
  SSH should work

Command Templates

When interacting with a cluster, the following command templates are useful:

OpenSSH

# when connecting to the recording proxy, `-o 'ForwardAgent yes'` is required.
ssh -o "ProxyCommand ssh -o 'ForwardAgent yes' -p 3023 %r@proxy.example.com -s proxy:%h:%p" \
  node.example.com

# the above command only forwards the agent to the proxy, to forward the agent
# to the target node, `-o 'ForwardAgent yes'` needs to be passed twice.
ssh -o "ForwardAgent yes" \
  -o "ProxyCommand ssh -o 'ForwardAgent yes' -p 3023 %r@proxy.example.com -s proxy:%h:%p" \
  node.example.com

# when connecting to a remote cluster using OpenSSH, the subsystem request is
# updated with the name of the remote cluster.
ssh -o "ProxyCommand ssh -o 'ForwardAgent yes' -p 3023 %r@proxy.example.com -s proxy:%h:%p@foo.com" \
  node.foo.com

Teleport

# when connecting to a OpenSSH node, remember `-p 22` needs to be passed.
tsh --proxy=proxy.example.com --user=<username> --insecure ssh -p 22 node.example.com

# an agent can be forwarded to the target node with `-A`
tsh --proxy=proxy.example.com --user=<username> --insecure ssh -A -p 22 node.example.com

# the --cluster flag is used to connect to a node in a remote cluster.
tsh --proxy=proxy.example.com --user=<username> --insecure ssh --cluster=foo.com -p 22 node.foo.com

Teleport with SSO Providers

GitHub External SSO @capnspacehook

Teleport OSS
- GitHub organization without external SSO succeeds
- GitHub organization with external SSO fails
Teleport Enterprise
- GitHub organization without external SSO succeeds
- GitHub organization with external SSO succeeds

`tctl sso` family of commands @Tener

For help with setting up sso connectors, check out the [Quick GitHub/SAML/OIDC Setup Tips]

tctl sso configure helps to construct a valid connector definition:

tctl sso configure github ... creates valid connector definitions
tctl sso configure oidc ... creates valid connector definitions
tctl sso configure saml ... creates valid connector definitions

tctl sso test test a provided connector definition, which can be loaded from
file or piped in with tctl sso configure or tctl get --with-secrets. Valid
connectors are accepted, invalid are rejected with sensible error messages.

SSO login on remote host @atburke

SSO login on a remote host

tsh should be running on a remote host (e.g. over an SSH session) and use the
local browser to complete and SSO login. Run
tsh login --callback <remote.host>:<port> --bind-addr localhost:<port> --auth <auth>
on the remote host. Note that the --callback URL must be able to resolve to the
--bind-addr over HTTPS.

Teleport Plugins @EdwardDowling

Teleport Operator @hugoShaka

Test deploying a Teleport cluster with the teleport-cluster Helm chart and the operator enabled
Test deploying a standalone operator against Teleport Cloud
Test that operator can reconcile
- TeleportUser
- TeleportRole
- TeleportProvisionToken

AWS Node Joining @hugoShaka

Docs

On EC2 instance with ec2:DescribeInstances permissions for local account:
TELEPORT_TEST_EC2=1 go test ./integration -run TestEC2NodeJoin
On EC2 instance with any attached role:
TELEPORT_TEST_EC2=1 go test ./integration -run TestIAMNodeJoin
EC2 Join method in IoT mode with node and auth in different AWS accounts
IAM Join method in IoT mode with node and auth in different AWS accounts

Kubernetes Node Joining @hugoShaka

Join a Teleport node running in the same Kubernetes cluster via a Kubernetes in-cluster ProvisionToken
Join a tbot instance running in a different Kubernetes cluster as Teleport with a Kubernetes JWKS ProvisionToken

Azure Node Joining @marcoandredinis

Docs

Join a Teleport node running in an Azure VM

GCP Node Joining @marcoandredinis

Docs

Join a Teleport node running in a GCP VM.

Cloud Labels @atburke

Create an EC2 instance with tags in instance metadata enabled
and with tag foo: bar. Verify that a node running on the instance has label
aws/foo=bar.
Create an Azure VM with tag foo: bar. Verify that a node running on the
instance has label azure/foo=bar.
Create a GCP instance with the required permissions
and with label
foo: bar and tag
baz: quux. Verify that a node running on the instance has labels
gcp/label/foo=bar and gcp/tag/baz=quux.

Passwordless @codingllama

This feature has additional build requirements, so it should be tested with a
pre-release build (eg: https://cdn.teleport.dev/tsh-v16.0.0-alpha.2.pkg).

This sections complements "Users -> Managing MFA devices". tsh binaries for
each operating system (Linux, macOS and Windows) must be tested separately for
FIDO2 items.

Device Trust @codingllama

Device Trust requires Teleport Enterprise.

This feature has additional build requirements, so it should be tested with a
pre-release build (eg: https://cdn.teleport.dev/teleport-ent-v16.0.0-alpha.2-linux-amd64-bin.tar.gz).

Client-side enrollment requires a signed tsh for macOS, make sure to use the
tsh binary from tsh.app.

Additionally, Device Trust Web requires Teleport Connect to be installed (device
authentication for the Web is handled by Connect).

A simple formula for testing device authorization is:

# Before enrollment.
# Replace with other kinds of access, as appropriate (db, kube, etc)
tsh ssh node-that-requires-device-trust
> ERROR: ssh: rejected: administratively prohibited (unauthorized device)

# Register/enroll the device.
tsh device enroll --current-device
tsh logout; tsh login

# After enrollment
tsh ssh node-that-requires-device-trust
> $

[enforcing-device-trust]: https://goteleport.com/docs/access-controls/device-trust/enforcing-device-trust/#app-access-support).

Hardware Key Support @Joerger

Hardware Key Support is an Enterprise feature and is not available for OSS.

You will need a YubiKey 4.3+ to test this feature.

This feature has additional build requirements, so it should be tested with a pre-release build (eg: https://cdn.teleport.dev/teleport-ent-v16.0.0-alpha.2-linux-amd64-bin.tar.gz).

Server Access

This test should be carried out on Linux, MacOS, and Windows.

Set auth_service.authentication.require_session_mfa: hardware_key_touch in your cluster auth settings and login.

tsh login
- Prompts for Yubikey touch with message "Tap your YubiKey" (separate from normal MFA prompt).
Server Access tsh ssh
- Requires yubikey to be connected
- Prompts for touch (if not cached)
Database Access: tsh proxy db --tunnel
- Requires yubikey to be connected
- Prompts for touch (if not cached)

HSM Support @nklaassen

Docs

Run the full test suite with each HSM/KMS:

$ make run-etcd # in background shell
$
$ # test YubiHSM
$ yubihsm-connector -d # in a background shell
$ cat /etc/yubihsm_pkcs11.conf
# /etc/yubihsm_pkcs11.conf
connector = http://127.0.0.1:12345
debug
$ TELEPORT_TEST_YUBIHSM_PKCS11_PATH=/usr/local/lib/pkcs11/yubihsm_pkcs11.dylib TELEPORT_TEST_YUBIHSM_PIN=0001password YUBIHSM_PKCS11_CONF=/etc/yubihsm_pkcs11.conf go test ./lib/auth/keystore -v --count 1
$ TELEPORT_TEST_YUBIHSM_PKCS11_PATH=/usr/local/lib/pkcs11/yubihsm_pkcs11.dylib TELEPORT_TEST_YUBIHSM_PIN=0001password YUBIHSM_PKCS11_CONF=/etc/yubihsm_pkcs11.conf TELEPORT_ETCD_TEST=1 go test ./integration/hsm -v --count 1 --timeout 20m # this takes ~12 minutes
$
$ # test AWS KMS
$ # login in to AWS locally
$ AWS_ACCOUNT="$(aws sts get-caller-identity | jq -r '.Account')"
$ TELEPORT_TEST_AWS_KMS_ACCOUNT="${AWS_ACCOUNT}" TELEPORT_TEST_AWS_REGION=us-west-2 go test ./lib/auth/keystore -v --count 1
$ TELEPORT_TEST_AWS_KMS_ACCOUNT="${AWS_ACCOUNT}" TELEPORT_TEST_AWS_REGION=us-west-2 TELEPORT_ETCD_TEST=1 go test ./integration/hsm -v --count 1
$
$ # test AWS CloudHSM
$ # set up the CloudHSM cluster and run this on an EC2 that can reach it
$ TELEPORT_TEST_CLOUDHSM_PIN="<CU_username>:<CU_password>" go test ./lib/auth/keystore -v --count 1
$ TELEPORT_TEST_CLOUDHSM_PIN="<CU_username>:<CU_password>" TELEPORT_ETCD_TEST=1 go test ./integration/hsm -v --count 1
$
$ # test GCP KMS
$ # login in to GCP locally
$ TELEPORT_TEST_GCP_KMS_KEYRING=projects/<account>/locations/us-west3/keyRings/<keyring> go test ./lib/auth/keystore -v --count 1
$ TELEPORT_TEST_GCP_KMS_KEYRING=projects/<account>/locations/us-west3/keyRings/<keyring> TELEPORT_ETCD_TEST=1 go test ./integration/hsm -v --count 1

Moderated session @rosstimothy

Create two Teleport users, a moderator and a user. Configure Teleport roles to require that the moderator moderate the user's sessions. Use TELEPORT_HOME to tsh login as the user in one terminal, and the moderator in another.

Ensure the default terminationPolicy of terminate has not been changed.

For each of the following cases, create a moderated session with the user using tsh ssh and join this session with the moderator using tsh join --role moderator:

Ensure that Ctrl+C in the user terminal disconnects the moderator as the session has ended.
Ensure that Ctrl+C in the moderator terminal disconnects the moderator and terminates the user's session as the session no longer has a moderator.
Ensure that t in the moderator terminal terminates the session for all participants.

Performance @rosstimothy @fspmarshall @espadolini

Scaling Test

Scale up the number of nodes/clusters a few times for each configuration below.

Verify that there are no memory/goroutine/file descriptor leaks
Compare the baseline metrics with the previous release to determine if resource usage has increased
Restart all Auth instances and verify that all nodes/clusters reconnect

Perform reverse tunnel node scaling tests for all backend configurations:

etcd - 10k
DynamoDB - 10k
Firestore - 10k
Postgres - 10k

Perform the following additional scaling tests on DynamoDB:

10k direct dial nodes.
500 trusted clusters.

Soak Test

Run 30 minute soak test directly against direct and tunnel nodes
and via label based matching. Tests should be run against a Cloud
tenant.

tsh bench ssh --duration=30m user@direct-dial-node ls
tsh bench ssh --duration=30m user@reverse-tunnel-node ls
tsh bench ssh --duration=30m user@foo=bar ls
tsh bench ssh --duration=30m --random user@foo ls

Concurrent Session Test

Cluster with 1k reverse tunnel nodes

Run a concurrent session test that will spawn 5 interactive sessions per node in the cluster:

tsh bench web sessions --max=5000 user ls
tsh bench web sessions --max=5000 --web user ls

Verify that all 5000 sessions are able to be established.
Verify that tsh and the web UI are still functional.

Robustness

Connectivity Issues:

Verify that a lack of connectivity to Auth does not prevent access to
resources which do not require a moderated session and in async recording
mode from an already issued certificate.
Verify that a lack of connectivity to Auth prevents access to resources
which require a moderated session and in async recording mode from an already
issued certificate.
Verify that an open session is not terminated when all Auth instances
are restarted.

Teleport with Cloud Providers

AWS @camscale

Deploy Teleport to AWS. Using DynamoDB & S3
Deploy Teleport Enterprise to AWS. Using HA Setup https://goteleport.com/docs/deploy-a-cluster/deployments/aws-ha-autoscale-cluster-terraform/

GCP @marcoandredinis

Deploy Teleport to GCP. Using Cloud Firestore & Cloud Storage
Deploy Teleport to GKE. Google Kubernetes engine.
Deploy Teleport Enterprise to GCP.

IBM @hugoShaka

Deploy Teleport to IBM Cloud. Using IBM Database for etcd & IBM Object Store
Deploy Teleport to IBM Cloud Kubernetes.
Deploy Teleport Enterprise to IBM Cloud.

Application Access @gabrielcorado

Database Access @greedy52

TLS Routing @greedy52

Verify that teleport proxy v2 configuration starts only a single listener for proxy service, in contrast with v1 configuration.
Given configuration: @GavinFrazar

version: v2
proxy_service:
  enabled: "yes"
  public_addr: ['root.example.com']
  web_listen_addr: 0.0.0.0:3080

There should be total of three listeners, with only *:3080 for proxy service. Given the configuration above, 3022 and 3025 will be opened for other services.

lsof -i -P | grep teleport | grep LISTEN
  teleport  ...  TCP *:3022 (LISTEN)
  teleport  ...  TCP *:3025 (LISTEN)
  teleport  ...  TCP *:3080 (LISTEN) # <-- proxy service

In contrast for the same configuration with version v1, there should be additional ports 3023 and 3024.

lsof -i -P | grep teleport | grep LISTEN
  teleport  ...  TCP *:3022 (LISTEN)
  teleport  ...  TCP *:3025 (LISTEN)
  teleport  ...  TCP *:3023 (LISTEN) # <-- extra proxy service port
  teleport  ...  TCP *:3024 (LISTEN) # <-- extra proxy service port
  teleport  ...  TCP *:3080 (LISTEN) # <-- proxy service

Run Teleport Proxy in multiplex mode auth_service.proxy_listener_mode: "multiplex" @GavinFrazar
- Trusted cluster
  - Setup trusted clusters using single port setup web_proxy_addr == tunnel_addr
```
kind: trusted_cluster
spec:
  ...
  web_proxy_addr: root.example.com:443
  tunnel_addr: root.example.com:443
  ...
```
Database Access
- Verify that tsh db connect works through proxy running in multiplex mode
  - Postgres @Tener
  - MySQL @Tener
  - MariaDB @Tener
  - MongoDB @GavinFrazar
  - CockroachDB @Tener
  - Redis @greedy52
  - MSSQL @gabrielcorado
  - Snowflake @gabrielcorado
  - Elasticsearch. @Tener
  - OpenSearch. @Tener
  - Cassandra/ScyllaDB. @Tener
  - Oracle. @Tener
- Verify connecting to a database through TLS ALPN SNI local proxy tsh proxy db with a GUI client. @GavinFrazar
- Verify connecting to a database through Teleport Connect. @GavinFrazar
Application Access @GavinFrazar
- Verify app access through proxy running in multiplex mode
SSH Access @GavinFrazar
- Connect to a OpenSSH server through a local ssh proxy ssh -o "ForwardAgent yes" -o "ProxyCommand tsh proxy ssh" user@host.example.com
- Connect to a OpenSSH server on leaf-cluster through a local ssh proxyssh -o "ForwardAgent yes" -o "ProxyCommand tsh proxy ssh --user=%r --cluster=leaf-cluster %h:%p" user@node.foo.com
- Verify tsh ssh access through proxy running in multiplex mode
Kubernetes access: @GavinFrazar
- Verify kubernetes access through proxy running in multiplex mode, using tsh
- Verify kubernetes access through Teleport Connect
Teleport Proxy single port multiplex mode behind L7 load balancer
- Agent can join through Proxy and maintain reverse tunnel @GavinFrazar
- tsh login and tctl @GavinFrazar
- SSH Access: tsh ssh and tsh config @GavinFrazar
- Database Access: tsh proxy db and tsh db connect @GavinFrazar
- Application Access: tsh proxy app and tsh aws @GavinFrazar
- Kubernetes Access: tsh proxy kube @GavinFrazar

The text was updated successfully, but these errors were encountered:

r0mant · 2024-05-29T00:59:57Z

Desktop Access @probakowski @ibeckermayer

Binaries / OS compatibility

Verify that our software runs on the minimum supported OS versions as per
https://goteleport.com/docs/installation/#operating-system-support

Windows @ravicious

tsh runs on the minimum supported Windows version
Teleport Connect runs on the minimum supported Windows version

Azure offers virtual machines with the Windows 10 2016 LTSB image. This image runs on Windows 10
rev. 1607, which is the exact minimum Windows version that we support.

macOS @camscale

tsh runs on the minimum supported macOS version
tctl runs on the minimum supported macOS version
teleport runs on the minimum supported macOS version
tbot runs on the minimum supported macOS version
Teleport Connect runs on the minimum supported macOS version

Linux @camscale

tsh runs on the minimum supported Linux version
tctl runs on the minimum supported Linux version
teleport runs on the minimum supported Linux version
tbot runs on the minimum supported Linux version
Teleport Connect runs on the minimum supported Linux version

Machine ID @timothyb89

Verify you are able to create a new bot user with tctl bots add robot --roles=access. Follow the instructions provided in the output to start tbot
- Directly connecting to the auth server
- Connecting to the auth server via the proxy reverse tunnel
Verify that after the renewal period (default 20m, but this can be reduced via configuration), that newly generated certificates are placed in the destination directory
Verify that sending both SIGUSR1 and SIGHUP to a running tbot process causes a renewal and new certificates to be generated

With an SSH node registered to the Teleport cluster:

Verify you are able to connect to the SSH node using openssh with the generated ssh_config in the destination directory
Verify you are able to connect to the SSH node using tsh with the identity file in the destination directory

With a Postgres DB registered to the Teleport cluster:

Verify you are able to interact with a database using tbot db connect with a database output
Verify you are able to connect to the database using tbot proxy db with a database output
Verify you are able to produce an authenticated tunnel using tbot proxy db --tunnel with a database output and then able to connect to the database through the tunnel without credentials

With a Kubernetes cluster registered to the Teleport cluster:

Verify the kubeconfig produced by a Kubernetes output can be used to run basic commands (e.g kubectl get pods)

With a HTTP application registered to the Teleport cluster:

Verify the certificates produced by an application output can be used directly against the proxy (e.g curl --cert ./out/tlscert --key ./out/key https://httpbin.teleport.example.com/headers)
Verify you are able to produce an authenticated tunnel using tbot proxy app httpbin with an application output and then able to connect to the application through the tunnel without credentials curl localhost:port/headers

Host users creation @atburke

Host users creation docs
Host users creation RFD

Verify host users creation functionality
- non-existing users are created automatically
- users are added to groups
  - non-existing configured groups are created
- created users are added to the teleport-system group
- users are cleaned up after their session ends
- cleanup occurs if a program was left running after session ends
- sudoers file creation is successful
- Invalid sudoers files are not created
- existing host users are not modified
- setting disable_create_host_user: true stops user creation from occurring

CA rotations @fspmarshall

Verify the CA rotation functionality itself (by checking in the backend or with tctl get cert_authority)
- standby phase: only active_keys, no additional_trusted_keys
- init phase: active_keys and additional_trusted_keys
- update_clients and update_servers phases: the certs from the init phase are swapped
- standby phase: only the new certs remain in active_keys, nothing in additional_trusted_keys
- rollback phase (second pass, after completing a regular rotation): same content as in the init phase
- standby phase after rollback: same content as in the previous standby phase
Verify functionality in all phases (clients might have to log in again in lieu of waiting for credentials to expire between phases)
- SSH session in tsh from a previous phase
- SSH session in web UI from a previous phase
- New SSH session with tsh
- New SSH session with web UI
- New SSH session in a child cluster on the same major version
- New SSH session in a child cluster on the previous major version
- New SSH session from a parent cluster
- Application access through a browser
- Application access through curl with tsh apps login
- kubectl get po after tsh kube login
- Database access (no configuration change should be necessary if the database CA isn't rotated, other Teleport functionality should not be affected if only the database CA is rotated)

Proxy Peering

Proxy Peering docs

Verify that Proxy Peering works for the following protocols:
- SSH @Joerger
- Kubernetes @AntonAM
- Database @greedy52
- Windows Desktop @ibeckermayer
- App Access @greedy52

SSH Connection Resumption @fspmarshall

Verify that SSH works, and that resumable SSH is not interrupted across a Teleport Cloud tenant upgrade.

	Standard node	Non-resuming node	Peered node	Agentless node
`tsh ssh`
`tsh ssh --no-resume`
Teleport Connect
Web UI (not resuming)
OpenSSH (standard `tsh config`)
OpenSSH (changing `ProxyCommand` to `tsh proxy ssh --no-resume`)

Verify that SSH works, and that resumable SSH is not interrupted across a control plane restart (of either the root or the leaf cluster).

	Tunnel node	Direct dial node
`tsh ssh`
`tsh ssh --no-resume`
`tsh ssh` (from a root cluster)
`tsh ssh --no-resume` (from a root cluster)
OpenSSH (without `ProxyCommand`)	n/a
OpenSSH's `ssh-keyscan`	n/a

EC2 Discovery @marcoandredinis

EC2 Discovery docs

Verify EC2 instance discovery
- Only EC2 instances matching given AWS tags have the installer executed on them
- Only the IAM permissions mentioned in the discovery docs are required for operation
- Custom scripts specified in different matchers are executed
- Custom SSM documents specified in different matchers are executed
- New EC2 instances with matching AWS tags are discovered and added to the teleport cluster
  - Large numbers of EC2 instances (51+) are all successfully added to the cluster
- Nodes that have been discovered do not have the install script run on the node multiple times

Azure Discovery @marcoandredinis

Azure Discovery docs

Verify Azure VM discovery
- Only Azure VMs matching given Azure tags have the installer executed on them
- Only the IAM permissions mentioned in the discovery docs are required for operation
- Custom scripts specified in different matchers are executed
- New Azure VMs with matching Azure tags are discovered and added to the teleport cluster
  - Large numbers of Azure VMs (51+) are all successfully added to the cluster
- Nodes that have been discovered do not have the install script run on the node multiple times

GCP Discovery @lxea

GCP Discovery docs

Verify GCP instance discovery
- Only GCP instances matching given GCP tags have the installer executed on them
- Only the IAM permissions mentioned in the discovery docs are required for operation
- Custom scripts specified in different matchers are executed
- New GCP instances with matching GCP tags are discovered and added to the teleport cluster
  - Large numbers of GCP instances (51+) are all successfully added to the cluster
- Nodes that have been discovered do not have the install script run on the node multiple times

IP Pinning @AntonAM

Add a role with pin_source_ip: true (requires Enterprise) to test IP pinning.
Testing will require changing your IP (that Teleport Proxy sees).
Docs: IP Pinning

Verify that it works for SSH Access
- You can access tunnel node with tsh ssh on root cluster
- You can access direct access node with tsh ssh on root cluster
- You can access tunnel node from Web UI on root cluster
- You can access direct access node from Web UI on root cluster
- You can access tunnel node with tsh ssh on leaf cluster
- You can access direct access node with tsh ssh on leaf cluster
- You can access tunnel node from Web UI on leaf cluster
- You can access direct access node from Web UI on leaf cluster
- You can download files from nodes in Web UI (small arrows at top left corner)
- If you change your IP you no longer can access nodes.
Verify that it works for Kube Access
- You can access Kubernetes cluster through standalone Kube service on root cluster
- You can access Kubernetes cluster through agent inside Kubernetes on root cluster
- You can access Kubernetes cluster through standalone Kube service on leaf cluster
- You can access Kubernetes cluster through agent inside Kubernetes on leaf cluster
- If you change your IP you no longer can access Kube clusters.
Verify that it works for DB Access
- You can access DB servers on root cluster
- You can access DB servers on leaf cluster
- If you change your IP you no longer can access DB servers.
Verify that it works for App Access
- You can access App service on root cluster
- You can access App service on leaf cluster
- If you change your IP you no longer can access App services.
Verify that it works for Desktop Access
- You can access Desktop service on root cluster
- You can access Desktop service on leaf cluster
- If you change your IP you no longer can access Desktop services.

Assist @jakule

Assist is not supported by tsh and WebUI is the only way to use it.
Assist test plan is in the core section instead of WebUI as most functionality is implemented in the core.

Configuration
- Assist is disabled by default (OSS, Enterprise)
- Assist can be enabled in the configuration file.
- Assist is disabled in the Cloud.
- Assist is enabled by default in the Cloud Team plan.
- Assist is always disabled when etcd is used as a backend.
SSH integration
- Assist icon is visible in WebUI's Terminal
- A Bash command can be generated in the above window.
- When an output is selected in the Terminal "Explain" option is available, and it generates the summary.

IGS @smallinsky

Teleport SAML Identity Provider @flyinghermit

Verify SAML IdP service provider resource management.

Docs:

Verify SAML IdP guide instructions work.

Manage Service Provider (SP)

saml_idp_service_provider resource can be created, updated and deleted with tctl create/update/delete sp.yaml command.
- SP can be created with name and entity descriptor.
- SP can be created with name, entity_id, acs_url.
  - Verify Entity descriptor is generated.
- Verify attribute mapping configuration works.
- Verify test attribute mapping command. $ tctl idp saml test-attribute-mapping --users <usernames or name of file containing user spec> --sp <name of file containing user spec> --format <json/yaml/defaults to text>

SAML service provider catalog

GCP Workforce Identity Federation
- Verify guided flow works end-to-end, signing into GCP web console from Teleport resource page.
- Verify that when a SAML resource is created with preset value preset: gcp-workforce, Teleport adds
  relay state relay_state: https://console.cloud.google/ value in the resulting resource spec.

Resources

Quick GitHub/SAML/OIDC Setup Tips

ravicious · 2024-05-29T13:49:10Z

Successful login with TOTP logs an error in auth server #42131

nklaassen · 2024-05-30T23:34:10Z

"Cancel and Logout" button does nothing #42209

atburke · 2024-05-31T00:06:40Z

SSH jump host fails for leaf cluster in separate port mode #42210

timothyb89 · 2024-05-31T23:27:57Z

When using the config generated by tsh config, ssh to agentless nodes on a remote cluster fails known hosts check #42252
tsh ssh -R fails to open socket: unexpected user UID for the socket: 0 #42254
ssh host verification to leaf clusters fails when using tsh config when session recording mode set to proxy #42256
tbot proxy app fails: identity is not allowed to reissue certificates #42257

bl-nero · 2024-06-03T18:11:58Z

Unable to connect to an agentless instance using tsh after changing its hostname #42315

capnspacehook · 2024-06-04T04:11:45Z

Misleading error when joining SSH session using Web UI in proxy recording mode #42346

greedy52 · 2024-06-04T15:31:50Z

tsh aws ssm start-session --target <instance-id> fails when KMS encryption is enabled on Session Manager #42371

Not a regression though.

codingllama · 2024-06-04T17:29:56Z

Users without a second factor are locked out in the transition to v16 #42386

Arguable if this is something we need to address, but it seemed better to "document" it anyway.

hugoShaka · 2024-06-04T18:02:12Z

Theme picker text is incorrect by default #42395

Nitpick: the theme picker text is bugged

ibeckermayer · 2024-06-05T00:45:54Z

Pause during Desktop playback sometimes fails to stop progress bar #42467

bl-nero · 2024-06-05T11:58:29Z

Unable to connect to RDP if Windows Desktop Service node name is different from its FQDN #42476

hugoShaka · 2024-06-05T19:12:22Z

Non blocking: etcd-backend emits warning logs on IBM #42511

ibeckermayer · 2024-06-05T21:43:40Z

Desktop sessions are recorded even if all of a user's roles disable recording #42522

ibeckermayer · 2024-06-06T18:20:26Z

Fixes withheld TDP messages in proxy #42578

rosstimothy · 2024-06-10T13:08:09Z

Performance Test Results

Cloud

Load Tests

10k Resources

Soak Tests

Origin: us-east-1 Target: us-east-1

tsh bench ssh --duration=30m root@node-agents-5b8c8bb49-zzh6r-09 /busybox/ls -lah /

* Requests originated: 17998
* Requests failed: 0

Histogram

Percentile Response Duration 
---------- ----------------- 
25         241 ms            
50         250 ms            
75         262 ms            
90         305 ms            
95         393 ms            
99         1286 ms           
100        4959 ms

Origin: us-west-2 Target: us-east-1

tsh bench ssh --duration=30m root@node-agents-5b8c8bb49-zzh6r-09 /busybox/ls -lah /

* Requests originated: 17992
* Requests failed: 0

Histogram

Percentile Response Duration 
---------- ----------------- 
25         879 ms            
50         890 ms            
75         905 ms            
90         952 ms            
95         1196 ms           
99         1795 ms           
100        2997 ms

etcd¹

30k Resources

500 Trusted Clusters

Postgres¹

30k Resources

Firestore¹

30k Resources

30k tests were performed using the simulated method described in the v14 Test Plan ↩ ↩² ↩³

greedy52 · 2024-06-11T20:47:15Z

Database Access load test (PostgreSQL and MySQL)

Setup

same as previous test but in ca-central-1.

EKS with a single node group:

Min: 2, Max: 10 instances.
Instance class: m5.4xlarge
Kubernetes version: 1.27

Teleport cluster (all deployed on the EKS cluster):

DynamoDB backend
3 Auth servers
3 Proxies instances
1 Database Agent

Databases:

Single PostgreSQL RDS instance on a db.t4g.xlarge instance class. Accessed through RDS Proxy with single RW endpoint.
Single MySQL RDS instance on a db.t4g.xlarge instance class. Accessed through RDS Proxy with single RW endpoint.

Note: Databases were configured using discovery running inside the database agent.

tsh bench commands were executed inside the cluster.

MySQL

10 connections/second (90 Percentile 80ms)

# tsh bench mysql mysql-proxy-rdsproxy --db-user=mysql --db-name=mysql --rate=10 --duration=30m

* Requests originated: 17999
* Requests failed: 0

Histogram

Percentile Response Duration 
---------- ----------------- 
25         62 ms             
50         67 ms             
75         74 ms             
90         80 ms             
95         85 ms             
99         117 ms            
100        703 ms

50 connections/second (90 Percentile 467ms)

# tsh bench mysql mysql-proxy-rdsproxy --db-user=mysql --db-name=mysql --rate=50 --duration=30m

* Requests originated: 89985
* Requests failed: 9
* Last error: io.ReadFull(header) failed. err EOF: connection was bad

Histogram

Percentile Response Duration 
---------- ----------------- 
25         164 ms            
50         246 ms            
75         349 ms            
90         467 ms            
95         552 ms            
99         736 ms            
100        1424 ms

PostgreSQL

10 connections/second (90 Percentile 93ms)

# tsh bench postgres postgres-proxy-rdsproxy --db-user=postgres --db-name=postgres --rate=10 --duration=30m

* Requests originated: 18000
* Requests failed: 0

Histogram

Percentile Response Duration 
---------- ----------------- 
25         74 ms             
50         80 ms             
75         87 ms             
90         93 ms             
95         99 ms             
99         201 ms            
100        1077 ms

50 connections/second (90 Percentile 499ms)

# tsh bench postgres postgres-proxy-rdsproxy --db-user=postgres --db-name=postgres --rate=50 --duration=30m

* Requests originated: 89986
* Requests failed: 27
* Last error: failed to connect to `host=127.0.0.1 user=teleport database=teleport`: failed to receive message (unexpected EOF)

Histogram

Percentile Response Duration 
---------- ----------------- 
25         183 ms            
50         269 ms            
75         375 ms            
90         499 ms            
95         586 ms            
99         791 ms            
100        2217 ms

Database Access resources count test

Setup

This is an one-time manual setup:

AWS ALB
DynamoDB backend
proxy: 2 x c6a.xlarge (4cpu, 8gb)
auth: 2 x c6a.2xlarge (8cpu, 16gb)
test: 1 x m6a.2xlarge (8cpu, 32gb)
Real database agents using systemctl

500 databases per agent, 50k keepalives

5k unique db resources in total.
Cloud Watch Dashboard

Timestamp	Agent Count	`db_server` Count	Auth CPU%	Auth Mem%	Proxy CPU%	Proxy Mem%
17:30	20	10,000	3%	5%	3%	5%
18:20	50	25,000	7%	11%	6%	6%
18:50	100	50,000	15%	22%	12%	8%

20 databases per agent, 10k keepalives

1k unique db resources in total.

Cloud Watch Dashboard

Timestamp	Agent Count	`db_server` Count	Auth CPU%	Auth Mem%	Proxy CPU%	Proxy Mem%
16:00	100	2,000	1%	3%	1%	5%
16:20	250	5,000	2%	4.5%	3%	6%
16:40	500	10,000	3%	6%	3%	8%
17:00	0	0	<1%	3%	<1%	3 %

GavinFrazar · 2024-06-11T21:23:44Z

null db access permissions allows one connection only #42804

Tener · 2024-06-12T15:11:17Z

Statically configured Oracle database breaks tctl #42845

greedy52 · 2024-06-12T19:07:45Z

Oracle access failed through trusted cluster #42878

Found by @Tener

r0mant added the test-plan A list of tasks required to ship a successful product release. label May 29, 2024

This comment was marked as off-topic.

Sign in to view

zmb3 closed this as completed Jun 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Teleport 16 Test Plan #42118

Teleport 16 Test Plan #42118

r0mant commented May 29, 2024 •

edited by greedy52

Loading

r0mant commented May 29, 2024 •

edited by lxea

Loading

ravicious commented May 29, 2024

nklaassen commented May 30, 2024

atburke commented May 31, 2024

timothyb89 commented May 31, 2024 •

edited

Loading

This comment was marked as off-topic.

bl-nero commented Jun 3, 2024

capnspacehook commented Jun 4, 2024

greedy52 commented Jun 4, 2024

codingllama commented Jun 4, 2024

hugoShaka commented Jun 4, 2024 •

edited

Loading

ibeckermayer commented Jun 5, 2024

bl-nero commented Jun 5, 2024

hugoShaka commented Jun 5, 2024

ibeckermayer commented Jun 5, 2024

ibeckermayer commented Jun 6, 2024

rosstimothy commented Jun 10, 2024

greedy52 commented Jun 11, 2024

GavinFrazar commented Jun 11, 2024

Tener commented Jun 12, 2024

greedy52 commented Jun 12, 2024 •

edited

Loading

Teleport 16 Test Plan #42118

Teleport 16 Test Plan #42118

Comments

r0mant commented May 29, 2024 • edited by greedy52 Loading

Manual Testing Plan

User accounting @atburke

Combinations @Joerger

Teleport with EKS/GKE @AntonAM

Teleport with multiple Kubernetes clusters @tigrato

Kubernetes exec via WebSockets/SPDY @AntonAM

Kubernetes auto-discovery @AntonAM

Kubernetes Secret Storage @AntonAM

Kubernetes Pod RBAC @AntonAM

Teleport with FIPS mode @bl-nero

ACME @bl-nero

Migrations @tigrato

Command Templates

OpenSSH

Teleport

Teleport with SSO Providers

GitHub External SSO @capnspacehook

tctl sso family of commands @Tener

SSO login on remote host @atburke

Teleport Plugins @EdwardDowling

Teleport Operator @hugoShaka

AWS Node Joining @hugoShaka

Kubernetes Node Joining @hugoShaka

Azure Node Joining @marcoandredinis

GCP Node Joining @marcoandredinis

Cloud Labels @atburke

Passwordless @codingllama

Device Trust @codingllama

Hardware Key Support @Joerger

Server Access

HSM Support @nklaassen

Moderated session @rosstimothy

Performance @rosstimothy @fspmarshall @espadolini

Scaling Test

Soak Test

Concurrent Session Test

Robustness

Teleport with Cloud Providers

AWS @camscale

GCP @marcoandredinis

IBM @hugoShaka

Application Access @gabrielcorado

Database Access @greedy52

TLS Routing @greedy52

r0mant commented May 29, 2024 • edited by lxea Loading

Desktop Access @probakowski @ibeckermayer

Binaries / OS compatibility

Windows @ravicious

macOS @camscale

Linux @camscale

Machine ID @timothyb89

Host users creation @atburke

CA rotations @fspmarshall

Proxy Peering

SSH Connection Resumption @fspmarshall

EC2 Discovery @marcoandredinis

Azure Discovery @marcoandredinis

GCP Discovery @lxea

IP Pinning @AntonAM

Assist @jakule

IGS @smallinsky

Teleport SAML Identity Provider @flyinghermit

Docs:

Manage Service Provider (SP)

SAML service provider catalog

Resources

ravicious commented May 29, 2024

nklaassen commented May 30, 2024

atburke commented May 31, 2024

timothyb89 commented May 31, 2024 • edited Loading

This comment was marked as off-topic.

bl-nero commented Jun 3, 2024

capnspacehook commented Jun 4, 2024

greedy52 commented Jun 4, 2024

codingllama commented Jun 4, 2024

hugoShaka commented Jun 4, 2024 • edited Loading

r0mant commented May 29, 2024 •

edited by greedy52

Loading

`tctl sso` family of commands @Tener

r0mant commented May 29, 2024 •

edited by lxea

Loading

timothyb89 commented May 31, 2024 •

edited

Loading

hugoShaka commented Jun 4, 2024 •

edited

Loading

etcd¹

Postgres¹

Firestore¹

greedy52 commented Jun 12, 2024 •

edited

Loading