Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add kube credentials lockfile to prevent possibility of excessive login attempts #26102

Merged
merged 1 commit into from Jun 5, 2023

Conversation

AntonAM
Copy link
Contributor

@AntonAM AntonAM commented May 11, 2023

This PR adds locking mechanism for when tsh kube credentials is called. It had potential to create a lot of open browser tabs, when GUI tool like lens tries to run kube commands repeatedly after user session expired, and SSO login attempts don't succeed.
We add a lock file that on one hand is used to synchronize parallel calls, but main usage is that we don't delete it if there was an error. So when subsequent tsh kube credentials run finds this file, it aborts to not cause excessive login attempts and returns an error asking user to login manually.

Closes #22494
Closes #9450

@github-actions github-actions bot added size/md tsh tsh - Teleport's command line tool for logging into nodes running Teleport. labels May 11, 2023
@github-actions github-actions bot requested review from espadolini and Tener May 11, 2023 19:15
@AntonAM AntonAM requested a review from tigrato May 11, 2023 19:16
Comment on lines +613 to +630
// Take a lock while we're trying to issue certificate and possibly relogin
unlock, err := utils.FSTryWriteLockTimeout(ctx, kubeCredLockfilePath, 5*time.Second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you supposed to hold a fs lock while doing remote I/O? utils.FSTryWriteLockTimeout tries to grab the lock every 10 milliseconds, which seems kind of a lot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my measurements, it creates pretty small additional cpu load, around 0.5-1%. And also it's not a happy path, relevant only when we need to reissue a cert. So I think it's ok in this case.

lib/auth/grpcserver.go Outdated Show resolved Hide resolved
lib/auth/grpcserver_test.go Outdated Show resolved Hide resolved
lib/utils/fs_windows.go Show resolved Hide resolved
tool/tsh/kube.go Outdated Show resolved Hide resolved
lib/client/keystore.go Outdated Show resolved Hide resolved
tool/tsh/kube.go Outdated Show resolved Hide resolved
tool/tsh/kube.go Outdated Show resolved Hide resolved
@@ -662,6 +714,9 @@ func (c *kubeCredentialsCommand) run(cf *CLIConf) error {
return trace.Wrap(err)
}

// Unlock and remove the lockfile so subsequent tsh kube credentials calls don't exit early
unlockKubeCred(true)

return c.writeKeyResponse(cf.Stdout(), k, c.kubeCluster)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of calling the unlock function multiple times, you can just capture the deleteKubeCredsLock var using a closure and once it executes, the var is read from pointer before calling unlockKubeCred

       var deleteKubeCredsLock = false
	defer func(){
 		unlockKubeCred(deleteKubeCredsLock) 
	}()

	tc, err := makeClient(cf, true)
	if err != nil {
		return trace.Wrap(err)
Expand All
	@@ -628,6 +677,9 @@ func (c *kubeCredentialsCommand) run(cf *CLIConf) error {
		}
		if crt != nil && time.Until(crt.NotAfter) > time.Minute {
			log.Debugf("Re-using existing TLS cert for Kubernetes cluster %q", c.kubeCluster)
			// Unlock and remove the lockfile so subsequent tsh kube credentials calls don't exit early
			deleteKubeCredsLock=true

			return c.writeKeyResponse(cf.Stdout(), k, c.kubeCluster)
		}
		// Otherwise, cert for this k8s cluster is missing or expired. Request
Expand Down
Expand Up
	@@ -662,6 +714,9 @@ func (c *kubeCredentialsCommand) run(cf *CLIConf) error {
		return trace.Wrap(err)
	}

	// Unlock and remove the lockfile so subsequent tsh kube credentials calls don't exit early
	deleteKubeCredsLock=true

return nil, trace.Wrap(err)
}
// Take a lock while we're trying to issue certificate and possibly relogin
unlock, err := utils.FSTryWriteLockTimeout(ctx, kubeCredLockfilePath, 5*time.Second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the operation fails because another // instance already locked the file, should we return ErrKubeCredLockfileFound?

lib/utils/fs_windows.go Outdated Show resolved Hide resolved
lib/client/keystore.go Outdated Show resolved Hide resolved
@AntonAM AntonAM force-pushed the anton/kube-cred-locking branch 2 times, most recently from ec210db to bca65b0 Compare June 1, 2023 00:09
tool/tsh/kube.go Outdated Show resolved Hide resolved
tool/tsh/kube.go Outdated Show resolved Hide resolved
tool/tsh/kube.go Outdated Show resolved Hide resolved
@AntonAM AntonAM added this pull request to the merge queue Jun 5, 2023
Merged via the queue into master with commit 8464dc2 Jun 5, 2023
22 checks passed
@AntonAM AntonAM deleted the anton/kube-cred-locking branch June 5, 2023 13:14
@public-teleport-github-review-bot

@AntonAM See the table below for backport results.

Branch Result
branch/v12 Failed
branch/v13 Create PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/branch/v12 backport/branch/v13 size/md tsh tsh - Teleport's command line tool for logging into nodes running Teleport.
Projects
None yet
4 participants