Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aad fallback to real auth if refresh token fails, fixes #82776 #86481

Merged

Conversation

tdihp
Copy link
Contributor

@tdihp tdihp commented Dec 20, 2019

Signed-off-by: Ping He tdihp@hotmail.com

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespace from that line:

/kind api-change
/kind bug
/kind cleanup
/kind deprecation
/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #82776

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Azure auth module for kubectl now requests login after refresh token expires.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. kind/bug Categorizes issue or PR as related to a bug. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Dec 20, 2019
@k8s-ci-robot
Copy link
Contributor

Welcome @tdihp!

It looks like this is your first PR to kubernetes/kubernetes 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/kubernetes has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Contributor

Hi @tdihp. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/auth Categorizes an issue or PR as relevant to SIG Auth. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Dec 20, 2019
@tanjunchen
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 21, 2019
@tdihp
Copy link
Contributor Author

tdihp commented Dec 23, 2019

@weinong

@feiskyer
Copy link
Member

/kind bug
/area provider/azure
/priority important-soon

@k8s-ci-robot k8s-ci-robot added area/provider/azure Issues or PRs related to azure provider priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Dec 23, 2019
@feiskyer
Copy link
Member

Could you remove fixes #82776 from commit message?

Copy link
Member

@feiskyer feiskyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good in general, could you add unit tests for it?

/retest

@feiskyer
Copy link
Member

@weinong would you like to have a look?

if err != nil {
return nil, fmt.Errorf("storing the refreshed token in configuration: %v", err)
return nil, fmt.Errorf("acquiring a new fresh token: %v", err)
Copy link
Contributor

@weinong weinong Dec 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Failed to acquire new token: %v

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original code was always using ": %v", err for err message, maybe I can refactor it in this PR?


// 3. extend if valid token but expired
if token != nil && token.token.IsExpired() {
klog.V(4).Infof("Refreshing token.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Info()


// set cfg to cache
if err == nil {
klog.V(4).Infof("Saving cache after load cfg")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Info()

@feiskyer
Copy link
Member

/assign @liggitt

@feiskyer
Copy link
Member

feiskyer commented Mar 1, 2020

/milestone v1.18

@k8s-ci-robot k8s-ci-robot added this to the v1.18 milestone Mar 1, 2020
@smourapina
Copy link

@liggitt, @feiskyer: Leaving a friendly reminder that we are 1 day away from code freeze (5 March EOD). Please keep this in mind if you still plan this PR for milestone 1.18. Thanks!

}
if !token.token.IsExpired() {

if err == nil {
ts.cache.setToken(azureTokenKey, token)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will now set an expired token in the cache... is that what we want?

if err != nil {
return nil, fmt.Errorf("refreshing the expired token: %v", err)
if _, ok := err.(adal.TokenRefreshError); ok {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this use autorest.IsTokenRefreshError() instead?

@enj
Copy link
Member

enj commented Mar 5, 2020

Here is how I would approach it (while trying to avoid deeply nested if statements and only returning early on error cases).

Diff from current PR:

diff --git staging/src/k8s.io/client-go/plugin/pkg/client/auth/azure/azure.go staging/src/k8s.io/client-go/plugin/pkg/client/auth/azure/azure.go
index f97bc6d2309..4949a757f1a 100644
--- staging/src/k8s.io/client-go/plugin/pkg/client/auth/azure/azure.go
+++ staging/src/k8s.io/client-go/plugin/pkg/client/auth/azure/azure.go
@@ -214,29 +214,24 @@ func (ts *azureTokenSource) Token() (*azureToken, error) {
 
 	// retrieve from config if no cache
 	if token == nil {
-		token, err = ts.retrieveTokenFromCfg()
-
+		tokenFromCfg, err := ts.retrieveTokenFromCfg()
 		if err == nil {
-			ts.cache.setToken(azureTokenKey, token)
+			token = tokenFromCfg
 		}
 	}
 
-	// quick exit for caching scenario
-	if token != nil && !token.token.IsExpired() {
-		return token, nil
-	}
-
 	if token != nil && token.token.IsExpired() {
 		klog.V(4).Info("Refreshing token.")
 		token, err = ts.Refresh(token)
-		if err != nil {
-			if _, ok := err.(adal.TokenRefreshError); ok {
-				// When Refresh fails, token will be reset to nil
-				// so that the inner token source will be used to acquire new
-				klog.V(4).Infof("Failed to refresh expired token, proceed to auth: %v", err)
-			} else {
-				return nil, fmt.Errorf("unexpected error when refreshing token: %v", err)
-			}
+		switch {
+		case err == nil:
+			// all good
+		case autorest.IsTokenRefreshError(err):
+			// When Refresh fails, token will be reset to nil
+			// so that the inner token source will be used to acquire new
+			klog.V(4).Infof("Failed to refresh expired token, proceed to auth: %v", err)
+		default:
+			return nil, fmt.Errorf("unexpected error when refreshing token: %v", err)
 		}
 	}
 
@@ -245,17 +240,23 @@ func (ts *azureTokenSource) Token() (*azureToken, error) {
 		if err != nil {
 			return nil, fmt.Errorf("failed acquiring new token: %v", err)
 		}
-		// corner condition, newly got token is valid but expired
-		if token.token.IsExpired() {
-			return nil, fmt.Errorf("newly acquired token is expired")
-		}
 	}
 
-	ts.cache.setToken(azureTokenKey, token)
+	// sanity check
+	if token == nil {
+		return nil, fmt.Errorf("unable to acquire token")
+	}
+
+	// corner condition, newly got token is valid but expired
+	if token.token.IsExpired() {
+		return nil, fmt.Errorf("newly acquired token is expired")
+	}
+
 	err = ts.storeTokenInCfg(token)
 	if err != nil {
 		return nil, fmt.Errorf("storing the refreshed token in configuration: %v", err)
 	}
+	ts.cache.setToken(azureTokenKey, token)
 
 	return token, nil
 }

Diff from master:

diff --git staging/src/k8s.io/client-go/plugin/pkg/client/auth/azure/azure.go staging/src/k8s.io/client-go/plugin/pkg/client/auth/azure/azure.go
index 11ef7afb213..4949a757f1a 100644
--- staging/src/k8s.io/client-go/plugin/pkg/client/auth/azure/azure.go
+++ staging/src/k8s.io/client-go/plugin/pkg/client/auth/azure/azure.go
@@ -180,6 +180,7 @@ type azureToken struct {
 
 type tokenSource interface {
 	Token() (*azureToken, error)
+	Refresh(*azureToken) (*azureToken, error)
 }
 
 type azureTokenSource struct {
@@ -210,33 +211,53 @@ func (ts *azureTokenSource) Token() (*azureToken, error) {
 
 	var err error
 	token := ts.cache.getToken(azureTokenKey)
+
+	// retrieve from config if no cache
 	if token == nil {
-		token, err = ts.retrieveTokenFromCfg()
-		if err != nil {
-			token, err = ts.source.Token()
-			if err != nil {
-				return nil, fmt.Errorf("acquiring a new fresh token: %v", err)
-			}
-		}
-		if !token.token.IsExpired() {
-			ts.cache.setToken(azureTokenKey, token)
-			err = ts.storeTokenInCfg(token)
-			if err != nil {
-				return nil, fmt.Errorf("storing the token in configuration: %v", err)
-			}
+		tokenFromCfg, err := ts.retrieveTokenFromCfg()
+		if err == nil {
+			token = tokenFromCfg
 		}
 	}
-	if token.token.IsExpired() {
-		token, err = ts.refreshToken(token)
-		if err != nil {
-			return nil, fmt.Errorf("refreshing the expired token: %v", err)
+
+	if token != nil && token.token.IsExpired() {
+		klog.V(4).Info("Refreshing token.")
+		token, err = ts.Refresh(token)
+		switch {
+		case err == nil:
+			// all good
+		case autorest.IsTokenRefreshError(err):
+			// When Refresh fails, token will be reset to nil
+			// so that the inner token source will be used to acquire new
+			klog.V(4).Infof("Failed to refresh expired token, proceed to auth: %v", err)
+		default:
+			return nil, fmt.Errorf("unexpected error when refreshing token: %v", err)
 		}
-		ts.cache.setToken(azureTokenKey, token)
-		err = ts.storeTokenInCfg(token)
+	}
+
+	if token == nil {
+		token, err = ts.source.Token()
 		if err != nil {
-			return nil, fmt.Errorf("storing the refreshed token in configuration: %v", err)
+			return nil, fmt.Errorf("failed acquiring new token: %v", err)
 		}
 	}
+
+	// sanity check
+	if token == nil {
+		return nil, fmt.Errorf("unable to acquire token")
+	}
+
+	// corner condition, newly got token is valid but expired
+	if token.token.IsExpired() {
+		return nil, fmt.Errorf("newly acquired token is expired")
+	}
+
+	err = ts.storeTokenInCfg(token)
+	if err != nil {
+		return nil, fmt.Errorf("storing the refreshed token in configuration: %v", err)
+	}
+	ts.cache.setToken(azureTokenKey, token)
+
 	return token, nil
 }
 
@@ -314,7 +335,13 @@ func (ts *azureTokenSource) storeTokenInCfg(token *azureToken) error {
 	return nil
 }
 
-func (ts *azureTokenSource) refreshToken(token *azureToken) (*azureToken, error) {
+func (ts *azureTokenSource) Refresh(token *azureToken) (*azureToken, error) {
+	return ts.source.Refresh(token)
+}
+
+// refresh outdated token with adal.
+// adal.RefreshTokenError will be returned if error occur during refreshing.
+func (ts *azureTokenSourceDeviceCode) Refresh(token *azureToken) (*azureToken, error) {
 	env, err := azure.EnvironmentFromName(token.environment)
 	if err != nil {
 		return nil, err

@tdihp
Copy link
Contributor Author

tdihp commented Mar 8, 2020

Thanks @enj for the suggestion! It looks cleaner than my attempt! Let me try it. OK if I borrow in this PR?

@enj
Copy link
Member

enj commented Mar 9, 2020

Thanks @enj for the suggestion! It looks cleaner than my attempt! Let me try it. OK if I borrow in this PR?

Sure.

/milestone v1.19

Moving to 1.19 since we are past 1.18 code freeze.

@k8s-ci-robot k8s-ci-robot modified the milestones: v1.18, v1.19 Mar 9, 2020
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 16, 2020
@tdihp
Copy link
Contributor Author

tdihp commented Mar 16, 2020

Applied change as reviewed by @liggitt and @enj. Now token is only assigned when err is nil, and added comment on the early cache (it's to avoid frequent persist call). Kindly help review again. Thanks!

@enj
Copy link
Member

enj commented Mar 17, 2020

This LGTM. @tdihp did you confirm this by running it against AAD?

Squash the change into a single commit.

@tdihp
Copy link
Contributor Author

tdihp commented Mar 17, 2020

@enj Thanks for confirming, will do my build confirmation again and squash :)

@liggitt
Copy link
Member

liggitt commented Mar 18, 2020

lgtm, please squash to a single commit (and don't put the issue number in the commit title, it spams the issue)

… add more tests.

Signed-off-by: Ping He <tdihp@hotmail.com>
@tdihp tdihp force-pushed the feature/aad-fallback-real-auth branch from 6ac7bfa to 26c97fa Compare March 22, 2020 09:07
@tdihp
Copy link
Contributor Author

tdihp commented Mar 24, 2020

Thanks @liggitt , rebased & squashed & modified commit message.

@enj
Copy link
Member

enj commented Mar 24, 2020

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 24, 2020
@liggitt
Copy link
Member

liggitt commented Mar 24, 2020

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andyzhangx, feiskyer, liggitt, tdihp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 24, 2020
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/provider/azure Issues or PRs related to azure provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/auth Categorizes an issue or PR as relevant to SIG Auth. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kubectl: Azure AAD authentication should handle refresh token timeout