Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ACR MSI cross-subscription authentication error #77245

Merged
merged 2 commits into from
Apr 30, 2019

Conversation

norshtein
Copy link
Contributor

What type of PR is this?
/kind bug

What this PR does / why we need it:
Method Provide() in interface DockerConfigProvider now takes image name as its parameter(added in a recent PR #75587). So we can parse the login server from the image name, and then interact with the login server to get docker login credential.
Which issue(s) this PR fixes:

Fixes #67892

Special notes for your reviewer:

Does this PR introduce a user-facing change?: no user-facing change

release note:

fix issue that pull image failed from a cross-subscription Azure Container Registry when using MSI to authenticate

/sig azure
/assign @andyzhangx @feiskyer

@k8s-ci-robot k8s-ci-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Apr 30, 2019
@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. sig/azure cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 30, 2019
@k8s-ci-robot
Copy link
Contributor

Hi @norshtein. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@andyzhangx
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 30, 2019
Copy link
Member

@andyzhangx andyzhangx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you also check what if ACR is a public repo, does this PR also work? And what if that image is a docker hub image ...

klog.V(4).Infof("listing registries")
result, err := a.registryClient.List(ctx)
// Read login server from image
parts := strings.Split(image, "/")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you use regexp.MustCompile to parse login server, here is an example:

matches := lunPathRE.FindStringSubmatch(deviceInfo)

And also there should be error handling in case image is an invalid image url

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have switched to use regular expression. For a public ACR, below code will handle for it
https://github.com/kubernetes/kubernetes/pull/77245/files#diff-65860720abfa0057bc9fe725fde21bedR220 . For other image registries, they will be filtered out.


if a.config.UseManagedIdentityExtension {
klog.V(4).Infof("listing registries")
result, err := a.registryClient.List(ctx)
// Read login server from image
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

provide an example format for image address and also print out the original image url in the log

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an example in comment


if a.config.UseManagedIdentityExtension {
klog.V(4).Infof("listing registries")
result, err := a.registryClient.List(ctx)
// Read login server from image
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should filter out non-ACR images here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code updated.

@feiskyer
Copy link
Member

@norshtein Could you also add a unit test for this change?

/priority important-soon

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Apr 30, 2019
@feiskyer feiskyer added this to In progress in Provider Azure via automation Apr 30, 2019
@feiskyer feiskyer moved this from In progress to Needs Review in Provider Azure Apr 30, 2019
@norshtein
Copy link
Contributor Author

@norshtein Could you also add a unit test for this change?

/priority important-soon

If the credential type is MSI, it will interact with MSI endpoint to get a token and write the token to DockerConfigEntry: https://github.com/kubernetes/kubernetes/pull/77245/files#diff-65860720abfa0057bc9fe725fde21bedR255 . It is not a static token similar as service principal. Seems it's hard to add unit test for it...

@andyzhangx
Copy link
Member

andyzhangx commented Apr 30, 2019

for the unit test, I think you could move following code into a seperate function, e.g. isACRImage:

		// Parse login server from parameter `image`. An example of parameter `image`: foo.azurecr.io/bar/imageName:version
		regEx, err := regexp.Compile(`.*\.azurecr\.io|.*\.azurecr\.cn|.*\.azurecr\.de|.*\.azurecr\.us`)
		if err != nil {
			klog.Error("Unexpected error: failed to construct regular expression: %v", err)
		}
		match := regEx.FindAllString(image, -1)
		if len(match) != 1 {
			klog.V(4).Infof("Found the image is not in acr, will not generate the token using MSI")
		}

Then you could add all kinds of unit tests, that would be testable

@norshtein
Copy link
Contributor Author

Cool. Thanks for the reminding, will do.

if err != nil {
klog.Errorf("Failed to list registries: %v", err)
return cfg
klog.Error("Unexpected error: failed to construct regular expression: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only log error here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refer to latest commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be OK to only log error, the error is not fatal and we can't do anything about it. For the worst case, cfg is not correctly filled and pulling image will fail, which is acceptable.

klog.V(2).Infof("loginServer: %s", loginServer)
match := regEx.FindAllString(image, -1)
if len(match) != 1 {
klog.V(4).Infof("Found the image is not in acr, will not generate the token using MSI")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

klog.V(4).Infof("image(%s) is not from ACR, skip MSI authentication", image)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

cred, err := getACRDockerEntryFromARMToken(a, loginServer)
if err != nil {
continue
klog.Errorf("Failed to get docker entry: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

klog.Errorf("getACRDockerEntryFromARMToken(%s) failed with %v", loginServer, err)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Apr 30, 2019
@@ -93,4 +93,17 @@ func Test(t *testing.T) {
t.Errorf("Missing expected registry: %s", registryName)
}
}

invalidImage := "invalidImage"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls add a separate unit test, you could refer to follow unit test as example:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in a new function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@norshtein I mean also copy the test code style like following, it's more maintainable:

	tests := []struct {
		ID        string
		expected  string
		expectErr bool
	}{
		{
			ID:        "",
			expected:  "",
			expectErr: true,
		},
		{
			ID:        "foo/",
			expected:  "",
			expectErr: true,
		},
		{
			ID:        "foo/bar",
			expected:  "bar",
			expectErr: false,
		},
		{
			ID:        "foo/bar/baz",
			expected:  "baz",
			expectErr: false,
		},
	}

	for _, test := range tests {
		s, e := getLastSegment(test.ID)
		if test.expectErr && e == nil {
			t.Errorf("Expected err, but it was nil")
			continue
		}
		if !test.expectErr && e != nil {
			t.Errorf("Unexpected error: %v", e)
			continue
		}
		if s != test.expected {
			t.Errorf("expected: %s, got %s", test.expected, s)
		}
	}
}

// If the provided image is not an acr image, this function will return empty string
func parseACRLoginServerFromImage(image string) string {
// Parse login server from parameter `image`. An example of parameter `image`: foo.azurecr.io/bar/imageName:version
regEx, err := regexp.Compile(`.*\.azurecr\.io|.*\.azurecr\.cn|.*\.azurecr\.de|.*\.azurecr\.us`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think use regexp.MustCompile is better, define that var as global, if there is error, it won't pass on e2e test, no need to check error every time:

lunPathRE = regexp.MustCompile(`/dev/disk/azure/scsi(?:.*)/lun(.+)`)

@@ -229,6 +223,10 @@ func getLoginServer(registry containerregistry.Registry) string {
}

func getACRDockerEntryFromARMToken(a *acrProvider, loginServer string) (*credentialprovider.DockerConfigEntry, error) {
// Run EnsureFresh to make sure the token is valid and does not expire
if err := a.servicePrincipalToken.EnsureFresh(); err != nil {
klog.Errorf("Failed to ensure fresh service principal token: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we return error here? If return error, no need to log every error

@@ -182,26 +183,19 @@ func (a *acrProvider) Enabled() bool {
}

func (a *acrProvider) Provide(image string) credentialprovider.DockerConfig {
klog.V(2).Infof("try to provide secret for image %s", image)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

klog.V(4).Infof is better, V(2) is the dispalying log level by default, it would flush the log file

@andyzhangx
Copy link
Member

BTW, pls sqush all commits into one @norshtein

@norshtein
Copy link
Contributor Author

Updated and rebased. PTAL @andyzhangx

// parseACRLoginServerFromImage takes image as parameter and returns login server of it.
// If the provided image is not an acr image, this function will return empty string
func parseACRLoginServerFromImage(image string) string {
// Parse login server from parameter `image`. An example of parameter `image`: foo.azurecr.io/bar/imageName:version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this comment since it's already exists in acrRE

@norshtein norshtein changed the title fix: fix ACR MSI cross-subscription authentication error Fix ACR MSI cross-subscription authentication error Apr 30, 2019
Copy link
Member

@andyzhangx andyzhangx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks much better now, one small comment about test coverage.

expected: "",
},
{
image: "foo.azurecr.io/bar/image:version",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add test for azurecr.cn

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

Copy link
Member

@andyzhangx andyzhangx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
@feiskyer do you have any comments?

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 30, 2019
Copy link
Member

@feiskyer feiskyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: feiskyer, norshtein

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 30, 2019
@norshtein
Copy link
Contributor Author

Many thanks for reviewing! @andyzhangx @feiskyer 😄

@andyzhangx
Copy link
Member

/test pull-kubernetes-kubemark-e2e-gce-big 

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@k8s-ci-robot k8s-ci-robot merged commit c419640 into kubernetes:master Apr 30, 2019
Provider Azure automation moved this from Needs Review to Done Apr 30, 2019
if len(match) == 1 {
return match[0]
}
return ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a log for len(match) > 1 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The image name is already logged here. len(match) > 1 indicates a wrong image is provided, credential provider can't handle this, in later code flow, no credential will be provided and this error will be caught outside the credential provider when trying to pull the image. Does this make sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
Development

Successfully merging this pull request may close these issues.

pull image from a cross-subscription azure container registry does not work by MSI
6 participants