Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ACR MSI cross-subscription authentication error #77245

Merged
merged 2 commits into from Apr 30, 2019

Conversation

@norshtein
Copy link
Contributor

commented Apr 30, 2019

What type of PR is this?
/kind bug

What this PR does / why we need it:
Method Provide() in interface DockerConfigProvider now takes image name as its parameter(added in a recent PR #75587). So we can parse the login server from the image name, and then interact with the login server to get docker login credential.
Which issue(s) this PR fixes:

Fixes #67892

Special notes for your reviewer:

Does this PR introduce a user-facing change?: no user-facing change

release note:

fix issue that pull image failed from a cross-subscription Azure Container Registry when using MSI to authenticate

/sig azure
/assign @andyzhangx @feiskyer

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Apr 30, 2019

Hi @norshtein. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot requested review from feiskyer and khenidak Apr 30, 2019

@andyzhangx

This comment has been minimized.

Copy link
Member

commented Apr 30, 2019

/ok-to-test

@andyzhangx
Copy link
Member

left a comment

could you also check what if ACR is a public repo, does this PR also work? And what if that image is a docker hub image ...

klog.V(4).Infof("listing registries")
result, err := a.registryClient.List(ctx)
// Read login server from image
parts := strings.Split(image, "/")

This comment has been minimized.

Copy link
@andyzhangx

andyzhangx Apr 30, 2019

Member

would you use regexp.MustCompile to parse login server, here is an example:

matches := lunPathRE.FindStringSubmatch(deviceInfo)

And also there should be error handling in case image is an invalid image url

This comment has been minimized.

Copy link
@norshtein

norshtein Apr 30, 2019

Author Contributor

I have switched to use regular expression. For a public ACR, below code will handle for it
https://github.com/kubernetes/kubernetes/pull/77245/files#diff-65860720abfa0057bc9fe725fde21bedR220 . For other image registries, they will be filtered out.


if a.config.UseManagedIdentityExtension {
klog.V(4).Infof("listing registries")
result, err := a.registryClient.List(ctx)
// Read login server from image

This comment has been minimized.

Copy link
@andyzhangx

andyzhangx Apr 30, 2019

Member

provide an example format for image address and also print out the original image url in the log

This comment has been minimized.

Copy link
@norshtein

norshtein Apr 30, 2019

Author Contributor

Added an example in comment


if a.config.UseManagedIdentityExtension {
klog.V(4).Infof("listing registries")
result, err := a.registryClient.List(ctx)
// Read login server from image

This comment has been minimized.

Copy link
@feiskyer

feiskyer Apr 30, 2019

Member

should filter out non-ACR images here.

This comment has been minimized.

Copy link
@norshtein

norshtein Apr 30, 2019

Author Contributor

Code updated.

@feiskyer

This comment has been minimized.

Copy link
Member

commented Apr 30, 2019

@norshtein Could you also add a unit test for this change?

/priority important-soon

@feiskyer feiskyer added this to In progress in Provider Azure via automation Apr 30, 2019

@feiskyer feiskyer moved this from In progress to Needs Review in Provider Azure Apr 30, 2019

@norshtein

This comment has been minimized.

Copy link
Contributor Author

commented Apr 30, 2019

@norshtein Could you also add a unit test for this change?

/priority important-soon

If the credential type is MSI, it will interact with MSI endpoint to get a token and write the token to DockerConfigEntry: https://github.com/kubernetes/kubernetes/pull/77245/files#diff-65860720abfa0057bc9fe725fde21bedR255 . It is not a static token similar as service principal. Seems it's hard to add unit test for it...

@andyzhangx

This comment has been minimized.

Copy link
Member

commented Apr 30, 2019

for the unit test, I think you could move following code into a seperate function, e.g. isACRImage:

		// Parse login server from parameter `image`. An example of parameter `image`: foo.azurecr.io/bar/imageName:version
		regEx, err := regexp.Compile(`.*\.azurecr\.io|.*\.azurecr\.cn|.*\.azurecr\.de|.*\.azurecr\.us`)
		if err != nil {
			klog.Error("Unexpected error: failed to construct regular expression: %v", err)
		}
		match := regEx.FindAllString(image, -1)
		if len(match) != 1 {
			klog.V(4).Infof("Found the image is not in acr, will not generate the token using MSI")
		}

Then you could add all kinds of unit tests, that would be testable

@norshtein

This comment has been minimized.

Copy link
Contributor Author

commented Apr 30, 2019

Cool. Thanks for the reminding, will do.

if err != nil {
klog.Errorf("Failed to list registries: %v", err)
return cfg
klog.Error("Unexpected error: failed to construct regular expression: %v", err)

This comment has been minimized.

Copy link
@andyzhangx

andyzhangx Apr 30, 2019

Member

only log error here?

This comment has been minimized.

Copy link
@norshtein

norshtein Apr 30, 2019

Author Contributor

Please refer to latest commit.

This comment has been minimized.

Copy link
@norshtein

norshtein Apr 30, 2019

Author Contributor

It would be OK to only log error, the error is not fatal and we can't do anything about it. For the worst case, cfg is not correctly filled and pulling image will fail, which is acceptable.

klog.V(2).Infof("loginServer: %s", loginServer)
match := regEx.FindAllString(image, -1)
if len(match) != 1 {
klog.V(4).Infof("Found the image is not in acr, will not generate the token using MSI")

This comment has been minimized.

Copy link
@andyzhangx

andyzhangx Apr 30, 2019

Member

klog.V(4).Infof("image(%s) is not from ACR, skip MSI authentication", image)

This comment has been minimized.

Copy link
@norshtein

norshtein Apr 30, 2019

Author Contributor

Updated.

cred, err := getACRDockerEntryFromARMToken(a, loginServer)
if err != nil {
continue
klog.Errorf("Failed to get docker entry: %v", err)

This comment has been minimized.

Copy link
@andyzhangx

andyzhangx Apr 30, 2019

Member

klog.Errorf("getACRDockerEntryFromARMToken(%s) failed with %v", loginServer, err)

This comment has been minimized.

Copy link
@norshtein

norshtein Apr 30, 2019

Author Contributor

Updated.

@k8s-ci-robot k8s-ci-robot added size/M and removed size/S labels Apr 30, 2019

@@ -93,4 +93,17 @@ func Test(t *testing.T) {
t.Errorf("Missing expected registry: %s", registryName)
}
}

invalidImage := "invalidImage"

This comment has been minimized.

Copy link
@andyzhangx

andyzhangx Apr 30, 2019

Member

pls add a separate unit test, you could refer to follow unit test as example:

This comment has been minimized.

Copy link
@norshtein

norshtein Apr 30, 2019

Author Contributor

Added in a new function.

This comment has been minimized.

Copy link
@andyzhangx

andyzhangx Apr 30, 2019

Member

@norshtein I mean also copy the test code style like following, it's more maintainable:

	tests := []struct {
		ID        string
		expected  string
		expectErr bool
	}{
		{
			ID:        "",
			expected:  "",
			expectErr: true,
		},
		{
			ID:        "foo/",
			expected:  "",
			expectErr: true,
		},
		{
			ID:        "foo/bar",
			expected:  "bar",
			expectErr: false,
		},
		{
			ID:        "foo/bar/baz",
			expected:  "baz",
			expectErr: false,
		},
	}

	for _, test := range tests {
		s, e := getLastSegment(test.ID)
		if test.expectErr && e == nil {
			t.Errorf("Expected err, but it was nil")
			continue
		}
		if !test.expectErr && e != nil {
			t.Errorf("Unexpected error: %v", e)
			continue
		}
		if s != test.expected {
			t.Errorf("expected: %s, got %s", test.expected, s)
		}
	}
}
// If the provided image is not an acr image, this function will return empty string
func parseACRLoginServerFromImage(image string) string {
// Parse login server from parameter `image`. An example of parameter `image`: foo.azurecr.io/bar/imageName:version
regEx, err := regexp.Compile(`.*\.azurecr\.io|.*\.azurecr\.cn|.*\.azurecr\.de|.*\.azurecr\.us`)

This comment has been minimized.

Copy link
@andyzhangx

andyzhangx Apr 30, 2019

Member

I think use regexp.MustCompile is better, define that var as global, if there is error, it won't pass on e2e test, no need to check error every time:

lunPathRE = regexp.MustCompile(`/dev/disk/azure/scsi(?:.*)/lun(.+)`)

@@ -229,6 +223,10 @@ func getLoginServer(registry containerregistry.Registry) string {
}

func getACRDockerEntryFromARMToken(a *acrProvider, loginServer string) (*credentialprovider.DockerConfigEntry, error) {
// Run EnsureFresh to make sure the token is valid and does not expire
if err := a.servicePrincipalToken.EnsureFresh(); err != nil {
klog.Errorf("Failed to ensure fresh service principal token: %v", err)

This comment has been minimized.

Copy link
@andyzhangx

andyzhangx Apr 30, 2019

Member

shall we return error here? If return error, no need to log every error

@@ -182,26 +183,19 @@ func (a *acrProvider) Enabled() bool {
}

func (a *acrProvider) Provide(image string) credentialprovider.DockerConfig {
klog.V(2).Infof("try to provide secret for image %s", image)

This comment has been minimized.

Copy link
@andyzhangx

andyzhangx Apr 30, 2019

Member

klog.V(4).Infof is better, V(2) is the dispalying log level by default, it would flush the log file

@andyzhangx

This comment has been minimized.

Copy link
Member

commented Apr 30, 2019

BTW, pls sqush all commits into one @norshtein

@norshtein norshtein force-pushed the norshtein:acr-msi-fix branch from b7ee036 to 376eb0a Apr 30, 2019

@norshtein

This comment has been minimized.

Copy link
Contributor Author

commented Apr 30, 2019

Updated and rebased. PTAL @andyzhangx

// parseACRLoginServerFromImage takes image as parameter and returns login server of it.
// If the provided image is not an acr image, this function will return empty string
func parseACRLoginServerFromImage(image string) string {
// Parse login server from parameter `image`. An example of parameter `image`: foo.azurecr.io/bar/imageName:version

This comment has been minimized.

Copy link
@andyzhangx

andyzhangx Apr 30, 2019

Member

remove this comment since it's already exists in acrRE

@norshtein norshtein force-pushed the norshtein:acr-msi-fix branch from 376eb0a to 26770f2 Apr 30, 2019

@norshtein norshtein changed the title fix: fix ACR MSI cross-subscription authentication error Fix ACR MSI cross-subscription authentication error Apr 30, 2019

@norshtein norshtein force-pushed the norshtein:acr-msi-fix branch from 26770f2 to b5cdb78 Apr 30, 2019

@andyzhangx
Copy link
Member

left a comment

looks much better now, one small comment about test coverage.

expected: "",
},
{
image: "foo.azurecr.io/bar/image:version",

This comment has been minimized.

Copy link
@andyzhangx

andyzhangx Apr 30, 2019

Member

add test for azurecr.cn

This comment has been minimized.

Copy link
@norshtein

norshtein Apr 30, 2019

Author Contributor

Added.

@norshtein norshtein force-pushed the norshtein:acr-msi-fix branch from e132cfc to a2f4f51 Apr 30, 2019

@andyzhangx
Copy link
Member

left a comment

/lgtm
@feiskyer do you have any comments?

@k8s-ci-robot k8s-ci-robot added the lgtm label Apr 30, 2019

@feiskyer
Copy link
Member

left a comment

/lgtm
/approve

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Apr 30, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: feiskyer, norshtein

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@norshtein

This comment has been minimized.

Copy link
Contributor Author

commented Apr 30, 2019

Many thanks for reviewing! @andyzhangx @feiskyer 😄

@andyzhangx

This comment has been minimized.

Copy link
Member

commented Apr 30, 2019

/test pull-kubernetes-kubemark-e2e-gce-big 

@fejta-bot

This comment has been minimized.

Copy link

commented Apr 30, 2019

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@k8s-ci-robot k8s-ci-robot merged commit c419640 into kubernetes:master Apr 30, 2019

20 checks passed

cla/linuxfoundation norshtein authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-conformance-image-test Skipped.
pull-kubernetes-cross Skipped.
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-csi-serial Skipped.
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gce-storage-slow Skipped.
pull-kubernetes-godeps Skipped.
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped.
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
pull-publishing-bot-validate Skipped.
tide In merge pool.
Details

Provider Azure automation moved this from Needs Review to Done Apr 30, 2019

if len(match) == 1 {
return match[0]
}
return ""

This comment has been minimized.

Copy link
@tedyu

tedyu Apr 30, 2019

Contributor

Should there be a log for len(match) > 1 ?

This comment has been minimized.

Copy link
@norshtein

norshtein May 1, 2019

Author Contributor

The image name is already logged here. len(match) > 1 indicates a wrong image is provided, credential provider can't handle this, in later code flow, no credential will be provided and this error will be caught outside the credential provider when trying to pull the image. Does this make sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.