-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ACR MSI cross-subscription authentication error #77245
Conversation
Hi @norshtein. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you also check what if ACR is a public repo, does this PR also work? And what if that image is a docker hub image ...
klog.V(4).Infof("listing registries") | ||
result, err := a.registryClient.List(ctx) | ||
// Read login server from image | ||
parts := strings.Split(image, "/") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would you use regexp.MustCompile
to parse login server, here is an example:
matches := lunPathRE.FindStringSubmatch(deviceInfo) |
And also there should be error handling in case image is an invalid image url
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have switched to use regular expression. For a public ACR, below code will handle for it
https://github.com/kubernetes/kubernetes/pull/77245/files#diff-65860720abfa0057bc9fe725fde21bedR220 . For other image registries, they will be filtered out.
|
||
if a.config.UseManagedIdentityExtension { | ||
klog.V(4).Infof("listing registries") | ||
result, err := a.registryClient.List(ctx) | ||
// Read login server from image |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
provide an example format for image address and also print out the original image url in the log
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an example in comment
|
||
if a.config.UseManagedIdentityExtension { | ||
klog.V(4).Infof("listing registries") | ||
result, err := a.registryClient.List(ctx) | ||
// Read login server from image |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should filter out non-ACR images here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code updated.
@norshtein Could you also add a unit test for this change? /priority important-soon |
If the credential type is MSI, it will interact with MSI endpoint to get a token and write the token to |
for the unit test, I think you could move following code into a seperate function, e.g.
Then you could add all kinds of unit tests, that would be testable |
Cool. Thanks for the reminding, will do. |
if err != nil { | ||
klog.Errorf("Failed to list registries: %v", err) | ||
return cfg | ||
klog.Error("Unexpected error: failed to construct regular expression: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only log error here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please refer to latest commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be OK to only log error, the error is not fatal and we can't do anything about it. For the worst case, cfg
is not correctly filled and pulling image will fail, which is acceptable.
klog.V(2).Infof("loginServer: %s", loginServer) | ||
match := regEx.FindAllString(image, -1) | ||
if len(match) != 1 { | ||
klog.V(4).Infof("Found the image is not in acr, will not generate the token using MSI") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
klog.V(4).Infof("image(%s) is not from ACR, skip MSI authentication", image)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
cred, err := getACRDockerEntryFromARMToken(a, loginServer) | ||
if err != nil { | ||
continue | ||
klog.Errorf("Failed to get docker entry: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
klog.Errorf("getACRDockerEntryFromARMToken(%s) failed with %v", loginServer, err)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
@@ -93,4 +93,17 @@ func Test(t *testing.T) { | |||
t.Errorf("Missing expected registry: %s", registryName) | |||
} | |||
} | |||
|
|||
invalidImage := "invalidImage" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls add a separate unit test, you could refer to follow unit test as example:
kubernetes/staging/src/k8s.io/legacy-cloud-providers/azure/azure_standard_test.go
Line 52 in cf22c56
func TestGetLastSegment(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in a new function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@norshtein I mean also copy the test code style like following, it's more maintainable:
tests := []struct {
ID string
expected string
expectErr bool
}{
{
ID: "",
expected: "",
expectErr: true,
},
{
ID: "foo/",
expected: "",
expectErr: true,
},
{
ID: "foo/bar",
expected: "bar",
expectErr: false,
},
{
ID: "foo/bar/baz",
expected: "baz",
expectErr: false,
},
}
for _, test := range tests {
s, e := getLastSegment(test.ID)
if test.expectErr && e == nil {
t.Errorf("Expected err, but it was nil")
continue
}
if !test.expectErr && e != nil {
t.Errorf("Unexpected error: %v", e)
continue
}
if s != test.expected {
t.Errorf("expected: %s, got %s", test.expected, s)
}
}
}
// If the provided image is not an acr image, this function will return empty string | ||
func parseACRLoginServerFromImage(image string) string { | ||
// Parse login server from parameter `image`. An example of parameter `image`: foo.azurecr.io/bar/imageName:version | ||
regEx, err := regexp.Compile(`.*\.azurecr\.io|.*\.azurecr\.cn|.*\.azurecr\.de|.*\.azurecr\.us`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think use regexp.MustCompile
is better, define that var as global, if there is error, it won't pass on e2e test, no need to check error every time:
lunPathRE = regexp.MustCompile(`/dev/disk/azure/scsi(?:.*)/lun(.+)`) |
@@ -229,6 +223,10 @@ func getLoginServer(registry containerregistry.Registry) string { | |||
} | |||
|
|||
func getACRDockerEntryFromARMToken(a *acrProvider, loginServer string) (*credentialprovider.DockerConfigEntry, error) { | |||
// Run EnsureFresh to make sure the token is valid and does not expire | |||
if err := a.servicePrincipalToken.EnsureFresh(); err != nil { | |||
klog.Errorf("Failed to ensure fresh service principal token: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we return error here? If return error, no need to log every error
@@ -182,26 +183,19 @@ func (a *acrProvider) Enabled() bool { | |||
} | |||
|
|||
func (a *acrProvider) Provide(image string) credentialprovider.DockerConfig { | |||
klog.V(2).Infof("try to provide secret for image %s", image) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
klog.V(4).Infof
is better, V(2) is the dispalying log level by default, it would flush the log file
BTW, pls sqush all commits into one @norshtein |
Updated and rebased. PTAL @andyzhangx |
// parseACRLoginServerFromImage takes image as parameter and returns login server of it. | ||
// If the provided image is not an acr image, this function will return empty string | ||
func parseACRLoginServerFromImage(image string) string { | ||
// Parse login server from parameter `image`. An example of parameter `image`: foo.azurecr.io/bar/imageName:version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this comment since it's already exists in acrRE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks much better now, one small comment about test coverage.
expected: "", | ||
}, | ||
{ | ||
image: "foo.azurecr.io/bar/image:version", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add test for azurecr.cn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
@feiskyer do you have any comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: feiskyer, norshtein The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Many thanks for reviewing! @andyzhangx @feiskyer 😄 |
/test pull-kubernetes-kubemark-e2e-gce-big |
/retest Review the full test history for this PR. Silence the bot with an |
if len(match) == 1 { | ||
return match[0] | ||
} | ||
return "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be a log for len(match) > 1 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The image name is already logged here. len(match) > 1
indicates a wrong image is provided, credential provider can't handle this, in later code flow, no credential will be provided and this error will be caught outside the credential provider when trying to pull the image. Does this make sense?
What type of PR is this?
/kind bug
What this PR does / why we need it:
Method
Provide()
in interfaceDockerConfigProvider
now takes image name as its parameter(added in a recent PR #75587). So we can parse the login server from the image name, and then interact with the login server to get docker login credential.Which issue(s) this PR fixes:
Fixes #67892
Special notes for your reviewer:
Does this PR introduce a user-facing change?: no user-facing change
release note:
/sig azure
/assign @andyzhangx @feiskyer