Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix racy lazyEcrProvider updates #82550

Merged
merged 1 commit into from Sep 13, 2019

Conversation

@wongma7
Copy link
Contributor

commented Sep 10, 2019

What type of PR is this?
/kind bug

What this PR does / why we need it: There is a race in the ECR credentials provider kubernetes 1.11-1.13 and probably earlier that causes kubelet to immediately panic. It's not in 1.14.?+ because the ECR credentials provider was refactored and the patch for that was backported to 1.14 (#78164).

The problem is that LazyProvide is not thread-safe. The variable ecrProvider.getter is written to in 2 places & read from 1:

Write 1: https://github.com/kubernetes/kubernetes/blob/release-1.13/pkg/credentialprovider/aws/aws_credentials.go#L127 (write getter to nil)
Write 2: https://github.com/kubernetes/kubernetes/blob/release-1.13/pkg/credentialprovider/aws/aws_credentials.go#L176 (write getter to value)
Read: https://github.com/kubernetes/kubernetes/blob/release-1.13/pkg/credentialprovider/aws/aws_credentials.go#L192 (read getter)

If between Write 2 and Read in Thread 1, Thread 2 executes Write 1, then Thread 1 will get a nil panic when it does the Read.

Which issue(s) this PR fixes:

Fixes #78164

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Sep 10, 2019

@wongma7: This PR is not for the master branch but does not have the cherry-pick-approved label. Adding the do-not-merge/cherry-pick-not-approved label.

To approve the cherry-pick, please ping the kubernetes/patch-release-team in a comment when ready.

See also Kubernetes Patch Releases

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@wongma7

This comment has been minimized.

Copy link
Contributor Author

commented Sep 10, 2019

/cc @gyuho
/sig aws

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Sep 10, 2019

@wongma7: The label(s) sig/aws cannot be appled. These labels are supported: api-review, community/discussion, community/maintenance, community/question, cuj/build-train-deploy, cuj/multi-user, platform/aws, platform/azure, platform/gcp, platform/minikube, platform/other

In response to this:

/cc @gyuho
/sig aws

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot requested a review from gyuho Sep 10, 2019

@gyuho
gyuho approved these changes Sep 10, 2019
@gyuho

This comment has been minimized.

Copy link
Member

commented Sep 10, 2019

/lgtm

@tpepper

This comment has been minimized.

Copy link
Contributor

commented Sep 11, 2019

/sig cloud-provider

@nckturner

This comment has been minimized.

Copy link
Contributor

commented Sep 11, 2019

/lgtm

@wongma7

This comment has been minimized.

Copy link
Contributor Author

commented Sep 11, 2019

/assign @liggitt
A 1.13 bugfix for you to approve.
Can wait until after 1.16 release busywork :)

@liggitt

This comment has been minimized.

Copy link
Member

commented Sep 12, 2019

This needs a test demonstrating the race (or races) we are vulnerable to, and demonstrating this fixes them. It's also unclear that this is actually protecting the getter field referenced

@wongma7 wongma7 force-pushed the wongma7:credentials-race branch from 28b7fdd to 7c5a2ba Sep 12, 2019

@k8s-ci-robot k8s-ci-robot added the size/S label Sep 12, 2019

@wongma7

This comment has been minimized.

Copy link
Contributor Author

commented Sep 12, 2019

I've added a test.

Without the lock in LazyProvide:

$ go test ./pkg/credentialprovider/aws/...
PASS
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x1362700]

goroutine 44 [running]:
k8s.io/kubernetes/pkg/credentialprovider/aws.(*ecrProvider).Provide(0xc000198060, 0x18)
	/Users/mattwon/go/src/k8s.io/kubernetes/pkg/credentialprovider/aws/aws_credentials.go:195 +0x60
k8s.io/kubernetes/pkg/credentialprovider.(*CachingDockerConfigProvider).Provide(0xc00018c100, 0x0)
	/Users/mattwon/go/src/k8s.io/kubernetes/pkg/credentialprovider/provider.go:117 +0x15a
k8s.io/kubernetes/pkg/credentialprovider/aws.(*lazyEcrProvider).LazyProvide(0xc00008c140, 0x21)
	/Users/mattwon/go/src/k8s.io/kubernetes/pkg/credentialprovider/aws/aws_credentials.go:138 +0x71
created by k8s.io/kubernetes/pkg/credentialprovider/aws.TestConcurrentEcrLazyProvide
	/Users/mattwon/go/src/k8s.io/kubernetes/pkg/credentialprovider/aws/aws_credentials_test.go:173 +0xc9
FAIL	k8s.io/kubernetes/pkg/credentialprovider/aws	0.056s

With the lock in LazyProvide:

$ go test ./pkg/credentialprovider/aws/...
ok  	k8s.io/kubernetes/pkg/credentialprovider/aws	0.124s
Fix racy lazyEcrProvider updates
ref. #78164

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>

@wongma7 wongma7 force-pushed the wongma7:credentials-race branch from 7c5a2ba to 3fd2e2d Sep 12, 2019

@gyuho

This comment has been minimized.

Copy link
Member

commented Sep 12, 2019

/lgtm

thanks @wongma7 @liggitt

@k8s-ci-robot k8s-ci-robot added the lgtm label Sep 12, 2019

@wongma7

This comment has been minimized.

Copy link
Contributor Author

commented Sep 12, 2019

@liggitt PTAL, test added and extra lock removed. Was informed patch deadline for the final 1.13 release is tomorrow :(

@liggitt

This comment has been minimized.

Copy link
Member

commented Sep 13, 2019

/lgtm
/approve
/priority important-soon

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Sep 13, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liggitt, wongma7

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 25074a1 into kubernetes:release-1.13 Sep 13, 2019

25 checks passed

cla/linuxfoundation gyuho authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-conformance-image-test Skipped.
pull-kubernetes-conformance-kind-ipv6 Skipped.
pull-kubernetes-cross Skipped.
pull-kubernetes-dependencies Skipped.
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Skipped.
pull-kubernetes-e2e-gce-alpha-features Skipped.
pull-kubernetes-e2e-gce-csi-serial Skipped.
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gce-iscsi Skipped.
pull-kubernetes-e2e-gce-iscsi-serial Skipped.
pull-kubernetes-e2e-gce-storage-slow Skipped.
pull-kubernetes-godeps Skipped.
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped.
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-node-e2e-containerd Skipped.
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
pull-publishing-bot-validate Skipped.
tide In merge pool.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.