Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a patch to make ECR credentials work #4084

Merged
merged 5 commits into from Jun 28, 2019

Conversation

jonjohnsonjr
Copy link
Contributor

@jonjohnsonjr jonjohnsonjr commented May 13, 2019

Fixes #1996

This patches in kubernetes/kubernetes#75585 and kubernetes/kubernetes#75587 in order to use the updated ECR credential provider. We'll be able to drop these patches once we've bumped our k8s dependencies to 1.15, but that's pretty far off.

Proposed Changes

  • Add a patch to enable fetching credentials for ECR.

Release Note


@knative-prow-robot knative-prow-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 13, 2019
@googlebot googlebot added the cla: yes Indicates the PR's author has signed the CLA. label May 13, 2019
@knative-prow-robot knative-prow-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. area/test-and-release It flags unit/e2e/conformance/perf test issues for product features labels May 13, 2019
@knative-prow-robot knative-prow-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels May 13, 2019
@jonjohnsonjr
Copy link
Contributor Author

/retest

@jonjohnsonjr
Copy link
Contributor Author

cc @symfrog @Justin2997 either of you interested in testing this out?

@Justin2997
Copy link

@jonjohnsonjr That seems good to me. I don't have the infra to test it anymore. Maybe @lavoiedn can test it.

@lavoiedn
Copy link

Funny story that... Our cluster got destroyed and we're still bringing it back 😂

Not able to test it at the moment, will update if I get a chance to fix our setup!

@sebgoa
Copy link
Contributor

sebgoa commented Jun 18, 2019

@tzununbekov can you try to test this

@tzununbekov
Copy link
Member

I've installed jonjohnsonjr:k8schain-patch branch on EKS cluster and tried to create simple knservice with image from private ECR - revision has failed to fetch the image with following message:

Unable to fetch image "730809894724.dkr.ecr.us-east-2.amazonaws.com/knative-test:latest": unsupported status code 401; body: Not Authorized

while k8s job on the same cluster and image succeeded:

  ...
  Normal  Pulling    27m   kubelet, ip-192-168-6-223.us-east-2.compute.internal  pulling image "730809894724.dkr.ecr.us-east-2.amazonaws.com/knative-test:latest"
  Normal  Pulled     27m   kubelet, ip-192-168-6-223.us-east-2.compute.internal  Successfully pulled image "730809894724.dkr.ecr.us-east-2.amazonaws.com/knative-test:latest"
  Normal  Created    27m   kubelet, ip-192-168-6-223.us-east-2.compute.internal  Created container

Is that a test case or do I miss something?

@tzununbekov
Copy link
Member

UPD: looks like I should run update-dep.sh script to apply "ecr-provider-hack" patch before tests, but it complains that github.com/knative/test-infra doesn't met constraints

@lavoiedn
Copy link

Update from us, we got our cluster back and we're just about to need to test this.

@januaryh you'll need this fix when you try to fetch the image for the rabbitmq adapter/controller.

CC: @alexgervais

@jonjohnsonjr
Copy link
Contributor Author

@tzununbekov is it a fortio issue? knative/test-infra#968

@jonjohnsonjr
Copy link
Contributor Author

This seems to be working again

@tzununbekov
Copy link
Member

I'm still getting Unable to fetch image: unsupported status code 401; body: Not Authorized error in knservice revisions with ECR registry. Can someone else confirm it?

@ryanbrainard
Copy link

I just tried this as well, and still getting Unable to fetch image <image id>: unsupported status code 401; body: Not Authorized. I'm really sure I applied the patch correctly, but here's what I did:

  • Baseline: Existing EKS cluster already running Knative v0.6.0
  • Checked out this PR
  • Setup cluster admin (not sure if needed)
  • Deploy cert-manager (not sure if needed, but i did see errors about not finding a cert manager before i did this)
  • Deploy Knative Serving
  • Checked my existing service using a private registry (still broken)
  • Created a new service using a private registry (still broken)

Did I miss any steps?

Here is one of the log lines from the controller after I deployed:

{"level":"error","ts":"2019-06-21T07:49:10.808Z","logger":"controller.revision-controller","caller":"revision/revision.go:197","msg":"Failed to reconcile","commit":"6a2d54f","knative.dev/controller":"revision-controller","knative.dev/traceid":"b8b19590-58f3-4017-8e9f-909dd7a85777","knative.dev/key":"default/shaas1-qkqm7","phase":"image digest","error":"unsupported status code 401; body: Not Authorized\n","stacktrace":"github.com/knative/serving/pkg/reconciler/revision.(*Reconciler).reconcile\n\t/Users/rbrainard/Development/go/src/github.com/knative/serving/pkg/reconciler/revision/revision.go:197\ngithub.com/knative/serving/pkg/reconciler/revision.(*Reconciler).Reconcile\n\t/Users/rbrainard/Development/go/src/github.com/knative/serving/pkg/reconciler/revision/revision.go:98\ngithub.com/knative/serving/vendor/github.com/knative/pkg/controller.(*Impl).processNextWorkItem\n\t/Users/rbrainard/Development/go/src/github.com/knative/serving/vendor/github.com/knative/pkg/controller/controller.go:330\ngithub.com/knative/serving/vendor/github.com/knative/pkg/controller.(*Impl).Run.func1\n\t/Users/rbrainard/Development/go/src/github.com/knative/serving/vendor/github.com/knative/pkg/controller/controller.go:282"}

cc: @mattmoor

@jonjohnsonjr
Copy link
Contributor Author

@tzununbekov @ryanbrainard thanks both for trying this -- can you see if this brings up anything relevant?

$ kubectl logs -n knative-serving $(kubectl get pod -n knative-serving -l app=controller -oname) | grep AWS

E.g. I hit this but that's expected because I'm running on GKE:

aws_credentials.go:76] while getting AWS credentials NoCredentialProviders: no valid providers in chain. Deprecated.

@ryanbrainard
Copy link

@jonjohnsonjr Seeing this:

$ kubectl logs -n knative-serving $(kubectl get pod -n knative-serving -l app=controller -oname) | grep AWS

ERROR: logging before flag.Parse: E0621 07:43:40.372529       1 aws_credentials.go:76] while getting AWS credentials NoCredentialProviders: no valid providers in chain. Deprecated.

@ryanbrainard
Copy link

ryanbrainard commented Jun 25, 2019

@jonjohnsonjr Good news, I was able to get this to work! It turned out to be a kiam issue that was blocking this from working last time I tried it. I had to give ECR access via IAM to the knative-serving/controller pod and then it all worked.

@tzununbekov perhaps you have a similar issue? I was able to debug this by looking at the controller's events (kubectl describe pods -n knative-serving -l app=controller) and then tracing those back to the AWS request ids in CloudTrail (or just look for recent AssumeRole events with failures). It all made sense once I found the problem, but the indirection that kiam adds tripped me up a bit, so the CloudTrail logs were helpful there.

@jonjohnsonjr jonjohnsonjr changed the title [WIP] Add a patch to make ECR credentials work Add a patch to make ECR credentials work Jun 25, 2019
@jonjohnsonjr jonjohnsonjr marked this pull request as ready for review June 25, 2019 05:41
@knative-prow-robot knative-prow-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 25, 2019
@ryanbrainard
Copy link

@jonjohnsonjr I just re-tested this again after your rebase on a brand new fresh cluster (I was mucking around a lot on the other one, so wanted to be 100% sure). Still works 👍

@jonjohnsonjr
Copy link
Contributor Author

/assign @mattmoor

Gopkg.toml Outdated Show resolved Hide resolved
@jonjohnsonjr
Copy link
Contributor Author

/retest

Copy link
Member

@mattmoor mattmoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 28, 2019
@knative-prow-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jonjohnsonjr, mattmoor

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow-robot knative-prow-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 28, 2019
@knative-prow-robot knative-prow-robot merged commit b0b890a into knative:master Jun 28, 2019
k4leung4 added a commit to k4leung4/serving that referenced this pull request Mar 20, 2020
The patch from knative#4084 has been addressed.
Addresses knative#4549
knative-prow-robot pushed a commit that referenced this pull request Mar 20, 2020
The patch from #4084 has been addressed.
Addresses #4549
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test-and-release It flags unit/e2e/conformance/perf test issues for product features cla: yes Indicates the PR's author has signed the CLA. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The Knative Serving controller can not access the same set of private registries as the K8s cluster
9 participants