Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

helm pull/push randomly hangs with GCP Artifact Registry #11415

Closed
dzmoore opened this issue Oct 5, 2022 · 17 comments
Closed

helm pull/push randomly hangs with GCP Artifact Registry #11415

dzmoore opened this issue Oct 5, 2022 · 17 comments
Labels
bug Categorizes issue or PR as related to a bug. oci Related to Helm OCI feature Stale

Comments

@dzmoore
Copy link

dzmoore commented Oct 5, 2022

Hello, I'm seeing an issue on version 3.10.0 of helm where push/pull to/from GCP Artifact Registry seems to randomly hang. For example, maybe 2 out of 3 times, a pull will work, but then it seems to hang.

When things work, it looks like this:

$ helm pull oci://us-west1-docker.pkg.dev/PROJECT_NAME_HERE/REPO_NAME_HERE/CHART_NAME_HERE --version CHART_VERSION_HERE --debug
DEBU[0000] resolving                                     host=us-west1-docker.pkg.dev
DEBU[0000] do request                                    host=us-west1-docker.pkg.dev request.header.accept="application/vnd.docker.distribution.manifest.v2+json, application/vnd.docker.distribution.manifest.list.v2+json, application/vnd.oci.image.manifest.v1+json, application/vnd.oci.image.index.v1+json, */*" request.header.user-agent=Helm/3.10.0 request.method=HEAD url="https://us-west1-docker.pkg.dev/v2/PROJECT_NAME_HERE/REPO_NAME_HERE/CHART_NAME_HERE/manifests/CHART_VERSION_HERE"
DEBU[0000] fetch response received                       host=us-west1-docker.pkg.dev response.header.alt-svc="h3=\":443\"; ma=2592000,h3-29=\":443\"; ma=2592000,h3-Q050=\":443\"; ma=2592000,h3-Q046=\":443\"; ma=2592000,h3-Q043=\":443\"; ma=2592000,quic=\":443\"; ma=2592000; v=\"46,43\"" response.header.content-length=102 response.header.content-type=application/json response.header.date="Wed, 05 Oct 2022 23:01:05 GMT" response.header.docker-distribution-api-version=registry/2.0 response.header.www-authenticate="Bearer realm=\"https://us-west1-docker.pkg.dev/v2/token\",service=\"us-west1-docker.pkg.dev\",scope=\"repository:PROJECT_NAME_HERE/REPO_NAME_HERE/CHART_NAME_HERE:pull\"" response.header.x-content-type-options=nosniff response.header.x-frame-options=SAMEORIGIN response.header.x-xss-protection=0 response.status="401 Unauthorized" url="https://us-west1-docker.pkg.dev/v2/PROJECT_NAME_HERE/REPO_NAME_HERE/CHART_NAME_HERE/manifests/CHART_VERSION_HERE"
DEBU[0000] Unauthorized                                  header="Bearer realm=\"https://us-west1-docker.pkg.dev/v2/token\",service=\"us-west1-docker.pkg.dev\",scope=\"repository:PROJECT_NAME_HERE/REPO_NAME_HERE/CHART_NAME_HERE:pull\"" host=us-west1-docker.pkg.dev
DEBU[0000] do request                                    host=us-west1-docker.pkg.dev request.header.accept="application/vnd.docker.distribution.manifest.v2+json, application/vnd.docker.distribution.manifest.list.v2+json, application/vnd.oci.image.manifest.v1+json, application/vnd.oci.image.index.v1+json, */*" request.header.user-agent=Helm/3.10.0 request.method=HEAD url="https://us-west1-docker.pkg.dev/v2/PROJECT_NAME_HERE/REPO_NAME_HERE/CHART_NAME_HERE/manifests/CHART_VERSION_HERE"
DEBU[0000] fetch response received                       host=us-west1-docker.pkg.dev response.header.alt-svc="h3=\":443\"; ma=2592000,h3-29=\":443\"; ma=2592000,h3-Q050=\":443\"; ma=2592000,h3-Q046=\":443\"; ma=2592000,h3-Q043=\":443\"; ma=2592000,quic=\":443\"; ma=2592000; v=\"46,43\"" response.header.content-length=354 response.header.content-type=application/vnd.oci.image.manifest.v1+json response.header.date="Wed, 05 Oct 2022 23:01:05 GMT" response.header.docker-content-digest="sha256:DIGEST_HERE" response.header.docker-distribution-api-version=registry/2.0 response.status="200 OK" url="https://us-west1-docker.pkg.dev/v2/PROJECT_NAME_HERE/REPO_NAME_HERE/CHART_NAME_HERE/manifests/CHART_VERSION_HERE"
DEBU[0000] resolved                                      desc.digest="sha256:DIGEST_HERE" host=us-west1-docker.pkg.dev
DEBU[0000] do request                                    digest="sha256:DIGEST_HERE" request.header.accept="application/vnd.oci.image.manifest.v1+json, */*" request.header.user-agent=Helm/3.10.0 request.method=GET url="https://us-west1-docker.pkg.dev/v2/PROJECT_NAME_HERE/REPO_NAME_HERE/CHART_NAME_HERE/manifests/sha256:DIGEST_HERE"
DEBU[0000] fetch response received                       digest="sha256:DIGEST_HERE" response.header.alt-svc="h3=\":443\"; ma=2592000,h3-29=\":443\"; ma=2592000,h3-Q050=\":443\"; ma=2592000,h3-Q046=\":443\"; ma=2592000,h3-Q043=\":443\"; ma=2592000,quic=\":443\"; ma=2592000; v=\"46,43\"" response.header.content-length=354 response.header.content-type=application/vnd.oci.image.manifest.v1+json response.header.date="Wed, 05 Oct 2022 23:01:05 GMT" response.header.docker-content-digest="sha256:DIGEST_HERE" response.header.docker-distribution-api-version=registry/2.0 response.header.x-content-type-options=nosniff response.header.x-frame-options=SAMEORIGIN response.header.x-xss-protection=0 response.status="200 OK" url="https://us-west1-docker.pkg.dev/v2/PROJECT_NAME_HERE/REPO_NAME_HERE/CHART_NAME_HERE/manifests/sha256:DIGEST_HERE"
DEBU[0000] do request                                    digest="sha256:DIGEST_HERE" request.header.accept="application/vnd.cncf.helm.chart.content.v1.tar+gzip, */*" request.header.user-agent=Helm/3.10.0 request.method=GET url="https://us-west1-docker.pkg.dev/v2/PROJECT_NAME_HERE/REPO_NAME_HERE/CHART_NAME_HERE/blobs/sha256:DIGEST_HERE"
DEBU[0000] do request                                    digest="sha256:DIGEST_HERE" request.header.accept="application/vnd.cncf.helm.config.v1+json, */*" request.header.user-agent=Helm/3.10.0 request.method=GET url="https://us-west1-docker.pkg.dev/v2/PROJECT_NAME_HERE/REPO_NAME_HERE/CHART_NAME_HERE/blobs/sha256:DIGEST_HERE"
DEBU[0000] fetch response received                       digest="sha256:DIGEST_HERE" response.header.accept-ranges=bytes response.header.alt-svc="h3=\":443\"; ma=2592000,h3-29=\":443\"; ma=2592000,h3-Q050=\":443\"; ma=2592000,h3-Q046=\":443\"; ma=2592000,h3-Q043=\":443\"; ma=2592000,quic=\":443\"; ma=2592000; v=\"46,43\"" response.header.cache-control="private, max-age=0" response.header.content-length=24086 response.header.content-type=application/octet-stream response.header.date="Wed, 05 Oct 2022 23:01:06 GMT" response.header.expires="Wed, 05 Oct 2022 23:01:06 GMT" response.header.server=UploadServer response.header.x-goog-hash="HASH_HERE" response.header.x-guploader-uploadid=ID_HERE response.status="200 OK" url="https://us-west1-docker.pkg.dev/v2/PROJECT_NAME_HERE/REPO_NAME_HERE/CHART_NAME_HERE/blobs/sha256:DIGEST_HERE"
DEBU[0000] fetch response received                       digest="sha256:DIGEST_HERE" response.header.accept-ranges=bytes response.header.alt-svc="h3=\":443\"; ma=2592000,h3-29=\":443\"; ma=2592000,h3-Q050=\":443\"; ma=2592000,h3-Q046=\":443\"; ma=2592000,h3-Q043=\":443\"; ma=2592000,quic=\":443\"; ma=2592000; v=\"46,43\"" response.header.content-length=304 response.header.content-type=application/octet-stream response.header.date="Wed, 05 Oct 2022 23:01:06 GMT" response.header.docker-distribution-api-version=registry/2.0 response.header.x-content-type-options=nosniff response.header.x-frame-options=SAMEORIGIN response.header.x-xss-protection=0 response.status="200 OK" url="https://us-west1-docker.pkg.dev/v2/PROJECT_NAME_HERE/REPO_NAME_HERE/CHART_NAME_HERE/blobs/sha256:DIGEST_HERE"
DEBU[0000] encountered unknown type application/vnd.cncf.helm.config.v1+json; children may not be fetched 
DEBU[0000] encountered unknown type application/vnd.cncf.helm.chart.content.v1.tar+gzip; children may not be fetched 
Pulled: us-west1-docker.pkg.dev/PROJECT_NAME_HERE/REPO_NAME_HERE/CHART_NAME_HERE:CHART_VERSION_HERE
Digest: sha256:DIGEST_HERE

But then when it hangs, it looks like this:

$ helm pull oci://us-west1-docker.pkg.dev/PROJECT_NAME_HERE/REPO_NAME_HERE/CHART_NAME_HERE --version CHART_VERSION_HERE --debug
DEBU[0000] resolving                                     host=us-west1-docker.pkg.dev
DEBU[0000] do request                                    host=us-west1-docker.pkg.dev request.header.accept="application/vnd.docker.distribution.manifest.v2+json, application/vnd.docker.distribution.manifest.list.v2+json, application/vnd.oci.image.manifest.v1+json, application/vnd.oci.image.index.v1+json, */*" request.header.user-agent=Helm/3.10.0 request.method=HEAD url="https://us-west1-docker.pkg.dev/v2/PROJECT_NAME_HERE/REPO_NAME_HERE/CHART_NAME_HERE/manifests/CHART_VERSION_HERE"
DEBU[0000] fetch response received                       host=us-west1-docker.pkg.dev response.header.alt-svc="h3=\":443\"; ma=2592000,h3-29=\":443\"; ma=2592000,h3-Q050=\":443\"; ma=2592000,h3-Q046=\":443\"; ma=2592000,h3-Q043=\":443\"; ma=2592000,quic=\":443\"; ma=2592000; v=\"46,43\"" response.header.content-length=102 response.header.content-type=application/json response.header.date="Wed, 05 Oct 2022 23:01:07 GMT" response.header.docker-distribution-api-version=registry/2.0 response.header.www-authenticate="Bearer realm=\"https://us-west1-docker.pkg.dev/v2/token\",service=\"us-west1-docker.pkg.dev\",scope=\"repository:PROJECT_NAME_HERE/REPO_NAME_HERE/CHART_NAME_HERE:pull\"" response.header.x-content-type-options=nosniff response.header.x-frame-options=SAMEORIGIN response.header.x-xss-protection=0 response.status="401 Unauthorized" url="https://us-west1-docker.pkg.dev/v2/PROJECT_NAME_HERE/REPO_NAME_HERE/CHART_NAME_HERE/manifests/CHART_VERSION_HERE"
DEBU[0000] Unauthorized                                  header="Bearer realm=\"https://us-west1-docker.pkg.dev/v2/token\",service=\"us-west1-docker.pkg.dev\",scope=\"repository:PROJECT_NAME_HERE/REPO_NAME_HERE/CHART_NAME_HERE:pull\"" host=us-west1-docker.pkg.dev
DEBU[0000] do request                                    host=us-west1-docker.pkg.dev request.header.accept="application/vnd.docker.distribution.manifest.v2+json, application/vnd.docker.distribution.manifest.list.v2+json, application/vnd.oci.image.manifest.v1+json, application/vnd.oci.image.index.v1+json, */*" request.header.user-agent=Helm/3.10.0 request.method=HEAD url="https://us-west1-docker.pkg.dev/v2/PROJECT_NAME_HERE/REPO_NAME_HERE/CHART_NAME_HERE/manifests/CHART_VERSION_HERE"

Unlike @Unknow0's issue in #10396, I don't think this has to do with multiple keys, since I only have a single entry in my ~/.config/helm/registry/config.json file and I do not have a ~/.docker/config.json file.

Any ideas about what might be going on would be much appreciated!

Output of helm version:

$ helm version
version.BuildInfo{Version:"v3.10.0", GitCommit:"ce66412a723e4d89555dc67217607c6579ffcb21", GitTreeState:"clean", GoVersion:"go1.18.6"}

Output of kubectl version:

$ kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.2", GitCommit:"5835544ca568b757a8ecae5c153f317e5736700e", GitTreeState:"clean", BuildDate:"2022-09-21T14:33:49Z", GoVersion:"go1.19.1", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Error from server (NotFound): the server could not find the requested resource

Cloud Provider/Platform (AKS, GKE, Minikube etc.): This issue involves GCP Artifact Registry (I am not interacting with a k8s cluster for this issue).

@joejulian joejulian added question/support oci Related to Helm OCI feature labels Oct 5, 2022
@wibobm
Copy link

wibobm commented Oct 21, 2022

Seeing same here.

@joejulian
Copy link
Contributor

It looks to me like helm is never receiving a response.

@dmvariphy
Copy link

Issue created with Google as well: https://issuetracker.google.com/issues/255248281

As a workaround, you can manually timeout the command and retry it, e.g.

/bin/bash -c \
  "until 
  timeout 5 \
    helm pull oci://us-west1-docker.pkg.dev/$PROJECT_NAME/$REPO_NAME/$CHART_NAME --version $CHART_VERSION; 
  do 
    echo 'helm hung; retrying...'; 
  done"

@jonjohnsonjr
Copy link

How long does it hang? Indefinitely? 60s?

@dmvariphy
Copy link

@jonjohnsonjr I think it hangs indefinitely.

@jonjohnsonjr
Copy link

I am not able to reproduce this. I've been running helm 3.10.0 in a loop pulling from us-west1-docker.pkg.dev for ~an hour without any hanging. Where are you running this from? Are you behind any kind of proxy?

@dmvariphy
Copy link

Thanks for taking a look @jonjohnsonjr. It's interesting you aren't seeing the issue; that's another data point in figuring out what's going on here..

I can see the issue both when running helm locally on my workstation, as well as part of a build pipeline running on a (not local) VM. The requests would come out of separate networks in those two scenarios.

As far as I know, I'm not behind a proxy or anything. I will try from another network and see if I can fail to reproduce the issue - so far, I can always reproduce it within 30 seconds or so of trying.

@joejulian
Copy link
Contributor

Time to break out wireshark. 😁

@jonjohnsonjr
Copy link

Yeah if it's possible to reproduce this with curl or get a packet capture, I could look into this further.

@cmaahs
Copy link

cmaahs commented Nov 2, 2022

For me it hangs for 300 seconds, and the problem appears to be in the package that helm uses to interface with OCI registries.

https://github.com/oras-project/oras-go

@cmaahs
Copy link

cmaahs commented Nov 4, 2022

Some additional troubleshooting info. I replaced using Helm's PULL functionality, with a couple of simple GAR / Docker V2 API calls:

GET https://us-docker.pkg.dev/v2/<project>/<repository>/manifests/<tag>

Retrieve the digest for media type: application/vnd.cncf.helm.chart.content.v1.tar+gzip

Then use that digest to pull the TGZ

GET https://us-docker.pkg.dev/v2/<project>/<repository>/blobs/<digest>

Switching to this process I've not had any 300s delays in retrieving the helm TGZ file.

It is unclear where in the ORAS (https://github.com/oras-project/oras-go) code the delay is being introduced.

Running this, I have about 150 or so helm images, and I would hit the 300s delay every 4-10 image pulls. However, this is VERY random, and even more so when run by my co-workers, some experience almost NO delays, perhaps a single 300s delay over the entire run.

#!/usr/bin/env zsh

gcloud auth print-access-token | helm registry login -u oauth2accesstoken --password-stdin https://us-docker.pkg.dev
PROJECT='<gcp project id>'
REPOSITORY="projects/${PROJECT}/locations/us/repositories/helm"   # helm is a specific registry folder containing only Helm OCI images
for i in $(gcloud artifacts packages list --repository ${REPOSITORY} --location all --format json  2>/dev/null | jq -r '.[].name'); do
    for t in $(gcloud artifacts tags list --repository ${REPOSITORY} --package ${REPOSITORY}/packages/${i} --location us 2>/dev/null | awk 'NR>1 {print $1}' | sort -r --version-sort | head -n 1); do
        if [[ -n ${t} ]]; then
            echo "start: $(date)         ${i}:${t}"
            helm pull oci://us-docker.pkg.dev/${PROJECT}/helm/${i} --version ${t} # --debug
            echo "  end: $(date)"
        fi
    done
done

I wrote a some simple go code to replace the helm pull in the above script, that uses the two API calls described above and have been running it for the past several days with zero 300s delays.

@jonjohnsonjr
Copy link

$ helm version
version.BuildInfo{Version:"v3.10.0", GitCommit:"ce66412a723e4d89555dc67217607c6579ffcb21", GitTreeState:"clean", GoVersion:"go1.18.6"}

I did some digging on this... I'm more confused than ever, but what I've discovered:

  1. I can trigger this by running mitmdump and trying to pull with helm. I cannot trigger it any other way.
  2. helm gets stuck here when it attempts to POST /v2/token. AR will return a 405, but we never see this request. I also don't see the POST request in mitmdump, so I'm not sure why it's stuck here.
  3. If I disable http2 via: GODEBUG=http2client=0, everything works.
  4. I can reproduce the same issue with oras v0.12.0, it gets stuck in the same place, but recent version of oras (which doesn't hit this path in containerd) doesn't have any issues.

I am convinced there is either a bug in containerd or a bug in how oras was using containerd, but it's probably not worth trying to find the bug given that helm could just upgrade to oras-go v2 instead.

@joejulian joejulian added bug Categorizes issue or PR as related to a bug. and removed question/support labels Dec 24, 2022
@github-actions
Copy link

This issue has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs.

@github-actions github-actions bot added the Stale label Mar 25, 2023
@dmvariphy
Copy link

We're still seeing the issue, although we are working around it with something like I mentioned above: #11415 (comment)

@github-actions github-actions bot removed the Stale label Mar 26, 2023
@s-todisco
Copy link

I have the same issue with GCP Artifact Registry, in particular with "helmfile", but the problem is the same with "helm login". In my case it doesn't seem hung, it responds randomly "401 - unauthorized". Sometimes there are necessary 10 attempts to get one right anwser and sometimes only two-three attempts, it's very random.

@github-actions
Copy link

This issue has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs.

@cjim8889
Copy link

$ helm version
version.BuildInfo{Version:"v3.10.0", GitCommit:"ce66412a723e4d89555dc67217607c6579ffcb21", GitTreeState:"clean", GoVersion:"go1.18.6"}

I did some digging on this... I'm more confused than ever, but what I've discovered:

  1. I can trigger this by running mitmdump and trying to pull with helm. I cannot trigger it any other way.
  2. helm gets stuck here when it attempts to POST /v2/token. AR will return a 405, but we never see this request. I also don't see the POST request in mitmdump, so I'm not sure why it's stuck here.
  3. If I disable http2 via: GODEBUG=http2client=0, everything works.
  4. I can reproduce the same issue with oras v0.12.0, it gets stuck in the same place, but recent version of oras (which doesn't hit this path in containerd) doesn't have any issues.

I am convinced there is either a bug in containerd or a bug in how oras was using containerd, but it's probably not worth trying to find the bug given that helm could just upgrade to oras-go v2 instead.

We believe either GCP cloud storage is using non-standard http2 impl or there's a bug in golang std library. We are seeing this exact issues with cloud bucket in multiple different use cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Categorizes issue or PR as related to a bug. oci Related to Helm OCI feature Stale
Projects
None yet
Development

No branches or pull requests

8 participants