helm pull/push randomly hangs with GCP Artifact Registry #11415

dzmoore · 2022-10-05T23:29:44Z

Hello, I'm seeing an issue on version 3.10.0 of helm where push/pull to/from GCP Artifact Registry seems to randomly hang. For example, maybe 2 out of 3 times, a pull will work, but then it seems to hang.

When things work, it looks like this:

$ helm pull oci://us-west1-docker.pkg.dev/PROJECT_NAME_HERE/REPO_NAME_HERE/CHART_NAME_HERE --version CHART_VERSION_HERE --debug
DEBU[0000] resolving                                     host=us-west1-docker.pkg.dev
DEBU[0000] do request                                    host=us-west1-docker.pkg.dev request.header.accept="application/vnd.docker.distribution.manifest.v2+json, application/vnd.docker.distribution.manifest.list.v2+json, application/vnd.oci.image.manifest.v1+json, application/vnd.oci.image.index.v1+json, */*" request.header.user-agent=Helm/3.10.0 request.method=HEAD url="https://us-west1-docker.pkg.dev/v2/PROJECT_NAME_HERE/REPO_NAME_HERE/CHART_NAME_HERE/manifests/CHART_VERSION_HERE"
DEBU[0000] fetch response received                       host=us-west1-docker.pkg.dev response.header.alt-svc="h3=\":443\"; ma=2592000,h3-29=\":443\"; ma=2592000,h3-Q050=\":443\"; ma=2592000,h3-Q046=\":443\"; ma=2592000,h3-Q043=\":443\"; ma=2592000,quic=\":443\"; ma=2592000; v=\"46,43\"" response.header.content-length=102 response.header.content-type=application/json response.header.date="Wed, 05 Oct 2022 23:01:05 GMT" response.header.docker-distribution-api-version=registry/2.0 response.header.www-authenticate="Bearer realm=\"https://us-west1-docker.pkg.dev/v2/token\",service=\"us-west1-docker.pkg.dev\",scope=\"repository:PROJECT_NAME_HERE/REPO_NAME_HERE/CHART_NAME_HERE:pull\"" response.header.x-content-type-options=nosniff response.header.x-frame-options=SAMEORIGIN response.header.x-xss-protection=0 response.status="401 Unauthorized" url="https://us-west1-docker.pkg.dev/v2/PROJECT_NAME_HERE/REPO_NAME_HERE/CHART_NAME_HERE/manifests/CHART_VERSION_HERE"
DEBU[0000] Unauthorized                                  header="Bearer realm=\"https://us-west1-docker.pkg.dev/v2/token\",service=\"us-west1-docker.pkg.dev\",scope=\"repository:PROJECT_NAME_HERE/REPO_NAME_HERE/CHART_NAME_HERE:pull\"" host=us-west1-docker.pkg.dev
DEBU[0000] do request                                    host=us-west1-docker.pkg.dev request.header.accept="application/vnd.docker.distribution.manifest.v2+json, application/vnd.docker.distribution.manifest.list.v2+json, application/vnd.oci.image.manifest.v1+json, application/vnd.oci.image.index.v1+json, */*" request.header.user-agent=Helm/3.10.0 request.method=HEAD url="https://us-west1-docker.pkg.dev/v2/PROJECT_NAME_HERE/REPO_NAME_HERE/CHART_NAME_HERE/manifests/CHART_VERSION_HERE"
DEBU[0000] fetch response received                       host=us-west1-docker.pkg.dev response.header.alt-svc="h3=\":443\"; ma=2592000,h3-29=\":443\"; ma=2592000,h3-Q050=\":443\"; ma=2592000,h3-Q046=\":443\"; ma=2592000,h3-Q043=\":443\"; ma=2592000,quic=\":443\"; ma=2592000; v=\"46,43\"" response.header.content-length=354 response.header.content-type=application/vnd.oci.image.manifest.v1+json response.header.date="Wed, 05 Oct 2022 23:01:05 GMT" response.header.docker-content-digest="sha256:DIGEST_HERE" response.header.docker-distribution-api-version=registry/2.0 response.status="200 OK" url="https://us-west1-docker.pkg.dev/v2/PROJECT_NAME_HERE/REPO_NAME_HERE/CHART_NAME_HERE/manifests/CHART_VERSION_HERE"
DEBU[0000] resolved                                      desc.digest="sha256:DIGEST_HERE" host=us-west1-docker.pkg.dev
DEBU[0000] do request                                    digest="sha256:DIGEST_HERE" request.header.accept="application/vnd.oci.image.manifest.v1+json, */*" request.header.user-agent=Helm/3.10.0 request.method=GET url="https://us-west1-docker.pkg.dev/v2/PROJECT_NAME_HERE/REPO_NAME_HERE/CHART_NAME_HERE/manifests/sha256:DIGEST_HERE"
DEBU[0000] fetch response received                       digest="sha256:DIGEST_HERE" response.header.alt-svc="h3=\":443\"; ma=2592000,h3-29=\":443\"; ma=2592000,h3-Q050=\":443\"; ma=2592000,h3-Q046=\":443\"; ma=2592000,h3-Q043=\":443\"; ma=2592000,quic=\":443\"; ma=2592000; v=\"46,43\"" response.header.content-length=354 response.header.content-type=application/vnd.oci.image.manifest.v1+json response.header.date="Wed, 05 Oct 2022 23:01:05 GMT" response.header.docker-content-digest="sha256:DIGEST_HERE" response.header.docker-distribution-api-version=registry/2.0 response.header.x-content-type-options=nosniff response.header.x-frame-options=SAMEORIGIN response.header.x-xss-protection=0 response.status="200 OK" url="https://us-west1-docker.pkg.dev/v2/PROJECT_NAME_HERE/REPO_NAME_HERE/CHART_NAME_HERE/manifests/sha256:DIGEST_HERE"
DEBU[0000] do request                                    digest="sha256:DIGEST_HERE" request.header.accept="application/vnd.cncf.helm.chart.content.v1.tar+gzip, */*" request.header.user-agent=Helm/3.10.0 request.method=GET url="https://us-west1-docker.pkg.dev/v2/PROJECT_NAME_HERE/REPO_NAME_HERE/CHART_NAME_HERE/blobs/sha256:DIGEST_HERE"
DEBU[0000] do request                                    digest="sha256:DIGEST_HERE" request.header.accept="application/vnd.cncf.helm.config.v1+json, */*" request.header.user-agent=Helm/3.10.0 request.method=GET url="https://us-west1-docker.pkg.dev/v2/PROJECT_NAME_HERE/REPO_NAME_HERE/CHART_NAME_HERE/blobs/sha256:DIGEST_HERE"
DEBU[0000] fetch response received                       digest="sha256:DIGEST_HERE" response.header.accept-ranges=bytes response.header.alt-svc="h3=\":443\"; ma=2592000,h3-29=\":443\"; ma=2592000,h3-Q050=\":443\"; ma=2592000,h3-Q046=\":443\"; ma=2592000,h3-Q043=\":443\"; ma=2592000,quic=\":443\"; ma=2592000; v=\"46,43\"" response.header.cache-control="private, max-age=0" response.header.content-length=24086 response.header.content-type=application/octet-stream response.header.date="Wed, 05 Oct 2022 23:01:06 GMT" response.header.expires="Wed, 05 Oct 2022 23:01:06 GMT" response.header.server=UploadServer response.header.x-goog-hash="HASH_HERE" response.header.x-guploader-uploadid=ID_HERE response.status="200 OK" url="https://us-west1-docker.pkg.dev/v2/PROJECT_NAME_HERE/REPO_NAME_HERE/CHART_NAME_HERE/blobs/sha256:DIGEST_HERE"
DEBU[0000] fetch response received                       digest="sha256:DIGEST_HERE" response.header.accept-ranges=bytes response.header.alt-svc="h3=\":443\"; ma=2592000,h3-29=\":443\"; ma=2592000,h3-Q050=\":443\"; ma=2592000,h3-Q046=\":443\"; ma=2592000,h3-Q043=\":443\"; ma=2592000,quic=\":443\"; ma=2592000; v=\"46,43\"" response.header.content-length=304 response.header.content-type=application/octet-stream response.header.date="Wed, 05 Oct 2022 23:01:06 GMT" response.header.docker-distribution-api-version=registry/2.0 response.header.x-content-type-options=nosniff response.header.x-frame-options=SAMEORIGIN response.header.x-xss-protection=0 response.status="200 OK" url="https://us-west1-docker.pkg.dev/v2/PROJECT_NAME_HERE/REPO_NAME_HERE/CHART_NAME_HERE/blobs/sha256:DIGEST_HERE"
DEBU[0000] encountered unknown type application/vnd.cncf.helm.config.v1+json; children may not be fetched 
DEBU[0000] encountered unknown type application/vnd.cncf.helm.chart.content.v1.tar+gzip; children may not be fetched 
Pulled: us-west1-docker.pkg.dev/PROJECT_NAME_HERE/REPO_NAME_HERE/CHART_NAME_HERE:CHART_VERSION_HERE
Digest: sha256:DIGEST_HERE

But then when it hangs, it looks like this:

$ helm pull oci://us-west1-docker.pkg.dev/PROJECT_NAME_HERE/REPO_NAME_HERE/CHART_NAME_HERE --version CHART_VERSION_HERE --debug
DEBU[0000] resolving                                     host=us-west1-docker.pkg.dev
DEBU[0000] do request                                    host=us-west1-docker.pkg.dev request.header.accept="application/vnd.docker.distribution.manifest.v2+json, application/vnd.docker.distribution.manifest.list.v2+json, application/vnd.oci.image.manifest.v1+json, application/vnd.oci.image.index.v1+json, */*" request.header.user-agent=Helm/3.10.0 request.method=HEAD url="https://us-west1-docker.pkg.dev/v2/PROJECT_NAME_HERE/REPO_NAME_HERE/CHART_NAME_HERE/manifests/CHART_VERSION_HERE"
DEBU[0000] fetch response received                       host=us-west1-docker.pkg.dev response.header.alt-svc="h3=\":443\"; ma=2592000,h3-29=\":443\"; ma=2592000,h3-Q050=\":443\"; ma=2592000,h3-Q046=\":443\"; ma=2592000,h3-Q043=\":443\"; ma=2592000,quic=\":443\"; ma=2592000; v=\"46,43\"" response.header.content-length=102 response.header.content-type=application/json response.header.date="Wed, 05 Oct 2022 23:01:07 GMT" response.header.docker-distribution-api-version=registry/2.0 response.header.www-authenticate="Bearer realm=\"https://us-west1-docker.pkg.dev/v2/token\",service=\"us-west1-docker.pkg.dev\",scope=\"repository:PROJECT_NAME_HERE/REPO_NAME_HERE/CHART_NAME_HERE:pull\"" response.header.x-content-type-options=nosniff response.header.x-frame-options=SAMEORIGIN response.header.x-xss-protection=0 response.status="401 Unauthorized" url="https://us-west1-docker.pkg.dev/v2/PROJECT_NAME_HERE/REPO_NAME_HERE/CHART_NAME_HERE/manifests/CHART_VERSION_HERE"
DEBU[0000] Unauthorized                                  header="Bearer realm=\"https://us-west1-docker.pkg.dev/v2/token\",service=\"us-west1-docker.pkg.dev\",scope=\"repository:PROJECT_NAME_HERE/REPO_NAME_HERE/CHART_NAME_HERE:pull\"" host=us-west1-docker.pkg.dev
DEBU[0000] do request                                    host=us-west1-docker.pkg.dev request.header.accept="application/vnd.docker.distribution.manifest.v2+json, application/vnd.docker.distribution.manifest.list.v2+json, application/vnd.oci.image.manifest.v1+json, application/vnd.oci.image.index.v1+json, */*" request.header.user-agent=Helm/3.10.0 request.method=HEAD url="https://us-west1-docker.pkg.dev/v2/PROJECT_NAME_HERE/REPO_NAME_HERE/CHART_NAME_HERE/manifests/CHART_VERSION_HERE"

Unlike @Unknow0's issue in #10396, I don't think this has to do with multiple keys, since I only have a single entry in my ~/.config/helm/registry/config.json file and I do not have a ~/.docker/config.json file.

Any ideas about what might be going on would be much appreciated!

Output of helm version:

$ helm version
version.BuildInfo{Version:"v3.10.0", GitCommit:"ce66412a723e4d89555dc67217607c6579ffcb21", GitTreeState:"clean", GoVersion:"go1.18.6"}

Output of kubectl version:

$ kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.2", GitCommit:"5835544ca568b757a8ecae5c153f317e5736700e", GitTreeState:"clean", BuildDate:"2022-09-21T14:33:49Z", GoVersion:"go1.19.1", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Error from server (NotFound): the server could not find the requested resource

Cloud Provider/Platform (AKS, GKE, Minikube etc.): This issue involves GCP Artifact Registry (I am not interacting with a k8s cluster for this issue).

The text was updated successfully, but these errors were encountered:

wibobm · 2022-10-21T19:04:00Z

Seeing same here.

joejulian · 2022-10-22T00:18:27Z

It looks to me like helm is never receiving a response.

dmvariphy · 2022-10-24T19:11:08Z

Issue created with Google as well: https://issuetracker.google.com/issues/255248281

As a workaround, you can manually timeout the command and retry it, e.g.

/bin/bash -c \
  "until 
  timeout 5 \
    helm pull oci://us-west1-docker.pkg.dev/$PROJECT_NAME/$REPO_NAME/$CHART_NAME --version $CHART_VERSION; 
  do 
    echo 'helm hung; retrying...'; 
  done"

jonjohnsonjr · 2022-10-25T19:49:56Z

How long does it hang? Indefinitely? 60s?

dmvariphy · 2022-10-25T21:10:46Z

@jonjohnsonjr I think it hangs indefinitely.

jonjohnsonjr · 2022-10-26T18:05:48Z

I am not able to reproduce this. I've been running helm 3.10.0 in a loop pulling from us-west1-docker.pkg.dev for ~an hour without any hanging. Where are you running this from? Are you behind any kind of proxy?

dmvariphy · 2022-10-26T21:47:42Z

Thanks for taking a look @jonjohnsonjr. It's interesting you aren't seeing the issue; that's another data point in figuring out what's going on here..

I can see the issue both when running helm locally on my workstation, as well as part of a build pipeline running on a (not local) VM. The requests would come out of separate networks in those two scenarios.

As far as I know, I'm not behind a proxy or anything. I will try from another network and see if I can fail to reproduce the issue - so far, I can always reproduce it within 30 seconds or so of trying.

joejulian · 2022-10-27T08:13:22Z

Time to break out wireshark. 😁

jonjohnsonjr · 2022-10-31T21:34:53Z

Yeah if it's possible to reproduce this with curl or get a packet capture, I could look into this further.

cmaahs · 2022-11-02T18:11:39Z

For me it hangs for 300 seconds, and the problem appears to be in the package that helm uses to interface with OCI registries.

https://github.com/oras-project/oras-go

cmaahs · 2022-11-04T16:56:24Z

Some additional troubleshooting info. I replaced using Helm's PULL functionality, with a couple of simple GAR / Docker V2 API calls:

GET https://us-docker.pkg.dev/v2/<project>/<repository>/manifests/<tag>

Retrieve the digest for media type: application/vnd.cncf.helm.chart.content.v1.tar+gzip

Then use that digest to pull the TGZ

GET https://us-docker.pkg.dev/v2/<project>/<repository>/blobs/<digest>

Switching to this process I've not had any 300s delays in retrieving the helm TGZ file.

It is unclear where in the ORAS (https://github.com/oras-project/oras-go) code the delay is being introduced.

Running this, I have about 150 or so helm images, and I would hit the 300s delay every 4-10 image pulls. However, this is VERY random, and even more so when run by my co-workers, some experience almost NO delays, perhaps a single 300s delay over the entire run.

#!/usr/bin/env zsh

gcloud auth print-access-token | helm registry login -u oauth2accesstoken --password-stdin https://us-docker.pkg.dev
PROJECT='<gcp project id>'
REPOSITORY="projects/${PROJECT}/locations/us/repositories/helm"   # helm is a specific registry folder containing only Helm OCI images
for i in $(gcloud artifacts packages list --repository ${REPOSITORY} --location all --format json  2>/dev/null | jq -r '.[].name'); do
    for t in $(gcloud artifacts tags list --repository ${REPOSITORY} --package ${REPOSITORY}/packages/${i} --location us 2>/dev/null | awk 'NR>1 {print $1}' | sort -r --version-sort | head -n 1); do
        if [[ -n ${t} ]]; then
            echo "start: $(date)         ${i}:${t}"
            helm pull oci://us-docker.pkg.dev/${PROJECT}/helm/${i} --version ${t} # --debug
            echo "  end: $(date)"
        fi
    done
done

I wrote a some simple go code to replace the helm pull in the above script, that uses the two API calls described above and have been running it for the past several days with zero 300s delays.

jonjohnsonjr · 2022-12-03T00:18:21Z

$ helm version
version.BuildInfo{Version:"v3.10.0", GitCommit:"ce66412a723e4d89555dc67217607c6579ffcb21", GitTreeState:"clean", GoVersion:"go1.18.6"}

I did some digging on this... I'm more confused than ever, but what I've discovered:

I can trigger this by running mitmdump and trying to pull with helm. I cannot trigger it any other way.
helm gets stuck here when it attempts to POST /v2/token. AR will return a 405, but we never see this request. I also don't see the POST request in mitmdump, so I'm not sure why it's stuck here.
If I disable http2 via: GODEBUG=http2client=0, everything works.
I can reproduce the same issue with oras v0.12.0, it gets stuck in the same place, but recent version of oras (which doesn't hit this path in containerd) doesn't have any issues.

I am convinced there is either a bug in containerd or a bug in how oras was using containerd, but it's probably not worth trying to find the bug given that helm could just upgrade to oras-go v2 instead.

github-actions · 2023-03-25T00:12:55Z

This issue has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs.

dmvariphy · 2023-03-25T15:41:01Z

We're still seeing the issue, although we are working around it with something like I mentioned above: #11415 (comment)

s-todisco · 2023-05-30T08:20:09Z

I have the same issue with GCP Artifact Registry, in particular with "helmfile", but the problem is the same with "helm login". In my case it doesn't seem hung, it responds randomly "401 - unauthorized". Sometimes there are necessary 10 attempts to get one right anwser and sometimes only two-three attempts, it's very random.

github-actions · 2023-08-29T00:12:05Z

This issue has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs.

cjim8889 · 2024-02-28T23:12:25Z

$ helm version
version.BuildInfo{Version:"v3.10.0", GitCommit:"ce66412a723e4d89555dc67217607c6579ffcb21", GitTreeState:"clean", GoVersion:"go1.18.6"}
I did some digging on this... I'm more confused than ever, but what I've discovered:

I can trigger this by running mitmdump and trying to pull with helm. I cannot trigger it any other way.

helm gets stuck here when it attempts to POST /v2/token. AR will return a 405, but we never see this request. I also don't see the POST request in mitmdump, so I'm not sure why it's stuck here.

If I disable http2 via: GODEBUG=http2client=0, everything works.

I can reproduce the same issue with oras v0.12.0, it gets stuck in the same place, but recent version of oras (which doesn't hit this path in containerd) doesn't have any issues.

I am convinced there is either a bug in containerd or a bug in how oras was using containerd, but it's probably not worth trying to find the bug given that helm could just upgrade to oras-go v2 instead.

We believe either GCP cloud storage is using non-standard http2 impl or there's a bug in golang std library. We are seeing this exact issues with cloud bucket in multiple different use cases.

joejulian added question/support oci Related to Helm OCI feature labels Oct 5, 2022

joejulian added bug Categorizes issue or PR as related to a bug. and removed question/support labels Dec 24, 2022

github-actions bot added the Stale label Mar 25, 2023

github-actions bot removed the Stale label Mar 26, 2023

github-actions bot added the Stale label Aug 29, 2023

github-actions bot closed this as completed Sep 28, 2023

mathe-matician mentioned this issue May 14, 2024

Switch to oras.land/oras-go/v2 #11821

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

helm pull/push randomly hangs with GCP Artifact Registry #11415

helm pull/push randomly hangs with GCP Artifact Registry #11415

dzmoore commented Oct 5, 2022

wibobm commented Oct 21, 2022

joejulian commented Oct 22, 2022

dmvariphy commented Oct 24, 2022

jonjohnsonjr commented Oct 25, 2022

dmvariphy commented Oct 25, 2022

jonjohnsonjr commented Oct 26, 2022

dmvariphy commented Oct 26, 2022

joejulian commented Oct 27, 2022

jonjohnsonjr commented Oct 31, 2022

cmaahs commented Nov 2, 2022 •

edited

cmaahs commented Nov 4, 2022 •

edited

jonjohnsonjr commented Dec 3, 2022

github-actions bot commented Mar 25, 2023

dmvariphy commented Mar 25, 2023

s-todisco commented May 30, 2023

github-actions bot commented Aug 29, 2023

cjim8889 commented Feb 28, 2024

helm pull/push randomly hangs with GCP Artifact Registry #11415

helm pull/push randomly hangs with GCP Artifact Registry #11415

Comments

dzmoore commented Oct 5, 2022

wibobm commented Oct 21, 2022

joejulian commented Oct 22, 2022

dmvariphy commented Oct 24, 2022

jonjohnsonjr commented Oct 25, 2022

dmvariphy commented Oct 25, 2022

jonjohnsonjr commented Oct 26, 2022

dmvariphy commented Oct 26, 2022

joejulian commented Oct 27, 2022

jonjohnsonjr commented Oct 31, 2022

cmaahs commented Nov 2, 2022 • edited

cmaahs commented Nov 4, 2022 • edited

jonjohnsonjr commented Dec 3, 2022

github-actions bot commented Mar 25, 2023

dmvariphy commented Mar 25, 2023

s-todisco commented May 30, 2023

github-actions bot commented Aug 29, 2023

cjim8889 commented Feb 28, 2024

cmaahs commented Nov 2, 2022 •

edited

cmaahs commented Nov 4, 2022 •

edited