-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
helm pull/push randomly hangs with GCP Artifact Registry #11415
Comments
Seeing same here. |
It looks to me like helm is never receiving a response. |
Issue created with Google as well: https://issuetracker.google.com/issues/255248281 As a workaround, you can manually timeout the command and retry it, e.g. /bin/bash -c \
"until
timeout 5 \
helm pull oci://us-west1-docker.pkg.dev/$PROJECT_NAME/$REPO_NAME/$CHART_NAME --version $CHART_VERSION;
do
echo 'helm hung; retrying...';
done" |
How long does it hang? Indefinitely? 60s? |
@jonjohnsonjr I think it hangs indefinitely. |
I am not able to reproduce this. I've been running helm 3.10.0 in a loop pulling from us-west1-docker.pkg.dev for ~an hour without any hanging. Where are you running this from? Are you behind any kind of proxy? |
Thanks for taking a look @jonjohnsonjr. It's interesting you aren't seeing the issue; that's another data point in figuring out what's going on here.. I can see the issue both when running helm locally on my workstation, as well as part of a build pipeline running on a (not local) VM. The requests would come out of separate networks in those two scenarios. As far as I know, I'm not behind a proxy or anything. I will try from another network and see if I can fail to reproduce the issue - so far, I can always reproduce it within 30 seconds or so of trying. |
Time to break out wireshark. 😁 |
Yeah if it's possible to reproduce this with |
For me it hangs for 300 seconds, and the problem appears to be in the package that helm uses to interface with OCI registries. |
Some additional troubleshooting info. I replaced using Helm's PULL functionality, with a couple of simple GAR / Docker V2 API calls:
Retrieve the digest for media type: application/vnd.cncf.helm.chart.content.v1.tar+gzip Then use that digest to pull the TGZ
Switching to this process I've not had any 300s delays in retrieving the helm TGZ file. It is unclear where in the ORAS (https://github.com/oras-project/oras-go) code the delay is being introduced. Running this, I have about 150 or so helm images, and I would hit the 300s delay every 4-10 image pulls. However, this is VERY random, and even more so when run by my co-workers, some experience almost NO delays, perhaps a single 300s delay over the entire run. #!/usr/bin/env zsh
gcloud auth print-access-token | helm registry login -u oauth2accesstoken --password-stdin https://us-docker.pkg.dev
PROJECT='<gcp project id>'
REPOSITORY="projects/${PROJECT}/locations/us/repositories/helm" # helm is a specific registry folder containing only Helm OCI images
for i in $(gcloud artifacts packages list --repository ${REPOSITORY} --location all --format json 2>/dev/null | jq -r '.[].name'); do
for t in $(gcloud artifacts tags list --repository ${REPOSITORY} --package ${REPOSITORY}/packages/${i} --location us 2>/dev/null | awk 'NR>1 {print $1}' | sort -r --version-sort | head -n 1); do
if [[ -n ${t} ]]; then
echo "start: $(date) ${i}:${t}"
helm pull oci://us-docker.pkg.dev/${PROJECT}/helm/${i} --version ${t} # --debug
echo " end: $(date)"
fi
done
done I wrote a some simple go code to replace the |
I did some digging on this... I'm more confused than ever, but what I've discovered:
I am convinced there is either a bug in containerd or a bug in how oras was using containerd, but it's probably not worth trying to find the bug given that helm could just upgrade to oras-go v2 instead. |
This issue has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs. |
We're still seeing the issue, although we are working around it with something like I mentioned above: #11415 (comment) |
I have the same issue with GCP Artifact Registry, in particular with "helmfile", but the problem is the same with "helm login". In my case it doesn't seem hung, it responds randomly "401 - unauthorized". Sometimes there are necessary 10 attempts to get one right anwser and sometimes only two-three attempts, it's very random. |
This issue has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs. |
We believe either GCP cloud storage is using non-standard http2 impl or there's a bug in golang std library. We are seeing this exact issues with cloud bucket in multiple different use cases. |
Hello, I'm seeing an issue on version 3.10.0 of helm where push/pull to/from GCP Artifact Registry seems to randomly hang. For example, maybe 2 out of 3 times, a pull will work, but then it seems to hang.
When things work, it looks like this:
But then when it hangs, it looks like this:
Unlike @Unknow0's issue in #10396, I don't think this has to do with multiple keys, since I only have a single entry in my
~/.config/helm/registry/config.json
file and I do not have a~/.docker/config.json
file.Any ideas about what might be going on would be much appreciated!
Output of
helm version
:Output of
kubectl version
:Cloud Provider/Platform (AKS, GKE, Minikube etc.): This issue involves GCP Artifact Registry (I am not interacting with a k8s cluster for this issue).
The text was updated successfully, but these errors were encountered: