-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
helm upgrade > timeout on pre-upgrade hook > revision stuck in PENDING_UPGRADE
and multiple DEPLOYED
revisions arise soon
#4558
Comments
This may directly also have led to multiple revisions considered to be |
PENDING_UPGRADE
PENDING_UPGRADE
and multiple DEPLOYED
revisions arise soon
By running the same My helm hooks are mostly a DaemonSet pulling images with init-containers with a pause image used by the main container. I also have a Job that awaits the DaemonSet to have the desired ready pods. All hooks have the following annotations.
I think that
An indication of this to be somewhat correct, is that the upgrade were considered to be succeeded in the end when it really should or could not be! I had asked it to pull images never to be found in the pre-upgrade hooks, but the actual upgrade happened even though those images did not exist. Somehow, tiller was fooled to believe the hooks completed successfully! |
How to get an indeployable Deployment deployedI tried to create a minimalistic reproduction and ended up with something slightly different but I bet that this is related. The following Charts deployment should never be deployed, right? Because it has a hook that should keep running in eternity. But it will be deployed if you run two upgrades in succession and have a hook resource already available with the same name and about to terminate. Chart.yaml: apiVersion: v1
appVersion: "1.0"
description: A Helm chart for Kubernetes
name: issue-4558
version: 0.1.0 templates/deployment.yaml: apiVersion: apps/v1beta2
kind: Deployment
metadata:
name: never-to-be-installed-deployment
spec:
selector:
matchLabels:
dummy: dummy
template:
metadata:
labels:
dummy: dummy
spec:
containers:
- name: never-to-be-installed-deployment
image: "gcr.io/google_containers/pause:3.1" templates/job.yaml: apiVersion: batch/v1
kind: Job
metadata:
name: never-finishing-job
annotations:
"helm.sh/hook": pre-install,pre-upgrade
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
spec:
template:
spec:
restartPolicy: Never
containers:
- name: never-finishing-job
image: "gcr.io/google_containers/pause:3.1" Reproduction commands:helm upgrade issue . --install --namespace issue
# abort
helm upgrade issue . --install --namespace issue |
I have notice the same issue but we are not using the |
Same: ➜ helm history jupyterhub
REVISION UPDATED STATUS CHART DESCRIPTION
1 Mon Aug 12 21:29:28 2019 DEPLOYED jupyterhub-0.8-ff69a77 Install complete
2 Mon Aug 12 22:04:16 2019 PENDING_UPGRADE jupyterhub-0.8-ff69a77 Preparing upgrade |
Is there a workaround for this? Is upgrading to helm3 a solution? |
I've just run into this issue and worked around it by performing a problem:
fix:
|
We are running into this same issue with helm 3. The pipeline gets canceled and the helm operation is stuck in pending-upgrade. The current workaround for running a rollback does work but it isn't that great for an automated pipeline unless we add a check before to make sure to "rollback" before deploy. Is there anyway to just bypass the "pending-upgrade" status on a new deploy without running a rollback? |
We are running on Helm 3.4.1 and are running into the same issue as here from time to time. Worth mentioning that the previous version 3.3.x had no such trouble with the deployments... |
Same problem, coming here searching for a reason/fix 👍 |
We also have the same problem. |
Same problem here. Our pipeline gets canceled when there is a new version running and afterwards we can't deploy anymore because of |
We have the same problem in our GitLab pipelines. The workaround (running rollback) is not a good solution for prod CI/CD pipelines. |
Also ran into the same issue: Error: UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress
make: *** [Makefile:131: deploy] Error 1 $ helm version |
Issue is reproducible on helm2 and helm3. |
This happened to me when I SIGTERMd an upgrade. I solved it by deleting the helm secret associated with this release, e.g.
|
Any updates on this issue? |
Can someone else confirm it's "fixed" in |
I didn't fix it for me, I've cancelled a deployment using |
Yes, this issue is still existing with new version. We got the same with |
Could you tell me how did you cancel your deployment? Like ctrl+C ? |
Any solution for this? |
+1 |
Happy to have a contribution to address this. Probably should start with a HIP also take a look at the contributing doc. |
It is observed that when a helm release is in pending state, another helm release can't be started by FluxCD. FluxCD will not try to do steps to apply the newer helm release, but will just error. This prevents us from applying a new helm release over a release with pods stuck in Pending state (just an example). When the specific message for helm operation in progress is detected, attempt to recover by moving the older releases to failed state. Move inspired by [1]. To do so, patch the helm secret for the specific release. As an optimization, trigger the FluxCD HelmRelease reconciliation right after. One future optimization we can do is run an audit to delete the helm releases for which metadata status is a pending operation, but release data is failed (resource that we patched in this commit). Refactor HelmRelease resource reconciliation trigger, smaller size. There are upstream references related to this bug, see [2] and [3]. Tests on Debian AIO-SX: PASS: unlocked enabled available PASS: platform-integ-apps applied after reproducing error: PASS: inspect sysinv logs, see recovery is attemped PASS: inspect fluxcd logs, see that HelmRelease reconciliation is triggered part of recovery [1]: https://github.com/porter-dev/porter/pull/1685/files [2]: helm/helm#8987 [3]: helm/helm#4558 Closes-Bug: 1997368 Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com> Change-Id: I36116ce8d298cc97194062b75db64541661ce84d
It's similar to #8987 right? |
Hit the same issue, but stuck in
Helm: Any ideas on this? |
helm v3.10.3 same when install ingress-nginx with helm. |
1+ really required |
2+ really required |
I'm stuck here. Can't do any helm install UPD: |
Use this before any upgrade/install (maybe already posted in this issue) :
I initially found it here : #5595 (comment) |
Brilliant, this is exactly what I was looking for. |
I'm hitting this bug on helm version 3.12
|
The problem is if there is ever a race condition in the code that causes multiple helm deployments to go through at the same time, this technique can cause corruption. |
If you deploy the same module in production multiple times at the exact same time, you have bigger problems than this one, my friend. For other environments, just deploy again. Before avoiding problems occuring once in a million, there are other everyday's problems to solve, generally speaking 😉 |
If you know the previous timeout something like
provided clock to be in sync |
This issue has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs. |
Please open that issue. It's not resolved. |
I agree that there's probably not much helm can do to handle this in all occurrences, but there should at least be some acknowledgement of the issue and some clear guidance on how to resolve when it occurs. The only workarounds I've seen from reading through the issues pages are:
2 or 3 is where I think some official guidance could be provided. Also, while helm might not be able to prevent the issue, since it is common and very disruptive, could helm provide some tool to resync correctly a stuck pending status? |
Reproduction and symptom
helm upgrade
with a helm pre-upgrade hook that times out.Error: UPGRADE FAILED: timed out waiting for the condition
.helm history my-release-name
# the last line... 22 Wed Aug 29 17:59:48 2018 PENDING_UPGRADE jupyterhub-0.7-04ccf1a Preparing upgrade
Expected outcome
The revision should end up in
FAILED
rather thanPENDING_UPGRADE
right?The text was updated successfully, but these errors were encountered: