-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Helm upgrade for charts that contain statefulsets with .spec.updateStrategy.rollingUpdate.partition is incorrect. #11773
Comments
Fixes helm#11773 This change updates readiness check in ready.go to correctly account for statefulsets that are utilizing a partitioned upgrade. These statefulsets only upgrade a subset of the managed pods with each call to helm upgrade. This causes the upgrade to legitimately hit the condition where sts.status.CurrentRevision != sts.Status.UpdateRevision which causes helm to mark the upgrade has failed when in fact it is successful. This change fixes that behavior to only check when partition is unspecified or 0.
Proposed fix: |
Kubernetes docs on partitions: https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#partitions |
From a quick look, the error you report:
is not the one your PR proposes to fix (which come after, on line 797) The logic #L384 purports to account for the update partition. Perhaps this is broken somehow? Or, there was an actual error during the deployment? (your output doesn't show the |
Sorry I believe the full output wasn't captured correctly in the previous iteration.
Also here is the status of the pods that shows that web-2 is ready.
|
I was able to reproduce this. When we do a partitioned upgrade of a StatefulSet, the status has following content after StatefulSet is successfully upgraded. (partition was set to 2, replicas is 3). {
"availableReplicas": 3,
"collisionCount": 0,
"observedGeneration": 2,
"readyReplicas": 3,
"replicas": 3,
"currentReplicas": 2,
"currentRevision": "web-6b4fd5c9f6",
"updatedReplicas": 1,
"updateRevision": "web-5fbd8d85bc"
} This basically says there are 2 replicas with currentRevision (old), and there is 1 replica with updateRevision (new). When the partition number changes this will move and eventually we will have same value of currentReplicas and updatedReplicas i.e. 3, and same value of currentRevision and updateRevision i.e. This is with partition 0, and replicas 3. {
"availableReplicas": 3,
"collisionCount": 0,
"observedGeneration": 3,
"readyReplicas": 3,
"replicas": 3,
"currentReplicas": 3,
"currentRevision": "web-5fbd8d85bc",
"updatedReplicas": 3
"updateRevision": "web-5fbd8d85bc",
} If we look at kubectl rollout code (which is similar to what we are trying to achieve), they don't check the revision for partitioned rollouts, they just return with success or failure in case it is a partitioned rollout. PS: the original change which is causing this was added as part of #10622 |
Yes thats what the the proposed fix is doing, only execute the check if partition value is unspecified or 0. |
Thanks, all makes more sense now. And thanks @bhavin192 for the additional details. I'll put a comment on the PR too; I think it is worthwhile having a test for this functionality (successful waiting for a partitioned rolling update of a statefulset). If there was test coverage, I suspect we wouldn't be here now :) |
Fixes helm#11773 This change updates readiness check in ready.go to correctly account for statefulsets that are utilizing a partitioned upgrade. These statefulsets only upgrade a subset of the managed pods with each call to helm upgrade. This causes the upgrade to legitimately hit the condition where sts.status.CurrentRevision != sts.Status.UpdateRevision which causes helm to mark the upgrade has failed when in fact it is successful. This change fixes that behavior to only check when partition is unspecified or 0. Signed-off-by: Aman Nijhawan <anijhawan@yugabyte.com>
Updating the archive for the helm chart as the earlier archive link is unusable now. |
…te. (#11774) * Fixes Readiness Check for statefulsets using partitioned rolling update. Fixes #11773 This change updates readiness check in ready.go to correctly account for statefulsets that are utilizing a partitioned upgrade. These statefulsets only upgrade a subset of the managed pods with each call to helm upgrade. This causes the upgrade to legitimately hit the condition where sts.status.CurrentRevision != sts.Status.UpdateRevision which causes helm to mark the upgrade has failed when in fact it is successful. This change fixes that behavior to only check when partition is unspecified or 0. Signed-off-by: Aman Nijhawan <anijhawan@yugabyte.com> * Adding a unit test to verify that partitioned rolling upgrade for a statefulset works. Signed-off-by: Aman Nijhawan <anijhawan@yugabyte.com> --------- Signed-off-by: Aman Nijhawan <anijhawan@yugabyte.com> Co-authored-by: Aman Nijhawan <anijhawan@yugabyte.com>
…te. (helm#11774) * Fixes Readiness Check for statefulsets using partitioned rolling update. Fixes helm#11773 This change updates readiness check in ready.go to correctly account for statefulsets that are utilizing a partitioned upgrade. These statefulsets only upgrade a subset of the managed pods with each call to helm upgrade. This causes the upgrade to legitimately hit the condition where sts.status.CurrentRevision != sts.Status.UpdateRevision which causes helm to mark the upgrade has failed when in fact it is successful. This change fixes that behavior to only check when partition is unspecified or 0. Signed-off-by: Aman Nijhawan <anijhawan@yugabyte.com> * Adding a unit test to verify that partitioned rolling upgrade for a statefulset works. Signed-off-by: Aman Nijhawan <anijhawan@yugabyte.com> --------- Signed-off-by: Aman Nijhawan <anijhawan@yugabyte.com> Co-authored-by: Aman Nijhawan <anijhawan@yugabyte.com> Signed-off-by: Drew Gonzales <drew.gonzales@datadoghq.com>
…te. (#11774) * Fixes Readiness Check for statefulsets using partitioned rolling update. Fixes #11773 This change updates readiness check in ready.go to correctly account for statefulsets that are utilizing a partitioned upgrade. These statefulsets only upgrade a subset of the managed pods with each call to helm upgrade. This causes the upgrade to legitimately hit the condition where sts.status.CurrentRevision != sts.Status.UpdateRevision which causes helm to mark the upgrade has failed when in fact it is successful. This change fixes that behavior to only check when partition is unspecified or 0. Signed-off-by: Aman Nijhawan <anijhawan@yugabyte.com> * Adding a unit test to verify that partitioned rolling upgrade for a statefulset works. Signed-off-by: Aman Nijhawan <anijhawan@yugabyte.com> --------- Signed-off-by: Aman Nijhawan <anijhawan@yugabyte.com> Co-authored-by: Aman Nijhawan <anijhawan@yugabyte.com> (cherry picked from commit eea2f27)
Output of
helm version
:Output of
kubectl version
:Cloud Provider/Platform (AKS, GKE, Minikube etc.): GKE
Hi Helm Maintainers.
In our testing we found that helm upgrade for charts that contain statefulsets with .spec.updateStrategy.rollingUpdate.partition does not function as expected. When using with a --wait and --timeout parameter the following command always hangs
helm upgrade helm-test ./ --debug --timeout=900s --wait -n test-helm
Here is a simple chart that recreates this issue.
https://github.com/amannijhawan/helm-test-case-anijhawan/blob/gh-pages/example-0.1.0.tgz
Steps to reproduce:
mkdir repro
cd repro && wget https://github.com/amannijhawan/helm-test-case-anijhawan/raw/gh-pages/example-0.1.0.tgz
tar -xvzf example-0.1.0.tgz
kubectl create ns repro
helm install helm-test . -n repro
helm upgrade helm-test ./ --debug --timeout=100s --wait --values=upgrade.yaml -n repro
We see that the helm upgrade fails with the following logs.
Edited to included the full output that shows the correct logs
However at this point we can check that the upgrade has been successfully applied to one pod matching the current value of partition.
The issue seems to be this check in ready.go thats assuming the upgrade hasn't been applied yet.
helm/pkg/kube/ready.go
Line 397 in 76157c6
The text was updated successfully, but these errors were encountered: