bug- race condition due to unused resource generation metadata/spec #10859

muang0 · 2022-04-14T05:27:32Z

I would like to propose a new 'ReadinessCheckDelay' argument to the helm commands: 'install', 'upgrade', and 'rollback' based on a race condition we are seeing between helm and a kubernetes controller. We are running helm install with the 'wait' and 'atomic' arguments, but helm doesn't track the resource readiness state (we are seeing this in approximately half of our installs that contain a statefulset). We have reviewed the api-server logs and found that helm makes both the patch and resource readiness get calls before the statefulset controller has the chance to update the resource. This is causing a false positive for the readiness check on the statefulset, and helm incorrectly updates the install status as completed successfully. I am proposing a 'ReadinessCheckDelay' argument that can be used to add some delay between the resources being patched by helm and the subsequent readiness checks. Unfortunately I can't share logs because this is something I am encountering in my cluster at work (highly restrictive fintech). This is something I'm happy to contribute assuming the community is open to this change. Thanks for the feedback :)

Output of helm version: 3.8.0

Output of kubectl version: 1.21.5

Cloud Provider/Platform (AKS, GKE, Minikube etc.): EKS

The text was updated successfully, but these errors were encountered:

joejulian · 2022-04-14T16:04:12Z

If I'm understanding correctly, this affects the upgrade and rollback processes due to the fact that when a StatefulSet or Deployment is updated, it's still considered ready because the subresources (pods, replicasets, etc) have not been updated yet. This fulfills the resources expectations (3 pods ready out of 3, for instance) causing the wait to consider the resource ready.

Have you considered filing a Kubernetes bug for this? It feels like this should be handled in the API. When patching a Deployment, StatefulSet, etc, it seems like maybe it should reset the readiness. I'd be really interested in seeing what sig-api-machinery thinks.

joejulian · 2022-04-14T16:05:34Z

As for a proposal, that would start as a HIP.

muang0 · 2022-04-14T18:39:03Z

Thanks for the response! Just opened an issue to kubernetes and tagged sig-api-machinery. Let's see what their analysis is and go from there 😄

muang0 · 2022-04-19T05:15:57Z

@joejulian Check out Jordans response on my issue opened to kubernetes. Seems like a great suggestion to incorporate generational checks into the readiness logic. Might be worth a review to see if there are additional resources that could benefit from generational checks. What are your thoughts on this idea and it's potential as a HIP? Thanks for the help again!

github-actions · 2022-07-28T00:15:56Z

This issue has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs.

muang0 · 2022-07-28T01:09:43Z

Bumping, this isn't stale

muang0 mentioned this issue Apr 14, 2022

helm upgrade --wait does not wait on newer versions #10061

Closed

joejulian added the question/support label Apr 14, 2022

muang0 mentioned this issue Apr 14, 2022

Patch request should update resource to indicate patch operation is in progress kubernetes/kubernetes#109490

Closed

muang0 changed the title ~~Proposal- 'ReadinessCheckDelay' argument~~ bug- race condition due to unused resource generation metadata/spec Apr 28, 2022

muang0 mentioned this issue May 2, 2022

Verify generation in readiness checks #10920

Merged

3 tasks

github-actions bot added the Stale label Jul 28, 2022

github-actions bot removed the Stale label Jul 29, 2022

joejulian added the keep open label Sep 19, 2022

joejulian added in progress and removed keep open labels Mar 22, 2023

dashanji mentioned this issue Jul 26, 2023

Vineyard operator can't be deployed successfully with helm chart v6d-io/v6d#1490

Closed

mattfarina closed this as completed in #10920 Jan 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug- race condition due to unused resource generation metadata/spec #10859

bug- race condition due to unused resource generation metadata/spec #10859

muang0 commented Apr 14, 2022 •

edited

joejulian commented Apr 14, 2022

joejulian commented Apr 14, 2022

muang0 commented Apr 14, 2022

muang0 commented Apr 19, 2022

github-actions bot commented Jul 28, 2022

muang0 commented Jul 28, 2022

bug- race condition due to unused resource generation metadata/spec #10859

bug- race condition due to unused resource generation metadata/spec #10859

Comments

muang0 commented Apr 14, 2022 • edited

joejulian commented Apr 14, 2022

joejulian commented Apr 14, 2022

muang0 commented Apr 14, 2022

muang0 commented Apr 19, 2022

github-actions bot commented Jul 28, 2022

muang0 commented Jul 28, 2022

muang0 commented Apr 14, 2022 •

edited