-
Notifications
You must be signed in to change notification settings - Fork 890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix misjudgment of deployment and statefuleset health status #2928
Conversation
I think it's due to my comments in #2329 (comment). @XiShanYongYe-Chang Please help to confirm how the argo assess the health status exactly. Does the |
I understand $ k explain deployment.status.updatedReplicas
KIND: Deployment
VERSION: apps/v1
FIELD: updatedReplicas <integer>
DESCRIPTION:
Total number of non-terminated pods targeted by this deployment that have
the desired template spec. |
argo-cd check the workload heathy by the logic: |
Hi @Fish-pro, maybe we can learn from the logic of Argo-cd. |
@XiShanYongYe-Chang |
We can update like this: --- a/pkg/resourceinterpreter/defaultinterpreter/healthy.go
+++ b/pkg/resourceinterpreter/defaultinterpreter/healthy.go
@@ -43,7 +43,13 @@ func interpretStatefulSetHealth(object *unstructured.Unstructured) (bool, error)
return false, err
}
- healthy := (statefulSet.Status.UpdatedReplicas == *statefulSet.Spec.Replicas) && (statefulSet.Generation == statefulSet.Status.ObservedGeneration)
+ healthy := true
+ if statefulSet.Status.UpdatedReplicas < *statefulSet.Spec.Replicas {
+ healthy = false
+ }
+ if statefulSet.Status.AvailableReplicas < statefulSet.Status.UpdatedReplicas {
+ healthy = false
+ } |
@XiShanYongYe-Chang Ok, I see what you mean then |
Hi @Fish-pro, we still need to judge the workload status observedGeneration, refer to https://github.com/argoproj/argo-cd/blob/661afe0ad9653418aa0e3e2cc7939e42e0db40ab/resource_customizations/argoproj.io/Rollout/health.lua#L66: --- a/pkg/resourceinterpreter/defaultinterpreter/healthy.go
+++ b/pkg/resourceinterpreter/defaultinterpreter/healthy.go
@@ -43,8 +43,16 @@ func interpretStatefulSetHealth(object *unstructured.Unstructured) (bool, error)
return false, err
}
- healthy := (statefulSet.Status.UpdatedReplicas == *statefulSet.Spec.Replicas) && (statefulSet.Generation == statefulSet.Status.ObservedGeneration)
- return healthy, nil
+ if statefulSet.Generation != statefulSet.Status.ObservedGeneration {
+ return false, nil
+ }
+ if statefulSet.Status.UpdatedReplicas < *statefulSet.Spec.Replicas {
+ return false, nil
+ }
+ if statefulSet.Status.AvailableReplicas < statefulSet.Status.UpdatedReplicas {
+ return false, nil
+ }
+ return true, nil In addition, we may need to modify these resources at the same time: Deployment, ReplicaSet, StatefulSet DaemonSet |
@Fish-pro: GitHub didn't allow me to request PR reviews from the following users: updated. Note that only karmada-io members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@XiShanYongYe-Chang Thanks for reminding. Judgment is reserved |
39dd135
to
7a79b9b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Fish-pro, thanks~
Can you help also modify ReplicaSet
and DaemonSet
?
if deploy.Generation != deploy.Status.ObservedGeneration { | ||
return false, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can judge at first, the current object is up-to-date only when generation is consistent.
@XiShanYongYe-Chang The rs has met the judgment requirements and the daemonset has been updated |
Codecov Report
@@ Coverage Diff @@
## master #2928 +/- ##
==========================================
+ Coverage 38.30% 38.56% +0.26%
==========================================
Files 204 205 +1
Lines 18756 18825 +69
==========================================
+ Hits 7184 7260 +76
+ Misses 11146 11136 -10
- Partials 426 429 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot!
/lgtm
/cc @RainbowMango
Signed-off-by: chen zechun <zechun.chen@daocloud.io>
898430e
to
ec8f4c2
Compare
@RainbowMango Thanks,updated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
Thanks.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: RainbowMango The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
…-upstream-release-1.4 Automated cherry pick of #2928: Fix misjudgment of deployment and statefuleset health status
…-upstream-release-1.3 Automated cherry pick of #2928: Fix misjudgment of deployment and statefuleset health status
What type of PR is this?
/kind bug
What this PR does / why we need it:
When I apply the following yaml of the deployment and pp
We can find that the health state of the resource under the resourceBinding is
Healthy
However, the resources propagated to the member clusters at this time are not healthy
[root@dce-10-29-14-21 demo]# k -n test get deploy NAME READY UP-TO-DATE AVAILABLE AGE nginx 0/2 2 0 74m
This pr fix this problem
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?: