Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix misjudgment of deployment and statefuleset health status #2928

Merged
merged 1 commit into from
Dec 12, 2022

Conversation

Fish-pro
Copy link
Member

@Fish-pro Fish-pro commented Dec 8, 2022

What type of PR is this?

/kind bug

What this PR does / why we need it:

When I apply the following yaml of the deployment and pp

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - image: nginx:dfaadadffasdfa # image not exist
        name: nginx
---
apiVersion: policy.karmada.io/v1alpha1
kind: PropagationPolicy
metadata:
  name: nginx-propagation
spec:
  resourceSelectors:
    - apiVersion: apps/v1
      kind: Deployment
      name: nginx
  placement:
    clusterAffinity:
      clusterNames:
        - 10-29-14-21
        - 10-29-14-25

We can find that the health state of the resource under the resourceBinding is Healthy

status:
  aggregatedStatus:
  - applied: true
    clusterName: 10-29-14-21
    health: Healthy
    status:
      replicas: 2
      unavailableReplicas: 2
      updatedReplicas: 2
  - applied: true
    clusterName: 10-29-14-25
    health: Healthy
    status:
      replicas: 2
      unavailableReplicas: 2
      updatedReplicas: 2
  conditions:
  - lastTransitionTime: "2022-12-08T08:34:03Z"
    message: All works have been successfully applied
    reason: FullyAppliedSuccess
    status: "True"
    type: FullyApplied
  - lastTransitionTime: "2022-12-08T08:34:03Z"
    message: Binding has been scheduled
    reason: BindingScheduled
    status: "True"
    type: Scheduled
  schedulerObservedGeneration: 2

However, the resources propagated to the member clusters at this time are not healthy

[root@dce-10-29-14-21 demo]# k -n test get deploy
NAME           READY   UP-TO-DATE   AVAILABLE   AGE
nginx          0/2     2            0           74m

This pr fix this problem

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

`karmada-controller-manager`/`karmada-agent`: Fixed misjudgment of deployment and statefuleset health status.

@karmada-bot karmada-bot added the kind/bug Categorizes issue or PR as related to a bug. label Dec 8, 2022
@karmada-bot karmada-bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Dec 8, 2022
@Fish-pro
Copy link
Member Author

Fish-pro commented Dec 8, 2022

@RainbowMango
Copy link
Member

I think it's due to my comments in #2329 (comment).

@XiShanYongYe-Chang Please help to confirm how the argo assess the health status exactly. Does the updated means UP-TO-DATE?

@Fish-pro
Copy link
Member Author

Fish-pro commented Dec 8, 2022

@RainbowMango

I understand updatedReplicas mean UP-TO-DATE

image

image

$ k explain deployment.status.updatedReplicas
KIND:     Deployment
VERSION:  apps/v1

FIELD:    updatedReplicas <integer>

DESCRIPTION:
     Total number of non-terminated pods targeted by this deployment that have
     the desired template spec.

@XiShanYongYe-Chang
Copy link
Member

@XiShanYongYe-Chang Please help to confirm how the argo assess the health status exactly. Does the updated means UP-TO-DATE?

argo-cd check the workload heathy by the logic:
https://github.com/argoproj/argo-cd/blob/661afe0ad9653418aa0e3e2cc7939e42e0db40ab/resource_customizations/argoproj.io/Rollout/health.lua#L8-L17

@XiShanYongYe-Chang
Copy link
Member

Hi @Fish-pro, maybe we can learn from the logic of Argo-cd.

@Fish-pro
Copy link
Member Author

Fish-pro commented Dec 9, 2022

Hi @Fish-pro, maybe we can learn from the logic of Argo-cd.

@XiShanYongYe-Chang
Thanks for your reply. Here we learn the logic of argo. Is there any other ability we need to learn argo?

@XiShanYongYe-Chang
Copy link
Member

Thanks for your reply. Here we learn the logic of argo. Is there any other ability we need to learn argo?

We can update like this:

--- a/pkg/resourceinterpreter/defaultinterpreter/healthy.go
+++ b/pkg/resourceinterpreter/defaultinterpreter/healthy.go
@@ -43,7 +43,13 @@ func interpretStatefulSetHealth(object *unstructured.Unstructured) (bool, error)
                return false, err
        }

-       healthy := (statefulSet.Status.UpdatedReplicas == *statefulSet.Spec.Replicas) && (statefulSet.Generation == statefulSet.Status.ObservedGeneration)
+       healthy := true
+       if statefulSet.Status.UpdatedReplicas < *statefulSet.Spec.Replicas {
+               healthy = false
+       }
+       if statefulSet.Status.AvailableReplicas < statefulSet.Status.UpdatedReplicas {
+               healthy = false
+       }

@Fish-pro
Copy link
Member Author

Fish-pro commented Dec 9, 2022

@XiShanYongYe-Chang Ok, I see what you mean then

@XiShanYongYe-Chang
Copy link
Member

Hi @Fish-pro, we still need to judge the workload status observedGeneration, refer to https://github.com/argoproj/argo-cd/blob/661afe0ad9653418aa0e3e2cc7939e42e0db40ab/resource_customizations/argoproj.io/Rollout/health.lua#L66:

--- a/pkg/resourceinterpreter/defaultinterpreter/healthy.go
+++ b/pkg/resourceinterpreter/defaultinterpreter/healthy.go
@@ -43,8 +43,16 @@ func interpretStatefulSetHealth(object *unstructured.Unstructured) (bool, error)
                return false, err
        }

-       healthy := (statefulSet.Status.UpdatedReplicas == *statefulSet.Spec.Replicas) && (statefulSet.Generation == statefulSet.Status.ObservedGeneration)
-       return healthy, nil
+       if statefulSet.Generation != statefulSet.Status.ObservedGeneration {
+               return false, nil
+       }
+       if statefulSet.Status.UpdatedReplicas < *statefulSet.Spec.Replicas {
+               return false, nil
+       }
+       if statefulSet.Status.AvailableReplicas < statefulSet.Status.UpdatedReplicas {
+               return false, nil
+       }
+       return true, nil

In addition, we may need to modify these resources at the same time: Deployment, ReplicaSet, StatefulSet DaemonSet

@karmada-bot karmada-bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Dec 9, 2022
@karmada-bot
Copy link
Collaborator

@Fish-pro: GitHub didn't allow me to request PR reviews from the following users: updated.

Note that only karmada-io members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @RainbowMango @XiShanYongYe-Chang updated

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Fish-pro
Copy link
Member Author

Fish-pro commented Dec 9, 2022

Hi @Fish-pro, we still need to judge the workload status observedGeneration, refer to https://github.com/argoproj/argo-cd/blob/661afe0ad9653418aa0e3e2cc7939e42e0db40ab/resource_customizations/argoproj.io/Rollout/health.lua#L66:

--- a/pkg/resourceinterpreter/defaultinterpreter/healthy.go
+++ b/pkg/resourceinterpreter/defaultinterpreter/healthy.go
@@ -43,8 +43,16 @@ func interpretStatefulSetHealth(object *unstructured.Unstructured) (bool, error)
                return false, err
        }

-       healthy := (statefulSet.Status.UpdatedReplicas == *statefulSet.Spec.Replicas) && (statefulSet.Generation == statefulSet.Status.ObservedGeneration)
-       return healthy, nil
+       if statefulSet.Generation != statefulSet.Status.ObservedGeneration {
+               return false, nil
+       }
+       if statefulSet.Status.UpdatedReplicas < *statefulSet.Spec.Replicas {
+               return false, nil
+       }
+       if statefulSet.Status.AvailableReplicas < statefulSet.Status.UpdatedReplicas {
+               return false, nil
+       }
+       return true, nil

In addition, we may need to modify these resources at the same time: Deployment, ReplicaSet, StatefulSet DaemonSet

@XiShanYongYe-Chang Thanks for reminding. Judgment is reserved

@Fish-pro Fish-pro force-pushed the fix/healthy branch 2 times, most recently from 39dd135 to 7a79b9b Compare December 9, 2022 04:36
@Fish-pro
Copy link
Member Author

Fish-pro commented Dec 9, 2022

Copy link
Member

@XiShanYongYe-Chang XiShanYongYe-Chang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Fish-pro, thanks~
Can you help also modify ReplicaSet and DaemonSet?

Comment on lines 41 to 43
if deploy.Generation != deploy.Status.ObservedGeneration {
return false, nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can judge at first, the current object is up-to-date only when generation is consistent.

@karmada-bot karmada-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 9, 2022
@Fish-pro
Copy link
Member Author

Fish-pro commented Dec 9, 2022

Hi @Fish-pro, thanks~ Can you help also modify ReplicaSet and DaemonSet?

@XiShanYongYe-Chang The rs has met the judgment requirements and the daemonset has been updated

@codecov-commenter
Copy link

codecov-commenter commented Dec 9, 2022

Codecov Report

Merging #2928 (898430e) into master (efb50ad) will increase coverage by 0.26%.
The diff coverage is 100.00%.

❗ Current head 898430e differs from pull request most recent head ec8f4c2. Consider uploading reports for the commit ec8f4c2 to get more accurate results

@@            Coverage Diff             @@
##           master    #2928      +/-   ##
==========================================
+ Coverage   38.30%   38.56%   +0.26%     
==========================================
  Files         204      205       +1     
  Lines       18756    18825      +69     
==========================================
+ Hits         7184     7260      +76     
+ Misses      11146    11136      -10     
- Partials      426      429       +3     
Flag Coverage Δ
unittests 38.56% <100.00%> (+0.26%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
.../resourceinterpreter/defaultinterpreter/healthy.go 66.99% <100.00%> (+6.07%) ⬆️
pkg/karmadactl/cmdinit/karmada/rbac.go 88.00% <0.00%> (-0.58%) ⬇️
...ceinterpreter/configurableinterpreter/luavm/lua.go 57.14% <0.00%> (-0.12%) ⬇️
pkg/search/proxy/store/util.go 93.36% <0.00%> (ø)
...kg/controllers/status/cluster_status_controller.go 0.00% <0.00%> (ø)
...einterpreter/configurableinterpreter/luavm/kube.go 67.39% <0.00%> (ø)
pkg/util/names/names.go 96.55% <0.00%> (+0.44%) ⬆️
pkg/util/helper/unstructured.go 89.28% <0.00%> (+78.57%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Copy link
Member

@XiShanYongYe-Chang XiShanYongYe-Chang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!
/lgtm
/cc @RainbowMango

@karmada-bot karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Dec 10, 2022
@karmada-bot karmada-bot removed the lgtm Indicates that a PR is ready to be merged. label Dec 12, 2022
Signed-off-by: chen zechun <zechun.chen@daocloud.io>
@Fish-pro
Copy link
Member Author

@RainbowMango Thanks,updated

Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

Thanks.

@karmada-bot karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Dec 12, 2022
@karmada-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RainbowMango

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@karmada-bot karmada-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 12, 2022
@karmada-bot karmada-bot merged commit 37cca1c into karmada-io:master Dec 12, 2022
karmada-bot added a commit that referenced this pull request Dec 12, 2022
…-upstream-release-1.4

Automated cherry pick of #2928: Fix misjudgment of deployment and statefuleset health status
karmada-bot added a commit that referenced this pull request Dec 12, 2022
…-upstream-release-1.3

Automated cherry pick of #2928: Fix misjudgment of deployment and statefuleset health status
@Fish-pro Fish-pro deleted the fix/healthy branch March 18, 2024 07:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants