Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

delete PropagationPolicy and OverridePolicy at the same time, lead to workload recover to origin state #3873

Open
wu0407 opened this issue Aug 1, 2023 · 19 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@wu0407
Copy link
Contributor

wu0407 commented Aug 1, 2023

What happened:
I want to clean the member resource, so I delete OverridePolicy and PropagationPolicy on the same time.
The OverridePolicy is to Override deployment spec.replicas to 0.
Many pod create by deployment controller on member cluster after delete action.

rule.yaml:

apiVersion: policy.karmada.io/v1alpha1
kind: ClusterPropagationPolicy
metadata:
  name: workload-propagation-policy
spec:
  resourceSelectors:
    - apiVersion: apps/v1
      kind: Deployment
    - apiVersion: apps/v1
      kind: StatefulSet
  propagateDeps: true
  placement:
    clusterAffinity:
      clusterNames:
        - sh5-online
---
apiVersion: policy.karmada.io/v1alpha1
kind: ClusterOverridePolicy
metadata:
  name: workload-override-policy
spec:
  resourceSelectors:
    - apiVersion: apps/v1
      kind: Deployment
    - apiVersion: apps/v1
      kind: StatefulSet
  overrideRules:
  - overriders:  
      plaintext:
      - path: /spec/replicas
        operator: replace
        value: 0
      - path: /spec/template/spec/tolerations
        operator: add
        value:
        - key: "eks.tke.cloud.tencent.com/eklet"
          operator: "Exists"
          effect: "NoSchedule"
      - path: /spec/template/spec/schedulerName
        operator: replace
        value: "tke-scheduler"

What you expected to happen:
no pod be created and deployment is deleted.
How to reproduce it (as minimally and precisely as possible):

  1. kubectl apply -f rule.yaml
  2. wait resouce is synced
  3. kubectl delete -f rule.yaml
  4. kubectl get pod -w -A
    Anything else we need to know?:

Environment:

  • Karmada version:
  • kubectl-karmada or karmadactl version (the result of kubectl-karmada version or karmadactl version):
  • Others:
kubectl karmada version: version.Info{GitVersion:"v1.6.1", GitCommit:"fdc7ac62c70b571d091a795cbe9b9fceac5f1c2c", GitTreeState:"clean", BuildDate:"2023-07-06T03:35:37Z", GoVersion:"go1.20.4", Compiler:"gc", Platform:"linux/amd64"}
@wu0407 wu0407 added the kind/bug Categorizes issue or PR as related to a bug. label Aug 1, 2023
@wu0407
Copy link
Contributor Author

wu0407 commented Aug 1, 2023

Similar behavior also occurs during creation OverridePolicy and PropagationPolicy.

@XiShanYongYe-Chang
Copy link
Member

Hi @wu0407 Thank you for your feedback.

The deletion of PropagationPolicy and the deletion of OverridePolicy may need to be considered separately.

When users delete PropagationPolicy, Karmada will disconnect the resource template from pp, but ResourceBinding will still exist. At this time, when users delete op again, ResourceBinding will still synchronize down to the member cluster. Therefore, the resources in the member cluster will remain consistent with the template.

I would like to understand why you need to delete PropagationPolicy and OverridePolicy, but keep the resource template.

@wu0407
Copy link
Contributor Author

wu0407 commented Aug 2, 2023

I don't want deployment running on that cluster, but continue run on karmada cluster. Because karmada apiserver is reuese exist apiserver.

relate issue:
The deployment will be sync to member cluster after cluster is rejoin, but that cluster not have any PropagationPolicy.
The reason is that ResourceBinding has been retained when PropagationPolicy is deleted even the cluster is removed.

@wu0407
Copy link
Contributor Author

wu0407 commented Aug 2, 2023

ResourceBinding should be delete or relate info need to be clean.

spec:
  placement:
    clusterAffinity:
      clusterNames:
      - sh5-online
    clusterTolerations:
    - effect: NoExecute
      key: cluster.karmada.io/not-ready
      operator: Exists
      tolerationSeconds: 300
    - effect: NoExecute
      key: cluster.karmada.io/unreachable
      operator: Exists
      tolerationSeconds: 300

@chaunceyjiang
Copy link
Member

ResourceBinding should be delete

ResourceBinding currently has the same lifecycle as the resource template. If the resource template is not deleted, then ResourceBinding will also not be deleted.

or relate info need to be clean.

I have created an issue to track this problem. #3886

@XiShanYongYe-Chang
Copy link
Member

@wu0407 @chaunceyjiang Thanks~, Your discussion has provided me with a new perspective. In fact, this issue has been discussed before, #3681

One previous conclusion was:

I'm thinking if we can take a step forward and consider how to make this behavior more in line with user expectations. For example, can we give control of this behavior to users and let them decide whether rb needs to be synchronized and deleted?

I think now we can further optimize this conclusion. The lifecycle of rb and resource templates remains consistent, and users can decide whether to clean up the scheduling results on rb with pp information so that they will no longer be synchronized. What do you think?

@whitewindmills
Copy link
Member

I think now we can further optimize this conclusion. The lifecycle of rb and resource templates remains consistent, and users can decide whether to clean up the scheduling results on rb with pp information so that they will no longer be synchronized. What do you think?

if users choose to clean up the previous scheduling results, should the resource template be matched by other new appropriate policies? I think cleanup conflicts with rescheduling. it would be great if you could explain it to me.

@chaunceyjiang
Copy link
Member

I think now we can further optimize this conclusion. The lifecycle of rb and resource templates remains consistent, and users can decide whether to clean up the scheduling results on rb with pp information so that they will no longer be synchronized. What do you think?

spec:
  placement:
    clusterAffinity:
      clusterNames:
      - sh5-online
    clusterTolerations:
    - effect: NoExecute
      key: cluster.karmada.io/not-ready
      operator: Exists
      tolerationSeconds: 300
    - effect: NoExecute
      key: cluster.karmada.io/unreachable
      operator: Exists
      tolerationSeconds: 300

I don't think these are scheduling results. I believe they are scheduling information. Since the PP has already been deleted, the related information should also be cleaned up.

@XiShanYongYe-Chang
Copy link
Member

XiShanYongYe-Chang commented Aug 3, 2023

if users choose to clean up the previous scheduling results, should the resource template be matched by other new appropriate policies?

I think that regardless of how users choose, it should be allowed for resource templates to match with new Propagation Policies. This point may not conflict.

I think cleanup conflicts with rescheduling.

When clearing the resource template of pp matching information, it is necessary to clean up the information on rb.

@XiShanYongYe-Chang
Copy link
Member

I don't think these are scheduling results. I believe they are scheduling information. Since the PP has already been deleted,

I agree with you. The statement about the scheduling results is incorrect.

the related information should also be cleaned up.

Do you mean to clean directly without requiring user selection?

@chaunceyjiang
Copy link
Member

Do you mean to clean directly without requiring user selection?

Yes.

@XiShanYongYe-Chang
Copy link
Member

I tend to agree a little, but I'm not sure if there are users who rely on this. It might be a good idea to hear what others think at the community meeting.

@XiShanYongYe-Chang
Copy link
Member

/cc @RainbowMango

@liangyuanpeng
Copy link
Contributor

liangyuanpeng commented Aug 3, 2023

I would recommend turning on a featuregate as a first step, the default behavior remains the same as it is now.

Open the featuregate if the user want clean directly.

Then someday in the future, when everything is stable, the default behavior can be modified and cleanup this featuregate.

@liangyuanpeng
Copy link
Contributor

liangyuanpeng commented Aug 3, 2023

Then someday in the future, when everything is stable, the default behavior can be modified and cleanup this featuregate.

A simple version evolution example:
1.7 alpha
1.8 beta and featuregate open by default
1.9 GA

Just let the featuregate open by default and do not need to cleanup.

If a direct cleaning solution is adopted in the end, I suggest notifying users in advance in 1.7, and execute in 1.8

@weilaaa
Copy link
Member

weilaaa commented Aug 4, 2023

In another issues #3681 , we've discussed it yet, like "I prefer to add a new gate in pp to determine this behavior, and the default is consistent with the current one to be compatible with the existing data. User can open new gate in pp to change this behavior if he knows what he is doing."

A simple version evolution example:
1.7 alpha
1.8 beta and featuregate open by default
1.9 GA

For compatibility reasons, I recommend using the current behavior as the default. But if we can do some research, and the research results show that users prefer to clean up rb when deleting pp, then I suggest that we can adopt the method that @liangyuanpeng said, just like k8s' strategy for new featuregate.

@weilaaa
Copy link
Member

weilaaa commented Aug 4, 2023

By the way, there will be new featuregate in pp that seems better than a global featuregate switch, casue user can control the action for each pp.

I just leave the problem here that how's the default action going.

cc @olderTaoist

@RainbowMango
Copy link
Member

I'm looking and thinking about the use case:

I don't want deployment running on that cluster, but continue run on karmada cluster. Because karmada apiserver is reuese exist apiserver.

I'm asking a further question, why can't remove it even in case of reusing the API server?

Feel that it'd be better to have a chat at the community meeting.

@XiShanYongYe-Chang
Copy link
Member

XiShanYongYe-Chang commented Aug 10, 2023

Hello everybody, @wu0407 @chaunceyjiang @liangyuanpeng @whitewindmills @weilaaa @olderTaoist

Can we discuss this issue at the community meeting next Tuesday (2023-08-15)? I hope that we can reach some consensus and make progress towards resolving this issue during the meeting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

7 participants