Added proposal about seamless cluster migration

Signed-off-by: chaosi-zju <chaosi@zju.edu.cn>
karmada-io · Jul 10, 2023 · 35e637f · 35e637f
1 parent 355b52c
commit 35e637f
Showing 1 changed file with 277 additions and 0 deletions.
diff --git a/docs/proposals/migration/design-of-seamless-cluster-migration-scheme.md b/docs/proposals/migration/design-of-seamless-cluster-migration-scheme.md
@@ -0,0 +1,277 @@
+---
+title: Design of seamless cluster migration scheme
+authors:
+- "@chaosi-zju"
+reviewers:
+- "@robot"
+- TBD
+approvers:
+- "@robot"
+- TBD
+
+creation-date: 2023-07-08
+
+---
+
+# Design of seamless cluster migration scheme
+
+
+## Summary
+
+- When users migrate from a single cluster to multiple clusters, there is a very troublesome problem. How to handle the resources that already exist in the original cluster? Most scenario users hope that Karmada can maintain the normal operation of existing resources in the original cluster during the process of taking over the original cluster. We refer to this migration method as seamless migration
+- Currently, there are multiple ways for Karmada to achieve seamless migration, but the existing methods require operators to perceive the resources of the sub cluster, require user intervention, and the user experience is not smooth enough
+- This proposal aims to achieve truly seamless migration by extending the Karmada custom resource API, ensuring that users are completely unaware during the migration process
+
+
+## Motivation
+
+Users migrate from a single cluster to multiple clusters, hoping that Karmada will take over the existing deployment and other resources of the original single cluster. During the migration process, users hope to achieve seamless switching, including:
+
+- **Pods that already exist during the entire migration process will not be restarted**
+
+- When deploying Pod to a new cluster, there will be a verification process, and the number of Pod replicas in the original cluster will remain unchanged until the verification is completed
+
+- After the migration is completed, the total number of Pod replicas in all clusters remains unchanged
+
+
+Karmada currently has two ways to achieve seamless migration as mentioned above:
+
+
+
+- Method 1: Call `karmadactl promote deployment xxx -C yyy` to take over the existing deployment and other resources in the original single cluster
+- Method 2: Manually write the `PropagationPolicy` and `ResourceTemplate` configurations and call the ApiServer interface of the Host cluster to apply the configuration. Note that it is necessary to add `work. karmada. io/conflict resolution: overwrite` in the annotation of `ResourceTemplate`, indicating that when there is a naming conflict resource in the member cluster, it will be taken over by overwriting  **(if left blank, it will not be taken over by default)**
+
+
+> Detailed practical steps can be found in: [Utilizing existing policy capabilities to achieve cluster migration](https://www.yuque.com/chaosi/pwzt9c/ypugrp6452lu36ok)
+
+
+The above methods all require **the migration of specific native resources that the operator perceives in the sub cluster**. In some `CI/CD` automated pipeline scenarios, **the operator cannot manually modify resource declarations such as the original deployment, and the resources contained in the original cluster are also not visible to the operator**. This brings difficulties and challenges to achieving seamless migration.
+
+
+### Goals
+
+Explore how to enable Karmada to achieve seamless cluster migration in a more perfect and universal way, to meet more complex user scenarios and achieve complete user insensitivity?
+
+In detail, extend the API semantics of Karmada custom resources to support declaring in the custom API how to take over a matching resource when there is a conflict in the member cluster.
+
+
+
+### Non-Goals
+
+If there is a conflict in the sub cluster and the resource is not taken over during the conflict, how to ensure that the total number of replicas meets the expected ResourceTemplate? This follows the original logic and is not within the scope of this proposal discussion
+
+> For example, the ResourceTemplate specifies that two replicas need to be evenly allocated to two sub clusters. If sub cluster 1 fails to allocate due to conflicts and does not take over, the proposal does not care about how to handle the failure issue
+
+
+## Proposal
+
+1. For custom resources such as PropagationPolicy, ResourceBinding, and Work, add a field called `conflictResolution`. The operator can use this field in the PropagationPolicy to specify how to take over the matched resource when there is a conflict in the member cluster.
+2. For native resources such as ResourceTemplate, preserve the `work. karmada. io/conflict resolution: overwrite` annotation method. Users can add `conflict-resolution` annotations for specific resources in the ResourceTemplate to override the `conflictResolution` configuration of the PropagationPolicy.
+
+
+### User Stories (Optional)
+
+#### Story 1
+
+Assuming that there are many deployments in the original single cluster, and the user wants Karmada to fully take over all deployments, they only need to write the following PropagationPolicy:
+
+```yaml
+ apiVersion: policy.karmada.io/v1alpha1
+ kind: PropagationPolicy
+ metadata:
+   name: deployments-pp
+ spec:
+   conflictResolution: overwrite   ## Add a new field to indicate that when there is a naming conflict resource in the member cluster, it will be taken over by overwriting it
+   placement:
+     clusterAffinity:
+       clusterNames:
+       - member1
+   priority: 0
+   resourceSelectors:
+   - apiVersion: apps/v1
+     kind: Deployment
+   schedulerName: default-scheduler
+```
+
+#### Story 2
+
+Assuming that there are many deployments in the original single cluster, users hope that most of the deployments will be directly taken over by Karmada, but some special deployments ignore taking over when there are naming conflicts with resources in the member cluster. Users only need to add `conflict resolution` annotations in the `ResourceTemplate` for individual special `deployments` based on the` PropagationPolicy `of` story 1 `, for example:
+
+```yaml
+ apiVersion: apps/v1
+ kind: Deployment
+ metadata:
+   name: nginx-deployment
+   annotations:
+     work.karmada.io/conflict-resolution: ignore   ## Preserve the semantics of the original annotation and display the indication to ignore takeover when there are naming conflicts for resources in the member cluster
+ spec:
+   selector:
+     matchLabels:
+       app: nginx
+   replicas: 2
+   template:
+     metadata:
+       labels:
+         app: nginx
+     spec:
+       containers:
+       - name: nginx
+         image: nginx:latest
+```
+
+#### Story 3
+
+Similarly, users hope to ignore takeover when most deployment conflicts occur, but for individual deployments that force takeover, declare `conflictResolution: ignore` in the PropagationPolicy (or leave it blank), and annotate `work. karmada. io/conflict resolution: overwrite` in the ResourceTemplate`
+
+
+
+### Notes/Constraints/Caveats (Optional)
+
+The `conflictResolution` field of the ResourceTemplate and the `conflict resolution` annotation of the PropagationPolicy specify the following constraints:
+
+1. Validity when coexisting: `conflict resolution` annotation in ResourceTemplate>`conflictResolution` field in PropagationPolicy
+2. Results corresponding to different values:
+
+| PP \ RT             | not set   | ignore | overwrite | null or invalid |
+|---------------------|-----------|--------|-----------|-----------------|
+| **not set**         | ignore    | ignore | overwrite | ignore          |
+| **ignore**          | ignore    | ignore | overwrite | ignore          |
+| **overwrite**       | overwrite | ignore | overwrite | overwrite       |
+| **null or invalid** | ignore    | ignore | overwrite | ignore          |
+
+> PP refer to PropagationPolicy ，RT refer to ResourceTemplate 
+
+
+
+### Risks and Mitigations
+
+none
+
+
+
+## Design Details
+
+
+
+### API Modify
+
+1、add `string` field `conflictResolution` to `PropagationSpec`
+
+```go
+ type PropagationPolicy struct {
+     metav1.TypeMeta   `json:",inline"`
+     metav1.ObjectMeta `json:"metadata,omitempty"`
+
+     // Spec represents the desired behavior of PropagationPolicy.
+     // +required
+     Spec PropagationSpec `json:"spec"`
+ }
+
+ // PropagationSpec represents the desired behavior of PropagationPolicy.
+ type PropagationSpec struct {
+     // conflictResolution how to resolve workload when resource template has conflicts with member clusters
+     conflictResolution string `json:"conflictResolution,omitempty"`  // TODO add
+     ...
+ }
+```
+
+2、add `string` field `conflictResolution` to `ResourceBindingSpec`
+
+```go
+ type ResourceBinding struct {
+     metav1.TypeMeta   `json:",inline"`
+     metav1.ObjectMeta `json:"metadata,omitempty"`
+
+     // Spec represents the desired behavior.
+     Spec ResourceBindingSpec `json:"spec"`
+
+     // Status represents the most recently observed status of the ResourceBinding.
+     // +optional
+     Status ResourceBindingStatus `json:"status,omitempty"`
+ }
+
+ // ResourceBindingSpec represents the expectation of ResourceBinding.
+ type ResourceBindingSpec struct {
+     // conflictResolution how to resolve workload when resource template has conflicts with member clusters
+     conflictResolution string `json:"conflictResolution,omitempty"`    // TODO add
+     ...
+ }
+```
+
+3、add `string` field `conflictResolution` to `WorkSpec` 
+
+```go
+ type Work struct {
+     metav1.TypeMeta   `json:",inline"`
+     metav1.ObjectMeta `json:"metadata,omitempty"`
+
+     // Spec represents the desired behavior of Work.
+     Spec WorkSpec `json:"spec"`
+
+     // Status represents the status of PropagationStatus.
+     // +optional
+     Status WorkStatus `json:"status,omitempty"`
+ }
+
+ // WorkSpec defines the desired state of Work.
+ type WorkSpec struct {
+     // conflictResolution how to resolve workload when resource template has conflicts with member clusters
+     conflictResolution string `json:"conflictResolution,omitempty"`   // TODO add
+
+     // Workload represents the manifest workload to be deployed on managed cluster.
+     Workload WorkloadTemplate `json:"workload,omitempty"`
+ }
+
+ type WorkloadTemplate struct {
+     Manifests []Manifest `json:"manifests,omitempty"`
+ }
+
+ type Manifest struct {
+     runtime.RawExtension `json:",inline"`
+ }
+```
+
+
+
+### Process Logic
+
+1、**ResourceDetector：** Reconcile ----> propagateResource ---->  ApplyClusterPolicy ----> BuildResourceBinding ----> CreateOrUpdate
+
+Change: Assign the `conflictResolution` value of `PropagationPolicy` to `ResourceBinding` in `BuildResourceBinding`
+
+
+
+2、**ResourceBinding Controller：** Reconcile ----> syncBinding ----> ensureWork ----> CreateOrUpdateWork ----> mergeAnnotations
+
+Change 1: Assign the `conflictResolution` of  `ResourceBinding` to `Work` in `CreateOrUpdateWork`
+
+Change 2: In `mergeAnnotations`, update the value of the` conflict resolution `annotation in ` workload ` based on the` conflictResolution `field and the original `conflict-resolution ` annotation in` workload `
+
+
+
+3、**Execution Controller：** Reconcile ----> syncWork ----> syncToClusters ----> tryCreateOrUpdateWorkload ----> ObjectWatcher.Update----> allowUpdate
+
+> Determine whether the `work. karmada. io/conflict resolution` annotation is included in the `workload` of `Work` in `allowUpdate`. Only with this annotation and a value of `overwrite` will the update be issued to the `workload`
+
+no changes
+
+
+
+### Test Plan
+
+Add e2e test cases:
+
+1）Create a host cluster and member clusters, install Karmada in the host cluster, and manage member clusters
+
+2）Creating a deployment in a member cluster
+
+3）Create a PropagationPolicy and ResourceTemplate in the host cluster, and verify whether the takeover results of deployments that already exist in the member cluster match the expected values for the `conflictResolution` and other fields mentioned above
+
+
+
+## Alternatives
+
+- whether is need to add this field in `ResourceBinding`
+- whether is need to add this field to `work`
+- should the old annotation method be abandoned
+- function `ensureWork` may involve some refactoring