Added proposal about seamless cluster migration

Signed-off-by: chaosi-zju <chaosi@zju.edu.cn>
karmada-io · Jul 10, 2023 · 2029898 · 2029898
1 parent 355b52c
commit 2029898
Showing 1 changed file with 291 additions and 0 deletions.
diff --git a/docs/proposals/migration/design-of-seamless-cluster-migration-scheme.md b/docs/proposals/migration/design-of-seamless-cluster-migration-scheme.md
@@ -0,0 +1,291 @@
+---
+title: proposal of seamless cluster migration
+authors:
+- "@chaosi-zju"
+reviewers:
+- "@RainbowMango"
+- TBD
+approvers:
+- "@RainbowMango"
+- TBD
+
+creation-date: 2023-07-08
+---
+
+# Proposal of seamless cluster migration
+
+
+
+## Summary
+
+- When users migrate a single cluster to multiple clusters, there is a common troublesome problem: how to handle the resources that already exist in the original cluster? In Most scenario, users hope that `Karmada` can maintain the previous running state of existing resources in the original cluster during the process of taking over the original cluster. We refer to this migration method as seamless migration.
+
+- Currently, there are several ways for `Karmada` to achieve seamless migration like [karmadactl promote](https://karmada.io/docs/administrator/migration/promote-legacy-workload), but the existing methods require operators to perceive the resources of the original cluster, require user intervention, and the user experience is not smooth enough.
+
+- This proposal aims to further optimizing of seamless migration by extending the `Karmada` custom resource API, ensuring that users are completely unaware during the migration process.
+
+
+
+
+## Motivation
+
+Users migrate a single cluster to multiple clusters, hoping that `Karmada` will take over the existing deployment or other resources of the original single cluster. During the migration process, users hope to achieve seamless switching, including:
+
+- **Pods that already exist should not be affected during the process of migration, which means the relevant container should not be restarted**.
+
+- When deploying Pod to a new cluster, there will be a verification process, and the number of Pod replicas in the original cluster will remain unchanged until the verification is completed.
+
+- After the migration is completed, the total number of Pod replicas in all clusters remains unchanged.
+
+
+
+Karmada currently has two ways to achieve seamless migration as mentioned above:
+
+- **Method 1:** Call `karmadactl promote deployment xxx -C yyy` to take over the existing deployment and other resources in the original single cluster
+- **Method 2:** Manually write the `PropagationPolicy` and `ResourceTemplate` configurations and call the ApiServer interface of the Host cluster to apply the configuration. Noting that **it is necessary to add `work.karmada.io/conflict-resolution: overwrite` in the annotation of `ResourceTemplate`**, indicating that when there is a naming conflict resource in the member cluster, it will be forced taken over by overwriting the configuration **(if left blank, it will not be taken over by default)** 
+
+
+> Detailed practical steps can be found in: [Utilizing existing policy capabilities to achieve cluster migration](https://www.yuque.com/chaosi/pwzt9c/ypugrp6452lu36ok)
+
+
+
+
+The above methods all require **the migration of specific native resources that the operator perceives in the member cluster**. In some `CI/CD` automated pipeline scenarios, **the operator cannot manually modify native resource declarations such as `Deployment`, and the resources contained in the original cluster are also not visible to the operator**. This brings difficulties and challenges to achieving seamless migration.
+
+
+
+### Goals
+
+Explore how to enable `Karmada`  to achieve seamless cluster migration in a more perfect and universal way, to meet more complex user scenarios and achieve complete user insensitivity?
+
+In detail, we are proposed to extend the API semantics of Karmada custom resources to support declaring how to take over a matching resource when there is a conflict in the member cluster.
+
+
+
+### Non-Goals
+
+If there is a conflict in the member cluster and the resource is not taken over during the conflict, how to ensure that the total number of replicas meets the expected `ResourceTemplate`? This follows the original logic and is not within the scope of this proposal discussion.
+
+> For example, the `ResourceTemplate` specifies that two replicas need to be evenly allocated to two member clusters. If member cluster 1 fails to allocate due to conflicts and does not take over, the proposal does not care about how to handle the failure issue.
+
+
+
+## Proposal
+
+1. For custom resources such as `PropagationPolicy`, `ResourceBinding`, and `Work`, add a field called `conflictResolution`. The operator can use this field in the `PropagationPolicy` to specify how to take over the matched resource when there is a conflict in the member cluster.
+2. For native resources such as `ResourceTemplate`, preserve the `work.karmada.io/conflict-resolution: overwrite` annotation method. Users can add `conflict-resolution` annotations for specific resources in the `ResourceTemplate` to override the `conflictResolution` configuration of the `PropagationPolicy`.
+
+
+
+### User Stories (Optional)
+
+#### Story 1
+
+Assuming that there are many deployments in the original single cluster, and the user wants `Karmada` to fully take over all deployments.
+
+They only need to write the following `PropagationPolicy`:
+
+```yaml
+ apiVersion: policy.karmada.io/v1alpha1
+ kind: PropagationPolicy
+ metadata:
+   name: deployments-pp
+ spec:
+   conflictResolution: overwrite   ## Add a new field to indicate that when there is a naming conflict resource in the member cluster, it will be taken over by overwriting it
+   placement:
+     clusterAffinity:
+       clusterNames:
+       - member1
+   priority: 0
+   resourceSelectors:
+   - apiVersion: apps/v1
+     kind: Deployment
+   schedulerName: default-scheduler
+```
+
+#### Story 2
+
+Assuming that there are many deployments in the original single cluster, users hope that most of the deployments will be directly taken over by `Karmada`, but some special deployments ignore taking over when there are naming conflicts with resources in the member cluster. 
+
+Users only need to add `conflict resolution` annotations in the `ResourceTemplate` for individual special `Deployments` based on the ` PropagationPolicy ` of ` story 1 `, for example:
+
+```yaml
+ apiVersion: apps/v1
+ kind: Deployment
+ metadata:
+   name: nginx-deployment
+   annotations:
+     work.karmada.io/conflict-resolution: ignore   ## Preserve the semantics of the original annotation and display the indication to ignore takeover when there are naming conflicts for resources in the member cluster
+ spec:
+   selector:
+     matchLabels:
+       app: nginx
+   replicas: 2
+   template:
+     metadata:
+       labels:
+         app: nginx
+     spec:
+       containers:
+       - name: nginx
+         image: nginx:latest
+```
+
+#### Story 3
+
+Similarly, if multiple `Deployment` is defined in one `PropagationPolicy` , and users hope `Karmada` ignoring takeover the conflict `Deployment` by default, but forcing takeover individual specificed conflict `Deployment` :
+
+A feasible practice is to declare `conflictResolution: ignore` in the `PropagationPolicy` (or leave it blank), and annotate `work.karmada.io/conflict-resolution: overwrite` in the `ResourceTemplate`.
+
+
+
+### Notes/Constraints/Caveats (Optional)
+
+If the `conflictResolution` field of the `PropagationPolicy` and the `conflict-resolution` annotation of the `ResourceTemplate` are specified in the same time, constraints below should be followed:
+
+1. Priority effectiveness： `conflict-resolution` annotation in `ResourceTemplate`  **>** `conflictResolution` field in `PropagationPolicy`
+2. Results corresponding to different pair values：
+
+| PP \ RT             | not set   | ignore | overwrite | null or invalid |
+|---------------------|-----------|--------|-----------|-----------------|
+| **not set**         | ignore    | ignore | overwrite | ignore          |
+| **ignore**          | ignore    | ignore | overwrite | ignore          |
+| **overwrite**       | overwrite | ignore | overwrite | overwrite       |
+| **null or invalid** | ignore    | ignore | overwrite | ignore          |
+
+> PP refer to PropagationPolicy ，RT refer to ResourceTemplate 
+
+
+
+### Risks and Mitigations
+
+none
+
+
+
+## Design Details
+
+
+
+### API Modify
+
+1、add `string` field `conflictResolution` to `PropagationSpec`
+
+```go
+ type PropagationPolicy struct {
+     metav1.TypeMeta   `json:",inline"`
+     metav1.ObjectMeta `json:"metadata,omitempty"`
+
+     // Spec represents the desired behavior of PropagationPolicy.
+     // +required
+     Spec PropagationSpec `json:"spec"`
+ }
+
+ // PropagationSpec represents the desired behavior of PropagationPolicy.
+ type PropagationSpec struct {
+     // conflictResolution how to resolve workload when resource template has conflicts with member clusters
+     conflictResolution string `json:"conflictResolution,omitempty"`  // TODO add
+     ...
+ }
+```
+
+2、add `string` field `conflictResolution` to `ResourceBindingSpec`
+
+```go
+ type ResourceBinding struct {
+     metav1.TypeMeta   `json:",inline"`
+     metav1.ObjectMeta `json:"metadata,omitempty"`
+
+     // Spec represents the desired behavior.
+     Spec ResourceBindingSpec `json:"spec"`
+
+     // Status represents the most recently observed status of the ResourceBinding.
+     // +optional
+     Status ResourceBindingStatus `json:"status,omitempty"`
+ }
+
+ // ResourceBindingSpec represents the expectation of ResourceBinding.
+ type ResourceBindingSpec struct {
+     // conflictResolution how to resolve workload when resource template has conflicts with member clusters
+     conflictResolution string `json:"conflictResolution,omitempty"`    // TODO add
+     ...
+ }
+```
+
+3、add `string` field `conflictResolution` to `WorkSpec` 
+
+```go
+ type Work struct {
+     metav1.TypeMeta   `json:",inline"`
+     metav1.ObjectMeta `json:"metadata,omitempty"`
+
+     // Spec represents the desired behavior of Work.
+     Spec WorkSpec `json:"spec"`
+
+     // Status represents the status of PropagationStatus.
+     // +optional
+     Status WorkStatus `json:"status,omitempty"`
+ }
+
+ // WorkSpec defines the desired state of Work.
+ type WorkSpec struct {
+     // conflictResolution how to resolve workload when resource template has conflicts with member clusters
+     conflictResolution string `json:"conflictResolution,omitempty"`   // TODO add
+
+     // Workload represents the manifest workload to be deployed on managed cluster.
+     Workload WorkloadTemplate `json:"workload,omitempty"`
+ }
+
+ type WorkloadTemplate struct {
+     Manifests []Manifest `json:"manifests,omitempty"`
+ }
+
+ type Manifest struct {
+     runtime.RawExtension `json:",inline"`
+ }
+```
+
+
+
+### Process Logic
+
+1、**ResourceDetector：** Reconcile ----> propagateResource ---->  ApplyClusterPolicy ----> BuildResourceBinding ----> CreateOrUpdate
+
+Change: Assign the `conflictResolution` value of `PropagationPolicy` to `ResourceBinding` in `BuildResourceBinding`
+
+
+
+2、**ResourceBinding Controller：** Reconcile ----> syncBinding ----> ensureWork ----> CreateOrUpdateWork ----> mergeAnnotations
+
+Change 1: Assign the `conflictResolution` of `ResourceBinding` to `Work` in `CreateOrUpdateWork`
+
+Change 2: In `mergeAnnotations`, update the value of the `conflict-resolution` annotation in `workload`  based on the `conflictResolution` field and the original `conflict-resolution` annotation in `workload`
+
+
+
+3、**Execution Controller：** Reconcile ----> syncWork ----> syncToClusters ----> tryCreateOrUpdateWorkload ----> ObjectWatcher.Update----> allowUpdate
+
+> Determine whether the `work.karmada.io/conflict-resolution` annotation is included in the `workload` of `Work` in `allowUpdate`. Only with this annotation and a value of  `overwrite` will the update be issued to the `workload`
+
+no changes
+
+
+
+### Test Plan
+
+Add e2e test cases:
+
+1）Create a host cluster and member clusters, install `Karmada` in the host cluster, and joins the member clusters.
+
+2）Creating a `Deployment` in a member cluster.
+
+3）Create a `PropagationPolicy` and `ResourceTemplate` in the host cluster, and verify whether the takeover results of `Deployments` that already exist in the member cluster match the expected values for the `conflictResolution` and other fields mentioned above.
+
+
+
+## Alternatives
+
+- whether is need to add this field in `ResourceBinding`
+- whether is need to add this field to `work`
+- should the old annotation method be abandoned
+- function `ensureWork` may involve some refactoring