-
Notifications
You must be signed in to change notification settings - Fork 885
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Added proposal about seamless cluster migration
Signed-off-by: chaosi-zju <chaosi@zju.edu.cn>
- Loading branch information
1 parent
355b52c
commit a34d97a
Showing
6 changed files
with
1,100 additions
and
0 deletions.
There are no files selected for viewing
317 changes: 317 additions & 0 deletions
317
docs/proposals/migration/design-of-seamless-cluster-migration-scheme.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,317 @@ | ||
--- | ||
title: proposal of seamless cluster migration | ||
authors: | ||
- "@chaosi-zju" | ||
reviewers: | ||
- "@RainbowMango" | ||
- TBD | ||
approvers: | ||
- "@RainbowMango" | ||
- TBD | ||
|
||
creation-date: 2023-07-08 | ||
--- | ||
|
||
# Proposal of seamless cluster migration | ||
|
||
## Summary | ||
|
||
- When users migrate a single cluster to multiple clusters, there is a common troublesome problem: how to handle the resources that already exist in the original cluster? In Most scenario, users hope that `Karmada` can maintain the previous running state of existing resources during the process of taking over the original cluster. We refer to this migration method as seamless migration. | ||
|
||
- Karmada currently provides two means to migrate resources from existing clusters to `Karmada`,and it is so powerful to meet most scenarios. | ||
|
||
- This proposal aims to further optimizing of seamless migration by extending the `Karmada` custom resource API, to meet more scenarios, and to ensure that users are completely unaware during the migration process. | ||
|
||
<br/> | ||
|
||
## Motivation | ||
|
||
### Scenarios analysis | ||
|
||
#### Background | ||
|
||
Typically, the multi cluster deployment process based on CI/CD pipeline for users is shown in the following figure. | ||
|
||
![image-20230718101856820](./statics/pic1.png) | ||
|
||
However, as the number of clusters increases, lacking management, users hope to migrate to [Karmada](https://karmada.io/docs/) for multi cluster management. | ||
|
||
So here comes the problem: **can the pods that already exist not be affected during the process of migration, which means the relevant container not be restarted?** | ||
|
||
#### Current method | ||
|
||
**Karmada currently provides two means to migrate resources from existing clusters to `Karmada`,and it is so powerful to meet most scenarios.** | ||
|
||
**Method one:By command line tool** | ||
|
||
Call command like `karmadactl promote deployment xxx -C yyy`, details refer to [karmadactl promote](https://karmada.io/docs/administrator/migration/promote-legacy-workload). | ||
|
||
![image-20230718110017567](./statics/pic2.png) | ||
|
||
**Method two:By automated configuration** | ||
|
||
Manually write the `PropagationPolicy` and `ResourceTemplate` configurations, then add `work.karmada.io/conflict-resolution: overwrite` to the annotation of `ResourceTemplate`, lastly call the API interface of `karmada-apiserver` to apply the configuration. | ||
|
||
> the annotation is to indicate if there is a naming conflict resource in member clusters, it will be forced taken over by overwriting the configuration. | ||
![image-20230718110017567](./statics/pic3.png) | ||
|
||
These methods both require collaborative intervention from two roles (`User` and `Operator`), if they are all developers, both methods are effective. | ||
|
||
So, we define the above two method as "methods for developers". | ||
|
||
#### Optimized method | ||
|
||
In some particular scenario, what if the `User` and the `Operator` are two independent role: | ||
|
||
* The `User` only known kubernetes native resource, he doesn't perceive Karmada and doesn't want to modify the original YAML config. | ||
|
||
* Migration is solely the responsibility of the `Operator`, he can only manage Karmada API like `PropagationPolicy`, and completely unaware of the original resources defined by the `User`. | ||
|
||
To meet these scenarios, I want to introduce a optimized method: | ||
|
||
![image-20230718110017567](./statics/pic4.png) | ||
|
||
We define this method as "methods for platform administrators/SREs". | ||
|
||
### Goals | ||
|
||
Explore how to enable `Karmada` to achieve seamless cluster migration in a more perfect and universal way, to meet more complex user scenarios and achieve complete user insensitivity? | ||
|
||
In detail, we are proposed to extend the API semantics of Karmada custom resources to support declaring how to take over a existing resource when there is a conflict in the member cluster. | ||
|
||
<br/> | ||
|
||
## Proposal | ||
|
||
1、For custom resources such as `PropagationPolicy` and `ResourceBinding`, add a field called `conflictResolution`. | ||
The operator can use this field in the `PropagationPolicy` to declare how potential conflict should be handled when a resource that is being propagated already exists in the target cluster. | ||
|
||
2、The field `conflictResolution` has the following values: | ||
|
||
* `Overwrite` : means that resolve the conflict by overwriting the resource with the propagating resource template. | ||
* `Abort`: means that do not resolve the conflict and stop propagating to avoid unexpected overwrites (default value). | ||
|
||
It defaults to "Abort" aims to avoid unexpected overwrites. The "Overwrite" might be useful when migrating legacy cluster resources to Karmada, | ||
in which case conflict is predictable and can be instructed to Karmada take over the resource by overwriting. | ||
|
||
> Points worth supplementing:for native resources such as `ResourceTemplate`, preserve the `work.karmada.io/conflict-resolution: overwrite` annotation method. | ||
> Users can add `conflict-resolution` annotations for specific resources in the `ResourceTemplate` to override the `conflictResolution` configuration of the `PropagationPolicy`. | ||
|
||
### User Stories (Optional) | ||
|
||
#### Story 1 | ||
|
||
For the scenario of [Optimized method](#optimized-method) | ||
|
||
The existing method unable to cope with this scenario. | ||
|
||
If my proposal implemented, operator can resolve it by writing a `PropagationPolicy` like this: | ||
|
||
```yaml | ||
apiVersion: policy.karmada.io/v1alpha1 | ||
kind: PropagationPolicy | ||
metadata: | ||
name: deployments-pp | ||
spec: | ||
conflictResolution: Overwrite ## Add a new field to indicate that when there is a naming conflict resource in the member cluster, it will be taken over by overwriting it | ||
placement: | ||
clusterAffinity: | ||
clusterNames: | ||
- member1 | ||
priority: 0 | ||
resourceSelectors: | ||
- apiVersion: apps/v1 | ||
kind: Deployment | ||
schedulerName: default-scheduler | ||
``` | ||
Summary:the pain spot in this scenario is that the operator can only manage the `PropagationPolicy`, however, in my method, he just right only need to add a filed `conflictResolution` to `PropagationPolicy`. | ||
|
||
#### Story 2 | ||
|
||
Suppose there are a thousand `Deployments` need to be forcibly taken over when conflicting. | ||
|
||
The existing method can resolve the problem by promote each resource one by one, feasible but inefficient. | ||
|
||
If my proposal implemented, operator can resolve it by writing a same `PropagationPolicy` in [Story 1](#Story-1) | ||
|
||
Summary:user can use `resourceSelectors` in `PropagationPolicy` to apply the `conflictResolution` configuration to a batch of `Deployments`. | ||
|
||
#### Story 3 | ||
|
||
Assuming that there are many deployments in the original single cluster, users hope that most of the deployments will be directly taken over by `Karmada`, but some special deployments ignore taking over when there are naming conflicts with resources in the member cluster. | ||
|
||
Users only need to add `conflict resolution` annotations in the `ResourceTemplate` for individual special `Deployments` based on the ` PropagationPolicy ` of [Story 1](#Story-1), for example: | ||
|
||
```yaml | ||
apiVersion: apps/v1 | ||
kind: Deployment | ||
metadata: | ||
name: nginx-deployment | ||
annotations: | ||
work.karmada.io/conflict-resolution: abort ## Preserve the semantics of the original annotation and display the indication to ignore takeover when there are naming conflicts for resources in the member cluster | ||
spec: | ||
selector: | ||
matchLabels: | ||
app: nginx | ||
replicas: 2 | ||
template: | ||
metadata: | ||
labels: | ||
app: nginx | ||
spec: | ||
containers: | ||
- name: nginx | ||
image: nginx:latest | ||
``` | ||
|
||
#### Story 4 | ||
|
||
Similarly, if multiple `Deployment` is defined in one `PropagationPolicy` , and users hope `Karmada` ignoring takeover the conflict `Deployment` by default, but forcing takeover individual specificed conflict `Deployment` : | ||
|
||
A feasible practice is to declare `conflictResolution: Abort` in the `PropagationPolicy` (or leave it blank), and annotate `work.karmada.io/conflict-resolution: overwrite` in the `ResourceTemplate`. | ||
|
||
### Notes/Constraints/Caveats (Optional) | ||
|
||
If the `conflictResolution` field of the `PropagationPolicy` and the `conflict-resolution` annotation of the `ResourceTemplate` are specified in the same time, constraints below should be followed: | ||
|
||
1、Priority effectiveness: `conflict-resolution` annotation in `ResourceTemplate` **>** `conflictResolution` field in `PropagationPolicy` | ||
|
||
2、Results corresponding to different pair values: | ||
|
||
| PP \ RT | not set / null | abort | overwrite | | ||
| ------------------ | -------------- | ----- | --------- | | ||
| **not set / null** | abort | abort | overwrite | | ||
| **Abort** | abort | abort | overwrite | | ||
| **Overwrite** | overwrite | abort | overwrite | | ||
|
||
> PP refer to PropagationPolicy ,RT refer to ResourceTemplate | ||
|
||
### Risks and Mitigations | ||
|
||
none | ||
|
||
<br/> | ||
|
||
## Design Details | ||
|
||
### API Modify | ||
|
||
1、define a enum type with two value for `conflictResolution` field | ||
|
||
```go | ||
// ConflictResolution describes how to resolve the conflict during the process | ||
// of propagation especially the resource already in a member cluster. | ||
type ConflictResolution string | ||
const ( | ||
// ConflictOverwrite means that resolve the conflict by overwriting the | ||
// resource with the propagating resource template. | ||
ConflictOverwrite ConflictResolution = "Overwrite" | ||
// ConflictAbort means that do not resolve the conflict and stop propagating. | ||
ConflictAbort ConflictResolution = "Abort" | ||
) | ||
``` | ||
|
||
2、add `string` field `conflictResolution` to `PropagationSpec` | ||
|
||
```go | ||
// PropagationSpec represents the desired behavior of PropagationPolicy. | ||
type PropagationSpec struct { | ||
// ConflictResolution declares how potential conflict should be handled when | ||
// a resource that is being propagated already exists in the target cluster. | ||
// | ||
// It defaults to "Abort" which means stop propagating to avoid unexpected | ||
// overwrites. The "Overwrite" might be useful when migrating legacy cluster | ||
// resources to Karmada, in which case conflict is predictable and can be | ||
// instructed to Karmada take over the resource by overwriting. | ||
// | ||
// +kubebuilder:default="Abort" | ||
// +kubebuilder:validation:Enum=Abort;Overwrite | ||
// +optional | ||
ConflictResolution ConflictResolution `json:"conflictResolution,omitempty"` | ||
... | ||
} | ||
``` | ||
|
||
3、add `string` field `conflictResolution` to `ResourceBinding`/`ClusterResourceBinding` | ||
|
||
```go | ||
// ResourceBindingSpec represents the expectation of ResourceBinding. | ||
type ResourceBindingSpec struct { | ||
// ConflictResolution declares how potential conflict should be handled when | ||
// a resource that is being propagated already exists in the target cluster. | ||
// | ||
// It defaults to "Abort" which means stop propagating to avoid unexpected | ||
// overwrites. The "Overwrite" might be useful when migrating legacy cluster | ||
// resources to Karmada, in which case conflict is predictable and can be | ||
// instructed to Karmada take over the resource by overwriting. | ||
// | ||
// +kubebuilder:default="Abort" | ||
// +kubebuilder:validation:Enum=Abort;Overwrite | ||
// +optional | ||
ConflictResolution policyv1alpha1.ConflictResolution `json:"conflictResolution,omitempty"` | ||
... | ||
} | ||
``` | ||
|
||
### Process Logic | ||
|
||
1、**ResourceDetector:** | ||
|
||
Change: Assign the `conflictResolution` value of `PropagationPolicy` to `ResourceBinding`/`ClusterResourceBinding` | ||
|
||
2、**ResourceBinding Controller:** | ||
|
||
Change: Update the value of the `conflict-resolution` annotation in `workload` based on the `conflictResolution` field and the original `conflict-resolution` annotation in `workload` | ||
|
||
3、**Execution Controller:** | ||
|
||
> Previous:determine whether the `work.karmada.io/conflict-resolution` annotation is included in the `workload` of `Work`. | ||
> Only with this annotation and a value of `overwrite` will the `workload` be synchronized to member clusters | ||
no changes | ||
|
||
### Test Plan | ||
|
||
Add e2e test cases: | ||
|
||
1)Create a host cluster and member clusters, install `Karmada` in the host cluster, and joins the member clusters. | ||
|
||
2)Creating a `Deployment` in a member cluster. | ||
|
||
3)Create a `PropagationPolicy` and `ResourceTemplate` in the host cluster, and verify whether the takeover results of `Deployments` that | ||
already exist in the member cluster match the expected values for the `conflictResolution` and other fields mentioned above. | ||
|
||
<br/> | ||
|
||
## Alternatives | ||
|
||
### Abandon old annotation method | ||
|
||
**Q:Should the old annotation method be abandoned ?** | ||
|
||
A:The old annotation method does have fewer usage scenarios, users might have a higher probability of using new methods, but I prefer to reserve the old annotation method, since: | ||
|
||
* compatible | ||
* Prevent individual special situations against `conflictResolution` value of `PropagationPolicy`, give user more flexibility to define configuration. | ||
|
||
### Less API modify | ||
|
||
**Q: Whether is needed to modify the api of `ResourceBinding`?** | ||
|
||
A:Honestly, there is no need to modify the api of `ResourceBinding` from the implementation perspective (`conflictResolution` can be declared through annotations either). | ||
No such api modify even makes code more clean, but two reasons are under my consideration: | ||
|
||
1)We not only want to just meet the scenario in above stories, but also solemnly recommend the ability and practice of seamless migration to users. | ||
Adding this field to CRDs including `ResourceBinding` can more clearly demonstrate this ability to users than adding annotations. | ||
|
||
2)Adding annotations is just a **compatible** way for individual exceptions, even if we remove it, it's still justifiable. Assuming it doesn't exist, | ||
we still need to modify the api of `ResourceBinding`. I mean, the annotation is just a addons, our desgin shouldn't overdependence on it. | ||
|
||
3)More convenient for code implementation | ||
|
||
<br/> |
Oops, something went wrong.