Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend Binding API for graceful eviction #2273

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 53 additions & 0 deletions api/openapi-spec/swagger.json
Original file line number Diff line number Diff line change
Expand Up @@ -15873,6 +15873,46 @@
}
]
},
"com.github.karmada-io.karmada.pkg.apis.work.v1alpha2.GracefulEvictionTask": {
"description": "GracefulEvictionTask represents a graceful eviction task.",
"type": "object",
"required": [
"fromCluster",
"reason",
"producer"
],
"properties": {
"creationTimestamp": {
"description": "CreationTimestamp is a timestamp representing the server time when this object was created. Clients should not set this value to avoid the time inconsistency issue. It is represented in RFC3339 form(like '2021-04-25T10:02:10Z') and is in UTC.\n\nPopulated by the system. Read-only.",
"default": {},
"$ref": "#/definitions/io.k8s.apimachinery.pkg.apis.meta.v1.Time"
},
"fromCluster": {
"description": "FromCluster which cluster the eviction perform from.",
"type": "string",
"default": ""
},
"message": {
"description": "Message is a human-readable message indicating details about the eviction. This may be an empty string.",
"type": "string"
},
"producer": {
"description": "Producer indicates the controller who triggered the eviction.",
"type": "string",
"default": ""
},
"reason": {
"description": "Reason contains a programmatic identifier indicating the reason for the eviction. Producers may define expected values and meanings for this field, and whether the values are considered a guaranteed API. The value should be a CamelCase string. This field may not be empty.",
"type": "string",
"default": ""
},
"replicas": {
"description": "Replicas indicates the number of replicas should be evicted. Should be ignored for resource type that doesn't have replica.",
"type": "integer",
"format": "int32"
}
}
},
"com.github.karmada-io.karmada.pkg.apis.work.v1alpha2.NodeClaim": {
"description": "NodeClaim represents the node claim HardNodeAffinity, NodeSelector and Tolerations required by each replica.",
"type": "object",
Expand Down Expand Up @@ -16044,6 +16084,14 @@
"$ref": "#/definitions/com.github.karmada-io.karmada.pkg.apis.work.v1alpha2.TargetCluster"
}
},
"gracefulEvictionTasks": {
"description": "GracefulEvictionTasks holds the eviction tasks that are expected to perform the eviction in a graceful way. The intended workflow is: 1. Once the controller(such as 'taint-manager') decided to evict the resource that\n is referenced by current ResourceBinding or ClusterResourceBinding from a target\n cluster, it removes(or scale down the replicas) the target from Clusters(.spec.Clusters)\n and builds a graceful eviction task.\n2. The scheduler may perform a re-scheduler and probably select a substitute cluster\n to take over the evicting workload(resource).\n3. The graceful eviction controller takes care of the graceful eviction tasks and\n performs the final removal after the workload(resource) is available on the substitute\n cluster or exceed the grace termination period(defaults to 10 minutes).",
"type": "array",
"items": {
"default": {},
"$ref": "#/definitions/com.github.karmada-io.karmada.pkg.apis.work.v1alpha2.GracefulEvictionTask"
}
},
"propagateDeps": {
"description": "PropagateDeps tells if relevant resources should be propagated automatically. It is inherited from PropagationPolicy or ClusterPropagationPolicy. default false.",
"type": "boolean"
Expand Down Expand Up @@ -16091,6 +16139,11 @@
"default": {},
"$ref": "#/definitions/io.k8s.apimachinery.pkg.apis.meta.v1.Condition"
}
},
"schedulerObservedGeneration": {
"description": "SchedulerObservedGeneration is the generation(.metadata.generation) observed by the scheduler. If SchedulerObservedGeneration is less than the generation in metadata means the scheduler hasn't confirmed the scheduling result or hasn't done the schedule yet.",
"type": "integer",
"format": "int64"
}
}
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -264,6 +264,66 @@ spec:
- name
type: object
type: array
gracefulEvictionTasks:
description: 'GracefulEvictionTasks holds the eviction tasks that
are expected to perform the eviction in a graceful way. The intended
workflow is: 1. Once the controller(such as ''taint-manager'') decided
to evict the resource that is referenced by current ResourceBinding
or ClusterResourceBinding from a target cluster, it removes(or scale
down the replicas) the target from Clusters(.spec.Clusters) and
builds a graceful eviction task. 2. The scheduler may perform a
re-scheduler and probably select a substitute cluster to take over
the evicting workload(resource). 3. The graceful eviction controller
takes care of the graceful eviction tasks and performs the final
removal after the workload(resource) is available on the substitute
cluster or exceed the grace termination period(defaults to 10 minutes).'
items:
description: GracefulEvictionTask represents a graceful eviction
task.
properties:
creationTimestamp:
description: "CreationTimestamp is a timestamp representing
the server time when this object was created. Clients should
not set this value to avoid the time inconsistency issue.
It is represented in RFC3339 form(like '2021-04-25T10:02:10Z')
and is in UTC. \n Populated by the system. Read-only."
format: date-time
type: string
fromCluster:
description: FromCluster which cluster the eviction perform
from.
type: string
message:
description: Message is a human-readable message indicating
details about the eviction. This may be an empty string.
maxLength: 1024
type: string
producer:
description: Producer indicates the controller who triggered
the eviction.
type: string
reason:
description: Reason contains a programmatic identifier indicating
the reason for the eviction. Producers may define expected
values and meanings for this field, and whether the values
are considered a guaranteed API. The value should be a CamelCase
string. This field may not be empty.
maxLength: 32
minLength: 1
pattern: ^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$
type: string
replicas:
description: Replicas indicates the number of replicas should
be evicted. Should be ignored for resource type that doesn't
have replica.
format: int32
type: integer
required:
- fromCluster
- producer
- reason
type: object
type: array
propagateDeps:
description: PropagateDeps tells if relevant resources should be propagated
automatically. It is inherited from PropagationPolicy or ClusterPropagationPolicy.
Expand Down Expand Up @@ -609,6 +669,13 @@ spec:
- type
type: object
type: array
schedulerObservedGeneration:
description: SchedulerObservedGeneration is the generation(.metadata.generation)
observed by the scheduler. If SchedulerObservedGeneration is less
than the generation in metadata means the scheduler hasn't confirmed
the scheduling result or hasn't done the schedule yet.
format: int64
type: integer
type: object
required:
- spec
Expand Down
67 changes: 67 additions & 0 deletions charts/karmada/_crds/bases/work.karmada.io_resourcebindings.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -264,6 +264,66 @@ spec:
- name
type: object
type: array
gracefulEvictionTasks:
description: 'GracefulEvictionTasks holds the eviction tasks that
are expected to perform the eviction in a graceful way. The intended
workflow is: 1. Once the controller(such as ''taint-manager'') decided
to evict the resource that is referenced by current ResourceBinding
or ClusterResourceBinding from a target cluster, it removes(or scale
down the replicas) the target from Clusters(.spec.Clusters) and
builds a graceful eviction task. 2. The scheduler may perform a
re-scheduler and probably select a substitute cluster to take over
the evicting workload(resource). 3. The graceful eviction controller
takes care of the graceful eviction tasks and performs the final
removal after the workload(resource) is available on the substitute
cluster or exceed the grace termination period(defaults to 10 minutes).'
items:
description: GracefulEvictionTask represents a graceful eviction
task.
properties:
creationTimestamp:
description: "CreationTimestamp is a timestamp representing
the server time when this object was created. Clients should
not set this value to avoid the time inconsistency issue.
It is represented in RFC3339 form(like '2021-04-25T10:02:10Z')
and is in UTC. \n Populated by the system. Read-only."
format: date-time
type: string
fromCluster:
description: FromCluster which cluster the eviction perform
from.
type: string
message:
description: Message is a human-readable message indicating
details about the eviction. This may be an empty string.
maxLength: 1024
type: string
producer:
description: Producer indicates the controller who triggered
the eviction.
type: string
reason:
description: Reason contains a programmatic identifier indicating
the reason for the eviction. Producers may define expected
values and meanings for this field, and whether the values
are considered a guaranteed API. The value should be a CamelCase
string. This field may not be empty.
maxLength: 32
minLength: 1
pattern: ^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$
type: string
replicas:
description: Replicas indicates the number of replicas should
be evicted. Should be ignored for resource type that doesn't
have replica.
format: int32
type: integer
required:
- fromCluster
- producer
- reason
type: object
type: array
propagateDeps:
description: PropagateDeps tells if relevant resources should be propagated
automatically. It is inherited from PropagationPolicy or ClusterPropagationPolicy.
Expand Down Expand Up @@ -609,6 +669,13 @@ spec:
- type
type: object
type: array
schedulerObservedGeneration:
description: SchedulerObservedGeneration is the generation(.metadata.generation)
observed by the scheduler. If SchedulerObservedGeneration is less
than the generation in metadata means the scheduler hasn't confirmed
the scheduling result or hasn't done the schedule yet.
format: int64
type: integer
type: object
required:
- spec
Expand Down
63 changes: 63 additions & 0 deletions pkg/apis/work/v1alpha2/binding_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,22 @@ type ResourceBindingSpec struct {
// +optional
Clusters []TargetCluster `json:"clusters,omitempty"`

// GracefulEvictionTasks holds the eviction tasks that are expected to perform
// the eviction in a graceful way.
// The intended workflow is:
// 1. Once the controller(such as 'taint-manager') decided to evict the resource that
// is referenced by current ResourceBinding or ClusterResourceBinding from a target
// cluster, it removes(or scale down the replicas) the target from Clusters(.spec.Clusters)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(or scale down the replicas)

Do we need to move the scale down replicas into GracefulEvictionTasks?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, e.g.

spec:
  clusters:
  - name: member2
    replicas: 10
  - name: member1
    replicas: 10

evict 5 replicas from member1, then

spec:
  clusters:
  - name: member2
    replicas: 10
  - name: member1
    replicas: 5
  gracefulEvictionTasks:
  - name: member1
    replicas: 5
    ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may cause problems. .spec.clusters describes the current cluster replicas distribution status. When the number of member2 replicas changes from 10 to 5, the number of workload replicas will be directly changed when ensuringWork in binding-controller. This will conflict with the eviction behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the number of member2 replicas changes from 10 to 5,

member2 do you mean member1?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clerical error. It does refer to member1.

Copy link
Member Author

@RainbowMango RainbowMango Jul 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that would be too complicated, and given the real use case is uncertain yet, I suppose we can remove the replicas from gracefulEvictionTasks. Now only focus on evicting the entire resource from clusters.

@Garrybest What do you say?

[edit] we can't remove the replicas, just focus on the eviction scenario of taint-manager.

// and builds a graceful eviction task.
// 2. The scheduler may perform a re-scheduler and probably select a substitute cluster
// to take over the evicting workload(resource).
// 3. The graceful eviction controller takes care of the graceful eviction tasks and
// performs the final removal after the workload(resource) is available on the substitute
// cluster or exceed the grace termination period(defaults to 10 minutes).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the grace termination period(defaults to 10 minutes).

Is this a configurable parameter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, probably. And a new field might be needed to hold the configurable parameter.

//
// +optional
GracefulEvictionTasks []GracefulEvictionTask `json:"gracefulEvictionTasks,omitempty"`

// RequiredBy represents the list of Bindings that depend on the referencing resource.
// +optional
RequiredBy []BindingSnapshot `json:"requiredBy,omitempty"`
Expand Down Expand Up @@ -142,6 +158,48 @@ type TargetCluster struct {
Replicas int32 `json:"replicas,omitempty"`
}

// GracefulEvictionTask represents a graceful eviction task.
type GracefulEvictionTask struct {
// FromCluster which cluster the eviction perform from.
// +required
FromCluster string `json:"fromCluster"`

// Replicas indicates the number of replicas should be evicted.
// Should be ignored for resource type that doesn't have replica.
// +optional
Replicas *int32 `json:"replicas,omitempty"`

// Reason contains a programmatic identifier indicating the reason for the eviction.
// Producers may define expected values and meanings for this field,
// and whether the values are considered a guaranteed API.
// The value should be a CamelCase string.
// This field may not be empty.
// +required
// +kubebuilder:validation:MaxLength=32
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:Pattern=`^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$`
Reason string `json:"reason"`

// Message is a human-readable message indicating details about the eviction.
// This may be an empty string.
// +optional
// +kubebuilder:validation:MaxLength=1024
Message string `json:"message,omitempty"`

// Producer indicates the controller who triggered the eviction.
// +required
Producer string `json:"producer"`

// CreationTimestamp is a timestamp representing the server time when this object was
// created.
// Clients should not set this value to avoid the time inconsistency issue.
// It is represented in RFC3339 form(like '2021-04-25T10:02:10Z') and is in UTC.
//
// Populated by the system. Read-only.
// +optional
CreationTimestamp metav1.Time `json:"creationTimestamp,omitempty"`
}

// BindingSnapshot is a snapshot of a ResourceBinding or ClusterResourceBinding.
type BindingSnapshot struct {
// Namespace represents the namespace of the Binding.
Expand All @@ -161,6 +219,11 @@ type BindingSnapshot struct {

// ResourceBindingStatus represents the overall status of the strategy as well as the referenced resources.
type ResourceBindingStatus struct {
// SchedulerObservedGeneration is the generation(.metadata.generation) observed by the scheduler.
// If SchedulerObservedGeneration is less than the generation in metadata means the scheduler hasn't confirmed
// the scheduling result or hasn't done the schedule yet.
// +optional
SchedulerObservedGeneration int64 `json:"schedulerObservedGeneration,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what if the evicition task has not been done yet but there comes a scaling-up event, how to deal with this situation by using SchedulerObservedGeneration?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I understand you correctly.

A general idea in my mind is that scheduler shouldn't schedule the resource(in binding) to a cluster which already in one of the graceful eviction tasks.

// Conditions contain the different condition statuses.
// +optional
Conditions []metav1.Condition `json:"conditions,omitempty"`
Expand Down
13 changes: 13 additions & 0 deletions pkg/apis/work/v1alpha2/well_known_constants.go
Original file line number Diff line number Diff line change
Expand Up @@ -59,3 +59,16 @@ const (
// https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#uids
ResourceTemplateUIDAnnotation = "resourcetemplate.karmada.io/uid"
)

// Define eviction reasons.
const (
// EvictionReasonTaintUntolerated describes the eviction is triggered
// because can not tolerate taint or exceed toleration period of time.
EvictionReasonTaintUntolerated = "TaintUntolerated"
)

// Define eviction producers.
const (
// EvictionProducerTaintManager represents the name of taint manager.
EvictionProducerTaintManager = "TaintManager"
)
29 changes: 29 additions & 0 deletions pkg/apis/work/v1alpha2/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading