Skip to content

Commit

Permalink
[release-1.5] Disable default workloadUpdates strategies (#1578)
Browse files Browse the repository at this point in the history
Enabling workloadUpdates strategies by default
can expose unexpected behaviours.
Temporary disable it until we properly
covered all the corner cases.

Bug-Url: https://bugzilla.redhat.com/show_bug.cgi?id=2017394

This is a manual cherry pick of #1577

Signed-off-by: Simone Tiraboschi <stirabos@redhat.com>
  • Loading branch information
tiraboschi committed Oct 27, 2021
1 parent f5b767a commit ea58ca8
Show file tree
Hide file tree
Showing 11 changed files with 21 additions and 94 deletions.
11 changes: 1 addition & 10 deletions deploy/crds/hco00.crd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1294,32 +1294,23 @@ spec:
providers
type: string
workloadUpdateStrategy:
default:
batchEvictionInterval: 1m0s
batchEvictionSize: 10
workloadUpdateMethods:
- LiveMigrate
description: WorkloadUpdateStrategy defines at the cluster level how
to handle automated workload updates
properties:
batchEvictionInterval:
default: 1m0s
description: BatchEvictionInterval Represents the interval to
wait before issuing the next batch of shutdowns
type: string
batchEvictionSize:
default: 10
description: BatchEvictionSize Represents the number of VMIs that
can be forced updated per the BatchShutdownInteral interval
type: integer
workloadUpdateMethods:
default:
- LiveMigrate
description: WorkloadUpdateMethods defines the methods that can
be used to disrupt workloads during automated workload updates.
When multiple methods are present, the least disruptive method
takes precedence over more disruptive methods. For example if
both LiveMigrate and Shutdown methods are listed, only VMs which
both LiveMigrate and Evict methods are listed, only VMs which
are not live migratable will be restarted/shutdown. An empty
list defaults to no automated workload updating.
items:
Expand Down
5 changes: 0 additions & 5 deletions deploy/hco.cr.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,4 @@ spec:
parallelMigrationsPerCluster: 5
parallelOutboundMigrationsPerNode: 2
progressTimeout: 150
workloadUpdateStrategy:
batchEvictionInterval: 1m0s
batchEvictionSize: 10
workloadUpdateMethods:
- LiveMigrate
workloads: {}
Original file line number Diff line number Diff line change
Expand Up @@ -1294,32 +1294,23 @@ spec:
providers
type: string
workloadUpdateStrategy:
default:
batchEvictionInterval: 1m0s
batchEvictionSize: 10
workloadUpdateMethods:
- LiveMigrate
description: WorkloadUpdateStrategy defines at the cluster level how
to handle automated workload updates
properties:
batchEvictionInterval:
default: 1m0s
description: BatchEvictionInterval Represents the interval to
wait before issuing the next batch of shutdowns
type: string
batchEvictionSize:
default: 10
description: BatchEvictionSize Represents the number of VMIs that
can be forced updated per the BatchShutdownInteral interval
type: integer
workloadUpdateMethods:
default:
- LiveMigrate
description: WorkloadUpdateMethods defines the methods that can
be used to disrupt workloads during automated workload updates.
When multiple methods are present, the least disruptive method
takes precedence over more disruptive methods. For example if
both LiveMigrate and Shutdown methods are listed, only VMs which
both LiveMigrate and Evict methods are listed, only VMs which
are not live migratable will be restarted/shutdown. An empty
list defaults to no automated workload updating.
items:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1294,32 +1294,23 @@ spec:
providers
type: string
workloadUpdateStrategy:
default:
batchEvictionInterval: 1m0s
batchEvictionSize: 10
workloadUpdateMethods:
- LiveMigrate
description: WorkloadUpdateStrategy defines at the cluster level how
to handle automated workload updates
properties:
batchEvictionInterval:
default: 1m0s
description: BatchEvictionInterval Represents the interval to
wait before issuing the next batch of shutdowns
type: string
batchEvictionSize:
default: 10
description: BatchEvictionSize Represents the number of VMIs that
can be forced updated per the BatchShutdownInteral interval
type: integer
workloadUpdateMethods:
default:
- LiveMigrate
description: WorkloadUpdateMethods defines the methods that can
be used to disrupt workloads during automated workload updates.
When multiple methods are present, the least disruptive method
takes precedence over more disruptive methods. For example if
both LiveMigrate and Shutdown methods are listed, only VMs which
both LiveMigrate and Evict methods are listed, only VMs which
are not live migratable will be restarted/shutdown. An empty
list defaults to no automated workload updating.
items:
Expand Down
8 changes: 4 additions & 4 deletions docs/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ HyperConvergedSpec defines the desired state of HyperConverged
| obsoleteCPUs | ObsoleteCPUs allows avoiding scheduling of VMs for obsolete CPU models | *[HyperConvergedObsoleteCPUs](#hyperconvergedobsoletecpus) | | false |
| commonTemplatesNamespace | CommonTemplatesNamespace defines namespace in which common templates will be deployed. It overrides the default openshift namespace. | *string | | false |
| storageImport | StorageImport contains configuration for importing containerized data | *[StorageImportConfig](#storageimportconfig) | | false |
| workloadUpdateStrategy | WorkloadUpdateStrategy defines at the cluster level how to handle automated workload updates | *[HyperConvergedWorkloadUpdateStrategy](#hyperconvergedworkloadupdatestrategy) | {"workloadUpdateMethods": {"LiveMigrate"}, "batchEvictionSize": 10, "batchEvictionInterval": "1m0s"} | false |
| workloadUpdateStrategy | WorkloadUpdateStrategy defines at the cluster level how to handle automated workload updates | *[HyperConvergedWorkloadUpdateStrategy](#hyperconvergedworkloadupdatestrategy) | | false |

[Back to TOC](#table-of-contents)

Expand All @@ -155,9 +155,9 @@ HyperConvergedWorkloadUpdateStrategy defines options related to updating a KubeV

| Field | Description | Scheme | Default | Required |
| ----- | ----------- | ------ | -------- |-------- |
| workloadUpdateMethods | WorkloadUpdateMethods defines the methods that can be used to disrupt workloads during automated workload updates. When multiple methods are present, the least disruptive method takes precedence over more disruptive methods. For example if both LiveMigrate and Shutdown methods are listed, only VMs which are not live migratable will be restarted/shutdown. An empty list defaults to no automated workload updating. | []string | {"LiveMigrate"} | false |
| batchEvictionSize | BatchEvictionSize Represents the number of VMIs that can be forced updated per the BatchShutdownInteral interval | *int | 10 | false |
| batchEvictionInterval | BatchEvictionInterval Represents the interval to wait before issuing the next batch of shutdowns | *metav1.Duration | "1m0s" | false |
| workloadUpdateMethods | WorkloadUpdateMethods defines the methods that can be used to disrupt workloads during automated workload updates. When multiple methods are present, the least disruptive method takes precedence over more disruptive methods. For example if both LiveMigrate and Evict methods are listed, only VMs which are not live migratable will be restarted/shutdown. An empty list defaults to no automated workload updating. | []string | | false |
| batchEvictionSize | BatchEvictionSize Represents the number of VMIs that can be forced updated per the BatchShutdownInteral interval | *int | | false |
| batchEvictionInterval | BatchEvictionInterval Represents the interval to wait before issuing the next batch of shutdowns | *metav1.Duration | | false |

[Back to TOC](#table-of-contents)

Expand Down
10 changes: 3 additions & 7 deletions docs/cluster-configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -406,19 +406,15 @@ spec:
commonTemplatesNamespace: kubevirt
```

## Enable eventual launcher updates by default
us the HyperConverged `spec.workloadUpdateStrategy` object to define how to handle automated workload updates at the cluster
## Optionally enable launcher updates
The HyperConverged `spec.workloadUpdateStrategy` defines how to handle automated workload updates at the cluster
level.

The `workloadUpdateStrategy` fields are:
* `batchEvictionInterval` - BatchEvictionInterval Represents the interval to wait before issuing the next batch of
shutdowns.

The Default value is `1m`

* `batchEvictionSize` - Represents the number of VMIs that can be forced updated per the BatchShutdownInteral interval

The default value is `10`

* `workloadUpdateMethods` - defines the methods that can be used to disrupt workloads
during automated workload updates.
Expand All @@ -429,7 +425,7 @@ The `workloadUpdateStrategy` fields are:

An empty list defaults to no automated workload updating.

The default values is `LiveMigrate`; `Evict` is not enabled by default being potentially disruptive for the existing workloads.
The feature is not enabled by default being potentially disruptive for the existing workloads.

### workloadUpdateStrategy example
```yaml
Expand Down
22 changes: 0 additions & 22 deletions hack/check_defaults.sh
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,6 @@ FGDEFAULTS='{"sriovLiveMigration":true,"withHostPassthroughCPU":false}'
LMDEFAULTS='{"completionTimeoutPerGiB":800,"parallelMigrationsPerCluster":5,"parallelOutboundMigrationsPerNode":2,"progressTimeout":150}'
PERMITTED_HOST_DEVICES_DEFAULT1='{"pciDeviceSelector":"10DE:1DB6","resourceName":"nvidia.com/GV100GL_Tesla_V100"}'
PERMITTED_HOST_DEVICES_DEFAULT2='{"pciDeviceSelector":"10DE:1EB8","resourceName":"nvidia.com/TU104GL_Tesla_T4"}'
WORKLOAD_UPDATE_STRATEGY_DEFAULT='{"batchEvictionInterval":"1m0s","batchEvictionSize":10,"workloadUpdateMethods":["LiveMigrate"]}'

CERTCONFIGPATHS=(
"/spec/certConfig/ca/duration"
Expand Down Expand Up @@ -56,14 +55,6 @@ LMPATHS=(
"/spec"
)

WORKLOAD_UPDATE_STRATEGY_PATHS=(
"/spec/workloadUpdateStrategy/workloadUpdateMethods"
"/spec/workloadUpdateStrategy/batchEvictionSize"
"/spec/workloadUpdateStrategy/batchEvictionInterval"
"/spec/workloadUpdateStrategy"
"/spec"
)

echo "Check that certConfig defaults are behaving as expected"

./hack/retry.sh 10 3 "${KUBECTL_BINARY} patch hco -n \"${INSTALLED_NAMESPACE}\" --type=json kubevirt-hyperconverged -p '[{ \"op\": \"replace\", \"path\": /spec, \"value\": {} }]'"
Expand Down Expand Up @@ -102,16 +93,3 @@ for JPATH in "${LMPATHS[@]}"; do
fi
sleep 2
done

echo "Check that workloadUpdateStrategy defaults are behaving as expected"

./hack/retry.sh 10 3 "${KUBECTL_BINARY} patch hco -n \"${INSTALLED_NAMESPACE}\" --type=json kubevirt-hyperconverged -p '[{ \"op\": \"replace\", \"path\": /spec, \"value\": {} }]'"
for JPATH in "${WORKLOAD_UPDATE_STRATEGY_PATHS[@]}"; do
./hack/retry.sh 10 3 "${KUBECTL_BINARY} patch hco -n \"${INSTALLED_NAMESPACE}\" --type='json' kubevirt-hyperconverged -p '[{ \"op\": \"remove\", \"path\": '\"${JPATH}\"' }]'"
WORKLOAD_UPDATE_STRATEGY=$(${KUBECTL_BINARY} get hco -n "${INSTALLED_NAMESPACE}" kubevirt-hyperconverged -o jsonpath='{.spec.workloadUpdateStrategy}')
if [[ "${WORKLOAD_UPDATE_STRATEGY_DEFAULT}" != "${WORKLOAD_UPDATE_STRATEGY}" ]]; then
echo "Failed checking CR defaults for workloadUpdateStrategy"
exit 1
fi
sleep 2
done
6 changes: 1 addition & 5 deletions pkg/apis/hco/v1beta1/hyperconverged_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,6 @@ type HyperConvergedSpec struct {
StorageImport *StorageImportConfig `json:"storageImport,omitempty"`

// WorkloadUpdateStrategy defines at the cluster level how to handle automated workload updates
// +kubebuilder:default={"workloadUpdateMethods": {"LiveMigrate"}, "batchEvictionSize": 10, "batchEvictionInterval": "1m0s"}
// +optional
WorkloadUpdateStrategy *HyperConvergedWorkloadUpdateStrategy `json:"workloadUpdateStrategy,omitempty"`
}
Expand Down Expand Up @@ -282,26 +281,23 @@ type HyperConvergedWorkloadUpdateStrategy struct {
// WorkloadUpdateMethods defines the methods that can be used to disrupt workloads
// during automated workload updates.
// When multiple methods are present, the least disruptive method takes
// precedence over more disruptive methods. For example if both LiveMigrate and Shutdown
// precedence over more disruptive methods. For example if both LiveMigrate and Evict
// methods are listed, only VMs which are not live migratable will be restarted/shutdown.
// An empty list defaults to no automated workload updating.
//
// +listType=atomic
// +kubebuilder:default={"LiveMigrate"}
// +optional
WorkloadUpdateMethods []string `json:"workloadUpdateMethods,omitempty"`

// BatchEvictionSize Represents the number of VMIs that can be forced updated per
// the BatchShutdownInteral interval
//
// +kubebuilder:default=10
// +optional
BatchEvictionSize *int `json:"batchEvictionSize,omitempty"`

// BatchEvictionInterval Represents the interval to wait before issuing the next
// batch of shutdowns
//
// +kubebuilder:default="1m0s"
// +optional
BatchEvictionInterval *metav1.Duration `json:"batchEvictionInterval,omitempty"`
}
Expand Down
8 changes: 0 additions & 8 deletions pkg/components/components.go
Original file line number Diff line number Diff line change
Expand Up @@ -797,9 +797,6 @@ func GetOperatorCR() *hcov1beta1.HyperConverged {
parallelOutboundMigrationsPerNode := uint32(2)
progressTimeout := int64(150)

batchEvictionSize := 10
batchEvictionInterval := metav1.Duration{Duration: 1 * time.Minute}

return &hcov1beta1.HyperConverged{
TypeMeta: metav1.TypeMeta{
APIVersion: util.APIVersion,
Expand Down Expand Up @@ -829,11 +826,6 @@ func GetOperatorCR() *hcov1beta1.HyperConverged {
ParallelOutboundMigrationsPerNode: &parallelOutboundMigrationsPerNode,
ProgressTimeout: &progressTimeout,
},
WorkloadUpdateStrategy: &hcov1beta1.HyperConvergedWorkloadUpdateStrategy{
WorkloadUpdateMethods: []string{"LiveMigrate"},
BatchEvictionSize: &batchEvictionSize,
BatchEvictionInterval: &batchEvictionInterval,
},
LocalStorageClassName: "",
},
}
Expand Down
20 changes: 7 additions & 13 deletions pkg/controller/operands/kubevirt_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -1726,7 +1726,7 @@ Version: 1.2.3`)
Expect(req.Conditions).To(BeEmpty())
})

It("should set Workload Update Strategy to defaults if missing in HCO CR", func() {
It("should not set Workload Update Strategy if missing in HCO CR", func() {
existingResource := NewKubeVirtWithNameOnly(hco)

cl := commonTestUtils.InitClient([]runtime.Object{hco})
Expand All @@ -1745,14 +1745,7 @@ Version: 1.2.3`)

Expect(foundResource.Spec.WorkloadUpdateStrategy).ToNot(BeNil())
kvUpdateStrategy := foundResource.Spec.WorkloadUpdateStrategy
Expect(kvUpdateStrategy.BatchEvictionInterval.Duration.String()).Should(Equal("1m0s"))
Expect(*kvUpdateStrategy.BatchEvictionSize).Should(Equal(defaultBatchEvictionSize))
Expect(kvUpdateStrategy.WorkloadUpdateMethods).Should(HaveLen(1))
Expect(kvUpdateStrategy.WorkloadUpdateMethods).Should(
ContainElements(
kubevirtv1.WorkloadUpdateMethodLiveMigrate,
),
)
Expect(kvUpdateStrategy.WorkloadUpdateMethods).Should(HaveLen(0))

Expect(req.Conditions).To(BeEmpty())
})
Expand All @@ -1763,10 +1756,11 @@ Version: 1.2.3`)
Expect(err).ToNot(HaveOccurred())

modifiedBatchEvictionSize := 5
hco.Spec.WorkloadUpdateStrategy.WorkloadUpdateMethods = []string{"aaa", "bbb", "ccc"}
hco.Spec.WorkloadUpdateStrategy.BatchEvictionInterval = &metav1.Duration{Duration: time.Minute * 3}
hco.Spec.WorkloadUpdateStrategy.BatchEvictionSize = &modifiedBatchEvictionSize

hco.Spec.WorkloadUpdateStrategy = &hcov1beta1.HyperConvergedWorkloadUpdateStrategy{
WorkloadUpdateMethods: []string{"aaa", "bbb", "ccc"},
BatchEvictionSize: &modifiedBatchEvictionSize,
BatchEvictionInterval: &metav1.Duration{Duration: time.Minute * 3},
}
cl := commonTestUtils.InitClient([]runtime.Object{hco, existingKv})
handler := (*genericOperand)(newKubevirtHandler(cl, commonTestUtils.GetScheme()))
res := handler.ensure(req)
Expand Down
3 changes: 3 additions & 0 deletions tools/util/marshaller.go
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,9 @@ func fixQuoteIssues(yamlBytes []byte) []byte {
s = strings.Replace(s, " '\"", " \"", -1)
s = strings.Replace(s, "\"'\n", "\"\n", -1)

// fix quoted empty square brackets by removing unneeded single quotes...
s = strings.Replace(s, " '[]'", " []", -1)

yamlBytes = []byte(s)
return yamlBytes
}
Expand Down

0 comments on commit ea58ca8

Please sign in to comment.