Make zone spread only apply within a given revision #1724

alvaroaleman · 2022-09-02T21:12:24Z

We currently apply zone spread for all revisions for a given workload.
This means that a new revision can only be rolled out after a replica of
the old revision was removed.

This PR fixes that by:

Moving the calculation of the zone spread affinity into the ApplyTo
funcs
Calculating a hash of the podTemplate there
Applying a label to the pod with that hash
Using all pod labels including the one with the hash in the
AntiAffinity rule

As a sideeffect, this removes the requirement to know the pods labels by
the time DeploymentConfig.SetDefaults() is called. In many cases, they
weren't known by that time so it was called with a nil labelmap, which
resulted in the Zone spread code being short-circuited. With this
change, everything that calls SetDefaults and has more than one replica
will get the zone spread, as it is makes sense for all components.

I manually verified that with this change and a
sufficiently-sized management cluster, a HA controlplane upgrade results
in zero failed scheduling events.

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, use fixes #<issue_number>(, fixes #<issue_number>, ...) format, where issue_number might be a GitHub issue, or a Jira story:
ref https://issues.redhat.com/browse/HOSTEDCP-518

Checklist

Subject and description added to both, commit and PR.
Relevant issues have been referenced.
This change includes docs.
This change includes unit tests.

openshift-ci · 2022-09-02T21:13:04Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alvaroaleman

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [alvaroaleman]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

csrwng · 2022-09-03T00:43:36Z

Looks like you need to run unit tests with UPDATE=1, other than that, it looks good to me.
Something that occurred to me is that maybe it would be a good idea to extract this code into its own module (not necessarily a separate repo) so that it can be reused by other operators that will place pods into the control plane namespace like CNO and storage. Otherwise we need to keep making the same changes there as well.

csrwng · 2022-09-03T00:53:30Z

Looks like you need to run unit tests with UPDATE=1, other than that, it looks good to me.
Something that occurred to me is that maybe it would be a good idea to extract this code into its own module (not necessarily a separate repo) so that it can be reused by other operators that will place pods into the control plane namespace like CNO and storage. Otherwise we need to keep making the same changes there as well.

eranco74 · 2022-09-04T08:02:09Z

/test capi-provider-agent-sanity

eranco74 · 2022-09-04T09:56:19Z

/test capi-provider-agent-sanity

eranco74 · 2022-09-04T10:25:53Z

Prow issue, the test never started
/test capi-provider-agent-sanity

enxebre · 2022-09-05T07:32:19Z

support/config/deployment.go

-func (c *DeploymentConfig) setMultizoneSpread(labels map[string]string) {
-	if labels == nil {
+func (c *DeploymentConfig) setMultizoneSpread(pod *corev1.PodTemplateSpec) {
+	if !c.setDefaults || c.Replicas <= 1 {
 		return


do we really need any check? since we are using a unique hash per Deployment I think could simplify and apply this unconditionally now.

This just keeps the previous behavior

I consider current behaviour and implementation suboptimal. Using the hash enables us to apply always a consistent set of labels and affinity unconditionally. That simplifies code but also reduce the number of different config combinations and divergence on the clusters we create.

support/config/deployment.go

alvaroaleman · 2022-09-05T14:12:13Z

Btw, this change shaves a good 25% of the runtime of our e2e tests off. Without this, we always need more than 2h, with this, we need around 1h30m.

We likely need openshift/release#31897 to make it stable though.

support/config/deployment.go

alvaroaleman · 2022-09-06T13:17:50Z

@enxebre removed the setDefaults, ptal. Could you also have a look at openshift/release#31897 which is likely required to merge this?

support/config/deployment.go

enxebre · 2022-09-07T09:20:53Z

lgtm, unit and e2e fails though

We currently apply zone spread for all revisions for a given workload. This means that a new revision can only be rolled out after a replica of the old revision was removed. This PR fixes that by: * Moving the calculation of the zone spread affinity into the ApplyTo funcs * Calculating a hash of the podTemplate there * Applying a label to the pod with that hash * Using all pod labels including the one with the hash in the AntiAffinity rule As a sideeffect, this removes the requirement to know the pods labels by the time DeploymentConfig.SetDefaults() is called. In many cases, they weren't known by that time so it was called with a nil labelmap, which resulted in the Zone spread code being short-circuited. With this change, everything that calls SetDefaults and has more than one replica will get the zone spread, as it is makes sense for all components. I manually verified that with this change and a sufficiently-sized management cluster, a HA controlplane upgrade results in zero failed scheduling events.

alvaroaleman · 2022-09-08T16:07:30Z

/retest-required

alvaroaleman · 2022-09-08T17:50:49Z

/retest-required

alvaroaleman · 2022-09-08T19:54:58Z

/retest-required

alvaroaleman · 2022-09-08T23:11:12Z

@enxebre the presubmit finally passed without any failed scheduling event. The upgrade failure is due to OCPBUGS-990, until we have a fix for that promoted into an N-1 release, it won't pass but that is unrelated to this change

enxebre · 2022-09-09T14:28:34Z

/lgtm

alvaroaleman · 2022-09-09T14:40:19Z

The upgrade job is known broken and everything else passed
/override ci/prow/e2e-aws

openshift-ci · 2022-09-09T14:41:44Z

@alvaroaleman: Overrode contexts on behalf of alvaroaleman: ci/prow/e2e-aws

In response to this:

The upgrade job is known broken and everything else passed
/override ci/prow/e2e-aws

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci · 2022-09-09T14:41:46Z

@alvaroaleman: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-kubevirt-gcp-ovn	`3c60cb9`	link	false	`/test e2e-kubevirt-gcp-ovn`
ci/prow/capi-provider-agent-sanity	`3c60cb9`	link	false	`/test capi-provider-agent-sanity`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-ci bot requested review from csrwng and sjenning September 2, 2022 21:13

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 2, 2022

alvaroaleman force-pushed the spread branch 2 times, most recently from 27f608f to d279a8c Compare September 2, 2022 21:32

alvaroaleman force-pushed the spread branch 2 times, most recently from 06ebba5 to 3722ccf Compare September 4, 2022 23:15

enxebre reviewed Sep 5, 2022

View reviewed changes