Bug 1970985: SDN-1955: Add pre-puller ds to reduce upgrade downtime #1141

tssurya · 2021-06-28T06:54:46Z

According to https://bugzilla.redhat.com/show_bug.cgi?id=1943334#c0 it takes roughly a minute during upgrades for old pods to get killed and new pods to get created.

It is predicted that a major chunk of this time is spent in pulling the new image into the node. (@squeed : do we have data to back up this claim that I can point to?)

This PR adds a new prepuller daemonset that is basically a no-op which simply assists in pulling the new image onto the nodes before the new pods get created so that it cuts down on the downtime.

Idea, Co-Authored By: https://github.com/squeed/openshift-cluster-network-operator/commit/42c0d1db5576a2e4e6b16b115e657477dbc33073

bindata/network/ovn-kubernetes/pre-puller.yaml

pkg/network/ovn_kubernetes.go

bindata/network/ovn-kubernetes/pre-puller.yaml

squeed · 2021-07-05T15:37:54Z

pkg/network/ovn_kubernetes.go

+		klog.Infof("Rolling out the no-op prepuller daemonset...")
+		return false, true
+	}
+


We should also check to see that the version of the pre-puller matches the expectedVersion. Unlikely, but it could happen if we upgrade and downgrade.

Yea I have added this in the new PR. PTAL!

openshift-ci · 2021-07-06T15:47:43Z

@tssurya: This pull request references Bugzilla bug 1970985, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target release (4.9.0) matches configured target release for branch (4.9.0)
bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

No GitHub users were found matching the public email listed for the QA contact in Bugzilla (anusaxen@redhat.com), skipping review request.

In response to this:

Bug 1970985: SDN-1955: Add pre-puller ds to reduce upgrade downtime

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

pkg/util/k8s/unstructured.go

tssurya · 2021-07-08T07:14:00Z

/test e2e-gcp-ovn-upgrade
/test e2e-gcp-ovn

jluhrsen · 2021-07-08T21:11:20Z

/retest

tssurya · 2021-07-12T08:36:57Z

/test e2e-gcp-ovn

tssurya · 2021-07-12T11:42:55Z

/test e2e-gcp-ovn

squeed · 2021-07-12T13:21:07Z

Would also like to see e2e-gcp-ovn-upgrade pass (or otherwise look good, logs-wise)

tssurya · 2021-07-12T15:19:11Z

/test e2e-gcp-ovn-upgrade

jluhrsen · 2021-07-13T20:10:53Z

@tssurya , is this ready now besides hopefully getting some of these failing checks to pass with a
/retest

jluhrsen · 2021-07-14T04:58:25Z

there is little hope that the gcp-ovn-upgrade or openstack job will pass, but both [1][2] are optional so shouldn't
block this from being merged.

squeed · 2021-07-14T10:29:55Z

I just checked the Loki logs, and everything is as expected: https://grafana-loki.ci.openshift.org/explore?orgId=1&left=%5B%221626206400000%22,%221626210000000%22,%22Grafana%20Cloud%22,%7B%22expr%22:%22%7Binvoker%3D%5C%22openshift-internal-ci%2Fpull-ci-openshift-cluster-network-operator-master-e2e-gcp-ovn-upgrade%2F1415041102456557568%5C%22%7D%20%7C%20unpack%20%7C%20namespace%3D%5C%22openshift-network-operator%5C%22%20%7C%3D%20puller%5Cn%22%7D%5D

/approve
/lgtm
woohoo

openshift-bot · 2021-07-22T10:29:52Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-07-22T10:41:51Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-07-22T12:41:51Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-07-22T12:53:52Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-07-22T13:41:51Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-07-22T14:29:51Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-07-22T14:54:53Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-07-22T15:31:54Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

trozet · 2021-07-22T16:00:52Z

/override ci/prow/e2e-gcp-ovn

openshift-ci · 2021-07-22T16:03:31Z

@trozet: Overrode contexts on behalf of trozet: ci/prow/e2e-gcp-ovn

In response to this:

/override ci/prow/e2e-gcp-ovn

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-bot · 2021-07-22T18:09:56Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-07-22T20:02:52Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-07-22T22:37:44Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-07-22T23:03:48Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-07-22T23:29:46Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-07-22T23:42:45Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-07-23T00:08:45Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-07-23T00:21:45Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-07-23T00:34:45Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-07-23T00:47:47Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-ci · 2021-07-23T00:53:33Z

@tssurya: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Rerun command
ci/prow/e2e-vsphere-ovn	`c9c0a34`	link	`/test e2e-vsphere-ovn`
ci/prow/e2e-vsphere-windows	`c9c0a34`	link	`/test e2e-vsphere-windows`
ci/prow/e2e-openstack-ovn	`c9c0a34`	link	`/test e2e-openstack-ovn`
ci/prow/e2e-ovn-hybrid-step-registry	`c9c0a34`	link	`/test e2e-ovn-hybrid-step-registry`
ci/prow/e2e-azure-ovn	`c9c0a34`	link	`/test e2e-azure-ovn`
ci/prow/e2e-metal-ipi-ovn-ipv6-ipsec	`c9c0a34`	link	`/test e2e-metal-ipi-ovn-ipv6-ipsec`
ci/prow/e2e-gcp-ovn-upgrade	`c9c0a34`	link	`/test e2e-gcp-ovn-upgrade`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-bot · 2021-07-23T02:05:45Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-07-23T02:18:58Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-07-23T02:44:45Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-07-23T03:11:08Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-ci · 2021-07-23T04:21:15Z

@tssurya: All pull requests linked via external trackers have merged:

openshift/cluster-network-operator#1141

Bugzilla bug 1970985 has been moved to the MODIFIED state.

In response to this:

Bug 1970985: SDN-1955: Add pre-puller ds to reduce upgrade downtime

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

vrutkovs · 2021-07-28T07:59:27Z

/cherry-pick release-4.8

openshift-cherrypick-robot · 2021-07-28T08:00:05Z

@vrutkovs: #1141 failed to apply on top of branch "release-4.8":

Applying: Add pre-puller ds to reduce upgrade downtime
Using index info to reconstruct a base tree...
M	pkg/bootstrap/types.go
M	pkg/network/ovn_kubernetes.go
M	pkg/network/ovn_kubernetes_test.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/network/ovn_kubernetes_test.go
CONFLICT (content): Merge conflict in pkg/network/ovn_kubernetes_test.go
Auto-merging pkg/network/ovn_kubernetes.go
CONFLICT (content): Merge conflict in pkg/network/ovn_kubernetes.go
Auto-merging pkg/bootstrap/types.go
CONFLICT (content): Merge conflict in pkg/bootstrap/types.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Add pre-puller ds to reduce upgrade downtime
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherry-pick release-4.8

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 28, 2021

openshift-ci bot requested review from aojea and juanluisvaladas June 28, 2021 06:54

tssurya commented Jun 28, 2021

View reviewed changes

bindata/network/ovn-kubernetes/pre-puller.yaml Outdated Show resolved Hide resolved

pkg/network/ovn_kubernetes.go Outdated Show resolved Hide resolved

pkg/network/ovn_kubernetes.go Show resolved Hide resolved

pkg/network/ovn_kubernetes.go Show resolved Hide resolved

tssurya changed the title ~~[WIP] Add pre-puller ds to reduce upgrade downtime~~ [WIP] SDN-1955: Add pre-puller ds to reduce upgrade downtime Jun 28, 2021

squeed reviewed Jun 28, 2021

View reviewed changes

bindata/network/ovn-kubernetes/pre-puller.yaml Outdated Show resolved Hide resolved

squeed reviewed Jun 28, 2021

View reviewed changes

bindata/network/ovn-kubernetes/pre-puller.yaml Outdated Show resolved Hide resolved

squeed reviewed Jun 28, 2021

View reviewed changes

bindata/network/ovn-kubernetes/pre-puller.yaml Outdated Show resolved Hide resolved

tssurya force-pushed the pre-puller branch from e694059 to 6d1f227 Compare July 4, 2021 06:51

squeed reviewed Jul 5, 2021

View reviewed changes

tssurya force-pushed the pre-puller branch from 6d1f227 to 16eadb3 Compare July 6, 2021 15:45

tssurya changed the title ~~[WIP] SDN-1955: Add pre-puller ds to reduce upgrade downtime~~ Bug 1970985: SDN-1955: Add pre-puller ds to reduce upgrade downtime Jul 6, 2021

openshift-ci bot added bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Jul 6, 2021

openshift-ci bot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Jul 6, 2021

squeed reviewed Jul 7, 2021

View reviewed changes

pkg/util/k8s/unstructured.go Outdated Show resolved Hide resolved

tssurya force-pushed the pre-puller branch from 16eadb3 to 4f9fc56 Compare July 7, 2021 20:39

openshift-ci bot assigned squeed Jul 14, 2021

openshift-ci bot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jul 14, 2021

openshift-merge-robot merged commit 38d5752 into openshift:master Jul 23, 2021

tssurya mentioned this pull request Jul 28, 2021

Bug 1987046: Add pre-puller ds to reduce upgrade downtime #1167

Merged

Bug 1970985: SDN-1955: Add pre-puller ds to reduce upgrade downtime #1141

Bug 1970985: SDN-1955: Add pre-puller ds to reduce upgrade downtime #1141

Conversation

tssurya commented Jun 28, 2021 • edited

squeed Jul 5, 2021

Choose a reason for hiding this comment

tssurya Jul 6, 2021

Choose a reason for hiding this comment

openshift-ci bot commented Jul 6, 2021 • edited

tssurya commented Jul 8, 2021

jluhrsen commented Jul 8, 2021

tssurya commented Jul 12, 2021

tssurya commented Jul 12, 2021

squeed commented Jul 12, 2021

tssurya commented Jul 12, 2021

jluhrsen commented Jul 13, 2021

jluhrsen commented Jul 14, 2021

squeed commented Jul 14, 2021

openshift-bot commented Jul 22, 2021

openshift-bot commented Jul 22, 2021

openshift-bot commented Jul 22, 2021

openshift-bot commented Jul 22, 2021

openshift-bot commented Jul 22, 2021

openshift-bot commented Jul 22, 2021

openshift-bot commented Jul 22, 2021

openshift-bot commented Jul 22, 2021

trozet commented Jul 22, 2021

openshift-ci bot commented Jul 22, 2021

openshift-bot commented Jul 22, 2021

openshift-bot commented Jul 22, 2021

openshift-bot commented Jul 22, 2021

openshift-bot commented Jul 22, 2021

openshift-bot commented Jul 22, 2021

openshift-bot commented Jul 22, 2021

openshift-bot commented Jul 23, 2021

openshift-bot commented Jul 23, 2021

openshift-bot commented Jul 23, 2021

openshift-bot commented Jul 23, 2021

openshift-ci bot commented Jul 23, 2021 • edited

openshift-bot commented Jul 23, 2021

openshift-bot commented Jul 23, 2021

openshift-bot commented Jul 23, 2021

openshift-bot commented Jul 23, 2021

openshift-ci bot commented Jul 23, 2021 • edited

vrutkovs commented Jul 28, 2021

openshift-cherrypick-robot commented Jul 28, 2021

tssurya commented Jun 28, 2021 •

edited

openshift-ci bot commented Jul 6, 2021 •

edited

openshift-ci bot commented Jul 23, 2021 •

edited

openshift-ci bot commented Jul 23, 2021 •

edited