Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1970985: SDN-1955: Add pre-puller ds to reduce upgrade downtime #1141

Merged
merged 1 commit into from Jul 23, 2021

Conversation

tssurya
Copy link
Contributor

@tssurya tssurya commented Jun 28, 2021

According to https://bugzilla.redhat.com/show_bug.cgi?id=1943334#c0 it takes roughly a minute during upgrades for old pods to get killed and new pods to get created.

It is predicted that a major chunk of this time is spent in pulling the new image into the node. (@squeed : do we have data to back up this claim that I can point to?)

This PR adds a new prepuller daemonset that is basically a no-op which simply assists in pulling the new image onto the nodes before the new pods get created so that it cuts down on the downtime.

Idea, Co-Authored By: https://github.com/squeed/openshift-cluster-network-operator/commit/42c0d1db5576a2e4e6b16b115e657477dbc33073

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 28, 2021
bindata/network/ovn-kubernetes/pre-puller.yaml Outdated Show resolved Hide resolved
pkg/network/ovn_kubernetes.go Outdated Show resolved Hide resolved
pkg/network/ovn_kubernetes.go Show resolved Hide resolved
pkg/network/ovn_kubernetes.go Show resolved Hide resolved
@tssurya tssurya changed the title [WIP] Add pre-puller ds to reduce upgrade downtime [WIP] SDN-1955: Add pre-puller ds to reduce upgrade downtime Jun 28, 2021
klog.Infof("Rolling out the no-op prepuller daemonset...")
return false, true
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also check to see that the version of the pre-puller matches the expectedVersion. Unlikely, but it could happen if we upgrade and downgrade.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I have added this in the new PR. PTAL!

@tssurya tssurya changed the title [WIP] SDN-1955: Add pre-puller ds to reduce upgrade downtime Bug 1970985: SDN-1955: Add pre-puller ds to reduce upgrade downtime Jul 6, 2021
@openshift-ci openshift-ci bot added bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Jul 6, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 6, 2021

@tssurya: This pull request references Bugzilla bug 1970985, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.9.0) matches configured target release for branch (4.9.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

No GitHub users were found matching the public email listed for the QA contact in Bugzilla (anusaxen@redhat.com), skipping review request.

In response to this:

Bug 1970985: SDN-1955: Add pre-puller ds to reduce upgrade downtime

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Jul 6, 2021
pkg/util/k8s/unstructured.go Outdated Show resolved Hide resolved
@tssurya
Copy link
Contributor Author

tssurya commented Jul 8, 2021

/test e2e-gcp-ovn-upgrade
/test e2e-gcp-ovn

@jluhrsen
Copy link
Contributor

jluhrsen commented Jul 8, 2021

/retest

@tssurya
Copy link
Contributor Author

tssurya commented Jul 12, 2021

/test e2e-gcp-ovn

1 similar comment
@tssurya
Copy link
Contributor Author

tssurya commented Jul 12, 2021

/test e2e-gcp-ovn

@squeed
Copy link
Contributor

squeed commented Jul 12, 2021

Would also like to see e2e-gcp-ovn-upgrade pass (or otherwise look good, logs-wise)

@tssurya
Copy link
Contributor Author

tssurya commented Jul 12, 2021

/test e2e-gcp-ovn-upgrade

@jluhrsen
Copy link
Contributor

@tssurya , is this ready now besides hopefully getting some of these failing checks to pass with a
/retest

@jluhrsen
Copy link
Contributor

there is little hope that the gcp-ovn-upgrade or openstack job will pass, but both [1][2] are optional so shouldn't
block this from being merged.

@openshift-ci openshift-ci bot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jul 14, 2021
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

7 similar comments
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@trozet
Copy link
Contributor

trozet commented Jul 22, 2021

/override ci/prow/e2e-gcp-ovn

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 22, 2021

@trozet: Overrode contexts on behalf of trozet: ci/prow/e2e-gcp-ovn

In response to this:

/override ci/prow/e2e-gcp-ovn

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

9 similar comments
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 23, 2021

@tssurya: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-vsphere-ovn c9c0a34 link /test e2e-vsphere-ovn
ci/prow/e2e-vsphere-windows c9c0a34 link /test e2e-vsphere-windows
ci/prow/e2e-openstack-ovn c9c0a34 link /test e2e-openstack-ovn
ci/prow/e2e-ovn-hybrid-step-registry c9c0a34 link /test e2e-ovn-hybrid-step-registry
ci/prow/e2e-azure-ovn c9c0a34 link /test e2e-azure-ovn
ci/prow/e2e-metal-ipi-ovn-ipv6-ipsec c9c0a34 link /test e2e-metal-ipi-ovn-ipv6-ipsec
ci/prow/e2e-gcp-ovn-upgrade c9c0a34 link /test e2e-gcp-ovn-upgrade

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

3 similar comments
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 38d5752 into openshift:master Jul 23, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 23, 2021

@tssurya: All pull requests linked via external trackers have merged:

Bugzilla bug 1970985 has been moved to the MODIFIED state.

In response to this:

Bug 1970985: SDN-1955: Add pre-puller ds to reduce upgrade downtime

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@vrutkovs
Copy link
Member

/cherry-pick release-4.8

@openshift-cherrypick-robot

@vrutkovs: #1141 failed to apply on top of branch "release-4.8":

Applying: Add pre-puller ds to reduce upgrade downtime
Using index info to reconstruct a base tree...
M	pkg/bootstrap/types.go
M	pkg/network/ovn_kubernetes.go
M	pkg/network/ovn_kubernetes_test.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/network/ovn_kubernetes_test.go
CONFLICT (content): Merge conflict in pkg/network/ovn_kubernetes_test.go
Auto-merging pkg/network/ovn_kubernetes.go
CONFLICT (content): Merge conflict in pkg/network/ovn_kubernetes.go
Auto-merging pkg/bootstrap/types.go
CONFLICT (content): Merge conflict in pkg/bootstrap/types.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Add pre-puller ds to reduce upgrade downtime
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherry-pick release-4.8

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants