New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use context to add timeout to cincinnati HTTP request #410
Use context to add timeout to cincinnati HTTP request #410
Conversation
3970ccf
to
17b5248
Compare
unit includes:
|
4ffb363
to
9ebfffc
Compare
/retest |
9ebfffc
to
4923ce5
Compare
/test unit |
/retest |
3 similar comments
/retest |
/retest |
/retest |
4923ce5
to
f5654a9
Compare
/retitle Use context to add timeout to cincinnati HTTP request |
f5654a9
to
292f41e
Compare
d5b7ee7
to
d1af4f9
Compare
c67db6e
to
8ee0d52
Compare
|
8ee0d52
to
6cb435f
Compare
Change CVO wait.Until instances to wait.UntilWithContext making top-level context available to the lower levels of the code. Use the added context to add a timeout to the remote cincinnati HTTP call which has the potential to hang the caller at the mercy of the target server. The timeout has been set to an hour which was determined to be long enough to allow all expected well-behaved requests to complete but short enough to not be catastrophic if the calling thread were to hang.
6cb435f
to
1d1de3b
Compare
/test integration |
/test e2e |
/test e2e-upgrade |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jottofar, wking The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
e2e passed, except for /override ci/prow/e2e |
@wking: Overrode contexts on behalf of wking: ci/prow/e2e In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Address a bug introduced by cc1921d (pkg/start: Release leader lease on graceful shutdown, 2020-08-03, openshift#424), where canceling the Operator.Run context would leave the operator with no time to attempt the final sync [1]: E0119 22:24:15.924216 1 cvo.go:344] unable to perform final sync: context canceled With this commit, I'm piping through shutdownContext, which gets a two-minute grace period beyond runContext, to give the operator time to push out that final status (which may include important information like the fact that the incoming release image has completed verification). --- This commit picks c4ddf03 (pkg/cvo: Use shutdownContext for final status synchronization, 2021-01-19, openshift#517) back to 4.5. It's not a clean pick, because we're missing changes like: * b72e843 (Bug 1822844: Block z level upgrades if ClusterVersionOverridesSet set, 2020-04-30, openshift#364). * 1d1de3b (Use context to add timeout to cincinnati HTTP request, 2019-01-15, openshift#410). which also touched these lines. But we've gotten this far without backporting rhbz#1822844, and openshift#410 was never associated with a bug in the first place, so instead of pulling back more of 4.6 to get a clean pick, I've just manually reconciled the pick conflicts. [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1916384#c10
Address a bug introduced by cc1921d (pkg/start: Release leader lease on graceful shutdown, 2020-08-03, openshift#424), where canceling the Operator.Run context would leave the operator with no time to attempt the final sync [1]: E0119 22:24:15.924216 1 cvo.go:344] unable to perform final sync: context canceled With this commit, I'm piping through shutdownContext, which gets a two-minute grace period beyond runContext, to give the operator time to push out that final status (which may include important information like the fact that the incoming release image has completed verification). --- This commit picks c4ddf03 (pkg/cvo: Use shutdownContext for final status synchronization, 2021-01-19, openshift#517) back to 4.5. It's not a clean pick, because we're missing changes like: * b72e843 (Bug 1822844: Block z level upgrades if ClusterVersionOverridesSet set, 2020-04-30, openshift#364). * 1d1de3b (Use context to add timeout to cincinnati HTTP request, 2019-01-15, openshift#410). which also touched these lines. But we've gotten this far without backporting rhbz#1822844, and openshift#410 was never associated with a bug in the first place, so instead of pulling back more of 4.6 to get a clean pick, I've just manually reconciled the pick conflicts. [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1916384#c10
Address a bug introduced by cc1921d (pkg/start: Release leader lease on graceful shutdown, 2020-08-03, openshift#424), where canceling the Operator.Run context would leave the operator with no time to attempt the final sync [1]: E0119 22:24:15.924216 1 cvo.go:344] unable to perform final sync: context canceled With this commit, I'm piping through shutdownContext, which gets a two-minute grace period beyond runContext, to give the operator time to push out that final status (which may include important information like the fact that the incoming release image has completed verification). --- This commit picks c4ddf03 (pkg/cvo: Use shutdownContext for final status synchronization, 2021-01-19, openshift#517) back to 4.5. It's not a clean pick, because we're missing changes like: * b72e843 (Bug 1822844: Block z level upgrades if ClusterVersionOverridesSet set, 2020-04-30, openshift#364). * 1d1de3b (Use context to add timeout to cincinnati HTTP request, 2019-01-15, openshift#410). which also touched these lines. But we've gotten this far without backporting rhbz#1822844, and openshift#410 was never associated with a bug in the first place, so instead of pulling back more of 4.6 to get a clean pick, I've just manually reconciled the pick conflicts. Removing Start from pkg/start (again) fixes a buggy re-introduction in the manually-backported 20421b6 (*: Add lots of Context and options arguments, 2020-07-24, openshift#470). [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1916384#c10
Change CVO wait.Until instances to wait.UntilWithContext making top-level context available to the lower levels of the code. Use the added context to add a timeout to the remote cincinnati HTTP call which has the potential to hang the caller at the mercy of the target server.
The timeout has been set to an hour which was determined to be long enough to allow all expected well-behaved requests to complete but short enough to not be catastrophic if the calling thread were to hang.