Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use context to add timeout to cincinnati HTTP request #410

Merged

Conversation

jottofar
Copy link
Contributor

@jottofar jottofar commented Jul 17, 2020

Change CVO wait.Until instances to wait.UntilWithContext making top-level context available to the lower levels of the code. Use the added context to add a timeout to the remote cincinnati HTTP call which has the potential to hang the caller at the mercy of the target server.

The timeout has been set to an hour which was determined to be long enough to allow all expected well-behaved requests to complete but short enough to not be catastrophic if the calling thread were to hang.

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 17, 2020
@jottofar jottofar force-pushed the ota-244-add-context branch 3 times, most recently from 3970ccf to 17b5248 Compare July 17, 2020 22:11
@wking
Copy link
Member

wking commented Jul 18, 2020

unit includes:

# github.com/openshift/cluster-version-operator/pkg/cvo [github.com/openshift/cluster-version-operator/pkg/cvo.test]
pkg/cvo/cvo_scenarios_test.go:114:15: not enough arguments in call to o.sync
	have (string)
	want (context.Context, string)

@jottofar jottofar force-pushed the ota-244-add-context branch 5 times, most recently from 4ffb363 to 9ebfffc Compare July 20, 2020 23:59
@jottofar
Copy link
Contributor Author

/retest

@jottofar
Copy link
Contributor Author

/test unit

@jottofar
Copy link
Contributor Author

/retest

3 similar comments
@jottofar
Copy link
Contributor Author

/retest

@jottofar
Copy link
Contributor Author

/retest

@jottofar
Copy link
Contributor Author

/retest

pkg/cvo/sync_worker.go Outdated Show resolved Hide resolved
pkg/cvo/cvo_test.go Outdated Show resolved Hide resolved
pkg/cvo/cvo.go Outdated Show resolved Hide resolved
@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 24, 2020
@jottofar
Copy link
Contributor Author

/retitle Use context to add timeout to cincinnati HTTP request

@openshift-ci-robot openshift-ci-robot changed the title Add a context to remote cincinnati calls from CVO Use context to add timeout to cincinnati HTTP request Jul 24, 2020
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 24, 2020
pkg/cincinnati/cincinnati.go Outdated Show resolved Hide resolved
pkg/cincinnati/cincinnati.go Outdated Show resolved Hide resolved
@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 28, 2020
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 28, 2020
@jottofar jottofar force-pushed the ota-244-add-context branch 3 times, most recently from c67db6e to 8ee0d52 Compare July 30, 2020 13:20
@wking
Copy link
Member

wking commented Jul 30, 2020

pkg/cincinnati/cincinnati.go:89:36: undefined: transport

Change CVO wait.Until instances to wait.UntilWithContext making top-level
context available to the lower levels of the code. Use the added context
to add a timeout to the remote cincinnati HTTP call which has the
potential to hang the caller at the mercy of the target server.

The timeout has been set to an hour which was determined to be long
enough to allow all expected well-behaved requests to complete but short
enough to not be catastrophic if the calling thread were to hang.
@jottofar
Copy link
Contributor Author

/test integration

@jottofar
Copy link
Contributor Author

/test e2e

@jottofar
Copy link
Contributor Author

/test e2e-upgrade

@wking
Copy link
Member

wking commented Jul 31, 2020

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 31, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jottofar, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@wking
Copy link
Member

wking commented Jul 31, 2020

e2e passed, except for an end user can use OLM can subscribe to the operator, which is rhbz#1862322.

/override ci/prow/e2e

@openshift-ci-robot
Copy link
Contributor

@wking: Overrode contexts on behalf of wking: ci/prow/e2e

In response to this:

e2e passed, except for an end user can use OLM can subscribe to the operator, which is rhbz#1862322.

/override ci/prow/e2e

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-merge-robot openshift-merge-robot merged commit 4c3024d into openshift:master Jul 31, 2020
wking added a commit to wking/cluster-version-operator that referenced this pull request Feb 20, 2021
Address a bug introduced by cc1921d (pkg/start: Release leader
lease on graceful shutdown, 2020-08-03, openshift#424), where canceling the
Operator.Run context would leave the operator with no time to attempt
the final sync [1]:

  E0119 22:24:15.924216       1 cvo.go:344] unable to perform final sync: context canceled

With this commit, I'm piping through shutdownContext, which gets a
two-minute grace period beyond runContext, to give the operator time
to push out that final status (which may include important information
like the fact that the incoming release image has completed
verification).

---

This commit picks c4ddf03 (pkg/cvo: Use shutdownContext for final
status synchronization, 2021-01-19, openshift#517) back to 4.5.  It's not a
clean pick, because we're missing changes like:

* b72e843 (Bug 1822844: Block z level upgrades if
  ClusterVersionOverridesSet set, 2020-04-30, openshift#364).
* 1d1de3b (Use context to add timeout to cincinnati HTTP request,
  2019-01-15, openshift#410).

which also touched these lines.  But we've gotten this far without
backporting rhbz#1822844, and openshift#410 was never associated with a bug in
the first place, so instead of pulling back more of 4.6 to get a clean
pick, I've just manually reconciled the pick conflicts.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1916384#c10
wking added a commit to wking/cluster-version-operator that referenced this pull request Feb 23, 2021
Address a bug introduced by cc1921d (pkg/start: Release leader
lease on graceful shutdown, 2020-08-03, openshift#424), where canceling the
Operator.Run context would leave the operator with no time to attempt
the final sync [1]:

  E0119 22:24:15.924216       1 cvo.go:344] unable to perform final sync: context canceled

With this commit, I'm piping through shutdownContext, which gets a
two-minute grace period beyond runContext, to give the operator time
to push out that final status (which may include important information
like the fact that the incoming release image has completed
verification).

---

This commit picks c4ddf03 (pkg/cvo: Use shutdownContext for final
status synchronization, 2021-01-19, openshift#517) back to 4.5.  It's not a
clean pick, because we're missing changes like:

* b72e843 (Bug 1822844: Block z level upgrades if
  ClusterVersionOverridesSet set, 2020-04-30, openshift#364).
* 1d1de3b (Use context to add timeout to cincinnati HTTP request,
  2019-01-15, openshift#410).

which also touched these lines.  But we've gotten this far without
backporting rhbz#1822844, and openshift#410 was never associated with a bug in
the first place, so instead of pulling back more of 4.6 to get a clean
pick, I've just manually reconciled the pick conflicts.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1916384#c10
wking added a commit to wking/cluster-version-operator that referenced this pull request Feb 23, 2021
Address a bug introduced by cc1921d (pkg/start: Release leader
lease on graceful shutdown, 2020-08-03, openshift#424), where canceling the
Operator.Run context would leave the operator with no time to attempt
the final sync [1]:

  E0119 22:24:15.924216       1 cvo.go:344] unable to perform final sync: context canceled

With this commit, I'm piping through shutdownContext, which gets a
two-minute grace period beyond runContext, to give the operator time
to push out that final status (which may include important information
like the fact that the incoming release image has completed
verification).

---

This commit picks c4ddf03 (pkg/cvo: Use shutdownContext for final
status synchronization, 2021-01-19, openshift#517) back to 4.5.  It's not a
clean pick, because we're missing changes like:

* b72e843 (Bug 1822844: Block z level upgrades if
  ClusterVersionOverridesSet set, 2020-04-30, openshift#364).
* 1d1de3b (Use context to add timeout to cincinnati HTTP request,
  2019-01-15, openshift#410).

which also touched these lines.  But we've gotten this far without
backporting rhbz#1822844, and openshift#410 was never associated with a bug in
the first place, so instead of pulling back more of 4.6 to get a clean
pick, I've just manually reconciled the pick conflicts.

Removing Start from pkg/start (again) fixes a buggy re-introduction in
the manually-backported 20421b6 (*: Add lots of Context and options
arguments, 2020-07-24, openshift#470).

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1916384#c10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants