New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-32091: Add Top-level Context for Create Commands #8063
OCPBUGS-32091: Add Top-level Context for Create Commands #8063
Conversation
@patrickdillon: This pull request references CORS-3241 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/cc @andfasano @vincepri |
@@ -310,6 +301,14 @@ func (i *InfraProvider) Provision(dir string, parents asset.Parents) ([]*asset.F | |||
} | |||
|
|||
logrus.Infof("Cluster API resources have been created. Waiting for cluster to become ready...") | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did this sneak into this PR? This doesn't seem directly related to the work on contexts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could move to a different commit or PR, but it's somewhat related with the focus of a clean shutdown of the capi controllers.
b3b35dc
to
50009d4
Compare
/assign @vincepri Even though we are using this context when starting the controllers
We still need to handle the CTRL+C user interrupt, because the context will not be done/cancelled with an interrupt, right? How can we ensure that the CAPI system Teardown() function has enough time to complete? In the current implementation, the Teardown function begins, but when the context is cancelled an error is returned which ends up calling logrus.Exit before Teardown reports it is done: installer/cmd/openshift-install/create.go Lines 313 to 323 in 6b4523d
This is similar to what you handled in #7693 and #7864 but I still don't see how to make it work here. |
@patrickdillon In this case we'd have to create a new context, as a parent of the one that we use from signals. We can then wait for |
50009d4
to
dadb4f7
Compare
0ec14af
to
401fccb
Compare
94aecd0
to
1e4ce95
Compare
/cc @bfournie |
0398c63
to
35c47ef
Compare
@patrickdillon: This pull request references CORS-3241 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Tried this with a ctrl-c right after capi started:
|
Yup that is the expected result. The returned error will vary depending on where/why exit. In terms of UX, I wonder if we should move these to debug level but I think that is mostly a nit.
|
Adds a top-level context to be passed down from main, as well as an interrupt handler with graceful shutdown logic. The signal handler allows us to trap a user interrupt and run graceful shutdown logic rather than exiting immediately. This graceful shutdown allows us to run any cleanup operations, particularly we can shutdown locally-running CAPI controller processes. Otherise they can potentially leak and continue running in the background (continuing to perform reconcile actions such as creating cloud resources).
Introduces the asset generator interface, which allows generating assets with a passed in context. Provides an adapter for assets that do not implement GenerateWithContext. The adapter simply wraps the asset and calls the original Generate (without context) function.
Plumbs the top-level context into the agent, create, & gather commands and into the asset graph.
Removes the original signal handler, which is replaced by the signal handler in installer main. pick 1464d8847d main: add top-level context and graceful shutdown
Typically the CAPI system needs to continue to run until bootstrap destroy (because it is used in the bootstrap destroy process). Shut it down if we are preserving bootstrap resources.
Updates tests to accept a context after the interface for the asset store was updated.
35c47ef
to
a132f85
Compare
@patrickdillon: This pull request references Jira Issue OCPBUGS-32091, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: r4f4, vincepri The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@patrickdillon: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
d8c7872
into
openshift:master
@patrickdillon: Jira Issue OCPBUGS-32091: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-32091 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
[ART PR BUILD NOTIFIER] This PR has been included in build ose-installer-altinfra-container-v4.16.0-202404121144.p0.gd8c7872.assembly.stream.el8 for distgit ose-installer-altinfra. |
Adds a top-level context when running create commands. The context is passed through the asset store and then to any PostRun functions. The main motivation is for a clean shutdown of CAPI controllers, but there are a wide variety of potential use cases.
This PR uses the pattern introduced in #6009 to enable the introduction of a
GenerateWithContext
function for assets and an adapter to drop the context and call the originalGenerate
function. Currently only the Cluster asset implementsGenerateWithContext
.Currently the PR achieves the primary goal of shutting down capi controllers on interrupt. There are some remaining issues I would like to clear up: