New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1741817: mcd: Add /run/machine-config-daemon-force stamp file #1086
Conversation
This causes the MCD to skip validating against `currentConfig` (or `pendingConfig`). Related: https://bugzilla.redhat.com/show_bug.cgi?id=1741817 A long time ago, this PR introduced the current model: openshift#245 One aspect of this is we need to avoid reboot loops; that was a real-world problem early in OpenShift development, although it is probably unlikely to re-occur today. Another problem is that we can't simply reconcile by default because we don't have a mechanism to coordinate reboots: openshift#662 (comment) However, this PR should aid disaster recovery scenarios and others where administrators want the MCD to "just do it".
cc: @rphillips |
To be clear the DR instructions should then include:
|
/cc @bergerhoffer |
Noted on BZ, but we're looking for the above command to appear as 9.e. in the DR instructions something like: |
@cgwalters: This pull request references Bugzilla bug 1741817, which is valid. The bug has been moved to the POST state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
vpc errors on e2e-aws-op /approve looks really good |
I didn't test manually (or add an e2e test) yet - deploying this locally I hit a very slow drain in a default aws cluster, MCD on the final worker:
We really need to roll up into status something like "draining node ip-10-0-163-146.us-east-2.compute.internal". |
like a "Working reason" |
This looks like the pod that made it slow - and I think it's not listening to drain (same bug as the registry one perhaps, I"ll check) |
/retest Doc PR: openshift/openshift-docs#16399 |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cgwalters, rphillips, runcom The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@@ -892,8 +892,12 @@ func (dn *Daemon) checkStateOnFirstRun() error { | |||
glog.Infof("Validating against current config %s", state.currentConfig.GetName()) | |||
expectedConfig = state.currentConfig | |||
} | |||
if !dn.validateOnDiskState(expectedConfig) { | |||
return fmt.Errorf("unexpected on-disk state validating against %s", expectedConfig.GetName()) | |||
if _, err := os.Stat(constants.MachineConfigDaemonForceFile); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we be more specific about the type of error here? Also is there a way to force when there's some issue preventing reading or setting this file successfully?
@cgwalters: All pull requests linked via external trackers have merged. Bugzilla bug 1741817 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherrypick release-4.1 |
@rphillips: #1086 failed to apply on top of branch "release-4.1":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[release-4.1] Bug 1749271: Backport mcd: Add /run/machine-config-daemon-force stampfile #1086
After some discussion on this I feel like this PR was wrong: what we really want is something more like "automatically reconcile any changes we detect rather than go degraded". |
The only thing we need to make sure is that whatever we’re force applying, it must be reflected into a rendered mc (like it is usually). I think one of the DR scenario advices to modify the files on host and use the force file which is ineffective if those changes aren’t in a the rendered machine config as well. |
This causes the MCD to skip validating against
currentConfig
(orpendingConfig
).Related: https://bugzilla.redhat.com/show_bug.cgi?id=1741817
A long time ago, this PR introduced the current model:
#245
One aspect of this is we need to avoid reboot loops; that was
a real-world problem early in OpenShift development, although
it is probably unlikely to re-occur today.
Another problem is that we can't simply reconcile by default because
we don't have a mechanism to coordinate reboots:
#662 (comment)
However, this PR should aid disaster recovery scenarios and others
where administrators want the MCD to "just do it".