Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

daemon: Run rpm-ostree directly in -firstboot.service #1802

Closed
wants to merge 1 commit into from

Conversation

cgwalters
Copy link
Member

@cgwalters cgwalters commented Jun 9, 2020

Currently machine-config-daemon-host is started by all 3 cases of:

  • directly via presence of /etc/pivot/image-pullspec
  • via machine-config-daemon-firstboot
  • via the MCD pod

This is unnecessary in the second case (we're already running as a systemd unit)
and it's quite confusing to have one systemd unit start another and
watch/replicate its logs.

Motivated by debugging https://bugzilla.redhat.com/show_bug.cgi?id=1842906

In theory this actually widens the race in allowing two things to try to
upgrade at the same time, but it will help us turn machine-config-daemon-host.service
into something more clearly targeted for handling 4.1 bootimages on firstboot, much like
how machine-config-daemon-firstboot-v42.service exists.

Currently `machine-config-daemon-host` is started by all 3 cases of:

- directly via presence of `/etc/pivot/image-pullspec`
- via `machine-config-daemon-firstboot`
- via the MCD pod

This is unnecessary in the second case (we're already running as a systemd unit)
and it's quite confusing to have one systemd unit start another and
watch/replicate its logs.

Motivated by debugging https://bugzilla.redhat.com/show_bug.cgi?id=1842906

In theory this actually *widens* the race in allowing two things to try to
upgrade at the same time, but it will help us turn `machine-config-daemon-host.service`
into something more clearly targeted for handling 4.1 bootimages on firstboot, much like
how `machine-config-daemon-firstboot-v42.service` exists.
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 9, 2020
@cgwalters
Copy link
Member Author

Also let me apologize for the horrendous mess around all of this code...in my defense there are some confusing external constraints like the SELinux BZ and handling 4.1 bootimages.

Once both of those get knocked out a whole lot of stuff will get cleaner.

cgwalters added a commit to cgwalters/machine-config-operator that referenced this pull request Jun 9, 2020
This is the mirror/dual of
openshift#1802

With this the MCD (when run as a pod) stops using `machine-config-daemon-host.service`.
and creates a dynamic unit instead.

With the combination of both, `machine-config-daemon-host.service` is on the
path to not being used by default and migrating to a "4.1 bootimage aid".

The systemd-run model of creating a unit dynamically is much clearer for what we want here;
conceptually the service is just a dynamic child of this pod (if we could we'd
tie the lifecycle together).  Further:

- Let's shorten our systemd unit names by using the `mco-` prefix
- Inject the `RPMOSTREE_CLIENT_ID`, see coreos/rpm-ostree@016c1c5

For example, one weird semantic of `systemctl start` is that it "joins" if
somehow the service is started for another reason.  But here, if somehow
two instances of the MCD were running then `systemd-run` will say e.g.:
`Failed to start transient service unit: Unit mco-pivot.service already exists.`
@cgwalters
Copy link
Member Author

/hold
I sent this up as a cleanup, and I want to have CI test it, but let's focus on #1804

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 9, 2020
@openshift-ci-robot
Copy link
Contributor

@cgwalters: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-gcp-upgrade 731138f link /test e2e-gcp-upgrade
ci/prow/e2e-aws 731138f link /test e2e-aws

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@cgwalters
Copy link
Member Author

And likewise for this, CI isn't telling us anything here because this code only runs in the bootimage and we're not overriding that w/our CI right now.

(This would also be fixed by #1804 (comment) )

return fmt.Errorf("failed to run pivot: %v", err)
// In the cluster case, for now we run indirectly via machine-config-daemon-host.service
// for SELinux reasons, see https://bugzilla.redhat.com/show_bug.cgi?id=1839065
if dn.kubeClient != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering instead of checking for kubeClient if we should check for /etc/ignition-machine-config-encapsulated.json file existence.

@cgwalters
Copy link
Member Author

Closing in favor of piling onto doing this in "one shot" with #1766
(And thanks Sinny for reviewing the code here!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants