New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use MCD binary from container in /run/bin #1766
Use MCD binary from container in /run/bin #1766
Conversation
/cc @LorbusChris |
When I was looking at this in the past, one problem is that all of the OCP containers are built against UBI7 (i.e. RHEL7). And while the MCD is Go and hence all statically linked...there's See coreos/fedora-coreos-tracker#354 So based on that one approach would be to change our container image to build binaries for both RHEL7/RHEL8 and then extract the rhel8 one here. |
templates/common/_base/units/machine-config-daemon-pull.service.yaml
Outdated
Show resolved
Hide resolved
templates/common/_base/units/machine-config-daemon-pull.service.yaml
Outdated
Show resolved
Hide resolved
85fd7e5
to
b13a0b8
Compare
templates/common/_base/units/machine-config-daemon-host.service
Outdated
Show resolved
Hide resolved
templates/common/_base/units/machine-config-daemon-pull.service.yaml
Outdated
Show resolved
Hide resolved
We would also need to create symlink and fix m-c-d path in https://github.com/openshift/machine-config-operator/blob/master/templates/common/_base/units/machine-config-daemon-firstboot-v42.service#L18 |
b13a0b8
to
dd5a74d
Compare
@vrutkovs thanks for including the proxy settings here! Before this can go in, we need to create UBI8 based MCO images and pull MCD from there (as Colin mentioned). |
templates/common/_base/units/machine-config-daemon-firstboot.service
Outdated
Show resolved
Hide resolved
@vrutkovs instead of having another container in the payload, we could also build the MCD in a UBI8 env and stick it into the UBI7 container |
Right, |
templates/common/_base/units/machine-config-daemon-pull.service.yaml
Outdated
Show resolved
Hide resolved
just a thought: Do we actually need to pull the MCD binary out of the container, or could we run it as a privileged DaemonSet from the container? |
The reason we do this is
|
@vrutkovs mind if I take this PR? I have a pile of changes to debug This would fix that and other things too. |
That would be great! Feel free to make a new one or push to this branch |
dd5a74d
to
ae20e37
Compare
🎉 ! |
OK I'm lifting WIP on this. I think the remaining issue is to determine whether or not we want to block on building a RHEL8 MCD binary into the container image: #1766 (comment) I'm uncertain...it is something we can do later, and try this out in master? If we need to back this out, then the work of adding another binary into the image wouldn't be useful either. |
+1 to that. I haven't hit rhel7/rhel8 issues yet, so the fix for this can be postponed |
e2e-gcp-upgrade looks like |
/assign sinnykumari I know this is a big change but...we all going to be way happier debugging this in the future I think. The "CI covers changes to firstboot" alone is worth it for all of our sanity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome, ran few tests locally and they worked as expected!
With this change, day1 operations like kargs also applies perfectly with node scale-up on a 4.1/4.2 based upgraded cluster.
Thanks again Colin and Vadim for this PR. Let's get this in 🎉
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cgwalters, sinnykumari, vrutkovs The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest |
/retest Please review the full test history for this PR and help us cut down flakes. |
1 similar comment
/retest Please review the full test history for this PR and help us cut down flakes. |
Followup to drop |
/test e2e-proxy |
/test e2e-aws-proxy |
As of currently in 4.6, |
(PR updated by Colin Walters walters@verbum.org,
Co-authored-by: Vadim Rutkovsky vrutkovs@redhat.com )
Use MCD binary from container in /run/bin
And drop all dependencies at firstboot and cluster time on the presence
of the MCD baked into the host.
This solves a whole pile of problems:
the updated code, so we get (important) coverage of firstboot paths
the on-host version and the in-container version
Since we're now injecting our updated code into the bootimage,
we can drop the special 4.1 handling with
pivot.service
andthe special
4.2
unit.Tweak the code run via
machine-config-daemon-firstboot
torun rpm-ostree directly rather than via the
-host
unit, whichjust adds confusion.
And finally
With this the MCD (when run as a pod) stops using
machine-config-daemon-host.service
.and creates a dynamic unit instead.
With the combination of both,
machine-config-daemon-host.service
is on thepath to not being used by default and migrating to a "4.1 bootimage aid".
The systemd-run model of creating a unit dynamically is much clearer for what we want here;
conceptually the service is just a dynamic child of this pod (if we could we'd
tie the lifecycle together). Further:
mco-
prefixRPMOSTREE_CLIENT_ID
, see coreos/rpm-ostree@016c1c5Co-authored-by: Vadim Rutkovsky vrutkovs@redhat.com