Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use MCD binary from container in /run/bin #1766

Merged

Conversation

vrutkovs
Copy link
Member

@vrutkovs vrutkovs commented May 29, 2020

(PR updated by Colin Walters walters@verbum.org,
Co-authored-by: Vadim Rutkovsky vrutkovs@redhat.com )


Use MCD binary from container in /run/bin

And drop all dependencies at firstboot and cluster time on the presence
of the MCD baked into the host.

This solves a whole pile of problems:

  • We need this for OKD (with FCOS)
  • Now pull requests with our CI naturally use
    the updated code, so we get (important) coverage of firstboot paths
  • Relatedly, we don't have to worry about "drift" between
    the on-host version and the in-container version
  • ART can stop building the MCD and RHCOS can drop it

Since we're now injecting our updated code into the bootimage,
we can drop the special 4.1 handling with pivot.service and
the special 4.2 unit.

Tweak the code run via machine-config-daemon-firstboot to
run rpm-ostree directly rather than via the -host unit, which
just adds confusion.

And finally
With this the MCD (when run as a pod) stops using machine-config-daemon-host.service.
and creates a dynamic unit instead.

With the combination of both, machine-config-daemon-host.service is on the
path to not being used by default and migrating to a "4.1 bootimage aid".

The systemd-run model of creating a unit dynamically is much clearer for what we want here;
conceptually the service is just a dynamic child of this pod (if we could we'd
tie the lifecycle together). Further:

  • Let's shorten our systemd unit names by using the mco- prefix
  • Inject the RPMOSTREE_CLIENT_ID, see coreos/rpm-ostree@016c1c5

Co-authored-by: Vadim Rutkovsky vrutkovs@redhat.com

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 29, 2020
@vrutkovs
Copy link
Member Author

/cc @LorbusChris

@cgwalters
Copy link
Member

When I was looking at this in the past, one problem is that all of the OCP containers are built against UBI7 (i.e. RHEL7). And while the MCD is Go and hence all statically linked...there's openssl which does come into play with FIPS and differs between the two (we carry openssl-deprecated in RHCOS solely because people are doing this type of "extract binary from a container" thing).

See coreos/fedora-coreos-tracker#354

So based on that one approach would be to change our container image to build binaries for both RHEL7/RHEL8 and then extract the rhel8 one here.

@vrutkovs vrutkovs force-pushed the mcd-pull-if-missing branch 3 times, most recently from 85fd7e5 to b13a0b8 Compare June 2, 2020 06:58
@sinnykumari
Copy link
Contributor

@LorbusChris
Copy link
Member

@vrutkovs thanks for including the proxy settings here!

Before this can go in, we need to create UBI8 based MCO images and pull MCD from there (as Colin mentioned).

@LorbusChris
Copy link
Member

@vrutkovs instead of having another container in the payload, we could also build the MCD in a UBI8 env and stick it into the UBI7 container

@vrutkovs
Copy link
Member Author

vrutkovs commented Jun 2, 2020

Right, golang-1.13 is centos7 based. Do we have a centos8-based builder image?

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 6, 2020
@LorbusChris
Copy link
Member

just a thought: Do we actually need to pull the MCD binary out of the container, or could we run it as a privileged DaemonSet from the container?

@cgwalters
Copy link
Member

just a thought: Do we actually need to pull the MCD binary out of the container, or could we run it as a privileged DaemonSet from the container?

The reason we do this is

// imported into the MCD itself. The reason we execute on the host is due to SELinux;

@cgwalters
Copy link
Member

@vrutkovs mind if I take this PR? I have a pile of changes to debug
https://bugzilla.redhat.com/show_bug.cgi?id=1842906
and I'm realizing that the lack of CI coverage for the on-host MCD is increasingly problematic for our sanity.

This would fix that and other things too.

@vrutkovs
Copy link
Member Author

mind if I take this PR?

That would be great! Feel free to make a new one or push to this branch

@cgwalters
Copy link
Member

ci/prow/cluster-bootimages — Job succeeded.

🎉 !

@cgwalters cgwalters changed the title WIP: Use MCD binary from container in /run/bin Use MCD binary from container in /run/bin Jun 23, 2020
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 23, 2020
@cgwalters
Copy link
Member

OK I'm lifting WIP on this.

I think the remaining issue is to determine whether or not we want to block on building a RHEL8 MCD binary into the container image: #1766 (comment)

I'm uncertain...it is something we can do later, and try this out in master? If we need to back this out, then the work of adding another binary into the image wouldn't be useful either.

@vrutkovs
Copy link
Member Author

I'm uncertain...it is something we can do later, and try this out in master?

+1 to that. I haven't hit rhel7/rhel8 issues yet, so the fix for this can be postponed

@cgwalters
Copy link
Member

e2e-gcp-upgrade looks like
https://bugzilla.redhat.com/show_bug.cgi?id=1791162
/test e2e-gcp-upgrade

@cgwalters
Copy link
Member

/assign sinnykumari
for review/approval (hope that's OK MCO team!)

I know this is a big change but...we all going to be way happier debugging this in the future I think. The "CI covers changes to firstboot" alone is worth it for all of our sanity.

Copy link
Contributor

@sinnykumari sinnykumari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome, ran few tests locally and they worked as expected!
With this change, day1 operations like kargs also applies perfectly with node scale-up on a 4.1/4.2 based upgraded cluster.

Thanks again Colin and Vadim for this PR. Let's get this in 🎉

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 25, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, sinnykumari, vrutkovs

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [cgwalters,sinnykumari]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@LorbusChris
Copy link
Member

/retest

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 617b654 into openshift:master Jun 25, 2020
@cgwalters
Copy link
Member

Followup to drop machine-config-daemon from RHCOS: https://gitlab.cee.redhat.com/coreos/redhat-coreos/-/merge_requests/1044

@runcom
Copy link
Member

runcom commented Jul 15, 2020

/test e2e-proxy

@runcom
Copy link
Member

runcom commented Jul 15, 2020

/test e2e-aws-proxy

@cgwalters
Copy link
Member

As of currently in 4.6, /usr/libexec/machine-config-daemon was dropped from RHCOS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants