Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MCO-286: Add mode for template controller to write to /usr, spike on building a container #3137

Closed
cgwalters opened this issue May 5, 2022 · 9 comments
Labels
layering lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@cgwalters
Copy link
Member

cgwalters commented May 5, 2022

Today the MCO ships a whole lot of static files. In this spike we introduce something like a Dockerfile that does:

FROM machine-config-operator as builder
RUN machine-config-controller extract-static-templates > /srv/out.json

FROM rhel-coreos
COPY --from=builder /srv/out.json
RUN ignition-liveapply /srv/out.json && rm -f /srv/out.json

The output of this would be a new openshift-node-base image that ends up in the release image too. And this image would become the "golden image" that is rolled out by the MCO by default.

In this, the template controller will need to put e.g. what is know /etc/systemd/system/kubelet.service to /usr/lib/systemd/system/kubelet.service, and /usr/local/bin/nodeip-config.sh (this is really /var/usrlocal/nodeip-config.sh) to /usr/bin/nodeip-config.sh.

@cgwalters
Copy link
Member Author

To build and extend on this a bit - notice a huge benefit of this transition is that suddenly files that the MCO currently owns move underneath the ostree read-only bind mount. Today, an admin can ssh to a node and vi /etc/systemd/system/kubelet.conf and that will work - the config drift monitoring will hopefully kick in.

Instead in this world, when they try to vi /usr/lib/systemd/system/kubelet.conf, they will get a permission denied, same as for all the OS binaries.

Of course, nothing stops them creating /etc/systemd/system/kubelet.conf which will override per systemd rules, or for that matter using systemctl edit kubelet.

But...I do hope actually what we can do ostree side actually is move to an opt-in model where people request a "sealed" system where /etc is really just a symlink to /usr/etc - and then it's all clearly immutable.

Then, following onto this - I think a powerful model will be enabling people to (cryptographically) sign their images with e.g. Linux IMA. Then the protection we have can't be subverted with a simple mount -o remount,rw /usr; this would help provide mitigation in some container breakout/exploit scenarios too.

@cgwalters cgwalters changed the title Add mode for template controller to write to /usr Add mode for template controller to write to /usr, spike on building a container Jun 2, 2022
@cgwalters cgwalters changed the title Add mode for template controller to write to /usr, spike on building a container MCO-286: Add mode for template controller to write to /usr, spike on building a container Jun 2, 2022
@cgwalters
Copy link
Member Author

Another big thing we can do once this lands is try a spike where we switch from templating things like kubelet.service to dynamic dispatch. For example, switch to a systemd generator which dispatches on ignition.platform.id (this intersects OCP platforms and CoreOS platforms, which has some nontrivial subtleties).

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 1, 2022
@cgwalters cgwalters removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 9, 2022
@cgwalters
Copy link
Member Author

cgwalters commented Oct 5, 2022

And doing this then leads to the next domino: removing the Machine Config Server entirely. Suddenly a vast swath of issues go away (e.g. #784 etc.)
(It'd be replaced by a registry on the bootstrap node if we have MachineConfig to apply, or in the "golden" case of zero configuration at all we literally just pull the stock openshift-node-base container image that this issue talks about).

Another way to look at this: how much we use/depend on Ignition in OpenShift shrinks a lot.

And then...you know, it seems quite viable to have a mode for openshift-install where it can output a kickstart file for Anaconda too...and we could support installing RHEL CoreOS via Anaconda (xref https://bugzilla.redhat.com/show_bug.cgi?id=2125655 - the role of the kickstart is basically just to pull our node base image). Irrelevant for cloud, and we've got all this covered really well with Assisted installer etc too...but I'm sure there's a not-small percentage of customers on bare metal for whom that would make RHEL CoreOS actually in practice feel much more like RHEL.

EDIT: Actually another big domino after we drop the MCS is that the need for networking in the initramfs for CoreOS also shrinks - in the cloud case we don't need to do DHCP, just link-local. For hypervisors that give us metadata via a non-IP channel, we go back to not needing networking at all. We do need initramfs networking for Tang of course. And we've invested a lot in initramfs networking, since RHEL9 and current Fedora it seems to work well.

@cgwalters
Copy link
Member Author

cgwalters commented Oct 7, 2022

Oooh. I just had another idea related to this...I'm thinking we could support a flow where we use the stock Fedora/CentOS/RHEL cloud images (AMI, GCP etc.) like this:

Stock RHEL AMI comes up, openshift-install has attached cloud-init data to it.

That cloud-init injected code entirely re-paves the system fetch and deploy the target oscontainer (a beauty of ostree is it just drops new files into /ostree), then start executing from ram and the old running root filesystem and reboots into it. Again, everything that existed there before is gone, actually we would wipe and reinitialize the bootloader state (ESP, MBR etc.) too. What we'd be keeping is the provisioned filesystem and that's it.

Done! OK well, almost...

Two important details here. First, assuming we've followed the model where all the secrets are embedded in the user data, we would need to inject the pull secret via cloud-init - and then we have a choice:

  • preserve that data across the "re-paving"
  • actually embed an ignition config inside the cloud-init config (in a way that cloud-init will ignore), then on the next boot, we actually run ignition in the same way we do today!

What if we have Ignition that wants to re-partition the disk? Yeah, here's where things would be much better if we had Ignition as an opt-in in these images too, because it actually makes sense to unify the "re-paving" and "re-partitioning". But in the short term, if you have Ignition repartitioning specified, we take another reboot (or, we could try to hack things up so that we run ignition not from the initramfs but from our similar running-from-RAM setup after we've fetched the target OS).

What would be the value in all of this? Well, for one thing, we could stop uploading and managing RHEL CoreOS cloud images...which would be kind of a big deal. It means for customers who want to install OpenShift in e.g. some private cloud and they already have uploaded RHEL guest images, we can just reuse that instead of making them upload, manage and maintain a different one.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 6, 2023
@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 5, 2023
@openshift-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this as completed Mar 8, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 8, 2023

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
layering lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

2 participants