The MCO manages the Red Hat Enterprise Linux CoreOS (RHCOS) operating system. Further,
the operating system itself is just another part of the release image, called
In other words, the cluster controls the operating system.
"Bootimage" vs machine-os-content
We will use the term "bootimage" to mean an initial RHCOS disk image, such
as an AMI, bare metal raw disk image, VMWare VMDK, OpenStack qcow2, etc.
These bootimages are built using coreos-assembler.
Today, the installer pins the "bootimages"
it uses, and released installers also pin the release image. As noted above,
release images contain
machine-os-content, which can be a different
It's essential to understand that both the bootimage and the
are both essentially wrappers for an OSTree commit.
The OSTree format is an image format designed for in-place operating system updates; it operates
at the filesystem level (like container images) but (unlike container runtimes) has
tooling to manage things such as the bootloader and handling persistence of
On top of the OSTree format, Red Hat Enterprise Linux CoreOS uses pivot
which is a "glue layer" that handles the encapsulation of the
with the OSTree repository.
The early pivot
We do not want to require that a new bootimage is released for every update, and in general it can be hard to require that in every environment (for example, bare metal PXE setups). Hence, the MCO and installer combine to implement "the early pivot".
By this mechanism the cluster can install using an older bootimage, and
bring the operating system into the state targeted by the
in the main release image. Let's step through aspects of
cluster bootstrap/installation and add some information about the early pivot.
In this example we'll use AWS, this process is similar to a situation of booting bare metal machines via PXE or VMs in OpenStack.
The openshift-installer starts, and uses the AMI it has pinned as the bootstrap node.
The bootstrap node's
bootkube.sh service pulls the release image, which
contains a reference to the MCO (
machine-config-operator) and also a
reference to a newer
bootkube.sh service runs the MCO in
"bootstrap" mode to generate and serve Ignition to the master machines.
These Ignition configs contain a reference to the newer machine-os-content from
the release image in the
The master machines boot, pulling their Ignition configs from the bootstrap
node. As part of that
pivot.service runs (which is
Before=kubelet.service, so before the cluster starts). It detects the
reference to the updated
machine-os-content and pulls it, uses it to upgrade, and
The master machines come up and form a cluster.
Workers and management via the MCO
At this point, Ignition has been executed, and that only runs once.
The master machines use the machine-api-operator to boot the workers. Each worker pulls Ignition configs from the MCS running on the control plane, which has the exact same pivot code. They also upgrade/reboot, and then join the cluster.
Thereafter, the MCO takes over fully. The
lands on the master nodes, and will start watching for updates to
machine-os-content from the release image, as well as any changes
Every change now will be managed by a
that only 1 machine at a time is changed (via
maxUnavailable: 1 default).
However, because of the early pivot, the master nodes have already booted the desired version (and the Ignition config generated at bootstrap time should match the in-cluster one) so the MCO has nothing to do.
But now the MCO is fully in control of operating system updates.
The next time the admin does an
oc adm upgrade, if a new
is provided in the release image, it will be rolled out to the masters
At the time of this writing, there is not a mechanism to roll out updates to bootimages. For example, in EC2, the AMI used will remain the same for the lifetime of a cluster. It is likely at some point that the machine-api-operator will extract bootimage data from the release image, but it is not yet implemented.