Skip to content

[layering] Separate build controller to handle builds for layered pools #3071

Merged
cgwalters merged 18 commits intoopenshift:layeringfrom
jkyros:build-controllery-bits
May 12, 2022
Merged

[layering] Separate build controller to handle builds for layered pools #3071
cgwalters merged 18 commits intoopenshift:layeringfrom
jkyros:build-controllery-bits

Conversation

@jkyros
Copy link
Copy Markdown
Member

@jkyros jkyros commented Apr 7, 2022

This is the build controller code so far. I will type better words later.

  • Still needs a bunch of cleanup.

  • Works like before where you create the "layered" pool

  • Difference from a user perspective is that desiredConfig becomes a sha256:a7c711e2f7ff0608cfdc601bc03f21b0e6cb825aa7b75bd30da53ee4760b48eb instead of a rendered config

    • If we like that, we probably need to better correlate that image to the machineconfig for the user
  • Build status/failure detection is still in progress

This correlates with MCO-228

Current Release Image: https://quay.io/jkyros/ocp-release:pr-3071-may-11

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 7, 2022
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 7, 2022

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci Bot requested review from cgwalters and yuqi-zhang April 7, 2022 17:00
Comment thread vendor/github.com/coreos/ignition/v2/config/merge/merge.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with the Image API or how the build controller is consuming this, so not sure if this would be possible...but I'm wondering if we could use generated.Name as a tag and then also tag that as latest. Then if I delete a MC, rolling back to the previously rendered config is faster, because you could just check that the image with that tag already exists and tag it as latest.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we do this once rather than in the sync loop?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to tag with the rendered config and see if an image has been built already rather than using time?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some of my other comments were based on the assumption that the base image won't have changed. Hmm may have to think about this more on Monday 🤔

@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 9, 2022
@cgwalters
Copy link
Copy Markdown
Member

OK sorry I had only briefly skimmed this before. So, this is a large change that is highly likely to conflict with any other changes (e.g. #3083 ) so...I think we should try to collaborate on this PR, push this forward and get it into the layering branch.

jkyros added 4 commits April 21, 2022 01:47
This removes most of the stuff from the render controller that was added
to the build controller. As a result, the render controller for the most
part goes back to the way it was. There is no avoiding having some
change to the render controller, as it will need to know to ignore the
pools that the build controller is managing.
We need to specify what resources the pool should be creating/owning in
order to manage builds and the relationships between them (triggers,
which buildconfigs should source from which streams, etc). The pool
needs some template/plan to follow when it gets triggered to create and
verify that its resources exist.

This is my not-quite-as-elegant-as-I-had-hoped attempt at that.
This confines the image and build informers to the MCO's namespace.
There are potentially many builds and images in the cluster and watching
all of them and their events is potentially expensive.

As our constraints currently sit, all of the images and builds will be
dealing with are present in the MCO namespace.
This adds helpers for getting and setting the current image stream for a
pool, separates the per-pool suffix '-base' from the global 'coreos'
imagestream, and adds a per-node stream name constant for later.
@jkyros jkyros force-pushed the build-controllery-bits branch from 0c7e0d2 to f5f87e7 Compare April 21, 2022 06:56
@openshift-ci openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 21, 2022
Comment thread pkg/daemon/daemon.go Outdated
// TODO(jkyros): This is a race I usually win, but I won't always, and we need a better
// way to get this metadata in
if imagestream.Name == pool.Name+ctrlcommon.ImageStreamSuffixRenderedConfig {
ctrl.cheatMachineConfigLabelIntoBuildConfig(imagestream, pool)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is silly. This is me trying to sneak the dynamic labels onto the buildconfig before it triggers.

I'm injecting the image labels via the dockerfile + args right now, but if a user runs oc start-build, say to trigger after a build fails, the user's build won't have the args, which means it won't get the label, which means we won't know what machineconfig an image contains (and I know we don't need it anymore because our config is the image, but I really do want to be able to show it back somehow).

// experimentalAddEntitlements grabs the cluster entitlement certificates out of the openshift-config-managed namespace and
// stuffs them into a machineconfig for our layered pool, so we can have entitled builds. This is a terrible practice, and
// we should probably just sync the secrets into our namespace so our builds can use them directly rather than expose them via machineconfig.
func (ctrl *Controller) experimentalAddEntitlements(pool *mcfgv1.MachineConfigPool) {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talking with Colin, we will need to think about the RHEL9 ramifications of this -- because it will probably be wrong 😄

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beyond just the RHEL8 versus RHEL9 versus e.g. C9S issues, I think we also need to handle the case of disconnected installs where a user already has a local RHEL mirror. Us trying to overwrite redhat.repo would then just be broken.

I think we need to de-couple "automatically set up repos" from layering; the former needs to be a different opt-in mechanism.

Name: image.Image,
}
// TODO(jkyros): Kind of cheating using this as metadata showback for the user until we figure out our "level triggering" strategy
pool.Spec.Configuration.Source = []corev1.ObjectReference{
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talking with Colin still, we were talking about how nice it would be to have some sort of history on the pool -- something like "this update changed X" and being able to step back through those.

Comment thread pkg/daemon/daemon.go Outdated
if desiredConfigName, ok := dn.node.Annotations[constants.DesiredMachineConfigAnnotationKey]; ok {
// TODO(jkyros): This works for now, but we should probably add another annotation or at least
// do something more elegant than string prefix parsing
if strings.HasPrefix(desiredConfigName, "sha256:") {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preference has been expressed that we have separate config/image annotations and we clean up the ones we aren't using when we switch.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this with the most recent push, but it seems a little more invasive than I'd have liked due to some of the assumptions in the node controller's status (see status.go comment below)

@jkyros jkyros force-pushed the build-controllery-bits branch from f5f87e7 to 3443c17 Compare April 26, 2022 07:35

// Make sure the imagestreams exist so we can populate them with our image builds
for _, imageStreamName := range ensureImageStreams {
_, err := ctrl.ensureImageStreamForPool(pool, imageStreamName, pbr)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something else we talked about was whether the resource creation/ensuring should be done by the operator instead of by the build controller. I put them here because they were "dynamic" per-pool and not created from any sort of manifest, but I wanted to make sure we think through that.

I have maybe some hypershift-y "lego brick" concerns if I need the operator to spawn my objects in order to do a build, but maybe that doesn't really matter.

Comment on lines 139 to 147
}
cconfig, ok := node.Annotations[daemonconsts.CurrentMachineConfigAnnotationKey]
if !ok || cconfig == "" {
return false

cconfig, hasConfig := node.Annotations[daemonconsts.CurrentMachineConfigAnnotationKey]
cimage, hasImage := node.Annotations[daemonconsts.CurrentImageConfigAnnotationKey]

// if we have an image config present
if hasImage && cimage != "" {
return true
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things like this are why I was previously hesitant to have a completely separate set of annotations -- I didn't want our changes to be too invasive -- there are a bunch of places where we assume that the presence/absence of daemonconsts.CurrentMachineConfigAnnotationKey means something.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah...fair enough. Hmm.
So definitely a bit more risk to a master merge with this. I think we can do it though, just needs some auditing. I'm also pretty confident our e2es cover this.

Comment thread docs/BuildController.md Outdated
Comment on lines +3 to +10
## This makes a bunch of imagestreams, what are they for?

|imagestream| intended purpose|
|---|---|
|`coreos` | The global default base coreos image (rhel-coreos) |
|`layered-base` | The base image stream our buildconfigs draw from (defaults to coreos:latest) |
|`layered-external-base` |External base image stream. If :latest tag is populated, layered-base uses this instead of coreos:latest|
|`layered-mco-content` | This is the image built by our mco dockerfile where ignition-apply runs |
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the other conversations we also had around this was whether or not the "custom pools" should derive from the worker pool image or if they should build their own. So right now on the layering branch:

  • each pool in "layred/image mode" (custom or not) gets its own set of imagestreams and buildconfigs.

This is okay because the render controller is merging the config for us -- it's handling "config derivation" on the machineconfig front for custom pools, but the image content is still being built 'from base'.

I'm just writing this here so we remember to think through whether "all custom pools inherit derive from worker" is a meme that we want to carry forward into our new derived image world, or if it is something we only did out of necessity because the pool needed the config in worker (base, templates, etc) to live ?

Comment thread pkg/controller/build/build_controller.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could probably share this with render.go; calling it poolKind could make sense too?

Comment thread docs/BuildController.md Outdated
Comment thread docs/BuildController.md
@cgwalters
Copy link
Copy Markdown
Member

/test unit
/test verify
/test images
/test e2e-gcp-op
/test e2e-agnostic-upgrade

@cgwalters
Copy link
Copy Markdown
Member

/approve

jkyros added 14 commits May 10, 2022 11:34
The build controller "takes over" for pools that are marked as
layered. It will more or less serve the function of the render
controller but for image based pools.

The render controller currently retains the functionality of generating
the rendered config, and will push that rendered config into an
imagestream every time it gets updated.
Since we're going straight into the configuration now (the configuration
being an Image (we can do this because the MachineConfigPool
spec.Configuration is an ObjectRef), we don't need to have multiple
annotations for desiredConfig and desiredImage anymore.

The desiredconfig contains the reference for whatever the configuration
is supposed to be, and the daemon will figure out how to apply it.

This removes the 'halfway-step' desiredImage annotation behavior and
allows the node_controller to assign the image sha256 itself as
desiredConfig.
Before this, we had separate annotations for current/desired config and
current/desired image. We were getting to a particular machineconfig
level "via an image". The MachineConfig level was what we were trying to
reach, not the image.

This changes the approach such that the image is now the desired
configuration that we are trying to reach, and that is what is will be
contained in desiredConfig for pools that are performing layered/image
builds.

This removes the old dual-annotation behavior and changes the ordering
slightly in a few places because we used to assume that desiredConfig
was a machineConfig (so we tried to retrieve it right out of the gate
without checking to see what it really was) and the mcd would degrade
if it couldn't retrieve it as a machineconfig.
The changes being made to the render/build controllers and the daemon
 have resulted in them losing or gaining informers/listers/clients in
their creationmethod signatures.

This, of course, breaks all of the tests.

This commit fixes the tests by adjusting the includes and method
signatures in the tests to match our new controller + daemon signatures
Previously we had put in some less careful RBAC rules to let the build
controller watch images and builds.

This replaces those cluster roles with a namespaced role that only lets
the MCO watch builds and images inside its own namespace.

This corresponds with the controller's build and image informers being
namespaced to the MCO namespace.

This also lets the daemon retrieve images, as the daemon is being passed
a sha256: hash (because that's how openshift images are named) and it
needs to convert it to an actual image reference so rpm-ostree can apply
it.
So after we talked about it, while the string matching on the sha: sum
did work, the desire was ti actually specify the kind of the config, so
we're doing that by having separate annotations.

This also cleans the currentConfig/desiredConfig off the node when it's
set to done. I *think* I addressed most of the consequences of that but
there is no way I didn't miss at least one of them somewhere, since the
MCO is heavily dependent on the assumption that currentConfig will
always be there in a lot of cases (which is why I had been avoiding
doing this).

I also removed the imagelister while I was in there and switched to a
direct client call.
The method signature of the daemon's clusterconnect method changed when
I took the lister out, the test was just adjusted to accommodate this.
I'll squash these out if this is what we end up wanting to do, but this
embraces the "separate annotation" way of assigning either an image or a
machineconfig to a node.

There are a lot of places in the node controller status calculation
where it assumes that controllerConfig is always there and so I tried to
address those so either the image annotations OR the machineconfig
annotations would satisfy things like the 'managed' check and the 'done'
check.

This makes the node controller assign current/desired image annotations
to nodes in pools where the config is an image, and the standard
current/desired config annotations to nodes in standard pools.
There was an issue where a node would fail to run some pods after it
rebased to an image and rebooted. Somehow during a rebase, a blank
'resolv.conf' gets added at /usr/etc/resolv.conf even though it is not
present in any config or image. Since we were defaulting all the files
previously using rsync for the ostree 3-way merge, we were
unfortunately blanking out resolv.conf. That would have been okay except
as part of that 'rsync' process, we were also losing the selinux context
for /etc/resolv.conf, which meant that NetworkManager couldn't rewrite
it because if was no longer net_t.

This sidesteps the problem for now by only copying files that have
definitely changed between images because we are already calculating
that file list and have it available.

Furthermore, we are running restorecon just to make sure that should
a config image change one of the files with a special context, it will
get set back properly.

This also moves the drain before the file changes -- we don't want the
pods to be affected by any of the config changes (that is why we are
draining, after all).
Previously when we were handling kargs, we were failing to apply them on
the initial transition because the mcdcontent.json file containing them
did not exist in the initial image. We'd rebase then degrade, then the
next time through, it would notice we rebased and skip the section. Then
the next image update would work because the file was already on disk.

Also, the Cat function that was retrieving the kargs data was assuming
that the sequence number was always going to be '0' on deployments, and
so depending on what had happened (live-apply) it would sometimes fail
to find the MCDContent file on disk.

This:
- Decouples the rebase from the kernel arg handling
- Uses the sequence number when reading files out of deployments so the
  files are properly found
- Allows us to fall back to using the on-disk currentConfig to retrieve
  the kernel args during our first machineconfig to image rebase (since
we delete existing, then add, we need to know what existing was)
This was just for playing with layering. We currently explode extensions
into lists of packages but do not support arbitrary non-extension
packages.

Until we figure out an elegant in-cluster way (if any) for a user to
supply arbitrary packages, I just hacked this in so they could specified
in the extensions field.
This is a half-step that gets us out of having to have extra RBAC perms
to let the daemon read secrets.

For now, this just uses the machine-config-daemon's serviceaccount
token which is mounted in /var/run/secrets.

This is also apparently what openshift-controller-manager does to
generate the dockercfg tokens, so having the operator inject them isn't
really any lower risk. (You can reverse-engineer the sa token from the
dockercfg secret).

I don't know that I want to write the serviceaccount token to disk? I
think I'd rather generate some other pull-only secret, but I don't know
if there is a way to do that yet, so this is the best we have.
@jkyros jkyros force-pushed the build-controllery-bits branch from 3443c17 to b895594 Compare May 11, 2022 20:53
@jkyros jkyros marked this pull request as ready for review May 11, 2022 21:01
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 11, 2022
@openshift-ci openshift-ci Bot requested a review from cheesesashimi May 11, 2022 21:02
Comment thread pkg/daemon/update.go
return err
}
// Fix the selinux context just in case it had special context
err = runCmdSync("restorecon", "-Fv", diffFile)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can actually just use cp -Z btw.

Copy link
Copy Markdown
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the detailed commit messages! Overall, there's a lot of change here and I didn't review line by line, but direction looks good.

I think we'll be reviewing bits of this more carefully when we're getting ready to merge back to main/master.

Awesome work here!

Comment thread pkg/daemon/writer.go
}

func setNodeAnnotations(client corev1client.NodeInterface, lister corev1lister.NodeLister, nodeName string, m map[string]string) (*corev1.Node, error) {
func setNodeAnnotations(client corev1client.NodeInterface, lister corev1lister.NodeLister, nodeName string, add, remove map[string]string) (*corev1.Node, error) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, we're starting to accumulate more merge conflicts...in this case with #3141

Not a blocker of course, just noting

@cgwalters
Copy link
Copy Markdown
Member

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 12, 2022
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 12, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, jkyros

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

1 similar comment
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 12, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, jkyros

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@cgwalters cgwalters changed the title [layering] WIP: Separate build controller to handle builds for layered pools [layering] Separate build controller to handle builds for layered pools May 12, 2022
@cgwalters cgwalters merged commit 08317ef into openshift:layering May 12, 2022
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 12, 2022

@jkyros: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-single-node b895594 link false /test e2e-gcp-single-node
ci/prow/e2e-gcp-layering b895594 link true /test e2e-gcp-layering
ci/prow/e2e-gcp-op b895594 link true /test e2e-gcp-op

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. layering lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants