Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add KernelArguments to MachineConfig #762

Merged
merged 1 commit into from May 27, 2019

Conversation

Projects
None yet
6 participants
@cgwalters
Copy link
Contributor

commented May 15, 2019

We need to support things like adding nosmt to the kernel
commandline, for users that want to choose security over performance;
this is particularly relevant for multi-tenant clusters.
There are a lot of other "tuning" features that are expressed as
kernel cmdline args; down the line we want node-tuning-operator
to write something like machineconfig/05-node-tuning-operator.

Per discussion in the PR, we may consider adding
an operatingsystem CRD or so that would be a high level wrapper
for this, but today everything else is expressed in MachineConfig,
so we need to extend this type.

There have also been some discussions about extending Ignition,
but that's not likely to land soon, and even if it did we still
need to handle "early pivot" pulling machine-os-content, and
scoping both OS updates and kargs into Ignition is too far out.

This is the first step which will support "day 2" configuration.
After this lands, we have further work to have the MCS provide
MachineConfig and not just Ignition, so that we can replace
the "early pivot" with MCD code.

Closes: #388

@cgwalters cgwalters force-pushed the cgwalters:kargs branch from cecbdab to 5b7eac9 May 16, 2019

@openshift-ci-robot openshift-ci-robot added size/L and removed size/M labels May 16, 2019

@cgwalters

This comment has been minimized.

Copy link
Contributor Author

commented May 16, 2019

Updated and tested with this MC:

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 05-walters-mctest-worker-nosmt
spec:
  config:
    ignition:
      version: 2.2.0
  kernelArguments:
    - nosmt

And I correctly get nosmt in the kargs on my worker nodes 🎉

Show resolved Hide resolved pkg/daemon/update.go Outdated

@cgwalters cgwalters force-pushed the cgwalters:kargs branch from 5b7eac9 to 02d9c5c May 16, 2019

@ashcrow

This comment has been minimized.

Copy link
Member

commented May 16, 2019

Looks good so far! When this makes it in let's not forget to drop the pivot interface.

@cgwalters

This comment has been minimized.

Copy link
Contributor Author

commented May 17, 2019

When this makes it in let's not forget to drop the pivot interface.

Yeah, though we'll need to ratchet by landing this, then switch the installer over to use it, then we can drop the pivot component.

And...hum actually this goes straight into coreos/coreos-installer#13 - we're served Ignition, not MachineConfig, so we won't have KernelArguments. But we really want to handle those kargs during the early pivot so we're not rebooting twice.

Hmm. Hmm. I think what we really want to do is have a mechanism to "invert" things and inject the extra stuff (today osImageURL, w/this PR also kernelArguments) from MachineConfig into the Ignition, and then have a host side service that processes the extra bits.

This would really be moving pivot to be "machineconfig handling for updates + kargsl". I think the simplest thing may be that we actually move the pivot code into the MCD, and when we go to build RHCOS we extract the binary from the MCD container or so, rather than building an RPM. (That entangles the build systems in an interesting way though)

Or: we make this bit common between FCOS/RHCOS and add kargs into Ignition.

@ashcrow

This comment has been minimized.

Copy link
Member

commented May 17, 2019

Or: we make this bit common between FCOS/RHCOS and add kargs into Ignition.

I feel like that would be a big win and help bring the two closer together if it's feasible. /cc @bgilbert @ajeddeloh @dustymabe

@bgilbert

This comment has been minimized.

Copy link
Member

commented May 17, 2019

For Ignition kargs discussion, see coreos/ignition-dracut#81. Note that it currently presumes a mechanism along the lines of ostreedev/ostree#479 (comment).

@cgwalters cgwalters force-pushed the cgwalters:kargs branch from 50354bc to f0c6254 May 17, 2019

@cgwalters

This comment has been minimized.

Copy link
Contributor Author

commented May 17, 2019

Oh hey cool, this passed the new e2e test I added!

--- PASS: TestKernelArguments (189.01s)
        mcd_test.go:296: Created kargs-81bd2992-78dc-11e9-b495-0a58ac10166e
        mcd_test.go:115: Pool worker has rendered config kargs-81bd2992-78dc-11e9-b495-0a58ac10166e with rendered-worker-431eab0a737953d758eacf1c99c1b9f1 (waited 2.032938681s)
        mcd_test.go:139: Pool worker has completed rendered-worker-431eab0a737953d758eacf1c99c1b9f1 (waited 2m52.017149132s)
        mcd_test.go:327: Node ip-10-0-131-46.ec2.internal has expected kargs
        mcd_test.go:327: Node ip-10-0-143-244.ec2.internal has expected kargs
        mcd_test.go:327: Node ip-10-0-147-161.ec2.internal has expected kargs

But it failed units...looking at that.

Will look at the other review comments here too.

But we need to close the debate pretty soon - are we all OK with extending MachineConfig in this fashion, or do we want to try creating a new operatingsystem.openshift.io CRD or something?

@runcom

This comment has been minimized.

Copy link
Member

commented May 17, 2019

But we need to close the debate pretty soon - are we all OK with extending MachineConfig in this fashion, or do we want to try creating a new operatingsystem.openshift.io CRD or something?

This is tricky, we got so may inputs and ideas (at a point, we wanted also something on top of MachineConfig to be more sugar to allow for a nice upgrade to ignv2s3). The CRD to allow us oc edit kargs was from the beginning an amazing view and idea (oc edit fips anyone?).

My preference would go to a new CRD though (operatingsystem sounds perfect and extensible/reusable) and it follows what we did with KubeletConfig and ContainerRuntimeConfig (both of them eventually create MachineConfigs so we could also use the same pattern here somehow?). Also, we could still have a nice CRD on top of the MachineConfig and leave the implementation in this PR to land as is now.

@cgwalters

This comment has been minimized.

Copy link
Contributor Author

commented May 18, 2019

My preference would go to a new CRD though (operatingsystem sounds perfect and extensible/reusable) and it follows what we did with KubeletConfig and ContainerRuntimeConfig (both of them eventually create MachineConfigs so we could also use the same pattern here somehow?).

Yeah...what has me hesitating about that is all the copy-paste involved in the "sub-controllers". But eh, we can figure that out.

Also, we could still have a nice CRD on top of the MachineConfig and leave the implementation in this PR to land as is now.

That's an interesting point. I had thought that for this we'd "bypass" MachineConfig so we wouldn't have the options twice, but that would involve a bigger rearchitecting.

Maybe there aren't that many special things that aren't basically files (i.e. Ignition), so we won't have to duplicate too much...I am actually having a bit of trouble thinking of anything besides kargs and osimageurl that are special.

@runcom

This comment has been minimized.

Copy link
Member

commented May 18, 2019

I am actually having a bit of trouble thinking of anything besides kargs and osimageurl that are special.

sebooleans? we had a doc at some point 🤔 but maybe sebooleans (timeservers and the like) aren't really operating system

@cgwalters cgwalters force-pushed the cgwalters:kargs branch from f0c6254 to 57fea46 May 18, 2019

@cgwalters

This comment has been minimized.

Copy link
Contributor Author

commented May 18, 2019

sebooleans?

Oh yeah that's a good one.

As far as other things like timeservers - this topic is kind of more about things that can't obviously "compile down" to just writing files or changing kernel args. Timeservers should basically just be writing a file. I think FIPS is potentially a good example as you need to run this update-crypto-policy command. But on the other hand...we can write a systemd unit that does it based on a file (or perhaps better, ship the unit with RHCOS by default).

@runcom

This comment has been minimized.

Copy link
Member

commented May 18, 2019

I think FIPS is potentially a good example as you need to run this update-crypto-policy command. But on the other hand...we can write a systemd unit that does it based on a file (or perhaps better, ship the unit with RHCOS by default).

oh yeah, FIPS is the follow up of this anyway (I'm ok on a pre-shipped unit as well if it works for RHCOS, but exercising it w/o a unit would be nice also).

Ok, let's sum this up, I believe we can overload MachineConfig with Kargs|FIPS|Sebooleans and abstract those out with CRDs in follow ups if we really want to (even before next release for kargs|fips).
The MC object is still the way to let MCD know what it has to do to when reconciling to a given config. Am I missing something? 🤔 even if we want sugar on top (or on the side), the MC is still the last unit in the chain that the MCD understands.

@cgwalters

This comment has been minimized.

Copy link
Contributor Author

commented May 20, 2019

/retest

@ashcrow

This comment has been minimized.

Copy link
Member

commented May 20, 2019

The container could not be located when the pod was terminated
May 20 14:21:18.352 I test="[Disruptive] Cluster upgrade should maintain a functioning cluster [Feature:ClusterUpgrade] [Suite:openshift] [Serial]" failed
@ashcrow

This comment has been minimized.

Copy link
Member

commented May 20, 2019

/test e2e-aws-upgrade

@cgwalters

This comment has been minimized.

Copy link
Contributor Author

commented May 20, 2019

The upgrade failure seems unrelated, seeing that in a few other jobs in https://prow.svc.ci.openshift.org/?job=*e2e-aws-upgrade*&state=failure
/retest

@cgwalters cgwalters force-pushed the cgwalters:kargs branch from 57fea46 to e9563b0 May 20, 2019

@cgwalters cgwalters changed the title WIP: Add KernelArguments to MachineConfig Add KernelArguments to MachineConfig May 20, 2019

@cgwalters cgwalters force-pushed the cgwalters:kargs branch from e9563b0 to b5a804a May 20, 2019

Add KernelArguments to MachineConfig
We need to support things like adding `nosmt` to the kernel
commandline, for users that want to choose security over performance;
this is particularly relevant for multi-tenant clusters.
There are a lot of other "tuning" features that are expressed as
kernel cmdline args; another good exmaple is to have `node-tuning-operator`
to write something like `machineconfig/05-node-tuning-operator`.

Per discussion in the PR, we down the line probably want to add
an `operatingsystem` type or so that would be a high level wrapper
for this, but today everything else is expressed in `MachineConfig`,
so we need to extend this type.

There have also been some discussions about extending Ignition,
but that's not likely to land soon, and even if it did we still
need to handle "early pivot" pulling `machine-os-content`, and
scoping both OS updates and kargs into Ignition is too far out.

This is the first step which will support "day 2" configuration.
After this lands, we have further work to have the MCS provide
`MachineConfig` and not just Ignition, so that we can replace
the "early pivot" with MCD code.

Closes: #388

@cgwalters cgwalters force-pushed the cgwalters:kargs branch from b5a804a to d4ace1c May 20, 2019

@cgwalters

This comment has been minimized.

Copy link
Contributor Author

commented May 20, 2019

OK, updated the commit message, and also updated the MachineConfig docs a bit. (I dropped the osImageURL stuff mostly from the user-facing docs since...they shouldn't/can't-really touch that)

I think this is probably good to go ?

@runcom

This comment has been minimized.

Copy link
Member

commented May 21, 2019

/retest

@runcom

This comment has been minimized.

Copy link
Member

commented May 21, 2019

Uhm, looks like upgrade is stuck from e2e?

@cgwalters

This comment has been minimized.

Copy link
Contributor Author

commented May 21, 2019

/retest

@runcom

This comment has been minimized.

Copy link
Member

commented May 27, 2019

Guess this is really good now, @ashcrow @miabbott this is a prereq for FIPS as well so we now have something to test with

/lgtm

@openshift-ci-robot

This comment has been minimized.

Copy link

commented May 27, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, runcom

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 470f219 into openshift:master May 27, 2019

7 checks passed

ci/prow/e2e-aws Job succeeded.
Details
ci/prow/e2e-aws-op Job succeeded.
Details
ci/prow/e2e-aws-upgrade Job succeeded.
Details
ci/prow/images Job succeeded.
Details
ci/prow/unit Job succeeded.
Details
ci/prow/verify Job succeeded.
Details
tide In merge pool.
Details

ashcrow added a commit to ashcrow/pivot that referenced this pull request May 28, 2019

Remove basic kernel tuning functionality from pivot
This functionality has been replaced within the MCO.

See: openshift/machine-config-operator#762

This reverts commit 9383200.

@cgwalters cgwalters referenced this pull request May 28, 2019

Open

kargs phase 2 #798

return nil
}
if dn.OperatingSystem != machineConfigDaemonOSRHCOS {
return fmt.Errorf("Updating kargs on non-RHCOS nodes is not supported: %v", diff)

This comment has been minimized.

Copy link
@runcom

runcom Jun 5, 2019

Member

now that I'm looking at this, should we just return nicely as we do for updateOS 🤔

This comment has been minimized.

Copy link
@runcom
// kargs
if err := dn.updateKernelArguments(oldConfig, newConfig); err != nil {
return err
}

This comment has been minimized.

Copy link
@runcom

runcom Jun 5, 2019

Member

and I've now found that we need to roll this back if we fail 😄

This comment has been minimized.

Copy link
@runcom
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.