New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MCO-588: Update ignition spec to 3.4, disallow ignition KernelArguments for now #3814
MCO-588: Update ignition spec to 3.4, disallow ignition KernelArguments for now #3814
Conversation
@jkyros: This pull request references MCO-588 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Skipping CI for Draft Pull Request. |
/test e2e-gcp-op |
1 similar comment
/test e2e-gcp-op |
/test e2e-gcp-op |
/test e2e-aws-ovn |
/test e2e-gcp-op |
/retest-required |
34adbfd
to
016227a
Compare
/retest-required |
/test e2e-hypershift |
1 similar comment
/test e2e-hypershift |
I'm going to try one more time but there is a nonzero % chance this is actually breaking hypershift since they have that ignition server. I looked through all the logs though and it looks like it's being served up right to the bootstrap server? This PR has never passed though, and others have passed since, so it looks very suspicious, I just don't understand yet how it's happening.
/test e2e-hypershift |
Yep, I think I did break it.
If
Then:
...and...hypershift explicitly asks for 3.2 🤦 So...we don't just need the 4.13 karg "downgrade hacks" for going back to a previous version, we need them for anytime someone asks us for a < 3.3. Ugh. |
/test e2e-aws-ovn |
"hypeshift". ha. helps if I type it right. |
Note a consequence of this is that there will be an extra reboot when kargs are applied, because now:
On one hand, it's good that kargs will be applied even before the OS update. But before now I think most users will only want one reboot |
@jkyros: This pull request references MCO-588 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
because I don't want to break hypershift... 😄 |
@jkyros: trigger 1 job(s) for the /payload-(job|aggregate) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/20ecb370-353b-11ee-8ce4-6cda0d6b9fe0-0 |
that payload job passed, so it seems at least likely this didn't immediately break hypershift |
Rendered MCs are using igniton version 3.4
./oc adm reboot-machine-config-pool mcp/worker mcp/master 95-oc-initiated-reboot-master 3.1.0 2m23s
Hypershift 4.14 could be installed and a 4.13 managed cluster could be deployed. All MCO hypershift test cases passed.
A yum based RHEL8 slave could be added without problems
The upgrade was successful
After the upgrade the old rendered MCs were still using ignition 3.2, and the new rendered MCs were using 3.4. This is the status of the MCs after an upgrade:
We see nothing weird, everything seems to be working as expected.
The worker pool becomes degraded reporting this error
When the offending MC was removed the worker pool stopped being degraded.
The worker pool is degraded, when the offending MC was removed the worker pool stopped being degraded.
Enter into a worker node and remove the iptables rule blocking the output traffic for ignition port
If we ask for lower unsupported versions, a bad request error is returned
Make sure that we can ask for all version of the ignition config to the MCS
We can see that if we ask for a 3.0.0 ignition configuration, the answer is a "400 Bad Request" error. MCO supports 3.0.0 ignition version, so we expected the request for a 3.0.0 configuration to work. It can be reproduced in a 4.13 cluster, so this problem (if this is a problem) is not related to our PR.
If we ask for versions higher that 3.4.0 but lower than 4.0.0 we get always the 3.4.0 version
Note that the kernel arguments are not in the 3.4.0/3.3.0 versions of the ignition configs
A new MC is rendered including the tang config but MCO does nothing. Nodes are not rebooted nor drained when the new rendered config is applied.
With 2.2.0 version it worked fine as well.
Run "TC-63477-Deploy files using all available ignition configs." .All configurations were applied properly.
Both 2.2.0 and 3.4.0 version behaved as expected. New rendered 3.2.0 machineconfigs were created and the units worked as expected
The generated MCs use 3.4 ignition version, as expected:
If we create the KubeletConfig and ContainerRuntimeConfig resources in a 4.13 cluster and we upgrade the cluster to our image, the generated MCs' ignition versions are updated from 3.2 to 3.4.
A full regression was executed (including hypershift and scale test cases) and no problem was found. Summary
We can add the qe-approved label /label qe-approved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, minor comments
oldIgnCfg.Storage.Raid = []ign3types.Raid{ | ||
{ | ||
Name: "data", | ||
Level: "stripe", | ||
Level: &stripe, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks fine but just interested why the change here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ignition spec changed -- not my choice. 😄
Sometimes ignition changes what is/isn't a pointer between versions. They try not to, but it happens. The ignition parsing (up) and the ignition converter (down) usually handle this for us (although sometimes it still doesn't line up exactly right, e.g.: coreos/ign-converter#47)
@@ -169,7 +176,7 @@ func (sh *APIHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) { | |||
serveConf = &converted31 | |||
} else { | |||
// Can only be 2.2 here | |||
converted2, err := ctrlcommon.ConvertRawExtIgnitionToV2(conf) | |||
converted2, err := ctrlcommon.ConvertRawExtIgnitionToV2_2(conf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, are we changing the min version as well? or has it always been 2.2 and we are just using better naming?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was just me changing the name to match the others while I was in there since the other ones for V3 all have it specified. I was thinking of the poor soul that might have to do the next ignition bump.
It's still the same V2 version (2.2) it ever was (and hopefully the only version that ever will be)
if version != nil && version.LessThan(*semver.New("3.3.0")) { | ||
// we're converting to a version that doesn't support kargs, so we need to stuff them in machineconfig | ||
if len(conf.KernelArguments.ShouldNotExist) > 0 { | ||
return fmt.Errorf("Can't serve version %s with ignition KernelArguments.ShouldNotExist populated", version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
version converts to string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, satisfies stringer interface: https://github.com/coreos/go-semver/blob/6328231d806b2e2199c42ecc055a16ddcc2c6121/semver/semver.go#L106
I think we're fine for now (we can safely upconvert), but to Sergio's point about the SSH key ignition, the installer is pegged to 3.2 right now and generates 3.2 ignition with (but I do know there are folks that are using bleeding edge installers against old MCO's so I wouldn't be in a particular hurry to bump that and break them if we don't need any of the new features) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally lgtm, so as I understand this, the MCO in 4.14 will allow you to provide an install time ignition-karg via machineconfig, but not allow existing clusters to retroactively add it. (much like, say, disk partitioning)
I'm curious when MigrateKernelArgsIfNecessary would be used then, since 4.14 rhcos should request 3.4 (I think?) and so the only time you would do this is to install 4.14 and then go find some 4.11 ami or something to use? It's fine to have regardless, just wondering when this could happen.
Also one comment inline
@@ -133,7 +133,7 @@ func (sh *APIHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) { | |||
w.WriteHeader(http.StatusNotFound) | |||
return | |||
} | |||
// we know we're at 3.2 in code.. serve directly, parsing is expensive... | |||
// we know we're at 3.4 in code.. serve directly, parsing is expensive... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This comment seems to imply that the if reqConfigVer.Equal(*semver.New("3.4.0"))
condition should now just return servedConf. I assume the conversion is really a no-op regardless, but just for cleanup, maybe we can just delete this line (or remove the conversion code if you want)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. It's a no-op, but the right thing to do is take out the conversion, and that is what I have done 😄
Correct.
So I originally added that because I was thinking specifically "if we migrate the kargs from MachineConfig into ignition, we're going to break something like hypershift that explicitly requests 3.2.0", and I didn't want to do that if I could help it. We didn't end up migrating MachineConfig to ignition (boo double-reboot), but I left that function in because I figured if someday anything ever populates ignition kargs (say in a template or something), that instantly breaks anything requesting < 3.3 and that's probably still bad. The converter would choke on it because unsupported fields are populated. and since we're carrying all this "downconvert and serve any version we've ever supported" translation logic around still, I assumed we were trying to keep best-effort compatibility with....old things? If possible? I mean, we haven't fixed bootimages yet....so if I have a cluster with old bootimages that scales up a node against the 4.14 MCS that's serving a config with ignition kargs populated, it's going to request < 3.3, get an error, and fail to complete unless I migrate the args out isn't it? Or am I missing a thing? (This is the part where we find out if John is actually a dunce 😛 ) |
This updates the default "internal" MCO version to 3.4.0 and it will by default try to convert/translate everything it gets to that version. The convention I'm following here is that int3types becomes the base ignition3 version that the MCO is using, and anything called out explicitly like ign3_4types is being used explicitly as that version not as the default version. So (hopefully) the next person that gets to do a migration should only have to go replace ign3types, add the downtranslator, and see what broke.
0477d05
to
126c164
Compare
Rebased and removed the noop per Jerry's comment #3814 (comment). No other changes, functionality remains unchanged. |
Makes sense to me! I'm 100% for having the backward compatibility, was more so of a curiosity question where I was trying to think of a scenario where it would be used as is today. Definitely will be used in the future such as:
This can't happen (without user doing something wonky) directly at the point of this PR I think? But I agree this is good to have since once we do enable you to specify ignition kargs on existing clusters, this will be needed |
/lgtm New changes lgtm, I think Sergio's validation is still accurate, so no need to re-verify |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cdoern, jkyros, yuqi-zhang The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@jkyros: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Stabilise openshift 4.14.0 spec on fcos 1.5 and ignition 3.4 stable specs. The MCO is now capable of understanding Ignition 3.4.0 specs and uses that by default. See: openshift/machine-config-operator#3576 See: openshift/machine-config-operator#3814
ZTP must produce MachineConfig resources with ignition version v3.2.0 Since openshift/machine-config-operator#3814, MCO writes v3.4.0 ignition configuration. openshift/machine-config-operator@63d7be1 is the commit just before the culprit one. Also, machine-config-operator@63d7be1ef18b86826b47c61172c7a9dc7c2b6de1 has a transitive dependency to `github.com/Microsoft/hcsshim@v0.8.7` that cause a `go mod tidy` error: ``` github.com/Microsoft/hcsshim@v0.8.7 requires k8s.io/kubernetes@v1.13.0 requires k8s.io/endpointslice@v0.0.0: reading k8s.io/endpointslice/go.mod at revision v0.0.0: unknown revision v0.0.0 ``` which is fixed in v0.8.8 by microsoft/hcsshim#783 Signed-off-by: Andrea Panattoni <apanatto@redhat.com>
ZTP must produce MachineConfig resources with ignition version v3.2.0 Since openshift/machine-config-operator#3814, MCO writes v3.4.0 ignition configuration. openshift/machine-config-operator@63d7be1 is the commit just before the culprit one. Also, machine-config-operator@63d7be1ef18b86826b47c61172c7a9dc7c2b6de1 has a transitive dependency to `github.com/Microsoft/hcsshim@v0.8.7` that cause a `go mod tidy` error: ``` github.com/Microsoft/hcsshim@v0.8.7 requires k8s.io/kubernetes@v1.13.0 requires k8s.io/endpointslice@v0.0.0: reading k8s.io/endpointslice/go.mod at revision v0.0.0: unknown revision v0.0.0 ``` which is fixed in v0.8.8 by microsoft/hcsshim#783 Signed-off-by: Andrea Panattoni <apanatto@redhat.com>
ZTP must produce MachineConfig resources with ignition version v3.2.0 Since openshift/machine-config-operator#3814, MCO writes v3.4.0 ignition configuration. openshift/machine-config-operator@63d7be1 is the commit just before the culprit one. Also, machine-config-operator@63d7be1ef18b86826b47c61172c7a9dc7c2b6de1 has a transitive dependency to `github.com/Microsoft/hcsshim@v0.8.7` that cause a `go mod tidy` error: ``` github.com/Microsoft/hcsshim@v0.8.7 requires k8s.io/kubernetes@v1.13.0 requires k8s.io/endpointslice@v0.0.0: reading k8s.io/endpointslice/go.mod at revision v0.0.0: unknown revision v0.0.0 ``` which is fixed in v0.8.8 by microsoft/hcsshim#783 Signed-off-by: Andrea Panattoni <apanatto@redhat.com>
- What I did
Ignition bump:
- How to verify it
- Description for the changelog
Closes: MCO-588
Closes: coreos/butane#312