Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MCO-588: Update ignition spec to 3.4, disallow ignition KernelArguments for now #3814

Merged
merged 4 commits into from Aug 12, 2023

Conversation

jkyros
Copy link
Contributor

@jkyros jkyros commented Jul 21, 2023

- What I did
Ignition bump:

  • Bumped the default/internal MCO ignition spec to 3.4
  • Adjusted translation up/down functions and MCS API to match
  • Took our "downgrade hacks" for 4.13 kargs out, including the downgrade kargs tests
  • Marked ignition KernelArguments irreconcilable for now

- How to verify it

- Description for the changelog

Closes: MCO-588
Closes: coreos/butane#312

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jul 21, 2023

@jkyros: This pull request references MCO-588 which is a valid jira issue.

In response to this:

- What I did
Ignition bump:

  • Bumped the default/internal MCO ignition spec to 3.4
  • Adjusted translation up/down functions and MCS API to match
  • Took our "downgrade hacks" for 4.13 kargs out, including the downgrade kargs tests

New kernel args handling:

  • Rendered config generation adds MachineConfig kernel args to the Ignition ShouldExist list and omits them from the MachineConfig portion of the rendered config (rendered config now only contains ignition kernel args)
  • Daemon only applies ignition kernel args, ignores MachineConfig kernel args

- How to verify it

- Description for the changelog

Closes: MCO-588

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jul 21, 2023
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 21, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 21, 2023

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 21, 2023
@jkyros
Copy link
Contributor Author

jkyros commented Jul 21, 2023

/test e2e-gcp-op
/test e2e-aws-ovn

1 similar comment
@jkyros
Copy link
Contributor Author

jkyros commented Jul 21, 2023

/test e2e-gcp-op
/test e2e-aws-ovn

@jkyros
Copy link
Contributor Author

jkyros commented Jul 21, 2023

/test e2e-gcp-op

@jkyros
Copy link
Contributor Author

jkyros commented Jul 21, 2023

/test e2e-aws-ovn

@jkyros
Copy link
Contributor Author

jkyros commented Jul 21, 2023

/test e2e-gcp-op

@jkyros jkyros marked this pull request as ready for review July 21, 2023 19:55
@jkyros
Copy link
Contributor Author

jkyros commented Jul 21, 2023

/retest-required

@jkyros jkyros force-pushed the ignition-34-bump branch 2 times, most recently from 34adbfd to 016227a Compare July 22, 2023 07:23
@jkyros
Copy link
Contributor Author

jkyros commented Jul 22, 2023

/retest-required

@jkyros
Copy link
Contributor Author

jkyros commented Jul 23, 2023

/test e2e-hypershift

1 similar comment
@jkyros
Copy link
Contributor Author

jkyros commented Jul 23, 2023

/test e2e-hypershift

@jkyros
Copy link
Contributor Author

jkyros commented Jul 24, 2023

I'm going to try one more time but there is a nonzero % chance this is actually breaking hypershift since they have that ignition server.

I looked through all the logs though and it looks like it's being served up right to the bootstrap server? This PR has never passed though, and others have passed since, so it looks very suspicious, I just don't understand yet how it's happening.

  113.378448] ignition[723]: GET https://ignition-server.apps.example-4kbsr.hypershift.local/ignition: attempt #26
[  113.386253] ignition[723]: GET error: Get "https://ignition-server.apps.example-4kbsr.hypershift.local/ignition": dial tcp: lookup ignition-server.apps.example-4kbsr.hypershift.local on 10.0.0.2:53: no such host
�M
�[K[�[0;31m*�[0;1;31m*�[0m�[0;31m*   �[0m] A start job is running for Ignition (fetch) (1min 50s / no limit)
�M
�[K[�[0;1;31m*�[0m�[0;31m*    �[0m] A start job is running for Ignition (fetch) (1min 51s / no limit)
�M
�[K[�[0m�[0;31m*     �[0m] A start job is running for Ignition (fetch) (1min 51s / no limit)
�M
�[K[�[0;1;31m*�[0m�[0;31m*    �[0m] A start job is running for Ignition (fetch) (1min 52s / no limit)
�M
�[K[�[0;31m*�[0;1;31m*�[0m�[0;31m*   �[0m] A start job is running for Ignition (fetch) (1min 52s / no limit)
�M
�[K[ �[0;31m*�[0;1;31m*�[0m�[0;31m*  �[0m] A start job is running for Ignition (fetch) (1min 53s / no limit)
�M
�[K[  �[0;31m*�[0;1;31m*�[0m�[0;31m* �[0m] A start job is running for Ignition (fetch) (1min 53s / no limit)
�M
�[K[   �[0;31m*�[0;1;31m*�[0m�[0;31m*�[0m] A start job is running for Ignition (fetch) (1min 54s / no limit)

/test e2e-hypershift

@jkyros
Copy link
Contributor Author

jkyros commented Jul 24, 2023

Yep, I think I did break it.

E0724 22:11:48.265695       1 api.go:163] couldn't convert config for req: {master 0xc0006684c0}, error: unable to convert Ignition spec v3_2 config to v3_1: KernelArguments is not supported on 3.2
I0724 22:11:59.650797       1 api.go:109] Pool master requested by address:"10.0.156.222:37451" User-Agent:"curl/7.76.1" Accept-Header: "*/*"
E0724 22:11:59.701123       1 api.go:183] couldn't convert config for req: {master 0xc000a4e1c0}, error: failed to convert config from spec v3.2 to v2.2: unable to convert Ignition spec v3 config to v2: unable to convert Ignition spec v3_2 config to v3_1: KernelArguments is not supported on 3.2

If

  • ignition kargs are populated in a 3.4 (which they would be because I moved them from MachineConfig ) and
  • something asks us for something < 3.3,

Then:

  • we fail to downtranslate it to 3.2 because kargs aren't supported.

...and...hypershift explicitly asks for 3.2 🤦

So...we don't just need the 4.13 karg "downgrade hacks" for going back to a previous version, we need them for anytime someone asks us for a < 3.3. Ugh.

@jkyros
Copy link
Contributor Author

jkyros commented Jul 25, 2023

/test e2e-aws-ovn
/test e2e-hypeshift

@jkyros
Copy link
Contributor Author

jkyros commented Jul 25, 2023

"hypeshift". ha. helps if I type it right.
/test e2e-hypershift

@cgwalters
Copy link
Member

merge MachineConfig kernel args into ignition

Note a consequence of this is that there will be an extra reboot when kargs are applied, because now:

  • Ignition will see kargs, apply them and reboot
  • We enter firstboot, and do the firstboot OS update and reboot

On one hand, it's good that kargs will be applied even before the OS update. But before now I think most users will only want one reboot

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jul 25, 2023

@jkyros: This pull request references MCO-588 which is a valid jira issue.

In response to this:

- What I did
Ignition bump:

  • Bumped the default/internal MCO ignition spec to 3.4
  • Adjusted translation up/down functions and MCS API to match
  • Took our "downgrade hacks" for 4.13 kargs out, including the downgrade kargs tests

New kernel args handling:

  • Rendered config generation adds MachineConfig kernel args to the Ignition ShouldExist list and omits them from the MachineConfig portion of the rendered config (rendered config now only contains ignition kernel args)
  • Daemon only applies ignition kernel args, ignores MachineConfig kernel args

- How to verify it

- Description for the changelog

Closes: MCO-588
Closes: coreos/butane#312

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 28, 2023
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 28, 2023
@jkyros
Copy link
Contributor Author

jkyros commented Aug 7, 2023

because I don't want to break hypershift... 😄
/payload-job periodic-ci-openshift-hypershift-release-4.14-periodics-e2e-aws-ovn-conformance

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 7, 2023

@jkyros: trigger 1 job(s) for the /payload-(job|aggregate) command

  • periodic-ci-openshift-hypershift-release-4.14-periodics-e2e-aws-ovn-conformance

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/20ecb370-353b-11ee-8ce4-6cda0d6b9fe0-0

@jkyros
Copy link
Contributor Author

jkyros commented Aug 7, 2023

that payload job passed, so it seems at least likely this didn't immediately break hypershift

@sergiordlr
Copy link

  • Verify that the default version is 3.4

Rendered MCs are using igniton version 3.4
The only MCs using 3.2 version are the SSH config ones (maybe the should be changed to 3.4 as well?)

  • Reboot counter is working without problems usin a 3.1 MC compatible with OCP4.6

./oc adm reboot-machine-config-pool mcp/worker mcp/master
./oc adm wait-for-node-reboot nodes --all

95-oc-initiated-reboot-master 3.1.0 2m23s
95-oc-initiated-reboot-worker 3.1.0 2m23s

  • Hypershift

Hypershift 4.14 could be installed and a 4.13 managed cluster could be deployed. All MCO hypershift test cases passed.

  • Add a yum based RHEL8 worker

A yum based RHEL8 slave could be added without problems

  • Upgrade from 4.13

The upgrade was successful

    history:
    - acceptedRisks: |-
        Target release version="" image="registry.build05.ci.openshift.org/ci-ln-mijdvyb/release:latest" cannot be verified, but continuing anyway because the update was forced: release images that are not accessed via digest cannot be verified
        Forced through blocking failures: Multiple precondition checks failed:
        * Precondition "ClusterVersionUpgradeable" failed because of "AdminAckRequired": Kubernetes 1.27 and therefore OpenShift 4.14 remove several APIs which require admin consideration. Please see the knowledge article https://access.redhat.com/articles/6958395 for details and instructions.
        * Precondition "EtcdRecentBackup" failed because of "ControllerStarted": RecentBackup: The etcd backup controller is starting, and will decide if recent backups are available or if a backup is required
        * Precondition "ClusterVersionRecommendedUpdate" failed because of "UnknownUpdate": RetrievedUpdates=False (VersionNotFound), so the recommended status of updating from 4.13.0-0.nightly-2023-08-07-165810 to 4.14.0-0.ci.test-2023-08-08-065712-ci-ln-mijdvyb-latest is unknown.
      completionTime: "2023-08-08T09:08:28Z"
      image: registry.build05.ci.openshift.org/ci-ln-mijdvyb/release:latest
      startedTime: "2023-08-08T08:08:50Z"
      state: Completed
      verified: false
      version: 4.14.0-0.ci.test-2023-08-08-065712-ci-ln-mijdvyb-latest
    - completionTime: "2023-08-08T07:59:00Z"
      image: registry.ci.openshift.org/ocp/release@sha256:436f2f02eb1aec3070a0f55080dfcf2fbee21d27a315ef5b8cc055a73a83890f
      startedTime: "2023-08-08T07:32:16Z"
      state: Completed
      verified: false
      version: 4.13.0-0.nightly-2023-08-07-165810

After the upgrade the old rendered MCs were still using ignition 3.2, and the new rendered MCs were using 3.4. This is the status of the MCs after an upgrade:

$ oc get mc
NAME                                               GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
00-master                                          b92f5d7829d64c1c3eb9db87e61e539b41d5c6af   3.4.0             4h12m
00-worker                                          b92f5d7829d64c1c3eb9db87e61e539b41d5c6af   3.4.0             4h12m
01-master-container-runtime                        b92f5d7829d64c1c3eb9db87e61e539b41d5c6af   3.4.0             4h12m
01-master-kubelet                                  b92f5d7829d64c1c3eb9db87e61e539b41d5c6af   3.4.0             4h12m
01-worker-container-runtime                        b92f5d7829d64c1c3eb9db87e61e539b41d5c6af   3.4.0             4h12m
01-worker-kubelet                                  b92f5d7829d64c1c3eb9db87e61e539b41d5c6af   3.4.0             4h12m
97-master-generated-kubelet                        b92f5d7829d64c1c3eb9db87e61e539b41d5c6af   3.4.0             178m
97-worker-generated-kubelet                        b92f5d7829d64c1c3eb9db87e61e539b41d5c6af   3.4.0             178m
98-master-generated-kubelet                        b92f5d7829d64c1c3eb9db87e61e539b41d5c6af   3.4.0             178m
98-worker-generated-kubelet                        b92f5d7829d64c1c3eb9db87e61e539b41d5c6af   3.4.0             178m
99-master-generated-registries                     b92f5d7829d64c1c3eb9db87e61e539b41d5c6af   3.4.0             4h12m
99-master-ssh                                                                                 3.2.0             4h16m
99-worker-generated-registries                     b92f5d7829d64c1c3eb9db87e61e539b41d5c6af   3.4.0             4h12m
99-worker-ssh                                                                                 3.2.0             4h16m
master-chrony-configuration                                                                   3.1.0             4h16m
rendered-master-49d052fa5cc1cfe10e353a9954791aca   b92f5d7829d64c1c3eb9db87e61e539b41d5c6af   3.4.0             178m
rendered-master-76b088a9913b050689dfd21fcb8e8542   88fcb47c150db0de0bb1507919b55d0e81233139   3.2.0             4h1m
rendered-master-b428c67a123d1ebaa838c1ca0d8dd029   88fcb47c150db0de0bb1507919b55d0e81233139   3.2.0             4h12m
rendered-master-e5c973d9dcbc34042a253c1828501dd6   88fcb47c150db0de0bb1507919b55d0e81233139   3.2.0             3h44m
rendered-worker-394f18b4915ae35349d0405b8db85ad0   88fcb47c150db0de0bb1507919b55d0e81233139   3.2.0             4h1m
rendered-worker-983aace49fe8da62a60a2b2e0a10344e   88fcb47c150db0de0bb1507919b55d0e81233139   3.2.0             3h44m
rendered-worker-a7df229a952f713cb8b8290a84e5f291   88fcb47c150db0de0bb1507919b55d0e81233139   3.2.0             4h12m
rendered-worker-fc5a764cc80148a1bd3a968c289eb75d   b92f5d7829d64c1c3eb9db87e61e539b41d5c6af   3.4.0             178m
worker-chrony-configuration                                                                   3.1.0             4h16m

We see nothing weird, everything seems to be working as expected.

  • Create a MC with should exist and should not exist kernel arguments
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  creationTimestamp: "2023-03-22T12:05:51Z"
  generation: 1
  labels:
    machineconfiguration.openshift.io/role: worker
  name: test-kernel-arguments-not-exsist
  resourceVersion: "62785"
  uid: 3d10188c-8b4e-4591-80ee-0b5129eb8d42
spec:
  config:
    ignition:
      version: 3.4.0
    kernelArguments:
      shouldExist:
      - enforcing=0
      shouldNotExist:
      - z=0

The worker pool becomes degraded reporting this error

  - lastTransitionTime: "2023-08-08T08:19:30Z"
    message: 'Node ip-10-0-1-35.us-east-2.compute.internal is reporting: "can''t reconcile
      config rendered-worker-fb92b811335f14e575317c589325b283 with rendered-worker-473cebad821410d6edf2e122eb30d934:
      ignition kargs section contains changes"'
    reason: 1 nodes are reporting degraded status on sync
    status: "True"
    type: NodeDegraded

When the offending MC was removed the worker pool stopped being degraded.

  • Create a MC using ignition version 3.4 with kernel arguments
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: test-kernel-arguments-34
spec:
  config:
    ignition:
      version: 3.4.0
    kernelArguments:
      shouldExist:
      - enforcing=0

The worker pool is degraded, when the offending MC was removed the worker pool stopped being degraded.

  • Check that there is no problem when we ask MCS for different ignition versions:

Enter into a worker node and remove the iptables rule blocking the output traffic for ignition port

$ oc debug ...
# iptables -D OPENSHIFT-BLOCK-OUTPUT -p tcp -m tcp --dport 22623 --tcp-flags FIN,SYN,RST,ACK SYN -j REJECT --reject-with icmp-port-unreachable

If we ask for lower unsupported versions, a bad request error is returned

sh-5.1#  curl -IH "Accept: application/vnd.coreos.ignition+json; version=2.1.0" -k https://api.sregidor-v343.qe.devcluster.openshift.com:22623/config/worker
HTTP/1.1 400 Bad Request

Make sure that we can ask for all version of the ignition config to the MCS

sh-5.1# curl -sH "Accept: application/vnd.coreos.ignition+json; version=2.2.0" -k https://api.sregidor-v343.qe.devcluster.openshift.com:22623/config/worker | jq .ignition.version
"2.2.0"
sh-5.1# curl -sH "Accept: application/vnd.coreos.ignition+json; version=3.0.0" -k https://api.sregidor-v343.qe.devcluster.openshift.com:22623/config/worker | jq .ignition.version
sh-5.1# curl -sH "Accept: application/vnd.coreos.ignition+json; version=3.1.0" -k https://api.sregidor-v343.qe.devcluster.openshift.com:22623/config/worker | jq .ignition.version
"3.1.0"
sh-5.1# curl -sH "Accept: application/vnd.coreos.ignition+json; version=3.2.0" -k https://api.sregidor-v343.qe.devcluster.openshift.com:22623/config/worker | jq .ignition.version
"3.2.0"
sh-5.1# curl -sH "Accept: application/vnd.coreos.ignition+json; version=3.3.0" -k https://api.sregidor-v343.qe.devcluster.openshift.com:22623/config/worker | jq .ignition.version
"3.3.0"
sh-5.1# curl -sH "Accept: application/vnd.coreos.ignition+json; version=3.4.0" -k https://api.sregidor-v343.qe.devcluster.openshift.com:22623/config/worker | jq .ignition.version
"3.4.0"
sh-5.1# curl -sH "Accept: application/vnd.coreos.ignition+json; version=3.5.0" -k https://api.sregidor-v343.qe.devcluster.openshift.com:22623/config/worker | jq .ignition.version
"3.4.0"

We can see that if we ask for a 3.0.0 ignition configuration, the answer is a "400 Bad Request" error. MCO supports 3.0.0 ignition version, so we expected the request for a 3.0.0 configuration to work. It can be reproduced in a 4.13 cluster, so this problem (if this is a problem) is not related to our PR.

sh-5.1# curl -IsH "Accept: application/vnd.coreos.ignition+json; version=3.0.0" -k https://api.sregidor-v343.qe.devcluster.openshift.com:22623/config/worker 
HTTP/1.1 400 Bad Request

If we ask for versions higher that 3.4.0 but lower than 4.0.0 we get always the 3.4.0 version


sh-5.1# curl -IsH "Accept: application/vnd.coreos.ignition+json; version=4.1.0" -k https://api.sregidor-v343.qe.devcluster.openshift.com:22623/config/worker 
HTTP/1.1 400 Bad Request

Note that the kernel arguments are not in the 3.4.0/3.3.0 versions of the ignition configs

sh-5.1# curl -sH "Accept: application/vnd.coreos.ignition+json; version=3.4.0" -k https://api.sregidor-v343.qe.devcluster.openshift.com:22623/config/worker | jq |grep -i kernela
  • Create a MC using unsupported luks.tang configuration:
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name:  99-cant-convert-34
spec:
  config:
    ignition:
      version: 3.4.0
    storage:
      luks:
      - name: luks-tang
        device: "/dev/sdb"
        clevis:
          tang:
          - url: https://tang.example.com
            thumbprint: REPLACE-THIS-WITH-YOUR-TANG-THUMBPRINT
            advertisement: '{"payload": "...", "protected": "...", "signature": "..."}'

A new MC is rendered including the tang config but MCO does nothing. Nodes are not rebooted nor drained when the new rendered config is applied.

  • Create a MC using kernels arguments and 3.2.0 and 2.2.0 versions.
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  name: test-kernel-arguments-32
spec:
  config:
    ignition:
      version: 3.2.0
  kernelArguments:
  - loglevel=7

oc debug node/ip-10-0-1-35.us-east-2.compute.internal -- chroot /host cat /proc/cmdline
Starting pod/ip-10-0-1-35us-east-2computeinternal-debug-gpz94 ...
To use host binaries, run `chroot /host`
BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-7da7379e14a5b6b5929f8e7ccf97aad4dbb9739bff9a30fa8ce2eff22524a847/vmlinuz-5.14.0-284.25.1.el9_2.x86_64 ostree=/ostree/boot.0/rhcos/7da7379e14a5b6b5929f8e7ccf97aad4dbb9739bff9a30fa8ce2eff22524a847/0 ignition.platform.id=aws console=tty0 console=ttyS0,115200n8 root=UUID=7296fe25-59fc-486f-a5f2-64062510e720 rw rootflags=prjquota boot=UUID=2fb3cf2e-70e3-49ac-8e35-c616311d01ff systemd.unified_cgroup_hierarchy=1 cgroup_no_v1=all psi=1 loglevel=7

With 2.2.0 version it worked fine as well.

  • Create a random test file in /etc using MCs with ignition versions 2.2.0, 3.3.0, 3.1.0, 3.2.0, 3.3.0, 3.4.0

Run "TC-63477-Deploy files using all available ignition configs." .All configurations were applied properly.

  • Create a random unit using MCs with ignition versions 2.2.0 and 3.4.0
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: mc-unit-22
spec:
  config:
    ignition:
      version: 2.2.0
    systemd:
      units:
      - name: my22unit.service
        contents: |
          [Unit]
          Description=A simple oneshot service
          
          [Service]
          Type=oneshot
          ExecStart=/bin/bash -c "echo Hello world"
          RemainAfterExit=yes

          [Install]
          WantedBy=multi-user.target

Both 2.2.0 and 3.4.0 version behaved as expected. New rendered 3.2.0 machineconfigs were created and the units worked as expected

sh-5.1# systemctl status my22unit.service
● my22unit.service - A simple oneshot service

sh-5.1# systemctl status my34unit.service
● my34unit.service - A simple oneshot service

  • Create a KubeletConfig and a ContainerRuntimeConfig resources

The generated MCs use 3.4 ignition version, as expected:

99-worker-generated-containerruntime               b92f5d7829d64c1c3eb9db87e61e539b41d5c6af   3.4.0             9s
99-worker-generated-kubelet                        b92f5d7829d64c1c3eb9db87e61e539b41d5c6af   3.4.0             5s

If we create the KubeletConfig and ContainerRuntimeConfig resources in a 4.13 cluster and we upgrade the cluster to our image, the generated MCs' ignition versions are updated from 3.2 to 3.4.

  • Full regression passed

A full regression was executed (including hypershift and scale test cases) and no problem was found.

Summary

  1. The MCs created at installation time to configure the ssh keys are using ignition version 3.2. After talking with @jkyros it seems that it is not a problem.
  2. When we ask for ignition config v3.0.0, we get a Bad Request error. This can be reproduced in 4.13, so this is not a problem related to this PR.
sh-5.1# curl -IsH "Accept: application/vnd.coreos.ignition+json; version=3.0.0" -k https://api.sregidor-v343.qe.devcluster.openshift.com:22623/config/worker 
HTTP/1.1 400 Bad Request

We can add the qe-approved label

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Aug 9, 2023
Copy link
Contributor

@cdoern cdoern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, minor comments

oldIgnCfg.Storage.Raid = []ign3types.Raid{
{
Name: "data",
Level: "stripe",
Level: &stripe,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks fine but just interested why the change here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ignition spec changed -- not my choice. 😄

Sometimes ignition changes what is/isn't a pointer between versions. They try not to, but it happens. The ignition parsing (up) and the ignition converter (down) usually handle this for us (although sometimes it still doesn't line up exactly right, e.g.: coreos/ign-converter#47)

@@ -169,7 +176,7 @@ func (sh *APIHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
serveConf = &converted31
} else {
// Can only be 2.2 here
converted2, err := ctrlcommon.ConvertRawExtIgnitionToV2(conf)
converted2, err := ctrlcommon.ConvertRawExtIgnitionToV2_2(conf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, are we changing the min version as well? or has it always been 2.2 and we are just using better naming?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was just me changing the name to match the others while I was in there since the other ones for V3 all have it specified. I was thinking of the poor soul that might have to do the next ignition bump.

It's still the same V2 version (2.2) it ever was (and hopefully the only version that ever will be)

if version != nil && version.LessThan(*semver.New("3.3.0")) {
// we're converting to a version that doesn't support kargs, so we need to stuff them in machineconfig
if len(conf.KernelArguments.ShouldNotExist) > 0 {
return fmt.Errorf("Can't serve version %s with ignition KernelArguments.ShouldNotExist populated", version)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

version converts to string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkyros
Copy link
Contributor Author

jkyros commented Aug 10, 2023

I think we're fine for now (we can safely upconvert), but to Sergio's point about the SSH key ignition, the installer is pegged to 3.2 right now and generates 3.2 ignition with openshift-install create manifests, so we probably want to bump that installer ignition at some point https://github.com/openshift/installer/blob/0be5579f9c69d275db0600f41fac1686068a62c8/pkg/asset/agent/image/ignition.go#L13

(but I do know there are folks that are using bleeding edge installers against old MCO's so I wouldn't be in a particular hurry to bump that and break them if we don't need any of the new features)

Copy link
Contributor

@yuqi-zhang yuqi-zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally lgtm, so as I understand this, the MCO in 4.14 will allow you to provide an install time ignition-karg via machineconfig, but not allow existing clusters to retroactively add it. (much like, say, disk partitioning)

I'm curious when MigrateKernelArgsIfNecessary would be used then, since 4.14 rhcos should request 3.4 (I think?) and so the only time you would do this is to install 4.14 and then go find some 4.11 ami or something to use? It's fine to have regardless, just wondering when this could happen.

Also one comment inline

@@ -133,7 +133,7 @@ func (sh *APIHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusNotFound)
return
}
// we know we're at 3.2 in code.. serve directly, parsing is expensive...
// we know we're at 3.4 in code.. serve directly, parsing is expensive...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This comment seems to imply that the if reqConfigVer.Equal(*semver.New("3.4.0")) condition should now just return servedConf. I assume the conversion is really a no-op regardless, but just for cleanup, maybe we can just delete this line (or remove the conversion code if you want)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. It's a no-op, but the right thing to do is take out the conversion, and that is what I have done 😄

@jkyros
Copy link
Contributor Author

jkyros commented Aug 11, 2023

Generally lgtm, so as I understand this, the MCO in 4.14 will allow you to provide an install time ignition-karg via machineconfig, but not allow existing clusters to retroactively add it. (much like, say, disk partitioning)

Correct.

I'm curious when MigrateKernelArgsIfNecessary would be used then, since 4.14 rhcos should request 3.4 (I think?) and so the only time you would do this is to install 4.14 and then go find some 4.11 ami or something to use? It's fine to have regardless, just wondering when this could happen.

So I originally added that because I was thinking specifically "if we migrate the kargs from MachineConfig into ignition, we're going to break something like hypershift that explicitly requests 3.2.0", and I didn't want to do that if I could help it.

We didn't end up migrating MachineConfig to ignition (boo double-reboot), but I left that function in because I figured if someday anything ever populates ignition kargs (say in a template or something), that instantly breaks anything requesting < 3.3 and that's probably still bad. The converter would choke on it because unsupported fields are populated. and since we're carrying all this "downconvert and serve any version we've ever supported" translation logic around still, I assumed we were trying to keep best-effort compatibility with....old things? If possible?

I mean, we haven't fixed bootimages yet....so if I have a cluster with old bootimages that scales up a node against the 4.14 MCS that's serving a config with ignition kargs populated, it's going to request < 3.3, get an error, and fail to complete unless I migrate the args out isn't it? Or am I missing a thing?

(This is the part where we find out if John is actually a dunce 😛 )

This updates the default "internal" MCO version to 3.4.0 and it will by
default try to convert/translate everything it gets to that version.

The convention I'm following here is that int3types becomes the base
ignition3 version that the MCO is using, and anything called out
explicitly like ign3_4types is being used explicitly as that version
not as the default version. So (hopefully) the next person that gets
to do a migration should only have to go replace ign3types, add the
downtranslator, and see what broke.
@jkyros
Copy link
Contributor Author

jkyros commented Aug 11, 2023

Rebased and removed the noop per Jerry's comment #3814 (comment). No other changes, functionality remains unchanged.

@yuqi-zhang
Copy link
Contributor

So I originally added that because I was thinking specifically "if we migrate the kargs from MachineConfig into ignition, we're going to break something like hypershift that explicitly requests 3.2.0", and I didn't want to do that if I could help it.

We didn't end up migrating MachineConfig to ignition (boo double-reboot), but I left that function in because I figured if someday anything ever populates ignition kargs (say in a template or something), that instantly breaks anything requesting < 3.3 and that's probably still bad. The converter would choke on it because unsupported fields are populated. and since we're carrying all this "downconvert and serve any version we've ever supported" translation logic around still, I assumed we were trying to keep best-effort compatibility with....old things? If possible?

Makes sense to me! I'm 100% for having the backward compatibility, was more so of a curiosity question where I was trying to think of a scenario where it would be used as is today. Definitely will be used in the future such as:

I mean, we haven't fixed bootimages yet....so if I have a cluster with old bootimages that scales up a node against the 4.14 MCS that's serving a config with ignition kargs populated, it's going to request < 3.3, get an error, and fail to complete unless I migrate the args out isn't it? Or am I missing a thing?

This can't happen (without user doing something wonky) directly at the point of this PR I think? But I agree this is good to have since once we do enable you to specify ignition kargs on existing clusters, this will be needed

@yuqi-zhang
Copy link
Contributor

/lgtm

New changes lgtm, I think Sergio's validation is still accurate, so no need to re-verify

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 11, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 11, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cdoern, jkyros, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [cdoern,jkyros,yuqi-zhang]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 63d7be1 and 2 for PR HEAD 126c164 in total

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 12, 2023

@jkyros: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit 2513389 into openshift:master Aug 12, 2023
13 checks passed
travier added a commit to travier/butane that referenced this pull request Sep 21, 2023
Stabilise openshift 4.14.0 spec on fcos 1.5 and ignition 3.4 stable
specs. The MCO is now capable of understanding Ignition 3.4.0 specs and
uses that by default.

See: openshift/machine-config-operator#3576
See: openshift/machine-config-operator#3814
@travier travier mentioned this pull request Sep 21, 2023
11 tasks
zeeke added a commit to zeeke/cnf-features-deploy that referenced this pull request Feb 9, 2024
ZTP must produce MachineConfig resources with ignition version v3.2.0
Since openshift/machine-config-operator#3814, MCO writes v3.4.0 ignition configuration.
openshift/machine-config-operator@63d7be1 is the commit just before the culprit one.

Also, machine-config-operator@63d7be1ef18b86826b47c61172c7a9dc7c2b6de1
has a transitive dependency to `github.com/Microsoft/hcsshim@v0.8.7` that cause
a `go mod tidy` error:

```
github.com/Microsoft/hcsshim@v0.8.7 requires
k8s.io/kubernetes@v1.13.0 requires
k8s.io/endpointslice@v0.0.0: reading k8s.io/endpointslice/go.mod at revision v0.0.0: unknown revision v0.0.0
```
which is fixed in v0.8.8 by microsoft/hcsshim#783

Signed-off-by: Andrea Panattoni <apanatto@redhat.com>
zeeke added a commit to zeeke/cnf-features-deploy that referenced this pull request Feb 13, 2024
ZTP must produce MachineConfig resources with ignition version v3.2.0
Since openshift/machine-config-operator#3814, MCO writes v3.4.0 ignition configuration.
openshift/machine-config-operator@63d7be1 is the commit just before the culprit one.

Also, machine-config-operator@63d7be1ef18b86826b47c61172c7a9dc7c2b6de1
has a transitive dependency to `github.com/Microsoft/hcsshim@v0.8.7` that cause
a `go mod tidy` error:

```
github.com/Microsoft/hcsshim@v0.8.7 requires
k8s.io/kubernetes@v1.13.0 requires
k8s.io/endpointslice@v0.0.0: reading k8s.io/endpointslice/go.mod at revision v0.0.0: unknown revision v0.0.0
```
which is fixed in v0.8.8 by microsoft/hcsshim#783

Signed-off-by: Andrea Panattoni <apanatto@redhat.com>
zeeke added a commit to zeeke/cnf-features-deploy that referenced this pull request Feb 19, 2024
ZTP must produce MachineConfig resources with ignition version v3.2.0
Since openshift/machine-config-operator#3814, MCO writes v3.4.0 ignition configuration.
openshift/machine-config-operator@63d7be1 is the commit just before the culprit one.

Also, machine-config-operator@63d7be1ef18b86826b47c61172c7a9dc7c2b6de1
has a transitive dependency to `github.com/Microsoft/hcsshim@v0.8.7` that cause
a `go mod tidy` error:

```
github.com/Microsoft/hcsshim@v0.8.7 requires
k8s.io/kubernetes@v1.13.0 requires
k8s.io/endpointslice@v0.0.0: reading k8s.io/endpointslice/go.mod at revision v0.0.0: unknown revision v0.0.0
```
which is fixed in v0.8.8 by microsoft/hcsshim#783

Signed-off-by: Andrea Panattoni <apanatto@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. qe-approved Signifies that QE has signed off on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

openshift experimental spec uses an Ignition spec not supported by the MCO
8 participants