Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add oc ignition convert3 #628

Closed
wants to merge 1 commit into from

Conversation

cgwalters
Copy link
Member

The transition to Ignition Spec 3 with 4.6 creates a
discontinuity. Some users want to update their bootimages,
e.g. for a cluster originally provisioned as 4.4 but upgraded
in place to 4.6, it should be possible to directly use RHCOS 4.6
bootimages for new workers.

In some cases in fact, this could be required for things like
adding a node with newer hardware.

The main stumbling block here is the pointer ignition config
which is generated by openshift-install. Since the idea is
openshift-install should in theory be disposable after a cluster
is provisioned, let's add this to oc which admins will need anyways.
Vendor and use
https://github.com/coreos/ign-converter
the same as the MCO uses.

xref openshift/enhancements#492 (comment)
xref https://bugzilla.redhat.com/show_bug.cgi?id=1884750

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: cgwalters
To complete the pull request process, please assign tnozicka after the PR has been reviewed.
You can assign the PR to them by writing /assign @tnozicka in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kikisdeliveryservice
Copy link

This seems like a good idea to me!! LGTM

@cgwalters
Copy link
Member Author

OK resolved outstanding comments so far.

@k-keiichi-rh
Copy link

k-keiichi-rh commented Nov 11, 2020

I tested if we can add new nodes to the upgraded cluster using the converted Ignition Config by this tool
and then found an issue of failing to add new nodes to a cluster in vSphere.

[ Test conditions ]

  • Upgraded the cluster from OCP4.5 to OCP4.6 in vSphere
  • Converted the Ignition Config which is generated by openshift-install when deploying OCP4.5.
  • Added new nodes to the upgraded cluster by using the converted Ignition Config and the OCP4.6 base image.

It failed to add new nodes and the following error is repeated:

GET https://api-int.share.test:22623/config/worker: attempt GET error: GET "https://api-int.share.test:22623/config/worker": x509: certificate relies on legacy Common Name filed, use SANs or temporarily enable Common Name matching with GODEBUG=x509ignoreCN=0

This issue is caused by missing "X509v3 Subject Alternative Name"(SAN) field in the OpenSSL x509 certificate of https://api-int.share.test:22623/config/worker.
After OCP4.6(go1.15 and later), the CommonName field on X.509 certificates is deprecated and the SAN field should be set in the certificates. However, OCP4.5 cluster on vSphere platform doesn't set the SAN field.

The following patch is related to this issue:

openshift/installer@b2d424b
=> It's changed to set the SAN field when deploying OCP4.6 clusters on vSphere platform.

So this converter tool won't work in Sphere platform.

To avoid this issue, the following steps are required:

  1. Retrieve Ignition Config from the upgrade cluster(OCP4.6)
 # curl -H "Accept: application/vnd.coreos.ignition+json; version=3.1.0" -k https://<api-server-url>:22623/config/worker
  1. Add new nodes using the retrieved Ignition Config and OCP4.6 base image.

Or

The X.509 certificate for the api-int endpoint should be re-created with the SNA field.
I removed machine-config-server-tls in openshift-machine-config-operator.
And then it was re-created automatically, but it didn't have the SAN field.

If this pr is not the right place to discuss, please correct me.
I will move to the right place.

@cgwalters
Copy link
Member Author

Hi right, you hit an unrelated issue in 4.5 ➡️ 4.6 upgrades. This is https://bugzilla.redhat.com/show_bug.cgi?id=1886134 for Ignition, and probably warrants an entry in the knowledge base and explicitly tracking a fix.

Perhaps that fix could go alongside here in oc indeed...something like oc ignition get-pointer-config that would just regenerate it with an updated cert?

Alternatively we could fix the above bug for Ignition and work around the behavior for a while longer.

@k-keiichi-rh
Copy link

Hi right, you hit an unrelated issue in 4.5 ➡️ 4.6 upgrades. This is https://bugzilla.redhat.com/show_bug.cgi?id=1886134 for Ignition, and probably warrants an entry in the knowledge base and explicitly tracking a fix.

Thank you for informing me. I will follow the bugzilla.

The transition to Ignition Spec 3 with 4.6 creates a
discontinuity. Some users want to update their bootimages,
e.g. for a cluster originally provisioned as 4.4 but upgraded
in place to 4.6, it should be possible to directly use RHCOS 4.6
bootimages for new workers.

I have a question regarding adding new nodes to the upgraded OCP4.6 cluster by using RHCOS 4.6 boot images directly.
In this case, we have to convert Ignition Config Spec 2 that was used to install the OCP4.5 cluster.

There are two ways to achieve the above:

  1. Convert Ignition Config from Ignition Spec 2 to Ignition Spec 3 by "oc ignition convert3"
  2. Get Ignition Spec 3 from MCO
    ex) "curl -H "Accept: application/vnd.coreos.ignition+json; version=3.1.0" -k https://:22623/config/worker"

Is there any difference between "Option 1" and "Option 2"?
As for Option 1, we have to keep the Ignition Config that was used to install the OCP4.5 cluster.
But, as for Option 2, we can dispose the Ignition Config and can get the Ignition Config to add new nodes from the existing cluster each time. It looks like Option 2 is easier and more maintainable way that Option 1.

@miabbott
Copy link
Member

Looks like the new subcommand is missing a help string:

$ ./oc --help | grep -C 3 ignition
  image           Useful commands for managing images
  registry        Commands for working with the registry
  idle            Idle scalable resources
  ignition        
  api-versions    Print the supported API versions on the server, in the form of "group/version"
  api-resources   Print the supported API resources on the server
  cluster-info    Display cluster info

cfg, rpt, err := ignv2.Parse(data)
fmt.Fprintf(os.Stderr, "%s", rpt.String())
if err != nil || rpt.IsFatal() {
return errors.Errorf("Error parsing spec v2 config: %v\n%v", err, rpt)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a user feeds a v3 config to the command, it returns this same error.

$ jq .ignition.version < ignitionv3.json 
"3.1.0"
$ ./oc ignition convert3 --in ignitionv3.json --out ignitionv3a.json
error: Error parsing spec v2 config: unsupported config version

Could we check the incoming config version before trying to parse?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we check the incoming config version before trying to parse?

Not really, to get the version we need to parse... I think you're just asking for a better error message? Makes total sense but I think the fix would probably be in ign-converter.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignition now has config.util.GetConfigVersion(), which should work with 2.x configs also.

@cgwalters
Copy link
Member Author

Get Ignition Spec 3 from MCO
ex) "curl -H "Accept: application/vnd.coreos.ignition+json; version=3.1.0" -k https://:22623/config/worker"

No, that's going to cause problems because you're getting the full config and not just the pointer. We don't expose an explicit API to retrieve the pointer config today, see discussions in openshift/enhancements#540

The transition to Ignition Spec 3 with 4.6 creates a
discontinuity. Some users want to update their bootimages,
e.g. for a cluster originally provisioned as 4.4 but upgraded
in place to 4.6, it should be possible to directly use RHCOS 4.6
bootimages for new workers.

In some cases in fact, this could be required for things like
adding a node with newer hardware.

The main stumbling block here is the pointer ignition config
which is generated by `openshift-install`.  Since the idea is
`openshift-install` should in theory be disposable after a cluster
is provisioned, let's add this to `oc` which admins will need anyways.
Vendor and use
https://github.com/coreos/ign-converter
the same as the MCO uses.

xref openshift/enhancements#492 (comment)
xref https://bugzilla.redhat.com/show_bug.cgi?id=1884750
@cgwalters
Copy link
Member Author

Looks like the new subcommand is missing a help string:

Fixed.

@openshift-merge-robot
Copy link
Contributor

openshift-merge-robot commented Nov 24, 2020

@cgwalters: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-agnostic-cmd 358978d link /test e2e-agnostic-cmd
ci/prow/e2e-aws 358978d link /test e2e-aws

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@marun
Copy link

marun commented Dec 15, 2020

/cc @soltysh

@soltysh
Copy link
Member

soltysh commented Dec 16, 2020

/hold
Adding new commands post-freeze is not acceptable, so at soonest this can be a 4.8 item. Secondly I haven't singed off the addition of a new command, yet.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 16, 2020
@soltysh soltysh self-assigned this Dec 16, 2020
@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 16, 2021
@travier
Copy link
Member

travier commented Mar 22, 2021

/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 22, 2021
@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 20, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 20, 2021

@cgwalters: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 20, 2021
@travier
Copy link
Member

travier commented Jun 21, 2021

/remove-lifecycle stale

@openshift-ci openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 21, 2021
@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 26, 2021
@travier
Copy link
Member

travier commented Oct 27, 2021

We are now several releases past this migration so we should probably give up on this one.

@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 26, 2021
@openshift-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this Dec 26, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 26, 2021

@openshift-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet