Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCP 4.5.x : bootkube.sh[2365]: Error: unknown flag: --etcd-ca-key #4332

Closed
ElCoyote27 opened this issue Nov 1, 2020 · 30 comments
Closed

OCP 4.5.x : bootkube.sh[2365]: Error: unknown flag: --etcd-ca-key #4332

ElCoyote27 opened this issue Nov 1, 2020 · 30 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@ElCoyote27
Copy link

Since around the time 4.6.x came out, I've been unable to deploy 4.5.x as the bootstrap VM now complains that bootkube.sh doesn not recognize etc_ca_cert (that's from memory) as an argument.

Has anyone seen that?
as for 4.5, I've tried 4.5.11, 4.5.12 and 4.5.15, 4.5.16 and 4.5.17 (I always try not the latest version because I want to be able to have an update available right after deploy)

bootkube.sh gives a syntax error and aborts since it doesn't recognize that cli argument.. hence the bootstrap never completes

4.4.29 deploys fine, so does 4.6.1/

Nov 01 22:19:21 ocp4d-rkk7j-bootstrap kernel: SELinux: mount invalid.  Same superblock, different security settings for (dev mqueue, type mqueue)
Nov 01 22:19:21 ocp4d-rkk7j-bootstrap bootkube.sh[2365]: Error: unknown flag: --etcd-ca-key
Nov 01 22:19:21 ocp4d-rkk7j-bootstrap bootkube.sh[2365]: Usage:
Nov 01 22:19:21 ocp4d-rkk7j-bootstrap bootkube.sh[2365]:   cluster-etcd-operator render [flags]
Nov 01 22:19:21 ocp4d-rkk7j-bootstrap bootkube.sh[2365]: Flags:
[.....]
Nov 01 22:19:21 ocp4d-rkk7j-bootstrap bootkube.sh[2365]: unknown flag: --etcd-ca-key
Nov 01 22:19:21 ocp4d-rkk7j-bootstrap systemd[1]: libpod-4ec0e6f761d00041be5579575e7d38d39cdcf10dc7fefe9873c19bdbd9416c51.scope: Consumed 293ms CPU time
@ElCoyote27
Copy link
Author

this is on libvirt IPI using the ocp_libvirt_ipi role (this has worked for me for months, on many different versions of OCP).

@ElCoyote27
Copy link
Author

@luisarizmendi Any ideas?

@kalranitin
Copy link

kalranitin commented Nov 2, 2020

I see the same behavior. In fact I am not able to get past 4.6.1 as well. I have tried both libvirt and Bare Metal. None of them is working.

Nov 02 01:10:07 bootstrap.ocp4.coe-example.com bootkube.sh[2695]: Moving OpenShift manifests in with the rest of them
Nov 02 01:10:07 bootstrap.ocp4.coe-example.com bootkube.sh[2695]: Rendering Cluster Version Operator Manifests...
Nov 02 01:10:07 bootstrap.ocp4.coe-example.com bootkube.sh[2695]: Rendering CEO Manifests...
Nov 02 01:10:10 bootstrap.ocp4.coe-example.com bootkube.sh[2695]: Error: unknown flag: --etcd-ca-key

@staebler
Copy link
Contributor

staebler commented Nov 2, 2020

Are you using a version of openshift-installer that matches the OpenShift version that you are trying to install? The --etcd-ca-key flag was added to the cluster-etcd-operator render command in 4.5 with openshift/cluster-etcd-operator#438. The bootstrap.sh script was changed in the installer in 4.5 with #4150. If you are using an openshift-installer built with the updated bootstrap.sh, then it will not be able to successfully install an OpenShift release built without the cluster-etcd-operator changes.

@kalranitin
Copy link

@staebler ... Yes ... I am using the installer with the same version as that of the Openshift cluster. For me Openshift 4.5.6 works fine with the "same version installer".
What I was trying to mention was for the 4.6.1 version (with a matching openshift installer). This install was tried on both Bare Metal and libvirt. Both of them are failing as of now.

@ElCoyote27
Copy link
Author

ElCoyote27 commented Nov 2, 2020

The ocp_libvirt_ipi role -always- uses the same version of the installer as the version of OCP you're trying to install
because it rebuilds the installer with libvirt. In what revision of 4.5.z was #4150 merged?

Ref: https://github.com/luisarizmendi/ocp-libvirt-ipi-role

@ElCoyote27
Copy link
Author

Since #4150 (comment) seems to imply that the fix was merged only 3 days ago, when can we expect this in a 4.5 release? I tested 4.5.17 and got the same error.
@staebler

@staebler
Copy link
Contributor

staebler commented Nov 2, 2020

The ocp_libvirt_ipi role -always- uses the same version of the installer as the version of OCP you're trying to install
because it rebuilds the installer with libvirt. In what revision of 4.5.z was #4150 merged?

Ref: https://github.com/luisarizmendi/ocp-libvirt-ipi-role

I am not convinced that this is accurate. How does the ocp_libvirt_ipi role know which commit to build in order to match it with the OpenShift release being installed? As far as I can tell, the clone [1] is just pulling the latest from the release-4.5 tag and not the commit matching a particular z-stream release.

[1] https://github.com/luisarizmendi/ocp-libvirt-ipi-role/blob/9938e919ffc6a65f5a742053f6b39e8ef56e3f15/tasks/ocp_deploy.yml#L27-L33

@staebler
Copy link
Contributor

staebler commented Nov 2, 2020

Since #4150 (comment) seems to imply that the fix was merged only 3 days ago, when can we expect this in a 4.5 release? I tested 4.5.17 and got the same error.
@staebler

4.5.17 is scheduled to be released on Nov 5.

@ElCoyote27
Copy link
Author

I am not too sure on how the ansible role is doing this but I've always been using it for months and every time I specified an OCP version (4.5.6, 44.18, 4.3.36, 4.6.1, anything else). I would see the role download the precise revision of the installer and I would obtain a running OCP cluster with the -same- version.

@staebler
Copy link
Contributor

staebler commented Nov 2, 2020

I am not too sure on how the ansible role is doing this but I've always been using it for months and every time I specified an OCP version (4.5.6, 44.18, 4.3.36, 4.6.1, anything else). I would see the role download the precise revision of the installer and I would obtain a running OCP cluster with the -same- version.

This is easy enough to verify. If you try to install 4.5.15, which commit is fetched from the installer repo?

@ElCoyote27
Copy link
Author

I tore down my 4.4.29 and tried again with 4.5.15.
Here's what I saw:

TASK [ocp-libvirt-ipi-role : Clone Openshift installer repo] ******************************************************************************
task path: /export/home/raistlin/World/Vincent/Code/GIT/virt-OCP/roles/ocp-libvirt-ipi-role/tasks/ocp_deploy.yml:27
changed: [daltigoth.lasthome.solace.krynn] => {"after": "a1f43445e365d186c3359c43961fa8974251edc0", "before": "0227b5f653786d8d58312cd08a2e924e72ae646f", "changed": true, "msg": "Local modifications exist.", "remote_url_changed": false}

@staebler
Copy link
Contributor

staebler commented Nov 2, 2020

I tore down my 4.4.29 and tried again with 4.5.15.
Here's what I saw:

TASK [ocp-libvirt-ipi-role : Clone Openshift installer repo] ******************************************************************************
task path: /export/home/raistlin/World/Vincent/Code/GIT/virt-OCP/roles/ocp-libvirt-ipi-role/tasks/ocp_deploy.yml:27
changed: [daltigoth.lasthome.solace.krynn] => {"after": "a1f43445e365d186c3359c43961fa8974251edc0", "before": "0227b5f653786d8d58312cd08a2e924e72ae646f", "changed": true, "msg": "Local modifications exist.", "remote_url_changed": false}

That pulled the most recent commit from the release-4.5 branch. It did not pull the commit necessary to install 4.5.15.

@ElCoyote27
Copy link
Author

Ah, that's interesting.. so how would one identify the commit needed for 4.5.15? is there a table somewhere? I didn't find relevant tags..
Is there a reason why I'm only running into this now? Shouldn't master be enough for it to work? Or should I try to use the 4.5.17 installer since the role pulls down 'master'?
Thanks,

@staebler
Copy link
Contributor

staebler commented Nov 2, 2020

Ah, that's interesting.. so how would one identify the commit needed for 4.5.15? is there a table somewhere? I didn't find relevant tags..

I do not know of a authoritative location where you could find the commit used in the 4.5.15 release. There may be one, but I don't know of it. You can see here that the release was created on Oct 14. From that you can deduce from the history of the release-4.5 branch, that the commit used was 9893a48.

Is there a reason why I'm only running into this now? Shouldn't master be enough for it to work? Or should I try to use the 4.5.17 installer since the role pulls down 'master'?

I guess you have been lucky in the past that there has not been any (noticeable) breaking changes.

Thanks,

@ElCoyote27
Copy link
Author

I'm still seeing that very issue if I use '4.5.17' (the installer) and master.

@ElCoyote27
Copy link
Author

ngs for file-filtered logging
Nov 02 16:32:16 ocp4d-q5zrn-bootstrap bootkube.sh[213621]: unknown flag: --etcd-ca-key
Nov 02 16:32:17 ocp4d-q5zrn-bootstrap systemd[1]: bootkube.service: Main process exited, code=exited, status=1/FAILURE
Nov 02 16:32:17 ocp4d-q5zrn-bootstrap systemd[1]: bootkube.service: Failed with result 'exit-code'.
Nov 02 16:32:22 ocp4d-q5zrn-bootstrap systemd[1]: bootkube.service: Service RestartSec=5s expired, scheduling restart.
Nov 02 16:32:22 ocp4d-q5zrn-bootstrap systemd[1]: bootkube.service: Scheduled restart job, restart counter is at 78.
[root@ocp4d-q5zrn-bootstrap ~]# uname -r
4.18.0-193.14.3.el8_2.x86_64

@staebler
Copy link
Contributor

staebler commented Nov 2, 2020

I'm still seeing that very issue if I use '4.5.17' (the installer) and master.

4.5.17 was built on Oct 28. The installer changes merged on Oct 29.

@ElCoyote27
Copy link
Author

@staebler Thanks for detailing that. So I should in fact be waiting for 4.5.18? when will its installer be release? do we have any idea at the moment?

@staebler
Copy link
Contributor

staebler commented Nov 2, 2020

@staebler Thanks for detailing that. So I should in fact be waiting for 4.5.18? when will its installer be release? do we have any idea at the moment?

There is a weekly cadence for z-stream releases. So, barring something that would push back the release, 4.5.18 should be released on Nov 9.

Note, however, that there is no libvirt installer released. You can use 4.5.17 right now: You just need to build the correct version of the installer.

@ElCoyote27
Copy link
Author

this is precisely my issue: in order for IPI to work on libvirt, the ocp_libvirt_ipi role downloads the client and the installer (it checks out master from the desired release branch) and rebuilds an installer with the libvirt bits enabled.. This is precisely where it's been failing for me since last week: because no installer currently carries the matching bits to be able to consume 'master' in 4.5.

https://github.com/luisarizmendi/ocp-libvirt-ipi-role/blob/master/tasks/ocp_deploy.yml#L6-L60

@kalranitin
Copy link

@staebler ... I tested it 4.5.17 and i am facing the same issue as @ElCoyote27 ... For me until 4.15.16 is working fine (matching installer versions) while it breaks from 4.15.17 onwards.

Additionally I am not able to run any successful installs at all on 4.6.1.

These installs have been tried both on libvirt and Bare Metal.

@staebler
Copy link
Contributor

staebler commented Nov 3, 2020

this is precisely my issue: in order for IPI to work on libvirt, the ocp_libvirt_ipi role downloads the client and the installer (it checks out master from the desired release branch) and rebuilds an installer with the libvirt bits enabled.. This is precisely where it's been failing for me since last week: because no installer currently carries the matching bits to be able to consume 'master' in 4.5.

https://github.com/luisarizmendi/ocp-libvirt-ipi-role/blob/master/tasks/ocp_deploy.yml#L6-L60

This sounds to me like a problem with https://github.com/luisarizmendi/ocp-libvirt-ipi-role rather than with https://github.com/openshift/installer. Maybe you could try opening an issue against that repo.

@nickhardiman
Copy link

I'm sure you already know this, but just in case -
This breaks the libvirt HOWTO instructions.
Perhaps a workaround is to change git clone to git clone --branch release-4.4 (I don't know, haven't tried).

@ElCoyote27
Copy link
Author

I tried 4.5.18 this morning on libvirt and the error is gone. My deployment is now proceeding happily.

@ElCoyote27
Copy link
Author

@nickhardiman Yes, that would have solved it (4.4 branch was unaffected)

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 3, 2021
@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 5, 2021
@openshift-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci-robot
Copy link
Contributor

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

6 participants