Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure install fails (4.12, 4.13, and 4.14) #7703

Closed
japhar81 opened this issue Nov 9, 2023 · 6 comments
Closed

Azure install fails (4.12, 4.13, and 4.14) #7703

japhar81 opened this issue Nov 9, 2023 · 6 comments
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@japhar81
Copy link

japhar81 commented Nov 9, 2023

Version

tested on release-4.12, release-4.13, and release-4.14

$ openshift-install version
bin/openshift-install unreleased-master-8673-gc0f108be993307e519f8847264d06bc99a9cc7ad
built from commit c0f108be993307e519f8847264d06bc99a9cc7ad
release image registry.ci.openshift.org/origin/release:4.14
release architecture amd64

Platform:

Azure IPI

What happened?

Installer times out after creating masters:

INFO Pulling debug logs from the bootstrap machine
DEBUG Using SSH_AUTH_SOCK /private/tmp/com.apple.launchd.V4UxPZEsUw/Listeners to connect to an existing agent
ERROR Attempted to gather debug logs after installation failure: failed to create SSH client: failed to use pre-existing agent, make sure the appropriate keys exist in the agent for authentication: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain
ERROR Bootstrap failed to complete: timed out waiting for the condition
ERROR Failed to wait for bootstrapping to complete. This error usually happens when there is a problem with control plane hosts that prevents the control plane operators from creating the control plane.

SSH'ing into one of the masters shows two failed services:

Fedora CoreOS 38.20231002.3.1
Tracker: https://github.com/coreos/fedora-coreos-tracker
Discuss: https://discussion.fedoraproject.org/tag/coreos

[systemd]
Failed Units: 2
  ovsdb-server.service
  systemd-sysusers.service

systemd-sysusers log:

Nov 09 18:56:43 hz1-fdpth-master-2 systemd[1]: Starting systemd-sysusers.service - Create System Users...
Nov 09 18:56:43 hz1-fdpth-master-2 systemd-sysusers[799]: /usr/lib/sysusers.d/20-setup-groups.conf:24: Conflict with earlier configuration for group 'nobody' in /usr/lib/sysusers.d/00-coreos-nobody.conf:8, ignoring line.
Nov 09 18:56:43 hz1-fdpth-master-2 systemd-sysusers[799]: /usr/lib/sysusers.d/20-setup-users.conf:13: Conflict with earlier configuration for user 'nobody' in /usr/lib/sysusers.d/00-coreos-nobody.conf:9, ignoring line.
Nov 09 18:56:43 hz1-fdpth-master-2 systemd-sysusers[799]: /usr/lib/sysusers.d/basic.conf:13: Conflict with earlier configuration for group 'nobody' in /usr/lib/sysusers.d/00-coreos-nobody.conf:8, ignoring line.
Nov 09 18:56:43 hz1-fdpth-master-2 systemd-sysusers[799]: /usr/lib/sysusers.d/basic.conf:14: Conflict with earlier configuration for user 'nobody' in /usr/lib/sysusers.d/00-coreos-nobody.conf:9, ignoring line.
Nov 09 18:56:43 hz1-fdpth-master-2 systemd-sysusers[799]: /usr/lib/sysusers.d/chrony.conf:2: Conflict with earlier configuration for user 'chrony' in /usr/lib/sysusers.d/00-coreos-static.conf:21, ignoring line.
Nov 09 18:56:43 hz1-fdpth-master-2 systemd-sysusers[799]: /usr/lib/sysusers.d/dbus.conf:2: Conflict with earlier configuration for user 'dbus' in /usr/lib/sysusers.d/10-static-extra.conf:19, ignoring line.
Nov 09 18:56:43 hz1-fdpth-master-2 systemd-sysusers[799]: /usr/lib/sysusers.d/openssh-server.conf:2: Conflict with earlier configuration for user 'sshd' in /usr/lib/sysusers.d/10-static-extra.conf:23, ignoring line.
Nov 09 18:56:43 hz1-fdpth-master-2 systemd-sysusers[799]: /usr/lib/sysusers.d/openvswitch.conf:2: Conflict with earlier configuration for user 'openvswitch' in /usr/lib/sysusers.d/35-rpmostree-pkg-user-openvswitch.conf:2, ignoring line.
Nov 09 18:56:43 hz1-fdpth-master-2 systemd-sysusers[799]: /usr/lib/sysusers.d/systemd-resolve.conf:8: Conflict with earlier configuration for user 'systemd-resolve' in /usr/lib/sysusers.d/00-coreos-static.conf:31, ignoring line.
Nov 09 18:56:43 hz1-fdpth-master-2 systemd-sysusers[799]: /usr/lib/sysusers.d/systemd-timesync.conf:8: Conflict with earlier configuration for user 'systemd-timesync' in /usr/lib/sysusers.d/00-coreos-static.conf:32, ignoring line.
Nov 09 18:56:43 hz1-fdpth-master-2 systemd-sysusers[799]: Creating group 'hugetlbfs' with GID 978.
Nov 09 18:56:43 hz1-fdpth-master-2 systemd-sysusers[799]: Creating group 'openvswitch' with GID 977.
Nov 09 18:56:43 hz1-fdpth-master-2 systemd-sysusers[799]: Creating group 'unbound' with GID 976.
Nov 09 18:56:43 hz1-fdpth-master-2 systemd-sysusers[799]: Creating user 'openvswitch' (Open vSwitch Daemons) with UID 977 and GID 977.
Nov 09 18:56:43 hz1-fdpth-master-2 systemd-sysusers[799]: Creating user 'unbound' (Unbound DNS resolver) with UID 976 and GID 976.
Nov 09 18:56:43 hz1-fdpth-master-2 systemd-sysusers[799]: /etc/gshadow: Group "openvswitch" already exists.
Nov 09 18:56:43 hz1-fdpth-master-2 systemd[1]: systemd-sysusers.service: Main process exited, code=exited, status=1/FAILURE

ovsdb-server log:

Nov 09 18:56:48 hz1-fdpth-master-2 systemd[1]: Starting ovsdb-server.service - Open vSwitch Database Unit...
Nov 09 18:56:48 hz1-fdpth-master-2 chown[1434]: /usr/bin/chown: invalid user: ‘openvswitch:hugetlbfs’
Nov 09 18:56:48 hz1-fdpth-master-2 sh[1439]: /usr/bin/chown: invalid user: ‘openvswitch:hugetlbfs’
Nov 09 18:56:48 hz1-fdpth-master-2 sh[1440]: /usr/bin/chown: invalid user: ‘openvswitch:hugetlbfs’
Nov 09 18:56:48 hz1-fdpth-master-2 sh[1441]: /usr/bin/chown: invalid user: ‘openvswitch:hugetlbfs’
Nov 09 18:56:48 hz1-fdpth-master-2 ovs-ctl[1471]: id: 'openvswitch': no such user
Nov 09 18:56:48 hz1-fdpth-master-2 ovs-ctl[1472]: id: 'openvswitch': no such user
Nov 09 18:56:48 hz1-fdpth-master-2 ovs-ctl[1474]: id: 'openvswitch': no such user
Nov 09 18:56:48 hz1-fdpth-master-2 ovs-ctl[1476]: setpriv: failed to parse reuid: ''
Nov 09 18:56:48 hz1-fdpth-master-2 ovs-ctl[1478]: id: 'openvswitch': no such user
Nov 09 18:56:48 hz1-fdpth-master-2 ovs-ctl[1479]: id: 'openvswitch': no such user
Nov 09 18:56:48 hz1-fdpth-master-2 ovs-ctl[1481]: id: 'openvswitch': no such user
Nov 09 18:56:48 hz1-fdpth-master-2 ovs-ctl[1483]: setpriv: failed to parse reuid: ''
Nov 09 18:56:48 hz1-fdpth-master-2 ovs-ctl[1484]: install: invalid user 'openvswitch'
Nov 09 18:56:48 hz1-fdpth-master-2 ovsdb-server[1486]: ovs|00001|daemon_unix|EMER|(null): user openvswitch not found, aborting.

This repeats over and over, and the cluster never comes up.

What you expected to happen?

Cluster to come up

How to reproduce it (as minimally and precisely as possible)?

  1. git checkout release-4.14
  2. hack/build.sh
  3. bin/openshift-install create cluster
  4. provide azure creds
  5. wait

References

Nothing I can find except an old libvirt issue that seems to have the same error, but no resolution.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 8, 2024
@r4f4
Copy link
Contributor

r4f4 commented Feb 22, 2024

Missing TAGS=okd as found out in the x-post.

/close

@openshift-ci openshift-ci bot closed this as completed Feb 22, 2024
Copy link
Contributor

openshift-ci bot commented Feb 22, 2024

@r4f4: Closing this issue.

In response to this:

Missing TAGS=okd as found out in the x-post.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@fdavalo
Copy link

fdavalo commented Apr 15, 2024

same issue for a gcp install for openshift, i would like to avoid okd workaround here, is there a solution ongoing ?

@r4f4
Copy link
Contributor

r4f4 commented Apr 15, 2024

same issue for a gcp install for openshift, i would like to avoid okd workaround here, is there a solution ongoing ?

Which workaround?

@fdavalo
Copy link

fdavalo commented Apr 15, 2024

sorry, the issue is because the customer re built the openshift installer and by default the release image is okd, and if they do not state TAGS=okd, the installation fails. They are re building with the release image pointing to ocp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

4 participants