Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fresh Installation on FCOS33 fails due to https://github.com/coreos/fedora-coreos-tracker/issues/646 #477

Closed
gtema opened this issue Jan 21, 2021 · 16 comments

Comments

@gtema
Copy link

gtema commented Jan 21, 2021

Describe the bug

A fresh installation (at least 4.6.0-0.okd-2020-12-12-135354) on using FCOS33 image fails with the error described in
coreos/fedora-coreos-tracker#646. Initial VM started works until it fetches OKD stuff (after bootstrap node reboots first it doesn't have access to internet anymore). Manual fixing of resolv.conf helps to continue, but this happens later also on all master nodes.
Provisioning of the same cluster using FCOS32 works without problems.

Version

4.6.0-0.okd-2020-12-12-135354

How reproducible
always

@devzeronull
Copy link

Did you take a look at the FAQ => Should I use Fedora CoreOS 32 or 33 for installing on User Provisioned Infrastructure?

Btw. you might run into Router connectivity issue with OpenShiftSDN in 4.6, one fix on a new Cluster could be to use the better and regularly tested OVNKubernetes CNI instead of OpenShiftSDN but you'll loose egress Router support (see the matrix from the link).

@gtema
Copy link
Author

gtema commented Jan 22, 2021

ogh, didn't see that link. I was pretty much creating new cluster using all the steps I new from previous installation, therefore not really re-reading FAQ

@devzeronull
Copy link

I can imagine :( had some related problems in the last week (originally using FCOS 32 images that were updated automatically during installation) and short before writing here I found the solution on my own while other similar issues popped up here...

@vrutkovs
Copy link
Member

Please attach log bundle.

Also, which platform is that?

@fortinj66
Copy link
Contributor

fortinj66 commented Jan 25, 2021

@vrutkovs this is probably the issue we are working on...

openshift/machine-config-operator#2359

@sandrobonazzola
Copy link

@Gal-Zaidman @janoszen can you please have a look?

@Gal-Zaidman
Copy link

@Gal-Zaidman @janoszen can you please have a look?

I didn't see any mention on oVirt as a platform...
Plus I don't think that we test okd & fcos on oVirt...

@fortinj66
Copy link
Contributor

@vrutkovs is indicating it is affected in the tracking link above

@Gal-Zaidman
Copy link

@vrutkovs is indicating it is affected in the tracking link above

Thanks missed that :)

@vrutkovs
Copy link
Member

@m-yosefpor
Copy link

This should be resolved since https://amd64.origin.releases.ci.openshift.org/releasestream/4.6.0-0.okd/release/4.6.0-0.okd-2021-02-11-022221

Great news 👌 I'll try a new installation with it today.

@m-yosefpor
Copy link

m-yosefpor commented Feb 11, 2021

Tested fedora-coreos-33.20210117.3.2-openstack.x86_64.qcow2 with https://amd64.origin.releases.ci.openshift.org/releasestream/4.6.0-0.okd/release/4.6.0-0.okd-2021-02-11-022221, UPI: Bootstrap node has been initialized successfully. It has been rebooted and the systemd-resolved has successfully managed nameservers, and all required pods/services are up and running on bootstrap and it's serving masters ign.

However, the installation will fail due to another problem.. masters will not join the cluster. kubelet and crio services will not start on masters. There is also this audit log which might help:

image

I assume there is a SELinux policy issue. However it does not seem to be related to this issue and is a new one. This sysmted-resolved nameserver issue is fixed. 👍

@vrutkovs
Copy link
Member

However, the installation will fail due to another problem.. masters will not join the cluster. kubelet and crio services will not start on masters. There is also this audit log which might help:

Please collect a log-bundle for this error.

Selinux issue needs to be reported at bugzilla.redhat.com for Fedora's chrony component

@m-yosefpor
Copy link

Sure @vrutkovs . I'll follow the rest in bugzilla. I've just found this https://bugzilla.redhat.com/show_bug.cgi?id=1910844.

@fortinj66
Copy link
Contributor

Sure @vrutkovs . I'll follow the rest in bugzilla. I've just found this https://bugzilla.redhat.com/show_bug.cgi?id=1910844.

That bug was resolved awhile ago. Newer dailies should have this fixed

@m-yosefpor
Copy link

@fortinj66 @vrutkovs Thank you.
I confirm this issue is fixed. I've tested UPI installation with most recent FCOS version and OKD release and everything is fine.

openshift-install 4.6.0-0.okd-2021-02-14-205305
FCOS 33.20210117.3.2 stable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants