-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libvirt with kvm installation failed #2157
Comments
See if you can reach the nodes from outside. If you can't reach them at all, you likely missed the firewall settings required. If you can reach them only through IP and not hostnames, then it's likely the DNS change didn't happen (Make sure you tol the NetworkManater to reload its config). |
@zeenix Thanks for the feedback. I could ssh into both the bootstrap and master node through IP. But it did not work using hostnames. I am pretty sure I configured & reloaded NetworkManager.
|
Right, DNS is your issue. Please verify that you followed every step precisely. Just to be sure, you've to use the fully-qualified hostnames. You can find them on the DHCP leases if you look at the XML of the cluster network that Installer creates. |
Oh and I ended up wasting a lot of time recently when i missed the fact that both the |
I just tried again, actually I could ssh into the master node using hostname as well. I just used the hostname printed out in the log entries. So why the installation failed in the end?
The reason why I failed to ssh the node previously was that I used the wrong hostnames, which were printed out using virsh
|
Could you please provide more detailed info on how to get the fully-qualified hostnames? |
After ssh into the bootstrap node, I saw lots of error logs,
|
The |
I created a brand new VM, and tried again following the guide precisely, and failed again. Once I configured NetworkManager to use dnsmasq and reloaded NerworkManager, then all the original nameserver entries in /etc/resolv.conf were gone, and a new record "nameserve 127.0.0.1" was added. Afterwards, when I tried to execute "bin/openshift-install create cluster --log-level debug", then I got the following error.
|
I actually used the code 885a442 (Aug 3) and at least I can create the vms and operators mostly online.. ( I actually omitted the NetworkManager stuff) if you just for test purpose, I think maybe you can first avoid the installer machine's effort, maybe you can ssh to the new created machines and check whether |
Ah, it's a nested virt case. Make sure you're not bitten by the famous CSR issue. About fully-qualified domain names, you need to be on the latest git master of Installer binary for that. I only recently fixed that. |
It's exactly what I am doing right now (I mean ignoring the NetworkManager/DNS configuration), and it took a long time for majority of the operators/PODS to be running, and both kubectl & oc work now. It seems that I ran into the same issue as 1428. But the worker node did not get created, and the bootstrap node did not get destroyed as expected, because the installer "waiting for Kubernetes API: context deadline exceeded " in the end.
Thanks for the info. The fully-qualified domain issue was gone after using the latest master code. Regarding the CSR issue, I will take a look later. |
I tried again on a powerful physical machine instead of a VM following the guide precisely, and it's much better now. Both master node and work node were created, and the bootstrap node was destroyed as expected. But the installation still failed in the end,
Version$ bin/openshift-install version Other infoSome PODs are in the status of "Preempting", and some are always in "ContainerCreating",
|
Based on the definition of deployment "etcd-quorum-guard", the replica is 3, and the 3 PODs must be running on different master node per the pod anti-affinity. So there should be at least 3 master node, otherwise 2 PODs will always be in "Pending" status? And there should be at least 2 worker node, because two replicas of deployment "deployment router-default" must be running on different worker nodes per the pod anti-affinity definition. Since it's just for test purpose, so It'd be better to create only one master node and one worker node. |
That's a very different issue now so please file a different one, unless there already is one (I think there is). Did you check the CSR on the VM? That's the only needed info right now here. |
Although that's no longer needed if you create a new cluster now. But try reproducing the issue again on the VM and if you can still reproduce the original issue, please check the CSR to be sure. |
I assigned more memory and CPU for the master node, and it's working now.
Installing OpenShift with libvirt on VM is really time consuming, normally it needs at least half day to reproduce/verify an issue. It took me a couple of days fighting with OpenShift Installer with libvirt on VM. I will take this as a low priority task, since I have just transferred all my work on a powerful physical machine. |
I understand. I'll close this for now then. Please let us know if you can reproduce the original issue and we can reopen it for you. /close |
@zeenix: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Version
Platform:
What happened?
What you expected to happen?
I expected to see successful result.
How to reproduce it (as minimally and precisely as possible)?
I just followed the guide as below,
https://github.com/openshift/installer/blob/master/docs/dev/libvirt/README.md
Other info
The text was updated successfully, but these errors were encountered: