New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with etcd during OCP 4.3 installation on vSphere #3028
Comments
can you provide the log bundle from |
Port 2379 is not bind on master0,1,2. (Same issue with OCP 4.2) Jan 31 10:21:28 bootstrap bootkube.sh[1782]: https://etcd-2.vmware.cpod-aca-ocp.az-rbx.cloud-garage.net:2379 is unhealthy: failed to connect: dial tcp 172.23.3.64:2379: connect: no route to host |
ping @abhinavdahiya Do you have some news please ? |
It works with Bind. So probably an issue with dnsmasq. Keep you posted. |
@ac06012014 - did you ever figure out the root cause of this? I'm having the same issue on bare metal. |
Yes. It was due to etcd SRV resolution. This project can be help you https://github.com/RedHatOfficial/ocp4-helpernode You can view DNS bind entries. |
Hi, I am having the same issue. My etcd SRV entries are as below They look fine but i keep getting that error. Any idea? |
Facing the same problem with 4.3. SRV records are absolutely fine. In fact, I did a successful install just 41 days ago. Has anyone found a solution yet? |
@AlekseyUsov what version did you manage to install? |
@alfredzoto 4.3.0. |
Try to look this bind configuration |
@ac06012014 Just checked - all records are in place. They are absolutely identical to those of the successfully installed cluster. The only difference is time. Something must've got broken between now and then.
Again, they are completely identical to the previously deployed cluster. |
All my configurations seem ok. Please see attached dns and haproxy files. Mar 30 09:02:05 bootstrap.ocp4.example.com bootkube.sh[1434]: https://etcd-2.ocp4.example.com:2379 is unhealthy: failed to commit proposal: context deadline exceeded My understanding is that port 2379 is not open or soemthing. |
@alfredzoto Yes, it's configured exactly the same way in my environment. The port is not the issue, as the subnet is the same as for the already installed cluster + it's not that etcd members can't connect to each other, etcd processes just won't start. And logs are completely useless. |
@AlekseyUsov so the same configuration used to work somehow, but now it is not working. |
@alfredzoto Exactly. I was very careful not to change anything from the recent installation, just to make sure I don't introduce any unknowns. So seems like they were introduced somewhere else. |
Here are my DNS entries: |
Do you believe that on the SRV records i need to bind port 2379 instead of 2380 since the error im getting is : Mar 30 09:02:05 bootstrap.ocp4.example.com bootkube.sh[1434]: https://etcd-2.ocp4.example.com:2379 is unhealthy: failed to commit proposal: context deadline exceeded |
@alfredzoto I don't think so, as 2380/tcp is used for peer-to-peer communications, while client requests use 2379/tcp. It's my understanding that etcd members form a quorum first over 2380/tcp and then bootstrap process tries to contact each of them individually over 2379/tcp. |
Having the same issue as well. Didn't have the issue with 4.2 previously and I used an install script for both 4.2 and 4.3 installs. Here is part of the bootstrap output from journalctl command
|
Note, this also occurs with VMware vSphere 6.7 u2. |
SOLVED: https://github.com/vchintal/ocp4-vsphere-upi-automation/issues/12#issuecomment-612164871 Read the above note to see how I solved this with the help of the great, @jimbarlow. (https://github.com/jimbarlow) |
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
Rotten issues close after 30d of inactivity. Reopen the issue by commenting /close |
@openshift-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Hi there,
Version
OCP 4.3
Platform: vSphere 6.7 U3
What happened?
I've got this issue on bootstrap/master nodes even if I've respected all prerequisites.
journalctl -b -f -u bootkube.service
desc = latest connection error: connection error: desc = "transport: Error while dialing dial tcp 172.23.3.55:2379: connect: connection refused""}
Jan 30 18:38:46 master0 bootkube.sh[2802]: https://etcd-0.vmware.cpod-aca-ocp.az-rbx.cloud-garage.net:2379 is unhealthy: failed to commit proposal: context deadline exceeded
Jan 30 18:38:46 master0 bootkube.sh[2802]: https://etcd-1.vmware.cpod-aca-ocp.az-rbx.cloud-garage.net:2379 is unhealthy: failed to commit proposal: context deadline exceeded
Jan 30 18:38:46 master0 bootkube.sh[2802]: https://etcd-2.vmware.cpod-aca-ocp.az-rbx.cloud-garage.net:2379 is unhealthy: failed to commit proposal: context deadline exceeded
Jan 30 18:38:46 master0 bootkube.sh[2802]: Error: unhealthy cluster
Do you have any idea please ?
The text was updated successfully, but these errors were encountered: