Runtimecfg broken for ovirt #40

Gal-Zaidman · 2020-01-16T14:36:38Z

Hello,
Yesterday ovirt ocp ci runs started falling in the night due to the change #38.
When debugging the issue we noticed that the masters that started couldn't publish their service because of:
2020-01-16T09:52:44.729133066+00:00 stderr F time="2020-01-16T09:52:44Z" level=fatal msg="Failed to find interface for specified ip" ip=192.168.201.0
we looked at the config file which is created by runtimecfg /etc/mdns/config.hcl and saw that the bind_address was set to "192.168.201.0" which should be "192.168.201.31".
We built the baremetal-runtime pod without the change and everything went smooth.

On our ENV we have one interface on each master which has IPv6 and IPv4 address.
We initially thought that it was the root casue so we tried removing the IPv6 address (since we are not using it) delete the conf, and restart the pod but the problem remained.

If you need env to try and debug this we recommend triggering an ovirt job.

The text was updated successfully, but these errors were encountered:

celebdor · 2020-01-16T14:49:47Z

@Gal-Zaidman restarting the pod does not re-generate the mdns configuration, could you delete the configuration and use crictl to delete the pod? If you do, kubelet should create the pod again and that will trigger the initcontainer again.

stbenjam · 2020-01-16T14:57:58Z

This is happening even on baremetal IPI, all installs started failing for us yesterday.

In mdns-publisher, I see this:

[core@master-0 ~]$ sudo crictl logs 3ec9fcdd466c2
time="2020-01-16T14:54:21Z" level=info msg="Publishing with settings" collision_avoidance=hostname ip=192.168.111.0
time="2020-01-16T14:54:21Z" level=fatal msg="Failed to find interface for specified ip" ip=192.168.111.0

Happy to share access to the env if you'd like

Gal-Zaidman · 2020-01-16T15:14:10Z

@Gal-Zaidman restarting the pod does not re-generate the mdns configuration, could you delete the configuration and use crictl to delete the pod? If you do, kubelet should create the pod again and that will trigger the initcontainer again.

That is what we did, we stopped kubelet, removed the mdns pod, removed the file and then started the kubelet. We made sure the init container runs again

cybertron · 2020-01-16T16:19:25Z

This is because net.ParseCIDR zeroes out the part of the address past the netmask. See https://play.golang.org/p/ZZcvN-JpS6A

I think we need to reconstruct the IPNet by keeping the first two return values from ParseCIDR and then packing them back together.

ip, n, _ := net.ParseCIDR("1.1.1.1/16")
n = &net.IPNet{IP: ip, Mask: n.Mask}

rgolangh · 2020-01-19T12:29:45Z

was this fixed by #42 ?

cybertron · 2020-01-21T08:31:35Z

Technically it was fixed by #41, but yeah it should be working now.

cybertron closed this as completed Jan 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Runtimecfg broken for ovirt #40

Runtimecfg broken for ovirt #40

Gal-Zaidman commented Jan 16, 2020

celebdor commented Jan 16, 2020

stbenjam commented Jan 16, 2020

Gal-Zaidman commented Jan 16, 2020

cybertron commented Jan 16, 2020

rgolangh commented Jan 19, 2020

cybertron commented Jan 21, 2020

Runtimecfg broken for ovirt #40

Runtimecfg broken for ovirt #40

Comments

Gal-Zaidman commented Jan 16, 2020

celebdor commented Jan 16, 2020

stbenjam commented Jan 16, 2020

Gal-Zaidman commented Jan 16, 2020

cybertron commented Jan 16, 2020

rgolangh commented Jan 19, 2020

cybertron commented Jan 21, 2020