New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Searchdomain crashes DHCP server #868

Closed
lukas-bednar opened this Issue Apr 4, 2018 · 7 comments

Comments

Projects
None yet
4 participants
@lukas-bednar
Member

lukas-bednar commented Apr 4, 2018

I am running latest KubeVirt on just released OpenShift 3.9, and when creating VM

apiVersion: kubevirt.io/v1alpha1
kind: VirtualMachine
metadata:
  annotations:
    presets.virtualmachines.kubevirt.io/presets-applied: kubevirt.io/v1alpha1
  clusterName: ""
  creationTimestamp: 2018-04-04T16:09:00Z
  generation: 0
  labels:
    kubevirt.io/nodeName: kubevirt-executor-lbednar-master1
  name: testvmgn56p
  namespace: kubevirt-test-default
  resourceVersion: "65056"
  selfLink: /apis/kubevirt.io/v1alpha1/namespaces/kubevirt-test-default/virtualmachines/testvmgn56p
  uid: 7cb2a2f0-3822-11e8-b93a-fa163e796a71
spec:
  domain:
    devices: {}
    features:
      acpi:
        enabled: true
    firmware:
      uuid: 52e2e942-8fcf-4a3e-bd7e-01d8e98db910
    machine:
      type: q35
    resources:
      requests:
        memory: 8Mi
  nodeSelector:
    kubernetes.io/hostname: kubevirt-executor-lbednar-master1
  terminationGracePeriodSeconds: 0
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: 2018-04-04T16:09:17Z
    message: unexpected EOF
    reason: Synchronizing with the Domain failed.
    status: "False"
    type: Synchronized
  interfaces:
  - ipAddress: 10.128.0.16
  nodeName: kubevirt-executor-lbednar-master1
  phase: Scheduled

The virt-launcher is failing on following issue:

[root@kubevirt-executor-lbednar-master1 ~]# oc logs  -n kubevirt-test-default virt-launcher-testvmgn56p-fxf5k
level=info timestamp=2018-04-04T16:09:16.840080Z pos=virt-launcher.go:120 component=virt-launcher msg="Watchdog file created at /var/run/kubevirt/watchdog-files/kubevirt-test-default_testvmgn56p"
level=info timestamp=2018-04-04T16:09:16.840400Z pos=client.go:164 component=virt-launcher msg="Registered libvirt event notify callback"
level=info timestamp=2018-04-04T16:09:16.840495Z pos=virt-launcher.go:58 component=virt-launcher msg="Marked as ready"
level=info timestamp=2018-04-04T16:09:16.840716Z pos=monitor.go:241 component=virt-launcher msg="Monitoring loop: rate 1s start timeout 5m0s"
level=error timestamp=2018-04-04T16:09:17.385206Z pos=network.go:151 component=virt-launcher msg="Updated Mac for iface: eth0 - 3a:d1:fc:5e:f5:6e"
level=info timestamp=2018-04-04T16:09:17.397269Z pos=network.go:198 component=virt-launcher msg="Found nameservers in /etc/resolv.conf: \ufffd\u0010\u0000\u0014"
level=info timestamp=2018-04-04T16:09:17.397329Z pos=network.go:199 component=virt-launcher msg="Found search domains in /etc/resolv.conf: kubevirt-test-default.svc.cluster.local svc.cluster.local cluster.local openstacklocal"
level=info timestamp=2018-04-04T16:09:17.397342Z pos=dhcp.go:58 component=virt-launcher msg="Starting SingleClientDHCPServer"
level=error timestamp=2018-04-04T16:09:17.397404Z pos=network.go:176 component=virt-launcher msg="failed to run DHCP: Search domain is not valid: 'openstacklocal'"
panic: Search domain is not valid: 'openstacklocal'

goroutine 31 [running]:
kubevirt.io/kubevirt/pkg/virt-launcher/virtwrap/network.(*NetworkUtilsHandler).StartDHCP(0x1c8fc08, 0xc420185b80, 0xc4200714a0)
	/root/go/src/kubevirt.io/kubevirt/pkg/virt-launcher/virtwrap/network/network.go:177 +0x3c0
created by kubevirt.io/kubevirt/pkg/virt-launcher/virtwrap/network.SetupDefaultPodNetwork
	/root/go/src/kubevirt.io/kubevirt/pkg/virt-launcher/virtwrap/network/network.go:303 +0x4b0
virt-launcher exited with code 2

This openstacklocal domain is in /etc/resolv.conf like

# nameserver updated by /etc/NetworkManager/dispatcher.d/99-origin-dns.sh
# Generated by NetworkManager
search cluster.local openstacklocal
nameserver 172.16.0.20

This OpenShift node is running as a VM inside of OpenStack cluster, and these values were generated by NetworkManager.

I will be happy for any suggestions!
Thanks,
Lukas.

@vladikr

This comment has been minimized.

Show comment
Hide comment
@vladikr

vladikr Apr 4, 2018

Member

@mlsorensen could you please take a look? Thanks!

Member

vladikr commented Apr 4, 2018

@mlsorensen could you please take a look? Thanks!

@vladikr

This comment has been minimized.

Show comment
Hide comment
@vladikr

vladikr Apr 4, 2018

Member

I think we should trust the pod's configuration more. If it worked for the pod it should work for the VM as well. In any case, we should skip an "incorrect" domain, but not crash.

Here is a nice approach to follow I think ... https://golang.org/src/net/dnsconfig_unix.go L86 .

Member

vladikr commented Apr 4, 2018

I think we should trust the pod's configuration more. If it worked for the pod it should work for the VM as well. In any case, we should skip an "incorrect" domain, but not crash.

Here is a nice approach to follow I think ... https://golang.org/src/net/dnsconfig_unix.go L86 .

@fabiand fabiand added this to the v1.0 milestone Apr 5, 2018

@fabiand fabiand changed the title from When trying to schedule VM, failing to run DHCP to Searchdomain crashes DHCP server Apr 5, 2018

@fabiand fabiand added the kind/bug label Apr 5, 2018

@mlsorensen

This comment has been minimized.

Show comment
Hide comment
@mlsorensen

mlsorensen Apr 5, 2018

Contributor

"openstacklocal' is indeed an invalid domain per RFC. How do we want to handle this?

Contributor

mlsorensen commented Apr 5, 2018

"openstacklocal' is indeed an invalid domain per RFC. How do we want to handle this?

@lukas-bednar

This comment has been minimized.

Show comment
Hide comment
@lukas-bednar

lukas-bednar Apr 5, 2018

Member

I would go in the way which @vladikr mentioned above.

In any case, we should skip an "incorrect" domain, but not crash.

Member

lukas-bednar commented Apr 5, 2018

I would go in the way which @vladikr mentioned above.

In any case, we should skip an "incorrect" domain, but not crash.

@mlsorensen

This comment has been minimized.

Show comment
Hide comment
@mlsorensen

mlsorensen Apr 5, 2018

Contributor

That would end us up with incorrect config. One would assume that if it is in the resolv.conf in the pod it needs to be in the VM as well, to match functionality.

I think perhaps the domain validation should be loosened to allow partial domains like this. 'com' on its own is not a valid domain, but it IS a valid search domain as searching assumes it will be prepended with something.

Contributor

mlsorensen commented Apr 5, 2018

That would end us up with incorrect config. One would assume that if it is in the resolv.conf in the pod it needs to be in the VM as well, to match functionality.

I think perhaps the domain validation should be loosened to allow partial domains like this. 'com' on its own is not a valid domain, but it IS a valid search domain as searching assumes it will be prepended with something.

@mlsorensen

This comment has been minimized.

Show comment
Hide comment
@mlsorensen

mlsorensen Apr 5, 2018

Contributor

Please review the commit and see if it's sufficient. I can add a passthrough to skip domains that are not valid if that's the consensus. I do think that it should react the same way to bad data in resolv.conf as it would if it hit a nil IP. In this case it didn't invisibly provide the user with a vaguely misconfigured VM to troubleshoot, and printed the problem clearly in the launcher log. That's what I would want.

Contributor

mlsorensen commented Apr 5, 2018

Please review the commit and see if it's sufficient. I can add a passthrough to skip domains that are not valid if that's the consensus. I do think that it should react the same way to bad data in resolv.conf as it would if it hit a nil IP. In this case it didn't invisibly provide the user with a vaguely misconfigured VM to troubleshoot, and printed the problem clearly in the launcher log. That's what I would want.

@lukas-bednar

This comment has been minimized.

Show comment
Hide comment
@lukas-bednar

lukas-bednar Apr 6, 2018

Member

@mlsorensen Thanks, your fix solved my problem!

Member

lukas-bednar commented Apr 6, 2018

@mlsorensen Thanks, your fix solved my problem!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment