Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop using a different network namespace #242

Merged
merged 4 commits into from
Jul 18, 2017

Conversation

berrange
Copy link
Contributor

@berrange berrange commented Jun 6, 2017

As a first step towards refactoring the VM startup process so that a QEMU wrapper script is no longer used, this stop using the network namespace of the POD. The SPICE listen address is now associated with the libvirtd POD ip addresses. The SPICE console API is updated to reflect this too, so that still works.

Eventually it might be possible to have some kind of connection forwarder running in the VM POD that connects to the QEMU server in the libvirtd namespace, thus allowing users to connect to SPICE using a virtual service IP assigned to the POD. This is an exercise for future work though.

@kubevirt-bot
Copy link
Contributor

Can one of the admins verify this patch?

@fabiand
Copy link
Member

fabiand commented Jun 6, 2017

Eventually it might be possible to have some kind of connection forwarder running in the VM POD that connects to the QEMU server in the libvirtd namespace, thus allowing users to connect to SPICE using a virtual service IP assigned to the POD. This is an exercise for future work though.

Uh - That's a nice idea - IIRC it was floating around at some point - to keep the k8s feeling.

/cc @rmohr

// to a local UDP socket lets us figure out the IP addr.
// If libvirtd were to move into a private network
// namespace, then we can instead switch to query
// the Status.PodIP field from the libvirtd POD.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to perform the switch to pod networking rather sooner than later.

This will directly break network connectivity, as we'll only have "ipd endpoints" inside the pod, and not a complete interface/L2 connectivity.

We might be able to workaround this by using macvlan/bridge or alike.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fabiand Are you working with the assumption, that it is ok if the libvirt container dies and all VMs go away?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In short, yes - for now.

I've also summarized this in #196 (comment).
Right now - in this early development phase -it is okay to me to live with this fact that VMs are getting killed.
OTOH I'd work with k8s to find a way of how this could be solved on the k8s layer - as this problem might not be kubevirt specific.

@fabiand fabiand requested a review from rmohr June 6, 2017 18:51
Copy link
Member

@rmohr rmohr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that there are a few drawback, which I don't really see discussed anywhere:

  • If the libvirt pod uses host network namespace, you would have to care about the firewall on the host for spice connections.
  • If the libvirt pod uses the pod network namespace, you run into the situation, that if you use a networking solution which actually enforces the containerPort mapping (I think e.g. open shift does), you can't specify the spice ports in advance and therefore can't reach the spice endpoint via plain tcp.
  • Since switching to the pod network for libvirt seems to be discussed, how do you plan to handle libvirt restarts/crashes? What should happen then to the VMs? What should happen with the pod network namespace?

Further, I really need reasonable arguments, why the namespaces are considered to be such a bad thing by you. Using emulators in production is bad and libvirt will never support namespaces, like you said before, is not really a good argument why a solution involving namespaces in general is bad. I am definitely open for working without the network namespaces, if there is an equally well working and with Kubernetes integrated solution proposed. As it is done here, I see it as a huge step back, while at the same time not showing how this should be better on the long run.

pkill -P \$! --signal SIG\$1
}

trap "_term TERM" TERM
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you removing the traps? When libvirt is running inside a container, we have to forward them because of the PID namespace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The traps should be redundant - k8s/docker will kill every process in the namespace which includes both the qemu-kube wrapper & real QEMU. Now we no longer need to cleanup the resolv.conf file, we don't need to catch the signals

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a functional test which tests if it works. Let me see.

// to a local UDP socket lets us figure out the IP addr.
// If libvirtd were to move into a private network
// namespace, then we can instead switch to query
// the Status.PodIP field from the libvirtd POD.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fabiand Are you working with the assumption, that it is ok if the libvirt container dies and all VMs go away?

spice := v1.NewSpice(vm.GetObjectMeta().GetName())
spice.Info = v1.SpiceInfo{
Type: "spice",
Host: ip,
Port: port,
Host: d.Host,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make seamless spice work during migrations, you also have to change things at https://github.com/kubevirt/kubevirt/blob/master/pkg/virt-controller/services/template.go#L133. Otherwise during the migration, the spice connection would be forwarded to the pod ip, which is no longer the case in this PR.

@berrange
Copy link
Contributor Author

berrange commented Jun 7, 2017

As explained in the first commit message, what it done with moving namespaces for QEMU is guaranteed to break with future libvirt + QEMU, when libvirt switches over to passing a pre-opened SPICE/VNC listener connection down to QEMU instead of letting QEMU open it itself. This alone makes the current approach untenable - it is relying on an assumption of the current implementation of QEMU which is not going to be valid across future releases. IOW we have no option but to run QEMU & libvirtd in the same namespaces if we want to avoid our architecture being broken by future libvirt/qemu.

@berrange
Copy link
Contributor Author

berrange commented Jun 7, 2017

Pushed update to kill the --pod-ip stuff during migration since it was never even needed given the original listen address is always 0.0.0.0

@rmohr
Copy link
Member

rmohr commented Jun 7, 2017

@berrange it was needed, since the original qemu instance needed to know to which IP to redirect the traffic.

@rmohr
Copy link
Member

rmohr commented Jun 7, 2017

@berrange I understand the this can break because of libvirt changes (and obviously there are already some changes in progess which will do that), but that is not what I have been asking for. Like with your ideas on how to add libvirt features to ensure that a VM is bound to a running Pod, libvirt could be changed to support namespaces in some way, or libvirt could allow to still let qemu do the work. The question is, why it is from your point of view definitely the wrong track to integrate with kubernetes namespaces, and how do you plan to integrate with kubernetes instead?

@berrange
Copy link
Contributor Author

berrange commented Jun 7, 2017

Breaking the seemless migration support in unavoidable right now since the controller doesn't know the IP address it'll be listening on the target host. It would be possible to fix this (and some other problems related to migration configuration), but that'll require refactoring other areas of code, so that has to wait. It is already going to be painful to get changes to switch over to the new VM startup process between the handler & launcher, without having to refactor migration code at the same time. So breaking seemless migration is an worthwhile tradeoff to getting the architectural stuff working better.

If we want to expose SPICE via the POD IP address then it would be necessary to write some kind of forwarding service that runs a listener in the POD and when clients connect, proxy traffic onto QEMU's listener. This could be a dumb proxy that actually copies data, but it might be possible to write a more intelligent proxy that actually just passes the pre-accepted client socket FD into QEMU via libvirt. That is something worth exploring in future. I'm not sure it si worth building such a solution though. Instead it would be better to focus on providing a HTTP connect proxy and/or websockets proxy, at which point you just have a single public facing IP for all VMs, and the particular VM is determined by the authentication cookie provided to the proxy. Both these approaches have implications for seemless migration because the IP address the user connects to is completely different from the IP address QEMU listens on. So for seemless migration to work, we actually need to pass in an IP address to the migrate command manually instead of triggering it off the listen address. So that's another reason why I don't think it is worth trying to make seemless migration work right now

@fabiand
Copy link
Member

fabiand commented Jun 7, 2017

To follow up, as their are good questions:

If the libvirt pod uses host network namespace, you would have to care about the firewall on the host for spice connections.

True

If the libvirt pod uses the pod network namespace, you run into the situation, that if you use a networking solution which actually enforces the containerPort mapping (I think e.g. open shift does), you can't specify the spice ports in advance and therefore can't reach the spice endpoint via plain tcp.

According to https://kubernetes.io/docs/concepts/cluster-administration/networking/ it is a reliable assumption that "all containers can communicate with all other containers without NAT".
This means that as long as libvirt is in a pod it can open whatever ports, and every other pod can reach this (libvirt) port.
The remaining gap is to see how we allow (external to the cluster) clients to access this port.
And here a border proxy solution is possible.

Since switching to the pod network for libvirt seems to be discussed, how do you plan to handle libvirt restarts/crashes? What should happen then to the VMs? What should happen with the pod network namespace?

We can use systemd if we want to cover the case where libvirt is crashing.
However, this does not cover i.e. if the container crashes or if the container needs an update.
Actually, for the latter - the controlled container upgrade - there I'd expect that we make the assumption that the node is getting evacuated before this.
This leaves us with the case where the kubelet/container crashes - and that's something we for sure want to deal with.

There are many open questions, but we can slowly walk through them and still have the freedom to give different solutions a try.

@admiyo
Copy link
Contributor

admiyo commented Jun 7, 2017

Before reviewing this, I want to know if removing the Namespace is the long term approach. My initial reaction is that it is a mistake. The container should have a very specific network namespace setup that the Virtual machine should inherit.

I think that is what you are targetting, and this is just removing the explicit namespace entry from the front end, but the net effect of breaking a working feature is, in my opinion, unacceptable. A commit like this needs to be atomic enough that we keep moving first.

As the corollary to Murphy's law says: if it is stupid but it works, its not stupid.

Is there any good justification for making this an interim step as opposed to part of a larger rengineering that gets us all the way to the VM launch script-removal?

@rmohr
Copy link
Member

rmohr commented Jun 8, 2017

@fabiand

According to https://kubernetes.io/docs/concepts/cluster-administration/networking/ it is a reliable assumption that "all containers can communicate with all other containers without NAT".
This means that as long as libvirt is in a pod it can open whatever ports, and every other pod can reach this (libvirt) port.
The remaining gap is to see how we allow (external to the cluster) clients to access this port.
And here a border proxy solution is possible.

That is not about NAT, it is about iptable rules on the interface in the container.

@rmohr
Copy link
Member

rmohr commented Jun 8, 2017

@berrange

It is already going to be painful to get changes to switch over to the new VM startup process between the handler & launcher, without having to refactor migration code at the same time

as far as I understand your idea of changing the VM start flow, it is not related at all to the namespace usage. So you could do your socket detection - startup, ... first without even thinking about the namespaces. Then you could remove the pid namspace and then you can think about the network namespace and provide an equally capable replacement for that.

@rmohr
Copy link
Member

rmohr commented Jun 8, 2017

@berrange I understand the this can break because of libvirt changes (and obviously there are already some changes in progess which will do that), but that is not what I have been asking for. Like with your ideas on how to add libvirt features to ensure that a VM is bound to a running Pod, libvirt could be changed to support namespaces in some way, or libvirt could allow to still let qemu do the work. The question is, why it is from your point of view definitely the wrong track to integrate with kubernetes namespaces, and how do you plan to integrate with kubernetes instead?

@berrange Still waiting for the justifications I asked for here. Neither the startup solution you try to accomplish, nor the current one are perfect. For both you have to either change libvirt and/or kubernetes, so it is not just like we are now going to the definitive superior solution and I want to be sure that this refactoring is not just for the sake of refactoring. Further, we did not even talk about other solutions at all.

@rmohr
Copy link
Member

rmohr commented Jun 8, 2017

@berrange @fabiand To say this clearly, there is no agreement between us maintainers at all if the new startup process (removing namespaces and giving the user the control of creating a pod) is a good thing.

Further @berrange I am really missing that you are trying to convince people that your changes are good.
In your answers to my questions, you seem to assume that everyone already agreed to your idea. I still don't see the "why" answered. Because, stretching that argument again: Neither the current namespace solution, nor the envisioned solution by you bring us to a production ready solution. Both require changes in libvirt and/or kubernetes.

@fabiand
Copy link
Member

fabiand commented Jun 8, 2017

That is not about NAT, it is about iptable rules on the interface in the container.

Right - Now I see what you meant about enforcing containerPort.
But according to the docs containerPort is primarily informative (from https://kubernetes.io/docs/api-reference/v1.6/#container-v1-core):

List of ports to expose from the container. Exposing a port here gives the system additional information about the network connections a container uses, but is primarily informational. Not specifying a port here DOES NOT prevent that port from being exposed. Any port which is listening on the default "0.0.0.0" address inside a container will be accessible from the network. Cannot be updated.

And these are other words for what I meant above - pods (with pod networking) can reach all ports of all others pods (with pod networking). And I did not find anything in the OpenShift docs that Origin behaves different.

(Btw - We could question how valuable containerPort is, if the current guarantee is that anything bound to 0.0.0.0 will be accessible from the network …)

@fabiand
Copy link
Member

fabiand commented Jun 8, 2017

@berrange @fabiand To say this clearly, there is no agreement between us maintainers at all if the new startup process (removing namespaces and giving the user the control of creating a pod) is a good thing.

Let me look at it in two parts:
Removing namespaces - So far we have played with the wrappre to have a great integration with Kubernetes - as you know this allows us to connect to the graphical console of a vm using spice, through the pod's ip. This itself is nice in the context of kubernetes. But the problem with this approach is that we violate the assumptions of libvirt. libvirt is (today) built around the assumption that the daemon and all of it's children have the same vision of the world (i.e. no namespaces) - all the details of libvirt are model around this string assumption. Again: With our wrapper we undermine this assumption and there are no guarantees that our current approach will work in future or not.
By removing the namespaces (dropping the wrapper) we just respect this assumption and don't undermine it.
From another perspective this can be seen as focusing even more on libvirt.
After all I don't see so much of a problem with dropping the namespaces. WRT the pid namespace - there are issues about the killing of VMs anyway, thus it's not getting worse. WRT the network namespace - We will loose the abbility to connect to the console via the pod IP for the moment, so this is the only visible change - and there might even be an approach where we can restore this behavior without breaking libvirt's assumptions.
And I'd assume that if we see really strong reasons for having the VMs in namespaces, then I'd say that we need to see how this can be fixed on the libvirt side.
I.e. I personally like the idea of defining networks on the VM pod, and the VM is inheriting those networks (one way or the other) and in this idea I'd expect that the VM uses the pod network - which would require libvirt to be able to use the pod's network ns. But that#s currently not possible with libvirt, so I'm living with this fact.

User VM pod creation from my POV there are two main points on this topic

  • first, (IIUIC) the ability to attach a VM to any pod in Kubernetes (regardless how it was created) looks like it could be helpful in future
  • second, creating building blocks. the fact that a pod and a vm spec are enough to launch a vm (maybe limited with a simple pod spec) creates on building block. a controller would then technically just be a really valuable addition to support more complex use cases (i.e. map mem/cpu req to the pod, in future device reqs, …). Despite that we technically crate a little more independence, for me the controller would still be a part of a standard deployment, as it obviously does deliver additional functionality. And let me note that I also don't see to much of a conflict with the current implementation, it's just tearing the controller and the pd a little further apart.

@rmohr
Copy link
Member

rmohr commented Jun 16, 2017

@fabiand

Namespaces: especially the PID namespace expresses a need to bind the VM to the Pod lifecycle, and it did not work so great when libvirt was moved into the container, but works now (however so lightly ignoring how far away libvirt is from running suitable for us in a container, is another topic worth it's own thread). While this "undermines" as you say, libvirts expectations, it gives right now better guarantees than not using them (with an unchanged libvirt). That gives a better overall experience when using Kubevirt (even if it is only for the sake of making developing and testing on top of kubernetes more stable, and allowing us to already now define needs in our tests). Further this PR, without also using the Pod namespace for libvirt, breaks a few things, including referencing iscsi storage, which is exposed through services. I think it is fair and necessary to ask for a suitable replacement, before removing the workaround (and I am absolutely not claiming that it is more than that). If that replacement is then already something production ready, then I find this awesome.

User VM pod creation:

first, (IIUIC) the ability to attach a VM to any pod in Kubernetes (regardless how it was created) looks like it could be helpful in future

For what?

second, creating building blocks. the fact that a pod and a vm spec are enough to launch a vm (maybe limited with a simple pod spec) creates on building block. a controller would then technically just be a really valuable addition to support more complex use cases (i.e. map mem/cpu req to the pod, in future device reqs, …).

It definitely makes the easy use-case extremely hard to understand. One has to shape the Pod in the right way, so that it will work with it's VM spec. Furhter, while one could see our qemu process as just another application inside a Pod, and the VM spec as a config map, on first sight, it is not really the case. It just leads to strange situations, where we are then sometimes treating our VM as an application inside the Pod, and sometimes as a Kubernetes entitie, with it's own state. That makes it tremendously hard to implement features like VMPools, ... in a kubernetes like way. Whereas, if the VM is the atomic entity for KubeVirt, like e Pod is for Kubernetes, expressing the needs with all our controller is much more simple and is much more Kubernetes like.

I think, if there needs to be some flexibility in designing the VM Pod by the user (which is not even unlikely), it should be done through the VM (e.g. template field like in Deployments, or having extra fields which are passed through, ...).

@fabiand
Copy link
Member

fabiand commented Jun 16, 2017

WRT attaching a VM to a pod. I think we are making the issue bigger than it is.
To me it is not a question if we want a controller or not - we do want it because VM handling will be much more convenient on the cluster level. Users would be acting on the VM.
But allowing to spawn a VM without a controller is not contradicting this point at all. After all we are not suggesting to create the pods manually - we just keep the freedom to do this if necessary.

We could also say: I don't see a strong reason to build the controller into the critical chain to launch a VM.

WRT namespaces

Further this PR, without also using the Pod namespace for libvirt, breaks a few things, including referencing iscsi storage, which is exposed through services.

Sure - something to look at. I wonder if this is due to the fact that currently the libvirt pod is using host networking. And IIUIC it should be fixed by switching to using pod networking (as discussed in #160)

@fabiand
Copy link
Member

fabiand commented Jun 16, 2017

Last addition - WRT running VMs - Like with the networking - let's sketch the broader design (including stopped)

@rmohr
Copy link
Member

rmohr commented Jun 26, 2017

@berrange if you use the node ip (which we also use to connect to libvirt on the target node) as spice target IP, seamless spice should work fine. Regarding iscsi dns name resolution, could you first make sure that libvirt uses it's own network namespace instead of the host one? I think then we can go ahead with this patch.

@kubevirt kubevirt deleted a comment from kubevirt-bot Jun 28, 2017
@fabiand
Copy link
Member

fabiand commented Jun 29, 2017

#279 is related to this one, it enables pod networking for libvirt.

Copy link
Member

@fabiand fabiand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change would work for me, it supports #261 / #279, and probably #244.

@rmohr thoughts?

@berrange
Copy link
Contributor Author

@fabiand I'm not intending to make any change to libvirt POD networking in this PR. It is only about dealing with QEMU namespaces. I have a way to solve the kube DNS problem with QEMU without changing libvirt POD namespaces. So any change to libvirt POD networking can be totally independant.

@fabiand
Copy link
Member

fabiand commented Jun 30, 2017

wfm

@berrange
Copy link
Contributor Author

Updated with workaround for DNS problems & updates for migration

Copy link
Member

@rmohr rmohr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, took me some time to run the tests because of #286. So I see the following tests failing:

[Fail] Vmlifecycle New VM with a serial console given [It] should be allowed to connect to the console 
/home/rmohr/go/src/kubevirt.io/kubevirt/tests/utils.go:150

[Fail] Vmlifecycle New VM with a serial console given [It] should be returned that we are running cirros 
/home/rmohr/go/src/kubevirt.io/kubevirt/tests/utils.go:150

[Fail] VmMigration New Migration given [It] Should migrate the VM three times in a row 
/home/rmohr/go/src/kubevirt.io/kubevirt/tests/vm_migration_test.go:127

[Fail] Storage Given a VM and a directly connected Alpine LUN [It] should be successfully started by libvirt 
/home/rmohr/go/src/kubevirt.io/kubevirt/tests/utils.go:150

[Fail] Storage Given a VM and an Alpine PVC [It] should be successfully started by libvirt 
/home/rmohr/go/src/kubevirt.io/kubevirt/tests/utils.go:150

Checking the events

cluster/kubectl.sh get events --watch
2017-07-04 17:01:25 +0200 CEST   2017-07-04 17:01:25 +0200 CEST   1         testvmbss7r   VM                  Warning   SyncFailed   virt-handler, node0   Unable to resolve host 'iscsi-demo-target.default.svc.cluster.local': lookup iscsi-demo-target.default.svc.cluster.local: no such hos

I see that resolving the iscsi addresses does not work and that teh VM are stuck in the Scheduling phase. At least some, but most likely all of them are failing because virt-handler uses the host network namespace. Something I did not think about, when looking at the DNS workaround to resolve the storage IPs.

We could stay in the Pod Network namespace with virt-handler, if we add some logic, which resolves the console connection call from virt-api to virt-handler to a Pod IP, instead of a host IP. That would be fairly simple.

However, with #160 and #279 in mind (which all require virt-handler to run CNI code in the host network namespace), and the fact that we want to switch libvirt over to the Pod network namespace anyway, I would suggest to use the Pod network namespace for libvirt instead, like suggested in #279.

@berrange
Copy link
Contributor Author

berrange commented Jul 4, 2017

I want to avoid a dependancy on the changes to move libvirtd into a private network namespace, because that's such a large topic of discussion that I don't think we'll be in a position to make such a move for a while yet, which would thus block this PR for long time potentially.

@rmohr
Copy link
Member

rmohr commented Jul 4, 2017

Good point. Calling CNI from virt-handler and the libvirt network namespace switch seem to be closely related. Thus I would go with the console lookup change, since the need to run virt-handler withing the host namespace, only arises with the CNI integration.

At that line https://github.com/kubevirt/kubevirt/blob/master/pkg/virt-api/rest/console.go#L91, a REST lookup woud be needed, to get the Pod IP of the virt-handler running on the target node. That should be the only dependency.

@berrange
Copy link
Contributor Author

berrange commented Jul 4, 2017

Ok, lemme see if I can fix the console and move virt-handler to a pod ns

@berrange
Copy link
Contributor Author

When I debugged the tests locally, I found the key problem was that my Hostname -> IP address subsitution was only done for disks that were backed by PVCs. The test VM was instead using a disk that directly referenced an iSCSI server with no PVC involved. I added extra code to handle IP address conversion for that too. The only test that failed after that was the storage_test.go, because it was checking for the IP addresses of the virt-launcher POD as a iSCSI client, where it needs to now be check the IP addresses of the libvirtd POD as an iSCSI client.

@berrange
Copy link
Contributor Author

Updated the branch with the fixes mentioned

Copy link
Member

@fabiand fabiand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question


vm.Status.Graphics = []v1.VMGraphics{}

podIP, err := d.getVMNodeAddress(vm)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it's podIP, but you get the nodeIP.

I'd assume that we want the pod IP once libvrit uses pod networking, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if libvirtd were to be moved into a private net namespace this would need to be the virtual POD IP addr, instead of node IP addr.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right - so we probably need to do this once that move is done. but this is outside of scope of this PR iiuic.

Copy link
Member

@rmohr rmohr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Console tests are still failing in the vagrant environment. Could be that they are passing for you because of a different network setup. Would be interesting to see why it works for you.

Migrations fail too. See in the comments why.


# Migrate
virsh -c $SOURCE migrate --xml $VM.xml $VM $DEST tcp://$POD_IP
virsh -c $SOURCE migrate --xml $VM.xml $VM $DEST spice://$NODE_IP
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change makes the migration tests fail:

error: argument unsupported: unsupported scheme spice in migration URI spice://192.168.121.186
``

@berrange berrange force-pushed the namespaces branch 2 times, most recently from adc225d to f8e98a7 Compare July 14, 2017 09:33
@berrange
Copy link
Contributor Author

Hopefully fixed console & mig problems, and resolved conflicts

Copy link
Member

@fabiand fabiand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming that the bugs will be sresolved at some point this lgtm.

@rmohr
Copy link
Member

rmohr commented Jul 17, 2017

Ok, so the traps are still needed. I had to apply

diff --git a/images/libvirt-kubevirt/qemu-kube b/images/libvirt-kubevirt/qemu-kube
index c707317..7307a42 100755
--- a/images/libvirt-kubevirt/qemu-kube
+++ b/images/libvirt-kubevirt/qemu-kube
@@ -98,6 +98,20 @@ set -e
 # Start the qemu process in the cgroups of the docker container
 # to adhere to the resource limitations of the container.
 exec sudo -C 10000 unshare --mount bash -s << END
+
+  function _term() {
+    pkill -P \$! --signal SIG\$1
+  }
+
+  trap "_term TERM" TERM
+  trap "_term INT" INT
+  trap "_term HUP" HUP
+  trap "_term QUIT" QUIT
+  trap "_term TERM" EXIT
+  trap "_term TERM" ERR
+
   cgclassify -g ${CGROUPS}:$CGROUP_PATH --sticky \$\$
-  exec nsenter -t $CONTAINER_PID -p $CMD
+  nsenter -t $CONTAINER_PID -p $CMD &
+
+  wait
 END
```.

Otherwise the qemu process stayed alive on the source system, and migrating back there was not possible.

for _, pod := range pods.Items {
if pod.ObjectMeta.DeletionTimestamp == nil {
hostIP = pod.Status.HostIP
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check for the actual IP, now gets very difficult. Depending on the kube proxy mode, in which network namespace the qemu process is, which network provider is used, or if just bridging is used, we always see different IPs connecting on the iscsi target. The BeforeEach block already makes sure that there is no connection to the iscsi target before each test. Maybe we should just check if a connection appears, and not try to check for specific IPs.

Record the VNC/SPICE graphics port details against the
VM object, under the VMStatus struct. This gets populated
with the (potentially dynamically allocated) QEMU port
for VNC/SPICE, and the IP address seen from virt-handler,
which is assumed to be the same as the IP seen from
libvirtd, given both are in the host net namespace.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
@berrange
Copy link
Contributor Author

It seems that bash is not honouring the 'exec' statement when using !#/bin/sh, but changing to !#/bin/bash causes other problems with the wrapper script.

Moving QEMU into a different network namespaces is not a
supported usage of libvirt and will break in future. For
example when libvirt switches to passing pre-opened for
the listener socket for VNC/SPICE backends, the listening
socket will now be on an IP associated with the libvirtd
POD, despite QEMU being moved into a different network
namespace.

With the network namespace switch removed, the SPICE API
has to be updated to reflect the new IP, and so this info
is obtained from the VM status graphics data added in the
previous commit. The code for changing SPICE IP address
during migration is removed on the assumption that the
SPICE server is set to listen on 0.0.0.0. This is the
implicit default set by the virt-controller.

Since the libvirtd POD is currently using the host network
namespace, it cannot resolve k8s DNS names. Thus when we
configure QEMU to access iSCSI servers, we must resolve
the DNS name to an IP address. To enable this virt-handler
must stay in a POD private network namespace.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
@berrange
Copy link
Contributor Author

Pushed an update

Copy link
Member

@rmohr rmohr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@berrange updated the storaget tests, to work in the vagrant and your local deployment. LGTM now.

In many cases, we don't see the Pod IP connecting. Therefore, remove all
workarounds, to cope with these scenarios, and only check if a new
connection appears when the VM is started. This simplifies the tests and
is almost as good, since the BeforeEach block makes sure that no
connection exists before each test.

Signed-off-by: Roman Mohr <rmohr@redhat.com>
@berrange berrange merged commit 02589cf into kubevirt:master Jul 18, 2017
@berrange berrange deleted the namespaces branch July 18, 2017 13:33
kubevirt-bot pushed a commit to kubevirt-bot/kubevirt that referenced this pull request Nov 6, 2020
gofmt'd cmd, pkg, and test dirs (no change in cmd)
kubevirt-bot pushed a commit to kubevirt-bot/kubevirt that referenced this pull request Dec 7, 2021
* Configure br_netfilter to fix DNS

Unless the knobs are enabled, DNS replies originate from a wrong
IP address, which breaks in-pod DNS resolution.

Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com>

* Load br_netfilter before setting its sysctl knobs

There's a chance the module is not loaded at all.

Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com>

* Use sysctl instead of /proc to set br_netfilter knobs

Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants