Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decentralized pod networking #686

Merged
merged 16 commits into from Feb 20, 2018

Conversation

Projects
None yet
5 participants
@vladikr
Copy link
Member

vladikr commented Jan 29, 2018

TBD

@vladikr vladikr force-pushed the vladikr:decentralized_pod_networking branch from 393e2aa to f745ea1 Jan 29, 2018

@rmohr rmohr referenced this pull request Jan 31, 2018

Closed

Pod Networking #450


// Start DHCP
go func() {
dhcp.SingleClientDHCPServer(

This comment has been minimized.

@rmohr

rmohr Jan 31, 2018

Member

I am missing error handling here. We at least want to log something, maybe even stop the whole pod?

This comment has been minimized.

@rmohr

rmohr Feb 5, 2018

Member

I still see no error handling in the latest code change. Will you address that?

@rmohr

This comment has been minimized.

Copy link
Member

rmohr commented Jan 31, 2018

I think that it would make sense to introduce mac/ip again in the vm status with this pr. @vladikr what do you think?

@davidvossel

This comment has been minimized.

Copy link
Member

davidvossel commented Jan 31, 2018

I think that it would make sense to introduce mac/ip again in the vm status with this pr. @vladikr what do you think?

I think putting the IP address in the Status section makes a lot of sense. The IP address is actually something we know before the VM gets scheduled. We can write the IP address to the Status field before assigning it to a node even in virt-controller.

I'm uncertain if reporting the MAC in the VM's status matters. I don't see a reason to expose that unless we have a use case. I can think of lots of use-cases for the IP address though.

@davidvossel

This comment has been minimized.

Copy link
Member

davidvossel commented Jan 31, 2018

The DHCP server looks fairly straight forward to unit test. We can just start the server and send requests to it.

Have you thought through any possibilities for unit testing the "SetupDefaultPodNetwork" function?

The only thing I could come up with after looking at it for a bit is for us to make our own go interface that wraps the netlink functions, and then mockgen our interface for the unit tests.

@vladikr

This comment has been minimized.

Copy link
Member Author

vladikr commented Feb 1, 2018

I think that it would make sense to introduce mac/ip again in the vm status with this pr. @vladikr what do you think?

Sure, good point.

@vladikr

This comment has been minimized.

Copy link
Member Author

vladikr commented Feb 1, 2018

Have you thought through any possibilities for unit testing the "SetupDefaultPodNetwork" function?

Yea, I'm working it now. Exactly as you said, trying to put all the networking methods under an interface and mock it.

@rmohr

This comment has been minimized.

Copy link
Member

rmohr commented Feb 1, 2018

I'm uncertain if reporting the MAC in the VM's status matters. I don't see a reason to expose that unless we have a use case. I can think of lots of use-cases for the IP address though.

I can think of a lot of use-cases for non-pod networks. Only adding the IP right now works for me.

}

// Set interface link to down to change its MAC address
err = netlink.LinkSetDown(link)

This comment has been minimized.

@rmohr

rmohr Feb 1, 2018

Member

Since we are changing the pod interface, we either first need to collect all data about the interface, and save it in an extra file for retries (that is what the code right now tries to do), or we say that a pod failed, if we can't set this up (exit with non-zero exit code).

This comment has been minimized.

@rmohr

rmohr Feb 5, 2018

Member

@vladikr is that addressed?

@rmohr

This comment has been minimized.

Copy link
Member

rmohr commented Feb 1, 2018

#663 is merged, you can rebase. :)

@vladikr vladikr force-pushed the vladikr:decentralized_pod_networking branch from f745ea1 to 9f889f9 Feb 5, 2018

})

Context("on successful Network setup", func() {
It("define a new VIF", func() {

This comment has been minimized.

@davidvossel

davidvossel Feb 5, 2018

Member

nice test 👍

@vladikr vladikr force-pushed the vladikr:decentralized_pod_networking branch 2 times, most recently from b6e3cd3 to 94d7dc1 Feb 6, 2018

@rmohr

This comment has been minimized.

Copy link
Member

rmohr commented Feb 6, 2018

retest this please

@rmohr rmohr referenced this pull request Feb 6, 2018

Closed

No nic found in vms #34

@@ -238,6 +240,12 @@ type VirtualMachineCondition struct {
Message string `json:"message,omitempty"`
}

type VirtualMachineNetworkInterface struct {

This comment has been minimized.

@rmohr

rmohr Feb 6, 2018

Member

Documentation is missing on the struct and all fields.

@@ -238,6 +240,12 @@ type VirtualMachineCondition struct {
Message string `json:"message,omitempty"`
}

type VirtualMachineNetworkInterface struct {
Device string `json:"name,omitempty"`

This comment has been minimized.

@rmohr

rmohr Feb 6, 2018

Member

What is that for?

This comment has been minimized.

@vladikr

vladikr Feb 6, 2018

Author Member

Yea. I'll get rid of it.


l, err := dhcpConn.NewUDP4BoundListener(serverIface, ":67")
if err != nil {
panic(err)

This comment has been minimized.

@rmohr

rmohr Feb 6, 2018

Member

Can you just use an error channel?

This comment has been minimized.

@davidvossel

davidvossel Feb 6, 2018

Member

what would we do with the error channel in this case?

There's an entrypoint.sh script that's the actual pid 1 of the container. I'm going to add logic in that script to send SIGTERM and eventually SIGKILL to any qemu processes that could still be running as a result of virt-launcher panicking. This means we'll still have a sane shutdown for qemu even if virt-launcher segfaults/panics/etc...

This comment has been minimized.

@davidvossel

davidvossel Feb 6, 2018

Member

I added #713 to address the orphaned qemu process.

This comment has been minimized.

@rmohr

rmohr Feb 6, 2018

Member

my main problem was the error handling in general (throwing, catching, trowing again). The channel is not so important.

// panic in case the DHCP server failed during the vm creation
// but ignore dhcp errors when the vm is destroyed or shutting down
defer func() {
if err := recover(); err != nil && !vm.IsFinal() {

This comment has been minimized.

@rmohr

rmohr Feb 6, 2018

Member

I think an error channel would be much more straight forward.

This comment has been minimized.

@vladikr

vladikr Feb 6, 2018

Author Member

Ok, I'll rework it. I thought that it is a known pattern.
I went with this approach because it was easier to test.

@@ -194,6 +195,12 @@ func (l *LibvirtDomainManager) preStartHook(vm *v1.VirtualMachine, domain *api.D
}
}

// setup networking
err = network.SetupPodNetwork(vm, domain)

This comment has been minimized.

@rmohr

rmohr Feb 6, 2018

Member

That still looks like we would retry starting the vm. It is very likely that we can't configure the dhcp server correct anymore, one this failed since the network device is most likely already partly deconfigured. Retrying would be pointless.

This comment has been minimized.

@davidvossel

davidvossel Feb 6, 2018

Member

Ugh... you're right. It's technically possible this preStartHook to get called twice (for example, if there's an error defining the domain with libvirt after the preStartHook completes it will result in preStartHook occurring again next try)

As it's written now, a second call to SetupPodNetwork will 100% cause virt-launcher to panic.

Here's how I think we can fix this.

  1. Make SetupPodNetwork Idempotent. This could be done by caching the result at the end of SetupPodNetwork to a flat file and checking to see if that file exists at the beginning of the function. If the file exists, ensure that the domain object has the right info in it and fast succeed.

  2. Be certain that any error we return from SetupPodNetwork has the ability to recover in the event of a second execution of SetupPodNetwork. Any unrecoverable errors should result in virt-launcher exiting.

This comment has been minimized.

@rmohr

rmohr Feb 7, 2018

Member

Let's be pragmatic. virt-handler guarantees that it does not try to work on the same vm in parallel. That means we can just touch a file after we have collected the data. We can add something to this file or another file, once we successfully started the dhcpserver the first time. Later invocations can then just skip the network setup. The dhcpserver can panic if something goes wrong.

nic.MAC = mac

// Remove IP from POD interface
err = Handler.AddrDel(nicLink, &nic.IP)

This comment has been minimized.

@rmohr

rmohr Feb 6, 2018

Member

From this point on retrying is pointless.

&expect.BExp{R: "cirros login: "},
&expect.BSnd{S: "cirros\n"},
&expect.BExp{R: "Password: "},
&expect.BSnd{S: "cubswin:)\n"},

This comment has been minimized.

@rmohr

rmohr Feb 6, 2018

Member

Once you rebase on latest master that password is wrong. The new cirros image uses "gocubsgo", I think.

@vladikr vladikr force-pushed the vladikr:decentralized_pod_networking branch from 94d7dc1 to 429fb46 Feb 6, 2018

@davidvossel
Copy link
Member

davidvossel left a comment

We need to iterate a bit more on how we do the DHCP server.

@vladikr @rmohr What would you all think about a dhcp manager go routine that starts up when virt-launcher is initializing?

There would be a cmd channel and error channel used to communicate with the dhcp manager.

This dhcp manager would wait for a start command on the cmd channel. A start cmd would result in another goroutine launching with the dhcp.Serve(). Any additional calls to start the dhcp server would just fast succeed.

dhcp errors would get sent back on the errors channel. We'd have the ability to inspect the state of the actual VM process to decide how to proceed with handling the error.

var wg sync.WaitGroup

createAndLogin := func() *v1.VirtualMachine {
vm := tests.NewRandomVMWithEphemeralDiskAndUserdata("kubevirt/cirros-registry-disk-demo:devel", "#!/bin/bash\necho 'hello'\n")

This comment has been minimized.

@davidvossel

davidvossel Feb 6, 2018

Member

do you need a VM with userdata here or would a registry disk without cloud-init be okay? I don't see the userdata being a part of the test.

This comment has been minimized.

@vladikr

vladikr Feb 8, 2018

Author Member

This is needed to speed up the boot process. Without the userdata cirros image will wait for it for some time until it gives up.


func BeforeAll(fn func()) {
first := true
BeforeEach(func() {

This comment has been minimized.

@davidvossel

davidvossel Feb 6, 2018

Member

Ha, that's one way to do it. I guess ginkgo doesn't have a BeforeAll() function. I like the idea of setting up the environment in parallel before the tests start. It cuts down on test time quite a bit.

This comment has been minimized.

@rmohr

rmohr Feb 7, 2018

Member

Yes, ginkgo does not have it: onsi/ginkgo#70. I can see why it is not there (it focuses on parallel and randomized test runs), but it would still make sense for us.

serverAddr.IP,
nic.Gateway,
net.ParseIP(guestDNS),
); err != nil && !vm.IsFinal() {

This comment has been minimized.

@davidvossel

davidvossel Feb 6, 2018

Member

I don't think this is going to do what you're intending it to do.

The VM object passed in to this function won't reflect the actual state of the VM when the DHCP server exits.

This comment has been minimized.

@rmohr

rmohr Feb 7, 2018

Member

Why is the vm even needed here? We need to make sure that we don't even run that code again.

@vladikr vladikr changed the title [WIP] Decentralized pod networking Decentralized pod networking Feb 7, 2018

@rmohr

This comment has been minimized.

Copy link
Member

rmohr commented Feb 7, 2018

retest this please

@rmohr
Copy link
Member

rmohr left a comment

We need to iterate a bit more on how we do the DHCP server.

@vladikr @rmohr What would you all think about a dhcp manager go routine that starts up when virt-launcher is initializing?

I think we can pretty much leave it as it is. I think we need to make sure anyway that the network setup is either reentrant or that we skip it after the first successful run. I don't see any issue starting the dhcp server with the right config directly.


func BeforeAll(fn func()) {
first := true
BeforeEach(func() {

This comment has been minimized.

@rmohr

rmohr Feb 7, 2018

Member

Yes, ginkgo does not have it: onsi/ginkgo#70. I can see why it is not there (it focuses on parallel and randomized test runs), but it would still make sense for us.

serverAddr.IP,
nic.Gateway,
net.ParseIP(guestDNS),
); err != nil && !vm.IsFinal() {

This comment has been minimized.

@rmohr

rmohr Feb 7, 2018

Member

Why is the vm even needed here? We need to make sure that we don't even run that code again.

@@ -194,6 +195,12 @@ func (l *LibvirtDomainManager) preStartHook(vm *v1.VirtualMachine, domain *api.D
}
}

// setup networking
err = network.SetupPodNetwork(vm, domain)

This comment has been minimized.

@rmohr

rmohr Feb 7, 2018

Member

Let's be pragmatic. virt-handler guarantees that it does not try to work on the same vm in parallel. That means we can just touch a file after we have collected the data. We can add something to this file or another file, once we successfully started the dhcpserver the first time. Later invocations can then just skip the network setup. The dhcpserver can panic if something goes wrong.


// Update vm status with interface details
ifaceStatus := decorateInterfaceStatus(vif)
vm.Status.Interfaces = append(vm.Status.Interfaces, *ifaceStatus)

This comment has been minimized.

@davidvossel

davidvossel Feb 8, 2018

Member

Setting the interface here won't result in the updated VM object getting persisted on teh cluster.

This comment has been minimized.

@vladikr

vladikr Feb 8, 2018

Author Member

Setting the interface here won't result in the updated VM object getting persisted on teh cluster.

Yea, I did the change locally. I'll push soon

@vladikr vladikr force-pushed the vladikr:decentralized_pod_networking branch from 429fb46 to 569ba6d Feb 8, 2018

os.RemoveAll(tmpDir)
})

Context("on successful Network setup", func() {

This comment has been minimized.

@cynepco3hahue

cynepco3hahue Feb 12, 2018

Member

Can you remove Network word from context, so you will have test name
"Network on successful setup should define a new VIF"

})

Context("on successful Network setup", func() {
It("define a new VIF", func() {

This comment has been minimized.

@cynepco3hahue

cynepco3hahue Feb 12, 2018

Member

add "should"

@vladikr vladikr force-pushed the vladikr:decentralized_pod_networking branch from 45ee962 to 51f6291 Feb 15, 2018

@vladikr

This comment has been minimized.

Copy link
Member Author

vladikr commented Feb 15, 2018

retest this please

@vladikr vladikr force-pushed the vladikr:decentralized_pod_networking branch from 51f6291 to 1794deb Feb 16, 2018

@vladikr vladikr force-pushed the vladikr:decentralized_pod_networking branch 2 times, most recently from fe588b9 to ad11b66 Feb 19, 2018

@fabiand

This comment has been minimized.

Copy link
Member

fabiand commented Feb 19, 2018

retest this please

@vladikr vladikr force-pushed the vladikr:decentralized_pod_networking branch from ad11b66 to 495deec Feb 20, 2018

vladikr and others added some commits Jan 26, 2018

Add initial pod networking library
Co-authored-by: Petr Horáček <phoracek@redhat.com>
Signed-off-by: Vladik Romanovsky <vromanso@redhat.com>
Adjust the Interface struct domain api schema
Additional attributes are needed to make the Interface struct representation
more complete.

- Adding TrustGuestRxFilters as an attribute.
- Add missing Mode to the InterfaceSource

Signed-off-by: Vladik Romanovsky <vromanso@redhat.com>
Use network library for vm creation
Signed-off-by: Vladik Romanovsky <vromanso@redhat.com>
Remove HostNetworking from the vm container
Signed-off-by: Vladik Romanovsky <vromanso@redhat.com>
Add network dependencies to glide.
Signed-off-by: Vladik Romanovsky <vromanso@redhat.com>
Adjust network interface configuration
Signed-off-by: Vladik Romanovsky <vromanso@redhat.com>
tests: refactor the networking code for testing
Signed-off-by: Vladik Romanovsky <vromanso@redhat.com>
Update VM status with attributes of a created interface
Signed-off-by: Vladik Romanovsky <vromanso@redhat.com>
Handle dhcp server exceptions in virt-launcher
Signed-off-by: Vladik Romanovsky <vromanso@redhat.com>
glide update
Signed-off-by: Vladik Romanovsky <vromanso@redhat.com>
Do not try to restart a vm if pod networking is not setup correctly
Signed-off-by: Vladik Romanovsky <vromanso@redhat.com>
Functioal test to verify pod network conectivity
Co-authored-by: Roman Mohr <rmohr@redhat.com>
Signed-off-by: Vladik Romanovsky <vromanso@redhat.com>
Make SetPodNetwork safe for multiple calls
Signed-off-by: David Vossel <davidvossel@gmail.com>
move back to bridge mode
Signed-off-by: Vladik Romanovsky <vromanso@redhat.com>
glide.lock update
Signed-off-by: Vladik Romanovsky <vromanso@redhat.com>

@vladikr vladikr force-pushed the vladikr:decentralized_pod_networking branch from 495deec to 214f8b9 Feb 20, 2018

@davidvossel

This comment has been minimized.

Copy link
Member

davidvossel commented Feb 20, 2018

retest this please

@davidvossel
Copy link
Member

davidvossel left a comment

I found one minor thing when testing this locally that's just related to running the functional tests on a single node. I made a comment about how to address that.

For CI, jenkins has been able to verify that vm<->pod and node->vm connectivity works. However there's something going on that is preventing vm->internet from working. I believe that issue is most likely related to DNS on the jenkin's host machines. If the vm->internet connectivity continues to fail in jenkins, I'm in favor of disabling that test and following up on that issue in a separate PR. I've tested these changes locally in multiple cluster configurations and never encountered the connectivity issue Jenkins hit.

I think this patch series is in a mergable state at this point.

You've done an amazing job here @vladikr, well done!

table.DescribeTable("should be reachable via the propagated IP from a Pod", func(op v12.NodeSelectorOperator, hostNetwork bool) {

ip := inboundVM.Status.Interfaces[0].IP

This comment has been minimized.

@davidvossel

davidvossel Feb 20, 2018

Member

This test needs to be able to work in single node configurations. Just add this code block here.

                       //TODO if node count 1, skip whe nv12.NodeSelectorOpOut
                       nodes, err := virtClient.CoreV1().Nodes().List(v13.ListOptions{})
                       Expect(err).ToNot(HaveOccurred())
                       Expect(nodes.Items).ToNot(BeEmpty())
                       if len(nodes.Items) == 1 && op == v12.NodeSelectorOpNotIn {
                               Skip("Skip network test that requires multiple nodes when only one node is present.")
                       }

This comment has been minimized.

@vladikr

vladikr Feb 20, 2018

Author Member

Thank a lot! Will add that in.

Skip network test that requires multiple nodes when only one node is …
…present

Signed-off-by: Vladik Romanovsky <vromanso@redhat.com>

@davidvossel davidvossel merged commit 91025f9 into kubevirt:master Feb 20, 2018

0 of 4 checks passed

check-patch.el7.x86_64 Setting up test environment
Details
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
kubevirt-functional-tests/jenkins/pr/vagrant-release kubevirt-functional-tests/jenkins/pr vagrant-release
Details
standard-ci Running tests
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.