masquerade, migration: hardcode the bridge MAC addr #5000

maiqueb · 2021-02-10T12:10:45Z

What this PR does / why we need it:
Since the bridge MAC address no longer changes, the ARP tables do
not need to stabilize, which preserves connectivity from the
outside world to the guest.

This is tested in this commit since the Eventually block in the
vmi_networking_test for masquerade binding migration can be safely
removed.

Reserve the hardcoded mac address when masquerade binding is used; as such, throw an error whenever a user attempts to
specify that MAC on a masquerade interface.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #4999

Special notes for your reviewer:

Release note:

NONE

maiqueb · 2021-02-10T12:11:38Z

/assign @EdDev

maiqueb · 2021-02-10T12:27:58Z

/retest

EdDev

I like the simplicity, but then one point comes up: What reserves this static MAC address from being used by:

The pod interface (eth0).
A secondary interface CNI (multus).
The VM/VMI user on the interface spec.
The KMP.

EdDev · 2021-02-10T19:28:41Z

pkg/virt-launcher/virtwrap/network/podinterface.go

+// the hardcoded MAC must be as low as possible, to prevent libvirt from
+// assigning a lower MAC to a tap device attached to the bridge - which
+// would trigger the bridge's MAC to update. This also applies for the
+// dummy connected to the bridge on masquerade binding.


I am unable to recreate this on my 3.10.0-1160.6.1.el7.x86_64 kernel.
Once I set the bridge iface MAC, it is not changed if I attach a dummy.

@AlonaKaplan do you remember the reference we had for this ?

EdDev · 2021-02-10T19:30:20Z

pkg/virt-launcher/virtwrap/network/podinterface.go

+// assigning a lower MAC to a tap device attached to the bridge - which
+// would trigger the bridge's MAC to update. This also applies for the
+// dummy connected to the bridge on masquerade binding.
+const hardcodedMasqueradeMAC = "02:00:00:00:00:00"


How about staticMasqueradeBridgeMAC ?

maiqueb · 2021-02-12T12:48:21Z

I like the simplicity, but then one point comes up: What reserves this static MAC address from being used by:

The pod interface (eth0).

A secondary interface CNI (multus).

The VM/VMI user on the interface spec.

The KMP.

Nothing. In any of those.

maiqueb · 2021-02-19T15:22:07Z

@EdDev I'd like to point out that (unless my math is seriously wrong) this PR addresses exactly (2/(2^40))*100 % of the scenarios if multus is not involved, and ((2+M)/2^40 )*100 % of the scenarios for M multus interfaces.

I.e. this solves a real problem for an insanely high percentage of possibilities. Yes, it does have a couple of corner cases. I think we could document this limitation (not sure where / how).

In the current refactoring of the pod interfaces I'm planning, I intend to propose a way to also cache the bridge interface between the handler / launcher processes, which would allow us to revert this, and use whatever generated MAC the kernel happily assigned for us.

2^40 => number of possible MAC addresses if we force the first octect to be 02
2 + M => number of 'collision domains (pod interface + VM interface + M multus interfaces). I'm leaving KubeMacPool out of this, for now since I'm too ignorant to model it.

@RamLavi : one question: does kubemacpool allow you to 'black-list' mac addresses ? I.e. "dear kubemacpool; whatever you do, do not assign this particular MAC". This black list would preferably be 'scoped' per interface.

maiqueb · 2021-02-19T15:33:34Z

@EdDev I'd like to point out that (unless my math is seriously wrong) this PR addresses exactly (2/(2^40))*100 % of the scenarios if multus is not involved, and ((2+M)/2^40 )*100 % of the scenarios for M multus interfaces.

I.e. this solves a real problem for an insanely high percentage of possibilities. Yes, it does have a couple of corner cases. I think we could document this limitation (not sure where / how).

In the current refactoring of the pod interfaces I'm planning, I intend to propose a way to also cache the bridge interface between the handler / launcher processes, which would allow us to revert this, and use whatever generated MAC the kernel happily assigned for us.

2^40 => number of possible MAC addresses if we force the first octect to be 02
2 + M => number of 'collision domains (pod interface + VM interface + M multus interfaces). I'm leaving KubeMacPool out of this, for now since I'm too ignorant to model it.

@RamLavi : one question: does kubemacpool allow you to 'black-list' mac addresses ? I.e. "dear kubemacpool; whatever you do, do not assign this particular MAC". This black list would preferably be 'scoped' per interface.

If I understand correctly, in kubemacpool you have to specify the prefix from which you're (you as in KMP) assigning MACs. It would be a matter of using whatever prefix does not start w/ 02:00:00 . I think that is assumable .

RamLavi · 2021-02-21T12:53:15Z

@RamLavi : one question: does kubemacpool allow you to 'black-list' mac addresses ? I.e. "dear kubemacpool; whatever you do, do not assign this particular MAC". This black list would preferably be 'scoped' per interface.

@maiqueb, Kubemacpool assigns MACs from the pool, defined by the range set when KMP is deployed. BUT, it does allow you to set a MAC Address manually, that may/may not be outside of this range. In other words, there is no "black-list" currently implemented.

EdDev · 2021-02-21T13:55:44Z

@EdDev I'd like to point out that (unless my math is seriously wrong) this PR addresses exactly (2/(2^40))*100 % of the scenarios if multus is not involved, and ((2+M)/2^40 )*100 % of the scenarios for M multus interfaces.

I think it is enough to consider that some engineer will just think of the same address you managed to invent for similar reasons, ending up in a bad setup will little help from the system to understand what is going on.

Playing with probabilities can reason for concurrency bugs, leaks and other one-in-a-million issues. But I do not think it should be accepted.

I.e. this solves a real problem for an insanely high percentage of possibilities. Yes, it does have a couple of corner cases. I think we could document this limitation (not sure where / how).

When the poor operator and/or support engineer will have to debug this, I think we will have hard time to reason this.

At the moment a static MAC address is introduced and reserved, it makes sense to add a validation check to avoid MAC duplication.
Especially for cases where the MAC can be explicitly set by the user in the spec (including KMP).
In the context of this change, I think the minimum is to assert so no one will add this address on an interface.

maiqueb · 2021-02-22T13:40:02Z

@EdDev I'd like to point out that (unless my math is seriously wrong) this PR addresses exactly (2/(2^40))*100 % of the scenarios if multus is not involved, and ((2+M)/2^40 )*100 % of the scenarios for M multus interfaces.

I think it is enough to consider that some engineer will just think of the same address you managed to invent for similar reasons, ending up in a bad setup will little help from the system to understand what is going on.

Playing with probabilities can reason for concurrency bugs, leaks and other one-in-a-million issues. But I do not think it should be accepted.

I.e. this solves a real problem for an insanely high percentage of possibilities. Yes, it does have a couple of corner cases. I think we could document this limitation (not sure where / how).

When the poor operator and/or support engineer will have to debug this, I think we will have hard time to reason this.

At the moment a static MAC address is introduced and reserved, it makes sense to add a validation check to avoid MAC duplication.

Makes sense.

Especially for cases where the MAC can be explicitly set by the user in the spec (including KMP).
In the context of this change, I think the minimum is to assert so no one will add this address on an interface.

RamLavi

I'm afraid this may collide with Kubemacpool.
02:00:00:00:00:00 is inside KMP's default range (the first one actually), and although it is advised to change the range prefix to be random, we can't assume ppl will change that.

I think every mac reserving should also be supported on KMP's side, otherwise we might end up with this mac colliding with the mac on the virt-launcher pod bridge.

maiqueb · 2021-02-25T15:19:49Z

I'm afraid this may collide with Kubemacpool.
02:00:00:00:00:00 is inside KMP's default range (the first one actually), and although it is advised to change the range prefix to be random, we can't assume ppl will change that.

I think every mac reserving should also be supported on KMP's side, otherwise we might end up with this mac colliding with the mac on the virt-launcher pod bridge.

That's half the problem. How will KMP play along with anything that requires a reserved mac address ? My short understanding is: it doesn't . And that includes whatever it is the users have running on their VMs.

Furthermore, I'm unsure if we can assume KMP is a dependency for kubevirt. AFAIU, it is not.

@phoracek would you shed some light into this ? How can anything with a specific mac address requirement (for whatever weird reason) co-exist with KMP ? What's KubeVirt's take on this ?

EdDev · 2021-03-04T08:12:45Z

/sig network

phoracek · 2021-03-08T09:00:54Z

I'm afraid this may collide with Kubemacpool.
02:00:00:00:00:00 is inside KMP's default range (the first one actually), and although it is advised to change the range prefix to be random, we can't assume ppl will change that.
I think every mac reserving should also be supported on KMP's side, otherwise we might end up with this mac colliding with the mac on the virt-launcher pod bridge.

That's half the problem. How will KMP play along with anything that requires a reserved mac address ? My short understanding is: it doesn't . And that includes whatever it is the users have running on their VMs.

Furthermore, I'm unsure if we can assume KMP is a dependency for kubevirt. AFAIU, it is not.

Correct.

@phoracek would you shed some light into this ? How can anything with a specific mac address requirement (for whatever weird reason) co-exist with KMP ? What's KubeVirt's take on this ?

I believe that if the MAC is from the managed pool, we would either reject it (if taken) or register it as taken. If it is outside the pool, we don't care.

@RamLavi please keep me honest here.

maiqueb · 2021-04-14T13:12:43Z

@RamLavi / @phoracek / @EdDev let's re-start this discussion, the problem won't go away on its own.

AFAIU, we either need to implement a mechanism to force the bridge MAC to persist across migration / reboots, or hard-code it, and be done with the issue. If we hard-code, we'd need a mechanism in place to make sure that MAC is not used by the list in #5000 (review) .

It would require kubemacpool to have an exception list of sorts (I think it is far from a bad idea having it anyway).

I would rather hard-code it, since it is a lot simpler.

Is there any alternative I am not seeing ?

EdDev · 2021-04-14T15:02:26Z

@RamLavi / @phoracek / @EdDev let's re-start this discussion, the problem won't go away on its own.

AFAIU, we either need to implement a mechanism to force the bridge MAC to persist across migration / reboots, or hard-code it, and be done with the issue. If we hard-code, we'd need a mechanism in place to make sure that MAC is not used by the list in #5000 (review) .

It would require kubemacpool to have an exception list of sorts (I think it is far from a bad idea having it anyway).

I would rather hard-code it, since it is a lot simpler.

Is there any alternative I am not seeing ?

I would suggest to define a reserved range and place it in a configmap (on the kubevirt CR?).
Then whoever needs to sync on what this range is, will read it from there (kubevirt, kmp, kubevirtci services, etc).
It will have also the advantage that an operator can change this range if it somehow collides with some network devices in his network.
What do you think?

Then it is a matter of just using a mac in this range in our code (in our case, just the first address in the pool).

But I am also ok with starting a simple single IP reserved address in kubevirt. If anyone, including KMP, will try to allocate it, kubevirt should reject it.

RamLavi · 2021-04-20T12:49:19Z

pkg/virt-api/webhooks/validating-webhook/admitters/vmi-create-admitter_test.go

@@ -1450,6 +1450,28 @@ var _ = Describe("Validating VMICreate Admitter", func() {
 			Expect(len(causes)).To(Equal(1))
 			Expect(causes[0].Field).To(Equal("fake.domain.devices.interfaces[0].name"))
 		})
+		It("should reject a masquerade interface with a specified MAC address which is reserved by the BindMechanism", func() {
+			vm := v1.NewMinimalVMI("testvm")


this is a vmi, not a vm

Suggested change

vm := v1.NewMinimalVMI("testvm")

vmi := v1.NewMinimalVMI("testvmi")

RamLavi · 2021-04-20T12:51:28Z

I'm afraid this may collide with Kubemacpool.
02:00:00:00:00:00 is inside KMP's default range (the first one actually), and although it is advised to change the range prefix to be random, we can't assume ppl will change that.
I think every mac reserving should also be supported on KMP's side, otherwise we might end up with this mac colliding with the mac on the virt-launcher pod bridge.

That's half the problem. How will KMP play along with anything that requires a reserved mac address ? My short understanding is: it doesn't . And that includes whatever it is the users have running on their VMs.
Furthermore, I'm unsure if we can assume KMP is a dependency for kubevirt. AFAIU, it is not.

Correct.

@phoracek would you shed some light into this ? How can anything with a specific mac address requirement (for whatever weird reason) co-exist with KMP ? What's KubeVirt's take on this ?

I believe that if the MAC is from the managed pool, we would either reject it (if taken) or register it as taken. If it is outside the pool, we don't care.

@RamLavi please keep me honest here.

well since KMP only monitors the vm and the change proposed seems to hardcode the mac to the vmi, I fear that KMP won't catch/reject/register it.

maiqueb · 2021-05-05T15:28:54Z

@RamLavi / @phoracek / @EdDev let's re-start this discussion, the problem won't go away on its own.
AFAIU, we either need to implement a mechanism to force the bridge MAC to persist across migration / reboots, or hard-code it, and be done with the issue. If we hard-code, we'd need a mechanism in place to make sure that MAC is not used by the list in #5000 (review) .
It would require kubemacpool to have an exception list of sorts (I think it is far from a bad idea having it anyway).
I would rather hard-code it, since it is a lot simpler.
Is there any alternative I am not seeing ?

I would suggest to define a reserved range and place it in a configmap (on the kubevirt CR?).
Then whoever needs to sync on what this range is, will read it from there (kubevirt, kmp, kubevirtci services, etc).
It will have also the advantage that an operator can change this range if it somehow collides with some network devices in his network.
What do you think?

Then it is a matter of just using a mac in this range in our code (in our case, just the first address in the pool).

But I am also ok with starting a simple single IP reserved address in kubevirt. If anyone, including KMP, will try to allocate it, kubevirt should reject it.

The current proposal does this; we can then build on this, and afterwards introduce a range of reserved MACs on kubevirt CR.

maiqueb · 2021-05-06T07:13:17Z

/retest

oshoval

Thanks, please see inside

pkg/virt-api/webhooks/validating-webhook/admitters/vmi-create-admitter_test.go

pkg/network/reserved_macs.go

EdDev · 2021-05-12T08:58:36Z

tests/network/vmi_networking.go

-
-					return nil
-				}, 120*time.Second).Should(Succeed())
+				Expect(ping(podIP)).To(Succeed())


Is it possible to check the mac address of the bridge on the virt-launcher pod?
If will be ugly (because we do not assert the API, but some inner detail), but depending on this ping timing is unsafe.

I don't follow the meaning of unsafe here; the point of preserving the MAC between migrations is to remove the need for the ARP tables to stabilize, which should preserve connectivity from the outside world to the guest, once the VM is considered running.

Is there something I am missing here ?

As a side note, yes, it is possible to check the mac of the bridge. And yes, it is ugly. And it doesn't prove anything about preserving the connectivity of the VM once migrated, which is this PR's objective.

EDIT: above I mean that "preserving the connectivity of the VM once migrated" is the issue this PR is trying to fix.

FWIW, I'm running these 2 tests in a loop w/ untilItFails flag.

$ FUNC_TEST_ARGS='-untilItFails -race' make functest ... Ran 2 of 813 Specs in 197.550 seconds SUCCESS! -- 2 Passed | 0 Failed | 0 Pending | 811 Skipped PASS | FOCUSED All tests passed... Will keep running them until they fail. This was attempt #19 Maybe you should stop now? ... Ran 2 of 813 Specs in 188.358 seconds SUCCESS! -- 2 Passed | 0 Failed | 0 Pending | 811 Skipped PASS | FOCUSED All tests passed... Will keep running them until they fail. This was attempt #25 Dave, this conversation can serve no purpose anymore. Goodbye.

Is there something I am missing here ?

The ping may fail for other reasons, like guest CPU right after the migration or MAC address duplication on the switch (the mac appears on a different port, so it will not reach to the right destination).
I do not think it is safe to assume this is the only reason to have echo packets lost.

As a side note, yes, it is possible to check the mac of the bridge. And yes, it is ugly. And it doesn't prove anything about preserving the connectivity of the VM once migrated, which is this PR's objective.

EDIT: above I mean that "preserving the connectivity of the VM once migrated" is the issue this PR is trying to fix.

I would use "improve" not "solve" here. I am just pointing out that there may be other factors that affect a smooth migration and connectivity.

There is also a logical problem with what you are saying: If the mac is equivalent, we can be sure the ping will work fine (lets assume this is true). However, you cannot assume the other direction, right? So what protects us from someone breaking this feature eventually?

What feature ? That you migrate & immediately ping once the VM is reported as ready ?

This test does.

What test are you missing ? I don't think there's a need to test that we get an hard coded mac address from the bridge ...

I am just worried that we are not testing this behavior (that the mac is explicit set to a hard-coded address and is expected to be the same on all similar virt-launchers).

I guess my logical claim is this: If this will get broken and the mac address will no longer be the same as we expect, then the ping may do pass from time to time, becoming flaky and letting us wander what happened.

If you are not worry about that, then lets proceed, not covering this explicitly is not worse than what we have today.
If we will encounter this in the future, we can handle it then.

EdDev

I am good with this change, but some questions popped up and it will be nice to verify this before we proceed.

For the reserved mac, I would reserved it for all interfaces in general, no matter the binding. It is easier to do so and can simplify KMP logic. If we will need to reserve another mac in the future, then it will enter the same category and the logic will be kept simple.
WDYT?

pkg/virt-api/webhooks/validating-webhook/admitters/vmi-create-admitter_test.go

EdDev · 2021-05-13T08:43:06Z

pkg/virt-api/webhooks/validating-webhook/admitters/vmi-create-admitter.go

@@ -258,6 +259,8 @@ func validateInterfaceNetworkBasics(field *k8sfield.Path, networkExists bool, id
 		causes = appendStatusCauseForSlirpNotEnabled(field, causes, idx)
 	} else if iface.Masquerade != nil && networkData.Pod == nil {
 		causes = appendStatusCauseForMasqueradeWithourPodNetwork(field, causes, idx)
+	} else if iface.Masquerade != nil && network.IsReserved(iface.MacAddress) {


What will happen in case we have 1 interface with masq and the 2nd with bridge binding?
Initially I thought it may be a problem, but on the other hand it is on a different LAN. Will this cause problems with the ARP resolution? (I think we need to check this)

This also raised another point which I do not know how we can mitigate: If the secondary network is for example bridge-binding, this mac may just exists in the network (e.g. as the default-gateway), and in such a case we may also have a problem with the arp table (not sure if the ARP arp table accepts having a single mac mapped to two different IP addresses).

Can you check this?

What will happen in case we have 1 interface with masq and the 2nd with bridge binding?
Initially I thought it may be a problem, but on the other hand it is on a different LAN. Will this cause problems with the ARP resolution? (I think we need to check this)

I don't see how, since those are different L2 networks.

This also raised another point which I do not know how we can mitigate: If the secondary network is for example bridge-binding, this mac may just exists in the network (e.g. as the default-gateway), and in such a case we may also have a problem with the arp table (not sure if the ARP arp table accepts having a single mac mapped to two different IP addresses).

Again, I don't see how. And furthermore, I don't see how this differs from a standard problem of having the same mac address in different L2 networks, which, I don't think is a problem.

Can you check this?

Never the less, I can. Below you can find the spec I used; it uses 2 VMs, both with the masquerade pod interface and a secondary macvtap network.

apiVersion: k8s.cni.cncf.io/v1 kind: NetworkAttachmentDefinition metadata: name: macvtap annotations: k8s.v1.cni.cncf.io/resourceName: macvtap.network.kubevirt.io/eth0 spec: config: '{ "cniVersion": "0.3.1", "type": "macvtap", "mtu": 1500 }' --- apiVersion: kubevirt.io/v1alpha3 kind: VirtualMachineInstance metadata: name: vm1 spec: domain: devices: disks: - disk: bus: virtio name: rootfs - disk: bus: virtio name: cloudinit interfaces: - masquerade: {} name: default ports: - name: ssh port: 22 - macvtap: {} name: macvtap macAddress: 02:00:00:00:00:00 machine: type: "" resources: requests: memory: 1024M networks: - name: default pod: {} - multus: networkName: macvtap name: macvtap terminationGracePeriodSeconds: 0 volumes: - containerDisk: image: registry:5000/kubevirt/fedora-cloud-container-disk-demo:devel name: rootfs - cloudInitNoCloud: userData: | #!/bin/bash echo "fedora" |passwd fedora --stdin dhclient eth1 name: cloudinit --- apiVersion: kubevirt.io/v1alpha3 kind: VirtualMachineInstance metadata: name: vm2 spec: domain: devices: disks: - disk: bus: virtio name: rootfs - disk: bus: virtio name: cloudinit interfaces: - masquerade: {} name: default ports: - name: ssh port: 22 - macvtap: {} name: macvtap machine: type: "" resources: requests: memory: 1024M networks: - name: default pod: {} - multus: networkName: macvtap name: macvtap terminationGracePeriodSeconds: 0 volumes: - containerDisk: image: registry:5000/kubevirt/fedora-cloud-container-disk-demo:devel name: rootfs - cloudInitNoCloud: userData: | #!/bin/bash echo "fedora" |passwd fedora --stdin dhclient eth1 name: cloudinit

You can find below the macs / ips for the 2 VMs, along with their neighbor tables.

VM1

$ [fedora@vm1 ~]$ ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc fq_codel state UP group default qlen 1000 link/ether 52:54:00:19:fa:ee brd ff:ff:ff:ff:ff:ff altname enp1s0 inet 10.0.2.2/24 brd 10.0.2.255 scope global dynamic noprefixroute eth0 valid_lft 86296654sec preferred_lft 86296654sec inet6 fe80::5054:ff:fe19:faee/64 scope link valid_lft forever preferred_lft forever 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 02:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff <===== same MAC the gateway of eth0 has altname enp2s0 inet 192.168.66.139/24 brd 192.168.66.255 scope global noprefixroute eth1 valid_lft forever preferred_lft forever inet6 fe80::e03c:bcdd:7d90:7e45/64 scope link noprefixroute valid_lft forever preferred_lft forever # ping gateway - has 02:00:00:... mac address [fedora@vm1 ~]$ ping 10.0.2.1 PING 10.0.2.1 (10.0.2.1) 56(84) bytes of data. 64 bytes from 10.0.2.1: icmp_seq=1 ttl=64 time=1.56 ms 64 bytes from 10.0.2.1: icmp_seq=2 ttl=64 time=0.927 ms --- 10.0.2.1 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1003ms rtt min/avg/max/mdev = 0.927/1.244/1.562/0.317 ms [fedora@vm1 ~]$ ip n 192.168.66.131 dev eth1 lladdr 62:89:2e:cd:75:d7 STALE 10.0.2.1 dev eth0 lladdr 02:00:00:00:00:00 REACHABLE <===== as you can see here 192.168.66.2 dev eth1 lladdr 16:c1:6d:e8:62:dc STALE fe80::b440:bff:fe6e:c1df dev eth1 lladdr 16:c1:6d:e8:62:dc router STALE

VM2

[fedora@vm2 ~]$ ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc fq_codel state UP group default qlen 1000 link/ether 52:54:00:3f:06:10 brd ff:ff:ff:ff:ff:ff altname enp1s0 inet 10.0.2.2/24 brd 10.0.2.255 scope global dynamic noprefixroute eth0 valid_lft 86296796sec preferred_lft 86296796sec inet6 fe80::5054:ff:fe3f:610/64 scope link valid_lft forever preferred_lft forever 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 62:89:2e:cd:75:d7 brd ff:ff:ff:ff:ff:ff altname enp2s0 inet 192.168.66.131/24 brd 192.168.66.255 scope global noprefixroute eth1 valid_lft forever preferred_lft forever inet6 fe80::dcf9:be0:d53b:d725/64 scope link noprefixroute valid_lft forever preferred_lft forever # 2 shells, one pinging vm1's IP - 192.168.66.139 - another pinging the gateway - 10.0.2.1 [fedora@vm2 ~]$ ping -q 192.168.66.139 [fedora@vm2 ~]$ ping -q 10.0.2.1 # gets us this neighbor table [fedora@vm2 ~]$ ip n 192.168.66.2 dev eth1 lladdr 16:c1:6d:e8:62:dc STALE 192.168.66.139 dev eth1 lladdr 02:00:00:00:00:00 REACHABLE 10.0.2.1 dev eth0 lladdr 02:00:00:00:00:00 REACHABLE fe80::b440:bff:fe6e:c1df dev eth1 lladdr 16:c1:6d:e8:62:dc router STALE

The ticks to update the neighbor state happen every BASE_REACHABLE_TIME/ 2, and the entries remain in the reachable state for a while, between BASE_REACHABLE_TIME/ 2 and 3/2 * BASE_REACHABLE_TIME as seen in the implementation.

In the system above, w/ a 5.6.6. kernel, the base_reachable_time for the interfaces is 30 seconds:

[fedora@vm2 ~]$ cat /proc/sys/net/ipv4/neigh/eth0/base_reachable_time 30 [fedora@vm2 ~]$ cat /proc/sys/net/ipv4/neigh/eth1/base_reachable_time 30

I see that the neighbors persist in reachable state for around 10-30 seconds, which, given the numbers described above make sense to me.

And the traffic works, which I think is the most important metric here.

maiqueb · 2021-05-13T15:07:03Z

I am good with this change, but some questions popped up and it will be nice to verify this before we proceed.

For the reserved mac, I would reserved it for all interfaces in general, no matter the binding. It is easier to do so and can simplify KMP logic. If we will need to reserve another mac in the future, then it will enter the same category and the logic will be kept simple.
WDYT?

I personally don't see that as a reason for imposing restrictions on bindings that do not require them. As such, would avoid doing it.

EdDev

Thank you!
This needs rebase now.

Since the bridge MAC address no longer changes, the ARP tables do not need to stabilize, which preserves connectivity from the outside world to the guest. This is tested in this commit since the `Eventually` block in the vmi_networking_test for masquerade binding migration can be safely removed. The lowest locally administrated unicast MAC address was chosen for the bridge. Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

For now, we reserve a single hardcoded MAC address, 02:00:00:00:00:00, the lowest locally administrated unicast address. Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

kubevirt-bot · 2021-05-17T13:03:38Z

@maiqueb: The following test failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
pull-kubevirt-e2e-k8s-1.17	`c69ac25`	link	`/test pull-kubevirt-e2e-k8s-1.17`

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

maiqueb · 2021-05-17T13:50:46Z

/retest

AlonaKaplan · 2021-05-20T11:53:36Z

/approve

kubevirt-bot · 2021-05-20T11:53:43Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: AlonaKaplan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [AlonaKaplan]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

kubevirt-bot added release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. size/S labels Feb 10, 2021

kubevirt-bot requested review from kbidarkar and phoracek February 10, 2021 12:11

kubevirt-bot assigned EdDev Feb 10, 2021

maiqueb force-pushed the lost-ip-connectivity-after-migration-masquerade-binding branch from 6960209 to 8eb934a Compare February 10, 2021 12:16

maiqueb changed the title ~~masquerade, ipv6m, migration: hardcode the MAC addr~~ masquerade, migration: hardcode the bridge MAC addr Feb 10, 2021

EdDev reviewed Feb 10, 2021

View reviewed changes

kubevirt-bot added size/L and removed size/S labels Feb 24, 2021

maiqueb requested a review from EdDev February 24, 2021 11:26

RamLavi reviewed Feb 25, 2021

View reviewed changes

kubevirt-bot added sig/network needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Mar 4, 2021

RamLavi reviewed Apr 20, 2021

View reviewed changes

oshoval reviewed May 6, 2021

View reviewed changes

pkg/virt-api/webhooks/validating-webhook/admitters/vmi-create-admitter_test.go Show resolved Hide resolved

pkg/network/reserved_macs.go Outdated Show resolved Hide resolved

kubevirt-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 7, 2021

maiqueb force-pushed the lost-ip-connectivity-after-migration-masquerade-binding branch from afe3bb4 to 78f831b Compare May 7, 2021 10:58

kubevirt-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 7, 2021

EdDev reviewed May 12, 2021

View reviewed changes

maiqueb force-pushed the lost-ip-connectivity-after-migration-masquerade-binding branch from 78f831b to f2e37c9 Compare May 12, 2021 09:22

kubevirt-bot added size/M and removed size/L labels May 12, 2021

EdDev reviewed May 13, 2021

View reviewed changes

kubevirt-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 14, 2021

EdDev approved these changes May 14, 2021

View reviewed changes

kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label May 14, 2021

maiqueb added 3 commits May 17, 2021 11:24

network, reserve macs: introduce a mechanism to have reserved macs

e5a1b7a

For now, we reserve a single hardcoded MAC address, 02:00:00:00:00:00, the lowest locally administrated unicast address. Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

admitter: prevent creation of VMIs with reserved macs on masquerade

d55538d

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

maiqueb force-pushed the lost-ip-connectivity-after-migration-masquerade-binding branch from f2e37c9 to d55538d Compare May 17, 2021 09:30

kubevirt-bot removed lgtm Indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels May 17, 2021

EdDev approved these changes May 17, 2021

View reviewed changes

kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label May 17, 2021

kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 20, 2021

kubevirt-bot merged commit d3c5f2c into kubevirt:master May 20, 2021

maiqueb deleted the lost-ip-connectivity-after-migration-masquerade-binding branch May 20, 2021 12:23

	vm := v1.NewMinimalVMI("testvm")
	vmi := v1.NewMinimalVMI("testvmi")

masquerade, migration: hardcode the bridge MAC addr #5000

masquerade, migration: hardcode the bridge MAC addr #5000

Conversation

maiqueb commented Feb 10, 2021 • edited

maiqueb commented Feb 10, 2021

maiqueb commented Feb 10, 2021

EdDev left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maiqueb commented Feb 12, 2021

maiqueb commented Feb 19, 2021

maiqueb commented Feb 19, 2021

RamLavi commented Feb 21, 2021 • edited

EdDev commented Feb 21, 2021

maiqueb commented Feb 22, 2021

RamLavi left a comment

Choose a reason for hiding this comment

maiqueb commented Feb 25, 2021

EdDev commented Mar 4, 2021

phoracek commented Mar 8, 2021

maiqueb commented Apr 14, 2021

EdDev commented Apr 14, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RamLavi commented Apr 20, 2021 • edited

maiqueb commented May 5, 2021

maiqueb commented May 6, 2021

oshoval left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maiqueb May 12, 2021 • edited

Choose a reason for hiding this comment

maiqueb May 12, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

EdDev left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

VM1

VM2

Choose a reason for hiding this comment

maiqueb commented May 13, 2021

EdDev left a comment

Choose a reason for hiding this comment

kubevirt-bot commented May 17, 2021 • edited

maiqueb commented May 17, 2021

AlonaKaplan commented May 20, 2021

kubevirt-bot commented May 20, 2021

maiqueb commented Feb 10, 2021 •

edited

RamLavi commented Feb 21, 2021 •

edited

EdDev commented Apr 14, 2021 •

edited

RamLavi commented Apr 20, 2021 •

edited

maiqueb May 12, 2021 •

edited

maiqueb May 12, 2021 •

edited

kubevirt-bot commented May 17, 2021 •

edited