Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Macvtap binding plugin #10800

Merged
merged 4 commits into from Dec 2, 2023
Merged

Conversation

AlonaKaplan
Copy link
Member

@AlonaKaplan AlonaKaplan commented Nov 28, 2023

What this PR does / why we need it:
This PR allows using macvtap binding as a plugin.

Those are the extra steps that need to be done to run a VM with a macvtap plugin binding interface (the rest is similar to the old API usage).
Enable network binding plugin feature-gate in the KV CR

kubectl patch kubevirts -n kubevirt kubevirt --type=json -p='[{"op": "add", "path": "/spec/configuration/developerConfiguration/featureGates/-",   "value": "NetworkBindingPlugins"}]'

Register the macvtap binding plugin in the KV CR

note: use the name of the network-attachment-definition defined earlier.
kubectl patch kubevirts -n kubevirt kubevirt --type=json -p='[{"op": "add", "path": "/spec/configuration/network",   "value": {
            "binding": {
                "macvtap": {
                    domainAttachmentType: tap,
                }
            }
        }}]'

Create (and apply) a VM with a macvtap binding interface

---
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  labels:
    kubevirt.io/vm: vm-net-binding-macvtap
  name: vm-net-binding-macvtap
spec:
  running: true
  template:
    metadata:
      labels:
        kubevirt.io/vm: vm-net-binding-macvtap
    spec:
      domain:
        devices:
          disks:
          - disk:
              bus: virtio
            name: containerdisk
          interfaces:
          - name: macvtapnet
            binding:
              name: macvtap
            ports:
            - name: http
              port: 80
              protocol: TCP
          rng: {}
        resources:
          requests:
            memory: 1024M
      networks:
      - name: macvtapnet
        mutlus:
          networkName: macvtapNad  
      terminationGracePeriodSeconds: 0
      volumes:
      - containerDisk:
          image: registry:5000/kubevirt/fedora-with-test-tooling-container-disk:devel
        name: containerdisk

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Checklist

This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.

Release note:

Support macvtap as a binding plugin

In case the binding CR doesn't define an annotation, an empty one shouldn't
be added to the multus annotation of the pod.

Signed-off-by: Alona Paz <alkaplan@redhat.com>
Signed-off-by: Alona Paz <alkaplan@redhat.com>
@kubevirt-bot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@kubevirt-bot kubevirt-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. sig/network size/L labels Nov 28, 2023
@ormergi
Copy link
Contributor

ormergi commented Nov 28, 2023

Regarding the example in PR the description, isn't it necessary to create NAD for mavtap binding?

@AlonaKaplan
Copy link
Member Author

/test pull-kubevirt-build
/test pull-kubevirt-build-arm64
/test pull-kubevirt-generate
/test pull-kubevirt-manifests
/test pull-kubevirt-unit-test
/test pull-kubevirt-e2e-k8s-1.27-sig-network

@AlonaKaplan
Copy link
Member Author

Regarding the example in PR the description, isn't it necessary to create NAD for mavtap binding?

Edited the comment to emphasize those are just the extra steps.

@@ -77,7 +77,9 @@ func GenerateMultusCNIAnnotationFromNameScheme(namespace string, interfaces []v1
if err != nil {
return "", err
}
multusNetworkAnnotationPool.add(*bindingPluginAnnotationData)
if bindingPluginAnnotationData != nil {
multusNetworkAnnotationPool.add(*bindingPluginAnnotationData)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It seems the root cause for the additional empty element in Multus annotation is passing bindingPluginAnnotationData by value instead of pointer to add.
We can avoid yet another if in this function by passing pointer and performing nil checking inside add.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then the nil check will be in the "add" method. So we anyway need to add a nil check.

Comment on lines 161 to 164
if errors.Is(err, os.ErrNotExist) {
return nil
}
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is that error ignored?
It seems it will be ignored whether its ErrNoExist or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means that is is ok if the device is just not there, in this case we don't need to change the ownership - there is no device.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks wrong or I miss something.
Of the device is missing or not, you always return nil.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you"re right. It is a bug. Done.

for _, inf := range vmi.Spec.Domain.Devices.Interfaces {
if inf.Macvtap != nil {
macvtap[inf.Name] = struct{}{}
if domainAttachmentByInterfaceName[inf.Name] == string(v1.Tap) && inf.Masquerade == nil && inf.Bridge == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen when Mavtap interface API is used, will it work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will soon see in the tests are passing:) But generally, mactvap old API should return true here (since it is in the map).

Comment on lines 125 to 126
macvtapLowerDevice = "eth0"
macvtapNetworkName = "net1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't these can set as consts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

}

BeforeEach(func() {
virtClient = kubevirt.Client()
Copy link
Contributor

@ormergi ormergi Nov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think kubevirt.Client() can be called directly, its implemented as singleton and its safe to call it more then once.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Comment on lines 142 to 143
var nodeList *k8sv1.NodeList
var nodeName string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These seem redundant because they only used in the BeforeEach

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@AlonaKaplan
Copy link
Member Author

/test pull-kubevirt-build
/test pull-kubevirt-build-arm64
/test pull-kubevirt-generate
/test pull-kubevirt-manifests
/test pull-kubevirt-unit-test
/test pull-kubevirt-e2e-k8s-1.27-sig-network

@AlonaKaplan AlonaKaplan marked this pull request as ready for review November 29, 2023 06:45
@kubevirt-bot kubevirt-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 29, 2023
for _, inf := range vmi.Spec.Domain.Devices.Interfaces {
if inf.Macvtap != nil {
macvtap[inf.Name] = struct{}{}
if domainAttachmentByInterfaceName[inf.Name] == string(v1.Tap) && inf.Masquerade == nil && inf.Bridge == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

masquerade and bridge
are exluded since we know in advance there is no tap device on those
bindings

We indeed know, but why is that a good reasoning to keep conditioning on it?
I thought we can generalize it to not have to ask. As a side effect, it will assure we are able to handle it for other future bindings that do use a simple tap without a device behind it.

Bottom line, it will be nice to specify here all the users of tap & tap-device.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can generalize it and everything will perfectly work if inf.Masquerade == nil && inf.Bridge == nil will be removed from the condition.
The reason I prefer to exclude it is because the next step is to step into the virt-laucnher's network namespace and looking for the device. It is a bit expensive, so if we can avoid it. Why not?
Another bindings don't have this privilege (that we know in advance if the tap device is there or not), so they we have to step into the network namespace to check.

Comment on lines 161 to 164
if errors.Is(err, os.ErrNotExist) {
return nil
}
return nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks wrong or I miss something.
Of the device is missing or not, you always return nil.

_, isMacvtapNetwork := macvtap[net.Name]
if podInterfaceName, exists := networkNameScheme[net.Name]; isMacvtapNetwork && exists {
_, isTapNetwork := tapNetworks[net.Name]
if podInterfaceName, exists := networkNameScheme[net.Name]; isTapNetwork && exists {
tapDevices[net.Name] = podInterfaceName
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is odd.

There is a hidden assumption that if there is a device behind a tap domainAttachement, it is always with the pod interface name.
Why is that correct? Maybe it is a device used with the tap* naming.

Well, you could always define this as a "rule", but it feels ugly.
Can this be solved in the future or will we be stuck with this forever? (I mean, assuming we let this slide in, can this be improved without breaking things).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is the rule. I should use the pod interface naming.
If we will improve it in the future we will have to somehow pass the "name" of the tap device.
I don't think it will break anything, we can say that the default is the pod interface name.

Comment on lines 110 to 119
macvtapNetworkConfNAD = `{"apiVersion":"k8s.cni.cncf.io/v1","kind":"NetworkAttachmentDefinition","metadata":{"name":"%s","namespace":"%s", "annotations": {"k8s.v1.cni.cncf.io/resourceName": "macvtap.network.kubevirt.io/%s"}},"spec":{"config":"{ \"cniVersion\": \"0.3.1\", \"name\": \"%s\", \"type\": \"macvtap\"}"}}`
macvtapBindingName = "macvtap"
macvtapLowerDevice = "eth0"
macvtapNetworkName = "net1"
)

createMacvtapNetworkAttachmentDefinition := func(namespace, networkName, macvtapLowerDevice string) error {
macvtapNad := fmt.Sprintf(macvtapNetworkConfNAD, networkName, namespace, macvtapLowerDevice, networkName)
return createNetworkAttachmentDefinition(kubevirt.Client(), networkName, namespace, macvtapNad)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please reuse or do something similar to createBasicNetworkAttachmentDefinition, this one is too noisy. You know I do not like these closure functions :).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed it.

Expect(err).NotTo(HaveOccurred())
})

Context("can run a virtual machine with one macvtap interface", func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see a need for another indented context here. We are in the "macvtap" context, and if there is a need for some pre-setup, it can be done at that level.

But I also think it is wrong to keep the current separation between the fixture and test body.
The test itself is also about defining the VM correctly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Comment on lines 137 to 139
nodeList = libnode.GetAllSchedulableNodes(kubevirt.Client())
Expect(nodeList.Items).NotTo(BeEmpty(), "schedulable kubernetes nodes must be present")
nodeName := nodeList.Items[0].Name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand why this is needed.
We do not check this for other tests, what is special about this one?

If there are no scheduled nodes, it should fail on timing out to create the VM.
I do not need to know about these details unless you tell me it is a must.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is nit needed, removed.

Expect(nodeList.Items).NotTo(BeEmpty(), "schedulable kubernetes nodes must be present")
nodeName := nodeList.Items[0].Name

chosenMACHW, err := GenerateRandomMac()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this GenerateRandomMac coming from? I guess it should be moved to libnet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

func GenerateRandomMac() (net.HardwareAddr, error) {

A separate PR can move it.

Comment on lines 149 to 151
libvmi.WithInterface(
*libvmi.InterfaceWithMac(
&macvtapIface, chosenMAC)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can these appear on the same line?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

libvmi.WithInterface(
*libvmi.InterfaceWithMac(
&macvtapIface, chosenMAC)),
libvmi.WithNetwork(libvmi.MultusNetwork(libvmi.DefaultInterfaceName, macvtapNetworkName)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I;m having hard time understanding how macvtap can be the primary network.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not the primary. Changed the name so it will be clearer.

The old code changed the ownership on the macvtap pod interface.
To allow using macvtap binding as a plugin the code is changed to support
any binding that is using tap domain attachment.
If a device exists, the ownership will be changed (masquerade and bridge
are exluded since we know in advance there is no tap device on those
bindings).


Signed-off-by: Alona Paz <alkaplan@redhat.com>
Signed-off-by: Alona Paz <alkaplan@redhat.com>
Copy link
Member

@EdDev EdDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Comment on lines +125 to +126
var vmi *v1.VirtualMachineInstance
var chosenMAC string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: You do not need these now.

Expect(err).ToNot(HaveOccurred())
chosenMAC = chosenMACHW.String()

ifaceName := "macvtapIface"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would call it networkName and set it as a const.

When we use iface, it has too many possible interpretations (logical iface/network name, name on the OS kernel).

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Nov 29, 2023
@AlonaKaplan
Copy link
Member Author

Raising to approve.

/approve

@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: AlonaKaplan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 30, 2023
@kubevirt-commenter-bot
Copy link

/retest-required
This bot automatically retries required jobs that failed/flaked on approved PRs.
Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

3 similar comments
@kubevirt-commenter-bot
Copy link

/retest-required
This bot automatically retries required jobs that failed/flaked on approved PRs.
Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@kubevirt-commenter-bot
Copy link

/retest-required
This bot automatically retries required jobs that failed/flaked on approved PRs.
Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@kubevirt-commenter-bot
Copy link

/retest-required
This bot automatically retries required jobs that failed/flaked on approved PRs.
Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@kvaps
Copy link
Member

kvaps commented Dec 1, 2023

Hi @AlonaKaplan, wouldn't you like to add an opportunity from #7648 to make macvtap method working also for default podNetworking and other CNIs?

I can investigate into it to make this working.

@kubevirt-commenter-bot
Copy link

/retest-required
This bot automatically retries required jobs that failed/flaked on approved PRs.
Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@kubevirt-bot
Copy link
Contributor

kubevirt-bot commented Dec 1, 2023

@AlonaKaplan: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubevirt-check-tests-for-flakes e378ba2 link false /test pull-kubevirt-check-tests-for-flakes

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@kubevirt-commenter-bot
Copy link

/retest-required
This bot automatically retries required jobs that failed/flaked on approved PRs.
Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@kubevirt-bot kubevirt-bot merged commit 6017b2a into kubevirt:main Dec 2, 2023
36 of 37 checks passed
@AlonaKaplan
Copy link
Member Author

/cherry-pick release-1.1

@kubevirt-bot
Copy link
Contributor

@AlonaKaplan: new pull request created: #10831

In response to this:

/cherry-pick release-1.1

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/network size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants