Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not try to set VF MAC non-administratively #232

Merged
merged 1 commit into from
Nov 24, 2022
Merged

Do not try to set VF MAC non-administratively #232

merged 1 commit into from
Nov 24, 2022

Conversation

cgoncalves
Copy link
Contributor

The VF MAC address is, when requested, already set administratively via PF and works regardless whether VF trust is on or off.

Trying to set VF MAC address non-administratively without trust on is expected to gracefully fail. For example, here's the dmesg output example:

kernel: i40e 0000:3b:00.0: VF attempting to override administratively set MAC address, bring down and up the VF interface to resume normal operation
kernel: i40e 0000:3b:00.0: VF 0 failed opcode 10, retval: -1
kernel: iavf 0000:3b:02.0: Failed to add MAC filter, error IAVF_ERR_NVM

The same applies also when releasing the VF.

Tested on Intel XXV710 (i40e/iavf), Intel E810 (ice/iavf) and Mellanox ConnectX-4 (mlx5_core).

Signed-off-by: Carlos Goncalves cgoncalves@redhat.com

@adrianchiris
Copy link
Contributor

to change mac address for SR-IOV you either:

  1. set administrative MAC on PF (with VF index) then unbind bind the VF driver
  2. set administrative MAC on PF (with VF index), bring VF interface down, set its MAC address (to same value) then bring it back up

we are doing the latter. AFAIK this procedure is suppose to work.

@SchSeba
Copy link
Collaborator

SchSeba commented Nov 20, 2022

@adrianchiris you are right we are doing the second option but it's not right.

I know we put the same mac address but for some drivers it still looks like a try to change the mac address on the VF when the VF doesn't have a trust flag on and that is blocked for security reasons

Copy link
Collaborator

@SchSeba SchSeba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

small comments :)

return "", fmt.Errorf("failed to set netlink MAC address to %s: %v", hwaddr, err)
}
macAddress = conf.MAC
return fmt.Errorf("error setting temp IF name %s for %s", tempName, linkName)
}

// 4. Change netns
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please update the number here :)

@@ -14,7 +14,6 @@ type NetlinkManager interface {
LinkSetVfVlan(netlink.Link, int, int) error
LinkSetVfVlanQos(netlink.Link, int, int, int) error
LinkSetVfHardwareAddr(netlink.Link, int, net.HardwareAddr) error
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please re-generate the mocks so the function will not be there

LinkState uint32
HostIFName string
SpoofChk bool
AdminMAC string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe we can change this to just mac now that we only handle the mac address via the PF

@adrianchiris
Copy link
Contributor

I know we put the same mac address but for some drivers it still looks like a try to change the mac address on the VF when the VF doesn't have a trust flag on and that is blocked for security reasons

so with trust on, its ok to change mac addresses right ? we can configure trust via cni config.
I dont understand why this PR removes existing functionality.

perhaps we should fail CNI call if trust is off but mac address was specified ?

@SchSeba
Copy link
Collaborator

SchSeba commented Nov 20, 2022

Hi @adrianchiris thanks for the comment let me try to answer

That configuration was implemented 2 years ago together with the mac address change from the PF configuration.

So this means that from day 1 it was redundant

we should not failed the CNI if someone want to configure mac address but trust is off.
we should allow it just do it via the pf administrative mac address configuration.

The PF driver will handle the propagation of the mac address to the vf netdevice mac address.

also for example in case of vfio-pci (DPDK) we only configure the mac via the pf administrative configuration that is fine.

@cgoncalves
Copy link
Contributor Author

The PR as-is actually does not work as expected for Mellanox (though I stated it does per testing, my bad).

Setting the VF MAC via PF appears to work but in fact it is not propagated down to the VF (e.g. eno1v0):

# ip link show eno1 | grep 'vf 5'
    vf 5     link/ether e2:f9:b0:31:47:b2 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off

# ip link show eno1v1
19: eno1v1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 66:9e:2d:a9:22:7b brd ff:ff:ff:ff:ff:ff
    altname enp25s0f0v1

# ip link set eno1 vf 5 mac e2:f9:b0:31:47:b3

# ip link show eno1 | grep 'vf 5'
    vf 5     link/ether e2:f9:b0:31:47:b3 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off

# ip link show eno1v5
23: eno1v5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether e2:f9:b0:31:47:b2 brd ff:ff:ff:ff:ff:ff
    altname enp25s0f0v5

Running a Kubernetes cluster with this PR, pod VF interfaces will still be configured with the original MAC address while the PF reports the VF has the desired MAC address:

# kubectl exec  -n sriov-operator-tests testpod-server -- ip link show net1
19: net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 66:9e:2d:a9:22:7b brd ff:ff:ff:ff:ff:ff
    altname enp25s0f0v1
# ip link show eno1 | grep 'vf 1'
    vf 1     link/ether 20:04:0f:f1:88:a3 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off

I don't know if this is by design or a bug in the Mellanox kernel driver. I looked up for official documentation on how to set VF MAC and all I could find is only via PF (ip link set <PF> vf <VF> mac <MAC>):

@adrianchiris
Copy link
Contributor

Setting the VF MAC via PF appears to work but in fact it is not propagated down to the VF (e.g. eno1v0)

yes, you need to either change MAC on VF netdev (bring interface down, change mac, bring interface up).
or unbind bind VF PCI so it pulls in the updated configuration. this is by design.

for Intel NICs you are seeing the MAC address is updated on the fly for VF netdev when setting their administrative MAC ?

@cgoncalves
Copy link
Contributor Author

for Intel NICs you are seeing the MAC address is updated on the fly for VF netdev when setting their administrative MAC ?

Yes.

@adrianchiris
Copy link
Contributor

adrianchiris commented Nov 21, 2022

i wonder if this is a change in behaviour for intel NIC or was this always failing.

i see support was added around 4 years ago[1]

@cgoncalves i wonder, when the VF interface is up and Admin MAC is changed. what happens to the VF netdev ?

[1] b4a9571

@cgoncalves
Copy link
Contributor Author

cgoncalves commented Nov 21, 2022

It is changed on the fly:

# echo 6 > /sys/class/net/ens1f0/device/sriov_numvfs                                                                                                                                                                                   

# ip link show ens1f0v5
193: ens1f0v5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 4a:e6:41:68:87:52 brd ff:ff:ff:ff:ff:ff
    altname enp59s0f0v5

# ip link set  ens1f0v5 up                                                                                                                                                                                                             

# ip link show ens1f0v5                                                                                                                                                                                                                
193: ens1f0v5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 4a:e6:41:68:87:52 brd ff:ff:ff:ff:ff:ff
    altname enp59s0f0v5

# ip link set ens1f0 vf 5 mac 4a:e6:41:68:87:53         
                                                                                                                                                                               
# ip link show ens1f0v5                                                                                                                                                                                                                
193: ens1f0v5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 4a:e6:41:68:87:53 brd ff:ff:ff:ff:ff:ff
    altname enp59s0f0v

i see support was added around 4 years ago[1]

On Intel SR-IOV NICs, setting the VF netdev MAC address to the same already configured administratively will just be a no-op in the Linux kernel. Setting to a different MAC address, it will fail with -EPERM:

# ip link set ens1f0v5 address 4a:e6:41:68:87:54 
RTNETLINK answers: Permission denied
kernel: i40e 0000:3b:00.0: VF attempting to override administratively set MAC address, bring down and up the VF interface to resume normal operation
kernel: i40e 0000:3b:00.0: VF 0 failed opcode 10, retval: -1
kernel: iavf 0000:3b:02.0: Failed to add MAC filter, error IAVF_ERR_NVM

@SchSeba
Copy link
Collaborator

SchSeba commented Nov 21, 2022

The next question will be if now we need a split in the code between mlx and intel, or we can introduce a rebind the driver in the sriov-cni

the rebind can have multiple implications like renaming, udev rules, networkManager update, etc so I am not too much in favor of that

@cgoncalves
Copy link
Contributor Author

The next question will be if now we need a split in the code between mlx and intel, or we can introduce a rebind the driver in the sriov-cni

Given we have now confirmed that Mellanox VFs behave that way by design, we might have to yes.

The issue that led to this PR is:
We started seeing recently is that for i40e driver Intel NICs, the driver will accept the VF netdev MAC address request issued by SRIO-CNI while the VF MAC address is still being set administratively/by the PF, leaving the VF configuration inconsistent to the point where IPv6 Neighbor Solicitation packets do not enter the VF. For newer kernels that I tested this issue has been addressed by erroring out. Removing the set of VF MAC netdev would work (but conflicts with Mellanox NICs).

@cgoncalves
Copy link
Contributor Author

Following @SchSeba's suggestion, I could update this PR to add that new logic.

In SetupVF:
  1. check what the driver/vendor is (preferences?)
  2. if result in []string -> set netdev VF MAC address
  3. else -> skip

@adrianchiris WDYT?

@coveralls
Copy link

coveralls commented Nov 21, 2022

Pull Request Test Coverage Report for Build 3540767190

  • 26 of 26 (100.0%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+1.4%) to 38.027%

Totals Coverage Status
Change from base Build 3524546022: 1.4%
Covered Lines: 424
Relevant Lines: 1115

💛 - Coveralls

@adrianchiris
Copy link
Contributor

adrianchiris commented Nov 21, 2022

@adrianchiris WDYT?

Id prefer to avoid checking vendor driver if possible. see below some additional thoughts :)

Seems like there are inconsistencies between vendors WRT admin MAC and also spoofchk trust and vlan.
Netronome folks did some comparison. see [1], [2], [3].

@cgoncalves, after setting admin MAC address does the effective mac of the VF netdev change immediately ? or its an async operation ? perhaps @Eoghan1232 has some more info on this.

if it happens immediately (or we can do some sensible polling) then we could read MAC address of the VF and if it differs from desired MAC the set it. in Intel case it would be the same so we will do a no-op.

alternatively is to always try to set it. if it fails with a permission denied, then we could read VF effective MAC if it equals our desired MAC then continue, else fail CNI call. (sensible polling may be needed here as well in case operation is async in kernel).

for cleanup we could use a similar logic.

@cgoncalves what is the cni config you are using ? (are you using trust/spoofchk ?)

[1] https://github.com/Netronome/nic-firmware/blob/master/docs/notes/sriov-trust.rst
[2] https://github.com/Netronome/nic-firmware/blob/master/docs/notes/sriov-spoofchk.rst
[3] https://github.com/Netronome/nic-firmware/blob/master/docs/notes/sriov-vlan.rst

@adrianchiris
Copy link
Contributor

Also @cgoncalves i see the following from you:

On Intel SR-IOV NICs, setting the VF netdev MAC address to the same already configured administratively will just be a no-op in the Linux kernel. Setting to a different MAC address, it will fail with -EPERM:

so the issue is actually only on cleanup ?

because on CMD ADD we would

  1. set admin MAC
  2. set effective MAC to the same value

so it should work according to this. is this the case ?

on cleanup we do the reverse

  1. restore effective MAC
  2. restore admin MAC

here according to what you say, will fail.

then we just need to swap between ReleaseVF and ResetVFConfig on cleanup

@Eoghan1232
Copy link
Collaborator

@cgoncalves, after setting admin MAC address does the effective mac of the VF netdev change immediately ? or its an async operation ? perhaps @Eoghan1232 has some more info on this.

I tested this on a PF/VF with i40e and iavf

ip link set <PF> vf 0 mac 4a:e6:41:68:87:54

ip link show <PF>

3: <PF>: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 4a:e6:41:68:87:53 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether 4a:e6:41:68:87:54 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off

The mac address got updated without needing a restart of the VF.

i40e 0000:17:00.0: Setting MAC 4a:e6:41:68:87:54 on VF 0
iavf 0000:17:02.1: Reset warning received from the PF
iavf 0000:17:02.1: Scheduling reset task
i40e 0000:17:00.0: Bring down and up the VF interface to make this change effective.

I can see in dmesg log that the mac gets assigned and it schedules a reset task.
Not sure why it states to bring down/up the VF interface, as it gets update on the fly it seems.

@paravmellanox
Copy link

@cgoncalves, after setting admin MAC address does the effective mac of the VF netdev change immediately ? or its an async operation ? perhaps @Eoghan1232 has some more info on this.

I tested this on a PF/VF with i40e and iavf

ip link set <PF> vf 0 mac 4a:e6:41:68:87:54

ip link show <PF>

3: <PF>: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 4a:e6:41:68:87:53 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether 4a:e6:41:68:87:54 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off

The mac address got updated without needing a restart of the VF.

i40e 0000:17:00.0: Setting MAC 4a:e6:41:68:87:54 on VF 0
iavf 0000:17:02.1: Reset warning received from the PF
iavf 0000:17:02.1: Scheduling reset task
i40e 0000:17:00.0: Bring down and up the VF interface to make this change effective.

I can see in dmesg log that the mac gets assigned and it schedules a reset task. Not sure why it states to bring down/up the VF interface, as it gets update on the fly it seems.

VF state machine triggering in the kernel on a mac address change is convoluted and racy.
VF side driver may take its own sweet time to complete the reset asynchronously.
An appropriate flow is to VF DRA plugin that follows
a. provision VF
b. bind VF to a right driver (vfio for kubevirt VM) or vendor driver for container/Pod
c. get the necessary devices of choice (rdma, netdev, vdpa, ptp, and more)
d. invoke the CNIs in chain

@cgoncalves
Copy link
Contributor Author

The latest change now instead adds a retry should the driver still be working on propagating the MAC address change down to the VF.

pkg/sriov/sriov.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@SchSeba SchSeba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
just a comment request

pkg/sriov/sriov.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@Eoghan1232 Eoghan1232 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good to me.
once the comments are added I am happy with it

cmd/sriov/main.go Outdated Show resolved Hide resolved
Some NIC drivers (i.e. i40e/iavf) set their VF MAC addressing
asynchronously when set administratively. This means that while the PF
could already show the VF with the desired MAC address, the netdev VF
may still have the original one. If in this window we issue a netdev VF
MAC address set, the driver will return an error and the pod will fail
to create.

One way to fix this issue would be to not try to set the netdev VF MAC
address, rather simply rely on the MAC address set administratively
already in place. However, other NIC drivers (i.e. mlx5_core) do not
propagate the MAC address down to the netdev VF so for those drivers we
have to continue setting the VF MAC address the same way (via PF and
netdev VF).

This commit addresses the issue with a retry where it waits up to 1
second (5 retries * 200 millisecond sleep) in case driver is still
working on propagating the MAC address change down to the VF.

ResetVFConfig resets a VF administratively. We must run ResetVFConfig
before ReleaseVF because some drivers will error out if we try to reset
netdev VF with trust off. So, reset VF MAC address via PF first.

Signed-off-by: Carlos Goncalves <cgoncalves@redhat.com>
@Eoghan1232
Copy link
Collaborator

Approved by 2 maintainers - merging this PR now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants