Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-3057: Retry setting VF MAC address #53

Merged
merged 1 commit into from Dec 7, 2022

Conversation

cgoncalves
Copy link
Contributor

Cherry-pick from k8snetworkplumbingwg/sriov-cni#232

Some NIC drivers (i.e. i40e/iavf) set their VF MAC addressing asynchronously when set administratively. This means that while the PF could already show the VF with the desired MAC address, the netdev VF may still have the original one. If in this window we issue a netdev VF MAC address set, the driver will return an error and the pod will fail to create.

One way to fix this issue would be to not try to set the netdev VF MAC address, rather simply rely on the MAC address set administratively already in place. However, other NIC drivers (i.e. mlx5_core) do not propagate the MAC address down to the netdev VF so for those drivers we have to continue setting the VF MAC address the same way (via PF and netdev VF).

This commit addresses the issue with a retry where it waits up to 1 second (5 retries * 200 millisecond sleep) in case driver is still working on propagating the MAC address change down to the VF.

ResetVFConfig resets a VF administratively. We must run ResetVFConfig before ReleaseVF because some drivers will error out if we try to reset netdev VF with trust off. So, reset VF MAC address via PF first.

Signed-off-by: Carlos Goncalves cgoncalves@redhat.com

Some NIC drivers (i.e. i40e/iavf) set their VF MAC addressing
asynchronously when set administratively. This means that while the PF
could already show the VF with the desired MAC address, the netdev VF
may still have the original one. If in this window we issue a netdev VF
MAC address set, the driver will return an error and the pod will fail
to create.

One way to fix this issue would be to not try to set the netdev VF MAC
address, rather simply rely on the MAC address set administratively
already in place. However, other NIC drivers (i.e. mlx5_core) do not
propagate the MAC address down to the netdev VF so for those drivers we
have to continue setting the VF MAC address the same way (via PF and
netdev VF).

This commit addresses the issue with a retry where it waits up to 1
second (5 retries * 200 millisecond sleep) in case driver is still
working on propagating the MAC address change down to the VF.

ResetVFConfig resets a VF administratively. We must run ResetVFConfig
before ReleaseVF because some drivers will error out if we try to reset
netdev VF with trust off. So, reset VF MAC address via PF first.

Signed-off-by: Carlos Goncalves <cgoncalves@redhat.com>
@openshift-ci-robot openshift-ci-robot added the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Dec 7, 2022
@openshift-ci-robot
Copy link
Contributor

@cgoncalves: This pull request references Jira Issue OCPBUGS-3057, which is invalid:

  • expected the bug to target the "4.13.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Cherry-pick from k8snetworkplumbingwg/sriov-cni#232

Some NIC drivers (i.e. i40e/iavf) set their VF MAC addressing asynchronously when set administratively. This means that while the PF could already show the VF with the desired MAC address, the netdev VF may still have the original one. If in this window we issue a netdev VF MAC address set, the driver will return an error and the pod will fail to create.

One way to fix this issue would be to not try to set the netdev VF MAC address, rather simply rely on the MAC address set administratively already in place. However, other NIC drivers (i.e. mlx5_core) do not propagate the MAC address down to the netdev VF so for those drivers we have to continue setting the VF MAC address the same way (via PF and netdev VF).

This commit addresses the issue with a retry where it waits up to 1 second (5 retries * 200 millisecond sleep) in case driver is still working on propagating the MAC address change down to the VF.

ResetVFConfig resets a VF administratively. We must run ResetVFConfig before ReleaseVF because some drivers will error out if we try to reset netdev VF with trust off. So, reset VF MAC address via PF first.

Signed-off-by: Carlos Goncalves cgoncalves@redhat.com

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested review from Billy99 and bn222 December 7, 2022 12:12
@cgoncalves
Copy link
Contributor Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Dec 7, 2022
@openshift-ci-robot
Copy link
Contributor

@cgoncalves: This pull request references Jira Issue OCPBUGS-3057, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.13.0) matches configured target version for branch (4.13.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @zhaozhanqi

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@cgoncalves
Copy link
Contributor Author

/cherry-pick release-4.12

@openshift-cherrypick-robot

@cgoncalves: once the present PR merges, I will cherry-pick it on top of release-4.12 in a new PR and assign it to you.

In response to this:

/cherry-pick release-4.12

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@SchSeba
Copy link
Contributor

SchSeba commented Dec 7, 2022

/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 7, 2022
@bn222
Copy link
Contributor

bn222 commented Dec 7, 2022

/lgtm

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 7, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bn222, cgoncalves, SchSeba

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 7, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 7, 2022

@cgoncalves: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit 74fa329 into openshift:master Dec 7, 2022
@openshift-ci-robot
Copy link
Contributor

@cgoncalves: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-3057 has been moved to the MODIFIED state.

In response to this:

Cherry-pick from k8snetworkplumbingwg/sriov-cni#232

Some NIC drivers (i.e. i40e/iavf) set their VF MAC addressing asynchronously when set administratively. This means that while the PF could already show the VF with the desired MAC address, the netdev VF may still have the original one. If in this window we issue a netdev VF MAC address set, the driver will return an error and the pod will fail to create.

One way to fix this issue would be to not try to set the netdev VF MAC address, rather simply rely on the MAC address set administratively already in place. However, other NIC drivers (i.e. mlx5_core) do not propagate the MAC address down to the netdev VF so for those drivers we have to continue setting the VF MAC address the same way (via PF and netdev VF).

This commit addresses the issue with a retry where it waits up to 1 second (5 retries * 200 millisecond sleep) in case driver is still working on propagating the MAC address change down to the VF.

ResetVFConfig resets a VF administratively. We must run ResetVFConfig before ReleaseVF because some drivers will error out if we try to reset netdev VF with trust off. So, reset VF MAC address via PF first.

Signed-off-by: Carlos Goncalves cgoncalves@redhat.com

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-cherrypick-robot

@cgoncalves: #53 failed to apply on top of branch "release-4.12":

Applying: Retry setting VF MAC address
Using index info to reconstruct a base tree...
M	cmd/sriov/main.go
M	pkg/sriov/sriov.go
M	pkg/utils/utils.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/utils/utils.go
Auto-merging pkg/sriov/sriov.go
Auto-merging cmd/sriov/main.go
CONFLICT (content): Merge conflict in cmd/sriov/main.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Retry setting VF MAC address
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherry-pick release-4.12

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants