-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
node: e2e: bring up/down SRIOV DP just once #96219
Conversation
@fromanirh: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Hi @fromanirh. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign @sjenning |
/ok-to-test |
The issue I mentioned in #96219 (comment) is very similar to what's happening in https://bugzilla.redhat.com/1878566 but still investigating if it is the same. Will keep investigating and file a github issue on k/k if needed |
d87afbb
to
6e24efd
Compare
The e2e topology manager want to test the resource alignment using devices, and the easiest devices to use are the SRIOV devices at this moment. The resource alignment test cases are run for each supported policies, in a loop. The tests manage the SRIOV device plugin; up until now, the plugin was set up and tore down at each loop. There is no real need for that. Each loop must reconfigure (thus restart) the kubelet, but the device plugin can set up and tore down just once for all the policies, thus once. The kubelet can reconnect just fine to a running device plugin. This way, we greatly reduce the interactions and the complexity of the test environment, making it easier to understand and more robust, and we trim down some minutes from execution time. However, this patch also hides (not solves) a test flake we observed on some environment. The issue is hardly reproduceable and not well understood, but seems caused by doing the sriov dp setup/teardown in each policy testing loop. Investigation so far suggests that the kubelet sometimes have a stale state after the sriovdp teardown/setup cycle, leading to flakes and false negatives. We tried to address this in kubernetes#95611 with no conclusive results yet. This patch was posted because overall we believe this patch gains exceeds the drawbacks (hiding the aforementioned flake) and because understanding the potential interaction issues between the sriovdp and the kubelet deserve a separate test. Signed-off-by: Francesco Romani <fromani@redhat.com>
6e24efd
to
5610643
Compare
/retest |
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: fromanirh, sjenning The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind cleanup
/kind failing-test
/kind flake
What this PR does / why we need it:
Simplify the topology manager e2e tests to make the flow easier to follow, less complex and less fragile
Fixes #
N/A
Special notes for your reviewer:
Please check the commit message for a noteworthy benign side effect of this change.
It seems another instance of https://bugzilla.redhat.com/1878566
Deserves separate GH issue on k/k?
Obsoletes #95611
We didn't see this issue before because it largely depend on the HW on which the test runs.
Additionally, the prow jobs don't run with SRIOV hardware, so it cannot happen in prow.
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
N/A