-
Notifications
You must be signed in to change notification settings - Fork 515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[noderesourcetopology] rewrite accounting of numa-affine resources with scope=container #752
[noderesourcetopology] rewrite accounting of numa-affine resources with scope=container #752
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ffromani The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
✅ Deploy Preview for kubernetes-sigs-scheduler-plugins canceled.
|
ce6f425
to
898898f
Compare
note: at this stage the integration tests are expected to fail - for the right reasons though :) |
like this, the expected failures:
|
/retest |
cb364d4
to
b81c7a6
Compare
8e8065e
to
4edb3a9
Compare
4edb3a9
to
782512b
Compare
85ba7e4
to
a9cabd4
Compare
/hold more testing ongoing |
/cc @philsphicas |
@ffromani: GitHub didn't allow me to request PR reviews from the following users: philsphicas. Note that only kubernetes-sigs members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
efc52d0
to
73e83e3
Compare
/hold cancel |
/cc @PiotrProkop |
73e83e3
to
ec25e3a
Compare
before kubernetes-sigs#710 and kubernetes-sigs#725, we logged the container being processed alongside the pod (identified by namespace/name pair). It was dropped by mistake and not deliberately. This is useful information when troubleshooting, so let's add it back. Signed-off-by: Francesco Romani <fromani@redhat.com>
The ephemeral storage resource is not a deciding factor for noderesourcetopology filtering, but it was incorrectly accounted causing bad scheduling decisions. First, we add some integration test coverage to catch these issues. Signed-off-by: Francesco Romani <fromani@redhat.com>
ec25e3a
to
9e1d22e
Compare
Looks good now, can you just squash 3rd and 4th commit into one? |
Rewrite the accounting of NUMA-local resources when using scope=container. The previous code was too lenient and worked mostly by side effects when dealing with non-NUMA affine resources. A non-NUMA affine resource (aka a hostlevel resource) is a resource which is not guaranteed to always have a NUMA affinity. CPU and memory (incl. hugepages) always do, but devices may or may not, both options are legal for device plugins. Similarly, ephemeral storage is a prominent example of resource which should never have a NUMA affinity. The accounting in this case was wrong because previously the resource was considered NUMA affine. Note: it's legal to configure topology updaters (e.g. NFD) to not advertise CPU and memory in NRT objects. Thus is best to treat lack of them as warnings, not as blocking errors. However if the per-NUMA affine counters go negative this is definitely an error condition we need to detect and be very loud about it. Signed-off-by: Francesco Romani <fromani@redhat.com>
plugin.go should contain only entry point and orchestration code. Let's move all the utilties and logic to other source code files. Trivial code movement with minimal renames. Signed-off-by: Francesco Romani <fromani@redhat.com>
9e1d22e
to
13e9bc7
Compare
/lgtm |
What type of PR is this?
/kind bug
What this PR does / why we need it:
Rewrite and fix, adding a bunch of tests and better logs along the way, how we treat non-NUMA-affine aka host-level resources. The most prominent example is ephemeral storage, but we aim to cover the general case.
Which issue(s) this PR fixes:
Fixes #751
Special notes for your reviewer:
N/A
Does this PR introduce a user-facing change?