nodetopologymatch: allow to track only pods with exclusive resources #590

ffromani · 2023-05-23T08:16:39Z

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

Optimize both the foreign pods detection and the overreserve resync loop considering only pods which have exclusive resources assigned (vs considering all of them) because only exclusive resources have NUMA affinity, so only those contribute to per-NUMA accounting. This reduces churn, noise and computing power required. This is also expected to improve the scheduling performance by reducing the time for the resync loop to converge

Which issue(s) this PR fixes:

Fixes N/A

Special notes for your reviewer:

Does this PR introduce a user-facing change?

nodetopologymatch: track only pods with exclusive resources. Reduces churn and improves the resync latency.

k8s-ci-robot · 2023-05-23T08:16:52Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ffromani

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [ffromani]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ffromani · 2023-05-23T08:17:10Z

/hold

still WIP
want to make more progress on nodetopologymatch: improve integration tests #576 before

ffromani · 2023-05-23T12:19:24Z

/retest

looks like a infra glitch

ffromani · 2023-05-24T08:44:20Z

/hold cancel

the PR is reviewable

pkg/noderesourcetopology/cache/foreign_pods.go

ffromani · 2023-05-24T08:46:37Z

/hold

see inline comment

PiotrProkop

small nits, otherwise looks good.

pkg/noderesourcetopology/resourcerequests/exclusive_test.go

pkg/noderesourcetopology/resourcerequests/exclusive.go

pkg/noderesourcetopology/cache/store.go

ffromani · 2023-05-25T09:02:18Z

back to WIP as I revisit the approach and address reviewer's comments

ffromani · 2023-05-26T16:01:15Z

/hold cancel

marking it WIP is enough. Still not reviewable atm (because still WIP)

The `InformerFromHandle` helper was meant to abstract the creation of pod(Shared)Informer and anything related (podLister) from the framework Handle. Having podLister instantiated outside is just a historical accident. Let's fix this with no intended change of behavior. Signed-off-by: Francesco Romani <fromani@redhat.com>

add helper package to identify pods which require some specific resources, most notably exclusive resources (CPU, devices...). We have code already which depends on detecting this pods, and we want to add more logic depending on exclusive resources. Additionally, we add (much more) test coverage for this code. Signed-off-by: Francesco Romani <fromani@redhat.com>

"foreign pods" are pods handled by other schedulers, namely the primary default scheduler. "foreign pods" need to be tracked to make sure the overreserve cache is accurate and trusted. When a foreign pod is detected, the relevant node cache is marked untrusty and verified periodically using the standard resync method. Expose in the configuration the foreign pods setting. Previously, we didn't had this option in the configuration because the foreign pod detection was simple, thus always enabled when the cache was in turn enabled. Tuning the foreign pods detection (to reduce the scheduling latency) benefits now from configurability. The default is the existing behavior, and the cluster admin can now disable entirely the foreign pods detection (perhaps because the NRT plugin is consumed by the main/only scheduler, thus is redundant) or enable it only for pods which have exclusive resources, which is indeed a way to greatly reduce the resync churn and thus the scheduling latency. Finally, the last addition is the option to only consider pods which have exclusive resources assigned to them. This is because only pods which have exclusive resources assigned to them can have NUMA-affine resources, thus contribute to the overreserve accounting. If we ignore pods with non-exclusive resources, we can significantly reduce noise, churn and CPU time. Signed-off-by: Francesco Romani <fromani@redhat.com>

We need to add more fields to pick the right method to compute the PFP, so we replace `types.NamespacedName` with a custom `podData` struct, which will be a superset of the former. Signed-off-by: Francesco Romani <fromani@redhat.com>

ffromani · 2023-05-29T14:44:46Z

/hold

trying to avoid premature merges

ffromani · 2023-05-30T05:58:00Z

currently hitting: kubernetes/test-infra#29622

ffromani · 2023-06-01T05:36:20Z

/hold cancel

all issues seems to be gone

PiotrProkop · 2023-06-01T07:00:56Z

pkg/noderesourcetopology/cache/store.go

-// the provided Node Resource Topology object.
-func podFingerprintForNodeTopology(nrt *topologyv1alpha2.NodeResourceTopology) string {
+// the provided Node Resource Topology object. Returns the expected fingerprint and the method to compute it.
+func podFingerprintForNodeTopology(nrt *topologyv1alpha2.NodeResourceTopology, method apiconfig.CacheResyncMethod) (string, apiconfig.CacheResyncMethod) {


maybe change signature to:

func podFingerprintForNodeTopology(nrt *topologyv1alpha2.NodeResourceTopology, method apiconfig.CacheResyncMethod) (string, bool) {

and the second return param will be wantsOnlyExclRes.
Then when passing to checkPodFingerprintForNode we can pass only boolean and not pfpMethod

good point. That was the original version of the code. I'm still torn between YAGNI and keep the signature simple (and don't leak apiconfig too much all around the codebase) and be generic and forward looking.
Now that you highlight this, I'm leaning towards reverting to YAGNI and implement your suggestion

When computing the expected node pod fingerprint (PFP), we now look for the attribute describing the computation method provided by the topology-updater agent. If not provided, we fall back to the previous backward-compatible method, which is consider all pods. If the method is supplied, the scheduler plugin now computes the PFP in the same way advertised as the node agent. Finally, in order to keep the user fully in control, add config tunables to force the global behavior of the resync loop. The cluster admin can let the plugin autodetect (default behavior) or force one of the existing methods. Signed-off-by: Francesco Romani <fromani@redhat.com>

Rewrite TestFingerprintFromNRT as tabular test. The test grown a bit wild and it's time to cleanup. No expected change (bar a slight increase by side effect) in coverage. Signed-off-by: Francesco Romani <fromani@redhat.com>

PiotrProkop

/lgtm

k8s-ci-robot requested a review from seanmalloy May 23, 2023 08:16

k8s-ci-robot requested a review from swatisehgal May 23, 2023 08:16

k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels May 23, 2023

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 23, 2023

ffromani force-pushed the resync-exclusive-resources branch from 2a01ad5 to f38583b Compare May 24, 2023 08:31

ffromani changed the title ~~WIP: nodetopologymatch: track only pods with exclusive resources~~ nodetopologymatch: track only pods with exclusive resources May 24, 2023

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 24, 2023

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 24, 2023

ffromani commented May 24, 2023

View reviewed changes

pkg/noderesourcetopology/cache/foreign_pods.go Show resolved Hide resolved

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 24, 2023

PiotrProkop reviewed May 24, 2023

View reviewed changes

pkg/noderesourcetopology/resourcerequests/exclusive_test.go Show resolved Hide resolved

pkg/noderesourcetopology/resourcerequests/exclusive.go Outdated Show resolved Hide resolved

pkg/noderesourcetopology/cache/store.go Outdated Show resolved Hide resolved

ffromani changed the title ~~nodetopologymatch: track only pods with exclusive resources~~ WIP nodetopologymatch: track only pods with exclusive resources May 25, 2023

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 25, 2023

ffromani force-pushed the resync-exclusive-resources branch from f38583b to c4ff2f7 Compare May 25, 2023 09:06

k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels May 25, 2023

ffromani changed the title ~~WIP nodetopologymatch: track only pods with exclusive resources~~ WIP nodetopologymatch: allow to track only pods with exclusive resources May 25, 2023

ffromani force-pushed the resync-exclusive-resources branch 2 times, most recently from a77432b to c91ce9b Compare May 26, 2023 16:00

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 26, 2023

ffromani force-pushed the resync-exclusive-resources branch from 91357c3 to 74ba154 Compare May 29, 2023 14:31

ffromani added 3 commits May 29, 2023 16:41

nrt: overreserve: add and use podData struct

30100f9

We need to add more fields to pick the right method to compute the PFP, so we replace `types.NamespacedName` with a custom `podData` struct, which will be a superset of the former. Signed-off-by: Francesco Romani <fromani@redhat.com>

ffromani force-pushed the resync-exclusive-resources branch from 74ba154 to fb7236b Compare May 29, 2023 14:41

ffromani changed the title ~~WIP nodetopologymatch: allow to track only pods with exclusive resources~~ nodetopologymatch: allow to track only pods with exclusive resources May 29, 2023

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 29, 2023

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 29, 2023

ffromani force-pushed the resync-exclusive-resources branch from fb7236b to 48b753c Compare May 30, 2023 05:53

ffromani force-pushed the resync-exclusive-resources branch from 48b753c to 97728ec Compare May 31, 2023 06:39

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 1, 2023

PiotrProkop reviewed Jun 1, 2023

View reviewed changes

ffromani added 2 commits June 1, 2023 10:56

chore: nrt: test: rewrite TestFingerprintFromNRT

f5d7100

Rewrite TestFingerprintFromNRT as tabular test. The test grown a bit wild and it's time to cleanup. No expected change (bar a slight increase by side effect) in coverage. Signed-off-by: Francesco Romani <fromani@redhat.com>

ffromani force-pushed the resync-exclusive-resources branch from 97728ec to f5d7100 Compare June 1, 2023 09:15

PiotrProkop reviewed Jun 1, 2023

View reviewed changes

k8s-ci-robot assigned PiotrProkop Jun 1, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 1, 2023

k8s-ci-robot merged commit cbf2979 into kubernetes-sigs:master Jun 1, 2023
6 checks passed

ffromani deleted the resync-exclusive-resources branch June 1, 2023 09:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nodetopologymatch: allow to track only pods with exclusive resources #590

nodetopologymatch: allow to track only pods with exclusive resources #590

ffromani commented May 23, 2023

k8s-ci-robot commented May 23, 2023

ffromani commented May 23, 2023

ffromani commented May 23, 2023

ffromani commented May 24, 2023

ffromani commented May 24, 2023

PiotrProkop left a comment

ffromani commented May 25, 2023

ffromani commented May 26, 2023

ffromani commented May 29, 2023

ffromani commented May 30, 2023

ffromani commented Jun 1, 2023

PiotrProkop Jun 1, 2023

ffromani Jun 1, 2023 •

edited

Loading

PiotrProkop left a comment

nodetopologymatch: allow to track only pods with exclusive resources #590

nodetopologymatch: allow to track only pods with exclusive resources #590

Conversation

ffromani commented May 23, 2023

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

k8s-ci-robot commented May 23, 2023

ffromani commented May 23, 2023

ffromani commented May 23, 2023

ffromani commented May 24, 2023

ffromani commented May 24, 2023

PiotrProkop left a comment

Choose a reason for hiding this comment

ffromani commented May 25, 2023

ffromani commented May 26, 2023

ffromani commented May 29, 2023

ffromani commented May 30, 2023

ffromani commented Jun 1, 2023

PiotrProkop Jun 1, 2023

Choose a reason for hiding this comment

ffromani Jun 1, 2023 • edited Loading

Choose a reason for hiding this comment

PiotrProkop left a comment

Choose a reason for hiding this comment

ffromani Jun 1, 2023 •

edited

Loading