-
Notifications
You must be signed in to change notification settings - Fork 484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nodetopologymatch: allow to track only pods with exclusive resources #590
nodetopologymatch: allow to track only pods with exclusive resources #590
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ffromani The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold
|
/retest looks like a infra glitch |
2a01ad5
to
f38583b
Compare
/hold cancel the PR is reviewable |
/hold see inline comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small nits, otherwise looks good.
back to WIP as I revisit the approach and address reviewer's comments |
f38583b
to
c4ff2f7
Compare
a77432b
to
c91ce9b
Compare
/hold cancel marking it WIP is enough. Still not reviewable atm (because still WIP) |
The `InformerFromHandle` helper was meant to abstract the creation of pod(Shared)Informer and anything related (podLister) from the framework Handle. Having podLister instantiated outside is just a historical accident. Let's fix this with no intended change of behavior. Signed-off-by: Francesco Romani <fromani@redhat.com>
91357c3
to
74ba154
Compare
add helper package to identify pods which require some specific resources, most notably exclusive resources (CPU, devices...). We have code already which depends on detecting this pods, and we want to add more logic depending on exclusive resources. Additionally, we add (much more) test coverage for this code. Signed-off-by: Francesco Romani <fromani@redhat.com>
"foreign pods" are pods handled by other schedulers, namely the primary default scheduler. "foreign pods" need to be tracked to make sure the overreserve cache is accurate and trusted. When a foreign pod is detected, the relevant node cache is marked untrusty and verified periodically using the standard resync method. Expose in the configuration the foreign pods setting. Previously, we didn't had this option in the configuration because the foreign pod detection was simple, thus always enabled when the cache was in turn enabled. Tuning the foreign pods detection (to reduce the scheduling latency) benefits now from configurability. The default is the existing behavior, and the cluster admin can now disable entirely the foreign pods detection (perhaps because the NRT plugin is consumed by the main/only scheduler, thus is redundant) or enable it only for pods which have exclusive resources, which is indeed a way to greatly reduce the resync churn and thus the scheduling latency. Finally, the last addition is the option to only consider pods which have exclusive resources assigned to them. This is because only pods which have exclusive resources assigned to them can have NUMA-affine resources, thus contribute to the overreserve accounting. If we ignore pods with non-exclusive resources, we can significantly reduce noise, churn and CPU time. Signed-off-by: Francesco Romani <fromani@redhat.com>
We need to add more fields to pick the right method to compute the PFP, so we replace `types.NamespacedName` with a custom `podData` struct, which will be a superset of the former. Signed-off-by: Francesco Romani <fromani@redhat.com>
74ba154
to
fb7236b
Compare
/hold trying to avoid premature merges |
fb7236b
to
48b753c
Compare
currently hitting: kubernetes/test-infra#29622 |
48b753c
to
97728ec
Compare
/hold cancel all issues seems to be gone |
// the provided Node Resource Topology object. | ||
func podFingerprintForNodeTopology(nrt *topologyv1alpha2.NodeResourceTopology) string { | ||
// the provided Node Resource Topology object. Returns the expected fingerprint and the method to compute it. | ||
func podFingerprintForNodeTopology(nrt *topologyv1alpha2.NodeResourceTopology, method apiconfig.CacheResyncMethod) (string, apiconfig.CacheResyncMethod) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe change signature to:
func podFingerprintForNodeTopology(nrt *topologyv1alpha2.NodeResourceTopology, method apiconfig.CacheResyncMethod) (string, bool) {
and the second return param will be wantsOnlyExclRes
.
Then when passing to checkPodFingerprintForNode
we can pass only boolean and not pfpMethod
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. That was the original version of the code. I'm still torn between YAGNI and keep the signature simple (and don't leak apiconfig
too much all around the codebase) and be generic and forward looking.
Now that you highlight this, I'm leaning towards reverting to YAGNI and implement your suggestion
When computing the expected node pod fingerprint (PFP), we now look for the attribute describing the computation method provided by the topology-updater agent. If not provided, we fall back to the previous backward-compatible method, which is consider all pods. If the method is supplied, the scheduler plugin now computes the PFP in the same way advertised as the node agent. Finally, in order to keep the user fully in control, add config tunables to force the global behavior of the resync loop. The cluster admin can let the plugin autodetect (default behavior) or force one of the existing methods. Signed-off-by: Francesco Romani <fromani@redhat.com>
Rewrite TestFingerprintFromNRT as tabular test. The test grown a bit wild and it's time to cleanup. No expected change (bar a slight increase by side effect) in coverage. Signed-off-by: Francesco Romani <fromani@redhat.com>
97728ec
to
f5d7100
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
Optimize both the foreign pods detection and the overreserve resync loop considering only pods which have exclusive resources assigned (vs considering all of them) because only exclusive resources have NUMA affinity, so only those contribute to per-NUMA accounting. This reduces churn, noise and computing power required. This is also expected to improve the scheduling performance by reducing the time for the resync loop to converge
Which issue(s) this PR fixes:
Fixes N/A
Special notes for your reviewer:
Does this PR introduce a user-facing change?