Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nodetopologymatch: allow to track only pods with exclusive resources #590

Merged

Conversation

ffromani
Copy link
Contributor

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

Optimize both the foreign pods detection and the overreserve resync loop considering only pods which have exclusive resources assigned (vs considering all of them) because only exclusive resources have NUMA affinity, so only those contribute to per-NUMA accounting. This reduces churn, noise and computing power required. This is also expected to improve the scheduling performance by reducing the time for the resync loop to converge

Which issue(s) this PR fixes:

Fixes N/A

Special notes for your reviewer:

Does this PR introduce a user-facing change?

nodetopologymatch: track only pods with exclusive resources. Reduces churn and improves the resync latency.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 23, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ffromani

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels May 23, 2023
@ffromani
Copy link
Contributor Author

/hold

  1. still WIP
  2. want to make more progress on nodetopologymatch: improve integration tests #576 before

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 23, 2023
@ffromani
Copy link
Contributor Author

/retest

looks like a infra glitch

@ffromani ffromani force-pushed the resync-exclusive-resources branch from 2a01ad5 to f38583b Compare May 24, 2023 08:31
@ffromani ffromani changed the title WIP: nodetopologymatch: track only pods with exclusive resources nodetopologymatch: track only pods with exclusive resources May 24, 2023
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 24, 2023
@ffromani
Copy link
Contributor Author

/hold cancel

the PR is reviewable

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 24, 2023
@ffromani
Copy link
Contributor Author

/hold

see inline comment

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 24, 2023
Copy link
Contributor

@PiotrProkop PiotrProkop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nits, otherwise looks good.

pkg/noderesourcetopology/resourcerequests/exclusive.go Outdated Show resolved Hide resolved
pkg/noderesourcetopology/cache/store.go Outdated Show resolved Hide resolved
@ffromani ffromani changed the title nodetopologymatch: track only pods with exclusive resources WIP nodetopologymatch: track only pods with exclusive resources May 25, 2023
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 25, 2023
@ffromani
Copy link
Contributor Author

back to WIP as I revisit the approach and address reviewer's comments

@ffromani ffromani force-pushed the resync-exclusive-resources branch from f38583b to c4ff2f7 Compare May 25, 2023 09:06
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels May 25, 2023
@ffromani ffromani changed the title WIP nodetopologymatch: track only pods with exclusive resources WIP nodetopologymatch: allow to track only pods with exclusive resources May 25, 2023
@ffromani ffromani force-pushed the resync-exclusive-resources branch 2 times, most recently from a77432b to c91ce9b Compare May 26, 2023 16:00
@ffromani
Copy link
Contributor Author

/hold cancel

marking it WIP is enough. Still not reviewable atm (because still WIP)

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 26, 2023
The `InformerFromHandle` helper was meant to abstract the creation
of pod(Shared)Informer and anything related (podLister) from the
framework Handle. Having podLister instantiated outside is just
a historical accident. Let's fix this with no intended change
of behavior.

Signed-off-by: Francesco Romani <fromani@redhat.com>
@ffromani ffromani force-pushed the resync-exclusive-resources branch from 91357c3 to 74ba154 Compare May 29, 2023 14:31
add helper package to identify pods which require some specific
resources, most notably exclusive resources (CPU, devices...).
We have code already which depends on detecting this pods, and
we want to add more logic depending on exclusive resources.

Additionally, we add (much more) test coverage for this code.

Signed-off-by: Francesco Romani <fromani@redhat.com>
"foreign pods" are pods handled by other schedulers, namely
the primary default scheduler.
"foreign pods" need to be tracked to make sure the overreserve
cache is accurate and trusted. When a foreign pod is detected,
the relevant node cache is marked untrusty and verified periodically
using the standard resync method.

Expose in the configuration the foreign pods setting.
Previously, we didn't had this option in the configuration
because the foreign pod detection was simple, thus always
enabled when the cache was in turn enabled.

Tuning the foreign pods detection (to reduce the scheduling latency)
benefits now from configurability.

The default is the existing behavior, and the cluster admin
can now disable entirely the foreign pods detection (perhaps because
the NRT plugin is consumed by the main/only scheduler, thus is
redundant) or enable it only for pods which have exclusive resources,
which is indeed a way to greatly reduce the resync churn and thus
the scheduling latency.

Finally, the last addition is the option to only consider pods
which have exclusive resources assigned to them. This is because
only pods which have exclusive resources assigned to them can have
NUMA-affine resources, thus contribute to the overreserve accounting.

If we ignore pods with non-exclusive resources, we can significantly
reduce noise, churn and CPU time.

Signed-off-by: Francesco Romani <fromani@redhat.com>
We need to add more fields to pick the right method to compute
the PFP, so we replace `types.NamespacedName` with a custom `podData`
struct, which will be a superset of the former.

Signed-off-by: Francesco Romani <fromani@redhat.com>
@ffromani ffromani force-pushed the resync-exclusive-resources branch from 74ba154 to fb7236b Compare May 29, 2023 14:41
@ffromani ffromani changed the title WIP nodetopologymatch: allow to track only pods with exclusive resources nodetopologymatch: allow to track only pods with exclusive resources May 29, 2023
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 29, 2023
@ffromani
Copy link
Contributor Author

/hold

trying to avoid premature merges

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 29, 2023
@ffromani ffromani force-pushed the resync-exclusive-resources branch from fb7236b to 48b753c Compare May 30, 2023 05:53
@ffromani
Copy link
Contributor Author

currently hitting: kubernetes/test-infra#29622

@ffromani ffromani force-pushed the resync-exclusive-resources branch from 48b753c to 97728ec Compare May 31, 2023 06:39
@ffromani
Copy link
Contributor Author

ffromani commented Jun 1, 2023

/hold cancel

all issues seems to be gone

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 1, 2023
// the provided Node Resource Topology object.
func podFingerprintForNodeTopology(nrt *topologyv1alpha2.NodeResourceTopology) string {
// the provided Node Resource Topology object. Returns the expected fingerprint and the method to compute it.
func podFingerprintForNodeTopology(nrt *topologyv1alpha2.NodeResourceTopology, method apiconfig.CacheResyncMethod) (string, apiconfig.CacheResyncMethod) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe change signature to:

func podFingerprintForNodeTopology(nrt *topologyv1alpha2.NodeResourceTopology, method apiconfig.CacheResyncMethod) (string, bool) {

and the second return param will be wantsOnlyExclRes.
Then when passing to checkPodFingerprintForNode we can pass only boolean and not pfpMethod

Copy link
Contributor Author

@ffromani ffromani Jun 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. That was the original version of the code. I'm still torn between YAGNI and keep the signature simple (and don't leak apiconfig too much all around the codebase) and be generic and forward looking.
Now that you highlight this, I'm leaning towards reverting to YAGNI and implement your suggestion

When computing the expected node pod fingerprint (PFP), we
now look for the attribute describing the computation method
provided by the topology-updater agent. If not provided, we
fall back to the previous backward-compatible method, which
is consider all pods.

If the method is supplied, the scheduler plugin now computes
the PFP in the same way advertised as the node agent.

Finally, in order to keep the user fully in control, add
config tunables to force the global behavior of the resync
loop. The cluster admin can let the plugin autodetect
(default behavior) or force one of the existing methods.

Signed-off-by: Francesco Romani <fromani@redhat.com>
Rewrite TestFingerprintFromNRT as tabular test.
The test grown a bit wild and it's time to cleanup. No expected
change (bar a slight increase by side effect) in coverage.

Signed-off-by: Francesco Romani <fromani@redhat.com>
@ffromani ffromani force-pushed the resync-exclusive-resources branch from 97728ec to f5d7100 Compare June 1, 2023 09:15
Copy link
Contributor

@PiotrProkop PiotrProkop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 1, 2023
@k8s-ci-robot k8s-ci-robot merged commit cbf2979 into kubernetes-sigs:master Jun 1, 2023
6 checks passed
@ffromani ffromani deleted the resync-exclusive-resources branch June 1, 2023 09:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants