Skip to content

Conversation

@xuzhenglun
Copy link
Contributor

@xuzhenglun xuzhenglun commented Oct 30, 2025

/kind bug

What this PR does / why we need it:

When a plugin is doing filter, each node will generate a event and send it to r.bufferCh. meanwhile, the buffer is flushed in every seconds. So if cluster is larger than the buffer size of channel, r.bufferCh can be easily full and overflowed. it cause plugin in later position barely have chance to be seen.

This fix make sure the buffer can be flushed ASAP, to avoid channel overflow as much as possible.

Which issue(s) this PR is related to:

Fixes: #134972

Special notes for your reviewer:

Without this fix, the chart in the bottom left, is evaluated by:

sum by (extension_point, status) (rate(scheduler_plugin_execution_duration_seconds_sum{plugin="DynamicResources"}[$__rate_interval]))

The line in the chart look very discontinuous, and in most time the duration of Filter only cost 0.

image

And with this fix, the chart looks normal now, and no performance regressed:
image

Furthermore, the data in the chart appears similar to the previous E2E test with 100 nodes: kubernetes/perf-tests#3368
https://prow.k8s.io/?job=ci-kubernetes-e2e-gce-100-node-dra-with-workload

Does this PR introduce a user-facing change?

none

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 30, 2025
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-priority Indicates a PR lacks a `priority/foo` label and requires one. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 30, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: xuzhenglun
Once this PR has been reviewed and has the lgtm label, please assign kerthcet for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@xuzhenglun xuzhenglun changed the title [WIP] make sure each plugin have even change to be monitor [WIP] make sure each plugin have even chance to be monitor Oct 30, 2025
@xuzhenglun xuzhenglun force-pushed the metric branch 3 times, most recently from f30c1c6 to 0c39134 Compare October 30, 2025 17:26
@xuzhenglun xuzhenglun changed the title [WIP] make sure each plugin have even chance to be monitor [WIP] make sure MetricAsyncRecorder flush buffer ASAP to avoid overflow Oct 31, 2025
@xuzhenglun
Copy link
Contributor Author

/retest

@xuzhenglun xuzhenglun changed the title [WIP] make sure MetricAsyncRecorder flush buffer ASAP to avoid overflow make sure MetricAsyncRecorder flush buffer ASAP to avoid overflow Oct 31, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Inaccurate metric scheduler_plugin_execution_duration_seconds in large cluster

2 participants