Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "Surface more information about plugin scores in scheduler" #99914

Merged
merged 1 commit into from Mar 7, 2021

Conversation

Huang-Wei
Copy link
Member

@Huang-Wei Huang-Wei commented Mar 7, 2021

What type of PR is this?

/kind regression
/sig scheduling

What this PR does / why we need it:

This PR reverts #99411.

Tried locally that change v(4) to v(5) works, but that doesn't fix the root cause - users who customized logging level to 5 or greater can still encounter this issue. So I proposed to revert #99411 and dig out why it regresses.

Which issue(s) this PR fixes:

Fixes #99913

NONE

@k8s-ci-robot k8s-ci-robot added kind/regression Categorizes issue or PR as related to a regression from a prior release. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 7, 2021
@k8s-ci-robot
Copy link
Contributor

@Huang-Wei: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 7, 2021
@Huang-Wei
Copy link
Member Author

/priority critical-urgent

/assign @damemi
/cc @ahg-g @alculquicondor

@k8s-ci-robot k8s-ci-robot added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 7, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Huang-Wei

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 7, 2021
@ahg-g
Copy link
Member

ahg-g commented Mar 7, 2021

can you explain the regression?

@ahg-g
Copy link
Member

ahg-g commented Mar 7, 2021

but that doesn't fix the root cause - users who customized logging level to 5 or greater can still encounter this issue.

Which issue?

@ahg-g
Copy link
Member

ahg-g commented Mar 7, 2021

oh, you are referring to #99913

Well, the benchmark should be running with the default v=2 logging level. I am guessing that increasing the logging level to 3 or 4 is done sometimes temporarily.

That being said, I don't quite like the way #99411 was implemented, it seems a bit ad hoc.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Mar 7, 2021
@Huang-Wei
Copy link
Member Author

oh, you are referring to #99913

Yes, I referred to #99913. Sorry, there was a typo.

Well, the benchmark should be running with the default v=2 logging level. I am guessing that increasing the logging level to 3 or 4 is done sometimes temporarily.

Here is the magic:

// Ensure we log at least level 4
v := flag.Lookup("v").Value
level, _ := strconv.Atoi(v.String())
if level < 4 {
v.Set("4")
}

But using which logging level is not the key point - #99411 is beneficial only when the desired logging level is enabled, but when that level is enabled, performance downgrades.

Copy link
Contributor

@damemi damemi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The performance drop is unexpected, though I would like to continue working on this to find a more efficient way to surface information like this.

I think we made some decisions in the original PR that have affected this, such as duplicating the calculation loop in order to separate out the logging information into its own function.

Due to that, we are essentially doing much of this work twice (which would explain the ~50% drop in throughput). I think that some reasonable level of efficiency drop is acceptable, because like @ahg-g said this is not a default logging level. This much is not acceptable.

Thanks for catching this and opening the reversion @Huang-Wei. I'll work to get a new PR with performance testing. Any other suggestions for how to improve this are welcome.
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 7, 2021
@Huang-Wei
Copy link
Member Author

Thanks for catching this and opening the reversion @Huang-Wei. I'll work to get a new PR with performance testing. Any other suggestions for how to improve this are welcome.

Thanks @damemi ! As code freeze is around the corner, reverting may be the most viable option to unblock other PRs.

And it's easy to reproduce the perf drop, reverting can buy us more time to find out the root cause.

@Huang-Wei
Copy link
Member Author

/retest

@k8s-ci-robot k8s-ci-robot merged commit 9a9a9b0 into kubernetes:master Mar 7, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.21 milestone Mar 7, 2021
@Huang-Wei Huang-Wei deleted the throughput-regression branch March 8, 2021 02:29
@alculquicondor
Copy link
Member

I don't think the performance drop is unexpected. But I don't think it's related to the calculation. It's probably about IO. We have 5000 nodes in those tests. That's a lot of writes.

I just didn't think that tests would run at level 4 :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/regression Categorizes issue or PR as related to a regression from a prior release. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scheduler throughput drops significantly
5 participants