Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change CPUCFSQuotaPeriod default value from 100ms to 100us to match Linux default #111520

Merged
merged 1 commit into from
Aug 24, 2022
Merged

Change CPUCFSQuotaPeriod default value from 100ms to 100us to match Linux default #111520

merged 1 commit into from
Aug 24, 2022

Conversation

paskal
Copy link
Contributor

@paskal paskal commented Jul 28, 2022

cpu.cfs_period_us is 100μs by default despite having an "ms" unit for some unfortunate reason. Documentation: https://www.kernel.org/doc/html/latest/scheduler/sched-bwc.html#management

The desired effect of that change is to match k8s default CPUCFSQuotaPeriod value (100ms before that change) with one used in k8s when CPUCFSQuotaPeriod is not set, and Linux CFS (100μs, 1000x smaller than 100ms).

This PR is a followup of the #63437, which introduced the default of 100ms in k8s v1.12.

/kind api-change

Does this PR introduce a user-facing change? Yes, described above.

[kubelet] Change default `cpuCFSQuotaPeriod` value with enabled `cpuCFSQuotaPeriod` flag from 100ms to 100µs to match the Linux CFS and k8s defaults. `cpuCFSQuotaPeriod` of 100ms now requires `customCPUCFSQuotaPeriod` flag to be set to work.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. kind/documentation Categorizes issue or PR as related to documentation. labels Jul 28, 2022
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Jul 28, 2022

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: paskal / name: Dmitry Verkhoturov (c74977235aabe8ce87587c1eb20890190a7357ba)

@k8s-ci-robot k8s-ci-robot added do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 28, 2022
@k8s-ci-robot
Copy link
Contributor

Welcome @paskal!

It looks like this is your first PR to kubernetes/kubernetes 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/kubernetes has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Contributor

Hi @paskal. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jul 28, 2022
@k8s-ci-robot k8s-ci-robot added area/code-generation area/kubelet kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/node Categorizes an issue or PR as relevant to SIG Node. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Jul 28, 2022
@szuecs
Copy link
Member

szuecs commented Aug 3, 2022

/retest

@paskal
Copy link
Contributor Author

paskal commented Aug 3, 2022

Unit tests failed due to yaml file having us instead of µs, I fixed it now. I don't know why e2e tests are failing and how to fix them. Reverting yaml files to 100ms produce same failure in e2e tests, so they are not relevant.

@bobbypage
Copy link
Member

Unit tests failed due to yaml file having us instead of µs, I fixed it now. I don't know why e2e tests are failing and how to fix them. Reverting yaml files to 100ms produce same failure in e2e tests, so they are not relevant.

E2E tests need to be passing to get the PR merged.

Please see here - https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/111520/pull-kubernetes-e2e-gce-ubuntu-containerd/1554830092503355392/artifacts/e2e-e2a60cb2ac-a7d53-minion-group-05r1/kubelet.log

Kubelet is failing to start because:

Aug 03 14:35:19.780469 e2e-e2a60cb2ac-a7d53-minion-group-05r1 kubelet[4188]: E0803 14:35:19.772966    4188 run.go:74] "command failed" err="failed to validate kubelet configuration, error: invalid configuration: cpuCFSQuotaPeriod (--cpu-cfs-quota-period) {100µs} requires feature gate CustomCPUCFSQuotaPeriod, path: &TypeMeta{Kind:,APIVersion:,}"

@paskal
Copy link
Contributor Author

paskal commented Aug 4, 2022

Thanks a lot for pointer to the logs location! I'll fix it, but I'm struggling to understand of why my changes triggered that problem, as I only changed existing value and haven't introduced configuration overrides anywhere in the code.

cpu.cfs_period_us is 100μs by default despite having an "ms" unit
for some unfortunate reason. Documentation:
https://www.kernel.org/doc/html/latest/scheduler/sched-bwc.html#management

The desired effect of that change is to match
k8s default `CPUCFSQuotaPeriod` value (100ms before that change)
with one used in k8s without the `CustomCPUCFSQuotaPeriod` flag enabled
and Linux CFS (100us, 1000x smaller than 100ms).
@paskal
Copy link
Contributor Author

paskal commented Aug 10, 2022

I'm clueless about how to properly fix the error which is highlighted by CI now: for some reason thousands of lines supposed to be added to api/openapi-spec/swagger.json and others like api/openapi-spec/v3/apis__autoscaling__v2beta2_openapi.json. For me as a naive contributor it looks like I've triggered generation for whole new version of APIs. Can someone please tell me what is the right course of actions here?

@liggitt
Copy link
Member

liggitt commented Aug 10, 2022

It was an unrelated failure now resolved in master

/retest

@paskal
Copy link
Contributor Author

paskal commented Aug 10, 2022

Thanks!

Reviewers, please review the code and the changelog entry part in the PR description: the PR changed the scope since the original introduction and requires a proper review before merging. #111554 (code-comments-only change) should be reviewed as well.

@liggitt liggitt added this to API review completed, 1.25 in API Reviews Aug 10, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liggitt, paskal, szuecs

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@paskal
Copy link
Contributor Author

paskal commented Aug 20, 2022

@bobbypage, could you please tell me if something is still missing here and in #111554, which prevents merging them into master? I'm not familiar with k8s merge flow enough to know what's missing at this moment here, and in another PR, approval is missing, but I already pinged all approvers once, and nothing happened.

@PixelOrange
Copy link

@paskal It looks to me like there are three people still pending approval on this one:

bobbypage
liggitt
mrunalp

It also says tide is pending and isn't mergeable, although I am not familiar with tide. I hope they see it soon, I've been anxiously awaiting this change since you first opened it.

Copy link
Member

@endocrimes endocrimes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

it'll merge when we thaw the codebase after the release of 1.25.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 23, 2022
@dims
Copy link
Member

dims commented Aug 23, 2022

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 23, 2022
@bobbypage
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot merged commit 08aac4f into kubernetes:master Aug 24, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.26 milestone Aug 24, 2022
@paskal paskal deleted the paskal/clarify_cfs_period_us branch August 24, 2022 03:20
@paskal
Copy link
Contributor Author

paskal commented Aug 31, 2022

For history, I was wrong, and the default value is 100ms and not 100μs: PR #112077 reverted these breaking changes, and PR #112123 clarified the code to prevent mistakes about units of measurement around this feature in the future. I'll try clarifying linked kernel.org documentation as well.

@PixelOrange
Copy link

@paskal I read through your discussion in #112108 and while I now also follow, I believe that the setting should not include a unit if the default value is a conflicting unit. If the value can never be less than 1ms then there's never a reason for microseconds to come into play. I think the "_us" should be removed from the setting and the current description stating the range of valid values be left unchanged.

Unfortunately, I don't know how to go about requesting that change. I'm coming at this from the same perspective as many other people who've stumbled on these conversations due to poorly performing applications under CFS. What do you suggest?

@paskal
Copy link
Contributor Author

paskal commented Aug 31, 2022

@PixelOrange, I would propose you check changes in #112123 and verify if something is missing there from your perspective. The valid range of values is 1ms-1s, and the k8s source code will be updated to reflect it.

_us postfix comes into play only in the Linux kernel and is part of the public API, meaning that logic will never change. The only thing which could be changed is documentation around the topic, and I'll try to come up with that change.

@PixelOrange
Copy link

Everything looks good as far as I can tell.

That's unfortunate about the public API. I found an article on how to submit kernel patches but I imagine that's a daunting task.

If you want a second set of eyes for the documentation updates, please feel free to message me.

@szuecs
Copy link
Member

szuecs commented Aug 31, 2022

Thanks @paskal for going the long way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/code-generation area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/documentation Categorizes issue or PR as related to documentation. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/node Categorizes an issue or PR as relevant to SIG Node. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Status: API review completed, 1.25
Development

Successfully merging this pull request may close these issues.

None yet

10 participants