Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube-controller-manager v1.26 high cpu usage #118706

Closed
zigmund opened this issue Jun 16, 2023 · 25 comments
Closed

kube-controller-manager v1.26 high cpu usage #118706

zigmund opened this issue Jun 16, 2023 · 25 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/regression Categorizes issue or PR as related to a regression from a prior release. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/apps Categorizes an issue or PR as relevant to SIG Apps. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@zigmund
Copy link

zigmund commented Jun 16, 2023

What happened?

Upgraded cluster from v1.24.6 to v1.26.4 (via v1.25.9) and after that kube-controller-manager starts to eat all available cpu:
image

Also I see massive workqueue_retries_total{name="cronjob"} metrics rate increase - from 2-3 per second to 20-30k:
image

Dumped pprof profile from kube-controller-manager and also see massive cronjob related load:
image

What did you expect to happen?

Same CPU usage of kube-controller-manager.

How can we reproduce it (as minimally and precisely as possible)?

IDK. We have 7 similar clusters of same version - issue is presents only in one of them.

Anything else we need to know?

~170 cronjobs
~300 job

Deleted some cronjobs and old failed jobs - didn't see any effect.

Kubernetes version

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.4", GitCommit:"872a965c6c6526caa949f0c6ac028ef7aff3fb78", GitTreeState:"clean", BuildDate:"2022-11-09T13:36:36Z", GoVersion:"go1.19.3", Compiler:"gc", Platform:"darwin/arm64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.4", GitCommit:"f89670c3aa4059d6999cb42e23ccb4f0b9a03979", GitTreeState:"clean", BuildDate:"2023-04-12T12:05:35Z", GoVersion:"go1.19.8", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider

none, self-hosted baremetal

OS version

# On Linux:
$ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.4 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.4 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
$ uname -a
Linux kube-[REDACTED] 5.15.0-53-generic #59~20.04.1-Ubuntu SMP Thu Oct 20 15:10:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Install tools

ansible

Container runtime (CRI) and version (if applicable)

containerd 1.6.8-1

Related plugins (CNI, CSI, ...) and versions (if applicable)

flannel v0.14.0
@zigmund zigmund added the kind/bug Categorizes issue or PR as related to a bug. label Jun 16, 2023
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 16, 2023
@zigmund
Copy link
Author

zigmund commented Jun 16, 2023

/sig api-machinery

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 16, 2023
@aojea
Copy link
Member

aojea commented Jun 16, 2023

/sig scheduling

@alculquicondor I think sig-scheduling is handling the cronjob and job controllers, right?

@k8s-ci-robot k8s-ci-robot added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Jun 16, 2023
@alculquicondor
Copy link
Member

No :) I personally look at Job, though

/remove-sig scheduling
/sig apps
cc @soltysh

@k8s-ci-robot k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. and removed sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. labels Jun 16, 2023
@alculquicondor
Copy link
Member

Cronjob v2 got to stable in 1.22, so if anything, it's something more recent.

@alculquicondor
Copy link
Member

alculquicondor commented Jun 16, 2023

My wild guess would be that somehow kcm is reconciling all CronJobs even though they are not due for schedule.

@zigmund any chance you can look at the logs and see whether a particular cronjob with a low frequency is being reconciled continuosly?

@zigmund
Copy link
Author

zigmund commented Jun 17, 2023

@alculquicondor
Checked few monthly cronjobs, don't see such problem.

But I see multiple logs about same jobs:

I0617 20:28:00.368553      12 job_controller.go:514] enqueueing job market-production/api-scheduler-28116868
I0617 20:28:00.369321      12 job_controller.go:514] enqueueing job market-production/api-scheduler-28116868
I0617 20:28:00.409013      12 job_controller.go:514] enqueueing job market-production/api-scheduler-28116868
I0617 20:28:01.056718      12 job_controller.go:514] enqueueing job market-production/api-scheduler-28116868
I0617 20:28:03.946792      12 job_controller.go:514] enqueueing job market-production/api-scheduler-28116868
I0617 20:28:16.982832      12 job_controller.go:514] enqueueing job market-production/api-scheduler-28116868
I0617 20:28:17.997649      12 job_controller.go:514] enqueueing job market-production/api-scheduler-28116868
I0617 20:28:20.009703      12 job_controller.go:514] enqueueing job market-production/api-scheduler-28116868
I0617 20:28:21.034160      12 job_controller.go:514] enqueueing job market-production/api-scheduler-28116868
I0617 20:28:21.055513      12 job_controller.go:514] enqueueing job market-production/api-scheduler-28116868
I0617 20:28:21.066377      12 job_controller.go:514] enqueueing job market-production/api-scheduler-28116868

Also there are many error logs:

E0617 20:37:18.442053      12 cronjob_controllerv2.go:166] error syncing CronJobController market-production/api-scheduler, requeuing: Operation cannot be fulfilled on cronjobs.batch "api-scheduler": the object has been modified; please apply your changes to the latest version and try again

But I also see all same logs in other healthy clusters.

@zigmund
Copy link
Author

zigmund commented Jun 19, 2023

@alculquicondor

I started kcm locally for debug and that's what I found so far:
There are two cronjobs constantly readded to workqueue with negative *requeueAfter: https://github.com/kubernetes/kubernetes/blob/v1.26.4/pkg/controller/cronjob/cronjob_controllerv2.go#L168-L171
Duration calculated wrongly in getMostRecentScheduleTime: https://github.com/kubernetes/kubernetes/blob/v1.26.4/pkg/controller/cronjob/utils.go#L121-L143

  1. Cronjob parts-panel-cronjob-products-data:
Schedule:                      0 16 * * *
Concurrency Policy:            Forbid
Last Schedule Time:  Fri, 16 Jun 2023 16:00:00 +0600
Active Jobs:         parts-panel-cronjob-products-data-28115160

Now is 2023-06-19 13:11:04
There is 1 active long running job, started at 3 days ago, already missed some schedules, getMostRecentScheduleTime should return 2023-06-19 16:00:00 (in the future), but returns 2023-06-18 16:00:00 (in the past).

  1. Cronjob parts-panel-cronjob-abcp-products-data:
Schedule:                      0 12 * * 3
Last Schedule Time:  Wed, 31 May 2023 12:00:00 +0600
Starting Deadline Seconds:     30s
Active Jobs:         <none>

Now is 2023-06-19 13:11:04
Schedules at 2023-06-17 12:00:00 and 2023-06-14 12:00:00 were missed for some reason (maybe because of Starting Deadline Seconds, doesn't matters), getMostRecentScheduleTime should return 2023-06-21 12:00:00 (in the future), but returns 2023-06-14 12:00:00.

@zigmund
Copy link
Author

zigmund commented Jun 19, 2023

Workaround - recreated these cronjobs and now kcm works fine.
Looked new code - the issue maybe already fixed in v1.27, but need to check.

@alculquicondor
Copy link
Member

@zigmund thanks for the detailed debugging.

Just for clarification job_controller.go is about Job, not CronJob. Jobs do need to be synced somewhat continuously while they are running.

Feel free to submit a PR if you find the fix.

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 19, 2023
@zigmund
Copy link
Author

zigmund commented Aug 3, 2023

It happened again after another controller-manager restart.

@alculquicondor
Copy link
Member

Did you have a chance to test in 1.27?

@zigmund
Copy link
Author

zigmund commented Aug 3, 2023

@alculquicondor, unfortunately it's not possible to upgrade the cluster in near future.

@alculquicondor
Copy link
Member

I see. In any case @soltysh probably has more context about recent changes in CronJob that could solve the issue and can potentially be cherry-picked.

@sxllwx
Copy link
Member

sxllwx commented Sep 26, 2023

/assign

@sxllwx
Copy link
Member

sxllwx commented Sep 26, 2023

This af1c9e4 seems to have fixed this problem.

I'm looking at ways to fix it. I will give some conclusions later.

@sxllwx
Copy link
Member

sxllwx commented Oct 20, 2023

It has been confirmed that #110838 commit(be44d67) corrects the error here.

We need to pay attention to the calculation method of earliestTime in the function mostRecentScheduleTime In the past, only Status#LastScheduleTime and ObjectMeta#CreationTimestamp were involved in the comparison. Now Spec#StartingDeadlineSeconds and now() will be involved in the calculation, so that the requeue time we calculate will not be a negative number. .

This is the unit test case I use for testing. You can use it for verification if you want.

sxllwx@82d0a0a

sxllwx added a commit to sxllwx/kubernetes that referenced this issue Oct 20, 2023
- bug-case reproduced
sxllwx added a commit to sxllwx/kubernetes that referenced this issue Oct 20, 2023
@sxllwx
Copy link
Member

sxllwx commented Oct 20, 2023

There is another interesting thing here:

numberOfMissedSchedules := (timeElapsed / timeBetweenTwoSchedules) + 1
t := time.Unix(t1.Unix()+((numberOfMissedSchedules-1)*timeBetweenTwoSchedules), 0).UTC()
return &t, numberOfMissedSchedules, nil

t.UTC() operation is performed on t, which will affect sched (I noticed that our spec does not set the time zone. I recommend setting it to "TZ=UTC 0 12 * * 3" Compatible with the UTC TZ of the hardcode in the code) calculates the scheduling time of the next round.

mostRecentTime, _, err := getMostRecentScheduleTime(earliestTime, now, sched)
if err != nil {
// we still have to requeue at some point, so aim for the next scheduling slot from now
mostRecentTime = &now
} else if mostRecentTime == nil {
// no missed schedules since earliestTime
mostRecentTime = &earliestTime
}
t := sched.Next(*mostRecentTime).Add(nextScheduleDelta).Sub(now)

@liggitt
Copy link
Member

liggitt commented Oct 20, 2023

I noticed that our spec does not set the time zone. I recommend setting it to "TZ=UTC 0 12 * * 3"

don't set TZ in the string schedule (that will be forbidden in future releases), set it in the cronjob spec

@liggitt
Copy link
Member

liggitt commented Oct 20, 2023

@sxllwx thanks for working on this

@soltysh, now that we have a crisp unit test reproducer, can we prioritize review and backport of a fix? it looks like this changed in 1.25, which is almost at EOL (I'm not even sure there's another patch release planned).

@liggitt liggitt added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. kind/regression Categorizes issue or PR as related to a regression from a prior release. and removed sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Oct 20, 2023
@soltysh
Copy link
Contributor

soltysh commented Oct 20, 2023

@sxllwx thx for your effort with reproducer

I'll make sure to prioritize this next week on Monday to figure out possible fixes.

@soltysh
Copy link
Contributor

soltysh commented Oct 23, 2023

Indeed #110838 fixed a lot of issues around the calculations of time schedules, especially around its performance. Additionally PRs #118724 and #118940 and currently awaiting one #121327 (where I've just added the test case from @sxllwx) improve both accuracy and performance. I'll check which ones we can safely backport to previous releases once the last one merges.

@soltysh
Copy link
Contributor

soltysh commented Oct 23, 2023

/assign

@zigmund
Copy link
Author

zigmund commented Oct 23, 2023

@soltysh nice, looking forward for patch release.

@soltysh
Copy link
Contributor

soltysh commented Oct 26, 2023

1.28 pick: #121536 - picks only #121327
1.27 pick: #121537 - picks only #121327
1.26 pick: #121540 - picks #110838 and #121327

@zigmund
Copy link
Author

zigmund commented Dec 27, 2023

All related PR's are merged, looks like we can close it now.
@soltysh thanks!

@zigmund zigmund closed this as completed Dec 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/regression Categorizes issue or PR as related to a regression from a prior release. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/apps Categorizes an issue or PR as relevant to SIG Apps. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Archived in project
Development

No branches or pull requests

7 participants