pkg/kubelet/cm: cgroup-related cleanups #102218

kolyshkin · 2021-05-21T22:17:50Z

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

While working on #102147, I noticed a few odd things in the code that can be straightened out.
This PR does some of that. Please see detailed description in commits.

Which issue(s) this PR fixes:

none

Special notes for your reviewer:

Please review on commit by commit basis.

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

none

This was added by commit a9772b2. In the current codebase, the cgroup being updated was created using runc/opencontainers' manager.Apply(), which already does controllers propagation, so there is no need to repeat that on every update. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

k8s-ci-robot · 2021-05-21T22:17:58Z

Hi @kolyshkin. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

ehashman · 2021-05-21T23:38:59Z

/ok-to-test
/priority backlog
/triage accepted
/cc @odinuge @fromanirh @klueska

pkg/kubelet/cm/cgroup_manager_linux.go

k8s-ci-robot · 2021-05-24T11:41:18Z

@kolyshkin: The following tests failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
pull-kubernetes-node-crio-cgrpv2-e2e	`f1aee7e`	link	`/test pull-kubernetes-node-crio-cgrpv2-e2e`
pull-kubernetes-node-kubelet-serial-crio-cgroupv2	`f1aee7e`	link	`/test pull-kubernetes-node-kubelet-serial-crio-cgroupv2`
pull-kubernetes-node-kubelet-serial-crio-cgroupv1	`f1aee7e`	link	`/test pull-kubernetes-node-kubelet-serial-crio-cgroupv1`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

kolyshkin · 2021-05-24T17:36:04Z

E2eNode Suite: [sig-node] Summary API [NodeConformance] when querying /stats/summary should report resource usage through the stats api expand_less	1m55s
_output/local/go/src/k8s.io/kubernetes/test/e2e_node/summary_test.go:54
Timed out after 90.000s.
Expected
    <string>: Summary
to match fields: {
[.Node.SystemContainers[runtime].Memory:
	Expected
	    <string>: MemoryStats
	to match fields: {
	.AvailableBytes:
		Expected
		    <*uint64 | 0xc000a91418>: 3705806848
		to be nil
	}
	, .Node.SystemContainers[pods].Memory:
	Expected
	    <string>: MemoryStats
	to match fields: {
	.MajorPageFaults:
		Expected
		    <uint64>: 198
		to be <=
		    <int>: 10
	}
	, .Node.SystemContainers[kubelet].Memory:
	Expected
	    <string>: MemoryStats
	to match fields: {
	.AvailableBytes:
		Expected
		    <*uint64 | 0xc000a91560>: 3836538880
		to be nil
	}
	]
}

As far as I understand, there are two issues here.

The test expects MemoryStats.AvailableBytes to be nil for system containers, and yet it reports something close to 4GB for runtime and kubelet (but not for pods).
Higher than expected number of MemoryStats.MajorPageFaults from the pods system container.

This is presumably caused by the first commit (removing propagateControllers), and I can't think about why is that. Perhaps somehow writing to cgroup.subtree_control reset the state of the controller?

Looking at it.

odinuge · 2021-05-24T18:12:18Z

As far as I understand, there are two issues here.

Ahh, sorry, thought you knew about these test failures... They are expected, I should probably have stated that explicitly. The failures in pull-kubernetes-node-crio-cgrpv2-e2e will be fixed by bumping cadvisor and the commit changing the conformance test in your first runc bump PR. They are tracked here: #99230

pull-kubernetes-node-kubelet-serial-crio-cgroupv{1,2} are just experimental, so don't care about those... 😅

Overall I think this PR is ok. There are some issues with controller propagation on v2 in general (some WIP stuff here: #102250), but you know more about the runc part than me.

kolyshkin · 2021-05-25T00:27:34Z

So, this PR can be merged then I guess?

odinuge · 2021-05-25T07:46:20Z

So, this PR can be merged then I guess?

Overall I think it is ok. I just need to think about why the propagateControllers was added in the first place. When dealing with systemd (both using systemd driver and procfs driver), systemd will manage controllers and stuff for you. It might be a case where propagateControllers had to be run before "update", but in a case where we don't have tests. The whole propagate controllers unless we are in a systemd delegated subtree is quite dodgy. I'll defer that to @giuseppe.

We don't have any testing of using cgroupfs on cgroup v2, and I am pretty sure running that while using systemd will break, no matter what... :/

/cc @giuseppe

ffromani · 2021-05-25T11:36:04Z

So, this PR can be merged then I guess?

Overall I think it is ok. I just need to think about why the propagateControllers was added in the first place. When dealing with systemd (both using systemd driver and procfs driver), systemd will manage controllers and stuff for you. It might be a case where propagateControllers had to be run before "update", but in a case where we don't have tests. The whole propagate controllers unless we are in a systemd delegated subtree is quite dodgy. I'll defer that to @giuseppe.

We don't have any testing of using cgroupfs on cgroup v2, and I am pretty sure running that while using systemd will break, no matter what... :/

/cc @giuseppe

my 2c: I had a review as well and I fully agree with @odinuge , from my POV all the changes in this PR make sense, with the caveat above about propagateControllers. From my (admittedly quick) review of the history, I'm not sure this was not just a accidental duplicate of runc features, but better indeed wait for review from. @giuseppe

giuseppe · 2021-05-25T12:07:21Z

As far as I understand, there are two issues here.
1. The test expects `MemoryStats.AvailableBytes` to be `nil` for system containers, and yet it reports something close to 4GB for `runtime` and `kubelet` (but not for `pods`).


2. Higher than expected number of `MemoryStats.MajorPageFaults` from the `pods` system container.
This is presumably caused by the first commit (removing propagateControllers), and I can't think about why is that. Perhaps somehow writing to cgroup.subtree_control reset the state of the controller?

I forgot what tests was failing but there are two fixes in cAdvisor that AFAICS are not yet propagated into a release and into Kubernetes:

google/cadvisor#2837
google/cadvisor#2839

Especially the first one, solves an issue when reading memory stats

giuseppe · 2021-05-25T12:08:46Z

Overall I think it is ok. I just need to think about why the propagateControllers was added in the first place. When dealing with systemd (both using systemd driver and procfs driver), systemd will manage controllers and stuff for you. It might be a case where propagateControllers had to be run before "update", but in a case where we don't have tests. The whole propagate controllers unless we are in a systemd delegated subtree is quite dodgy. I'll defer that to @giuseppe.

at the time I've added that code, runc didn't work with cgroupfs on cgroup v2. Now that this functionality is in libcontainer, I agree it is better to remove it

kolyshkin · 2021-05-25T21:17:39Z

I forgot what tests was failing but there are two fixes in cAdvisor that AFAICS are not yet propagated into a release and into Kubernetes:

google/cadvisor#2837
google/cadvisor#2839

Especially the first one, solves an issue when reading memory stats

Thanks, opened google/cadvisor#2878 and google/cadvisor#2879 (it might need a consent from you @giuseppe to make a google-cla bot happy).

matthyx · 2021-05-26T05:19:28Z

/uncc

ehashman · 2021-05-26T17:42:39Z

/assign @odinuge @fromanirh

/cc @sjenning

odinuge

/lgtm

pkg/kubelet/cm/types.go

giuseppe

LGTM

k8s-ci-robot · 2021-06-01T16:53:16Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: giuseppe, kolyshkin, mrunalp, odinuge

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/kubelet/cm/OWNERS~~ [mrunalp]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

odinuge · 2021-06-01T19:17:34Z

/hold cancel

k8s-ci-robot added needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. area/kubelet labels May 21, 2021

k8s-ci-robot requested review from bobbypage and matthyx May 21, 2021 22:22

k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 21, 2021

ehashman added this to Waiting on Author in SIG Node PR Triage May 21, 2021

ehashman moved this from Waiting on Author to Triage in SIG Node PR Triage May 21, 2021

k8s-ci-robot requested review from ffromani, klueska and odinuge May 21, 2021 23:38

ehashman moved this from Triage to Needs Reviewer in SIG Node PR Triage May 21, 2021

kolyshkin force-pushed the cgroup-cleanups branch from 6dea836 to ccf1eac Compare May 22, 2021 00:27

ffromani reviewed May 22, 2021

View reviewed changes

pkg/kubelet/cm/cgroup_manager_linux.go Outdated Show resolved Hide resolved

odinuge moved this from Waiting on Author to Needs Reviewer in SIG Node PR Triage May 24, 2021

k8s-ci-robot requested a review from giuseppe May 25, 2021 07:46

k8s-ci-robot requested a review from sjenning May 25, 2021 11:37

k8s-ci-robot removed the request for review from matthyx May 26, 2021 05:19

k8s-ci-robot assigned ffromani and odinuge May 26, 2021

odinuge approved these changes May 27, 2021

View reviewed changes

pkg/kubelet/cm/types.go Show resolved Hide resolved

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 27, 2021

odinuge moved this from Needs Reviewer to Needs Approver in SIG Node PR Triage May 28, 2021

giuseppe approved these changes Jun 1, 2021

View reviewed changes

mrunalp approved these changes Jun 1, 2021

View reviewed changes

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 1, 2021

mrunalp moved this from Needs Approver to Done in SIG Node PR Triage Jun 1, 2021

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 1, 2021

k8s-ci-robot merged commit 7c7a086 into kubernetes:master Jun 1, 2021

k8s-ci-robot added this to the v1.22 milestone Jun 1, 2021

odinuge mentioned this pull request Jun 4, 2021

Delegate cgroup exists to systemd and libcontainer #102250

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pkg/kubelet/cm: cgroup-related cleanups #102218

pkg/kubelet/cm: cgroup-related cleanups #102218

kolyshkin commented May 21, 2021 •

edited

k8s-ci-robot commented May 21, 2021

ehashman commented May 21, 2021

k8s-ci-robot commented May 24, 2021

kolyshkin commented May 24, 2021

odinuge commented May 24, 2021 •

edited

kolyshkin commented May 25, 2021

odinuge commented May 25, 2021

ffromani commented May 25, 2021

giuseppe commented May 25, 2021

giuseppe commented May 25, 2021

kolyshkin commented May 25, 2021 •

edited

matthyx commented May 26, 2021

ehashman commented May 26, 2021

odinuge left a comment

giuseppe left a comment

k8s-ci-robot commented Jun 1, 2021

odinuge commented Jun 1, 2021

pkg/kubelet/cm: cgroup-related cleanups #102218

pkg/kubelet/cm: cgroup-related cleanups #102218

Conversation

kolyshkin commented May 21, 2021 • edited

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot commented May 21, 2021

ehashman commented May 21, 2021

k8s-ci-robot commented May 24, 2021

kolyshkin commented May 24, 2021

odinuge commented May 24, 2021 • edited

kolyshkin commented May 25, 2021

odinuge commented May 25, 2021

ffromani commented May 25, 2021

giuseppe commented May 25, 2021

giuseppe commented May 25, 2021

kolyshkin commented May 25, 2021 • edited

matthyx commented May 26, 2021

ehashman commented May 26, 2021

odinuge left a comment

Choose a reason for hiding this comment

giuseppe left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Jun 1, 2021

odinuge commented Jun 1, 2021

kolyshkin commented May 21, 2021 •

edited

odinuge commented May 24, 2021 •

edited

kolyshkin commented May 25, 2021 •

edited