Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/kubelet/cm: cgroup-related cleanups #102218

Merged
merged 2 commits into from Jun 1, 2021

Conversation

kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented May 21, 2021

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

While working on #102147, I noticed a few odd things in the code that can be straightened out.
This PR does some of that. Please see detailed description in commits.

Which issue(s) this PR fixes:

none

Special notes for your reviewer:

Please review on commit by commit basis.

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

none

This was added by commit a9772b2.

In the current codebase, the cgroup being updated was created using
runc/opencontainers' manager.Apply(), which already does controllers
propagation, so there is no need to repeat that on every update.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 21, 2021
@k8s-ci-robot
Copy link
Contributor

Hi @kolyshkin. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. area/kubelet labels May 21, 2021
@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 21, 2021
@ehashman ehashman added this to Waiting on Author in SIG Node PR Triage May 21, 2021
@ehashman ehashman moved this from Waiting on Author to Triage in SIG Node PR Triage May 21, 2021
@ehashman
Copy link
Member

/ok-to-test
/priority backlog
/triage accepted
/cc @odinuge @fromanirh @klueska

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/backlog Higher priority than priority/awaiting-more-evidence. triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 21, 2021
@ehashman ehashman moved this from Triage to Needs Reviewer in SIG Node PR Triage May 21, 2021
@k8s-ci-robot
Copy link
Contributor

@kolyshkin: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kubernetes-node-crio-cgrpv2-e2e f1aee7e link /test pull-kubernetes-node-crio-cgrpv2-e2e
pull-kubernetes-node-kubelet-serial-crio-cgroupv2 f1aee7e link /test pull-kubernetes-node-kubelet-serial-crio-cgroupv2
pull-kubernetes-node-kubelet-serial-crio-cgroupv1 f1aee7e link /test pull-kubernetes-node-kubelet-serial-crio-cgroupv1

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@kolyshkin
Copy link
Contributor Author

E2eNode Suite: [sig-node] Summary API [NodeConformance] when querying /stats/summary should report resource usage through the stats api expand_less	1m55s
_output/local/go/src/k8s.io/kubernetes/test/e2e_node/summary_test.go:54
Timed out after 90.000s.
Expected
    <string>: Summary
to match fields: {
[.Node.SystemContainers[runtime].Memory:
	Expected
	    <string>: MemoryStats
	to match fields: {
	.AvailableBytes:
		Expected
		    <*uint64 | 0xc000a91418>: 3705806848
		to be nil
	}
	, .Node.SystemContainers[pods].Memory:
	Expected
	    <string>: MemoryStats
	to match fields: {
	.MajorPageFaults:
		Expected
		    <uint64>: 198
		to be <=
		    <int>: 10
	}
	, .Node.SystemContainers[kubelet].Memory:
	Expected
	    <string>: MemoryStats
	to match fields: {
	.AvailableBytes:
		Expected
		    <*uint64 | 0xc000a91560>: 3836538880
		to be nil
	}
	]
}

As far as I understand, there are two issues here.

  1. The test expects MemoryStats.AvailableBytes to be nil for system containers, and yet it reports something close to 4GB for runtime and kubelet (but not for pods).
  2. Higher than expected number of MemoryStats.MajorPageFaults from the pods system container.

This is presumably caused by the first commit (removing propagateControllers), and I can't think about why is that. Perhaps somehow writing to cgroup.subtree_control reset the state of the controller?

Looking at it.

@odinuge
Copy link
Member

odinuge commented May 24, 2021

As far as I understand, there are two issues here.

Ahh, sorry, thought you knew about these test failures... They are expected, I should probably have stated that explicitly. The failures in pull-kubernetes-node-crio-cgrpv2-e2e will be fixed by bumping cadvisor and the commit changing the conformance test in your first runc bump PR. They are tracked here: #99230

pull-kubernetes-node-kubelet-serial-crio-cgroupv{1,2} are just experimental, so don't care about those... 😅

Overall I think this PR is ok. There are some issues with controller propagation on v2 in general (some WIP stuff here: #102250), but you know more about the runc part than me.

@odinuge odinuge moved this from Waiting on Author to Needs Reviewer in SIG Node PR Triage May 24, 2021
@kolyshkin
Copy link
Contributor Author

So, this PR can be merged then I guess?

@odinuge
Copy link
Member

odinuge commented May 25, 2021

So, this PR can be merged then I guess?

Overall I think it is ok. I just need to think about why the propagateControllers was added in the first place. When dealing with systemd (both using systemd driver and procfs driver), systemd will manage controllers and stuff for you. It might be a case where propagateControllers had to be run before "update", but in a case where we don't have tests. The whole propagate controllers unless we are in a systemd delegated subtree is quite dodgy. I'll defer that to @giuseppe.

We don't have any testing of using cgroupfs on cgroup v2, and I am pretty sure running that while using systemd will break, no matter what... :/

/cc @giuseppe

@ffromani
Copy link
Contributor

So, this PR can be merged then I guess?

Overall I think it is ok. I just need to think about why the propagateControllers was added in the first place. When dealing with systemd (both using systemd driver and procfs driver), systemd will manage controllers and stuff for you. It might be a case where propagateControllers had to be run before "update", but in a case where we don't have tests. The whole propagate controllers unless we are in a systemd delegated subtree is quite dodgy. I'll defer that to @giuseppe.

We don't have any testing of using cgroupfs on cgroup v2, and I am pretty sure running that while using systemd will break, no matter what... :/

/cc @giuseppe

my 2c: I had a review as well and I fully agree with @odinuge , from my POV all the changes in this PR make sense, with the caveat above about propagateControllers. From my (admittedly quick) review of the history, I'm not sure this was not just a accidental duplicate of runc features, but better indeed wait for review from. @giuseppe

@giuseppe
Copy link
Member


As far as I understand, there are two issues here.

1. The test expects `MemoryStats.AvailableBytes` to be `nil` for system containers, and yet it reports something close to 4GB for `runtime` and `kubelet` (but not for `pods`).


2. Higher than expected number of `MemoryStats.MajorPageFaults` from the `pods` system container.

This is presumably caused by the first commit (removing propagateControllers), and I can't think about why is that. Perhaps somehow writing to cgroup.subtree_control reset the state of the controller?

I forgot what tests was failing but there are two fixes in cAdvisor that AFAICS are not yet propagated into a release and into Kubernetes:

google/cadvisor#2837
google/cadvisor#2839

Especially the first one, solves an issue when reading memory stats

@giuseppe
Copy link
Member

Overall I think it is ok. I just need to think about why the propagateControllers was added in the first place. When dealing with systemd (both using systemd driver and procfs driver), systemd will manage controllers and stuff for you. It might be a case where propagateControllers had to be run before "update", but in a case where we don't have tests. The whole propagate controllers unless we are in a systemd delegated subtree is quite dodgy. I'll defer that to @giuseppe.

at the time I've added that code, runc didn't work with cgroupfs on cgroup v2. Now that this functionality is in libcontainer, I agree it is better to remove it

@kolyshkin
Copy link
Contributor Author

kolyshkin commented May 25, 2021

I forgot what tests was failing but there are two fixes in cAdvisor that AFAICS are not yet propagated into a release and into Kubernetes:

google/cadvisor#2837
google/cadvisor#2839

Especially the first one, solves an issue when reading memory stats

Thanks, opened google/cadvisor#2878 and google/cadvisor#2879 (it might need a consent from you @giuseppe to make a google-cla bot happy).

@matthyx
Copy link
Contributor

matthyx commented May 26, 2021

/uncc

@k8s-ci-robot k8s-ci-robot removed the request for review from matthyx May 26, 2021 05:19
@ehashman
Copy link
Member

/assign @odinuge @fromanirh

/cc @sjenning

Copy link
Member

@odinuge odinuge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

pkg/kubelet/cm/types.go Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 27, 2021
@odinuge odinuge moved this from Needs Reviewer to Needs Approver in SIG Node PR Triage May 28, 2021
Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: giuseppe, kolyshkin, mrunalp, odinuge

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 1, 2021
@mrunalp mrunalp moved this from Needs Approver to Done in SIG Node PR Triage Jun 1, 2021
@odinuge
Copy link
Member

odinuge commented Jun 1, 2021

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 1, 2021
@k8s-ci-robot k8s-ci-robot merged commit 7c7a086 into kubernetes:master Jun 1, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.22 milestone Jun 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/backlog Higher priority than priority/awaiting-more-evidence. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

Successfully merging this pull request may close these issues.

None yet

8 participants