Skip to content

Conversation

pacoxu
Copy link
Member

@pacoxu pacoxu commented Oct 18, 2024

/kind bug
quick fix and needs verify

the failed CI /proc/mounts: see https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/128174/pull-kubernetes-cos-cgroupv1-containerd-node-e2e/1847150504468025344.

 /proc/mounts: tmpfs /run tmpfs rw,nosuid,nodev,size=803108k,nr_inodes=819200,mode=755 0 0
 /proc/mounts: tmpfs /sys/fs/cgroup tmpfs ro,nosuid,nodev,noexec,size=4096k,nr_inodes=1024,mode=755 0 0
 /proc/mounts: cgroup2 /sys/fs/cgroup/unified cgroup2 rw,nosuid,nodev,noexec,relatime,nsdelegate 0 0
 /proc/mounts: cgroup /sys/fs/cgroup/systemd cgroup rw,nosuid,nodev,noexec,relatime,xattr,name=systemd 0 0
 /proc/mounts: pstore /sys/fs/pstore pstore rw,nosuid,nodev,noexec,relatime 0 0
 /proc/mounts: bpf /sys/fs/bpf bpf rw,nosuid,nodev,noexec,relatime,mode=700 0 0
 /proc/mounts: cgroup /sys/fs/cgroup/freezer cgroup rw,nosuid,nodev,noexec,relatime,freezer 0 0
 /proc/mounts: cgroup /sys/fs/cgroup/net_cls,net_prio cgroup rw,nosuid,nodev,noexec,relatime,net_cls,net_prio 0 0
 /proc/mounts: cgroup /sys/fs/cgroup/hugetlb cgroup rw,nosuid,nodev,noexec,relatime,hugetlb 0 0
 /proc/mounts: cgroup /sys/fs/cgroup/memory cgroup rw,nosuid,nodev,noexec,relatime,memory 0 0
 /proc/mounts: cgroup /sys/fs/cgroup/cpu,cpuacct cgroup rw,nosuid,nodev,noexec,relatime,cpu,cpuacct 0 0
 /proc/mounts: cgroup /sys/fs/cgroup/perf_event cgroup rw,nosuid,nodev,noexec,relatime,perf_event 0 0
 /proc/mounts: cgroup /sys/fs/cgroup/pids cgroup rw,nosuid,nodev,noexec,relatime,pids 0 0
 /proc/mounts: cgroup /sys/fs/cgroup/rdma cgroup rw,nosuid,nodev,noexec,relatime,rdma 0 0
 /proc/mounts: cgroup /sys/fs/cgroup/blkio cgroup rw,nosuid,nodev,noexec,relatime,blkio 0 0
 /proc/mounts: cgroup /sys/fs/cgroup/cpuset cgroup rw,nosuid,nodev,noexec,relatime,cpuset 0 0
 /proc/mounts: cgroup /sys/fs/cgroup/devices cgroup rw,nosuid,nodev,noexec,relatime,devices 0 0

The cri-o cgroupv1 env.

I also checked another env using RHEL 8

[root@localhost ~]# cat /etc/os-release
NAME="Red Hat Enterprise Linux"
VERSION="8.4 (Ootpa)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="8.4"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Red Hat Enterprise Linux 8.4 (Ootpa)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:8.4:GA"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://access.redhat.com/documentation/red_hat_enterprise_linux/8/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_BUGZILLA_PRODUCT_VERSION=8.4
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.4"
[root@localhost ~]# cat /proc/mounts | grep cgroup
tmpfs /sys/fs/cgroup tmpfs ro,seclabel,nosuid,nodev,noexec,mode=755 0 0
cgroup /sys/fs/cgroup/systemd cgroup rw,seclabel,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd 0 0
cgroup /sys/fs/cgroup/net_cls,net_prio cgroup rw,seclabel,nosuid,nodev,noexec,relatime,net_cls,net_prio 0 0
cgroup /sys/fs/cgroup/blkio cgroup rw,seclabel,nosuid,nodev,noexec,relatime,blkio 0 0
cgroup /sys/fs/cgroup/memory cgroup rw,seclabel,nosuid,nodev,noexec,relatime,memory 0 0
cgroup /sys/fs/cgroup/cpu,cpuacct cgroup rw,seclabel,nosuid,nodev,noexec,relatime,cpu,cpuacct 0 0
cgroup /sys/fs/cgroup/rdma cgroup rw,seclabel,nosuid,nodev,noexec,relatime,rdma 0 0
cgroup /sys/fs/cgroup/cpuset cgroup rw,seclabel,nosuid,nodev,noexec,relatime,cpuset 0 0
cgroup /sys/fs/cgroup/hugetlb cgroup rw,seclabel,nosuid,nodev,noexec,relatime,hugetlb 0 0
cgroup /sys/fs/cgroup/perf_event cgroup rw,seclabel,nosuid,nodev,noexec,relatime,perf_event 0 0
cgroup /sys/fs/cgroup/devices cgroup rw,seclabel,nosuid,nodev,noexec,relatime,devices 0 0
cgroup /sys/fs/cgroup/freezer cgroup rw,seclabel,nosuid,nodev,noexec,relatime,freezer 0 0
cgroup /sys/fs/cgroup/pids cgroup rw,seclabel,nosuid,nodev,noexec,relatime,pids 0 0

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. labels Oct 18, 2024
@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Oct 18, 2024
@pacoxu pacoxu force-pushed the check-unifiedMountpoint branch from 1fa79b5 to 66f2b64 Compare October 18, 2024 02:51
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Oct 18, 2024
@pacoxu
Copy link
Member Author

pacoxu commented Oct 18, 2024

/cc @neolit123

@pacoxu pacoxu force-pushed the check-unifiedMountpoint branch from 821b88c to 1eb5efa Compare October 18, 2024 03:21
Copy link
Member

@neolit123 neolit123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve
/hold

do you need additional verification for this?
drop the hold if needed, please.

it has to be backported to branch release-1.9 too.
(manually)

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 18, 2024
@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Oct 18, 2024
@pacoxu
Copy link
Member Author

pacoxu commented Oct 18, 2024

I needs to write more test and verify it in k/k presbumit CI.

Comment on lines -106 to +109
if st.Type == unix.CGROUP2_SUPER_MAGIC {
if isCgroupsV2 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that makes sense and drops the golang.org/x/sys dependency

@pacoxu pacoxu force-pushed the check-unifiedMountpoint branch from 1eb5efa to 5e86f49 Compare October 18, 2024 06:31
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 18, 2024
@pacoxu
Copy link
Member Author

pacoxu commented Oct 18, 2024

/proc/mounts: tmpfs /run tmpfs rw,nosuid,nodev,size=803108k,nr_inodes=819200,mode=755 0 0
/proc/mounts: tmpfs /sys/fs/cgroup tmpfs ro,nosuid,nodev,noexec,size=4096k,nr_inodes=1024,mode=755 0 0
/proc/mounts: cgroup2 /sys/fs/cgroup/unified cgroup2 rw,nosuid,nodev,noexec,relatime,nsdelegate 0 0
/proc/mounts: cgroup /sys/fs/cgroup/systemd cgroup rw,nosuid,nodev,noexec,relatime,xattr,name=systemd 0 0
/proc/mounts: pstore /sys/fs/pstore pstore rw,nosuid,nodev,noexec,relatime 0 0
/proc/mounts: bpf /sys/fs/bpf bpf rw,nosuid,nodev,noexec,relatime,mode=700 0 0

For this combined mode, I am not sure how to behavior is best practise for system-validator.

  • overall, it used cgroups v1; but there is /sys/fs/cgroup/unified.
  • [current] for cgroup v1, we checked /proc/cgroups
  • [current] for cgroup v2, we checked /sys/fs/cgroup/<tmp>/cgroup.controllers
  • for this combined more,
    • [current] using cgroup v1 way, we checked /proc/cgroups
    • should we also check. /sys/fs/cgroup/unified/cgroup.controllers (With this PR, we still do not combine the controllers together. )

@pacoxu
Copy link
Member Author

pacoxu commented Oct 18, 2024

/cc @kannon92 @haircommander

@pacoxu
Copy link
Member Author

pacoxu commented Oct 18, 2024

In pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e, /sys/fs/cgroup/unified/cgroup.controllers is empty.

@pacoxu pacoxu force-pushed the check-unifiedMountpoint branch from 5e86f49 to 17c3ea1 Compare October 18, 2024 08:07
@pacoxu
Copy link
Member Author

pacoxu commented Oct 18, 2024

do you need additional verification for this?
drop the hold if needed, please.

I run some test in kubernetes/kubernetes#128174. The current results LGTM.

#43 (comment) is the only concern now.

  • if /sys/fs/cgroup is mount using tmpfs, can we assume it is cgroup v1?

BTW, /proc/cgroups seems to be another solution. runc is using this file to check cgroup controllers.

https://github.com/kubernetes/kubernetes/blob/0b071fd37ef59158e54ecb88aad2c31751b360ff/vendor/github.com/opencontainers/runc/libcontainer/cgroups/utils.go#L115-L132

	f, err := os.Open("/proc/cgroups")
	if err != nil {
		return nil, err
	}
	defer f.Close()

	subsystems := []string{}

	s := bufio.NewScanner(f)
	for s.Scan() {
		text := s.Text()
		if text[0] != '#' {
			parts := strings.Fields(text)
			if len(parts) >= 4 && parts[3] != "0" {
				subsystems = append(subsystems, parts[0])
			}
		}
	}

I tried to collect some cases in https://gist.github.com/pacoxu/2768337b62ccc150e8623d25ea890cfc.

@pacoxu
Copy link
Member Author

pacoxu commented Oct 18, 2024

BTW, /proc/cgroups seems to be another solution. runc is using this file to check cgroup controllers.

https://github.com/kubernetes/kubernetes/blob/0b071fd37ef59158e54ecb88aad2c31751b360ff/vendor/github.com/opencontainers/runc/libcontainer/cgroups/utils.go#L115-L132

	f, err := os.Open("/proc/cgroups")
	if err != nil {
		return nil, err
	}
	defer f.Close()

	subsystems := []string{}

	s := bufio.NewScanner(f)
	for s.Scan() {
		text := s.Text()
		if text[0] != '#' {
			parts := strings.Fields(text)
			if len(parts) >= 4 && parts[3] != "0" {
				subsystems = append(subsystems, parts[0])
			}
		}
	}

According to https://docs.kernel.org/admin-guide/cgroup-v2.html

/proc/cgroups is meaningless for v2. Use “cgroup.controllers” or “cgroup.stat” files at the root instead.

This is not a correct solution then.

@neolit123
Copy link
Member

BTW, /proc/cgroups seems to be another solution. runc is using this file to check cgroup controllers.

we are already doing that here:

func (c *CgroupsValidator) getCgroupV1Subsystems() ([]string, error) {

if /sys/fs/cgroup is mount using tmpfs, can we assume it is cgroup v1?

google-ing about it, seems like both v1 and v2 can mount to /sys/fs/cgroup with tmpfs.
but someone needs the confirm.
what is the problem assuming that both versions can mount like that?

@neolit123
Copy link
Member

cc @KentaTada

@pacoxu
Copy link
Member Author

pacoxu commented Oct 18, 2024

@pacoxu pacoxu force-pushed the check-unifiedMountpoint branch from 17c3ea1 to 7e84124 Compare October 18, 2024 08:47
@pacoxu pacoxu force-pushed the check-unifiedMountpoint branch from 7e84124 to ff4711e Compare October 18, 2024 08:57
// If default unified mount point is available, return it directly.
if fields[1] == defaultUnifiedMountPoint {
if fields[2] == "tmpfs" {
c, err := os.Open(filepath.Join(defaultUnifiedMountPoint, "cgroup.controllers"))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this more seems like a line to verify it is v2, not verify it's not v1. I don't personally know of a situation the kernel would return tmpfs but also populate cgroup.controllers

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a common way to check if this is cgroup v1 when the mount type is tmpfs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://gist.github.com/pacoxu/2768337b62ccc150e8623d25ea890cfc I collected several examples:

this more seems like a line to verify it is v2, not verify it's not v1. I don't personally know of a situation the kernel would return tmpfs but also populate cgroup.controllers

To verify it is not v1, we may check /sys/fs/cgroup/cpu or /sys/fs/cgroup/memory which is used by kubelet directly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a520b48 is to address this comment. PTAL.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or check /sys/fs/cgroup/memory in /proc/mounts like below:

scanner := bufio.NewScanner(f)
	var cgroupV1MountPoint string
+	var isDefault bool
	for scanner.Scan() {
		line := scanner.Text()
		if !strings.Contains(line, "cgroup") {
			continue
		}
		// Example fields: `cgroup2 /sys/fs/cgroup cgroup2 rw,seclabel,nosuid,nodev,noexec,relatime 0 0`.
		fields := strings.Fields(line)
		if len(fields) >= 3 {
+			if isDefault {
+				if fields[1] == defaultUnifiedMountPoint + "/memory" {
+					return defaultUnifiedMountPoint, fields[2] == "cgroup2", nil
+				}
+				continue
+			}

			// If default unified mount point is available, return it directly.
			if fields[1] == defaultUnifiedMountPoint {
+				isDefault = true
				if fields[2] == "tmpfs" {
+					continue
				}
				return defaultUnifiedMountPoint, fields[2] == "cgroup2", nil
			}
			switch fields[2] {
			case "cgroup2":
				// Return the first cgroups v2 mount point directly.
				return fields[1], true, nil
			case "cgroup":
				// Set the first cgroups v1 mount point only,
				// and continue the loop to find if there is a cgroups v2 mount point.
				if len(cgroupV1MountPoint) == 0 {
					cgroupV1MountPoint = fields[1]
				}
			}
		}

The cons here are

  • no checking local file

@pacoxu pacoxu force-pushed the check-unifiedMountpoint branch from a520b48 to d4dab26 Compare October 21, 2024 06:55
Copy link
Member

@neolit123 neolit123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pacoxu i would assume we want this for k8s 1.32?

@KentaTada @haircommander @kannon92
please provide guidance on how to solve the pending cgroups v1/v2 detection issue that @pacoxu describes.

@haircommander
Copy link

LGTM

Copy link
Member

@neolit123 neolit123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve
/hold

drop hold if ready to go.
you must backport this to the release-1.9 branch of this repo.
then i will create release 1.9.1

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 4, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: neolit123, pacoxu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@pacoxu
Copy link
Member Author

pacoxu commented Nov 4, 2024

Acccording to my test based on kubernetes/kubernetes#128174, this should not fail the k/k CIs again.

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 4, 2024
@k8s-ci-robot k8s-ci-robot merged commit 06642e8 into kubernetes:main Nov 4, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants