cAdvisor only sees pause container scope on cgroup v2 + systemd-cgroup (subcontainer compat dirs not created)

## Description

On a host running unified cgroup v2 with `systemd-cgroup = "true"` in `runsc.toml`,
cAdvisor only reports per-scope metrics for the **pause container's** cgroup scope.
Application and sidecar containers in the same pod have no `cri-containerd-*.scope`
directory under the pod slice, so cAdvisor's inotify watcher never discovers them
and `/metrics/cadvisor` is missing CPU / memory / network series for them.

The kubelet's CRI-backed `/stats/summary` endpoint reports the values fine, so the
data is available -- it just isn't surfaced through cAdvisor.

This looks like the cgroup v2 + systemd analogue of #6500, which was fixed for v1
non-systemd in #6657 by creating empty subcontainer cgroup directories so cAdvisor
could detect them. Tracing the current code:

- `runsc/container/container.go::setupCgroupForSubcontainer` calls `cgroupInstall(conf, cg, &specs.LinuxResources{})`.
- For systemd v2, the underlying `cgroupSystemd.Install()` (in `runsc/cgroup/systemd.go`)
  only stages dbus properties; the systemd transient scope unit (and therefore the
  cgroup directory on the host) is only created in `Join()` -- which is not called
  for these compat-only subcontainer cgroups.

Net effect: with `systemd-cgroup = "true"` on a v2 host, the empty subcontainer
cgroup directory that would let cAdvisor discover the container is never created,
so per-container cAdvisor metrics regress to "pause-only" for every gVisor pod.

This regresses anything keyed off cAdvisor: `kubectl top pod`, container-level
CPU/memory dashboards, VPA recommendations driven from cAdvisor series, etc.

### Expected

For a pod with N user containers running under runsc + `systemd-cgroup = "true"`
on a cgroup v2 host, cAdvisor should expose per-container series for each user
container (matching what `runc` exposes today and what kubelet's `/stats/summary`
already reports via CRI for the same pod). Equivalent fix path to #6657, but
working on the systemd v2 code path -- e.g. ensure `setupCgroupForSubcontainer`
on systemd v2 actually materializes an empty cgroup directory under the pod slice
that cAdvisor can inotify-watch.

### Actual

Only the pause container's scope is visible to cAdvisor; user containers are
absent from `/metrics/cadvisor`, and the per-container cgroup directories
(`memory.current`, `memory.stat`, `memory.max`, `cpu.stat`, ...) do not exist
under the pod slice on the host.

## Steps to reproduce

Cluster: cgroup v2 unified host (`stat -fc %T /sys/fs/cgroup` -> `cgroup2fs`),
containerd 2.x, kubernetes 1.35, runsc registered as `RuntimeClass: gvisor`,
`runsc.toml`:

```toml
[runsc_config]
  net-raw = "true"
  systemd-cgroup = "true"
```

Pod: `runtimeClassName: gvisor`, two containers (sidecar `istio-proxy` +
application container).

```
$ kubectl get pods -n <namespace> -l role=<app> -owide
NAME                READY   STATUS    NODE
<app>-...-rls5m     2/2     Running   <node>

$ kubectl get pods -n <namespace> -l role=<app> -oyaml | grep runtimeClass
    runtimeClassName: gvisor
```

### cAdvisor (kubelet `/metrics/cadvisor`)

Total `container_memory_working_set_bytes` series on the node: **346**.
For the gVisor pod, only **2** series, both at the pod-slice / pause-scope level:

```
$ kubectl get --raw "/api/v1/nodes/<node>/proxy/metrics/cadvisor" \
    | grep container_memory_working_set_bytes | grep <app>
container_memory_working_set_bytes{container="",
  id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod<uid>.slice",
  image="",name="",namespace="<namespace>",pod="<app>-...-rls5m"} 2.268966912e+09 ...
container_memory_working_set_bytes{container="",
  id="/kubepods.slice/.../cri-containerd-<pause-id>.scope",
  image="localhost/kubernetes/pause:latest",
  name="<pause-id>",namespace="<namespace>",pod="<app>-...-rls5m"} 2.268966912e+09 ...
```

No series for `container="istio-proxy"` or `container="<app>"`.

### kubelet `/stats/summary` (CRI-backed) for the same pod

Both containers show real CPU/memory:

```json
{
  "name": "istio-proxy",
  "cpu":    { "usageNanoCores": 18618234, "usageCoreNanoSeconds": 305217528000 },
  "memory": { "workingSetBytes": 146890752, "usageBytes": 146890752 }
}
{
  "name": "<app>",
  "cpu":    { "usageNanoCores": 19078357, "usageCoreNanoSeconds": 328723623000 },
  "memory": { "workingSetBytes": 2868473856, "usageBytes": 2868473856 }
}
```

So the data exists; it's just not exposed in the cAdvisor cgroup tree.

### Host filesystem

Under `/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod<uid>.slice/`:
- only the pause container's `cri-containerd-<pause-id>.scope/` exists
- no `cri-containerd-*.scope/` directory for the app or sidecar container
- consequently, no `memory.current` / `memory.stat` / `memory.max` for those
  containers on the host

For comparison, a `runc` pod on the same node has one scope per user container
under the pod slice with the usual v2 files populated.

## Environment

### runsc version

```
runsc version release-20260302.0
spec: 1.1.0-rc.1
```

(Repros identically on stock release; we run a local build that cherry-picks
#12686 and #12688 on top of `release-20260316.0` for Istio DNS capture, but
neither patch touches cgroup code.)

### uname

`Linux ... 6.x ... x86_64 GNU/Linux` (Ubuntu 22.04, EKS worker)

### kubectl

```
Client Version: v1.35.x
Server Version: v1.35.x
```

### containerd

`containerd 2.2.2`, runtime registered as:

```toml
[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.runsc]
  runtime_type = "io.containerd.runsc.v1"
[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.runsc.options]
  TypeUrl    = "io.containerd.runsc.v1.options"
  ConfigPath = "/etc/containerd/runsc.toml"
```

### Related

- #6500 / #6657 -- original cAdvisor compat fix for cgroup v1
- #9580 -- v2 + systemd limit detection on the host (different bug, related area)
- #13007 -- sandbox-side cgroupfs is v1-only on a v2 host (separate, in-sandbox)
- https://github.com/Datadog/datadog-agent/issues/44084


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cAdvisor only sees pause container scope on cgroup v2 + systemd-cgroup (subcontainer compat dirs not created) #13067

Description

Expected

Actual

Steps to reproduce

cAdvisor (kubelet `/metrics/cadvisor`)

kubelet `/stats/summary` (CRI-backed) for the same pod

Host filesystem

Environment

runsc version

uname

kubectl

containerd

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

cAdvisor only sees pause container scope on cgroup v2 + systemd-cgroup (subcontainer compat dirs not created) #13067

Description

Description

Expected

Actual

Steps to reproduce

cAdvisor (kubelet /metrics/cadvisor)

kubelet /stats/summary (CRI-backed) for the same pod

Host filesystem

Environment

runsc version

uname

kubectl

containerd

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

cAdvisor (kubelet `/metrics/cadvisor`)

kubelet `/stats/summary` (CRI-backed) for the same pod