Skip to content

Expose cgroup v2 memory.events as Prometheus metrics#3870

Merged
dims merged 1 commit into
google:masterfrom
sohankunkerkar:add-memory-events-upstream
May 13, 2026
Merged

Expose cgroup v2 memory.events as Prometheus metrics#3870
dims merged 1 commit into
google:masterfrom
sohankunkerkar:add-memory-events-upstream

Conversation

@sohankunkerkar
Copy link
Copy Markdown
Contributor

Kubernetes KEP-2570 (MemoryQoS) uses cgroup v2 memory.high for throttling and memory.min/memory.low for memory protection. To observe the effect of these settings, operators need visibility into memory pressure events. cadvisor currently does not read the memory.events cgroup file. The existing container_oom_events_total metric comes from kernel log parsing, not cgroup counters.

Read memory.events on cgroup v2 and expose two new Prometheus counter metrics:

  • container_memory_events_high_total: times the container was throttled for breaching memory.high
  • container_memory_events_max_total: times the container's usage hit memory.max

xref: kubernetes/enhancements#2570

@sohankunkerkar
Copy link
Copy Markdown
Contributor Author

cc @haircommander

Comment thread container/libcontainer/handler.go
@sohankunkerkar sohankunkerkar force-pushed the add-memory-events-upstream branch from 921423a to 1a98acf Compare May 7, 2026 17:29
@sohankunkerkar
Copy link
Copy Markdown
Contributor Author

@dims could you take a look at it?

@dims
Copy link
Copy Markdown
Collaborator

dims commented May 12, 2026

@sohankunkerkar no tests at all? :(

  • add a unit test in container/libcontainer/handler_test.go?
  • integration/tests/metrics/prometheus_test.go::TestCoreMemoryMetricsExist (add to memoryMetrics array)
  • integration/tests/api/event_test.go::TestOomKillEventConstraint (same setup as the OOM test, but poll /metrics until container_memory_events_max_total{name~containerID} > 0.)

@sohankunkerkar sohankunkerkar force-pushed the add-memory-events-upstream branch from 1a98acf to 9d965e7 Compare May 13, 2026 14:18
@sohankunkerkar
Copy link
Copy Markdown
Contributor Author

@sohankunkerkar no tests at all? :(

  • add a unit test in container/libcontainer/handler_test.go?
  • integration/tests/metrics/prometheus_test.go::TestCoreMemoryMetricsExist (add to memoryMetrics array)
  • integration/tests/api/event_test.go::TestOomKillEventConstraint (same setup as the OOM test, but poll /metrics until container_memory_events_max_total{name~containerID} > 0.)

@dims I addressed your comments. Could you take a look at it again? Thanks!

@dims
Copy link
Copy Markdown
Collaborator

dims commented May 13, 2026

@sohankunkerkar https://github.com/google/cadvisor/actions/runs/25805017191/job/75842992013?pr=3870

Kubernetes KEP-2570 (MemoryQoS) uses cgroup v2 memory.high for
throttling and memory.min/memory.low for memory protection. To observe
the effect of these settings, operators need visibility into memory
pressure events. cadvisor currently does not read the memory.events
cgroup file — the existing container_oom_events_total metric comes from
kernel log parsing, not cgroup counters.

Read memory.events on cgroup v2 and expose two new Prometheus counter
metrics:

- container_memory_events_high_total: times the container was throttled
  for breaching memory.high
- container_memory_events_max_total: times the container's usage hit
  memory.max

Signed-off-by: Sohan Kunkerkar <sohank2602@gmail.com>
@sohankunkerkar sohankunkerkar force-pushed the add-memory-events-upstream branch from 9d965e7 to cc66235 Compare May 13, 2026 19:20
@dims dims merged commit e3eecca into google:master May 13, 2026
10 checks passed
@dims
Copy link
Copy Markdown
Collaborator

dims commented May 13, 2026

thanks @sohankunkerkar

@sohankunkerkar
Copy link
Copy Markdown
Contributor Author

@dims Thanks for reviewing this PR. I had one quick question: do we have any plans to cut a new release of cAdvisor anytime soon? We might need it for testing the MemoryQoS feature in Kubernetes.

@dims
Copy link
Copy Markdown
Collaborator

dims commented May 13, 2026

@sohankunkerkar i try to do at least one release of cadvisor to support k8s. Will take stock soon-ish. Do you need this in short order? (weeks? days?)

@sohankunkerkar
Copy link
Copy Markdown
Contributor Author

@sohankunkerkar i try to do at least one release of cadvisor to support k8s. Will take stock soon-ish. Do you need this in short order? (weeks? days?)

Thanks for the update! It would be ideal if we could get that once the v1.37 branch opens, or before the feature freeze maybe?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants