feat: add cgroup CFS throttling metrics to OTel meter#1114
Merged
fenos merged 1 commit intoMay 21, 2026
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Adds Linux cgroup CFS throttling metrics to the project’s OpenTelemetry metrics setup so container CPU throttling can be correlated with runtime behavior (e.g., GC pauses).
Changes:
- Introduces
installCgroupCpuMetrics(meter)to detect cgroup v1/v2 and register four observable CFS instruments based oncpu.stat. - Wires cgroup metrics installation into OTel meter provider initialization.
- Adds unit tests for cpu.stat parsing and the new metric installation behavior; updates existing OTel metrics tests to mock the new module.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/internal/monitoring/otel-metrics.ts | Installs cgroup CFS metrics immediately after setting the global meter provider. |
| src/internal/monitoring/otel-metrics.test.ts | Mocks the new cgroup metrics module and updates MeterProvider mock to include getMeter. |
| src/internal/monitoring/cgroup-cpu-metrics.ts | Implements cgroup detection, cpu.stat parsing, and observable metric registration. |
| src/internal/monitoring/cgroup-cpu-metrics.test.ts | Adds coverage for parsing and observable metric behavior (including throttled ratio guards). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
4 tasks
Coverage Report for CI Build 26211996717Coverage increased (+0.07%) to 74.943%Details
Uncovered Changes
Coverage RegressionsNo coverage regressions found. Coverage Stats💛 - Coveralls |
fenos
approved these changes
May 21, 2026
ferhatelmas
approved these changes
May 21, 2026
Register four observable instruments on the existing MeterProvider to surface container CPU bandwidth-control state so GC pauses can be correlated with throttling in our observability backend: - process.cpu.cfs.periods (counter) - process.cpu.cfs.throttled_periods (counter) - process.cpu.cfs.throttled_time (counter, ns) - process.cpu.cfs.throttled_ratio (gauge, delta-based) Supports cgroup v1 and v2, short-circuits cleanly on non-Linux and when cpu.stat is unreadable, and reads the file from the OTel observable callback (no separate timer). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1f318b8 to
6e4791e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
installCgroupCpuMetrics(meter)insrc/internal/monitoring/cgroup-cpu-metrics.ts. Detects cgroup v1 vs v2 at startup, parsescpu.stat(normalising the throttled-time field to nanoseconds), and registers four observable instruments on the provided OTelMeter:process.cpu.cfs.periods— total CFS periods elapsed (counter,{period})process.cpu.cfs.throttled_periods— periods the cgroup was throttled (counter,{period})process.cpu.cfs.throttled_time— total throttled time (counter,ns)process.cpu.cfs.throttled_ratio— fraction of recent periods throttled (gauge,1), delta-based with a divide-by-zero guard on first sample and zero-period intervals.otel-metrics.tsright aftermetrics.setGlobalMeterProvider(meterProvider).setInterval). Non-Linux platforms and missing/unreadablecpu.statshort-circuit cleanly with a single debug log. Mid-lifetime read failures are caught silently with a once-per-process warn log so they don't surface as SDK warnings.Motivation
App runs in containers (ECS / Kubernetes); we want to correlate intermittent multi-100ms GC pauses with CFS throttling. These metrics are project-local — they are not yet part of the OTel semantic conventions, which the module's top comment documents.
Test plan
npx tsc -noEmitcleannpx biome checkclean on touched filesnpm run test:unit -- otel-metrics.test.ts cgroup-cpu-metrics.test.ts→ 9/9 passing🤖 Generated with Claude Code