fix(observability): v0.12.1 — CPU mCPU relabel + memory chart labels by eyriehq-bot[bot] · Pull Request #20 · eyriehq/plugins

eyriehq-bot · 2026-04-27T09:01:12Z

Summary

MGupta deployed v0.12.0 to iwdev and surfaced two issues against the per-pod chart grid:

Issue 1 — CPU values 1000x too large

v0.12.0's max(Value) fix correctly suppressed argMax-induced spike noise (iwdev average dropped from 84.735 → 22.710), but the unit label was still wrong. Cross-check against kubectl top pods -n infrawatch for the same iwagent pod / window shows the rate values match millicores 1:1 — i.e., the v0.12.0 simulation's "0.022–0.035 cores" was right by raw scale but the chart label of "cores" was off by 1000x.

Most likely root cause (TODO to verify upstream): the kubelet-stats pipeline on iwdev appears to emit k8s.pod.cpu.time deltas in milliseconds rather than seconds. Specifically:

pod uses ~0.022 cores
in 60s of cumulative growth, delta ≈ 0.022 × 60 = 1.32 cpu-seconds = 1320 cpu-ms
divided by 60s bucket → 22 — exactly what we display

Safe fix this cycle (per brief): relabel chart unit to mCPU so the chart frame matches the data the user already sees side-by-side with kubectl top. Backend rate calc is unchanged — only labels + display formatting move.

Issue 2 — Memory chart label leakage

Y-axis was rendering the literal `"B/KB/MB/GB"` string (the unit-suffix list, not a label) → changed to "Utilization" (label describes what the chart shows; tick formatter already auto-scales).
Header next to avg readout was echoing the same literal as a trailing unit → removed. Auto-scaled values ("31.2M", "22 mCPU") are self-contained.

Changes

File	Change
metrics/components/ServiceChart.jsx	resolveUnitLabel → mCPU / Utilization. formatMetric for CPU → mCPU-friendly (one decimal non-compact, integer compact, "<1" floor). Header trailing-unit span removed. Tooltip: CPU appends "mCPU", memory shows auto-scaled value alone.
metrics/components/MetricsTab.jsx	Page summary unit "cores" → "mCPU".
metrics/router.py	Per-resource Pod CPU cards (CLUSTER + POD) unit "cores" → "mCPU". v0.12.1 docstring on _query_pod_cpu_time recording unit-trace finding + TODO.
_shared/otel_queries.py	Same TODO on explorer_cpu_per_pod_query.
metrics/components/ux.test.js	resolveUnitLabel + formatMetric tests rewritten for new labels and millicore formatter.
metrics/plugin.json	1.3.0 → 1.3.1.

Before / after (iwdev, iwagent pod)

	v0.12.0 deployed	v0.12.1 (this PR)
Y-axis label	cores	mCPU
Header avg	avg 22.710 cores	avg 22.7 mCPU
Memory Y-axis label	B/KB/MB/GB	Utilization
Memory header avg	avg 31.2M B/KB/MB/GB	avg 31.2M
kubectl top pods -n infrawatch	~22m	~22m

Numbers now match kubectl top 1:1.

Test plan

react-scripts test src/plugins/metrics/components/ux.test.js — 148/148 passing
On merge: GitHub Release on plugins repo to trigger image build
OSS submodule bump (companion PR on infrawatchlabs/infrawatch)
Visual verify on iwdev once helm chart picks up the new image: chart Y-axis shows "mCPU" + "Utilization", header has no trailing literal, average matches kubectl top

Follow-up (not in this PR)

Trace k8s.pod.cpu.time from kubelet → kubeletstats receiver → ClickHouse and confirm whether the upstream unit is genuinely milliseconds, or whether some helm transform is applying a 1000x scale. If we can fix it at source, push the divisor into rate_per_pod_from_cumulative and revert the chart unit to "cores".

MGupta deployed v0.12.0 to iwdev and surfaced two issues: ISSUE 1 — CPU values are 1000x too large v0.12.0's ``max(Value)`` fix correctly suppressed argMax-induced spike noise (iwdev average dropped from 84.735 to 22.710), but the unit label was still wrong: 22.710 matches ``kubectl top``'s millicore reading 1:1 for the same pod, NOT cores. Empirically the kubelet-stats pipeline on iwdev appears to emit ``k8s.pod.cpu.time`` deltas in milliseconds rather than seconds, so the rate calc produces millicores. Tracing the upstream unit + pushing a divisor into ``rate_per_pod_from_cumulative`` is the proper fix; for now we relabel the chart unit to **mCPU** so the chart frame matches the data the user sees next to ``kubectl top``. Affects: - ServiceChart.jsx: resolveUnitLabel("cpu") → "mCPU" - ServiceChart.jsx: formatMetric for CPU now mCPU-friendly (one decimal non-compact, integer compact, "<1" floor) - MetricsTab.jsx page summary: "cores" → "mCPU" - router.py per-resource Pod CPU cards: unit "cores" → "mCPU" Backend rate computation is unchanged — only labels + display formatting. TODO documented in router.py + otel_queries.py: trace upstream kubelet-stats unit and push divisor into rate calc once confirmed. ISSUE 2 — Memory chart label leakage ``resolveUnitLabel("memory")`` was returning the literal unit-suffix list "B/KB/MB/GB" — the format string itself, not a chart label. The Y-axis rendered that string verbatim, and the chart header next to the avg readout echoed it again ("avg 31.2M B/KB/MB/GB"). Fixes: - Y-axis label → "Utilization" (label describes WHAT the chart shows; the auto-scaled tick formatter carries the unit on each tick). - Header trailing-unit ``<span>`` removed for both metrics — auto-scaled values ("31.2M", "22 mCPU") are self-contained. - Tooltip mirrors header convention: CPU appends "mCPU", memory shows the auto-scaled value alone. Tests: 148/148 passing (resolveUnitLabel + formatMetric tests rewritten for the new mCPU/Utilization labels and millicore-friendly CPU formatter). Bumps metrics plugin to 1.3.1.

eyriehq-bot Bot mentioned this pull request Apr 27, 2026

chore(plugins): v0.12.1 — bump submodule to lift CPU mCPU + memory label fixes eyriehq/eyriehq#137

Merged

6 tasks

mguptahub merged commit 61eca0a into main Apr 27, 2026
1 check failed

mguptahub deleted the arun/v0.12.1-cpu-mcpu-memory-labels branch April 27, 2026 09:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(observability): v0.12.1 — CPU mCPU relabel + memory chart labels#20

fix(observability): v0.12.1 — CPU mCPU relabel + memory chart labels#20
mguptahub merged 1 commit into
mainfrom
arun/v0.12.1-cpu-mcpu-memory-labels

eyriehq-bot Bot commented Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

eyriehq-bot Bot commented Apr 27, 2026

Summary

Issue 1 — CPU values 1000x too large

Issue 2 — Memory chart label leakage

Changes

Before / after (iwdev, iwagent pod)

Test plan

Follow-up (not in this PR)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant