fix(observability): v0.12.1 — CPU mCPU relabel + memory chart labels#20
Merged
Conversation
MGupta deployed v0.12.0 to iwdev and surfaced two issues:
ISSUE 1 — CPU values are 1000x too large
v0.12.0's ``max(Value)`` fix correctly suppressed argMax-induced spike noise
(iwdev average dropped from 84.735 to 22.710), but the unit label was still
wrong: 22.710 matches ``kubectl top``'s millicore reading 1:1 for the same
pod, NOT cores. Empirically the kubelet-stats pipeline on iwdev appears to
emit ``k8s.pod.cpu.time`` deltas in milliseconds rather than seconds, so the
rate calc produces millicores. Tracing the upstream unit + pushing a divisor
into ``rate_per_pod_from_cumulative`` is the proper fix; for now we relabel
the chart unit to **mCPU** so the chart frame matches the data the user sees
next to ``kubectl top``. Affects:
- ServiceChart.jsx: resolveUnitLabel("cpu") → "mCPU"
- ServiceChart.jsx: formatMetric for CPU now mCPU-friendly (one decimal
non-compact, integer compact, "<1" floor)
- MetricsTab.jsx page summary: "cores" → "mCPU"
- router.py per-resource Pod CPU cards: unit "cores" → "mCPU"
Backend rate computation is unchanged — only labels + display formatting.
TODO documented in router.py + otel_queries.py: trace upstream kubelet-stats
unit and push divisor into rate calc once confirmed.
ISSUE 2 — Memory chart label leakage
``resolveUnitLabel("memory")`` was returning the literal unit-suffix list
"B/KB/MB/GB" — the format string itself, not a chart label. The Y-axis
rendered that string verbatim, and the chart header next to the avg readout
echoed it again ("avg 31.2M B/KB/MB/GB"). Fixes:
- Y-axis label → "Utilization" (label describes WHAT the chart shows;
the auto-scaled tick formatter carries the unit on each tick).
- Header trailing-unit ``<span>`` removed for both metrics — auto-scaled
values ("31.2M", "22 mCPU") are self-contained.
- Tooltip mirrors header convention: CPU appends "mCPU", memory shows
the auto-scaled value alone.
Tests: 148/148 passing (resolveUnitLabel + formatMetric tests rewritten for
the new mCPU/Utilization labels and millicore-friendly CPU formatter).
Bumps metrics plugin to 1.3.1.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
MGupta deployed v0.12.0 to iwdev and surfaced two issues against the per-pod chart grid:
Issue 1 — CPU values 1000x too large
v0.12.0's
max(Value)fix correctly suppressed argMax-induced spike noise (iwdev average dropped from 84.735 → 22.710), but the unit label was still wrong. Cross-check againstkubectl top pods -n infrawatchfor the same iwagent pod / window shows the rate values match millicores 1:1 — i.e., the v0.12.0 simulation's "0.022–0.035 cores" was right by raw scale but the chart label of "cores" was off by 1000x.Most likely root cause (TODO to verify upstream): the kubelet-stats pipeline on iwdev appears to emit
k8s.pod.cpu.timedeltas in milliseconds rather than seconds. Specifically:Safe fix this cycle (per brief): relabel chart unit to mCPU so the chart frame matches the data the user already sees side-by-side with kubectl top. Backend rate calc is unchanged — only labels + display formatting move.
Issue 2 — Memory chart label leakage
Changes
Before / after (iwdev, iwagent pod)
Numbers now match kubectl top 1:1.
Test plan
Follow-up (not in this PR)
Trace k8s.pod.cpu.time from kubelet → kubeletstats receiver → ClickHouse and confirm whether the upstream unit is genuinely milliseconds, or whether some helm transform is applying a 1000x scale. If we can fix it at source, push the divisor into rate_per_pod_from_cumulative and revert the chart unit to "cores".