Skip to content

CPU metric is bimodal — use a delta sample instead of top -bn1 #23

@pachev

Description

@pachev

What we see

Both the KPI tile and the CPU-over-time chart on the server-detail page tend to report values clustered near 0% or near 100%, with very little in between, even on idle hosts (e.g. my LXC). The chart looks spiky, the tile flips between extremes, and the two often disagree because they sample at slightly different times (KPI from ConnectionCheck, chart from StatsCollect).

Why

Both metrics come from `Mast.Hosts.Metrics.parse_cpu/1` (lib/mast/hosts/metrics.ex:18), which reads the idle field out of the first `%Cpu(s):` line of `top -bn1`. That line is an instant snapshot — over the ~1ms window `top` measures, a core is essentially always fully idle or fully busy. Averaged usage isn't actually being measured.

Options

  1. `top -bn2 -d 1`, take the second iteration. Top's second pass is a delta over the elapsed second, so we get a real 1s-averaged number. One-character SSH command change plus picking the second `%Cpu(s):` line in the regex. Biggest realism win for the smallest diff.
  2. Read `/proc/stat` twice ~250ms apart and compute the delta ourselves. Most accurate, no top-output fragility, but new code.
  3. Surface `load_1` as the KPI instead of CPU%. Already kernel-smoothed. Misleading on multi-core (load=2 on 2 cores is ≠ 100%), but honest about what it is.

Recommend (1) when we pick this up.

Out of scope here

Not changing the chart renderer or the schema. `server_stats.stats["cpu"]` keeps the same shape; only the value gets more meaningful.

Where

  • `lib/mast/hosts/metrics.ex` — `parse_cpu/1`
  • `lib/mast/workers/connection_check.ex` — `top -bn1 | head -3`
  • `lib/mast/workers/stats_collect.ex` — the same `top` call lives in here too; both probes need the new command

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions