health: stability events log + live system metrics page#732
Merged
Conversation
backend/stability:
EventLog persists structured events to $SNAP_COMMON/stability-events.jsonl
(one JSON line per zram setup / file-swap disable / pressure detection /
SIGTERM / SIGKILL). Watcher and Zram both accept an optional *EventLog
and append events alongside their existing zap logging.
backend/health:
New package. Collector reads /proc/stat, /proc/meminfo, /proc/diskstats,
/proc/net/dev for live system metrics; statfs of each non-snap mountpoint
for capacity. Health{} bundles the EventLog reader + Collector for use
by REST.
backend/rest:
Two new admin-secured endpoints:
GET /rest/settings/health/events?limit=N -> recent stability events
GET /rest/settings/health/metrics -> single Snapshot
web/platform:
New Health.vue. Polls metrics every 2 s (computes CPU% / disk-IO KB/s /
net rate from snapshot deltas), events every 10 s. Shows per-mount usage
bars, swap usage, and the stability event history. Listed under Settings
with a 'favorite' material icon.
CPU ticks advance with sine-shaped busy/idle, mem/swap usage oscillates so the live bars actually move, net/disk byte counters grow with random deltas to produce realistic KB/s rates. Events list includes a recent SIGTERM + earlier SIGKILL chain plus a zram_enabled + swapoff_file pair matching what the borisarm64 OOM stress test produced.
el-table forces a fixed wide layout that overflowed on mobile. Switch to a vertical list of cards (max-width 720px) with a colored left border per event kind (red for kills, orange for pressure, green for zram/swap actions). Time wraps below kind on narrow screens via flex-wrap.
Each event now has a material icon (warning/cancel for kills, priority for pressure, memory/swap for zram) tinted to match the left-border accent color, a white rounded card with soft shadow, and relative time (e.g. '2m ago') with absolute timestamp on hover. Tighter padding + smaller fonts under 600px.
Wrap events H2 + list in .settingsblock so they get the same 1024px-capped centered container as the CPU/memory/disk sections. Drop the now-redundant 720px event-list cap.
Other views import these in their <style> block. Routes are lazy-loaded, so refreshing on /health was loading Health.vue cold — without those imports the page lost the site-wide typography/layout rules and the event-card material icons fell back to text. Matches the pattern in Settings.vue / Logs.vue / etc.
- Switch <style scoped> to plain <style> matching Settings/Logs. Under scoped, the @import 'site.css' was rewritten to data-v-… selectors so it didn't reach the global menu/header on cold refreshes of /health. - Refactor disks/network rows into a flex 'metric-row' with name on the left, tabular-num value on the right, optional bar below — keeps layout stable when rate digits change. - Add 16 px horizontal padding to the events block under 1024 px so the event cards inset from the screen edge like the rest of the content.
Wrap the two col2 columns in a flex .health-row (40 px gap, wrap on narrow screens) — gives the CPU/memory and disks/network blocks real breathing room on desktop. Cap the events block at 720 px and center it so it doesn't stretch full settingsblock width.
EventLog: Recent() previously decoded the entire jsonl file into a slice before trimming, so a 100k-event log would allocate the full set even for a Recent(10) call. Reworked to use a fixed-size ring buffer (capped at the requested limit) so memory use is O(limit), not O(file size). Append() now rotates the file when it exceeds 256 KiB — keeps the newest 1000 events and rewrites atomically via tmpfile+rename. Disk usage is also bounded. i18n: Added the new settings.health label and the full health.* dictionary to ar/de/es/fr/hi/ja/pt/ru/zh-CN. Previously only en.json had them and other locales fell back to English mid-page.
readCPU/readMemory/readDisks/readNet now hang off *Collector so they own the procDir path instead of taking it as an argument. Snapshot() no longer threads filepath.Join calls. Tests cover each method individually plus the end-to-end Snapshot path through a shared newTestCollector helper. Matches the same OO style we use for the EventLog / Watcher / Zram.
Every commit on a branch with an open PR was creating two builds (push + pull_request) and Drone serializes them, so the queue piled up faster than it drained. Drop pull_request from the event trigger — the push build for the same commit already validates everything PR check would, and merging is unblocked once that one is green.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
\$SNAP_COMMON/stability-events.jsonl— every zram setup, file-swap disable, pressure detection, SIGTERM/SIGKILL is recorded with timestamp + structured fields.No UI framework migration in this PR — uses existing Element Plus (`el-progress`, `el-table`).
Test plan