Skip to content

feat: multi-node viewer, combine, and UX improvements#795

Merged
thinkingfish merged 24 commits intoiopsystems:mainfrom
thinkingfish:feat/multi-node-combine
Apr 17, 2026
Merged

feat: multi-node viewer, combine, and UX improvements#795
thinkingfish merged 24 commits intoiopsystems:mainfrom
thinkingfish:feat/multi-node-combine

Conversation

@thinkingfish
Copy link
Copy Markdown
Member

@thinkingfish thinkingfish commented Apr 16, 2026

Summary

Multi-node/multi-instance support for the viewer and parquet combine tool, plus several UX improvements.

Multi-node/multi-instance

  • Combine: restructured per_source_metadata into nested { source_type: { node: metadata } } format; propagate node labels from service files
  • Viewer: node selector dropdown, instance selector for service sections, PromQL label injection for multi-node queries
  • System Info: multi-node system info page with per-node hardware details and sticky jump-to-node navigation
  • Metadata: dedicated metadata page with organized file metadata display

KPI validation

  • Validate service extension KPIs at load time by running PromQL queries; mark empty KPIs as available = false
  • Hide charts with no data on all sections (except cgroup, which awaits group selection)

Code quality

  • Extract Kpi::effective_query() to service_extension.rs, removing duplicate implementations from viewer/mod.rs and annotate.rs
  • Fix CounterGroup/GaugeGroup ambiguity from metriken re-exports by using explicit imports in sampler stats
  • Fix node/instance switching stuck on "Loading..." by bypassing same-route guard
  • Remove raw JSON from system info page
  • Fix metadata sidebar link arrow styling

Infrastructure

  • Switch metriken deps from crates.io to git for latest query engine
  • Multi-endpoint recording support

Test plan

  • cargo test passes
  • cargo clippy clean
  • Load single-node parquet — viewer works as before, empty charts hidden
  • Load combined multi-node parquet — node selector works, switching reloads data
  • System info page shows all nodes with sticky jump buttons
  • Metadata page shows organized file metadata
  • Service KPI sections hide unavailable KPIs in notes
  • Cgroup charts still show placeholders awaiting group selection

🤖 Generated with Claude Code

thinkingfish and others added 12 commits April 16, 2026 23:07
Add node and instance disambiguation to parquet combine, enabling
merging of recordings from multiple hosts (rezolus agents) and multiple
service processes/containers.

- Add --node and --instance flags to the recorder for labeling at
  capture time
- Prefix combined columns with node:: or instance:: to avoid conflicts
- Inject node/instance as Arrow field metadata so they become PromQL
  labels automatically (zero TSDB changes)
- Validate label uniqueness and auto-assign instance numbers when
  metadata is absent
- Store per-node systeminfo in per_source_metadata with composite keys
  (source:label)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a new /api/v1/file_metadata endpoint that returns all parquet
file-level key-value metadata as a JSON object. Update
extract_parquet_metadata to build this JSON by iterating all KV pairs
(parsing values as JSON where possible, falling back to strings).

Add build_multinode_systeminfo() which reads per_source_metadata to
construct a per-node systeminfo map for combined multi-node files.
When multiple nodes are present, the systeminfo endpoint returns a
JSON object keyed by node name instead of the flat single-node format.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…data

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tadata page

Also restores Source/Version fallback in TopNav metadata popup for
non-combined single-file recordings.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Release builds no longer need swap because CARGO_BUILD_JOBS=2 limits
parallel compilation enough to stay within memory.
@thinkingfish thinkingfish force-pushed the feat/multi-node-combine branch from b290652 to 7f0d1fe Compare April 17, 2026 06:14
Use the latest metriken from the main branch to pick up histogram 1.0
and other API changes. Fix GaugeGroup/CounterGroup ambiguity in
gpu/macos/stats.rs caused by metriken now re-exporting these types.
…itching fix

Restructure per_source_metadata from flat composite keys to nested
format: { "rezolus": { "web01": {...} }, "cachecannon": { "0": {...} } }.
Updates combine, viewer backend, and all frontend code to match.

Validate KPI availability at viewer load time by running PromQL queries
against the TSDB, so template-derived dashboards hide KPIs with no data
(e.g. zero-traffic histogram charts) instead of showing empty charts.

Fix node/instance switching getting stuck on "Loading..." by using
m.route.set() instead of m.redraw() to force the route's onmatch
handler to re-run and fetch fresh data for the selected node.
Replace `use crate::agent::*` with explicit imports in stats files that
also have `use metriken::*`, since metriken now re-exports CounterGroup
and GaugeGroup types that conflict with the agent's own implementations.
The previous fix used m.route.set(m.route.get()) which hits the
same-route guard that returns a never-resolving Promise. Instead,
directly re-fetch and re-process the current section data after
clearing caches, then redraw with the fresh results.
The metadata page now shows all file metadata, making the raw JSON
dump on the system info page redundant.
Add a navigation bar with buttons for each node at the top of the
multi-node system info page. Clicking a button smooth-scrolls to
that node's section.
- Extract Kpi::effective_query() to service_extension.rs; remove duplicate
  implementations from viewer/mod.rs and annotate.rs
- Hide charts with no data (display:none) instead of showing muted placeholders;
  cgroup charts are unaffected (noCollapse skips the no-data class)
- Add missing CSS for metadata sidebar link arrow to match query explorer
  and system info link styles
@thinkingfish thinkingfish changed the title feat(parquet): multi-node/multi-instance combine support feat: multi-node viewer, combine, and UX improvements Apr 17, 2026
thinkingfish and others added 4 commits April 17, 2026 02:16
overflow:hidden on #section-content was preventing position:sticky from
working. Change to overflow-x:hidden and set top to header height so
the nav bar sticks just below the top navbar.
The injectLabel function only handled metrics with braces (metric{labels})
or followed by brackets/parens (metric[5m]). Bare gauge metric names in
expressions like "memory_total - memory_available" were missed, causing
multi-node combined files to show one line per node instead of filtering
to the selected node.

Rewrote injectLabel as a single-pass regex that handles all three forms:
metric{labels}, metric[duration], and bare metric names in expressions.
Also fixes overflow:hidden → overflow-x:hidden on #section-content to
unblock sticky positioning for the system info node nav.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When combining multi-node parquet files, --pinned <node> sets which
rezolus node the viewer selects by default. Stored as pinned_node in
parquet file metadata. Validates the node name matches an actual input.
@thinkingfish thinkingfish marked this pull request as ready for review April 17, 2026 10:07
@thinkingfish thinkingfish merged commit 945c712 into iopsystems:main Apr 17, 2026
22 of 25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant