Skip to content

Pipeline Plan 170

Seth Ford edited this page Mar 1, 2026 · 1 revision

Plan: Dashboard Real-Time Health & Anomaly Visualization

Issue: #170 Design doc: docs/plans/2026-03-01-dashboard-health-anomaly-design.md

Summary

Add a new "Health" tab to the Shipwright dashboard that provides a unified real-time view of pipeline health scoring (0-100), anomaly detection alerts with drill-down, cost burn rate vs budget, stage-level progress with historical comparison, and DORA metrics with targets. All data streams via the existing WebSocket FleetState mechanism for live updates without page refresh.

Alternatives Considered

Approach Description Chosen? Why
A: Scatter across tabs Health in Overview, anomalies in Insights, DORA in Metrics No No unified health view; user hops between 3 tabs
B: New "Health" tab Dedicated tab with all health/anomaly visualization Yes Unified view, clean separation, meets all acceptance criteria
C: Header overlay Always-visible header panel No Cramped header, complex management

Files to Modify

New Files

File Purpose
dashboard/src/views/health.ts Health tab view (init/render/destroy) — gauge, alerts, cost burn, stages, DORA
dashboard/src/views/health.test.ts Unit tests for health view render functions

Modified Files

File Changes
dashboard/src/types/api.ts Add HealthInfo, AnomalyAlert, StageProgressInfo types; extend FleetState; add "health" to TabId
dashboard/server.ts Add computeHealthScore(), getHealthTrend(), getAnomalies(), getStageProgress(); extend getFleetState(); add 3 REST endpoints
dashboard/src/main.ts Import and registerView("health", healthView)
dashboard/public/index.html Add "Health" tab button + tab-panel container
dashboard/public/styles.css Health-specific CSS: gauge, alerts, progress bars, DORA cards, responsive grid
dashboard/src/components/header.ts Add compact health score badge near connection dot
scripts/sw-dashboard-e2e-test.sh Add health endpoint tests + FleetState health field verification
scripts/sw-server-api-test.sh Add tests for /api/health/trend, /api/health/anomalies, /api/health/stages

Implementation Steps

Step 1: TypeScript Types (api.ts)

Add new interfaces and extend existing types:

export interface HealthInfo {
  score: number;           // 0-100 composite health score
  verdict: string;         // "green" | "yellow" | "red" | "critical"
  signals: {
    momentum: number;      // 0-100 — iteration velocity
    convergence: number;   // 0-100 — error trend direction
    budget: number;        // 0-100 — cost utilization
    errorMaturity: number; // 0-100 — unique/total error ratio
  };
  activePipelines: number;
}

export interface AnomalyAlert {
  id: string;
  metric: string;          // e.g. "build.duration", "test.failures"
  value: number;           // current value
  baseline: number;        // expected value
  severity: "warning" | "critical";
  rootCause: string;       // predicted root cause
  factors: string[];       // contributing factors
  actions: string[];       // suggested actions
  ts: string;              // ISO timestamp
  issue?: number;
}

export interface StageProgressInfo {
  pipelineIssue: number;
  stage: string;
  currentDuration_s: number;
  avgDuration_s: number;
  count: number;           // historical sample count
  status: "on-track" | "slow" | "fast";
}

// Extend FleetState:
export interface FleetState {
  // ... existing fields unchanged ...
  health?: HealthInfo;
}

// Extend TabId:
export type TabId = /* existing */ | "health";

Step 2: Server Health Computation (server.ts)

Implement computeHealthScore() mirroring sw-pipeline-vitals.sh weights:

Momentum (25%):    Average (iteration / maxIterations) across active pipelines
Convergence (35%): Error count trend — compare last-hour errors vs previous-hour
Budget (20%):      100 - pct_used (from cost info)
Error Maturity (20%): unique_errors / total_errors ratio × 100
  • Add state.health = computeHealthScore(events, daemonState, costInfo) in getFleetState()
  • When no pipelines active: score = 100 (healthy idle), verdict = "green"

Step 3: Server REST Endpoints (server.ts)

GET /api/health/trend?days=7

  • Scan ~/.shipwright/progress/issue-*.json files for historical snapshots
  • Group by day, compute daily average score
  • Return: { points: [{ ts, score, verdict }] }
  • Cache with 60s TTL

GET /api/health/anomalies

  • Read events from last 24h
  • Compare stage durations, failure rates against computed baselines (EMA from historical events)
  • Classify: warning (>2σ), critical (>3σ)
  • Return: { anomalies: [{ id, metric, value, baseline, severity, rootCause, factors, actions, ts }] }

GET /api/health/stages

  • For each active pipeline, compute current stage duration
  • Compare to historical average for that stage (from events)
  • Return: { stages: [{ pipelineIssue, stage, currentDuration_s, avgDuration_s, count, status }] }

Step 4: HTML Structure (index.html)

Add tab button in <nav class="tab-nav">:

<button class="tab-btn" data-tab="health">
  <svg viewBox="0 0 16 16" width="14" height="14"><!-- heart/pulse icon --></svg>
  Health
</button>

Add tab panel:

<div class="tab-panel" id="tab-health" style="display:none">
  <section class="health-score-section" aria-label="Pipeline Health Score">
    <div id="health-gauge-wrap"></div>
    <div id="health-trend-wrap"></div>
  </section>
  <section class="anomaly-alerts-section" aria-label="Anomaly Alerts">
    <h2 class="section-heading">Anomaly Alerts</h2>
    <div id="anomaly-alerts-list"></div>
  </section>
  <div class="health-bottom-row">
    <section class="cost-burn-section" aria-label="Cost Burn Rate">
      <h2 class="section-heading">Cost vs Budget</h2>
      <div id="cost-burn-gauge-wrap"></div>
    </section>
    <section class="stage-progress-section" aria-label="Stage Progress">
      <h2 class="section-heading">Stage Progress</h2>
      <div id="stage-progress-list"></div>
    </section>
  </div>
  <section class="dora-cards-section" aria-label="DORA Metrics">
    <h2 class="section-heading">DORA Metrics</h2>
    <div id="dora-health-cards" class="dora-cards-grid"></div>
  </section>
</div>

Step 5: CSS Styles (styles.css)

Key new styles:

  • .health-score-section — 2-column grid (gauge + trend)
  • .health-gauge — SVG radial gauge with gradient fill
  • .anomaly-card — card with severity-colored left border
  • .anomaly-drilldown — expandable detail panel (max-height transition)
  • .cost-burn-gauge — SVG arc gauge with gradient
  • .stage-progress-bar — horizontal bar with avg marker
  • .dora-cards-grid — 4-column responsive grid
  • .dora-card-health — card with grade badge and target comparison
  • Responsive: stack to 1 column at 320px, 2 columns at 768px

Step 6: Health View (health.ts)

export const healthView: View = {
  init() {
    // Fetch trend data, anomalies, stage progress via REST
    fetchHealthTrend();
    fetchAnomalies();
    fetchStageProgress();
  },
  render(state: FleetState) {
    // Real-time updates from WebSocket
    renderHealthGauge(state.health);
    renderCostBurnGauge(state.cost);
    renderDoraCards(state.dora);
    // Stage progress also updated from WS (active pipelines change)
  },
  destroy() {
    // Clean up click handlers on anomaly cards
  }
};

Widget functions:

  • renderHealthGauge(health) — SVG circle with stroke-dasharray animation, verdict color
  • renderTrendSparkline(points) — SVG polyline with area fill
  • renderAnomalyAlerts(anomalies) — sorted cards with click-to-expand drill-down
  • renderCostBurnGauge(cost) — SVG arc (180°) with spent/budget ratio
  • renderStageProgress(stages) — horizontal bars with avg marker line
  • renderDoraCards(dora) — 4 metric cards with grade badge (Elite/High/Medium/Low)

Step 7: Register View (main.ts)

import { healthView } from "./views/health";
registerView("health", healthView);

Step 8: Header Health Badge (header.ts)

Add after connection dot in renderCostTicker() or as separate renderHealthBadge():

<span class="health-badge health-{verdict}" title="Health: {score}/100">
  {score}
</span>

Click handler: switchTab("health").

Step 9: Health View Tests (health.test.ts)

Test cases:

  1. renderHealthGauge — correct SVG attributes for score 0, 50, 100
  2. renderHealthGauge — verdict colors map correctly (green/yellow/red/critical)
  3. renderAnomalyAlerts — renders correct number of alert cards
  4. renderAnomalyAlerts — drill-down expands on click
  5. renderAnomalyAlerts — empty state shows "No anomalies detected"
  6. renderCostBurnGauge — arc fill matches pct_used
  7. renderStageProgress — bar widths proportional to duration/avg
  8. renderDoraCards — all 4 metrics rendered with correct grades
  9. renderTrendSparkline — polyline points match data

Step 10: E2E Tests (sw-dashboard-e2e-test.sh)

Add mock data:

  • Progress snapshots in $HOME/.shipwright/progress/
  • Historical events with varying health scores

Test:

  • /api/health/trend returns array with expected structure
  • /api/health/anomalies returns anomaly array
  • /api/health/stages returns stage progress for active pipelines
  • FleetState from /ws contains health field with score/verdict/signals

Step 11: API Tests (sw-server-api-test.sh)

Test each endpoint:

  • /api/health/trend?days=7 → response has points array
  • /api/health/anomalies → response has anomalies array with required fields
  • /api/health/stages → response has stages array
  • /api/health/trend?days=0 → graceful handling (empty array)
  • /api/health/trend?days=365 → capped at reasonable limit

Step 12: Build and Verify

npm run build   # TypeScript compiles, no errors
npm test        # All existing + new tests pass

Task Checklist

  • Task 1: Add TypeScript types (HealthInfo, AnomalyAlert, StageProgressInfo) to api.ts
  • Task 2: Implement computeHealthScore() in server.ts
  • Task 3: Add health field to getFleetState() and register 3 REST endpoints
  • Task 4: Add Health tab HTML structure to index.html
  • Task 5: Add Health-specific CSS to styles.css
  • Task 6: Create health.ts view with all widget render functions
  • Task 7: Register health view in main.ts
  • Task 8: Add compact health score badge to header
  • Task 9: Write unit tests (health.test.ts)
  • Task 10: Extend E2E tests for health endpoints
  • Task 11: Extend API tests for health endpoints
  • Task 12: Build and verify — npm run build + npm test

Testing Approach

Layer File What
Unit health.test.ts Render functions, score computation, edge cases
API sw-server-api-test.sh REST endpoint response shapes
E2E sw-dashboard-e2e-test.sh Full mock server with health data flow
Build npm run build TypeScript compilation, no regressions

Definition of Done

  • Health tab visible in dashboard with gauge showing 0-100 score
  • 7-day trend sparkline renders historical data
  • Anomaly alerts list with severity + root cause
  • Click anomaly → drill-down with contributing factors and actions
  • Cost burn gauge shows spent vs budget with rate
  • Stage progress bars show duration vs historical average
  • DORA cards show all 4 metrics with grade and target
  • All widgets update via WebSocket (no page refresh)
  • Health badge in header for at-a-glance status
  • All existing tests pass
  • New tests pass
  • Responsive at 320px, 768px, 1024px, 1440px
  • Keyboard accessible (tab, Enter/Space for drill-down)

Risk Analysis

Risk What Could Break Mitigation
FleetState shape change Existing views might fail if they destructure strictly All new fields are optional; existing types unchanged
Server perf degradation Health score computed every 2s push Lightweight computation (no subprocess, no heavy I/O); progress file reads cached
Test flakiness E2E tests timing-sensitive Use deterministic mock data; no real WebSocket timing
CSS conflicts New styles collide with existing Prefix all new classes with health- or scope under .tab-panel#tab-health

Clone this wiki locally