-
Notifications
You must be signed in to change notification settings - Fork 1
Pipeline Design 170
Now I have a thorough understanding of the codebase. Let me produce the ADR.
Shipwright's dashboard (Bun WebSocket server + vanilla TypeScript frontend) currently serves 13 tab views via a reactive Store → View pattern. FleetState is computed server-side every 2s and broadcast to up to 50 WebSocket clients. The dashboard already displays cost, DORA grades, pipeline status, and agent health — but these are scattered across Overview, Metrics, and Insights tabs. There is no unified health-at-a-glance surface.
Issue #170 requests: a composite health score (0-100) with trend, anomaly alerts with drill-down, cost burn gauge, stage progress vs historical avg, and DORA cards — all live-updating via WebSocket.
Constraints:
- Server is a single 5800-line
server.ts(Bun runtime, SQLite + JSONL fallback) - Frontend uses no framework — raw DOM manipulation with
Viewinterface (init/render/destroy) - All CSS uses
--vardesign tokens (--cyan,--abyss,--rose, etc.) - Tab panels are pre-rendered in HTML, toggled by
switchTab() - FleetState broadcast uses JSON string deduplication — payload bloat impacts all clients
- Bash 3.2 compatibility required for test scripts
Add a dedicated "Health" tab (Approach B) with server-side health score computation piggybacking on the existing getFleetState() cycle. Heavy data (7-day trend, anomaly details, stage history) is served via 3 new REST endpoints fetched lazily on tab open — NOT stuffed into the WebSocket payload.
-
Health score computed in TypeScript on the server, not by shelling out to
sw-pipeline-vitals.sh. This avoids subprocess overhead in the 2s broadcast loop. The four signals (momentum 25%, convergence 35%, budget 20%, error maturity 20%) mirror the vitals engine weights. -
FleetState extended with a small
health?optional field (~200 bytes) containing onlyscore,verdict,signals, andactivePipelines. Trend and anomaly data are NOT in FleetState. -
Three REST endpoints for on-demand data:
/api/health/trend,/api/health/anomalies,/api/health/stages. Each is cached server-side (60s TTL for trend, 30s for anomalies/stages). -
Header health badge provides at-a-glance status without switching tabs. Clicking navigates to the Health tab.
-
All new types are additive and optional — zero risk to existing 13 views or their tests.
┌─────────────────────────────────────────────────────────────────┐
│ server.ts │
│ │
│ getFleetState() │
│ ├── readEvents() ─── events.jsonl / SQLite │
│ ├── readDaemonState() ─── daemon-state.json / SQLite │
│ ├── getCostInfo() ─── costs.json + budget.json │
│ ├── calculateDoraGrades() ─── 7-day event scan │
│ └── computeHealthScore() ─── NEW: momentum/convergence/ │
│ │ budget/errorMaturity │
│ └── returns HealthInfo (appended to FleetState.health) │
│ │
│ REST endpoints (lazy-fetched): │
│ GET /api/health/trend ─── getHealthTrend(days) │
│ GET /api/health/anomalies ─── getAnomalies(events) │
│ GET /api/health/stages ─── getStageProgress(events, jobs) │
│ │
│ broadcastToClients(fleetState) ── every 2s via WebSocket │
└────────────┬──────────────────────┬─────────────────────────────┘
│ WS push │ REST responses
▼ ▼
┌─────────────────────────────────────────────────────────────────┐
│ Browser Client │
│ │
│ ws.ts ──onmessage──► store.set("fleetState", data) │
│ │ │
│ ├──► header.ts: renderHealthBadge() │
│ │ (score + verdict color) │
│ │ │
│ └──► health.ts (if active tab): │
│ ├── renderHealthGauge(health) │
│ ├── renderCostBurnGauge(cost) │
│ └── renderDoraCards(dora) │
│ │
│ api.ts (on tab open): │
│ fetchHealthTrend() ──► renderTrendSparkline(points) │
│ fetchAnomalies() ──► renderAnomalyAlerts(anomalies) │
│ fetchStageProgress() ──► renderStageProgress(stages) │
└─────────────────────────────────────────────────────────────────┘
// Pure function — no I/O, no subprocess
function computeHealthScore(
events: DaemonEvent[],
daemonState: DaemonState,
costInfo: CostInfo
): HealthInfo;
// Errors: returns { score: 100, verdict: "green", ... } when no data (healthy idle)
// File I/O (reads progress snapshots), cached 60s
function getHealthTrend(days: number): { points: Array<{ ts: string; score: number; verdict: string }> };
// Errors: returns { points: [] } on missing/corrupt files
// Reads events, compares to baselines using EMA
function getAnomalies(events: DaemonEvent[]): { anomalies: AnomalyAlert[] };
// Errors: returns { anomalies: [] } on computation failure
// Reads active jobs + historical events
function getStageProgress(events: DaemonEvent[], pipelines: PipelineInfo[]): { stages: StageProgressInfo[] };
// Errors: returns { stages: [] } on empty inputexport interface HealthInfo {
score: number; // 0-100, clamped
verdict: "green" | "yellow" | "red" | "critical";
signals: {
momentum: number; // 0-100
convergence: number; // 0-100
budget: number; // 0-100
errorMaturity: number; // 0-100
};
activePipelines: number;
}
export interface AnomalyAlert {
id: string;
metric: string;
value: number;
baseline: number;
severity: "warning" | "critical";
rootCause: string;
factors: string[];
actions: string[];
ts: string;
issue?: number;
}
export interface StageProgressInfo {
pipelineIssue: number;
stage: string;
currentDuration_s: number;
avgDuration_s: number;
count: number;
status: "on-track" | "slow" | "fast";
}
// Extended (additive):
export interface FleetState {
/* ... existing fields unchanged ... */
health?: HealthInfo; // NEW — optional
}
export type TabId = /* existing 13 */ | "health";export const healthView: View = {
init(): void; // Fetches trend, anomalies, stages via REST
render(state: FleetState): void; // Updates gauge, cost, DORA from WS data
destroy(): void; // Removes click listeners on anomaly cards
};
// Pure render functions (DOM manipulation):
function renderHealthGauge(el: HTMLElement, health: HealthInfo): void;
function renderTrendSparkline(el: HTMLElement, points: TrendPoint[]): void;
function renderAnomalyAlerts(el: HTMLElement, anomalies: AnomalyAlert[]): void;
function renderCostBurnGauge(el: HTMLElement, cost: CostInfo): void;
function renderStageProgress(el: HTMLElement, stages: StageProgressInfo[]): void;
function renderDoraCards(el: HTMLElement, dora: DoraGrades): void;export async function fetchHealthTrend(days?: number): Promise<{ points: TrendPoint[] }>;
export async function fetchAnomalies(): Promise<{ anomalies: AnomalyAlert[] }>;
export async function fetchStageProgress(): Promise<{ stages: StageProgressInfo[] }>;
// All follow existing pattern: catch → return default empty response1. REAL-TIME (every 2s):
server: getFleetState()
→ computeHealthScore(events, daemonState, costInfo)
→ FleetState.health = { score, verdict, signals, activePipelines }
→ broadcastToClients(fleetState)
→ ws.onmessage → store.set("fleetState")
→ header: renderHealthBadge(state.health)
→ health tab (if active): renderHealthGauge, renderCostBurnGauge, renderDoraCards
2. LAZY (on tab open, cached):
health.init()
→ fetchHealthTrend(7) → GET /api/health/trend?days=7
→ server: getHealthTrend(7) reads ~/.shipwright/progress/issue-*.json
→ returns { points: [...] } → renderTrendSparkline()
→ fetchAnomalies() → GET /api/health/anomalies
→ server: getAnomalies(events) compares stage durations/failures vs EMA baselines
→ returns { anomalies: [...] } → renderAnomalyAlerts()
→ fetchStageProgress() → GET /api/health/stages
→ server: getStageProgress(events, pipelines) compares current vs avg
→ returns { stages: [...] } → renderStageProgress()
3. USER INTERACTION:
click anomaly card → toggle .anomaly-drilldown visibility (local state)
click health badge → switchTab("health")
| Component | Error Source | Handling |
|---|---|---|
computeHealthScore() |
Missing events/cost data | Returns healthy idle state (score=100, verdict="green") |
getHealthTrend() |
Missing/corrupt progress files | Returns { points: [] } — sparkline shows empty state |
getAnomalies() |
No baseline data | Returns { anomalies: [] } — alerts section shows "No anomalies" |
getStageProgress() |
No active pipelines | Returns { stages: [] } — progress section shows empty state |
| REST fetch failures | Network/auth errors | API client catches, returns default response; view shows last known data |
| WebSocket disconnect | Network loss | Existing reconnect logic preserves last FleetState; health badge shows "—" |
healthView.init() |
Any throw | Caught by existing tab error boundary in router.ts; shows retry button |
healthView.render() |
Bad FleetState shape | Guard: if (!state.health) return; — skip render, no crash |
-
Scatter across existing tabs — Pros: no new tab, minimal new code / Cons: no unified health view, user must hop 3 tabs, doesn't meet acceptance criteria for cohesive dashboard. Rejected because it fragments the monitoring experience.
-
Header overlay/panel — Pros: always visible without tab switching / Cons: cramped header, complex overlay z-index management, hard to fit 5 widgets plus drill-down in a header panel. Rejected because it requires header restructuring with high blast radius.
-
Shell out to
sw-pipeline-vitals.sh— Pros: reuses existing bash logic / Cons: adds ~200ms subprocess per 2s broadcast, doesn't scale with 50 clients, bash output parsing fragile. Rejected for performance — TypeScript computation is synchronous and fast. -
Put all data in FleetState — Pros: single data source / Cons: 7-day trend + anomaly details adds ~5KB per broadcast × 50 clients × every 2s = significant bandwidth waste. Rejected for payload bloat. Only the 200-byte summary goes in FleetState; heavy data is REST-fetched lazily.
App
├── Header (existing)
│ ├── ConnectionDot (existing)
│ ├── CostTicker (existing)
│ └── HealthBadge (NEW) ← state: fleetState.health
│ └── onClick → switchTab("health")
│
├── TabNav (existing, extended)
│ └── "Health" button (NEW)
│
└── Main
└── HealthView (NEW, tab="health")
├── HealthScoreSection ← state: fleetState.health (WS)
│ ├── HealthGauge (SVG circle)
│ └── TrendSparkline ← local: healthTrend (REST, cached)
│
├── AnomalyAlertsSection ← local: anomalies (REST, cached)
│ └── AlertCard[] ← local: expandedAlertId (view state)
│ └── DrillDown (conditional render)
│
├── BottomRow
│ ├── CostBurnGauge ← state: fleetState.cost (WS)
│ └── StageProgressList ← local: stages (REST, cached)
│ └── StageProgressBar[]
│
└── DoraCardsSection ← state: fleetState.dora (WS)
└── DoraCard × 4
State ownership: WS-driven data lives in the global store (health, cost, dora). REST-fetched data (trend, anomalies, stages) lives as module-level variables in health.ts, refreshed on init().
| Data | Source | Update Frequency | Storage |
|---|---|---|---|
health.score/verdict/signals |
WebSocket FleetState | Every 2s | store.fleetState.health |
cost |
WebSocket FleetState | Every 2s |
store.fleetState.cost (existing) |
dora |
WebSocket FleetState | Every 2s |
store.fleetState.dora (existing) |
| 7-day trend points | REST /api/health/trend
|
On tab open | Module-local in health.ts
|
| Anomaly alerts | REST /api/health/anomalies
|
On tab open | Module-local in health.ts
|
| Stage progress | REST /api/health/stages
|
On tab open | Module-local in health.ts
|
| Expanded alert ID | User click | On interaction | Module-local in health.ts
|
No new store keys needed — REST data is tab-scoped and discarded on destroy().
- Health gauge:
aria-label="Pipeline health score: {score} out of 100, status {verdict}", not color-only - Alert severity: text label + icon, not just colored border
- Anomaly drill-down:
<button>wrapper on cards,aria-expanded, keyboard Enter/Space - Focus management: visible focus ring (existing
--cyanoutline), logical tab order - Color contrast: all text uses
--text-primaryon--abyss(passes 4.5:1 per existing design) - Semantic HTML:
<section aria-label>,<h2>,<button>,<ul>/<li>for alerts - Live updates:
aria-live="polite"on health score container for screen reader announcements - Touch targets: all clickable elements min 44×44px
| Breakpoint | Layout |
|---|---|
| 320px (mobile) | Single column stack. Gauge shrinks to 120px. DORA cards 1×4 vertical. Alert cards full width. |
| 768px (tablet) | 2-column grid: gauge + trend side-by-side, cost + stages side-by-side. DORA cards 2×2. |
| 1024px (desktop) | Full designed layout. DORA cards 4×1 row. All sections visible without scroll. |
| 1440px (wide) | Wider cards with more sparkline data points. Gauge at full 200px. |
-
dashboard/src/views/health.ts— Health tab view with 6 render functions -
dashboard/src/views/health.test.ts— Vitest unit tests (9 test cases)
-
dashboard/src/types/api.ts— AddHealthInfo,AnomalyAlert,StageProgressInfo; extendFleetState; extendTabId -
dashboard/server.ts— AddcomputeHealthScore(),getHealthTrend(),getAnomalies(),getStageProgress(); extendgetFleetState(); register 3 REST endpoints -
dashboard/src/main.ts— Import andregisterView("health", healthView) -
dashboard/src/core/api.ts— AddfetchHealthTrend(),fetchAnomalies(),fetchStageProgress() -
dashboard/public/index.html— Add Health tab button + panel -
dashboard/public/styles.css— Health-specific CSS (~150 lines) -
dashboard/src/components/header.ts— AddrenderHealthBadge()after connection dot -
scripts/sw-dashboard-e2e-test.sh— Add health endpoint + FleetState verification -
scripts/sw-server-api-test.sh— Add 3 endpoint tests
- None new. All data sources (
events.jsonl,costs.json,budget.json,progress/) are already read by the server.
-
server.tssize: Already 5800 lines. Adding ~150 lines of health computation is acceptable but approaching the threshold where extraction to a module would help. Monitor. -
computeHealthScore()in broadcast loop: Must remain synchronous and fast (<5ms). No file I/O, no subprocesses. The REST endpoints handle heavy lifting separately. -
CSS specificity: Prefix all new classes with
health-to avoid collisions with existing 13 views' styles. -
FleetState deduplication: Adding
healthto FleetState means the score will change every 2s as pipelines progress, reducing deduplication effectiveness. Acceptable — the score is small and clients need the updates.
-
npm run buildcompiles with zero errors (TypeScript strict mode) -
npm testpasses all existing 102 test suites + newhealth.test.ts - Health tab renders gauge at score=0, score=50, score=100 (boundary cases)
- Health tab renders correct verdict colors: green (>=75), yellow (>=50), red (>=25), critical (<25)
- Empty state (no pipelines): score=100, "No active pipelines" message, empty alerts/stages
- Anomaly drill-down toggles on Enter/Space key (keyboard accessible)
- FleetState WebSocket message includes
healthfield with valid structure -
/api/health/trend?days=7returns{ points: [...] }array -
/api/health/anomaliesreturns{ anomalies: [...] }with required fields per item -
/api/health/stagesreturns{ stages: [...] }with status classification - Header health badge updates every 2s and navigates to Health tab on click
- Layout responsive at 320px, 768px, 1024px, 1440px (no horizontal overflow)
- No regressions in existing Overview, Metrics, or Insights tab rendering