feat(dashboard): add Experiments tab backed by gh aw experiments list#42213
Conversation
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
gh aw experiments list
|
✅ PR Code Quality Reviewer completed the code quality review. |
|
🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅ |
|
✅ Test Quality Sentinel completed test quality analysis. No test files were added or modified in this PR. All 6 changed files are dashboard source files (.mjs, package-lock.json, .ts, .ts, .js, .html). Test Quality Sentinel skipped. |
|
✅ Design Decision Gate 🏗️ completed the design decision gate check. No ADR enforcement needed: PR #42213 does not have the 'implementation' label and has 0 new lines of code in business logic directories (threshold: 100). |
There was a problem hiding this comment.
Pull request overview
Adds an Experiments tab to the Agentic Workflows Dashboard canvas extension, surfacing experiment-branch summaries from gh aw experiments list --json through a new server route and UI pagination.
Changes:
- Added an
/api/experimentsendpoint in the extension server and alistExperimentscanvas action (paged). - Added an Experiments tab in the web UI, including fetch/pagination wiring and a date formatter helper.
- Extended the TypeScript dashboard model/mock app with an
ExperimentInfotype and mock experiment data.
Show a summary per file
| File | Description |
|---|---|
| .github/extensions/agentic-workflows-dashboard/web/index.html | Adds the Experiments tab UI and renders experiment rows + pagination controls. |
| .github/extensions/agentic-workflows-dashboard/web/app.js | Adds experiments fetch/load state, paging, and a date formatting helper used by the new tab. |
| .github/extensions/agentic-workflows-dashboard/src/models.ts | Introduces the ExperimentInfo TypeScript interface for experiment list items. |
| .github/extensions/agentic-workflows-dashboard/src/app.ts | Extends the TS mock dashboard state/tabs and adds a mock experiment data builder. |
| .github/extensions/agentic-workflows-dashboard/package-lock.json | Updates the lockfile (notably removing @github/copilot-sdk entries). |
| .github/extensions/agentic-workflows-dashboard/extension.mjs | Adds cached getExperiments(), /api/experiments route, and listExperiments action; updates system message docs. |
Review details
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Files not reviewed (1)
- .github/extensions/agentic-workflows-dashboard/package-lock.json: Generated file
- Files reviewed: 5/6 changed files
- Comments generated: 2
- Review effort level: Low
| function formatDate(iso) { | ||
| if (!iso) return "—"; | ||
| const d = new Date(iso); | ||
| return isNaN(d.getTime()) ? "—" : d.toLocaleString(); | ||
| } |
| <div class="text-small color-fg-muted">Variants</div> | ||
| <div class="text-bold" x-text="exp.experiments"></div> |
There was a problem hiding this comment.
🧵 Reviewed using Impeccable skills by Impeccable Skills Reviewer · 82.6 AIC · ⌖ 6.98 AIC · ⊞ 4.9K
| if (hit) return hit; | ||
| const raw = await runGhAw(["experiments", "list", "--json"]); | ||
| const data = JSON.parse(raw); | ||
| const experiments = Array.isArray(data) ? data : []; |
There was a problem hiding this comment.
When gh aw experiments list --json exits cleanly (exit code 0) but returns a non-array payload — e.g. an error object like {"error":"unknown subcommand"} — the Array.isArray(data) ? data : [] branch silently caches [] for 60 s and the UI shows "No experiments found" instead of surfacing the actual error.
Contrast with getDefinitions() (line 70), which caches the raw parsed value, and getRuns() (line 91), which does logsData.runs ?? [] (narrower key access).
Consider throwing when the shape is unexpected so the server's top-level catch (line 181) returns a 500 that fetchExperiments() in web/app.js will display in errorExperiments:
if (!Array.isArray(data)) {
throw new Error(data?.error ?? `Unexpected response: ${JSON.stringify(data).slice(0, 120)}`);
}@copilot please address this.
| { id: "definitions", label: "Workflows", counter: "definitions" }, | ||
| { id: "runs", label: "Runs", counter: "runs" }, | ||
| { id: "details", label: "Run details" }, | ||
| { id: "experiments", label: "Experiments", counter: "experiments" }, |
There was a problem hiding this comment.
Tab ordering nit: placing "Experiments" between "Run details" (a context panel for a selected run) and "Commands" breaks the data-view grouping. A more natural order would put the two top-level data views next to each other:
Workflows | Runs | Experiments | Run details | Commands
Non-blocking — happy to defer or keep the current order if it was intentional.
Review SummaryApplied 📌 Inline comments at a glance
The overall architecture is clean and the feature follows established patterns well. Issues 1 and 2 are the priority before merge. @copilot please address the review comments above.
|
There was a problem hiding this comment.
Skills-Based Review 🧠
Applied /tdd, /grill-with-docs, and /improve-codebase-architecture — requesting changes on two correctness issues before merge.
📋 Key Themes & Highlights
Blocking Issues
- Unguarded
JSON.parseingetExperiments()— a CLI error or unavailable subcommand will crash the HTTP route rather than returning a graceful empty state. resp.json()beforeresp.okinfetchExperiments()— non-JSON error responses produce a confusing JSON parse error instead of the real HTTP error.
Non-blocking Issues
ExperimentInfo.experiments: number— tautological name; the UI already calls it "Variants", so the model should too.- snake_case fields in a TypeScript interface are unusual in this codebase and should be either documented or mapped to camelCase at the API boundary.
formatDate()is a pure multi-branch utility with no test coverage; Vitest is already configured.- Redundant ternary in the template —
formatDatealready handles falsy inputs.
Positive Highlights
- ✅ Clean end-to-end symmetry:
getExperiments()→/api/experiments→fetchExperiments()→ Alpine state — follows the exact same shape as definitions and runs. - ✅ 60-second cache reuse is correct — no new cache key needed per-page.
- ✅ Loading / error / empty states are all handled in the UI, with a helpful "Run
gh aw experiments list" empty-state hint. - ✅ Removing
@github/copilot-sdkfrom production dependencies is a meaningful cleanup. - ✅
formatDate'sInvalid Dateguard is exactly the right defensive approach.
🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 93.3 AIC · ⌖ 8.37 AIC · ⊞ 6.6K
Comment /matt to run again
| const hit = getCached("experiments"); | ||
| if (hit) return hit; | ||
| const raw = await runGhAw(["experiments", "list", "--json"]); | ||
| const data = JSON.parse(raw); |
There was a problem hiding this comment.
[/tdd] JSON.parse(raw) has no error handling — if gh aw experiments list --json is unavailable or returns non-JSON output, this throws uncaught and crashes the /api/experiments route.
💡 Suggested fix
async function getExperiments() {
const hit = getCached("experiments");
if (hit) return hit;
try {
const raw = await runGhAw(["experiments", "list", "--json"]);
const data = JSON.parse(raw);
const experiments = Array.isArray(data) ? data : [];
setCached("experiments", experiments);
return experiments;
} catch (err) {
console.error("getExperiments failed:", err.message);
return [];
}
}The same gap exists in the existing getDefinitions()/getRuns(), but those are pre-existing. A try/catch here (returning [] on failure) keeps the route alive and lets the UI show its error state.
@copilot please address this.
| this.errorExperiments = ""; | ||
| try { | ||
| const resp = await fetch("/api/experiments"); | ||
| const data = await resp.json(); |
There was a problem hiding this comment.
[/tdd] resp.json() is called before resp.ok is checked — if the server returns a non-JSON error response (e.g. a plain-text 500), the call throws a JSON parse error and the actual HTTP error is swallowed.
💡 Suggested pattern
async fetchExperiments() {
this.loadingExperiments = true;
this.errorExperiments = "";
try {
const resp = await fetch("/api/experiments");
if (!resp.ok) {
const text = await resp.text();
throw new Error(text || `HTTP ${resp.status}`);
}
const data = await resp.json();
this.experiments = Array.isArray(data) ? data : [];
this.loadExperimentPage(1);
} catch (e) {
this.errorExperiments = `Failed to load experiments: ${e.message}`;
} finally {
this.loadingExperiments = false;
}
},fetchDefinitions and fetchRuns have the same ordering issue; fixing it here improves the pattern for future consistency.
@copilot please address this.
| export interface ExperimentInfo { | ||
| workflow_id: string; | ||
| branch: string; | ||
| experiments: number; |
There was a problem hiding this comment.
[/grill-with-docs] experiments: number is tautologically named — the field is on ExperimentInfo and the word experiments doesn't communicate what it counts. The HTML renders it under a "Variants" heading, suggesting the domain term is already established.
💡 Suggested rename
export interface ExperimentInfo {
workflow_id: string;
branch: string;
variants: number; // was: experiments (count of A/B variants)
total_runs: number;
last_run: string;
}Update the matching field in web/app.js/web/index.html (exp.experiments → exp.variants) and the buildExperiments mock in src/app.ts. If the Go CLI emits experiments, map it at the API boundary rather than carrying the ambiguous name into the TypeScript model.
@copilot please address this.
| hasPreviousPage: boolean; | ||
| } | ||
|
|
||
| export interface ExperimentInfo { |
There was a problem hiding this comment.
[/grill-with-docs] All fields in ExperimentInfo use snake_case (workflow_id, total_runs, last_run) while TypeScript convention and the existing codebase prefer camelCase. This appears to be a direct mapping from the Go CLI JSON output, but there is no comment explaining the deliberate choice.
💡 Two options
Option A — add a doc comment (minimal change):
/** Direct JSON mapping from `gh aw experiments list --json` — snake_case intentional. */
export interface ExperimentInfo {
workflow_id: string;
...
}Option B — camelCase in the model, map at the API boundary in extension.mjs:
export interface ExperimentInfo {
workflowId: string;
branch: string;
variants: number;
totalRuns: number;
lastRun: string;
}Option A is lower risk for this PR; Option B aligns with the rest of the codebase conventions.
@copilot please address this.
| tabs: dashboardTabs, | ||
| activeTab: "definitions", | ||
| pageSize: 20, | ||
| definitions: [], |
There was a problem hiding this comment.
[/tdd] formatDate() has two distinct branches (falsy guard, Invalid Date guard) but no unit tests. It's a pure function that's easy to test, and incorrect date handling would silently show stale or confusing data to users.
💡 Suggested tests (using existing Vitest setup)
// formatDate.test.js (or add to an existing test file)
import { describe, it, expect } from 'vitest';
import { formatDate } from './app.js'; // or extract and export
describe('formatDate', () => {
it('returns em-dash for null', () => expect(formatDate(null)).toBe('—'));
it('returns em-dash for undefined', () => expect(formatDate(undefined)).toBe('—'));
it('returns em-dash for empty string', () => expect(formatDate('')).toBe('—'));
it('returns em-dash for invalid date string', () => expect(formatDate('not-a-date')).toBe('—'));
it('returns a non-empty string for valid ISO date', () => {
expect(formatDate('2024-01-01T00:00:00Z')).not.toBe('—');
});
});@copilot please address this.
| <div> | ||
| <div class="text-bold" x-text="exp.workflow_id"></div> | ||
| <div class="color-fg-muted text-small mt-1">Branch: <code x-text="exp.branch"></code></div> | ||
| <div class="color-fg-muted text-small">Last run: <span x-text="exp.last_run ? formatDate(exp.last_run) : '—'"></span></div> |
There was a problem hiding this comment.
[/improve-codebase-architecture] The ternary exp.last_run ? formatDate(exp.last_run) : '—' is redundant — formatDate() already returns '—' for falsy inputs. Keeping this in the template obscures the contract of formatDate for future readers.
Simplify to:
<span x-text="formatDate(exp.last_run)"></span>@copilot please address this.
There was a problem hiding this comment.
No blocking issues found.
The HTTP route handler wraps all async calls in a top-level try/catch that sends {error: ...} JSON at 500, so CLI failures (missing subcommand, non-JSON output) are gracefully surfaced to the client. The cache has a 60 s TTL and a /api/refresh path. Client-side error handling in fetchExperiments() is consistent with fetchDefinitions()/fetchRuns().
Minor nit (non-blocking): The template guard exp.last_run ? formatDate(exp.last_run) : '—' in index.html is redundant — formatDate already returns '—' for falsy input.
🔎 Code quality review by PR Code Quality Reviewer · 99.2 AIC · ⌖ 7.03 AIC · ⊞ 5.2K
Comment /review to run again
|
🎉 This pull request is included in a new release. Release: |
Adds an Experiments tab to the
agentic-workflows-dashboardcanvas, exposing A/B experiment branch data fromgh aw experiments list --json.Extension server (
extension.mjs)getExperiments()— runsgh aw experiments list --json, 60s cached/api/experimentsHTTP routelistExperimentscanvas action (paged, same contract aslistDefinitions/listRuns)Web UI (
web/app.js,web/index.html)fetchExperiments()+loadExperimentPage()with loading/error stateformatDate()helper (guardsInvalid Date)TypeScript source (
src/models.ts,src/app.ts)ExperimentInfointerface matching Go CLI JSON schema (workflow_id,branch,experiments,total_runs,last_run)DashboardTabIdand counter union types; mock data builder for local dev