Skip to content

Prepare SPA dashboard release path#44

Merged
WellDunDun merged 16 commits intodevfrom
WellDunDun/winnipeg-v5
Mar 14, 2026
Merged

Prepare SPA dashboard release path#44
WellDunDun merged 16 commits intodevfrom
WellDunDun/winnipeg-v5

Conversation

@WellDunDun
Copy link
Collaborator

This packages the current release-prep work onto one branch. It makes the SPA/dashboard path coherent by sharing the v2 dashboard contract, adds a root dev workflow, and updates tests for SPA-default serving. It also switches orchestrate to autonomous-by-default description evolution with explicit review-required mode instead of mandatory approval. Finally, it promotes the current product/reset, SQLite, MCP, and grader-eval plans into tracked docs under docs/exec-plans/active/.

@coderabbitai
Copy link

coderabbitai bot commented Mar 14, 2026

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Local SPA dashboard server and dev/start flows (includes --no-open and conditional dev scripts).
    • Orchestrate-centric autonomous scheduling with install/dry-run support.
  • Improvements

    • Dashboard migrated to SPA-at-root with v2 API and centralized payload contracts.
    • Orchestrator: approval modes (auto/review), candidate prioritization, and safer auto-rollback.
    • CI/publish updated to build the dashboard SPA.
  • Documentation

    • Updated docs, workflows, and guides for dashboard, orchestrate, and scheduling.

Walkthrough

Consolidates dashboard payload types into a shared contract; removes the embedded static HTML dashboard and replaces it with a SPA served by a dashboard server exposing v2 APIs and optional loaders; refactors orchestrate/schedule to use approvalMode ("auto" | "review") and adds schedule install helpers and CI dashboard build steps.

Changes

Cohort / File(s) Summary
Dashboard contracts & types
cli/selftune/dashboard-contract.ts, apps/local-dashboard/src/types.ts
Add centralized dashboard payload contract; move many local interfaces to a shared module and re-export them; add UI-only types (SkillHealthStatus, SkillCard).
Dashboard server & CLI
cli/selftune/dashboard-server.ts, cli/selftune/dashboard.ts, apps/local-dashboard/HANDOFF.md
Remove embedded-HTML export flow; server now serves SPA from spaDir, exposes /api/health, /api/v2/overview, /api/v2/skills/:name, optional overviewLoader/skillReportLoader hooks; CLI simplified to server startup with --port/--no-open.
Static artifact removal
dashboard/index.html
Delete monolithic static HTML/JS/CSS dashboard (migrated to SPA).
Local dev scripts & packaging
package.json, apps/local-dashboard/package.json
Add root dev and dev:dashboard scripts; change app dev script to include --no-open; remove dashboard/ from package files.
Local DB / queries
cli/selftune/localdb/queries.ts
Move payload interfaces to contract; make safeParseJsonArray generic; use typed parsing for skills/evidence fields.
Orchestrator & scheduling
cli/selftune/orchestrate.ts, cli/selftune/schedule.ts, cli/selftune/cron/setup.ts, tests
Replace booleans (dryRun, autoApprove) with `approvalMode: "auto"
Tests: dashboard & badge routes
tests/dashboard/*, tests/dashboard/dashboard-server.test.ts, tests/dashboard/badge-routes.test.ts
Refactor tests to use new contract types and loader fixtures; add temporary SPA dir fixtures; switch server host to 127.0.0.1; update expectations to v2 endpoints and new payload shapes.
Tests: CLI & sandbox
tests/dashboard/dashboard.test.ts, tests/sandbox/run-sandbox.ts
Remove HTML-export tests; add CLI smoke tests that start server and poll / and /api/v2/overview; add runDashboardSmokeTest.
Docs & exec plans
docs/**, skill/**, ARCHITECTURE.md, README.md, ROADMAP.md
Document SPA-as-default, update CLI usage to --port/--no-open, add execution plans (SQLite materialization, grader), update escalation/tech-debt notes and workflows to orchestrate-centric model.
CI / publish
.github/workflows/ci.yml, .github/workflows/publish.yml
Add dashboard SPA build step/job running bun run build:dashboard in CI and publish workflows.
New public APIs / exports
cli/selftune/dashboard-contract.ts, cli/selftune/schedule.ts
Introduce typed contract interfaces (OverviewResponse, SkillReportResponse, etc.) and schedule install exports (ScheduleInstallArtifact, ScheduleInstallResult, selectInstallFormat, buildInstallPlan, installSchedule, applyCronArtifact).

Sequence Diagram(s)

sequenceDiagram
    participant Browser as Browser (SPA)
    participant Server as Dashboard Server
    participant Loader as overviewLoader / skillReportLoader
    participant DB as Optional DB/materialization
    Note over Browser,Server: SPA requests data on demand
    Browser->>Server: GET / (SPA shell)
    Server-->>Browser: 200 HTML/JS assets (from spaDir)
    Browser->>Server: GET /api/v2/overview
    Server->>Loader: overviewLoader()? or DB query
    alt Loader present
        Loader-->>Server: OverviewResponse
    else DB/materialized
        DB-->>Server: OverviewResponse
    end
    Server-->>Browser: 200 JSON (OverviewResponse)
    Browser->>Server: GET /api/v2/skills/:name
    Server->>Loader: skillReportLoader(skillName)? or DB query
    alt Loader returns report
        Loader-->>Server: SkillReportResponse
        Server-->>Browser: 200 JSON
    else no report
        Server-->>Browser: 404
    end
Loading
sequenceDiagram
    participant Scheduler as Scheduler (cron/launchd/systemd)
    participant CLI as selftune orchestrate
    participant Sync as Sync step
    participant Select as Candidate selector
    participant Evolve as Evolve
    participant Watch as Watch
    Note over Scheduler,CLI: Orchestrate runs periodically or manually
    Scheduler->>CLI: run orchestrate (--approvalMode)
    CLI->>Sync: sync --sync-first
    Sync-->>CLI: synced data
    CLI->>Select: selectCandidates()
    Select-->>CLI: ordered candidates
    CLI->>Evolve: evolve(candidates, dryRun = approvalMode=="review")
    alt approvalMode == "review"
        Evolve-->>CLI: dry-run (no commit)
    else approvalMode == "auto"
        Evolve-->>CLI: applied changes
    end
    CLI->>Watch: watch(evolved, autoRollback=true)
    Watch-->>CLI: monitor results / rollback if needed
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~70 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Title check ⚠️ Warning Title does not follow conventional commits format (feat|fix|docs|chore|refactor|test|perf|ci); uses generic phrasing instead of required prefix. Rewrite title using conventional commits format, e.g., 'feat: prepare SPA dashboard release path' or 'chore: consolidate SPA dashboard and release-prep work'.
Docstring Coverage ⚠️ Warning Docstring coverage is 34.15% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Description check ✅ Passed Description clearly relates to the changeset, detailing SPA consolidation, contract sharing, dev workflow, orchestrate changes, and promotion of exec plans.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch WellDunDun/winnipeg-v5
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
tests/dashboard/dashboard-server.test.ts (2)

273-288: ⚠️ Potential issue | 🟡 Minor

Assert rollback failure behavior explicitly.

This test currently validates only response shape. Given the stubbed rollback path is a failure, assert success === false and the error payload so regressions in error handling are caught.

Proposed fix
     expect(res.status).toBe(200);
     const data = await res.json();
-    expect(data).toHaveProperty("success");
-    expect(typeof data.success).toBe("boolean");
+    expect(data.success).toBe(false);
+    expect(data.error).toContain("rollback blocked in test");

As per coding guidelines: tests/**/*.ts should include coverage of error paths, not just happy paths.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/dashboard/dashboard-server.test.ts` around lines 273 - 288, The test
"returns JSON response with proposalId validation" currently only checks
response shape; update it to assert the rollback failure explicitly by verifying
that the parsed JSON (variable data) has data.success === false and that an
error payload exists (e.g., data.error or data.message) with the expected
type/string content; locate the test by its description and the call to
fetch(`/api/actions/rollback`) and add assertions for data.success === false and
for the presence and type/value of the error field to ensure failure paths are
covered.

367-406: ⚠️ Potential issue | 🟠 Major

Guarantee server teardown with try/finally in lifecycle tests.

Line [385] and Line [405] call stop() only on the happy path. If an assertion fails first, the server can leak and destabilize subsequent tests.

Proposed fix
   it("can start and stop cleanly", async () => {
     const s = await startDashboardServer({
       port: 0,
       host: "127.0.0.1",
       openBrowser: false,
       dataLoader: () => fakeData,
       statusLoader: () => ({
         skills: [],
         unmatchedQueries: 0,
         pendingProposals: 0,
         lastSession: null,
         system: { healthy: true, pass: 0, fail: 0, warn: 0 },
       }),
     });
-    expect(s).toHaveProperty("stop");
-    expect(s).toHaveProperty("port");
-    expect(typeof s.port).toBe("number");
-    expect(s.port).toBeGreaterThan(0);
-    s.stop();
+    try {
+      expect(s).toHaveProperty("stop");
+      expect(s).toHaveProperty("port");
+      expect(typeof s.port).toBe("number");
+      expect(s.port).toBeGreaterThan(0);
+    } finally {
+      s.stop();
+    }
   }, 30000);

   it("exposes port after binding", async () => {
     const s = await startDashboardServer({
       port: 0,
       host: "127.0.0.1",
       openBrowser: false,
       dataLoader: () => fakeData,
       statusLoader: () => ({
         skills: [],
         unmatchedQueries: 0,
         pendingProposals: 0,
         lastSession: null,
         system: { healthy: true, pass: 0, fail: 0, warn: 0 },
       }),
     });
-    const res = await fetch(`http://127.0.0.1:${s.port}/api/data`);
-    expect(res.status).toBe(200);
-    s.stop();
+    try {
+      const res = await fetch(`http://127.0.0.1:${s.port}/api/data`);
+      expect(res.status).toBe(200);
+    } finally {
+      s.stop();
+    }
   }, 15000);

As per coding guidelines: tests/**/*.ts should use isolated tests with proper setup/teardown.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/dashboard/dashboard-server.test.ts` around lines 367 - 406, The two
tests that call startDashboardServer (the "can start and stop cleanly" and
"exposes port after binding" tests) currently call s.stop() only on the happy
path; change each to ensure teardown by wrapping the server usage in a
try/finally so s.stop() is always invoked (or check s and call s.stop() in
finally). Locate the tests that call startDashboardServer and replace the direct
s.stop() at the end with a try { ... assertions and any fetch calls } finally {
if (s && typeof s.stop === "function") s.stop(); } to guarantee no server leak
across tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/exec-plans/active/product-reset-and-shipping.md`:
- Around line 137-173: Remove the duplicated "Current Recommendations" heading
near the end of the section (the second occurrence shown in the diff) so the
document has a single top-level "Current Recommendations" heading; locate the
duplicate string "## Current Recommendations" and delete that redundant heading
(or merge any unique content beneath it into the original section) so
markdownlint MD024 is resolved.

In `@package.json`:
- Line 53: The dev npm script currently reuses any process listening on port
7888 based solely on lsof; update the "dev" script to validate that the listener
is the dashboard server before reusing the port by performing an HTTP health
probe (e.g., curl or wget) against the dashboard health endpoint (such as
/health or /api/health) after detecting a listener; only proceed to "Using
existing dashboard server on 7888" if the probe returns a successful response
(200), otherwise print a clear error and exit so the script does not incorrectly
proxy to a different service. Target the "dev" script entry in package.json and
replace the simple lsof-only branch with a two-step check: lsof -> health HTTP
probe -> reuse or fail fast with a descriptive message.

---

Outside diff comments:
In `@tests/dashboard/dashboard-server.test.ts`:
- Around line 273-288: The test "returns JSON response with proposalId
validation" currently only checks response shape; update it to assert the
rollback failure explicitly by verifying that the parsed JSON (variable data)
has data.success === false and that an error payload exists (e.g., data.error or
data.message) with the expected type/string content; locate the test by its
description and the call to fetch(`/api/actions/rollback`) and add assertions
for data.success === false and for the presence and type/value of the error
field to ensure failure paths are covered.
- Around line 367-406: The two tests that call startDashboardServer (the "can
start and stop cleanly" and "exposes port after binding" tests) currently call
s.stop() only on the happy path; change each to ensure teardown by wrapping the
server usage in a try/finally so s.stop() is always invoked (or check s and call
s.stop() in finally). Locate the tests that call startDashboardServer and
replace the direct s.stop() at the end with a try { ... assertions and any fetch
calls } finally { if (s && typeof s.stop === "function") s.stop(); } to
guarantee no server leak across tests.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2ed601f1-bacc-4706-96ae-54fa9d21d039

📥 Commits

Reviewing files that changed from the base of the PR and between aec5f4f and 94c8f67.

📒 Files selected for processing (13)
  • apps/local-dashboard/HANDOFF.md
  • apps/local-dashboard/src/types.ts
  • cli/selftune/dashboard-contract.ts
  • cli/selftune/localdb/queries.ts
  • cli/selftune/orchestrate.ts
  • docs/exec-plans/active/grader-prompt-evals.md
  • docs/exec-plans/active/local-sqlite-materialization.md
  • docs/exec-plans/active/mcp-tool-descriptions.md
  • docs/exec-plans/active/product-reset-and-shipping.md
  • package.json
  • skill/Workflows/Dashboard.md
  • tests/dashboard/dashboard-server.test.ts
  • tests/orchestrate.test.ts

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
skill/SKILL.md (1)

49-60: ⚠️ Potential issue | 🟡 Minor

Remove duplicate selftune dashboard quick-reference entry.

Line [49] and Line [60] repeat the same command, which can cause documentation drift.

Proposed cleanup
 selftune cron list
 selftune cron remove [--dry-run]
-selftune dashboard [--port <port>] [--no-open]
 selftune evolve-body --skill <name> --skill-path <path> --target <routing_table|full_body> [--dry-run]
As per coding guidelines: `skill/**/*.md` should have clear step-by-step instructions and no ambiguous references.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skill/SKILL.md` around lines 49 - 60, Remove the duplicate quick-reference
entry for the command "selftune dashboard" in SKILL.md: locate both occurrences
of the "selftune dashboard [--port <port>] [--no-open]" lines (the repeated
entry at the top and the one near the end) and delete the duplicate so the
command appears only once in the quick-reference section, ensuring the remaining
list preserves original ordering and spacing.
tests/sandbox/run-sandbox.ts (1)

134-135: ⚠️ Potential issue | 🟡 Minor

Duplicate import of readdirSync.

readdirSync is already imported at line 13 from "node:fs". This import statement is redundant.

🛠️ Proposed fix
-// Need readdirSync for transcripts
-import { readdirSync } from "node:fs";
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/sandbox/run-sandbox.ts` around lines 134 - 135, Remove the redundant
import of readdirSync (the duplicate import statement that imports from
"node:fs"); keep the original import at the top (the existing readdirSync import
used throughout) and delete the later duplicate import so there is only one
import of readdirSync in the file (ensure usages of readdirSync remain
unchanged, e.g., any calls in run-sandbox.ts).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cli/selftune/dashboard-server.ts`:
- Around line 703-713: The decodeURIComponent call that derives skillName can
throw a URIError for malformed percent-encoding; wrap the decode step in a
try/catch around
decodeURIComponent(url.pathname.slice("/api/v2/skills/".length)) and if a
URIError (or any decoding error) is thrown return a Response.json({ error:
"Malformed skill name" }, { status: 400, headers: corsHeaders() }) instead of
letting it bubble up; keep the existing flow that calls
getSkillReportResponse(skillName) when decoding succeeds.

In `@docs/escalation-policy.md`:
- Around line 55-56: The escalation trigger list uses a short path
"dashboard-contract.ts" on one line but a fully-qualified repo path
"cli/selftune/localdb/queries.ts" on another; update the shorter entry to the
full repository path (e.g., replace "dashboard-contract.ts" with its full path
such as "cli/<actual-subdir>/dashboard-contract.ts" or whatever the canonical
repo path is) so both entries use fully-qualified paths for consistency and
unambiguous ownership.

In `@tests/dashboard/dashboard-server.test.ts`:
- Around line 271-279: The test fails because the server doesn't have SPA assets
to serve the fallback; before calling getServer() in the "returns SPA fallback
for client-side routes" test, create a minimal SPA index HTML at the server's
static assets directory (the same path the server uses to serve the SPA shell)
so the fallback route can return 200 and the '<div id="root"></div>' marker;
alternatively update the test to invoke getServer({ useMockAssets: true }) or
similar initialization flag if available, or modify getServer to create a
temporary index.html in the static/public build directory when assets are
missing so the route handler that serves the SPA shell returns the expected
HTML.
- Line 99: The variable serverPromise is typed with a Promise that resolves to
an object containing server: unknown; replace unknown with the concrete server
type returned by startDashboardServer to preserve type information — either
import/use the specific interface/type (e.g., DashboardServer) or derive it via
ReturnType<typeof startDashboardServer> (and wrap with Awaited<> if needed) and
update the declaration of serverPromise to Promise<{ server: ThatType; stop: ()
=> void; port: number }> | null so the test gets proper IDE/type-safety for the
server instance.
- Around line 155-168: Tests expect the dashboard SPA build but CI doesn't build
it, causing startDashboardServer() to return 503; fix by either adding a CI
build step (run the existing build:dashboard script before bun test) or make the
tests resilient by mocking/conditionalizing the SPA layer: in
tests/dashboard/dashboard-server.test.ts replace direct
readRootHtml()/startDashboardServer() file reads with a mock that returns a
minimal HTML shell (containing '<div id="root"></div>' and '/assets/') when
apps/local-dashboard/dist/index.html is missing, or skip those GET / assertions
if the artifact is absent; update getServer()/readRootHtml() usage accordingly
so tests pass without requiring a prior build.

---

Outside diff comments:
In `@skill/SKILL.md`:
- Around line 49-60: Remove the duplicate quick-reference entry for the command
"selftune dashboard" in SKILL.md: locate both occurrences of the "selftune
dashboard [--port <port>] [--no-open]" lines (the repeated entry at the top and
the one near the end) and delete the duplicate so the command appears only once
in the quick-reference section, ensuring the remaining list preserves original
ordering and spacing.

In `@tests/sandbox/run-sandbox.ts`:
- Around line 134-135: Remove the redundant import of readdirSync (the duplicate
import statement that imports from "node:fs"); keep the original import at the
top (the existing readdirSync import used throughout) and delete the later
duplicate import so there is only one import of readdirSync in the file (ensure
usages of readdirSync remain unchanged, e.g., any calls in run-sandbox.ts).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ee809e3b-15f2-4195-b559-7b446a709314

📥 Commits

Reviewing files that changed from the base of the PR and between 94c8f67 and 273bd39.

📒 Files selected for processing (19)
  • ARCHITECTURE.md
  • CHANGELOG.md
  • README.md
  • ROADMAP.md
  • apps/local-dashboard/HANDOFF.md
  • apps/local-dashboard/package.json
  • cli/selftune/dashboard-server.ts
  • cli/selftune/dashboard.ts
  • dashboard/index.html
  • docs/design-docs/sandbox-claude-code.md
  • docs/design-docs/sandbox-test-harness.md
  • docs/escalation-policy.md
  • docs/exec-plans/tech-debt-tracker.md
  • package.json
  • skill/SKILL.md
  • skill/Workflows/Dashboard.md
  • tests/dashboard/dashboard-server.test.ts
  • tests/dashboard/dashboard.test.ts
  • tests/sandbox/run-sandbox.ts
💤 Files with no reviewable changes (1)
  • dashboard/index.html

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
docs/exec-plans/active/product-reset-and-shipping.md (1)

175-175: ⚠️ Potential issue | 🟡 Minor

Remove duplicate Current Recommendations heading (MD024).

Line [175] repeats an H2 already used at Line [139], which trips markdownlint no-duplicate-heading.

Proposed fix
-## Current Recommendations
-
 ### 1. Make the SPA the real default dashboard path
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/exec-plans/active/product-reset-and-shipping.md` at line 175, Remove the
duplicate H2 "Current Recommendations" that appears twice in this document by
deleting the second occurrence (the repetition at the later location) so only
one "Current Recommendations" H2 remains; locate the duplicate heading text
"Current Recommendations" and remove that redundant line to satisfy markdownlint
MD024.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/ci.yml:
- Around line 28-30: Add bun dependency caching around the existing "uses:
oven-sh/setup-bun" and "run: bun install" steps: add an actions/cache@v3
restore/save step keyed on bun.lockb (e.g. key: ${{ runner.os }}-bun-${{
hashFiles('bun.lockb') }}) and cache the bun cache directory (e.g. ~/.bun or
$HOME/.bun) so subsequent runs reuse installed packages; verify the exact cache
path and any built-in caching options in oven-sh/setup-bun docs and place the
restore step before "run: bun install" and the save step after it (or use a
single actions/cache entry that handles both restore and save).

In `@docs/exec-plans/active/telemetry-field-map.md`:
- Line 5: The document telemetry-field-map.md has a mismatched status label
"Status: Reference" while it resides in the active exec-plans collection; either
move the document to the reference/archive collection or update the in-file
status line to "Status: Active" so the label matches its location—edit the file
telemetry-field-map.md and either relocate it to the reference collection or
change the status header from "Reference" to "Active".
- Line 3: Update the verification comment line that currently reads "<!--
Verified: 2026-03-10 -->" to reflect the current review date by changing it to
"<!-- Verified: 2026-03-14 -->" so the document's verification timestamp is
accurate.

---

Duplicate comments:
In `@docs/exec-plans/active/product-reset-and-shipping.md`:
- Line 175: Remove the duplicate H2 "Current Recommendations" that appears twice
in this document by deleting the second occurrence (the repetition at the later
location) so only one "Current Recommendations" H2 remains; locate the duplicate
heading text "Current Recommendations" and remove that redundant line to satisfy
markdownlint MD024.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: d138f785-dfa5-4fb5-b727-9ce4e7c0237b

📥 Commits

Reviewing files that changed from the base of the PR and between 273bd39 and bee281c.

📒 Files selected for processing (9)
  • .github/workflows/ci.yml
  • .github/workflows/publish.yml
  • docs/exec-plans/active/grader-prompt-evals.md
  • docs/exec-plans/active/local-sqlite-materialization.md
  • docs/exec-plans/active/mcp-tool-descriptions.md
  • docs/exec-plans/active/multi-agent-sandbox.md
  • docs/exec-plans/active/product-reset-and-shipping.md
  • docs/exec-plans/active/telemetry-field-map.md
  • docs/exec-plans/completed/dashboard-spa-cutover.md

Comment on lines +28 to +30
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
- uses: oven-sh/setup-bun@ecf28ddc73e819eb6fa29df6b34ef8921c743461 # v2
- run: bun install
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider caching bun dependencies.

While the existing jobs also lack caching, adding bun cache would speed up all jobs. This is optional but beneficial for larger dependency sets.

♻️ Optional: Add bun caching
     steps:
       - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
       - uses: oven-sh/setup-bun@ecf28ddc73e819eb6fa29df6b34ef8921c743461 # v2
+        with:
+          bun-version: latest
       - run: bun install
+      - uses: actions/cache@v4
+        with:
+          path: ~/.bun/install/cache
+          key: ${{ runner.os }}-bun-${{ hashFiles('**/bun.lockb') }}
+          restore-keys: |
+            ${{ runner.os }}-bun-
       - run: bun run build:dashboard

Note: oven-sh/setup-bun may have built-in caching options — verify current action docs.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
- uses: oven-sh/setup-bun@ecf28ddc73e819eb6fa29df6b34ef8921c743461 # v2
- run: bun install
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
- uses: oven-sh/setup-bun@ecf28ddc73e819eb6fa29df6b34ef8921c743461 # v2
with:
bun-version: latest
- run: bun install
- uses: actions/cache@v4
with:
path: ~/.bun/install/cache
key: ${{ runner.os }}-bun-${{ hashFiles('**/bun.lockb') }}
restore-keys: |
${{ runner.os }}-bun-
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/ci.yml around lines 28 - 30, Add bun dependency caching
around the existing "uses: oven-sh/setup-bun" and "run: bun install" steps: add
an actions/cache@v3 restore/save step keyed on bun.lockb (e.g. key: ${{
runner.os }}-bun-${{ hashFiles('bun.lockb') }}) and cache the bun cache
directory (e.g. ~/.bun or $HOME/.bun) so subsequent runs reuse installed
packages; verify the exact cache path and any built-in caching options in
oven-sh/setup-bun docs and place the restore step before "run: bun install" and
the save step after it (or use a single actions/cache entry that handles both
restore and save).

@@ -2,7 +2,7 @@

<!-- Verified: 2026-03-10 -->
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Update the verification date.

The verification date is 2026-03-10, but this document is being modified on 2026-03-14. Update the date to reflect this review.

📅 Proposed update
-<!-- Verified: 2026-03-10 -->
+<!-- Verified: 2026-03-14 -->
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
<!-- Verified: 2026-03-10 -->
<!-- Verified: 2026-03-14 -->
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/exec-plans/active/telemetry-field-map.md` at line 3, Update the
verification comment line that currently reads "<!-- Verified: 2026-03-10 -->"
to reflect the current review date by changing it to "<!-- Verified: 2026-03-14
-->" so the document's verification timestamp is accurate.

@@ -2,7 +2,7 @@

<!-- Verified: 2026-03-10 -->

**Status:** Active
**Status:** Reference
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Status "Reference" contradicts the active/ directory path.

The status label now reads "Reference", but the file remains in docs/exec-plans/active/. Either move the file to a reference/ or archive/ directory, or keep the status as "Active" if this document is still actively maintained.

📂 Suggested path correction

Move the file to reflect its reference status:

# Rename/move the file
git mv docs/exec-plans/active/telemetry-field-map.md docs/exec-plans/reference/telemetry-field-map.md

Or revert the status label if the document should remain active.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/exec-plans/active/telemetry-field-map.md` at line 5, The document
telemetry-field-map.md has a mismatched status label "Status: Reference" while
it resides in the active exec-plans collection; either move the document to the
reference/archive collection or update the in-file status line to "Status:
Active" so the label matches its location—edit the file telemetry-field-map.md
and either relocate it to the reference collection or change the status header
from "Reference" to "Active".

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cli/selftune/dashboard-server.ts`:
- Around line 425-432: The code treats any provided spaDir (from options?.spaDir
or findSpaDir()) as valid; instead, verify the directory actually contains
index.html before announcing/using it. Modify the spaDir detection logic (around
the spaDir variable assignment and the console.log/console.warn block) to check
for existence/readability of path.join(spaDir, "index.html") and if the file is
missing or inaccessible, log the warning and set spaDir to undefined so the
server will use the intended 503 fallback for "/" rather than attempting to
serve a broken SPA.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: d4c22e6a-651f-4e58-aedd-6a37dd907673

📥 Commits

Reviewing files that changed from the base of the PR and between 23e5380 and 789ebef.

📒 Files selected for processing (8)
  • cli/selftune/dashboard-server.ts
  • docs/escalation-policy.md
  • docs/exec-plans/active/product-reset-and-shipping.md
  • docs/exec-plans/active/telemetry-normalization.md
  • docs/exec-plans/reference/telemetry-field-map.md
  • package.json
  • tests/dashboard/badge-routes.test.ts
  • tests/dashboard/dashboard-server.test.ts

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
skill/Workflows/Initialize.md (1)

19-25: ⚠️ Potential issue | 🟡 Minor

Document the real --agent values.

Lines 45-51 of cli/selftune/init.ts accept claude_code, codex, opencode, openclaw, and unknown. This table still advertises claude and omits openclaw, so selftune init --agent claude does not actually honor the override.

As per coding guidelines, skill workflow documents should have no ambiguous references.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skill/Workflows/Initialize.md` around lines 19 - 25, The README table
documents the wrong/ambiguous --agent values; update the Flag table in
Initialize.md to list the actual accepted values used by cli/selftune/init.ts
(claude_code, codex, opencode, openclaw, unknown) instead of "claude" and ensure
the description clarifies these are exact token values to pass to the --agent
option so that using selftune init --agent <value> matches the switch in
cli/selftune/init.ts.
cli/selftune/orchestrate.ts (1)

377-383: ⚠️ Potential issue | 🔴 Critical

Guard watch rollback in dry-run/review mode.

Line 382 forces autoRollback: true, which can still mutate state during --dry-run and --review-required. That contradicts the no-mutation behavior described by this command.

Proposed fix
-      const watchResult = await _watch({
+      const watchResult = await _watch({
         skillName,
         skillPath,
         windowSessions: 20,
         regressionThreshold: 0.1,
-        autoRollback: true,
+        autoRollback: !options.dryRun && options.approvalMode === "auto",
         syncFirst: false,
       });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli/selftune/orchestrate.ts` around lines 377 - 383, The call to _watch
currently forces autoRollback: true which can mutate state during CLI modes that
should be non-mutating; change the options passed to _watch (the call where
skillName/skillPath/windowSessions/regressionThreshold/autoRollback/syncFirst
are set) so autoRollback is conditional (e.g., only true when not in dry-run and
not in review-required mode) by using the command flags/variables that represent
dry-run/review-required (e.g., dryRun or reviewRequired) to compute autoRollback
as false in those modes; ensure the rest of the _watch call and downstream logic
still handle a false autoRollback gracefully.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cli/selftune/init.ts`:
- Around line 411-412: The CLI currently short-circuits when an existing config
is detected, causing the "--enable-autonomy" flag to be a no-op and hiding
scheduler activation failures; modify the configExists early-return logic in
cli/selftune/init.ts so that if options["enable-autonomy"] is true the code does
not exit early and proceeds to the scheduler activation block (the code that
produces scheduleResult), and after attempting activation check
scheduleResult.activated and, on false, call process.exit with a non-zero code
(and log an actionable error) to surface failures; apply the same change to the
parallel retry/second-path block referenced around the other schedule attempt so
both code paths bypass the early exit when "enable-autonomy" is set and both
exit non-zero on scheduleResult.activated === false.

In `@cli/selftune/orchestrate.ts`:
- Around line 82-85: The statusWeight ternary in function candidatePriority is
misformatted and failing CI; reformat the block to match project style by
running Biome on cli/selftune/orchestrate.ts (or manually adjust the formatting
of the statusWeight expression and surrounding lines) so the statusWeight
declaration and the subsequent missedWeight line conform to the repository's
Biome rules; ensure the candidatePriority function formatting is consistent with
other functions in the file and commit the formatted file.

In `@cli/selftune/schedule.ts`:
- Around line 271-276: The current cron install flow returns an artifact and
uses activationCommands that run `crontab ${path}`, which overwrites the user's
entire crontab; instead modify the activation step to merge a selftune-managed
block into the existing crontab: read the current crontab (e.g., via `crontab
-l`), locate markers such as "# BEGIN SELFTUNE" / "# END SELFTUNE", replace that
block with the output of generateCrontab(), or append a new marked block if none
exists, write the merged content to a temp file and apply it with `crontab
<tempfile>`; update the code paths that build artifacts/activationCommands (the
branch that checks format === "cron", the const path variable and
generateCrontab()) to implement this safe merge instead of blindly running
`crontab ${path}`.
- Around line 431-449: When handling the values.install branch, wrap the call to
installSchedule(...) in a try/catch and on any thrown error print a concise
error via console.error and call process.exit(1); after a successful call, check
the returned result (the installSchedule result object) and if result.activated
is false or the install indicates failure (e.g., non-empty failure flag or
unexpected result.dryRun state per your semantics) emit a concise error message
with context and call process.exit(1); otherwise continue to print the JSON
success payload. Reference: values.install branch, installSchedule(...), result,
result.activated, result.dryRun.

In `@docs/integration-guide.md`:
- Around line 390-395: The code block currently shows just the command
"orchestrate" which is incomplete; update the example to show the full runnable
command "selftune orchestrate" (replace the lone "orchestrate" token in the
docs/integration-guide.md snippet with "selftune orchestrate") so readers can
copy and run it successfully.

In `@skill/Workflows/Schedule.md`:
- Around line 19-23: Replace the ambiguous CLI example "orchestrate" with the
full command "selftune orchestrate" in the Schedule.md content and any related
sentences so the example is copy/pasteable and consistent with the paragraph
that references the CLI entry point; update the text around the core selftune
automation loop to use the unique symbol "selftune orchestrate" instead of the
bare "orchestrate" to ensure the workflow example is valid.

In `@tests/schedule/schedule.test.ts`:
- Line 184: The test asserts a POSIX-style path string and will fail on Windows;
update the assertion in tests/schedule/schedule.test.ts to be separator-agnostic
by comparing against a platform-correct path built with Node's path utilities
(e.g., use path.join('.selftune','schedule','selftune.crontab') or normalize
both sides) when checking result.artifacts[0]?.path so the assertion works on
all OSes.
- Around line 154-185: Add tests that cover the error/invalid-format paths for
the install helper functions: call selectInstallFormat with an
unsupported/invalid platform or an explicit invalid format and assert it returns
{ ok: false } (or the error shape your function uses); call buildInstallPlan
with an unsupported format (e.g., "invalid-format") and assert it throws or
returns an error result; call installSchedule with an invalid format and with
runCommand that would be invoked on activation and assert it does not activate
and returns an error/did-fail response (check result.activated is false and
result.dryRun handling for errors). Reference selectInstallFormat,
buildInstallPlan, and installSchedule when locating the code to test and make
assertions against their error-return semantics consistent with the module.

---

Outside diff comments:
In `@cli/selftune/orchestrate.ts`:
- Around line 377-383: The call to _watch currently forces autoRollback: true
which can mutate state during CLI modes that should be non-mutating; change the
options passed to _watch (the call where
skillName/skillPath/windowSessions/regressionThreshold/autoRollback/syncFirst
are set) so autoRollback is conditional (e.g., only true when not in dry-run and
not in review-required mode) by using the command flags/variables that represent
dry-run/review-required (e.g., dryRun or reviewRequired) to compute autoRollback
as false in those modes; ensure the rest of the _watch call and downstream logic
still handle a false autoRollback gracefully.

In `@skill/Workflows/Initialize.md`:
- Around line 19-25: The README table documents the wrong/ambiguous --agent
values; update the Flag table in Initialize.md to list the actual accepted
values used by cli/selftune/init.ts (claude_code, codex, opencode, openclaw,
unknown) instead of "claude" and ensure the description clarifies these are
exact token values to pass to the --agent option so that using selftune init
--agent <value> matches the switch in cli/selftune/init.ts.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: b6a0406a-3ba3-4c63-b03f-4147d5383dea

📥 Commits

Reviewing files that changed from the base of the PR and between 789ebef and 61c7dec.

📒 Files selected for processing (11)
  • cli/selftune/cron/setup.ts
  • cli/selftune/init.ts
  • cli/selftune/orchestrate.ts
  • cli/selftune/schedule.ts
  • docs/integration-guide.md
  • skill/Workflows/Cron.md
  • skill/Workflows/Initialize.md
  • skill/Workflows/Schedule.md
  • tests/cron/setup.test.ts
  • tests/orchestrate.test.ts
  • tests/schedule/schedule.test.ts

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@README.md`:
- Around line 40-47: Update the Quick proof path in README.md to warn about the
destructive nature of the `sync --force` flag: replace or supplement the current
`npx selftune@latest sync --force` line with `npx selftune@latest sync` as the
recommended first-run command, and add a one-line caution beneath the block
stating that `--force` will overwrite/rebuild local state and should only be
used when you explicitly need to rebuild from scratch (use `--force` only when
necessary).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7ddc60f4-4d93-4d99-a2a1-5d86ca1c0cb4

📥 Commits

Reviewing files that changed from the base of the PR and between 61c7dec and 4ea63c3.

📒 Files selected for processing (4)
  • README.md
  • skill/SKILL.md
  • skill/Workflows/Orchestrate.md
  • skill/references/setup-patterns.md

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@README.md`:
- Around line 51-57: Add a one-line explanation of what autonomous mode does
immediately before the "Autonomy quick start:" code block (the block that begins
with the command "npx selftune@latest init --enable-autonomy"), e.g. "Autonomous
mode enables automatic deployment of low-risk skill improvements with validation
and rollback safeguards." Keep it concise and inline with the existing README
formatting so the new sentence appears directly above the three example
commands.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 00e4a3c3-383d-45ae-9bd2-994d376ade1f

📥 Commits

Reviewing files that changed from the base of the PR and between f0ed152 and 8c5db2e.

📒 Files selected for processing (1)
  • README.md

Comment on lines +51 to +57
Autonomy quick start:

```bash
npx selftune@latest init --enable-autonomy
npx selftune@latest orchestrate --dry-run
npx selftune@latest schedule --install --dry-run
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Add context for autonomy mode.

The autonomy quick start jumps directly to commands without explaining what autonomous mode does. Add a brief one-liner (e.g., "Autonomous mode enables auto-deployment of low-risk skill improvements with validation and rollback safeguards") before the code block so users understand what they're opting into.

📝 Suggested clarification
+Autonomy quick start (auto-deploy validated low-risk improvements):
-Autonomy quick start:
 
 ```bash
 npx selftune@latest init --enable-autonomy
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 51 - 57, Add a one-line explanation of what
autonomous mode does immediately before the "Autonomy quick start:" code block
(the block that begins with the command "npx selftune@latest init
--enable-autonomy"), e.g. "Autonomous mode enables automatic deployment of
low-risk skill improvements with validation and rollback safeguards." Keep it
concise and inline with the existing README formatting so the new sentence
appears directly above the three example commands.

@WellDunDun WellDunDun merged commit 3f6c6c3 into dev Mar 14, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant