feat(iter-4): web GUI, robustness, notifications, review UI, per-iter scope#4
Merged
Conversation
New packages/web workspace with: - Dashboard: project list with status badges, click to detail - Project detail: tabs for Status, History, Logs, Config - Phase indicator: visual stepper showing loop progress - Log viewer: SSE-based real-time log streaming - Judge assessment display with determination + quality score - Iteration history table with judge verdicts - Loop controls: start/stop/resume/review/document - Feedback form: textarea + resume when loop is paused - Config display: all four agent roles, limits, policies Server changes: - CORS middleware for Vite dev proxy - Loop events SSE endpoint (GET /api/projects/:id/loop/events) - Static file serving from packages/web/dist/ (production) - SPA fallback for hash routing Dark theme, plain CSS, responsive layout, no UI framework.
- Render max 500 lines (truncate older), use single <pre> instead of individual <div> per line -- fixes DOM bloat on large logs - Auto-scroll detection: disables when user scrolls up, shows 'scroll to bottom' button - Line count shown in log header - Disabled buttons now use pointer-events:none + lower opacity to make the disabled state unambiguous
Review/Document buttons now have transparent background with purple border, clearly distinguishable from the solid primary Start Loop button and from disabled state.
Only poll loop status every 3s when the project is running/paused. For idle/completed/stopped projects, fetch once on mount then stop. Reduces unnecessary network requests when nothing is changing.
Removed the MAX_RENDERED_LINES=500 limit that hid 54K+ earlier lines. All lines now rendered in a single <pre> block which the browser handles efficiently. Full log is scrollable from start to end.
Shows spinner + '(loading...)' text while SSE is streaming lines. Wait cursor on the log viewer container during loading. Line count only shown after first lines arrive.
Removed confusing green dot / 'complete' status indicator. Replaced with always-visible 'top' and 'bottom' buttons for navigating large logs. Cleaner header: iteration info + line count on left, navigation buttons on right.
Top button hidden when scrolled to top, bottom button hidden when at bottom. Both appear when scrolling in the middle of the log.
Old config files (created before iteration 3/4) lack these fields. Config validation now backfills them with the dev agent's adapter instead of throwing. Fixes 500 Internal Server Error when creating projects with a pre-existing config.
The SSE log endpoint now polls the log file during active loop iterations, streaming new content as it's written. Previously it only read the file once (for completed iterations) or used the old single-iteration runner's in-memory state (which the loop doesn't populate). Now detects live iterations via getLoopState() and tails the file until the iteration completes.
Review button now appears first (recommended flow before Start Loop). Document button last. Secondary buttons use brighter color (primary-hover) for better visibility on dark background. Hover state goes to white for clear interactivity feedback.
Clicking Review or Document now shows 'Reviewing...' / 'Documenting...' with a spinner status message. Polls the status endpoint every 3s until complete. All buttons disabled while review/document is running. Fixes the 'flash and nothing happens' issue — the action was succeeding but there was no visual feedback.
All three are equal workflow actions, should have the same visual weight.
… empty-state
Fixes from live testing:
1. Add 'documenting' phase to LoopPhase.
- iteration-loop.ts: sets phase to 'documenting' before spawning
the documenter post-SUCCESS, then 'completed' after. Push happens
after documenter completes.
- Mirrored in web types.
- StatusBadge and PhaseIndicator show the new phase.
2. isLoopActive now derives from loopState.phase, not project.status.
The project.status field was sometimes stale; phase is canonical.
ProjectDetail manages loop state via useEffect + setInterval now,
polling only when phase is in an active set including 'documenting'.
This fixes: logs stop updating when documenter is running
(loop showed 'completed' before documenter finished).
3. Server SSE live-check also includes 'documenting'.
4. History tab shows an info note when loopState.iterations is empty
but project.currentIteration > 0 (historical iterations from prior
runs are not persisted — see git log for full history).
New scripts/: - setup-test-repos.sh: creates /tmp/cfcf-calc and /tmp/cfcf-tracker with git init + initial commit + problem-pack files copied from problem-packs/ (not auto-committed — user reviews first) - cleanup-test-repos.sh: safe removal of all test state (repos, project configs, loop state, logs). Preserves global config. - scripts/README.md: usage docs Plan additions for iteration 5: - autoDocumenter / autoReviewSpecs config flags - Git merge strategy: --no-ff to preserve iteration boundaries in history - Optional branch cleanup after successful merge
Each iteration's merge to main now creates a merge commit instead of
fast-forwarding. This keeps iteration boundaries visible in git log
--graph and makes it easier to see what each iteration contributed.
The merge commit messages are already informative ('Merge cfcf
iteration N') so the graph view becomes a useful project history.
Moved this item from iteration 5 plan to iteration 4 (done).
Previously the script only did direct rm on the config directories. Now it: 1. Checks if the server is reachable 2. If yes, uses DELETE /api/projects/:id to let the server clean up in-memory state and any future teardown logic 3. Falls back to direct file removal otherwise 4. Always does a final rm as a safety net Uses the API directly (not 'cfcf project delete') because the CLI command has an interactive yes/no prompt that doesn't work in scripts.
Previously the script did 'rm -rf' on the entire projects directory and logs directory, which would wipe ALL cfcf projects including any real ones the user had set up. Now the script: 1. Scans project configs and identifies only those with repoPath starting with /tmp/cfcf- 2. Deletes matching projects via API (if server is up) and their specific config dir + logs dir 3. Deletes matching /tmp/cfcf-* repos 4. Leaves everything else untouched Reports exactly what will be deleted before prompting for confirmation.
Major refactor for a cleaner agent-invocation UX: **History** - New packages/core/src/project-history.ts: persistent history.json per project tracking every agent run (review, iteration, document) across loop restarts - architect-runner, documenter-runner, iteration-loop all write/update history events with unique IDs, start/complete timestamps, status **Log file naming** - Architect/documenter logs are now sequence-numbered so re-runs preserve history: architect-001.log, architect-002.log, etc. - Dev/judge logs unchanged (iteration-NNN-dev.log / iteration-NNN-judge.log) - New getAgentRunLogPath() + nextAgentRunSequence() helpers - New getLogPathByFilename() with safety checks (no path traversal) **Server** - GET /api/projects/:id/history -- returns all events - GET /api/projects/:id/logs/:filename -- generic log streaming for any log file (iteration/architect/documenter/judge), with live tailing when the agent is running **Web UI** - New ProjectHistory component replaces IterationHistory. Unified timeline of reviews, iterations, documents with a log link per event (dev+judge for iterations) - LogViewer now takes a LogTarget (projectId + logFile + label), uses the generic /logs/:filename endpoint - Log state lifted to ProjectDetail so SSE streaming persists across tab switches (all tabs kept mounted, toggled via display:none) - LoopControls simplified: removed separate status polling, the log stream IS the feedback. Clicking Review/Start Loop/Document auto-switches to Logs tab and streams that agent's log **Docs** - Updated plan.md (iteration 4), docs/api/server-api.md, docs/guides/cli-usage.md, docs/design/agent-process-and-context.md
…n iteration 5 Both CLI (cfcf run) and web UI (Start Loop) must respect the flags identically. Added per-run override support (CLI flags, web toggles) and discoverability requirement (effective flag values visible in the Config tab).
Treats all three agent invocations (review, loop, document) as siblings
sharing the same state machine: start → progress indicator → stop button
→ log stream → history event.
Core:
- stopReview() and stopDocument() kill the running process and update
history event to 'failed' with 'Stopped by user'
- cleanupStaleRunningEvents() on server startup marks any orphaned
'running' events as failed (recovers from crashes / restarts)
- Runners store ManagedProcess refs in a map keyed by projectId so
stop functions can kill them
Server:
- POST /api/projects/:id/review/stop
- POST /api/projects/:id/document/stop
- start.ts: runs cleanupAllStaleRunningEvents() on startup
Web:
- LoopControls: Review button → Stop Review (red) while running.
Same for Document. Start Loop disabled while any sibling agent runs.
New ActiveAgent type ('loop' | 'review' | 'document' | null) passed
from ProjectDetail.
- PhaseIndicator: now takes agentType ('loop' | 'review' | 'document')
with appropriate phase sequence for each. review: prepare→executing→
collecting. document: prepare→executing. loop: unchanged.
- ProjectDetail: fetches review + document status alongside loop state
and history. Derives activeAgent. Polls all endpoints every 3s when
anything is running, every 10s otherwise (fixes the 'history stuck
at running' bug).
- Status tab: shows progress section for review/document runs, not
just loop state.
- Header shows a 'review running' / 'document running' tag next to
the project name when applicable.
Docs:
- Added POST /review/stop and /document/stop to API reference
- Updated plan.md iteration 4 section with the unified model
Without --verbose, Claude Code in print mode (-p) is silent until the final response, which makes the log viewer appear empty while the agent is working. --verbose shows turn-by-turn text output so users can watch progress, matching Codex's verbose-by-default behavior. Consistent live progress across agents.
Each iteration is now a table with columns: # | Status | Title | Notes. Status uses: ✅ Done · 🟡 In progress · ❌ Not started · ⏸ Deferred ·⚠️ Blocked. The Notes column records outcomes, commit references, and reasons for deferral — this is what I re-read at the start of a new session to reorient. Iteration 4 now shows the real state: web GUI + unified agent-run model + history tracking are done (items 4.1-4.11); diff viewer, CLI convenience commands (cfcf log / push / prepare), error handling audit, token tracking, and notifications are still not started (items 4.12-4.18); cross-project knowledge, reflection, sandbox research, and binary self-hosting are deferred to iteration 5.
Dropped during refactor. Now item 5.6: 'cfcf server start' works from compiled binary without a Bun runtime. Renumbered 5.6→5.7 (embed templates), 5.7→5.8 (installer), etc. Clarified dependency chain: 5.6 (binary self-hosting) + 5.7 (embed templates) are prerequisites for 5.8 (installer) so users don't need a separate template download.
…ndbox) 4.19-4.22 previously had very terse deferral notes. Expanded with full context so anyone reading the plan can understand what the deferred item actually is, without having to cross-reference iteration 5. Also enriched 5.12 (Tier 3 Reflection) with full description of what a reflection agent does and how it complements the per-iteration judge.
…tatus) Also enriched the plan's 4.18 description (notification hooks). Changes: - DocumentHistoryEvent gets docsFileCount, committed, exitCode fields - documenter-runner counts .md files in docs/ after each run - iteration-loop updates committed=true after its post-SUCCESS commit (standalone document runs set committed=false since they don't auto-commit) - ProjectHistory renders a DocumentResult: 'N docs' + '✓ committed' or '(not committed)' as appropriate Fixes the UX inconsistency where Review showed readiness, Iteration showed judge determination + quality + merged, but Document showed nothing.
Makes cfcf resilient to crashes, signal interrupts, and unexpected agent
failures. The system now always leaves consistent state on disk regardless
of how or when it's interrupted.
Changes:
Core:
- New active-processes.ts: central registry of all spawned agent processes
keyed by projectId + role. registerProcess, killProjectProcesses,
killAllActiveProcesses. 8 unit tests.
- All runners (architect, documenter, iteration-loop dev/judge) now
register their processes with this registry.
- New cleanupStaleActiveLoops() in iteration-loop.ts: on startup, any
loop-state.json with an active phase is marked 'failed' with a
descriptive error (was orphaned by a crash/restart).
- All fire-and-forget .catch() handlers in runners are now themselves
try/catch-wrapped so a failure to record an error doesn't silently
swallow it -- the error goes to console.error.
Server:
- start.ts: graceful shutdown on SIGINT/SIGTERM. Kills all tracked
processes, marks their history events as failed, removes PID file,
exits. Second signal forces immediate exit.
- Installs process.on('unhandledRejection') and
process.on('uncaughtException') handlers that log and trigger
graceful shutdown before exiting.
- Warns at startup when running in watch mode (BUN_WATCH=1 or
--watch in execArgv) so the user knows file changes will kill
active agent runs.
- Calls cleanupStaleActiveLoops() on startup alongside the existing
cleanupAllStaleRunningEvents().
Web:
- ProjectDetail shows an improved error banner for loop failures:
title + message (monospace) + hint text for server-restart cases.
- Banner is now subtle (left-border accent) instead of a solid red
background.
Tests: 154 pass (up from 146), 8 new for the active-processes registry.
When running unattended, cfcf now notifies the user at key moments so they can walk away and come back only when needed — the dark factory operating mode. Events (v1): - loop.paused (cadence, anomaly, user_input_needed, max_iterations) - loop.completed (success, failure, stopped, max_iterations) - agent.failed (non-zero exit + no signals) Channels (v1): - terminal-bell: ASCII BEL to stderr, terminal beeps or flashes - macos: native Notification Center via osascript - linux: native desktop notification via notify-send - log: JSON Lines audit trail at ~/.cfcf/logs/<project>/notifications.log Features: - Fire-and-forget dispatch (never blocks the loop) - Per-channel 5-second timeout (one slow channel can't stall others) - A failing channel logs via console.error but doesn't crash the dispatcher - Global + per-project config with cascading defaults - Asked during 'cfcf init' (OS-appropriate channel auto-selected) - Config displayed in 'cfcf config show' and web UI Config tab Implementation: - packages/core/src/notifications/: dispatcher + 4 channels + registry - Wired into iteration-loop (paused, completed, loop failure), architect-runner (agent.failed), documenter-runner (agent.failed) - 8 new unit tests for dispatcher with mock channels (no actual shell-outs in tests) Deferred to iteration 5: - webhook channel - rate limiting - additional events (iteration.completed, review.completed, etc.) - web UI editing of notification config
- CHANGELOG.md: add 0.4.0 section covering web GUI, server API, operational robustness (4.16), and notifications (4.18) - README.md: update status to iteration 4, refresh architecture diagram - docs/api/server-api.md: document "documenting" loop phase - docs/design/agent-process-and-context.md: reflect Documenter role in docs responsibility rows - docs/design/cfcf-stack.md: mark web GUI as available in iteration 4, update notifications channels - docs/design/technical-design.md: refreshed system diagram with Active Processes, Notifications Dispatcher, History Store, Graceful Shutdown; added new persistence files - docs/guides/workflow.md: note CLI / web GUI parity Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Persist the full parsed ArchitectSignals inline on ReviewHistoryEvent
(the repo file cfcf-docs/cfcf-architect-signals.json is overwritten on
every review run, so inline persistence is what makes prior reviews
viewable in the UI).
New ArchitectReview React component renders:
- Readiness badge (READY / NEEDS_REFINEMENT / BLOCKED) with color code
- Guidance banner keyed to readiness ("Edit files under problem-pack/
and rerun Review" for NEEDS_REFINEMENT, etc.)
- Collapsible Gaps / Suggestions / Risks sections, with gaps +
suggestions auto-expanded when the review needs refinement
- Collapsible recommended_approach section
Integration points:
- Status tab: replaces the one-line "Readiness: X" placeholder with
the full component for the latest completed review
- History tab: readiness cell becomes a clickable pill; expands an
inline detail row with ArchitectReview in compact mode
Backward-compat: pre-4.23 review events without `signals` still render
their readiness label as plain text.
Tests: 4 new cases in packages/core/src/project-history.test.ts
covering signals round-trip, legacy entries, and iteration events not
accidentally getting a signals field.
Docs: plan.md item 4.23 added; CHANGELOG Unreleased section;
server-api.md history example; agent-process-and-context.md now
explains the scratchpad vs persisted-event distinction.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…pt (4.24)
Two independent changes bundled:
## 4.25 -- Live elapsed-time counter on PhaseIndicator
While an agent run (iteration / review / document) is active, show the
running duration next to the subtitle row, e.g.
"Iteration 2 · 2m 14s"
"Review run 1 · 00m 42s"
- Shared `packages/web/src/utils/time.ts` with `formatDuration` /
`formatDurationOrRunning`. Same format used by the live timer and
the existing History tab Duration column (ProjectHistory now imports
from the shared util instead of a local copy).
- New `useElapsed` hook (1s tick, no server calls). Hook unmounts the
interval when the run enters a terminal state.
- Freeze behavior: hide timer on completed/failed/stopped; render
frozen on paused (so the user sees how long the iteration has taken
before deciding how to resume).
- Tabular-nums + monospace so the digit doesn't shift every second.
First web-package test suite: `packages/web/src/utils/time.test.ts`
(9 unit tests for the formatter, including sub-minute / sub-hour / h+
/ invalid-input / Date.now()-fallback cases). Root `test` script now
runs `bun test packages/web` as well; `test:web` alias added.
## 4.24 -- One-phase-per-iteration prompt
While running the tracker example, the user observed that a hint file
instructing the dev agent to "map phases to iterations, stop after
each, update progress" produced clean checkpointed iterations. That
behavior is exactly what cfcf's iteration loop is designed for, so
promote it into the built-in prompts:
- `packages/core/src/templates/process.md`: new "Iteration Scope --
one phase per iteration" section telling the dev agent to read
plan.md, execute only the next pending chunk, mark completed items
[x] with brief notes, and exit.
- `packages/core/src/templates/cfcf-architect-instructions.md`:
plan.md outline now maps phases to concrete iterations
("## Iteration 1 -- Foundation") instead of generic phases, so
downstream dev iterations have a ready-made checkpoint schedule.
- `packages/core/src/templates/plan.md`: comment block now models the
same iteration structure so the dev agent has a guide even when no
architect ran.
Docs: plan.md items 4.24 + 4.25 added; CHANGELOG Unreleased updated.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous commit only updated the static `process.md` template, which is copied into a project via `copyTemplateIfMissing` on first iteration -- so existing projects (whose `process.md` was copied before this change) never saw the new instruction. It also left the one-line CLI prompt silent about iteration scoping, and the adapter question (does this reach Codex too?) was implicit. Fix: inject the discipline at three levels so it reaches every run, every project, and both adapters: 1. `context-assembler.generateInstructionContent()` now embeds an "Iteration Scope -- one phase per iteration" section in the Tier-1 instruction file that is regenerated fresh every iteration. This is the live channel that reaches existing projects. The content is written by iteration-loop.ts to whichever filename the dev adapter declares (`CLAUDE.md` for Claude Code, `AGENTS.md` for Codex) -- same discipline, both adapters, no code duplication. 2. The one-line dev-agent CLI prompt in iteration-loop.ts:660 now spells out "execute only the next pending chunk from plan.md" directly, so the agent sees the discipline even before reading its instruction file. 3. The static `process.md` template already got the long-form "Iteration Scope" section in the previous commit; kept as the canonical reference for new projects. Tests: new context-assembler assertion that "Iteration Scope" + "one phase per iteration" + "cfcf-docs/plan.md" appear in the generated instruction content for BOTH iteration 1 (maps phases to iterations) and later iterations (picks up next pending chunk). Docs sweep for "each phase = clean iteration = clean session": - CLAUDE.md (project root): design principle #6 expanded to describe the discipline and where it's injected - README.md: "How It Works" flow now shows architect-maps-phases + one-phase-per-iteration + fresh-process-picks-up-from-plan.md - docs/guides/workflow.md: new "One phase per iteration, one clean session per phase" subsection + updated iteration-loop diagram - docs/design/agent-process-and-context.md: iteration-model step 2 explains the discipline and references generateInstructionContent() - docs/design/technical-design.md: Tier 1 context strategy now lists "Iteration Scope discipline" as an always-included component - docs/plan.md: 4.24 entry rewritten to describe the three-level injection and both-adapters coverage - CHANGELOG.md: Unreleased 4.24 entry mirrors the above Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fstamatelopoulos
added a commit
that referenced
this pull request
Apr 19, 2026
feat(iter-4): web GUI, robustness, notifications, review UI, per-iter scope
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Ship iteration 4: a full web GUI, operational robustness (graceful shutdown +
state persistence), notification hooks, richer architect-review presentation,
live elapsed-time timer, and a uniform "one phase per iteration" discipline
enforced on every dev-agent run.
The CLI remains the primary headless interface; the web GUI served by the
same Hono server is the monitoring-and-control surface.
4.6 -- Web GUI (React + Vite, served by Hono)
4.16 -- Robust error handling + graceful shutdown
and marks history events + loop states as failed
marked failed
unhandledRejection/uncaughtExceptionhandlers trigger gracefulshutdown
4.18 -- Notification hooks
loop.paused,loop.completed,agent.failedterminal-bell,macos(osascript),
linux(notify-send),log(JSON Lines)never blocks another
cfcf init4.23 -- Architect review presentation
ArchitectSignalsnow persisted inline onReviewHistoryEvent.signals(the repo scratchpad file is overwrittenevery run, so inline persistence is what keeps prior reviews viewable)
ArchitectReviewReact component renders readiness, guidance("Edit files under
problem-pack/and rerun Review" etc.), andcollapsible gaps / suggestions / risks / recommended_approach
(clickable readiness pill expands an inline detail row)
4.24 -- Per-iteration plan execution discipline (three-level injection)
context-assembler.generateInstructionContent()embeds an"Iteration Scope -- one phase per iteration" section in the Tier-1
instruction file (
CLAUDE.mdfor Claude Code,AGENTS.mdfor Codex)regenerated every iteration -- reaches existing projects whose
static
process.mdwas copied before the changechunk from
plan.md"process.md, architectplan.mdoutline) modelthe same phases-as-iterations structure for new projects
adapter specifies
user-authored hint; promoted into the core prompts so every project
gets checkpointed iterations by default
4.25 -- Live elapsed-time counter
formatDuration+useElapsedhook (1s tick, no server calls)the shared util)
Other changes
--no-ffmerges for iteration branches to preserve history/tmp/cfcf-*--verbosefor live progress in logs-a neverbeforeexec)architectAgent/documenterAgentRepo docs
All design and user docs updated:
CHANGELOG,README,CLAUDE.md,docs/plan.md,docs/design/{agent-process-and-context,cfcf-stack, technical-design}.md,docs/api/server-api.md,docs/guides/workflow.md.Test plan
bun run typecheckpassesbun run testpasses (170 core + 24 server + 2 cli + 9 web = 205 tests)bun run build:webproduces the bundle served by the Hono serverStatus tab's ArchitectReview component; expand a History row's
readiness pill
"Iteration 1" while running; confirm it disappears on completion
section in its
CLAUDE.md/AGENTS.mdeach run and executesone phase at a time, updating
cfcf-docs/plan.mdbefore exitingkill -INT <server-pid>while a loop isrunning; confirm the agent process is terminated and the
history event is marked failed
fires (terminal bell / macOS banner / log line)