Skip to content

richsak/webster

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

208 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Webster

A council of 9 Claude Managed Agents that autonomously audits a small-business landing page every week, synthesizes the findings, and opens a PR with the proposed redesign.

Built with Opus 4.7 — Anthropic × Cerebral Valley Hackathon submission (deadline 2026-04-26).

The one-line pitch

A council of 9 Claude Managed Agents audits a landing page once a week, synthesizes findings across SEO, brand-voice, compliance, conversion, copy, and rendered-layout lenses, and hands the operator a reviewable draft PR. The win is cycle time — the analytical loop runs in tens of minutes instead of multi-week agency rounds — and a runtime mechanism (Critic Genealogy) where Opus 4.7 detects an unowned audit gap and registers a brand-new specialist agent against the live API mid-run. A human still reviews the PR before it ships.

The hero moment — Critic Genealogy

weekly trigger
   │
   ▼
Planner ──► SEO / brand / compliance / conversion / copy / monitor / visual-reviewer
   │                         │
   │                         └── out-of-scope overlap found
   ▼
Critic Genealogy (Opus 4.7) ──► registers new specialist ──► reruns on council branch
   │
   ▼
Redesigner ──► draft PR for human review

The 5 pre-registered critics (SEO, brand-voice, functional-health-compliance, conversion, copy) each cover one scope. They commit findings to a shared branch and flag cross-cutting issues in their Out of scope sections.

When >=2 critics flag the same scope as unowned, scripts/critic-genealogy.ts asks Opus 4.7 to (a) author a new critic spec, (b) register it live via POST /v1/agents, and (c) invoke it on the same council branch — all at runtime, no human in the loop.

Proof it works: fixture dry-run against the 5 committed critics' findings produces a schema-valid accessibility-critic spec with 7 focus bullets, 6 cross-tagged out-of-scope bullets, and a WCAG-tuned severity rubric. Live Opus call, ~$0.03, ~15s:

bun scripts/critic-genealogy.ts --fixtures scripts/__tests__/fixtures/genealogy --dry-run

Architecture

            weekly trigger (cron, manual, or wbs prompt)
                              │
                              ▼
          Claude Code orchestrator session (Opus 4.7)
                              │
                              ▼
                      Planner (Opus 4.7)
                 memory + verdicts → plan.md
                              │
               fans out 7 sessions in parallel ─────────────┐
                                                            │
  ┌───────┬──────────┬──────────┬────────┬────────┬─────────┴─┬───────────────┐
  │  SEO  │  brand   │ FH-compl │  CRO   │  copy  │  monitor  │ visual-review │
  │ Sonnet│ Sonnet   │  Sonnet  │ Sonnet │ Sonnet │  Haiku    │   Opus 4.7    │
  │  4.6  │   4.6    │   4.6    │  4.6   │  4.6   │   4.5     │    (post)     │
  └───┬───┴────┬─────┴─────┬────┴────┬───┴────┬───┴─────┬─────┴──────┬────────┘
      │        │           │         │        │         │            │
      └────────┴─── each critic commits via GitHub MCP ──┴────────────┘
                  to council/<week-date> branch          │
                                                         ▼
                     Critic Genealogy (Opus 4.7, runtime)
                     gap? → new critic spec → POST /v1/agents
                          → POST /v1/sessions → commits findings
                                                         │
                                                         ▼
                              Redesigner (Opus 4.7)
                              reads all findings on branch
                              commits proposal.md + decision.json
                                                         │
                                                         ▼
                              Draft PR opened
                              human merges → Cloudflare redeploys

Why this composition wins: Managed Agents give each critic a pre-registered scope + MCP tools + vault credentials. Runtime agent registration via POST /v1/agents lets the orchestrator spawn specialists mid-run — novel capability that callable_agents (research preview) would handicap by gating. Full detail in context/ARCHITECTURE.md.

Honest scope note: the site/ fork of the demo substrate LP and the Claude Code Routine cron wiring are manual in this submission — the composition does not depend on either. The redesigner currently emits proposal.md (the PR body) instead of proposal.diff; diff mode becomes a one-file change once the site source lands.

For judges

30-second pitch: Webster is an autonomous landing-page improvement council. Nine Claude Managed Agents plan, audit, monitor, synthesize, and package one weekly redesign proposal; the standout demo is Critic Genealogy, where Opus 4.7 detects an unowned audit gap and registers a new specialist at runtime.

Live-run evidence: the operator surface is the /webster-weekly-council skill (library: SKILL.md index + on-demand phase references + helper scripts); the full single-page runbook lives at prompts/second-wbs-session.md. Registration IDs live in environments/webster-council-env.id and context/*/id.txt. Run artifacts are written under history/<week>/ when the weekly run executes.

Demo arc artifacts: an 11-week simulation council run, week-by-week, browsable as files. Start at demo-output/landing-page/INDEX.md for the narrated walk-through. Each week directory under demo-output/landing-page/w00..w10/ contains desktop/mobile/tablet screenshots, heatmap JSON+SVG, synthetic analytics, and the visual reviewer's markdown verdict. Anthropic Managed Agents memory-store provisioning is captured at assets/memory-stores-screenshots/. The render pipeline that turns these per-week assets into a timelapse video is submission tooling and lives outside the public repo.

Hero code: scripts/critic-genealogy.ts is the runtime specialist-spawn path; scripts/__tests__/critic-genealogy.test.ts and scripts/__tests__/fixtures/genealogy are the fixture proof.

Validate locally: run bun install once, then bun run validate for type-check, zero-warning lint, format, agent schemas, findings format, markdown, and tests.

5-minute judge tour

If you're evaluating this submission and have five minutes:

  1. Read the 30-second pitch + hero moment above (you're here) — that's the architecture and the novel-mechanic claim in one screen.
  2. Open demo-output/landing-page/INDEX.md — narrated walk through the 11-week LP timelapse. One paragraph per week, links to that week's screenshots + heatmap + visual-reviewer verdict.
  3. Click into one week's visual-review.md (e.g. w04/visual-review.md for the largest beat, w10/visual-review.md for the terminal polish) — that's what the council actually wrote about its own changes.
  4. Read scripts/critic-genealogy.ts — the hero file. Two tools (report_no_gap / report_gap), Opus 4.7 picks one, then drafts a JSON spec, registers it via POST /v1/agents, and invokes it via POST /v1/sessions — all at runtime.
  5. Optional, if a terminal is handy: bun install && bun scripts/critic-genealogy.ts --fixtures scripts/__tests__/fixtures/genealogy --dry-run. Live Opus 4.7 call against the committed fixture findings, ~15s wall clock, prints the new critic spec it would have registered.

agents/production/ holds the 9 pre-registered specs; agents/simulation/ holds the 1:1 simulation mirror used for the timelapse run. prompts/second-wbs-session.md is the production weekly orchestrator (locked); skills/webster-weekly-council/SKILL.md is the same flow as a Claude Code skill.

What's in the repo

webster/
├── agents/
│   ├── production/      9 Managed Agent specs that run Nicolette's live council
│   └── simulation/      9 LP-sim specs (1:1 mirror) that drive the timelapse demo
├── context/             architecture, features, quality gates, per-critic findings dirs
├── environments/        webster-council-env.json (single Anthropic environment)
├── prompts/             first-wbs-session.md (bootstrap), second-wbs-session.md (weekly run runbook)
├── scripts/             validate-agents, validate-findings, critic-genealogy
├── skills/              webster-weekly-council (operator surface for the weekly run),
│                        webster-onboarding (first-time setup for a new operator),
│                        webster-lp-audit (shared critic discipline),
│                        webster-browser-audit (Playwright-headless audit capability)
├── .github/workflows/   CI: type + lint + format + schema + findings + markdown + tests
├── .husky/              pre-commit runs the same gates locally
└── AGENTS.md            operator guide for in-repo work

The weekly flow

The live council runner is a Claude Code library skill: /webster-weekly-council — slim SKILL.md index, on-demand phase references under references/, and reusable helper scripts under scripts/. The single-page bash-in-markdown runbook at prompts/second-wbs-session.md is the same flow as a scrollable readable page. Both produce identical artifacts. The flow:

  1. Seeds 10 weeks of mock analytics on first run (monitor needs baselines to diff).
  2. Prepares a shared council/YYYY-MM-DD branch.
  3. Runs the planner — marshals history/memory.jsonl, recent verdicts, and monitor anomalies; writes history/YYYY-MM-DD/plan.md.
  4. Fans out 7 Managed Agent sessions (monitor + 5 critics + visual-reviewer) — each commits context/critics/<scope>/findings.md via GitHub MCP.
  5. Validates findings via bun scripts/validate-findings.ts.
  6. Runs the redesigner — commits history/YYYY-MM-DD/proposal.md + decision.json.
  7. Opens a draft PR.

Wall-clock per run is in the tens of minutes; the bulk of that is the parallel critic fan-out, not orchestration overhead.

Submission note: all 9 agent specs are registered against the live Anthropic API (IDs in environments/webster-council-env.id + context/*/id.txt), the genealogy hero is live-validated (~$0.03 Opus 4.7 dry-run documented above), and the full orchestration prompt is committed. The end-to-end fan-out that produces history/YYYY-MM-DD/ artifacts is the operator-triggered weekly run — history/ is empty at submission time by design. Loop has been exercised component-by-component.

Quality gates

Strict validation discipline. One command:

bun run validate

Chains: tsc --noEmiteslint --max-warnings 0prettier --check → agent+environment schema validation → findings format validation → markdownlint → bun test. Every gate is blocking. Pre-commit hook enforces the same set. CI enforces the same set on push + PR. See context/QUALITY-GATES.md.

Current state: 29 test files green via bun run validate, 0 lint warnings, 0 type errors, 18 JSON specs valid, 6 findings files valid.

Prize-lane alignment

  • Best Use of Claude Managed Agents — 9 pre-registered production agents (with a 1:1 sim mirror in agents/simulation/) + runtime-registered genealogy critics, all invoked via /v1/sessions with vault-bound GitHub MCP (no tokens in user.message).
  • Creative Exploration — runtime critic genealogy. Gap detection → template-cloned spec → live POST /v1/agents → immediate invocation. The emergent-capability demo beat.

Running it yourself

Prerequisites

  • bun >= 1.3.0
  • jq (for bash scripts inside the prompts)
  • gh CLI (authenticated to the target repo)
  • git with commit-signing configured
  • An Anthropic API key stored in macOS keychain under service anthropic-webster. First-session will show the exact security add-generic-password command if missing.

The wbs alias (project convention)

The wbs @prompts/... commands below assume a shell alias that launches Claude Code into Webster's dispatcher mode (Opus 4.7, 1M context, custom system prompt at .claude/dispatcher.md, custom settings at .claude/dispatcher-settings.json). Add to your shell rc:

alias wbs='cd ~/Projects/webster && claude --dangerously-skip-permissions --model claude-opus-4-7 \
  --settings .claude/dispatcher-settings.json \
  --system-prompt "$(cat .claude/dispatcher.md)"'

Or run the equivalent claude --settings ... --system-prompt ... directly without aliasing. Either works.

Bootstrap (one-time)

wbs @prompts/first-wbs-session.md

Registers the single environment + 9 production agents against the Anthropic API. Runs an SEO hello-world to prove the council loop end-to-end. Artifacts: environments/webster-council-env.id + context/{monitor,redesigner,critics/*}/id.txt.

Weekly council run

In Claude Code (primary):

/webster-weekly-council

Or as a single-page prompt (fallback):

wbs @prompts/second-wbs-session.md

Both run the full planner + fan-out + redesigner + draft PR described above. The skill loads phase references on demand (smaller per-turn context budget); the prompt is one readable file.

Spawn a genealogy critic manually

bun scripts/critic-genealogy.ts --branch council/$(date -u +%Y-%m-%d)

Reads the week's findings, asks Opus 4.7 if any scope is unowned, and spawns + registers + invokes a new critic if yes. Use --fixtures scripts/__tests__/fixtures/genealogy --dry-run to see the flow without making API writes.

Meta-attribution

Every layer uses Opus 4.7 as author:

Layer Opus 4.7 role
9 agent specs (agents/production/*.json) Drafted during bootstrap session, validated against live API
Bootstrap + weekly prompts Opus-authored during dispatcher sessions; in git history
Critic Genealogy script Opus-authored; see dcf5726 + e474301
Redesigner synthesis Opus 4.7 at runtime — its decision.json outputs live in history/<date>/
Runtime critic spawning Opus 4.7 selects the gap AND authors the new spec via tool_use

Repo is entirely MIT. No Anthropic or third-party proprietary code.

License

MIT. See LICENSE.

About

A council of 9 Claude Managed Agents that autonomously audits a small-business landing page every week, synthesizes findings across SEO/brand/compliance/conversion/copy/visual lenses, and opens a draft PR with the proposed redesign. Built with Opus 4.7 for the Anthropic × Cerebral Valley hackathon.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors