hookdeck · alexbouchardd · Apr 12, 2026 · Apr 7, 2026 · Apr 8, 2026 · Apr 8, 2026
diff --git a/.cursor/rules/agent-evaluation-authoring.mdc b/.cursor/rules/agent-evaluation-authoring.mdc
@@ -0,0 +1,14 @@
+---
+description: Authoring standards for docs/agent-evaluation (no eval leakage in user turns)
+globs: docs/agent-evaluation/**/*
+---
+
+When editing anything under `docs/agent-evaluation/`, read and follow **`docs/agent-evaluation/AGENTS.md`**.
+
+**Quick guardrails for `scenarios/*.md`:**
+
+- **`### Turn N — User`** blockquotes = in-character **product engineer** speech only.
+- **Never** in user lines: `Option 1/2/3`, `Turn 0`, `scenario`, `eval`, `success criteria`, `scoreScenario`, references to “the prompt/instructions you already have” or named template sections.
+- Put rubric detail in **`## Success criteria`** / **Intent** / **Failure modes**, not in the user quote.
+
+Full checklist and rationale: **`docs/agent-evaluation/AGENTS.md`**.
diff --git a/.github/workflows/docs-agent-eval-ci.yml b/.github/workflows/docs-agent-eval-ci.yml
@@ -0,0 +1,70 @@
+# Runs scenarios 01+02 (curl + TypeScript SDK) with heuristic + LLM judge.
+# Sets EVAL_LOCAL_DOCS=1 so the agent reads repo docs under docs/ (not production WebFetch).
+# Triggers: workflow_dispatch, or push (main) / pull_request when docs / OpenAPI / agent-eval / TS SDK paths change.
+# Each run bills Anthropic (agent + judge).
+# Requires repo secrets: ANTHROPIC_API_KEY, EVAL_TEST_DESTINATION_URL, OUTPOST_API_KEY
+# (OUTPOST_TEST_WEBHOOK_URL uses the same URL as EVAL_TEST_DESTINATION_URL in CI.)
+# See docs/agent-evaluation/README.md § CI (recommended slice).
+name: Docs agent eval (CI slice)
+
+on:
+  workflow_dispatch:
+  push:
+    branches:
+      - main
+    paths:
+      - "docs/content/**"
+      - "docs/apis/**"
+      - "docs/agent-evaluation/**"
+      - "docs/README.md"
+      - "docs/AGENTS.md"
+      - "sdks/outpost-typescript/**"
+      - ".github/workflows/docs-agent-eval-ci.yml"
+  pull_request:
+    paths:
+      - "docs/content/**"
+      - "docs/apis/**"
+      - "docs/agent-evaluation/**"
+      - "docs/README.md"
+      - "docs/AGENTS.md"
+      - "sdks/outpost-typescript/**"
+      - ".github/workflows/docs-agent-eval-ci.yml"
+
+jobs:
+  eval-ci:
+    # Fork PRs cannot use repository secrets; skip instead of failing a required-looking job.
+    if: github.event_name != 'pull_request' || github.event.pull_request.head.repo.full_name == github.repository
+    runs-on: ubuntu-latest
+    timeout-minutes: 60
+    defaults:
+      run:
+        working-directory: docs/agent-evaluation
+
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@v4
+
+      - name: Set up Node.js
+        uses: actions/setup-node@v4
+        with:
+          node-version: "20"
+          cache: npm
+          cache-dependency-path: docs/agent-evaluation/package-lock.json
+
+      - name: Install dependencies
+        run: npm ci
+
+      - name: Run eval CI slice (scenarios 01, 02)
+        env:
+          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
+          EVAL_TEST_DESTINATION_URL: ${{ secrets.EVAL_TEST_DESTINATION_URL }}
+          EVAL_LOCAL_DOCS: "1"
+        run: ./scripts/ci-eval.sh
+
+      - name: Execute generated curl + TypeScript artifacts (live Outpost)
+        env:
+          OUTPOST_API_KEY: ${{ secrets.OUTPOST_API_KEY }}
+          OUTPOST_TEST_WEBHOOK_URL: ${{ secrets.EVAL_TEST_DESTINATION_URL }}
+          OUTPOST_API_BASE_URL: https://api.outpost.hookdeck.com/2025-07-01
+          OUTPOST_CI_PUBLISH_TOPIC: user.created
+        run: ./scripts/execute-ci-artifacts.sh
diff --git a/.gitignore b/.gitignore
@@ -1,10 +1,15 @@
 # Environment variables
 .env
+.env.ci
 .outpost.yaml
 
 # Built binaries
 /dist
 /bin
+
+# Documentation (local build artifacts; content lives under docs/content/)
+/docs/dist/
+/docs/TEMP-*.md
 /tmp
 
 # Golang test coverage

diff --git a/AGENTS.md b/AGENTS.md
@@ -0,0 +1,5 @@
+# Coding agent notes (Outpost)
+
+When you change files under **`docs/agent-evaluation/`** (scenarios, scoring, harness docs), read and apply **[`docs/agent-evaluation/AGENTS.md`](docs/agent-evaluation/AGENTS.md)** first. It defines anti–“teach to the test” rules for user-turn wording and scenario structure.
+
+For this repo’s PR review format, see **`CLAUDE.md`**.
diff --git a/README.md b/README.md
@@ -53,7 +53,7 @@ Outpost is built and maintained by [Hookdeck](https://hookdeck.com?ref=github-ou
 
 ![Outpost architecture](docs/public/images/architecture.png)
 
-Read [Outpost Concepts](https://outpost.hookdeck.com/docs/concepts) to learn more about the Outpost architecture and design.
+Read [Outpost Concepts](https://hookdeck.com/docs/outpost/concepts) to learn more about the Outpost architecture and design.
 
 ## Features
 
@@ -70,17 +70,17 @@ Read [Outpost Concepts](https://outpost.hookdeck.com/docs/concepts) to learn mor
 - **Webhook best practices**: Opt-out webhook best practices, such as headers for idempotency, timestamp and signature, and signature rotation.
 - **SDKs and MCP server**: Go, Python, and TypeScript SDK are available. Outpost also ships with an MCP server. All generated by [Speakeasy](https://speakeasy.com).
 
-See the [Outpost Features](https://outpost.hookdeck.com/docs/features) for more information.
+See the [Outpost Features](https://hookdeck.com/docs/outpost/features) for more information.
 
 ## Documentation
 
-- [Overview](https://outpost.hookdeck.com/docs/overview)
-- [Concepts](https://outpost.hookdeck.com/docs/concepts)
-- [Quickstarts](https://outpost.hookdeck.com/docs/quickstarts)
-- [Features](https://outpost.hookdeck.com/docs/features)
-- [Guides](https://outpost.hookdeck.com/docs/guides)
-- [API Reference](https://outpost.hookdeck.com/docs/api)
-- [Configuration Reference](https://outpost.hookdeck.com/docs/references/configuration)
+- [Overview](https://hookdeck.com/docs/outpost/overview)
+- [Concepts](https://hookdeck.com/docs/outpost/concepts)
+- [Quickstarts](https://hookdeck.com/docs/outpost/quickstarts)
+- [Features](https://hookdeck.com/docs/outpost/features)
+- [Guides](https://hookdeck.com/docs/outpost/guides)
+- [API Reference](https://hookdeck.com/docs/outpost/api)
+- [Configuration Reference](https://hookdeck.com/docs/outpost/self-hosting/configuration)
 
 _The Outpost documentation is built using the [Zudoku documentation framework](https://zuplo.link/outpost)._
 
@@ -144,7 +144,7 @@ For other cloud Redis services or self-hosted Redis clusters, set `REDIS_CLUSTER
 ```sh
 go run cmd/redis-debug/main.go your-redis-host 6379 password 0 [tls] [cluster]
 ```
-See the [Redis Troubleshooting Guide](https://docs.outpost.hookdeck.com/references/troubleshooting-redis) for detailed guidance.
+See the [Redis Troubleshooting Guide](https://hookdeck.com/docs/outpost/self-hosting/guides/troubleshooting-redis) for detailed guidance.
 
 Start the Outpost dependencies and services:
 
@@ -241,7 +241,7 @@ Open the `redirect_url` link to view the Outpost portal.
 
 ![Dashboard homepage](docs/public/images/dashboard-homepage.png)
 
-Continue to use the [Outpost API](https://outpost.hookdeck.com/docs/api) or the Outpost portal to add and test more destinations.
+Continue to use the [Outpost API](https://hookdeck.com/docs/outpost/api) or the Outpost portal to add and test more destinations.
 
 ## Contributing
 

diff --git a/build/entrypoint.sh b/build/entrypoint.sh
@@ -23,7 +23,7 @@ if ! /usr/local/bin/outpost migrate init --current --log-format=json; then
     echo "  docker run --rm hookdeck/outpost migrate --help"
     echo ""
     echo "Learn more about Outpost migration workflow at:"
-    echo "  https://outpost.hookdeck.com/docs/guides/migration"
+    echo "  https://hookdeck.com/docs/outpost/self-hosting/guides/migration"
     echo ""
     exit 1
 fi

diff --git a/docs/agent-evaluation/.env.example b/docs/agent-evaluation/.env.example
@@ -0,0 +1,37 @@
+# Copy to .env and fill in. .env is gitignored at the repo root.
+
+# Required for npm run eval (Claude Agent SDK — calls Anthropic only)
+ANTHROPIC_API_KEY=
+
+# Required for Turn 0 template (test webhook URL injected into the prompt)
+EVAL_TEST_DESTINATION_URL=
+
+# Strongly recommended for a *full* eval: run the agent’s curl/script/app against a real project.
+# The harness does not read this key; you (or a future verifier) use it after the run.
+# OUTPOST_API_KEY=   # required for ./scripts/execute-ci-artifacts.sh after eval:ci; GitHub Actions CI execution step
+# OUTPOST_API_BASE_URL=https://api.outpost.hookdeck.com/2025-07-01
+# OUTPOST_TEST_WEBHOOK_URL=https://hkdk.events/your-source-id   # often same as EVAL_TEST_DESTINATION_URL
+# OUTPOST_CI_PUBLISH_TOPIC=user.created   # optional; publish topic for npm run smoke:execute-ci (must exist in project)
+
+# Optional (see npm run eval -- --help)
+# EVAL_API_BASE_URL=https://api.outpost.hookdeck.com/2025-07-01
+# EVAL_TOPICS_LIST=- user.created
+# EVAL_DOCS_URL=https://hookdeck.com/docs/outpost
+# EVAL_LOCAL_DOCS=1
+# EVAL_LLMS_FULL_URL=
+# Default includes Write, Edit, Bash (per-run workspace + installs). Override to narrow:
+# EVAL_TOOLS=Read,Glob,Grep,WebFetch,Write,Edit,Bash
+# EVAL_MODEL=
+# EVAL_MAX_TURNS=40
+# Long runs (08–10): periodic stderr heartbeats while each agent query is in flight
+# EVAL_PROGRESS=1
+# EVAL_PROGRESS_INTERVAL_MS=30000
+# EVAL_PERMISSION_MODE=dontAsk
+# EVAL_PERSIST_SESSION=true
+# Debug only: allow Write/Edit outside the per-run workspace (not recommended)
+# EVAL_DISABLE_WORKSPACE_WRITE_GUARD=1
+
+# Scoring is ON by default after each scenario (heuristic + LLM). Opt out:
+# EVAL_NO_SCORE_HEURISTIC=1
+# EVAL_NO_SCORE_LLM=1
+# EVAL_SCORE_MODEL=claude-sonnet-4-20250514
diff --git a/docs/agent-evaluation/AGENTS.md b/docs/agent-evaluation/AGENTS.md
@@ -0,0 +1,46 @@
+# Agent evaluation — authoring rules for humans & coding agents
+
+This file applies to **everything under `docs/agent-evaluation/`** (scenarios, README, tracker, harness TypeScript). Follow it when adding or editing eval specs so we do not **teach to the test** or confuse **evaluator docs** with **in-character user speech**.
+
+## Who reads what
+
+| Audience | Content |
+|----------|---------|
+| **The model under test** | Turn 0 = pasted [`hookdeck-outpost-agent-prompt.mdoc`](../content/quickstarts/hookdeck-outpost-agent-prompt.mdoc) template only, plus **Turn N — User** blockquotes (verbatim user role-play). |
+| **Humans / harness** | Intent, preconditions, eval harness JSON, Success criteria, Failure modes, `score-transcript.ts`, README. |
+
+**Never** put harness vocabulary into **user** lines. The user is a product engineer, not an eval runner.
+
+## Anti-leakage rules (user turns)
+
+In **`### Turn N — User`** blockquotes, **do not** use:
+
+- **Option 1 / 2 / 3** (those labels exist only inside the dashboard template; a real user says what they want in plain language).
+- **Turn 0**, **Turn 1**, or any **turn** numbering (that is script metadata).
+- Phrases like **“the instructions you already have”**, **“the full-stack section of the prompt”**, **“follow the Hookdeck Outpost template”** as a stand-in for requirements (the model already has Turn 0; state the *product ask*, not a pointer to a doc section).
+- **“Match the prompt”**, **“dashboard prompt”**, **“eval”**, **“scenario”**, **“success criteria”**, **heuristic names**, **`scoreScenarioNN`**.
+
+**Do** use natural operator language: stack, repo, product behavior, security (key on server), domain topics, README/env, Hookdeck project/topics **as the customer would say them**.
+
+It is fine for **Success criteria**, **Failure modes**, and **Intent** to name `scoreScenarioNN`, Turn 0, Option 3, etc. — those sections are not pasted as the user.
+
+## Alignment without parroting
+
+- **Product bar** (domain publish, topic reconciliation, full-stack UI depth) belongs in **Success criteria** and in the **prompt template** in `hookdeck-outpost-agent-prompt.mdoc`.
+- **User turns** should **request outcomes** (“I need customers to see failed deliveries and retry”) not **cite** where in the template that is spelled out.
+
+If you add a new requirement, update **Success criteria** (and heuristics only when a **durable, low–false-positive** check exists). Do not stuff the verbatim rubric into the user quote.
+
+## Pre-merge checklist (scenarios)
+
+Before merging changes to `scenarios/*.md`:
+
+- [ ] Every **`> ...` user** line reads like a **real customer** message (read aloud test).
+- [ ] No **Option N** / **Turn 0** / **scenario** / **prompt section** leakage in user blockquotes.
+- [ ] **Success criteria** still state the full bar; nothing removed from criteria and only moved into user text.
+- [ ] If integration depth changed, **`src/score-transcript.ts`** and this **README** scenario table are updated when rubrics change.
+
+## Where Cursor loads this
+
+- A **repo-root** [`AGENTS.md`](../../AGENTS.md) points here so agents see this folder’s rules.
+- [`.cursor/rules/agent-evaluation-authoring.mdc`](../../.cursor/rules/agent-evaluation-authoring.mdc) applies when editing paths under `docs/agent-evaluation/`.