From 1638fe15c58c8fbaa42d0d4b22b90101787e2466 Mon Sep 17 00:00:00 2001
From: Phil Leggetter <phil@leggetter.co.uk>
Date: Tue, 7 Apr 2026 13:03:41 +0100
Subject: [PATCH 01/47] docs: add Hookdeck Outpost managed quickstarts and
 agent prompt
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Add self-contained quickstarts for curl, TypeScript, Python, and Go
against the managed API, with Settings → Secrets, env-based examples,
and verification via Hookdeck Console and project logs.

Nest Quickstarts nav under Hookdeck Outpost (above Self-Hosted) and
add an agent prompt template page for dashboard copy/paste.

Include TEMP-hookdeck-outpost-onboarding-status.md for GA tracking.

Made-with: Cursor
---
 ...TEMP-hookdeck-outpost-onboarding-status.md |  29 ++++
 docs/pages/quickstarts.mdx                    |  14 +-
 .../hookdeck-outpost-agent-prompt.mdx         |  68 ++++++++
 .../quickstarts/hookdeck-outpost-curl.mdx     |  96 +++++++++++
 .../pages/quickstarts/hookdeck-outpost-go.mdx | 163 ++++++++++++++++++
 .../quickstarts/hookdeck-outpost-python.mdx   | 134 ++++++++++++++
 .../hookdeck-outpost-typescript.mdx           | 135 +++++++++++++++
 docs/zudoku.config.ts                         |  63 +++++--
 8 files changed, 690 insertions(+), 12 deletions(-)
 create mode 100644 docs/TEMP-hookdeck-outpost-onboarding-status.md
 create mode 100644 docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx
 create mode 100644 docs/pages/quickstarts/hookdeck-outpost-curl.mdx
 create mode 100644 docs/pages/quickstarts/hookdeck-outpost-go.mdx
 create mode 100644 docs/pages/quickstarts/hookdeck-outpost-python.mdx
 create mode 100644 docs/pages/quickstarts/hookdeck-outpost-typescript.mdx

diff --git a/docs/TEMP-hookdeck-outpost-onboarding-status.md b/docs/TEMP-hookdeck-outpost-onboarding-status.md
new file mode 100644
index 000000000..9faa176f0
--- /dev/null
+++ b/docs/TEMP-hookdeck-outpost-onboarding-status.md
@@ -0,0 +1,29 @@
+# Hookdeck Outpost onboarding — status (temporary)
+
+**Purpose:** Track implementation status for the managed quickstarts, agent prompt, and related work. **Delete this file** when tracking moves elsewhere (e.g. Linear, parent epic).
+
+**Last updated:** 2026-04-07
+
+---
+
+## Done (Outpost OSS repo)
+
+- Managed quickstarts: `hookdeck-outpost-curl.mdx`, `-typescript.mdx`, `-python.mdx`, `-go.mdx`
+- Agent prompt template page: `hookdeck-outpost-agent-prompt.mdx`
+- Zudoku sidebar: **Quickstarts → Hookdeck Outpost** (above **Self-Hosted**)
+- `quickstarts.mdx` index: managed vs self-hosted links
+- Content aligned with product copy: API key from **Settings → Secrets**, standard markdown (no `:::tip`), verify via Hookdeck Console + project logs
+- SDK examples: env vars section, numbered quickstart scripts with step comments
+
+## Pending / follow-up
+
+- **QA:** Run TypeScript, Python, and Go examples against live managed API; confirm all doc links resolve on production docs URL
+- **Test destination URL:** When `console.hookdeck.com` (or equivalent) has a stable public URL format, update quickstarts if it replaces “create a Console Source” instructions
+- **Hookdeck Dashboard:** Two-step onboarding (topics → copy agent prompt) with placeholder injection (`{{API_BASE_URL}}`, `{{TOPICS_LIST}}`, `{{TEST_DESTINATION_URL}}`, `{{DOCS_URL}}`, optional `{{LLMS_FULL_URL}}`); env var UI for `OUTPOST_API_KEY` (not in prompt body)
+- **Hookdeck Astro site:** Consume MDX, `llms.txt` / `llms-full.txt` / `.md` exports, canonical `DOCS_URL` (e.g. `https://hookdeck.com/outpost/docs`)
+- **Deferred (not blocking GA):** Broader docs IA (“Self-Hosted” under Guides, redirects for moved pages) per original plan
+
+## References
+
+- OpenAPI / managed base URL: `https://api.outpost.hookdeck.com/2025-07-01` (in `docs/apis/openapi.yaml` `servers`)
+- Agent template source: `docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx`
\ No newline at end of file
diff --git a/docs/pages/quickstarts.mdx b/docs/pages/quickstarts.mdx
index e5a74ee7e..13f6aaa5b 100644
--- a/docs/pages/quickstarts.mdx
+++ b/docs/pages/quickstarts.mdx
@@ -2,7 +2,19 @@
 title: "Outpost Quickstarts"
 ---
 
-Get started with Outpost by following one of the quickstarts:
+## Hookdeck Outpost (managed)
+
+Use Hookdeck’s hosted Outpost API with your dashboard API key and preconfigured topics:
+
+- [curl](/docs/quickstarts/hookdeck-outpost-curl)
+- [TypeScript](/docs/quickstarts/hookdeck-outpost-typescript)
+- [Python](/docs/quickstarts/hookdeck-outpost-python)
+- [Go](/docs/quickstarts/hookdeck-outpost-go)
+- [Agent prompt template](/docs/quickstarts/hookdeck-outpost-agent-prompt) (for AI-assisted integration)
+
+## Self-hosted
+
+Run Outpost in your own infrastructure:
 
 - [Docker with RabbitMQ or AWS SQS via LocalStack](/docs/quickstarts/docker)
 - [Kubernetes with RabbitMQ](/docs/quickstarts/kubernetes)
diff --git a/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx b/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx
new file mode 100644
index 000000000..1f2a3a394
--- /dev/null
+++ b/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx
@@ -0,0 +1,68 @@
+---
+title: "Hookdeck Outpost — agent prompt template"
+description: "Copy-paste template for AI coding agents. Dashboard teams should inject the placeholders server-side or client-side."
+---
+
+This page is a **reference template** for the Hookdeck Outpost onboarding flow. Replace `{{PLACEHOLDERS}}` with values from the operator’s project (or render them in the dashboard). **Do not** put the API key in the prompt; the operator sets `OUTPOST_API_KEY` separately. API keys are created under the Outpost project: **Settings → Secrets** (the same Outpost API key used by the REST API and SDKs).
+
+## Template
+
+```
+## Hookdeck Outpost integration
+
+You are helping integrate Hookdeck Outpost into a platform to deliver events (webhooks and event destinations) to the platform's customers.
+
+### Credentials
+
+- API base URL: {{API_BASE_URL}}
+- API key (Outpost API key from the project **Settings → Secrets**): read from the `OUTPOST_API_KEY` environment variable (never ask the user to paste the key into chat)
+
+### Configured topics
+
+{{TOPICS_LIST}}
+
+### Test destination
+
+Use this URL to verify event delivery (webhook destination): {{TEST_DESTINATION_URL}}
+
+### Documentation
+
+- Getting started (curl): {{DOCS_URL}}/quickstarts/hookdeck-outpost-curl
+- TypeScript quickstart: {{DOCS_URL}}/quickstarts/hookdeck-outpost-typescript
+- Python quickstart: {{DOCS_URL}}/quickstarts/hookdeck-outpost-python
+- Go quickstart: {{DOCS_URL}}/quickstarts/hookdeck-outpost-go
+- Full docs bundle (when available on the public site): {{LLMS_FULL_URL}}
+- API reference: {{DOCS_URL}}/api
+- Destination types: {{DOCS_URL}}/destinations
+- SDK documentation: {{DOCS_URL}}/sdks
+
+### What to do
+
+Ask the user which of the following they want:
+
+1. **Try it out** — Create a minimal script that runs through the full flow: create a tenant, add a webhook destination, publish a test event. Ask which language they prefer (TypeScript, Python, Go, or curl) and follow the matching quickstart doc.
+
+2. **Build a minimal example** — Scaffold a small app with a simple UI that demonstrates tenant creation, destination management, and event publishing. Ask which framework they prefer.
+
+3. **Integrate with an existing app** — Inspect the codebase for language and framework, then integrate Outpost: add the SDK (or use REST), create tenants when customers onboard, and publish events at the right points in application logic.
+
+For all modes, read the relevant quickstart documentation before writing code.
+
+**Concepts:** Each tenant is one of the platform's customers. Destinations are where events are delivered (webhook URLs, queues, etc.). Events are published with a **topic**; only destinations subscribed to that topic receive the event. Topics for this project are listed above and were configured in the Hookdeck dashboard.
+```
+
+## Placeholder reference
+
+| Placeholder | Example | Notes |
+|-------------|---------|--------|
+| `{{API_BASE_URL}}` | `https://api.outpost.hookdeck.com/2025-07-01` | Safe to embed in the prompt |
+| `{{TOPICS_LIST}}` | Bullet list or comma-separated topic names | From dashboard config |
+| `{{TEST_DESTINATION_URL}}` | Unique URL from Hookdeck Console Source, or operator’s test endpoint | May be TBC until `console.hookdeck.com` flow is finalized |
+| `{{DOCS_URL}}` | `https://hookdeck.com/outpost/docs` | Public docs root (no trailing slash) |
+| `{{LLMS_FULL_URL}}` | `https://hookdeck.com/outpost/docs/llms-full.txt` | Optional; omit the line if not live yet |
+
+## Operator checklist (dashboard UI)
+
+- Show **API base URL** and **topics** next to the copyable prompt.
+- Explain that the **API key** is the Outpost API key from **Settings → Secrets**, and show **environment variables**: `OUTPOST_API_KEY` (value with copy button), optional `OUTPOST_API_BASE_URL`, and `OUTPOST_TEST_WEBHOOK_URL` when the quickstart examples need a test webhook URL.
+- Keep the **API key out of the prompt text** to reduce exposure via model logs and chat history.
diff --git a/docs/pages/quickstarts/hookdeck-outpost-curl.mdx b/docs/pages/quickstarts/hookdeck-outpost-curl.mdx
new file mode 100644
index 000000000..c7614b614
--- /dev/null
+++ b/docs/pages/quickstarts/hookdeck-outpost-curl.mdx
@@ -0,0 +1,96 @@
+---
+title: "Hookdeck Outpost Quickstart: curl"
+---
+
+[Hookdeck Outpost](https://outpost.hookdeck.com) is Hookdeck’s managed [Outpost](https://github.com/hookdeck/outpost) service: a control plane and delivery layer for event destinations (webhooks, queues, and more) scoped per **tenant**—each tenant is one of your platform’s customers.
+
+This quickstart uses the REST API with `curl`. Topics are assumed to be configured already in the Hookdeck dashboard; use a topic name that exists there when you publish.
+
+## Prerequisites
+
+- A Hookdeck account with an Outpost project
+- An **API key** (Outpost API key) from your project: **Settings → Secrets**
+- **Topics** already configured in the dashboard (for example `user.created`, `order.completed`)
+- API base URL: `https://api.outpost.hookdeck.com/2025-07-01`
+
+## Set up credentials
+
+In the Hookdeck Dashboard, open your Outpost project, go to **Settings → Secrets**, and create or copy an API key. That value is the same Outpost API key you use for the REST API and the SDKs.
+
+Store the API key and base URL in your shell (or in a `.env` file you `source`):
+
+```sh
+export OUTPOST_API_BASE_URL="https://api.outpost.hookdeck.com/2025-07-01"
+export OUTPOST_API_KEY="your_api_key"
+```
+
+Use them in the requests below as `$OUTPOST_API_BASE_URL` and `$OUTPOST_API_KEY`.
+
+## Create a tenant
+
+Each tenant maps to one of your customers. Pick a stable ID from your own system (for example a team or account ID).
+
+```sh
+TENANT_ID="customer_acme_001"
+
+curl --request PUT "$OUTPOST_API_BASE_URL/tenants/$TENANT_ID" \
+  --header "Authorization: Bearer $OUTPOST_API_KEY"
+```
+
+## Create a webhook destination
+
+Subscribe the tenant to one or more topics you configured in the dashboard. Set `config.url` to an HTTPS endpoint you control.
+
+If you do not have your own endpoint yet, open [Hookdeck Console](https://console.hookdeck.com?ref=outpost-docs), create a **Source**, and paste that Source URL as the webhook URL below (or any HTTPS URL you own). Replace `REPLACE_WITH_YOUR_WEBHOOK_URL` accordingly.
+
+Replace `user.created` with a topic that exists in your project if needed.
+
+```sh
+curl --request POST "$OUTPOST_API_BASE_URL/tenants/$TENANT_ID/destinations" \
+  --header "Authorization: Bearer $OUTPOST_API_KEY" \
+  --header "Content-Type: application/json" \
+  --data '{
+    "type": "webhook",
+    "topics": ["user.created"],
+    "config": {
+      "url": "REPLACE_WITH_YOUR_WEBHOOK_URL"
+    }
+  }'
+```
+
+To receive every configured topic on this destination, set `"topics": ["*"]` instead.
+
+## Publish a test event
+
+Use the same tenant ID and a `topic` that matches both your dashboard configuration and the destination’s `topics`.
+
+```sh
+curl --request POST "$OUTPOST_API_BASE_URL/publish" \
+  --header "Authorization: Bearer $OUTPOST_API_KEY" \
+  --header "Content-Type: application/json" \
+  --data '{
+    "tenant_id": "'"$TENANT_ID"'",
+    "topic": "user.created",
+    "eligible_for_retry": true,
+    "metadata": {
+      "source": "quickstart"
+    },
+    "data": {
+      "user_id": "user_123"
+    }
+  }'
+```
+
+A `202` response means the event was accepted for delivery.
+
+## Verify delivery
+
+- In **Hookdeck Console**, inspect the connection or destination you used (for example the Source you created) and confirm the webhook request and payload look correct.
+- In the **Hookdeck Dashboard**, open **your Outpost project** and review **logs** (and any deliveries or event views your project exposes) to confirm the event was processed and delivered.
+
+## Next steps
+
+- [Destination types](/docs/destinations) — webhooks, AWS SQS, RabbitMQ, Hookdeck, and more
+- [Tenant user portal](/docs/features/tenant-user-portal) — optional UI for tenants to manage their own destinations
+- [SDKs](/docs/sdks) — TypeScript, Python, Go, and others
+- [API reference](/docs/api/authentication) — full REST API
diff --git a/docs/pages/quickstarts/hookdeck-outpost-go.mdx b/docs/pages/quickstarts/hookdeck-outpost-go.mdx
new file mode 100644
index 000000000..c70ff9986
--- /dev/null
+++ b/docs/pages/quickstarts/hookdeck-outpost-go.mdx
@@ -0,0 +1,163 @@
+---
+title: "Hookdeck Outpost Quickstart: Go"
+---
+
+[Hookdeck Outpost](https://outpost.hookdeck.com) is Hookdeck’s managed [Outpost](https://github.com/hookdeck/outpost) service. Use **tenants** for each customer, **destinations** for delivery targets, and **topics** aligned with your dashboard configuration.
+
+## Prerequisites
+
+- A Hookdeck account with an Outpost project
+- An **API key** (Outpost API key) from your project: **Settings → Secrets**
+- **Topics** already configured in the dashboard
+- [Go](https://go.dev/) 1.22+ recommended
+- API base URL: `https://api.outpost.hookdeck.com/2025-07-01`
+
+## Install the SDK
+
+```sh
+go get github.com/hookdeck/outpost/sdks/outpost-go
+```
+
+## Set up credentials
+
+In the Hookdeck Dashboard, open your Outpost project, go to **Settings → Secrets**, and create or copy an API key. Export it (and optionally the base URL) in your shell:
+
+```sh
+export OUTPOST_API_KEY="your_api_key"
+export OUTPOST_API_BASE_URL="https://api.outpost.hookdeck.com/2025-07-01"
+```
+
+If `OUTPOST_API_BASE_URL` is unset, the SDK uses its default production server URL.
+
+## Set environment variables
+
+Set these in the shell where you run `go run .` (or inject them the way your deployment platform expects).
+
+1. **`OUTPOST_API_KEY`** — **Required.** From **Settings → Secrets**. The program exits if it is missing.
+
+2. **`OUTPOST_API_BASE_URL`** — **Optional.** When set, the client is configured with `WithServerURL`. Otherwise the Go SDK uses its default Hookdeck Outpost production URL.
+
+3. **`OUTPOST_TEST_WEBHOOK_URL`** — **Required for this walkthrough.** Webhook destination URL (HTTPS). Use your own server or a [Hookdeck Console](https://console.hookdeck.com?ref=outpost-docs) **Source** URL for a quick test.
+
+## Create and run the quickstart program
+
+Use `main.go` in a small module (after `go get github.com/hookdeck/outpost/sdks/outpost-go`).
+
+The program (**1)** configures the client with your API key, (**2)** upserts a tenant, (**3)** creates a webhook destination for your topic, (**4)** publishes one event, and (**5)** prints ids.
+
+```go
+package main
+
+import (
+	"context"
+	"fmt"
+	"log"
+	"os"
+
+	outpostgo "github.com/hookdeck/outpost/sdks/outpost-go"
+	"github.com/hookdeck/outpost/sdks/outpost-go/models/components"
+)
+
+func main() {
+	ctx := context.Background()
+
+	//
+	// --- 1. Authenticated client (API key from Settings → Secrets) ---
+	//
+
+	apiKey := os.Getenv("OUTPOST_API_KEY")
+	if apiKey == "" {
+		log.Fatal("Set OUTPOST_API_KEY")
+	}
+
+	opts := []outpostgo.SDKOption{outpostgo.WithSecurity(apiKey)}
+	if base := os.Getenv("OUTPOST_API_BASE_URL"); base != "" {
+		opts = append(opts, outpostgo.WithServerURL(base))
+	}
+
+	s := outpostgo.New(opts...)
+
+	//
+	// --- 2. Tenant id, topic name, and webhook URL (from env) ---
+	//
+	// tenantID = one of your customers in Outpost.
+	// topic    = must match a topic configured in the dashboard.
+	//
+
+	tenantID := "customer_acme_001"
+	topic := "user.created"
+
+	webhookURL := os.Getenv("OUTPOST_TEST_WEBHOOK_URL")
+	if webhookURL == "" {
+		log.Fatal("Set OUTPOST_TEST_WEBHOOK_URL (e.g. a Hookdeck Console Source URL)")
+	}
+
+	//
+	// --- 3. Create or update the tenant ---
+	//
+
+	if _, err := s.Tenants.Upsert(ctx, tenantID, nil); err != nil {
+		log.Fatal(err)
+	}
+
+	//
+	// --- 4. Webhook destination: events on `topic` are POSTed to this URL ---
+	//
+
+	destBody := components.CreateDestinationCreateWebhook(
+		components.DestinationCreateWebhook{
+			Topics: components.CreateTopicsArrayOfStr([]string{topic}),
+			Config: components.WebhookConfig{URL: webhookURL},
+		},
+	)
+
+	createRes, err := s.Destinations.Create(ctx, tenantID, destBody)
+	if err != nil {
+		log.Fatal(err)
+	}
+
+	if createRes != nil && createRes.GetDestinationWebhook() != nil {
+		fmt.Println("Destination id:", createRes.GetDestinationWebhook().GetID())
+	}
+
+	//
+	// --- 5. Publish one event ---
+	//
+
+	pubRes, err := s.Publish.Event(ctx, components.PublishRequest{
+		TenantID:         outpostgo.String(tenantID),
+		Topic:            outpostgo.String(topic),
+		EligibleForRetry: outpostgo.Bool(true),
+		Metadata:         map[string]string{"source": "quickstart"},
+		Data:             map[string]any{"user_id": "user_123"},
+	})
+
+	if err != nil {
+		log.Fatal(err)
+	}
+
+	if pubRes != nil && pubRes.GetPublishResponse() != nil {
+		fmt.Println("Published event id:", pubRes.GetPublishResponse().GetID())
+	}
+}
+```
+
+Run:
+
+```sh
+go run .
+```
+
+For all topics on that destination, use `components.CreateTopicsTopicsEnum(components.TopicsEnumWildcard)` instead of `CreateTopicsArrayOfStr`.
+
+## Verify delivery
+
+- In **Hookdeck Console**, confirm the webhook hit your test URL.
+- In the **Hookdeck Dashboard**, open **your Outpost project** and review **logs** to confirm the event was processed and delivered.
+
+## Next steps
+
+- [Destination types](/docs/destinations)
+- [Tenant user portal](/docs/features/tenant-user-portal)
+- [SDKs](/docs/sdks)
+- [API reference](/docs/api/authentication)
diff --git a/docs/pages/quickstarts/hookdeck-outpost-python.mdx b/docs/pages/quickstarts/hookdeck-outpost-python.mdx
new file mode 100644
index 000000000..f49fd28f1
--- /dev/null
+++ b/docs/pages/quickstarts/hookdeck-outpost-python.mdx
@@ -0,0 +1,134 @@
+---
+title: "Hookdeck Outpost Quickstart: Python"
+---
+
+[Hookdeck Outpost](https://outpost.hookdeck.com) is Hookdeck’s managed [Outpost](https://github.com/hookdeck/outpost) service. Each **tenant** is one of your customers; **destinations** receive events; **topics** must match what you configured in the dashboard.
+
+## Prerequisites
+
+- A Hookdeck account with an Outpost project
+- An **API key** (Outpost API key) from your project: **Settings → Secrets**
+- **Topics** already configured in the dashboard
+- Python 3.9+ recommended
+- API base URL: `https://api.outpost.hookdeck.com/2025-07-01`
+
+## Install the SDK
+
+```sh
+pip install outpost_sdk
+```
+
+## Set up credentials
+
+In the Hookdeck Dashboard, open your Outpost project, go to **Settings → Secrets**, and create or copy an API key. Export it (and optionally the base URL) in your shell:
+
+```sh
+export OUTPOST_API_KEY="your_api_key"
+export OUTPOST_API_BASE_URL="https://api.outpost.hookdeck.com/2025-07-01"
+```
+
+The SDK defaults to the production API base URL when `server_url` is omitted.
+
+## Set environment variables
+
+Set these in the same shell before you run the script (or load them with your preferred `.env` helper).
+
+1. **`OUTPOST_API_KEY`** — **Required.** From **Settings → Secrets**. Without it the script exits, because every API call must be authenticated.
+
+2. **`OUTPOST_API_BASE_URL`** — **Optional.** Passed through as `server_url` on the client. Omit it to use the SDK default production URL for Hookdeck Outpost.
+
+3. **`OUTPOST_TEST_WEBHOOK_URL`** — **Required for this walkthrough.** Webhook destinations need an HTTPS URL. Use your own endpoint or a [Hookdeck Console](https://console.hookdeck.com?ref=outpost-docs) **Source** URL for a quick, no-server test.
+
+## Create and run the quickstart script
+
+Save as `outpost_quickstart.py`.
+
+The script (**1)** creates an authenticated client, (**2)** upserts a tenant, (**3)** creates a webhook destination subscribed to your topic, (**4)** publishes one test event, and (**5)** prints the event id.
+
+```python
+import os
+
+from outpost_sdk import Outpost
+
+#
+# --- 1. Authenticated client (API key from Settings → Secrets) ---
+#
+
+api_key = os.environ.get("OUTPOST_API_KEY")
+if not api_key:
+    raise SystemExit("Set OUTPOST_API_KEY")
+
+base_url = os.environ.get("OUTPOST_API_BASE_URL")
+client = Outpost(api_key=api_key, server_url=base_url)
+
+#
+# --- 2. Tenant id, topic name, and webhook URL (from env) ---
+#
+# tenant_id = one of your customers in Outpost.
+# topic     = must match a topic configured in the dashboard.
+#
+
+tenant_id = "customer_acme_001"
+topic = "user.created"
+
+webhook_url = os.environ.get("OUTPOST_TEST_WEBHOOK_URL")
+if not webhook_url:
+    raise SystemExit(
+        "Set OUTPOST_TEST_WEBHOOK_URL (e.g. a Hookdeck Console Source URL)"
+    )
+
+#
+# --- 3. Create or update the tenant ---
+#
+
+client.tenants.upsert(tenant_id=tenant_id)
+
+#
+# --- 4. Webhook destination: events on `topic` are POSTed to this URL ---
+#
+
+client.destinations.create(
+    tenant_id=tenant_id,
+    body={
+        "type": "webhook",
+        "topics": [topic],
+        "config": {"url": webhook_url},
+    },
+)
+
+#
+# --- 5. Publish one event ---
+#
+
+published = client.publish.event(
+    request={
+        "tenant_id": tenant_id,
+        "topic": topic,
+        "eligible_for_retry": True,
+        "metadata": {"source": "quickstart"},
+        "data": {"user_id": "user_123"},
+    }
+)
+
+print("Published event id:", published.id)
+```
+
+Run:
+
+```sh
+python outpost_quickstart.py
+```
+
+Use `topics: ["*"]` on the destination to receive all configured topics.
+
+## Verify delivery
+
+- In **Hookdeck Console**, confirm the webhook hit your test URL.
+- In the **Hookdeck Dashboard**, open **your Outpost project** and review **logs** to confirm the event was processed and delivered.
+
+## Next steps
+
+- [Destination types](/docs/destinations)
+- [Tenant user portal](/docs/features/tenant-user-portal)
+- [SDKs](/docs/sdks)
+- [API reference](/docs/api/authentication)
diff --git a/docs/pages/quickstarts/hookdeck-outpost-typescript.mdx b/docs/pages/quickstarts/hookdeck-outpost-typescript.mdx
new file mode 100644
index 000000000..a3bbfe04a
--- /dev/null
+++ b/docs/pages/quickstarts/hookdeck-outpost-typescript.mdx
@@ -0,0 +1,135 @@
+---
+title: "Hookdeck Outpost Quickstart: TypeScript"
+---
+
+[Hookdeck Outpost](https://outpost.hookdeck.com) is Hookdeck’s managed [Outpost](https://github.com/hookdeck/outpost) service. Each **tenant** represents one of your platform’s customers; **destinations** are where events are delivered; **topics** route events to the right destinations.
+
+This quickstart uses the official TypeScript SDK. Configure **topics** in the Hookdeck dashboard before publishing—use a topic name that exists there in the code below.
+
+## Prerequisites
+
+- A Hookdeck account with an Outpost project
+- An **API key** (Outpost API key) from your project: **Settings → Secrets**
+- **Topics** already configured in the dashboard
+- [Node.js](https://nodejs.org/) 18+ recommended
+- API base URL: `https://api.outpost.hookdeck.com/2025-07-01`
+
+## Install the SDK
+
+```sh
+npm install @hookdeck/outpost-sdk
+```
+
+## Set up credentials
+
+In the Hookdeck Dashboard, open your Outpost project, go to **Settings → Secrets**, and create or copy an API key. Export it (and optionally the base URL) in your shell:
+
+```sh
+export OUTPOST_API_KEY="your_api_key"
+export OUTPOST_API_BASE_URL="https://api.outpost.hookdeck.com/2025-07-01"
+```
+
+The SDK defaults to the production API base URL, so `OUTPOST_API_BASE_URL` is only needed if you want to be explicit or point at another environment.
+
+## Set environment variables
+
+Before you run the quickstart script, define these in the same terminal session (or load them from a `.env` file if your tooling supports it).
+
+1. **`OUTPOST_API_KEY`** — **Required.** Copy the Outpost API key from **Settings → Secrets** in your project. The script passes this to the SDK as the Bearer token. Without it, the script stops with an error.
+
+2. **`OUTPOST_API_BASE_URL`** — **Optional.** Only set this if you need to override the API host. For Hookdeck Outpost you can omit it entirely: the SDK already uses `https://api.outpost.hookdeck.com/2025-07-01`.
+
+3. **`OUTPOST_TEST_WEBHOOK_URL`** — **Required for this walkthrough.** The script creates a webhook destination, which must point at an HTTPS URL. Easiest path: open [Hookdeck Console](https://console.hookdeck.com?ref=outpost-docs), create a **Source**, copy its URL, and assign it to this variable so you can see the webhook payload without deploying your own server.
+
+## Create and run the quickstart script
+
+Save the following as `outpost-quickstart.ts`.
+
+The script (**1)** builds an authenticated SDK client, (**2)** ensures a tenant exists, (**3)** adds a webhook destination subscribed to your topic, (**4)** publishes one test event, and (**5)** prints the event id.
+
+```typescript
+import { Outpost } from "@hookdeck/outpost-sdk";
+
+//
+// --- 1. Authenticated client (API key from Settings → Secrets) ---
+//
+
+const apiKey = process.env.OUTPOST_API_KEY;
+if (!apiKey) {
+  throw new Error("Set OUTPOST_API_KEY");
+}
+
+const outpost = new Outpost({
+  apiKey,
+  ...(process.env.OUTPOST_API_BASE_URL
+    ? { serverURL: process.env.OUTPOST_API_BASE_URL }
+    : {}),
+});
+
+//
+// --- 2. Tenant id, topic name, and webhook URL (from env) ---
+//
+// tenantId = one of your customers in Outpost.
+// topic    = must match a topic configured in the dashboard.
+//
+
+const tenantId = "customer_acme_001";
+const topic = "user.created";
+
+const webhookUrl = process.env.OUTPOST_TEST_WEBHOOK_URL;
+if (!webhookUrl) {
+  throw new Error(
+    "Set OUTPOST_TEST_WEBHOOK_URL to an HTTPS endpoint (e.g. a Hookdeck Console Source URL)",
+  );
+}
+
+//
+// --- 3. Create or update the tenant ---
+//
+
+await outpost.tenants.upsert(tenantId);
+
+//
+// --- 4. Webhook destination: Outpost delivers events on `topic` to this URL ---
+//
+
+await outpost.destinations.create(tenantId, {
+  type: "webhook",
+  topics: [topic],
+  config: { url: webhookUrl },
+});
+
+//
+// --- 5. Publish one event (delivered to destinations subscribed to `topic`) ---
+//
+
+const published = await outpost.publish.event({
+  tenantId,
+  topic,
+  eligibleForRetry: true,
+  metadata: { source: "quickstart" },
+  data: { user_id: "user_123" },
+});
+
+console.log("Published event id:", published.id);
+```
+
+Run:
+
+```sh
+npx tsx outpost-quickstart.ts
+```
+
+To subscribe the destination to all topics, pass `topics: ["*"]` instead of `[topic]`.
+
+## Verify delivery
+
+- In **Hookdeck Console**, inspect the Source or connection you used for `OUTPOST_TEST_WEBHOOK_URL` and confirm the webhook request arrived as expected.
+- In the **Hookdeck Dashboard**, open **your Outpost project** and review **logs** to confirm the event was processed and delivered.
+
+## Next steps
+
+- [Destination types](/docs/destinations)
+- [Tenant user portal](/docs/features/tenant-user-portal)
+- [SDKs](/docs/sdks)
+- [API reference](/docs/api/authentication)
diff --git a/docs/zudoku.config.ts b/docs/zudoku.config.ts
index 1687bd4c9..ec7164478 100644
--- a/docs/zudoku.config.ts
+++ b/docs/zudoku.config.ts
@@ -86,19 +86,60 @@ const config: ZudokuConfig = {
         collapsible: false,
         items: [
           {
-            type: "doc",
-            label: "Docker",
-            id: "quickstarts/docker",
+            type: "category",
+            label: "Hookdeck Outpost",
+            collapsed: false,
+            collapsible: true,
+            items: [
+              {
+                type: "doc",
+                label: "curl",
+                id: "quickstarts/hookdeck-outpost-curl",
+              },
+              {
+                type: "doc",
+                label: "TypeScript",
+                id: "quickstarts/hookdeck-outpost-typescript",
+              },
+              {
+                type: "doc",
+                label: "Python",
+                id: "quickstarts/hookdeck-outpost-python",
+              },
+              {
+                type: "doc",
+                label: "Go",
+                id: "quickstarts/hookdeck-outpost-go",
+              },
+              {
+                type: "doc",
+                label: "Agent prompt",
+                id: "quickstarts/hookdeck-outpost-agent-prompt",
+              },
+            ],
           },
           {
-            type: "doc",
-            label: "Kubernetes",
-            id: "quickstarts/kubernetes",
-          },
-          {
-            type: "doc",
-            label: "Railway",
-            id: "quickstarts/railway",
+            type: "category",
+            label: "Self-Hosted",
+            collapsed: false,
+            collapsible: true,
+            items: [
+              {
+                type: "doc",
+                label: "Docker",
+                id: "quickstarts/docker",
+              },
+              {
+                type: "doc",
+                label: "Kubernetes",
+                id: "quickstarts/kubernetes",
+              },
+              {
+                type: "doc",
+                label: "Railway",
+                id: "quickstarts/railway",
+              },
+            ],
           },
         ],
       },

From e0897218161fde31573856e5926d2b5900a56520 Mon Sep 17 00:00:00 2001
From: Phil Leggetter <phil@leggetter.co.uk>
Date: Wed, 8 Apr 2026 11:23:18 +0100
Subject: [PATCH 02/47] docs: add Outpost agent evaluation harness and
 scenarios

- Claude Agent SDK runner with explicit --scenario/--scenarios/--all, per-run workspace
- Heuristic + LLM scoring vs scenario Success criteria; score-transcript 01-10
- Scenarios: basics, minimal apps, existing-app integration baselines
- CI slice (eval:ci), SCENARIO-RUN-TRACKER, prompt template Files on disk guidance
- Allow committing docs/**/.env.example under docs/.gitignore
- TEMP status and README updates

Made-with: Cursor
---
 docs/.gitignore                               |    1 +
 ...TEMP-hookdeck-outpost-onboarding-status.md |   87 +-
 docs/agent-evaluation/.env.example            |   31 +
 docs/agent-evaluation/README.md               |  197 ++
 docs/agent-evaluation/SCENARIO-RUN-TRACKER.md |   53 +
 docs/agent-evaluation/SKILL-UPSTREAM-NOTES.md |   22 +
 .../fixtures/placeholder-values-for-turn0.md  |   27 +
 docs/agent-evaluation/package-lock.json       | 2096 +++++++++++++++++
 docs/agent-evaluation/package.json            |   25 +
 docs/agent-evaluation/results/.gitignore      |    5 +
 docs/agent-evaluation/results/README.md       |   57 +
 .../results/RUN-RECORDING.template.md         |   36 +
 .../scenarios/01-basics-curl.md               |   48 +
 .../scenarios/02-basics-typescript.md         |   45 +
 .../scenarios/03-basics-python.md             |   43 +
 .../scenarios/04-basics-go.md                 |   38 +
 .../scenarios/05-app-nextjs.md                |   58 +
 .../scenarios/06-app-fastapi.md               |   47 +
 .../scenarios/07-app-go-http.md               |   46 +
 .../scenarios/08-integrate-nextjs-existing.md |   59 +
 .../09-integrate-fastapi-existing.md          |   52 +
 .../scenarios/10-integrate-go-existing.md     |   51 +
 docs/agent-evaluation/scripts/ci-eval.sh      |   22 +
 docs/agent-evaluation/scripts/run-scenario.sh |   46 +
 docs/agent-evaluation/src/llm-judge.ts        |  230 ++
 docs/agent-evaluation/src/run-agent-eval.ts   |  527 +++++
 docs/agent-evaluation/src/score-eval.ts       |  183 ++
 docs/agent-evaluation/src/score-transcript.ts | 1119 +++++++++
 docs/agent-evaluation/tsconfig.json           |   15 +
 .../hookdeck-outpost-agent-prompt.mdx         |   16 +-
 30 files changed, 5267 insertions(+), 15 deletions(-)
 create mode 100644 docs/agent-evaluation/.env.example
 create mode 100644 docs/agent-evaluation/README.md
 create mode 100644 docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
 create mode 100644 docs/agent-evaluation/SKILL-UPSTREAM-NOTES.md
 create mode 100644 docs/agent-evaluation/fixtures/placeholder-values-for-turn0.md
 create mode 100644 docs/agent-evaluation/package-lock.json
 create mode 100644 docs/agent-evaluation/package.json
 create mode 100644 docs/agent-evaluation/results/.gitignore
 create mode 100644 docs/agent-evaluation/results/README.md
 create mode 100644 docs/agent-evaluation/results/RUN-RECORDING.template.md
 create mode 100644 docs/agent-evaluation/scenarios/01-basics-curl.md
 create mode 100644 docs/agent-evaluation/scenarios/02-basics-typescript.md
 create mode 100644 docs/agent-evaluation/scenarios/03-basics-python.md
 create mode 100644 docs/agent-evaluation/scenarios/04-basics-go.md
 create mode 100644 docs/agent-evaluation/scenarios/05-app-nextjs.md
 create mode 100644 docs/agent-evaluation/scenarios/06-app-fastapi.md
 create mode 100644 docs/agent-evaluation/scenarios/07-app-go-http.md
 create mode 100644 docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md
 create mode 100644 docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md
 create mode 100644 docs/agent-evaluation/scenarios/10-integrate-go-existing.md
 create mode 100755 docs/agent-evaluation/scripts/ci-eval.sh
 create mode 100755 docs/agent-evaluation/scripts/run-scenario.sh
 create mode 100644 docs/agent-evaluation/src/llm-judge.ts
 create mode 100644 docs/agent-evaluation/src/run-agent-eval.ts
 create mode 100644 docs/agent-evaluation/src/score-eval.ts
 create mode 100644 docs/agent-evaluation/src/score-transcript.ts
 create mode 100644 docs/agent-evaluation/tsconfig.json

diff --git a/docs/.gitignore b/docs/.gitignore
index d777781b5..1f70a5a5a 100644
--- a/docs/.gitignore
+++ b/docs/.gitignore
@@ -27,6 +27,7 @@ yarn-error.log*
 
 # env files (can opt-in for commiting if needed)
 .env*
+!.env.example
 
 # typescript
 *.tsbuildinfo
diff --git a/docs/TEMP-hookdeck-outpost-onboarding-status.md b/docs/TEMP-hookdeck-outpost-onboarding-status.md
index 9faa176f0..1d481b17f 100644
--- a/docs/TEMP-hookdeck-outpost-onboarding-status.md
+++ b/docs/TEMP-hookdeck-outpost-onboarding-status.md
@@ -6,24 +6,93 @@
 
 ---
 
+## Agent eval harness — **implemented**; **prompt validation in progress**
+
+The automated harness in `docs/agent-evaluation/` is in place. **What it does today:**
+
+| Area | Status |
+|------|--------|
+| **Runner** | `src/run-agent-eval.ts` — **## Template** from `hookdeck-outpost-agent-prompt.mdx`, `{{…}}` from env, multi-turn scenarios, **Claude Agent SDK** with **`Read` / `Glob` / `Grep` / `WebFetch` / `Write` / `Edit` / `Bash`**, **`cwd`** = `results/runs/<stamp>-scenario-NN/` |
+| **Artifacts** | `transcript.json`, optional **`heuristic-score.json`** + **`llm-score.json`** (LLM reads each scenario **`## Success criteria`**), agent-written files beside the transcript |
+| **Heuristics** | `score-transcript.ts` — **`scoreScenario01`–`scoreScenario10`** on assistant text + tool corpus (so **Write**/Edit content counts) |
+| **Scenarios** | **01–04:** try-it-out (curl, TS, Python, Go). **05–07:** minimal UIs (Next, FastAPI, Go `net/http`). **08–10:** Option 3 — integrate into pinned repos (Next **`leerob/next-saas-starter`**, FastAPI **`philipokiokio/FastAPI_SAAS_Template`**, Go **`devinterface/startersaas-go-api`**) |
+| **CLI** | **`npm run eval` requires `--scenario`, `--scenarios`, or `--all`** — no accidental full-suite run. Default scoring = **heuristic + LLM judge** unless **`--no-score`** / **`--no-score-llm`** or **`EVAL_NO_SCORE_*`**. **Exit 1** if any enabled score fails |
+| **CI** | **`npm run eval:ci`** = **`--scenarios 01,02`** + heuristic **and** LLM judge. **`scripts/ci-eval.sh`** — requires **`ANTHROPIC_API_KEY`**, **`EVAL_TEST_DESTINATION_URL`** |
+| **Re-score** | `npm run score -- --run <run-dir> [--llm] [--write]` |
+
+**Operational**
+
+- Prefer a normal runner / full permissions for session persistence (`~/.claude/...`); tight sandboxes can break multi-turn resume.
+- **Validate the prompt in stages** (simple → complex); exact commands below.
+
+### Recommended run order (test evals → stress prompt)
+
+Run from **`docs/agent-evaluation/`** with **`.env`** set (**`ANTHROPIC_API_KEY`**, **`EVAL_TEST_DESTINATION_URL`**). Use a normal terminal (not a restricted sandbox) for reliable SDK sessions.
+
+**Stage A — basics (fast, minimal tooling)**
+
+```sh
+npm run eval -- --scenarios 01,02,03,04
+```
+
+**Stage B — minimal example apps**
+
+```sh
+npm run eval -- --scenarios 05,06,07
+```
+
+**Stage C — existing-app integration (clone + integrate; slowest)**
+
+```sh
+npm run eval -- --scenarios 08,09,10
+```
+
+**Full suite (explicit cost)**
+
+```sh
+npm run eval -- --all
+```
+
+After each stage, inspect **`results/runs/<stamp>-scenario-NN/`** (transcript, scores, on-disk artifacts). **Goal:** confirm the **dashboard prompt** + **Success criteria** hold across stacks; **Execution** (live **`OUTPOST_API_KEY`**) remains a separate human step per scenario.
+
+---
+
+## Agent eval automation (original plan — historical)
+
+1. **In-repo runner** — ✅ Node + Agent SDK (not shell-only `curl`).
+2. **Default backend: Anthropic** — ✅ Agent SDK.
+3. **Claude Code CLI** — Optional local path only (unchanged).
+4. **OpenAI adapter** — Still optional / not implemented.
+5. **Judging** — ✅ Transcripts on disk; ✅ heuristics; ✅ LLM-as-judge vs **`## Success criteria`**.
+6. **CI shape** — ✅ `eval:ci` + docs; **GitHub Actions workflow** not committed (add `workflow_dispatch` + secrets when ready).
+
+**Avoid as primary design:** brittle hand-rolled JSON in bash, or CLI-only gates that break for contributors and headless runners.
+
+---
+
 ## Done (Outpost OSS repo)
 
 - Managed quickstarts: `hookdeck-outpost-curl.mdx`, `-typescript.mdx`, `-python.mdx`, `-go.mdx`
-- Agent prompt template page: `hookdeck-outpost-agent-prompt.mdx`
+- Agent prompt template page: `hookdeck-outpost-agent-prompt.mdx` (includes **Files on disk** guidance)
 - Zudoku sidebar: **Quickstarts → Hookdeck Outpost** (above **Self-Hosted**)
 - `quickstarts.mdx` index: managed vs self-hosted links
-- Content aligned with product copy: API key from **Settings → Secrets**, standard markdown (no `:::tip`), verify via Hookdeck Console + project logs
-- SDK examples: env vars section, numbered quickstart scripts with step comments
+- Content aligned with product copy: API key from **Settings → Secrets**, verify via Hookdeck Console + project logs
+- SDK quickstarts: env vars, step-commented scripts
+- **Agent evaluation:** `docs/agent-evaluation/` — scenarios **01–10**, dual scoring, explicit CLI, CI slice, **`SCENARIO-RUN-TRACKER.md`** (per-scenario + execution log), `results/README.md`, `fixtures/`, `SKILL-UPSTREAM-NOTES.md`
 
 ## Pending / follow-up
 
-- **QA:** Run TypeScript, Python, and Go examples against live managed API; confirm all doc links resolve on production docs URL
-- **Test destination URL:** When `console.hookdeck.com` (or equivalent) has a stable public URL format, update quickstarts if it replaces “create a Console Source” instructions
-- **Hookdeck Dashboard:** Two-step onboarding (topics → copy agent prompt) with placeholder injection (`{{API_BASE_URL}}`, `{{TOPICS_LIST}}`, `{{TEST_DESTINATION_URL}}`, `{{DOCS_URL}}`, optional `{{LLMS_FULL_URL}}`); env var UI for `OUTPOST_API_KEY` (not in prompt body)
-- **Hookdeck Astro site:** Consume MDX, `llms.txt` / `llms-full.txt` / `.md` exports, canonical `DOCS_URL` (e.g. `https://hookdeck.com/outpost/docs`)
-- **Deferred (not blocking GA):** Broader docs IA (“Self-Hosted” under Guides, redirects for moved pages) per original plan
+- **Prompt + eval validation (in progress):** Run stages **A → B → C** above (or **`--all`** when deliberate); record pass/fail per scenario; adjust prompt or heuristics if systematic failures appear
+- **hookdeck/agent-skills:** Refresh `skills/outpost/SKILL.md` using `docs/agent-evaluation/SKILL-UPSTREAM-NOTES.md` (managed-first, correct `/tenants/` paths, env naming)
+- **QA:** Run TypeScript, Python, and Go examples against live managed API; confirm production doc links
+- **Test destination URL:** When Console has a stable public URL story, align quickstarts if copy changes
+- **Hookdeck Dashboard:** Two-step onboarding (topics → copy agent prompt) with placeholder injection; env UI for `OUTPOST_API_KEY` (not in prompt body)
+- **Hookdeck Astro site:** MDX, `llms.txt` / `llms-full.txt`, canonical `DOCS_URL`
+- **CI workflow:** Optional GitHub Actions job for `eval:ci` with secrets
+- **Deferred (not blocking GA):** Broader docs IA per original plan
 
 ## References
 
 - OpenAPI / managed base URL: `https://api.outpost.hookdeck.com/2025-07-01` (in `docs/apis/openapi.yaml` `servers`)
-- Agent template source: `docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx`
\ No newline at end of file
+- Agent template source: `docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx`
+- Eval harness: `docs/agent-evaluation/README.md`
diff --git a/docs/agent-evaluation/.env.example b/docs/agent-evaluation/.env.example
new file mode 100644
index 000000000..6f1e3eb48
--- /dev/null
+++ b/docs/agent-evaluation/.env.example
@@ -0,0 +1,31 @@
+# Copy to .env and fill in. .env is gitignored at the repo root.
+
+# Required for npm run eval (Claude Agent SDK — calls Anthropic only)
+ANTHROPIC_API_KEY=
+
+# Required for Turn 0 template (test webhook URL injected into the prompt)
+EVAL_TEST_DESTINATION_URL=
+
+# Strongly recommended for a *full* eval: run the agent’s curl/script/app against a real project.
+# The harness does not read this key; you (or a future verifier) use it after the run.
+# OUTPOST_API_KEY=
+# OUTPOST_API_BASE_URL=https://api.outpost.hookdeck.com/2025-07-01
+# OUTPOST_TEST_WEBHOOK_URL=https://hkdk.events/your-source-id   # often same as EVAL_TEST_DESTINATION_URL
+
+# Optional (see npm run eval -- --help)
+# EVAL_API_BASE_URL=https://api.outpost.hookdeck.com/2025-07-01
+# EVAL_TOPICS_LIST=- user.created
+# EVAL_DOCS_URL=https://outpost.hookdeck.com/docs
+# EVAL_LOCAL_DOCS=1
+# EVAL_LLMS_FULL_URL=
+# Default includes Write, Edit, Bash (per-run workspace + installs). Override to narrow:
+# EVAL_TOOLS=Read,Glob,Grep,WebFetch,Write,Edit,Bash
+# EVAL_MODEL=
+# EVAL_MAX_TURNS=40
+# EVAL_PERMISSION_MODE=dontAsk
+# EVAL_PERSIST_SESSION=true
+
+# Scoring is ON by default after each scenario (heuristic + LLM). Opt out:
+# EVAL_NO_SCORE_HEURISTIC=1
+# EVAL_NO_SCORE_LLM=1
+# EVAL_SCORE_MODEL=claude-sonnet-4-20250514
diff --git a/docs/agent-evaluation/README.md b/docs/agent-evaluation/README.md
new file mode 100644
index 000000000..274921647
--- /dev/null
+++ b/docs/agent-evaluation/README.md
@@ -0,0 +1,197 @@
+# Agent evaluation — Hookdeck Outpost onboarding
+
+This folder contains **manual** scenario specs (markdown) and an **automated** runner that uses the [Claude Agent SDK](https://platform.claude.com/docs/en/agent-sdk/overview) (`src/run-agent-eval.ts`).
+
+## Where success criteria live
+
+| What | Where |
+|------|--------|
+| **Human checklist** (full eval, including execution) | Each file under [`scenarios/`](scenarios/) — section **Success criteria** (static + **Execution (full pass)** rows). |
+| **Manual run write-up** | [`results/RUN-RECORDING.template.md`](results/RUN-RECORDING.template.md) — copy to a local file under `results/` (gitignored). |
+| **Automated transcript rubric** (regex heuristics) | [`src/score-transcript.ts`](src/score-transcript.ts) — `scoreScenario01`–`scoreScenario10` (assistant text + tool-written file corpus). |
+| **LLM judge** (Anthropic vs **`## Success criteria`** in each scenario) | [`src/llm-judge.ts`](src/llm-judge.ts) — runs after each scenario unless **`--no-score-llm`**; also `npm run score -- --llm`. |
+
+**Deliberate scope:** `npm run eval` **requires** **`--scenario`**, **`--scenarios`**, or **`--all`**. There is no silent “run everything” default — you choose the scenarios and accept the cost. After **each** run: **`transcript.json`**, **`heuristic-score.json`**, and **`llm-score.json`** (judge reads the same **Success criteria** as humans). Exit **1** if any enabled score fails.
+
+Opt out of scoring: **`--no-score`** (heuristic only), **`--no-score-llm`** (drops the Success-criteria judge), or **`.env`**: **`EVAL_NO_SCORE_HEURISTIC=1`**, **`EVAL_NO_SCORE_LLM=1`**. Transcript-only: **`npm run eval -- --no-score --no-score-llm`**.
+
+Each scenario run uses one directory:
+
+`results/runs/<ISO-stamp>-scenario-NN/`
+
+- **`transcript.json`** — full SDK log  
+- **`heuristic-score.json`** / **`llm-score.json`** — by default (unless disabled above)  
+- **Agent-written files** — the SDK **`cwd`** is this directory. Defaults include **`Write`**, **`Edit`**, and **`Bash`** for clones, installs, and generated code.
+
+Re-score a finished run without re-invoking the agent:
+
+- **`npm run score -- --run results/runs/<dir>`** — heuristic (add **`--llm`** for LLM only, **`--write`** to persist sidecars).
+
+Legacy flat files `*-scenario-NN.json` next to `runs/` are still accepted by **`npm run score`** for older runs.
+
+**Execution** (live Outpost) is still not auto-verified; the LLM is instructed to set `execution_in_transcript.pass` to **null** unless the transcript itself reports HTTP results.
+
+## Automated runs (Claude Agent SDK)
+
+From `docs/agent-evaluation/`:
+
+```sh
+npm install
+cp .env.example .env   # then edit: ANTHROPIC_API_KEY, EVAL_TEST_DESTINATION_URL, …
+npm run eval -- --scenario 01
+npm run eval -- --scenarios 01,02,08
+npm run eval -- --all   # explicit full suite (every scenario file)
+npm run eval:ci         # same as --scenarios 01,02 + heuristic + LLM judge (see § CI)
+npm run eval -- --dry-run
+```
+
+The runner loads **`docs/agent-evaluation/.env`** automatically (via `dotenv`). Shell exports still override `.env` if both are set.
+
+### CI (recommended slice)
+
+For **pull-request or main-branch** automation, run **two** scenarios only:
+
+| Scenario | Why |
+|----------|-----|
+| **01** (curl) | Shortest path: managed API, tenant → destination → publish, no `npm install` / framework scaffold. Cheap signal that the prompt + heuristics still align with the curl quickstart. |
+| **02** (TypeScript) | Most common integration style: **`@hookdeck/outpost-sdk`**, env vars, same API flow in code. Still much faster than **05** (Next.js) or **08** (clone a full SaaS repo). |
+
+**Commands:**
+
+```sh
+cd docs/agent-evaluation && npm ci && npm run eval:ci
+# or: ./scripts/ci-eval.sh   # requires ANTHROPIC_API_KEY + EVAL_TEST_DESTINATION_URL in the environment
+```
+
+`eval:ci` is **`npm run eval -- --scenarios 01,02`**: both **heuristic** checks and the **LLM judge** (grounded in each scenario’s **`## Success criteria`**). Skipping the judge would leave you with regex-only signal, which does not encode the product checklist.
+
+**GitHub Actions:** add repository secrets **`ANTHROPIC_API_KEY`** and **`EVAL_TEST_DESTINATION_URL`**, run from `docs/agent-evaluation` with a normal runner (Claude Agent SDK needs session filesystem access — avoid tight sandboxes; see **Permissions / failures** above). **`OUTPOST_API_KEY`** is still not required for transcript-only CI.
+
+- **`ANTHROPIC_API_KEY`** — required for the agent and for the **LLM judge** (Success criteria) after each scenario you run.
+- **`EVAL_TEST_DESTINATION_URL`** — required for Turn 0; same Source URL as `{{TEST_DESTINATION_URL}}`.
+- **`OUTPOST_API_KEY`** — **not** read by the automated runner, but **required if you want a full evaluation**: without it you can only judge the transcript (plausible curl/SDK text). To verify that **generated commands or code actually work**, put the same Outpost API key you use against the managed API in **`docs/agent-evaluation/.env`** (or export it) and run the agent’s output against a real project. The onboarding prompt tells operators to keep that key in **`.env`** and never paste it into chat.
+- **`EVAL_LOCAL_DOCS=1`** — before public docs are live, set this so Turn 0 replaces public doc URLs with **absolute paths to MDX/OpenAPI files in this repo** (so the agent should use **Read** on local files instead of WebFetch to production).
+
+- **Turn 0** text is built from [`hookdeck-outpost-agent-prompt.mdx`](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx) (`## Template`) with placeholders filled from environment variables.
+- Transcripts are written to `results/runs/<stamp>-scenario-NN/transcript.json` (gitignored).
+
+See `npm run eval -- --help` for env vars (`EVAL_TOOLS`, `EVAL_MODEL`, etc.).
+
+### Permissions / failures (why a run might not work)
+
+Two different things get called “permissions”:
+
+1. **Cursor (or CI) sandbox and `tsx`** — The `tsx` **CLI** opens an IPC pipe in `/tmp` (or similar), which some sandboxes block (`listen EPERM`). This repo’s `npm run eval` uses **`node --import tsx`** instead so Node loads the tsx **loader** only (no CLI IPC). If you still see EPERM, run the same command in a normal terminal outside the sandbox, or use `npm run eval:tsx-cli` only where IPC is allowed.
+
+2. **Claude Agent SDK `dontAsk` + `allowedTools`** — In `dontAsk` mode, tools **not** listed in `allowedTools` are denied (no prompt). Defaults include **`Write`**, **`Edit`**, and **`Bash`** so app scenarios can scaffold and install dependencies inside the per-run directory. With **`EVAL_LOCAL_DOCS=1`**: **`Read,Glob,Grep,Write,Edit,Bash`**. Otherwise **`Read,Glob,Grep,WebFetch,Write,Edit,Bash`**. Narrow **`EVAL_TOOLS`** only if you need a stricter harness (e.g. transcript-only, no shell).
+
+Changing **`EVAL_PERMISSION_MODE`** is usually unnecessary; widening **`EVAL_TOOLS`** (or using local docs) fixes most tool denials.
+
+### Transcript vs execution (full pass)
+
+`npm run eval` only captures **what the model produced**; it does **not** call Outpost. Treat that as **transcript review**.
+
+A **full pass** also answers: *did the generated curl / script / app succeed against a live Outpost project?* Each scenario’s **Success criteria** ends with **Execution** checkboxes for that step. To run them:
+
+1. Add **`OUTPOST_API_KEY`** (and **`OUTPOST_TEST_WEBHOOK_URL`** / **`OUTPOST_API_BASE_URL`** when the artifact expects them) to `docs/agent-evaluation/.env` so your shell has them after `dotenv` or when you `source` / copy into the directory where you run the code.
+2. Run the agent’s commands or start its app and complete the flows the scenario describes.
+3. Record pass/fail in your run notes ([`results/RUN-RECORDING.template.md`](results/RUN-RECORDING.template.md)).
+
+## Single source of truth for the dashboard prompt
+
+The **full prompt template** (the text operators paste as Turn 0) lives in **one** place:
+
+**[`docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx`](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx)** — use the fenced block under **## Template**.
+
+For eval runs, example placeholder substitutions (non-secret) are in [`fixtures/placeholder-values-for-turn0.md`](fixtures/placeholder-values-for-turn0.md) only. That file intentionally **does not** duplicate the template.
+
+The Hookdeck dashboard should eventually render the **same** template body from product-side source; until then, this MDX page is the documentation canonical copy.
+
+## How to run an evaluation (manual)
+
+1. **Turn 0:** Open the [agent prompt MDX](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx), copy **## Template**, replace `{{…}}` (see [placeholder examples](fixtures/placeholder-values-for-turn0.md)).
+2. **Pick a scenario:** e.g. [`scenarios/01-basics-curl.md`](scenarios/01-basics-curl.md).
+3. **New agent thread:** Paste Turn 0, then follow each **Turn N — User** line from the scenario verbatim (or as specified).
+4. **Judge output:** Use the scenario’s **Success criteria** checkboxes (human decision).
+5. **Record:** Copy [`results/RUN-RECORDING.template.md`](results/RUN-RECORDING.template.md) to a local filename under `results/` (see [`results/README.md`](results/README.md)); those files are **gitignored** by default.
+
+### Helper script (optional)
+
+From the repo root:
+
+```sh
+./docs/agent-evaluation/scripts/run-scenario.sh 01
+```
+
+This **only prints** paths and reminders. It does **not** start an agent or call OpenAI/Anthropic/etc.
+
+## Judging results
+
+- **Automated runs:** use **Success criteria** in each `scenarios/*.md` (definition of pass). Each **`npm run eval -- --scenario|scenarios|all`** run applies **heuristic + LLM** scorers unless you pass **`--no-score`** / **`--no-score-llm`**; **Execution** rows stay manual unless you add a verifier.
+- **Manual runs** use the checklist in [`results/RUN-RECORDING.template.md`](results/RUN-RECORDING.template.md).
+
+There is still **no single portable “IDE agent” CLI** for all vendors; the SDK runner is the supported path for headless Anthropic-based CI.
+
+## Measuring scenarios
+
+| Layer | What it answers | Where |
+|--------|-----------------|--------|
+| **Definition** | What “good” means (product + transcript) | **`## Success criteria`** in each [`scenarios/*.md`](scenarios/) |
+| **Heuristic** | Fast, deterministic signal from transcript JSON | [`src/score-transcript.ts`](src/score-transcript.ts) — combines assistant text with **Write/Edit tool inputs** and tool results so on-disk artifacts count |
+| **LLM judge** | Structured pass/fail vs the same **Success criteria** | After each scenario when **`--no-score-llm`** is not set; or `npm run score -- --run <dir> --llm` — [`src/llm-judge.ts`](src/llm-judge.ts) |
+| **Execution** | Live API / app smoke test | Human (or future script); not automated here |
+
+**Heuristic functions** (failed checks set **`npm run eval`** / **`npm run score`** exit **1** when that scorer ran):
+
+| Scenario | Function | Topics covered (summary) |
+|----------|----------|---------------------------|
+| 01 | `scoreScenario01` | Managed URL, tenant PUT, webhook destination POST, publish `data`, no key leak, optional verify turn |
+| 02 | `scoreScenario02` | TS SDK, `Outpost`, env key, tenants/destinations/publish, webhook env, run command |
+| 03 | `scoreScenario03` | Python SDK import, client, same API calls, env, webhook URL |
+| 04 | `scoreScenario04` | Go module, `New`/`WithSecurity`, Upsert/Create/Publish, env, webhook URL |
+| 05 | `scoreScenario05` | Next.js signals, TS SDK, API routes, two flows, server env key, no `NEXT_PUBLIC_` key, README, optional stress-turn Hookdeck hint |
+| 06 | `scoreScenario06` | FastAPI, `outpost_sdk`, uvicorn, server env, two flows, README, webhook docs |
+| 07 | `scoreScenario07` | `net/http`, Go SDK + `CreateDestinationCreateWebhook`, HTML UI, two flows, `go run`, README |
+| 08 | `scoreScenario08` | Clone **next-saas-starter** (or git baseline), TS SDK, publish/destinations/tenants, server env key, per-customer webhook story |
+| 09 | `scoreScenario09` | Clone **FastAPI_SAAS_Template** (or git baseline), `outpost_sdk`, integration + domain hook, env key |
+| 10 | `scoreScenario10` | Clone **startersaas-go-api** (or git baseline), Go Outpost SDK, publish + handler hook, env key |
+
+Export **`SCENARIO_IDS_WITH_HEURISTIC_RUBRIC`** in `score-transcript.ts` lists IDs **01–10** for tooling.
+
+## Scenarios
+
+To record each **`npm run eval -- --scenario …`** run, automated scores, and **whether you ran the generated code** with `OUTPOST_API_KEY`, use **[`SCENARIO-RUN-TRACKER.md`](SCENARIO-RUN-TRACKER.md)** (committed; not under `results/`, which is gitignored).
+
+| ID | File | Goal |
+|----|------|------|
+| 1 | [scenarios/01-basics-curl.md](scenarios/01-basics-curl.md) | Minimal **curl** only (managed API). |
+| 2 | [scenarios/02-basics-typescript.md](scenarios/02-basics-typescript.md) | Minimal **TypeScript** script (`@hookdeck/outpost-sdk`). |
+| 3 | [scenarios/03-basics-python.md](scenarios/03-basics-python.md) | Minimal **Python** script (`outpost_sdk`). |
+| 4 | [scenarios/04-basics-go.md](scenarios/04-basics-go.md) | Minimal **Go** program (`outpost-go`). |
+| 5 | [scenarios/05-app-nextjs.md](scenarios/05-app-nextjs.md) | Small **Next.js** app: UI to register a webhook destination and trigger a test publish. |
+| 6 | [scenarios/06-app-fastapi.md](scenarios/06-app-fastapi.md) | Small **FastAPI** app with the same UX as scenario 5. |
+| 7 | [scenarios/07-app-go-http.md](scenarios/07-app-go-http.md) | Small **Go** `net/http` app + simple HTML UI (same UX as scenario 5). |
+| 8 | [scenarios/08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md) | **Existing Next.js SaaS** baseline — add outbound webhooks via Outpost ([leerob/next-saas-starter](https://github.com/leerob/next-saas-starter)). |
+| 9 | [scenarios/09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) | **Existing FastAPI SaaS** baseline — Outpost integration ([philipokiokio/FastAPI_SAAS_Template](https://github.com/philipokiokio/FastAPI_SAAS_Template)). |
+| 10 | [scenarios/10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md) | **Existing Go SaaS API** baseline — Outpost integration ([devinterface/startersaas-go-api](https://github.com/devinterface/startersaas-go-api)). |
+
+Scenarios **1–4** align with **“Try it out”**; **5–7** with **“Build a minimal example”**; **8–10** with **“Integrate with an existing app”** using pinned OSS baselines (Java / .NET can be added later the same way).
+
+## Agent skills recommendation
+
+**Recommend yes** for teams standardizing on Hookdeck’s skill pack: the [outpost skill](https://github.com/hookdeck/agent-skills/tree/main/skills/outpost) gives agents a consistent overview (tenants, destinations, topics, curl shape) and links into docs.
+
+**Caveats (update the skill in `hookdeck/agent-skills`, not in this repo):**
+
+1. **Managed-first** — The published skill is still **self-hosted heavy** (Docker block first; managed is a short table). For Hookdeck Outpost GA, the skill should foreground [managed quickstarts](../pages/quickstarts/hookdeck-outpost-curl.mdx), `https://api.outpost.hookdeck.com/2025-07-01`, **Settings → Secrets**, and `OUTPOST_API_KEY` / optional `OUTPOST_API_BASE_URL` to match product copy.
+2. **REST paths** — Examples must use **`/tenants/{id}`**, not `PUT $BASE_URL/$TENANT_ID` (that path is wrong for the real API).
+3. **Naming** — Align env var naming with docs (`OUTPOST_API_KEY` or documented dashboard name), not ad-hoc `HOOKDECK_API_KEY` unless the dashboard literally uses that string.
+4. **Router vs. deep skills** — Today `outpost` is one monolithic `SKILL.md`. The skill itself mentions **future** destination-specific skills (`outpost-webhooks`, etc.). For scale, consider either **sections** with clear headings or **child skills** (e.g. `outpost-managed-quickstart`, `outpost-self-hosted`) once content grows—without forcing users to install many tiles for the common case.
+
+Until the skill is updated, agents should still be pointed at the **quickstart MDX pages** in this repo (or production docs URLs); the skill is supplementary.
+
+## Related docs
+
+- [Agent prompt template (SSoT)](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx)
+- [Upstream skill notes](SKILL-UPSTREAM-NOTES.md)
+- [TEMP tracking note](../TEMP-hookdeck-outpost-onboarding-status.md)
diff --git a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
new file mode 100644
index 000000000..ac620193f
--- /dev/null
+++ b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
@@ -0,0 +1,53 @@
+# Scenario run tracker
+
+Use this table while you **run scenarios one at a time** and **execute the generated artifacts** against a real Outpost project.
+
+## How to use
+
+1. **Automated agent eval** (from `docs/agent-evaluation/`):
+
+   ```sh
+   npm run eval -- --scenario <NN>
+   ```
+
+   Each run creates **`results/runs/<ISO-stamp>-scenario-<NN>/`** with `transcript.json`, `heuristic-score.json`, `llm-score.json`, and whatever the agent wrote (scripts, apps, clones).
+
+2. **Fill the table:** paste or note the **run directory** (stamp), mark **Heuristic** / **LLM** pass or fail (from the sidecars or console).
+
+3. **Execution (generated code):** with **`OUTPOST_API_KEY`** (and **`OUTPOST_TEST_WEBHOOK_URL`** / **`OUTPOST_API_BASE_URL`** if needed) in your shell or `.env`, run the artifact the scenario expects — e.g. `bash outpost-quickstart.sh`, `npx tsx …`, `python …`, `go run …`, `npm run dev` in the generated app folder. Mark **Pass** / **Fail** / **Skip** and add **Notes** (HTTP status, delivery in Hookdeck Console, etc.).
+
+4. **Optional:** copy a row to your local run log under `results/` if you use `RUN-RECORDING.template.md`.
+
+---
+
+## Tracker
+
+| ID | Scenario file | Run directory (`results/runs/…`) | Heuristic | LLM judge | Execution (generated code) | Notes |
+|----|---------------|-----------------------------------|-----------|-----------|----------------------------|-------|
+| 01 | [01-basics-curl.md](scenarios/01-basics-curl.md) | | | | | |
+| 02 | [02-basics-typescript.md](scenarios/02-basics-typescript.md) | | | | | |
+| 03 | [03-basics-python.md](scenarios/03-basics-python.md) | | | | | |
+| 04 | [04-basics-go.md](scenarios/04-basics-go.md) | | | | | |
+| 05 | [05-app-nextjs.md](scenarios/05-app-nextjs.md) | | | | | |
+| 06 | [06-app-fastapi.md](scenarios/06-app-fastapi.md) | | | | | |
+| 07 | [07-app-go-http.md](scenarios/07-app-go-http.md) | | | | | |
+| 08 | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md) | | | | | |
+| 09 | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) | | | | | |
+| 10 | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md) | | | | | |
+
+### Column hints
+
+| Column | Meaning |
+|--------|---------|
+| **Run directory** | e.g. `2026-04-07T15-00-00-000Z-scenario-01` — the folder containing `transcript.json` |
+| **Heuristic** | `heuristic-score.json` → `overallTranscriptPass` (or `passed`/`total`) |
+| **LLM judge** | `llm-score.json` → `overall_transcript_pass` |
+| **Execution** | Your smoke test of the **produced** script/app with real credentials — **not** automated by `npm run eval` |
+
+### Status legend (suggested)
+
+Use short text or symbols in cells, e.g. **Pass** / **Fail** / **Skip** / **N/A**, or ✅ / ❌ / —
+
+---
+
+Full harness docs: [README.md](README.md).
diff --git a/docs/agent-evaluation/SKILL-UPSTREAM-NOTES.md b/docs/agent-evaluation/SKILL-UPSTREAM-NOTES.md
new file mode 100644
index 000000000..6c8de7367
--- /dev/null
+++ b/docs/agent-evaluation/SKILL-UPSTREAM-NOTES.md
@@ -0,0 +1,22 @@
+# Notes for updating `hookdeck/agent-skills` — `skills/outpost`
+
+Apply these in the **[agent-skills](https://github.com/hookdeck/agent-skills)** repository, not in Outpost OSS.
+
+## Recommended direction
+
+1. **Lead with managed Hookdeck Outpost** — Link prominently to managed quickstarts (curl, TypeScript, Python, Go) and `https://api.outpost.hookdeck.com/2025-07-01`.
+2. **Fix REST examples** — Tenant upsert must be `PUT {base}/tenants/{tenant_id}`, not `PUT {base}/{tenant_id}`.
+3. **Align env naming** — Match product/docs: Outpost API key from project **Settings → Secrets**, typically loaded as `OUTPOST_API_KEY` in examples; avoid introducing `HOOKDECK_API_KEY` unless the dashboard literally uses that name.
+4. **Self-hosted section** — Keep Docker/Kubernetes/Railway as a secondary path with `http://localhost:3333/api/v1` and correct `/tenants/...` paths.
+5. **Optional: split later** — If the file grows, add `outpost-managed.md` / `outpost-self-hosted.md` fragments or separate skills; keep the default tile entrypoint short.
+
+## Concrete issues in current `SKILL.md` (as of fetch against `main`)
+
+- **Wrong curl path:** `curl -X PUT "$BASE_URL/$TENANT_ID"` should target `/tenants/$TENANT_ID` relative to the API base (managed base has no `/api/v1` prefix).
+- **Managed auth row** — Verify exact dashboard copy for secret name and env var conventions; link to Hookdeck Outpost project settings, not only generic dashboard secrets if URLs differ.
+- **Tile summary** — `tile.json` says “self-hosted relay”; managed Outpost should be reflected in the summary string when GA positioning is final.
+
+## Cross-links from this repo
+
+- Onboarding prompt template: `docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx`
+- Manual agent eval harness: `docs/agent-evaluation/README.md`
\ No newline at end of file
diff --git a/docs/agent-evaluation/fixtures/placeholder-values-for-turn0.md b/docs/agent-evaluation/fixtures/placeholder-values-for-turn0.md
new file mode 100644
index 000000000..39d344677
--- /dev/null
+++ b/docs/agent-evaluation/fixtures/placeholder-values-for-turn0.md
@@ -0,0 +1,27 @@
+# Placeholder values for Turn 0 (eval / local testing)
+
+The **prompt template itself** lives in one place only:
+
+**[`hookdeck-outpost-agent-prompt.mdx`](../../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx)** (from repo root: `docs/pages/quickstarts/...`) — copy the fenced block under **## Template**, then replace each `{{PLACEHOLDER}}` using the table below.
+
+Do **not** paste real API keys into chat. Have operators put `OUTPOST_API_KEY` in a project **`.env`** (or another loader), not in the agent transcript. Use a throwaway Hookdeck project when possible.
+
+For **`npm run eval -- --scenario …`** (or **`--scenarios`** / **`--all`**), the runner only needs **`ANTHROPIC_API_KEY`** and **`EVAL_TEST_DESTINATION_URL`**. To score a **full** eval (generated commands/code actually work), you still need **`OUTPOST_API_KEY`** (and usually **`OUTPOST_TEST_WEBHOOK_URL`**) when you **execute** the agent’s output afterward. Optional **`EVAL_LOCAL_DOCS=1`** points Turn 0 at repo paths instead of live `{{DOCS_URL}}` links.
+
+---
+
+## Example substitutions (non-secret)
+
+| Placeholder | Example |
+|-------------|---------|
+| `{{API_BASE_URL}}` | `https://api.outpost.hookdeck.com/2025-07-01` |
+| `{{TOPICS_LIST}}` | `- user.created` |
+| `{{TEST_DESTINATION_URL}}` | Hookdeck Console **Source** URL the dashboard feeds in (for automated evals, set `EVAL_TEST_DESTINATION_URL` to the same value). Example: `https://hkdk.events/...` |
+| `{{DOCS_URL}}` | `https://outpost.hookdeck.com/docs` (local Zudoku: same paths under `/docs`) |
+| `{{LLMS_FULL_URL}}` | Omit the line in the template if unused, or your public `llms-full.txt` URL |
+
+---
+
+## Dashboard implementation note
+
+When this text is embedded in the Hookdeck product, the **same** template body should be rendered from one dashboard/backend source so docs and product stay aligned. The MDX page in this repo is the documentation **canonical** copy until product source is wired to match it.
diff --git a/docs/agent-evaluation/package-lock.json b/docs/agent-evaluation/package-lock.json
new file mode 100644
index 000000000..12d5ab75e
--- /dev/null
+++ b/docs/agent-evaluation/package-lock.json
@@ -0,0 +1,2096 @@
+{
+  "name": "outpost-agent-evaluation",
+  "version": "1.0.0",
+  "lockfileVersion": 3,
+  "requires": true,
+  "packages": {
+    "": {
+      "name": "outpost-agent-evaluation",
+      "version": "1.0.0",
+      "dependencies": {
+        "@anthropic-ai/claude-agent-sdk": "^0.2.92",
+        "dotenv": "^16.4.7"
+      },
+      "devDependencies": {
+        "tsx": "^4.19.4",
+        "typescript": "^5.8.3"
+      },
+      "engines": {
+        "node": ">=18"
+      }
+    },
+    "node_modules/@anthropic-ai/claude-agent-sdk": {
+      "version": "0.2.92",
+      "resolved": "https://registry.npmjs.org/@anthropic-ai/claude-agent-sdk/-/claude-agent-sdk-0.2.92.tgz",
+      "integrity": "sha512-loYyxVUC5gBwHjGi9Fv0b84mduJTp9Z3Pum+y/7IVQDb4NynKfVQl6l4VeDKZaW+1QTQtd25tY4hwUznD7Krqw==",
+      "license": "SEE LICENSE IN README.md",
+      "dependencies": {
+        "@anthropic-ai/sdk": "^0.80.0",
+        "@modelcontextprotocol/sdk": "^1.27.1"
+      },
+      "engines": {
+        "node": ">=18.0.0"
+      },
+      "optionalDependencies": {
+        "@img/sharp-darwin-arm64": "^0.34.2",
+        "@img/sharp-darwin-x64": "^0.34.2",
+        "@img/sharp-linux-arm": "^0.34.2",
+        "@img/sharp-linux-arm64": "^0.34.2",
+        "@img/sharp-linux-x64": "^0.34.2",
+        "@img/sharp-linuxmusl-arm64": "^0.34.2",
+        "@img/sharp-linuxmusl-x64": "^0.34.2",
+        "@img/sharp-win32-arm64": "^0.34.2",
+        "@img/sharp-win32-x64": "^0.34.2"
+      },
+      "peerDependencies": {
+        "zod": "^4.0.0"
+      }
+    },
+    "node_modules/@anthropic-ai/sdk": {
+      "version": "0.80.0",
+      "resolved": "https://registry.npmjs.org/@anthropic-ai/sdk/-/sdk-0.80.0.tgz",
+      "integrity": "sha512-WeXLn7zNVk3yjeshn+xZHvld6AoFUOR3Sep6pSoHho5YbSi6HwcirqgPA5ccFuW8QTVJAAU7N8uQQC6Wa9TG+g==",
+      "license": "MIT",
+      "dependencies": {
+        "json-schema-to-ts": "^3.1.1"
+      },
+      "bin": {
+        "anthropic-ai-sdk": "bin/cli"
+      },
+      "peerDependencies": {
+        "zod": "^3.25.0 || ^4.0.0"
+      },
+      "peerDependenciesMeta": {
+        "zod": {
+          "optional": true
+        }
+      }
+    },
+    "node_modules/@babel/runtime": {
+      "version": "7.29.2",
+      "resolved": "https://registry.npmjs.org/@babel/runtime/-/runtime-7.29.2.tgz",
+      "integrity": "sha512-JiDShH45zKHWyGe4ZNVRrCjBz8Nh9TMmZG1kh4QTK8hCBTWBi8Da+i7s1fJw7/lYpM4ccepSNfqzZ/QvABBi5g==",
+      "license": "MIT",
+      "engines": {
+        "node": ">=6.9.0"
+      }
+    },
+    "node_modules/@esbuild/aix-ppc64": {
+      "version": "0.27.7",
+      "resolved": "https://registry.npmjs.org/@esbuild/aix-ppc64/-/aix-ppc64-0.27.7.tgz",
+      "integrity": "sha512-EKX3Qwmhz1eMdEJokhALr0YiD0lhQNwDqkPYyPhiSwKrh7/4KRjQc04sZ8db+5DVVnZ1LmbNDI1uAMPEUBnQPg==",
+      "cpu": [
+        "ppc64"
+      ],
+      "dev": true,
+      "license": "MIT",
+      "optional": true,
+      "os": [
+        "aix"
+      ],
+      "engines": {
+        "node": ">=18"
+      }
+    },
+    "node_modules/@esbuild/android-arm": {
+      "version": "0.27.7",
+      "resolved": "https://registry.npmjs.org/@esbuild/android-arm/-/android-arm-0.27.7.tgz",
+      "integrity": "sha512-jbPXvB4Yj2yBV7HUfE2KHe4GJX51QplCN1pGbYjvsyCZbQmies29EoJbkEc+vYuU5o45AfQn37vZlyXy4YJ8RQ==",
+      "cpu": [
+        "arm"
+      ],
+      "dev": true,
+      "license": "MIT",
+      "optional": true,
+      "os": [
+        "android"
+      ],
+      "engines": {
+        "node": ">=18"
+      }
+    },
+    "node_modules/@esbuild/android-arm64": {
+      "version": "0.27.7",
+      "resolved": "https://registry.npmjs.org/@esbuild/android-arm64/-/android-arm64-0.27.7.tgz",
+      "integrity": "sha512-62dPZHpIXzvChfvfLJow3q5dDtiNMkwiRzPylSCfriLvZeq0a1bWChrGx/BbUbPwOrsWKMn8idSllklzBy+dgQ==",
+      "cpu": [
+        "arm64"
+      ],
+      "dev": true,
+      "license": "MIT",
+      "optional": true,
+      "os": [
+        "android"
+      ],
+      "engines": {
+        "node": ">=18"
+      }
+    },
+    "node_modules/@esbuild/android-x64": {
+      "version": "0.27.7",
+      "resolved": "https://registry.npmjs.org/@esbuild/android-x64/-/android-x64-0.27.7.tgz",
+      "integrity": "sha512-x5VpMODneVDb70PYV2VQOmIUUiBtY3D3mPBG8NxVk5CogneYhkR7MmM3yR/uMdITLrC1ml/NV1rj4bMJuy9MCg==",
+      "cpu": [
+        "x64"
+      ],
+      "dev": true,
+      "license": "MIT",
+      "optional": true,
+      "os": [
+        "android"
+      ],
+      "engines": {
+        "node": ">=18"
+      }
+    },
+    "node_modules/@esbuild/darwin-arm64": {
+      "version": "0.27.7",
+      "resolved": "https://registry.npmjs.org/@esbuild/darwin-arm64/-/darwin-arm64-0.27.7.tgz",
+      "integrity": "sha512-5lckdqeuBPlKUwvoCXIgI2D9/ABmPq3Rdp7IfL70393YgaASt7tbju3Ac+ePVi3KDH6N2RqePfHnXkaDtY9fkw==",
+      "cpu": [
+        "arm64"
+      ],
+      "dev": true,
+      "license": "MIT",
+      "optional": true,
+      "os": [
+        "darwin"
+      ],
+      "engines": {
+        "node": ">=18"
+      }
+    },
+    "node_modules/@esbuild/darwin-x64": {
+      "version": "0.27.7",
+      "resolved": "https://registry.npmjs.org/@esbuild/darwin-x64/-/darwin-x64-0.27.7.tgz",
+      "integrity": "sha512-rYnXrKcXuT7Z+WL5K980jVFdvVKhCHhUwid+dDYQpH+qu+TefcomiMAJpIiC2EM3Rjtq0sO3StMV/+3w3MyyqQ==",
+      "cpu": [
+        "x64"
+      ],
+      "dev": true,
+      "license": "MIT",
+      "optional": true,
+      "os": [
+        "darwin"
+      ],
+      "engines": {
+        "node": ">=18"
+      }
+    },
+    "node_modules/@esbuild/freebsd-arm64": {
+      "version": "0.27.7",
+      "resolved": "https://registry.npmjs.org/@esbuild/freebsd-arm64/-/freebsd-arm64-0.27.7.tgz",
+      "integrity": "sha512-B48PqeCsEgOtzME2GbNM2roU29AMTuOIN91dsMO30t+Ydis3z/3Ngoj5hhnsOSSwNzS+6JppqWsuhTp6E82l2w==",
+      "cpu": [
+        "arm64"
+      ],
+      "dev": true,
+      "license": "MIT",
+      "optional": true,
+      "os": [
+        "freebsd"
+      ],
+      "engines": {
+        "node": ">=18"
+      }
+    },
+    "node_modules/@esbuild/freebsd-x64": {
+      "version": "0.27.7",
+      "resolved": "https://registry.npmjs.org/@esbuild/freebsd-x64/-/freebsd-x64-0.27.7.tgz",
+      "integrity": "sha512-jOBDK5XEjA4m5IJK3bpAQF9/Lelu/Z9ZcdhTRLf4cajlB+8VEhFFRjWgfy3M1O4rO2GQ/b2dLwCUGpiF/eATNQ==",
+      "cpu": [
+        "x64"
+      ],
+      "dev": true,
+      "license": "MIT",
+      "optional": true,
+      "os": [
+        "freebsd"
+      ],
+      "engines": {
+        "node": ">=18"
+      }
+    },
+    "node_modules/@esbuild/linux-arm": {
+      "version": "0.27.7",
+      "resolved": "https://registry.npmjs.org/@esbuild/linux-arm/-/linux-arm-0.27.7.tgz",
+      "integrity": "sha512-RkT/YXYBTSULo3+af8Ib0ykH8u2MBh57o7q/DAs3lTJlyVQkgQvlrPTnjIzzRPQyavxtPtfg0EopvDyIt0j1rA==",
+      "cpu": [
+        "arm"
+      ],
+      "dev": true,
+      "license": "MIT",
+      "optional": true,
+      "os": [
+        "linux"
+      ],
+      "engines": {
+        "node": ">=18"
+      }
+    },
+    "node_modules/@esbuild/linux-arm64": {
+      "version": "0.27.7",
+      "resolved": "https://registry.npmjs.org/@esbuild/linux-arm64/-/linux-arm64-0.27.7.tgz",
+      "integrity": "sha512-RZPHBoxXuNnPQO9rvjh5jdkRmVizktkT7TCDkDmQ0W2SwHInKCAV95GRuvdSvA7w4VMwfCjUiPwDi0ZO6Nfe9A==",
+      "cpu": [
+        "arm64"
+      ],
+      "dev": true,
+      "license": "MIT",
+      "optional": true,
+      "os": [
+        "linux"
+      ],
+      "engines": {
+        "node": ">=18"
+      }
+    },
+    "node_modules/@esbuild/linux-ia32": {
+      "version": "0.27.7",
+      "resolved": "https://registry.npmjs.org/@esbuild/linux-ia32/-/linux-ia32-0.27.7.tgz",
+      "integrity": "sha512-GA48aKNkyQDbd3KtkplYWT102C5sn/EZTY4XROkxONgruHPU72l+gW+FfF8tf2cFjeHaRbWpOYa/uRBz/Xq1Pg==",
+      "cpu": [
+        "ia32"
+      ],
+      "dev": true,
+      "license": "MIT",
+      "optional": true,
+      "os": [
+        "linux"
+      ],
+      "engines": {
+        "node": ">=18"
+      }
+    },
+    "node_modules/@esbuild/linux-loong64": {
+      "version": "0.27.7",
+      "resolved": "https://registry.npmjs.org/@esbuild/linux-loong64/-/linux-loong64-0.27.7.tgz",
+      "integrity": "sha512-a4POruNM2oWsD4WKvBSEKGIiWQF8fZOAsycHOt6JBpZ+JN2n2JH9WAv56SOyu9X5IqAjqSIPTaJkqN8F7XOQ5Q==",
+      "cpu": [
+        "loong64"
+      ],
+      "dev": true,
+      "license": "MIT",
+      "optional": true,
+      "os": [
+        "linux"
+      ],
+      "engines": {
+        "node": ">=18"
+      }
+    },
+    "node_modules/@esbuild/linux-mips64el": {
+      "version": "0.27.7",
+      "resolved": "https://registry.npmjs.org/@esbuild/linux-mips64el/-/linux-mips64el-0.27.7.tgz",
+      "integrity": "sha512-KabT5I6StirGfIz0FMgl1I+R1H73Gp0ofL9A3nG3i/cYFJzKHhouBV5VWK1CSgKvVaG4q1RNpCTR2LuTVB3fIw==",
+      "cpu": [
+        "mips64el"
+      ],
+      "dev": true,
+      "license": "MIT",
+      "optional": true,
+      "os": [
+        "linux"
+      ],
+      "engines": {
+        "node": ">=18"
+      }
+    },
+    "node_modules/@esbuild/linux-ppc64": {
+      "version": "0.27.7",
+      "resolved": "https://registry.npmjs.org/@esbuild/linux-ppc64/-/linux-ppc64-0.27.7.tgz",
+      "integrity": "sha512-gRsL4x6wsGHGRqhtI+ifpN/vpOFTQtnbsupUF5R5YTAg+y/lKelYR1hXbnBdzDjGbMYjVJLJTd2OFmMewAgwlQ==",
+      "cpu": [
+        "ppc64"
+      ],
+      "dev": true,
+      "license": "MIT",
+      "optional": true,
+      "os": [
+        "linux"
+      ],
+      "engines": {
+        "node": ">=18"
+      }
+    },
+    "node_modules/@esbuild/linux-riscv64": {
+      "version": "0.27.7",
+      "resolved": "https://registry.npmjs.org/@esbuild/linux-riscv64/-/linux-riscv64-0.27.7.tgz",
+      "integrity": "sha512-hL25LbxO1QOngGzu2U5xeXtxXcW+/GvMN3ejANqXkxZ/opySAZMrc+9LY/WyjAan41unrR3YrmtTsUpwT66InQ==",
+      "cpu": [
+        "riscv64"
+      ],
+      "dev": true,
+      "license": "MIT",
+      "optional": true,
+      "os": [
+        "linux"
+      ],
+      "engines": {
+        "node": ">=18"
+      }
+    },
+    "node_modules/@esbuild/linux-s390x": {
+      "version": "0.27.7",
+      "resolved": "https://registry.npmjs.org/@esbuild/linux-s390x/-/linux-s390x-0.27.7.tgz",
+      "integrity": "sha512-2k8go8Ycu1Kb46vEelhu1vqEP+UeRVj2zY1pSuPdgvbd5ykAw82Lrro28vXUrRmzEsUV0NzCf54yARIK8r0fdw==",
+      "cpu": [
+        "s390x"
+      ],
+      "dev": true,
+      "license": "MIT",
+      "optional": true,
+      "os": [
+        "linux"
+      ],
+      "engines": {
+        "node": ">=18"
+      }
+    },
+    "node_modules/@esbuild/linux-x64": {
+      "version": "0.27.7",
+      "resolved": "https://registry.npmjs.org/@esbuild/linux-x64/-/linux-x64-0.27.7.tgz",
+      "integrity": "sha512-hzznmADPt+OmsYzw1EE33ccA+HPdIqiCRq7cQeL1Jlq2gb1+OyWBkMCrYGBJ+sxVzve2ZJEVeePbLM2iEIZSxA==",
+      "cpu": [
+        "x64"
+      ],
+      "dev": true,
+      "license": "MIT",
+      "optional": true,
+      "os": [
+        "linux"
+      ],
+      "engines": {
+        "node": ">=18"
+      }
+    },
+    "node_modules/@esbuild/netbsd-arm64": {
+      "version": "0.27.7",
+      "resolved": "https://registry.npmjs.org/@esbuild/netbsd-arm64/-/netbsd-arm64-0.27.7.tgz",
+      "integrity": "sha512-b6pqtrQdigZBwZxAn1UpazEisvwaIDvdbMbmrly7cDTMFnw/+3lVxxCTGOrkPVnsYIosJJXAsILG9XcQS+Yu6w==",
+      "cpu": [
+        "arm64"
+      ],
+      "dev": true,
+      "license": "MIT",
+      "optional": true,
+      "os": [
+        "netbsd"
+      ],
+      "engines": {
+        "node": ">=18"
+      }
+    },
+    "node_modules/@esbuild/netbsd-x64": {
+      "version": "0.27.7",
+      "resolved": "https://registry.npmjs.org/@esbuild/netbsd-x64/-/netbsd-x64-0.27.7.tgz",
+      "integrity": "sha512-OfatkLojr6U+WN5EDYuoQhtM+1xco+/6FSzJJnuWiUw5eVcicbyK3dq5EeV/QHT1uy6GoDhGbFpprUiHUYggrw==",
+      "cpu": [
+        "x64"
+      ],
+      "dev": true,
+      "license": "MIT",
+      "optional": true,
+      "os": [
+        "netbsd"
+      ],
+      "engines": {
+        "node": ">=18"
+      }
+    },
+    "node_modules/@esbuild/openbsd-arm64": {
+      "version": "0.27.7",
+      "resolved": "https://registry.npmjs.org/@esbuild/openbsd-arm64/-/openbsd-arm64-0.27.7.tgz",
+      "integrity": "sha512-AFuojMQTxAz75Fo8idVcqoQWEHIXFRbOc1TrVcFSgCZtQfSdc1RXgB3tjOn/krRHENUB4j00bfGjyl2mJrU37A==",
+      "cpu": [
+        "arm64"
+      ],
+      "dev": true,
+      "license": "MIT",
+      "optional": true,
+      "os": [
+        "openbsd"
+      ],
+      "engines": {
+        "node": ">=18"
+      }
+    },
+    "node_modules/@esbuild/openbsd-x64": {
+      "version": "0.27.7",
+      "resolved": "https://registry.npmjs.org/@esbuild/openbsd-x64/-/openbsd-x64-0.27.7.tgz",
+      "integrity": "sha512-+A1NJmfM8WNDv5CLVQYJ5PshuRm/4cI6WMZRg1by1GwPIQPCTs1GLEUHwiiQGT5zDdyLiRM/l1G0Pv54gvtKIg==",
+      "cpu": [
+        "x64"
+      ],
+      "dev": true,
+      "license": "MIT",
+      "optional": true,
+      "os": [
+        "openbsd"
+      ],
+      "engines": {
+        "node": ">=18"
+      }
+    },
+    "node_modules/@esbuild/openharmony-arm64": {
+      "version": "0.27.7",
+      "resolved": "https://registry.npmjs.org/@esbuild/openharmony-arm64/-/openharmony-arm64-0.27.7.tgz",
+      "integrity": "sha512-+KrvYb/C8zA9CU/g0sR6w2RBw7IGc5J2BPnc3dYc5VJxHCSF1yNMxTV5LQ7GuKteQXZtspjFbiuW5/dOj7H4Yw==",
+      "cpu": [
+        "arm64"
+      ],
+      "dev": true,
+      "license": "MIT",
+      "optional": true,
+      "os": [
+        "openharmony"
+      ],
+      "engines": {
+        "node": ">=18"
+      }
+    },
+    "node_modules/@esbuild/sunos-x64": {
+      "version": "0.27.7",
+      "resolved": "https://registry.npmjs.org/@esbuild/sunos-x64/-/sunos-x64-0.27.7.tgz",
+      "integrity": "sha512-ikktIhFBzQNt/QDyOL580ti9+5mL/YZeUPKU2ivGtGjdTYoqz6jObj6nOMfhASpS4GU4Q/Clh1QtxWAvcYKamA==",
+      "cpu": [
+        "x64"
+      ],
+      "dev": true,
+      "license": "MIT",
+      "optional": true,
+      "os": [
+        "sunos"
+      ],
+      "engines": {
+        "node": ">=18"
+      }
+    },
+    "node_modules/@esbuild/win32-arm64": {
+      "version": "0.27.7",
+      "resolved": "https://registry.npmjs.org/@esbuild/win32-arm64/-/win32-arm64-0.27.7.tgz",
+      "integrity": "sha512-7yRhbHvPqSpRUV7Q20VuDwbjW5kIMwTHpptuUzV+AA46kiPze5Z7qgt6CLCK3pWFrHeNfDd1VKgyP4O+ng17CA==",
+      "cpu": [
+        "arm64"
+      ],
+      "dev": true,
+      "license": "MIT",
+      "optional": true,
+      "os": [
+        "win32"
+      ],
+      "engines": {
+        "node": ">=18"
+      }
+    },
+    "node_modules/@esbuild/win32-ia32": {
+      "version": "0.27.7",
+      "resolved": "https://registry.npmjs.org/@esbuild/win32-ia32/-/win32-ia32-0.27.7.tgz",
+      "integrity": "sha512-SmwKXe6VHIyZYbBLJrhOoCJRB/Z1tckzmgTLfFYOfpMAx63BJEaL9ExI8x7v0oAO3Zh6D/Oi1gVxEYr5oUCFhw==",
+      "cpu": [
+        "ia32"
+      ],
+      "dev": true,
+      "license": "MIT",
+      "optional": true,
+      "os": [
+        "win32"
+      ],
+      "engines": {
+        "node": ">=18"
+      }
+    },
+    "node_modules/@esbuild/win32-x64": {
+      "version": "0.27.7",
+      "resolved": "https://registry.npmjs.org/@esbuild/win32-x64/-/win32-x64-0.27.7.tgz",
+      "integrity": "sha512-56hiAJPhwQ1R4i+21FVF7V8kSD5zZTdHcVuRFMW0hn753vVfQN8xlx4uOPT4xoGH0Z/oVATuR82AiqSTDIpaHg==",
+      "cpu": [
+        "x64"
+      ],
+      "dev": true,
+      "license": "MIT",
+      "optional": true,
+      "os": [
+        "win32"
+      ],
+      "engines": {
+        "node": ">=18"
+      }
+    },
+    "node_modules/@hono/node-server": {
+      "version": "1.19.13",
+      "resolved": "https://registry.npmjs.org/@hono/node-server/-/node-server-1.19.13.tgz",
+      "integrity": "sha512-TsQLe4i2gvoTtrHje625ngThGBySOgSK3Xo2XRYOdqGN1teR8+I7vchQC46uLJi8OF62YTYA3AhSpumtkhsaKQ==",
+      "license": "MIT",
+      "engines": {
+        "node": ">=18.14.1"
+      },
+      "peerDependencies": {
+        "hono": "^4"
+      }
+    },
+    "node_modules/@img/sharp-darwin-arm64": {
+      "version": "0.34.5",
+      "resolved": "https://registry.npmjs.org/@img/sharp-darwin-arm64/-/sharp-darwin-arm64-0.34.5.tgz",
+      "integrity": "sha512-imtQ3WMJXbMY4fxb/Ndp6HBTNVtWCUI0WdobyheGf5+ad6xX8VIDO8u2xE4qc/fr08CKG/7dDseFtn6M6g/r3w==",
+      "cpu": [
+        "arm64"
+      ],
+      "license": "Apache-2.0",
+      "optional": true,
+      "os": [
+        "darwin"
+      ],
+      "engines": {
+        "node": "^18.17.0 || ^20.3.0 || >=21.0.0"
+      },
+      "funding": {
+        "url": "https://opencollective.com/libvips"
+      },
+      "optionalDependencies": {
+        "@img/sharp-libvips-darwin-arm64": "1.2.4"
+      }
+    },
+    "node_modules/@img/sharp-darwin-x64": {
+      "version": "0.34.5",
+      "resolved": "https://registry.npmjs.org/@img/sharp-darwin-x64/-/sharp-darwin-x64-0.34.5.tgz",
+      "integrity": "sha512-YNEFAF/4KQ/PeW0N+r+aVVsoIY0/qxxikF2SWdp+NRkmMB7y9LBZAVqQ4yhGCm/H3H270OSykqmQMKLBhBJDEw==",
+      "cpu": [
+        "x64"
+      ],
+      "license": "Apache-2.0",
+      "optional": true,
+      "os": [
+        "darwin"
+      ],
+      "engines": {
+        "node": "^18.17.0 || ^20.3.0 || >=21.0.0"
+      },
+      "funding": {
+        "url": "https://opencollective.com/libvips"
+      },
+      "optionalDependencies": {
+        "@img/sharp-libvips-darwin-x64": "1.2.4"
+      }
+    },
+    "node_modules/@img/sharp-libvips-darwin-arm64": {
+      "version": "1.2.4",
+      "resolved": "https://registry.npmjs.org/@img/sharp-libvips-darwin-arm64/-/sharp-libvips-darwin-arm64-1.2.4.tgz",
+      "integrity": "sha512-zqjjo7RatFfFoP0MkQ51jfuFZBnVE2pRiaydKJ1G/rHZvnsrHAOcQALIi9sA5co5xenQdTugCvtb1cuf78Vf4g==",
+      "cpu": [
+        "arm64"
+      ],
+      "license": "LGPL-3.0-or-later",
+      "optional": true,
+      "os": [
+        "darwin"
+      ],
+      "funding": {
+        "url": "https://opencollective.com/libvips"
+      }
+    },
+    "node_modules/@img/sharp-libvips-darwin-x64": {
+      "version": "1.2.4",
+      "resolved": "https://registry.npmjs.org/@img/sharp-libvips-darwin-x64/-/sharp-libvips-darwin-x64-1.2.4.tgz",
+      "integrity": "sha512-1IOd5xfVhlGwX+zXv2N93k0yMONvUlANylbJw1eTah8K/Jtpi15KC+WSiaX/nBmbm2HxRM1gZ0nSdjSsrZbGKg==",
+      "cpu": [
+        "x64"
+      ],
+      "license": "LGPL-3.0-or-later",
+      "optional": true,
+      "os": [
+        "darwin"
+      ],
+      "funding": {
+        "url": "https://opencollective.com/libvips"
+      }
+    },
+    "node_modules/@img/sharp-libvips-linux-arm": {
+      "version": "1.2.4",
+      "resolved": "https://registry.npmjs.org/@img/sharp-libvips-linux-arm/-/sharp-libvips-linux-arm-1.2.4.tgz",
+      "integrity": "sha512-bFI7xcKFELdiNCVov8e44Ia4u2byA+l3XtsAj+Q8tfCwO6BQ8iDojYdvoPMqsKDkuoOo+X6HZA0s0q11ANMQ8A==",
+      "cpu": [
+        "arm"
+      ],
+      "license": "LGPL-3.0-or-later",
+      "optional": true,
+      "os": [
+        "linux"
+      ],
+      "funding": {
+        "url": "https://opencollective.com/libvips"
+      }
+    },
+    "node_modules/@img/sharp-libvips-linux-arm64": {
+      "version": "1.2.4",
+      "resolved": "https://registry.npmjs.org/@img/sharp-libvips-linux-arm64/-/sharp-libvips-linux-arm64-1.2.4.tgz",
+      "integrity": "sha512-excjX8DfsIcJ10x1Kzr4RcWe1edC9PquDRRPx3YVCvQv+U5p7Yin2s32ftzikXojb1PIFc/9Mt28/y+iRklkrw==",
+      "cpu": [
+        "arm64"
+      ],
+      "license": "LGPL-3.0-or-later",
+      "optional": true,
+      "os": [
+        "linux"
+      ],
+      "funding": {
+        "url": "https://opencollective.com/libvips"
+      }
+    },
+    "node_modules/@img/sharp-libvips-linux-x64": {
+      "version": "1.2.4",
+      "resolved": "https://registry.npmjs.org/@img/sharp-libvips-linux-x64/-/sharp-libvips-linux-x64-1.2.4.tgz",
+      "integrity": "sha512-tJxiiLsmHc9Ax1bz3oaOYBURTXGIRDODBqhveVHonrHJ9/+k89qbLl0bcJns+e4t4rvaNBxaEZsFtSfAdquPrw==",
+      "cpu": [
+        "x64"
+      ],
+      "license": "LGPL-3.0-or-later",
+      "optional": true,
+      "os": [
+        "linux"
+      ],
+      "funding": {
+        "url": "https://opencollective.com/libvips"
+      }
+    },
+    "node_modules/@img/sharp-libvips-linuxmusl-arm64": {
+      "version": "1.2.4",
+      "resolved": "https://registry.npmjs.org/@img/sharp-libvips-linuxmusl-arm64/-/sharp-libvips-linuxmusl-arm64-1.2.4.tgz",
+      "integrity": "sha512-FVQHuwx1IIuNow9QAbYUzJ+En8KcVm9Lk5+uGUQJHaZmMECZmOlix9HnH7n1TRkXMS0pGxIJokIVB9SuqZGGXw==",
+      "cpu": [
+        "arm64"
+      ],
+      "license": "LGPL-3.0-or-later",
+      "optional": true,
+      "os": [
+        "linux"
+      ],
+      "funding": {
+        "url": "https://opencollective.com/libvips"
+      }
+    },
+    "node_modules/@img/sharp-libvips-linuxmusl-x64": {
+      "version": "1.2.4",
+      "resolved": "https://registry.npmjs.org/@img/sharp-libvips-linuxmusl-x64/-/sharp-libvips-linuxmusl-x64-1.2.4.tgz",
+      "integrity": "sha512-+LpyBk7L44ZIXwz/VYfglaX/okxezESc6UxDSoyo2Ks6Jxc4Y7sGjpgU9s4PMgqgjj1gZCylTieNamqA1MF7Dg==",
+      "cpu": [
+        "x64"
+      ],
+      "license": "LGPL-3.0-or-later",
+      "optional": true,
+      "os": [
+        "linux"
+      ],
+      "funding": {
+        "url": "https://opencollective.com/libvips"
+      }
+    },
+    "node_modules/@img/sharp-linux-arm": {
+      "version": "0.34.5",
+      "resolved": "https://registry.npmjs.org/@img/sharp-linux-arm/-/sharp-linux-arm-0.34.5.tgz",
+      "integrity": "sha512-9dLqsvwtg1uuXBGZKsxem9595+ujv0sJ6Vi8wcTANSFpwV/GONat5eCkzQo/1O6zRIkh0m/8+5BjrRr7jDUSZw==",
+      "cpu": [
+        "arm"
+      ],
+      "license": "Apache-2.0",
+      "optional": true,
+      "os": [
+        "linux"
+      ],
+      "engines": {
+        "node": "^18.17.0 || ^20.3.0 || >=21.0.0"
+      },
+      "funding": {
+        "url": "https://opencollective.com/libvips"
+      },
+      "optionalDependencies": {
+        "@img/sharp-libvips-linux-arm": "1.2.4"
+      }
+    },
+    "node_modules/@img/sharp-linux-arm64": {
+      "version": "0.34.5",
+      "resolved": "https://registry.npmjs.org/@img/sharp-linux-arm64/-/sharp-linux-arm64-0.34.5.tgz",
+      "integrity": "sha512-bKQzaJRY/bkPOXyKx5EVup7qkaojECG6NLYswgktOZjaXecSAeCWiZwwiFf3/Y+O1HrauiE3FVsGxFg8c24rZg==",
+      "cpu": [
+        "arm64"
+      ],
+      "license": "Apache-2.0",
+      "optional": true,
+      "os": [
+        "linux"
+      ],
+      "engines": {
+        "node": "^18.17.0 || ^20.3.0 || >=21.0.0"
+      },
+      "funding": {
+        "url": "https://opencollective.com/libvips"
+      },
+      "optionalDependencies": {
+        "@img/sharp-libvips-linux-arm64": "1.2.4"
+      }
+    },
+    "node_modules/@img/sharp-linux-x64": {
+      "version": "0.34.5",
+      "resolved": "https://registry.npmjs.org/@img/sharp-linux-x64/-/sharp-linux-x64-0.34.5.tgz",
+      "integrity": "sha512-MEzd8HPKxVxVenwAa+JRPwEC7QFjoPWuS5NZnBt6B3pu7EG2Ge0id1oLHZpPJdn3OQK+BQDiw9zStiHBTJQQQQ==",
+      "cpu": [
+        "x64"
+      ],
+      "license": "Apache-2.0",
+      "optional": true,
+      "os": [
+        "linux"
+      ],
+      "engines": {
+        "node": "^18.17.0 || ^20.3.0 || >=21.0.0"
+      },
+      "funding": {
+        "url": "https://opencollective.com/libvips"
+      },
+      "optionalDependencies": {
+        "@img/sharp-libvips-linux-x64": "1.2.4"
+      }
+    },
+    "node_modules/@img/sharp-linuxmusl-arm64": {
+      "version": "0.34.5",
+      "resolved": "https://registry.npmjs.org/@img/sharp-linuxmusl-arm64/-/sharp-linuxmusl-arm64-0.34.5.tgz",
+      "integrity": "sha512-fprJR6GtRsMt6Kyfq44IsChVZeGN97gTD331weR1ex1c1rypDEABN6Tm2xa1wE6lYb5DdEnk03NZPqA7Id21yg==",
+      "cpu": [
+        "arm64"
+      ],
+      "license": "Apache-2.0",
+      "optional": true,
+      "os": [
+        "linux"
+      ],
+      "engines": {
+        "node": "^18.17.0 || ^20.3.0 || >=21.0.0"
+      },
+      "funding": {
+        "url": "https://opencollective.com/libvips"
+      },
+      "optionalDependencies": {
+        "@img/sharp-libvips-linuxmusl-arm64": "1.2.4"
+      }
+    },
+    "node_modules/@img/sharp-linuxmusl-x64": {
+      "version": "0.34.5",
+      "resolved": "https://registry.npmjs.org/@img/sharp-linuxmusl-x64/-/sharp-linuxmusl-x64-0.34.5.tgz",
+      "integrity": "sha512-Jg8wNT1MUzIvhBFxViqrEhWDGzqymo3sV7z7ZsaWbZNDLXRJZoRGrjulp60YYtV4wfY8VIKcWidjojlLcWrd8Q==",
+      "cpu": [
+        "x64"
+      ],
+      "license": "Apache-2.0",
+      "optional": true,
+      "os": [
+        "linux"
+      ],
+      "engines": {
+        "node": "^18.17.0 || ^20.3.0 || >=21.0.0"
+      },
+      "funding": {
+        "url": "https://opencollective.com/libvips"
+      },
+      "optionalDependencies": {
+        "@img/sharp-libvips-linuxmusl-x64": "1.2.4"
+      }
+    },
+    "node_modules/@img/sharp-win32-arm64": {
+      "version": "0.34.5",
+      "resolved": "https://registry.npmjs.org/@img/sharp-win32-arm64/-/sharp-win32-arm64-0.34.5.tgz",
+      "integrity": "sha512-WQ3AgWCWYSb2yt+IG8mnC6Jdk9Whs7O0gxphblsLvdhSpSTtmu69ZG1Gkb6NuvxsNACwiPV6cNSZNzt0KPsw7g==",
+      "cpu": [
+        "arm64"
+      ],
+      "license": "Apache-2.0 AND LGPL-3.0-or-later",
+      "optional": true,
+      "os": [
+        "win32"
+      ],
+      "engines": {
+        "node": "^18.17.0 || ^20.3.0 || >=21.0.0"
+      },
+      "funding": {
+        "url": "https://opencollective.com/libvips"
+      }
+    },
+    "node_modules/@img/sharp-win32-x64": {
+      "version": "0.34.5",
+      "resolved": "https://registry.npmjs.org/@img/sharp-win32-x64/-/sharp-win32-x64-0.34.5.tgz",
+      "integrity": "sha512-+29YMsqY2/9eFEiW93eqWnuLcWcufowXewwSNIT6UwZdUUCrM3oFjMWH/Z6/TMmb4hlFenmfAVbpWeup2jryCw==",
+      "cpu": [
+        "x64"
+      ],
+      "license": "Apache-2.0 AND LGPL-3.0-or-later",
+      "optional": true,
+      "os": [
+        "win32"
+      ],
+      "engines": {
+        "node": "^18.17.0 || ^20.3.0 || >=21.0.0"
+      },
+      "funding": {
+        "url": "https://opencollective.com/libvips"
+      }
+    },
+    "node_modules/@modelcontextprotocol/sdk": {
+      "version": "1.29.0",
+      "resolved": "https://registry.npmjs.org/@modelcontextprotocol/sdk/-/sdk-1.29.0.tgz",
+      "integrity": "sha512-zo37mZA9hJWpULgkRpowewez1y6ML5GsXJPY8FI0tBBCd77HEvza4jDqRKOXgHNn867PVGCyTdzqpz0izu5ZjQ==",
+      "license": "MIT",
+      "dependencies": {
+        "@hono/node-server": "^1.19.9",
+        "ajv": "^8.17.1",
+        "ajv-formats": "^3.0.1",
+        "content-type": "^1.0.5",
+        "cors": "^2.8.5",
+        "cross-spawn": "^7.0.5",
+        "eventsource": "^3.0.2",
+        "eventsource-parser": "^3.0.0",
+        "express": "^5.2.1",
+        "express-rate-limit": "^8.2.1",
+        "hono": "^4.11.4",
+        "jose": "^6.1.3",
+        "json-schema-typed": "^8.0.2",
+        "pkce-challenge": "^5.0.0",
+        "raw-body": "^3.0.0",
+        "zod": "^3.25 || ^4.0",
+        "zod-to-json-schema": "^3.25.1"
+      },
+      "engines": {
+        "node": ">=18"
+      },
+      "peerDependencies": {
+        "@cfworker/json-schema": "^4.1.1",
+        "zod": "^3.25 || ^4.0"
+      },
+      "peerDependenciesMeta": {
+        "@cfworker/json-schema": {
+          "optional": true
+        },
+        "zod": {
+          "optional": false
+        }
+      }
+    },
+    "node_modules/accepts": {
+      "version": "2.0.0",
+      "resolved": "https://registry.npmjs.org/accepts/-/accepts-2.0.0.tgz",
+      "integrity": "sha512-5cvg6CtKwfgdmVqY1WIiXKc3Q1bkRqGLi+2W/6ao+6Y7gu/RCwRuAhGEzh5B4KlszSuTLgZYuqFqo5bImjNKng==",
+      "license": "MIT",
+      "dependencies": {
+        "mime-types": "^3.0.0",
+        "negotiator": "^1.0.0"
+      },
+      "engines": {
+        "node": ">= 0.6"
+      }
+    },
+    "node_modules/ajv": {
+      "version": "8.18.0",
+      "resolved": "https://registry.npmjs.org/ajv/-/ajv-8.18.0.tgz",
+      "integrity": "sha512-PlXPeEWMXMZ7sPYOHqmDyCJzcfNrUr3fGNKtezX14ykXOEIvyK81d+qydx89KY5O71FKMPaQ2vBfBFI5NHR63A==",
+      "license": "MIT",
+      "dependencies": {
+        "fast-deep-equal": "^3.1.3",
+        "fast-uri": "^3.0.1",
+        "json-schema-traverse": "^1.0.0",
+        "require-from-string": "^2.0.2"
+      },
+      "funding": {
+        "type": "github",
+        "url": "https://github.com/sponsors/epoberezkin"
+      }
+    },
+    "node_modules/ajv-formats": {
+      "version": "3.0.1",
+      "resolved": "https://registry.npmjs.org/ajv-formats/-/ajv-formats-3.0.1.tgz",
+      "integrity": "sha512-8iUql50EUR+uUcdRQ3HDqa6EVyo3docL8g5WJ3FNcWmu62IbkGUue/pEyLBW8VGKKucTPgqeks4fIU1DA4yowQ==",
+      "license": "MIT",
+      "dependencies": {
+        "ajv": "^8.0.0"
+      },
+      "peerDependencies": {
+        "ajv": "^8.0.0"
+      },
+      "peerDependenciesMeta": {
+        "ajv": {
+          "optional": true
+        }
+      }
+    },
+    "node_modules/body-parser": {
+      "version": "2.2.2",
+      "resolved": "https://registry.npmjs.org/body-parser/-/body-parser-2.2.2.tgz",
+      "integrity": "sha512-oP5VkATKlNwcgvxi0vM0p/D3n2C3EReYVX+DNYs5TjZFn/oQt2j+4sVJtSMr18pdRr8wjTcBl6LoV+FUwzPmNA==",
+      "license": "MIT",
+      "dependencies": {
+        "bytes": "^3.1.2",
+        "content-type": "^1.0.5",
+        "debug": "^4.4.3",
+        "http-errors": "^2.0.0",
+        "iconv-lite": "^0.7.0",
+        "on-finished": "^2.4.1",
+        "qs": "^6.14.1",
+        "raw-body": "^3.0.1",
+        "type-is": "^2.0.1"
+      },
+      "engines": {
+        "node": ">=18"
+      },
+      "funding": {
+        "type": "opencollective",
+        "url": "https://opencollective.com/express"
+      }
+    },
+    "node_modules/bytes": {
+      "version": "3.1.2",
+      "resolved": "https://registry.npmjs.org/bytes/-/bytes-3.1.2.tgz",
+      "integrity": "sha512-/Nf7TyzTx6S3yRJObOAV7956r8cr2+Oj8AC5dt8wSP3BQAoeX58NoHyCU8P8zGkNXStjTSi6fzO6F0pBdcYbEg==",
+      "license": "MIT",
+      "engines": {
+        "node": ">= 0.8"
+      }
+    },
+    "node_modules/call-bind-apply-helpers": {
+      "version": "1.0.2",
+      "resolved": "https://registry.npmjs.org/call-bind-apply-helpers/-/call-bind-apply-helpers-1.0.2.tgz",
+      "integrity": "sha512-Sp1ablJ0ivDkSzjcaJdxEunN5/XvksFJ2sMBFfq6x0ryhQV/2b/KwFe21cMpmHtPOSij8K99/wSfoEuTObmuMQ==",
+      "license": "MIT",
+      "dependencies": {
+        "es-errors": "^1.3.0",
+        "function-bind": "^1.1.2"
+      },
+      "engines": {
+        "node": ">= 0.4"
+      }
+    },
+    "node_modules/call-bound": {
+      "version": "1.0.4",
+      "resolved": "https://registry.npmjs.org/call-bound/-/call-bound-1.0.4.tgz",
+      "integrity": "sha512-+ys997U96po4Kx/ABpBCqhA9EuxJaQWDQg7295H4hBphv3IZg0boBKuwYpt4YXp6MZ5AmZQnU/tyMTlRpaSejg==",
+      "license": "MIT",
+      "dependencies": {
+        "call-bind-apply-helpers": "^1.0.2",
+        "get-intrinsic": "^1.3.0"
+      },
+      "engines": {
+        "node": ">= 0.4"
+      },
+      "funding": {
+        "url": "https://github.com/sponsors/ljharb"
+      }
+    },
+    "node_modules/content-disposition": {
+      "version": "1.0.1",
+      "resolved": "https://registry.npmjs.org/content-disposition/-/content-disposition-1.0.1.tgz",
+      "integrity": "sha512-oIXISMynqSqm241k6kcQ5UwttDILMK4BiurCfGEREw6+X9jkkpEe5T9FZaApyLGGOnFuyMWZpdolTXMtvEJ08Q==",
+      "license": "MIT",
+      "engines": {
+        "node": ">=18"
+      },
+      "funding": {
+        "type": "opencollective",
+        "url": "https://opencollective.com/express"
+      }
+    },
+    "node_modules/content-type": {
+      "version": "1.0.5",
+      "resolved": "https://registry.npmjs.org/content-type/-/content-type-1.0.5.tgz",
+      "integrity": "sha512-nTjqfcBFEipKdXCv4YDQWCfmcLZKm81ldF0pAopTvyrFGVbcR6P/VAAd5G7N+0tTr8QqiU0tFadD6FK4NtJwOA==",
+      "license": "MIT",
+      "engines": {
+        "node": ">= 0.6"
+      }
+    },
+    "node_modules/cookie": {
+      "version": "0.7.2",
+      "resolved": "https://registry.npmjs.org/cookie/-/cookie-0.7.2.tgz",
+      "integrity": "sha512-yki5XnKuf750l50uGTllt6kKILY4nQ1eNIQatoXEByZ5dWgnKqbnqmTrBE5B4N7lrMJKQ2ytWMiTO2o0v6Ew/w==",
+      "license": "MIT",
+      "engines": {
+        "node": ">= 0.6"
+      }
+    },
+    "node_modules/cookie-signature": {
+      "version": "1.2.2",
+      "resolved": "https://registry.npmjs.org/cookie-signature/-/cookie-signature-1.2.2.tgz",
+      "integrity": "sha512-D76uU73ulSXrD1UXF4KE2TMxVVwhsnCgfAyTg9k8P6KGZjlXKrOLe4dJQKI3Bxi5wjesZoFXJWElNWBjPZMbhg==",
+      "license": "MIT",
+      "engines": {
+        "node": ">=6.6.0"
+      }
+    },
+    "node_modules/cors": {
+      "version": "2.8.6",
+      "resolved": "https://registry.npmjs.org/cors/-/cors-2.8.6.tgz",
+      "integrity": "sha512-tJtZBBHA6vjIAaF6EnIaq6laBBP9aq/Y3ouVJjEfoHbRBcHBAHYcMh/w8LDrk2PvIMMq8gmopa5D4V8RmbrxGw==",
+      "license": "MIT",
+      "dependencies": {
+        "object-assign": "^4",
+        "vary": "^1"
+      },
+      "engines": {
+        "node": ">= 0.10"
+      },
+      "funding": {
+        "type": "opencollective",
+        "url": "https://opencollective.com/express"
+      }
+    },
+    "node_modules/cross-spawn": {
+      "version": "7.0.6",
+      "resolved": "https://registry.npmjs.org/cross-spawn/-/cross-spawn-7.0.6.tgz",
+      "integrity": "sha512-uV2QOWP2nWzsy2aMp8aRibhi9dlzF5Hgh5SHaB9OiTGEyDTiJJyx0uy51QXdyWbtAHNua4XJzUKca3OzKUd3vA==",
+      "license": "MIT",
+      "dependencies": {
+        "path-key": "^3.1.0",
+        "shebang-command": "^2.0.0",
+        "which": "^2.0.1"
+      },
+      "engines": {
+        "node": ">= 8"
+      }
+    },
+    "node_modules/debug": {
+      "version": "4.4.3",
+      "resolved": "https://registry.npmjs.org/debug/-/debug-4.4.3.tgz",
+      "integrity": "sha512-RGwwWnwQvkVfavKVt22FGLw+xYSdzARwm0ru6DhTVA3umU5hZc28V3kO4stgYryrTlLpuvgI9GiijltAjNbcqA==",
+      "license": "MIT",
+      "dependencies": {
+        "ms": "^2.1.3"
+      },
+      "engines": {
+        "node": ">=6.0"
+      },
+      "peerDependenciesMeta": {
+        "supports-color": {
+          "optional": true
+        }
+      }
+    },
+    "node_modules/depd": {
+      "version": "2.0.0",
+      "resolved": "https://registry.npmjs.org/depd/-/depd-2.0.0.tgz",
+      "integrity": "sha512-g7nH6P6dyDioJogAAGprGpCtVImJhpPk/roCzdb3fIh61/s/nPsfR6onyMwkCAR/OlC3yBC0lESvUoQEAssIrw==",
+      "license": "MIT",
+      "engines": {
+        "node": ">= 0.8"
+      }
+    },
+    "node_modules/dotenv": {
+      "version": "16.6.1",
+      "resolved": "https://registry.npmjs.org/dotenv/-/dotenv-16.6.1.tgz",
+      "integrity": "sha512-uBq4egWHTcTt33a72vpSG0z3HnPuIl6NqYcTrKEg2azoEyl2hpW0zqlxysq2pK9HlDIHyHyakeYaYnSAwd8bow==",
+      "license": "BSD-2-Clause",
+      "engines": {
+        "node": ">=12"
+      },
+      "funding": {
+        "url": "https://dotenvx.com"
+      }
+    },
+    "node_modules/dunder-proto": {
+      "version": "1.0.1",
+      "resolved": "https://registry.npmjs.org/dunder-proto/-/dunder-proto-1.0.1.tgz",
+      "integrity": "sha512-KIN/nDJBQRcXw0MLVhZE9iQHmG68qAVIBg9CqmUYjmQIhgij9U5MFvrqkUL5FbtyyzZuOeOt0zdeRe4UY7ct+A==",
+      "license": "MIT",
+      "dependencies": {
+        "call-bind-apply-helpers": "^1.0.1",
+        "es-errors": "^1.3.0",
+        "gopd": "^1.2.0"
+      },
+      "engines": {
+        "node": ">= 0.4"
+      }
+    },
+    "node_modules/ee-first": {
+      "version": "1.1.1",
+      "resolved": "https://registry.npmjs.org/ee-first/-/ee-first-1.1.1.tgz",
+      "integrity": "sha512-WMwm9LhRUo+WUaRN+vRuETqG89IgZphVSNkdFgeb6sS/E4OrDIN7t48CAewSHXc6C8lefD8KKfr5vY61brQlow==",
+      "license": "MIT"
+    },
+    "node_modules/encodeurl": {
+      "version": "2.0.0",
+      "resolved": "https://registry.npmjs.org/encodeurl/-/encodeurl-2.0.0.tgz",
+      "integrity": "sha512-Q0n9HRi4m6JuGIV1eFlmvJB7ZEVxu93IrMyiMsGC0lrMJMWzRgx6WGquyfQgZVb31vhGgXnfmPNNXmxnOkRBrg==",
+      "license": "MIT",
+      "engines": {
+        "node": ">= 0.8"
+      }
+    },
+    "node_modules/es-define-property": {
+      "version": "1.0.1",
+      "resolved": "https://registry.npmjs.org/es-define-property/-/es-define-property-1.0.1.tgz",
+      "integrity": "sha512-e3nRfgfUZ4rNGL232gUgX06QNyyez04KdjFrF+LTRoOXmrOgFKDg4BCdsjW8EnT69eqdYGmRpJwiPVYNrCaW3g==",
+      "license": "MIT",
+      "engines": {
+        "node": ">= 0.4"
+      }
+    },
+    "node_modules/es-errors": {
+      "version": "1.3.0",
+      "resolved": "https://registry.npmjs.org/es-errors/-/es-errors-1.3.0.tgz",
+      "integrity": "sha512-Zf5H2Kxt2xjTvbJvP2ZWLEICxA6j+hAmMzIlypy4xcBg1vKVnx89Wy0GbS+kf5cwCVFFzdCFh2XSCFNULS6csw==",
+      "license": "MIT",
+      "engines": {
+        "node": ">= 0.4"
+      }
+    },
+    "node_modules/es-object-atoms": {
+      "version": "1.1.1",
+      "resolved": "https://registry.npmjs.org/es-object-atoms/-/es-object-atoms-1.1.1.tgz",
+      "integrity": "sha512-FGgH2h8zKNim9ljj7dankFPcICIK9Cp5bm+c2gQSYePhpaG5+esrLODihIorn+Pe6FGJzWhXQotPv73jTaldXA==",
+      "license": "MIT",
+      "dependencies": {
+        "es-errors": "^1.3.0"
+      },
+      "engines": {
+        "node": ">= 0.4"
+      }
+    },
+    "node_modules/esbuild": {
+      "version": "0.27.7",
+      "resolved": "https://registry.npmjs.org/esbuild/-/esbuild-0.27.7.tgz",
+      "integrity": "sha512-IxpibTjyVnmrIQo5aqNpCgoACA/dTKLTlhMHihVHhdkxKyPO1uBBthumT0rdHmcsk9uMonIWS0m4FljWzILh3w==",
+      "dev": true,
+      "hasInstallScript": true,
+      "license": "MIT",
+      "bin": {
+        "esbuild": "bin/esbuild"
+      },
+      "engines": {
+        "node": ">=18"
+      },
+      "optionalDependencies": {
+        "@esbuild/aix-ppc64": "0.27.7",
+        "@esbuild/android-arm": "0.27.7",
+        "@esbuild/android-arm64": "0.27.7",
+        "@esbuild/android-x64": "0.27.7",
+        "@esbuild/darwin-arm64": "0.27.7",
+        "@esbuild/darwin-x64": "0.27.7",
+        "@esbuild/freebsd-arm64": "0.27.7",
+        "@esbuild/freebsd-x64": "0.27.7",
+        "@esbuild/linux-arm": "0.27.7",
+        "@esbuild/linux-arm64": "0.27.7",
+        "@esbuild/linux-ia32": "0.27.7",
+        "@esbuild/linux-loong64": "0.27.7",
+        "@esbuild/linux-mips64el": "0.27.7",
+        "@esbuild/linux-ppc64": "0.27.7",
+        "@esbuild/linux-riscv64": "0.27.7",
+        "@esbuild/linux-s390x": "0.27.7",
+        "@esbuild/linux-x64": "0.27.7",
+        "@esbuild/netbsd-arm64": "0.27.7",
+        "@esbuild/netbsd-x64": "0.27.7",
+        "@esbuild/openbsd-arm64": "0.27.7",
+        "@esbuild/openbsd-x64": "0.27.7",
+        "@esbuild/openharmony-arm64": "0.27.7",
+        "@esbuild/sunos-x64": "0.27.7",
+        "@esbuild/win32-arm64": "0.27.7",
+        "@esbuild/win32-ia32": "0.27.7",
+        "@esbuild/win32-x64": "0.27.7"
+      }
+    },
+    "node_modules/escape-html": {
+      "version": "1.0.3",
+      "resolved": "https://registry.npmjs.org/escape-html/-/escape-html-1.0.3.tgz",
+      "integrity": "sha512-NiSupZ4OeuGwr68lGIeym/ksIZMJodUGOSCZ/FSnTxcrekbvqrgdUxlJOMpijaKZVjAJrWrGs/6Jy8OMuyj9ow==",
+      "license": "MIT"
+    },
+    "node_modules/etag": {
+      "version": "1.8.1",
+      "resolved": "https://registry.npmjs.org/etag/-/etag-1.8.1.tgz",
+      "integrity": "sha512-aIL5Fx7mawVa300al2BnEE4iNvo1qETxLrPI/o05L7z6go7fCw1J6EQmbK4FmJ2AS7kgVF/KEZWufBfdClMcPg==",
+      "license": "MIT",
+      "engines": {
+        "node": ">= 0.6"
+      }
+    },
+    "node_modules/eventsource": {
+      "version": "3.0.7",
+      "resolved": "https://registry.npmjs.org/eventsource/-/eventsource-3.0.7.tgz",
+      "integrity": "sha512-CRT1WTyuQoD771GW56XEZFQ/ZoSfWid1alKGDYMmkt2yl8UXrVR4pspqWNEcqKvVIzg6PAltWjxcSSPrboA4iA==",
+      "license": "MIT",
+      "dependencies": {
+        "eventsource-parser": "^3.0.1"
+      },
+      "engines": {
+        "node": ">=18.0.0"
+      }
+    },
+    "node_modules/eventsource-parser": {
+      "version": "3.0.6",
+      "resolved": "https://registry.npmjs.org/eventsource-parser/-/eventsource-parser-3.0.6.tgz",
+      "integrity": "sha512-Vo1ab+QXPzZ4tCa8SwIHJFaSzy4R6SHf7BY79rFBDf0idraZWAkYrDjDj8uWaSm3S2TK+hJ7/t1CEmZ7jXw+pg==",
+      "license": "MIT",
+      "engines": {
+        "node": ">=18.0.0"
+      }
+    },
+    "node_modules/express": {
+      "version": "5.2.1",
+      "resolved": "https://registry.npmjs.org/express/-/express-5.2.1.tgz",
+      "integrity": "sha512-hIS4idWWai69NezIdRt2xFVofaF4j+6INOpJlVOLDO8zXGpUVEVzIYk12UUi2JzjEzWL3IOAxcTubgz9Po0yXw==",
+      "license": "MIT",
+      "peer": true,
+      "dependencies": {
+        "accepts": "^2.0.0",
+        "body-parser": "^2.2.1",
+        "content-disposition": "^1.0.0",
+        "content-type": "^1.0.5",
+        "cookie": "^0.7.1",
+        "cookie-signature": "^1.2.1",
+        "debug": "^4.4.0",
+        "depd": "^2.0.0",
+        "encodeurl": "^2.0.0",
+        "escape-html": "^1.0.3",
+        "etag": "^1.8.1",
+        "finalhandler": "^2.1.0",
+        "fresh": "^2.0.0",
+        "http-errors": "^2.0.0",
+        "merge-descriptors": "^2.0.0",
+        "mime-types": "^3.0.0",
+        "on-finished": "^2.4.1",
+        "once": "^1.4.0",
+        "parseurl": "^1.3.3",
+        "proxy-addr": "^2.0.7",
+        "qs": "^6.14.0",
+        "range-parser": "^1.2.1",
+        "router": "^2.2.0",
+        "send": "^1.1.0",
+        "serve-static": "^2.2.0",
+        "statuses": "^2.0.1",
+        "type-is": "^2.0.1",
+        "vary": "^1.1.2"
+      },
+      "engines": {
+        "node": ">= 18"
+      },
+      "funding": {
+        "type": "opencollective",
+        "url": "https://opencollective.com/express"
+      }
+    },
+    "node_modules/express-rate-limit": {
+      "version": "8.3.2",
+      "resolved": "https://registry.npmjs.org/express-rate-limit/-/express-rate-limit-8.3.2.tgz",
+      "integrity": "sha512-77VmFeJkO0/rvimEDuUC5H30oqUC4EyOhyGccfqoLebB0oiEYfM7nwPrsDsBL1gsTpwfzX8SFy2MT3TDyRq+bg==",
+      "license": "MIT",
+      "dependencies": {
+        "ip-address": "10.1.0"
+      },
+      "engines": {
+        "node": ">= 16"
+      },
+      "funding": {
+        "url": "https://github.com/sponsors/express-rate-limit"
+      },
+      "peerDependencies": {
+        "express": ">= 4.11"
+      }
+    },
+    "node_modules/fast-deep-equal": {
+      "version": "3.1.3",
+      "resolved": "https://registry.npmjs.org/fast-deep-equal/-/fast-deep-equal-3.1.3.tgz",
+      "integrity": "sha512-f3qQ9oQy9j2AhBe/H9VC91wLmKBCCU/gDOnKNAYG5hswO7BLKj09Hc5HYNz9cGI++xlpDCIgDaitVs03ATR84Q==",
+      "license": "MIT"
+    },
+    "node_modules/fast-uri": {
+      "version": "3.1.0",
+      "resolved": "https://registry.npmjs.org/fast-uri/-/fast-uri-3.1.0.tgz",
+      "integrity": "sha512-iPeeDKJSWf4IEOasVVrknXpaBV0IApz/gp7S2bb7Z4Lljbl2MGJRqInZiUrQwV16cpzw/D3S5j5Julj/gT52AA==",
+      "funding": [
+        {
+          "type": "github",
+          "url": "https://github.com/sponsors/fastify"
+        },
+        {
+          "type": "opencollective",
+          "url": "https://opencollective.com/fastify"
+        }
+      ],
+      "license": "BSD-3-Clause"
+    },
+    "node_modules/finalhandler": {
+      "version": "2.1.1",
+      "resolved": "https://registry.npmjs.org/finalhandler/-/finalhandler-2.1.1.tgz",
+      "integrity": "sha512-S8KoZgRZN+a5rNwqTxlZZePjT/4cnm0ROV70LedRHZ0p8u9fRID0hJUZQpkKLzro8LfmC8sx23bY6tVNxv8pQA==",
+      "license": "MIT",
+      "dependencies": {
+        "debug": "^4.4.0",
+        "encodeurl": "^2.0.0",
+        "escape-html": "^1.0.3",
+        "on-finished": "^2.4.1",
+        "parseurl": "^1.3.3",
+        "statuses": "^2.0.1"
+      },
+      "engines": {
+        "node": ">= 18.0.0"
+      },
+      "funding": {
+        "type": "opencollective",
+        "url": "https://opencollective.com/express"
+      }
+    },
+    "node_modules/forwarded": {
+      "version": "0.2.0",
+      "resolved": "https://registry.npmjs.org/forwarded/-/forwarded-0.2.0.tgz",
+      "integrity": "sha512-buRG0fpBtRHSTCOASe6hD258tEubFoRLb4ZNA6NxMVHNw2gOcwHo9wyablzMzOA5z9xA9L1KNjk/Nt6MT9aYow==",
+      "license": "MIT",
+      "engines": {
+        "node": ">= 0.6"
+      }
+    },
+    "node_modules/fresh": {
+      "version": "2.0.0",
+      "resolved": "https://registry.npmjs.org/fresh/-/fresh-2.0.0.tgz",
+      "integrity": "sha512-Rx/WycZ60HOaqLKAi6cHRKKI7zxWbJ31MhntmtwMoaTeF7XFH9hhBp8vITaMidfljRQ6eYWCKkaTK+ykVJHP2A==",
+      "license": "MIT",
+      "engines": {
+        "node": ">= 0.8"
+      }
+    },
+    "node_modules/fsevents": {
+      "version": "2.3.3",
+      "resolved": "https://registry.npmjs.org/fsevents/-/fsevents-2.3.3.tgz",
+      "integrity": "sha512-5xoDfX+fL7faATnagmWPpbFtwh/R77WmMMqqHGS65C3vvB0YHrgF+B1YmZ3441tMj5n63k0212XNoJwzlhffQw==",
+      "dev": true,
+      "hasInstallScript": true,
+      "license": "MIT",
+      "optional": true,
+      "os": [
+        "darwin"
+      ],
+      "engines": {
+        "node": "^8.16.0 || ^10.6.0 || >=11.0.0"
+      }
+    },
+    "node_modules/function-bind": {
+      "version": "1.1.2",
+      "resolved": "https://registry.npmjs.org/function-bind/-/function-bind-1.1.2.tgz",
+      "integrity": "sha512-7XHNxH7qX9xG5mIwxkhumTox/MIRNcOgDrxWsMt2pAr23WHp6MrRlN7FBSFpCpr+oVO0F744iUgR82nJMfG2SA==",
+      "license": "MIT",
+      "funding": {
+        "url": "https://github.com/sponsors/ljharb"
+      }
+    },
+    "node_modules/get-intrinsic": {
+      "version": "1.3.0",
+      "resolved": "https://registry.npmjs.org/get-intrinsic/-/get-intrinsic-1.3.0.tgz",
+      "integrity": "sha512-9fSjSaos/fRIVIp+xSJlE6lfwhES7LNtKaCBIamHsjr2na1BiABJPo0mOjjz8GJDURarmCPGqaiVg5mfjb98CQ==",
+      "license": "MIT",
+      "dependencies": {
+        "call-bind-apply-helpers": "^1.0.2",
+        "es-define-property": "^1.0.1",
+        "es-errors": "^1.3.0",
+        "es-object-atoms": "^1.1.1",
+        "function-bind": "^1.1.2",
+        "get-proto": "^1.0.1",
+        "gopd": "^1.2.0",
+        "has-symbols": "^1.1.0",
+        "hasown": "^2.0.2",
+        "math-intrinsics": "^1.1.0"
+      },
+      "engines": {
+        "node": ">= 0.4"
+      },
+      "funding": {
+        "url": "https://github.com/sponsors/ljharb"
+      }
+    },
+    "node_modules/get-proto": {
+      "version": "1.0.1",
+      "resolved": "https://registry.npmjs.org/get-proto/-/get-proto-1.0.1.tgz",
+      "integrity": "sha512-sTSfBjoXBp89JvIKIefqw7U2CCebsc74kiY6awiGogKtoSGbgjYE/G/+l9sF3MWFPNc9IcoOC4ODfKHfxFmp0g==",
+      "license": "MIT",
+      "dependencies": {
+        "dunder-proto": "^1.0.1",
+        "es-object-atoms": "^1.0.0"
+      },
+      "engines": {
+        "node": ">= 0.4"
+      }
+    },
+    "node_modules/get-tsconfig": {
+      "version": "4.13.7",
+      "resolved": "https://registry.npmjs.org/get-tsconfig/-/get-tsconfig-4.13.7.tgz",
+      "integrity": "sha512-7tN6rFgBlMgpBML5j8typ92BKFi2sFQvIdpAqLA2beia5avZDrMs0FLZiM5etShWq5irVyGcGMEA1jcDaK7A/Q==",
+      "dev": true,
+      "license": "MIT",
+      "dependencies": {
+        "resolve-pkg-maps": "^1.0.0"
+      },
+      "funding": {
+        "url": "https://github.com/privatenumber/get-tsconfig?sponsor=1"
+      }
+    },
+    "node_modules/gopd": {
+      "version": "1.2.0",
+      "resolved": "https://registry.npmjs.org/gopd/-/gopd-1.2.0.tgz",
+      "integrity": "sha512-ZUKRh6/kUFoAiTAtTYPZJ3hw9wNxx+BIBOijnlG9PnrJsCcSjs1wyyD6vJpaYtgnzDrKYRSqf3OO6Rfa93xsRg==",
+      "license": "MIT",
+      "engines": {
+        "node": ">= 0.4"
+      },
+      "funding": {
+        "url": "https://github.com/sponsors/ljharb"
+      }
+    },
+    "node_modules/has-symbols": {
+      "version": "1.1.0",
+      "resolved": "https://registry.npmjs.org/has-symbols/-/has-symbols-1.1.0.tgz",
+      "integrity": "sha512-1cDNdwJ2Jaohmb3sg4OmKaMBwuC48sYni5HUw2DvsC8LjGTLK9h+eb1X6RyuOHe4hT0ULCW68iomhjUoKUqlPQ==",
+      "license": "MIT",
+      "engines": {
+        "node": ">= 0.4"
+      },
+      "funding": {
+        "url": "https://github.com/sponsors/ljharb"
+      }
+    },
+    "node_modules/hasown": {
+      "version": "2.0.2",
+      "resolved": "https://registry.npmjs.org/hasown/-/hasown-2.0.2.tgz",
+      "integrity": "sha512-0hJU9SCPvmMzIBdZFqNPXWa6dqh7WdH0cII9y+CyS8rG3nL48Bclra9HmKhVVUHyPWNH5Y7xDwAB7bfgSjkUMQ==",
+      "license": "MIT",
+      "dependencies": {
+        "function-bind": "^1.1.2"
+      },
+      "engines": {
+        "node": ">= 0.4"
+      }
+    },
+    "node_modules/hono": {
+      "version": "4.12.12",
+      "resolved": "https://registry.npmjs.org/hono/-/hono-4.12.12.tgz",
+      "integrity": "sha512-p1JfQMKaceuCbpJKAPKVqyqviZdS0eUxH9v82oWo1kb9xjQ5wA6iP3FNVAPDFlz5/p7d45lO+BpSk1tuSZMF4Q==",
+      "license": "MIT",
+      "peer": true,
+      "engines": {
+        "node": ">=16.9.0"
+      }
+    },
+    "node_modules/http-errors": {
+      "version": "2.0.1",
+      "resolved": "https://registry.npmjs.org/http-errors/-/http-errors-2.0.1.tgz",
+      "integrity": "sha512-4FbRdAX+bSdmo4AUFuS0WNiPz8NgFt+r8ThgNWmlrjQjt1Q7ZR9+zTlce2859x4KSXrwIsaeTqDoKQmtP8pLmQ==",
+      "license": "MIT",
+      "dependencies": {
+        "depd": "~2.0.0",
+        "inherits": "~2.0.4",
+        "setprototypeof": "~1.2.0",
+        "statuses": "~2.0.2",
+        "toidentifier": "~1.0.1"
+      },
+      "engines": {
+        "node": ">= 0.8"
+      },
+      "funding": {
+        "type": "opencollective",
+        "url": "https://opencollective.com/express"
+      }
+    },
+    "node_modules/iconv-lite": {
+      "version": "0.7.2",
+      "resolved": "https://registry.npmjs.org/iconv-lite/-/iconv-lite-0.7.2.tgz",
+      "integrity": "sha512-im9DjEDQ55s9fL4EYzOAv0yMqmMBSZp6G0VvFyTMPKWxiSBHUj9NW/qqLmXUwXrrM7AvqSlTCfvqRb0cM8yYqw==",
+      "license": "MIT",
+      "dependencies": {
+        "safer-buffer": ">= 2.1.2 < 3.0.0"
+      },
+      "engines": {
+        "node": ">=0.10.0"
+      },
+      "funding": {
+        "type": "opencollective",
+        "url": "https://opencollective.com/express"
+      }
+    },
+    "node_modules/inherits": {
+      "version": "2.0.4",
+      "resolved": "https://registry.npmjs.org/inherits/-/inherits-2.0.4.tgz",
+      "integrity": "sha512-k/vGaX4/Yla3WzyMCvTQOXYeIHvqOKtnqBduzTHpzpQZzAskKMhZ2K+EnBiSM9zGSoIFeMpXKxa4dYeZIQqewQ==",
+      "license": "ISC"
+    },
+    "node_modules/ip-address": {
+      "version": "10.1.0",
+      "resolved": "https://registry.npmjs.org/ip-address/-/ip-address-10.1.0.tgz",
+      "integrity": "sha512-XXADHxXmvT9+CRxhXg56LJovE+bmWnEWB78LB83VZTprKTmaC5QfruXocxzTZ2Kl0DNwKuBdlIhjL8LeY8Sf8Q==",
+      "license": "MIT",
+      "engines": {
+        "node": ">= 12"
+      }
+    },
+    "node_modules/ipaddr.js": {
+      "version": "1.9.1",
+      "resolved": "https://registry.npmjs.org/ipaddr.js/-/ipaddr.js-1.9.1.tgz",
+      "integrity": "sha512-0KI/607xoxSToH7GjN1FfSbLoU0+btTicjsQSWQlh/hZykN8KpmMf7uYwPW3R+akZ6R/w18ZlXSHBYXiYUPO3g==",
+      "license": "MIT",
+      "engines": {
+        "node": ">= 0.10"
+      }
+    },
+    "node_modules/is-promise": {
+      "version": "4.0.0",
+      "resolved": "https://registry.npmjs.org/is-promise/-/is-promise-4.0.0.tgz",
+      "integrity": "sha512-hvpoI6korhJMnej285dSg6nu1+e6uxs7zG3BYAm5byqDsgJNWwxzM6z6iZiAgQR4TJ30JmBTOwqZUw3WlyH3AQ==",
+      "license": "MIT"
+    },
+    "node_modules/isexe": {
+      "version": "2.0.0",
+      "resolved": "https://registry.npmjs.org/isexe/-/isexe-2.0.0.tgz",
+      "integrity": "sha512-RHxMLp9lnKHGHRng9QFhRCMbYAcVpn69smSGcq3f36xjgVVWThj4qqLbTLlq7Ssj8B+fIQ1EuCEGI2lKsyQeIw==",
+      "license": "ISC"
+    },
+    "node_modules/jose": {
+      "version": "6.2.2",
+      "resolved": "https://registry.npmjs.org/jose/-/jose-6.2.2.tgz",
+      "integrity": "sha512-d7kPDd34KO/YnzaDOlikGpOurfF0ByC2sEV4cANCtdqLlTfBlw2p14O/5d/zv40gJPbIQxfES3nSx1/oYNyuZQ==",
+      "license": "MIT",
+      "funding": {
+        "url": "https://github.com/sponsors/panva"
+      }
+    },
+    "node_modules/json-schema-to-ts": {
+      "version": "3.1.1",
+      "resolved": "https://registry.npmjs.org/json-schema-to-ts/-/json-schema-to-ts-3.1.1.tgz",
+      "integrity": "sha512-+DWg8jCJG2TEnpy7kOm/7/AxaYoaRbjVB4LFZLySZlWn8exGs3A4OLJR966cVvU26N7X9TWxl+Jsw7dzAqKT6g==",
+      "license": "MIT",
+      "dependencies": {
+        "@babel/runtime": "^7.18.3",
+        "ts-algebra": "^2.0.0"
+      },
+      "engines": {
+        "node": ">=16"
+      }
+    },
+    "node_modules/json-schema-traverse": {
+      "version": "1.0.0",
+      "resolved": "https://registry.npmjs.org/json-schema-traverse/-/json-schema-traverse-1.0.0.tgz",
+      "integrity": "sha512-NM8/P9n3XjXhIZn1lLhkFaACTOURQXjWhV4BA/RnOv8xvgqtqpAX9IO4mRQxSx1Rlo4tqzeqb0sOlruaOy3dug==",
+      "license": "MIT"
+    },
+    "node_modules/json-schema-typed": {
+      "version": "8.0.2",
+      "resolved": "https://registry.npmjs.org/json-schema-typed/-/json-schema-typed-8.0.2.tgz",
+      "integrity": "sha512-fQhoXdcvc3V28x7C7BMs4P5+kNlgUURe2jmUT1T//oBRMDrqy1QPelJimwZGo7Hg9VPV3EQV5Bnq4hbFy2vetA==",
+      "license": "BSD-2-Clause"
+    },
+    "node_modules/math-intrinsics": {
+      "version": "1.1.0",
+      "resolved": "https://registry.npmjs.org/math-intrinsics/-/math-intrinsics-1.1.0.tgz",
+      "integrity": "sha512-/IXtbwEk5HTPyEwyKX6hGkYXxM9nbj64B+ilVJnC/R6B0pH5G4V3b0pVbL7DBj4tkhBAppbQUlf6F6Xl9LHu1g==",
+      "license": "MIT",
+      "engines": {
+        "node": ">= 0.4"
+      }
+    },
+    "node_modules/media-typer": {
+      "version": "1.1.0",
+      "resolved": "https://registry.npmjs.org/media-typer/-/media-typer-1.1.0.tgz",
+      "integrity": "sha512-aisnrDP4GNe06UcKFnV5bfMNPBUw4jsLGaWwWfnH3v02GnBuXX2MCVn5RbrWo0j3pczUilYblq7fQ7Nw2t5XKw==",
+      "license": "MIT",
+      "engines": {
+        "node": ">= 0.8"
+      }
+    },
+    "node_modules/merge-descriptors": {
+      "version": "2.0.0",
+      "resolved": "https://registry.npmjs.org/merge-descriptors/-/merge-descriptors-2.0.0.tgz",
+      "integrity": "sha512-Snk314V5ayFLhp3fkUREub6WtjBfPdCPY1Ln8/8munuLuiYhsABgBVWsozAG+MWMbVEvcdcpbi9R7ww22l9Q3g==",
+      "license": "MIT",
+      "engines": {
+        "node": ">=18"
+      },
+      "funding": {
+        "url": "https://github.com/sponsors/sindresorhus"
+      }
+    },
+    "node_modules/mime-db": {
+      "version": "1.54.0",
+      "resolved": "https://registry.npmjs.org/mime-db/-/mime-db-1.54.0.tgz",
+      "integrity": "sha512-aU5EJuIN2WDemCcAp2vFBfp/m4EAhWJnUNSSw0ixs7/kXbd6Pg64EmwJkNdFhB8aWt1sH2CTXrLxo/iAGV3oPQ==",
+      "license": "MIT",
+      "engines": {
+        "node": ">= 0.6"
+      }
+    },
+    "node_modules/mime-types": {
+      "version": "3.0.2",
+      "resolved": "https://registry.npmjs.org/mime-types/-/mime-types-3.0.2.tgz",
+      "integrity": "sha512-Lbgzdk0h4juoQ9fCKXW4by0UJqj+nOOrI9MJ1sSj4nI8aI2eo1qmvQEie4VD1glsS250n15LsWsYtCugiStS5A==",
+      "license": "MIT",
+      "dependencies": {
+        "mime-db": "^1.54.0"
+      },
+      "engines": {
+        "node": ">=18"
+      },
+      "funding": {
+        "type": "opencollective",
+        "url": "https://opencollective.com/express"
+      }
+    },
+    "node_modules/ms": {
+      "version": "2.1.3",
+      "resolved": "https://registry.npmjs.org/ms/-/ms-2.1.3.tgz",
+      "integrity": "sha512-6FlzubTLZG3J2a/NVCAleEhjzq5oxgHyaCU9yYXvcLsvoVaHJq/s5xXI6/XXP6tz7R9xAOtHnSO/tXtF3WRTlA==",
+      "license": "MIT"
+    },
+    "node_modules/negotiator": {
+      "version": "1.0.0",
+      "resolved": "https://registry.npmjs.org/negotiator/-/negotiator-1.0.0.tgz",
+      "integrity": "sha512-8Ofs/AUQh8MaEcrlq5xOX0CQ9ypTF5dl78mjlMNfOK08fzpgTHQRQPBxcPlEtIw0yRpws+Zo/3r+5WRby7u3Gg==",
+      "license": "MIT",
+      "engines": {
+        "node": ">= 0.6"
+      }
+    },
+    "node_modules/object-assign": {
+      "version": "4.1.1",
+      "resolved": "https://registry.npmjs.org/object-assign/-/object-assign-4.1.1.tgz",
+      "integrity": "sha512-rJgTQnkUnH1sFw8yT6VSU3zD3sWmu6sZhIseY8VX+GRu3P6F7Fu+JNDoXfklElbLJSnc3FUQHVe4cU5hj+BcUg==",
+      "license": "MIT",
+      "engines": {
+        "node": ">=0.10.0"
+      }
+    },
+    "node_modules/object-inspect": {
+      "version": "1.13.4",
+      "resolved": "https://registry.npmjs.org/object-inspect/-/object-inspect-1.13.4.tgz",
+      "integrity": "sha512-W67iLl4J2EXEGTbfeHCffrjDfitvLANg0UlX3wFUUSTx92KXRFegMHUVgSqE+wvhAbi4WqjGg9czysTV2Epbew==",
+      "license": "MIT",
+      "engines": {
+        "node": ">= 0.4"
+      },
+      "funding": {
+        "url": "https://github.com/sponsors/ljharb"
+      }
+    },
+    "node_modules/on-finished": {
+      "version": "2.4.1",
+      "resolved": "https://registry.npmjs.org/on-finished/-/on-finished-2.4.1.tgz",
+      "integrity": "sha512-oVlzkg3ENAhCk2zdv7IJwd/QUD4z2RxRwpkcGY8psCVcCYZNq4wYnVWALHM+brtuJjePWiYF/ClmuDr8Ch5+kg==",
+      "license": "MIT",
+      "dependencies": {
+        "ee-first": "1.1.1"
+      },
+      "engines": {
+        "node": ">= 0.8"
+      }
+    },
+    "node_modules/once": {
+      "version": "1.4.0",
+      "resolved": "https://registry.npmjs.org/once/-/once-1.4.0.tgz",
+      "integrity": "sha512-lNaJgI+2Q5URQBkccEKHTQOPaXdUxnZZElQTZY0MFUAuaEqe1E+Nyvgdz/aIyNi6Z9MzO5dv1H8n58/GELp3+w==",
+      "license": "ISC",
+      "dependencies": {
+        "wrappy": "1"
+      }
+    },
+    "node_modules/parseurl": {
+      "version": "1.3.3",
+      "resolved": "https://registry.npmjs.org/parseurl/-/parseurl-1.3.3.tgz",
+      "integrity": "sha512-CiyeOxFT/JZyN5m0z9PfXw4SCBJ6Sygz1Dpl0wqjlhDEGGBP1GnsUVEL0p63hoG1fcj3fHynXi9NYO4nWOL+qQ==",
+      "license": "MIT",
+      "engines": {
+        "node": ">= 0.8"
+      }
+    },
+    "node_modules/path-key": {
+      "version": "3.1.1",
+      "resolved": "https://registry.npmjs.org/path-key/-/path-key-3.1.1.tgz",
+      "integrity": "sha512-ojmeN0qd+y0jszEtoY48r0Peq5dwMEkIlCOu6Q5f41lfkswXuKtYrhgoTpLnyIcHm24Uhqx+5Tqm2InSwLhE6Q==",
+      "license": "MIT",
+      "engines": {
+        "node": ">=8"
+      }
+    },
+    "node_modules/path-to-regexp": {
+      "version": "8.4.2",
+      "resolved": "https://registry.npmjs.org/path-to-regexp/-/path-to-regexp-8.4.2.tgz",
+      "integrity": "sha512-qRcuIdP69NPm4qbACK+aDogI5CBDMi1jKe0ry5rSQJz8JVLsC7jV8XpiJjGRLLol3N+R5ihGYcrPLTno6pAdBA==",
+      "license": "MIT",
+      "funding": {
+        "type": "opencollective",
+        "url": "https://opencollective.com/express"
+      }
+    },
+    "node_modules/pkce-challenge": {
+      "version": "5.0.1",
+      "resolved": "https://registry.npmjs.org/pkce-challenge/-/pkce-challenge-5.0.1.tgz",
+      "integrity": "sha512-wQ0b/W4Fr01qtpHlqSqspcj3EhBvimsdh0KlHhH8HRZnMsEa0ea2fTULOXOS9ccQr3om+GcGRk4e+isrZWV8qQ==",
+      "license": "MIT",
+      "engines": {
+        "node": ">=16.20.0"
+      }
+    },
+    "node_modules/proxy-addr": {
+      "version": "2.0.7",
+      "resolved": "https://registry.npmjs.org/proxy-addr/-/proxy-addr-2.0.7.tgz",
+      "integrity": "sha512-llQsMLSUDUPT44jdrU/O37qlnifitDP+ZwrmmZcoSKyLKvtZxpyV0n2/bD/N4tBAAZ/gJEdZU7KMraoK1+XYAg==",
+      "license": "MIT",
+      "dependencies": {
+        "forwarded": "0.2.0",
+        "ipaddr.js": "1.9.1"
+      },
+      "engines": {
+        "node": ">= 0.10"
+      }
+    },
+    "node_modules/qs": {
+      "version": "6.15.0",
+      "resolved": "https://registry.npmjs.org/qs/-/qs-6.15.0.tgz",
+      "integrity": "sha512-mAZTtNCeetKMH+pSjrb76NAM8V9a05I9aBZOHztWy/UqcJdQYNsf59vrRKWnojAT9Y+GbIvoTBC++CPHqpDBhQ==",
+      "license": "BSD-3-Clause",
+      "dependencies": {
+        "side-channel": "^1.1.0"
+      },
+      "engines": {
+        "node": ">=0.6"
+      },
+      "funding": {
+        "url": "https://github.com/sponsors/ljharb"
+      }
+    },
+    "node_modules/range-parser": {
+      "version": "1.2.1",
+      "resolved": "https://registry.npmjs.org/range-parser/-/range-parser-1.2.1.tgz",
+      "integrity": "sha512-Hrgsx+orqoygnmhFbKaHE6c296J+HTAQXoxEF6gNupROmmGJRoyzfG3ccAveqCBrwr/2yxQ5BVd/GTl5agOwSg==",
+      "license": "MIT",
+      "engines": {
+        "node": ">= 0.6"
+      }
+    },
+    "node_modules/raw-body": {
+      "version": "3.0.2",
+      "resolved": "https://registry.npmjs.org/raw-body/-/raw-body-3.0.2.tgz",
+      "integrity": "sha512-K5zQjDllxWkf7Z5xJdV0/B0WTNqx6vxG70zJE4N0kBs4LovmEYWJzQGxC9bS9RAKu3bgM40lrd5zoLJ12MQ5BA==",
+      "license": "MIT",
+      "dependencies": {
+        "bytes": "~3.1.2",
+        "http-errors": "~2.0.1",
+        "iconv-lite": "~0.7.0",
+        "unpipe": "~1.0.0"
+      },
+      "engines": {
+        "node": ">= 0.10"
+      }
+    },
+    "node_modules/require-from-string": {
+      "version": "2.0.2",
+      "resolved": "https://registry.npmjs.org/require-from-string/-/require-from-string-2.0.2.tgz",
+      "integrity": "sha512-Xf0nWe6RseziFMu+Ap9biiUbmplq6S9/p+7w7YXP/JBHhrUDDUhwa+vANyubuqfZWTveU//DYVGsDG7RKL/vEw==",
+      "license": "MIT",
+      "engines": {
+        "node": ">=0.10.0"
+      }
+    },
+    "node_modules/resolve-pkg-maps": {
+      "version": "1.0.0",
+      "resolved": "https://registry.npmjs.org/resolve-pkg-maps/-/resolve-pkg-maps-1.0.0.tgz",
+      "integrity": "sha512-seS2Tj26TBVOC2NIc2rOe2y2ZO7efxITtLZcGSOnHHNOQ7CkiUBfw0Iw2ck6xkIhPwLhKNLS8BO+hEpngQlqzw==",
+      "dev": true,
+      "license": "MIT",
+      "funding": {
+        "url": "https://github.com/privatenumber/resolve-pkg-maps?sponsor=1"
+      }
+    },
+    "node_modules/router": {
+      "version": "2.2.0",
+      "resolved": "https://registry.npmjs.org/router/-/router-2.2.0.tgz",
+      "integrity": "sha512-nLTrUKm2UyiL7rlhapu/Zl45FwNgkZGaCpZbIHajDYgwlJCOzLSk+cIPAnsEqV955GjILJnKbdQC1nVPz+gAYQ==",
+      "license": "MIT",
+      "dependencies": {
+        "debug": "^4.4.0",
+        "depd": "^2.0.0",
+        "is-promise": "^4.0.0",
+        "parseurl": "^1.3.3",
+        "path-to-regexp": "^8.0.0"
+      },
+      "engines": {
+        "node": ">= 18"
+      }
+    },
+    "node_modules/safer-buffer": {
+      "version": "2.1.2",
+      "resolved": "https://registry.npmjs.org/safer-buffer/-/safer-buffer-2.1.2.tgz",
+      "integrity": "sha512-YZo3K82SD7Riyi0E1EQPojLz7kpepnSQI9IyPbHHg1XXXevb5dJI7tpyN2ADxGcQbHG7vcyRHk0cbwqcQriUtg==",
+      "license": "MIT"
+    },
+    "node_modules/send": {
+      "version": "1.2.1",
+      "resolved": "https://registry.npmjs.org/send/-/send-1.2.1.tgz",
+      "integrity": "sha512-1gnZf7DFcoIcajTjTwjwuDjzuz4PPcY2StKPlsGAQ1+YH20IRVrBaXSWmdjowTJ6u8Rc01PoYOGHXfP1mYcZNQ==",
+      "license": "MIT",
+      "dependencies": {
+        "debug": "^4.4.3",
+        "encodeurl": "^2.0.0",
+        "escape-html": "^1.0.3",
+        "etag": "^1.8.1",
+        "fresh": "^2.0.0",
+        "http-errors": "^2.0.1",
+        "mime-types": "^3.0.2",
+        "ms": "^2.1.3",
+        "on-finished": "^2.4.1",
+        "range-parser": "^1.2.1",
+        "statuses": "^2.0.2"
+      },
+      "engines": {
+        "node": ">= 18"
+      },
+      "funding": {
+        "type": "opencollective",
+        "url": "https://opencollective.com/express"
+      }
+    },
+    "node_modules/serve-static": {
+      "version": "2.2.1",
+      "resolved": "https://registry.npmjs.org/serve-static/-/serve-static-2.2.1.tgz",
+      "integrity": "sha512-xRXBn0pPqQTVQiC8wyQrKs2MOlX24zQ0POGaj0kultvoOCstBQM5yvOhAVSUwOMjQtTvsPWoNCHfPGwaaQJhTw==",
+      "license": "MIT",
+      "dependencies": {
+        "encodeurl": "^2.0.0",
+        "escape-html": "^1.0.3",
+        "parseurl": "^1.3.3",
+        "send": "^1.2.0"
+      },
+      "engines": {
+        "node": ">= 18"
+      },
+      "funding": {
+        "type": "opencollective",
+        "url": "https://opencollective.com/express"
+      }
+    },
+    "node_modules/setprototypeof": {
+      "version": "1.2.0",
+      "resolved": "https://registry.npmjs.org/setprototypeof/-/setprototypeof-1.2.0.tgz",
+      "integrity": "sha512-E5LDX7Wrp85Kil5bhZv46j8jOeboKq5JMmYM3gVGdGH8xFpPWXUMsNrlODCrkoxMEeNi/XZIwuRvY4XNwYMJpw==",
+      "license": "ISC"
+    },
+    "node_modules/shebang-command": {
+      "version": "2.0.0",
+      "resolved": "https://registry.npmjs.org/shebang-command/-/shebang-command-2.0.0.tgz",
+      "integrity": "sha512-kHxr2zZpYtdmrN1qDjrrX/Z1rR1kG8Dx+gkpK1G4eXmvXswmcE1hTWBWYUzlraYw1/yZp6YuDY77YtvbN0dmDA==",
+      "license": "MIT",
+      "dependencies": {
+        "shebang-regex": "^3.0.0"
+      },
+      "engines": {
+        "node": ">=8"
+      }
+    },
+    "node_modules/shebang-regex": {
+      "version": "3.0.0",
+      "resolved": "https://registry.npmjs.org/shebang-regex/-/shebang-regex-3.0.0.tgz",
+      "integrity": "sha512-7++dFhtcx3353uBaq8DDR4NuxBetBzC7ZQOhmTQInHEd6bSrXdiEyzCvG07Z44UYdLShWUyXt5M/yhz8ekcb1A==",
+      "license": "MIT",
+      "engines": {
+        "node": ">=8"
+      }
+    },
+    "node_modules/side-channel": {
+      "version": "1.1.0",
+      "resolved": "https://registry.npmjs.org/side-channel/-/side-channel-1.1.0.tgz",
+      "integrity": "sha512-ZX99e6tRweoUXqR+VBrslhda51Nh5MTQwou5tnUDgbtyM0dBgmhEDtWGP/xbKn6hqfPRHujUNwz5fy/wbbhnpw==",
+      "license": "MIT",
+      "dependencies": {
+        "es-errors": "^1.3.0",
+        "object-inspect": "^1.13.3",
+        "side-channel-list": "^1.0.0",
+        "side-channel-map": "^1.0.1",
+        "side-channel-weakmap": "^1.0.2"
+      },
+      "engines": {
+        "node": ">= 0.4"
+      },
+      "funding": {
+        "url": "https://github.com/sponsors/ljharb"
+      }
+    },
+    "node_modules/side-channel-list": {
+      "version": "1.0.0",
+      "resolved": "https://registry.npmjs.org/side-channel-list/-/side-channel-list-1.0.0.tgz",
+      "integrity": "sha512-FCLHtRD/gnpCiCHEiJLOwdmFP+wzCmDEkc9y7NsYxeF4u7Btsn1ZuwgwJGxImImHicJArLP4R0yX4c2KCrMrTA==",
+      "license": "MIT",
+      "dependencies": {
+        "es-errors": "^1.3.0",
+        "object-inspect": "^1.13.3"
+      },
+      "engines": {
+        "node": ">= 0.4"
+      },
+      "funding": {
+        "url": "https://github.com/sponsors/ljharb"
+      }
+    },
+    "node_modules/side-channel-map": {
+      "version": "1.0.1",
+      "resolved": "https://registry.npmjs.org/side-channel-map/-/side-channel-map-1.0.1.tgz",
+      "integrity": "sha512-VCjCNfgMsby3tTdo02nbjtM/ewra6jPHmpThenkTYh8pG9ucZ/1P8So4u4FGBek/BjpOVsDCMoLA/iuBKIFXRA==",
+      "license": "MIT",
+      "dependencies": {
+        "call-bound": "^1.0.2",
+        "es-errors": "^1.3.0",
+        "get-intrinsic": "^1.2.5",
+        "object-inspect": "^1.13.3"
+      },
+      "engines": {
+        "node": ">= 0.4"
+      },
+      "funding": {
+        "url": "https://github.com/sponsors/ljharb"
+      }
+    },
+    "node_modules/side-channel-weakmap": {
+      "version": "1.0.2",
+      "resolved": "https://registry.npmjs.org/side-channel-weakmap/-/side-channel-weakmap-1.0.2.tgz",
+      "integrity": "sha512-WPS/HvHQTYnHisLo9McqBHOJk2FkHO/tlpvldyrnem4aeQp4hai3gythswg6p01oSoTl58rcpiFAjF2br2Ak2A==",
+      "license": "MIT",
+      "dependencies": {
+        "call-bound": "^1.0.2",
+        "es-errors": "^1.3.0",
+        "get-intrinsic": "^1.2.5",
+        "object-inspect": "^1.13.3",
+        "side-channel-map": "^1.0.1"
+      },
+      "engines": {
+        "node": ">= 0.4"
+      },
+      "funding": {
+        "url": "https://github.com/sponsors/ljharb"
+      }
+    },
+    "node_modules/statuses": {
+      "version": "2.0.2",
+      "resolved": "https://registry.npmjs.org/statuses/-/statuses-2.0.2.tgz",
+      "integrity": "sha512-DvEy55V3DB7uknRo+4iOGT5fP1slR8wQohVdknigZPMpMstaKJQWhwiYBACJE3Ul2pTnATihhBYnRhZQHGBiRw==",
+      "license": "MIT",
+      "engines": {
+        "node": ">= 0.8"
+      }
+    },
+    "node_modules/toidentifier": {
+      "version": "1.0.1",
+      "resolved": "https://registry.npmjs.org/toidentifier/-/toidentifier-1.0.1.tgz",
+      "integrity": "sha512-o5sSPKEkg/DIQNmH43V0/uerLrpzVedkUh8tGNvaeXpfpuwjKenlSox/2O/BTlZUtEe+JG7s5YhEz608PlAHRA==",
+      "license": "MIT",
+      "engines": {
+        "node": ">=0.6"
+      }
+    },
+    "node_modules/ts-algebra": {
+      "version": "2.0.0",
+      "resolved": "https://registry.npmjs.org/ts-algebra/-/ts-algebra-2.0.0.tgz",
+      "integrity": "sha512-FPAhNPFMrkwz76P7cdjdmiShwMynZYN6SgOujD1urY4oNm80Ou9oMdmbR45LotcKOXoy7wSmHkRFE6Mxbrhefw==",
+      "license": "MIT"
+    },
+    "node_modules/tsx": {
+      "version": "4.21.0",
+      "resolved": "https://registry.npmjs.org/tsx/-/tsx-4.21.0.tgz",
+      "integrity": "sha512-5C1sg4USs1lfG0GFb2RLXsdpXqBSEhAaA/0kPL01wxzpMqLILNxIxIOKiILz+cdg/pLnOUxFYOR5yhHU666wbw==",
+      "dev": true,
+      "license": "MIT",
+      "dependencies": {
+        "esbuild": "~0.27.0",
+        "get-tsconfig": "^4.7.5"
+      },
+      "bin": {
+        "tsx": "dist/cli.mjs"
+      },
+      "engines": {
+        "node": ">=18.0.0"
+      },
+      "optionalDependencies": {
+        "fsevents": "~2.3.3"
+      }
+    },
+    "node_modules/type-is": {
+      "version": "2.0.1",
+      "resolved": "https://registry.npmjs.org/type-is/-/type-is-2.0.1.tgz",
+      "integrity": "sha512-OZs6gsjF4vMp32qrCbiVSkrFmXtG/AZhY3t0iAMrMBiAZyV9oALtXO8hsrHbMXF9x6L3grlFuwW2oAz7cav+Gw==",
+      "license": "MIT",
+      "dependencies": {
+        "content-type": "^1.0.5",
+        "media-typer": "^1.1.0",
+        "mime-types": "^3.0.0"
+      },
+      "engines": {
+        "node": ">= 0.6"
+      }
+    },
+    "node_modules/typescript": {
+      "version": "5.9.3",
+      "resolved": "https://registry.npmjs.org/typescript/-/typescript-5.9.3.tgz",
+      "integrity": "sha512-jl1vZzPDinLr9eUt3J/t7V6FgNEw9QjvBPdysz9KfQDD41fQrC2Y4vKQdiaUpFT4bXlb1RHhLpp8wtm6M5TgSw==",
+      "dev": true,
+      "license": "Apache-2.0",
+      "bin": {
+        "tsc": "bin/tsc",
+        "tsserver": "bin/tsserver"
+      },
+      "engines": {
+        "node": ">=14.17"
+      }
+    },
+    "node_modules/unpipe": {
+      "version": "1.0.0",
+      "resolved": "https://registry.npmjs.org/unpipe/-/unpipe-1.0.0.tgz",
+      "integrity": "sha512-pjy2bYhSsufwWlKwPc+l3cN7+wuJlK6uz0YdJEOlQDbl6jo/YlPi4mb8agUkVC8BF7V8NuzeyPNqRksA3hztKQ==",
+      "license": "MIT",
+      "engines": {
+        "node": ">= 0.8"
+      }
+    },
+    "node_modules/vary": {
+      "version": "1.1.2",
+      "resolved": "https://registry.npmjs.org/vary/-/vary-1.1.2.tgz",
+      "integrity": "sha512-BNGbWLfd0eUPabhkXUVm0j8uuvREyTh5ovRa/dyow/BqAbZJyC+5fU+IzQOzmAKzYqYRAISoRhdQr3eIZ/PXqg==",
+      "license": "MIT",
+      "engines": {
+        "node": ">= 0.8"
+      }
+    },
+    "node_modules/which": {
+      "version": "2.0.2",
+      "resolved": "https://registry.npmjs.org/which/-/which-2.0.2.tgz",
+      "integrity": "sha512-BLI3Tl1TW3Pvl70l3yq3Y64i+awpwXqsGBYWkkqMtnbXgrMD+yj7rhW0kuEDxzJaYXGjEW5ogapKNMEKNMjibA==",
+      "license": "ISC",
+      "dependencies": {
+        "isexe": "^2.0.0"
+      },
+      "bin": {
+        "node-which": "bin/node-which"
+      },
+      "engines": {
+        "node": ">= 8"
+      }
+    },
+    "node_modules/wrappy": {
+      "version": "1.0.2",
+      "resolved": "https://registry.npmjs.org/wrappy/-/wrappy-1.0.2.tgz",
+      "integrity": "sha512-l4Sp/DRseor9wL6EvV2+TuQn63dMkPjZ/sp9XkghTEbV9KlPS1xUsZ3u7/IQO4wxtcFB4bgpQPRcR3QCvezPcQ==",
+      "license": "ISC"
+    },
+    "node_modules/zod": {
+      "version": "4.3.6",
+      "resolved": "https://registry.npmjs.org/zod/-/zod-4.3.6.tgz",
+      "integrity": "sha512-rftlrkhHZOcjDwkGlnUtZZkvaPHCsDATp4pGpuOOMDaTdDDXF91wuVDJoWoPsKX/3YPQ5fHuF3STjcYyKr+Qhg==",
+      "license": "MIT",
+      "peer": true,
+      "funding": {
+        "url": "https://github.com/sponsors/colinhacks"
+      }
+    },
+    "node_modules/zod-to-json-schema": {
+      "version": "3.25.2",
+      "resolved": "https://registry.npmjs.org/zod-to-json-schema/-/zod-to-json-schema-3.25.2.tgz",
+      "integrity": "sha512-O/PgfnpT1xKSDeQYSCfRI5Gy3hPf91mKVDuYLUHZJMiDFptvP41MSnWofm8dnCm0256ZNfZIM7DSzuSMAFnjHA==",
+      "license": "ISC",
+      "peerDependencies": {
+        "zod": "^3.25.28 || ^4"
+      }
+    }
+  }
+}
diff --git a/docs/agent-evaluation/package.json b/docs/agent-evaluation/package.json
new file mode 100644
index 000000000..900af5e2d
--- /dev/null
+++ b/docs/agent-evaluation/package.json
@@ -0,0 +1,25 @@
+{
+  "name": "outpost-agent-evaluation",
+  "version": "1.0.0",
+  "private": true,
+  "type": "module",
+  "description": "Claude Agent SDK harness for Outpost onboarding scenario evals",
+  "scripts": {
+    "eval": "node --import tsx src/run-agent-eval.ts",
+    "eval:ci": "node --import tsx src/run-agent-eval.ts -- --scenarios 01,02",
+    "eval:tsx-cli": "tsx src/run-agent-eval.ts",
+    "score": "node --import tsx src/score-eval.ts",
+    "typecheck": "tsc --noEmit"
+  },
+  "engines": {
+    "node": ">=18"
+  },
+  "dependencies": {
+    "@anthropic-ai/claude-agent-sdk": "^0.2.92",
+    "dotenv": "^16.4.7"
+  },
+  "devDependencies": {
+    "tsx": "^4.19.4",
+    "typescript": "^5.8.3"
+  }
+}
diff --git a/docs/agent-evaluation/results/.gitignore b/docs/agent-evaluation/results/.gitignore
new file mode 100644
index 000000000..3a2f71330
--- /dev/null
+++ b/docs/agent-evaluation/results/.gitignore
@@ -0,0 +1,5 @@
+# Ignore local run recordings; keep README + template committed
+*
+!.gitignore
+!README.md
+!RUN-RECORDING.template.md
diff --git a/docs/agent-evaluation/results/README.md b/docs/agent-evaluation/results/README.md
new file mode 100644
index 000000000..0ed815986
--- /dev/null
+++ b/docs/agent-evaluation/results/README.md
@@ -0,0 +1,57 @@
+# Agent evaluation — results
+
+This directory holds **manual run write-ups** and, under `**runs/`**, **automated** artifacts from `npm run eval`. Almost everything here is **gitignored** by default (see `[.gitignore](.gitignore)`).
+
+Full workflow and env vars: `**[../README.md](../README.md)`**.
+
+---
+
+## Automated runs (`runs/`)
+
+From `docs/agent-evaluation/`:
+
+```sh
+npm run eval -- --scenario 01
+npm run eval -- --scenarios 01,02
+npm run eval -- --all
+```
+
+Each run is a **directory** (same timestamp stem, all gitignored):
+
+`runs/<stamp>-scenario-NN/`
+
+| Path in run dir                         | What it is                                                                                                                       |
+| --------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------- |
+| `transcript.json`                       | Full Claude Agent SDK transcript (`meta` + `messages`).                                                                          |
+| `heuristic-score.json`                  | **Heuristic** transcript checks (`[../src/score-transcript.ts](../src/score-transcript.ts)`); rubrics **01–10** (`scoreScenario01`–`10`). |
+| `llm-score.json`                        | **LLM judge** output (`[../src/llm-judge.ts](../src/llm-judge.ts)`) vs `**## Success criteria`** in the scenario markdown.       |
+| *(other files)*                         | Anything the agent **`Write`**s (e.g. `outpost-quickstart.sh`); SDK **`cwd`** is this directory.                                 |
+
+Legacy flat `runs/<stamp>-scenario-NN.json` (and `*.score.json` / `*.llm-score.json` beside it) still work with **`npm run score`**.
+
+Re-score an existing run without re-running the agent:
+
+```sh
+npm run score -- --run results/runs/<stamp>-scenario-NN --write
+npm run score -- --run results/runs/<stamp>-scenario-NN --llm --write
+```
+
+**Execution** (curl/SDK against live Outpost with `OUTPOST_API_KEY`) is **not** produced by these JSON files. Treat the **Execution (full pass)** rows in `[../scenarios/](../scenarios/)` as a separate human or CI step unless you add a verifier script.
+
+---
+
+## Manual run recordings
+
+For **IDE-only** or ad-hoc runs (no `npm run eval`):
+
+1. Copy `[RUN-RECORDING.template.md](RUN-RECORDING.template.md)` to a **local-only** name (e.g. `2026-04-08-s01-cursor.md`) in this directory.
+2. Fill in transcript summary, heuristic/LLM pointers if you ran `npm run score` separately, **Execution verification**, and notes.
+3. Do not commit raw recordings unless your policy allows it; anonymized summaries in a PR are fine.
+
+Success criteria for every scenario: `**[../scenarios/*.md](../scenarios/)`** — section **Success criteria**.
+
+---
+
+## Template
+
+See `[RUN-RECORDING.template.md](RUN-RECORDING.template.md)`.
\ No newline at end of file
diff --git a/docs/agent-evaluation/results/RUN-RECORDING.template.md b/docs/agent-evaluation/results/RUN-RECORDING.template.md
new file mode 100644
index 000000000..047b9fa84
--- /dev/null
+++ b/docs/agent-evaluation/results/RUN-RECORDING.template.md
@@ -0,0 +1,36 @@
+# Agent eval recording (copy this file, rename, fill in)
+
+**Scenario:** (e.g. `01-basics-curl` — link to `../scenarios/....md`)  
+**Date:** YYYY-MM-DD  
+**Agent / client:** (e.g. Cursor Agent, Claude Code, Copilot Chat)  
+**Model:** (if known)  
+**Outpost skill enabled?** yes / no  
+
+## Environment
+
+- Docs / prompt source: (commit SHA or “main @ date”)
+- Hookdeck project: throwaway / prod (describe)
+
+## Transcript summary
+
+(Optional bullets — do not paste secrets.)
+
+- Turn 0: …
+- Turn 1: …
+
+## Success criteria (from scenario doc)
+
+Copy the checklist from the scenario and mark **PASS** / **FAIL** / **N/A**.
+
+- …
+
+## Execution verification (full pass)
+
+Did you run the generated curl / script / app against a **live** Outpost project with `**OUTPOST_API_KEY`** (and related env vars)?
+
+- **Execution:** PASS / FAIL / SKIPPED (transcript-only)
+- Notes (HTTP status codes, error bodies — no secrets):
+
+## Notes / regressions
+
+…
\ No newline at end of file
diff --git a/docs/agent-evaluation/scenarios/01-basics-curl.md b/docs/agent-evaluation/scenarios/01-basics-curl.md
new file mode 100644
index 000000000..b7a491861
--- /dev/null
+++ b/docs/agent-evaluation/scenarios/01-basics-curl.md
@@ -0,0 +1,48 @@
+# Scenario 1 — Basics with curl
+
+## Intent
+
+Agent should produce a **minimal shell + curl** flow against the **managed** API (no SDK), matching the official curl quickstart. Prefer a **single runnable shell script** (e.g. `outpost-quickstart.sh`) that sets variables and runs all curls, so the operator can `chmod +x` and run once; inline copy-paste blocks are acceptable if the user asked only for “commands.”
+
+## Preconditions
+
+- `OUTPOST_API_KEY` set in the environment (user states this; agent must not ask for the raw key in chat).
+- Topics include at least one topic used in the script (e.g. `user.created`).
+
+## Automated eval (Claude Agent SDK)
+
+The harness sets the agent **cwd** to an empty directory under `docs/agent-evaluation/results/runs/<stamp>-scenario-NN/`. Save the shell script there with **Write** (e.g. `outpost-quickstart.sh`), not only as a fenced block in chat, so the run folder is reviewable on disk.
+
+## Conversation script
+
+### Turn 0
+
+Paste the **## Template** block from `[hookdeck-outpost-agent-prompt.mdx](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx)`, with `{{…}}` filled using your project or `[fixtures/placeholder-values-for-turn0.md](../fixtures/placeholder-values-for-turn0.md)`.
+
+### Turn 1 — User
+
+> I only want the basics using **curl** against the managed API. No SDK. Give me a **single shell script** I can save and run (e.g. `bash outpost-quickstart.sh`) that: creates a tenant, adds a webhook destination for my test URL, and publishes one event. Use the topic from the prompt. Use `OUTPOST_API_KEY` from the environment (document that I should `export` it or load `.env`). If you can’t provide a file, paste one script block I can save as `.sh`.
+
+### Turn 2 — User (optional probe)
+
+> Show me how to verify delivery after I run those commands.
+
+## Success criteria
+
+**Measurement:** Heuristic rubric `scoreScenario01` in [`../src/score-transcript.ts`](../src/score-transcript.ts) (assistant text + tool-written script content). LLM judge: `npm run score -- --run <run-dir> --llm`. Execution row remains manual.
+
+- Uses managed base URL `https://api.outpost.hookdeck.com/2025-07-01` (or explicit `OUTPOST_API_BASE_URL`), **not** `localhost:3333/api/v1`, unless the user asked for self-hosted.
+- Tenant: `PUT .../tenants/{tenant_id}` with `Authorization: Bearer` (or documents equivalent).
+- Destination: `POST .../tenants/{tenant_id}/destinations` with `type: webhook`, `topics` including the configured topic or `*`, and `config.url` pointing at a test HTTPS URL (env or placeholder).
+- Publish: `POST .../publish` with `tenant_id`, `topic`, and a top-level JSON field `**data`** (the event payload object — see OpenAPI `PublishRequest` and curl quickstart). Not `payload`. Typically also `eligible_for_retry`.
+- Delivers as one **shell script** (or one fenced `bash` block meant to be saved as `.sh`), not only three unrelated snippets without a shebang/variables.
+- Does **not** embed a pasted API key in the reply.
+- Verification mentions Hookdeck Console / dashboard logs if Turn 2 was asked.
+- **Execution (full pass):** With `OUTPOST_API_KEY` (and `OUTPOST_API_BASE_URL` if the snippet uses it) set in your environment, run the agent’s tenant → destination → publish sequence against a real project. Expect **2xx** on tenant upsert and destination create, **202** (or documented success) on publish, and a visible delivery to the test webhook URL (Hookdeck Console / project logs, or `GET .../attempts` as appropriate). *Skip only if you are doing transcript-only triage.*
+
+## Failure modes to note
+
+- Wrong path (`PUT /{tenant}` without `/tenants/`).
+- Mixing self-hosted base path with managed host.
+- Skipping topic alignment with dashboard configuration.
+
diff --git a/docs/agent-evaluation/scenarios/02-basics-typescript.md b/docs/agent-evaluation/scenarios/02-basics-typescript.md
new file mode 100644
index 000000000..9a2fc40a7
--- /dev/null
+++ b/docs/agent-evaluation/scenarios/02-basics-typescript.md
@@ -0,0 +1,45 @@
+# Scenario 2 — Basics with TypeScript
+
+## Intent
+
+Agent should produce a **single runnable `.ts` file** using `@hookdeck/outpost-sdk`, following the managed TypeScript quickstart pattern.
+
+## Preconditions
+
+- Node 18+; user can run `npx tsx`.
+- `OUTPOST_API_KEY` and `OUTPOST_TEST_WEBHOOK_URL` available as env vars.
+
+## Automated eval (Claude Agent SDK)
+
+The harness sets the agent **cwd** to `docs/agent-evaluation/results/runs/<stamp>-scenario-NN/`. Write the script and any `package.json` there with **Write** / **Edit**; use **Bash** for `npm install`, `npx tsx`, etc., so the folder is a runnable mini-project.
+
+## Conversation script
+
+### Turn 0
+
+Paste the **## Template** block from `[hookdeck-outpost-agent-prompt.mdx](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx)`, with `{{…}}` filled using your project or `[fixtures/placeholder-values-for-turn0.md](../fixtures/placeholder-values-for-turn0.md)`.
+
+### Turn 1 — User
+
+> Option 1 — try it out. Use **TypeScript** only: one script file, use `@hookdeck/outpost-sdk`, read `OUTPOST_API_KEY` and `OUTPOST_TEST_WEBHOOK_URL` from the environment. Create tenant, webhook destination for the topic in the prompt, publish one test event, print the event id.
+
+### Turn 2 — User (optional)
+
+> How do I run it?
+
+## Success criteria
+
+**Measurement:** Heuristic `scoreScenario02` in [`src/score-transcript.ts`](../src/score-transcript.ts); LLM judge maps the bullets below ([`README.md` § Measuring scenarios](../README.md#measuring-scenarios)). Execution row is manual.
+
+- Depends on `@hookdeck/outpost-sdk`; uses `Outpost` client with `apiKey` from `process.env.OUTPOST_API_KEY`.
+- Calls `tenants.upsert`, `destinations.create` (webhook), `publish.event`.
+- Uses a topic that matches the dashboard list from the prompt (or asks which topic if ambiguous).
+- Webhook URL from `OUTPOST_TEST_WEBHOOK_URL` (or clearly documented env).
+- No API key in source; fails fast if env missing.
+- Mentions `npx tsx script.ts` or equivalent run instructions.
+- **Execution (full pass):** With `OUTPOST_API_KEY`, `OUTPOST_TEST_WEBHOOK_URL`, and optional `OUTPOST_API_BASE_URL` set, the generated script runs to completion (no uncaught API errors) and prints or logs an event id or other clear success signal. *Skip only for transcript-only triage.*
+
+## Failure modes to note
+
+- Defaulting to localhost API without user asking for self-hosted.
+- Using raw `fetch` when user asked for TypeScript SDK specifically.
\ No newline at end of file
diff --git a/docs/agent-evaluation/scenarios/03-basics-python.md b/docs/agent-evaluation/scenarios/03-basics-python.md
new file mode 100644
index 000000000..2d9ecb88b
--- /dev/null
+++ b/docs/agent-evaluation/scenarios/03-basics-python.md
@@ -0,0 +1,43 @@
+# Scenario 3 — Basics with Python
+
+## Intent
+
+Agent should produce a **single Python script** using `outpost_sdk`, equivalent to scenario 2.
+
+## Preconditions
+
+- Python 3.9+; `pip install outpost_sdk`.
+- `OUTPOST_API_KEY`, `OUTPOST_TEST_WEBHOOK_URL` set.
+
+## Automated eval (Claude Agent SDK)
+
+The harness sets the agent **cwd** to `docs/agent-evaluation/results/runs/<stamp>-scenario-NN/`. Save `*.py`, `requirements.txt` or `pyproject.toml` with **Write** / **Edit**; use **Bash** for `pip` / `uv` installs so the run directory is self-contained.
+
+## Conversation script
+
+### Turn 0
+
+Paste the **## Template** block from [`hookdeck-outpost-agent-prompt.mdx`](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx), with `{{…}}` filled using your project or [`fixtures/placeholder-values-for-turn0.md`](../fixtures/placeholder-values-for-turn0.md).
+
+### Turn 1 — User
+
+> Option 1 — try it out. Use **Python** with `outpost_sdk`. Read credentials from the environment. Same flow: tenant, webhook destination, one publish, print event id.
+
+### Turn 2 — User (optional)
+
+> Keep it to one file I can run with `python`.
+
+## Success criteria
+
+**Measurement:** Heuristic `scoreScenario03` in [`src/score-transcript.ts`](../src/score-transcript.ts); LLM judge maps the checklist below ([`README.md` § Measuring scenarios](../README.md#measuring-scenarios)). Execution row is manual.
+
+- [ ] `from outpost_sdk import Outpost` (or equivalent documented import path).
+- [ ] `Outpost(api_key=..., server_url=...)` with optional base URL from env.
+- [ ] `tenants.upsert`, `destinations.create`, `publish.event` with correct shapes.
+- [ ] Topic aligned with prompt; webhook URL from env.
+- [ ] No secrets in file.
+- [ ] **Execution (full pass):** With `OUTPOST_API_KEY`, `OUTPOST_TEST_WEBHOOK_URL`, and optional base URL env vars set, `python …` (as documented) completes without API errors and prints an event id or clear success. *Skip only for transcript-only triage.*
+
+## Failure modes to note
+
+- Using `requests` only when user asked for the official SDK.
diff --git a/docs/agent-evaluation/scenarios/04-basics-go.md b/docs/agent-evaluation/scenarios/04-basics-go.md
new file mode 100644
index 000000000..29622c6a1
--- /dev/null
+++ b/docs/agent-evaluation/scenarios/04-basics-go.md
@@ -0,0 +1,38 @@
+# Scenario 4 — Basics with Go
+
+## Intent
+
+Agent should produce a **small Go program** using `github.com/hookdeck/outpost/sdks/outpost-go`, equivalent to scenarios 2–3.
+
+## Preconditions
+
+- Go toolchain; module with `outpost-go` dependency.
+- `OUTPOST_API_KEY`, `OUTPOST_TEST_WEBHOOK_URL` set.
+
+## Automated eval (Claude Agent SDK)
+
+The harness sets the agent **cwd** to `docs/agent-evaluation/results/runs/<stamp>-scenario-NN/`. Write `go.mod`, `main.go`, etc. with **Write** / **Edit**; use **Bash** for `go mod init`, `go mod tidy`, and `go run` so the folder is a complete module.
+
+## Conversation script
+
+### Turn 0
+
+Paste the **## Template** block from [`hookdeck-outpost-agent-prompt.mdx`](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx), with `{{…}}` filled using your project or [`fixtures/placeholder-values-for-turn0.md`](../fixtures/placeholder-values-for-turn0.md).
+
+### Turn 1 — User
+
+> Option 1 — try it out. Use **Go** and the official Outpost Go SDK. Environment variables for API key and test webhook URL. Tenant upsert, webhook destination, publish one event, print ids.
+
+## Success criteria
+
+**Measurement:** Heuristic `scoreScenario04` in [`src/score-transcript.ts`](../src/score-transcript.ts); LLM judge maps the checklist below ([`README.md` § Measuring scenarios](../README.md#measuring-scenarios)). Execution row is manual.
+
+- [ ] `outpostgo.New` with `WithSecurity` (and optional `WithServerURL`).
+- [ ] `Tenants.Upsert`, `Destinations.Create` with `CreateDestinationCreateWebhook` (or correct union wrapper), `Publish.Event`.
+- [ ] Topic and tenant id explicit; matches prompt topics.
+- [ ] No API key in source.
+- [ ] **Execution (full pass):** With `OUTPOST_API_KEY`, `OUTPOST_TEST_WEBHOOK_URL`, and optional server URL env vars set, `go run …` succeeds and prints ids or clear success. *Skip only for transcript-only triage.*
+
+## Failure modes to note
+
+- Passing raw struct to `Create` without `CreateDestinationCreateWebhook` wrapper (common compile mistake).
diff --git a/docs/agent-evaluation/scenarios/05-app-nextjs.md b/docs/agent-evaluation/scenarios/05-app-nextjs.md
new file mode 100644
index 000000000..3e5ffa10b
--- /dev/null
+++ b/docs/agent-evaluation/scenarios/05-app-nextjs.md
@@ -0,0 +1,58 @@
+# Scenario 5 — Minimal example app (Next.js)
+
+## Intent
+
+Agent scaffolds a **minimal Next.js** app (App Router or Pages Router acceptable) with a **simple UI** that lets an operator:
+
+1. Register a **webhook destination** for a tenant (URL input + submit).
+2. After registration, **trigger a test publish** to a configured topic so the destination receives an event.
+
+Server-side code must call Outpost with the API key from **environment** (e.g. `OUTPOST_API_KEY`), never exposed to the browser.
+
+## Preconditions
+
+- User has Node 18+; comfortable creating a Next app.
+- `OUTPOST_API_KEY`, managed base URL, at least one topic, and `OUTPOST_TEST_WEBHOOK_URL` or user-supplied URL pattern documented.
+
+## Automated eval (Claude Agent SDK)
+
+The harness sets the agent **cwd** to an empty directory under `docs/agent-evaluation/results/runs/<stamp>-scenario-NN/`. You **must** scaffold the Next.js app **into that directory** (e.g. `npx create-next-app@latest` with flags for non-interactive use) using **Bash**, then implement routes/server code with **Write** / **Edit**. Chat-only snippets are not enough for this scenario—the run folder should contain a real project tree reviewers can `npm install && npm run dev`.
+
+## Conversation script
+
+### Turn 0
+
+Paste the **## Template** block from `[hookdeck-outpost-agent-prompt.mdx](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx)`, with `{{…}}` filled using your project or `[fixtures/placeholder-values-for-turn0.md](../fixtures/placeholder-values-for-turn0.md)`.
+
+### Turn 1 — User
+
+> Option 2 — build a minimal example. I want **Next.js**. Very small UI: field for webhook URL, button to create the webhook destination for tenant `demo_tenant` (or let me edit tenant id in the UI), and a button to send one test event on topic `user.created` (or the first topic from the prompt). Use the Outpost TypeScript SDK on the server only.
+
+### Turn 2 — User (optional)
+
+> Add a short README with env vars and `npm run dev` steps.
+
+### Turn 3 — User (stress)
+
+> I do not have a public URL yet — what should I use for the webhook URL field?
+
+Expected: agent suggests Hookdeck Console Source URL or similar, aligned with quickstarts.
+
+## Success criteria
+
+**Measurement:** Heuristic `scoreScenario05` in `[src/score-transcript.ts](../src/score-transcript.ts)`; LLM judge maps the bullets below (`[README.md` § Measuring scenarios](../README.md#measuring-scenarios)). Execution row is manual.
+
+- Next.js project structure with install/run instructions.
+- API routes or server actions perform Outpost calls; **no API key** in client bundles.
+- UI flow covers **create destination** and **publish** (two distinct actions visible to the user).
+- Tenant id and topic are configurable or clearly documented constants.
+- Uses managed base URL by default.
+- README lists required env vars.
+- **Execution (full pass):** After `npm install` and `npm run dev` (or documented command), a manual smoke test completes **both** flows: register webhook destination and trigger test publish, without 5xx from your app’s Outpost calls and with Outpost accepting the requests. Requires `OUTPOST_API_KEY` and related env in `.env.local` or as documented. *Skip only for transcript-only triage.*
+
+## Failure modes to note
+
+- Calling Outpost directly from browser-side code with embedded key.
+- Only publishing without a UI path to register the destination first.
+- Hard-coding localhost Outpost without user request.
+
diff --git a/docs/agent-evaluation/scenarios/06-app-fastapi.md b/docs/agent-evaluation/scenarios/06-app-fastapi.md
new file mode 100644
index 000000000..1f00b5f68
--- /dev/null
+++ b/docs/agent-evaluation/scenarios/06-app-fastapi.md
@@ -0,0 +1,47 @@
+# Scenario 6 — Minimal example app (FastAPI + Jinja or HTMX)
+
+## Intent
+
+Same product behavior as [scenario 5](05-app-nextjs.md), but stack is **Python FastAPI**:
+
+- Server renders a **simple HTML form** (Jinja2 templates, HTMX, or minimal static HTML served by FastAPI).
+- Endpoints (or form posts) call `outpost_sdk` with env-based API key.
+- User can submit webhook URL → create destination; user can trigger test publish.
+
+## Preconditions
+
+- Python 3.9+; `fastapi`, `uvicorn`, `outpost_sdk`.
+
+## Automated eval (Claude Agent SDK)
+
+The harness sets the agent **cwd** to `docs/agent-evaluation/results/runs/<stamp>-scenario-NN/`. Create the FastAPI app **in that directory**: add source files with **Write** / **Edit**, install deps with **Bash** (`pip` / `uv`). The run folder must be a small but complete app (not only code pasted in chat).
+
+## Conversation script
+
+### Turn 0
+
+Paste the **## Template** block from [`hookdeck-outpost-agent-prompt.mdx`](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx), with `{{…}}` filled using your project or [`fixtures/placeholder-values-for-turn0.md`](../fixtures/placeholder-values-for-turn0.md).
+
+### Turn 1 — User
+
+> Option 2 — minimal example with **FastAPI**. Single small app: HTML page with webhook URL field, button to register destination for tenant `demo_tenant`, button to publish one test event. Use `outpost_sdk` only on the server. Keep it to a few files.
+
+### Turn 2 — User (optional)
+
+> Document env vars and `uvicorn` command in README.
+
+## Success criteria
+
+**Measurement:** Heuristic `scoreScenario06` in [`src/score-transcript.ts`](../src/score-transcript.ts); LLM judge maps the checklist below ([`README.md` § Measuring scenarios](../README.md#measuring-scenarios)). Execution row is manual.
+
+- [ ] FastAPI app runs with one command documented (`uvicorn ...`).
+- [ ] Outpost calls only server-side; API key from environment.
+- [ ] Two user-visible actions: **register webhook** and **publish test event**.
+- [ ] Managed API base URL by default.
+- [ ] README with `OUTPOST_API_KEY`, `OUTPOST_TEST_WEBHOOK_URL` or equivalent.
+- [ ] **Execution (full pass):** App starts (`uvicorn` or as documented); manual smoke test completes **register webhook** and **publish test event** without server errors on Outpost calls. Env vars set including `OUTPOST_API_KEY`. *Skip only for transcript-only triage.*
+
+## Failure modes to note
+
+- Exposing API key to templates/inline JS.
+- Using only `curl` subprocesses when user asked for FastAPI + SDK.
diff --git a/docs/agent-evaluation/scenarios/07-app-go-http.md b/docs/agent-evaluation/scenarios/07-app-go-http.md
new file mode 100644
index 000000000..cfdd594a9
--- /dev/null
+++ b/docs/agent-evaluation/scenarios/07-app-go-http.md
@@ -0,0 +1,46 @@
+# Scenario 7 — Minimal example app (Go net/http)
+
+## Intent
+
+Same behavior as scenarios 5–6: **small Go program** using `net/http` (no heavy framework required) that serves **basic HTML** with:
+
+1. Form or fields for webhook URL → create webhook destination (via `outpost-go`).
+2. Control to **publish** one test event.
+
+## Preconditions
+
+- Go 1.22+; `outpost-go` module.
+
+## Automated eval (Claude Agent SDK)
+
+The harness sets the agent **cwd** to `docs/agent-evaluation/results/runs/<stamp>-scenario-NN/`. Initialize the module and server **there** (`go mod init`, `go get`, etc. via **Bash**; `main.go` / `handlers.go` via **Write** / **Edit`). Reviewers should be able to `go run .` from the run directory after the eval.
+
+## Conversation script
+
+### Turn 0
+
+Paste the **## Template** block from `[hookdeck-outpost-agent-prompt.mdx](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx)`, with `{{…}}` filled using your project or `[fixtures/placeholder-values-for-turn0.md](../fixtures/placeholder-values-for-turn0.md)`.
+
+### Turn 1 — User
+
+> Option 2 — minimal example in **Go**. Standard library HTTP server, simple HTML page: register webhook destination for a fixed tenant id, then button to publish one event. Use the official Go SDK for Outpost calls. API key from environment.
+
+### Turn 2 — User (optional)
+
+> Keep everything in `main.go` if reasonable, or split `handlers.go` — your choice, but stay small.
+
+## Success criteria
+
+**Measurement:** Heuristic `scoreScenario07` in [`src/score-transcript.ts`](../src/score-transcript.ts); LLM judge maps the bullets below ([`README.md` § Measuring scenarios](../README.md#measuring-scenarios)). Execution row is manual.
+
+- `go run .` (or `go run main.go`) documented.
+- HTML UI with two flows: **create destination**, **publish**.
+- SDK used server-side only; `OUTPOST_API_KEY` from env.
+- Correct `CreateDestinationCreateWebhook` usage.
+- README lists env vars and port.
+- **Execution (full pass):** `go run …` starts the server; manual smoke test completes **create destination** and **publish** through the HTML UI without Outpost API failures. `OUTPOST_API_KEY` (and related env) set. *Skip only for transcript-only triage.*
+
+## Failure modes to note
+
+- Embedding API key in HTML/JS.
+- Omitting publish action after destination registration.
\ No newline at end of file
diff --git a/docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md b/docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md
new file mode 100644
index 000000000..56cd9c9b0
--- /dev/null
+++ b/docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md
@@ -0,0 +1,59 @@
+# Scenario 8 — Integrate Outpost into an existing Next.js SaaS app
+
+## Intent
+
+Operators often have a **production-shaped SaaS codebase** (auth, teams, dashboard) and need **outbound webhooks** for their customers. This scenario measures whether the agent can **clone a known open-source baseline**, understand where **domain events** happen, and **wire Hookdeck Outpost** so events are **published** to Outpost (with **per-tenant webhook destinations** documented or implemented).
+
+**Baseline application (pin this in evals):** [**leerob/next-saas-starter**](https://github.com/leerob/next-saas-starter) — Next.js, PostgreSQL, Drizzle, team/member flows, MIT license. It is a common reference for “real” SaaS structure; adjust the prompt if you standardize on another repo.
+
+## Preconditions
+
+- Node 18+; `git` available.
+- Same Turn 0 placeholders as other scenarios (`OUTPOST_API_KEY` **not** in the prompt text; test destination URL from dashboard).
+
+## Automated eval (Claude Agent SDK)
+
+The harness **`cwd`** is an empty directory under `results/runs/<stamp>-scenario-08/`. The agent should **`git clone`** the baseline into that workspace (or a subdirectory), **`npm` / `pnpm install`** via **Bash**, then **Write** / **Edit** integration code. Reviewers inspect the run folder and transcript.
+
+## Conversation script
+
+### Turn 0
+
+Paste the **## Template** block from [`hookdeck-outpost-agent-prompt.mdx`](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx), with `{{…}}` filled using your project or [`fixtures/placeholder-values-for-turn0.md`](../fixtures/placeholder-values-for-turn0.md).
+
+### Turn 1 — User
+
+> **Option 3 — integrate with an existing app.** Clone **`https://github.com/leerob/next-saas-starter`** into this workspace (subdirectory is fine), install dependencies per its README, and get it in a state where we could run it locally.
+>
+> Then integrate **Hookdeck Outpost** for **outbound webhooks** to our customers:
+>
+> 1. Use the official **`@hookdeck/outpost-sdk`** on the **server only** (API routes, server actions, or equivalent — never expose `OUTPOST_API_KEY` to the browser).
+> 2. Pick **one meaningful domain event** in this starter (e.g. team or member lifecycle — choose something that actually exists in the code) and **`publish`** an event to Outpost with a **topic** from the Turn 0 prompt (or document the topic constant).
+> 3. Document how an operator registers a **webhook destination** per **tenant/customer** (REST flow or small admin UI is fine). Use the test destination URL from Turn 0 where helpful.
+> 4. Add or update a **README section** listing required env vars (`OUTPOST_API_KEY`, optional base URL, anything else you add).
+
+### Turn 2 — User (optional)
+
+> Where should we call **`tenants.upsert`** relative to our own tenant/customer model?
+
+## Success criteria
+
+**Measurement:** Heuristic `scoreScenario08` in [`src/score-transcript.ts`](../src/score-transcript.ts); LLM judge maps the bullets below ([`README.md` § Measuring scenarios](../README.md#measuring-scenarios)). Execution row is manual.
+
+- Baseline app is the documented **next-saas-starter** (or an explicitly justified fork) with clone + install steps reflected in the transcript or run directory.
+- **Outpost TypeScript SDK** used **server-side only**; no `NEXT_PUBLIC_*` API key.
+- At least one **publish** (or equivalent) tied to a **real code path** in the baseline (not dead code).
+- **Topic** aligns with Turn 0 configuration or is clearly named and documented.
+- **Per-customer webhook** story is explained: destination creation / subscription to topic.
+- README (or equivalent) lists **env vars** for Outpost.
+- **Execution (full pass):** With `OUTPOST_API_KEY` set, the app runs; a manual path triggers the integrated publish and Outpost accepts the request (2xx/202 as appropriate). *Skip only for transcript-only triage.*
+
+## Failure modes to note
+
+- Pasting a greenfield Next app instead of integrating the **cloned** baseline.
+- Publishing only from a demo route unrelated to the product model.
+- Calling Outpost from client components with secrets.
+
+## Future baselines
+
+Java / .NET “existing app” scenarios can follow the same shape: fixed public baseline repo + Option 3 Turn 1 + Success criteria + `scoreScenarioNN`.
diff --git a/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md b/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md
new file mode 100644
index 000000000..72c63ef86
--- /dev/null
+++ b/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md
@@ -0,0 +1,52 @@
+# Scenario 9 — Integrate Outpost into an existing FastAPI SaaS app
+
+## Intent
+
+Same as [scenario 8](08-integrate-nextjs-existing.md), but stack is **Python + FastAPI** with a **multi-tenant / org** style baseline.
+
+**Baseline application (pin this in evals):** [**philipokiokio/FastAPI_SAAS_Template**](https://github.com/philipokiokio/FastAPI_SAAS_Template) — FastAPI, organizations, permissions, Alembic, MIT-style OSS template commonly used as a starting point. Substitute only if you document another baseline in the scenario and update heuristics.
+
+## Preconditions
+
+- Python 3.10+; `git` available.
+
+## Automated eval (Claude Agent SDK)
+
+**`cwd`** is `results/runs/<stamp>-scenario-09/`. Expect **`git clone`**, **`pip` / `uv`**, then **Write** / **Edit** for Outpost integration.
+
+## Conversation script
+
+### Turn 0
+
+Paste the **## Template** from [`hookdeck-outpost-agent-prompt.mdx`](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx) with placeholders filled.
+
+### Turn 1 — User
+
+> **Option 3 — integrate with an existing app.** Clone **`https://github.com/philipokiokio/FastAPI_SAAS_Template`** into this workspace, install dependencies per its README (venv + `pip install -r requirements.txt` or `uv` as you prefer).
+>
+> Integrate **Hookdeck Outpost** for **outbound webhooks**:
+>
+> 1. Use **`outpost_sdk`** only in **server** code (routers, services — never embed the API key in templates or static JS).
+> 2. Hook **`publish.event`** (and tenant/destination setup as needed) to **one real domain event** in this template (e.g. org membership or user lifecycle — pick something that exists in the codebase).
+> 3. Document how operators register **webhook destinations** per tenant/customer and which **topic** you publish on (use topics from Turn 0 when possible).
+> 4. Document **`OUTPOST_API_KEY`** and **`uvicorn`** (or equivalent) run instructions in README.
+
+### Turn 2 — User (optional)
+
+> Should **`tenants.upsert`** run at org creation or lazily on first publish?
+
+## Success criteria
+
+**Measurement:** Heuristic `scoreScenario09` in [`src/score-transcript.ts`](../src/score-transcript.ts); LLM judge; execution manual.
+
+- Cloned **FastAPI_SAAS_Template** (or documented alternative) with install steps.
+- **`outpost_sdk`** with **`publish.event`** (and related calls as needed) on a **real** code path.
+- API key from **environment** or secure settings — not hard-coded or exposed to clients.
+- **Topic** and **destination** story documented.
+- README updated for env + run.
+- **Execution (full pass):** App starts; trigger path fires publish; Outpost accepts. *Skip for transcript-only.*
+
+## Failure modes to note
+
+- Greenfield FastAPI “hello world” instead of the **cloned** template.
+- Using raw `httpx` to Outpost when the scenario asks for **`outpost_sdk`**.
diff --git a/docs/agent-evaluation/scenarios/10-integrate-go-existing.md b/docs/agent-evaluation/scenarios/10-integrate-go-existing.md
new file mode 100644
index 000000000..c8f91c79e
--- /dev/null
+++ b/docs/agent-evaluation/scenarios/10-integrate-go-existing.md
@@ -0,0 +1,51 @@
+# Scenario 10 — Integrate Outpost into an existing Go SaaS API
+
+## Intent
+
+Same integration goal as [scenarios 8–9](08-integrate-nextjs-existing.md), for a **Go** REST API baseline with **auth and typical SaaS** structure.
+
+**Baseline application (pin this in evals):** [**devinterface/startersaas-go-api**](https://github.com/devinterface/startersaas-go-api) — Go API, JWT, MongoDB, Stripe hooks, Docker — MIT license, small enough to clone in an eval. If you standardize on another Go SaaS boilerplate, update this file and `scoreScenario10`’s baseline check.
+
+## Preconditions
+
+- Go 1.21+; `git` available.
+
+## Automated eval (Claude Agent SDK)
+
+**`cwd`** is `results/runs/<stamp>-scenario-10/`. Expect **`git clone`**, **`go mod`** / **`go get`** for **`outpost-go`**, then source edits.
+
+## Conversation script
+
+### Turn 0
+
+Paste the **## Template** from [`hookdeck-outpost-agent-prompt.mdx`](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx) with placeholders filled.
+
+### Turn 1 — User
+
+> **Option 3 — integrate with an existing app.** Clone **`https://github.com/devinterface/startersaas-go-api`** into this workspace and make it build (`go build` / `go test` ./… as appropriate per the repo).
+>
+> Add **Hookdeck Outpost** for **outbound webhooks** to customers:
+>
+> 1. Use the official **Go SDK** (`github.com/hookdeck/outpost/sdks/outpost-go` or current module path from docs).
+> 2. **`OUTPOST_API_KEY`** from environment only.
+> 3. On **one real domain event** in this API (e.g. user registration, subscription, or another existing handler), call **`Publish.Event`** (and **`Tenants` / `Destinations`** as needed) with a **topic** from Turn 0.
+> 4. Document how to register **webhook destinations** per tenant and which env vars to set. Mention the Hookdeck test destination URL from Turn 0 where useful.
+
+### Turn 2 — User (optional)
+
+> Show where **`CreateDestinationCreateWebhook`** fits if we let each customer paste a webhook URL in a settings API.
+
+## Success criteria
+
+**Measurement:** Heuristic `scoreScenario10` in [`src/score-transcript.ts`](../src/score-transcript.ts); LLM judge; execution manual.
+
+- Cloned **startersaas-go-api** (or documented alternative) with build instructions attempted.
+- **Outpost Go SDK** used with **`Publish.Event`** (and related types) on a **real** handler path.
+- No API key in source; **`os.Getenv("OUTPOST_API_KEY")`** (or config loader) only.
+- **Topic** + **destination** documentation for operators.
+- **Execution (full pass):** Server runs; trigger handler; Outpost accepts publish. *Skip for transcript-only.*
+
+## Failure modes to note
+
+- New `main.go` only, without using the **cloned** baseline’s routes/models.
+- Wrong `Create` shape without **`CreateDestinationCreateWebhook`** when creating webhook destinations.
diff --git a/docs/agent-evaluation/scripts/ci-eval.sh b/docs/agent-evaluation/scripts/ci-eval.sh
new file mode 100755
index 000000000..4197c8b92
--- /dev/null
+++ b/docs/agent-evaluation/scripts/ci-eval.sh
@@ -0,0 +1,22 @@
+#!/usr/bin/env bash
+# CI-friendly agent eval: scenarios 01+02 with heuristic + LLM judge (Success criteria from each scenario .md).
+#
+# Required secrets (e.g. GitHub Actions): ANTHROPIC_API_KEY, EVAL_TEST_DESTINATION_URL
+# Optional: same vars in docs/agent-evaluation/.env for local runs.
+#
+# Scenarios: 01 = curl quickstart shape; 02 = TypeScript SDK script. See README § CI.
+set -euo pipefail
+
+ROOT="$(cd "$(dirname "$0")/.." && pwd)"
+cd "$ROOT"
+
+if [[ -z "${ANTHROPIC_API_KEY:-}" ]]; then
+  echo "ci-eval: ANTHROPIC_API_KEY is not set" >&2
+  exit 1
+fi
+if [[ -z "${EVAL_TEST_DESTINATION_URL:-}" ]]; then
+  echo "ci-eval: EVAL_TEST_DESTINATION_URL is not set" >&2
+  exit 1
+fi
+
+exec npm run eval:ci
diff --git a/docs/agent-evaluation/scripts/run-scenario.sh b/docs/agent-evaluation/scripts/run-scenario.sh
new file mode 100755
index 000000000..7b24d3291
--- /dev/null
+++ b/docs/agent-evaluation/scripts/run-scenario.sh
@@ -0,0 +1,46 @@
+#!/usr/bin/env bash
+# Manual agent evaluation helper: prints paths and Turn 0 instructions.
+# Does NOT invoke an LLM or run automated tests.
+set -euo pipefail
+
+ROOT="$(cd "$(dirname "$0")/.." && pwd)"
+REPO_ROOT="$(cd "$ROOT/../.." && pwd)"
+
+usage() {
+  echo "Usage: $0 <01|02|03|04|05|06|07|08|09|10>"
+  echo "Prints the scenario file path and how to obtain Turn 0 from the single source of truth."
+  echo ""
+  echo "This script does not call an API or start an agent."
+}
+
+if [[ "${1:-}" == "-h" || "${1:-}" == "--help" || -z "${1:-}" ]]; then
+  usage
+  exit 0
+fi
+
+id="$1"
+shopt -s nullglob
+matches=( "$ROOT/scenarios/${id}"-*.md )
+shopt -u nullglob
+
+if [[ ${#matches[@]} -eq 0 ]]; then
+  echo "No scenario matching: scenarios/${id}-*.md" >&2
+  exit 1
+fi
+
+scenario="${matches[0]}"
+
+echo "=== Outpost agent eval (manual) ==="
+echo ""
+echo "Scenario file:"
+echo "  $scenario"
+echo ""
+echo "Turn 0 — copy the fenced block under '## Template' from:"
+echo "  $REPO_ROOT/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx"
+echo ""
+echo "Placeholder examples (not the template):"
+echo "  $ROOT/fixtures/placeholder-values-for-turn0.md"
+echo ""
+echo "Record results (local copy; see results/.gitignore):"
+echo "  cp \"$ROOT/results/RUN-RECORDING.template.md\" \"$ROOT/results/$(date +%F)-s${id}-<client>.md\""
+echo ""
diff --git a/docs/agent-evaluation/src/llm-judge.ts b/docs/agent-evaluation/src/llm-judge.ts
new file mode 100644
index 000000000..b3e9ae0b9
--- /dev/null
+++ b/docs/agent-evaluation/src/llm-judge.ts
@@ -0,0 +1,230 @@
+/**
+ * LLM-as-judge scoring via Anthropic Messages API.
+ * Feeds scenario Success criteria + assistant transcript; returns structured JSON from the model.
+ */
+
+import { readFile } from "node:fs/promises";
+import { basename, dirname, join } from "node:path";
+import { extractTranscriptScoringText } from "./score-transcript.js";
+
+const ANTHROPIC_MESSAGES_URL = "https://api.anthropic.com/v1/messages";
+const DEFAULT_SCORE_MODEL = "claude-sonnet-4-20250514";
+const MAX_TRANSCRIPT_CHARS = 180_000;
+
+export interface LlmCriterionJudgment {
+  readonly criterion: string;
+  readonly pass: boolean;
+  readonly evidence: string;
+}
+
+export interface LlmJudgeReport {
+  readonly version: 1;
+  readonly model: string;
+  readonly runFile: string;
+  readonly scenarioFile: string;
+  readonly overall_transcript_pass: boolean;
+  /** LLM cannot run curls; always note limits */
+  readonly execution_in_transcript: {
+    readonly pass: boolean | null;
+    readonly note: string;
+  };
+  readonly criteria: readonly LlmCriterionJudgment[];
+  readonly summary: string;
+}
+
+interface RunJson {
+  meta?: {
+    scenarioId?: string;
+    scenarioFile?: string;
+    turns?: readonly { label?: string; messageCount?: number }[];
+  };
+  messages?: unknown[];
+}
+
+export function extractSuccessCriteriaMarkdown(fullMd: string): string {
+  const anchor = "## Success criteria";
+  const i = fullMd.indexOf(anchor);
+  if (i === -1) {
+    return "(No ## Success criteria section found.)";
+  }
+  const rest = fullMd.slice(i);
+  const sub = rest.slice(anchor.length);
+  const rel = sub.search(/\n## [A-Za-z]/);
+  return rel === -1 ? rest.trim() : rest.slice(0, anchor.length + rel).trim();
+}
+
+function stripJsonFence(text: string): string {
+  const t = text.trim();
+  const m = t.match(/^```(?:json)?\s*([\s\S]*?)```$/m);
+  if (m) return m[1].trim();
+  return t;
+}
+
+function parseJudgeJson(text: string): Omit<LlmJudgeReport, "model" | "runFile" | "scenarioFile" | "version"> & {
+  version?: number;
+} {
+  const raw = stripJsonFence(text);
+  const parsed = JSON.parse(raw) as Record<string, unknown>;
+  const overall = Boolean(parsed.overall_transcript_pass);
+  const criteriaIn = parsed.criteria;
+  const criteria: LlmCriterionJudgment[] = [];
+  if (Array.isArray(criteriaIn)) {
+    for (const c of criteriaIn) {
+      if (typeof c !== "object" || c === null) continue;
+      const o = c as Record<string, unknown>;
+      criteria.push({
+        criterion: String(o.criterion ?? o.id ?? "unnamed"),
+        pass: Boolean(o.pass),
+        evidence: String(o.evidence ?? ""),
+      });
+    }
+  }
+  const exec = parsed.execution_in_transcript;
+  let execution_in_transcript: LlmJudgeReport["execution_in_transcript"] = {
+    pass: null,
+    note: "Not specified by judge.",
+  };
+  if (typeof exec === "object" && exec !== null) {
+    const e = exec as Record<string, unknown>;
+    execution_in_transcript = {
+      pass: typeof e.pass === "boolean" ? e.pass : null,
+      note: String(e.note ?? ""),
+    };
+  }
+  return {
+    overall_transcript_pass: overall,
+    execution_in_transcript: execution_in_transcript,
+    criteria,
+    summary: String(parsed.summary ?? ""),
+  };
+}
+
+const JUDGE_SYSTEM = `You are an expert evaluator for Hookdeck Outpost onboarding documentation and API usage.
+You judge whether an AI assistant's replies satisfy the scenario's Success criteria (markdown checklist from the scenario spec).
+Be strict: a criterion passes only if the transcript (including code the model wrote via tools) clearly satisfies it.
+You cannot run shell or HTTP — do not claim execution passed; use execution_in_transcript.pass = null and explain in note.
+Output ONLY valid JSON (no markdown fences, no commentary outside JSON) matching this shape:
+{
+  "overall_transcript_pass": boolean,
+  "execution_in_transcript": { "pass": null, "note": "string explaining you did not execute code" },
+  "criteria": [
+    { "criterion": "short label from checklist", "pass": boolean, "evidence": "1-3 sentences; quote or paraphrase assistant" }
+  ],
+  "summary": "2-4 sentences overall"
+}
+Map each major bullet/checkbox line from Success criteria to one criteria[] entry (merge tiny sub-bullets if needed).`;
+
+export async function llmJudgeRun(options: {
+  readonly runPath: string;
+  readonly scenarioMdPath: string;
+  readonly apiKey: string;
+  readonly model?: string;
+}): Promise<LlmJudgeReport> {
+  const model = options.model?.trim() || process.env.EVAL_SCORE_MODEL?.trim() || DEFAULT_SCORE_MODEL;
+  const rawRun = await readFile(options.runPath, "utf8");
+  const data = JSON.parse(rawRun) as RunJson;
+  const scenarioFile = data.meta?.scenarioFile ?? "unknown.md";
+  const scenarioMd = await readFile(options.scenarioMdPath, "utf8");
+  const criteriaBlock = extractSuccessCriteriaMarkdown(scenarioMd);
+
+  let transcript = extractTranscriptScoringText(data.messages);
+  if (transcript.length > MAX_TRANSCRIPT_CHARS) {
+    transcript =
+      transcript.slice(0, MAX_TRANSCRIPT_CHARS) +
+      "\n\n[… transcript truncated for judge context …]\n";
+  }
+
+  const userContent = `## Success criteria (from scenario spec — your rubric)
+
+${criteriaBlock}
+
+---
+
+## Transcript for review (assistant text plus tool-written file contents and tool inputs from the run JSON)
+
+${transcript}
+
+---
+
+Judge the transcript against the Success criteria. Remember: execution (running curl against a live API) is NOT evidenced here unless the transcript explicitly describes successful HTTP results; normally set execution_in_transcript.pass to null.`;
+
+  const res = await fetch(ANTHROPIC_MESSAGES_URL, {
+    method: "POST",
+    headers: {
+      "content-type": "application/json",
+      "x-api-key": options.apiKey,
+      "anthropic-version": "2023-06-01",
+    },
+    body: JSON.stringify({
+      model,
+      max_tokens: 8192,
+      system: JUDGE_SYSTEM,
+      messages: [{ role: "user", content: userContent }],
+    }),
+  });
+
+  if (!res.ok) {
+    const errText = await res.text();
+    throw new Error(`Anthropic API ${res.status}: ${errText.slice(0, 2000)}`);
+  }
+
+  const body = (await res.json()) as {
+    content?: readonly { type?: string; text?: string }[];
+  };
+  const textBlock = body.content?.find((c) => c.type === "text");
+  const text = textBlock?.text ?? "";
+  let judged: ReturnType<typeof parseJudgeJson>;
+  try {
+    judged = parseJudgeJson(text);
+  } catch {
+    throw new Error(
+      `Judge did not return parseable JSON. First 800 chars:\n${text.slice(0, 800)}`,
+    );
+  }
+
+  return {
+    version: 1,
+    model,
+    runFile: options.runPath,
+    scenarioFile,
+    overall_transcript_pass: judged.overall_transcript_pass,
+    execution_in_transcript: judged.execution_in_transcript,
+    criteria: judged.criteria,
+    summary: judged.summary,
+  };
+}
+
+export function scenarioMdPathFromRun(
+  evalRoot: string,
+  scenarioFile: string | undefined,
+): string {
+  if (!scenarioFile?.trim()) {
+    throw new Error("Run JSON meta.scenarioFile is missing");
+  }
+  return join(evalRoot, "scenarios", scenarioFile);
+}
+
+export function formatLlmReportHuman(r: LlmJudgeReport): string {
+  const lines: string[] = [
+    `LLM judge (${r.model})`,
+    `Transcript: ${r.runFile}`,
+    `Scenario: ${r.scenarioFile}`,
+  ];
+  if (basename(r.runFile) === "transcript.json") {
+    lines.push(`Run directory: ${dirname(r.runFile)}`);
+  }
+  lines.push(
+    "",
+    `Overall transcript pass: ${r.overall_transcript_pass ? "YES" : "NO"}`,
+    `Execution (from transcript only): pass=${String(r.execution_in_transcript.pass)} — ${r.execution_in_transcript.note}`,
+    "",
+    "Per criterion:",
+  );
+  for (const c of r.criteria) {
+    lines.push(`  [${c.pass ? "PASS" : "FAIL"}] ${c.criterion}`);
+    lines.push(`         ${c.evidence}`);
+  }
+  lines.push("");
+  lines.push(`Summary: ${r.summary}`);
+  return lines.join("\n");
+}
diff --git a/docs/agent-evaluation/src/run-agent-eval.ts b/docs/agent-evaluation/src/run-agent-eval.ts
new file mode 100644
index 000000000..72464f3a2
--- /dev/null
+++ b/docs/agent-evaluation/src/run-agent-eval.ts
@@ -0,0 +1,527 @@
+/**
+ * Automated Outpost onboarding agent evals via the Claude Agent SDK.
+ *
+ * Requires ANTHROPIC_API_KEY (and EVAL_TEST_DESTINATION_URL). Does not call Outpost.
+ * For a full eval, humans (or a separate verifier) run generated artifacts using OUTPOST_API_KEY — see README.
+ *
+ * @see https://platform.claude.com/docs/en/agent-sdk/overview
+ */
+
+import { mkdir, readdir, readFile, writeFile } from "node:fs/promises";
+import { join, dirname } from "node:path";
+import { fileURLToPath } from "node:url";
+import { parseArgs } from "node:util";
+import dotenv from "dotenv";
+import {
+  query,
+  type Options,
+  type SDKMessage,
+  type SDKSystemMessage,
+} from "@anthropic-ai/claude-agent-sdk";
+import { llmJudgeRun, scenarioMdPathFromRun } from "./llm-judge.js";
+import { scoreRunFile } from "./score-transcript.js";
+
+const __filename = fileURLToPath(import.meta.url);
+const __dirname = dirname(__filename);
+
+/** `docs/agent-evaluation/` */
+const EVAL_ROOT = join(__dirname, "..");
+
+dotenv.config({ path: join(EVAL_ROOT, ".env") });
+/** Outpost repository root */
+const REPO_ROOT = join(EVAL_ROOT, "..", "..");
+const PROMPT_MDX = join(
+  REPO_ROOT,
+  "docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx",
+);
+const SCENARIOS_DIR = join(EVAL_ROOT, "scenarios");
+const RUNS_DIR = join(EVAL_ROOT, "results", "runs");
+
+function isInitSystemMessage(m: SDKMessage): m is SDKSystemMessage {
+  return m.type === "system" && m.subtype === "init";
+}
+
+function extractTemplateFromMdx(mdx: string): string {
+  const idx = mdx.indexOf("## Template");
+  if (idx === -1) {
+    throw new Error("Could not find ## Template in hookdeck-outpost-agent-prompt.mdx");
+  }
+  const after = mdx.slice(idx);
+  const fenceStart = after.indexOf("```");
+  if (fenceStart === -1) {
+    throw new Error("No opening code fence after ## Template");
+  }
+  const contentStart = after.indexOf("\n", fenceStart) + 1;
+  const fenceEnd = after.indexOf("```", contentStart);
+  if (fenceEnd === -1) {
+    throw new Error("No closing code fence for ## Template");
+  }
+  return after.slice(contentStart, fenceEnd).trim();
+}
+
+function envFlagTruthy(v: string | undefined): boolean {
+  if (!v) return false;
+  const s = v.trim().toLowerCase();
+  return s === "1" || s === "true" || s === "yes";
+}
+
+/** When docs are not published yet, point the agent at MDX/OpenAPI paths in this repo. */
+function localDocumentationBlock(repoRoot: string, llmsFullUrl: string | undefined): string {
+  const f = (...parts: string[]) => join(repoRoot, ...parts);
+  let block = `### Documentation (local repository — unpublished)
+
+Do **not** rely on live public documentation URLs for this session. Read these files from the Outpost checkout (for example with the **Read** tool). Paths are absolute from the repository root:
+
+- Getting started (curl): \`${f("docs/pages/quickstarts/hookdeck-outpost-curl.mdx")}\`
+- TypeScript quickstart: \`${f("docs/pages/quickstarts/hookdeck-outpost-typescript.mdx")}\`
+- Python quickstart: \`${f("docs/pages/quickstarts/hookdeck-outpost-python.mdx")}\`
+- Go quickstart: \`${f("docs/pages/quickstarts/hookdeck-outpost-go.mdx")}\`
+- API reference (human-oriented pages under): \`${f("docs/pages/references/")}\`
+- OpenAPI spec (machine-readable): \`${f("docs/apis/openapi.yaml")}\`
+- Destination types: \`${f("docs/pages/destinations/")}\`
+- SDKs overview: \`${f("docs/pages/sdks.mdx")}\``;
+  if (llmsFullUrl) {
+    block += `\n- Full docs bundle: ${llmsFullUrl}`;
+  }
+  return block;
+}
+
+function applyPlaceholders(
+  template: string,
+  env: NodeJS.ProcessEnv,
+  repoRoot: string,
+): string {
+  const apiBase =
+    env.EVAL_API_BASE_URL ?? "https://api.outpost.hookdeck.com/2025-07-01";
+  const topics = env.EVAL_TOPICS_LIST ?? "- user.created";
+  const testUrl = env.EVAL_TEST_DESTINATION_URL?.trim();
+  if (!testUrl) {
+    throw new Error(
+      "Set EVAL_TEST_DESTINATION_URL to your Hookdeck Console Source URL (same value the dashboard injects as {{TEST_DESTINATION_URL}})",
+    );
+  }
+  const docsUrl = env.EVAL_DOCS_URL ?? "https://outpost.hookdeck.com/docs";
+  const llms = env.EVAL_LLMS_FULL_URL?.trim() ?? "";
+  const useLocalDocs = envFlagTruthy(env.EVAL_LOCAL_DOCS);
+
+  let base = template;
+  if (useLocalDocs) {
+    const docSection = /^### Documentation\n\n[\s\S]*?(?=\n### What to do\b)/m;
+    if (!docSection.test(base)) {
+      throw new Error(
+        "EVAL_LOCAL_DOCS is set but the prompt template has no ### Documentation section before ### What to do",
+      );
+    }
+    base = base.replace(
+      docSection,
+      localDocumentationBlock(repoRoot, llms || undefined),
+    );
+  }
+
+  let out = base
+    .replaceAll("{{API_BASE_URL}}", apiBase)
+    .replaceAll("{{TOPICS_LIST}}", topics)
+    .replaceAll("{{TEST_DESTINATION_URL}}", testUrl)
+    .replaceAll("{{DOCS_URL}}", docsUrl)
+    .replaceAll("{{LLMS_FULL_URL}}", llms);
+
+  if (!llms) {
+    out = out
+      .split("\n")
+      .filter((line) => !/Full docs bundle/i.test(line))
+      .join("\n");
+  }
+
+  return out;
+}
+
+interface ParsedTurn {
+  readonly num: number;
+  readonly title: string;
+  readonly body: string;
+  readonly optional: boolean;
+}
+
+function parseScenarioTurns(markdown: string): ParsedTurn[] {
+  const lines = markdown.split(/\r?\n/);
+  const turns: ParsedTurn[] = [];
+  let i = 0;
+
+  while (i < lines.length) {
+    const line = lines[i];
+    const m = line.match(/^### Turn (\d+)\s*(.*)$/);
+    if (m) {
+      const num = Number(m[1]);
+      const restOfTitle = m[2] ?? "";
+      const title = `Turn ${m[1]}${restOfTitle ? ` ${restOfTitle}` : ""}`;
+      const optional = /optional/i.test(title);
+      i++;
+      const bodyLines: string[] = [];
+      while (i < lines.length) {
+        const L = lines[i];
+        if (/^### /.test(L)) {
+          break;
+        }
+        if (/^## /.test(L)) {
+          break;
+        }
+        bodyLines.push(L);
+        i++;
+      }
+      turns.push({
+        num,
+        title,
+        body: bodyLines.join("\n").trim(),
+        optional,
+      });
+      continue;
+    }
+    i++;
+  }
+
+  return turns.sort((a, b) => a.num - b.num);
+}
+
+function extractUserMessage(turnBody: string): string {
+  const quoted: string[] = [];
+  for (const line of turnBody.split(/\r?\n/)) {
+    const q = line.match(/^\s*>\s?(.*)$/);
+    if (q) {
+      quoted.push(q[1]);
+    }
+  }
+  const fromBlockquote = quoted.join("\n").trim();
+  if (fromBlockquote) {
+    return fromBlockquote;
+  }
+  return turnBody.replace(/^\s*$/gm, "").trim();
+}
+
+function serializeMessage(message: SDKMessage): unknown {
+  try {
+    return JSON.parse(
+      JSON.stringify(message, (_, v) => (typeof v === "bigint" ? v.toString() : v)),
+    );
+  } catch {
+    return { _nonSerializable: String(message) };
+  }
+}
+
+async function listScenarioFiles(): Promise<string[]> {
+  const names = await readdir(SCENARIOS_DIR);
+  return names
+    .filter((n) => /^\d{2}-.*\.md$/.test(n))
+    .sort();
+}
+
+function idFromFilename(file: string): string {
+  return file.slice(0, 2);
+}
+
+async function runScenarioQuery(
+  prompt: string,
+  options: Options,
+): Promise<{ messages: unknown[]; sessionId?: string }> {
+  const messages: unknown[] = [];
+  let sessionId: string | undefined;
+
+  const q = query({ prompt, options });
+  for await (const message of q) {
+    messages.push(serializeMessage(message));
+    if (isInitSystemMessage(message)) {
+      sessionId = message.session_id;
+    }
+  }
+
+  return { messages, sessionId };
+}
+
+async function runOneScenario(
+  scenarioFile: string,
+  filledTemplate: string,
+  opts: {
+    skipOptional: boolean;
+    baseOptions: Options;
+  },
+): Promise<{
+  scenarioId: string;
+  scenarioFile: string;
+  turns: Array<{ label: string; messageCount: number }>;
+  sessionId?: string;
+  allMessages: unknown[];
+}> {
+  const path = join(SCENARIOS_DIR, scenarioFile);
+  const md = await readFile(path, "utf8");
+  const parsed = parseScenarioTurns(md);
+
+  const userTurns = parsed
+    .filter((t) => t.num >= 1)
+    .filter((t) => !t.optional || !opts.skipOptional)
+    .map((t) => ({
+      label: t.title,
+      text: extractUserMessage(t.body),
+    }))
+    .filter((t) => t.text.length > 0);
+
+  const prompts = [filledTemplate, ...userTurns.map((t) => t.text)];
+
+  const allMessages: unknown[] = [];
+  let sessionId: string | undefined;
+  const turnStats: Array<{ label: string; messageCount: number }> = [];
+
+  for (let i = 0; i < prompts.length; i++) {
+    const label = i === 0 ? "Turn 0 (dashboard prompt)" : userTurns[i - 1]?.label ?? `Turn ${i}`;
+    const before = allMessages.length;
+    const { messages, sessionId: sid } = await runScenarioQuery(prompts[i]!, {
+      ...opts.baseOptions,
+      resume: sessionId,
+    });
+    if (sid) {
+      sessionId = sid;
+    }
+    allMessages.push(...messages);
+    turnStats.push({
+      label,
+      messageCount: allMessages.length - before,
+    });
+  }
+
+  return {
+    scenarioId: idFromFilename(scenarioFile),
+    scenarioFile,
+    turns: turnStats,
+    sessionId,
+    allMessages,
+  };
+}
+
+function defaultEvalTools(env: NodeJS.ProcessEnv): string {
+  if (env.EVAL_TOOLS?.trim()) {
+    return env.EVAL_TOOLS.trim();
+  }
+  // dontAsk + allowedTools: only listed tools are pre-approved; others are denied.
+  // Write/Edit: materialize scripts and apps into the per-run directory (agent cwd).
+  // Bash: npm/npx/go mod/pip/uv for app scenarios (05–07) and installs for 02–04.
+  // WebFetch: omitted when EVAL_LOCAL_DOCS uses repo paths + Read instead.
+  return envFlagTruthy(env.EVAL_LOCAL_DOCS)
+    ? "Read,Glob,Grep,Write,Edit,Bash"
+    : "Read,Glob,Grep,WebFetch,Write,Edit,Bash";
+}
+
+function buildBaseOptions(agentWorkspaceCwd: string): Options {
+  const toolsRaw = defaultEvalTools(process.env);
+  const allowedTools = toolsRaw
+    .split(",")
+    .map((s) => s.trim())
+    .filter(Boolean);
+
+  const mode = (process.env.EVAL_PERMISSION_MODE ?? "dontAsk") as NonNullable<
+    Options["permissionMode"]
+  >;
+
+  const maxTurns = Number(process.env.EVAL_MAX_TURNS ?? "40");
+  const persistSession = process.env.EVAL_PERSIST_SESSION !== "false";
+
+  const o: Options = {
+    cwd: agentWorkspaceCwd,
+    allowedTools,
+    permissionMode: mode,
+    maxTurns: Number.isFinite(maxTurns) ? maxTurns : 40,
+    persistSession,
+    env: {
+      ...process.env,
+      CLAUDE_AGENT_SDK_CLIENT_APP: "outpost-docs-agent-eval/1.0.0",
+    } as Record<string, string | undefined>,
+  };
+
+  if (process.env.EVAL_MODEL?.trim()) {
+    o.model = process.env.EVAL_MODEL.trim();
+  }
+
+  return o;
+}
+
+async function main(): Promise<void> {
+  const { values } = parseArgs({
+    options: {
+      scenario: { type: "string" },
+      scenarios: { type: "string" },
+      all: { type: "boolean", default: false },
+      "skip-optional": { type: "boolean", default: false },
+      "dry-run": { type: "boolean", default: false },
+      "no-score": { type: "boolean", default: false },
+      "no-score-llm": { type: "boolean", default: false },
+      help: { type: "boolean", short: "h", default: false },
+    },
+    allowPositionals: false,
+  });
+
+  if (values.help) {
+    console.log(`
+Outpost agent evaluation (Claude Agent SDK)
+
+Usage:
+  npm run eval -- --scenario 01
+  npm run eval -- --scenarios 01,02,05
+  npm run eval -- --all              # deliberate: every scenario (costly)
+  npm run eval -- --skip-optional
+  npm run eval -- --no-score         # skip heuristic-score.json
+  npm run eval -- --no-score-llm     # skip llm-score.json (no Success-criteria judge)
+  npm run eval -- --no-score --no-score-llm   # transcripts only
+  npm run eval -- --dry-run
+
+You must pass --scenario, --scenarios, or --all so the set of runs is explicit (cost and scope).
+After each scenario: transcript + heuristic-score.json + llm-score.json (judge uses ## Success criteria) unless disabled above.
+Exit 1 if any enabled score fails.
+
+Environment:
+  Values can be set in docs/agent-evaluation/.env (loaded automatically) or exported in the shell.
+  ANTHROPIC_API_KEY     Required
+  EVAL_TEST_DESTINATION_URL   Required — Hookdeck Console Source URL (fed into {{TEST_DESTINATION_URL}})
+  EVAL_API_BASE_URL     Optional (default: managed production URL)
+  EVAL_TOPICS_LIST      Optional
+  EVAL_DOCS_URL         Optional (ignored for doc links when EVAL_LOCAL_DOCS is set)
+  EVAL_LOCAL_DOCS       Set to 1/true/yes to replace Documentation URLs with repo file paths (unpublished docs)
+  EVAL_LLMS_FULL_URL    Optional (omit docs line if unset)
+  EVAL_TOOLS            Optional, comma-separated (default: Read,Glob,Grep[,WebFetch],Write,Edit,Bash — see README)
+  EVAL_MODEL            Optional
+  EVAL_MAX_TURNS        Optional (default: 40)
+  EVAL_PERMISSION_MODE  Optional (default: dontAsk)
+  EVAL_PERSIST_SESSION  Set to "false" to disable session persistence (breaks multi-turn resume)
+
+Outputs under docs/agent-evaluation/results/runs/ (gitignored): each scenario gets
+  results/runs/<stamp>-scenario-NN/transcript.json
+  heuristic-score.json and llm-score.json unless disabled (see above).
+Also set EVAL_NO_SCORE_HEURISTIC=1 or EVAL_NO_SCORE_LLM=1 in .env to skip scoring without flags.
+
+Each run uses results/runs/<stamp>-scenario-NN/ as agent cwd so Write creates files there.
+`);
+    process.exit(0);
+  }
+
+  if (!process.env.ANTHROPIC_API_KEY?.trim()) {
+    console.error("Missing ANTHROPIC_API_KEY");
+    process.exit(1);
+  }
+
+  const mdx = await readFile(PROMPT_MDX, "utf8");
+  const template = extractTemplateFromMdx(mdx);
+  const filledTemplate = applyPlaceholders(template, process.env, REPO_ROOT);
+
+  const allFiles = await listScenarioFiles();
+  let selected: string[];
+
+  if (values.all) {
+    selected = allFiles;
+  } else if (values.scenarios) {
+    const ids = values.scenarios.split(",").map((s) => s.trim());
+    selected = allFiles.filter((f) => ids.includes(idFromFilename(f)));
+    const missing = ids.filter((id) => !selected.some((f) => idFromFilename(f) === id));
+    if (missing.length) {
+      console.error("Unknown scenario id(s):", missing.join(", "));
+      process.exit(1);
+    }
+  } else if (values.scenario) {
+    const id = values.scenario.padStart(2, "0");
+    selected = allFiles.filter((f) => idFromFilename(f) === id);
+    if (selected.length === 0) {
+      console.error("Unknown scenario:", values.scenario);
+      process.exit(1);
+    }
+  } else {
+    console.error(
+      "Choose which scenarios to run (cost is proportional): --scenario <id>, --scenarios id,id, or --all for the full set.",
+    );
+    console.error(`Available: ${allFiles.map((f) => idFromFilename(f)).join(", ")}`);
+    process.exit(1);
+  }
+
+  if (values["dry-run"]) {
+    console.log("Dry run: would execute", selected.join(", "));
+    console.log("Turn 0 length (chars):", filledTemplate.length);
+    process.exit(0);
+  }
+
+  await mkdir(RUNS_DIR, { recursive: true });
+  const stamp = new Date().toISOString().replace(/[:.]/g, "-");
+
+  const wantScore =
+    !values["no-score"] &&
+    !envFlagTruthy(process.env.EVAL_NO_SCORE_HEURISTIC);
+  const wantLlm =
+    !values["no-score-llm"] &&
+    !envFlagTruthy(process.env.EVAL_NO_SCORE_LLM);
+
+  let anyScoreFailure = false;
+
+  console.error(
+    `Running ${selected.length} scenario(s): ${selected.join(", ")} (heuristic=${String(wantScore)}, llm=${String(wantLlm)})`,
+  );
+
+  for (const file of selected) {
+    const scenarioIdEarly = idFromFilename(file);
+    const runDir = join(RUNS_DIR, `${stamp}-scenario-${scenarioIdEarly}`);
+    await mkdir(runDir, { recursive: true });
+
+    const baseOptions = buildBaseOptions(runDir);
+    console.error(`\n>>> Scenario ${file} (workspace ${runDir}) ...`);
+    const result = await runOneScenario(file, filledTemplate, {
+      skipOptional: values["skip-optional"] ?? false,
+      baseOptions,
+    });
+
+    const outPath = join(runDir, "transcript.json");
+    const payload = {
+      meta: {
+        scenarioId: result.scenarioId,
+        scenarioFile: result.scenarioFile,
+        runDirectory: runDir,
+        agentWorkspaceCwd: runDir,
+        repositoryRoot: REPO_ROOT,
+        completedAt: new Date().toISOString(),
+        sessionId: result.sessionId,
+        turns: result.turns,
+      },
+      messages: result.allMessages,
+    };
+
+    await writeFile(outPath, JSON.stringify(payload, null, 2), "utf8");
+    console.error(`Wrote ${outPath}`);
+
+    if (wantScore) {
+      const report = await scoreRunFile(outPath);
+      const scorePath = join(runDir, "heuristic-score.json");
+      await writeFile(scorePath, `${JSON.stringify(report, null, 2)}\n`, "utf8");
+      console.error(`Wrote ${scorePath} (transcript: ${report.transcript.passed}/${report.transcript.total}, overallTranscriptPass=${String(report.overallTranscriptPass)})`);
+      if (report.overallTranscriptPass === false) {
+        anyScoreFailure = true;
+      }
+    }
+
+    if (wantLlm) {
+      const scenarioPath = scenarioMdPathFromRun(EVAL_ROOT, result.scenarioFile);
+      const llmReport = await llmJudgeRun({
+        runPath: outPath,
+        scenarioMdPath: scenarioPath,
+        apiKey: process.env.ANTHROPIC_API_KEY!.trim(),
+      });
+      const llmPath = join(runDir, "llm-score.json");
+      await writeFile(llmPath, `${JSON.stringify(llmReport, null, 2)}\n`, "utf8");
+      console.error(
+        `Wrote ${llmPath} (LLM overall_transcript_pass=${String(llmReport.overall_transcript_pass)})`,
+      );
+      if (!llmReport.overall_transcript_pass) {
+        anyScoreFailure = true;
+      }
+    }
+  }
+
+  if (anyScoreFailure) {
+    process.exit(1);
+  }
+}
+
+main().catch((err) => {
+  console.error(err);
+  process.exit(1);
+});
diff --git a/docs/agent-evaluation/src/score-eval.ts b/docs/agent-evaluation/src/score-eval.ts
new file mode 100644
index 000000000..4c720060d
--- /dev/null
+++ b/docs/agent-evaluation/src/score-eval.ts
@@ -0,0 +1,183 @@
+/**
+ * CLI: score a transcript JSON from npm run eval.
+ *
+ * Usage:
+ *   npm run score -- --run results/runs/2026-...-scenario-01.json
+ *   npm run score -- --latest
+ *   npm run score -- --latest --scenario 01
+ *   npm run score -- --run <file>.json --llm --write   # Anthropic judge → .llm-score.json
+ */
+
+import { readFile, writeFile } from "node:fs/promises";
+import { join, dirname } from "node:path";
+import { fileURLToPath } from "node:url";
+import { parseArgs } from "node:util";
+import dotenv from "dotenv";
+import {
+  formatLlmReportHuman,
+  llmJudgeRun,
+  scenarioMdPathFromRun,
+  type LlmJudgeReport,
+} from "./llm-judge.js";
+import {
+  findLatestRunFile,
+  formatScoreReportHuman,
+  resolveTranscriptJsonPath,
+  scoreRunFile,
+  scoreSidecarPaths,
+  type ScoreReport,
+} from "./score-transcript.js";
+
+const __dirname = dirname(fileURLToPath(import.meta.url));
+const EVAL_ROOT = join(__dirname, "..");
+dotenv.config({ path: join(EVAL_ROOT, ".env") });
+
+const RUNS_DIR = join(EVAL_ROOT, "results", "runs");
+
+async function main(): Promise<void> {
+  const { values, positionals } = parseArgs({
+    options: {
+      run: { type: "string" },
+      latest: { type: "boolean", default: false },
+      scenario: { type: "string" },
+      json: { type: "boolean", default: false },
+      write: { type: "boolean", default: false },
+      llm: { type: "boolean", default: false },
+      "no-heuristic": { type: "boolean", default: false },
+      help: { type: "boolean", short: "h", default: false },
+    },
+    allowPositionals: true,
+  });
+
+  if (values.help) {
+    console.log(`
+Score an eval transcript.
+
+  npm run score -- --run results/runs/<stamp>-scenario-01/transcript.json
+  npm run score -- --run results/runs/<stamp>-scenario-01   # directory ok
+  npm run score -- --latest [--scenario 01]
+  npm run score -- --write              # heuristic-score.json + llm-score.json in run dir
+  npm run score -- --llm [--write]      # Anthropic judge (needs ANTHROPIC_API_KEY)
+  npm run score -- --llm --no-heuristic # LLM only (no regex heuristic)
+
+Heuristic: src/score-transcript.ts. LLM: reads scenarios/*.md Success criteria + assistant text; model from EVAL_SCORE_MODEL (default claude-sonnet-4-20250514).
+
+Options:
+  --run <path>      transcript.json, a run directory, or legacy flat *-scenario-NN.json
+  --latest          Newest transcript (nested run dir or legacy flat file)
+  --scenario <id>   With --latest, filter scenario-0<id>
+  --json            Print machine-readable JSON only (last scorer: heuristic or LLM if --llm-only)
+  --write           Write sidecar file(s) for enabled scorers
+  --llm             Call Anthropic Messages API to judge against Success criteria
+  --no-heuristic    Skip regex heuristic (use with --llm for API-only scoring)
+`);
+    process.exit(0);
+  }
+
+  let runPath: string | null = values.run ?? null;
+  if (values.latest) {
+    runPath = await findLatestRunFile(RUNS_DIR, values.scenario);
+    if (!runPath) {
+      console.error("No matching run JSON in", RUNS_DIR);
+      process.exit(1);
+    }
+  }
+
+  if (!runPath && positionals[0]) {
+    runPath = positionals[0];
+  }
+
+  if (!runPath) {
+    console.error("Provide --run <path> or --latest");
+    process.exit(1);
+  }
+
+  let transcriptPath: string;
+  try {
+    transcriptPath = await resolveTranscriptJsonPath(runPath);
+  } catch (e) {
+    console.error(String(e));
+    process.exit(1);
+  }
+
+  const doHeuristic = !values["no-heuristic"];
+  const doLlm = values.llm;
+
+  if (!doHeuristic && !doLlm) {
+    console.error("Nothing to run: enable heuristic (default) or pass --llm");
+    process.exit(1);
+  }
+
+  let heuristicReport: ScoreReport | null = null;
+  let llmReport: LlmJudgeReport | null = null;
+  let fail = false;
+
+  if (doHeuristic) {
+    heuristicReport = await scoreRunFile(transcriptPath);
+    if (heuristicReport.overallTranscriptPass === false) {
+      fail = true;
+    }
+  }
+
+  if (doLlm) {
+    const key = process.env.ANTHROPIC_API_KEY?.trim();
+    if (!key) {
+      console.error("Missing ANTHROPIC_API_KEY for --llm");
+      process.exit(1);
+    }
+    const raw = await readFile(transcriptPath, "utf8");
+    const meta = JSON.parse(raw) as { meta?: { scenarioFile?: string } };
+    const scenarioPath = scenarioMdPathFromRun(EVAL_ROOT, meta.meta?.scenarioFile);
+    llmReport = await llmJudgeRun({
+      runPath: transcriptPath,
+      scenarioMdPath: scenarioPath,
+      apiKey: key,
+    });
+    if (!llmReport.overall_transcript_pass) {
+      fail = true;
+    }
+  }
+
+  if (values.json) {
+    if (doLlm && values["no-heuristic"]) {
+      console.log(JSON.stringify(llmReport, null, 2));
+    } else if (doHeuristic && !doLlm) {
+      console.log(JSON.stringify(heuristicReport, null, 2));
+    } else {
+      console.log(
+        JSON.stringify({ heuristic: heuristicReport, llm: llmReport }, null, 2),
+      );
+    }
+  } else {
+    if (heuristicReport) {
+      console.log(formatScoreReportHuman(heuristicReport));
+      console.log("");
+    }
+    if (llmReport) {
+      console.log(formatLlmReportHuman(llmReport));
+    }
+  }
+
+  if (values.write) {
+    const { heuristic: heuristicOut, llm: llmOut } = scoreSidecarPaths(transcriptPath);
+    if (heuristicReport) {
+      await writeFile(heuristicOut, `${JSON.stringify(heuristicReport, null, 2)}\n`, "utf8");
+      if (!values.json) {
+        console.error(`Wrote ${heuristicOut}`);
+      }
+    }
+    if (llmReport) {
+      await writeFile(llmOut, `${JSON.stringify(llmReport, null, 2)}\n`, "utf8");
+      if (!values.json) {
+        console.error(`Wrote ${llmOut}`);
+      }
+    }
+  }
+
+  process.exit(fail ? 1 : 0);
+}
+
+main().catch((e) => {
+  console.error(e);
+  process.exit(1);
+});
diff --git a/docs/agent-evaluation/src/score-transcript.ts b/docs/agent-evaluation/src/score-transcript.ts
new file mode 100644
index 000000000..5ba55459b
--- /dev/null
+++ b/docs/agent-evaluation/src/score-transcript.ts
@@ -0,0 +1,1119 @@
+/**
+ * Heuristic transcript scoring for agent eval runs.
+ * Maps to human checklist items in scenarios/*.md — not a substitute for execution verification.
+ */
+
+import { readFile, readdir, stat } from "node:fs/promises";
+import { basename, dirname, join } from "node:path";
+
+export interface CheckResult {
+  readonly id: string;
+  readonly pass: boolean;
+  readonly detail: string;
+}
+
+export interface TranscriptScore {
+  readonly passed: number;
+  readonly total: number;
+  readonly checks: readonly CheckResult[];
+  readonly fraction: number;
+}
+
+export interface ScoreReport {
+  readonly runFile: string;
+  readonly scenarioId: string;
+  readonly scenarioFile: string;
+  readonly transcript: TranscriptScore;
+  /** Automated harness does not run Outpost; execution stays manual or a future verifier. */
+  readonly execution: { readonly status: "not_automated"; readonly note: string };
+  /** null when no automated transcript rubric exists for this scenario yet */
+  readonly overallTranscriptPass: boolean | null;
+}
+
+interface RunJson {
+  meta?: {
+    scenarioId?: string;
+    scenarioFile?: string;
+    turns?: readonly { label?: string; messageCount?: number }[];
+  };
+  messages?: unknown[];
+}
+
+export function extractAssistantText(messages: unknown[] | undefined): string {
+  if (!messages?.length) return "";
+  let out = "";
+  for (const m of messages) {
+    if (typeof m !== "object" || m === null) continue;
+    const o = m as Record<string, unknown>;
+    if (o.type !== "assistant") continue;
+    const inner = o.message;
+    if (typeof inner !== "object" || inner === null) continue;
+    const msg = inner as Record<string, unknown>;
+    const content = msg.content;
+    if (!Array.isArray(content)) continue;
+    for (const block of content) {
+      if (typeof block !== "object" || block === null) continue;
+      const b = block as Record<string, unknown>;
+      if (b.type === "text" && typeof b.text === "string") {
+        out += b.text;
+      }
+    }
+  }
+  return out;
+}
+
+const MAX_TOOL_SCORING_CHARS = 600_000;
+
+/**
+ * Assistant-visible text plus tool inputs and Write/Edit file bodies from the transcript.
+ * Heuristics use this so scored content includes material that only appeared in tool calls/results.
+ */
+export function extractTranscriptScoringText(messages: unknown[] | undefined): string {
+  const assistant = extractAssistantText(messages);
+  if (!messages?.length) return assistant;
+  const chunks: string[] = [];
+  let budget = MAX_TOOL_SCORING_CHARS;
+
+  const push = (s: string) => {
+    if (budget <= 0) return;
+    const take = s.slice(0, budget);
+    chunks.push(take);
+    budget -= take.length;
+  };
+
+  for (const m of messages) {
+    if (typeof m !== "object" || m === null) continue;
+    const o = m as Record<string, unknown>;
+
+    if (o.type === "assistant") {
+      const inner = o.message;
+      if (typeof inner !== "object" || inner === null) continue;
+      const content = (inner as Record<string, unknown>).content;
+      if (!Array.isArray(content)) continue;
+      for (const block of content) {
+        if (typeof block !== "object" || block === null) continue;
+        const b = block as Record<string, unknown>;
+        if (b.type !== "tool_use") continue;
+        const input = b.input;
+        if (input !== undefined) {
+          try {
+            push(`\n[tool_use ${String(b.name ?? "?")}]\n${JSON.stringify(input)}\n`);
+          } catch {
+            push(`\n[tool_use ${String(b.name ?? "?")}]\n`);
+          }
+        }
+      }
+      continue;
+    }
+
+    if (o.type === "user") {
+      const tur = o.tool_use_result;
+      if (typeof tur === "object" && tur !== null) {
+        const t = tur as Record<string, unknown>;
+        if (typeof t.content === "string") {
+          push(`\n[tool_result content]\n${t.content}\n`);
+        }
+        if (typeof t.newContent === "string") {
+          push(`\n[tool_result newContent]\n${t.newContent}\n`);
+        }
+      }
+      const inner = o.message;
+      if (typeof inner === "object" && inner !== null) {
+        const content = (inner as Record<string, unknown>).content;
+        if (Array.isArray(content)) {
+          for (const block of content) {
+            if (typeof block !== "object" || block === null) continue;
+            const b = block as Record<string, unknown>;
+            if (b.type === "tool_result" && typeof b.content === "string") {
+              push(`\n[tool_result]\n${b.content}\n`);
+            }
+          }
+        }
+      }
+    }
+  }
+
+  return `${assistant}\n\n--- tool corpus ---\n${chunks.join("")}`;
+}
+
+function hadOptionalSecondUserTurn(meta: RunJson["meta"]): boolean {
+  const turns = meta?.turns ?? [];
+  return turns.some((t) => {
+    const l = (t.label ?? "").toLowerCase();
+    return l.includes("turn 2") || l.includes("optional");
+  });
+}
+
+/** Likely pasted API key (not env var reference). */
+function containsLikelyLeakedKey(text: string): boolean {
+  if (/Bearer\s+sk-ant-api/i.test(text)) return true;
+  if (/Bearer\s+[a-zA-Z0-9_-]{40,}/.test(text)) return true;
+  return false;
+}
+
+function scoreScenario01(corpus: string, assistant: string, meta: RunJson["meta"]): TranscriptScore {
+  const t = corpus;
+  const lower = t.toLowerCase();
+  const checks: CheckResult[] = [];
+
+  const managed =
+    t.includes("api.outpost.hookdeck.com/2025-07-01") ||
+    /\$OUTPOST_API_BASE_URL/.test(t);
+  // Self-hosted snippet must not be what the assistant told the user to run (tool corpus can quote docs).
+  const selfHostedInUserGuidance = /\blocalhost:3333\/api\/v1\b/.test(assistant);
+  checks.push({
+    id: "managed_base_url",
+    pass: managed && !selfHostedInUserGuidance,
+    detail: !managed
+      ? "Expected api.outpost.hookdeck.com/2025-07-01 or $OUTPOST_API_BASE_URL"
+      : selfHostedInUserGuidance
+        ? "Assistant guidance includes localhost:3333/api/v1 (self-hosted) as primary"
+        : "Uses managed API base (or OUTPOST_API_BASE_URL); no self-hosted path in assistant guidance",
+  });
+
+  const tenantPut =
+    /PUT|put/i.test(t) &&
+    (t.includes("/tenants/") || t.includes("/tenants/$") || t.includes("/tenants/${"));
+  checks.push({
+    id: "tenant_put",
+    pass: tenantPut,
+    detail: tenantPut ? "PUT …/tenants/… present" : "Expected PUT with /tenants/ path",
+  });
+
+  const dest =
+    lower.includes("webhook") &&
+    (t.includes("/destinations") || t.includes("/destinations\"")) &&
+    (lower.includes("post") || t.includes("-X POST") || t.includes("-X post"));
+  checks.push({
+    id: "destination_webhook",
+    pass: dest,
+    detail: dest ? "POST destinations with webhook" : "Expected POST …/destinations with webhook type",
+  });
+
+  const publish =
+    (t.includes("/publish") || t.includes("/publish\"")) &&
+    (lower.includes("post") || t.includes("-X POST"));
+  checks.push({
+    id: "publish_post",
+    pass: publish,
+    detail: publish ? "POST …/publish present" : "Expected POST publish",
+  });
+
+  const afterPublish = t.split(/\/publish/i).pop() ?? t;
+  const wrongPayload = /"payload"\s*:/.test(afterPublish);
+  const hasData = /"data"\s*:/.test(afterPublish);
+  checks.push({
+    id: "publish_body_data_not_payload",
+    pass: publish && !wrongPayload && hasData,
+    detail: !publish
+      ? "N/A (no publish block)"
+      : wrongPayload
+        ? 'Found "payload" after /publish — Outpost expects "data"'
+        : hasData
+          ? 'Publish section uses "data"'
+          : 'Missing "data" in publish JSON (check manually)',
+  });
+
+  checks.push({
+    id: "no_key_in_reply",
+    pass: !containsLikelyLeakedKey(assistant),
+    detail: containsLikelyLeakedKey(assistant)
+      ? "Possible raw API key in assistant-visible text"
+      : "No obvious raw Bearer secret in assistant text",
+  });
+
+  const verifyTurn = hadOptionalSecondUserTurn(meta);
+  if (verifyTurn) {
+    const verify =
+      lower.includes("hookdeck") &&
+      (lower.includes("console") || lower.includes("dashboard") || lower.includes("log"));
+    checks.push({
+      id: "verification_console_or_logs",
+      pass: verify,
+      detail: verify
+        ? "Turn 2+ mentions Hookdeck Console / dashboard / logs"
+        : "Optional verify turn ran but no Console/dashboard/logs mention found",
+    });
+  }
+
+  const passed = checks.filter((c) => c.pass).length;
+  const total = checks.length;
+  return {
+    passed,
+    total,
+    checks,
+    fraction: total ? passed / total : 0,
+  };
+}
+
+function scoreScenario02(corpus: string, assistant: string): TranscriptScore {
+  const t = corpus;
+  const checks: CheckResult[] = [];
+
+  const sdk = /@hookdeck\/outpost-sdk\b/.test(t);
+  checks.push({
+    id: "ts_sdk_dependency",
+    pass: sdk,
+    detail: sdk ? "References @hookdeck/outpost-sdk" : "Expected @hookdeck/outpost-sdk in code or package.json",
+  });
+
+  const client = /new\s+Outpost\s*\(|Outpost\s*\(\s*\{/.test(t);
+  checks.push({
+    id: "outpost_client",
+    pass: client,
+    detail: client ? "Constructs Outpost client" : "Expected new Outpost(…) or Outpost({ … })",
+  });
+
+  const envKey = /process\.env\.OUTPOST_API_KEY|OUTPOST_API_KEY/.test(t);
+  checks.push({
+    id: "env_api_key",
+    pass: envKey,
+    detail: envKey ? "Uses OUTPOST_API_KEY from env" : "Expected process.env.OUTPOST_API_KEY (or documented env)",
+  });
+
+  const upsert = /tenants\.upsert|tenants\?\.upsert/.test(t);
+  checks.push({
+    id: "tenants_upsert",
+    pass: upsert,
+    detail: upsert ? "Calls tenants.upsert" : "Expected tenants.upsert",
+  });
+
+  const dest = /destinations\.create|destinations\?\.create/.test(t);
+  checks.push({
+    id: "destinations_create",
+    pass: dest,
+    detail: dest ? "Calls destinations.create" : "Expected destinations.create",
+  });
+
+  const pub = /publish\.event|publish\?\.event/.test(t);
+  checks.push({
+    id: "publish_event",
+    pass: pub,
+    detail: pub ? "Calls publish.event" : "Expected publish.event",
+  });
+
+  const hookUrl = /OUTPOST_TEST_WEBHOOK_URL/.test(t);
+  checks.push({
+    id: "webhook_env",
+    pass: hookUrl,
+    detail: hookUrl ? "Uses OUTPOST_TEST_WEBHOOK_URL" : "Expected OUTPOST_TEST_WEBHOOK_URL for webhook URL",
+  });
+
+  const run = /npx\s+tsx\b|tsx\s+\S+\.ts\b|ts-node\b|node\s+.*\.ts\b/.test(t);
+  checks.push({
+    id: "run_instructions",
+    pass: run,
+    detail: run ? "Mentions npx tsx / ts-node / running .ts" : "Expected run instructions (e.g. npx tsx …)",
+  });
+
+  checks.push({
+    id: "no_key_in_reply",
+    pass: !containsLikelyLeakedKey(assistant),
+    detail: containsLikelyLeakedKey(assistant)
+      ? "Possible raw API key in assistant-visible text"
+      : "No obvious raw Bearer secret in assistant text",
+  });
+
+  const passed = checks.filter((c) => c.pass).length;
+  const total = checks.length;
+  return { passed, total, checks, fraction: total ? passed / total : 0 };
+}
+
+function scoreScenario03(corpus: string, assistant: string): TranscriptScore {
+  const t = corpus;
+  const checks: CheckResult[] = [];
+
+  const imp = /from\s+outpost_sdk\s+import|import\s+outpost_sdk/.test(t);
+  checks.push({
+    id: "python_sdk_import",
+    pass: imp,
+    detail: imp ? "Imports outpost_sdk" : "Expected `from outpost_sdk import …` or import outpost_sdk",
+  });
+
+  const client = /Outpost\s*\(/.test(t);
+  checks.push({
+    id: "outpost_client",
+    pass: client,
+    detail: client ? "Constructs Outpost(…)" : "Expected Outpost(…) client",
+  });
+
+  const upsert = /tenants\.upsert|tenants\?\.upsert/.test(t);
+  checks.push({
+    id: "tenants_upsert",
+    pass: upsert,
+    detail: upsert ? "Calls tenants.upsert" : "Expected tenants.upsert",
+  });
+
+  const dest = /destinations\.create|destinations\?\.create/.test(t);
+  checks.push({
+    id: "destinations_create",
+    pass: dest,
+    detail: dest ? "Calls destinations.create" : "Expected destinations.create",
+  });
+
+  const pub = /publish\.event|publish\?\.event/.test(t);
+  checks.push({
+    id: "publish_event",
+    pass: pub,
+    detail: pub ? "Calls publish.event" : "Expected publish.event",
+  });
+
+  const env = /os\.environ|getenv\s*\(\s*["']OUTPOST_API_KEY/.test(t);
+  checks.push({
+    id: "env_api_key",
+    pass: env,
+    detail: env ? "Reads API key from environment" : "Expected os.environ or getenv for OUTPOST_API_KEY",
+  });
+
+  const hookUrl = /OUTPOST_TEST_WEBHOOK_URL/.test(t);
+  checks.push({
+    id: "webhook_env",
+    pass: hookUrl,
+    detail: hookUrl ? "Uses OUTPOST_TEST_WEBHOOK_URL" : "Expected OUTPOST_TEST_WEBHOOK_URL",
+  });
+
+  checks.push({
+    id: "no_key_in_reply",
+    pass: !containsLikelyLeakedKey(assistant),
+    detail: containsLikelyLeakedKey(assistant)
+      ? "Possible raw API key in assistant-visible text"
+      : "No obvious raw Bearer secret in assistant text",
+  });
+
+  const passed = checks.filter((c) => c.pass).length;
+  const total = checks.length;
+  return { passed, total, checks, fraction: total ? passed / total : 0 };
+}
+
+function scoreScenario04(corpus: string, assistant: string): TranscriptScore {
+  const t = corpus;
+  const checks: CheckResult[] = [];
+
+  const mod = /hookdeck\/outpost.*outpost-go|outpost-go|outpostgo/.test(t);
+  checks.push({
+    id: "go_sdk_module",
+    pass: mod,
+    detail: mod ? "References outpost-go / outpostgo" : "Expected github.com/hookdeck/outpost/.../outpost-go or outpostgo",
+  });
+
+  const newClient = /outpostgo\.New\s*\(|\bNew\s*\(\s*context\./.test(t);
+  checks.push({
+    id: "go_client_new",
+    pass: newClient,
+    detail: newClient ? "Creates client with New(…)" : "Expected outpostgo.New(…) or similar",
+  });
+
+  const sec = /WithSecurity|WithServerURL/.test(t);
+  checks.push({
+    id: "go_client_options",
+    pass: sec,
+    detail: sec ? "Uses WithSecurity or WithServerURL" : "Expected WithSecurity (and optional WithServerURL)",
+  });
+
+  const upsert = /Tenants\.Upsert|\.Upsert\s*\(/.test(t);
+  checks.push({
+    id: "tenants_upsert",
+    pass: upsert,
+    detail: upsert ? "Calls Tenants.Upsert" : "Expected Tenants.Upsert",
+  });
+
+  const dest = /Destinations\.Create|CreateDestinationCreateWebhook/.test(t);
+  checks.push({
+    id: "destinations_create",
+    pass: dest,
+    detail: dest ? "Creates webhook destination" : "Expected Destinations.Create / CreateDestinationCreateWebhook",
+  });
+
+  const pub = /Publish\.Event|\.Event\s*\(/.test(t);
+  checks.push({
+    id: "publish_event",
+    pass: pub,
+    detail: pub ? "Calls Publish.Event" : "Expected Publish.Event",
+  });
+
+  const envKey = /Getenv\s*\(\s*["']OUTPOST_API_KEY["']/.test(t);
+  checks.push({
+    id: "env_api_key",
+    pass: envKey,
+    detail: envKey ? "Reads OUTPOST_API_KEY via os.Getenv" : "Expected os.Getenv(\"OUTPOST_API_KEY\")",
+  });
+
+  const hookUrl = /OUTPOST_TEST_WEBHOOK_URL/.test(t);
+  checks.push({
+    id: "webhook_env",
+    pass: hookUrl,
+    detail: hookUrl ? "Uses OUTPOST_TEST_WEBHOOK_URL" : "Expected OUTPOST_TEST_WEBHOOK_URL",
+  });
+
+  checks.push({
+    id: "no_key_in_reply",
+    pass: !containsLikelyLeakedKey(assistant),
+    detail: containsLikelyLeakedKey(assistant)
+      ? "Possible raw API key in assistant-visible text"
+      : "No obvious raw Bearer secret in assistant text",
+  });
+
+  const passed = checks.filter((c) => c.pass).length;
+  const total = checks.length;
+  return { passed, total, checks, fraction: total ? passed / total : 0 };
+}
+
+function scoreScenario05(corpus: string, assistant: string, meta: RunJson["meta"]): TranscriptScore {
+  const t = corpus;
+  const lower = t.toLowerCase();
+  const checks: CheckResult[] = [];
+
+  const next =
+    /"next"\s*:\s*"/.test(t) ||
+    /next\/dev|next\s+dev|next\.config/.test(t) ||
+    /\bnext@\d/.test(t);
+  checks.push({
+    id: "nextjs_signals",
+    pass: next,
+    detail: next ? "Next.js dependency or dev command present" : "Expected next in package.json or next dev / next.config",
+  });
+
+  const sdk = /@hookdeck\/outpost-sdk\b/.test(t);
+  checks.push({
+    id: "outpost_ts_sdk",
+    pass: sdk,
+    detail: sdk ? "Uses @hookdeck/outpost-sdk" : "Expected @hookdeck/outpost-sdk in dependencies or imports",
+  });
+
+  const api =
+    /app\/api\/[^"'\s]+\/route\.(t|j)sx?/.test(t) ||
+    /pages\/api\//.test(t) ||
+    /["']\/api\/(destination|destinations|event|publish)/.test(t);
+  checks.push({
+    id: "api_routes_layer",
+    pass: api,
+    detail: api ? "App/Pages API route layer present" : "Expected app/api/.../route or pages/api or /api/… fetches",
+  });
+
+  const twoFlows =
+    (/destination|webhook|subscribe/i.test(t) && /publish|event|send/i.test(t) && /\/api\//.test(t)) ||
+    (t.includes("/api/destination") && t.includes("/api/event"));
+  checks.push({
+    id: "destination_and_publish_surface",
+    pass: twoFlows,
+    detail: twoFlows
+      ? "Distinct destination + publish flows (URLs or labels)"
+      : "Expected separate destination registration and publish (e.g. two API routes or actions)",
+  });
+
+  const serverEnv =
+    /route\.(t|j)sx?[\s\S]{0,12000}process\.env\.OUTPOST_API_KEY|OUTPOST_API_KEY[\s\S]{0,800}(route\.(t|j)sx?|api\/)/i.test(
+      t,
+    ) || (/process\.env\.OUTPOST_API_KEY/.test(t) && /app\/api\//.test(t));
+  checks.push({
+    id: "server_env_outpost_key",
+    pass: serverEnv,
+    detail: serverEnv
+      ? "OUTPOST_API_KEY read server-side (e.g. API route)"
+      : "Expected process.env.OUTPOST_API_KEY in API route context",
+  });
+
+  const leakClient = /NEXT_PUBLIC_OUTPOST_API_KEY/.test(t);
+  checks.push({
+    id: "no_next_public_api_key",
+    pass: !leakClient,
+    detail: leakClient
+      ? "NEXT_PUBLIC_OUTPOST_API_KEY would expose key to browser"
+      : "No NEXT_PUBLIC_OUTPOST_API_KEY",
+  });
+
+  const readme = /README/i.test(t) && /OUTPOST_API_KEY/.test(t);
+  checks.push({
+    id: "readme_env",
+    pass: readme,
+    detail: readme ? "README mentions OUTPOST_API_KEY" : "Expected README with OUTPOST_API_KEY",
+  });
+
+  const managed =
+    !/\blocalhost:3333\/api\/v1\b/.test(t) &&
+    (!/localhost:\d{2,5}\s*\/\s*api\/v1/.test(t) || /OUTPOST_API_BASE_URL/.test(t));
+  checks.push({
+    id: "managed_base_not_selfhosted",
+    pass: managed,
+    detail: managed
+      ? "No self-hosted localhost API path as default"
+      : "Avoid localhost:3333/api/v1 unless user asked for self-hosted",
+  });
+
+  checks.push({
+    id: "no_key_in_reply",
+    pass: !containsLikelyLeakedKey(assistant),
+    detail: containsLikelyLeakedKey(assistant)
+      ? "Possible raw API key in assistant-visible text"
+      : "No obvious raw Bearer secret in assistant text",
+  });
+
+  const stressTurn = (meta?.turns?.length ?? 0) >= 4;
+  if (stressTurn) {
+    const hookdeckHint =
+      lower.includes("hookdeck") &&
+      (lower.includes("console") || lower.includes("source") || lower.includes("dashboard"));
+    checks.push({
+      id: "stress_public_url_hint",
+      pass: hookdeckHint,
+      detail: hookdeckHint
+        ? "Turn 3+ stress: mentions Hookdeck Console/Source/dashboard for webhook URL"
+        : "Stress turn present but no Hookdeck Console/Source hint found",
+    });
+  }
+
+  const passed = checks.filter((c) => c.pass).length;
+  const total = checks.length;
+  return { passed, total, checks, fraction: total ? passed / total : 0 };
+}
+
+function scoreScenario06(corpus: string, assistant: string): TranscriptScore {
+  const t = corpus;
+  const lower = t.toLowerCase();
+  const checks: CheckResult[] = [];
+
+  const fast = /FastAPI|from\s+fastapi\s+import/.test(t);
+  checks.push({
+    id: "fastapi_framework",
+    pass: fast,
+    detail: fast ? "Uses FastAPI" : "Expected FastAPI import or class",
+  });
+
+  const sdk = /from\s+outpost_sdk\s+import|import\s+outpost_sdk|outpost_sdk/.test(t);
+  checks.push({
+    id: "python_outpost_sdk",
+    pass: sdk,
+    detail: sdk ? "Uses outpost_sdk" : "Expected outpost_sdk import or usage",
+  });
+
+  const uv = /uvicorn/.test(lower);
+  checks.push({
+    id: "uvicorn_documented",
+    pass: uv,
+    detail: uv ? "Mentions uvicorn" : "Expected uvicorn run command or import",
+  });
+
+  const envKey = /OUTPOST_API_KEY/.test(t) && (/os\.environ|getenv/.test(t) || /Depends?\(/.test(t));
+  checks.push({
+    id: "server_env_api_key",
+    pass: envKey,
+    detail: envKey ? "API key from environment on server" : "Expected OUTPOST_API_KEY via os.environ/getenv or settings",
+  });
+
+  const two =
+    (/destination|webhook/i.test(t) && /publish|event/i.test(t)) ||
+    (/@app\.(get|post)|APIRouter/.test(t) && /publish/i.test(t) && /destination|webhook/i.test(t));
+  checks.push({
+    id: "register_and_publish_flow",
+    pass: two,
+    detail: two ? "Both destination/webhook and publish/event surfaced" : "Expected register webhook + publish flows",
+  });
+
+  const readme = /README/i.test(t) && /OUTPOST_API_KEY/.test(t);
+  checks.push({
+    id: "readme_env",
+    pass: readme,
+    detail: readme ? "README mentions OUTPOST_API_KEY" : "Expected README with OUTPOST_API_KEY",
+  });
+
+  const hookOrDoc = /OUTPOST_TEST_WEBHOOK_URL|TEST_WEBHOOK|webhook\s*url/i.test(t);
+  checks.push({
+    id: "webhook_url_documented",
+    pass: hookOrDoc,
+    detail: hookOrDoc ? "Webhook URL env or field documented" : "Expected OUTPOST_TEST_WEBHOOK_URL or webhook URL docs",
+  });
+
+  checks.push({
+    id: "no_key_in_reply",
+    pass: !containsLikelyLeakedKey(assistant),
+    detail: containsLikelyLeakedKey(assistant)
+      ? "Possible raw API key in assistant-visible text"
+      : "No obvious raw Bearer secret in assistant text",
+  });
+
+  const passed = checks.filter((c) => c.pass).length;
+  const total = checks.length;
+  return { passed, total, checks, fraction: total ? passed / total : 0 };
+}
+
+function scoreScenario07(corpus: string, assistant: string): TranscriptScore {
+  const t = corpus;
+  const lower = t.toLowerCase();
+  const checks: CheckResult[] = [];
+
+  const httpLib = /"net\/http"|net\/http/.test(t) || /\bhttp\.HandleFunc\b/.test(t);
+  checks.push({
+    id: "stdlib_http",
+    pass: httpLib,
+    detail: httpLib ? "Uses net/http" : "Expected net/http or http.HandleFunc",
+  });
+
+  const sdk = /hookdeck\/outpost.*outpost-go|outpostgo|CreateDestinationCreateWebhook/.test(t);
+  checks.push({
+    id: "go_outpost_sdk",
+    pass: sdk,
+    detail: sdk ? "Uses Outpost Go SDK patterns" : "Expected outpost-go / CreateDestinationCreateWebhook",
+  });
+
+  const createWebhook = /CreateDestinationCreateWebhook/.test(t);
+  checks.push({
+    id: "create_destination_webhook",
+    pass: createWebhook,
+    detail: createWebhook ? "CreateDestinationCreateWebhook present" : "Expected CreateDestinationCreateWebhook wrapper",
+  });
+
+  const htmlUi = /<form|<button|text\/html|template\.Execute/.test(t);
+  checks.push({
+    id: "html_ui",
+    pass: htmlUi,
+    detail: htmlUi ? "HTML form/button or template response" : "Expected simple HTML UI",
+  });
+
+  const two =
+    (/destination|webhook/i.test(t) && /publish|event/i.test(t)) ||
+    (/register|destination/i.test(lower) && /publish/i.test(lower));
+  checks.push({
+    id: "destination_and_publish_ui",
+    pass: two,
+    detail: two ? "Destination + publish reflected in UI or handlers" : "Expected both create destination and publish flows",
+  });
+
+  const envKey = /Getenv\s*\(\s*["']OUTPOST_API_KEY["']/.test(t);
+  checks.push({
+    id: "env_api_key",
+    pass: envKey,
+    detail: envKey ? "Reads OUTPOST_API_KEY from env" : "Expected os.Getenv(\"OUTPOST_API_KEY\")",
+  });
+
+  const run = /go\s+run\b/.test(lower);
+  checks.push({
+    id: "go_run_documented",
+    pass: run,
+    detail: run ? "Documents go run" : "Expected `go run` instructions",
+  });
+
+  const readme = /README/i.test(t) && (/OUTPOST_API_KEY|port/i.test(t));
+  checks.push({
+    id: "readme_env_or_port",
+    pass: readme,
+    detail: readme ? "README mentions env or port" : "Expected README with OUTPOST_API_KEY or port",
+  });
+
+  checks.push({
+    id: "no_key_in_reply",
+    pass: !containsLikelyLeakedKey(assistant),
+    detail: containsLikelyLeakedKey(assistant)
+      ? "Possible raw API key in assistant-visible text"
+      : "No obvious raw Bearer secret in assistant text",
+  });
+
+  const passed = checks.filter((c) => c.pass).length;
+  const total = checks.length;
+  return { passed, total, checks, fraction: total ? passed / total : 0 };
+}
+
+/** Option 3 — integrate Outpost into an existing SaaS-style codebase (Next.js baseline). */
+function scoreScenario08(corpus: string, assistant: string): TranscriptScore {
+  const t = corpus;
+  const lower = t.toLowerCase();
+  const checks: CheckResult[] = [];
+
+  const baseline =
+    /leerob\/next-saas-starter|next-saas-starter/.test(t) ||
+    (/git\s+clone\b/.test(lower) && /github\.com/.test(t));
+  checks.push({
+    id: "baseline_or_clone",
+    pass: baseline,
+    detail: baseline
+      ? "References next-saas-starter baseline or git clone from GitHub"
+      : "Expected clone/setup of the documented baseline (e.g. leerob/next-saas-starter)",
+  });
+
+  const sdk = /@hookdeck\/outpost-sdk\b/.test(t);
+  checks.push({
+    id: "outpost_ts_sdk",
+    pass: sdk,
+    detail: sdk ? "Uses @hookdeck/outpost-sdk" : "Expected @hookdeck/outpost-sdk",
+  });
+
+  const integration =
+    /publish\.event|destinations\.create|tenants\.upsert/.test(t) ||
+    /\/api\/.*outpost|outpost.*publish/i.test(t);
+  checks.push({
+    id: "outpost_integration_calls",
+    pass: integration,
+    detail: integration
+      ? "Server-side Outpost client usage (publish / destinations / tenants)"
+      : "Expected publish.event, destinations.create, or tenants.upsert (or clear API wrapper)",
+  });
+
+  const topic = /user\.created|topic|TOPIC/.test(t);
+  checks.push({
+    id: "topic_or_event_hook",
+    pass: topic,
+    detail: topic ? "Topic or event hook documented" : "Expected topic from prompt or explicit event naming",
+  });
+
+  const serverKey =
+    /process\.env\.OUTPOST_API_KEY/.test(t) &&
+    !/NEXT_PUBLIC_OUTPOST_API_KEY/.test(t);
+  checks.push({
+    id: "server_env_key_only",
+    pass: serverKey,
+    detail: serverKey
+      ? "OUTPOST_API_KEY read server-side; no NEXT_PUBLIC_ key"
+      : "Expected process.env.OUTPOST_API_KEY and no NEXT_PUBLIC_OUTPOST_API_KEY",
+  });
+
+  const destDoc =
+    /destination|webhook\s*url|register.*webhook/i.test(t) && /tenant|customer|team/i.test(lower);
+  checks.push({
+    id: "destination_per_customer_doc",
+    pass: destDoc,
+    detail: destDoc
+      ? "Documents webhook destination registration per tenant/customer (or team)"
+      : "Expected how operators register webhook URLs per customer/tenant",
+  });
+
+  checks.push({
+    id: "no_key_in_reply",
+    pass: !containsLikelyLeakedKey(assistant),
+    detail: containsLikelyLeakedKey(assistant)
+      ? "Possible raw API key in assistant-visible text"
+      : "No obvious raw Bearer secret in assistant text",
+  });
+
+  const passed = checks.filter((c) => c.pass).length;
+  const total = checks.length;
+  return { passed, total, checks, fraction: total ? passed / total : 0 };
+}
+
+/** Option 3 — existing FastAPI SaaS baseline. */
+function scoreScenario09(corpus: string, assistant: string): TranscriptScore {
+  const t = corpus;
+  const lower = t.toLowerCase();
+  const checks: CheckResult[] = [];
+
+  const baseline =
+    /philipokiokio\/fastapi_saas_template|fastapi_saas_template|FastAPI_SAAS/i.test(t) ||
+    (/git\s+clone\b/.test(lower) && /github\.com/.test(t));
+  checks.push({
+    id: "baseline_or_clone",
+    pass: baseline,
+    detail: baseline
+      ? "References FastAPI_SAAS_Template baseline or git clone"
+      : "Expected clone/setup of philipokiokio/FastAPI_SAAS_Template (or documented alternative)",
+  });
+
+  const sdk = /from\s+outpost_sdk\s+import|import\s+outpost_sdk/.test(t);
+  checks.push({
+    id: "python_outpost_sdk",
+    pass: sdk,
+    detail: sdk ? "Imports outpost_sdk" : "Expected outpost_sdk import",
+  });
+
+  const integration =
+    /publish\.event|destinations\.create|tenants\.upsert/.test(t);
+  checks.push({
+    id: "outpost_integration_calls",
+    pass: integration,
+    detail: integration ? "Uses tenants/destinations/publish APIs" : "Expected SDK API calls for Outpost",
+  });
+
+  const hook =
+    /signal|event|webhook|post_save|after_create|lifecycle|router\.(post|put)/i.test(t) &&
+    /publish|outpost/i.test(lower);
+  checks.push({
+    id: "domain_event_hook",
+    pass: hook,
+    detail: hook
+      ? "Hooks Outpost publish into an application event or route"
+      : "Expected tying publish to a domain event or HTTP handler",
+  });
+
+  const env = /OUTPOST_API_KEY/.test(t) && (/os\.environ|getenv|settings|Depends/.test(t));
+  checks.push({
+    id: "env_api_key",
+    pass: env,
+    detail: env ? "API key from environment / settings" : "Expected OUTPOST_API_KEY from env",
+  });
+
+  checks.push({
+    id: "no_key_in_reply",
+    pass: !containsLikelyLeakedKey(assistant),
+    detail: containsLikelyLeakedKey(assistant)
+      ? "Possible raw API key in assistant-visible text"
+      : "No obvious raw Bearer secret in assistant text",
+  });
+
+  const passed = checks.filter((c) => c.pass).length;
+  const total = checks.length;
+  return { passed, total, checks, fraction: total ? passed / total : 0 };
+}
+
+/** Option 3 — existing Go SaaS/API baseline. */
+function scoreScenario10(corpus: string, assistant: string): TranscriptScore {
+  const t = corpus;
+  const lower = t.toLowerCase();
+  const checks: CheckResult[] = [];
+
+  const baseline =
+    /devinterface\/startersaas-go-api|startersaas-go-api|StarterSaaS/.test(t) ||
+    (/git\s+clone\b/.test(lower) && /github\.com/.test(t));
+  checks.push({
+    id: "baseline_or_clone",
+    pass: baseline,
+    detail: baseline
+      ? "References StarterSaaS Go API baseline or git clone"
+      : "Expected clone/setup of devinterface/startersaas-go-api (or documented alternative)",
+  });
+
+  const sdk = /hookdeck\/outpost.*outpost-go|outpostgo\.|github\.com\/hookdeck\/outpost/.test(t);
+  checks.push({
+    id: "go_outpost_sdk",
+    pass: sdk,
+    detail: sdk ? "Uses Outpost Go module" : "Expected outpost-go / outpostgo import path",
+  });
+
+  const integration = /Publish\.Event|Tenants\.|Destinations\./.test(t);
+  checks.push({
+    id: "outpost_integration_calls",
+    pass: integration,
+    detail: integration ? "Uses Outpost Go client operations" : "Expected Publish / Tenants / Destinations usage",
+  });
+
+  const hook =
+    /handler|middleware|OnUser|event|CreateUser|signup|register/i.test(t) && /publish|outpost/i.test(lower);
+  checks.push({
+    id: "domain_event_hook",
+    pass: hook,
+    detail: hook
+      ? "Hooks publish into a handler or domain flow"
+      : "Expected publish tied to a concrete code path",
+  });
+
+  const envKey = /Getenv\s*\(\s*["']OUTPOST_API_KEY["']/.test(t);
+  checks.push({
+    id: "env_api_key",
+    pass: envKey,
+    detail: envKey ? "Reads OUTPOST_API_KEY via os.Getenv" : "Expected os.Getenv(\"OUTPOST_API_KEY\")",
+  });
+
+  checks.push({
+    id: "no_key_in_reply",
+    pass: !containsLikelyLeakedKey(assistant),
+    detail: containsLikelyLeakedKey(assistant)
+      ? "Possible raw API key in assistant-visible text"
+      : "No obvious raw Bearer secret in assistant text",
+  });
+
+  const passed = checks.filter((c) => c.pass).length;
+  const total = checks.length;
+  return { passed, total, checks, fraction: total ? passed / total : 0 };
+}
+
+/** Scenarios with a non-empty regex rubric in this file (used for exit / overallTranscriptPass). */
+export const SCENARIO_IDS_WITH_HEURISTIC_RUBRIC: ReadonlySet<string> = new Set([
+  "01",
+  "02",
+  "03",
+  "04",
+  "05",
+  "06",
+  "07",
+  "08",
+  "09",
+  "10",
+]);
+
+function scoreByScenarioId(
+  scenarioId: string,
+  corpus: string,
+  assistant: string,
+  meta: RunJson["meta"],
+): TranscriptScore {
+  switch (scenarioId) {
+    case "01":
+      return scoreScenario01(corpus, assistant, meta);
+    case "02":
+      return scoreScenario02(corpus, assistant);
+    case "03":
+      return scoreScenario03(corpus, assistant);
+    case "04":
+      return scoreScenario04(corpus, assistant);
+    case "05":
+      return scoreScenario05(corpus, assistant, meta);
+    case "06":
+      return scoreScenario06(corpus, assistant);
+    case "07":
+      return scoreScenario07(corpus, assistant);
+    case "08":
+      return scoreScenario08(corpus, assistant);
+    case "09":
+      return scoreScenario09(corpus, assistant);
+    case "10":
+      return scoreScenario10(corpus, assistant);
+    default:
+      return {
+        passed: 0,
+        total: 0,
+        checks: [],
+        fraction: 0,
+      };
+  }
+}
+
+export async function scoreRunJson(
+  runPath: string,
+  raw: string,
+): Promise<ScoreReport> {
+  const data = JSON.parse(raw) as RunJson;
+  const scenarioId = data.meta?.scenarioId ?? "unknown";
+  const scenarioFile = data.meta?.scenarioFile ?? `${scenarioId}-unknown.md`;
+  const assistantOnly = extractAssistantText(data.messages);
+  const corpus = extractTranscriptScoringText(data.messages);
+  const transcript = scoreByScenarioId(scenarioId, corpus, assistantOnly, data.meta);
+
+  const hasRubric = SCENARIO_IDS_WITH_HEURISTIC_RUBRIC.has(scenarioId);
+  const overallTranscriptPass = hasRubric
+    ? transcript.total > 0 && transcript.passed === transcript.total
+    : null;
+
+  return {
+    runFile: runPath,
+    scenarioId,
+    scenarioFile,
+    transcript,
+    execution: {
+      status: "not_automated",
+      note:
+        "Execution (live Outpost) is not scored here. After running curls/code with OUTPOST_API_KEY, mark the Execution row in scenarios/*.md or results/RUN-RECORDING.template.md.",
+    },
+    overallTranscriptPass,
+  };
+}
+
+export async function scoreRunFile(runPath: string): Promise<ScoreReport> {
+  const raw = await readFile(runPath, "utf8");
+  return scoreRunJson(runPath, raw);
+}
+
+/** Resolve a run directory or legacy flat JSON path to transcript.json path. */
+export async function resolveTranscriptJsonPath(input: string): Promise<string> {
+  let st;
+  try {
+    st = await stat(input);
+  } catch {
+    throw new Error(`Path not found: ${input}`);
+  }
+  if (st.isDirectory()) {
+    const t = join(input, "transcript.json");
+    try {
+      await stat(t);
+    } catch {
+      throw new Error(`No transcript.json in directory: ${input}`);
+    }
+    return t;
+  }
+  return input;
+}
+
+/** Sidecar score paths: nested run dir vs legacy flat *-scenario-NN.json */
+export function scoreSidecarPaths(transcriptPath: string): {
+  heuristic: string;
+  llm: string;
+} {
+  if (basename(transcriptPath) === "transcript.json") {
+    const dir = dirname(transcriptPath);
+    return {
+      heuristic: join(dir, "heuristic-score.json"),
+      llm: join(dir, "llm-score.json"),
+    };
+  }
+  return {
+    heuristic: transcriptPath.replace(/\.json$/i, ".score.json"),
+    llm: transcriptPath.replace(/\.json$/i, ".llm-score.json"),
+  };
+}
+
+export async function findLatestRunFile(
+  runsDir: string,
+  scenarioId?: string,
+): Promise<string | null> {
+  const entries = await readdir(runsDir, { withFileTypes: true });
+  /** Mutable holder so TS control flow tracks updates across async `consider` calls. */
+  const latest = { path: null as string | null, mtime: -Infinity };
+
+  const consider = async (transcriptPath: string) => {
+    try {
+      const st = await stat(transcriptPath);
+      if (st.mtimeMs > latest.mtime) {
+        latest.path = transcriptPath;
+        latest.mtime = st.mtimeMs;
+      }
+    } catch {
+      /* skip */
+    }
+  };
+
+  for (const ent of entries) {
+    const name = ent.name;
+    if (ent.isDirectory()) {
+      if (!/-scenario-\d{2}$/i.test(name)) continue;
+      if (
+        scenarioId &&
+        !name.endsWith(`scenario-${scenarioId.padStart(2, "0")}`)
+      ) {
+        continue;
+      }
+      await consider(join(runsDir, name, "transcript.json"));
+      continue;
+    }
+    if (
+      ent.isFile() &&
+      /-scenario-\d{2}\.json$/i.test(name) &&
+      !name.endsWith(".score.json") &&
+      !name.endsWith(".llm-score.json")
+    ) {
+      if (
+        scenarioId &&
+        !name.includes(`scenario-${scenarioId.padStart(2, "0")}`)
+      ) {
+        continue;
+      }
+      await consider(join(runsDir, name));
+    }
+  }
+
+  return latest.path;
+}
+
+export function formatScoreReportHuman(r: ScoreReport): string {
+  const lines: string[] = [
+    `Transcript: ${r.runFile}`,
+    `Scenario: ${r.scenarioId} (${r.scenarioFile})`,
+  ];
+  if (basename(r.runFile) === "transcript.json") {
+    lines.push(`Run directory (agent workspace): ${dirname(r.runFile)}`);
+  }
+  lines.push("");
+  if (r.transcript.total === 0) {
+    lines.push("Transcript checks: (no automated rubric — add scorers in src/score-transcript.ts)");
+  } else {
+    lines.push(
+      `Transcript checks: ${r.transcript.passed}/${r.transcript.total} passed (${Math.round(r.transcript.fraction * 100)}%)`,
+    );
+  }
+  for (const c of r.transcript.checks) {
+    lines.push(`  [${c.pass ? "PASS" : "FAIL"}] ${c.id}: ${c.detail}`);
+  }
+  lines.push("");
+  lines.push(`Execution: ${r.execution.status} — ${r.execution.note}`);
+  lines.push("");
+  lines.push(
+    `Overall transcript pass: ${
+      r.overallTranscriptPass === null ? "N/A (no rubric)" : r.overallTranscriptPass ? "YES" : "NO"
+    }`,
+  );
+  return lines.join("\n");
+}
diff --git a/docs/agent-evaluation/tsconfig.json b/docs/agent-evaluation/tsconfig.json
new file mode 100644
index 000000000..80fcf22d3
--- /dev/null
+++ b/docs/agent-evaluation/tsconfig.json
@@ -0,0 +1,15 @@
+{
+  "compilerOptions": {
+    "target": "ES2022",
+    "module": "NodeNext",
+    "moduleResolution": "NodeNext",
+    "lib": ["ES2022"],
+    "strict": true,
+    "skipLibCheck": true,
+    "noEmit": true,
+    "esModuleInterop": true,
+    "verbatimModuleSyntax": true,
+    "resolveJsonModule": true
+  },
+  "include": ["src/**/*.ts"]
+}
diff --git a/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx b/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx
index 1f2a3a394..bba3b53d7 100644
--- a/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx
+++ b/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx
@@ -3,7 +3,7 @@ title: "Hookdeck Outpost — agent prompt template"
 description: "Copy-paste template for AI coding agents. Dashboard teams should inject the placeholders server-side or client-side."
 ---
 
-This page is a **reference template** for the Hookdeck Outpost onboarding flow. Replace `{{PLACEHOLDERS}}` with values from the operator’s project (or render them in the dashboard). **Do not** put the API key in the prompt; the operator sets `OUTPOST_API_KEY` separately. API keys are created under the Outpost project: **Settings → Secrets** (the same Outpost API key used by the REST API and SDKs).
+This page is a **reference template** for the Hookdeck Outpost onboarding flow. Replace `{{PLACEHOLDERS}}` with values from the operator’s project (or render them in the dashboard). **Do not** put the API key in the prompt; the operator sets `OUTPOST_API_KEY` separately (for example in a project **`.env`** file loaded by their shell or app—never pasted into chat). API keys are created under the Outpost project: **Settings → Secrets** (the same Outpost API key used by the REST API and SDKs).
 
 ## Template
 
@@ -15,7 +15,7 @@ You are helping integrate Hookdeck Outpost into a platform to deliver events (we
 ### Credentials
 
 - API base URL: {{API_BASE_URL}}
-- API key (Outpost API key from the project **Settings → Secrets**): read from the `OUTPOST_API_KEY` environment variable (never ask the user to paste the key into chat)
+- API key (Outpost API key from the project **Settings → Secrets**): load from the `OUTPOST_API_KEY` environment variable — typically a **`.env`** file in the operator’s project (or another secrets mechanism their tooling loads); never ask the user to paste the key into chat
 
 ### Configured topics
 
@@ -23,7 +23,9 @@ You are helping integrate Hookdeck Outpost into a platform to deliver events (we
 
 ### Test destination
 
-Use this URL to verify event delivery (webhook destination): {{TEST_DESTINATION_URL}}
+Use this **Hookdeck Console Source** URL to verify event delivery (the webhook `config.url`, or `OUTPOST_TEST_WEBHOOK_URL` in the SDK quickstarts). Your dashboard supplies it for this project:
+
+{{TEST_DESTINATION_URL}}
 
 ### Documentation
 
@@ -48,6 +50,8 @@ Ask the user which of the following they want:
 
 For all modes, read the relevant quickstart documentation before writing code.
 
+**Files on disk:** When your environment supports it, **write runnable artifacts into the operator’s project workspace** (real files: scripts, app source, `package.json`, `go.mod`, README) rather than only pasting long code in chat—so they can run, diff, and commit. Keep everything for one task in the same directory. For **minimal example apps** (option 2), scaffold and install dependencies there as you normally would (for example `npm` / `npx`, `go mod`, `pip` or `uv`).
+
 **Concepts:** Each tenant is one of the platform's customers. Destinations are where events are delivered (webhook URLs, queues, etc.). Events are published with a **topic**; only destinations subscribed to that topic receive the event. Topics for this project are listed above and were configured in the Hookdeck dashboard.
 ```
 
@@ -57,12 +61,12 @@ For all modes, read the relevant quickstart documentation before writing code.
 |-------------|---------|--------|
 | `{{API_BASE_URL}}` | `https://api.outpost.hookdeck.com/2025-07-01` | Safe to embed in the prompt |
 | `{{TOPICS_LIST}}` | Bullet list or comma-separated topic names | From dashboard config |
-| `{{TEST_DESTINATION_URL}}` | Unique URL from Hookdeck Console Source, or operator’s test endpoint | May be TBC until `console.hookdeck.com` flow is finalized |
-| `{{DOCS_URL}}` | `https://hookdeck.com/outpost/docs` | Public docs root (no trailing slash) |
+| `{{TEST_DESTINATION_URL}}` | **Required** — HTTPS URL of the Hookdeck Console **Source** created for this onboarding flow (fed in by the dashboard). |
+| `{{DOCS_URL}}` | `https://outpost.hookdeck.com/docs` | Public docs root (no trailing slash). For unpublished docs, automated evals can set **`EVAL_LOCAL_DOCS=1`** so the Documentation section is replaced with repository file paths (see `docs/agent-evaluation/README.md`). |
 | `{{LLMS_FULL_URL}}` | `https://hookdeck.com/outpost/docs/llms-full.txt` | Optional; omit the line if not live yet |
 
 ## Operator checklist (dashboard UI)
 
 - Show **API base URL** and **topics** next to the copyable prompt.
-- Explain that the **API key** is the Outpost API key from **Settings → Secrets**, and show **environment variables**: `OUTPOST_API_KEY` (value with copy button), optional `OUTPOST_API_BASE_URL`, and `OUTPOST_TEST_WEBHOOK_URL` when the quickstart examples need a test webhook URL.
+- Feed **`{{TEST_DESTINATION_URL}}`** from a Hookdeck Console **Source** URL you create for the operator (same value can be shown for `OUTPOST_TEST_WEBHOOK_URL` in env UI). Explain **Settings → Secrets** for `OUTPOST_API_KEY` (recommend a project **`.env`** or env-injection pattern, not pasting into the agent). Optional `OUTPOST_API_BASE_URL`.
 - Keep the **API key out of the prompt text** to reduce exposure via model logs and chat history.

From 76d7c9be0ff7b567ea057bd93fa391a3b5e2e935 Mon Sep 17 00:00:00 2001
From: Phil Leggetter <phil@leggetter.co.uk>
Date: Wed, 8 Apr 2026 15:58:19 +0100
Subject: [PATCH 03/47] docs(agent-eval): prompt mapping, scenarios, harness;
 reset scenario tracker

- Agent prompt: language implies SDK; simplest path defaults to curl; option 2/3
  framework mapping; warn on sdks.mdx vs per-language quickstarts.
- Curl quickstart: shell script notes (HTTP 202, portable body/status split).
- run-agent-eval: PreToolUse write guard, default EVAL_MAX_TURNS 80, local docs
  block aligned with prompt; scenario heuristic fix for publish data key escaping.
- Scenarios 01-10: realistic short user turns; success-criteria fixes where needed.
- SCENARIO-RUN-TRACKER: cleared run results for a fresh pass; action items reset.
- README and .env.example updates for eval harness as applicable.

Made-with: Cursor
---
 ...TEMP-hookdeck-outpost-onboarding-status.md | 31 ++++---
 docs/agent-evaluation/.env.example            |  2 +
 docs/agent-evaluation/README.md               |  2 +
 docs/agent-evaluation/SCENARIO-RUN-TRACKER.md | 61 +++++++------
 .../scenarios/01-basics-curl.md               | 13 ++-
 .../scenarios/02-basics-typescript.md         |  6 +-
 .../scenarios/03-basics-python.md             | 22 ++---
 .../scenarios/04-basics-go.md                 |  6 +-
 .../scenarios/05-app-nextjs.md                |  8 +-
 .../scenarios/06-app-fastapi.md               |  4 +-
 .../scenarios/07-app-go-http.md               |  4 +-
 .../scenarios/08-integrate-nextjs-existing.md | 11 +--
 .../09-integrate-fastapi-existing.md          | 11 +--
 .../scenarios/10-integrate-go-existing.md     | 11 +--
 docs/agent-evaluation/src/run-agent-eval.ts   | 88 +++++++++++++++++--
 docs/agent-evaluation/src/score-transcript.ts |  7 +-
 .../hookdeck-outpost-agent-prompt.mdx         | 40 ++++++---
 .../quickstarts/hookdeck-outpost-curl.mdx     |  7 ++
 18 files changed, 216 insertions(+), 118 deletions(-)

diff --git a/docs/TEMP-hookdeck-outpost-onboarding-status.md b/docs/TEMP-hookdeck-outpost-onboarding-status.md
index 1d481b17f..8fbff69c8 100644
--- a/docs/TEMP-hookdeck-outpost-onboarding-status.md
+++ b/docs/TEMP-hookdeck-outpost-onboarding-status.md
@@ -10,15 +10,17 @@
 
 The automated harness in `docs/agent-evaluation/` is in place. **What it does today:**
 
-| Area | Status |
-|------|--------|
-| **Runner** | `src/run-agent-eval.ts` — **## Template** from `hookdeck-outpost-agent-prompt.mdx`, `{{…}}` from env, multi-turn scenarios, **Claude Agent SDK** with **`Read` / `Glob` / `Grep` / `WebFetch` / `Write` / `Edit` / `Bash`**, **`cwd`** = `results/runs/<stamp>-scenario-NN/` |
-| **Artifacts** | `transcript.json`, optional **`heuristic-score.json`** + **`llm-score.json`** (LLM reads each scenario **`## Success criteria`**), agent-written files beside the transcript |
-| **Heuristics** | `score-transcript.ts` — **`scoreScenario01`–`scoreScenario10`** on assistant text + tool corpus (so **Write**/Edit content counts) |
-| **Scenarios** | **01–04:** try-it-out (curl, TS, Python, Go). **05–07:** minimal UIs (Next, FastAPI, Go `net/http`). **08–10:** Option 3 — integrate into pinned repos (Next **`leerob/next-saas-starter`**, FastAPI **`philipokiokio/FastAPI_SAAS_Template`**, Go **`devinterface/startersaas-go-api`**) |
-| **CLI** | **`npm run eval` requires `--scenario`, `--scenarios`, or `--all`** — no accidental full-suite run. Default scoring = **heuristic + LLM judge** unless **`--no-score`** / **`--no-score-llm`** or **`EVAL_NO_SCORE_*`**. **Exit 1** if any enabled score fails |
-| **CI** | **`npm run eval:ci`** = **`--scenarios 01,02`** + heuristic **and** LLM judge. **`scripts/ci-eval.sh`** — requires **`ANTHROPIC_API_KEY`**, **`EVAL_TEST_DESTINATION_URL`** |
-| **Re-score** | `npm run score -- --run <run-dir> [--llm] [--write]` |
+
+| Area           | Status                                                                                                                                                                                                                                                                                    |
+| -------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| **Runner**     | `src/run-agent-eval.ts` — **## Template** from `hookdeck-outpost-agent-prompt.mdx`, `{{…}}` from env, multi-turn scenarios, **Claude Agent SDK** with `**Read` / `Glob` / `Grep` / `WebFetch` / `Write` / `Edit` / `Bash`**, `**cwd`** = `results/runs/<stamp>-scenario-NN/`              |
+| **Artifacts**  | `transcript.json`, optional `**heuristic-score.json`** + `**llm-score.json`** (LLM reads each scenario `**## Success criteria**`), agent-written files beside the transcript                                                                                                              |
+| **Heuristics** | `score-transcript.ts` — `**scoreScenario01`–`scoreScenario10`** on assistant text + tool corpus (so **Write**/Edit content counts)                                                                                                                                                        |
+| **Scenarios**  | **01–04:** try-it-out (curl, TS, Python, Go). **05–07:** minimal UIs (Next, FastAPI, Go `net/http`). **08–10:** Option 3 — integrate into pinned repos (Next `**leerob/next-saas-starter`**, FastAPI `**philipokiokio/FastAPI_SAAS_Template`**, Go `**devinterface/startersaas-go-api**`) |
+| **CLI**        | `**npm run eval` requires `--scenario`, `--scenarios`, or `--all`** — no accidental full-suite run. Default scoring = **heuristic + LLM judge** unless `**--no-score`** / `**--no-score-llm`** or `**EVAL_NO_SCORE_***`. **Exit 1** if any enabled score fails                            |
+| **CI**         | `**npm run eval:ci`** = `**--scenarios 01,02`** + heuristic **and** LLM judge. `**scripts/ci-eval.sh`** — requires `**ANTHROPIC_API_KEY`**, `**EVAL_TEST_DESTINATION_URL**`                                                                                                               |
+| **Re-score**   | `npm run score -- --run <run-dir> [--llm] [--write]`                                                                                                                                                                                                                                      |
+
 
 **Operational**
 
@@ -27,7 +29,7 @@ The automated harness in `docs/agent-evaluation/` is in place. **What it does to
 
 ### Recommended run order (test evals → stress prompt)
 
-Run from **`docs/agent-evaluation/`** with **`.env`** set (**`ANTHROPIC_API_KEY`**, **`EVAL_TEST_DESTINATION_URL`**). Use a normal terminal (not a restricted sandbox) for reliable SDK sessions.
+Run from `**docs/agent-evaluation/`** with `**.env`** set (`**ANTHROPIC_API_KEY**`, `**EVAL_TEST_DESTINATION_URL**`). Use a normal terminal (not a restricted sandbox) for reliable SDK sessions.
 
 **Stage A — basics (fast, minimal tooling)**
 
@@ -53,7 +55,7 @@ npm run eval -- --scenarios 08,09,10
 npm run eval -- --all
 ```
 
-After each stage, inspect **`results/runs/<stamp>-scenario-NN/`** (transcript, scores, on-disk artifacts). **Goal:** confirm the **dashboard prompt** + **Success criteria** hold across stacks; **Execution** (live **`OUTPOST_API_KEY`**) remains a separate human step per scenario.
+After each stage, inspect `**results/runs/<stamp>-scenario-NN/**` (transcript, scores, on-disk artifacts). **Goal:** confirm the **dashboard prompt** + **Success criteria** hold across stacks; **Execution** (live `**OUTPOST_API_KEY`**) remains a separate human step per scenario.
 
 ---
 
@@ -63,7 +65,7 @@ After each stage, inspect **`results/runs/<stamp>-scenario-NN/`** (transcript, s
 2. **Default backend: Anthropic** — ✅ Agent SDK.
 3. **Claude Code CLI** — Optional local path only (unchanged).
 4. **OpenAI adapter** — Still optional / not implemented.
-5. **Judging** — ✅ Transcripts on disk; ✅ heuristics; ✅ LLM-as-judge vs **`## Success criteria`**.
+5. **Judging** — ✅ Transcripts on disk; ✅ heuristics; ✅ LLM-as-judge vs `**## Success criteria`**.
 6. **CI shape** — ✅ `eval:ci` + docs; **GitHub Actions workflow** not committed (add `workflow_dispatch` + secrets when ready).
 
 **Avoid as primary design:** brittle hand-rolled JSON in bash, or CLI-only gates that break for contributors and headless runners.
@@ -78,11 +80,11 @@ After each stage, inspect **`results/runs/<stamp>-scenario-NN/`** (transcript, s
 - `quickstarts.mdx` index: managed vs self-hosted links
 - Content aligned with product copy: API key from **Settings → Secrets**, verify via Hookdeck Console + project logs
 - SDK quickstarts: env vars, step-commented scripts
-- **Agent evaluation:** `docs/agent-evaluation/` — scenarios **01–10**, dual scoring, explicit CLI, CI slice, **`SCENARIO-RUN-TRACKER.md`** (per-scenario + execution log), `results/README.md`, `fixtures/`, `SKILL-UPSTREAM-NOTES.md`
+- **Agent evaluation:** `docs/agent-evaluation/` — scenarios **01–10**, dual scoring, explicit CLI, CI slice, `**SCENARIO-RUN-TRACKER.md`** (per-scenario + execution log), `results/README.md`, `fixtures/`, `SKILL-UPSTREAM-NOTES.md`
 
 ## Pending / follow-up
 
-- **Prompt + eval validation (in progress):** Run stages **A → B → C** above (or **`--all`** when deliberate); record pass/fail per scenario; adjust prompt or heuristics if systematic failures appear
+- **Prompt + eval validation (in progress):** Run stages **A → B → C** above (or `**--all`** when deliberate); record pass/fail per scenario; adjust prompt or heuristics if systematic failures appear
 - **hookdeck/agent-skills:** Refresh `skills/outpost/SKILL.md` using `docs/agent-evaluation/SKILL-UPSTREAM-NOTES.md` (managed-first, correct `/tenants/` paths, env naming)
 - **QA:** Run TypeScript, Python, and Go examples against live managed API; confirm production doc links
 - **Test destination URL:** When Console has a stable public URL story, align quickstarts if copy changes
@@ -96,3 +98,4 @@ After each stage, inspect **`results/runs/<stamp>-scenario-NN/`** (transcript, s
 - OpenAPI / managed base URL: `https://api.outpost.hookdeck.com/2025-07-01` (in `docs/apis/openapi.yaml` `servers`)
 - Agent template source: `docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx`
 - Eval harness: `docs/agent-evaluation/README.md`
+
diff --git a/docs/agent-evaluation/.env.example b/docs/agent-evaluation/.env.example
index 6f1e3eb48..9df940ad4 100644
--- a/docs/agent-evaluation/.env.example
+++ b/docs/agent-evaluation/.env.example
@@ -24,6 +24,8 @@ EVAL_TEST_DESTINATION_URL=
 # EVAL_MAX_TURNS=40
 # EVAL_PERMISSION_MODE=dontAsk
 # EVAL_PERSIST_SESSION=true
+# Debug only: allow Write/Edit outside the per-run workspace (not recommended)
+# EVAL_DISABLE_WORKSPACE_WRITE_GUARD=1
 
 # Scoring is ON by default after each scenario (heuristic + LLM). Opt out:
 # EVAL_NO_SCORE_HEURISTIC=1
diff --git a/docs/agent-evaluation/README.md b/docs/agent-evaluation/README.md
index 274921647..8f63b4abd 100644
--- a/docs/agent-evaluation/README.md
+++ b/docs/agent-evaluation/README.md
@@ -85,6 +85,8 @@ Two different things get called “permissions”:
 
 2. **Claude Agent SDK `dontAsk` + `allowedTools`** — In `dontAsk` mode, tools **not** listed in `allowedTools` are denied (no prompt). Defaults include **`Write`**, **`Edit`**, and **`Bash`** so app scenarios can scaffold and install dependencies inside the per-run directory. With **`EVAL_LOCAL_DOCS=1`**: **`Read,Glob,Grep,Write,Edit,Bash`**. Otherwise **`Read,Glob,Grep,WebFetch,Write,Edit,Bash`**. Narrow **`EVAL_TOOLS`** only if you need a stricter harness (e.g. transcript-only, no shell).
 
+3. **Run-directory write guard** — a **`PreToolUse`** hook denies **`Write` / `Edit` / `NotebookEdit`** when the target path resolves **outside** the current `results/runs/<stamp>-scenario-NN/` workspace (hooks enforce this under `permissionMode: dontAsk`; `canUseTool` alone does not). Set **`EVAL_DISABLE_WORKSPACE_WRITE_GUARD=1`** only for debugging. **`Bash`** can still redirect output outside the run dir; review transcripts if that matters.
+
 Changing **`EVAL_PERMISSION_MODE`** is usually unnecessary; widening **`EVAL_TOOLS`** (or using local docs) fixes most tool denials.
 
 ### Transcript vs execution (full pass)
diff --git a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
index ac620193f..543ef09a9 100644
--- a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
+++ b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
@@ -5,44 +5,43 @@ Use this table while you **run scenarios one at a time** and **execute the gener
 ## How to use
 
 1. **Automated agent eval** (from `docs/agent-evaluation/`):
-
-   ```sh
+  ```sh
    npm run eval -- --scenario <NN>
-   ```
-
-   Each run creates **`results/runs/<ISO-stamp>-scenario-<NN>/`** with `transcript.json`, `heuristic-score.json`, `llm-score.json`, and whatever the agent wrote (scripts, apps, clones).
-
+  ```
+   Each run creates `**results/runs/<ISO-stamp>-scenario-<NN>/**` with `transcript.json`, `heuristic-score.json`, `llm-score.json`, and whatever the agent wrote (scripts, apps, clones).
 2. **Fill the table:** paste or note the **run directory** (stamp), mark **Heuristic** / **LLM** pass or fail (from the sidecars or console).
-
-3. **Execution (generated code):** with **`OUTPOST_API_KEY`** (and **`OUTPOST_TEST_WEBHOOK_URL`** / **`OUTPOST_API_BASE_URL`** if needed) in your shell or `.env`, run the artifact the scenario expects — e.g. `bash outpost-quickstart.sh`, `npx tsx …`, `python …`, `go run …`, `npm run dev` in the generated app folder. Mark **Pass** / **Fail** / **Skip** and add **Notes** (HTTP status, delivery in Hookdeck Console, etc.).
-
+3. **Execution (generated code):** with `**OUTPOST_API_KEY`** (and `**OUTPOST_TEST_WEBHOOK_URL`** / `**OUTPOST_API_BASE_URL`** if needed) in your shell or `.env`, run the artifact the scenario expects — e.g. `bash outpost-quickstart.sh`, `npx tsx …`, `python …`, `go run …`, `npm run dev` in the generated app folder. Mark **Pass** / **Fail** / **Skip** and add **Notes** (HTTP status, delivery in Hookdeck Console, etc.). **Do not edit generated files to force a pass** — test what the agent produced; note OS/environment (e.g. Linux vs macOS) when relevant. **This column is the primary bar for “does the output actually work?”** Heuristic and LLM scores are supplementary.
 4. **Optional:** copy a row to your local run log under `results/` if you use `RUN-RECORDING.template.md`.
 
 ---
 
 ## Tracker
 
-| ID | Scenario file | Run directory (`results/runs/…`) | Heuristic | LLM judge | Execution (generated code) | Notes |
-|----|---------------|-----------------------------------|-----------|-----------|----------------------------|-------|
-| 01 | [01-basics-curl.md](scenarios/01-basics-curl.md) | | | | | |
-| 02 | [02-basics-typescript.md](scenarios/02-basics-typescript.md) | | | | | |
-| 03 | [03-basics-python.md](scenarios/03-basics-python.md) | | | | | |
-| 04 | [04-basics-go.md](scenarios/04-basics-go.md) | | | | | |
-| 05 | [05-app-nextjs.md](scenarios/05-app-nextjs.md) | | | | | |
-| 06 | [06-app-fastapi.md](scenarios/06-app-fastapi.md) | | | | | |
-| 07 | [07-app-go-http.md](scenarios/07-app-go-http.md) | | | | | |
-| 08 | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md) | | | | | |
-| 09 | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) | | | | | |
-| 10 | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md) | | | | | |
+
+| ID  | Scenario file                                                                  | Run directory (`results/runs/…`) | Heuristic | LLM judge | Execution (generated code) | Notes |
+| --- | ------------------------------------------------------------------------------ | -------------------------------- | --------- | --------- | ---------------------------- | ----- |
+| 01  | [01-basics-curl.md](scenarios/01-basics-curl.md)                               |                                  |           |           |                              |       |
+| 02  | [02-basics-typescript.md](scenarios/02-basics-typescript.md)                 |                                  |           |           |                              |       |
+| 03  | [03-basics-python.md](scenarios/03-basics-python.md)                           |                                  |           |           |                              |       |
+| 04  | [04-basics-go.md](scenarios/04-basics-go.md)                                   |                                  |           |           |                              |       |
+| 05  | [05-app-nextjs.md](scenarios/05-app-nextjs.md)                                 |                                  |           |           |                              |       |
+| 06  | [06-app-fastapi.md](scenarios/06-app-fastapi.md)                               |                                  |           |           |                              |       |
+| 07  | [07-app-go-http.md](scenarios/07-app-go-http.md)                               |                                  |           |           |                              |       |
+| 08  | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md)   |                                  |           |           |                              |       |
+| 09  | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) |                                  |           |           |                              |       |
+| 10  | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md)         |                                  |           |           |                              |       |
+
 
 ### Column hints
 
-| Column | Meaning |
-|--------|---------|
-| **Run directory** | e.g. `2026-04-07T15-00-00-000Z-scenario-01` — the folder containing `transcript.json` |
-| **Heuristic** | `heuristic-score.json` → `overallTranscriptPass` (or `passed`/`total`) |
-| **LLM judge** | `llm-score.json` → `overall_transcript_pass` |
-| **Execution** | Your smoke test of the **produced** script/app with real credentials — **not** automated by `npm run eval` |
+
+| Column            | Meaning                                                                                                    |
+| ----------------- | ---------------------------------------------------------------------------------------------------------- |
+| **Run directory** | e.g. `2026-04-07T15-00-00-000Z-scenario-01` — the folder containing `transcript.json`                      |
+| **Heuristic**     | `heuristic-score.json` → `overallTranscriptPass` (or `passed`/`total`)                                     |
+| **LLM judge**     | `llm-score.json` → `overall_transcript_pass`                                                               |
+| **Execution**     | Your smoke test of the **produced** script/app with real credentials — **not** automated by `npm run eval` |
+
 
 ### Status legend (suggested)
 
@@ -50,4 +49,10 @@ Use short text or symbols in cells, e.g. **Pass** / **Fail** / **Skip** / **N/A*
 
 ---
 
-Full harness docs: [README.md](README.md).
+## Action items
+
+Add bullet or table rows here when something should be tracked across runs (docs gaps, harness changes, etc.). *None recorded yet for this pass.*
+
+---
+
+Full harness docs: [README.md](README.md).
\ No newline at end of file
diff --git a/docs/agent-evaluation/scenarios/01-basics-curl.md b/docs/agent-evaluation/scenarios/01-basics-curl.md
index b7a491861..ad48add99 100644
--- a/docs/agent-evaluation/scenarios/01-basics-curl.md
+++ b/docs/agent-evaluation/scenarios/01-basics-curl.md
@@ -11,7 +11,7 @@ Agent should produce a **minimal shell + curl** flow against the **managed** API
 
 ## Automated eval (Claude Agent SDK)
 
-The harness sets the agent **cwd** to an empty directory under `docs/agent-evaluation/results/runs/<stamp>-scenario-NN/`. Save the shell script there with **Write** (e.g. `outpost-quickstart.sh`), not only as a fenced block in chat, so the run folder is reviewable on disk.
+The harness sets the agent **cwd** to an empty directory under `docs/agent-evaluation/results/runs/<stamp>-scenario-NN/`. **`Write` / `Edit` / `NotebookEdit` paths are enforced** to that directory only (absolute paths elsewhere are denied). Save the script as e.g. **`outpost-quickstart.sh`** in that folder (relative path or a path under the run dir), not under `examples/` or the repo root.
 
 ## Conversation script
 
@@ -21,15 +21,15 @@ Paste the **## Template** block from `[hookdeck-outpost-agent-prompt.mdx](../pag
 
 ### Turn 1 — User
 
-> I only want the basics using **curl** against the managed API. No SDK. Give me a **single shell script** I can save and run (e.g. `bash outpost-quickstart.sh`) that: creates a tenant, adds a webhook destination for my test URL, and publishes one event. Use the topic from the prompt. Use `OUTPOST_API_KEY` from the environment (document that I should `export` it or load `.env`). If you can’t provide a file, paste one script block I can save as `.sh`.
+> I want option 1 — **the simplest thing possible**. I don’t need a framework or SDK; just the smallest path to see tenant → webhook → publish working.
 
-### Turn 2 — User (optional probe)
+### Turn 2 — User (optional)
 
-> Show me how to verify delivery after I run those commands.
+> How do I know the event actually reached my test URL?
 
 ## Success criteria
 
-**Measurement:** Heuristic rubric `scoreScenario01` in [`../src/score-transcript.ts`](../src/score-transcript.ts) (assistant text + tool-written script content). LLM judge: `npm run score -- --run <run-dir> --llm`. Execution row remains manual.
+**Measurement:** Heuristic rubric `scoreScenario01` in `[../src/score-transcript.ts](../src/score-transcript.ts)` (assistant text + tool-written script content). LLM judge: `npm run score -- --run <run-dir> --llm`. Execution row remains manual.
 
 - Uses managed base URL `https://api.outpost.hookdeck.com/2025-07-01` (or explicit `OUTPOST_API_BASE_URL`), **not** `localhost:3333/api/v1`, unless the user asked for self-hosted.
 - Tenant: `PUT .../tenants/{tenant_id}` with `Authorization: Bearer` (or documents equivalent).
@@ -44,5 +44,4 @@ Paste the **## Template** block from `[hookdeck-outpost-agent-prompt.mdx](../pag
 
 - Wrong path (`PUT /{tenant}` without `/tenants/`).
 - Mixing self-hosted base path with managed host.
-- Skipping topic alignment with dashboard configuration.
-
+- Skipping topic alignment with dashboard configuration.
\ No newline at end of file
diff --git a/docs/agent-evaluation/scenarios/02-basics-typescript.md b/docs/agent-evaluation/scenarios/02-basics-typescript.md
index 9a2fc40a7..a403bab6d 100644
--- a/docs/agent-evaluation/scenarios/02-basics-typescript.md
+++ b/docs/agent-evaluation/scenarios/02-basics-typescript.md
@@ -21,15 +21,15 @@ Paste the **## Template** block from `[hookdeck-outpost-agent-prompt.mdx](../pag
 
 ### Turn 1 — User
 
-> Option 1 — try it out. Use **TypeScript** only: one script file, use `@hookdeck/outpost-sdk`, read `OUTPOST_API_KEY` and `OUTPOST_TEST_WEBHOOK_URL` from the environment. Create tenant, webhook destination for the topic in the prompt, publish one test event, print the event id.
+> Option 1. Let’s do it in **TypeScript**.
 
 ### Turn 2 — User (optional)
 
-> How do I run it?
+> How do I run it locally?
 
 ## Success criteria
 
-**Measurement:** Heuristic `scoreScenario02` in [`src/score-transcript.ts`](../src/score-transcript.ts); LLM judge maps the bullets below ([`README.md` § Measuring scenarios](../README.md#measuring-scenarios)). Execution row is manual.
+**Measurement:** Heuristic `scoreScenario02` in [`src/score-transcript.ts`](../src/score-transcript.ts); LLM judge maps the bullets below ([README.md § Measuring scenarios](../README.md#measuring-scenarios)). Execution row is manual.
 
 - Depends on `@hookdeck/outpost-sdk`; uses `Outpost` client with `apiKey` from `process.env.OUTPOST_API_KEY`.
 - Calls `tenants.upsert`, `destinations.create` (webhook), `publish.event`.
diff --git a/docs/agent-evaluation/scenarios/03-basics-python.md b/docs/agent-evaluation/scenarios/03-basics-python.md
index 2d9ecb88b..880b3c5e1 100644
--- a/docs/agent-evaluation/scenarios/03-basics-python.md
+++ b/docs/agent-evaluation/scenarios/03-basics-python.md
@@ -17,27 +17,27 @@ The harness sets the agent **cwd** to `docs/agent-evaluation/results/runs/<stamp
 
 ### Turn 0
 
-Paste the **## Template** block from [`hookdeck-outpost-agent-prompt.mdx`](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx), with `{{…}}` filled using your project or [`fixtures/placeholder-values-for-turn0.md`](../fixtures/placeholder-values-for-turn0.md).
+Paste the **## Template** block from `[hookdeck-outpost-agent-prompt.mdx](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx)`, with `{{…}}` filled using your project or `[fixtures/placeholder-values-for-turn0.md](../fixtures/placeholder-values-for-turn0.md)`.
 
 ### Turn 1 — User
 
-> Option 1 — try it out. Use **Python** with `outpost_sdk`. Read credentials from the environment. Same flow: tenant, webhook destination, one publish, print event id.
+> Option 1. I’d like to use **Python**.
 
 ### Turn 2 — User (optional)
 
-> Keep it to one file I can run with `python`.
+> One file I can run with `python` is enough.
 
 ## Success criteria
 
-**Measurement:** Heuristic `scoreScenario03` in [`src/score-transcript.ts`](../src/score-transcript.ts); LLM judge maps the checklist below ([`README.md` § Measuring scenarios](../README.md#measuring-scenarios)). Execution row is manual.
+**Measurement:** Heuristic `scoreScenario03` in [`src/score-transcript.ts`](../src/score-transcript.ts); LLM judge maps the checklist below ([README.md § Measuring scenarios](../README.md#measuring-scenarios)). Execution row is manual.
 
-- [ ] `from outpost_sdk import Outpost` (or equivalent documented import path).
-- [ ] `Outpost(api_key=..., server_url=...)` with optional base URL from env.
-- [ ] `tenants.upsert`, `destinations.create`, `publish.event` with correct shapes.
-- [ ] Topic aligned with prompt; webhook URL from env.
-- [ ] No secrets in file.
-- [ ] **Execution (full pass):** With `OUTPOST_API_KEY`, `OUTPOST_TEST_WEBHOOK_URL`, and optional base URL env vars set, `python …` (as documented) completes without API errors and prints an event id or clear success. *Skip only for transcript-only triage.*
+- `from outpost_sdk import Outpost` (or equivalent documented import path).
+- `Outpost(api_key=..., server_url=...)` with optional base URL from env.
+- `tenants.upsert`, `destinations.create`, `publish.event` as in the **Python quickstart** (including `request=` for publish where the SDK requires it).
+- Topic aligned with prompt; webhook URL from env.
+- No secrets in file.
+- **Execution (full pass):** With `OUTPOST_API_KEY`, `OUTPOST_TEST_WEBHOOK_URL`, and optional base URL env vars set, `python …` (as documented) completes without API errors and prints an event id or clear success. *Skip only for transcript-only triage.*
 
 ## Failure modes to note
 
-- Using `requests` only when user asked for the official SDK.
+- Using `requests` only when user asked for the official SDK.
\ No newline at end of file
diff --git a/docs/agent-evaluation/scenarios/04-basics-go.md b/docs/agent-evaluation/scenarios/04-basics-go.md
index 29622c6a1..7d575c62f 100644
--- a/docs/agent-evaluation/scenarios/04-basics-go.md
+++ b/docs/agent-evaluation/scenarios/04-basics-go.md
@@ -21,7 +21,11 @@ Paste the **## Template** block from [`hookdeck-outpost-agent-prompt.mdx`](../pa
 
 ### Turn 1 — User
 
-> Option 1 — try it out. Use **Go** and the official Outpost Go SDK. Environment variables for API key and test webhook URL. Tenant upsert, webhook destination, publish one event, print ids.
+> Option 1. I want to try it in **Go**.
+
+### Turn 2 — User (optional)
+
+> Keep the program small — one `main` or a couple of files is fine.
 
 ## Success criteria
 
diff --git a/docs/agent-evaluation/scenarios/05-app-nextjs.md b/docs/agent-evaluation/scenarios/05-app-nextjs.md
index 3e5ffa10b..f44061775 100644
--- a/docs/agent-evaluation/scenarios/05-app-nextjs.md
+++ b/docs/agent-evaluation/scenarios/05-app-nextjs.md
@@ -26,17 +26,17 @@ Paste the **## Template** block from `[hookdeck-outpost-agent-prompt.mdx](../pag
 
 ### Turn 1 — User
 
-> Option 2 — build a minimal example. I want **Next.js**. Very small UI: field for webhook URL, button to create the webhook destination for tenant `demo_tenant` (or let me edit tenant id in the UI), and a button to send one test event on topic `user.created` (or the first topic from the prompt). Use the Outpost TypeScript SDK on the server only.
+> Option 2 — a **tiny demo app**. Can we use **Next.js**? I want a minimal page: somewhere to put a webhook URL, register it for a customer, and a way to fire one test event.
 
 ### Turn 2 — User (optional)
 
-> Add a short README with env vars and `npm run dev` steps.
+> Can you add a short README — what goes in `.env` and how I start the dev server?
 
 ### Turn 3 — User (stress)
 
-> I do not have a public URL yet — what should I use for the webhook URL field?
+> I don’t have a public webhook URL yet. What should I put in that field?
 
-Expected: agent suggests Hookdeck Console Source URL or similar, aligned with quickstarts.
+*Expected:* agent points to a Hookdeck Console Source URL (or equivalent) consistent with the quickstarts and Turn 0 test destination.
 
 ## Success criteria
 
diff --git a/docs/agent-evaluation/scenarios/06-app-fastapi.md b/docs/agent-evaluation/scenarios/06-app-fastapi.md
index 1f00b5f68..704415e33 100644
--- a/docs/agent-evaluation/scenarios/06-app-fastapi.md
+++ b/docs/agent-evaluation/scenarios/06-app-fastapi.md
@@ -24,11 +24,11 @@ Paste the **## Template** block from [`hookdeck-outpost-agent-prompt.mdx`](../pa
 
 ### Turn 1 — User
 
-> Option 2 — minimal example with **FastAPI**. Single small app: HTML page with webhook URL field, button to register destination for tenant `demo_tenant`, button to publish one test event. Use `outpost_sdk` only on the server. Keep it to a few files.
+> Option 2 — **FastAPI**, same idea as a tiny demo: simple HTML, register a webhook for a tenant, button to send one test event. Keep the codebase small.
 
 ### Turn 2 — User (optional)
 
-> Document env vars and `uvicorn` command in README.
+> README with env vars and how to run it would help.
 
 ## Success criteria
 
diff --git a/docs/agent-evaluation/scenarios/07-app-go-http.md b/docs/agent-evaluation/scenarios/07-app-go-http.md
index cfdd594a9..5dfdd85e2 100644
--- a/docs/agent-evaluation/scenarios/07-app-go-http.md
+++ b/docs/agent-evaluation/scenarios/07-app-go-http.md
@@ -23,11 +23,11 @@ Paste the **## Template** block from `[hookdeck-outpost-agent-prompt.mdx](../pag
 
 ### Turn 1 — User
 
-> Option 2 — minimal example in **Go**. Standard library HTTP server, simple HTML page: register webhook destination for a fixed tenant id, then button to publish one event. Use the official Go SDK for Outpost calls. API key from environment.
+> Option 2 — **Go** with the standard library: small HTTP server, basic HTML, register a webhook and publish one test event.
 
 ### Turn 2 — User (optional)
 
-> Keep everything in `main.go` if reasonable, or split `handlers.go` — your choice, but stay small.
+> One or two files is fine if you can keep it readable.
 
 ## Success criteria
 
diff --git a/docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md b/docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md
index 56cd9c9b0..fc1594ff0 100644
--- a/docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md
+++ b/docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md
@@ -23,18 +23,13 @@ Paste the **## Template** block from [`hookdeck-outpost-agent-prompt.mdx`](../pa
 
 ### Turn 1 — User
 
-> **Option 3 — integrate with an existing app.** Clone **`https://github.com/leerob/next-saas-starter`** into this workspace (subdirectory is fine), install dependencies per its README, and get it in a state where we could run it locally.
+> Option 3 — I’m not starting from scratch. Please clone **`https://github.com/leerob/next-saas-starter`** here, install it, and get it runnable. Then wire in **Hookdeck Outpost** so we can send **outbound webhooks** to our customers.
 >
-> Then integrate **Hookdeck Outpost** for **outbound webhooks** to our customers:
->
-> 1. Use the official **`@hookdeck/outpost-sdk`** on the **server only** (API routes, server actions, or equivalent — never expose `OUTPOST_API_KEY` to the browser).
-> 2. Pick **one meaningful domain event** in this starter (e.g. team or member lifecycle — choose something that actually exists in the code) and **`publish`** an event to Outpost with a **topic** from the Turn 0 prompt (or document the topic constant).
-> 3. Document how an operator registers a **webhook destination** per **tenant/customer** (REST flow or small admin UI is fine). Use the test destination URL from Turn 0 where helpful.
-> 4. Add or update a **README section** listing required env vars (`OUTPOST_API_KEY`, optional base URL, anything else you add).
+> I need this tied to **something real in the app** (not a throwaway demo page), and I need to understand how each customer gets their webhook registered. Put whatever I need to configure in the README (env vars, etc.). Keep secrets on the server only.
 
 ### Turn 2 — User (optional)
 
-> Where should we call **`tenants.upsert`** relative to our own tenant/customer model?
+> When should we create or sync the Outpost **tenant** with our own customer or team model?
 
 ## Success criteria
 
diff --git a/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md b/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md
index 72c63ef86..dd8270921 100644
--- a/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md
+++ b/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md
@@ -22,18 +22,13 @@ Paste the **## Template** from [`hookdeck-outpost-agent-prompt.mdx`](../pages/qu
 
 ### Turn 1 — User
 
-> **Option 3 — integrate with an existing app.** Clone **`https://github.com/philipokiokio/FastAPI_SAAS_Template`** into this workspace, install dependencies per its README (venv + `pip install -r requirements.txt` or `uv` as you prefer).
+> Option 3 — integrate Outpost into a real codebase. Clone **`https://github.com/philipokiokio/FastAPI_SAAS_Template`**, set it up from its README, then add **Hookdeck Outpost** for customer webhooks.
 >
-> Integrate **Hookdeck Outpost** for **outbound webhooks**:
->
-> 1. Use **`outpost_sdk`** only in **server** code (routers, services — never embed the API key in templates or static JS).
-> 2. Hook **`publish.event`** (and tenant/destination setup as needed) to **one real domain event** in this template (e.g. org membership or user lifecycle — pick something that exists in the codebase).
-> 3. Document how operators register **webhook destinations** per tenant/customer and which **topic** you publish on (use topics from Turn 0 when possible).
-> 4. Document **`OUTPOST_API_KEY`** and **`uvicorn`** (or equivalent) run instructions in README.
+> Hook publishing to **one real event** that already exists in the app (orgs, users, whatever fits). Document topics, how tenants register webhook URLs, and env vars. Don’t leak the API key to the client.
 
 ### Turn 2 — User (optional)
 
-> Should **`tenants.upsert`** run at org creation or lazily on first publish?
+> Should we create the Outpost tenant when the org is created, or lazily on first publish?
 
 ## Success criteria
 
diff --git a/docs/agent-evaluation/scenarios/10-integrate-go-existing.md b/docs/agent-evaluation/scenarios/10-integrate-go-existing.md
index c8f91c79e..1408caa57 100644
--- a/docs/agent-evaluation/scenarios/10-integrate-go-existing.md
+++ b/docs/agent-evaluation/scenarios/10-integrate-go-existing.md
@@ -22,18 +22,13 @@ Paste the **## Template** from [`hookdeck-outpost-agent-prompt.mdx`](../pages/qu
 
 ### Turn 1 — User
 
-> **Option 3 — integrate with an existing app.** Clone **`https://github.com/devinterface/startersaas-go-api`** into this workspace and make it build (`go build` / `go test` ./… as appropriate per the repo).
+> Option 3 — existing Go API. Clone **`https://github.com/devinterface/startersaas-go-api`**, get it building, then add **Hookdeck Outpost** for outbound webhooks.
 >
-> Add **Hookdeck Outpost** for **outbound webhooks** to customers:
->
-> 1. Use the official **Go SDK** (`github.com/hookdeck/outpost/sdks/outpost-go` or current module path from docs).
-> 2. **`OUTPOST_API_KEY`** from environment only.
-> 3. On **one real domain event** in this API (e.g. user registration, subscription, or another existing handler), call **`Publish.Event`** (and **`Tenants` / `Destinations`** as needed) with a **topic** from Turn 0.
-> 4. Document how to register **webhook destinations** per tenant and which env vars to set. Mention the Hookdeck test destination URL from Turn 0 where useful.
+> Use **one real handler** as the publish trigger (signup, billing, etc.). API key from env only. Document how customers register webhook URLs and what to set in env. Use the test destination from the dashboard prompt where it helps.
 
 ### Turn 2 — User (optional)
 
-> Show where **`CreateDestinationCreateWebhook`** fits if we let each customer paste a webhook URL in a settings API.
+> If customers submit a webhook URL in a settings endpoint, where does destination creation live?
 
 ## Success criteria
 
diff --git a/docs/agent-evaluation/src/run-agent-eval.ts b/docs/agent-evaluation/src/run-agent-eval.ts
index 72464f3a2..25c67f459 100644
--- a/docs/agent-evaluation/src/run-agent-eval.ts
+++ b/docs/agent-evaluation/src/run-agent-eval.ts
@@ -8,12 +8,13 @@
  */
 
 import { mkdir, readdir, readFile, writeFile } from "node:fs/promises";
-import { join, dirname } from "node:path";
+import { dirname, join, resolve, sep } from "node:path";
 import { fileURLToPath } from "node:url";
 import { parseArgs } from "node:util";
 import dotenv from "dotenv";
 import {
   query,
+  type HookInput,
   type Options,
   type SDKMessage,
   type SDKSystemMessage,
@@ -68,18 +69,38 @@ function envFlagTruthy(v: string | undefined): boolean {
 /** When docs are not published yet, point the agent at MDX/OpenAPI paths in this repo. */
 function localDocumentationBlock(repoRoot: string, llmsFullUrl: string | undefined): string {
   const f = (...parts: string[]) => join(repoRoot, ...parts);
+  const languageSdkBlock = `### Language → SDK vs HTTP
+
+Map what the user says (they rarely name packages):
+
+- **Simplest / minimal / least setup** and no language named → **curl** quickstart + OpenAPI; one shell script; **no SDK**. Publish success is **HTTP 202**; see curl quickstart for script portability (avoid GNU-only \`head -n -1\`).
+- **TypeScript** or **Node** → TypeScript quickstart + \`@hookdeck/outpost-sdk\` as in that doc.
+- **Python** → Python quickstart + \`outpost_sdk\`; \`publish.event(request={{...}})\` as in that doc — not TS-style kwargs.
+- **Go** → Go quickstart + official Go SDK as in that doc.
+- Explicit **curl** / **HTTP only** / **REST** → curl quickstart + OpenAPI.
+
+**Small app (option 2):** Next.js → TS SDK server-side; FastAPI → Python SDK; Go net/http → Go SDK — use that language’s quickstart for Outpost shapes.
+
+**Existing app (option 3):** Official SDK for the repo’s language (or REST if they refuse SDK).
+
+Do **not** mix TS call shapes into Python.`;
+
   let block = `### Documentation (local repository — unpublished)
 
 Do **not** rely on live public documentation URLs for this session. Read these files from the Outpost checkout (for example with the **Read** tool). Paths are absolute from the repository root:
 
-- Getting started (curl): \`${f("docs/pages/quickstarts/hookdeck-outpost-curl.mdx")}\`
-- TypeScript quickstart: \`${f("docs/pages/quickstarts/hookdeck-outpost-typescript.mdx")}\`
-- Python quickstart: \`${f("docs/pages/quickstarts/hookdeck-outpost-python.mdx")}\`
-- Go quickstart: \`${f("docs/pages/quickstarts/hookdeck-outpost-go.mdx")}\`
+Follow **Language → SDK vs HTTP** below for mapping user intent to the **single** right quickstart. Prefer language quickstarts over \`sdks.mdx\` (TS-heavy).
+
+- Getting started (curl / HTTP only): \`${f("docs/pages/quickstarts/hookdeck-outpost-curl.mdx")}\`
+- TypeScript quickstart (TS SDK): \`${f("docs/pages/quickstarts/hookdeck-outpost-typescript.mdx")}\`
+- Python quickstart (Python SDK): \`${f("docs/pages/quickstarts/hookdeck-outpost-python.mdx")}\`
+- Go quickstart (Go SDK): \`${f("docs/pages/quickstarts/hookdeck-outpost-go.mdx")}\`
 - API reference (human-oriented pages under): \`${f("docs/pages/references/")}\`
 - OpenAPI spec (machine-readable): \`${f("docs/apis/openapi.yaml")}\`
 - Destination types: \`${f("docs/pages/destinations/")}\`
-- SDKs overview: \`${f("docs/pages/sdks.mdx")}\``;
+- SDKs overview (TS-heavy): \`${f("docs/pages/sdks.mdx")}\` — prefer the language quickstart over this for Python/Go/TS code.
+
+${languageSdkBlock}`;
   if (llmsFullUrl) {
     block += `\n- Full docs bundle: ${llmsFullUrl}`;
   }
@@ -295,6 +316,48 @@ async function runOneScenario(
   };
 }
 
+/** True if resolved `filePath` is `runDir` or a path inside it (never outside). */
+function filePathIsInsideRunDir(runDir: string, filePath: string): boolean {
+  const root = resolve(runDir);
+  const target = resolve(filePath);
+  if (target === root) return true;
+  const prefix = root.endsWith(sep) ? root : root + sep;
+  return target.startsWith(prefix);
+}
+
+function toolInputFilePath(toolName: string, toolInput: unknown): string | undefined {
+  if (toolName !== "Write" && toolName !== "Edit" && toolName !== "NotebookEdit") {
+    return undefined;
+  }
+  if (typeof toolInput !== "object" || toolInput === null) return undefined;
+  const input = toolInput as Record<string, unknown>;
+  for (const k of ["file_path", "path", "notebook_path"] as const) {
+    const v = input[k];
+    if (typeof v === "string" && v.length > 0) return v;
+  }
+  return undefined;
+}
+
+/**
+ * PreToolUse hook: deny Write/Edit/NotebookEdit outside the run dir.
+ * `canUseTool` is not reliable under `permissionMode: dontAsk`; hooks receive `permissionDecision` instead.
+ */
+function createRunDirPreToolHook(runDir: string) {
+  return async (input: HookInput) => {
+    if (input.hook_event_name !== "PreToolUse") return {};
+    const candidate = toolInputFilePath(input.tool_name, input.tool_input);
+    if (!candidate) return {};
+    if (filePathIsInsideRunDir(runDir, candidate)) return {};
+    return {
+      hookSpecificOutput: {
+        hookEventName: "PreToolUse" as const,
+        permissionDecision: "deny" as const,
+        permissionDecisionReason: `Outpost agent-eval: ${input.tool_name} must target only the scenario workspace. Use a path under ${runDir} (e.g. outpost-quickstart.sh). Refused: ${resolve(candidate)}`,
+      },
+    };
+  };
+}
+
 function defaultEvalTools(env: NodeJS.ProcessEnv): string {
   if (env.EVAL_TOOLS?.trim()) {
     return env.EVAL_TOOLS.trim();
@@ -319,14 +382,14 @@ function buildBaseOptions(agentWorkspaceCwd: string): Options {
     Options["permissionMode"]
   >;
 
-  const maxTurns = Number(process.env.EVAL_MAX_TURNS ?? "40");
+  const maxTurns = Number(process.env.EVAL_MAX_TURNS ?? "80");
   const persistSession = process.env.EVAL_PERSIST_SESSION !== "false";
 
   const o: Options = {
     cwd: agentWorkspaceCwd,
     allowedTools,
     permissionMode: mode,
-    maxTurns: Number.isFinite(maxTurns) ? maxTurns : 40,
+    maxTurns: Number.isFinite(maxTurns) ? maxTurns : 80,
     persistSession,
     env: {
       ...process.env,
@@ -334,6 +397,12 @@ function buildBaseOptions(agentWorkspaceCwd: string): Options {
     } as Record<string, string | undefined>,
   };
 
+  if (!envFlagTruthy(process.env.EVAL_DISABLE_WORKSPACE_WRITE_GUARD)) {
+    o.hooks = {
+      PreToolUse: [{ hooks: [createRunDirPreToolHook(agentWorkspaceCwd)] }],
+    };
+  }
+
   if (process.env.EVAL_MODEL?.trim()) {
     o.model = process.env.EVAL_MODEL.trim();
   }
@@ -385,9 +454,10 @@ Environment:
   EVAL_LLMS_FULL_URL    Optional (omit docs line if unset)
   EVAL_TOOLS            Optional, comma-separated (default: Read,Glob,Grep[,WebFetch],Write,Edit,Bash — see README)
   EVAL_MODEL            Optional
-  EVAL_MAX_TURNS        Optional (default: 40)
+  EVAL_MAX_TURNS        Optional (default: 80; npm/go mod installs can exceed 40)
   EVAL_PERMISSION_MODE  Optional (default: dontAsk)
   EVAL_PERSIST_SESSION  Set to "false" to disable session persistence (breaks multi-turn resume)
+  EVAL_DISABLE_WORKSPACE_WRITE_GUARD  Set to 1 to allow Write/Edit outside the run dir (not recommended)
 
 Outputs under docs/agent-evaluation/results/runs/ (gitignored): each scenario gets
   results/runs/<stamp>-scenario-NN/transcript.json
diff --git a/docs/agent-evaluation/src/score-transcript.ts b/docs/agent-evaluation/src/score-transcript.ts
index 5ba55459b..ec7455243 100644
--- a/docs/agent-evaluation/src/score-transcript.ts
+++ b/docs/agent-evaluation/src/score-transcript.ts
@@ -200,8 +200,11 @@ function scoreScenario01(corpus: string, assistant: string, meta: RunJson["meta"
   });
 
   const afterPublish = t.split(/\/publish/i).pop() ?? t;
-  const wrongPayload = /"payload"\s*:/.test(afterPublish);
-  const hasData = /"data"\s*:/.test(afterPublish);
+  // Tool corpus JSON-stringifies Write bodies, so bash-escaped keys look like \"data\": not "data":
+  const wrongPayload =
+    /"payload"\s*:/.test(afterPublish) || /\\"payload\\"\s*:/.test(afterPublish);
+  const hasData =
+    /"data"\s*:/.test(afterPublish) || /\\"data\\"\s*:/.test(afterPublish);
   checks.push({
     id: "publish_body_data_not_payload",
     pass: publish && !wrongPayload && hasData,
diff --git a/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx b/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx
index bba3b53d7..8e6afe122 100644
--- a/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx
+++ b/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx
@@ -29,26 +29,44 @@ Use this **Hookdeck Console Source** URL to verify event delivery (the webhook `
 
 ### Documentation
 
-- Getting started (curl): {{DOCS_URL}}/quickstarts/hookdeck-outpost-curl
-- TypeScript quickstart: {{DOCS_URL}}/quickstarts/hookdeck-outpost-typescript
-- Python quickstart: {{DOCS_URL}}/quickstarts/hookdeck-outpost-python
-- Go quickstart: {{DOCS_URL}}/quickstarts/hookdeck-outpost-go
+- Getting started (curl / HTTP only, no SDK): {{DOCS_URL}}/quickstarts/hookdeck-outpost-curl
+- TypeScript quickstart (TypeScript SDK): {{DOCS_URL}}/quickstarts/hookdeck-outpost-typescript
+- Python quickstart (Python SDK): {{DOCS_URL}}/quickstarts/hookdeck-outpost-python
+- Go quickstart (Go SDK): {{DOCS_URL}}/quickstarts/hookdeck-outpost-go
 - Full docs bundle (when available on the public site): {{LLMS_FULL_URL}}
-- API reference: {{DOCS_URL}}/api
+- API reference and OpenAPI (REST JSON shapes and status codes): {{DOCS_URL}}/api
 - Destination types: {{DOCS_URL}}/destinations
-- SDK documentation: {{DOCS_URL}}/sdks
+- SDK overview (mostly TypeScript-shaped examples): {{DOCS_URL}}/sdks — use **only** for high-level context; for **TypeScript, Python, or Go** code, follow that language’s **quickstart** for correct method signatures (e.g. Python `publish.event` uses `request={{...}}`, not TypeScript-style spreads as Python kwargs).
+
+### Language → SDK vs HTTP
+
+Operators rarely name packages or SDK details. **You** map what they say to the right doc and dependency:
+
+**“Try it out” — interpret their words**
+
+- **Simplest / fastest / minimal / least setup / “just show me” / no framework** (and they do **not** name TypeScript, Python, or Go) → treat as **curl**: **curl quickstart** + **OpenAPI** for exact JSON. One runnable shell script is ideal. **No SDK.**
+- **TypeScript** or **Node** → **TypeScript quickstart**; use the **official TypeScript SDK** (`@hookdeck/outpost-sdk`) exactly as that quickstart shows. The user does not need to say “SDK.”
+- **Python** → **Python quickstart**; use **`outpost_sdk`** as that quickstart shows (e.g. Python `publish.event` uses `request={{...}}` — **not** TypeScript-style kwargs on the method).
+- **Go** → **Go quickstart**; use the **official Go SDK** as that quickstart shows.
+- They explicitly want **curl**, **HTTP only**, or **REST** without a language SDK → **curl quickstart** + OpenAPI.
+
+Do **not** mix patterns across languages (e.g. do not apply TypeScript `publish.event({ ... })` argument style to Python).
+
+**Option 2 (small app)** — Map framework to the matching official SDK on the **server only**: e.g. **Next.js** → TypeScript SDK + patterns from the TypeScript quickstart and your Next conventions; **FastAPI** → Python SDK; **Go + net/http** → Go SDK. Prefer each language’s **quickstart** for Outpost call shapes.
+
+**Option 3 (existing app)** — Use the **official SDK for the repo’s language** on the server (or REST/OpenAPI if they insist on no SDK). Read that language’s quickstart for shapes; integrate on **real** domain paths, not throwaway demos.
 
 ### What to do
 
-Ask the user which of the following they want:
+Guide the conversation, then act:
 
-1. **Try it out** — Create a minimal script that runs through the full flow: create a tenant, add a webhook destination, publish a test event. Ask which language they prefer (TypeScript, Python, Go, or curl) and follow the matching quickstart doc.
+1. **Try it out** — Minimal path: tenant → webhook destination → publish → print event id (or show success). If they want the **simplest** path, default to **curl** without making them say “curl.” If they name **TypeScript**, **Python**, or **Go**, use **only** that language’s quickstart and implied SDK. Ask only for what the quickstart and runnability still need (env vars, etc.).
 
-2. **Build a minimal example** — Scaffold a small app with a simple UI that demonstrates tenant creation, destination management, and event publishing. Ask which framework they prefer.
+2. **Build a minimal example** — Small UI + server; use the **SDK for that stack** (see **Option 2** above) or REST if they choose HTTP-only.
 
-3. **Integrate with an existing app** — Inspect the codebase for language and framework, then integrate Outpost: add the SDK (or use REST), create tenants when customers onboard, and publish events at the right points in application logic.
+3. **Integrate with an existing app** — Clone or open their codebase; add Outpost per **Option 3** above; document env vars and operator steps.
 
-For all modes, read the relevant quickstart documentation before writing code.
+For all modes, read the **single** language-appropriate quickstart (and OpenAPI when implementing raw HTTP) before writing code.
 
 **Files on disk:** When your environment supports it, **write runnable artifacts into the operator’s project workspace** (real files: scripts, app source, `package.json`, `go.mod`, README) rather than only pasting long code in chat—so they can run, diff, and commit. Keep everything for one task in the same directory. For **minimal example apps** (option 2), scaffold and install dependencies there as you normally would (for example `npm` / `npx`, `go mod`, `pip` or `uv`).
 
diff --git a/docs/pages/quickstarts/hookdeck-outpost-curl.mdx b/docs/pages/quickstarts/hookdeck-outpost-curl.mdx
index c7614b614..6194262f8 100644
--- a/docs/pages/quickstarts/hookdeck-outpost-curl.mdx
+++ b/docs/pages/quickstarts/hookdeck-outpost-curl.mdx
@@ -83,6 +83,13 @@ curl --request POST "$OUTPOST_API_BASE_URL/publish" \
 
 A `202` response means the event was accepted for delivery.
 
+## Shell scripts: status codes and portability
+
+If you combine API response bodies with `curl --write-out '\n%{http_code}'`:
+
+- **Publish** success is **HTTP 202** (not only 200/201). Treat **202** as success in conditional checks.
+- **Portability:** GNU `head -n -1` (“all lines but the last”) is **not** available on macOS BSD `head`. Prefer splitting with **`sed '$d'`** (body) and **`tail -n 1`** (status), or another POSIX-friendly approach, so the same script runs on Linux and macOS.
+
 ## Verify delivery
 
 - In **Hookdeck Console**, inspect the connection or destination you used (for example the Source you created) and confirm the webhook request and payload look correct.

From 3bc54696115c81416fbd892cc1b7ca7a78f3f0bd Mon Sep 17 00:00:00 2001
From: Phil Leggetter <phil@leggetter.co.uk>
Date: Wed, 8 Apr 2026 16:02:51 +0100
Subject: [PATCH 04/47] docs(agent-eval): record fresh scenario 01 eval run in
 tracker

Made-with: Cursor
---
 docs/agent-evaluation/SCENARIO-RUN-TRACKER.md | 22 +++++++++----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
index 543ef09a9..e02009ff5 100644
--- a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
+++ b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
@@ -19,17 +19,17 @@ Use this table while you **run scenarios one at a time** and **execute the gener
 
 
 | ID  | Scenario file                                                                  | Run directory (`results/runs/…`) | Heuristic | LLM judge | Execution (generated code) | Notes |
-| --- | ------------------------------------------------------------------------------ | -------------------------------- | --------- | --------- | ---------------------------- | ----- |
-| 01  | [01-basics-curl.md](scenarios/01-basics-curl.md)                               |                                  |           |           |                              |       |
-| 02  | [02-basics-typescript.md](scenarios/02-basics-typescript.md)                 |                                  |           |           |                              |       |
-| 03  | [03-basics-python.md](scenarios/03-basics-python.md)                           |                                  |           |           |                              |       |
-| 04  | [04-basics-go.md](scenarios/04-basics-go.md)                                   |                                  |           |           |                              |       |
-| 05  | [05-app-nextjs.md](scenarios/05-app-nextjs.md)                                 |                                  |           |           |                              |       |
-| 06  | [06-app-fastapi.md](scenarios/06-app-fastapi.md)                               |                                  |           |           |                              |       |
-| 07  | [07-app-go-http.md](scenarios/07-app-go-http.md)                               |                                  |           |           |                              |       |
-| 08  | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md)   |                                  |           |           |                              |       |
-| 09  | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) |                                  |           |           |                              |       |
-| 10  | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md)         |                                  |           |           |                              |       |
+| --- | ------------------------------------------------------------------------------ | -------------------------------- | --------- | --------- | -------------------------- | ----- |
+| 01  | [01-basics-curl.md](scenarios/01-basics-curl.md)                               | `2026-04-08T14-58-40-850Z-scenario-01` | Pass (7/7) | Pass | — | Eval exit 0. Artifact: **`try-it-out.sh`**. **Execution** (manual): set `OUTPOST_API_KEY`, run script; uses `curl --fail-with-body` (2xx includes **202** on publish). |
+| 02  | [02-basics-typescript.md](scenarios/02-basics-typescript.md)                   |                                  |           |           |                            |       |
+| 03  | [03-basics-python.md](scenarios/03-basics-python.md)                           |                                  |           |           |                            |       |
+| 04  | [04-basics-go.md](scenarios/04-basics-go.md)                                   |                                  |           |           |                            |       |
+| 05  | [05-app-nextjs.md](scenarios/05-app-nextjs.md)                                 |                                  |           |           |                            |       |
+| 06  | [06-app-fastapi.md](scenarios/06-app-fastapi.md)                               |                                  |           |           |                            |       |
+| 07  | [07-app-go-http.md](scenarios/07-app-go-http.md)                               |                                  |           |           |                            |       |
+| 08  | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md)   |                                  |           |           |                            |       |
+| 09  | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) |                                  |           |           |                            |       |
+| 10  | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md)           |                                  |           |           |                            |       |
 
 
 ### Column hints

From 241dae68334115a3478db5f352d6bb94bcacd6e6 Mon Sep 17 00:00:00 2001
From: Phil Leggetter <phil@leggetter.co.uk>
Date: Wed, 8 Apr 2026 16:06:54 +0100
Subject: [PATCH 05/47] fix(agent-eval): remove harness-only 202/head hints
 from local docs block

- Point local EVAL_LOCAL_DOCS guidance at full curl quickstart instead
- Reword scenario 01 execution criteria to reference quickstart/OpenAPI
---
 docs/agent-evaluation/scenarios/01-basics-curl.md | 2 +-
 docs/agent-evaluation/src/run-agent-eval.ts       | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/agent-evaluation/scenarios/01-basics-curl.md b/docs/agent-evaluation/scenarios/01-basics-curl.md
index ad48add99..6aa12b215 100644
--- a/docs/agent-evaluation/scenarios/01-basics-curl.md
+++ b/docs/agent-evaluation/scenarios/01-basics-curl.md
@@ -38,7 +38,7 @@ Paste the **## Template** block from `[hookdeck-outpost-agent-prompt.mdx](../pag
 - Delivers as one **shell script** (or one fenced `bash` block meant to be saved as `.sh`), not only three unrelated snippets without a shebang/variables.
 - Does **not** embed a pasted API key in the reply.
 - Verification mentions Hookdeck Console / dashboard logs if Turn 2 was asked.
-- **Execution (full pass):** With `OUTPOST_API_KEY` (and `OUTPOST_API_BASE_URL` if the snippet uses it) set in your environment, run the agent’s tenant → destination → publish sequence against a real project. Expect **2xx** on tenant upsert and destination create, **202** (or documented success) on publish, and a visible delivery to the test webhook URL (Hookdeck Console / project logs, or `GET .../attempts` as appropriate). *Skip only if you are doing transcript-only triage.*
+- **Execution (full pass):** With `OUTPOST_API_KEY` (and `OUTPOST_API_BASE_URL` if the snippet uses it) set in your environment, run the agent’s tenant → destination → publish sequence against a real project. Expect success per the **curl quickstart** and **OpenAPI** (tenant and destination typically 2xx; publish uses the documented success status—often **202**). Confirm delivery via Hookdeck Console / project logs (or `GET .../attempts` as appropriate). *Skip only if you are doing transcript-only triage.*
 
 ## Failure modes to note
 
diff --git a/docs/agent-evaluation/src/run-agent-eval.ts b/docs/agent-evaluation/src/run-agent-eval.ts
index 25c67f459..87abd3b78 100644
--- a/docs/agent-evaluation/src/run-agent-eval.ts
+++ b/docs/agent-evaluation/src/run-agent-eval.ts
@@ -73,7 +73,7 @@ function localDocumentationBlock(repoRoot: string, llmsFullUrl: string | undefin
 
 Map what the user says (they rarely name packages):
 
-- **Simplest / minimal / least setup** and no language named → **curl** quickstart + OpenAPI; one shell script; **no SDK**. Publish success is **HTTP 202**; see curl quickstart for script portability (avoid GNU-only \`head -n -1\`).
+- **Simplest / minimal / least setup** and no language named → **curl** quickstart + OpenAPI; one shell script; **no SDK**. Read the **entire** curl quickstart (it covers REST responses and any shell portability notes for scripts).
 - **TypeScript** or **Node** → TypeScript quickstart + \`@hookdeck/outpost-sdk\` as in that doc.
 - **Python** → Python quickstart + \`outpost_sdk\`; \`publish.event(request={{...}})\` as in that doc — not TS-style kwargs.
 - **Go** → Go quickstart + official Go SDK as in that doc.

From 6b1fd4b13ee282043559ce177b8be8b60bb1edc0 Mon Sep 17 00:00:00 2001
From: Phil Leggetter <phil@leggetter.co.uk>
Date: Wed, 8 Apr 2026 16:11:44 +0100
Subject: [PATCH 06/47] docs(agent-eval): update scenario 01 tracker after
 re-run and execution pass

---
 docs/agent-evaluation/SCENARIO-RUN-TRACKER.md | 24 +++++++++----------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
index e02009ff5..bf45cc4ee 100644
--- a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
+++ b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
@@ -18,18 +18,18 @@ Use this table while you **run scenarios one at a time** and **execute the gener
 ## Tracker
 
 
-| ID  | Scenario file                                                                  | Run directory (`results/runs/…`) | Heuristic | LLM judge | Execution (generated code) | Notes |
-| --- | ------------------------------------------------------------------------------ | -------------------------------- | --------- | --------- | -------------------------- | ----- |
-| 01  | [01-basics-curl.md](scenarios/01-basics-curl.md)                               | `2026-04-08T14-58-40-850Z-scenario-01` | Pass (7/7) | Pass | — | Eval exit 0. Artifact: **`try-it-out.sh`**. **Execution** (manual): set `OUTPOST_API_KEY`, run script; uses `curl --fail-with-body` (2xx includes **202** on publish). |
-| 02  | [02-basics-typescript.md](scenarios/02-basics-typescript.md)                   |                                  |           |           |                            |       |
-| 03  | [03-basics-python.md](scenarios/03-basics-python.md)                           |                                  |           |           |                            |       |
-| 04  | [04-basics-go.md](scenarios/04-basics-go.md)                                   |                                  |           |           |                            |       |
-| 05  | [05-app-nextjs.md](scenarios/05-app-nextjs.md)                                 |                                  |           |           |                            |       |
-| 06  | [06-app-fastapi.md](scenarios/06-app-fastapi.md)                               |                                  |           |           |                            |       |
-| 07  | [07-app-go-http.md](scenarios/07-app-go-http.md)                               |                                  |           |           |                            |       |
-| 08  | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md)   |                                  |           |           |                            |       |
-| 09  | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) |                                  |           |           |                            |       |
-| 10  | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md)           |                                  |           |           |                            |       |
+| ID  | Scenario file                                                                  | Run directory (`results/runs/…`)       | Heuristic  | LLM judge | Execution (generated code) | Notes                                                                                                                                                                  |
+| --- | ------------------------------------------------------------------------------ | -------------------------------------- | ---------- | --------- | -------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| 01  | [01-basics-curl.md](scenarios/01-basics-curl.md)                               | `2026-04-08T15-07-08-923Z-scenario-01` | Pass (7/7) | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: **`outpost-quickstart.sh`**. Execution: `docs/agent-evaluation/.env` + `./outpost-quickstart.sh`; tenant 200, destination 201, publish **202**; exit 0. |
+| 02  | [02-basics-typescript.md](scenarios/02-basics-typescript.md)                   |                                        |            |           |                            |                                                                                                                                                                        |
+| 03  | [03-basics-python.md](scenarios/03-basics-python.md)                           |                                        |            |           |                            |                                                                                                                                                                        |
+| 04  | [04-basics-go.md](scenarios/04-basics-go.md)                                   |                                        |            |           |                            |                                                                                                                                                                        |
+| 05  | [05-app-nextjs.md](scenarios/05-app-nextjs.md)                                 |                                        |            |           |                            |                                                                                                                                                                        |
+| 06  | [06-app-fastapi.md](scenarios/06-app-fastapi.md)                               |                                        |            |           |                            |                                                                                                                                                                        |
+| 07  | [07-app-go-http.md](scenarios/07-app-go-http.md)                               |                                        |            |           |                            |                                                                                                                                                                        |
+| 08  | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md)   |                                        |            |           |                            |                                                                                                                                                                        |
+| 09  | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) |                                        |            |           |                            |                                                                                                                                                                        |
+| 10  | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md)           |                                        |            |           |                            |                                                                                                                                                                        |
 
 
 ### Column hints

From 556b77f62c65a00c91348bf97f384f1183dd13cc Mon Sep 17 00:00:00 2001
From: Phil Leggetter <phil@leggetter.co.uk>
Date: Wed, 8 Apr 2026 16:32:08 +0100
Subject: [PATCH 07/47] docs(agent-eval): record scenario 02 run and execution
 pass

---
 docs/agent-evaluation/SCENARIO-RUN-TRACKER.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
index bf45cc4ee..b9f0f5156 100644
--- a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
+++ b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
@@ -21,7 +21,7 @@ Use this table while you **run scenarios one at a time** and **execute the gener
 | ID  | Scenario file                                                                  | Run directory (`results/runs/…`)       | Heuristic  | LLM judge | Execution (generated code) | Notes                                                                                                                                                                  |
 | --- | ------------------------------------------------------------------------------ | -------------------------------------- | ---------- | --------- | -------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | 01  | [01-basics-curl.md](scenarios/01-basics-curl.md)                               | `2026-04-08T15-07-08-923Z-scenario-01` | Pass (7/7) | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: **`outpost-quickstart.sh`**. Execution: `docs/agent-evaluation/.env` + `./outpost-quickstart.sh`; tenant 200, destination 201, publish **202**; exit 0. |
-| 02  | [02-basics-typescript.md](scenarios/02-basics-typescript.md)                   |                                        |            |           |                            |                                                                                                                                                                        |
+| 02  | [02-basics-typescript.md](scenarios/02-basics-typescript.md)                   | `2026-04-08T15-16-50-424Z-scenario-02` | Pass (9/9) | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: **`outpost-quickstart.ts`** + `package.json` (SDK). Ran `npx tsx outpost-quickstart.ts` with `docs/agent-evaluation/.env`; tenant, destination, publish OK. |
 | 03  | [03-basics-python.md](scenarios/03-basics-python.md)                           |                                        |            |           |                            |                                                                                                                                                                        |
 | 04  | [04-basics-go.md](scenarios/04-basics-go.md)                                   |                                        |            |           |                            |                                                                                                                                                                        |
 | 05  | [05-app-nextjs.md](scenarios/05-app-nextjs.md)                                 |                                        |            |           |                            |                                                                                                                                                                        |

From 46e6dcc45b1e5037a6bd44fb14403d30f306f605 Mon Sep 17 00:00:00 2001
From: Phil Leggetter <phil@leggetter.co.uk>
Date: Wed, 8 Apr 2026 16:34:09 +0100
Subject: [PATCH 08/47] docs(agent-eval): fix tracker table formatting and
 artifact markdown

---
 docs/agent-evaluation/SCENARIO-RUN-TRACKER.md | 22 +++++++++----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
index b9f0f5156..a05e6b5f3 100644
--- a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
+++ b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
@@ -18,18 +18,18 @@ Use this table while you **run scenarios one at a time** and **execute the gener
 ## Tracker
 
 
-| ID  | Scenario file                                                                  | Run directory (`results/runs/…`)       | Heuristic  | LLM judge | Execution (generated code) | Notes                                                                                                                                                                  |
-| --- | ------------------------------------------------------------------------------ | -------------------------------------- | ---------- | --------- | -------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| 01  | [01-basics-curl.md](scenarios/01-basics-curl.md)                               | `2026-04-08T15-07-08-923Z-scenario-01` | Pass (7/7) | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: **`outpost-quickstart.sh`**. Execution: `docs/agent-evaluation/.env` + `./outpost-quickstart.sh`; tenant 200, destination 201, publish **202**; exit 0. |
+| ID  | Scenario file                                                                  | Run directory (`results/runs/…`)       | Heuristic  | LLM judge | Execution (generated code) | Notes                                                                                                                                                                                      |
+| --- | ------------------------------------------------------------------------------ | -------------------------------------- | ---------- | --------- | -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
+| 01  | [01-basics-curl.md](scenarios/01-basics-curl.md)                               | `2026-04-08T15-07-08-923Z-scenario-01` | Pass (7/7) | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: **`outpost-quickstart.sh`**. Execution: `docs/agent-evaluation/.env` + `./outpost-quickstart.sh`; tenant 200, destination 201, publish **202**; exit 0.     |
 | 02  | [02-basics-typescript.md](scenarios/02-basics-typescript.md)                   | `2026-04-08T15-16-50-424Z-scenario-02` | Pass (9/9) | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: **`outpost-quickstart.ts`** + `package.json` (SDK). Ran `npx tsx outpost-quickstart.ts` with `docs/agent-evaluation/.env`; tenant, destination, publish OK. |
-| 03  | [03-basics-python.md](scenarios/03-basics-python.md)                           |                                        |            |           |                            |                                                                                                                                                                        |
-| 04  | [04-basics-go.md](scenarios/04-basics-go.md)                                   |                                        |            |           |                            |                                                                                                                                                                        |
-| 05  | [05-app-nextjs.md](scenarios/05-app-nextjs.md)                                 |                                        |            |           |                            |                                                                                                                                                                        |
-| 06  | [06-app-fastapi.md](scenarios/06-app-fastapi.md)                               |                                        |            |           |                            |                                                                                                                                                                        |
-| 07  | [07-app-go-http.md](scenarios/07-app-go-http.md)                               |                                        |            |           |                            |                                                                                                                                                                        |
-| 08  | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md)   |                                        |            |           |                            |                                                                                                                                                                        |
-| 09  | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) |                                        |            |           |                            |                                                                                                                                                                        |
-| 10  | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md)           |                                        |            |           |                            |                                                                                                                                                                        |
+| 03  | [03-basics-python.md](scenarios/03-basics-python.md)                           |                                        |            |           |                            |                                                                                                                                                                                            |
+| 04  | [04-basics-go.md](scenarios/04-basics-go.md)                                   |                                        |            |           |                            |                                                                                                                                                                                            |
+| 05  | [05-app-nextjs.md](scenarios/05-app-nextjs.md)                                 |                                        |            |           |                            |                                                                                                                                                                                            |
+| 06  | [06-app-fastapi.md](scenarios/06-app-fastapi.md)                               |                                        |            |           |                            |                                                                                                                                                                                            |
+| 07  | [07-app-go-http.md](scenarios/07-app-go-http.md)                               |                                        |            |           |                            |                                                                                                                                                                                            |
+| 08  | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md)   |                                        |            |           |                            |                                                                                                                                                                                            |
+| 09  | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) |                                        |            |           |                            |                                                                                                                                                                                            |
+| 10  | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md)           |                                        |            |           |                            |                                                                                                                                                                                            |
 
 
 ### Column hints

From f57b59db4dae8f0e54eea51073e5bd1bdbe1c10e Mon Sep 17 00:00:00 2001
From: Phil Leggetter <phil@leggetter.co.uk>
Date: Wed, 8 Apr 2026 16:48:01 +0100
Subject: [PATCH 09/47] docs(agent-eval): record scenario 03 run and execution
 pass

---
 docs/agent-evaluation/SCENARIO-RUN-TRACKER.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
index a05e6b5f3..69fd31de0 100644
--- a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
+++ b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
@@ -22,7 +22,7 @@ Use this table while you **run scenarios one at a time** and **execute the gener
 | --- | ------------------------------------------------------------------------------ | -------------------------------------- | ---------- | --------- | -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
 | 01  | [01-basics-curl.md](scenarios/01-basics-curl.md)                               | `2026-04-08T15-07-08-923Z-scenario-01` | Pass (7/7) | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: **`outpost-quickstart.sh`**. Execution: `docs/agent-evaluation/.env` + `./outpost-quickstart.sh`; tenant 200, destination 201, publish **202**; exit 0.     |
 | 02  | [02-basics-typescript.md](scenarios/02-basics-typescript.md)                   | `2026-04-08T15-16-50-424Z-scenario-02` | Pass (9/9) | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: **`outpost-quickstart.ts`** + `package.json` (SDK). Ran `npx tsx outpost-quickstart.ts` with `docs/agent-evaluation/.env`; tenant, destination, publish OK. |
-| 03  | [03-basics-python.md](scenarios/03-basics-python.md)                           |                                        |            |           |                            |                                                                                                                                                                                            |
+| 03  | [03-basics-python.md](scenarios/03-basics-python.md)                           | `2026-04-08T15-34-12-720Z-scenario-03` | Pass (8/8) | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: **`outpost_quickstart.py`**. Execution: `python3 -m venv .venv`, `pip install outpost_sdk`, `docs/agent-evaluation/.env` + `python outpost_quickstart.py`. |
 | 04  | [04-basics-go.md](scenarios/04-basics-go.md)                                   |                                        |            |           |                            |                                                                                                                                                                                            |
 | 05  | [05-app-nextjs.md](scenarios/05-app-nextjs.md)                                 |                                        |            |           |                            |                                                                                                                                                                                            |
 | 06  | [06-app-fastapi.md](scenarios/06-app-fastapi.md)                               |                                        |            |           |                            |                                                                                                                                                                                            |

From 803b51c0413b8a37e7c5cf1795bfaf061ba59a31 Mon Sep 17 00:00:00 2001
From: Phil Leggetter <phil@leggetter.co.uk>
Date: Wed, 8 Apr 2026 17:04:08 +0100
Subject: [PATCH 10/47] docs(agent-eval): record scenario 04 run and execution
 pass

---
 docs/agent-evaluation/SCENARIO-RUN-TRACKER.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
index 69fd31de0..cedd90ff6 100644
--- a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
+++ b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
@@ -23,7 +23,7 @@ Use this table while you **run scenarios one at a time** and **execute the gener
 | 01  | [01-basics-curl.md](scenarios/01-basics-curl.md)                               | `2026-04-08T15-07-08-923Z-scenario-01` | Pass (7/7) | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: **`outpost-quickstart.sh`**. Execution: `docs/agent-evaluation/.env` + `./outpost-quickstart.sh`; tenant 200, destination 201, publish **202**; exit 0.     |
 | 02  | [02-basics-typescript.md](scenarios/02-basics-typescript.md)                   | `2026-04-08T15-16-50-424Z-scenario-02` | Pass (9/9) | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: **`outpost-quickstart.ts`** + `package.json` (SDK). Ran `npx tsx outpost-quickstart.ts` with `docs/agent-evaluation/.env`; tenant, destination, publish OK. |
 | 03  | [03-basics-python.md](scenarios/03-basics-python.md)                           | `2026-04-08T15-34-12-720Z-scenario-03` | Pass (8/8) | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: **`outpost_quickstart.py`**. Execution: `python3 -m venv .venv`, `pip install outpost_sdk`, `docs/agent-evaluation/.env` + `python outpost_quickstart.py`. |
-| 04  | [04-basics-go.md](scenarios/04-basics-go.md)                                   |                                        |            |           |                            |                                                                                                                                                                                            |
+| 04  | [04-basics-go.md](scenarios/04-basics-go.md)                                   | `2026-04-08T15-48-31-367Z-scenario-04` | Pass (9/9) | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifacts: **`main.go`**, `go.mod` (replace → repo `sdks/outpost-go`). `docs/agent-evaluation/.env` + `go run .`; tenant, destination, publish OK. |
 | 05  | [05-app-nextjs.md](scenarios/05-app-nextjs.md)                                 |                                        |            |           |                            |                                                                                                                                                                                            |
 | 06  | [06-app-fastapi.md](scenarios/06-app-fastapi.md)                               |                                        |            |           |                            |                                                                                                                                                                                            |
 | 07  | [07-app-go-http.md](scenarios/07-app-go-http.md)                               |                                        |            |           |                            |                                                                                                                                                                                            |

From f600652b26c1e6bd74708cce1a6f8197e96e79f9 Mon Sep 17 00:00:00 2001
From: Phil Leggetter <phil@leggetter.co.uk>
Date: Wed, 8 Apr 2026 17:22:11 +0100
Subject: [PATCH 11/47] docs(agent-eval): record scenario 05 run and execution
 pass

Made-with: Cursor
---
 docs/agent-evaluation/SCENARIO-RUN-TRACKER.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
index cedd90ff6..576da3add 100644
--- a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
+++ b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
@@ -20,11 +20,11 @@ Use this table while you **run scenarios one at a time** and **execute the gener
 
 | ID  | Scenario file                                                                  | Run directory (`results/runs/…`)       | Heuristic  | LLM judge | Execution (generated code) | Notes                                                                                                                                                                                      |
 | --- | ------------------------------------------------------------------------------ | -------------------------------------- | ---------- | --------- | -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
-| 01  | [01-basics-curl.md](scenarios/01-basics-curl.md)                               | `2026-04-08T15-07-08-923Z-scenario-01` | Pass (7/7) | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: **`outpost-quickstart.sh`**. Execution: `docs/agent-evaluation/.env` + `./outpost-quickstart.sh`; tenant 200, destination 201, publish **202**; exit 0.     |
-| 02  | [02-basics-typescript.md](scenarios/02-basics-typescript.md)                   | `2026-04-08T15-16-50-424Z-scenario-02` | Pass (9/9) | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: **`outpost-quickstart.ts`** + `package.json` (SDK). Ran `npx tsx outpost-quickstart.ts` with `docs/agent-evaluation/.env`; tenant, destination, publish OK. |
-| 03  | [03-basics-python.md](scenarios/03-basics-python.md)                           | `2026-04-08T15-34-12-720Z-scenario-03` | Pass (8/8) | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: **`outpost_quickstart.py`**. Execution: `python3 -m venv .venv`, `pip install outpost_sdk`, `docs/agent-evaluation/.env` + `python outpost_quickstart.py`. |
-| 04  | [04-basics-go.md](scenarios/04-basics-go.md)                                   | `2026-04-08T15-48-31-367Z-scenario-04` | Pass (9/9) | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifacts: **`main.go`**, `go.mod` (replace → repo `sdks/outpost-go`). `docs/agent-evaluation/.env` + `go run .`; tenant, destination, publish OK. |
-| 05  | [05-app-nextjs.md](scenarios/05-app-nextjs.md)                                 |                                        |            |           |                            |                                                                                                                                                                                            |
+| 01  | [01-basics-curl.md](scenarios/01-basics-curl.md)                               | `2026-04-08T15-07-08-923Z-scenario-01` | Pass (7/7) | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost-quickstart.sh`**. Execution: `docs/agent-evaluation/.env` + `./outpost-quickstart.sh`; tenant 200, destination 201, publish **202**; exit 0.     |
+| 02  | [02-basics-typescript.md](scenarios/02-basics-typescript.md)                   | `2026-04-08T15-16-50-424Z-scenario-02` | Pass (9/9) | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost-quickstart.ts`** + `package.json` (SDK). Ran `npx tsx outpost-quickstart.ts` with `docs/agent-evaluation/.env`; tenant, destination, publish OK. |
+| 03  | [03-basics-python.md](scenarios/03-basics-python.md)                           | `2026-04-08T15-34-12-720Z-scenario-03` | Pass (8/8) | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost_quickstart.py`**. Execution: `python3 -m venv .venv`, `pip install outpost_sdk`, `docs/agent-evaluation/.env` + `python outpost_quickstart.py`.  |
+| 04  | [04-basics-go.md](scenarios/04-basics-go.md)                                   | `2026-04-08T15-48-31-367Z-scenario-04` | Pass (9/9) | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifacts: `**main.go**`, `go.mod` (replace → repo `sdks/outpost-go`). `docs/agent-evaluation/.env` + `go run .`; tenant, destination, publish OK.                    |
+| 05  | [05-app-nextjs.md](scenarios/05-app-nextjs.md)                                 | `2026-04-08T16-12-10-708Z-scenario-05` | Pass (10/10) | Pass    | Pass                       | `EVAL_LOCAL_DOCS=1`. App in **`outpost-nextjs-demo/`** (`@hookdeck/outpost-sdk` npm). `npm run build` OK. Dev on :3010: `POST /api/register` and `POST /api/publish` **200** with `docs/agent-evaluation/.env`. Next.js workspace-root warning (nested lockfiles). |
 | 06  | [06-app-fastapi.md](scenarios/06-app-fastapi.md)                               |                                        |            |           |                            |                                                                                                                                                                                            |
 | 07  | [07-app-go-http.md](scenarios/07-app-go-http.md)                               |                                        |            |           |                            |                                                                                                                                                                                            |
 | 08  | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md)   |                                        |            |           |                            |                                                                                                                                                                                            |

From e1e5154ed85731ce5dbed20ea21fdeaa83bd7053 Mon Sep 17 00:00:00 2001
From: Phil Leggetter <phil@leggetter.co.uk>
Date: Wed, 8 Apr 2026 20:41:00 +0100
Subject: [PATCH 12/47] docs: Outpost mental model, UI guide agnostic URLs,
 agent prompt links

- Expand concepts with SaaS/platform flow; refine building-your-own-ui (API root, paths, no localhost:3333 in examples)
- Agent prompt: link concepts, UI guide, topics; tighten option-2 guidance
- Eval harness: local docs list includes concepts, building-your-own-ui, topics
- SCENARIO-RUN-TRACKER: scenario 05 assessment for 17-21-22 run, heuristic notes
- Minor scenario 05 doc tweak

Made-with: Cursor
---
 docs/agent-evaluation/SCENARIO-RUN-TRACKER.md | 63 +++++++++++++++----
 .../scenarios/05-app-nextjs.md                |  3 +-
 docs/agent-evaluation/src/run-agent-eval.ts   |  3 +
 docs/pages/concepts.mdx                       | 33 +++++++---
 docs/pages/guides/building-your-own-ui.mdx    | 51 +++++++++++----
 .../hookdeck-outpost-agent-prompt.mdx         |  9 ++-
 6 files changed, 125 insertions(+), 37 deletions(-)

diff --git a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
index 576da3add..e9b506e5c 100644
--- a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
+++ b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
@@ -18,18 +18,18 @@ Use this table while you **run scenarios one at a time** and **execute the gener
 ## Tracker
 
 
-| ID  | Scenario file                                                                  | Run directory (`results/runs/…`)       | Heuristic  | LLM judge | Execution (generated code) | Notes                                                                                                                                                                                      |
-| --- | ------------------------------------------------------------------------------ | -------------------------------------- | ---------- | --------- | -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
-| 01  | [01-basics-curl.md](scenarios/01-basics-curl.md)                               | `2026-04-08T15-07-08-923Z-scenario-01` | Pass (7/7) | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost-quickstart.sh`**. Execution: `docs/agent-evaluation/.env` + `./outpost-quickstart.sh`; tenant 200, destination 201, publish **202**; exit 0.     |
-| 02  | [02-basics-typescript.md](scenarios/02-basics-typescript.md)                   | `2026-04-08T15-16-50-424Z-scenario-02` | Pass (9/9) | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost-quickstart.ts`** + `package.json` (SDK). Ran `npx tsx outpost-quickstart.ts` with `docs/agent-evaluation/.env`; tenant, destination, publish OK. |
-| 03  | [03-basics-python.md](scenarios/03-basics-python.md)                           | `2026-04-08T15-34-12-720Z-scenario-03` | Pass (8/8) | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost_quickstart.py`**. Execution: `python3 -m venv .venv`, `pip install outpost_sdk`, `docs/agent-evaluation/.env` + `python outpost_quickstart.py`.  |
-| 04  | [04-basics-go.md](scenarios/04-basics-go.md)                                   | `2026-04-08T15-48-31-367Z-scenario-04` | Pass (9/9) | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifacts: `**main.go**`, `go.mod` (replace → repo `sdks/outpost-go`). `docs/agent-evaluation/.env` + `go run .`; tenant, destination, publish OK.                    |
-| 05  | [05-app-nextjs.md](scenarios/05-app-nextjs.md)                                 | `2026-04-08T16-12-10-708Z-scenario-05` | Pass (10/10) | Pass    | Pass                       | `EVAL_LOCAL_DOCS=1`. App in **`outpost-nextjs-demo/`** (`@hookdeck/outpost-sdk` npm). `npm run build` OK. Dev on :3010: `POST /api/register` and `POST /api/publish` **200** with `docs/agent-evaluation/.env`. Next.js workspace-root warning (nested lockfiles). |
-| 06  | [06-app-fastapi.md](scenarios/06-app-fastapi.md)                               |                                        |            |           |                            |                                                                                                                                                                                            |
-| 07  | [07-app-go-http.md](scenarios/07-app-go-http.md)                               |                                        |            |           |                            |                                                                                                                                                                                            |
-| 08  | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md)   |                                        |            |           |                            |                                                                                                                                                                                            |
-| 09  | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) |                                        |            |           |                            |                                                                                                                                                                                            |
-| 10  | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md)           |                                        |            |           |                            |                                                                                                                                                                                            |
+| ID  | Scenario file                                                                  | Run directory (`results/runs/…`)       | Heuristic              | LLM judge | Execution (generated code) | Notes                                                                                                                                                                                                                                                                        |
+| --- | ------------------------------------------------------------------------------ | -------------------------------------- | ---------------------- | --------- | -------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| 01  | [01-basics-curl.md](scenarios/01-basics-curl.md)                               | `2026-04-08T15-07-08-923Z-scenario-01` | Pass (7/7)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost-quickstart.sh`**. Execution: `docs/agent-evaluation/.env` + `./outpost-quickstart.sh`; tenant 200, destination 201, publish **202**; exit 0.                                                                                       |
+| 02  | [02-basics-typescript.md](scenarios/02-basics-typescript.md)                   | `2026-04-08T15-16-50-424Z-scenario-02` | Pass (9/9)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost-quickstart.ts`** + `package.json` (SDK). Ran `npx tsx outpost-quickstart.ts` with `docs/agent-evaluation/.env`; tenant, destination, publish OK.                                                                                   |
+| 03  | [03-basics-python.md](scenarios/03-basics-python.md)                           | `2026-04-08T15-34-12-720Z-scenario-03` | Pass (8/8)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost_quickstart.py`**. Execution: `python3 -m venv .venv`, `pip install outpost_sdk`, `docs/agent-evaluation/.env` + `python outpost_quickstart.py`.                                                                                    |
+| 04  | [04-basics-go.md](scenarios/04-basics-go.md)                                   | `2026-04-08T15-48-31-367Z-scenario-04` | Pass (9/9)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifacts: `**main.go`**, `go.mod` (replace → repo `sdks/outpost-go`). `docs/agent-evaluation/.env` + `go run .`; tenant, destination, publish OK.                                                                                                      |
+| 05  | [05-app-nextjs.md](scenarios/05-app-nextjs.md)                                 | `2026-04-08T17-21-22-170Z-scenario-05` | 9/10; overall **Fail** | Pass      | Pass                       | `**nextjs-webhook-demo/`** — primary assessed run; see **§ Scenario 05 — assessment (`17-21-22`)**. Heuristic failure is `managed_base_not_selfhosted` (doc-corpus), not the app. Earlier: `2026-04-08T16-12-10-708Z` (`outpost-nextjs-demo/`, 10/10 heuristic, simpler UI). |
+| 06  | [06-app-fastapi.md](scenarios/06-app-fastapi.md)                               |                                        |                        |           |                            |                                                                                                                                                                                                                                                                              |
+| 07  | [07-app-go-http.md](scenarios/07-app-go-http.md)                               |                                        |                        |           |                            |                                                                                                                                                                                                                                                                              |
+| 08  | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md)   |                                        |                        |           |                            |                                                                                                                                                                                                                                                                              |
+| 09  | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) |                                        |                        |           |                            |                                                                                                                                                                                                                                                                              |
+| 10  | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md)           |                                        |                        |           |                            |                                                                                                                                                                                                                                                                              |
 
 
 ### Column hints
@@ -49,9 +49,46 @@ Use short text or symbols in cells, e.g. **Pass** / **Fail** / **Skip** / **N/A*
 
 ---
 
+## Scenario 05 — assessment (`2026-04-08T17-21-22-170Z`)
+
+**Status:** This is the **current focus run** for scenario 05 reviews (not `2026-04-08T16-12-10-708Z`).
+
+
+| Dimension         | Result                                                                                                                                                                                                                                                                                                                            |
+| ----------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| **Run directory** | `results/runs/2026-04-08T17-21-22-170Z-scenario-05/`                                                                                                                                                                                                                                                                              |
+| **Artifact**      | `nextjs-webhook-demo/` — Next.js App Router, `@hookdeck/outpost-sdk`, Outpost calls **only** in `app/api/**/route.ts` (managed API via SDK default unless `OUTPOST_API_BASE_URL` is set).                                                                                                                                         |
+| **Heuristic**     | **9/10**; `overallTranscriptPass` false — single failure: `managed_base_not_selfhosted` because the transcript corpus included a **Read** of older [Building your own UI](../pages/guides/building-your-own-ui.mdx) containing `localhost:3333/api/v1`. The **generated app does not** use that URL. See § Scenario 05 heuristic. |
+| **LLM judge**     | **Pass** — matches scenario 05 success criteria (Next.js structure, server-side SDK, distinct destination + publish UI, tenant/topic handling, README env, managed default).                                                                                                                                                      |
+| **Execution**     | **Pass** (re-checked): `npm run build` in `nextjs-webhook-demo/`; `npm run dev` with `docs/agent-evaluation/.env`; `POST /api/destinations` → **201**, `POST /api/publish` → **200**.                                                                                                                                             |
+
+
+**What the app demonstrates (UX / model):**
+
+1. **Tenant** — Editable tenant id; copy states destinations and publishes are scoped to it.
+2. **Register webhook destination** — URL field + **topic checkboxes** populated from `**GET /api/topics`** (server lists topics from Outpost); `**POST /api/destinations**` upserts tenant and creates webhook destination for selected topics.
+3. **Destinations list** — `**GET /api/destinations?tenantId=`** table (type, target, topics) with refresh — matches “tenant → many destinations” mental model.
+4. **Publish test event** — Separate action; `**POST /api/publish`** with chosen topic; UI notes fan-out to matching destinations.
+
+**Comparison — older run `2026-04-08T16-12-10-708Z` (`outpost-nextjs-demo/`):** Simpler two-route app (`/api/register`, `/api/publish`), **fixed topic** in routes, **no** topics or destinations list APIs, **10/10** heuristic (no offending doc fragment in corpus). Useful as a minimal baseline; **17-21-22** is the richer assessment target.
+
+---
+
+## Scenario 05 heuristic — `managed_base_not_selfhosted`
+
+Scenario 05 includes a regex check (`managed_base_not_selfhosted`) in `[src/score-transcript.ts](../src/score-transcript.ts)` (`scoreScenario05`). It looks at the **whole scoring corpus**: assistant-visible text **plus** content that ended up in the transcript from tools (e.g. **Read** of a doc file), not just files in the run folder.
+
+- It fails if the corpus contains a **self-hosted** default API path: specifically the literal substring `localhost:3333/api/v1` (Outpost’s common local dev URL), or a similar `localhost:<port> / api/v1` pattern, unless `OUTPOST_API_BASE_URL` also appears (see code for the exact conditions).
+- **Historical cause:** Older [Building your own UI](../pages/guides/building-your-own-ui.mdx) curl examples used `localhost:3333/api/v1`. If the agent **read** that page during a run, those lines were embedded in `transcript.json`, the check fired, and `overallTranscriptPass` became **false** even when the **generated Next.js app** only used the **managed** SDK default. That was a **harness / doc-corpus** interaction, not proof the app targeted local Outpost.
+- **Doc update:** `docs/pages/guides/building-your-own-ui.mdx` was rewritten to be **managed / self-hosted agnostic** (`OUTPOST_API_BASE_URL`, OpenAPI-shaped paths). Examples **no longer contain** the literal `localhost:3333/api/v1`, so a future eval whose corpus only picks up the current file should **not** fail this check for that substring. Re-run scenario 05 to confirm; other `localhost` patterns could still match if they appear elsewhere in the corpus.
+- **Run `2026-04-08T16-12-10-708Z`:** heuristic **10/10**, `overallTranscriptPass: true`.
+- **Run `2026-04-08T17-21-22-170Z`:** heuristic **9/10**, `overallTranscriptPass: false` — failed `managed_base_not_selfhosted`; LLM judge still **passed**; transcript included **Read** of the **previous** `building-your-own-ui.mdx` with `localhost:3333/api/v1`.
+
+**Possible follow-ups:** narrow the heuristic to tool-written files under the run workspace only, or exclude known doc paths from the substring that triggers this check.
+
 ## Action items
 
-Add bullet or table rows here when something should be tracked across runs (docs gaps, harness changes, etc.). *None recorded yet for this pass.*
+- Scenario 05: optionally re-run eval after the UI guide rewrite to confirm `managed_base_not_selfhosted` no longer false-positives on that doc **Read**; then consider whether the heuristic can be narrowed (see § above).
 
 ---
 
diff --git a/docs/agent-evaluation/scenarios/05-app-nextjs.md b/docs/agent-evaluation/scenarios/05-app-nextjs.md
index f44061775..bc4aca4db 100644
--- a/docs/agent-evaluation/scenarios/05-app-nextjs.md
+++ b/docs/agent-evaluation/scenarios/05-app-nextjs.md
@@ -54,5 +54,4 @@ Paste the **## Template** block from `[hookdeck-outpost-agent-prompt.mdx](../pag
 
 - Calling Outpost directly from browser-side code with embedded key.
 - Only publishing without a UI path to register the destination first.
-- Hard-coding localhost Outpost without user request.
-
+- Hard-coding localhost Outpost without user request.
\ No newline at end of file
diff --git a/docs/agent-evaluation/src/run-agent-eval.ts b/docs/agent-evaluation/src/run-agent-eval.ts
index 87abd3b78..bc7629f53 100644
--- a/docs/agent-evaluation/src/run-agent-eval.ts
+++ b/docs/agent-evaluation/src/run-agent-eval.ts
@@ -91,6 +91,9 @@ Do **not** rely on live public documentation URLs for this session. Read these f
 
 Follow **Language → SDK vs HTTP** below for mapping user intent to the **single** right quickstart. Prefer language quickstarts over \`sdks.mdx\` (TS-heavy).
 
+- **Concepts** (tenants, destinations as subscriptions, topics, how this fits a SaaS/platform): \`${f("docs/pages/concepts.mdx")}\`
+- **Building your own UI** (screen structure: list destinations, create flow type → topics → config): \`${f("docs/pages/guides/building-your-own-ui.mdx")}\`
+- **Topics** (destination topic subscriptions, fan-out): \`${f("docs/pages/features/topics.mdx")}\`
 - Getting started (curl / HTTP only): \`${f("docs/pages/quickstarts/hookdeck-outpost-curl.mdx")}\`
 - TypeScript quickstart (TS SDK): \`${f("docs/pages/quickstarts/hookdeck-outpost-typescript.mdx")}\`
 - Python quickstart (Python SDK): \`${f("docs/pages/quickstarts/hookdeck-outpost-python.mdx")}\`
diff --git a/docs/pages/concepts.mdx b/docs/pages/concepts.mdx
index 841f64249..a74e927bb 100644
--- a/docs/pages/concepts.mdx
+++ b/docs/pages/concepts.mdx
@@ -2,14 +2,31 @@
 title: "Outpost Concepts"
 ---
 
+## How this fits your product
+
+If you run a **SaaS**, **platform**, or **API product** and want each of **your customers** to receive webhooks or other event destinations, Outpost gives you a **multi-tenant** control plane for that.
+
+At a high level, the same mental model as a single-tenant webhook product still applies: something happens in your system (**event**), it belongs to a category (**topic**), and the consumer cares about **where** it should be delivered (**URL**, queue, etc.). Outpost adds one layer: those subscriptions live **per customer** in your product, which maps to a **tenant** in Outpost.
+
+**Typical flow:**
+
+1. **Map your customer to a tenant** — Each organization, team, or account in your app should have a stable **tenant id** in Outpost (often the same id you already use internally). Create or upsert that tenant when the customer is ready to use outbound events (onboarding, first visit to integrations, etc.).
+2. **Each tenant has zero or more destinations** — A **destination** is a concrete subscription: it combines a **destination type** (webhook, SQS, Hookdeck, …), one or more **topics** the customer wants to receive, and **type-specific configuration** (for a webhook, the HTTPS **endpoint URL** and signing secret; for a queue, the queue identifier; and so on). One tenant may have several destinations (e.g. production vs staging endpoints, or different systems).
+3. **Your backend publishes events** — When something happens, your **server** calls the publish API (or SDK) with **`tenant_id`**, **`topic`**, and payload metadata. Outpost does **not** infer the tenant from the browser; publishing uses your **platform** credentials and explicit tenant scope.
+4. **Outpost delivers to matching destinations** — For that tenant, every destination whose **topic subscription** includes the event’s topic gets a delivery attempt. A single publish can fan out to **many** destinations or to **none** if no destination subscribes to that topic.
+
+**What to build in your UI (conceptually):** screens or flows scoped to the **current customer** (tenant): list their **destinations**, **create or edit** a destination (choose type → choose topics → enter URL or other config), and surfaces for **events and delivery attempts** when you want users to inspect what was sent and how delivery behaved. Your UI talks to Outpost **through your backend** (recommended) or via **per-tenant JWT**, never by embedding your platform API key in the browser. See the [Building your own UI](/docs/guides/building-your-own-ui) guide for screen-level structure and API patterns.
+
+For topic subscription behavior (wildcard `*`, multiple topics, fan-out), see [Topics](/docs/features/topics).
+
 ## Models
 
-- **Tenants**: A tenant represents a user/team/organization in your product.
-- **Destination Types**: The type of destination where events will be delivered. For example, webhook, Hookdeck, or AWS SQS.
-- **Destinations**: A destination is a specific instance of a destination type. For example, a webhook destination with a specific URL.
-- **Topics**: A topic is a way to categorize events and is a common concept found in Pub/Sub messaging. For example, a `user.created` event might be categorized under the user topic.
-- **Events**: An event is a piece of data that represents an action that occurred in your system. For example, a user signed up or a payment was processed.
-- **Delivery Attempts**: A delivery attempt represents the result of an attempt to deliver an event to a destination.
+- **Tenants**: A tenant represents a user, team, or organization **in your product**—the customer who owns their own destinations and receives their own deliveries.
+- **Destination types**: The kind of endpoint where events are delivered (webhook, Hookdeck, AWS SQS, …). The set of types is configured on the Outpost deployment.
+- **Destinations**: A **subscription** for one tenant: an instance of a destination type plus **which topics** to receive and **where** to deliver (webhook URL, queue name, Hookdeck token, etc.). A tenant may have **many** destinations.
+- **Topics**: Labels for categories of events (e.g. `user.created`). Your platform configures which topics exist; destinations **subscribe** to one or more topics; publish calls include a **topic** so Outpost knows which subscriptions match.
+- **Events**: A unit of something that happened in your system, published into Outpost with tenant, topic, and payload. Delivery attempts record how each destination received (or failed) that event.
+- **Delivery attempts**: The outcome of trying to deliver one event to one destination (success, failure, retries, response metadata).
 
 ## Architecture
 
@@ -41,9 +58,9 @@ Required for log storage.
 - PostgreSQL
 - ClickHouse
 
-## Tenant Destination Types
+## Supported destination types
 
-Event destination types belonging to Outpost tenants where events are delivered.
+These are the **destination types** your tenants can choose when creating a destination (see **Models** above).
 
 - Webhooks
 - Hookdeck Event Gateway
diff --git a/docs/pages/guides/building-your-own-ui.mdx b/docs/pages/guides/building-your-own-ui.mdx
index 2edbcc6ad..73fe8c135 100644
--- a/docs/pages/guides/building-your-own-ui.mdx
+++ b/docs/pages/guides/building-your-own-ui.mdx
@@ -10,20 +10,50 @@ Within this guide, we will use the User Portal as a reference implementation for
 
 In this guide, we will assume you are using React (client-side) to build your own UI, but the same principles can be applied to any other framework.
 
+## UI structure and flow
+
+Outpost’s tenant portal is a good reference for how screens map to the **tenant → destinations → topics → delivery target** model. When you build your own UI, keep the same structure so operators and end users are not forced into a misleading “single global webhook URL” mental model.
+
+**Tenant context**
+
+- Everything below is **scoped to one tenant**—the signed-in customer in your SaaS or the account selected in your platform. That tenant id is what you pass to Outpost when listing or creating destinations and when publishing from your backend.
+- If you use JWT auth against Outpost, the token is issued **for that tenant**; if you proxy through your API, your routes should resolve the current customer to a `tenant_id` and forward it on list/create/publish calls.
+
+**Recommended areas / screens**
+
+| Area | Purpose |
+| ---- | ------- |
+| **Destinations list** | Show all destinations for the current tenant (each row is one subscription: type, human-readable **target** such as webhook URL, subscribed topics). Entry point to edit, disable, or remove. |
+| **Create destination** | Multi-step flow aligned with the API: (1) **choose destination type**, (2) **select topics** (from the topics configured on your Outpost project—often checkboxes or multi-select), (3) **configure** type-specific fields (e.g. webhook URL, credentials). Optional: instructions or remote setup links from the destination type schema. |
+| **Events and delivery attempts** | List recent events for the tenant and inspect **delivery attempts** per event or destination so users can see outcomes, failures, and retries—similar to the portal’s event and log experience. |
+
+For how tenants, destinations, and topics fit together in a multi-tenant product, see [Outpost Concepts](/docs/concepts)—especially **How this fits your product**.
+
 ## Authentication
 
 To perform API calls on behalf of your tenants, you can either generate a JWT token, which can be used client-side to make Outpost API calls, or you can proxy any API requests to the Outpost API through your own API. When proxying through your own API, you can ensure the API call is made for the currently authenticated tenant using the API `tenant_id` parameter.
 
 Proxying through your own API can be useful if you want to limit access to some configuration or functionality of Outpost.
 
+### API base URL (managed and self-hosted)
+
+Examples below use a single variable **`API_URL`** (or **`OUTPOST_API_BASE_URL`** in shell snippets): the **root URL for Outpost’s HTTP API**, with **no trailing slash**. Paths in this guide match the [OpenAPI specification](/docs/api) (`/tenants/...`, `/topics`, `/destination-types`, …).
+
+- **Hookdeck Outpost (managed):** use the base URL from your project (for example `https://api.outpost.hookdeck.com/2025-07-01`). The [managed curl quickstart](/docs/quickstarts/hookdeck-outpost-curl) uses the same pattern.
+- **Self-hosted Outpost:** use your deployment’s public origin **plus** whatever path prefix your install uses (commonly **`/api/v1`**), e.g. `https://outpost.internal.example.com/api/v1`. For local dev, use your actual host and port (see your deployment docs—do not assume a specific port in shared snippets).
+
+Do **not** hardcode `localhost` in product docs or copy-paste snippets meant for operators; always substitute your real base URL. The React snippets assume `API_URL` already includes any `/api/v1` segment so that `${API_URL}/tenants/destinations` resolves correctly for your environment.
+
 ### Generating a JWT Token (Optional)
 
 You can generate a JWT token by using the [Tenant JWT Token API](/docs/api/tenants#get-tenant-jwt-token).
 
 ```bash
-curl --location 'localhost:3333/api/v1/tenants/<TENANT_ID>/token' \
-  --header 'Content-Type: application/json' \
-  --header 'Authorization: Bearer <API_KEY>' \
+export OUTPOST_API_BASE_URL="https://api.outpost.hookdeck.com/2025-07-01"   # or your self-hosted root, e.g. …/api/v1
+TENANT_ID="<TENANT_ID>"
+
+curl --request GET "$OUTPOST_API_BASE_URL/tenants/$TENANT_ID/token" \
+  --header "Authorization: Bearer <ADMIN_API_KEY>"
 ```
 
 ## Fetching Destination Type Schema
@@ -36,14 +66,15 @@ Destinations are listed using the [List Destinations API](/docs/api/destinations
 
 ```tsx
 // React example to fetch and render a list of destinations
+// API_URL = Outpost API root (managed project URL or self-hosted origin + /api/v1)
 
 const [destinations, setDestinations] = useState([]);
 
 const [destination_types, setDestinationTypes] = useState([]);
 
 const fetchDestinations = async () => {
-  // Get the tenant destinations
-  const response = await fetch(`${API_URL}/api/v1/tenants/destinations`, {
+  // Get the tenant destinations (JWT infers tenant — see Authentication API)
+  const response = await fetch(`${API_URL}/tenants/destinations`, {
     headers: {
       Authorization: `Bearer ${token}`,
     },
@@ -54,8 +85,7 @@ const fetchDestinations = async () => {
 };
 
 const fetchDestinationTypes = async () => {
-  // Get the destination types schemas
-  const response = await fetch(`${API_URL}/api/v1/destination-types`, {
+  const response = await fetch(`${API_URL}/destination-types`, {
     headers: {
       Authorization: `Bearer ${token}`,
     },
@@ -120,8 +150,7 @@ The list of available destination types is rendered from the list of destination
 const [destination_types, setDestinationTypes] = useState([]);
 
 const fetchDestinationTypes = async () => {
-  // Get the destination types schemas
-  const response = await fetch(`${API_URL}/api/v1/destination-types`, {
+  const response = await fetch(`${API_URL}/destination-types`, {
     headers: {
       Authorization: `Bearer ${token}`,
     },
@@ -183,7 +212,7 @@ Available topics are returned from the [List Topics API](/docs/api/topics#list-t
 const [topics, setTopics] = useState([]);
 
 const fetchTopics = async () => {
-  const response = await fetch(`${API_URL}/api/v1/topics`, {
+  const response = await fetch(`${API_URL}/topics`, {
     headers: {
       Authorization: `Bearer ${token}`,
     },
@@ -341,7 +370,7 @@ Events are listed using the [List Events API](/docs/api/events#list-events). You
 const [events, setEvents] = useState([]);
 
 const fetchEvents = async () => {
-  const response = await fetch(`${API_URL}/api/v1/tenants/events`, {
+  const response = await fetch(`${API_URL}/tenants/events`, {
     headers: {
       Authorization: `Bearer ${token}`,
     },
diff --git a/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx b/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx
index 8e6afe122..a36ea94a9 100644
--- a/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx
+++ b/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx
@@ -35,7 +35,10 @@ Use this **Hookdeck Console Source** URL to verify event delivery (the webhook `
 - Go quickstart (Go SDK): {{DOCS_URL}}/quickstarts/hookdeck-outpost-go
 - Full docs bundle (when available on the public site): {{LLMS_FULL_URL}}
 - API reference and OpenAPI (REST JSON shapes and status codes): {{DOCS_URL}}/api
+- **Concepts — how tenants, destinations (subscriptions), topics, and publish fit a SaaS/platform:** {{DOCS_URL}}/concepts
+- **Building your own UI — screen structure and flow** (list destinations, create destination: type → topics → config; tenant scope): {{DOCS_URL}}/guides/building-your-own-ui
 - Destination types: {{DOCS_URL}}/destinations
+- Topics and destination subscriptions (fan-out, `*`): {{DOCS_URL}}/features/topics
 - SDK overview (mostly TypeScript-shaped examples): {{DOCS_URL}}/sdks — use **only** for high-level context; for **TypeScript, Python, or Go** code, follow that language’s **quickstart** for correct method signatures (e.g. Python `publish.event` uses `request={{...}}`, not TypeScript-style spreads as Python kwargs).
 
 ### Language → SDK vs HTTP
@@ -52,7 +55,7 @@ Operators rarely name packages or SDK details. **You** map what they say to the
 
 Do **not** mix patterns across languages (e.g. do not apply TypeScript `publish.event({ ... })` argument style to Python).
 
-**Option 2 (small app)** — Map framework to the matching official SDK on the **server only**: e.g. **Next.js** → TypeScript SDK + patterns from the TypeScript quickstart and your Next conventions; **FastAPI** → Python SDK; **Go + net/http** → Go SDK. Prefer each language’s **quickstart** for Outpost call shapes.
+**Option 2 (small app)** — Map framework to the matching official SDK on the **server only**: e.g. **Next.js** → TypeScript SDK + patterns from the TypeScript quickstart and your Next conventions; **FastAPI** → Python SDK; **Go + net/http** → Go SDK. Prefer each language’s **quickstart** for Outpost call shapes. **Before** designing pages or forms, read **Concepts** and **Building your own UI** in the Documentation list: the UI should reflect **tenant scope**, **multiple destinations per tenant**, and **destination = topic subscription + delivery target** (not a single anonymous webhook field unless the user explicitly asks for that simplified shape).
 
 **Option 3 (existing app)** — Use the **official SDK for the repo’s language** on the server (or REST/OpenAPI if they insist on no SDK). Read that language’s quickstart for shapes; integrate on **real** domain paths, not throwaway demos.
 
@@ -62,7 +65,7 @@ Guide the conversation, then act:
 
 1. **Try it out** — Minimal path: tenant → webhook destination → publish → print event id (or show success). If they want the **simplest** path, default to **curl** without making them say “curl.” If they name **TypeScript**, **Python**, or **Go**, use **only** that language’s quickstart and implied SDK. Ask only for what the quickstart and runnability still need (env vars, etc.).
 
-2. **Build a minimal example** — Small UI + server; use the **SDK for that stack** (see **Option 2** above) or REST if they choose HTTP-only.
+2. **Build a minimal example** — Small UI + server; use the **SDK for that stack** (see **Option 2** above) or REST if they choose HTTP-only. Follow **Concepts** + **Building your own UI** for the real product model. For a **tiny** demo (e.g. one page), still keep the model visible: **tenant** in scope, **create destination** as **topics + delivery target** (not one undifferentiated “webhook” field that hides topics), and a **separate** control or flow to **publish a test event** so the operator can verify delivery—avoid collapsing tenant setup, destination creation, and publish into a single form unless the user insists. An events or attempts view is optional for the smallest demo but matches the portal pattern when you have room.
 
 3. **Integrate with an existing app** — Clone or open their codebase; add Outpost per **Option 3** above; document env vars and operator steps.
 
@@ -70,7 +73,7 @@ For all modes, read the **single** language-appropriate quickstart (and OpenAPI
 
 **Files on disk:** When your environment supports it, **write runnable artifacts into the operator’s project workspace** (real files: scripts, app source, `package.json`, `go.mod`, README) rather than only pasting long code in chat—so they can run, diff, and commit. Keep everything for one task in the same directory. For **minimal example apps** (option 2), scaffold and install dependencies there as you normally would (for example `npm` / `npx`, `go mod`, `pip` or `uv`).
 
-**Concepts:** Each tenant is one of the platform's customers. Destinations are where events are delivered (webhook URLs, queues, etc.). Events are published with a **topic**; only destinations subscribed to that topic receive the event. Topics for this project are listed above and were configured in the Hookdeck dashboard.
+**Concepts:** Each **tenant** is one of the platform’s customers. A tenant has **zero or more destinations**; each **destination** is a **subscription**—a destination type (e.g. webhook) plus **which topics** to receive and **where** to deliver (e.g. HTTPS URL). Your **backend** publishes with **`tenant_id`**, **`topic`**, and payload; Outpost fans out to every destination of that tenant that subscribes to that topic. Read **{{DOCS_URL}}/concepts** and **{{DOCS_URL}}/guides/building-your-own-ui** for the full model and recommended screens. Topics for this project are listed above and were configured in the Hookdeck dashboard.
 ```
 
 ## Placeholder reference

From 1c6042be32ff39c6f54b6a5938f19a3bb53b232f Mon Sep 17 00:00:00 2001
From: Phil Leggetter <phil@leggetter.co.uk>
Date: Thu, 9 Apr 2026 11:41:38 +0100
Subject: [PATCH 13/47] =?UTF-8?q?docs(agent-eval):=20record=20scenario=200?=
 =?UTF-8?q?6=E2=80=9307=20runs=20and=20execution=20passes?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Made-with: Cursor
---
 docs/agent-evaluation/SCENARIO-RUN-TRACKER.md | 26 +++++++++----------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
index e9b506e5c..7bc14b506 100644
--- a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
+++ b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
@@ -18,18 +18,18 @@ Use this table while you **run scenarios one at a time** and **execute the gener
 ## Tracker
 
 
-| ID  | Scenario file                                                                  | Run directory (`results/runs/…`)       | Heuristic              | LLM judge | Execution (generated code) | Notes                                                                                                                                                                                                                                                                        |
-| --- | ------------------------------------------------------------------------------ | -------------------------------------- | ---------------------- | --------- | -------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| 01  | [01-basics-curl.md](scenarios/01-basics-curl.md)                               | `2026-04-08T15-07-08-923Z-scenario-01` | Pass (7/7)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost-quickstart.sh`**. Execution: `docs/agent-evaluation/.env` + `./outpost-quickstart.sh`; tenant 200, destination 201, publish **202**; exit 0.                                                                                       |
-| 02  | [02-basics-typescript.md](scenarios/02-basics-typescript.md)                   | `2026-04-08T15-16-50-424Z-scenario-02` | Pass (9/9)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost-quickstart.ts`** + `package.json` (SDK). Ran `npx tsx outpost-quickstart.ts` with `docs/agent-evaluation/.env`; tenant, destination, publish OK.                                                                                   |
-| 03  | [03-basics-python.md](scenarios/03-basics-python.md)                           | `2026-04-08T15-34-12-720Z-scenario-03` | Pass (8/8)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost_quickstart.py`**. Execution: `python3 -m venv .venv`, `pip install outpost_sdk`, `docs/agent-evaluation/.env` + `python outpost_quickstart.py`.                                                                                    |
-| 04  | [04-basics-go.md](scenarios/04-basics-go.md)                                   | `2026-04-08T15-48-31-367Z-scenario-04` | Pass (9/9)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifacts: `**main.go`**, `go.mod` (replace → repo `sdks/outpost-go`). `docs/agent-evaluation/.env` + `go run .`; tenant, destination, publish OK.                                                                                                      |
-| 05  | [05-app-nextjs.md](scenarios/05-app-nextjs.md)                                 | `2026-04-08T17-21-22-170Z-scenario-05` | 9/10; overall **Fail** | Pass      | Pass                       | `**nextjs-webhook-demo/`** — primary assessed run; see **§ Scenario 05 — assessment (`17-21-22`)**. Heuristic failure is `managed_base_not_selfhosted` (doc-corpus), not the app. Earlier: `2026-04-08T16-12-10-708Z` (`outpost-nextjs-demo/`, 10/10 heuristic, simpler UI). |
-| 06  | [06-app-fastapi.md](scenarios/06-app-fastapi.md)                               |                                        |                        |           |                            |                                                                                                                                                                                                                                                                              |
-| 07  | [07-app-go-http.md](scenarios/07-app-go-http.md)                               |                                        |                        |           |                            |                                                                                                                                                                                                                                                                              |
-| 08  | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md)   |                                        |                        |           |                            |                                                                                                                                                                                                                                                                              |
-| 09  | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) |                                        |                        |           |                            |                                                                                                                                                                                                                                                                              |
-| 10  | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md)           |                                        |                        |           |                            |                                                                                                                                                                                                                                                                              |
+| ID  | Scenario file                                                                  | Run directory (`results/runs/…`)       | Heuristic              | LLM judge | Execution (generated code) | Notes                                                                                                                                                                                                                                                                                                                                                             |
+| --- | ------------------------------------------------------------------------------ | -------------------------------------- | ---------------------- | --------- | -------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| 01  | [01-basics-curl.md](scenarios/01-basics-curl.md)                               | `2026-04-08T15-07-08-923Z-scenario-01` | Pass (7/7)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost-quickstart.sh`**. Execution: `docs/agent-evaluation/.env` + `./outpost-quickstart.sh`; tenant 200, destination 201, publish **202**; exit 0.                                                                                                                                                                            |
+| 02  | [02-basics-typescript.md](scenarios/02-basics-typescript.md)                   | `2026-04-08T15-16-50-424Z-scenario-02` | Pass (9/9)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost-quickstart.ts`** + `package.json` (SDK). Ran `npx tsx outpost-quickstart.ts` with `docs/agent-evaluation/.env`; tenant, destination, publish OK.                                                                                                                                                                        |
+| 03  | [03-basics-python.md](scenarios/03-basics-python.md)                           | `2026-04-08T15-34-12-720Z-scenario-03` | Pass (8/8)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost_quickstart.py`**. Execution: `python3 -m venv .venv`, `pip install outpost_sdk`, `docs/agent-evaluation/.env` + `python outpost_quickstart.py`.                                                                                                                                                                         |
+| 04  | [04-basics-go.md](scenarios/04-basics-go.md)                                   | `2026-04-08T15-48-31-367Z-scenario-04` | Pass (9/9)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifacts: `**main.go`**, `go.mod` (replace → repo `sdks/outpost-go`). `docs/agent-evaluation/.env` + `go run .`; tenant, destination, publish OK.                                                                                                                                                                                           |
+| 05  | [05-app-nextjs.md](scenarios/05-app-nextjs.md)                                 | `2026-04-08T17-21-22-170Z-scenario-05` | 9/10; overall **Fail** | Pass      | Pass                       | `**nextjs-webhook-demo/`** — primary assessed run; see **§ Scenario 05 — assessment (`17-21-22`)**. Heuristic failure is `managed_base_not_selfhosted` (doc-corpus), not the app. Earlier: `2026-04-08T16-12-10-708Z` (`outpost-nextjs-demo/`, 10/10 heuristic, simpler UI).                                                                                      |
+| 06  | [06-app-fastapi.md](scenarios/06-app-fastapi.md)                               | `2026-04-09T08-38-42-008Z-scenario-06` | Pass (8/8)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. `**main.py`** + `requirements.txt`, `outpost_sdk` + FastAPI. HTML: destinations list, add webhook (topics from API + URL), publish test event, delete. Execution: `python3 -m venv .venv`, `pip install -r requirements.txt`, run-dir `.env`, `uvicorn main:app` on :8766; **GET /** 200, **POST /destinations** 303, **POST /publish** 303. |
+| 07  | [07-app-go-http.md](scenarios/07-app-go-http.md)                               | `2026-04-09T09-10-23-291Z-scenario-07` | Pass (9/9)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. **`go-portal-demo/`** — `main.go` + `templates/`, `net/http`, `outpost-go` (`replace` → repo `sdks/outpost-go`). Multi-step create destination + **GET/POST /publish**. Execution: `PORT=8777` + key/base from `docs/agent-evaluation/.env`; **GET /** 200, **POST /publish** 200. Eval ~25 min wall time. |
+| 08  | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md)   |                                        |                        |           |                            |                                                                                                                                                                                                                                                                                                                                                                   |
+| 09  | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) |                                        |                        |           |                            |                                                                                                                                                                                                                                                                                                                                                                   |
+| 10  | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md)           |                                        |                        |           |                            |                                                                                                                                                                                                                                                                                                                                                                   |
 
 
 ### Column hints
@@ -66,7 +66,7 @@ Use short text or symbols in cells, e.g. **Pass** / **Fail** / **Skip** / **N/A*
 **What the app demonstrates (UX / model):**
 
 1. **Tenant** — Editable tenant id; copy states destinations and publishes are scoped to it.
-2. **Register webhook destination** — URL field + **topic checkboxes** populated from `**GET /api/topics`** (server lists topics from Outpost); `**POST /api/destinations**` upserts tenant and creates webhook destination for selected topics.
+2. **Register webhook destination** — URL field + **topic checkboxes** populated from `**GET /api/topics`** (server lists topics from Outpost); `**POST /api/destinations`** upserts tenant and creates webhook destination for selected topics.
 3. **Destinations list** — `**GET /api/destinations?tenantId=`** table (type, target, topics) with refresh — matches “tenant → many destinations” mental model.
 4. **Publish test event** — Separate action; `**POST /api/publish`** with chosen topic; UI notes fan-out to matching destinations.
 

From 89afda8ec7bb8ef0b0f07f997ff89d03a65c9e87 Mon Sep 17 00:00:00 2001
From: Phil Leggetter <phil@leggetter.co.uk>
Date: Thu, 9 Apr 2026 16:32:29 +0100
Subject: [PATCH 14/47] docs: fix List Topics UI example for string[] API
 response
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

GET /topics returns a JSON array of topic names (OpenAPI). The React snippet
incorrectly treated items as objects with id and name, which misled readers
and agent integrations. Use the string as key, value, and label to match the
API and TypeScript SDK (topicsList → Array<string>).

Made-with: Cursor
---
 docs/pages/guides/building-your-own-ui.mdx | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/pages/guides/building-your-own-ui.mdx b/docs/pages/guides/building-your-own-ui.mdx
index 73fe8c135..3b5e1711b 100644
--- a/docs/pages/guides/building-your-own-ui.mdx
+++ b/docs/pages/guides/building-your-own-ui.mdx
@@ -235,9 +235,9 @@ return (
     <h1>Select topics</h1>
     <form onSubmit={handleSubmit}>
       {topics.map((topic) => (
-        <label key={topic.id}>
-          <input type="checkbox" name="topics" value={topic.id} />
-          {topic.name}
+        <label key={topic}>
+          <input type="checkbox" name="topics" value={topic} />
+          {topic}
         </label>
       ))}
   </div>

From 78845ab3f859e940ef26dee45fd69868463d90e9 Mon Sep 17 00:00:00 2001
From: Phil Leggetter <phil@leggetter.co.uk>
Date: Thu, 9 Apr 2026 16:35:50 +0100
Subject: [PATCH 15/47] feat(agent-eval): declarative pre-steps via Eval
 harness section

- Add eval-harness.ts to parse eval-harness fenced JSON (git_clone + agentCwd).
- Runner applies pre-steps per scenario, sets agent cwd and write guard to
  the run directory, passes scenario markdown once into runOneScenario.
- Transcript meta includes evalHarness summary; document EVAL_SKIP_HARNESS_PRE_STEPS.

Made-with: Cursor
---
 docs/agent-evaluation/README.md             |   1 +
 docs/agent-evaluation/src/eval-harness.ts   | 226 ++++++++++++++++++++
 docs/agent-evaluation/src/run-agent-eval.ts |  37 +++-
 3 files changed, 254 insertions(+), 10 deletions(-)
 create mode 100644 docs/agent-evaluation/src/eval-harness.ts

diff --git a/docs/agent-evaluation/README.md b/docs/agent-evaluation/README.md
index 8f63b4abd..7b5e6b439 100644
--- a/docs/agent-evaluation/README.md
+++ b/docs/agent-evaluation/README.md
@@ -71,6 +71,7 @@ cd docs/agent-evaluation && npm ci && npm run eval:ci
 - **`EVAL_TEST_DESTINATION_URL`** — required for Turn 0; same Source URL as `{{TEST_DESTINATION_URL}}`.
 - **`OUTPOST_API_KEY`** — **not** read by the automated runner, but **required if you want a full evaluation**: without it you can only judge the transcript (plausible curl/SDK text). To verify that **generated commands or code actually work**, put the same Outpost API key you use against the managed API in **`docs/agent-evaluation/.env`** (or export it) and run the agent’s output against a real project. The onboarding prompt tells operators to keep that key in **`.env`** and never paste it into chat.
 - **`EVAL_LOCAL_DOCS=1`** — before public docs are live, set this so Turn 0 replaces public doc URLs with **absolute paths to MDX/OpenAPI files in this repo** (so the agent should use **Read** on local files instead of WebFetch to production).
+- **`EVAL_SKIP_HARNESS_PRE_STEPS=1`** — skip **`git_clone`** (and any future **`preSteps`**) declared in a scenario’s **`## Eval harness`** JSON block; useful offline or when the baseline folder is already present.
 
 - **Turn 0** text is built from [`hookdeck-outpost-agent-prompt.mdx`](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx) (`## Template`) with placeholders filled from environment variables.
 - Transcripts are written to `results/runs/<stamp>-scenario-NN/transcript.json` (gitignored).
diff --git a/docs/agent-evaluation/src/eval-harness.ts b/docs/agent-evaluation/src/eval-harness.ts
new file mode 100644
index 000000000..d8facfba8
--- /dev/null
+++ b/docs/agent-evaluation/src/eval-harness.ts
@@ -0,0 +1,226 @@
+/**
+ * Declarative pre-steps for agent eval scenarios (see `## Eval harness` in scenario markdown).
+ */
+
+import { existsSync } from "node:fs";
+import { readdir } from "node:fs/promises";
+import { join, resolve, sep } from "node:path";
+
+export interface EvalHarnessConfig {
+  readonly preSteps: HarnessPreStep[];
+  /** Directory under the run folder for the agent process `cwd` (default `"."` = run dir). */
+  readonly agentCwd: string;
+}
+
+export type HarnessPreStep = GitClonePreStep;
+
+export interface GitClonePreStep {
+  readonly type: "git_clone";
+  readonly url: string;
+  /** Target directory name under the run dir (single segment, no `..`). */
+  readonly into: string;
+  readonly depth?: number;
+  /** If set and `process.env[urlEnv]` is non-empty, use it instead of `url`. */
+  readonly urlEnv?: string;
+}
+
+const DEFAULT_CONFIG: EvalHarnessConfig = { preSteps: [], agentCwd: "." };
+
+function envFlagTruthy(v: string | undefined): boolean {
+  if (!v) return false;
+  const s = v.trim().toLowerCase();
+  return s === "1" || s === "true" || s === "yes";
+}
+
+/** Resolved path must stay under `root` (no `..` escape). */
+export function pathMustStayInsideRunDir(root: string, relativeOrAbsolute: string): string {
+  const resolved = resolve(relativeOrAbsolute);
+  const r = resolve(root);
+  if (resolved === r) return resolved;
+  const prefix = r.endsWith(sep) ? r : r + sep;
+  if (!resolved.startsWith(prefix)) {
+    throw new Error(`Path escapes run directory: ${relativeOrAbsolute} -> ${resolved}`);
+  }
+  return resolved;
+}
+
+function assertSingleRunSubdir(name: string, field: string): void {
+  if (!name || name === "." || name === "..") {
+    throw new Error(`eval-harness: invalid ${field} (empty, ., or ..)`);
+  }
+  if (name.includes("/") || name.includes("\\") || name.includes("..")) {
+    throw new Error(`eval-harness: ${field} must be a single path segment: ${JSON.stringify(name)}`);
+  }
+}
+
+function isRecord(v: unknown): v is Record<string, unknown> {
+  return typeof v === "object" && v !== null && !Array.isArray(v);
+}
+
+function parseGitCloneStep(raw: Record<string, unknown>, index: number): GitClonePreStep {
+  const url = raw.url;
+  const into = raw.into;
+  if (typeof url !== "string" || url.length === 0) {
+    throw new Error(`eval-harness: preSteps[${index}] git_clone requires non-empty string "url"`);
+  }
+  if (typeof into !== "string" || into.length === 0) {
+    throw new Error(`eval-harness: preSteps[${index}] git_clone requires non-empty string "into"`);
+  }
+  assertSingleRunSubdir(into, "into");
+  const depth = raw.depth;
+  if (depth !== undefined && (typeof depth !== "number" || !Number.isInteger(depth) || depth < 1)) {
+    throw new Error(`eval-harness: preSteps[${index}] git_clone "depth" must be a positive integer`);
+  }
+  const urlEnv = raw.urlEnv;
+  if (urlEnv !== undefined && (typeof urlEnv !== "string" || urlEnv.length === 0)) {
+    throw new Error(`eval-harness: preSteps[${index}] git_clone "urlEnv" must be a non-empty string`);
+  }
+  return {
+    type: "git_clone",
+    url,
+    into,
+    ...(depth !== undefined ? { depth } : {}),
+    ...(urlEnv ? { urlEnv } : {}),
+  };
+}
+
+function parsePreStep(raw: unknown, index: number): HarnessPreStep {
+  if (!isRecord(raw)) {
+    throw new Error(`eval-harness: preSteps[${index}] must be an object`);
+  }
+  const t = raw.type;
+  if (t === "git_clone") {
+    return parseGitCloneStep(raw, index);
+  }
+  throw new Error(`eval-harness: preSteps[${index}] unknown type ${JSON.stringify(t)}`);
+}
+
+/**
+ * Parse `## Eval harness` and a ```eval-harness JSON block. Missing section → default (no pre-steps, cwd = run dir).
+ */
+export function parseEvalHarness(markdown: string): EvalHarnessConfig {
+  const m = markdown.match(/^## Eval harness\s*$/m);
+  if (!m || m.index === undefined) {
+    return DEFAULT_CONFIG;
+  }
+  const afterHeader = markdown.slice(m.index + m[0].length);
+  const nextH2 = afterHeader.match(/^## [^\s#]/m);
+  const section = nextH2?.index !== undefined ? afterHeader.slice(0, nextH2.index) : afterHeader;
+  const fence = section.match(/```eval-harness\s*\n([\s\S]*?)```/);
+  if (!fence) {
+    throw new Error(
+      'Scenario has "## Eval harness" but no ```eval-harness ... ``` JSON block (add one, or remove the heading).',
+    );
+  }
+  let parsed: unknown;
+  try {
+    parsed = JSON.parse(fence[1]!.trim());
+  } catch (e) {
+    throw new Error(
+      `eval-harness: invalid JSON in ## Eval harness block: ${e instanceof Error ? e.message : String(e)}`,
+    );
+  }
+  if (!isRecord(parsed)) {
+    throw new Error("eval-harness: root must be a JSON object");
+  }
+  const preRaw = parsed.preSteps;
+  const preSteps: HarnessPreStep[] = [];
+  if (preRaw !== undefined) {
+    if (!Array.isArray(preRaw)) {
+      throw new Error('eval-harness: "preSteps" must be an array');
+    }
+    for (let i = 0; i < preRaw.length; i++) {
+      preSteps.push(parsePreStep(preRaw[i], i));
+    }
+  }
+  let agentCwd = ".";
+  const ac = parsed.agentCwd;
+  if (ac !== undefined) {
+    if (typeof ac !== "string") {
+      throw new Error('eval-harness: "agentCwd" must be a string');
+    }
+    agentCwd = ac.trim() || ".";
+  }
+  if (agentCwd !== "." && agentCwd !== "") {
+    assertSingleRunSubdir(agentCwd, "agentCwd");
+  } else {
+    agentCwd = ".";
+  }
+  return { preSteps, agentCwd };
+}
+
+async function dirLooksCloned(target: string): Promise<boolean> {
+  if (!existsSync(target)) return false;
+  const entries = await readdir(target);
+  return entries.length > 0;
+}
+
+async function runGitClone(runDir: string, step: GitClonePreStep): Promise<void> {
+  const url =
+    (step.urlEnv && process.env[step.urlEnv]?.trim()) || step.url;
+  if (!url) {
+    throw new Error(
+      `eval-harness: git_clone into ${step.into} has no URL (set "url" or env ${step.urlEnv ?? "(none)"})`,
+    );
+  }
+  const target = join(runDir, step.into);
+  if (await dirLooksCloned(target)) {
+    console.error(`Harness: skip git_clone (directory already non-empty): ${target}`);
+    return;
+  }
+  const { execFile } = await import("node:child_process");
+  const { promisify } = await import("node:util");
+  const execFileAsync = promisify(execFile);
+  const depth = step.depth ?? 1;
+  console.error(`Harness: git clone -> ${target}`);
+  try {
+    await execFileAsync("git", ["clone", "--depth", String(depth), url, target], {
+      cwd: runDir,
+      maxBuffer: 50 * 1024 * 1024,
+    });
+  } catch (err) {
+    if (await dirLooksCloned(target)) {
+      return;
+    }
+    throw new Error(
+      `Harness git_clone failed (${url} -> ${target}): ${err instanceof Error ? err.message : String(err)}`,
+    );
+  }
+}
+
+/**
+ * Run harness pre-steps and return absolute agent cwd + run dir for the write guard.
+ */
+export async function applyEvalHarness(
+  runDir: string,
+  config: EvalHarnessConfig,
+): Promise<{ agentCwd: string; writeGuardRoot: string }> {
+  const writeGuardRoot = runDir;
+  const skip = envFlagTruthy(process.env.EVAL_SKIP_HARNESS_PRE_STEPS);
+
+  if (!skip) {
+    for (const step of config.preSteps) {
+      if (step.type === "git_clone") {
+        await runGitClone(runDir, step);
+      }
+    }
+  } else if (config.preSteps.length > 0) {
+    console.error("Harness: EVAL_SKIP_HARNESS_PRE_STEPS set — skipped all preSteps.");
+  }
+
+  const relative = config.agentCwd === "." ? "" : config.agentCwd;
+  const agentCwd = relative ? join(runDir, relative) : runDir;
+  pathMustStayInsideRunDir(runDir, agentCwd);
+
+  if (!existsSync(agentCwd)) {
+    if (skip) {
+      console.error(
+        `Harness: agent cwd ${agentCwd} missing (pre-steps skipped); falling back to run dir ${runDir}`,
+      );
+      return { agentCwd: runDir, writeGuardRoot };
+    }
+    throw new Error(`Harness: agent cwd does not exist after pre-steps: ${agentCwd}`);
+  }
+
+  return { agentCwd, writeGuardRoot };
+}
diff --git a/docs/agent-evaluation/src/run-agent-eval.ts b/docs/agent-evaluation/src/run-agent-eval.ts
index bc7629f53..7201e6e51 100644
--- a/docs/agent-evaluation/src/run-agent-eval.ts
+++ b/docs/agent-evaluation/src/run-agent-eval.ts
@@ -19,6 +19,7 @@ import {
   type SDKMessage,
   type SDKSystemMessage,
 } from "@anthropic-ai/claude-agent-sdk";
+import { applyEvalHarness, parseEvalHarness } from "./eval-harness.js";
 import { llmJudgeRun, scenarioMdPathFromRun } from "./llm-judge.js";
 import { scoreRunFile } from "./score-transcript.js";
 
@@ -266,6 +267,8 @@ async function runOneScenario(
   opts: {
     skipOptional: boolean;
     baseOptions: Options;
+    /** When set, avoids a second read of the scenario file (same content as harness parse). */
+    scenarioMarkdown?: string;
   },
 ): Promise<{
   scenarioId: string;
@@ -275,7 +278,7 @@ async function runOneScenario(
   allMessages: unknown[];
 }> {
   const path = join(SCENARIOS_DIR, scenarioFile);
-  const md = await readFile(path, "utf8");
+  const md = opts.scenarioMarkdown ?? (await readFile(path, "utf8"));
   const parsed = parseScenarioTurns(md);
 
   const userTurns = parsed
@@ -345,17 +348,17 @@ function toolInputFilePath(toolName: string, toolInput: unknown): string | undef
  * PreToolUse hook: deny Write/Edit/NotebookEdit outside the run dir.
  * `canUseTool` is not reliable under `permissionMode: dontAsk`; hooks receive `permissionDecision` instead.
  */
-function createRunDirPreToolHook(runDir: string) {
+function createRunDirPreToolHook(allowedRootDir: string) {
   return async (input: HookInput) => {
     if (input.hook_event_name !== "PreToolUse") return {};
     const candidate = toolInputFilePath(input.tool_name, input.tool_input);
     if (!candidate) return {};
-    if (filePathIsInsideRunDir(runDir, candidate)) return {};
+    if (filePathIsInsideRunDir(allowedRootDir, candidate)) return {};
     return {
       hookSpecificOutput: {
         hookEventName: "PreToolUse" as const,
         permissionDecision: "deny" as const,
-        permissionDecisionReason: `Outpost agent-eval: ${input.tool_name} must target only the scenario workspace. Use a path under ${runDir} (e.g. outpost-quickstart.sh). Refused: ${resolve(candidate)}`,
+        permissionDecisionReason: `Outpost agent-eval: ${input.tool_name} must target only the scenario workspace. Use a path under ${allowedRootDir} (e.g. outpost-quickstart.sh). Refused: ${resolve(candidate)}`,
       },
     };
   };
@@ -374,7 +377,11 @@ function defaultEvalTools(env: NodeJS.ProcessEnv): string {
     : "Read,Glob,Grep,WebFetch,Write,Edit,Bash";
 }
 
-function buildBaseOptions(agentWorkspaceCwd: string): Options {
+/**
+ * @param agentWorkspaceCwd — process cwd for the agent (per-run directory, or a subfolder when the scenario defines `agentCwd` in ## Eval harness).
+ * @param writeGuardRoot — PreToolUse hook allows Write/Edit only under this path (usually the per-run directory so the clone stays inside it).
+ */
+function buildBaseOptions(agentWorkspaceCwd: string, writeGuardRoot: string): Options {
   const toolsRaw = defaultEvalTools(process.env);
   const allowedTools = toolsRaw
     .split(",")
@@ -402,7 +409,7 @@ function buildBaseOptions(agentWorkspaceCwd: string): Options {
 
   if (!envFlagTruthy(process.env.EVAL_DISABLE_WORKSPACE_WRITE_GUARD)) {
     o.hooks = {
-      PreToolUse: [{ hooks: [createRunDirPreToolHook(agentWorkspaceCwd)] }],
+      PreToolUse: [{ hooks: [createRunDirPreToolHook(writeGuardRoot)] }],
     };
   }
 
@@ -461,13 +468,14 @@ Environment:
   EVAL_PERMISSION_MODE  Optional (default: dontAsk)
   EVAL_PERSIST_SESSION  Set to "false" to disable session persistence (breaks multi-turn resume)
   EVAL_DISABLE_WORKSPACE_WRITE_GUARD  Set to 1 to allow Write/Edit outside the run dir (not recommended)
+  EVAL_SKIP_HARNESS_PRE_STEPS       Set to 1 to skip ## Eval harness preSteps (git_clone, etc.); see scenario markdown
 
 Outputs under docs/agent-evaluation/results/runs/ (gitignored): each scenario gets
   results/runs/<stamp>-scenario-NN/transcript.json
   heuristic-score.json and llm-score.json unless disabled (see above).
 Also set EVAL_NO_SCORE_HEURISTIC=1 or EVAL_NO_SCORE_LLM=1 in .env to skip scoring without flags.
 
-Each run uses results/runs/<stamp>-scenario-NN/ as agent cwd so Write creates files there.
+Agent cwd is usually the run directory. Scenarios may define ## Eval harness (JSON) to clone a baseline into a subfolder first.
 `);
     process.exit(0);
   }
@@ -536,11 +544,16 @@ Each run uses results/runs/<stamp>-scenario-NN/ as agent cwd so Write creates fi
     const runDir = join(RUNS_DIR, `${stamp}-scenario-${scenarioIdEarly}`);
     await mkdir(runDir, { recursive: true });
 
-    const baseOptions = buildBaseOptions(runDir);
-    console.error(`\n>>> Scenario ${file} (workspace ${runDir}) ...`);
+    const scenarioPath = join(SCENARIOS_DIR, file);
+    const scenarioMd = await readFile(scenarioPath, "utf8");
+    const harnessConfig = parseEvalHarness(scenarioMd);
+    const { agentCwd, writeGuardRoot } = await applyEvalHarness(runDir, harnessConfig);
+    const baseOptions = buildBaseOptions(agentCwd, writeGuardRoot);
+    console.error(`\n>>> Scenario ${file} (run dir ${runDir}, agent cwd ${agentCwd}) ...`);
     const result = await runOneScenario(file, filledTemplate, {
       skipOptional: values["skip-optional"] ?? false,
       baseOptions,
+      scenarioMarkdown: scenarioMd,
     });
 
     const outPath = join(runDir, "transcript.json");
@@ -549,7 +562,11 @@ Each run uses results/runs/<stamp>-scenario-NN/ as agent cwd so Write creates fi
         scenarioId: result.scenarioId,
         scenarioFile: result.scenarioFile,
         runDirectory: runDir,
-        agentWorkspaceCwd: runDir,
+        agentWorkspaceCwd: agentCwd,
+        evalHarness: {
+          preStepCount: harnessConfig.preSteps.length,
+          agentCwd: harnessConfig.agentCwd,
+        },
         repositoryRoot: REPO_ROOT,
         completedAt: new Date().toISOString(),
         sessionId: result.sessionId,

From 77f26089f59c7fa9a0405f69c1ed67058b9d07d2 Mon Sep 17 00:00:00 2001
From: Phil Leggetter <phil@leggetter.co.uk>
Date: Thu, 9 Apr 2026 16:36:02 +0100
Subject: [PATCH 16/47] =?UTF-8?q?docs(agent-eval):=20harness=20blocks=20fo?=
 =?UTF-8?q?r=20existing-app=20scenarios=2008=E2=80=9310?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- Add ## Eval harness JSON (git_clone + agentCwd) for Next.js, FastAPI, Go baselines.
- Turn 1 stays in-user voice (repo present) without naming the eval harness.
- Align Automated eval and success criteria with pre-cloned workspace model.

Made-with: Cursor
---
 .../scenarios/08-integrate-nextjs-existing.md | 33 +++++++++++++++----
 .../09-integrate-fastapi-existing.md          | 23 +++++++++++--
 .../scenarios/10-integrate-go-existing.md     | 23 +++++++++++--
 3 files changed, 66 insertions(+), 13 deletions(-)

diff --git a/docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md b/docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md
index fc1594ff0..94c9b65ab 100644
--- a/docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md
+++ b/docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md
@@ -2,7 +2,7 @@
 
 ## Intent
 
-Operators often have a **production-shaped SaaS codebase** (auth, teams, dashboard) and need **outbound webhooks** for their customers. This scenario measures whether the agent can **clone a known open-source baseline**, understand where **domain events** happen, and **wire Hookdeck Outpost** so events are **published** to Outpost (with **per-tenant webhook destinations** documented or implemented).
+Operators often have a **production-shaped SaaS codebase** (auth, teams, dashboard) and need **outbound webhooks** for their customers. This scenario measures whether the agent can work **inside an existing app tree** (here: a pinned open-source baseline), understand where **domain events** happen, and **wire Hookdeck Outpost** so events are **published** to Outpost (with **per-tenant webhook destinations** documented or implemented).
 
 **Baseline application (pin this in evals):** [**leerob/next-saas-starter**](https://github.com/leerob/next-saas-starter) — Next.js, PostgreSQL, Drizzle, team/member flows, MIT license. It is a common reference for “real” SaaS structure; adjust the prompt if you standardize on another repo.
 
@@ -11,9 +11,28 @@ Operators often have a **production-shaped SaaS codebase** (auth, teams, dashboa
 - Node 18+; `git` available.
 - Same Turn 0 placeholders as other scenarios (`OUTPOST_API_KEY` **not** in the prompt text; test destination URL from dashboard).
 
+## Eval harness
+
+The runner executes **`preSteps`** below with shell **`cwd`** = `results/runs/<stamp>-scenario-08/` before Turn 0. **`agentCwd`** is the SDK process working directory (the baseline repo root). Set **`EVAL_SKIP_HARNESS_PRE_STEPS=1`** to skip preSteps; if **`agentCwd`** is missing, the harness falls back to the run directory. When **`urlEnv`** is set and that variable is non-empty, it overrides **`url`**.
+
+```eval-harness
+{
+  "preSteps": [
+    {
+      "type": "git_clone",
+      "url": "https://github.com/leerob/next-saas-starter.git",
+      "into": "next-saas-starter",
+      "depth": 1,
+      "urlEnv": "EVAL_NEXT_SAAS_BASELINE_URL"
+    }
+  ],
+  "agentCwd": "next-saas-starter"
+}
+```
+
 ## Automated eval (Claude Agent SDK)
 
-The harness **`cwd`** is an empty directory under `results/runs/<stamp>-scenario-08/`. The agent should **`git clone`** the baseline into that workspace (or a subdirectory), **`npm` / `pnpm install`** via **Bash**, then **Write** / **Edit** integration code. Reviewers inspect the run folder and transcript.
+Same as other scenarios, except the agent starts **inside** the cloned tree above. Expect **`npm` / `pnpm install`** via **Bash**, then **Write** / **Edit** for Outpost. Reviewers inspect that tree plus `transcript.json`.
 
 ## Conversation script
 
@@ -23,7 +42,7 @@ Paste the **## Template** block from [`hookdeck-outpost-agent-prompt.mdx`](../pa
 
 ### Turn 1 — User
 
-> Option 3 — I’m not starting from scratch. Please clone **`https://github.com/leerob/next-saas-starter`** here, install it, and get it runnable. Then wire in **Hookdeck Outpost** so we can send **outbound webhooks** to our customers.
+> Option 3 — I’m not starting from scratch. **We’re already in the Next.js SaaS app in this workspace** — the baseline repo is checked out here. Install dependencies and get it runnable, then wire in **Hookdeck Outpost** so we can send **outbound webhooks** to our customers.
 >
 > I need this tied to **something real in the app** (not a throwaway demo page), and I need to understand how each customer gets their webhook registered. Put whatever I need to configure in the README (env vars, etc.). Keep secrets on the server only.
 
@@ -35,20 +54,20 @@ Paste the **## Template** block from [`hookdeck-outpost-agent-prompt.mdx`](../pa
 
 **Measurement:** Heuristic `scoreScenario08` in [`src/score-transcript.ts`](../src/score-transcript.ts); LLM judge maps the bullets below ([`README.md` § Measuring scenarios](../README.md#measuring-scenarios)). Execution row is manual.
 
-- Baseline app is the documented **next-saas-starter** (or an explicitly justified fork) with clone + install steps reflected in the transcript or run directory.
+- Baseline app is the documented **next-saas-starter** (or an explicitly justified fork): harness clone under the run directory plus install / integration steps reflected in the transcript or that tree.
 - **Outpost TypeScript SDK** used **server-side only**; no `NEXT_PUBLIC_*` API key.
 - At least one **publish** (or equivalent) tied to a **real code path** in the baseline (not dead code).
 - **Topic** aligns with Turn 0 configuration or is clearly named and documented.
 - **Per-customer webhook** story is explained: destination creation / subscription to topic.
 - README (or equivalent) lists **env vars** for Outpost.
-- **Execution (full pass):** With `OUTPOST_API_KEY` set, the app runs; a manual path triggers the integrated publish and Outpost accepts the request (2xx/202 as appropriate). *Skip only for transcript-only triage.*
+- **Execution (full pass):** With `OUTPOST_API_KEY` set, the app runs; a manual path triggers the integrated publish and Outpost accepts the request (2xx/202 as appropriate). Run smoke tests from **`results/runs/…-scenario-08/next-saas-starter/`** (not transcript-only triage).
 
 ## Failure modes to note
 
-- Pasting a greenfield Next app instead of integrating the **cloned** baseline.
+- Pasting a greenfield Next app instead of integrating the **baseline** in the workspace.
 - Publishing only from a demo route unrelated to the product model.
 - Calling Outpost from client components with secrets.
 
 ## Future baselines
 
-Java / .NET “existing app” scenarios can follow the same shape: fixed public baseline repo + Option 3 Turn 1 + Success criteria + `scoreScenarioNN`.
+Java / .NET “existing app” scenarios can follow the same shape: harness pre-clones a fixed public baseline into the run workspace + Option 3 Turn 1 (user already “in” the app) + Success criteria + `scoreScenarioNN`.
diff --git a/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md b/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md
index dd8270921..c80554dc4 100644
--- a/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md
+++ b/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md
@@ -10,9 +10,26 @@ Same as [scenario 8](08-integrate-nextjs-existing.md), but stack is **Python + F
 
 - Python 3.10+; `git` available.
 
+## Eval harness
+
+```eval-harness
+{
+  "preSteps": [
+    {
+      "type": "git_clone",
+      "url": "https://github.com/philipokiokio/FastAPI_SAAS_Template.git",
+      "into": "FastAPI_SAAS_Template",
+      "depth": 1,
+      "urlEnv": "EVAL_FASTAPI_SAAS_BASELINE_URL"
+    }
+  ],
+  "agentCwd": "FastAPI_SAAS_Template"
+}
+```
+
 ## Automated eval (Claude Agent SDK)
 
-**`cwd`** is `results/runs/<stamp>-scenario-09/`. Expect **`git clone`**, **`pip` / `uv`**, then **Write** / **Edit** for Outpost integration.
+The agent starts **inside** the cloned baseline above. Expect **`pip` / `uv`** setup from the template README, then **Write** / **Edit** for Outpost integration.
 
 ## Conversation script
 
@@ -22,7 +39,7 @@ Paste the **## Template** from [`hookdeck-outpost-agent-prompt.mdx`](../pages/qu
 
 ### Turn 1 — User
 
-> Option 3 — integrate Outpost into a real codebase. Clone **`https://github.com/philipokiokio/FastAPI_SAAS_Template`**, set it up from its README, then add **Hookdeck Outpost** for customer webhooks.
+> Option 3 — integrate Outpost into a real codebase. **We’re already in the FastAPI SaaS template in this workspace** — the repository is present here. Set it up from its README, then add **Hookdeck Outpost** for customer webhooks.
 >
 > Hook publishing to **one real event** that already exists in the app (orgs, users, whatever fits). Document topics, how tenants register webhook URLs, and env vars. Don’t leak the API key to the client.
 
@@ -34,7 +51,7 @@ Paste the **## Template** from [`hookdeck-outpost-agent-prompt.mdx`](../pages/qu
 
 **Measurement:** Heuristic `scoreScenario09` in [`src/score-transcript.ts`](../src/score-transcript.ts); LLM judge; execution manual.
 
-- Cloned **FastAPI_SAAS_Template** (or documented alternative) with install steps.
+- **FastAPI_SAAS_Template** (or documented alternative) present via harness **`preSteps`** with install steps in the transcript or tree.
 - **`outpost_sdk`** with **`publish.event`** (and related calls as needed) on a **real** code path.
 - API key from **environment** or secure settings — not hard-coded or exposed to clients.
 - **Topic** and **destination** story documented.
diff --git a/docs/agent-evaluation/scenarios/10-integrate-go-existing.md b/docs/agent-evaluation/scenarios/10-integrate-go-existing.md
index 1408caa57..bbe96d80f 100644
--- a/docs/agent-evaluation/scenarios/10-integrate-go-existing.md
+++ b/docs/agent-evaluation/scenarios/10-integrate-go-existing.md
@@ -10,9 +10,26 @@ Same integration goal as [scenarios 8–9](08-integrate-nextjs-existing.md), for
 
 - Go 1.21+; `git` available.
 
+## Eval harness
+
+```eval-harness
+{
+  "preSteps": [
+    {
+      "type": "git_clone",
+      "url": "https://github.com/devinterface/startersaas-go-api.git",
+      "into": "startersaas-go-api",
+      "depth": 1,
+      "urlEnv": "EVAL_GO_SAAS_BASELINE_URL"
+    }
+  ],
+  "agentCwd": "startersaas-go-api"
+}
+```
+
 ## Automated eval (Claude Agent SDK)
 
-**`cwd`** is `results/runs/<stamp>-scenario-10/`. Expect **`git clone`**, **`go mod`** / **`go get`** for **`outpost-go`**, then source edits.
+The agent starts **inside** the cloned baseline above. Expect **`go mod`** / **`go get`** for **`outpost-go`**, then source edits.
 
 ## Conversation script
 
@@ -22,7 +39,7 @@ Paste the **## Template** from [`hookdeck-outpost-agent-prompt.mdx`](../pages/qu
 
 ### Turn 1 — User
 
-> Option 3 — existing Go API. Clone **`https://github.com/devinterface/startersaas-go-api`**, get it building, then add **Hookdeck Outpost** for outbound webhooks.
+> Option 3 — existing Go API. **We’re already in the startersaas-go-api tree in this workspace** — the repository is present here. Get it building, then add **Hookdeck Outpost** for outbound webhooks.
 >
 > Use **one real handler** as the publish trigger (signup, billing, etc.). API key from env only. Document how customers register webhook URLs and what to set in env. Use the test destination from the dashboard prompt where it helps.
 
@@ -34,7 +51,7 @@ Paste the **## Template** from [`hookdeck-outpost-agent-prompt.mdx`](../pages/qu
 
 **Measurement:** Heuristic `scoreScenario10` in [`src/score-transcript.ts`](../src/score-transcript.ts); LLM judge; execution manual.
 
-- Cloned **startersaas-go-api** (or documented alternative) with build instructions attempted.
+- **startersaas-go-api** (or documented alternative) present via harness **`preSteps`** with build instructions attempted in the transcript or tree.
 - **Outpost Go SDK** used with **`Publish.Event`** (and related types) on a **real** handler path.
 - No API key in source; **`os.Getenv("OUTPOST_API_KEY")`** (or config loader) only.
 - **Topic** + **destination** documentation for operators.

From e766d9864d3b3ca9e6842c82b9810150fb8775ea Mon Sep 17 00:00:00 2001
From: Phil Leggetter <phil@leggetter.co.uk>
Date: Thu, 9 Apr 2026 16:36:13 +0100
Subject: [PATCH 17/47] chore(agent-eval): update SCENARIO-RUN-TRACKER for
 recent runs

Made-with: Cursor
---
 docs/agent-evaluation/SCENARIO-RUN-TRACKER.md | 24 +++++++++----------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
index 7bc14b506..197269399 100644
--- a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
+++ b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
@@ -18,18 +18,18 @@ Use this table while you **run scenarios one at a time** and **execute the gener
 ## Tracker
 
 
-| ID  | Scenario file                                                                  | Run directory (`results/runs/…`)       | Heuristic              | LLM judge | Execution (generated code) | Notes                                                                                                                                                                                                                                                                                                                                                             |
-| --- | ------------------------------------------------------------------------------ | -------------------------------------- | ---------------------- | --------- | -------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| 01  | [01-basics-curl.md](scenarios/01-basics-curl.md)                               | `2026-04-08T15-07-08-923Z-scenario-01` | Pass (7/7)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost-quickstart.sh`**. Execution: `docs/agent-evaluation/.env` + `./outpost-quickstart.sh`; tenant 200, destination 201, publish **202**; exit 0.                                                                                                                                                                            |
-| 02  | [02-basics-typescript.md](scenarios/02-basics-typescript.md)                   | `2026-04-08T15-16-50-424Z-scenario-02` | Pass (9/9)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost-quickstart.ts`** + `package.json` (SDK). Ran `npx tsx outpost-quickstart.ts` with `docs/agent-evaluation/.env`; tenant, destination, publish OK.                                                                                                                                                                        |
-| 03  | [03-basics-python.md](scenarios/03-basics-python.md)                           | `2026-04-08T15-34-12-720Z-scenario-03` | Pass (8/8)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost_quickstart.py`**. Execution: `python3 -m venv .venv`, `pip install outpost_sdk`, `docs/agent-evaluation/.env` + `python outpost_quickstart.py`.                                                                                                                                                                         |
-| 04  | [04-basics-go.md](scenarios/04-basics-go.md)                                   | `2026-04-08T15-48-31-367Z-scenario-04` | Pass (9/9)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifacts: `**main.go`**, `go.mod` (replace → repo `sdks/outpost-go`). `docs/agent-evaluation/.env` + `go run .`; tenant, destination, publish OK.                                                                                                                                                                                           |
-| 05  | [05-app-nextjs.md](scenarios/05-app-nextjs.md)                                 | `2026-04-08T17-21-22-170Z-scenario-05` | 9/10; overall **Fail** | Pass      | Pass                       | `**nextjs-webhook-demo/`** — primary assessed run; see **§ Scenario 05 — assessment (`17-21-22`)**. Heuristic failure is `managed_base_not_selfhosted` (doc-corpus), not the app. Earlier: `2026-04-08T16-12-10-708Z` (`outpost-nextjs-demo/`, 10/10 heuristic, simpler UI).                                                                                      |
-| 06  | [06-app-fastapi.md](scenarios/06-app-fastapi.md)                               | `2026-04-09T08-38-42-008Z-scenario-06` | Pass (8/8)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. `**main.py`** + `requirements.txt`, `outpost_sdk` + FastAPI. HTML: destinations list, add webhook (topics from API + URL), publish test event, delete. Execution: `python3 -m venv .venv`, `pip install -r requirements.txt`, run-dir `.env`, `uvicorn main:app` on :8766; **GET /** 200, **POST /destinations** 303, **POST /publish** 303. |
-| 07  | [07-app-go-http.md](scenarios/07-app-go-http.md)                               | `2026-04-09T09-10-23-291Z-scenario-07` | Pass (9/9)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. **`go-portal-demo/`** — `main.go` + `templates/`, `net/http`, `outpost-go` (`replace` → repo `sdks/outpost-go`). Multi-step create destination + **GET/POST /publish**. Execution: `PORT=8777` + key/base from `docs/agent-evaluation/.env`; **GET /** 200, **POST /publish** 200. Eval ~25 min wall time. |
-| 08  | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md)   |                                        |                        |           |                            |                                                                                                                                                                                                                                                                                                                                                                   |
-| 09  | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) |                                        |                        |           |                            |                                                                                                                                                                                                                                                                                                                                                                   |
-| 10  | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md)           |                                        |                        |           |                            |                                                                                                                                                                                                                                                                                                                                                                   |
+| ID  | Scenario file                                                                  | Run directory (`results/runs/…`)       | Heuristic              | LLM judge | Execution (generated code) | Notes                                                                                                                                                                                                                                                                                                                                                                                                               |
+| --- | ------------------------------------------------------------------------------ | -------------------------------------- | ---------------------- | --------- | -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| 01  | [01-basics-curl.md](scenarios/01-basics-curl.md)                               | `2026-04-08T15-07-08-923Z-scenario-01` | Pass (7/7)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost-quickstart.sh`**. Execution: `docs/agent-evaluation/.env` + `./outpost-quickstart.sh`; tenant 200, destination 201, publish **202**; exit 0.                                                                                                                                                                                                                              |
+| 02  | [02-basics-typescript.md](scenarios/02-basics-typescript.md)                   | `2026-04-08T15-16-50-424Z-scenario-02` | Pass (9/9)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost-quickstart.ts`** + `package.json` (SDK). Ran `npx tsx outpost-quickstart.ts` with `docs/agent-evaluation/.env`; tenant, destination, publish OK.                                                                                                                                                                                                                          |
+| 03  | [03-basics-python.md](scenarios/03-basics-python.md)                           | `2026-04-08T15-34-12-720Z-scenario-03` | Pass (8/8)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost_quickstart.py`**. Execution: `python3 -m venv .venv`, `pip install outpost_sdk`, `docs/agent-evaluation/.env` + `python outpost_quickstart.py`.                                                                                                                                                                                                                           |
+| 04  | [04-basics-go.md](scenarios/04-basics-go.md)                                   | `2026-04-08T15-48-31-367Z-scenario-04` | Pass (9/9)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifacts: `**main.go`**, `go.mod` (replace → repo `sdks/outpost-go`). `docs/agent-evaluation/.env` + `go run .`; tenant, destination, publish OK.                                                                                                                                                                                                                                             |
+| 05  | [05-app-nextjs.md](scenarios/05-app-nextjs.md)                                 | `2026-04-08T17-21-22-170Z-scenario-05` | 9/10; overall **Fail** | Pass      | Pass                       | `**nextjs-webhook-demo/`** — primary assessed run; see **§ Scenario 05 — assessment (`17-21-22`)**. Heuristic failure is `managed_base_not_selfhosted` (doc-corpus), not the app. Earlier: `2026-04-08T16-12-10-708Z` (`outpost-nextjs-demo/`, 10/10 heuristic, simpler UI).                                                                                                                                        |
+| 06  | [06-app-fastapi.md](scenarios/06-app-fastapi.md)                               | `2026-04-09T08-38-42-008Z-scenario-06` | Pass (8/8)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. `**main.py`** + `requirements.txt`, `outpost_sdk` + FastAPI. HTML: destinations list, add webhook (topics from API + URL), publish test event, delete. Execution: `python3 -m venv .venv`, `pip install -r requirements.txt`, run-dir `.env`, `uvicorn main:app` on :8766; **GET /** 200, **POST /destinations** 303, **POST /publish** 303.                                                   |
+| 07  | [07-app-go-http.md](scenarios/07-app-go-http.md)                               | `2026-04-09T09-10-23-291Z-scenario-07` | Pass (9/9)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. `**go-portal-demo/`** — `main.go` + `templates/`, `net/http`, `outpost-go` (`replace` → repo `sdks/outpost-go`). Multi-step create destination + **GET/POST /publish**. Execution: `PORT=8777` + key/base from `docs/agent-evaluation/.env`; **GET /** 200, **POST /publish** 200. Eval ~25 min wall time.                                                                                     |
+| 08  | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md)   | `2026-04-09T14-48-16-906Z-scenario-08` | Pass (7/7)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. `**## Eval harness`** pre-clone + `**agent cwd**` = `next-saas-starter/` under run dir; artifact colocated (`app/api/outpost/**`, dashboard webhooks, `@hookdeck/outpost-sdk`). **Execution:** `npx tsc --noEmit` in `…/next-saas-starter/` — **exit 0**. Eval ~13 min wall time. Earlier run `2026-04-09T11-08-32-505Z-scenario-08`: work had landed outside run dir (no app tree in folder). |
+| 09  | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) |                                        |                        |           |                            |                                                                                                                                                                                                                                                                                                                                                                                                                     |
+| 10  | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md)           |                                        |                        |           |                            |                                                                                                                                                                                                                                                                                                                                                                                                                     |
 
 
 ### Column hints

From 43a3f3c49815481f53920bcf3c83e225ebd22541 Mon Sep 17 00:00:00 2001
From: Phil Leggetter <phil@leggetter.co.uk>
Date: Thu, 9 Apr 2026 23:07:19 +0100
Subject: [PATCH 18/47] docs(pages): guide Option 3 full-stack Outpost
 integration

Expand the copy-paste agent template so existing apps with a product UI
wire backend (BFF, server SDK) and frontend (calls own API only). Point
to Concepts and Building your own UI before destination screens; allow
API-only path when there is no customer UI.

Made-with: Cursor
---
 .../quickstarts/hookdeck-outpost-agent-prompt.mdx  | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx b/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx
index a36ea94a9..28e9cc9af 100644
--- a/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx
+++ b/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx
@@ -57,7 +57,13 @@ Do **not** mix patterns across languages (e.g. do not apply TypeScript `publish.
 
 **Option 2 (small app)** — Map framework to the matching official SDK on the **server only**: e.g. **Next.js** → TypeScript SDK + patterns from the TypeScript quickstart and your Next conventions; **FastAPI** → Python SDK; **Go + net/http** → Go SDK. Prefer each language’s **quickstart** for Outpost call shapes. **Before** designing pages or forms, read **Concepts** and **Building your own UI** in the Documentation list: the UI should reflect **tenant scope**, **multiple destinations per tenant**, and **destination = topic subscription + delivery target** (not a single anonymous webhook field unless the user explicitly asks for that simplified shape).
 
-**Option 3 (existing app)** — Use the **official SDK for the repo’s language** on the server (or REST/OpenAPI if they insist on no SDK). Read that language’s quickstart for shapes; integrate on **real** domain paths, not throwaway demos.
+**Option 3 (existing app)** — Use the **official SDK for the repo’s backend language** on the **server** (or REST/OpenAPI if they insist on no SDK). Read that language’s quickstart for call shapes; integrate on **real** domain paths (signup, core entities, workflows), not throwaway demos.
+
+**Full-stack existing apps (backend + product UI)** — If the codebase already has a **customer-facing UI** (dashboard, settings, integrations, account area) **or** a mobile app that talks to your API, assume operators want customers to **manage webhook destinations inside the product**, not only via raw API or Swagger:
+
+- **Backend:** Keep **`OUTPOST_API_KEY`** and all Outpost SDK usage **server-side only**. Implement **tenant** upsert/sync where it fits your model, **publish** on real domain events, and **authenticated HTTP routes** (BFF / API routes / server actions—whatever matches the stack) that list, create, update, or delete destinations for the **currently signed-in customer’s** tenant. Those handlers call Outpost with the platform credentials; responses return only what the customer should see (e.g. destination ids, URLs, topics—never the platform API key).
+- **Frontend:** Wire **logged-in** pages to **your** backend endpoints (session cookie, JWT, or your existing API client)—**not** to Hookdeck’s API directly and **not** with the Outpost SDK in the browser. Reuse your design system and routing. **Before** building screens, read **Concepts** and **Building your own UI** in the Documentation list: flows should reflect **tenant scope**, **multiple destinations per tenant**, and **destination = topic subscription + delivery target** (avoid a single undifferentiated “webhook” field that hides topics unless the operator asks for that simplification).
+- **API-only or headless products:** If there is **no** customer UI, document how tenants manage destinations through **your** documented API (OpenAPI, etc.); still keep the platform key on the server.
 
 ### What to do
 
@@ -67,11 +73,11 @@ Guide the conversation, then act:
 
 2. **Build a minimal example** — Small UI + server; use the **SDK for that stack** (see **Option 2** above) or REST if they choose HTTP-only. Follow **Concepts** + **Building your own UI** for the real product model. For a **tiny** demo (e.g. one page), still keep the model visible: **tenant** in scope, **create destination** as **topics + delivery target** (not one undifferentiated “webhook” field that hides topics), and a **separate** control or flow to **publish a test event** so the operator can verify delivery—avoid collapsing tenant setup, destination creation, and publish into a single form unless the user insists. An events or attempts view is optional for the smallest demo but matches the portal pattern when you have room.
 
-3. **Integrate with an existing app** — Clone or open their codebase; add Outpost per **Option 3** above; document env vars and operator steps.
+3. **Integrate with an existing app** — Open their codebase; implement **Option 3**. For repos that ship a **product UI**, integrate **both** server and client: backend Outpost calls plus customer-facing screens (or clear extension points) wired through **your** authenticated API, following **Building your own UI** for structure. Document env vars, tenant mapping, topics, and how to verify delivery (e.g. `{{TEST_DESTINATION_URL}}` or the Hookdeck dashboard).
 
-For all modes, read the **single** language-appropriate quickstart (and OpenAPI when implementing raw HTTP) before writing code.
+For all modes, read the **single** language-appropriate quickstart (and OpenAPI when implementing raw HTTP) before writing code. For **Option 3** with a UI, also read **Building your own UI** before implementing destination-management screens.
 
-**Files on disk:** When your environment supports it, **write runnable artifacts into the operator’s project workspace** (real files: scripts, app source, `package.json`, `go.mod`, README) rather than only pasting long code in chat—so they can run, diff, and commit. Keep everything for one task in the same directory. For **minimal example apps** (option 2), scaffold and install dependencies there as you normally would (for example `npm` / `npx`, `go mod`, `pip` or `uv`).
+**Files on disk:** When your environment supports it, **write runnable artifacts into the operator’s project workspace** (real files: scripts, app source, `package.json`, `go.mod`, README) rather than only pasting long code in chat—so they can run, diff, and commit. Keep everything for one task in the same directory. For **minimal example apps** (option 2), scaffold and install dependencies there as you normally would (for example `npm` / `npx`, `go mod`, `pip` or `uv`). For **Option 3** full-stack products, change both **backend and frontend** (or equivalent UI layer) when the repo already includes a customer-facing app—do not stop at OpenAPI-only unless the product is genuinely API-only or the operator asks to skip UI work.
 
 **Concepts:** Each **tenant** is one of the platform’s customers. A tenant has **zero or more destinations**; each **destination** is a **subscription**—a destination type (e.g. webhook) plus **which topics** to receive and **where** to deliver (e.g. HTTPS URL). Your **backend** publishes with **`tenant_id`**, **`topic`**, and payload; Outpost fans out to every destination of that tenant that subscribes to that topic. Read **{{DOCS_URL}}/concepts** and **{{DOCS_URL}}/guides/building-your-own-ui** for the full model and recommended screens. Topics for this project are listed above and were configured in the Hookdeck dashboard.
 ```

From 7ecdbeead71b6d6a7657b169ad53dcd55edb898c Mon Sep 17 00:00:00 2001
From: Phil Leggetter <phil@leggetter.co.uk>
Date: Thu, 9 Apr 2026 23:07:33 +0100
Subject: [PATCH 19/47] docs(agent-eval): scenario 09 uses full-stack FastAPI
 template

Pin scenario 09 to fastapi/full-stack-fastapi-template (React + Pydantic v2).
Update scoreScenario09 baseline check, README index, TEMP onboarding
status, and SCENARIO-RUN-TRACKER notes. Optional clone URL override:
EVAL_FASTAPI_BASELINE_URL.

Made-with: Cursor
---
 ...TEMP-hookdeck-outpost-onboarding-status.md |  2 +-
 docs/agent-evaluation/README.md               |  4 +-
 docs/agent-evaluation/SCENARIO-RUN-TRACKER.md | 24 +++++-----
 .../09-integrate-fastapi-existing.md          | 45 ++++++++++++-------
 docs/agent-evaluation/src/score-transcript.ts |  7 ++-
 5 files changed, 48 insertions(+), 34 deletions(-)

diff --git a/docs/TEMP-hookdeck-outpost-onboarding-status.md b/docs/TEMP-hookdeck-outpost-onboarding-status.md
index 8fbff69c8..e37ec4a9d 100644
--- a/docs/TEMP-hookdeck-outpost-onboarding-status.md
+++ b/docs/TEMP-hookdeck-outpost-onboarding-status.md
@@ -16,7 +16,7 @@ The automated harness in `docs/agent-evaluation/` is in place. **What it does to
 | **Runner**     | `src/run-agent-eval.ts` — **## Template** from `hookdeck-outpost-agent-prompt.mdx`, `{{…}}` from env, multi-turn scenarios, **Claude Agent SDK** with `**Read` / `Glob` / `Grep` / `WebFetch` / `Write` / `Edit` / `Bash`**, `**cwd`** = `results/runs/<stamp>-scenario-NN/`              |
 | **Artifacts**  | `transcript.json`, optional `**heuristic-score.json`** + `**llm-score.json`** (LLM reads each scenario `**## Success criteria**`), agent-written files beside the transcript                                                                                                              |
 | **Heuristics** | `score-transcript.ts` — `**scoreScenario01`–`scoreScenario10`** on assistant text + tool corpus (so **Write**/Edit content counts)                                                                                                                                                        |
-| **Scenarios**  | **01–04:** try-it-out (curl, TS, Python, Go). **05–07:** minimal UIs (Next, FastAPI, Go `net/http`). **08–10:** Option 3 — integrate into pinned repos (Next `**leerob/next-saas-starter`**, FastAPI `**philipokiokio/FastAPI_SAAS_Template`**, Go `**devinterface/startersaas-go-api**`) |
+| **Scenarios**  | **01–04:** try-it-out (curl, TS, Python, Go). **05–07:** minimal UIs (Next, FastAPI, Go `net/http`). **08–10:** Option 3 — integrate into pinned repos (Next `**leerob/next-saas-starter`**, FastAPI `**fastapi/full-stack-fastapi-template`**, Go `**devinterface/startersaas-go-api**`) |
 | **CLI**        | `**npm run eval` requires `--scenario`, `--scenarios`, or `--all`** — no accidental full-suite run. Default scoring = **heuristic + LLM judge** unless `**--no-score`** / `**--no-score-llm`** or `**EVAL_NO_SCORE_***`. **Exit 1** if any enabled score fails                            |
 | **CI**         | `**npm run eval:ci`** = `**--scenarios 01,02`** + heuristic **and** LLM judge. `**scripts/ci-eval.sh`** — requires `**ANTHROPIC_API_KEY`**, `**EVAL_TEST_DESTINATION_URL**`                                                                                                               |
 | **Re-score**   | `npm run score -- --run <run-dir> [--llm] [--write]`                                                                                                                                                                                                                                      |
diff --git a/docs/agent-evaluation/README.md b/docs/agent-evaluation/README.md
index 7b5e6b439..4a591adb8 100644
--- a/docs/agent-evaluation/README.md
+++ b/docs/agent-evaluation/README.md
@@ -156,7 +156,7 @@ There is still **no single portable “IDE agent” CLI** for all vendors; the S
 | 06 | `scoreScenario06` | FastAPI, `outpost_sdk`, uvicorn, server env, two flows, README, webhook docs |
 | 07 | `scoreScenario07` | `net/http`, Go SDK + `CreateDestinationCreateWebhook`, HTML UI, two flows, `go run`, README |
 | 08 | `scoreScenario08` | Clone **next-saas-starter** (or git baseline), TS SDK, publish/destinations/tenants, server env key, per-customer webhook story |
-| 09 | `scoreScenario09` | Clone **FastAPI_SAAS_Template** (or git baseline), `outpost_sdk`, integration + domain hook, env key |
+| 09 | `scoreScenario09` | Clone **full-stack-fastapi-template** (or git baseline), `outpost_sdk`, integration + domain hook, env key |
 | 10 | `scoreScenario10` | Clone **startersaas-go-api** (or git baseline), Go Outpost SDK, publish + handler hook, env key |
 
 Export **`SCENARIO_IDS_WITH_HEURISTIC_RUBRIC`** in `score-transcript.ts` lists IDs **01–10** for tooling.
@@ -175,7 +175,7 @@ To record each **`npm run eval -- --scenario …`** run, automated scores, and *
 | 6 | [scenarios/06-app-fastapi.md](scenarios/06-app-fastapi.md) | Small **FastAPI** app with the same UX as scenario 5. |
 | 7 | [scenarios/07-app-go-http.md](scenarios/07-app-go-http.md) | Small **Go** `net/http` app + simple HTML UI (same UX as scenario 5). |
 | 8 | [scenarios/08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md) | **Existing Next.js SaaS** baseline — add outbound webhooks via Outpost ([leerob/next-saas-starter](https://github.com/leerob/next-saas-starter)). |
-| 9 | [scenarios/09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) | **Existing FastAPI SaaS** baseline — Outpost integration ([philipokiokio/FastAPI_SAAS_Template](https://github.com/philipokiokio/FastAPI_SAAS_Template)). |
+| 9 | [scenarios/09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) | **Existing FastAPI full-stack** baseline — Outpost integration ([fastapi/full-stack-fastapi-template](https://github.com/fastapi/full-stack-fastapi-template)). |
 | 10 | [scenarios/10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md) | **Existing Go SaaS API** baseline — Outpost integration ([devinterface/startersaas-go-api](https://github.com/devinterface/startersaas-go-api)). |
 
 Scenarios **1–4** align with **“Try it out”**; **5–7** with **“Build a minimal example”**; **8–10** with **“Integrate with an existing app”** using pinned OSS baselines (Java / .NET can be added later the same way).
diff --git a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
index 197269399..f7a207a1d 100644
--- a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
+++ b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
@@ -18,18 +18,18 @@ Use this table while you **run scenarios one at a time** and **execute the gener
 ## Tracker
 
 
-| ID  | Scenario file                                                                  | Run directory (`results/runs/…`)       | Heuristic              | LLM judge | Execution (generated code) | Notes                                                                                                                                                                                                                                                                                                                                                                                                               |
-| --- | ------------------------------------------------------------------------------ | -------------------------------------- | ---------------------- | --------- | -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| 01  | [01-basics-curl.md](scenarios/01-basics-curl.md)                               | `2026-04-08T15-07-08-923Z-scenario-01` | Pass (7/7)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost-quickstart.sh`**. Execution: `docs/agent-evaluation/.env` + `./outpost-quickstart.sh`; tenant 200, destination 201, publish **202**; exit 0.                                                                                                                                                                                                                              |
-| 02  | [02-basics-typescript.md](scenarios/02-basics-typescript.md)                   | `2026-04-08T15-16-50-424Z-scenario-02` | Pass (9/9)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost-quickstart.ts`** + `package.json` (SDK). Ran `npx tsx outpost-quickstart.ts` with `docs/agent-evaluation/.env`; tenant, destination, publish OK.                                                                                                                                                                                                                          |
-| 03  | [03-basics-python.md](scenarios/03-basics-python.md)                           | `2026-04-08T15-34-12-720Z-scenario-03` | Pass (8/8)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost_quickstart.py`**. Execution: `python3 -m venv .venv`, `pip install outpost_sdk`, `docs/agent-evaluation/.env` + `python outpost_quickstart.py`.                                                                                                                                                                                                                           |
-| 04  | [04-basics-go.md](scenarios/04-basics-go.md)                                   | `2026-04-08T15-48-31-367Z-scenario-04` | Pass (9/9)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifacts: `**main.go`**, `go.mod` (replace → repo `sdks/outpost-go`). `docs/agent-evaluation/.env` + `go run .`; tenant, destination, publish OK.                                                                                                                                                                                                                                             |
-| 05  | [05-app-nextjs.md](scenarios/05-app-nextjs.md)                                 | `2026-04-08T17-21-22-170Z-scenario-05` | 9/10; overall **Fail** | Pass      | Pass                       | `**nextjs-webhook-demo/`** — primary assessed run; see **§ Scenario 05 — assessment (`17-21-22`)**. Heuristic failure is `managed_base_not_selfhosted` (doc-corpus), not the app. Earlier: `2026-04-08T16-12-10-708Z` (`outpost-nextjs-demo/`, 10/10 heuristic, simpler UI).                                                                                                                                        |
-| 06  | [06-app-fastapi.md](scenarios/06-app-fastapi.md)                               | `2026-04-09T08-38-42-008Z-scenario-06` | Pass (8/8)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. `**main.py`** + `requirements.txt`, `outpost_sdk` + FastAPI. HTML: destinations list, add webhook (topics from API + URL), publish test event, delete. Execution: `python3 -m venv .venv`, `pip install -r requirements.txt`, run-dir `.env`, `uvicorn main:app` on :8766; **GET /** 200, **POST /destinations** 303, **POST /publish** 303.                                                   |
-| 07  | [07-app-go-http.md](scenarios/07-app-go-http.md)                               | `2026-04-09T09-10-23-291Z-scenario-07` | Pass (9/9)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. `**go-portal-demo/`** — `main.go` + `templates/`, `net/http`, `outpost-go` (`replace` → repo `sdks/outpost-go`). Multi-step create destination + **GET/POST /publish**. Execution: `PORT=8777` + key/base from `docs/agent-evaluation/.env`; **GET /** 200, **POST /publish** 200. Eval ~25 min wall time.                                                                                     |
-| 08  | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md)   | `2026-04-09T14-48-16-906Z-scenario-08` | Pass (7/7)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. `**## Eval harness`** pre-clone + `**agent cwd**` = `next-saas-starter/` under run dir; artifact colocated (`app/api/outpost/**`, dashboard webhooks, `@hookdeck/outpost-sdk`). **Execution:** `npx tsc --noEmit` in `…/next-saas-starter/` — **exit 0**. Eval ~13 min wall time. Earlier run `2026-04-09T11-08-32-505Z-scenario-08`: work had landed outside run dir (no app tree in folder). |
-| 09  | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) |                                        |                        |           |                            |                                                                                                                                                                                                                                                                                                                                                                                                                     |
-| 10  | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md)           |                                        |                        |           |                            |                                                                                                                                                                                                                                                                                                                                                                                                                     |
+| ID  | Scenario file                                                                  | Run directory (`results/runs/…`)       | Heuristic              | LLM judge | Execution (generated code) | Notes                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
+| --- | ------------------------------------------------------------------------------ | -------------------------------------- | ---------------------- | --------- | -------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| 01  | [01-basics-curl.md](scenarios/01-basics-curl.md)                               | `2026-04-08T15-07-08-923Z-scenario-01` | Pass (7/7)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost-quickstart.sh`**. Execution: `docs/agent-evaluation/.env` + `./outpost-quickstart.sh`; tenant 200, destination 201, publish **202**; exit 0.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
+| 02  | [02-basics-typescript.md](scenarios/02-basics-typescript.md)                   | `2026-04-08T15-16-50-424Z-scenario-02` | Pass (9/9)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost-quickstart.ts`** + `package.json` (SDK). Ran `npx tsx outpost-quickstart.ts` with `docs/agent-evaluation/.env`; tenant, destination, publish OK.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
+| 03  | [03-basics-python.md](scenarios/03-basics-python.md)                           | `2026-04-08T15-34-12-720Z-scenario-03` | Pass (8/8)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost_quickstart.py`**. Execution: `python3 -m venv .venv`, `pip install outpost_sdk`, `docs/agent-evaluation/.env` + `python outpost_quickstart.py`.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
+| 04  | [04-basics-go.md](scenarios/04-basics-go.md)                                   | `2026-04-08T15-48-31-367Z-scenario-04` | Pass (9/9)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifacts: `**main.go`**, `go.mod` (replace → repo `sdks/outpost-go`). `docs/agent-evaluation/.env` + `go run .`; tenant, destination, publish OK.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
+| 05  | [05-app-nextjs.md](scenarios/05-app-nextjs.md)                                 | `2026-04-08T17-21-22-170Z-scenario-05` | 9/10; overall **Fail** | Pass      | Pass                       | `**nextjs-webhook-demo/`** — primary assessed run; see **§ Scenario 05 — assessment (`17-21-22`)**. Heuristic failure is `managed_base_not_selfhosted` (doc-corpus), not the app. Earlier: `2026-04-08T16-12-10-708Z` (`outpost-nextjs-demo/`, 10/10 heuristic, simpler UI).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
+| 06  | [06-app-fastapi.md](scenarios/06-app-fastapi.md)                               | `2026-04-09T08-38-42-008Z-scenario-06` | Pass (8/8)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. `**main.py`** + `requirements.txt`, `outpost_sdk` + FastAPI. HTML: destinations list, add webhook (topics from API + URL), publish test event, delete. Execution: `python3 -m venv .venv`, `pip install -r requirements.txt`, run-dir `.env`, `uvicorn main:app` on :8766; **GET /** 200, **POST /destinations** 303, **POST /publish** 303.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
+| 07  | [07-app-go-http.md](scenarios/07-app-go-http.md)                               | `2026-04-09T09-10-23-291Z-scenario-07` | Pass (9/9)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. `**go-portal-demo/`** — `main.go` + `templates/`, `net/http`, `outpost-go` (`replace` → repo `sdks/outpost-go`). Multi-step create destination + **GET/POST /publish**. Execution: `PORT=8777` + key/base from `docs/agent-evaluation/.env`; **GET /** 200, **POST /publish** 200. Eval ~25 min wall time.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
+| 08  | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md)   | `2026-04-09T14-48-16-906Z-scenario-08` | Pass (7/7)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. `**## Eval harness`** pre-clone + `**agent cwd`** = `next-saas-starter/` under run dir; artifact colocated (`app/api/outpost/`**, dashboard webhooks, `@hookdeck/outpost-sdk`). **Execution:** `npx tsc --noEmit` in `…/next-saas-starter/` — **exit 0**. Eval ~13 min wall time. Earlier run `2026-04-09T11-08-32-505Z-scenario-08`: work had landed outside run dir (no app tree in folder).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
+| 09  | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) | `2026-04-09T20-48-16-530Z-scenario-09` | Pass (6/6)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact `full-stack-fastapi-template/` under run dir. **Execution (macOS, Docker):** Injected `OUTPOST_API_KEY` from `docs/agent-evaluation/.env` into project `.env`. **Agent gap:** `outpost_sdk` was in `backend/pyproject.toml` but **not** in repo-root `uv.lock` — first `docker compose … up` left backend crashing (`ModuleNotFoundError: outpost_sdk`); regenerated lock with `uv lock` (via one-off `python:3.10-slim` + `pip install uv`) then rebuild. **Host ports:** local **5432** / **8000** were busy — temporarily mapped DB **54333→5432** and backend **8001→8000** in `compose.override.yml` for the smoke run; reverted to template defaults after `docker compose -p outpost-s09-exec down`. **Checks:** `GET /api/v1/utils/health-check/` **200**; `GET /docs` **200**; OpenAPI lists `/api/v1/webhooks/destinations`; `POST /api/v1/users/signup` **200** (Outpost tenant upsert + `user.created` publish — HTTP 2xx to Hookdeck API in logs); `GET /api/v1/webhooks/destinations` with new user JWT **200** `[]`. Superuser listing destinations before any tenant upsert returned **502** (expected: Outpost **404** tenant not found). **Legacy:** `2026-04-09T15-51-44-184Z-scenario-09` (`FastAPI_SAAS_Template/`). |
+| 10  | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md)           |                                        |                        |           |                            |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
 
 
 ### Column hints
diff --git a/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md b/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md
index c80554dc4..36f31229b 100644
--- a/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md
+++ b/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md
@@ -2,13 +2,17 @@
 
 ## Intent
 
-Same as [scenario 8](08-integrate-nextjs-existing.md), but stack is **Python + FastAPI** with a **multi-tenant / org** style baseline.
+Same as [scenario 8](08-integrate-nextjs-existing.md), but stack is **Python + FastAPI** with a **multi-tenant / team** style baseline that also ships a **real web UI** (so operators can exercise dashboards, not only OpenAPI).
 
-**Baseline application (pin this in evals):** [**philipokiokio/FastAPI_SAAS_Template**](https://github.com/philipokiokio/FastAPI_SAAS_Template) — FastAPI, organizations, permissions, Alembic, MIT-style OSS template commonly used as a starting point. Substitute only if you document another baseline in the scenario and update heuristics.
+**Baseline application (pin this in evals):** [**fastapi/full-stack-fastapi-template**](https://github.com/fastapi/full-stack-fastapi-template) — maintained full-stack app: **FastAPI** backend (SQLModel, **Pydantic v2**), **React + TypeScript + Vite** frontend, PostgreSQL, Docker Compose, JWT auth, MIT license. Substitute only if you document another baseline in the scenario and update heuristics.
+
+**Supersedes:** The previous pin [**philipokiokio/FastAPI_SAAS_Template**](https://github.com/philipokiokio/FastAPI_SAAS_Template) (stale dependencies, API-only, no product UI).
 
 ## Preconditions
 
-- Python 3.10+; `git` available.
+- Python 3.10+; **Node.js 18+** (for the frontend); `git` available.
+- **Docker** (recommended) — template dev flow uses Docker Compose for API, DB, and frontend; see repository `development.md`.
+- Same Turn 0 placeholders as other scenarios (`OUTPOST_API_KEY` **not** in the prompt text; test destination URL from dashboard).
 
 ## Eval harness
 
@@ -17,19 +21,21 @@ Same as [scenario 8](08-integrate-nextjs-existing.md), but stack is **Python + F
   "preSteps": [
     {
       "type": "git_clone",
-      "url": "https://github.com/philipokiokio/FastAPI_SAAS_Template.git",
-      "into": "FastAPI_SAAS_Template",
+      "url": "https://github.com/fastapi/full-stack-fastapi-template.git",
+      "into": "full-stack-fastapi-template",
       "depth": 1,
-      "urlEnv": "EVAL_FASTAPI_SAAS_BASELINE_URL"
+      "urlEnv": "EVAL_FASTAPI_BASELINE_URL"
     }
   ],
-  "agentCwd": "FastAPI_SAAS_Template"
+  "agentCwd": "full-stack-fastapi-template"
 }
 ```
 
+Optional: set **`EVAL_FASTAPI_BASELINE_URL`** to override the clone URL (fork or pinned commit).
+
 ## Automated eval (Claude Agent SDK)
 
-The agent starts **inside** the cloned baseline above. Expect **`pip` / `uv`** setup from the template README, then **Write** / **Edit** for Outpost integration.
+The agent starts **inside** the cloned baseline above. Expect **`docker compose`** and/or **`uv` / `pip`** per **`development.md`** and **`backend/README.md`**, then **Write** / **Edit** for Outpost integration (backend-first; UI hooks optional but encouraged when they clarify the customer webhook story).
 
 ## Conversation script
 
@@ -39,26 +45,31 @@ Paste the **## Template** from [`hookdeck-outpost-agent-prompt.mdx`](../pages/qu
 
 ### Turn 1 — User
 
-> Option 3 — integrate Outpost into a real codebase. **We’re already in the FastAPI SaaS template in this workspace** — the repository is present here. Set it up from its README, then add **Hookdeck Outpost** for customer webhooks.
+> Option 3 — integrate Outpost into a real codebase. **We’re already in the full-stack FastAPI template in this workspace** — the repository is present here. Follow the project’s dev docs to get backend (and frontend if useful) running, then add **Hookdeck Outpost** for customer webhooks.
 >
-> Hook publishing to **one real event** that already exists in the app (orgs, users, whatever fits). Document topics, how tenants register webhook URLs, and env vars. Don’t leak the API key to the client.
+> Hook publishing to **one real event** that already exists in the app (users, items, teams, whatever fits). Document topics, how tenants register webhook URLs, and env vars. Don’t leak the API key to the client.
 
 ### Turn 2 — User (optional)
 
-> Should we create the Outpost tenant when the org is created, or lazily on first publish?
+> When should we create or sync the Outpost **tenant** with our own customer or team model?
 
 ## Success criteria
 
 **Measurement:** Heuristic `scoreScenario09` in [`src/score-transcript.ts`](../src/score-transcript.ts); LLM judge; execution manual.
 
-- **FastAPI_SAAS_Template** (or documented alternative) present via harness **`preSteps`** with install steps in the transcript or tree.
-- **`outpost_sdk`** with **`publish.event`** (and related calls as needed) on a **real** code path.
+- **full-stack-fastapi-template** (or documented alternative) present via harness **`preSteps`** with install steps in the transcript or tree.
+- **`outpost_sdk`** with **`publish.event`** (and related calls as needed) on a **real** code path in the **backend** (server-side only for secrets).
 - API key from **environment** or secure settings — not hard-coded or exposed to clients.
-- **Topic** and **destination** story documented.
-- README updated for env + run.
-- **Execution (full pass):** App starts; trigger path fires publish; Outpost accepts. *Skip for transcript-only.*
+- **Topic** and **destination** story documented (README or inline); if the app has a UI, linking or exposing **safe** controls for webhook URLs is a plus.
+- README (or equivalent) lists **env vars** for Outpost.
+- **Execution (full pass):** Stack runs per template docs; trigger path fires publish; Outpost accepts. *Skip for transcript-only.*
 
 ## Failure modes to note
 
-- Greenfield FastAPI “hello world” instead of the **cloned** template.
+- Greenfield FastAPI “hello world” instead of the **cloned** baseline.
 - Using raw `httpx` to Outpost when the scenario asks for **`outpost_sdk`**.
+- Putting `OUTPOST_API_KEY` in `NEXT_PUBLIC_*` / client bundles.
+
+## Future baselines
+
+Other “existing FastAPI app” pins can follow the same shape: harness pre-clone + Option 3 Turn 1 + success criteria + `scoreScenario09`.
diff --git a/docs/agent-evaluation/src/score-transcript.ts b/docs/agent-evaluation/src/score-transcript.ts
index ec7455243..976bd9bcd 100644
--- a/docs/agent-evaluation/src/score-transcript.ts
+++ b/docs/agent-evaluation/src/score-transcript.ts
@@ -799,13 +799,16 @@ function scoreScenario09(corpus: string, assistant: string): TranscriptScore {
 
   const baseline =
     /philipokiokio\/fastapi_saas_template|fastapi_saas_template|FastAPI_SAAS/i.test(t) ||
+    /fastapi\/full-stack-fastapi-template|full-stack-fastapi-template|full_stack_fastapi_template/i.test(
+      t,
+    ) ||
     (/git\s+clone\b/.test(lower) && /github\.com/.test(t));
   checks.push({
     id: "baseline_or_clone",
     pass: baseline,
     detail: baseline
-      ? "References FastAPI_SAAS_Template baseline or git clone"
-      : "Expected clone/setup of philipokiokio/FastAPI_SAAS_Template (or documented alternative)",
+      ? "References FastAPI baseline (full-stack template or legacy SaaS template) or git clone"
+      : "Expected clone/setup of fastapi/full-stack-fastapi-template (or documented alternative)",
   });
 
   const sdk = /from\s+outpost_sdk\s+import|import\s+outpost_sdk/.test(t);

From 12bca0d0aede32876d65506f4ab06d2fc85cddc2 Mon Sep 17 00:00:00 2001
From: Phil Leggetter <phil@leggetter.co.uk>
Date: Thu, 9 Apr 2026 23:29:21 +0100
Subject: [PATCH 20/47] docs(agent-eval): record scenario 09 re-eval after
 prompt update

Run 2026-04-09T22-16-54-750Z-scenario-09: heuristic 6/6, LLM pass. Point
execution notes to prior Docker smoke on 20-48 stamp.

Made-with: Cursor
---
 docs/agent-evaluation/SCENARIO-RUN-TRACKER.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
index f7a207a1d..c153e608c 100644
--- a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
+++ b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
@@ -28,7 +28,7 @@ Use this table while you **run scenarios one at a time** and **execute the gener
 | 06  | [06-app-fastapi.md](scenarios/06-app-fastapi.md)                               | `2026-04-09T08-38-42-008Z-scenario-06` | Pass (8/8)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. `**main.py`** + `requirements.txt`, `outpost_sdk` + FastAPI. HTML: destinations list, add webhook (topics from API + URL), publish test event, delete. Execution: `python3 -m venv .venv`, `pip install -r requirements.txt`, run-dir `.env`, `uvicorn main:app` on :8766; **GET /** 200, **POST /destinations** 303, **POST /publish** 303.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
 | 07  | [07-app-go-http.md](scenarios/07-app-go-http.md)                               | `2026-04-09T09-10-23-291Z-scenario-07` | Pass (9/9)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. `**go-portal-demo/`** — `main.go` + `templates/`, `net/http`, `outpost-go` (`replace` → repo `sdks/outpost-go`). Multi-step create destination + **GET/POST /publish**. Execution: `PORT=8777` + key/base from `docs/agent-evaluation/.env`; **GET /** 200, **POST /publish** 200. Eval ~25 min wall time.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
 | 08  | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md)   | `2026-04-09T14-48-16-906Z-scenario-08` | Pass (7/7)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. `**## Eval harness`** pre-clone + `**agent cwd`** = `next-saas-starter/` under run dir; artifact colocated (`app/api/outpost/`**, dashboard webhooks, `@hookdeck/outpost-sdk`). **Execution:** `npx tsc --noEmit` in `…/next-saas-starter/` — **exit 0**. Eval ~13 min wall time. Earlier run `2026-04-09T11-08-32-505Z-scenario-08`: work had landed outside run dir (no app tree in folder).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
-| 09  | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) | `2026-04-09T20-48-16-530Z-scenario-09` | Pass (6/6)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact `full-stack-fastapi-template/` under run dir. **Execution (macOS, Docker):** Injected `OUTPOST_API_KEY` from `docs/agent-evaluation/.env` into project `.env`. **Agent gap:** `outpost_sdk` was in `backend/pyproject.toml` but **not** in repo-root `uv.lock` — first `docker compose … up` left backend crashing (`ModuleNotFoundError: outpost_sdk`); regenerated lock with `uv lock` (via one-off `python:3.10-slim` + `pip install uv`) then rebuild. **Host ports:** local **5432** / **8000** were busy — temporarily mapped DB **54333→5432** and backend **8001→8000** in `compose.override.yml` for the smoke run; reverted to template defaults after `docker compose -p outpost-s09-exec down`. **Checks:** `GET /api/v1/utils/health-check/` **200**; `GET /docs` **200**; OpenAPI lists `/api/v1/webhooks/destinations`; `POST /api/v1/users/signup` **200** (Outpost tenant upsert + `user.created` publish — HTTP 2xx to Hookdeck API in logs); `GET /api/v1/webhooks/destinations` with new user JWT **200** `[]`. Superuser listing destinations before any tenant upsert returned **502** (expected: Outpost **404** tenant not found). **Legacy:** `2026-04-09T15-51-44-184Z-scenario-09` (`FastAPI_SAAS_Template/`). |
+| 09  | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) | `2026-04-09T22-16-54-750Z-scenario-09` | Pass (6/6)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Re-run after **Option 3 full-stack prompt** update (`hookdeck-outpost-agent-prompt.mdx`); artifact `full-stack-fastapi-template/`; ~7m wall. **Execution:** not re-smoked on this stamp; prior Docker smoke + Outpost checks documented on run `2026-04-09T20-48-16-530Z-scenario-09` (lockfile `uv lock`, host port overrides, signup + `GET /api/v1/webhooks/destinations`). **Older:** `2026-04-09T15-51-44-184Z-scenario-09` (`FastAPI_SAAS_Template/`). |
 | 10  | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md)           |                                        |                        |           |                            |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
 
 

From 9bad0219cca18029bdd295f356594fd35f2d5318 Mon Sep 17 00:00:00 2001
From: Phil Leggetter <phil@leggetter.co.uk>
Date: Fri, 10 Apr 2026 00:43:33 +0100
Subject: [PATCH 21/47] docs: scenario 09 tracker, agent prompt, BYO UI
 events/retry guidance

Made-with: Cursor
---
 docs/agent-evaluation/SCENARIO-RUN-TRACKER.md |  48 +++++--
 docs/pages/guides/building-your-own-ui.mdx    | 121 +++++++++---------
 .../hookdeck-outpost-agent-prompt.mdx         |  20 ++-
 3 files changed, 118 insertions(+), 71 deletions(-)

diff --git a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
index c153e608c..82a96208b 100644
--- a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
+++ b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
@@ -18,20 +18,44 @@ Use this table while you **run scenarios one at a time** and **execute the gener
 ## Tracker
 
 
-| ID  | Scenario file                                                                  | Run directory (`results/runs/…`)       | Heuristic              | LLM judge | Execution (generated code) | Notes                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
-| --- | ------------------------------------------------------------------------------ | -------------------------------------- | ---------------------- | --------- | -------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| 01  | [01-basics-curl.md](scenarios/01-basics-curl.md)                               | `2026-04-08T15-07-08-923Z-scenario-01` | Pass (7/7)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost-quickstart.sh`**. Execution: `docs/agent-evaluation/.env` + `./outpost-quickstart.sh`; tenant 200, destination 201, publish **202**; exit 0.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
-| 02  | [02-basics-typescript.md](scenarios/02-basics-typescript.md)                   | `2026-04-08T15-16-50-424Z-scenario-02` | Pass (9/9)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost-quickstart.ts`** + `package.json` (SDK). Ran `npx tsx outpost-quickstart.ts` with `docs/agent-evaluation/.env`; tenant, destination, publish OK.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
-| 03  | [03-basics-python.md](scenarios/03-basics-python.md)                           | `2026-04-08T15-34-12-720Z-scenario-03` | Pass (8/8)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost_quickstart.py`**. Execution: `python3 -m venv .venv`, `pip install outpost_sdk`, `docs/agent-evaluation/.env` + `python outpost_quickstart.py`.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
-| 04  | [04-basics-go.md](scenarios/04-basics-go.md)                                   | `2026-04-08T15-48-31-367Z-scenario-04` | Pass (9/9)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifacts: `**main.go`**, `go.mod` (replace → repo `sdks/outpost-go`). `docs/agent-evaluation/.env` + `go run .`; tenant, destination, publish OK.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
-| 05  | [05-app-nextjs.md](scenarios/05-app-nextjs.md)                                 | `2026-04-08T17-21-22-170Z-scenario-05` | 9/10; overall **Fail** | Pass      | Pass                       | `**nextjs-webhook-demo/`** — primary assessed run; see **§ Scenario 05 — assessment (`17-21-22`)**. Heuristic failure is `managed_base_not_selfhosted` (doc-corpus), not the app. Earlier: `2026-04-08T16-12-10-708Z` (`outpost-nextjs-demo/`, 10/10 heuristic, simpler UI).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
-| 06  | [06-app-fastapi.md](scenarios/06-app-fastapi.md)                               | `2026-04-09T08-38-42-008Z-scenario-06` | Pass (8/8)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. `**main.py`** + `requirements.txt`, `outpost_sdk` + FastAPI. HTML: destinations list, add webhook (topics from API + URL), publish test event, delete. Execution: `python3 -m venv .venv`, `pip install -r requirements.txt`, run-dir `.env`, `uvicorn main:app` on :8766; **GET /** 200, **POST /destinations** 303, **POST /publish** 303.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
-| 07  | [07-app-go-http.md](scenarios/07-app-go-http.md)                               | `2026-04-09T09-10-23-291Z-scenario-07` | Pass (9/9)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. `**go-portal-demo/`** — `main.go` + `templates/`, `net/http`, `outpost-go` (`replace` → repo `sdks/outpost-go`). Multi-step create destination + **GET/POST /publish**. Execution: `PORT=8777` + key/base from `docs/agent-evaluation/.env`; **GET /** 200, **POST /publish** 200. Eval ~25 min wall time.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
-| 08  | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md)   | `2026-04-09T14-48-16-906Z-scenario-08` | Pass (7/7)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. `**## Eval harness`** pre-clone + `**agent cwd`** = `next-saas-starter/` under run dir; artifact colocated (`app/api/outpost/`**, dashboard webhooks, `@hookdeck/outpost-sdk`). **Execution:** `npx tsc --noEmit` in `…/next-saas-starter/` — **exit 0**. Eval ~13 min wall time. Earlier run `2026-04-09T11-08-32-505Z-scenario-08`: work had landed outside run dir (no app tree in folder).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
-| 09  | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) | `2026-04-09T22-16-54-750Z-scenario-09` | Pass (6/6)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Re-run after **Option 3 full-stack prompt** update (`hookdeck-outpost-agent-prompt.mdx`); artifact `full-stack-fastapi-template/`; ~7m wall. **Execution:** not re-smoked on this stamp; prior Docker smoke + Outpost checks documented on run `2026-04-09T20-48-16-530Z-scenario-09` (lockfile `uv lock`, host port overrides, signup + `GET /api/v1/webhooks/destinations`). **Older:** `2026-04-09T15-51-44-184Z-scenario-09` (`FastAPI_SAAS_Template/`). |
-| 10  | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md)           |                                        |                        |           |                            |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
+| ID  | Scenario file                                                                  | Run directory (`results/runs/…`)       | Heuristic              | LLM judge | Execution (generated code) | Notes                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
+| --- | ------------------------------------------------------------------------------ | -------------------------------------- | ---------------------- | --------- | -------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| 01  | [01-basics-curl.md](scenarios/01-basics-curl.md)                               | `2026-04-08T15-07-08-923Z-scenario-01` | Pass (7/7)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost-quickstart.sh`**. Execution: `docs/agent-evaluation/.env` + `./outpost-quickstart.sh`; tenant 200, destination 201, publish **202**; exit 0.                                                                                                                                                                                                                                                                                                                                                              |
+| 02  | [02-basics-typescript.md](scenarios/02-basics-typescript.md)                   | `2026-04-08T15-16-50-424Z-scenario-02` | Pass (9/9)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost-quickstart.ts`** + `package.json` (SDK). Ran `npx tsx outpost-quickstart.ts` with `docs/agent-evaluation/.env`; tenant, destination, publish OK.                                                                                                                                                                                                                                                                                                                                                          |
+| 03  | [03-basics-python.md](scenarios/03-basics-python.md)                           | `2026-04-08T15-34-12-720Z-scenario-03` | Pass (8/8)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost_quickstart.py`**. Execution: `python3 -m venv .venv`, `pip install outpost_sdk`, `docs/agent-evaluation/.env` + `python outpost_quickstart.py`.                                                                                                                                                                                                                                                                                                                                                           |
+| 04  | [04-basics-go.md](scenarios/04-basics-go.md)                                   | `2026-04-08T15-48-31-367Z-scenario-04` | Pass (9/9)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifacts: `**main.go`**, `go.mod` (replace → repo `sdks/outpost-go`). `docs/agent-evaluation/.env` + `go run .`; tenant, destination, publish OK.                                                                                                                                                                                                                                                                                                                                                                             |
+| 05  | [05-app-nextjs.md](scenarios/05-app-nextjs.md)                                 | `2026-04-08T17-21-22-170Z-scenario-05` | 9/10; overall **Fail** | Pass      | Pass                       | `**nextjs-webhook-demo/`** — primary assessed run; see **§ Scenario 05 — assessment (`17-21-22`)**. Heuristic failure is `managed_base_not_selfhosted` (doc-corpus), not the app. Earlier: `2026-04-08T16-12-10-708Z` (`outpost-nextjs-demo/`, 10/10 heuristic, simpler UI).                                                                                                                                                                                                                                                                        |
+| 06  | [06-app-fastapi.md](scenarios/06-app-fastapi.md)                               | `2026-04-09T08-38-42-008Z-scenario-06` | Pass (8/8)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. `**main.py`** + `requirements.txt`, `outpost_sdk` + FastAPI. HTML: destinations list, add webhook (topics from API + URL), publish test event, delete. Execution: `python3 -m venv .venv`, `pip install -r requirements.txt`, run-dir `.env`, `uvicorn main:app` on :8766; **GET /** 200, **POST /destinations** 303, **POST /publish** 303.                                                                                                                                                                                   |
+| 07  | [07-app-go-http.md](scenarios/07-app-go-http.md)                               | `2026-04-09T09-10-23-291Z-scenario-07` | Pass (9/9)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. `**go-portal-demo/`** — `main.go` + `templates/`, `net/http`, `outpost-go` (`replace` → repo `sdks/outpost-go`). Multi-step create destination + **GET/POST /publish**. Execution: `PORT=8777` + key/base from `docs/agent-evaluation/.env`; **GET /** 200, **POST /publish** 200. Eval ~25 min wall time.                                                                                                                                                                                                                     |
+| 08  | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md)   | `2026-04-09T14-48-16-906Z-scenario-08` | Pass (7/7)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. `**## Eval harness`** pre-clone + `**agent cwd`** = `next-saas-starter/` under run dir; artifact colocated (`app/api/outpost/`**, dashboard webhooks, `@hookdeck/outpost-sdk`). **Execution:** `npx tsc --noEmit` in `…/next-saas-starter/` — **exit 0**. Eval ~13 min wall time. Earlier run `2026-04-09T11-08-32-505Z-scenario-08`: work had landed outside run dir (no app tree in folder).                                                                                                                                 |
+| 09  | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) | `2026-04-09T22-16-54-750Z-scenario-09` | Pass (6/6)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. **Artifact** lives under `results/runs/…` (**gitignored**): `full-stack-fastapi-template/` + Docker **outpost-local-s09**; ports **5173** / **8001** / **54333** / **1080**. **§ Scenario 09 — post-agent work** lists everything applied after the agent transcript (incl. test publish, events/attempts/retry UI, docs + prompt). **Legacy runs:** `2026-04-09T20-48-16-530Z-scenario-09`, `2026-04-09T15-51-44-184Z-scenario-09`. |
+| 10  | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md)           |                                        |                        |           |                            |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
 
 
+### Scenario 09 — post-agent work (`2026-04-09T22-16-54-750Z-scenario-09`)
+
+Work applied **after** the agent transcript so the FastAPI + React artifact matches current integration guidance (eval honesty + local execution). The template tree under `results/runs/<stamp>-scenario-09/` is **not committed** (see `results/.gitignore`); repo **docs** and **prompt** updates that back this scenario **are** in git.
+
+**Frontend / router**
+
+- **TanStack Router:** `frontend/src/routeTree.gen.ts` — register `/_layout/webhooks` (agent added the route file but not the generated tree).
+- **API base URL:** webhooks page used browser-relative `/api/...` against nginx; switched to backend base (`OpenAPI.BASE` / `VITE_API_URL`).
+- **Destination types:** Outpost JSON uses **`type`** and **`icon`** (not `id` / `svg`); fixed controlled radios / **Next** in the create wizard.
+
+**Backend**
+
+- **`POST /api/v1/webhooks/publish-test`** — synthetic `publish` for integration testing.
+- **`GET /api/v1/webhooks/events`**, **`GET /api/v1/webhooks/attempts`**, **`POST /api/v1/webhooks/retry`** — BFF proxies for tenant-scoped **events list**, **attempts**, and **manual retry** (admin key server-side).
+
+**Dashboard UI (webhooks page)**
+
+- **Send test event**, **Event activity** (filter by destination, select event → attempts table, **Retry** on failed attempts).
+
+**Docs & prompt (repository)**
+
+- [Building your own UI](../pages/guides/building-your-own-ui.mdx) — destination-type field fixes; **Events, attempts, and retries** section (features, how they connect, links to API).
+- [Agent prompt template](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx) — full-stack guidance mentions **events list**, **attempts**, **retry**, alongside test publish.
+
 ### Column hints
 
 
diff --git a/docs/pages/guides/building-your-own-ui.mdx b/docs/pages/guides/building-your-own-ui.mdx
index 3b5e1711b..e3d90fad5 100644
--- a/docs/pages/guides/building-your-own-ui.mdx
+++ b/docs/pages/guides/building-your-own-ui.mdx
@@ -25,7 +25,7 @@ Outpost’s tenant portal is a good reference for how screens map to the **tenan
 | ---- | ------- |
 | **Destinations list** | Show all destinations for the current tenant (each row is one subscription: type, human-readable **target** such as webhook URL, subscribed topics). Entry point to edit, disable, or remove. |
 | **Create destination** | Multi-step flow aligned with the API: (1) **choose destination type**, (2) **select topics** (from the topics configured on your Outpost project—often checkboxes or multi-select), (3) **configure** type-specific fields (e.g. webhook URL, credentials). Optional: instructions or remote setup links from the destination type schema. |
-| **Events and delivery attempts** | List recent events for the tenant and inspect **delivery attempts** per event or destination so users can see outcomes, failures, and retries—similar to the portal’s event and log experience. |
+| **Events and delivery attempts** | List recent events for the tenant, optionally scoped to one **destination**, and inspect **delivery attempts** (success/failure, response metadata). Support **manual retry** from the UI when an attempt failed—see [Events, attempts, and retries](#events-attempts-and-retries) below. |
 
 For how tenants, destinations, and topics fit together in a multi-tenant product, see [Outpost Concepts](/docs/concepts)—especially **How this fits your product**.
 
@@ -60,6 +60,14 @@ curl --request GET "$OUTPOST_API_BASE_URL/tenants/$TENANT_ID/token" \
 
 The destination type schema can be fetched using the [Destination Types Schema API](/docs/api/schemas). It can be used to render destination information such as the destination type icon and label. Additionally, the schema includes the destination type configuration fields, which can be used to render the destination configuration UI.
 
+Each entry returned by `GET /destination-types` includes:
+
+- **`type`** — string identifier for the destination kind (for example `webhook`). Use this as the stable key when mapping rows to [list destinations](/docs/api/destinations#list-destinations) results (`destination.type` refers to the same value). It is **not** named `id` in the API.
+- **`label`**, **`description`**, **`icon`** — display metadata; **`icon`** is typically an SVG string (some examples and older code may call this field `svg`—the JSON field is **`icon`**).
+- **`config_fields`** and **`credential_fields`** — arrays of field definitions for the configuration step (snake_case in JSON responses).
+
+Always align your UI types with the [OpenAPI schema](/docs/api) or a live response—do not assume generic names like `id` for the destination type identifier.
+
 ## Listing Configured Destinations
 
 Destinations are listed using the [List Destinations API](/docs/api/destinations#list-destinations). Destinations can be listed by type and topic. Since each destination type has different configuration, the `target` field can be used to display a recognizable label for the destination, such as the Webhook URL, the SQS queue URL, or Hookdeck Source Name associated with the destination. Each destination type will return a sensible `target` value to display.
@@ -104,21 +112,25 @@ if (!destination_types || !destinations) {
   return <div>Loading...</div>;
 }
 
-const destination_type_map = destination_types.reduce((acc, type) => {
-  acc[type.id] = type;
+// Key by `type` (API identifier), not `id` — see "Fetching Destination Type Schema" above.
+const destination_type_map = destination_types.reduce((acc, dt) => {
+  if (dt.type) acc[dt.type] = dt;
   return acc;
 }, {});
 
 return (
   <ul>
-    {destinations.map((destination) => (
+    {destinations.map((destination) => {
+      const meta = destination_type_map[destination.type];
+      if (!meta) return null;
+      return (
       <li key={destination.id}>
         <span
           dangerouslySetInnerHTML={{
-            __html: destination_type_map[destination.type].svg,
+            __html: meta.icon ?? "",
           }}
         />
-        <h2>{destination_type_map[destination.type].label}</h2>
+        <h2>{meta.label}</h2>
         {destination.target_url ? (
           <a
             href={destination.target_url}
@@ -131,7 +143,8 @@ return (
           <p>{destination.target}</p>
         )}
       </li>
-    ))}
+      );
+    })}
   </ul>
 );
 ```
@@ -179,22 +192,22 @@ return (
   <div>
     <h1>Choose a destination type</h1>
     <form onSubmit={handleSubmit}>
-      {destinations?.map((destination) => (
-        <label key={destination.type}>
+      {destination_types.map((dt) => (
+        <label key={dt.type}>
           <input
             type="radio"
             name="type"
-            value={destination.type}
+            value={dt.type}
             required
             defaultChecked={
-              defaultValue ? defaultValue.type === destination.type : undefined
+              defaultValue ? defaultValue.type === dt.type : undefined
             }
           />
           <p>
-            <span dangerouslySetInnerHTML={{ __html: destination.icon }} />
-            {destination.label}
+            <span dangerouslySetInnerHTML={{ __html: dt.icon ?? "" }} />
+            {dt.label}
           </p>
-          <p>{destination.description}</p>
+          <p>{dt.description}</p>
         </label>
       ))}
     </form>
@@ -240,6 +253,7 @@ return (
           {topic}
         </label>
       ))}
+    </form>
   </div>
 );
 ```
@@ -248,7 +262,7 @@ You can find the source code of the `TopicPicker.tsx` component of the User Port
 
 ### Configuring the Destination
 
-Using the destination type schema for the selected destination type, you can render a form to create and manage destinations configuration. The configuration fields are found in the `configuration_fields` and `credentials_fields` arrays of the destination type schema.
+Using the destination type schema for the selected destination type, you can render a form to create and manage destinations configuration. The configuration fields are found in the **`config_fields`** and **`credential_fields`** arrays of the destination type schema (snake_case in JSON responses).
 
 To render your form, you should render all fields from both arrays. Note that some of the `credentials_fields` will be obfuscated once the destination is created, and in order to edit the input, the value must be cleared first.
 
@@ -293,18 +307,22 @@ const DestinationConfigForm = ({
   }
 
   const type_schema = destination_types.find(
-    (type) => type.id === destination_type
+    (t) => t.type === destination_type
   );
 
+  if (!type_schema) {
+    return <div>Unknown destination type</div>;
+  }
+
   return (
     <>
-      {destination_type_schema.remote_setup_url ? (
+      {type_schema.remote_setup_url ? (
         <a
-          href={destination_type_schema.remote_setup_url}
+          href={type_schema.remote_setup_url}
           target="_blank"
           rel="noopener noreferrer"
         >
-          Setup in {destination_type_schema.label}
+          Setup in {type_schema.label}
         </a>
       ) : (
         <button onClick={showInstructionsModal}>
@@ -314,7 +332,10 @@ const DestinationConfigForm = ({
         </button>
       )}
       <form onSubmit={handleSubmit}>
-        {[...type_schema.config_fields, ...type_schema.credential_fields].map(
+        {[
+          ...(type_schema.config_fields ?? []),
+          ...(type_schema.credential_fields ?? []),
+        ].map(
           (field) => (
             <div key={field.key}>
               <label htmlFor={field.key}>
@@ -352,8 +373,7 @@ const DestinationConfigForm = ({
               )}
               {field.description && <p>{field.description}</p>}
             </div>
-          )
-        )}
+        ))}
       </form>
     </>
   );
@@ -362,45 +382,32 @@ const DestinationConfigForm = ({
 
 You can find the source code of the `DestinationConfigForm.tsx` component of the User Portal here: [DestinationConfigForm.tsx](https://github.com/hookdeck/outpost/blob/main/internal/portal/src/common/DestinationConfigFields/DestinationConfigFields.tsx#L14)
 
-## Listing Events
-
-Events are listed using the [List Events API](/docs/api/events#list-events). You can use the `topic` parameter to filter the events by topic or the `destination_id` parameter to filter the events by destination.
+## Events, attempts, and retries
 
-```tsx
-const [events, setEvents] = useState([]);
+This section ties together **how customers see what was delivered** and **how they recover from failures**—without duplicating full UI code (see the [portal](https://github.com/hookdeck/outpost/tree/main/internal/portal) and [OpenAPI](/docs/api) for request/response shapes).
 
-const fetchEvents = async () => {
-  const response = await fetch(`${API_URL}/tenants/events`, {
-    headers: {
-      Authorization: `Bearer ${token}`,
-    },
-  });
-};
+### How the pieces fit
 
-useEffect(() => {
-  fetchEvents();
-}, []);
+1. **Destinations list** — Each row is a subscription (type, target URL or label, topics). From here, users typically open **“Activity”**, **“Events”**, or **“Logs”** for that destination, or you filter a shared events view by `destination_id`.
+2. **Events** — An event is something your **backend published** (topic + payload). The [List Events API](/docs/api/events#list-events) returns a **paginated** list. Important query dimensions:
+   - **`destination_id`** — only events that were routed to that destination (ideal for a per-destination screen).
+   - **`topic`**, **time ranges**, **pagination** (`limit`, `next` / `prev` cursors) — for broader “recent activity” views.
+   With a **tenant JWT**, results are limited to that tenant; with an **admin API key**, supply **`tenant_id`** (your BFF usually injects it from the signed-in customer).
+3. **Attempts** — Each row is one **delivery try** to a destination (status, HTTP code, timing, optional response payload). Link attempts to events via **`event_id`** and **`destination_id`**.
+   - Tenant-wide: [List attempts](/docs/api/attempts#list-attempts) with `event_id` (and optionally `destination_id`).
+   - Destination-scoped: `GET /tenants/{tenant_id}/destinations/{destination_id}/attempts` — see [OpenAPI / tenant destination attempts](/docs/api) (same filters, including `event_id` when drilling down).
+4. **Automatic vs manual retry** — Outpost [retries failed deliveries automatically](/docs/features/event-delivery) (backoff, limits). **Manual retry** lets a user trigger another delivery after fixing their endpoint—use [Retry event delivery](/docs/api/attempts#retry-attempt) (`POST /retry` with **`event_id`** and **`destination_id`**). The destination must be enabled and subscribed to the event’s topic; disabled destinations cannot be retried.
 
-if (!events) {
-  return <div>Loading...</div>;
-}
+### What to expose in your dashboard UI
 
-return (
-  <div>
-    <h1>Events</h1>
-    <ul>
-      {events.map((event) => (
-        <li key={event.id}>
-          <h2>{event.id}</h2>
-          <p>{event.created_at}</p>
-          <p>{event.payload}</p>
-        </li>
-      ))}
-    </ul>
-  </div>
-);
-```
+| User need | API direction |
+| --------- | ------------- |
+| “What fired for my webhook?” | List **events** filtered by **`destination_id`**, then list **attempts** for the chosen **`event_id`** (and destination). |
+| “Why did it fail?” | Show attempt **status**, **code**, and **response** fields (when included); link to your own docs on fixing URLs, auth, or timeouts. |
+| “Send it again” | **Retry** button on failed attempts (or on the event row if you only show one destination) → `POST /retry`. Show **202** / success vs **400** (e.g. destination disabled) from the API. |
 
-For each event, you can retrieve all its associated delivery attempts using the [List Event Attempts API](/docs/api/events#list-event-attempts).
+### Implementation notes
 
-You can find the source code of the `Events.tsx` component of the User Portal here: [Events.tsx](https://github.com/hookdeck/outpost/blob/main/internal/portal/src/scenes/Destination/Events/Events.tsx)
+- **Pagination:** Event and attempt list endpoints are cursor-paged; your UI should pass through **`next`** / **`prev`** (or “Load more”) so busy tenants are usable.
+- **Auth:** If the browser never sees the admin key, proxy these endpoints from your backend and attach the platform **Outpost API key** server-side, scoping **`tenant_id`** to the logged-in customer—same pattern as destination CRUD.
+- **Reference UI:** The portal’s destination flow includes event listing for a destination—see [Events.tsx](https://github.com/hookdeck/outpost/blob/main/internal/portal/src/scenes/Destination/Events/Events.tsx) as a reference layout (not a copy-paste requirement).
diff --git a/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx b/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx
index 28e9cc9af..9e08b3e03 100644
--- a/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx
+++ b/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx
@@ -36,7 +36,7 @@ Use this **Hookdeck Console Source** URL to verify event delivery (the webhook `
 - Full docs bundle (when available on the public site): {{LLMS_FULL_URL}}
 - API reference and OpenAPI (REST JSON shapes and status codes): {{DOCS_URL}}/api
 - **Concepts — how tenants, destinations (subscriptions), topics, and publish fit a SaaS/platform:** {{DOCS_URL}}/concepts
-- **Building your own UI — screen structure and flow** (list destinations, create destination: type → topics → config; tenant scope): {{DOCS_URL}}/guides/building-your-own-ui
+- **Building your own UI — screen structure and flow** (list destinations, create destination: type → topics → config; **events list**, delivery **attempts**, **manual retry**; tenant scope): {{DOCS_URL}}/guides/building-your-own-ui
 - Destination types: {{DOCS_URL}}/destinations
 - Topics and destination subscriptions (fan-out, `*`): {{DOCS_URL}}/features/topics
 - SDK overview (mostly TypeScript-shaped examples): {{DOCS_URL}}/sdks — use **only** for high-level context; for **TypeScript, Python, or Go** code, follow that language’s **quickstart** for correct method signatures (e.g. Python `publish.event` uses `request={{...}}`, not TypeScript-style spreads as Python kwargs).
@@ -63,6 +63,8 @@ Do **not** mix patterns across languages (e.g. do not apply TypeScript `publish.
 
 - **Backend:** Keep **`OUTPOST_API_KEY`** and all Outpost SDK usage **server-side only**. Implement **tenant** upsert/sync where it fits your model, **publish** on real domain events, and **authenticated HTTP routes** (BFF / API routes / server actions—whatever matches the stack) that list, create, update, or delete destinations for the **currently signed-in customer’s** tenant. Those handlers call Outpost with the platform credentials; responses return only what the customer should see (e.g. destination ids, URLs, topics—never the platform API key).
 - **Frontend:** Wire **logged-in** pages to **your** backend endpoints (session cookie, JWT, or your existing API client)—**not** to Hookdeck’s API directly and **not** with the Outpost SDK in the browser. Reuse your design system and routing. **Before** building screens, read **Concepts** and **Building your own UI** in the Documentation list: flows should reflect **tenant scope**, **multiple destinations per tenant**, and **destination = topic subscription + delivery target** (avoid a single undifferentiated “webhook” field that hides topics unless the operator asks for that simplification).
+- **Events and retries in the product UI:** Surface an **events** view (filterable by **destination** when useful) so customers can see what was published, plus **delivery attempts** per event (success/failure, response hints). For **failed** attempts, offer **manual retry** (server-side `POST /retry` with `event_id` and `destination_id`) after they fix their endpoint—see **Building your own UI** for how this links to destinations and to automatic retries in Outpost.
+- **Send test events (strongly recommended for full-stack / Option 3):** When you ship customer-facing destination management, also add a **separate** control or screen that **publishes a test event** for the signed-in tenant (server-side `publish` to a selectable topic, same pattern as the test destination URL above). Treat this as **best practice** for SaaS products that offer webhooks: it proves end-to-end delivery without waiting on production traffic and matches what operators expect (similar to “send test webhook” in major platforms). Implement it **by default** for this integration path; the product team can remove or gate it later, but skipping it makes verification much harder.
 - **API-only or headless products:** If there is **no** customer UI, document how tenants manage destinations through **your** documented API (OpenAPI, etc.); still keep the platform key on the server.
 
 ### What to do
@@ -73,10 +75,24 @@ Guide the conversation, then act:
 
 2. **Build a minimal example** — Small UI + server; use the **SDK for that stack** (see **Option 2** above) or REST if they choose HTTP-only. Follow **Concepts** + **Building your own UI** for the real product model. For a **tiny** demo (e.g. one page), still keep the model visible: **tenant** in scope, **create destination** as **topics + delivery target** (not one undifferentiated “webhook” field that hides topics), and a **separate** control or flow to **publish a test event** so the operator can verify delivery—avoid collapsing tenant setup, destination creation, and publish into a single form unless the user insists. An events or attempts view is optional for the smallest demo but matches the portal pattern when you have room.
 
-3. **Integrate with an existing app** — Open their codebase; implement **Option 3**. For repos that ship a **product UI**, integrate **both** server and client: backend Outpost calls plus customer-facing screens (or clear extension points) wired through **your** authenticated API, following **Building your own UI** for structure. Document env vars, tenant mapping, topics, and how to verify delivery (e.g. `{{TEST_DESTINATION_URL}}` or the Hookdeck dashboard).
+3. **Integrate with an existing app** — Open their codebase; implement **Option 3**. For repos that ship a **product UI**, integrate **both** server and client: backend Outpost calls plus customer-facing screens (or clear extension points) wired through **your** authenticated API, following **Building your own UI** for structure—**including test publish**, an **events** list (and attempts / **retry** where appropriate), unless the operator explicitly asks to omit parts. Document env vars, tenant mapping, topics, and how to verify delivery (e.g. `{{TEST_DESTINATION_URL}}` or the Hookdeck dashboard).
 
 For all modes, read the **single** language-appropriate quickstart (and OpenAPI when implementing raw HTTP) before writing code. For **Option 3** with a UI, also read **Building your own UI** before implementing destination-management screens.
 
+### Before you stop (verify)
+
+Apply **only** the items below that fit the task; **skip** any that do not apply (e.g. skip the existing-repo items for a standalone script or curl-only flow).
+
+**Always (when you produced or changed runnable code):**
+
+- [ ] **Ran** the smallest end-to-end check that fits this task (e.g. run the script or shell flow once, exercise one new API path, or smoke the UI/API flow you added) and saw a clear success signal (e.g. event id, HTTP 2xx, or expected output).
+- [ ] **Secrets:** The platform Outpost API key remains **server-side** / **environment** only — not in client bundles, not hard-coded in committed source.
+- [ ] **Repeatable:** Env vars, how to run, and how to verify with the test destination above are stated briefly (README, comments, or chat — match the task size; a one-file script may need only inline or chat notes).
+
+**When editing an existing application repository (Option 3 or equivalent):**
+
+- [ ] **Build integrity:** Generated outputs, route or module registries, and dependency lockfiles are **consistent** with new or edited source so a **clean** install + typecheck or build (or the repo’s documented CI step) would pass.
+
 **Files on disk:** When your environment supports it, **write runnable artifacts into the operator’s project workspace** (real files: scripts, app source, `package.json`, `go.mod`, README) rather than only pasting long code in chat—so they can run, diff, and commit. Keep everything for one task in the same directory. For **minimal example apps** (option 2), scaffold and install dependencies there as you normally would (for example `npm` / `npx`, `go mod`, `pip` or `uv`). For **Option 3** full-stack products, change both **backend and frontend** (or equivalent UI layer) when the repo already includes a customer-facing app—do not stop at OpenAPI-only unless the product is genuinely API-only or the operator asks to skip UI work.
 
 **Concepts:** Each **tenant** is one of the platform’s customers. A tenant has **zero or more destinations**; each **destination** is a **subscription**—a destination type (e.g. webhook) plus **which topics** to receive and **where** to deliver (e.g. HTTPS URL). Your **backend** publishes with **`tenant_id`**, **`topic`**, and payload; Outpost fans out to every destination of that tenant that subscribes to that topic. Read **{{DOCS_URL}}/concepts** and **{{DOCS_URL}}/guides/building-your-own-ui** for the full model and recommended screens. Topics for this project are listed above and were configured in the Hookdeck dashboard.

From 7ab552d7ade4faa2c0c6b6b9c6ab585471b67097 Mon Sep 17 00:00:00 2001
From: Phil Leggetter <phil@leggetter.co.uk>
Date: Fri, 10 Apr 2026 02:23:54 +0100
Subject: [PATCH 22/47] docs: refresh Building Your Own UI guide

Reword for customer-facing UI builders: clearer tenant/auth framing,
configurable API base URL, less internal jargon and emphasis noise.
Add implementation checklists for planning, destinations, activity,
and safe rendering without duplicating the OpenAPI mapping tables.

Made-with: Cursor
---
 docs/pages/guides/building-your-own-ui.mdx | 495 ++++++---------------
 1 file changed, 146 insertions(+), 349 deletions(-)

diff --git a/docs/pages/guides/building-your-own-ui.mdx b/docs/pages/guides/building-your-own-ui.mdx
index e3d90fad5..fd8496b76 100644
--- a/docs/pages/guides/building-your-own-ui.mdx
+++ b/docs/pages/guides/building-your-own-ui.mdx
@@ -2,51 +2,87 @@
 title: "Building Your Own UI"
 ---
 
-While Outpost offers a Tenant User Portal, you may want to build your own UI for users to manage their destinations and view their events.
+While Outpost offers a Tenant User Portal, you may want to build your own UI so your customers can manage their destinations and view delivery activity.
 
-The portal is built using the Outpost API with JWT authentication. You can leverage the same API to build your own UI.
+The portal uses the same Outpost API you can call from your product. Its source is a useful reference ([`internal/portal`](https://github.com/hookdeck/outpost/tree/main/internal/portal), React); you are not required to match its stack.
 
-Within this guide, we will use the User Portal as a reference implementation for a simple UI. You can find the full source code for the User Portal [here](https://github.com/hookdeck/outpost/tree/main/internal/portal).
+This guide is framework-agnostic. It describes screens, flows, and how they map to the API. For paths, query parameters, request and response JSON, status codes, and authentication, use the [OpenAPI specification](/docs/api) as the authoritative contract. If anything here disagrees with OpenAPI, trust the spec.
 
-In this guide, we will assume you are using React (client-side) to build your own UI, but the same principles can be applied to any other framework.
+### Working from OpenAPI
+
+Each screen should map to named operations in the spec (list destinations, create destination, list events, and so on). Use the published schemas for request bodies and list rows.
+
+Destination type labels, icons, and dynamic form fields come from `GET /destination-types`—specifically `config_fields` and `credential_fields` (see [Destination type metadata and dynamic config](#destination-type-metadata-and-dynamic-config)). That response is the source for field keys and types, not guesses from older examples.
+
+If the browser calls Outpost directly, use the tenant JWT flows documented in OpenAPI. If you proxy through your backend (often called a BFF), your server performs the same operations with your session and injects `tenant_id` where the admin-key flows require it.
+
+The portal shows full UI code for complex forms; this page avoids long framework-specific snippets so the spec stays the single place for shapes and validation.
 
 ## UI structure and flow
 
-Outpost’s tenant portal is a good reference for how screens map to the **tenant → destinations → topics → delivery target** model. When you build your own UI, keep the same structure so operators and end users are not forced into a misleading “single global webhook URL” mental model.
+The tenant portal illustrates how screens map to tenant → destinations → topics → delivery target. Following that shape helps your customers understand subscriptions and targets instead of a single anonymous “webhook URL.”
 
 **Tenant context**
 
-- Everything below is **scoped to one tenant**—the signed-in customer in your SaaS or the account selected in your platform. That tenant id is what you pass to Outpost when listing or creating destinations and when publishing from your backend.
-- If you use JWT auth against Outpost, the token is issued **for that tenant**; if you proxy through your API, your routes should resolve the current customer to a `tenant_id` and forward it on list/create/publish calls.
+- Everything below applies to one tenant at a time: the signed-in account in your SaaS (your customer). Use that account’s `tenant_id` when listing or creating destinations and when publishing from your backend.
+- With a tenant JWT, the token is scoped to that tenant. If you proxy through your API, resolve the signed-in account to `tenant_id` and forward it on list, create, and publish calls.
 
 **Recommended areas / screens**
 
 | Area | Purpose |
 | ---- | ------- |
-| **Destinations list** | Show all destinations for the current tenant (each row is one subscription: type, human-readable **target** such as webhook URL, subscribed topics). Entry point to edit, disable, or remove. |
-| **Create destination** | Multi-step flow aligned with the API: (1) **choose destination type**, (2) **select topics** (from the topics configured on your Outpost project—often checkboxes or multi-select), (3) **configure** type-specific fields (e.g. webhook URL, credentials). Optional: instructions or remote setup links from the destination type schema. |
-| **Events and delivery attempts** | List recent events for the tenant, optionally scoped to one **destination**, and inspect **delivery attempts** (success/failure, response metadata). Support **manual retry** from the UI when an attempt failed—see [Events, attempts, and retries](#events-attempts-and-retries) below. |
+| Destinations list | All destinations for the current tenant (type, human-readable target such as webhook URL, queue name, or Hookdeck label, plus subscribed topics). Entry point to edit, disable, or remove. |
+| Create destination | Multi-step flow: (1) choose destination type, (2) select topics from your Outpost project configuration, (3) fill type-specific config from the type schema. Optional: instructions or remote setup URL from the schema. |
+| Events and delivery attempts | Default pattern: open activity from a destination (events, then attempts, then retry in that context). Optional: a tenant-wide activity view with a destination filter for support or power users. See [Default information architecture](#default-information-architecture-multi-destination-products) and [Events, attempts, and retries](#events-attempts-and-retries). |
+
+### Default information architecture (multi-destination products)
+
+When a tenant can have many destinations—of any [destination type](/docs/destinations) your project enables—the primary path is destination → activity: people ask “what was delivered to this subscription?” rather than seeing all traffic in one undifferentiated list. The same API applies for webhooks, queues, and other types; only create/edit forms differ, driven by [destination type metadata and dynamic config](#destination-type-metadata-and-dynamic-config).
+
+For list events and list attempts, reuse the same endpoints everywhere: vary query parameters (for example `destination_id`, cursors) rather than inventing parallel client-side contracts. Pagination and auth details are defined in [OpenAPI](/docs/api); [Events, attempts, and retries](#events-attempts-and-retries) below summarizes how those endpoints support common UI needs.
+
+**Example routes** (rename to fit your product—integrations, event destinations, webhooks, etc.):
+
+| Example route | What it does | Spec |
+| ------------- | ------------ | ---- |
+| `…/destinations` or `…/integrations` | Hub: list destinations; create or drill down | [Listing destinations](#listing-configured-destinations) · [List destinations](/docs/api/destinations#list-destinations) |
+| `…/destinations/new` (or wizard) | Create destination: choose type ([types](/docs/destinations); `GET /destination-types` in [OpenAPI](/docs/api)), then topics and config | [Creating a destination](#creating-a-destination) |
+| `…/destinations/:destinationId` | Detail: edit config, enable/disable, topics | [OpenAPI](/docs/api) — Destinations |
+| `…/destinations/:destinationId/activity` | Activity for this destination: events, attempts, retry | [Events, attempts, and retries](#events-attempts-and-retries) · [List events](/docs/api/events#list-events) · [List attempts](/docs/api/attempts#list-attempts) |
+| `…/activity` (optional) | Tenant-wide activity; optional filter by `destination_id` | Same list-events operation with different query params ([OpenAPI](/docs/api)) |
+
+For the conceptual model, see [Outpost Concepts](/docs/concepts), especially “How this fits your product.”
+
+## OpenAPI: core operations for a tenant dashboard
 
-For how tenants, destinations, and topics fit together in a multi-tenant product, see [Outpost Concepts](/docs/concepts)—especially **How this fits your product**.
+| Goal | OpenAPI entry point | In the UI |
+| ---- | ------------------- | --------- |
+| Types, labels, icons, dynamic form defs | [Destination types / schema](/docs/api/schemas) — `GET /destination-types` | Type picker; join list rows on `destination.type` (the type id is `type`, not a separate `id` on the type object). |
+| Topics for subscriptions | [Topics](/docs/api/topics#list-topics) — `GET /topics` | Checkboxes or multi-select on create/update. |
+| List destinations | [List destinations](/docs/api/destinations#list-destinations) | Main table; show `target` / `target_url` per schema. |
+| Create destination | [Create destination](/docs/api/destinations#create-destination) | Body: `type`, `topics`, type-specific `config` / credentials per spec. |
+| Get / update / delete | [OpenAPI](/docs/api) — Destinations | Detail and edit flows. |
+| Tenant JWT (optional browser calls) | [Tenant JWT](/docs/api/tenants#get-tenant-jwt-token) | Short-lived token; BFF is often simpler if you need to hide capabilities. |
+| Events, attempts, retry | [Events](/docs/api/events#list-events), [Attempts](/docs/api/attempts#list-attempts), [Retry](/docs/api/attempts#retry-attempt) | Activity and recovery; see below. |
 
 ## Authentication
 
-To perform API calls on behalf of your tenants, you can either generate a JWT token, which can be used client-side to make Outpost API calls, or you can proxy any API requests to the Outpost API through your own API. When proxying through your own API, you can ensure the API call is made for the currently authenticated tenant using the API `tenant_id` parameter.
+You can issue a tenant JWT for client-side calls to Outpost, or proxy requests through your own API. With a proxy, attach your platform’s Outpost API key on the server and scope each call to the authenticated tenant (for example via `tenant_id` on admin-key routes).
 
-Proxying through your own API can be useful if you want to limit access to some configuration or functionality of Outpost.
+Proxying is useful when you want to restrict which Outpost features are exposed or to keep the admin key off the client entirely.
 
 ### API base URL (managed and self-hosted)
 
-Examples below use a single variable **`API_URL`** (or **`OUTPOST_API_BASE_URL`** in shell snippets): the **root URL for Outpost’s HTTP API**, with **no trailing slash**. Paths in this guide match the [OpenAPI specification](/docs/api) (`/tenants/...`, `/topics`, `/destination-types`, …).
+Use one configurable base URL for Outpost (no trailing slash), for example `API_URL` or `OUTPOST_API_BASE_URL`. Paths in this guide match [OpenAPI](/docs/api) (`/tenants/...`, `/topics`, `/destination-types`, …).
 
-- **Hookdeck Outpost (managed):** use the base URL from your project (for example `https://api.outpost.hookdeck.com/2025-07-01`). The [managed curl quickstart](/docs/quickstarts/hookdeck-outpost-curl) uses the same pattern.
-- **Self-hosted Outpost:** use your deployment’s public origin **plus** whatever path prefix your install uses (commonly **`/api/v1`**), e.g. `https://outpost.internal.example.com/api/v1`. For local dev, use your actual host and port (see your deployment docs—do not assume a specific port in shared snippets).
+- **Managed Hookdeck Outpost:** use the base URL from your project (see the [curl quickstart](/docs/quickstarts/hookdeck-outpost-curl)).
+- **Self-hosted:** use your deployment’s public origin plus any path prefix (often `/api/v1`). Local development should still read host and port from configuration or environment so the same code works in staging and production.
 
-Do **not** hardcode `localhost` in product docs or copy-paste snippets meant for operators; always substitute your real base URL. The React snippets assume `API_URL` already includes any `/api/v1` segment so that `${API_URL}/tenants/destinations` resolves correctly for your environment.
+In your product, treat the base URL like any other external service: load it from config or env, not from literals baked into client bundles.
 
 ### Generating a JWT Token (Optional)
 
-You can generate a JWT token by using the [Tenant JWT Token API](/docs/api/tenants#get-tenant-jwt-token).
+See the [Tenant JWT Token API](/docs/api/tenants#get-tenant-jwt-token).
 
 ```bash
 export OUTPOST_API_BASE_URL="https://api.outpost.hookdeck.com/2025-07-01"   # or your self-hosted root, e.g. …/api/v1
@@ -56,358 +92,119 @@ curl --request GET "$OUTPOST_API_BASE_URL/tenants/$TENANT_ID/token" \
   --header "Authorization: Bearer <ADMIN_API_KEY>"
 ```
 
-## Fetching Destination Type Schema
-
-The destination type schema can be fetched using the [Destination Types Schema API](/docs/api/schemas). It can be used to render destination information such as the destination type icon and label. Additionally, the schema includes the destination type configuration fields, which can be used to render the destination configuration UI.
-
-Each entry returned by `GET /destination-types` includes:
-
-- **`type`** — string identifier for the destination kind (for example `webhook`). Use this as the stable key when mapping rows to [list destinations](/docs/api/destinations#list-destinations) results (`destination.type` refers to the same value). It is **not** named `id` in the API.
-- **`label`**, **`description`**, **`icon`** — display metadata; **`icon`** is typically an SVG string (some examples and older code may call this field `svg`—the JSON field is **`icon`**).
-- **`config_fields`** and **`credential_fields`** — arrays of field definitions for the configuration step (snake_case in JSON responses).
-
-Always align your UI types with the [OpenAPI schema](/docs/api) or a live response—do not assume generic names like `id` for the destination type identifier.
-
-## Listing Configured Destinations
-
-Destinations are listed using the [List Destinations API](/docs/api/destinations#list-destinations). Destinations can be listed by type and topic. Since each destination type has different configuration, the `target` field can be used to display a recognizable label for the destination, such as the Webhook URL, the SQS queue URL, or Hookdeck Source Name associated with the destination. Each destination type will return a sensible `target` value to display.
-
-```tsx
-// React example to fetch and render a list of destinations
-// API_URL = Outpost API root (managed project URL or self-hosted origin + /api/v1)
-
-const [destinations, setDestinations] = useState([]);
-
-const [destination_types, setDestinationTypes] = useState([]);
-
-const fetchDestinations = async () => {
-  // Get the tenant destinations (JWT infers tenant — see Authentication API)
-  const response = await fetch(`${API_URL}/tenants/destinations`, {
-    headers: {
-      Authorization: `Bearer ${token}`,
-    },
-  });
-
-  const destinations = await response.json();
-  setDestinations(destinations);
-};
-
-const fetchDestinationTypes = async () => {
-  const response = await fetch(`${API_URL}/destination-types`, {
-    headers: {
-      Authorization: `Bearer ${token}`,
-    },
-  });
-
-  const destination_types = await response.json();
-  setDestinationTypes(destination_types);
-};
-
-useEffect(() => {
-  fetchDestinations();
-  fetchDestinationTypes();
-}, []);
-
-if (!destination_types || !destinations) {
-  return <div>Loading...</div>;
-}
-
-// Key by `type` (API identifier), not `id` — see "Fetching Destination Type Schema" above.
-const destination_type_map = destination_types.reduce((acc, dt) => {
-  if (dt.type) acc[dt.type] = dt;
-  return acc;
-}, {});
-
-return (
-  <ul>
-    {destinations.map((destination) => {
-      const meta = destination_type_map[destination.type];
-      if (!meta) return null;
-      return (
-      <li key={destination.id}>
-        <span
-          dangerouslySetInnerHTML={{
-            __html: meta.icon ?? "",
-          }}
-        />
-        <h2>{meta.label}</h2>
-        {destination.target_url ? (
-          <a
-            href={destination.target_url}
-            target="_blank"
-            rel="noopener noreferrer"
-          >
-            {destination.target_url}
-          </a>
-        ) : (
-          <p>{destination.target}</p>
-        )}
-      </li>
-      );
-    })}
-  </ul>
-);
-```
+## Destination type metadata and dynamic config
 
-You can find the source code of the `DestinationList.tsx` component of the User Portal here: [DestinationList.tsx](https://github.com/hookdeck/outpost/blob/main/internal/portal/src/scenes/DestinationsList/DestinationList.tsx)
-
-## Creating a Destination
-
-To create a destination, the form will require three steps: one to choose the destination type, one to select the topics (optional), and one to configure the destination.
-
-### Choosing the Destination Type
-
-The list of available destination types is rendered from the list of destination types fetched from the API.
-
-```tsx
-const [destination_types, setDestinationTypes] = useState([]);
-
-const fetchDestinationTypes = async () => {
-  const response = await fetch(`${API_URL}/destination-types`, {
-    headers: {
-      Authorization: `Bearer ${token}`,
-    },
-  });
-
-  const destination_types = await response.json();
-  setDestinationTypes(destination_types);
-};
-
-useEffect(() => {
-  fetchDestinationTypes();
-}, []);
-
-const handleSubmit = (e: React.FormEvent<HTMLFormElement>) => {
-  e.preventDefault();
-  const formData = new FormData(e.target as HTMLFormElement);
-  const destination_type = formData.get("type");
-  goToNextStep(destination_type);
-};
-
-if (!destination_types) {
-  return <div>Loading...</div>;
-}
-
-return (
-  <div>
-    <h1>Choose a destination type</h1>
-    <form onSubmit={handleSubmit}>
-      {destination_types.map((dt) => (
-        <label key={dt.type}>
-          <input
-            type="radio"
-            name="type"
-            value={dt.type}
-            required
-            defaultChecked={
-              defaultValue ? defaultValue.type === dt.type : undefined
-            }
-          />
-          <p>
-            <span dangerouslySetInnerHTML={{ __html: dt.icon ?? "" }} />
-            {dt.label}
-          </p>
-          <p>{dt.description}</p>
-        </label>
-      ))}
-    </form>
-  </div>
-);
-```
+`GET /destination-types` returns everything needed to render type pickers and config forms. See the [Destination Types Schema API](/docs/api/schemas).
 
-You can find the source code of the `CreateDestination.tsx` component of the User Portal here: [CreateDestination.tsx](https://github.com/hookdeck/outpost/blob/main/internal/portal/src/scenes/CreateDestination/CreateDestination.tsx)
-
-### Selecting Topics
-
-Available topics are returned from the [List Topics API](/docs/api/topics#list-topics). You can display the list of topics as a list of checkboxes to capture the user input.
-
-```tsx
-const [topics, setTopics] = useState([]);
-
-const fetchTopics = async () => {
-  const response = await fetch(`${API_URL}/topics`, {
-    headers: {
-      Authorization: `Bearer ${token}`,
-    },
-  });
-
-  const topics = await response.json();
-  setTopics(topics);
-};
-
-useEffect(() => {
-  fetchTopics();
-}, []);
-
-if (!topics) {
-  return <div>Loading...</div>;
-}
-
-return (
-  <div>
-    <h1>Select topics</h1>
-    <form onSubmit={handleSubmit}>
-      {topics.map((topic) => (
-        <label key={topic}>
-          <input type="checkbox" name="topics" value={topic} />
-          {topic}
-        </label>
-      ))}
-    </form>
-  </div>
-);
-```
+Each entry typically includes (confirm names and optionality in OpenAPI):
 
-You can find the source code of the `TopicPicker.tsx` component of the User Portal here: [TopicPicker.tsx](https://github.com/hookdeck/outpost/blob/main/internal/portal/src/common/TopicPicker/TopicPicker.tsx)
+- `type` — Stable identifier (e.g. `webhook`). Matches `destination.type` on list rows; not named `id` on the type object.
+- `label`, `description`, `icon` — Display metadata; `icon` is often an SVG string (some older code used the name `svg`). Sanitize if you render inline HTML.
+- `config_fields`, `credential_fields` — Field definitions for the config step (snake_case in JSON). Include every field from both arrays on create and edit.
+- `instructions` — Markdown for complex setup (for example cloud resources).
+- `remote_setup_url` — Optional external setup flow before or instead of inline fields.
 
-### Configuring the Destination
+### Dynamic field shape (for forms)
 
-Using the destination type schema for the selected destination type, you can render a form to create and manage destinations configuration. The configuration fields are found in the **`config_fields`** and **`credential_fields`** arrays of the destination type schema (snake_case in JSON responses).
+Field objects are fully described in OpenAPI. Typically each has `key`, `label`, `type` (text vs checkbox), `required`, optional `description`, validation (`minlength`, `maxlength`, `pattern`), `default`, `disabled`, and `sensitive` (password-style; values may be masked after create—clear to edit).
 
-To render your form, you should render all fields from both arrays. Note that some of the `credentials_fields` will be obfuscated once the destination is created, and in order to edit the input, the value must be cleared first.
+**Reference:** [DestinationConfigFields.tsx](https://github.com/hookdeck/outpost/blob/main/internal/portal/src/common/DestinationConfigFields/DestinationConfigFields.tsx) maps schema fields to inputs.
 
-The input field schema is as follows:
+### Remote setup URL
 
-```ts
-type InputField = {
-  type: "text" | "checkbox"; // Only text and checkbox fields are supported
-  required: boolean; // If true, the field will be required
-  description?: string; // Field description, to use as a tooltip
-  sensitive?: boolean; // If true, the field will be obfuscated once the destination is created and should be treated as a password input
-  default?: string; // Default value for the field
-  minlength?: number; // Minimum length for the field
-  maxlength?: number; // Maximum length for the field
-  pattern?: string; // Regex validation pattern, to use with the input's pattern attribute
-};
-```
+When `remote_setup_url` is present, you can link users through an external setup flow (for example Hookdeck-managed configuration) instead of only inline fields.
 
-#### Remote Setup URL
-
-Some destination type schemas have a `remote_setup_url` property that contains a URL to a page where the destination can be configured. Destinations that support remote URLs have a simplified setup flow that doesn't require instructions. For example, with the Hookdeck destination, the user is taken through a setup flow managed by Hookdeck to configure the destination.
-
-The URL is optional but provides a better user experience than following sometimes lengthy instructions to configure the destination.
-
-#### Instructions
-
-Each destination type schema has an `instructions` property that contains instructions to configure the destination as a markdown string. These instructions should be displayed to the user to help them configure the destination, as for some destination types, such as AWS, the necessary configuration can be complex and require multiple steps by the user within AWS.
-
-Example of a destination configuration form:
-
-```tsx
-const DestinationConfigForm = ({
-  destination_type,
-}: {
-  destination_type: string;
-}) => {
-  const [destination_types, setDestinationTypes] = useState([]);
-  //... Fetch the destination type schema
-
-  if (!destination_types) {
-    return <div>Loading...</div>;
-  }
-
-  const type_schema = destination_types.find(
-    (t) => t.type === destination_type
-  );
-
-  if (!type_schema) {
-    return <div>Unknown destination type</div>;
-  }
-
-  return (
-    <>
-      {type_schema.remote_setup_url ? (
-        <a
-          href={type_schema.remote_setup_url}
-          target="_blank"
-          rel="noopener noreferrer"
-        >
-          Setup in {type_schema.label}
-        </a>
-      ) : (
-        <button onClick={showInstructionsModal}>
-          {" "}
-          // Modal not implemented just for example
-          {showInstructions ? "Hide instructions" : "Show instructions"}
-        </button>
-      )}
-      <form onSubmit={handleSubmit}>
-        {[
-          ...(type_schema.config_fields ?? []),
-          ...(type_schema.credential_fields ?? []),
-        ].map(
-          (field) => (
-            <div key={field.key}>
-              <label htmlFor={field.key}>
-                {field.label}
-                {field.required && <span>\*</span>}
-              </label>
-              {field.type === "text" && (
-                <>
-                  <input
-                    type={
-                      "sensitive" in field && field.sensitive
-                        ? "password"
-                        : "text"
-                    }
-                    placeholder={""}
-                    id={field.key}
-                    name={field.key}
-                    defaultValue={field.default}
-                    required={field.required}
-                    minLength={field.minlength}
-                    maxLength={field.maxlength}
-                    pattern={field.pattern}
-                  />
-                </>
-              )}
-              {field.type === "checkbox" && (
-                <input
-                  type="checkbox"
-                  id={field.key}
-                  name={field.key}
-                  defaultChecked={false}
-                  disabled={field.disabled}
-                  required={field.required}
-                />
-              )}
-              {field.description && <p>{field.description}</p>}
-            </div>
-        ))}
-      </form>
-    </>
-  );
-};
-```
+### Instructions
+
+Render `instructions` as markdown when the destination type needs context beyond simple fields.
+
+## Listing configured destinations
+
+Use the [List Destinations API](/docs/api/destinations#list-destinations). OpenAPI describes variants for admin API key (tenant in path or query) versus tenant JWT (tenant inferred from the token); choose the operations that match how you authenticate.
+
+- Call list and render `type`, `target`, `target_url` when present, and subscribed topics.
+- Optionally fetch `GET /destination-types` in parallel and map `type` string → schema row for `label` and `icon`.
+- Link each row to destination detail and destination-scoped activity ([Default information architecture](#default-information-architecture-multi-destination-products)).
+
+**Reference:** [DestinationList.tsx](https://github.com/hookdeck/outpost/blob/main/internal/portal/src/scenes/DestinationsList/DestinationList.tsx)
 
-You can find the source code of the `DestinationConfigForm.tsx` component of the User Portal here: [DestinationConfigForm.tsx](https://github.com/hookdeck/outpost/blob/main/internal/portal/src/common/DestinationConfigFields/DestinationConfigFields.tsx#L14)
+## Creating a destination
+
+The product flow is three steps; the API is typically one [create destination](/docs/api/destinations#create-destination) request once you have `type`, `topics`, and `config` (plus credentials if required). OpenAPI defines the body.
+
+### Step 1 — Choose destination type
+
+- Data: `GET /destination-types` ([schemas](/docs/api/schemas)).
+- Show each type’s `label`, `description`, and `icon`; store the chosen `type` string.
+
+**Reference:** [CreateDestination.tsx](https://github.com/hookdeck/outpost/blob/main/internal/portal/src/scenes/CreateDestination/CreateDestination.tsx)
+
+### Step 2 — Select topics
+
+- Data: `GET /topics` ([list topics](/docs/api/topics#list-topics)).
+- Collect topic strings, or `*` for all topics, as allowed by the create schema.
+
+**Reference:** [TopicPicker.tsx](https://github.com/hookdeck/outpost/blob/main/internal/portal/src/common/TopicPicker/TopicPicker.tsx)
+
+### Step 3 — Configure the destination
+
+- Read `config_fields` and `credential_fields` for the selected type from `GET /destination-types` (or a single-type endpoint if you use one—see OpenAPI).
+- If `remote_setup_url` is set, consider that flow first.
+- Otherwise render fields per [Dynamic field shape](#dynamic-field-shape-for-forms) and submit via [Create destination](/docs/api/destinations#create-destination).
+
+**Reference:** [DestinationConfigFields.tsx](https://github.com/hookdeck/outpost/blob/main/internal/portal/src/common/DestinationConfigFields/DestinationConfigFields.tsx)
 
 ## Events, attempts, and retries
 
-This section ties together **how customers see what was delivered** and **how they recover from failures**—without duplicating full UI code (see the [portal](https://github.com/hookdeck/outpost/tree/main/internal/portal) and [OpenAPI](/docs/api) for request/response shapes).
+This section connects what your customers see (what was delivered, what failed, how to retry) to the API. Request and response shapes live in [OpenAPI](/docs/api); the [portal](https://github.com/hookdeck/outpost/tree/main/internal/portal) shows one full implementation.
 
 ### How the pieces fit
 
-1. **Destinations list** — Each row is a subscription (type, target URL or label, topics). From here, users typically open **“Activity”**, **“Events”**, or **“Logs”** for that destination, or you filter a shared events view by `destination_id`.
-2. **Events** — An event is something your **backend published** (topic + payload). The [List Events API](/docs/api/events#list-events) returns a **paginated** list. Important query dimensions:
-   - **`destination_id`** — only events that were routed to that destination (ideal for a per-destination screen).
-   - **`topic`**, **time ranges**, **pagination** (`limit`, `next` / `prev` cursors) — for broader “recent activity” views.
-   With a **tenant JWT**, results are limited to that tenant; with an **admin API key**, supply **`tenant_id`** (your BFF usually injects it from the signed-in customer).
-3. **Attempts** — Each row is one **delivery try** to a destination (status, HTTP code, timing, optional response payload). Link attempts to events via **`event_id`** and **`destination_id`**.
-   - Tenant-wide: [List attempts](/docs/api/attempts#list-attempts) with `event_id` (and optionally `destination_id`).
-   - Destination-scoped: `GET /tenants/{tenant_id}/destinations/{destination_id}/attempts` — see [OpenAPI / tenant destination attempts](/docs/api) (same filters, including `event_id` when drilling down).
-4. **Automatic vs manual retry** — Outpost [retries failed deliveries automatically](/docs/features/event-delivery) (backoff, limits). **Manual retry** lets a user trigger another delivery after fixing their endpoint—use [Retry event delivery](/docs/api/attempts#retry-attempt) (`POST /retry` with **`event_id`** and **`destination_id`**). The destination must be enabled and subscribed to the event’s topic; disabled destinations cannot be retried.
+1. **Destinations list** — Each row is a subscription. By default, link into destination-scoped activity ([Default information architecture](#default-information-architecture-multi-destination-products)). An optional tenant-wide activity route should still call the same list endpoints with different query parameters, not a separate unofficial API contract.
+2. **Events** — Your backend published each event (topic + payload). [List events](/docs/api/events#list-events) is paginated. Common filters: `destination_id` for a per-destination screen; `topic`, time ranges, and `limit` / `next` / `prev` for broader views. With a tenant JWT, results are limited to that tenant; with an admin key, supply `tenant_id` (your backend usually injects it for the signed-in account).
+3. **Attempts** — One row per delivery try (status, HTTP code, timing, optional response). Tie attempts to events with `event_id` and `destination_id`. Tenant-wide: [list attempts](/docs/api/attempts#list-attempts). Destination-scoped routes are under [OpenAPI](/docs/api) (tenant destination attempts).
+4. **Retry** — Outpost [retries automatically](/docs/features/event-delivery) with backoff. [Manual retry](/docs/api/attempts#retry-attempt) is `POST /retry` with `event_id` and `destination_id` after the customer fixes their endpoint. The destination must be enabled and subscribed to the event’s topic.
 
 ### What to expose in your dashboard UI
 
 | User need | API direction |
 | --------- | ------------- |
-| “What fired for my webhook?” | List **events** filtered by **`destination_id`**, then list **attempts** for the chosen **`event_id`** (and destination). |
-| “Why did it fail?” | Show attempt **status**, **code**, and **response** fields (when included); link to your own docs on fixing URLs, auth, or timeouts. |
-| “Send it again” | **Retry** button on failed attempts (or on the event row if you only show one destination) → `POST /retry`. Show **202** / success vs **400** (e.g. destination disabled) from the API. |
+| “What was delivered here?” (this destination) | List events with `destination_id`, then list attempts for the chosen `event_id` (and destination as needed)—same idea for webhooks, queues, and other types. |
+| “Why did it fail?” | Surface attempt status, code, and response when present; link to your docs on URLs, auth, or timeouts. |
+| “Send it again” | Retry on failed attempts → `POST /retry`; handle 202 vs errors such as disabled destination. |
 
 ### Implementation notes
 
-- **Pagination:** Event and attempt list endpoints are cursor-paged; your UI should pass through **`next`** / **`prev`** (or “Load more”) so busy tenants are usable.
-- **Auth:** If the browser never sees the admin key, proxy these endpoints from your backend and attach the platform **Outpost API key** server-side, scoping **`tenant_id`** to the logged-in customer—same pattern as destination CRUD.
-- **Reference UI:** The portal’s destination flow includes event listing for a destination—see [Events.tsx](https://github.com/hookdeck/outpost/blob/main/internal/portal/src/scenes/Destination/Events/Events.tsx) as a reference layout (not a copy-paste requirement).
+- Event and attempt lists use cursor pagination; pass through `next` and `prev` (or “load more”) for busy tenants.
+- If the browser never holds the admin key, proxy these calls through your backend with the platform key and the correct `tenant_id`, same as destination CRUD.
+- **Reference:** [Events.tsx](https://github.com/hookdeck/outpost/blob/main/internal/portal/src/scenes/Destination/Events/Events.tsx) for destination-scoped activity layout.
+
+## Implementation checklists
+
+These are readiness checks: they do not replace the tables above or OpenAPI. Use them to confirm nothing important was skipped before ship or when reviewing an implementation.
+
+### Planning and contract
+
+- [ ] Every call is scoped to the correct tenant (`tenant_id` on admin-key routes, or tenant inferred from JWT).
+- [ ] Outpost base URL comes from configuration or environment for dev, staging, and production (not a single hardcoded host in app code).
+- [ ] You chose an auth approach (browser JWT, server-side proxy/BFF, or mix) and use the matching OpenAPI operations and headers consistently.
+- [ ] Dynamic destination UI (labels, icons, form fields) is driven by `GET /destination-types`, not copied field lists from examples.
+
+### Destinations experience
+
+- [ ] List view shows type, human-readable target, and subscribed topics; each row reaches detail edit and destination-scoped activity.
+- [ ] Create flow covers: pick type → select topics (`GET /topics`) → collect `config` and credentials per the selected type’s `config_fields` and `credential_fields`.
+- [ ] When a type exposes `instructions` or `remote_setup_url`, the UI surfaces them (markdown / external flow) so customers are not blocked on opaque fields.
+- [ ] Detail supports lifecycle your product needs: view, update, delete, enable/disable—per OpenAPI and your product policy.
+
+### Activity, attempts, and retries
+
+- [ ] Default path is destination → events → attempts; optional tenant-wide activity still uses the same list endpoints with different query parameters.
+- [ ] Cursor pagination is implemented for busy tenants (`next` / `prev` or equivalent “load more”).
+- [ ] Failed deliveries show enough context (status, HTTP code, response when present) for customers to fix their side.
+- [ ] Manual retry is available where appropriate; errors such as disabled destination are handled with a clear message.
+
+### Content from the API
+
+- [ ] Inline icons or `instructions` markdown are rendered safely if they contain HTML or untrusted strings.
+- [ ] Sensitive credential fields respect masking and “clear to edit” behavior described in the spec.

From 320c039c9d3ba93d7fae492a230c6a623a78b605 Mon Sep 17 00:00:00 2001
From: Phil Leggetter <phil@leggetter.co.uk>
Date: Fri, 10 Apr 2026 02:28:32 +0100
Subject: [PATCH 23/47] =?UTF-8?q?docs(eval):=20align=20scenarios=2008?=
 =?UTF-8?q?=E2=80=9310,=20prompt,=20and=20heuristics?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- Agent prompt: topic reconciliation, domain vs test publish, full-stack UI
  guidance; remove eval-flavored Turn 0 / next-run wording in template.
- score-transcript: publish_beyond_test_only for 08/09/10 (domain publish).
- Scenarios + README: success criteria and Turn 1 nudges match prompt.
- SCENARIO-RUN-TRACKER: scenario 09 review notes marked resolved.

Made-with: Cursor
---
 docs/agent-evaluation/README.md               | 10 +++-
 docs/agent-evaluation/SCENARIO-RUN-TRACKER.md | 11 +++-
 .../scenarios/08-integrate-nextjs-existing.md | 13 ++---
 .../09-integrate-fastapi-existing.md          | 10 ++--
 .../scenarios/10-integrate-go-existing.md     |  9 ++--
 docs/agent-evaluation/src/score-transcript.ts | 50 +++++++++++++++++++
 .../hookdeck-outpost-agent-prompt.mdx         | 30 +++++++----
 7 files changed, 107 insertions(+), 26 deletions(-)

diff --git a/docs/agent-evaluation/README.md b/docs/agent-evaluation/README.md
index 4a591adb8..87941a677 100644
--- a/docs/agent-evaluation/README.md
+++ b/docs/agent-evaluation/README.md
@@ -8,7 +8,7 @@ This folder contains **manual** scenario specs (markdown) and an **automated** r
 |------|--------|
 | **Human checklist** (full eval, including execution) | Each file under [`scenarios/`](scenarios/) — section **Success criteria** (static + **Execution (full pass)** rows). |
 | **Manual run write-up** | [`results/RUN-RECORDING.template.md`](results/RUN-RECORDING.template.md) — copy to a local file under `results/` (gitignored). |
-| **Automated transcript rubric** (regex heuristics) | [`src/score-transcript.ts`](src/score-transcript.ts) — `scoreScenario01`–`scoreScenario10` (assistant text + tool-written file corpus). |
+| **Automated transcript rubric** (regex heuristics) | [`src/score-transcript.ts`](src/score-transcript.ts) — `scoreScenario01`–`scoreScenario10` (assistant text + tool-written file corpus). Scenarios **08–10** include **`publish_beyond_test_only`** (domain publish signal vs test-only). |
 | **LLM judge** (Anthropic vs **`## Success criteria`** in each scenario) | [`src/llm-judge.ts`](src/llm-judge.ts) — runs after each scenario unless **`--no-score-llm`**; also `npm run score -- --llm`. |
 
 **Deliberate scope:** `npm run eval` **requires** **`--scenario`**, **`--scenarios`**, or **`--all`**. There is no silent “run everything” default — you choose the scenarios and accept the cost. After **each** run: **`transcript.json`**, **`heuristic-score.json`**, and **`llm-score.json`** (judge reads the same **Success criteria** as humans). Exit **1** if any enabled score fails.
@@ -100,6 +100,14 @@ A **full pass** also answers: *did the generated curl / script / app succeed aga
 2. Run the agent’s commands or start its app and complete the flows the scenario describes.
 3. Record pass/fail in your run notes ([`results/RUN-RECORDING.template.md`](results/RUN-RECORDING.template.md)).
 
+#### Integration scenarios (08–10): depth to verify
+
+These measure **Option 3** (existing app), not a greenfield demo. When you **execute** the artifact:
+
+- **Topic reconciliation:** Confirm README maps **`publish` topics** to **real domain events** and, when Turn 0 is incomplete, tells the operator to **add topics in Hookdeck**—not to retarget the app to a stale list (unless the scenario was explicitly wiring-only).
+- **Domain publish:** Prefer a smoke step that performs a **real product action** (signup, create entity, etc.) and observe an accepted publish—not **only** a “send test event” button.
+- **Heuristic `publish_beyond_test_only`:** [`score-transcript.ts`](src/score-transcript.ts) adds a weak automated check that the transcript corpus suggests publish beyond synthetic test-only paths; it is **not** a substitute for execution or the LLM judge reading **Success criteria**.
+
 ## Single source of truth for the dashboard prompt
 
 The **full prompt template** (the text operators paste as Turn 0) lives in **one** place:
diff --git a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
index 82a96208b..806d0c14a 100644
--- a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
+++ b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
@@ -28,7 +28,7 @@ Use this table while you **run scenarios one at a time** and **execute the gener
 | 06  | [06-app-fastapi.md](scenarios/06-app-fastapi.md)                               | `2026-04-09T08-38-42-008Z-scenario-06` | Pass (8/8)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. `**main.py`** + `requirements.txt`, `outpost_sdk` + FastAPI. HTML: destinations list, add webhook (topics from API + URL), publish test event, delete. Execution: `python3 -m venv .venv`, `pip install -r requirements.txt`, run-dir `.env`, `uvicorn main:app` on :8766; **GET /** 200, **POST /destinations** 303, **POST /publish** 303.                                                                                                                                                                                   |
 | 07  | [07-app-go-http.md](scenarios/07-app-go-http.md)                               | `2026-04-09T09-10-23-291Z-scenario-07` | Pass (9/9)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. `**go-portal-demo/`** — `main.go` + `templates/`, `net/http`, `outpost-go` (`replace` → repo `sdks/outpost-go`). Multi-step create destination + **GET/POST /publish**. Execution: `PORT=8777` + key/base from `docs/agent-evaluation/.env`; **GET /** 200, **POST /publish** 200. Eval ~25 min wall time.                                                                                                                                                                                                                     |
 | 08  | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md)   | `2026-04-09T14-48-16-906Z-scenario-08` | Pass (7/7)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. `**## Eval harness`** pre-clone + `**agent cwd`** = `next-saas-starter/` under run dir; artifact colocated (`app/api/outpost/`**, dashboard webhooks, `@hookdeck/outpost-sdk`). **Execution:** `npx tsc --noEmit` in `…/next-saas-starter/` — **exit 0**. Eval ~13 min wall time. Earlier run `2026-04-09T11-08-32-505Z-scenario-08`: work had landed outside run dir (no app tree in folder).                                                                                                                                 |
-| 09  | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) | `2026-04-09T22-16-54-750Z-scenario-09` | Pass (6/6)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. **Artifact** lives under `results/runs/…` (**gitignored**): `full-stack-fastapi-template/` + Docker **outpost-local-s09**; ports **5173** / **8001** / **54333** / **1080**. **§ Scenario 09 — post-agent work** lists everything applied after the agent transcript (incl. test publish, events/attempts/retry UI, docs + prompt). **Legacy runs:** `2026-04-09T20-48-16-530Z-scenario-09`, `2026-04-09T15-51-44-184Z-scenario-09`. |
+| 09  | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) | `2026-04-09T22-16-54-750Z-scenario-09` | Pass (6/6)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. **Artifact** lives under `results/runs/…` (**gitignored**): `full-stack-fastapi-template/` + Docker **outpost-local-s09**; ports **5173** / **8001** / **54333** / **1080**. **§ Scenario 09 — post-agent work** lists everything applied after the agent transcript (incl. test publish, events/attempts/retry UI, docs + prompt). **§ Scenario 09 — review notes** — closed (IA + domain topics guidance landed in BYO UI + prompt). **Legacy runs:** `2026-04-09T20-48-16-530Z-scenario-09`, `2026-04-09T15-51-44-184Z-scenario-09`. |
 | 10  | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md)           |                                        |                        |           |                            |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
 
 
@@ -56,6 +56,15 @@ Work applied **after** the agent transcript so the FastAPI + React artifact matc
 - [Building your own UI](../pages/guides/building-your-own-ui.mdx) — destination-type field fixes; **Events, attempts, and retries** section (features, how they connect, links to API).
 - [Agent prompt template](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx) — full-stack guidance mentions **events list**, **attempts**, **retry**, alongside test publish.
 
+### Scenario 09 — review notes (resolved, 2026-04-10)
+
+Operator feedback from exercising the FastAPI full-stack artifact is **closed** in-repo:
+
+1. **Event activity IA** — [Building your own UI](../pages/guides/building-your-own-ui.mdx) documents **default** destination → activity and **optional** tenant-wide activity with the same list endpoints; no open doc gap.
+2. **Domain topics + real publishes vs test-only** — [Agent prompt](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx) (topic reconciliation, domain publish, test publish as separate), scenarios **08–10** success criteria + Turn 1 copy, [README](README.md) execution notes, and heuristic **`publish_beyond_test_only`** in [`src/score-transcript.ts`](src/score-transcript.ts) cover what we measure.
+
+The **copied agent template** (the `## Hookdeck Outpost integration` block) intentionally stays **scenario-agnostic**: it does not name eval baselines, harness repos, or scenario IDs—only product-level integration guidance and doc links.
+
 ### Column hints
 
 
diff --git a/docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md b/docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md
index 94c9b65ab..542a030fe 100644
--- a/docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md
+++ b/docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md
@@ -44,7 +44,7 @@ Paste the **## Template** block from [`hookdeck-outpost-agent-prompt.mdx`](../pa
 
 > Option 3 — I’m not starting from scratch. **We’re already in the Next.js SaaS app in this workspace** — the baseline repo is checked out here. Install dependencies and get it runnable, then wire in **Hookdeck Outpost** so we can send **outbound webhooks** to our customers.
 >
-> I need this tied to **something real in the app** (not a throwaway demo page), and I need to understand how each customer gets their webhook registered. Put whatever I need to configure in the README (env vars, etc.). Keep secrets on the server only.
+> I need this tied to **something real in the app** (not a throwaway demo page), and I need to understand how each customer gets their webhook registered. **Publish topic names should follow the app’s domain**; if Turn 0’s configured list is missing any name you need, document what to **add in the Outpost project**—don’t retarget real features to wrong topics just to match the list unless I explicitly asked for a minimal demo. Put whatever I need to configure in the README (env vars, etc.). Keep secrets on the server only.
 
 ### Turn 2 — User (optional)
 
@@ -56,16 +56,17 @@ Paste the **## Template** block from [`hookdeck-outpost-agent-prompt.mdx`](../pa
 
 - Baseline app is the documented **next-saas-starter** (or an explicitly justified fork): harness clone under the run directory plus install / integration steps reflected in the transcript or that tree.
 - **Outpost TypeScript SDK** used **server-side only**; no `NEXT_PUBLIC_*` API key.
-- At least one **publish** (or equivalent) tied to a **real code path** in the baseline (not dead code).
-- **Topic** aligns with Turn 0 configuration or is clearly named and documented.
-- **Per-customer webhook** story is explained: destination creation / subscription to topic.
+- **Topic reconciliation:** README or inline notes map **each `publish` topic** to a **real domain event**; if the app needs topics not in Turn 0, instructions say to **add them in Hookdeck** (domain-first—not reshaping product logic to fit a stale default list unless wiring-only scope was agreed).
+- At least one **publish** on a **real domain path** (signup, CRUD, billing, etc.)—**not** only a synthetic “test event” route. A separate test publish for wiring checks is fine but does **not** replace this.
+- **Per-customer webhook** story is explained: destination creation / subscription to topic; **tenant ↔ customer** mapping is consistent for publish and destination APIs.
 - README (or equivalent) lists **env vars** for Outpost.
-- **Execution (full pass):** With `OUTPOST_API_KEY` set, the app runs; a manual path triggers the integrated publish and Outpost accepts the request (2xx/202 as appropriate). Run smoke tests from **`results/runs/…-scenario-08/next-saas-starter/`** (not transcript-only triage).
+- **Execution (full pass):** With `OUTPOST_API_KEY` set, the app runs; perform a **real in-app action** that triggers the domain publish and confirm Outpost accepts it (2xx/202). Optionally also run a test publish. Smoke from **`results/runs/…-scenario-08/next-saas-starter/`** (not transcript-only triage).
 
 ## Failure modes to note
 
 - Pasting a greenfield Next app instead of integrating the **baseline** in the workspace.
-- Publishing only from a demo route unrelated to the product model.
+- Publishing only from a demo or **test-only** route with no domain path.
+- **Topics** in code with no README telling the operator to **add** them in Hookdeck when Turn 0 was incomplete (or silently retargeting domain logic to unrelated Turn 0 names).
 - Calling Outpost from client components with secrets.
 
 ## Future baselines
diff --git a/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md b/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md
index 36f31229b..97a54756c 100644
--- a/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md
+++ b/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md
@@ -47,7 +47,7 @@ Paste the **## Template** from [`hookdeck-outpost-agent-prompt.mdx`](../pages/qu
 
 > Option 3 — integrate Outpost into a real codebase. **We’re already in the full-stack FastAPI template in this workspace** — the repository is present here. Follow the project’s dev docs to get backend (and frontend if useful) running, then add **Hookdeck Outpost** for customer webhooks.
 >
-> Hook publishing to **one real event** that already exists in the app (users, items, teams, whatever fits). Document topics, how tenants register webhook URLs, and env vars. Don’t leak the API key to the client.
+> Hook publishing to **one real event** that already exists in the app (users, items, teams, whatever fits). **Topic strings should match that domain**; if Turn 0’s list doesn’t include the right names yet, document what the operator must **add in the Outpost project**—don’t contort the app to arbitrary topics unless this is explicitly a minimal wiring pass. Document topics, how tenants register webhook URLs, and env vars. Don’t leak the API key to the client.
 
 ### Turn 2 — User (optional)
 
@@ -58,17 +58,19 @@ Paste the **## Template** from [`hookdeck-outpost-agent-prompt.mdx`](../pages/qu
 **Measurement:** Heuristic `scoreScenario09` in [`src/score-transcript.ts`](../src/score-transcript.ts); LLM judge; execution manual.
 
 - **full-stack-fastapi-template** (or documented alternative) present via harness **`preSteps`** with install steps in the transcript or tree.
-- **`outpost_sdk`** with **`publish.event`** (and related calls as needed) on a **real** code path in the **backend** (server-side only for secrets).
+- **`outpost_sdk`** with **`publish.event`** (and related calls as needed) on a **real** code path in the **backend** (server-side only for secrets)—**not** only a synthetic test-publish endpoint unless the scenario was explicitly scoped to wiring-only.
 - API key from **environment** or secure settings — not hard-coded or exposed to clients.
-- **Topic** and **destination** story documented (README or inline); if the app has a UI, linking or exposing **safe** controls for webhook URLs is a plus.
+- **Topic reconciliation:** each **`topic` in code** ties to a real domain event; gaps vs Turn 0 are resolved by **operator adding topics in Hookdeck** (documented), not by retargeting domain logic to a mismatched list unless wiring-only scope was agreed.
+- **Destination** story documented; if the app has a UI, linking or exposing **safe** controls for customer destinations is a plus; **tenant id** usage consistent with publish.
 - README (or equivalent) lists **env vars** for Outpost.
-- **Execution (full pass):** Stack runs per template docs; trigger path fires publish; Outpost accepts. *Skip for transcript-only.*
+- **Execution (full pass):** Stack runs per template docs; trigger a **real domain action** that fires publish; Outpost accepts. A test-publish button may be used **additionally** for smoke. *Skip for transcript-only.*
 
 ## Failure modes to note
 
 - Greenfield FastAPI “hello world” instead of the **cloned** baseline.
 - Using raw `httpx` to Outpost when the scenario asks for **`outpost_sdk`**.
 - Putting `OUTPOST_API_KEY` in `NEXT_PUBLIC_*` / client bundles.
+- **Only** test/synthetic publish with no domain hook.
 
 ## Future baselines
 
diff --git a/docs/agent-evaluation/scenarios/10-integrate-go-existing.md b/docs/agent-evaluation/scenarios/10-integrate-go-existing.md
index bbe96d80f..be24d501e 100644
--- a/docs/agent-evaluation/scenarios/10-integrate-go-existing.md
+++ b/docs/agent-evaluation/scenarios/10-integrate-go-existing.md
@@ -41,7 +41,7 @@ Paste the **## Template** from [`hookdeck-outpost-agent-prompt.mdx`](../pages/qu
 
 > Option 3 — existing Go API. **We’re already in the startersaas-go-api tree in this workspace** — the repository is present here. Get it building, then add **Hookdeck Outpost** for outbound webhooks.
 >
-> Use **one real handler** as the publish trigger (signup, billing, etc.). API key from env only. Document how customers register webhook URLs and what to set in env. Use the test destination from the dashboard prompt where it helps.
+> Use **one real handler** as the publish trigger (signup, billing, etc.). **`topic` values should match that domain**; if Turn 0’s list is incomplete, document what to **add in the Outpost project**—don’t bend the handler to wrong topic names just to match the prompt unless this is explicitly minimal wiring. API key from env only. Document how customers register webhook URLs and what to set in env. Use the test destination from the dashboard prompt where it helps.
 
 ### Turn 2 — User (optional)
 
@@ -52,12 +52,13 @@ Paste the **## Template** from [`hookdeck-outpost-agent-prompt.mdx`](../pages/qu
 **Measurement:** Heuristic `scoreScenario10` in [`src/score-transcript.ts`](../src/score-transcript.ts); LLM judge; execution manual.
 
 - **startersaas-go-api** (or documented alternative) present via harness **`preSteps`** with build instructions attempted in the transcript or tree.
-- **Outpost Go SDK** used with **`Publish.Event`** (and related types) on a **real** handler path.
+- **Outpost Go SDK** used with **`Publish.Event`** (and related types) on a **real** handler path—not only a test-only route unless wiring-only scope was agreed.
 - No API key in source; **`os.Getenv("OUTPOST_API_KEY")`** (or config loader) only.
-- **Topic** + **destination** documentation for operators.
-- **Execution (full pass):** Server runs; trigger handler; Outpost accepts publish. *Skip for transcript-only.*
+- **Topic reconciliation** (domain-first; operator adds missing Hookdeck topics as documented) + **destination** documentation for operators; **tenant** mapping consistent.
+- **Execution (full pass):** Server runs; trigger the **domain** handler; Outpost accepts publish. *Skip for transcript-only.*
 
 ## Failure modes to note
 
 - New `main.go` only, without using the **cloned** baseline’s routes/models.
 - Wrong `Create` shape without **`CreateDestinationCreateWebhook`** when creating webhook destinations.
+- Publish only from a **test** helper with no real handler path.
diff --git a/docs/agent-evaluation/src/score-transcript.ts b/docs/agent-evaluation/src/score-transcript.ts
index 976bd9bcd..9bc8df7d4 100644
--- a/docs/agent-evaluation/src/score-transcript.ts
+++ b/docs/agent-evaluation/src/score-transcript.ts
@@ -151,6 +151,29 @@ function containsLikelyLeakedKey(text: string): boolean {
   return false;
 }
 
+/**
+ * Option 3 (08–10): corpus should show publish on a real domain path, not only a synthetic
+ * “test event” / publish-test helper. Multiple publish sites, or one publish without test-only
+ * markers, passes. Weak signal — confirm with scenario Success criteria + execution smoke.
+ */
+function corpusSuggestsPublishBeyondTestOnly(corpus: string): boolean {
+  const t = corpus;
+  const publishHits = t.match(/publish\.event|Publish\.Event|PublishEvent/gi);
+  if (!publishHits?.length) return false;
+  if (publishHits.length >= 2) return true;
+  const lower = t.toLowerCase();
+  const testish =
+    /publish-test|publish_test|publishtest|test_publish|send test|synthetic.*(event|publish)|test event/.test(
+      lower,
+    );
+  if (!testish) return true;
+  const domainish =
+    /signup|register|user\.created|item\.|order\.|after_commit|post_save|on_.*create|createuser|create.?item|router\.(post|put|patch)|@router\.(post|put|patch)|handler\.|func.*create|def create_/.test(
+      lower,
+    ) && /publish|outpost/.test(lower);
+  return domainish;
+}
+
 function scoreScenario01(corpus: string, assistant: string, meta: RunJson["meta"]): TranscriptScore {
   const t = corpus;
   const lower = t.toLowerCase();
@@ -778,6 +801,15 @@ function scoreScenario08(corpus: string, assistant: string): TranscriptScore {
       : "Expected how operators register webhook URLs per customer/tenant",
   });
 
+  const beyondTest = corpusSuggestsPublishBeyondTestOnly(t);
+  checks.push({
+    id: "publish_beyond_test_only",
+    pass: beyondTest,
+    detail: beyondTest
+      ? "Publish appears beyond a synthetic test-only path (or multiple publish sites)"
+      : "Expected domain publish (not only publish-test / send test) — see scenario Success criteria",
+  });
+
   checks.push({
     id: "no_key_in_reply",
     pass: !containsLikelyLeakedKey(assistant),
@@ -844,6 +876,15 @@ function scoreScenario09(corpus: string, assistant: string): TranscriptScore {
     detail: env ? "API key from environment / settings" : "Expected OUTPOST_API_KEY from env",
   });
 
+  const beyondTest = corpusSuggestsPublishBeyondTestOnly(t);
+  checks.push({
+    id: "publish_beyond_test_only",
+    pass: beyondTest,
+    detail: beyondTest
+      ? "Publish appears beyond a synthetic test-only path (or multiple publish sites)"
+      : "Expected domain publish (not only publish-test / send test) — see scenario Success criteria",
+  });
+
   checks.push({
     id: "no_key_in_reply",
     pass: !containsLikelyLeakedKey(assistant),
@@ -905,6 +946,15 @@ function scoreScenario10(corpus: string, assistant: string): TranscriptScore {
     detail: envKey ? "Reads OUTPOST_API_KEY via os.Getenv" : "Expected os.Getenv(\"OUTPOST_API_KEY\")",
   });
 
+  const beyondTest = corpusSuggestsPublishBeyondTestOnly(t);
+  checks.push({
+    id: "publish_beyond_test_only",
+    pass: beyondTest,
+    detail: beyondTest
+      ? "Publish appears beyond a synthetic test-only path (or multiple publish sites)"
+      : "Expected domain publish (not only publish-test / send test) — see scenario Success criteria",
+  });
+
   checks.push({
     id: "no_key_in_reply",
     pass: !containsLikelyLeakedKey(assistant),
diff --git a/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx b/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx
index 9e08b3e03..16f348e09 100644
--- a/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx
+++ b/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx
@@ -10,7 +10,7 @@ This page is a **reference template** for the Hookdeck Outpost onboarding flow.
 ```
 ## Hookdeck Outpost integration
 
-You are helping integrate Hookdeck Outpost into a platform to deliver events (webhooks and event destinations) to the platform's customers.
+You are helping integrate Hookdeck Outpost into a platform to deliver events to the platform's customers via **event destinations** (webhook URLs, cloud queues, Hookdeck, and other supported types—see **{{DOCS_URL}}/destinations**).
 
 ### Credentials
 
@@ -21,6 +21,12 @@ You are helping integrate Hookdeck Outpost into a platform to deliver events (we
 
 {{TOPICS_LIST}}
 
+These names must **exist in the Outpost project** (dashboard) for publishes and destination subscriptions to work.
+
+**Naming:** In typical B2B SaaS, lifecycle topics like **`user.created`** mean an **end-user of the tenant’s account** (your customer’s customer—e.g. a team member), **not** your platform’s internal operator or staff. Use topic names that match **your product’s domain** (`order.shipped`, `item.deleted`, …) when those are the real events.
+
+**Reconciliation (default):** Derive **`topic` strings in code** from **real state changes** in the app. If **Configured topics** above is missing a name the app should emit, **do not** bend the product model to fit the list—tell the operator to **add that topic in the Outpost project** (Hookdeck) and to **refresh `{{TOPICS_LIST}}`** in the dashboard so a regenerated prompt matches the project. Only narrow or rename domain publishes when the operator **explicitly** asks for a minimal wiring demo with a fixed topic set.
+
 ### Test destination
 
 Use this **Hookdeck Console Source** URL to verify event delivery (the webhook `config.url`, or `OUTPOST_TEST_WEBHOOK_URL` in the SDK quickstarts). Your dashboard supplies it for this project:
@@ -36,7 +42,7 @@ Use this **Hookdeck Console Source** URL to verify event delivery (the webhook `
 - Full docs bundle (when available on the public site): {{LLMS_FULL_URL}}
 - API reference and OpenAPI (REST JSON shapes and status codes): {{DOCS_URL}}/api
 - **Concepts — how tenants, destinations (subscriptions), topics, and publish fit a SaaS/platform:** {{DOCS_URL}}/concepts
-- **Building your own UI — screen structure and flow** (list destinations, create destination: type → topics → config; **events list**, delivery **attempts**, **manual retry**; tenant scope): {{DOCS_URL}}/guides/building-your-own-ui
+- **Building your own UI — screen structure and flow** (list destinations—**any type**; create: choose **type** → topics → type-specific config; **events** / **attempts** / **manual retry**; tenant scope; default **destination → activity**): {{DOCS_URL}}/guides/building-your-own-ui
 - Destination types: {{DOCS_URL}}/destinations
 - Topics and destination subscriptions (fan-out, `*`): {{DOCS_URL}}/features/topics
 - SDK overview (mostly TypeScript-shaped examples): {{DOCS_URL}}/sdks — use **only** for high-level context; for **TypeScript, Python, or Go** code, follow that language’s **quickstart** for correct method signatures (e.g. Python `publish.event` uses `request={{...}}`, not TypeScript-style spreads as Python kwargs).
@@ -57,21 +63,21 @@ Do **not** mix patterns across languages (e.g. do not apply TypeScript `publish.
 
 **Option 2 (small app)** — Map framework to the matching official SDK on the **server only**: e.g. **Next.js** → TypeScript SDK + patterns from the TypeScript quickstart and your Next conventions; **FastAPI** → Python SDK; **Go + net/http** → Go SDK. Prefer each language’s **quickstart** for Outpost call shapes. **Before** designing pages or forms, read **Concepts** and **Building your own UI** in the Documentation list: the UI should reflect **tenant scope**, **multiple destinations per tenant**, and **destination = topic subscription + delivery target** (not a single anonymous webhook field unless the user explicitly asks for that simplified shape).
 
-**Option 3 (existing app)** — Use the **official SDK for the repo’s backend language** on the **server** (or REST/OpenAPI if they insist on no SDK). Read that language’s quickstart for call shapes; integrate on **real** domain paths (signup, core entities, workflows), not throwaway demos.
+**Option 3 (existing app)** — Use the **official SDK for the repo’s backend language** on the **server** (or REST/OpenAPI if they insist on no SDK). Read that language’s quickstart for call shapes; integrate on **real** domain paths (signup, core entities, workflows), not throwaway demos. **Minimum integration depth:** (1) **Topic reconciliation** — every **`topic` in `publish`** must either appear under **Configured topics** above **or** be documented for the operator with **“add this topic in the Outpost project”** (prefer fixing the project to match the domain, not retargeting domain logic to a stale list). (2) **Domain publish** — at least one **`publish` on a real state-change path** (CRUD handler, service after commit, job, etc.), not only a “send test event” / synthetic demo route. (3) **Same tenant mapping** everywhere you call Outpost for that customer.
 
-**Full-stack existing apps (backend + product UI)** — If the codebase already has a **customer-facing UI** (dashboard, settings, integrations, account area) **or** a mobile app that talks to your API, assume operators want customers to **manage webhook destinations inside the product**, not only via raw API or Swagger:
+**Full-stack existing apps (backend + product UI)** — If the codebase already has a **customer-facing UI** (dashboard, settings, integrations, account area) **or** a mobile app that talks to your API, assume operators want customers to **manage event destinations** (every **destination type** the project enables—webhook, queues, Hookdeck, etc.; see **{{DOCS_URL}}/destinations** and **`GET /destination-types`** in OpenAPI) **inside the product**, not only via raw API or Swagger:
 
-- **Backend:** Keep **`OUTPOST_API_KEY`** and all Outpost SDK usage **server-side only**. Implement **tenant** upsert/sync where it fits your model, **publish** on real domain events, and **authenticated HTTP routes** (BFF / API routes / server actions—whatever matches the stack) that list, create, update, or delete destinations for the **currently signed-in customer’s** tenant. Those handlers call Outpost with the platform credentials; responses return only what the customer should see (e.g. destination ids, URLs, topics—never the platform API key).
+- **Backend:** Keep **`OUTPOST_API_KEY`** and all Outpost SDK usage **server-side only**. Implement **tenant** upsert/sync where it fits your model, **publish** on real domain events, and **authenticated HTTP routes** (BFF / API routes / server actions—whatever matches the stack) that list, create, update, or delete destinations for the **currently signed-in customer’s** tenant. Those handlers call Outpost with the platform credentials; responses return only what the customer should see (e.g. destination ids, **targets** / config summaries, topics—never the platform API key).
 - **Frontend:** Wire **logged-in** pages to **your** backend endpoints (session cookie, JWT, or your existing API client)—**not** to Hookdeck’s API directly and **not** with the Outpost SDK in the browser. Reuse your design system and routing. **Before** building screens, read **Concepts** and **Building your own UI** in the Documentation list: flows should reflect **tenant scope**, **multiple destinations per tenant**, and **destination = topic subscription + delivery target** (avoid a single undifferentiated “webhook” field that hides topics unless the operator asks for that simplification).
-- **Events and retries in the product UI:** Surface an **events** view (filterable by **destination** when useful) so customers can see what was published, plus **delivery attempts** per event (success/failure, response hints). For **failed** attempts, offer **manual retry** (server-side `POST /retry` with `event_id` and `destination_id`) after they fix their endpoint—see **Building your own UI** for how this links to destinations and to automatic retries in Outpost.
-- **Send test events (strongly recommended for full-stack / Option 3):** When you ship customer-facing destination management, also add a **separate** control or screen that **publishes a test event** for the signed-in tenant (server-side `publish` to a selectable topic, same pattern as the test destination URL above). Treat this as **best practice** for SaaS products that offer webhooks: it proves end-to-end delivery without waiting on production traffic and matches what operators expect (similar to “send test webhook” in major platforms). Implement it **by default** for this integration path; the product team can remove or gate it later, but skipping it makes verification much harder.
+- **Events and retries in the product UI:** Surface an **events** view (filterable by **destination** when useful) so customers can see what was published, plus **delivery attempts** per event (success/failure, response hints). For **failed** attempts, offer **manual retry** (server-side `POST /retry` with `event_id` and `destination_id`) after they fix their endpoint or downstream config—see **Building your own UI** (default: **destination → activity**) for how this links to destinations and to automatic retries in Outpost.
+- **Send test events (strongly recommended for full-stack / Option 3):** When you ship customer-facing destination management, also add a **separate** control or screen that **publishes a test event** for the signed-in tenant (server-side `publish` to a selectable topic, same pattern as the test destination URL above). This is **complementary** to domain publishes: it proves wiring (destination + topic subscription + delivery) without waiting on real traffic. It **does not replace** a `publish` on a real domain path. The test topic can be any **configured** topic; domain publishes should use topics that match the events you document.
 - **API-only or headless products:** If there is **no** customer UI, document how tenants manage destinations through **your** documented API (OpenAPI, etc.); still keep the platform key on the server.
 
 ### What to do
 
 Guide the conversation, then act:
 
-1. **Try it out** — Minimal path: tenant → webhook destination → publish → print event id (or show success). If they want the **simplest** path, default to **curl** without making them say “curl.” If they name **TypeScript**, **Python**, or **Go**, use **only** that language’s quickstart and implied SDK. Ask only for what the quickstart and runnability still need (env vars, etc.).
+1. **Try it out** — Minimal path: tenant → **one destination** (often a webhook for quick verification) → publish → print event id (or show success). If they want the **simplest** path, default to **curl** without making them say “curl.” If they name **TypeScript**, **Python**, or **Go**, use **only** that language’s quickstart and implied SDK. Ask only for what the quickstart and runnability still need (env vars, etc.).
 
 2. **Build a minimal example** — Small UI + server; use the **SDK for that stack** (see **Option 2** above) or REST if they choose HTTP-only. Follow **Concepts** + **Building your own UI** for the real product model. For a **tiny** demo (e.g. one page), still keep the model visible: **tenant** in scope, **create destination** as **topics + delivery target** (not one undifferentiated “webhook” field that hides topics), and a **separate** control or flow to **publish a test event** so the operator can verify delivery—avoid collapsing tenant setup, destination creation, and publish into a single form unless the user insists. An events or attempts view is optional for the smallest demo but matches the portal pattern when you have room.
 
@@ -91,11 +97,14 @@ Apply **only** the items below that fit the task; **skip** any that do not apply
 
 **When editing an existing application repository (Option 3 or equivalent):**
 
+- [ ] **Topic reconciliation:** Every **`topic`** in `publish` is either in **Configured topics** above **or** README/chat tells the operator exactly which topics to **add in Hookdeck**—**domain-first**; do not retarget real features to wrong topic names to match an incomplete **Configured topics** list unless the operator explicitly asked for a minimal demo scope.
+- [ ] **Domain publish:** At least one **`publish` on a real application path** (entity create/update, signup, etc.), not solely a synthetic “test event” endpoint—unless the operator explicitly scoped the task to wiring-only.
+- [ ] **Test publish (if you added one):** Kept as a **separate** control from domain logic; does not satisfy the domain-publish item by itself.
 - [ ] **Build integrity:** Generated outputs, route or module registries, and dependency lockfiles are **consistent** with new or edited source so a **clean** install + typecheck or build (or the repo’s documented CI step) would pass.
 
 **Files on disk:** When your environment supports it, **write runnable artifacts into the operator’s project workspace** (real files: scripts, app source, `package.json`, `go.mod`, README) rather than only pasting long code in chat—so they can run, diff, and commit. Keep everything for one task in the same directory. For **minimal example apps** (option 2), scaffold and install dependencies there as you normally would (for example `npm` / `npx`, `go mod`, `pip` or `uv`). For **Option 3** full-stack products, change both **backend and frontend** (or equivalent UI layer) when the repo already includes a customer-facing app—do not stop at OpenAPI-only unless the product is genuinely API-only or the operator asks to skip UI work.
 
-**Concepts:** Each **tenant** is one of the platform’s customers. A tenant has **zero or more destinations**; each **destination** is a **subscription**—a destination type (e.g. webhook) plus **which topics** to receive and **where** to deliver (e.g. HTTPS URL). Your **backend** publishes with **`tenant_id`**, **`topic`**, and payload; Outpost fans out to every destination of that tenant that subscribes to that topic. Read **{{DOCS_URL}}/concepts** and **{{DOCS_URL}}/guides/building-your-own-ui** for the full model and recommended screens. Topics for this project are listed above and were configured in the Hookdeck dashboard.
+**Concepts:** Each **tenant** is one of the platform’s customers (an org/account you sell to). A tenant has **zero or more destinations**; each **destination** is a **subscription**—a **destination type** (webhook, queue, Hookdeck, …) plus **which topics** to receive and **where** to deliver (type-specific: URL, queue name, etc.). Your **backend** publishes with **`tenant_id`**, **`topic`**, and payload; Outpost fans out to every destination of that tenant that subscribes to that topic. Topic names should reflect **your product’s events**; **`user.*`** usually means **users inside that tenant’s account**, not your company’s internal operators. Read **{{DOCS_URL}}/concepts** and **{{DOCS_URL}}/guides/building-your-own-ui** for the full model and recommended screens. Topics for this project are listed above and were configured in the Hookdeck dashboard.
 ```
 
 ## Placeholder reference
@@ -103,7 +112,7 @@ Apply **only** the items below that fit the task; **skip** any that do not apply
 | Placeholder | Example | Notes |
 |-------------|---------|--------|
 | `{{API_BASE_URL}}` | `https://api.outpost.hookdeck.com/2025-07-01` | Safe to embed in the prompt |
-| `{{TOPICS_LIST}}` | Bullet list or comma-separated topic names | From dashboard config |
+| `{{TOPICS_LIST}}` | Bullet list or comma-separated topic names | From dashboard config — operators should keep this aligned with what the integrated app will **publish** and what destinations subscribe to |
 | `{{TEST_DESTINATION_URL}}` | **Required** — HTTPS URL of the Hookdeck Console **Source** created for this onboarding flow (fed in by the dashboard). |
 | `{{DOCS_URL}}` | `https://outpost.hookdeck.com/docs` | Public docs root (no trailing slash). For unpublished docs, automated evals can set **`EVAL_LOCAL_DOCS=1`** so the Documentation section is replaced with repository file paths (see `docs/agent-evaluation/README.md`). |
 | `{{LLMS_FULL_URL}}` | `https://hookdeck.com/outpost/docs/llms-full.txt` | Optional; omit the line if not live yet |
@@ -111,5 +120,6 @@ Apply **only** the items below that fit the task; **skip** any that do not apply
 ## Operator checklist (dashboard UI)
 
 - Show **API base URL** and **topics** next to the copyable prompt.
+- **`{{TOPICS_LIST}}`:** Should match what the **integrated product** will publish (domain-first). If the baseline app emits events the project does not list yet, **add topics in Hookdeck** and refresh this list—avoid expecting the agent to **reshape the app** to fit a stale default (e.g. only `user.created` when the real model is `item.*`).
 - Feed **`{{TEST_DESTINATION_URL}}`** from a Hookdeck Console **Source** URL you create for the operator (same value can be shown for `OUTPOST_TEST_WEBHOOK_URL` in env UI). Explain **Settings → Secrets** for `OUTPOST_API_KEY` (recommend a project **`.env`** or env-injection pattern, not pasting into the agent). Optional `OUTPOST_API_BASE_URL`.
 - Keep the **API key out of the prompt text** to reduce exposure via model logs and chat history.

From 97aaa246256747379b2cca6531919652d3193229 Mon Sep 17 00:00:00 2001
From: Phil Leggetter <phil@leggetter.co.uk>
Date: Fri, 10 Apr 2026 02:54:56 +0100
Subject: [PATCH 24/47] =?UTF-8?q?docs(eval):=20de-meta=20user=20turns=20in?=
 =?UTF-8?q?=20scenarios=208=E2=80=9310?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Rewrite Turn 1 blockquotes as natural operator speech; drop Option 3,
Turn 0, and prompt-section references. Align success-criteria wording
with configured onboarding topics. Tracker references user-turn scripts.

Made-with: Cursor
---
 docs/agent-evaluation/SCENARIO-RUN-TRACKER.md |  2 +-
 .../scenarios/08-integrate-nextjs-existing.md | 12 ++++----
 .../09-integrate-fastapi-existing.md          | 29 +++++++++++--------
 .../scenarios/10-integrate-go-existing.md     |  4 +--
 4 files changed, 26 insertions(+), 21 deletions(-)

diff --git a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
index 806d0c14a..aba77c2a5 100644
--- a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
+++ b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
@@ -61,7 +61,7 @@ Work applied **after** the agent transcript so the FastAPI + React artifact matc
 Operator feedback from exercising the FastAPI full-stack artifact is **closed** in-repo:
 
 1. **Event activity IA** — [Building your own UI](../pages/guides/building-your-own-ui.mdx) documents **default** destination → activity and **optional** tenant-wide activity with the same list endpoints; no open doc gap.
-2. **Domain topics + real publishes vs test-only** — [Agent prompt](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx) (topic reconciliation, domain publish, test publish as separate), scenarios **08–10** success criteria + Turn 1 copy, [README](README.md) execution notes, and heuristic **`publish_beyond_test_only`** in [`src/score-transcript.ts`](src/score-transcript.ts) cover what we measure.
+2. **Domain topics + real publishes vs test-only** — [Agent prompt](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx) (topic reconciliation, domain publish, test publish as separate), scenarios **08–10** success criteria + user-turn scripts, [README](README.md) execution notes, and heuristic **`publish_beyond_test_only`** in [`src/score-transcript.ts`](src/score-transcript.ts) cover what we measure.
 
 The **copied agent template** (the `## Hookdeck Outpost integration` block) intentionally stays **scenario-agnostic**: it does not name eval baselines, harness repos, or scenario IDs—only product-level integration guidance and doc links.
 
diff --git a/docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md b/docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md
index 542a030fe..9471a654c 100644
--- a/docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md
+++ b/docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md
@@ -9,7 +9,7 @@ Operators often have a **production-shaped SaaS codebase** (auth, teams, dashboa
 ## Preconditions
 
 - Node 18+; `git` available.
-- Same Turn 0 placeholders as other scenarios (`OUTPOST_API_KEY` **not** in the prompt text; test destination URL from dashboard).
+- Same **initial onboarding prompt** as other scenarios (`OUTPOST_API_KEY` **not** in the pasted text; test destination URL from dashboard).
 
 ## Eval harness
 
@@ -42,9 +42,9 @@ Paste the **## Template** block from [`hookdeck-outpost-agent-prompt.mdx`](../pa
 
 ### Turn 1 — User
 
-> Option 3 — I’m not starting from scratch. **We’re already in the Next.js SaaS app in this workspace** — the baseline repo is checked out here. Install dependencies and get it runnable, then wire in **Hookdeck Outpost** so we can send **outbound webhooks** to our customers.
+> I’m integrating into our existing **Next.js** SaaS app—you’re in this repo with me. Install dependencies, get it running, then add **Hookdeck Outpost** so we can send **outbound webhooks** to our customers.
 >
-> I need this tied to **something real in the app** (not a throwaway demo page), and I need to understand how each customer gets their webhook registered. **Publish topic names should follow the app’s domain**; if Turn 0’s configured list is missing any name you need, document what to **add in the Outpost project**—don’t retarget real features to wrong topics just to match the list unless I explicitly asked for a minimal demo. Put whatever I need to configure in the README (env vars, etc.). Keep secrets on the server only.
+> Tie it to **real product behavior** (not a throwaway demo page). I need a clear story for **how each customer registers their webhook** and which topics they receive. Use **topic names that match our domain**; if Hookdeck doesn’t list a topic we need yet, tell me exactly what to add in the project—don’t point our code at the wrong names just to match a short list unless I’ve said we’re only doing a quick wiring spike. Document env vars and setup in the **README**. Keep the Outpost API key on the **server** only.
 
 ### Turn 2 — User (optional)
 
@@ -56,7 +56,7 @@ Paste the **## Template** block from [`hookdeck-outpost-agent-prompt.mdx`](../pa
 
 - Baseline app is the documented **next-saas-starter** (or an explicitly justified fork): harness clone under the run directory plus install / integration steps reflected in the transcript or that tree.
 - **Outpost TypeScript SDK** used **server-side only**; no `NEXT_PUBLIC_*` API key.
-- **Topic reconciliation:** README or inline notes map **each `publish` topic** to a **real domain event**; if the app needs topics not in Turn 0, instructions say to **add them in Hookdeck** (domain-first—not reshaping product logic to fit a stale default list unless wiring-only scope was agreed).
+- **Topic reconciliation:** README or inline notes map **each `publish` topic** to a **real domain event**; if the app needs topics not in the **configured project list** from onboarding, instructions say to **add them in Hookdeck** (domain-first—not reshaping product logic to fit a stale default list unless wiring-only scope was agreed).
 - At least one **publish** on a **real domain path** (signup, CRUD, billing, etc.)—**not** only a synthetic “test event” route. A separate test publish for wiring checks is fine but does **not** replace this.
 - **Per-customer webhook** story is explained: destination creation / subscription to topic; **tenant ↔ customer** mapping is consistent for publish and destination APIs.
 - README (or equivalent) lists **env vars** for Outpost.
@@ -66,9 +66,9 @@ Paste the **## Template** block from [`hookdeck-outpost-agent-prompt.mdx`](../pa
 
 - Pasting a greenfield Next app instead of integrating the **baseline** in the workspace.
 - Publishing only from a demo or **test-only** route with no domain path.
-- **Topics** in code with no README telling the operator to **add** them in Hookdeck when Turn 0 was incomplete (or silently retargeting domain logic to unrelated Turn 0 names).
+- **Topics** in code with no README telling the operator to **add** them in Hookdeck when the onboarding topic list was incomplete (or silently retargeting domain logic to unrelated configured names).
 - Calling Outpost from client components with secrets.
 
 ## Future baselines
 
-Java / .NET “existing app” scenarios can follow the same shape: harness pre-clones a fixed public baseline into the run workspace + Option 3 Turn 1 (user already “in” the app) + Success criteria + `scoreScenarioNN`.
+Java / .NET “existing app” scenarios can follow the same shape: harness pre-clones a fixed public baseline into the run workspace + a natural-language **integration** Turn 1 + Success criteria + `scoreScenarioNN`.
diff --git a/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md b/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md
index 97a54756c..c24787d0a 100644
--- a/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md
+++ b/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md
@@ -12,7 +12,7 @@ Same as [scenario 8](08-integrate-nextjs-existing.md), but stack is **Python + F
 
 - Python 3.10+; **Node.js 18+** (for the frontend); `git` available.
 - **Docker** (recommended) — template dev flow uses Docker Compose for API, DB, and frontend; see repository `development.md`.
-- Same Turn 0 placeholders as other scenarios (`OUTPOST_API_KEY` **not** in the prompt text; test destination URL from dashboard).
+- Same **initial onboarding prompt** as other scenarios (`OUTPOST_API_KEY` **not** in the pasted text; test destination URL from dashboard).
 
 ## Eval harness
 
@@ -45,9 +45,11 @@ Paste the **## Template** from [`hookdeck-outpost-agent-prompt.mdx`](../pages/qu
 
 ### Turn 1 — User
 
-> Option 3 — integrate Outpost into a real codebase. **We’re already in the full-stack FastAPI template in this workspace** — the repository is present here. Follow the project’s dev docs to get backend (and frontend if useful) running, then add **Hookdeck Outpost** for customer webhooks.
+> This workspace is our **full-stack FastAPI + React** product (the template we ship). Follow the repo’s dev docs to bring up API, DB, and frontend, then integrate **Hookdeck Outpost** for **per-customer webhooks**.
 >
-> Hook publishing to **one real event** that already exists in the app (users, items, teams, whatever fits). **Topic strings should match that domain**; if Turn 0’s list doesn’t include the right names yet, document what the operator must **add in the Outpost project**—don’t contort the app to arbitrary topics unless this is explicitly a minimal wiring pass. Document topics, how tenants register webhook URLs, and env vars. Don’t leak the API key to the client.
+> I want customers to manage **destinations** from the product (or through our authenticated API), a **separate** way to **fire a test event** that isn’t pretending to be production traffic, and enough **delivery visibility** that they can see **events**, **attempts**, and **retry** when something failed—all **through our backend**, never with the platform API key in the browser.
+>
+> Wire **publish** into **one real workflow** we already have (signups, records, teams—whatever fits this codebase). **Topics** should match that workflow. If Hookdeck doesn’t list a name we need, document what I should add there; don’t reshape the product around random topic strings unless I’ve said this is wiring-only. Document env vars and how **tenant** maps to our customer or team model. Don’t expose the API key to clients.
 
 ### Turn 2 — User (optional)
 
@@ -55,23 +57,26 @@ Paste the **## Template** from [`hookdeck-outpost-agent-prompt.mdx`](../pages/qu
 
 ## Success criteria
 
-**Measurement:** Heuristic `scoreScenario09` in [`src/score-transcript.ts`](../src/score-transcript.ts); LLM judge; execution manual.
+**Measurement:** Heuristic `scoreScenario09` in [`src/score-transcript.ts`](../src/score-transcript.ts); LLM judge (reads this section); execution manual.
 
 - **full-stack-fastapi-template** (or documented alternative) present via harness **`preSteps`** with install steps in the transcript or tree.
 - **`outpost_sdk`** with **`publish.event`** (and related calls as needed) on a **real** code path in the **backend** (server-side only for secrets)—**not** only a synthetic test-publish endpoint unless the scenario was explicitly scoped to wiring-only.
-- API key from **environment** or secure settings — not hard-coded or exposed to clients.
-- **Topic reconciliation:** each **`topic` in code** ties to a real domain event; gaps vs Turn 0 are resolved by **operator adding topics in Hookdeck** (documented), not by retargeting domain logic to a mismatched list unless wiring-only scope was agreed.
-- **Destination** story documented; if the app has a UI, linking or exposing **safe** controls for customer destinations is a plus; **tenant id** usage consistent with publish.
-- README (or equivalent) lists **env vars** for Outpost.
-- **Execution (full pass):** Stack runs per template docs; trigger a **real domain action** that fires publish; Outpost accepts. A test-publish button may be used **additionally** for smoke. *Skip for transcript-only.*
+- **Domain + test publish:** At least one **`publish` on a real domain path** (entity create/update, signup, etc.). A **separate** test-publish path or control is **also** expected for this baseline so operators can smoke-test wiring without waiting on production traffic—it **does not** replace the domain publish requirement.
+- API key from **environment** or secure backend settings only — not hard-coded, not exposed via **`NEXT_PUBLIC_*`**, **`VITE_*`**, or other client-visible env patterns.
+- **Topic reconciliation:** each **`topic` in code** ties to a real domain event; gaps vs the **configured project topic list** from onboarding are resolved by **adding topics in Hookdeck** (documented), not by retargeting domain logic to a mismatched list unless wiring-only scope was agreed.
+- **Destinations + tenant:** Per-customer (or per-team) **destination** management is **documented** and, where this template ships a dashboard, implemented with **safe** UI or BFF routes (list/create/edit as appropriate). **`tenant_id`** (or equivalent) is consistent between publish and destination APIs.
+- **Delivery visibility (full-stack bar):** Because this baseline includes a **customer-facing UI**, the product should expose **event activity** aligned with [Building your own UI](../../pages/guides/building-your-own-ui.mdx): customers can see **events** (e.g. filterable by destination), **attempts** for a selected event, and **manual retry** for failed deliveries—all via **your** authenticated backend calling Outpost (admin key server-side), not from the browser with the platform key. Omit only if the user explicitly scoped the task to **backend-only** or excluded activity UI.
+- **Operator docs:** Root **README**, **backend/README**, **development.md**, or **`.env.example`** (whichever the template uses) lists **Outpost env vars** and how to run and verify.
+- **Execution (full pass):** Stack runs per template docs; trigger a **real domain action** that fires publish; Outpost accepts. Optionally exercise test publish and activity/retry in the UI. *Skip for transcript-only.*
 
 ## Failure modes to note
 
 - Greenfield FastAPI “hello world” instead of the **cloned** baseline.
 - Using raw `httpx` to Outpost when the scenario asks for **`outpost_sdk`**.
-- Putting `OUTPOST_API_KEY` in `NEXT_PUBLIC_*` / client bundles.
-- **Only** test/synthetic publish with no domain hook.
+- Putting `OUTPOST_API_KEY` in `NEXT_PUBLIC_*`, `VITE_*`, or other client bundles.
+- **Only** test/synthetic publish with no domain hook, or **only** domain publish with no **separate** test-publish control when a dashboard is in scope.
+- **No** events/attempts/retry surfaced for customers when the baseline includes a product UI and the user did not ask to skip that scope.
 
 ## Future baselines
 
-Other “existing FastAPI app” pins can follow the same shape: harness pre-clone + Option 3 Turn 1 + success criteria + `scoreScenario09`.
+Other “existing FastAPI app” pins can follow the same shape: harness pre-clone + natural-language integration Turn 1 + success criteria + `scoreScenario09`.
diff --git a/docs/agent-evaluation/scenarios/10-integrate-go-existing.md b/docs/agent-evaluation/scenarios/10-integrate-go-existing.md
index be24d501e..7daab8da6 100644
--- a/docs/agent-evaluation/scenarios/10-integrate-go-existing.md
+++ b/docs/agent-evaluation/scenarios/10-integrate-go-existing.md
@@ -39,9 +39,9 @@ Paste the **## Template** from [`hookdeck-outpost-agent-prompt.mdx`](../pages/qu
 
 ### Turn 1 — User
 
-> Option 3 — existing Go API. **We’re already in the startersaas-go-api tree in this workspace** — the repository is present here. Get it building, then add **Hookdeck Outpost** for outbound webhooks.
+> Existing **Go** API—you’re in this repo with me. Get it building, then add **Hookdeck Outpost** for outbound webhooks.
 >
-> Use **one real handler** as the publish trigger (signup, billing, etc.). **`topic` values should match that domain**; if Turn 0’s list is incomplete, document what to **add in the Outpost project**—don’t bend the handler to wrong topic names just to match the prompt unless this is explicitly minimal wiring. API key from env only. Document how customers register webhook URLs and what to set in env. Use the test destination from the dashboard prompt where it helps.
+> Trigger **publish** from **one real handler** (signup, billing, etc.—not a throwaway test-only route by itself). **`topic` values should match that domain**. If our Hookdeck project’s topic list is missing something, document what to add; don’t point production code at the wrong names just to match a stub list unless I’ve said this is a minimal wiring pass. **`OUTPOST_API_KEY`** from env only. Explain how customers register webhook URLs and what to put in **README** / env. Use the **test receiver URL** from our Hookdeck setup when you want to prove delivery end-to-end.
 
 ### Turn 2 — User (optional)
 

From d5eef9129ef0a50fb500b11b26e10b8cf4fef1c2 Mon Sep 17 00:00:00 2001
From: Phil Leggetter <phil@leggetter.co.uk>
Date: Fri, 10 Apr 2026 02:56:20 +0100
Subject: [PATCH 25/47] feat(eval): extend scenario 09 transcript heuristics

Add no_client_bundled_outpost_key and readme_or_env_docs checks to
scoreScenario09 (align with full-stack success criteria).

Made-with: Cursor
---
 docs/agent-evaluation/src/score-transcript.ts | 24 +++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/docs/agent-evaluation/src/score-transcript.ts b/docs/agent-evaluation/src/score-transcript.ts
index 9bc8df7d4..73bc8d5c8 100644
--- a/docs/agent-evaluation/src/score-transcript.ts
+++ b/docs/agent-evaluation/src/score-transcript.ts
@@ -876,6 +876,19 @@ function scoreScenario09(corpus: string, assistant: string): TranscriptScore {
     detail: env ? "API key from environment / settings" : "Expected OUTPOST_API_KEY from env",
   });
 
+  const clientKeyLeak =
+    /NEXT_PUBLIC_OUTPOST_API_KEY\s*[=:]/.test(t) ||
+    /VITE_OUTPOST_API_KEY\s*[=:]/.test(t) ||
+    /process\.env\.NEXT_PUBLIC_OUTPOST_API_KEY\b/.test(t) ||
+    /import\.meta\.env\.(?:VITE_OUTPOST_API_KEY|NEXT_PUBLIC_OUTPOST_API_KEY)\b/.test(t);
+  checks.push({
+    id: "no_client_bundled_outpost_key",
+    pass: !clientKeyLeak,
+    detail: clientKeyLeak
+      ? "Corpus suggests Outpost API key wired into client-visible env — keep server-side only"
+      : "No client env assignment/access for OUTPOST_API_KEY (NEXT_PUBLIC_/VITE_) in corpus",
+  });
+
   const beyondTest = corpusSuggestsPublishBeyondTestOnly(t);
   checks.push({
     id: "publish_beyond_test_only",
@@ -885,6 +898,17 @@ function scoreScenario09(corpus: string, assistant: string): TranscriptScore {
       : "Expected domain publish (not only publish-test / send test) — see scenario Success criteria",
   });
 
+  const readmeOrEnvDocs =
+    /OUTPOST_API_KEY/.test(t) &&
+    /README|development\.md|\.env\.example|backend\/readme/i.test(t);
+  checks.push({
+    id: "readme_or_env_docs",
+    pass: readmeOrEnvDocs,
+    detail: readmeOrEnvDocs
+      ? "README / development.md / .env.example (or similar) touches OUTPOST_API_KEY"
+      : "Expected operator docs listing OUTPOST env vars (see scenario Success criteria)",
+  });
+
   checks.push({
     id: "no_key_in_reply",
     pass: !containsLikelyLeakedKey(assistant),

From e415e33004172fe15fc1c0575e89400c47199266 Mon Sep 17 00:00:00 2001
From: Phil Leggetter <phil@leggetter.co.uk>
Date: Fri, 10 Apr 2026 02:56:27 +0100
Subject: [PATCH 26/47] feat(eval): persist run lifecycle sidecars

Write eval-run-started.json at scenario start; eval-failure.json on
uncaught errors; eval-aborted.json on SIGTERM/SIGINT. Register signal
handlers so interrupted runs leave a trace (SIGKILL still silent).

Made-with: Cursor
---
 docs/agent-evaluation/src/run-agent-eval.ts | 162 ++++++++++++++------
 1 file changed, 116 insertions(+), 46 deletions(-)

diff --git a/docs/agent-evaluation/src/run-agent-eval.ts b/docs/agent-evaluation/src/run-agent-eval.ts
index 7201e6e51..62c0d4ea1 100644
--- a/docs/agent-evaluation/src/run-agent-eval.ts
+++ b/docs/agent-evaluation/src/run-agent-eval.ts
@@ -7,6 +7,7 @@
  * @see https://platform.claude.com/docs/en/agent-sdk/overview
  */
 
+import { writeFileSync } from "node:fs";
 import { mkdir, readdir, readFile, writeFile } from "node:fs/promises";
 import { dirname, join, resolve, sep } from "node:path";
 import { fileURLToPath } from "node:url";
@@ -39,6 +40,41 @@ const PROMPT_MDX = join(
 const SCENARIOS_DIR = join(EVAL_ROOT, "scenarios");
 const RUNS_DIR = join(EVAL_ROOT, "results", "runs");
 
+/** Set while a scenario is in progress so SIGTERM/SIGINT can leave a sidecar (not SIGKILL). */
+let activeRunDirForSignal: string | null = null;
+
+function registerEvalSignalHandlers(): void {
+  const recordAbort = (signal: string) => {
+    if (!activeRunDirForSignal) return;
+    try {
+      writeFileSync(
+        join(activeRunDirForSignal, "eval-aborted.json"),
+        `${JSON.stringify(
+          {
+            abortedAt: new Date().toISOString(),
+            signal,
+            pid: process.pid,
+            note: "Process exited before transcript.json was written; long agent turns often print little to stdout.",
+          },
+          null,
+          2,
+        )}\n`,
+        "utf8",
+      );
+    } catch {
+      // best-effort
+    }
+  };
+  process.once("SIGTERM", () => {
+    recordAbort("SIGTERM");
+    process.exit(143);
+  });
+  process.once("SIGINT", () => {
+    recordAbort("SIGINT");
+    process.exit(130);
+  });
+}
+
 function isInitSystemMessage(m: SDKMessage): m is SDKSystemMessage {
   return m.type === "system" && m.subtype === "init";
 }
@@ -539,6 +575,8 @@ Agent cwd is usually the run directory. Scenarios may define ## Eval harness (JS
     `Running ${selected.length} scenario(s): ${selected.join(", ")} (heuristic=${String(wantScore)}, llm=${String(wantLlm)})`,
   );
 
+  registerEvalSignalHandlers();
+
   for (const file of selected) {
     const scenarioIdEarly = idFromFilename(file);
     const runDir = join(RUNS_DIR, `${stamp}-scenario-${scenarioIdEarly}`);
@@ -550,59 +588,91 @@ Agent cwd is usually the run directory. Scenarios may define ## Eval harness (JS
     const { agentCwd, writeGuardRoot } = await applyEvalHarness(runDir, harnessConfig);
     const baseOptions = buildBaseOptions(agentCwd, writeGuardRoot);
     console.error(`\n>>> Scenario ${file} (run dir ${runDir}, agent cwd ${agentCwd}) ...`);
-    const result = await runOneScenario(file, filledTemplate, {
-      skipOptional: values["skip-optional"] ?? false,
-      baseOptions,
-      scenarioMarkdown: scenarioMd,
-    });
 
-    const outPath = join(runDir, "transcript.json");
-    const payload = {
-      meta: {
-        scenarioId: result.scenarioId,
-        scenarioFile: result.scenarioFile,
-        runDirectory: runDir,
-        agentWorkspaceCwd: agentCwd,
-        evalHarness: {
-          preStepCount: harnessConfig.preSteps.length,
-          agentCwd: harnessConfig.agentCwd,
+    activeRunDirForSignal = runDir;
+    await writeFile(
+      join(runDir, "eval-run-started.json"),
+      `${JSON.stringify(
+        {
+          startedAt: new Date().toISOString(),
+          pid: process.pid,
+          scenarioFile: file,
+          scenarioId: scenarioIdEarly,
+          note: "If you see this without transcript.json, the run may still be in progress, was interrupted (SIGTERM/SIGINT writes eval-aborted.json), crashed, or was SIGKILL’d (no sidecar). The agent phase often logs little until the turn completes.",
         },
-        repositoryRoot: REPO_ROOT,
-        completedAt: new Date().toISOString(),
-        sessionId: result.sessionId,
-        turns: result.turns,
-      },
-      messages: result.allMessages,
-    };
+        null,
+        2,
+      )}\n`,
+      "utf8",
+    );
 
-    await writeFile(outPath, JSON.stringify(payload, null, 2), "utf8");
-    console.error(`Wrote ${outPath}`);
+    try {
+      const result = await runOneScenario(file, filledTemplate, {
+        skipOptional: values["skip-optional"] ?? false,
+        baseOptions,
+        scenarioMarkdown: scenarioMd,
+      });
 
-    if (wantScore) {
-      const report = await scoreRunFile(outPath);
-      const scorePath = join(runDir, "heuristic-score.json");
-      await writeFile(scorePath, `${JSON.stringify(report, null, 2)}\n`, "utf8");
-      console.error(`Wrote ${scorePath} (transcript: ${report.transcript.passed}/${report.transcript.total}, overallTranscriptPass=${String(report.overallTranscriptPass)})`);
-      if (report.overallTranscriptPass === false) {
-        anyScoreFailure = true;
+      const outPath = join(runDir, "transcript.json");
+      const payload = {
+        meta: {
+          scenarioId: result.scenarioId,
+          scenarioFile: result.scenarioFile,
+          runDirectory: runDir,
+          agentWorkspaceCwd: agentCwd,
+          evalHarness: {
+            preStepCount: harnessConfig.preSteps.length,
+            agentCwd: harnessConfig.agentCwd,
+          },
+          repositoryRoot: REPO_ROOT,
+          completedAt: new Date().toISOString(),
+          sessionId: result.sessionId,
+          turns: result.turns,
+        },
+        messages: result.allMessages,
+      };
+
+      await writeFile(outPath, JSON.stringify(payload, null, 2), "utf8");
+      console.error(`Wrote ${outPath}`);
+
+      if (wantScore) {
+        const report = await scoreRunFile(outPath);
+        const scorePath = join(runDir, "heuristic-score.json");
+        await writeFile(scorePath, `${JSON.stringify(report, null, 2)}\n`, "utf8");
+        console.error(`Wrote ${scorePath} (transcript: ${report.transcript.passed}/${report.transcript.total}, overallTranscriptPass=${String(report.overallTranscriptPass)})`);
+        if (report.overallTranscriptPass === false) {
+          anyScoreFailure = true;
+        }
       }
-    }
 
-    if (wantLlm) {
-      const scenarioPath = scenarioMdPathFromRun(EVAL_ROOT, result.scenarioFile);
-      const llmReport = await llmJudgeRun({
-        runPath: outPath,
-        scenarioMdPath: scenarioPath,
-        apiKey: process.env.ANTHROPIC_API_KEY!.trim(),
-      });
-      const llmPath = join(runDir, "llm-score.json");
-      await writeFile(llmPath, `${JSON.stringify(llmReport, null, 2)}\n`, "utf8");
-      console.error(
-        `Wrote ${llmPath} (LLM overall_transcript_pass=${String(llmReport.overall_transcript_pass)})`,
-      );
-      if (!llmReport.overall_transcript_pass) {
-        anyScoreFailure = true;
+      if (wantLlm) {
+        const scenarioPathForJudge = scenarioMdPathFromRun(EVAL_ROOT, result.scenarioFile);
+        const llmReport = await llmJudgeRun({
+          runPath: outPath,
+          scenarioMdPath: scenarioPathForJudge,
+          apiKey: process.env.ANTHROPIC_API_KEY!.trim(),
+        });
+        const llmPath = join(runDir, "llm-score.json");
+        await writeFile(llmPath, `${JSON.stringify(llmReport, null, 2)}\n`, "utf8");
+        console.error(
+          `Wrote ${llmPath} (LLM overall_transcript_pass=${String(llmReport.overall_transcript_pass)})`,
+        );
+        if (!llmReport.overall_transcript_pass) {
+          anyScoreFailure = true;
+        }
       }
+    } catch (err) {
+      const message = err instanceof Error ? err.message : String(err);
+      const stack = err instanceof Error ? err.stack : undefined;
+      await writeFile(
+        join(runDir, "eval-failure.json"),
+        `${JSON.stringify({ failedAt: new Date().toISOString(), message, stack }, null, 2)}\n`,
+        "utf8",
+      );
+      console.error(`Eval scenario failed (${file}):`, err);
+      throw err;
+    } finally {
+      activeRunDirForSignal = null;
     }
   }
 

From cbb6c516aec8cf7294695a0099ef967150030932 Mon Sep 17 00:00:00 2001
From: Phil Leggetter <phil@leggetter.co.uk>
Date: Fri, 10 Apr 2026 02:56:44 +0100
Subject: [PATCH 27/47] docs(eval): authoring AGENTS, README, shared Cursor
 rule

Add docs/agent-evaluation/AGENTS.md (anti-leakage checklist), root
AGENTS.md pointer, and a Cursor rule scoped to docs/agent-evaluation/.
Document run sidecars, re-scoring, integration verification wording,
and scenario 09 heuristic summary. Fix placeholder fixtures markdown.

Made-with: Cursor
---
 .cursor/rules/agent-evaluation-authoring.mdc  | 14 ++++++
 AGENTS.md                                     |  5 ++
 docs/agent-evaluation/AGENTS.md               | 46 +++++++++++++++++++
 docs/agent-evaluation/README.md               | 20 +++++---
 .../fixtures/placeholder-values-for-turn0.md  | 14 +++---
 5 files changed, 86 insertions(+), 13 deletions(-)
 create mode 100644 .cursor/rules/agent-evaluation-authoring.mdc
 create mode 100644 AGENTS.md
 create mode 100644 docs/agent-evaluation/AGENTS.md

diff --git a/.cursor/rules/agent-evaluation-authoring.mdc b/.cursor/rules/agent-evaluation-authoring.mdc
new file mode 100644
index 000000000..34e509cce
--- /dev/null
+++ b/.cursor/rules/agent-evaluation-authoring.mdc
@@ -0,0 +1,14 @@
+---
+description: Authoring standards for docs/agent-evaluation (no eval leakage in user turns)
+globs: docs/agent-evaluation/**/*
+---
+
+When editing anything under `docs/agent-evaluation/`, read and follow **`docs/agent-evaluation/AGENTS.md`**.
+
+**Quick guardrails for `scenarios/*.md`:**
+
+- **`### Turn N — User`** blockquotes = in-character **product engineer** speech only.
+- **Never** in user lines: `Option 1/2/3`, `Turn 0`, `scenario`, `eval`, `success criteria`, `scoreScenario`, references to “the prompt/instructions you already have” or named template sections.
+- Put rubric detail in **`## Success criteria`** / **Intent** / **Failure modes**, not in the user quote.
+
+Full checklist and rationale: **`docs/agent-evaluation/AGENTS.md`**.
diff --git a/AGENTS.md b/AGENTS.md
new file mode 100644
index 000000000..0fb773eda
--- /dev/null
+++ b/AGENTS.md
@@ -0,0 +1,5 @@
+# Coding agent notes (Outpost)
+
+When you change files under **`docs/agent-evaluation/`** (scenarios, scoring, harness docs), read and apply **[`docs/agent-evaluation/AGENTS.md`](docs/agent-evaluation/AGENTS.md)** first. It defines anti–“teach to the test” rules for user-turn wording and scenario structure.
+
+For this repo’s PR review format, see **`CLAUDE.md`**.
diff --git a/docs/agent-evaluation/AGENTS.md b/docs/agent-evaluation/AGENTS.md
new file mode 100644
index 000000000..5ab942505
--- /dev/null
+++ b/docs/agent-evaluation/AGENTS.md
@@ -0,0 +1,46 @@
+# Agent evaluation — authoring rules for humans & coding agents
+
+This file applies to **everything under `docs/agent-evaluation/`** (scenarios, README, tracker, harness TypeScript). Follow it when adding or editing eval specs so we do not **teach to the test** or confuse **evaluator docs** with **in-character user speech**.
+
+## Who reads what
+
+| Audience | Content |
+|----------|---------|
+| **The model under test** | Turn 0 = pasted [`hookdeck-outpost-agent-prompt.mdx`](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx) template only, plus **Turn N — User** blockquotes (verbatim user role-play). |
+| **Humans / harness** | Intent, preconditions, eval harness JSON, Success criteria, Failure modes, `score-transcript.ts`, README. |
+
+**Never** put harness vocabulary into **user** lines. The user is a product engineer, not an eval runner.
+
+## Anti-leakage rules (user turns)
+
+In **`### Turn N — User`** blockquotes, **do not** use:
+
+- **Option 1 / 2 / 3** (those labels exist only inside the dashboard template; a real user says what they want in plain language).
+- **Turn 0**, **Turn 1**, or any **turn** numbering (that is script metadata).
+- Phrases like **“the instructions you already have”**, **“the full-stack section of the prompt”**, **“follow the Hookdeck Outpost template”** as a stand-in for requirements (the model already has Turn 0; state the *product ask*, not a pointer to a doc section).
+- **“Match the prompt”**, **“dashboard prompt”**, **“eval”**, **“scenario”**, **“success criteria”**, **heuristic names**, **`scoreScenarioNN`**.
+
+**Do** use natural operator language: stack, repo, product behavior, security (key on server), domain topics, README/env, Hookdeck project/topics **as the customer would say them**.
+
+It is fine for **Success criteria**, **Failure modes**, and **Intent** to name `scoreScenarioNN`, Turn 0, Option 3, etc. — those sections are not pasted as the user.
+
+## Alignment without parroting
+
+- **Product bar** (domain publish, topic reconciliation, full-stack UI depth) belongs in **Success criteria** and in the **prompt template** in `hookdeck-outpost-agent-prompt.mdx`.
+- **User turns** should **request outcomes** (“I need customers to see failed deliveries and retry”) not **cite** where in the template that is spelled out.
+
+If you add a new requirement, update **Success criteria** (and heuristics only when a **durable, low–false-positive** check exists). Do not stuff the verbatim rubric into the user quote.
+
+## Pre-merge checklist (scenarios)
+
+Before merging changes to `scenarios/*.md`:
+
+- [ ] Every **`> ...` user** line reads like a **real customer** message (read aloud test).
+- [ ] No **Option N** / **Turn 0** / **scenario** / **prompt section** leakage in user blockquotes.
+- [ ] **Success criteria** still state the full bar; nothing removed from criteria and only moved into user text.
+- [ ] If integration depth changed, **`src/score-transcript.ts`** and this **README** scenario table are updated when rubrics change.
+
+## Where Cursor loads this
+
+- A **repo-root** [`AGENTS.md`](../../AGENTS.md) points here so agents see this folder’s rules.
+- [`.cursor/rules/agent-evaluation-authoring.mdc`](../../.cursor/rules/agent-evaluation-authoring.mdc) applies when editing paths under `docs/agent-evaluation/`.
diff --git a/docs/agent-evaluation/README.md b/docs/agent-evaluation/README.md
index 87941a677..3cf6ffb4d 100644
--- a/docs/agent-evaluation/README.md
+++ b/docs/agent-evaluation/README.md
@@ -2,6 +2,8 @@
 
 This folder contains **manual** scenario specs (markdown) and an **automated** runner that uses the [Claude Agent SDK](https://platform.claude.com/docs/en/agent-sdk/overview) (`src/run-agent-eval.ts`).
 
+**Authoring standards (user-turn wording, no eval leakage):** [`AGENTS.md`](AGENTS.md) — also enforced via [`.cursor/rules/agent-evaluation-authoring.mdc`](../../.cursor/rules/agent-evaluation-authoring.mdc) when editing here.
+
 ## Where success criteria live
 
 | What | Where |
@@ -19,13 +21,17 @@ Each scenario run uses one directory:
 
 `results/runs/<ISO-stamp>-scenario-NN/`
 
-- **`transcript.json`** — full SDK log  
-- **`heuristic-score.json`** / **`llm-score.json`** — by default (unless disabled above)  
+- **`transcript.json`** — full SDK log (written only **after** the agent finishes all turns — long runs may show little console output until then)
+- **`eval-run-started.json`** — created as soon as a scenario begins (pid, scenario id); if present **without** `transcript.json`, the run was interrupted, is still going, crashed, or was **SIGKILL**’d (no sidecar for SIGKILL)
+- **`eval-failure.json`** — uncaught exception before a transcript was written
+- **`eval-aborted.json`** — **SIGTERM** or **SIGINT** (e.g. stopping the process) before completion
+- **`heuristic-score.json`** / **`llm-score.json`** — by default (unless disabled above)
 - **Agent-written files** — the SDK **`cwd`** is this directory. Defaults include **`Write`**, **`Edit`**, and **`Bash`** for clones, installs, and generated code.
 
-Re-score a finished run without re-invoking the agent:
+Re-score a finished run without re-invoking the agent — uses **today’s** [`src/score-transcript.ts`](src/score-transcript.ts) and **scenario markdown on disk** (so LLM criteria update when you edit **`## Success criteria`**):
 
-- **`npm run score -- --run results/runs/<dir>`** — heuristic (add **`--llm`** for LLM only, **`--write`** to persist sidecars).
+- **`npm run score -- --run results/runs/<dir> --write`** — refresh **`heuristic-score.json`**
+- Add **`--llm`** to also re-run the judge and write **`llm-score.json`** (needs **`ANTHROPIC_API_KEY`**)
 
 Legacy flat files `*-scenario-NN.json` next to `runs/` are still accepted by **`npm run score`** for older runs.
 
@@ -102,9 +108,9 @@ A **full pass** also answers: *did the generated curl / script / app succeed aga
 
 #### Integration scenarios (08–10): depth to verify
 
-These measure **Option 3** (existing app), not a greenfield demo. When you **execute** the artifact:
+These measure **existing-app integration**, not a greenfield demo. When you **execute** the artifact:
 
-- **Topic reconciliation:** Confirm README maps **`publish` topics** to **real domain events** and, when Turn 0 is incomplete, tells the operator to **add topics in Hookdeck**—not to retarget the app to a stale list (unless the scenario was explicitly wiring-only).
+- **Topic reconciliation:** Confirm README maps **`publish` topics** to **real domain events** and, when the **configured topic list from onboarding** is incomplete, tells the operator to **add topics in Hookdeck**—not to retarget the app to a stale list (unless the scenario was explicitly wiring-only).
 - **Domain publish:** Prefer a smoke step that performs a **real product action** (signup, create entity, etc.) and observe an accepted publish—not **only** a “send test event” button.
 - **Heuristic `publish_beyond_test_only`:** [`score-transcript.ts`](src/score-transcript.ts) adds a weak automated check that the transcript corpus suggests publish beyond synthetic test-only paths; it is **not** a substitute for execution or the LLM judge reading **Success criteria**.
 
@@ -164,7 +170,7 @@ There is still **no single portable “IDE agent” CLI** for all vendors; the S
 | 06 | `scoreScenario06` | FastAPI, `outpost_sdk`, uvicorn, server env, two flows, README, webhook docs |
 | 07 | `scoreScenario07` | `net/http`, Go SDK + `CreateDestinationCreateWebhook`, HTML UI, two flows, `go run`, README |
 | 08 | `scoreScenario08` | Clone **next-saas-starter** (or git baseline), TS SDK, publish/destinations/tenants, server env key, per-customer webhook story |
-| 09 | `scoreScenario09` | Clone **full-stack-fastapi-template** (or git baseline), `outpost_sdk`, integration + domain hook, env key |
+| 09 | `scoreScenario09` | Clone **full-stack-fastapi-template** (or git baseline), `outpost_sdk`, integration + domain hook, env key, no client `NEXT_PUBLIC_`/`VITE_` key wiring, `publish_beyond_test_only`, README/env docs signal |
 | 10 | `scoreScenario10` | Clone **startersaas-go-api** (or git baseline), Go Outpost SDK, publish + handler hook, env key |
 
 Export **`SCENARIO_IDS_WITH_HEURISTIC_RUBRIC`** in `score-transcript.ts` lists IDs **01–10** for tooling.
diff --git a/docs/agent-evaluation/fixtures/placeholder-values-for-turn0.md b/docs/agent-evaluation/fixtures/placeholder-values-for-turn0.md
index 39d344677..2336f6352 100644
--- a/docs/agent-evaluation/fixtures/placeholder-values-for-turn0.md
+++ b/docs/agent-evaluation/fixtures/placeholder-values-for-turn0.md
@@ -12,13 +12,15 @@ For **`npm run eval -- --scenario …`** (or **`--scenarios`** / **`--all`**), t
 
 ## Example substitutions (non-secret)
 
-| Placeholder | Example |
-|-------------|---------|
-| `{{API_BASE_URL}}` | `https://api.outpost.hookdeck.com/2025-07-01` |
-| `{{TOPICS_LIST}}` | `- user.created` |
+
+| Placeholder                | Example                                                                                                                                                             |
+| -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `{{API_BASE_URL}}`         | `https://api.outpost.hookdeck.com/2025-07-01`                                                                                                                       |
+| `{{TOPICS_LIST}}`          | `- user.created`                                                                                                                                                    |
 | `{{TEST_DESTINATION_URL}}` | Hookdeck Console **Source** URL the dashboard feeds in (for automated evals, set `EVAL_TEST_DESTINATION_URL` to the same value). Example: `https://hkdk.events/...` |
-| `{{DOCS_URL}}` | `https://outpost.hookdeck.com/docs` (local Zudoku: same paths under `/docs`) |
-| `{{LLMS_FULL_URL}}` | Omit the line in the template if unused, or your public `llms-full.txt` URL |
+| `{{DOCS_URL}}`             | `https://outpost.hookdeck.com/docs` (local Zudoku: same paths under `/docs`)                                                                                        |
+| `{{LLMS_FULL_URL}}`        | Omit the line in the template if unused, or your public `llms-full.txt` URL                                                                                         |
+
 
 ---
 

From ce0be6b7407aa1005ca36f94fcbfd0b1fe09a546 Mon Sep 17 00:00:00 2001
From: Phil Leggetter <phil@leggetter.co.uk>
Date: Fri, 10 Apr 2026 10:38:01 +0100
Subject: [PATCH 28/47] feat(agent-evaluation): read/bash sandbox and sibling
 harness sidecars

Restrict PreToolUse Read/Glob/Grep to the run directory (and docs/ when
EVAL_LOCAL_DOCS). Block Bash that touches the monorepo root outside those
areas; deny Agent unless EVAL_ALLOW_AGENT_TOOL. Split read vs write guard
env vars.

Write eval-started, eval-failure, and eval-aborted next to the run folder
under results/runs/ so the agent cannot read harness metadata. SIGTERM/
SIGINT abort payload includes runDirectory.

Made-with: Cursor
---
 docs/agent-evaluation/src/run-agent-eval.ts | 289 +++++++++++++++++---
 1 file changed, 253 insertions(+), 36 deletions(-)

diff --git a/docs/agent-evaluation/src/run-agent-eval.ts b/docs/agent-evaluation/src/run-agent-eval.ts
index 62c0d4ea1..3c34c7d24 100644
--- a/docs/agent-evaluation/src/run-agent-eval.ts
+++ b/docs/agent-evaluation/src/run-agent-eval.ts
@@ -9,7 +9,7 @@
 
 import { writeFileSync } from "node:fs";
 import { mkdir, readdir, readFile, writeFile } from "node:fs/promises";
-import { dirname, join, resolve, sep } from "node:path";
+import { basename, dirname, join, resolve, sep } from "node:path";
 import { fileURLToPath } from "node:url";
 import { parseArgs } from "node:util";
 import dotenv from "dotenv";
@@ -40,20 +40,39 @@ const PROMPT_MDX = join(
 const SCENARIOS_DIR = join(EVAL_ROOT, "scenarios");
 const RUNS_DIR = join(EVAL_ROOT, "results", "runs");
 
-/** Set while a scenario is in progress so SIGTERM/SIGINT can leave a sidecar (not SIGKILL). */
-let activeRunDirForSignal: string | null = null;
+/**
+ * Harness-only status files next to the run folder (not inside `runDir`) so the agent sandbox cannot Read them.
+ * Example: `…/runs/2026-…-scenario-08/transcript.json` vs `…/runs/2026-…-scenario-08.eval-started.json`.
+ */
+function harnessSidecarPaths(runDir: string): {
+  started: string;
+  failure: string;
+  aborted: string;
+} {
+  const stem = basename(runDir);
+  return {
+    started: join(RUNS_DIR, `${stem}.eval-started.json`),
+    failure: join(RUNS_DIR, `${stem}.eval-failure.json`),
+    aborted: join(RUNS_DIR, `${stem}.eval-aborted.json`),
+  };
+}
+
+/** Paths for SIGTERM/SIGINT abort sidecar while a scenario is in progress (not SIGKILL). */
+let activeHarnessAbortContext: { readonly path: string; readonly runDirectory: string } | null = null;
 
 function registerEvalSignalHandlers(): void {
   const recordAbort = (signal: string) => {
-    if (!activeRunDirForSignal) return;
+    const ctx = activeHarnessAbortContext;
+    if (!ctx) return;
     try {
       writeFileSync(
-        join(activeRunDirForSignal, "eval-aborted.json"),
+        ctx.path,
         `${JSON.stringify(
           {
             abortedAt: new Date().toISOString(),
             signal,
             pid: process.pid,
+            runDirectory: ctx.runDirectory,
             note: "Process exited before transcript.json was written; long agent turns often print little to stdout.",
           },
           null,
@@ -367,7 +386,42 @@ function filePathIsInsideRunDir(runDir: string, filePath: string): boolean {
   return target.startsWith(prefix);
 }
 
-function toolInputFilePath(toolName: string, toolInput: unknown): string | undefined {
+function resolveMaybeRelativePath(p: string, agentCwd: string): string {
+  if (p.startsWith(sep) || /^[A-Za-z]:[\\/]/.test(p)) {
+    return resolve(p);
+  }
+  return resolve(agentCwd, p);
+}
+
+/** Read/Glob/Grep may touch the run directory, or (with local docs) only `repoRoot/docs`. */
+function pathAllowedForReadTool(
+  absPath: string,
+  runDir: string,
+  repoRoot: string,
+  localDocs: boolean,
+): boolean {
+  const p = resolve(absPath);
+  if (filePathIsInsideRunDir(runDir, p)) return true;
+  if (localDocs && filePathIsInsideRunDir(join(repoRoot, "docs"), p)) return true;
+  return false;
+}
+
+/**
+ * Bash: block commands that reference the Outpost repo root unless the reference stays under
+ * `runDir` or (local docs) `repoRoot/docs`.
+ */
+function bashCommandAllowed(command: string, runDir: string, repoRoot: string, localDocs: boolean): boolean {
+  const rr = resolve(repoRoot);
+  const rd = resolve(runDir);
+  const docRoot = localDocs ? resolve(join(repoRoot, "docs")) : null;
+  if (!command.includes(rr)) return true;
+  if (command.includes(rd)) return true;
+  if (docRoot && command.includes(docRoot)) return true;
+  if (localDocs && command.includes(join(repoRoot, "docs"))) return true;
+  return false;
+}
+
+function toolInputWritePath(toolName: string, toolInput: unknown): string | undefined {
   if (toolName !== "Write" && toolName !== "Edit" && toolName !== "NotebookEdit") {
     return undefined;
   }
@@ -380,23 +434,140 @@ function toolInputFilePath(toolName: string, toolInput: unknown): string | undef
   return undefined;
 }
 
+function toolInputReadFilePath(toolInput: unknown): string | undefined {
+  if (typeof toolInput !== "object" || toolInput === null) return undefined;
+  const v = (toolInput as Record<string, unknown>).file_path;
+  return typeof v === "string" && v.length > 0 ? v : undefined;
+}
+
+function preToolDeny(reason: string) {
+  return {
+    hookSpecificOutput: {
+      hookEventName: "PreToolUse" as const,
+      permissionDecision: "deny" as const,
+      permissionDecisionReason: reason,
+    },
+  };
+}
+
+/**
+ * Appended to Turn 0 so the model does not treat the Hookdeck Outpost monorepo as the integration target.
+ */
+function buildWorkspaceBoundaryAppendix(
+  runDir: string,
+  agentCwd: string,
+  repoRoot: string,
+  localDocs: boolean,
+): string {
+  const docsPath = join(repoRoot, "docs");
+  const docBullet = localDocs
+    ? `\n- You **may** use Read/Glob/Grep only under **\`${docsPath}\`** when following the **Documentation (local repository)** paths in this prompt—not elsewhere under **\`${repoRoot}\`** (no \`sdks/\`, \`internal/\`, \`go.mod\` at repo root, etc.).`
+    : `\n- Do **not** read or search the Hookdeck Outpost checkout on disk outside **\`${runDir}\`**; use the documentation URLs already listed above.`;
+
+  return `
+
+### Workspace boundary (automated eval session)
+
+- The **integration target** is **only** under **\`${runDir}\`** (shell cwd: **\`${agentCwd}\`**). Install dependencies, add SDK usage, routes, UI, and env/README notes **there**.
+- Do **not** use Read, Glob, Grep, or Bash to explore **\`${repoRoot}\`** except:${docBullet}
+- Do **not** use the **Agent** tool to spider the monorepo or another tree; implement the integration directly in this workspace.
+`;
+}
+
 /**
- * PreToolUse hook: deny Write/Edit/NotebookEdit outside the run dir.
- * `canUseTool` is not reliable under `permissionMode: dontAsk`; hooks receive `permissionDecision` instead.
+ * PreToolUse hook: Write/Edit only under run dir; Read/Glob/Grep/Bash constrained to run dir (+ docs/ when EVAL_LOCAL_DOCS).
+ * \`EVAL_DISABLE_WORKSPACE_READ_GUARD=1\` — allow Read/Glob/Grep/Bash/Agent outside the sandbox.
+ * \`EVAL_DISABLE_WORKSPACE_WRITE_GUARD=1\` — allow Write/Edit outside the run directory (read sandbox unchanged unless also disabled above).
  */
-function createRunDirPreToolHook(allowedRootDir: string) {
+function createRunDirPreToolHook(ctx: {
+  allowedRootDir: string;
+  agentCwd: string;
+  runDir: string;
+  repoRoot: string;
+  localDocs: boolean;
+  readGuardOn: boolean;
+  writeGuardOn: boolean;
+}) {
+  const { allowedRootDir, agentCwd, runDir, repoRoot, localDocs, readGuardOn, writeGuardOn } = ctx;
+
   return async (input: HookInput) => {
     if (input.hook_event_name !== "PreToolUse") return {};
-    const candidate = toolInputFilePath(input.tool_name, input.tool_input);
-    if (!candidate) return {};
-    if (filePathIsInsideRunDir(allowedRootDir, candidate)) return {};
-    return {
-      hookSpecificOutput: {
-        hookEventName: "PreToolUse" as const,
-        permissionDecision: "deny" as const,
-        permissionDecisionReason: `Outpost agent-eval: ${input.tool_name} must target only the scenario workspace. Use a path under ${allowedRootDir} (e.g. outpost-quickstart.sh). Refused: ${resolve(candidate)}`,
-      },
-    };
+
+    if (readGuardOn && input.tool_name === "Agent" && !envFlagTruthy(process.env.EVAL_ALLOW_AGENT_TOOL)) {
+      return preToolDeny(
+        "Outpost agent-eval: the Agent subagent is disabled for fair scoring (set EVAL_ALLOW_AGENT_TOOL=1 to allow).",
+      );
+    }
+
+    if (readGuardOn && input.tool_name === "Read") {
+      const raw = toolInputReadFilePath(input.tool_input);
+      if (raw) {
+        const abs = resolveMaybeRelativePath(raw, agentCwd);
+        if (!pathAllowedForReadTool(abs, runDir, repoRoot, localDocs)) {
+          return preToolDeny(
+            `Outpost agent-eval: Read must stay under the scenario run directory or (with EVAL_LOCAL_DOCS) ${join(repoRoot, "docs")}. Refused: ${abs}`,
+          );
+        }
+      }
+      return {};
+    }
+
+    if (readGuardOn && input.tool_name === "Glob") {
+      const inp = input.tool_input;
+      if (typeof inp === "object" && inp !== null) {
+        const pathRaw = (inp as Record<string, unknown>).path;
+        if (typeof pathRaw === "string" && pathRaw.length > 0) {
+          const abs = resolveMaybeRelativePath(pathRaw, agentCwd);
+          if (!pathAllowedForReadTool(abs, runDir, repoRoot, localDocs)) {
+            return preToolDeny(
+              `Outpost agent-eval: Glob path must stay under the run directory or repo docs/. Refused: ${abs}`,
+            );
+          }
+        }
+      }
+      return {};
+    }
+
+    if (readGuardOn && input.tool_name === "Grep") {
+      const inp = input.tool_input;
+      if (typeof inp === "object" && inp !== null) {
+        const pathRaw = (inp as Record<string, unknown>).path;
+        if (typeof pathRaw === "string" && pathRaw.length > 0) {
+          const abs = resolveMaybeRelativePath(pathRaw, agentCwd);
+          if (!pathAllowedForReadTool(abs, runDir, repoRoot, localDocs)) {
+            return preToolDeny(
+              `Outpost agent-eval: Grep path must stay under the run directory or repo docs/. Refused: ${abs}`,
+            );
+          }
+        }
+      }
+      return {};
+    }
+
+    if (readGuardOn && input.tool_name === "Bash") {
+      const inp = input.tool_input;
+      if (typeof inp === "object" && inp !== null) {
+        const cmd = (inp as Record<string, unknown>).command;
+        if (typeof cmd === "string" && cmd.trim().length > 0) {
+          if (!bashCommandAllowed(cmd, runDir, repoRoot, localDocs)) {
+            return preToolDeny(
+              `Outpost agent-eval: Bash must not traverse the Outpost monorepo outside this run (or docs/ when EVAL_LOCAL_DOCS=1). Refused command prefix: ${cmd.slice(0, 120)}${cmd.length > 120 ? "…" : ""}`,
+            );
+          }
+        }
+      }
+      return {};
+    }
+
+    if (writeGuardOn) {
+      const candidate = toolInputWritePath(input.tool_name, input.tool_input);
+      if (candidate && !filePathIsInsideRunDir(allowedRootDir, candidate)) {
+        return preToolDeny(
+          `Outpost agent-eval: ${input.tool_name} must target only the scenario run directory tree. Use a path under ${allowedRootDir}. Refused: ${resolve(candidate)}`,
+        );
+      }
+    }
+    return {};
   };
 }
 
@@ -413,11 +584,14 @@ function defaultEvalTools(env: NodeJS.ProcessEnv): string {
     : "Read,Glob,Grep,WebFetch,Write,Edit,Bash";
 }
 
-/**
- * @param agentWorkspaceCwd — process cwd for the agent (per-run directory, or a subfolder when the scenario defines `agentCwd` in ## Eval harness).
- * @param writeGuardRoot — PreToolUse hook allows Write/Edit only under this path (usually the per-run directory so the clone stays inside it).
- */
-function buildBaseOptions(agentWorkspaceCwd: string, writeGuardRoot: string): Options {
+function buildBaseOptions(ctx: {
+  agentCwd: string;
+  writeGuardRoot: string;
+  runDir: string;
+  repoRoot: string;
+  localDocs: boolean;
+}): Options {
+  const { agentCwd, writeGuardRoot, runDir, repoRoot, localDocs } = ctx;
   const toolsRaw = defaultEvalTools(process.env);
   const allowedTools = toolsRaw
     .split(",")
@@ -432,7 +606,7 @@ function buildBaseOptions(agentWorkspaceCwd: string, writeGuardRoot: string): Op
   const persistSession = process.env.EVAL_PERSIST_SESSION !== "false";
 
   const o: Options = {
-    cwd: agentWorkspaceCwd,
+    cwd: agentCwd,
     allowedTools,
     permissionMode: mode,
     maxTurns: Number.isFinite(maxTurns) ? maxTurns : 80,
@@ -443,9 +617,25 @@ function buildBaseOptions(agentWorkspaceCwd: string, writeGuardRoot: string): Op
     } as Record<string, string | undefined>,
   };
 
-  if (!envFlagTruthy(process.env.EVAL_DISABLE_WORKSPACE_WRITE_GUARD)) {
+  const readGuardOn = !envFlagTruthy(process.env.EVAL_DISABLE_WORKSPACE_READ_GUARD);
+  const writeGuardOn = !envFlagTruthy(process.env.EVAL_DISABLE_WORKSPACE_WRITE_GUARD);
+  if (readGuardOn || writeGuardOn) {
     o.hooks = {
-      PreToolUse: [{ hooks: [createRunDirPreToolHook(writeGuardRoot)] }],
+      PreToolUse: [
+        {
+          hooks: [
+            createRunDirPreToolHook({
+              allowedRootDir: writeGuardRoot,
+              agentCwd,
+              runDir,
+              repoRoot,
+              localDocs,
+              readGuardOn,
+              writeGuardOn,
+            }),
+          ],
+        },
+      ],
     };
   }
 
@@ -504,6 +694,8 @@ Environment:
   EVAL_PERMISSION_MODE  Optional (default: dontAsk)
   EVAL_PERSIST_SESSION  Set to "false" to disable session persistence (breaks multi-turn resume)
   EVAL_DISABLE_WORKSPACE_WRITE_GUARD  Set to 1 to allow Write/Edit outside the run dir (not recommended)
+  EVAL_DISABLE_WORKSPACE_READ_GUARD   Set to 1 to allow Read/Glob/Grep/Bash/Agent outside the run dir (+ docs/ when local)
+  EVAL_ALLOW_AGENT_TOOL               Set to 1 to allow the Agent subagent (default: denied for fair scoring)
   EVAL_SKIP_HARNESS_PRE_STEPS       Set to 1 to skip ## Eval harness preSteps (git_clone, etc.); see scenario markdown
 
 Outputs under docs/agent-evaluation/results/runs/ (gitignored): each scenario gets
@@ -554,8 +746,17 @@ Agent cwd is usually the run directory. Scenarios may define ## Eval harness (JS
   }
 
   if (values["dry-run"]) {
+    const localDocs = envFlagTruthy(process.env.EVAL_LOCAL_DOCS);
+    const sampleRun = join(RUNS_DIR, "dry-run-example-scenario");
+    const sampleAgent = join(sampleRun, "app-baseline");
+    const boundarySample = buildWorkspaceBoundaryAppendix(sampleRun, sampleAgent, REPO_ROOT, localDocs);
     console.log("Dry run: would execute", selected.join(", "));
-    console.log("Turn 0 length (chars):", filledTemplate.length);
+    console.log(
+      "Turn 0 base template (chars):",
+      filledTemplate.length,
+      "| + workspace boundary (~chars):",
+      boundarySample.length,
+    );
     process.exit(0);
   }
 
@@ -586,19 +787,35 @@ Agent cwd is usually the run directory. Scenarios may define ## Eval harness (JS
     const scenarioMd = await readFile(scenarioPath, "utf8");
     const harnessConfig = parseEvalHarness(scenarioMd);
     const { agentCwd, writeGuardRoot } = await applyEvalHarness(runDir, harnessConfig);
-    const baseOptions = buildBaseOptions(agentCwd, writeGuardRoot);
+    const localDocs = envFlagTruthy(process.env.EVAL_LOCAL_DOCS);
+    const baseOptions = buildBaseOptions({
+      agentCwd,
+      writeGuardRoot,
+      runDir,
+      repoRoot: REPO_ROOT,
+      localDocs,
+    });
+    const turn0Prompt =
+      filledTemplate + buildWorkspaceBoundaryAppendix(runDir, agentCwd, REPO_ROOT, localDocs);
     console.error(`\n>>> Scenario ${file} (run dir ${runDir}, agent cwd ${agentCwd}) ...`);
 
-    activeRunDirForSignal = runDir;
+    const sidecars = harnessSidecarPaths(runDir);
+    activeHarnessAbortContext = { path: sidecars.aborted, runDirectory: runDir };
     await writeFile(
-      join(runDir, "eval-run-started.json"),
+      sidecars.started,
       `${JSON.stringify(
         {
           startedAt: new Date().toISOString(),
           pid: process.pid,
           scenarioFile: file,
           scenarioId: scenarioIdEarly,
-          note: "If you see this without transcript.json, the run may still be in progress, was interrupted (SIGTERM/SIGINT writes eval-aborted.json), crashed, or was SIGKILL’d (no sidecar). The agent phase often logs little until the turn completes.",
+          runDirectory: runDir,
+          harnessSidecars: {
+            started: sidecars.started,
+            failure: sidecars.failure,
+            aborted: sidecars.aborted,
+          },
+          note: "Transcript and score JSON live under runDirectory. Harness *.eval-*.json paths are siblings of the run folder (not inside it) so the agent cannot read eval metadata.",
         },
         null,
         2,
@@ -607,7 +824,7 @@ Agent cwd is usually the run directory. Scenarios may define ## Eval harness (JS
     );
 
     try {
-      const result = await runOneScenario(file, filledTemplate, {
+      const result = await runOneScenario(file, turn0Prompt, {
         skipOptional: values["skip-optional"] ?? false,
         baseOptions,
         scenarioMarkdown: scenarioMd,
@@ -665,14 +882,14 @@ Agent cwd is usually the run directory. Scenarios may define ## Eval harness (JS
       const message = err instanceof Error ? err.message : String(err);
       const stack = err instanceof Error ? err.stack : undefined;
       await writeFile(
-        join(runDir, "eval-failure.json"),
-        `${JSON.stringify({ failedAt: new Date().toISOString(), message, stack }, null, 2)}\n`,
+        sidecars.failure,
+        `${JSON.stringify({ failedAt: new Date().toISOString(), message, stack, runDirectory: runDir }, null, 2)}\n`,
         "utf8",
       );
       console.error(`Eval scenario failed (${file}):`, err);
       throw err;
     } finally {
-      activeRunDirForSignal = null;
+      activeHarnessAbortContext = null;
     }
   }
 

From 8ab658f2435f7d83ed161f5bc9846598244dba99 Mon Sep 17 00:00:00 2001
From: Phil Leggetter <phil@leggetter.co.uk>
Date: Fri, 10 Apr 2026 10:38:48 +0100
Subject: [PATCH 29/47] docs(agent-evaluation): document sidecars, sandbox, and
 env vars

Describe sibling *.eval-*.json harness files and expanded PreToolUse
permissions (read guard, bash, Agent tool).

Made-with: Cursor
---
 docs/agent-evaluation/README.md | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/docs/agent-evaluation/README.md b/docs/agent-evaluation/README.md
index 3cf6ffb4d..94246c975 100644
--- a/docs/agent-evaluation/README.md
+++ b/docs/agent-evaluation/README.md
@@ -22,9 +22,11 @@ Each scenario run uses one directory:
 `results/runs/<ISO-stamp>-scenario-NN/`
 
 - **`transcript.json`** — full SDK log (written only **after** the agent finishes all turns — long runs may show little console output until then)
-- **`eval-run-started.json`** — created as soon as a scenario begins (pid, scenario id); if present **without** `transcript.json`, the run was interrupted, is still going, crashed, or was **SIGKILL**’d (no sidecar for SIGKILL)
-- **`eval-failure.json`** — uncaught exception before a transcript was written
-- **`eval-aborted.json`** — **SIGTERM** or **SIGINT** (e.g. stopping the process) before completion
+- **Harness sidecars (siblings of the run folder, not inside it)** — so the agent sandbox cannot read them:
+  - **`<stamp>-scenario-NN.eval-started.json`** — written when the scenario begins (pid, scenario id, paths)
+  - **`<stamp>-scenario-NN.eval-failure.json`** — uncaught exception before `transcript.json`
+  - **`<stamp>-scenario-NN.eval-aborted.json`** — **SIGTERM** / **SIGINT** before completion (not **SIGKILL**)
+  If **`transcript.json`** is missing, check these files next to **`…/runs/<stamp>-scenario-NN/`** (same directory as the run folder, not inside it).
 - **`heuristic-score.json`** / **`llm-score.json`** — by default (unless disabled above)
 - **Agent-written files** — the SDK **`cwd`** is this directory. Defaults include **`Write`**, **`Edit`**, and **`Bash`** for clones, installs, and generated code.
 
@@ -92,7 +94,12 @@ Two different things get called “permissions”:
 
 2. **Claude Agent SDK `dontAsk` + `allowedTools`** — In `dontAsk` mode, tools **not** listed in `allowedTools` are denied (no prompt). Defaults include **`Write`**, **`Edit`**, and **`Bash`** so app scenarios can scaffold and install dependencies inside the per-run directory. With **`EVAL_LOCAL_DOCS=1`**: **`Read,Glob,Grep,Write,Edit,Bash`**. Otherwise **`Read,Glob,Grep,WebFetch,Write,Edit,Bash`**. Narrow **`EVAL_TOOLS`** only if you need a stricter harness (e.g. transcript-only, no shell).
 
-3. **Run-directory write guard** — a **`PreToolUse`** hook denies **`Write` / `Edit` / `NotebookEdit`** when the target path resolves **outside** the current `results/runs/<stamp>-scenario-NN/` workspace (hooks enforce this under `permissionMode: dontAsk`; `canUseTool` alone does not). Set **`EVAL_DISABLE_WORKSPACE_WRITE_GUARD=1`** only for debugging. **`Bash`** can still redirect output outside the run dir; review transcripts if that matters.
+3. **Run-directory sandbox (`PreToolUse`)** — Under `permissionMode: dontAsk`, hooks enforce boundaries (not `canUseTool` alone):
+   - **Write / Edit / NotebookEdit** — target path must resolve under `results/runs/<stamp>-scenario-NN/`. **`EVAL_DISABLE_WORKSPACE_WRITE_GUARD=1`** disables this only (debug).
+   - **Read / Glob / Grep** — must stay under that same run directory, and (when **`EVAL_LOCAL_DOCS=1`**) under **`docs/`** of the Outpost repo for local MDX/OpenAPI only. **`EVAL_DISABLE_WORKSPACE_READ_GUARD=1`** disables read/glob/grep/bash/agent checks (restores pre–workspace-sandbox behavior).
+   - **Bash** — commands must not reference the Outpost **`repositoryRoot`** on disk unless the reference stays inside the run dir or (with local docs) inside **`docs/`**.
+   - **Agent** (subagent) — **denied by default** so runs cannot spider the monorepo for “free” SDK context. **`EVAL_ALLOW_AGENT_TOOL=1`** to opt in.
+   - Turn 0 also appends a short **workspace boundary** block (absolute run-dir paths) so the model treats only the clone as the product under integration.
 
 Changing **`EVAL_PERMISSION_MODE`** is usually unnecessary; widening **`EVAL_TOOLS`** (or using local docs) fixes most tool denials.
 

From cc6e7e05111bb2c7fc0dbce517408571e4b63ad0 Mon Sep 17 00:00:00 2001
From: Phil Leggetter <phil@leggetter.co.uk>
Date: Fri, 10 Apr 2026 10:38:56 +0100
Subject: [PATCH 30/47] docs(agent-evaluation): update scenario 01 tracker row

Record 2026-04-10 run, quickstart.sh artifact, execution smoke test, and
sibling harness sidecar layout.

Made-with: Cursor
---
 docs/agent-evaluation/SCENARIO-RUN-TRACKER.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
index aba77c2a5..0ffd50a42 100644
--- a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
+++ b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
@@ -20,7 +20,7 @@ Use this table while you **run scenarios one at a time** and **execute the gener
 
 | ID  | Scenario file                                                                  | Run directory (`results/runs/…`)       | Heuristic              | LLM judge | Execution (generated code) | Notes                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
 | --- | ------------------------------------------------------------------------------ | -------------------------------------- | ---------------------- | --------- | -------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| 01  | [01-basics-curl.md](scenarios/01-basics-curl.md)                               | `2026-04-08T15-07-08-923Z-scenario-01` | Pass (7/7)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost-quickstart.sh`**. Execution: `docs/agent-evaluation/.env` + `./outpost-quickstart.sh`; tenant 200, destination 201, publish **202**; exit 0.                                                                                                                                                                                                                                                                                                                                                              |
+| 01  | [01-basics-curl.md](scenarios/01-basics-curl.md)                               | `2026-04-10T09-28-52-764Z-scenario-01` | Pass (7/7)             | Pass      | Pass                       | Artifact: **`quickstart.sh`**. Heuristic + LLM from `npm run eval -- --scenario 01`; harness sidecars are sibling `*.eval-*.json` under `results/runs/` (not inside run dir). Execution: `OUTPOST_API_KEY` from `docs/agent-evaluation/.env` + `bash quickstart.sh` in run dir; tenant **200**, destination **201**, publish **202**; exit 0.                                                                                                                                                                                                                                                                      |
 | 02  | [02-basics-typescript.md](scenarios/02-basics-typescript.md)                   | `2026-04-08T15-16-50-424Z-scenario-02` | Pass (9/9)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost-quickstart.ts`** + `package.json` (SDK). Ran `npx tsx outpost-quickstart.ts` with `docs/agent-evaluation/.env`; tenant, destination, publish OK.                                                                                                                                                                                                                                                                                                                                                          |
 | 03  | [03-basics-python.md](scenarios/03-basics-python.md)                           | `2026-04-08T15-34-12-720Z-scenario-03` | Pass (8/8)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost_quickstart.py`**. Execution: `python3 -m venv .venv`, `pip install outpost_sdk`, `docs/agent-evaluation/.env` + `python outpost_quickstart.py`.                                                                                                                                                                                                                                                                                                                                                           |
 | 04  | [04-basics-go.md](scenarios/04-basics-go.md)                                   | `2026-04-08T15-48-31-367Z-scenario-04` | Pass (9/9)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifacts: `**main.go`**, `go.mod` (replace → repo `sdks/outpost-go`). `docs/agent-evaluation/.env` + `go run .`; tenant, destination, publish OK.                                                                                                                                                                                                                                                                                                                                                                             |

From cee7ff4dae0b83fb8737571e6a07bebbf41f9688 Mon Sep 17 00:00:00 2001
From: Phil Leggetter <phil@leggetter.co.uk>
Date: Fri, 10 Apr 2026 10:59:35 +0100
Subject: [PATCH 31/47] docs(agent-evaluation): update scenario 02 tracker row

Record 2026-04-10 run: heuristic 9/9 pass, LLM fail (script vs Next.js
mismatch), execution pass via outpost-quickstart.ts.

Made-with: Cursor
---
 docs/agent-evaluation/SCENARIO-RUN-TRACKER.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
index 0ffd50a42..461926046 100644
--- a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
+++ b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
@@ -21,7 +21,7 @@ Use this table while you **run scenarios one at a time** and **execute the gener
 | ID  | Scenario file                                                                  | Run directory (`results/runs/…`)       | Heuristic              | LLM judge | Execution (generated code) | Notes                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
 | --- | ------------------------------------------------------------------------------ | -------------------------------------- | ---------------------- | --------- | -------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | 01  | [01-basics-curl.md](scenarios/01-basics-curl.md)                               | `2026-04-10T09-28-52-764Z-scenario-01` | Pass (7/7)             | Pass      | Pass                       | Artifact: **`quickstart.sh`**. Heuristic + LLM from `npm run eval -- --scenario 01`; harness sidecars are sibling `*.eval-*.json` under `results/runs/` (not inside run dir). Execution: `OUTPOST_API_KEY` from `docs/agent-evaluation/.env` + `bash quickstart.sh` in run dir; tenant **200**, destination **201**, publish **202**; exit 0.                                                                                                                                                                                                                                                                      |
-| 02  | [02-basics-typescript.md](scenarios/02-basics-typescript.md)                   | `2026-04-08T15-16-50-424Z-scenario-02` | Pass (9/9)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost-quickstart.ts`** + `package.json` (SDK). Ran `npx tsx outpost-quickstart.ts` with `docs/agent-evaluation/.env`; tenant, destination, publish OK.                                                                                                                                                                                                                                                                                                                                                          |
+| 02  | [02-basics-typescript.md](scenarios/02-basics-typescript.md)                   | `2026-04-10T09-39-06-362Z-scenario-02` | Pass (9/9)             | **Fail**  | Pass                       | `EVAL_LOCAL_DOCS=1`. Agent produced a **Next.js app** plus **`outpost-quickstart.ts`**; LLM judge **failed** (`overall_transcript_pass=false`) — expected a minimal single-file script + `npx tsx` story, not a full UI (see `llm-score.json` criteria). Heuristic still 9/9. **Execution:** `npx tsx outpost-quickstart.ts` with run-dir `.env` (`OUTPOST_API_KEY`); tenant/destination/publish succeeded (printed event id). Harness sidecars sibling under `results/runs/`.                                                                                                                                           |
 | 03  | [03-basics-python.md](scenarios/03-basics-python.md)                           | `2026-04-08T15-34-12-720Z-scenario-03` | Pass (8/8)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost_quickstart.py`**. Execution: `python3 -m venv .venv`, `pip install outpost_sdk`, `docs/agent-evaluation/.env` + `python outpost_quickstart.py`.                                                                                                                                                                                                                                                                                                                                                           |
 | 04  | [04-basics-go.md](scenarios/04-basics-go.md)                                   | `2026-04-08T15-48-31-367Z-scenario-04` | Pass (9/9)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifacts: `**main.go`**, `go.mod` (replace → repo `sdks/outpost-go`). `docs/agent-evaluation/.env` + `go run .`; tenant, destination, publish OK.                                                                                                                                                                                                                                                                                                                                                                             |
 | 05  | [05-app-nextjs.md](scenarios/05-app-nextjs.md)                                 | `2026-04-08T17-21-22-170Z-scenario-05` | 9/10; overall **Fail** | Pass      | Pass                       | `**nextjs-webhook-demo/`** — primary assessed run; see **§ Scenario 05 — assessment (`17-21-22`)**. Heuristic failure is `managed_base_not_selfhosted` (doc-corpus), not the app. Earlier: `2026-04-08T16-12-10-708Z` (`outpost-nextjs-demo/`, 10/10 heuristic, simpler UI).                                                                                                                                                                                                                                                                        |

From 60f73f43332f77404f70b229de62456b5cf59406 Mon Sep 17 00:00:00 2001
From: Phil Leggetter <phil@leggetter.co.uk>
Date: Fri, 10 Apr 2026 12:08:22 +0100
Subject: [PATCH 32/47] docs: scope-router Outpost agent prompt and refresh
 basics tracker rows
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Restructure hookdeck-outpost-agent-prompt.mdx with Quick path / new minimal
app / existing app guidance, default-to-smallest behavior, language vs
architecture, doc list split, mapping hints, and explicit anti-over-build
rules.

Update SCENARIO-RUN-TRACKER for scenarios 01–03 with recent eval runs
(heuristic, LLM, execution notes, sibling harness sidecars).

Made-with: Cursor
---
 docs/agent-evaluation/SCENARIO-RUN-TRACKER.md |  30 ++---
 .../hookdeck-outpost-agent-prompt.mdx         | 103 +++++++++++++-----
 2 files changed, 91 insertions(+), 42 deletions(-)

diff --git a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
index 461926046..c043acaa7 100644
--- a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
+++ b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
@@ -18,18 +18,18 @@ Use this table while you **run scenarios one at a time** and **execute the gener
 ## Tracker
 
 
-| ID  | Scenario file                                                                  | Run directory (`results/runs/…`)       | Heuristic              | LLM judge | Execution (generated code) | Notes                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
-| --- | ------------------------------------------------------------------------------ | -------------------------------------- | ---------------------- | --------- | -------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| 01  | [01-basics-curl.md](scenarios/01-basics-curl.md)                               | `2026-04-10T09-28-52-764Z-scenario-01` | Pass (7/7)             | Pass      | Pass                       | Artifact: **`quickstart.sh`**. Heuristic + LLM from `npm run eval -- --scenario 01`; harness sidecars are sibling `*.eval-*.json` under `results/runs/` (not inside run dir). Execution: `OUTPOST_API_KEY` from `docs/agent-evaluation/.env` + `bash quickstart.sh` in run dir; tenant **200**, destination **201**, publish **202**; exit 0.                                                                                                                                                                                                                                                                      |
-| 02  | [02-basics-typescript.md](scenarios/02-basics-typescript.md)                   | `2026-04-10T09-39-06-362Z-scenario-02` | Pass (9/9)             | **Fail**  | Pass                       | `EVAL_LOCAL_DOCS=1`. Agent produced a **Next.js app** plus **`outpost-quickstart.ts`**; LLM judge **failed** (`overall_transcript_pass=false`) — expected a minimal single-file script + `npx tsx` story, not a full UI (see `llm-score.json` criteria). Heuristic still 9/9. **Execution:** `npx tsx outpost-quickstart.ts` with run-dir `.env` (`OUTPOST_API_KEY`); tenant/destination/publish succeeded (printed event id). Harness sidecars sibling under `results/runs/`.                                                                                                                                           |
-| 03  | [03-basics-python.md](scenarios/03-basics-python.md)                           | `2026-04-08T15-34-12-720Z-scenario-03` | Pass (8/8)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost_quickstart.py`**. Execution: `python3 -m venv .venv`, `pip install outpost_sdk`, `docs/agent-evaluation/.env` + `python outpost_quickstart.py`.                                                                                                                                                                                                                                                                                                                                                           |
-| 04  | [04-basics-go.md](scenarios/04-basics-go.md)                                   | `2026-04-08T15-48-31-367Z-scenario-04` | Pass (9/9)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifacts: `**main.go`**, `go.mod` (replace → repo `sdks/outpost-go`). `docs/agent-evaluation/.env` + `go run .`; tenant, destination, publish OK.                                                                                                                                                                                                                                                                                                                                                                             |
-| 05  | [05-app-nextjs.md](scenarios/05-app-nextjs.md)                                 | `2026-04-08T17-21-22-170Z-scenario-05` | 9/10; overall **Fail** | Pass      | Pass                       | `**nextjs-webhook-demo/`** — primary assessed run; see **§ Scenario 05 — assessment (`17-21-22`)**. Heuristic failure is `managed_base_not_selfhosted` (doc-corpus), not the app. Earlier: `2026-04-08T16-12-10-708Z` (`outpost-nextjs-demo/`, 10/10 heuristic, simpler UI).                                                                                                                                                                                                                                                                        |
-| 06  | [06-app-fastapi.md](scenarios/06-app-fastapi.md)                               | `2026-04-09T08-38-42-008Z-scenario-06` | Pass (8/8)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. `**main.py`** + `requirements.txt`, `outpost_sdk` + FastAPI. HTML: destinations list, add webhook (topics from API + URL), publish test event, delete. Execution: `python3 -m venv .venv`, `pip install -r requirements.txt`, run-dir `.env`, `uvicorn main:app` on :8766; **GET /** 200, **POST /destinations** 303, **POST /publish** 303.                                                                                                                                                                                   |
-| 07  | [07-app-go-http.md](scenarios/07-app-go-http.md)                               | `2026-04-09T09-10-23-291Z-scenario-07` | Pass (9/9)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. `**go-portal-demo/`** — `main.go` + `templates/`, `net/http`, `outpost-go` (`replace` → repo `sdks/outpost-go`). Multi-step create destination + **GET/POST /publish**. Execution: `PORT=8777` + key/base from `docs/agent-evaluation/.env`; **GET /** 200, **POST /publish** 200. Eval ~25 min wall time.                                                                                                                                                                                                                     |
-| 08  | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md)   | `2026-04-09T14-48-16-906Z-scenario-08` | Pass (7/7)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. `**## Eval harness`** pre-clone + `**agent cwd`** = `next-saas-starter/` under run dir; artifact colocated (`app/api/outpost/`**, dashboard webhooks, `@hookdeck/outpost-sdk`). **Execution:** `npx tsc --noEmit` in `…/next-saas-starter/` — **exit 0**. Eval ~13 min wall time. Earlier run `2026-04-09T11-08-32-505Z-scenario-08`: work had landed outside run dir (no app tree in folder).                                                                                                                                 |
+| ID  | Scenario file                                                                  | Run directory (`results/runs/…`)       | Heuristic              | LLM judge | Execution (generated code) | Notes                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
+| --- | ------------------------------------------------------------------------------ | -------------------------------------- | ---------------------- | --------- | -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
+| 01  | [01-basics-curl.md](scenarios/01-basics-curl.md)                               | `2026-04-10T09-28-52-764Z-scenario-01` | Pass (7/7)             | Pass      | Pass                       | Artifact: `**quickstart.sh`**. Heuristic + LLM from `npm run eval -- --scenario 01`; harness sidecars are sibling `*.eval-*.json` under `results/runs/` (not inside run dir). Execution: `OUTPOST_API_KEY` from `docs/agent-evaluation/.env` + `bash quickstart.sh` in run dir; tenant **200**, destination **201**, publish **202**; exit 0.                                                                                                                                                                                                                |
+| 02  | [02-basics-typescript.md](scenarios/02-basics-typescript.md)                   | `2026-04-10T10-34-35-461Z-scenario-02` | Pass (9/9)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1` after **scope-router** update to [agent prompt template](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx). Artifact: `**outpost-quickstart.ts`** + `package.json` (SDK)—**no** Next.js scaffold. Heuristic + LLM pass; judge `execution_in_transcript` **pass** (agent ran script in transcript: tenant, destination, event id). Harness sidecars sibling under `results/runs/`. Earlier over-build run: `2026-04-10T09-39-06-362Z-scenario-02` (Next.js + script; LLM fail).                                                  |
+| 03  | [03-basics-python.md](scenarios/03-basics-python.md)                           | `2026-04-10T11-02-19-073Z-scenario-03` | Pass (8/8)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1` with [scope-router prompt](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx). Artifact: `**outpost_quickstart.py`** + `.env.example` (`python-dotenv`, `outpost_sdk`)—**no** web framework. Heuristic + LLM pass; judge `execution_in_transcript` **pass** (agent ran script; printed event id). Harness sidecars sibling under `results/runs/`. Earlier run: `2026-04-08T15-34-12-720Z-scenario-03`.                                                                                                                                 |
+| 04  | [04-basics-go.md](scenarios/04-basics-go.md)                                   | `2026-04-08T15-48-31-367Z-scenario-04` | Pass (9/9)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifacts: `**main.go`**, `go.mod` (replace → repo `sdks/outpost-go`). `docs/agent-evaluation/.env` + `go run .`; tenant, destination, publish OK.                                                                                                                                                                                                                                                                                                                                                                                      |
+| 05  | [05-app-nextjs.md](scenarios/05-app-nextjs.md)                                 | `2026-04-08T17-21-22-170Z-scenario-05` | 9/10; overall **Fail** | Pass      | Pass                       | `**nextjs-webhook-demo/`** — primary assessed run; see **§ Scenario 05 — assessment (`17-21-22`)**. Heuristic failure is `managed_base_not_selfhosted` (doc-corpus), not the app. Earlier: `2026-04-08T16-12-10-708Z` (`outpost-nextjs-demo/`, 10/10 heuristic, simpler UI).                                                                                                                                                                                                                                                                                 |
+| 06  | [06-app-fastapi.md](scenarios/06-app-fastapi.md)                               | `2026-04-09T08-38-42-008Z-scenario-06` | Pass (8/8)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. `**main.py`** + `requirements.txt`, `outpost_sdk` + FastAPI. HTML: destinations list, add webhook (topics from API + URL), publish test event, delete. Execution: `python3 -m venv .venv`, `pip install -r requirements.txt`, run-dir `.env`, `uvicorn main:app` on :8766; **GET /** 200, **POST /destinations** 303, **POST /publish** 303.                                                                                                                                                                                            |
+| 07  | [07-app-go-http.md](scenarios/07-app-go-http.md)                               | `2026-04-09T09-10-23-291Z-scenario-07` | Pass (9/9)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. `**go-portal-demo/`** — `main.go` + `templates/`, `net/http`, `outpost-go` (`replace` → repo `sdks/outpost-go`). Multi-step create destination + **GET/POST /publish**. Execution: `PORT=8777` + key/base from `docs/agent-evaluation/.env`; **GET /** 200, **POST /publish** 200. Eval ~25 min wall time.                                                                                                                                                                                                                              |
+| 08  | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md)   | `2026-04-09T14-48-16-906Z-scenario-08` | Pass (7/7)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. `**## Eval harness`** pre-clone + `**agent cwd`** = `next-saas-starter/` under run dir; artifact colocated (`app/api/outpost/`**, dashboard webhooks, `@hookdeck/outpost-sdk`). **Execution:** `npx tsc --noEmit` in `…/next-saas-starter/` — **exit 0**. Eval ~13 min wall time. Earlier run `2026-04-09T11-08-32-505Z-scenario-08`: work had landed outside run dir (no app tree in folder).                                                                                                                                          |
 | 09  | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) | `2026-04-09T22-16-54-750Z-scenario-09` | Pass (6/6)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. **Artifact** lives under `results/runs/…` (**gitignored**): `full-stack-fastapi-template/` + Docker **outpost-local-s09**; ports **5173** / **8001** / **54333** / **1080**. **§ Scenario 09 — post-agent work** lists everything applied after the agent transcript (incl. test publish, events/attempts/retry UI, docs + prompt). **§ Scenario 09 — review notes** — closed (IA + domain topics guidance landed in BYO UI + prompt). **Legacy runs:** `2026-04-09T20-48-16-530Z-scenario-09`, `2026-04-09T15-51-44-184Z-scenario-09`. |
-| 10  | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md)           |                                        |                        |           |                            |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
+| 10  | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md)           |                                        |                        |           |                            |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
 
 
 ### Scenario 09 — post-agent work (`2026-04-09T22-16-54-750Z-scenario-09`)
@@ -40,12 +40,12 @@ Work applied **after** the agent transcript so the FastAPI + React artifact matc
 
 - **TanStack Router:** `frontend/src/routeTree.gen.ts` — register `/_layout/webhooks` (agent added the route file but not the generated tree).
 - **API base URL:** webhooks page used browser-relative `/api/...` against nginx; switched to backend base (`OpenAPI.BASE` / `VITE_API_URL`).
-- **Destination types:** Outpost JSON uses **`type`** and **`icon`** (not `id` / `svg`); fixed controlled radios / **Next** in the create wizard.
+- **Destination types:** Outpost JSON uses `**type`** and `**icon**` (not `id` / `svg`); fixed controlled radios / **Next** in the create wizard.
 
 **Backend**
 
-- **`POST /api/v1/webhooks/publish-test`** — synthetic `publish` for integration testing.
-- **`GET /api/v1/webhooks/events`**, **`GET /api/v1/webhooks/attempts`**, **`POST /api/v1/webhooks/retry`** — BFF proxies for tenant-scoped **events list**, **attempts**, and **manual retry** (admin key server-side).
+- `**POST /api/v1/webhooks/publish-test`** — synthetic `publish` for integration testing.
+- `**GET /api/v1/webhooks/events**`, `**GET /api/v1/webhooks/attempts**`, `**POST /api/v1/webhooks/retry**` — BFF proxies for tenant-scoped **events list**, **attempts**, and **manual retry** (admin key server-side).
 
 **Dashboard UI (webhooks page)**
 
@@ -61,7 +61,7 @@ Work applied **after** the agent transcript so the FastAPI + React artifact matc
 Operator feedback from exercising the FastAPI full-stack artifact is **closed** in-repo:
 
 1. **Event activity IA** — [Building your own UI](../pages/guides/building-your-own-ui.mdx) documents **default** destination → activity and **optional** tenant-wide activity with the same list endpoints; no open doc gap.
-2. **Domain topics + real publishes vs test-only** — [Agent prompt](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx) (topic reconciliation, domain publish, test publish as separate), scenarios **08–10** success criteria + user-turn scripts, [README](README.md) execution notes, and heuristic **`publish_beyond_test_only`** in [`src/score-transcript.ts`](src/score-transcript.ts) cover what we measure.
+2. **Domain topics + real publishes vs test-only** — [Agent prompt](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx) (topic reconciliation, domain publish, test publish as separate), scenarios **08–10** success criteria + user-turn scripts, [README](README.md) execution notes, and heuristic `**publish_beyond_test_only`** in `[src/score-transcript.ts](src/score-transcript.ts)` cover what we measure.
 
 The **copied agent template** (the `## Hookdeck Outpost integration` block) intentionally stays **scenario-agnostic**: it does not name eval baselines, harness repos, or scenario IDs—only product-level integration guidance and doc links.
 
diff --git a/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx b/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx
index 16f348e09..875bd739b 100644
--- a/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx
+++ b/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx
@@ -35,59 +35,108 @@ Use this **Hookdeck Console Source** URL to verify event delivery (the webhook `
 
 ### Documentation
 
+**Core (read for every path):**
+
 - Getting started (curl / HTTP only, no SDK): {{DOCS_URL}}/quickstarts/hookdeck-outpost-curl
 - TypeScript quickstart (TypeScript SDK): {{DOCS_URL}}/quickstarts/hookdeck-outpost-typescript
 - Python quickstart (Python SDK): {{DOCS_URL}}/quickstarts/hookdeck-outpost-python
 - Go quickstart (Go SDK): {{DOCS_URL}}/quickstarts/hookdeck-outpost-go
-- Full docs bundle (when available on the public site): {{LLMS_FULL_URL}}
 - API reference and OpenAPI (REST JSON shapes and status codes): {{DOCS_URL}}/api
 - **Concepts — how tenants, destinations (subscriptions), topics, and publish fit a SaaS/platform:** {{DOCS_URL}}/concepts
+- Full docs bundle (when available on the public site): {{LLMS_FULL_URL}}
+- SDK overview: {{DOCS_URL}}/sdks — use **only** for high-level context; for **TypeScript, Python, or Go** code, follow that language’s **quickstart** for correct method signatures (e.g. Python `publish.event` uses `request={{...}}`, not TypeScript-style spreads as Python kwargs).
+
+**When you build customer-facing UI or integrate into an existing product (not for quick path only):**
+
 - **Building your own UI — screen structure and flow** (list destinations—**any type**; create: choose **type** → topics → type-specific config; **events** / **attempts** / **manual retry**; tenant scope; default **destination → activity**): {{DOCS_URL}}/guides/building-your-own-ui
 - Destination types: {{DOCS_URL}}/destinations
 - Topics and destination subscriptions (fan-out, `*`): {{DOCS_URL}}/features/topics
-- SDK overview (mostly TypeScript-shaped examples): {{DOCS_URL}}/sdks — use **only** for high-level context; for **TypeScript, Python, or Go** code, follow that language’s **quickstart** for correct method signatures (e.g. Python `publish.event` uses `request={{...}}`, not TypeScript-style spreads as Python kwargs).
+
+### Scope: choose the right depth (read before you build)
+
+Operators often give **short** answers (“TypeScript example,” “show me in Go”). **You** infer **how much** to build from their words—not from habit, and **not** from the language alone.
+
+**Three paths** (dashboard or chat may use other labels—“try it out,” “small demo app,” “our existing codebase,” or “Option 1 / 2 / 3”—map them to the same three):
+
+1. **Quick path** — Smallest runnable artifact: one **shell script** (curl) or **one source file** run with `npx tsx`, `python`, `go run`, etc., exactly as that language’s **quickstart** describes. No application framework, no multi-route server, no dev-server “project,” unless they clearly asked for an app.
+2. **New minimal application** — They want a **new** small service or UI (pages, forms, a demo they can open in a browser). Use the **official SDK on the server** for whatever stack they name; stay **framework-agnostic** unless they specify a framework—do not impose one.
+3. **Existing application** — They are changing **their current codebase**. Same SDK-on-server rules; integrate on **real** domain paths. Use the **full-stack** guidance in **Existing application (full-stack products)** below when the repo already has customer-facing UI.
+
+**Default when scope is ambiguous:** Prefer **Quick path**. If they only name a language, say “example,” “quickstart,” “try it,” “just show me,” or similar—and they do **not** ask for an app, UI, pages, a server project, or changes **inside their repo**—deliver **only** the quickstart-shaped artifact for that language (or curl if they gave no language). **Brief user messages are normal;** map them to the **smallest** matching path.
+
+**Language ≠ architecture:** **TypeScript**, **Python**, and **Go** select **which quickstart and SDK** to use. They do **not** mean “build a web application.” If they want an app or a full integration, they will signal it (“small dashboard,” “add to our backend,” “we use X in production,” etc.)—or ask **one** clarifying question if truly unclear.
+
+**Do not over-build:**
+
+- **Quick path** → **No** framework scaffold (no app router, no `create-*-app`, no Express/FastAPI/Go HTTP **project** just to demo Outpost). One file or one shell script is enough.
+- **Quick path** → Do **not** default to a large stack because the language was TypeScript or Node; a **single `.ts` file** per the TypeScript quickstart is the right shape unless they asked for more.
+- **New minimal application** → Do **not** ship full portal depth (events UI, retry flows, every destination type) unless they asked for that level; grow into **Building your own UI** when they want customer-grade destination management.
+- **Existing application** → Do **not** stop at a throwaway demo route when they asked for real integration; follow **Minimum integration depth** under that section.
+
+### If the operator said… (mapping hints)
+
+| They said (examples) | Likely path |
+|----------------------|-------------|
+| “Example,” “quickstart,” “fastest,” “simplest,” “just show me,” or **only** a language name with no app/repo context | **Quick path** |
+| “Small app,” “UI,” “page,” “form,” “demo site,” “dashboard” (greenfield, not their production repo) | **New minimal application** |
+| “Our app,” “existing code,” “add to my API,” “integrate into this repo,” “we already run …” | **Existing application** |
+
+Use judgment; when two paths seem possible, prefer **Quick path** unless they clearly want UI or repo integration.
 
 ### Language → SDK vs HTTP
 
-Operators rarely name packages or SDK details. **You** map what they say to the right doc and dependency:
+**You** map their words to the right doc—**after** you have chosen **scope** above.
 
-**“Try it out” — interpret their words**
+- **No language named** + simplest / minimal / “just show me” / no framework → **curl quickstart** + OpenAPI. One runnable shell script. **No SDK.**
+- **TypeScript** or **Node** → **TypeScript quickstart** + **`@hookdeck/outpost-sdk`** as that doc shows. They do not need to say “SDK.”
+- **Python** → **Python quickstart** + **`outpost_sdk`** (e.g. Python `publish.event` uses `request={{...}}` — **not** TypeScript-style kwargs).
+- **Go** → **Go quickstart** + official Go SDK as that doc shows.
+- **curl**, **HTTP only**, or **REST** without a language SDK → **curl quickstart** + OpenAPI.
 
-- **Simplest / fastest / minimal / least setup / “just show me” / no framework** (and they do **not** name TypeScript, Python, or Go) → treat as **curl**: **curl quickstart** + **OpenAPI** for exact JSON. One runnable shell script is ideal. **No SDK.**
-- **TypeScript** or **Node** → **TypeScript quickstart**; use the **official TypeScript SDK** (`@hookdeck/outpost-sdk`) exactly as that quickstart shows. The user does not need to say “SDK.”
-- **Python** → **Python quickstart**; use **`outpost_sdk`** as that quickstart shows (e.g. Python `publish.event` uses `request={{...}}` — **not** TypeScript-style kwargs on the method).
-- **Go** → **Go quickstart**; use the **official Go SDK** as that quickstart shows.
-- They explicitly want **curl**, **HTTP only**, or **REST** without a language SDK → **curl quickstart** + OpenAPI.
+Do **not** mix argument styles across languages (e.g. do not apply TypeScript `publish.event({ ... })` shapes to Python).
 
-Do **not** mix patterns across languages (e.g. do not apply TypeScript `publish.event({ ... })` argument style to Python).
+### Quick path — how to deliver
 
-**Option 2 (small app)** — Map framework to the matching official SDK on the **server only**: e.g. **Next.js** → TypeScript SDK + patterns from the TypeScript quickstart and your Next conventions; **FastAPI** → Python SDK; **Go + net/http** → Go SDK. Prefer each language’s **quickstart** for Outpost call shapes. **Before** designing pages or forms, read **Concepts** and **Building your own UI** in the Documentation list: the UI should reflect **tenant scope**, **multiple destinations per tenant**, and **destination = topic subscription + delivery target** (not a single anonymous webhook field unless the user explicitly asks for that simplified shape).
+Goal: tenant → **one destination** (often webhook to `{{TEST_DESTINATION_URL}}` / `OUTPOST_TEST_WEBHOOK_URL`) → **publish** → clear success (event id, HTTP 2xx, log line).
 
-**Option 3 (existing app)** — Use the **official SDK for the repo’s backend language** on the **server** (or REST/OpenAPI if they insist on no SDK). Read that language’s quickstart for call shapes; integrate on **real** domain paths (signup, core entities, workflows), not throwaway demos. **Minimum integration depth:** (1) **Topic reconciliation** — every **`topic` in `publish`** must either appear under **Configured topics** above **or** be documented for the operator with **“add this topic in the Outpost project”** (prefer fixing the project to match the domain, not retargeting domain logic to a stale list). (2) **Domain publish** — at least one **`publish` on a real state-change path** (CRUD handler, service after commit, job, etc.), not only a “send test event” / synthetic demo route. (3) **Same tenant mapping** everywhere you call Outpost for that customer.
+- Default to **curl** when they want the absolute minimum and did not name a language.
+- When they name **TypeScript**, **Python**, or **Go**, produce **only** what that language’s **quickstart** describes—typically **one file** (plus `package.json` / `go.mod` / venv if the quickstart needs it), not a full application tree.
+- Ask only for env vars and details the quickstart still needs.
 
-**Full-stack existing apps (backend + product UI)** — If the codebase already has a **customer-facing UI** (dashboard, settings, integrations, account area) **or** a mobile app that talks to your API, assume operators want customers to **manage event destinations** (every **destination type** the project enables—webhook, queues, Hookdeck, etc.; see **{{DOCS_URL}}/destinations** and **`GET /destination-types`** in OpenAPI) **inside the product**, not only via raw API or Swagger:
+### New minimal application
 
-- **Backend:** Keep **`OUTPOST_API_KEY`** and all Outpost SDK usage **server-side only**. Implement **tenant** upsert/sync where it fits your model, **publish** on real domain events, and **authenticated HTTP routes** (BFF / API routes / server actions—whatever matches the stack) that list, create, update, or delete destinations for the **currently signed-in customer’s** tenant. Those handlers call Outpost with the platform credentials; responses return only what the customer should see (e.g. destination ids, **targets** / config summaries, topics—never the platform API key).
-- **Frontend:** Wire **logged-in** pages to **your** backend endpoints (session cookie, JWT, or your existing API client)—**not** to Hookdeck’s API directly and **not** with the Outpost SDK in the browser. Reuse your design system and routing. **Before** building screens, read **Concepts** and **Building your own UI** in the Documentation list: flows should reflect **tenant scope**, **multiple destinations per tenant**, and **destination = topic subscription + delivery target** (avoid a single undifferentiated “webhook” field that hides topics unless the operator asks for that simplification).
-- **Events and retries in the product UI:** Surface an **events** view (filterable by **destination** when useful) so customers can see what was published, plus **delivery attempts** per event (success/failure, response hints). For **failed** attempts, offer **manual retry** (server-side `POST /retry` with `event_id` and `destination_id`) after they fix their endpoint or downstream config—see **Building your own UI** (default: **destination → activity**) for how this links to destinations and to automatic retries in Outpost.
-- **Send test events (strongly recommended for full-stack / Option 3):** When you ship customer-facing destination management, also add a **separate** control or screen that **publishes a test event** for the signed-in tenant (server-side `publish` to a selectable topic, same pattern as the test destination URL above). This is **complementary** to domain publishes: it proves wiring (destination + topic subscription + delivery) without waiting on real traffic. It **does not replace** a `publish` on a real domain path. The test topic can be any **configured** topic; domain publishes should use topics that match the events you document.
-- **API-only or headless products:** If there is **no** customer UI, document how tenants manage destinations through **your** documented API (OpenAPI, etc.); still keep the platform key on the server.
+When they want a **new** small app (not quick path): use the **official SDK on the server** for **their** stack. **Do not** treat any single framework as the default—follow what they name (or ask once). Prefer each language’s **quickstart** for Outpost call shapes, then add routes/pages as their stack requires.
 
-### What to do
+**Before** designing screens or forms, read **Concepts** and **Building your own UI** (under Documentation): reflect **tenant scope**, **multiple destinations per tenant**, and **destination = topic subscription + delivery target** (not one anonymous webhook field unless they ask for that simplification).
 
-Guide the conversation, then act:
+For a **tiny** demo, keep **tenant** in scope, **create destination** as **topics + delivery target**, and a **separate** way to **publish a test event** so they can verify delivery—avoid one giant form unless they insist. Events / attempts UI is optional for the smallest demo; add it when matching **Building your own UI**.
 
-1. **Try it out** — Minimal path: tenant → **one destination** (often a webhook for quick verification) → publish → print event id (or show success). If they want the **simplest** path, default to **curl** without making them say “curl.” If they name **TypeScript**, **Python**, or **Go**, use **only** that language’s quickstart and implied SDK. Ask only for what the quickstart and runnability still need (env vars, etc.).
+### Existing application
 
-2. **Build a minimal example** — Small UI + server; use the **SDK for that stack** (see **Option 2** above) or REST if they choose HTTP-only. Follow **Concepts** + **Building your own UI** for the real product model. For a **tiny** demo (e.g. one page), still keep the model visible: **tenant** in scope, **create destination** as **topics + delivery target** (not one undifferentiated “webhook” field that hides topics), and a **separate** control or flow to **publish a test event** so the operator can verify delivery—avoid collapsing tenant setup, destination creation, and publish into a single form unless the user insists. An events or attempts view is optional for the smallest demo but matches the portal pattern when you have room.
+Use the **official SDK for the repo’s backend language** on the **server** (or REST/OpenAPI if they refuse SDKs). Read that language’s quickstart for call shapes; integrate on **real** domain paths (signup, entities, workflows), not throwaway demos only.
 
-3. **Integrate with an existing app** — Open their codebase; implement **Option 3**. For repos that ship a **product UI**, integrate **both** server and client: backend Outpost calls plus customer-facing screens (or clear extension points) wired through **your** authenticated API, following **Building your own UI** for structure—**including test publish**, an **events** list (and attempts / **retry** where appropriate), unless the operator explicitly asks to omit parts. Document env vars, tenant mapping, topics, and how to verify delivery (e.g. `{{TEST_DESTINATION_URL}}` or the Hookdeck dashboard).
+**Minimum integration depth:** (1) **Topic reconciliation** — every **`topic` in `publish`** appears under **Configured topics** above **or** the operator is told to **add that topic in the Outpost project** (prefer fixing the project to match the domain, not retargeting domain logic to a stale list). (2) **Domain publish** — at least one **`publish` on a real state-change path**, not only a synthetic “test event” route. (3) **Same tenant mapping** everywhere you call Outpost for that customer.
+
+### Existing application (full-stack products)
+
+If the codebase already has **customer-facing UI** (dashboard, settings, integrations) **or** a client that talks to **your** API, operators usually want customers to **manage destinations** (every **destination type** the project enables; see **{{DOCS_URL}}/destinations** and **`GET /destination-types`** in OpenAPI) **inside the product**:
+
+- **Backend:** **`OUTPOST_API_KEY`** and all Outpost SDK usage **server-side only**. **Tenant** upsert/sync where it fits, **publish** on real domain events, and **authenticated routes** (BFF, server handlers, server actions—whatever matches **their** stack) to list/create/update/delete destinations for the **signed-in customer’s** tenant. Handlers call Outpost with platform credentials; responses expose only what the customer should see (ids, targets, topics—**never** the platform API key).
+- **Frontend:** **Logged-in** clients call **your** backend (session, JWT, existing API client)—**not** Hookdeck’s API directly; **not** the Outpost SDK in the browser. Reuse their design system and routing. **Before** building screens, read **Concepts** and **Building your own UI**: **tenant scope**, **multiple destinations**, **destination = topics + delivery target** (avoid one undifferentiated “webhook” field unless they want that simplification).
+- **Events and retries:** Surface **events** (filter by **destination** when useful) and **attempts** per event; offer **manual retry** for failed attempts (server-side retry API with `event_id` and `destination_id`) after they fix downstream—see **Building your own UI** (default **destination → activity**).
+- **Test publish (recommended when shipping destination UI):** A **separate** control that **publishes a test event** for the signed-in tenant (server-side `publish` to a configured topic). Complementary to domain publishes; **does not replace** a real domain `publish`.
+- **API-only products:** Document how tenants manage destinations via **your** API; keep the platform key on the server.
+
+### What to do
 
-For all modes, read the **single** language-appropriate quickstart (and OpenAPI when implementing raw HTTP) before writing code. For **Option 3** with a UI, also read **Building your own UI** before implementing destination-management screens.
+1. **Infer scope** from **Scope** + **If the operator said…** (default **Quick path** when unclear).
+2. **Map language** under **Language → SDK vs HTTP**.
+3. **Execute** the matching section: **Quick path**, **New minimal application**, or **Existing application** (+ **full-stack** subsection when applicable).
+4. Read the **single** language-appropriate quickstart (and OpenAPI for raw HTTP) before coding. For existing apps with UI, read **Building your own UI** before destination-management screens.
 
 ### Before you stop (verify)
 
-Apply **only** the items below that fit the task; **skip** any that do not apply (e.g. skip the existing-repo items for a standalone script or curl-only flow).
+Apply **only** the items below that fit the task; **skip** any that do not apply (e.g. skip existing-repo items for a standalone script or curl-only flow).
 
 **Always (when you produced or changed runnable code):**
 
@@ -95,14 +144,14 @@ Apply **only** the items below that fit the task; **skip** any that do not apply
 - [ ] **Secrets:** The platform Outpost API key remains **server-side** / **environment** only — not in client bundles, not hard-coded in committed source.
 - [ ] **Repeatable:** Env vars, how to run, and how to verify with the test destination above are stated briefly (README, comments, or chat — match the task size; a one-file script may need only inline or chat notes).
 
-**When editing an existing application repository (Option 3 or equivalent):**
+**When editing an existing application repository (Existing application or equivalent):**
 
 - [ ] **Topic reconciliation:** Every **`topic`** in `publish` is either in **Configured topics** above **or** README/chat tells the operator exactly which topics to **add in Hookdeck**—**domain-first**; do not retarget real features to wrong topic names to match an incomplete **Configured topics** list unless the operator explicitly asked for a minimal demo scope.
 - [ ] **Domain publish:** At least one **`publish` on a real application path** (entity create/update, signup, etc.), not solely a synthetic “test event” endpoint—unless the operator explicitly scoped the task to wiring-only.
 - [ ] **Test publish (if you added one):** Kept as a **separate** control from domain logic; does not satisfy the domain-publish item by itself.
 - [ ] **Build integrity:** Generated outputs, route or module registries, and dependency lockfiles are **consistent** with new or edited source so a **clean** install + typecheck or build (or the repo’s documented CI step) would pass.
 
-**Files on disk:** When your environment supports it, **write runnable artifacts into the operator’s project workspace** (real files: scripts, app source, `package.json`, `go.mod`, README) rather than only pasting long code in chat—so they can run, diff, and commit. Keep everything for one task in the same directory. For **minimal example apps** (option 2), scaffold and install dependencies there as you normally would (for example `npm` / `npx`, `go mod`, `pip` or `uv`). For **Option 3** full-stack products, change both **backend and frontend** (or equivalent UI layer) when the repo already includes a customer-facing app—do not stop at OpenAPI-only unless the product is genuinely API-only or the operator asks to skip UI work.
+**Files on disk:** When your environment supports it, **write runnable artifacts into the operator’s project workspace** (real files: scripts, app source, `package.json`, `go.mod`, README) rather than only pasting long code in chat—so they can run, diff, and commit. Keep everything for one task in the same directory. For **new minimal application**, scaffold and install dependencies as you normally would (`npm` / `npx`, `go mod`, `pip` or `uv`). For **existing** full-stack products, change both **backend and frontend** (or equivalent UI layer) when the repo already includes customer-facing UI—do not stop at OpenAPI-only unless the product is genuinely API-only or the operator asks to skip UI work.
 
 **Concepts:** Each **tenant** is one of the platform’s customers (an org/account you sell to). A tenant has **zero or more destinations**; each **destination** is a **subscription**—a **destination type** (webhook, queue, Hookdeck, …) plus **which topics** to receive and **where** to deliver (type-specific: URL, queue name, etc.). Your **backend** publishes with **`tenant_id`**, **`topic`**, and payload; Outpost fans out to every destination of that tenant that subscribes to that topic. Topic names should reflect **your product’s events**; **`user.*`** usually means **users inside that tenant’s account**, not your company’s internal operators. Read **{{DOCS_URL}}/concepts** and **{{DOCS_URL}}/guides/building-your-own-ui** for the full model and recommended screens. Topics for this project are listed above and were configured in the Hookdeck dashboard.
 ```

From 33653ddb3f9e03104e918fc4d11bc17b11f0c402 Mon Sep 17 00:00:00 2001
From: Phil Leggetter <phil@leggetter.co.uk>
Date: Fri, 10 Apr 2026 12:52:57 +0100
Subject: [PATCH 33/47] fix(api): add DestinationSchemaField.key to OpenAPI
 spec

The API and registry metadata always returned key on config_fields and
credential_fields; the published schema omitted it and examples did not
validate against the corrected shape. Align DestinationSchemaField and
embedded destination-types examples with the wire format.

Made-with: Cursor
---
 docs/apis/openapi.yaml      | 24 +++++++++++++++++++++++-
 docs/pages/destinations.mdx |  1 +
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/docs/apis/openapi.yaml b/docs/apis/openapi.yaml
index 8f944fed2..95047557f 100644
--- a/docs/apis/openapi.yaml
+++ b/docs/apis/openapi.yaml
@@ -2008,8 +2008,14 @@ components:
             $ref: "#/components/schemas/DestinationSchemaField"
     DestinationSchemaField:
       type: object
-      required: [type, required]
+      required: [type, required, key]
       properties:
+        key:
+          type: string
+          description: >-
+            Property name for this value inside the destination `config` or `credentials` object
+            on create/update (for example `url` for a webhook endpoint URL).
+          example: "url"
         type:
           type: string
           enum: [text, checkbox, key_value_map, select]
@@ -3688,6 +3694,7 @@ paths:
                       instructions: "Enter the URL..."
                       config_fields: [
                           {
+                            key: "url",
                             type: "text",
                             label: "URL",
                             description: "The URL to send the webhook to.",
@@ -3697,6 +3704,7 @@ paths:
                         ]
                       credential_fields: [
                           {
+                            key: "secret",
                             type: "text",
                             label: "Secret",
                             description: "Optional signing secret.",
@@ -3711,30 +3719,35 @@ paths:
                       config_fields:
                         [
                           {
+                            key: "brokers",
                             type: "text",
                             label: "Brokers",
                             description: "Comma-separated list of Kafka broker addresses.",
                             required: true,
                           },
                           {
+                            key: "topic",
                             type: "text",
                             label: "Topic",
                             description: "The Kafka topic to publish messages to.",
                             required: true,
                           },
                           {
+                            key: "tls",
                             type: "checkbox",
                             label: "TLS",
                             description: "Enable TLS for the connection.",
                             default: "true",
                           },
                           {
+                            key: "partition_key_template",
                             type: "text",
                             label: "Partition Key Template",
                             description: "JMESPath template to extract the partition key from the event payload.",
                             required: false,
                           },
                           {
+                            key: "sasl_mechanism",
                             type: "select",
                             label: "SASL Mechanism",
                             description: "SASL authentication mechanism.",
@@ -3749,12 +3762,14 @@ paths:
                       credential_fields:
                         [
                           {
+                            key: "username",
                             type: "text",
                             label: "Username",
                             description: "SASL username for authentication.",
                             required: true,
                           },
                           {
+                            key: "password",
                             type: "text",
                             label: "Password",
                             description: "SASL password for authentication.",
@@ -3770,12 +3785,14 @@ paths:
                       config_fields:
                         [
                           {
+                            key: "queue_url",
                             type: "text",
                             label: "Queue URL",
                             description: "The URL of the SQS queue.",
                             required: true,
                           },
                           {
+                            key: "endpoint",
                             type: "text",
                             label: "Endpoint",
                             description: "Optional custom AWS endpoint URL.",
@@ -3785,6 +3802,7 @@ paths:
                       credential_fields:
                         [
                           {
+                            key: "key",
                             type: "text",
                             label: "Key",
                             description: "AWS Access Key ID.",
@@ -3792,6 +3810,7 @@ paths:
                             sensitive: true,
                           },
                           {
+                            key: "secret",
                             type: "text",
                             label: "Secret",
                             description: "AWS Secret Access Key.",
@@ -3799,6 +3818,7 @@ paths:
                             sensitive: true,
                           },
                           {
+                            key: "session",
                             type: "text",
                             label: "Session",
                             description: "Optional AWS Session Token.",
@@ -3843,6 +3863,7 @@ paths:
                     # remote_setup_url is optional, omitted here
                     config_fields: [
                         {
+                          key: "url",
                           type: "text",
                           label: "URL",
                           description: "The URL to send the webhook to.",
@@ -3852,6 +3873,7 @@ paths:
                       ]
                     credential_fields: [
                         {
+                          key: "secret",
                           type: "text",
                           label: "Secret",
                           description: "Optional signing secret.",
diff --git a/docs/pages/destinations.mdx b/docs/pages/destinations.mdx
index 4280108aa..7936153c0 100644
--- a/docs/pages/destinations.mdx
+++ b/docs/pages/destinations.mdx
@@ -59,6 +59,7 @@ For example, for the `webhook` type:
   "remote_setup_url": null,
   "config_fields": [
     {
+      "key": "url",
       "type": "text",
       "label": "URL",
       "description": "The URL to send the event to",

From e7d220964d7c187ac9afc6c4da200a7d0f409dec Mon Sep 17 00:00:00 2001
From: Phil Leggetter <phil@leggetter.co.uk>
Date: Fri, 10 Apr 2026 18:27:50 +0100
Subject: [PATCH 34/47] docs: refine Building your own UI guide and onboarding
 agent prompt

Rebalance audience and IA (SDK-first server usage, wire JSON in later sections).
Shorten prompt invariants with links; align with integration checklist.

Made-with: Cursor
---
 docs/pages/guides/building-your-own-ui.mdx    | 37 ++++++++++++++++---
 .../hookdeck-outpost-agent-prompt.mdx         | 20 +++++++++-
 2 files changed, 49 insertions(+), 8 deletions(-)

diff --git a/docs/pages/guides/building-your-own-ui.mdx b/docs/pages/guides/building-your-own-ui.mdx
index fd8496b76..a2d1cd28c 100644
--- a/docs/pages/guides/building-your-own-ui.mdx
+++ b/docs/pages/guides/building-your-own-ui.mdx
@@ -4,17 +4,21 @@ title: "Building Your Own UI"
 
 While Outpost offers a Tenant User Portal, you may want to build your own UI so your customers can manage their destinations and view delivery activity.
 
+This page is for **teams shipping that experience**—usually product engineers and anyone designing settings, integrations, or support tooling around webhooks and other destination types. It is framework-agnostic: screens, flows, and how they map to Outpost. If you use an **AI coding assistant** with Hookdeck’s optional [integration prompt](/docs/quickstarts/hookdeck-outpost-agent-prompt), that document carries workflow-specific instructions; this guide stays focused on what your **customers** should see and what your **backend** should enforce.
+
 The portal uses the same Outpost API you can call from your product. Its source is a useful reference ([`internal/portal`](https://github.com/hookdeck/outpost/tree/main/internal/portal), React); you are not required to match its stack.
 
-This guide is framework-agnostic. It describes screens, flows, and how they map to the API. For paths, query parameters, request and response JSON, status codes, and authentication, use the [OpenAPI specification](/docs/api) as the authoritative contract. If anything here disagrees with OpenAPI, trust the spec.
+For paths, query parameters, request and response JSON, status codes, and authentication, use the [OpenAPI specification](/docs/api) as the authoritative contract. If anything here disagrees with OpenAPI, trust the spec.
+
+**Prefer official SDKs on the server** where Hookdeck provides them for your backend language—see the [SDK overview](/docs/sdks) and the **curl**, **TypeScript**, **Python**, or **Go** quickstart in this documentation for runnable examples. The SDKs wrap the same API: less boilerplate, typed clients, and fewer raw HTTP mistakes. Use **OpenAPI** as the contract for **wire JSON** (especially when your browser or BFF returns JSON that should match the HTTP API), for generated clients, or when you integrate from a stack without a first-party SDK.
 
 ### Working from OpenAPI
 
-Each screen should map to named operations in the spec (list destinations, create destination, list events, and so on). Use the published schemas for request bodies and list rows.
+Map each surface in your product to named operations in the spec (list destinations, create destination, list events, and so on). Use the published schemas for request bodies and list rows, and implement those operations with the **official SDK** on your backend when available.
 
-Destination type labels, icons, and dynamic form fields come from `GET /destination-types`—specifically `config_fields` and `credential_fields` (see [Destination type metadata and dynamic config](#destination-type-metadata-and-dynamic-config)). That response is the source for field keys and types, not guesses from older examples.
+Destination type labels, icons, and dynamic form fields come from `GET /destination-types`—specifically `config_fields` and `credential_fields` (see [Destination type metadata and dynamic config](#destination-type-metadata-and-dynamic-config)). That response is the source for field keys and types, not guesses from older examples. Each field object includes a **`key`**: the property name inside the destination’s `config` or `credentials` object (for example `url` for a webhook). This is documented on **`DestinationSchemaField`** in [OpenAPI](/docs/api).
 
-If the browser calls Outpost directly, use the tenant JWT flows documented in OpenAPI. If you proxy through your backend (often called a BFF), your server performs the same operations with your session and injects `tenant_id` where the admin-key flows require it.
+Whether the browser uses a **tenant JWT** or talks only to **your** API, the operations are the ones in OpenAPI; see [Authentication](#authentication) for how credentials and `tenant_id` are applied.
 
 The portal shows full UI code for complex forms; this page avoids long framework-specific snippets so the spec stays the single place for shapes and validation.
 
@@ -71,6 +75,12 @@ You can issue a tenant JWT for client-side calls to Outpost, or proxy requests t
 
 Proxying is useful when you want to restrict which Outpost features are exposed or to keep the admin key off the client entirely.
 
+### Browser, your API, and Outpost (BFF pattern)
+
+In a typical **backend-for-frontend** arrangement, the customer’s browser calls **your** product API only. Your servers call Outpost with the **platform** API key and the correct **`tenant_id`** for the signed-in account. Teams refer to this as a **BFF**, an **Outpost proxy**, or a server-side integration layer—the pattern is the same.
+
+The alternative is for the browser to call Outpost **directly** using a short-lived **tenant JWT** ([Generating a JWT Token](#generating-a-jwt-token-optional) below). Many products prefer a proxy so the admin key never ships to the client and so they can limit which Outpost capabilities the UI may invoke.
+
 ### API base URL (managed and self-hosted)
 
 Use one configurable base URL for Outpost (no trailing slash), for example `API_URL` or `OUTPOST_API_BASE_URL`. Paths in this guide match [OpenAPI](/docs/api) (`/tenants/...`, `/topics`, `/destination-types`, …).
@@ -106,10 +116,22 @@ Each entry typically includes (confirm names and optionality in OpenAPI):
 
 ### Dynamic field shape (for forms)
 
-Field objects are fully described in OpenAPI. Typically each has `key`, `label`, `type` (text vs checkbox), `required`, optional `description`, validation (`minlength`, `maxlength`, `pattern`), `default`, `disabled`, and `sensitive` (password-style; values may be masked after create—clear to edit).
+Field objects are fully described in OpenAPI (`DestinationSchemaField`), including **`key`** (where to place the value in `config` / `credentials` on create/update). Each field has `label`, `type` (text vs checkbox vs select vs key-value map), `required`, optional `description`, validation (`minlength`, `maxlength`, `pattern`), `default`, `disabled`, and `sensitive` (password-style; values may be masked after create—clear to edit). On submit, map each value to the **`key`** Outpost expects inside `config` / `credentials`, regardless of how property names were transformed earlier in your stack—see [Wire JSON, SDK responses, and your UI](#wire-json-sdk-responses-and-your-ui).
 
 **Reference:** [DestinationConfigFields.tsx](https://github.com/hookdeck/outpost/blob/main/internal/portal/src/common/DestinationConfigFields/DestinationConfigFields.tsx) maps schema fields to inputs.
 
+### Wire JSON, SDK responses, and your UI
+
+This section matters whether you use an **official SDK** on the server (recommended when available) or raw HTTP: the **HTTP API** always follows [OpenAPI](/docs/api), while SDKs present language-native types to your backend code.
+
+HTTP responses from Outpost on the wire use JSON property names that match OpenAPI—typically **snake_case** (for example `config_fields`, `credential_fields`, and `remote_setup_url` on `GET /destination-types`).
+
+Official **SDKs** deserialize into language-native structures; names often differ from the wire format (for example TypeScript may expose **camelCase** such as `configFields` and `credentialFields`). Mutations use each SDK’s documented request types, which may not mirror OpenAPI field names literally.
+
+When a **browser** loads destination-type metadata via **your** backend, it receives whatever JSON your server returns. Options include forwarding the **raw** Outpost response body (so the client matches OpenAPI) or translating once on the server and treating that as your product’s contract. In all cases, create and update bodies must still place each value under the schema field’s **`key`** inside `config` and `credentials` as defined in OpenAPI.
+
+**Shape mismatches** between layers often appear as missing dynamic fields or create errors referencing absent `config.*` keys (for example `config.url` for webhooks). Comparing the **actual** JSON your UI receives with the property names your rendering code expects (`config_fields` versus `configFields`, and similar) usually isolates the problem.
+
 ### Remote setup URL
 
 When `remote_setup_url` is present, you can link users through an external setup flow (for example Hookdeck-managed configuration) instead of only inline fields.
@@ -181,12 +203,15 @@ This section connects what your customers see (what was delivered, what failed,
 
 ## Implementation checklists
 
-These are readiness checks: they do not replace the tables above or OpenAPI. Use them to confirm nothing important was skipped before ship or when reviewing an implementation.
+Use these lists before launch, in design or code review, or when comparing your tenant experience to the patterns above. They do not replace OpenAPI, security review, or testing against your deployment.
+
+For **customer-facing** destination and delivery UI, work through **Planning and contract**, **Destinations experience**, and **Activity, attempts, and retries** at minimum. Skip rows that clearly do not apply (for example, if you only expose destinations through your own API and have no in-app activity screens—document how customers verify delivery instead).
 
 ### Planning and contract
 
 - [ ] Every call is scoped to the correct tenant (`tenant_id` on admin-key routes, or tenant inferred from JWT).
 - [ ] Outpost base URL comes from configuration or environment for dev, staging, and production (not a single hardcoded host in app code).
+- [ ] Server-side Outpost calls use an **official SDK** when Hookdeck ships one for your language; raw HTTP or generated OpenAPI clients are fine when they fit better.
 - [ ] You chose an auth approach (browser JWT, server-side proxy/BFF, or mix) and use the matching OpenAPI operations and headers consistently.
 - [ ] Dynamic destination UI (labels, icons, form fields) is driven by `GET /destination-types`, not copied field lists from examples.
 
diff --git a/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx b/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx
index 875bd739b..f8551a81e 100644
--- a/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx
+++ b/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx
@@ -46,6 +46,8 @@ Use this **Hookdeck Console Source** URL to verify event delivery (the webhook `
 - Full docs bundle (when available on the public site): {{LLMS_FULL_URL}}
 - SDK overview: {{DOCS_URL}}/sdks — use **only** for high-level context; for **TypeScript, Python, or Go** code, follow that language’s **quickstart** for correct method signatures (e.g. Python `publish.event` uses `request={{...}}`, not TypeScript-style spreads as Python kwargs).
 
+**SDK vs OpenAPI (BFF / dashboard UI):** **Prefer the official server SDK** when Hookdeck provides one for the repo’s backend language (**{{DOCS_URL}}/sdks**). Keep these invariants: (1) **Wire JSON** matches **OpenAPI** (often **snake_case**). **SDKs** rename fields in language types (e.g. TypeScript **camelCase**). (2) The **browser** should consume the same JSON shape your BFF actually returns—or the server should **normalize** (e.g. forward raw `GET /destination-types`). (3) On create/update, each schema field’s **`key`** maps into `config` / `credentials` per OpenAPI. **Calling** Outpost: use **SDK** types when using the SDK; use **OpenAPI** for raw `fetch` / curl. Detail: **{{DOCS_URL}}/guides/building-your-own-ui#authentication** and **{{DOCS_URL}}/guides/building-your-own-ui#wire-json-sdk-responses-and-your-ui**.
+
 **When you build customer-facing UI or integrate into an existing product (not for quick path only):**
 
 - **Building your own UI — screen structure and flow** (list destinations—**any type**; create: choose **type** → topics → type-specific config; **events** / **attempts** / **manual retry**; tenant scope; default **destination → activity**): {{DOCS_URL}}/guides/building-your-own-ui
@@ -121,7 +123,7 @@ Use the **official SDK for the repo’s backend language** on the **server** (or
 
 If the codebase already has **customer-facing UI** (dashboard, settings, integrations) **or** a client that talks to **your** API, operators usually want customers to **manage destinations** (every **destination type** the project enables; see **{{DOCS_URL}}/destinations** and **`GET /destination-types`** in OpenAPI) **inside the product**:
 
-- **Backend:** **`OUTPOST_API_KEY`** and all Outpost SDK usage **server-side only**. **Tenant** upsert/sync where it fits, **publish** on real domain events, and **authenticated routes** (BFF, server handlers, server actions—whatever matches **their** stack) to list/create/update/delete destinations for the **signed-in customer’s** tenant. Handlers call Outpost with platform credentials; responses expose only what the customer should see (ids, targets, topics—**never** the platform API key).
+- **Backend:** **`OUTPOST_API_KEY`** and all Outpost SDK usage **server-side only**. **Tenant** upsert/sync where it fits, **publish** on real domain events, and **authenticated routes** (backend-for-frontend / BFF, server handlers, server actions—whatever matches **their** stack) to list/create/update/delete destinations for the **signed-in customer’s** tenant. Handlers call Outpost with platform credentials; responses expose only what the customer should see (ids, targets, topics—**never** the platform API key).
 - **Frontend:** **Logged-in** clients call **your** backend (session, JWT, existing API client)—**not** Hookdeck’s API directly; **not** the Outpost SDK in the browser. Reuse their design system and routing. **Before** building screens, read **Concepts** and **Building your own UI**: **tenant scope**, **multiple destinations**, **destination = topics + delivery target** (avoid one undifferentiated “webhook” field unless they want that simplification).
 - **Events and retries:** Surface **events** (filter by **destination** when useful) and **attempts** per event; offer **manual retry** for failed attempts (server-side retry API with `event_id` and `destination_id`) after they fix downstream—see **Building your own UI** (default **destination → activity**).
 - **Test publish (recommended when shipping destination UI):** A **separate** control that **publishes a test event** for the signed-in tenant (server-side `publish` to a configured topic). Complementary to domain publishes; **does not replace** a real domain `publish`.
@@ -151,9 +153,13 @@ Apply **only** the items below that fit the task; **skip** any that do not apply
 - [ ] **Test publish (if you added one):** Kept as a **separate** control from domain logic; does not satisfy the domain-publish item by itself.
 - [ ] **Build integrity:** Generated outputs, route or module registries, and dependency lockfiles are **consistent** with new or edited source so a **clean** install + typecheck or build (or the repo’s documented CI step) would pass.
 
+**When you added or changed customer-facing destination management in an existing full-stack product** (dashboard, settings, or integrations UI—per **Existing application (full-stack products)** above):
+
+- [ ] **Full-stack UI bar:** Walked **Planning and contract**, **Destinations experience**, and **Activity, attempts, and retries** in **{{DOCS_URL}}/guides/building-your-own-ui#implementation-checklists** and confirmed the implementation matches: list rows reach **detail** and **destination-scoped activity** (events → attempts → manual retry as appropriate), **dynamic** create (and edit if you expose it) is driven by **`GET /destination-types`** (including each field’s **`key`** in `config` / `credentials`), and a **separate server-side test publish** control exists when customers can manage destinations. *Skip this item if the product is **API-only** (no customer UI for destinations) or the operator explicitly excluded activity / test UI—then document verification instead (README, Outpost dashboard, or curl to list events/attempts).*
+
 **Files on disk:** When your environment supports it, **write runnable artifacts into the operator’s project workspace** (real files: scripts, app source, `package.json`, `go.mod`, README) rather than only pasting long code in chat—so they can run, diff, and commit. Keep everything for one task in the same directory. For **new minimal application**, scaffold and install dependencies as you normally would (`npm` / `npx`, `go mod`, `pip` or `uv`). For **existing** full-stack products, change both **backend and frontend** (or equivalent UI layer) when the repo already includes customer-facing UI—do not stop at OpenAPI-only unless the product is genuinely API-only or the operator asks to skip UI work.
 
-**Concepts:** Each **tenant** is one of the platform’s customers (an org/account you sell to). A tenant has **zero or more destinations**; each **destination** is a **subscription**—a **destination type** (webhook, queue, Hookdeck, …) plus **which topics** to receive and **where** to deliver (type-specific: URL, queue name, etc.). Your **backend** publishes with **`tenant_id`**, **`topic`**, and payload; Outpost fans out to every destination of that tenant that subscribes to that topic. Topic names should reflect **your product’s events**; **`user.*`** usually means **users inside that tenant’s account**, not your company’s internal operators. Read **{{DOCS_URL}}/concepts** and **{{DOCS_URL}}/guides/building-your-own-ui** for the full model and recommended screens. Topics for this project are listed above and were configured in the Hookdeck dashboard.
+**Concepts:** Read **{{DOCS_URL}}/concepts** for tenants, destinations as subscriptions, topics, and how **publish** fans out. Use **{{DOCS_URL}}/guides/building-your-own-ui** for recommended screens and implementation checklists. **Configured topics** above lists this project’s topic names (dashboard); **`user.*`** naming semantics are explained under **Configured topics** in this prompt.
 ```
 
 ## Placeholder reference
@@ -166,6 +172,16 @@ Apply **only** the items below that fit the task; **skip** any that do not apply
 | `{{DOCS_URL}}` | `https://outpost.hookdeck.com/docs` | Public docs root (no trailing slash). For unpublished docs, automated evals can set **`EVAL_LOCAL_DOCS=1`** so the Documentation section is replaced with repository file paths (see `docs/agent-evaluation/README.md`). |
 | `{{LLMS_FULL_URL}}` | `https://hookdeck.com/outpost/docs/llms-full.txt` | Optional; omit the line if not live yet |
 
+### Building your own UI — where the detail lives
+
+Product guidance is consolidated in **[Building your own UI](/docs/guides/building-your-own-ui)**:
+
+- **[Implementation checklists](/docs/guides/building-your-own-ui#implementation-checklists)** — ship/review rows for destinations and activity (referenced from **Before you stop (verify)** in the template above; not duplicated here).
+- **[Authentication](/docs/guides/building-your-own-ui#authentication)** — browser vs your API vs Outpost (**BFF** pattern) and JWT option.
+- **[Wire JSON, SDK responses, and your UI](/docs/guides/building-your-own-ui#wire-json-sdk-responses-and-your-ui)** — snake_case wire vs SDK names, `key` in `config` / `credentials`, shape mismatches.
+
+That page is written for **teams integrating Outpost** (engineers, PMs, reviewers). **Agent evaluation** in the Outpost repository (`docs/agent-evaluation/scenarios/`, scenarios **8–10** for existing-app baselines) uses the same implementation checklist when a run includes **customer-facing** destination UI—see each scenario’s success criteria for links.
+
 ## Operator checklist (dashboard UI)
 
 - Show **API base URL** and **topics** next to the copyable prompt.

From 8f240ec1ce3293e3b428fe7a553d317b3316818c Mon Sep 17 00:00:00 2001
From: Phil Leggetter <phil@leggetter.co.uk>
Date: Fri, 10 Apr 2026 18:27:58 +0100
Subject: [PATCH 35/47] =?UTF-8?q?docs(eval):=20tighten=20scenarios=2008?=
 =?UTF-8?q?=E2=80=9310=20and=20transcript=20heuristics?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Stricter success criteria with guide/prompt references; align placeholders.
Add heuristic checks for activity and test-publish signals where applicable.

Made-with: Cursor
---
 .../fixtures/placeholder-values-for-turn0.md  |  8 ++---
 .../scenarios/08-integrate-nextjs-existing.md | 12 +++++--
 .../09-integrate-fastapi-existing.md          | 10 +++---
 .../scenarios/10-integrate-go-existing.md     | 10 ++++--
 docs/agent-evaluation/src/score-transcript.ts | 34 +++++++++++++++++++
 5 files changed, 61 insertions(+), 13 deletions(-)

diff --git a/docs/agent-evaluation/fixtures/placeholder-values-for-turn0.md b/docs/agent-evaluation/fixtures/placeholder-values-for-turn0.md
index 2336f6352..f17f94ce6 100644
--- a/docs/agent-evaluation/fixtures/placeholder-values-for-turn0.md
+++ b/docs/agent-evaluation/fixtures/placeholder-values-for-turn0.md
@@ -2,11 +2,11 @@
 
 The **prompt template itself** lives in one place only:
 
-**[`hookdeck-outpost-agent-prompt.mdx`](../../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx)** (from repo root: `docs/pages/quickstarts/...`) — copy the fenced block under **## Template**, then replace each `{{PLACEHOLDER}}` using the table below.
+`**[hookdeck-outpost-agent-prompt.mdx](../../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx)`** (from repo root: `docs/pages/quickstarts/...`) — copy the fenced block under **## Template**, then replace each `{{PLACEHOLDER}}` using the table below.
 
-Do **not** paste real API keys into chat. Have operators put `OUTPOST_API_KEY` in a project **`.env`** (or another loader), not in the agent transcript. Use a throwaway Hookdeck project when possible.
+Do **not** paste real API keys into chat. Have operators put `OUTPOST_API_KEY` in a project `**.env`** (or another loader), not in the agent transcript. Use a throwaway Hookdeck project when possible.
 
-For **`npm run eval -- --scenario …`** (or **`--scenarios`** / **`--all`**), the runner only needs **`ANTHROPIC_API_KEY`** and **`EVAL_TEST_DESTINATION_URL`**. To score a **full** eval (generated commands/code actually work), you still need **`OUTPOST_API_KEY`** (and usually **`OUTPOST_TEST_WEBHOOK_URL`**) when you **execute** the agent’s output afterward. Optional **`EVAL_LOCAL_DOCS=1`** points Turn 0 at repo paths instead of live `{{DOCS_URL}}` links.
+For `**npm run eval -- --scenario …**` (or `**--scenarios**` / `**--all**`), the runner only needs `**ANTHROPIC_API_KEY**` and `**EVAL_TEST_DESTINATION_URL**`. To score a **full** eval (generated commands/code actually work), you still need `**OUTPOST_API_KEY`** (and usually `**OUTPOST_TEST_WEBHOOK_URL**`) when you **execute** the agent’s output afterward. Optional `**EVAL_LOCAL_DOCS=1`** points Turn 0 at repo paths instead of live `{{DOCS_URL}}` links.
 
 ---
 
@@ -26,4 +26,4 @@ For **`npm run eval -- --scenario …`** (or **`--scenarios`** / **`--all`**), t
 
 ## Dashboard implementation note
 
-When this text is embedded in the Hookdeck product, the **same** template body should be rendered from one dashboard/backend source so docs and product stay aligned. The MDX page in this repo is the documentation **canonical** copy until product source is wired to match it.
+When this text is embedded in the Hookdeck product, the **same** template body should be rendered from one dashboard/backend source so docs and product stay aligned. The MDX page in this repo is the documentation **canonical** copy until product source is wired to match it.
\ No newline at end of file
diff --git a/docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md b/docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md
index 9471a654c..74ad08253 100644
--- a/docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md
+++ b/docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md
@@ -54,17 +54,23 @@ Paste the **## Template** block from [`hookdeck-outpost-agent-prompt.mdx`](../pa
 
 **Measurement:** Heuristic `scoreScenario08` in [`src/score-transcript.ts`](../src/score-transcript.ts); LLM judge maps the bullets below ([`README.md` § Measuring scenarios](../README.md#measuring-scenarios)). Execution row is manual.
 
+**Contract:** The baseline ships a **customer-facing dashboard**. Treat it like **Existing application (full-stack products)** in [`hookdeck-outpost-agent-prompt.mdx`](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx). The detailed UI bar is **not** repeated here—use **[Building your own UI — Implementation checklists](../../pages/guides/building-your-own-ui.mdx#implementation-checklists)** (*Planning and contract*, *Destinations experience*, *Activity, attempts, and retries*). The agent must self-verify with **Before you stop (verify)** in the same prompt (full-stack UI item).
+
 - Baseline app is the documented **next-saas-starter** (or an explicitly justified fork): harness clone under the run directory plus install / integration steps reflected in the transcript or that tree.
 - **Outpost TypeScript SDK** used **server-side only**; no `NEXT_PUBLIC_*` API key.
 - **Topic reconciliation:** README or inline notes map **each `publish` topic** to a **real domain event**; if the app needs topics not in the **configured project list** from onboarding, instructions say to **add them in Hookdeck** (domain-first—not reshaping product logic to fit a stale default list unless wiring-only scope was agreed).
-- At least one **publish** on a **real domain path** (signup, CRUD, billing, etc.)—**not** only a synthetic “test event” route. A separate test publish for wiring checks is fine but does **not** replace this.
-- **Per-customer webhook** story is explained: destination creation / subscription to topic; **tenant ↔ customer** mapping is consistent for publish and destination APIs.
+- **Domain publish:** At least one **`publish` on a real domain path** (signup, CRUD, billing, etc.)—**not** only a synthetic “test event” route.
+- **Separate test publish:** A **distinct** server-side control (button, action, or route) that publishes a **test** event for the signed-in tenant—**in addition to** domain publish; does **not** satisfy the domain-publish requirement by itself (see prompt).
+- **Full-stack destination + activity UI:** Customers can **drill into** a destination (detail or edit—per product policy), reach **destination-scoped activity** (events / attempts / manual retry for failures) via **your** authenticated routes, and **create** destinations using **dynamic** fields from **`GET /destination-types`** (each field’s **`key`** → `config` / `credentials`). **List rows** link or navigate into that flow—not **only** create + delete with no detail or activity. Omit sub-items only if Turn 1 explicitly scoped **backend-only** or excluded activity UI (then document how operators verify delivery instead).
+- **Per-customer webhook** story: **tenant ↔ customer** mapping is consistent for publish and destination APIs.
 - README (or equivalent) lists **env vars** for Outpost.
-- **Execution (full pass):** With `OUTPOST_API_KEY` set, the app runs; perform a **real in-app action** that triggers the domain publish and confirm Outpost accepts it (2xx/202). Optionally also run a test publish. Smoke from **`results/runs/…-scenario-08/next-saas-starter/`** (not transcript-only triage).
+- **Execution (full pass):** With `OUTPOST_API_KEY` set, the app runs; perform a **real in-app action** that triggers the domain publish and confirm Outpost accepts it (2xx/202). Exercise **test publish** and **activity / retry** in the UI when present. Smoke from **`results/runs/…-scenario-08/next-saas-starter/`** (not transcript-only triage).
 
 ## Failure modes to note
 
 - Pasting a greenfield Next app instead of integrating the **baseline** in the workspace.
+- **List-only** destinations (no drill-down to detail or destination-scoped activity) while the baseline still has a product dashboard—unless the user explicitly scoped backend-only.
+- **No separate test publish** when customers can manage destinations from the UI.
 - Publishing only from a demo or **test-only** route with no domain path.
 - **Topics** in code with no README telling the operator to **add** them in Hookdeck when the onboarding topic list was incomplete (or silently retargeting domain logic to unrelated configured names).
 - Calling Outpost from client components with secrets.
diff --git a/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md b/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md
index c24787d0a..bd171fb3c 100644
--- a/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md
+++ b/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md
@@ -59,15 +59,16 @@ Paste the **## Template** from [`hookdeck-outpost-agent-prompt.mdx`](../pages/qu
 
 **Measurement:** Heuristic `scoreScenario09` in [`src/score-transcript.ts`](../src/score-transcript.ts); LLM judge (reads this section); execution manual.
 
+**Contract:** Same full-stack bar as scenario **8**, pinned to this template. **Canonical checklist:** [Building your own UI — Implementation checklists](../../pages/guides/building-your-own-ui.mdx#implementation-checklists). **Agent self-verify:** [`hookdeck-outpost-agent-prompt.mdx`](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx) → *Before you stop (verify)* (full-stack UI item). Do not duplicate checklist rows in transcripts—confirm against the guide.
+
 - **full-stack-fastapi-template** (or documented alternative) present via harness **`preSteps`** with install steps in the transcript or tree.
 - **`outpost_sdk`** with **`publish.event`** (and related calls as needed) on a **real** code path in the **backend** (server-side only for secrets)—**not** only a synthetic test-publish endpoint unless the scenario was explicitly scoped to wiring-only.
-- **Domain + test publish:** At least one **`publish` on a real domain path** (entity create/update, signup, etc.). A **separate** test-publish path or control is **also** expected for this baseline so operators can smoke-test wiring without waiting on production traffic—it **does not** replace the domain publish requirement.
+- **Domain + test publish:** At least one **`publish` on a real domain path** (entity create/update, signup, etc.). A **separate** test-publish path or control is **required** for this baseline—it **does not** replace the domain publish requirement.
 - API key from **environment** or secure backend settings only — not hard-coded, not exposed via **`NEXT_PUBLIC_*`**, **`VITE_*`**, or other client-visible env patterns.
 - **Topic reconciliation:** each **`topic` in code** ties to a real domain event; gaps vs the **configured project topic list** from onboarding are resolved by **adding topics in Hookdeck** (documented), not by retargeting domain logic to a mismatched list unless wiring-only scope was agreed.
-- **Destinations + tenant:** Per-customer (or per-team) **destination** management is **documented** and, where this template ships a dashboard, implemented with **safe** UI or BFF routes (list/create/edit as appropriate). **`tenant_id`** (or equivalent) is consistent between publish and destination APIs.
-- **Delivery visibility (full-stack bar):** Because this baseline includes a **customer-facing UI**, the product should expose **event activity** aligned with [Building your own UI](../../pages/guides/building-your-own-ui.mdx): customers can see **events** (e.g. filterable by destination), **attempts** for a selected event, and **manual retry** for failed deliveries—all via **your** authenticated backend calling Outpost (admin key server-side), not from the browser with the platform key. Omit only if the user explicitly scoped the task to **backend-only** or excluded activity UI.
+- **Destinations + tenant:** Per-customer (or per-team) **destination** management via **authenticated** UI or BFF routes: **list**, **create**, and **drill-down** (detail and **destination-scoped activity**—events, attempts, **manual retry**). **Dynamic** forms from **`GET /destination-types`** with correct **`key`** → `config` / `credentials`. **`tenant_id`** is consistent between publish and destination APIs. Omit drill-down / activity only if Turn 1 scoped **backend-only** or excluded activity UI (document verification instead).
 - **Operator docs:** Root **README**, **backend/README**, **development.md**, or **`.env.example`** (whichever the template uses) lists **Outpost env vars** and how to run and verify.
-- **Execution (full pass):** Stack runs per template docs; trigger a **real domain action** that fires publish; Outpost accepts. Optionally exercise test publish and activity/retry in the UI. *Skip for transcript-only.*
+- **Execution (full pass):** Stack runs per template docs; trigger a **real domain action** that fires publish; Outpost accepts. Exercise **test publish** and **activity / retry** in the UI when in scope. *Skip for transcript-only.*
 
 ## Failure modes to note
 
@@ -76,6 +77,7 @@ Paste the **## Template** from [`hookdeck-outpost-agent-prompt.mdx`](../pages/qu
 - Putting `OUTPOST_API_KEY` in `NEXT_PUBLIC_*`, `VITE_*`, or other client bundles.
 - **Only** test/synthetic publish with no domain hook, or **only** domain publish with no **separate** test-publish control when a dashboard is in scope.
 - **No** events/attempts/retry surfaced for customers when the baseline includes a product UI and the user did not ask to skip that scope.
+- **Flat list** of destinations with no navigation to **detail** or **per-destination activity** (same as scenario 8 failure mode).
 
 ## Future baselines
 
diff --git a/docs/agent-evaluation/scenarios/10-integrate-go-existing.md b/docs/agent-evaluation/scenarios/10-integrate-go-existing.md
index 7daab8da6..c9ab15366 100644
--- a/docs/agent-evaluation/scenarios/10-integrate-go-existing.md
+++ b/docs/agent-evaluation/scenarios/10-integrate-go-existing.md
@@ -51,14 +51,20 @@ Paste the **## Template** from [`hookdeck-outpost-agent-prompt.mdx`](../pages/qu
 
 **Measurement:** Heuristic `scoreScenario10` in [`src/score-transcript.ts`](../src/score-transcript.ts); LLM judge; execution manual.
 
+**Contract:** This baseline is an **API-first** Go service (no first-party customer dashboard in the pin). It does **not** inherit the full **[Building your own UI](../../pages/guides/building-your-own-ui.mdx)** dashboard checklist wholesale—agents follow **[Existing application](../../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx#existing-application)** (minimum integration depth) plus **API-only** guidance in **Existing application (full-stack products)** (*Document how tenants manage destinations via **your** API*). If a future pin adds a UI, scenarios should be updated to require the **Implementation checklists** linked above.
+
 - **startersaas-go-api** (or documented alternative) present via harness **`preSteps`** with build instructions attempted in the transcript or tree.
 - **Outpost Go SDK** used with **`Publish.Event`** (and related types) on a **real** handler path—not only a test-only route unless wiring-only scope was agreed.
 - No API key in source; **`os.Getenv("OUTPOST_API_KEY")`** (or config loader) only.
-- **Topic reconciliation** (domain-first; operator adds missing Hookdeck topics as documented) + **destination** documentation for operators; **tenant** mapping consistent.
-- **Execution (full pass):** Server runs; trigger the **domain** handler; Outpost accepts publish. *Skip for transcript-only.*
+- **Topic reconciliation** (domain-first; operator adds missing Hookdeck topics as documented); **tenant** mapping consistent everywhere Outpost is called.
+- **Customer webhook registration:** At least one **concrete** story—**implemented** authenticated route(s) and/or **OpenAPI/README**—for how a customer **creates or updates** a webhook destination (URL + topics) for their tenant. Prefer real **`Destinations.Create`** (or update) calls over prose-only if the Turn 1 story asks where destination creation lives.
+- **Test / verify delivery:** A **separate** mechanism from domain publish: e.g. documented **`curl`** + test receiver URL, a **small admin/test publish** endpoint, or README steps to trigger a test event—so operators can prove end-to-end delivery without relying solely on production traffic. Domain publish remains **required**; test-only wiring does **not** replace it (see prompt *Before you stop*).
+- **Execution (full pass):** Server runs; trigger the **domain** handler; Outpost accepts publish. Optionally exercise documented test publish / destination registration. *Skip for transcript-only.*
 
 ## Failure modes to note
 
 - New `main.go` only, without using the **cloned** baseline’s routes/models.
 - Wrong `Create` shape without **`CreateDestinationCreateWebhook`** when creating webhook destinations.
 - Publish only from a **test** helper with no real handler path.
+- **Vague** “customers paste a URL somewhere” with no API contract, handler, or README steps for destination creation when the conversation asked for it.
+- **No** operator-facing way to smoke-test delivery (test publish or documented curl) when README promises outbound webhooks.
diff --git a/docs/agent-evaluation/src/score-transcript.ts b/docs/agent-evaluation/src/score-transcript.ts
index 73bc8d5c8..b3c4df2c9 100644
--- a/docs/agent-evaluation/src/score-transcript.ts
+++ b/docs/agent-evaluation/src/score-transcript.ts
@@ -810,6 +810,28 @@ function scoreScenario08(corpus: string, assistant: string): TranscriptScore {
       : "Expected domain publish (not only publish-test / send test) — see scenario Success criteria",
   });
 
+  const fullStackSignals =
+    /(attempt|retry|list\s*attempt|destination[_-]?scoped|\/activity|\/attempts|events?\s*\(|list\s*events|manual\s*retry)/i.test(
+      t,
+    ) && /(outpost|destination|tenant)/i.test(t);
+  checks.push({
+    id: "delivery_activity_signals",
+    pass: fullStackSignals,
+    detail: fullStackSignals
+      ? "Transcript mentions delivery visibility (attempts/events/retry/activity) with Outpost context"
+      : "Scenario 8 expects destination-scoped activity UI — see Building your own UI checklists + success criteria",
+  });
+
+  const testPublishSeparate =
+    /(test\s*publish|publish\s*test|send\s*test\s*event|\/api\/.*test|test.?event)/i.test(t);
+  checks.push({
+    id: "separate_test_publish_signal",
+    pass: testPublishSeparate,
+    detail: testPublishSeparate
+      ? "Separate test publish / test event control mentioned"
+      : "Expected distinct test-publish path or control (see scenario 8 success criteria)",
+  });
+
   checks.push({
     id: "no_key_in_reply",
     pass: !containsLikelyLeakedKey(assistant),
@@ -909,6 +931,18 @@ function scoreScenario09(corpus: string, assistant: string): TranscriptScore {
       : "Expected operator docs listing OUTPOST env vars (see scenario Success criteria)",
   });
 
+  const fullStackSignals09 =
+    /(attempt|retry|list\s*attempt|destination[_-]?scoped|\/activity|\/attempts|events?\s*\(|list\s*events|manual\s*retry)/i.test(
+      t,
+    ) && /(outpost|destination|tenant)/i.test(t);
+  checks.push({
+    id: "delivery_activity_signals",
+    pass: fullStackSignals09,
+    detail: fullStackSignals09
+      ? "Transcript mentions delivery visibility (attempts/events/retry/activity) with Outpost context"
+      : "Scenario 9 expects full-stack activity UI — see Building your own UI checklists + success criteria",
+  });
+
   checks.push({
     id: "no_key_in_reply",
     pass: !containsLikelyLeakedKey(assistant),

From 2031762e50cb067d0807ff9cfa79cf87b302f200 Mon Sep 17 00:00:00 2001
From: Phil Leggetter <phil@leggetter.co.uk>
Date: Fri, 10 Apr 2026 18:28:05 +0100
Subject: [PATCH 36/47] docs(eval): document wall time for heavy baseline
 scenarios
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Explain 08–10 clone/install cost, sparse console output, and operator knobs
(Ctrl+C, EVAL_SKIP_HARNESS_PRE_STEPS, EVAL_MAX_TURNS, --no-score-llm).
Mirror a short note in run-agent-eval --help output.

Made-with: Cursor
---
 docs/agent-evaluation/README.md             | 11 +++++++++++
 docs/agent-evaluation/src/run-agent-eval.ts |  7 ++++++-
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/docs/agent-evaluation/README.md b/docs/agent-evaluation/README.md
index 94246c975..5dfb9330c 100644
--- a/docs/agent-evaluation/README.md
+++ b/docs/agent-evaluation/README.md
@@ -55,6 +55,17 @@ npm run eval -- --dry-run
 
 The runner loads **`docs/agent-evaluation/.env`** automatically (via `dotenv`). Shell exports still override `.env` if both are set.
 
+### Wall time (scenarios **08–10** and other heavy baselines)
+
+Scenarios that **`git clone`** a full SaaS template and run **`npm` / `pnpm` / `docker compose`** installs are **slow by design**. Expect **roughly 30–90+ minutes** of wall time for a single run of **08**, **09**, or **10** (clone + install + several agent turns). The harness prints little to the terminal until **`transcript.json`** is written at the end, which can look hung.
+
+- **Stop early:** **Ctrl+C** (**SIGINT**) in the terminal running `npm run eval`. The runner writes **`*-scenario-NN.eval-aborted.json`** next to the run folder (see **Harness sidecars** at the top of this file).
+- **Skip re-clone:** If the baseline is already under the run directory, **`EVAL_SKIP_HARNESS_PRE_STEPS=1`** skips **`git_clone`** from the scenario harness (see each scenario’s **`## Eval harness`** block).
+- **Cap agent length (smoke only):** **`EVAL_MAX_TURNS`** (default **80**) limits SDK turns; lowering it may end the run sooner but often **fails** the integration before success criteria are met—use for debugging, not a real pass.
+- **Save judge time only:** **`--no-score-llm`** skips the Success-criteria LLM judge at the end (saves a few minutes; you lose that rubric).
+
+For **fast** automated signal in CI, use **`eval:ci`** (**01** + **02** only)—not **08**.
+
 ### CI (recommended slice)
 
 For **pull-request or main-branch** automation, run **two** scenarios only:
diff --git a/docs/agent-evaluation/src/run-agent-eval.ts b/docs/agent-evaluation/src/run-agent-eval.ts
index 3c34c7d24..ba1129170 100644
--- a/docs/agent-evaluation/src/run-agent-eval.ts
+++ b/docs/agent-evaluation/src/run-agent-eval.ts
@@ -690,7 +690,7 @@ Environment:
   EVAL_LLMS_FULL_URL    Optional (omit docs line if unset)
   EVAL_TOOLS            Optional, comma-separated (default: Read,Glob,Grep[,WebFetch],Write,Edit,Bash — see README)
   EVAL_MODEL            Optional
-  EVAL_MAX_TURNS        Optional (default: 80; npm/go mod installs can exceed 40)
+  EVAL_MAX_TURNS        Optional (default: 80; npm/go mod installs can exceed 40; lower only for smoke — may not finish 08–10)
   EVAL_PERMISSION_MODE  Optional (default: dontAsk)
   EVAL_PERSIST_SESSION  Set to "false" to disable session persistence (breaks multi-turn resume)
   EVAL_DISABLE_WORKSPACE_WRITE_GUARD  Set to 1 to allow Write/Edit outside the run dir (not recommended)
@@ -798,6 +798,11 @@ Agent cwd is usually the run directory. Scenarios may define ## Eval harness (JS
     const turn0Prompt =
       filledTemplate + buildWorkspaceBoundaryAppendix(runDir, agentCwd, REPO_ROOT, localDocs);
     console.error(`\n>>> Scenario ${file} (run dir ${runDir}, agent cwd ${agentCwd}) ...`);
+    if (scenarioIdEarly === "08" || scenarioIdEarly === "09" || scenarioIdEarly === "10") {
+      console.error(
+        "Note: Scenarios 08–10 clone a full baseline and install deps — often 30–90+ min wall time with sparse console output until transcript.json. Ctrl+C aborts (writes *.eval-aborted.json). See README § Wall time.",
+      );
+    }
 
     const sidecars = harnessSidecarPaths(runDir);
     activeHarnessAbortContext = { path: sidecars.aborted, runDirectory: runDir };

From 4186ca3ddb2ff666cb1e8c5c17249d67757c22d7 Mon Sep 17 00:00:00 2001
From: Phil Leggetter <phil@leggetter.co.uk>
Date: Fri, 10 Apr 2026 18:28:11 +0100
Subject: [PATCH 37/47] docs(eval): update scenario run tracker for scenario 08

Record primary run 2026-04-10T14-29-04-214Z-scenario-08, heuristic 10/10,
execution pass, and execution notes (seed/dev, schema key vs SDK).

Made-with: Cursor
---
 docs/agent-evaluation/SCENARIO-RUN-TRACKER.md | 39 ++++++++++++-------
 1 file changed, 25 insertions(+), 14 deletions(-)

diff --git a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
index c043acaa7..7c789f207 100644
--- a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
+++ b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
@@ -18,20 +18,31 @@ Use this table while you **run scenarios one at a time** and **execute the gener
 ## Tracker
 
 
-| ID  | Scenario file                                                                  | Run directory (`results/runs/…`)       | Heuristic              | LLM judge | Execution (generated code) | Notes                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
-| --- | ------------------------------------------------------------------------------ | -------------------------------------- | ---------------------- | --------- | -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
-| 01  | [01-basics-curl.md](scenarios/01-basics-curl.md)                               | `2026-04-10T09-28-52-764Z-scenario-01` | Pass (7/7)             | Pass      | Pass                       | Artifact: `**quickstart.sh`**. Heuristic + LLM from `npm run eval -- --scenario 01`; harness sidecars are sibling `*.eval-*.json` under `results/runs/` (not inside run dir). Execution: `OUTPOST_API_KEY` from `docs/agent-evaluation/.env` + `bash quickstart.sh` in run dir; tenant **200**, destination **201**, publish **202**; exit 0.                                                                                                                                                                                                                |
-| 02  | [02-basics-typescript.md](scenarios/02-basics-typescript.md)                   | `2026-04-10T10-34-35-461Z-scenario-02` | Pass (9/9)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1` after **scope-router** update to [agent prompt template](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx). Artifact: `**outpost-quickstart.ts`** + `package.json` (SDK)—**no** Next.js scaffold. Heuristic + LLM pass; judge `execution_in_transcript` **pass** (agent ran script in transcript: tenant, destination, event id). Harness sidecars sibling under `results/runs/`. Earlier over-build run: `2026-04-10T09-39-06-362Z-scenario-02` (Next.js + script; LLM fail).                                                  |
-| 03  | [03-basics-python.md](scenarios/03-basics-python.md)                           | `2026-04-10T11-02-19-073Z-scenario-03` | Pass (8/8)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1` with [scope-router prompt](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx). Artifact: `**outpost_quickstart.py`** + `.env.example` (`python-dotenv`, `outpost_sdk`)—**no** web framework. Heuristic + LLM pass; judge `execution_in_transcript` **pass** (agent ran script; printed event id). Harness sidecars sibling under `results/runs/`. Earlier run: `2026-04-08T15-34-12-720Z-scenario-03`.                                                                                                                                 |
-| 04  | [04-basics-go.md](scenarios/04-basics-go.md)                                   | `2026-04-08T15-48-31-367Z-scenario-04` | Pass (9/9)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifacts: `**main.go`**, `go.mod` (replace → repo `sdks/outpost-go`). `docs/agent-evaluation/.env` + `go run .`; tenant, destination, publish OK.                                                                                                                                                                                                                                                                                                                                                                                      |
-| 05  | [05-app-nextjs.md](scenarios/05-app-nextjs.md)                                 | `2026-04-08T17-21-22-170Z-scenario-05` | 9/10; overall **Fail** | Pass      | Pass                       | `**nextjs-webhook-demo/`** — primary assessed run; see **§ Scenario 05 — assessment (`17-21-22`)**. Heuristic failure is `managed_base_not_selfhosted` (doc-corpus), not the app. Earlier: `2026-04-08T16-12-10-708Z` (`outpost-nextjs-demo/`, 10/10 heuristic, simpler UI).                                                                                                                                                                                                                                                                                 |
-| 06  | [06-app-fastapi.md](scenarios/06-app-fastapi.md)                               | `2026-04-09T08-38-42-008Z-scenario-06` | Pass (8/8)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. `**main.py`** + `requirements.txt`, `outpost_sdk` + FastAPI. HTML: destinations list, add webhook (topics from API + URL), publish test event, delete. Execution: `python3 -m venv .venv`, `pip install -r requirements.txt`, run-dir `.env`, `uvicorn main:app` on :8766; **GET /** 200, **POST /destinations** 303, **POST /publish** 303.                                                                                                                                                                                            |
-| 07  | [07-app-go-http.md](scenarios/07-app-go-http.md)                               | `2026-04-09T09-10-23-291Z-scenario-07` | Pass (9/9)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. `**go-portal-demo/`** — `main.go` + `templates/`, `net/http`, `outpost-go` (`replace` → repo `sdks/outpost-go`). Multi-step create destination + **GET/POST /publish**. Execution: `PORT=8777` + key/base from `docs/agent-evaluation/.env`; **GET /** 200, **POST /publish** 200. Eval ~25 min wall time.                                                                                                                                                                                                                              |
-| 08  | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md)   | `2026-04-09T14-48-16-906Z-scenario-08` | Pass (7/7)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. `**## Eval harness`** pre-clone + `**agent cwd`** = `next-saas-starter/` under run dir; artifact colocated (`app/api/outpost/`**, dashboard webhooks, `@hookdeck/outpost-sdk`). **Execution:** `npx tsc --noEmit` in `…/next-saas-starter/` — **exit 0**. Eval ~13 min wall time. Earlier run `2026-04-09T11-08-32-505Z-scenario-08`: work had landed outside run dir (no app tree in folder).                                                                                                                                          |
-| 09  | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) | `2026-04-09T22-16-54-750Z-scenario-09` | Pass (6/6)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. **Artifact** lives under `results/runs/…` (**gitignored**): `full-stack-fastapi-template/` + Docker **outpost-local-s09**; ports **5173** / **8001** / **54333** / **1080**. **§ Scenario 09 — post-agent work** lists everything applied after the agent transcript (incl. test publish, events/attempts/retry UI, docs + prompt). **§ Scenario 09 — review notes** — closed (IA + domain topics guidance landed in BYO UI + prompt). **Legacy runs:** `2026-04-09T20-48-16-530Z-scenario-09`, `2026-04-09T15-51-44-184Z-scenario-09`. |
-| 10  | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md)           |                                        |                        |           |                            |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
+| ID  | Scenario file                                                                  | Run directory (`results/runs/…`)       | Heuristic              | LLM judge | Execution (generated code) | Notes                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
+| --- | ------------------------------------------------------------------------------ | -------------------------------------- | ---------------------- | --------- | -------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| 01  | [01-basics-curl.md](scenarios/01-basics-curl.md)                               | `2026-04-10T09-28-52-764Z-scenario-01` | Pass (7/7)             | Pass      | Pass                       | Artifact: `**quickstart.sh`**. Heuristic + LLM from `npm run eval -- --scenario 01`; harness sidecars are sibling `*.eval-*.json` under `results/runs/` (not inside run dir). Execution: `OUTPOST_API_KEY` from `docs/agent-evaluation/.env` + `bash quickstart.sh` in run dir; tenant **200**, destination **201**, publish **202**; exit 0.                                                                                                                                                                                                                                     |
+| 02  | [02-basics-typescript.md](scenarios/02-basics-typescript.md)                   | `2026-04-10T10-34-35-461Z-scenario-02` | Pass (9/9)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1` after **scope-router** update to [agent prompt template](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx). Artifact: `**outpost-quickstart.ts`** + `package.json` (SDK)—**no** Next.js scaffold. Heuristic + LLM pass; judge `execution_in_transcript` **pass** (agent ran script in transcript: tenant, destination, event id). Harness sidecars sibling under `results/runs/`. Earlier over-build run: `2026-04-10T09-39-06-362Z-scenario-02` (Next.js + script; LLM fail).                                                                          |
+| 03  | [03-basics-python.md](scenarios/03-basics-python.md)                           | `2026-04-10T11-02-19-073Z-scenario-03` | Pass (8/8)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1` with [scope-router prompt](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx). Artifact: `**outpost_quickstart.py`** + `.env.example` (`python-dotenv`, `outpost_sdk`)—**no** web framework. Heuristic + LLM pass; judge `execution_in_transcript` **pass** (agent ran script; printed event id). Harness sidecars sibling under `results/runs/`. Earlier run: `2026-04-08T15-34-12-720Z-scenario-03`.                                                                                                                                                   |
+| 04  | [04-basics-go.md](scenarios/04-basics-go.md)                                   | `2026-04-08T15-48-31-367Z-scenario-04` | Pass (9/9)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifacts: `**main.go`**, `go.mod` (replace → repo `sdks/outpost-go`). `docs/agent-evaluation/.env` + `go run .`; tenant, destination, publish OK.                                                                                                                                                                                                                                                                                                                                                                                                           |
+| 05  | [05-app-nextjs.md](scenarios/05-app-nextjs.md)                                 | `2026-04-08T17-21-22-170Z-scenario-05` | 9/10; overall **Fail** | Pass      | Pass                       | `**nextjs-webhook-demo/`** — primary assessed run; see **§ Scenario 05 — assessment (`17-21-22`)**. Heuristic failure is `managed_base_not_selfhosted` (doc-corpus), not the app. Earlier: `2026-04-08T16-12-10-708Z` (`outpost-nextjs-demo/`, 10/10 heuristic, simpler UI).                                                                                                                                                                                                                                                                                                      |
+| 06  | [06-app-fastapi.md](scenarios/06-app-fastapi.md)                               | `2026-04-09T08-38-42-008Z-scenario-06` | Pass (8/8)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. `**main.py`** + `requirements.txt`, `outpost_sdk` + FastAPI. HTML: destinations list, add webhook (topics from API + URL), publish test event, delete. Execution: `python3 -m venv .venv`, `pip install -r requirements.txt`, run-dir `.env`, `uvicorn main:app` on :8766; **GET /** 200, **POST /destinations** 303, **POST /publish** 303.                                                                                                                                                                                                                 |
+| 07  | [07-app-go-http.md](scenarios/07-app-go-http.md)                               | `2026-04-09T09-10-23-291Z-scenario-07` | Pass (9/9)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. `**go-portal-demo/`** — `main.go` + `templates/`, `net/http`, `outpost-go` (`replace` → repo `sdks/outpost-go`). Multi-step create destination + **GET/POST /publish**. Execution: `PORT=8777` + key/base from `docs/agent-evaluation/.env`; **GET /** 200, **POST /publish** 200. Eval ~25 min wall time.                                                                                                                                                                                                                                                   |
+| 08  | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md)   | `2026-04-10T14-29-04-214Z-scenario-08` | Pass (10/10)           | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1` + [scope-router prompt](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx). Harness `**next-saas-starter/`** under run dir (gitignored). **Execution pass** — operator QA (Postgres, `.env`, migrate/seed/dev, Outpost UI/API). See **§ Scenario 08 — execution notes** for reproducibility (seed/`server-only`, destination-schema `key` vs SDK). Earlier: `2026-04-10T11-08-35-921Z-scenario-08` (8/8), `2026-04-09T14-48-16-906Z-scenario-08`, `2026-04-09T11-08-32-505Z-scenario-08`. |
+| 09  | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) | `2026-04-09T22-16-54-750Z-scenario-09` | Pass (6/6)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. **Artifact** lives under `results/runs/…` (**gitignored**): `full-stack-fastapi-template/` + Docker **outpost-local-s09**; ports **5173** / **8001** / **54333** / **1080**. **§ Scenario 09 — post-agent work** lists everything applied after the agent transcript (incl. test publish, events/attempts/retry UI, docs + prompt). **§ Scenario 09 — review notes** — closed (IA + domain topics guidance landed in BYO UI + prompt). **Legacy runs:** `2026-04-09T20-48-16-530Z-scenario-09`, `2026-04-09T15-51-44-184Z-scenario-09`.                      |
+| 10  | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md)           |                                        |                        |           |                            |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
 
 
+### Scenario 08 — execution notes (`2026-04-10T14-29-04-214Z-scenario-08`)
+
+**Execution:** **Pass** — operator QA on `**next-saas-starter/`** (artifact **not** committed; run folder under `results/runs/` is gitignored).
+
+Reproducibility / gotchas:
+
+- **`pnpm db:migrate`** — succeeds against local Postgres when `POSTGRES_URL` is set (see clone `README.md`).
+- **`pnpm db:seed`** — as generated, importing `stripe` from `**lib/payments/stripe.ts**` pulls Outpost and `**server-only**`, which throws when the seed script runs under `**tsx**` (not the Next server). Common **local** fix: instantiate `**Stripe**` directly in `**lib/db/seed.ts**` with the same `**apiVersion**` as the payments module so seed does not load that file. Requires valid **Stripe** keys in `.env`. Re-running seed after a successful run fails on duplicate `**test@test.com**` — expected.
+- **`pnpm dev`** — if another `**next dev**` already holds **`.next/dev/lock`** for this tree, stop it or remove the lock; port **3000** may be taken (Next picks another port). Turbopack may warn about multiple lockfiles when the app sits under the monorepo — see Next’s **`turbopack.root`** guidance if needed.
+- **Destination schema `key`** — API returns `key` on schema fields; older SDK parses may strip it and break create-destination payloads keyed from labels. Regenerating SDKs (or a BFF raw fetch + mapping) aligns the UI with the API until then.
+
 ### Scenario 09 — post-agent work (`2026-04-09T22-16-54-750Z-scenario-09`)
 
 Work applied **after** the agent transcript so the FastAPI + React artifact matches current integration guidance (eval honesty + local execution). The template tree under `results/runs/<stamp>-scenario-09/` is **not committed** (see `results/.gitignore`); repo **docs** and **prompt** updates that back this scenario **are** in git.
@@ -40,12 +51,12 @@ Work applied **after** the agent transcript so the FastAPI + React artifact matc
 
 - **TanStack Router:** `frontend/src/routeTree.gen.ts` — register `/_layout/webhooks` (agent added the route file but not the generated tree).
 - **API base URL:** webhooks page used browser-relative `/api/...` against nginx; switched to backend base (`OpenAPI.BASE` / `VITE_API_URL`).
-- **Destination types:** Outpost JSON uses `**type`** and `**icon**` (not `id` / `svg`); fixed controlled radios / **Next** in the create wizard.
+- **Destination types:** Outpost JSON uses `**type`** and `**icon`** (not `id` / `svg`); fixed controlled radios / **Next** in the create wizard.
 
 **Backend**
 
 - `**POST /api/v1/webhooks/publish-test`** — synthetic `publish` for integration testing.
-- `**GET /api/v1/webhooks/events**`, `**GET /api/v1/webhooks/attempts**`, `**POST /api/v1/webhooks/retry**` — BFF proxies for tenant-scoped **events list**, **attempts**, and **manual retry** (admin key server-side).
+- `**GET /api/v1/webhooks/events`**, `**GET /api/v1/webhooks/attempts**`, `**POST /api/v1/webhooks/retry**` — BFF proxies for tenant-scoped **events list**, **attempts**, and **manual retry** (admin key server-side).
 
 **Dashboard UI (webhooks page)**
 

From c83d43d14895d9c0d21c1b41e2919b422198ca88 Mon Sep 17 00:00:00 2001
From: Phil Leggetter <phil@leggetter.co.uk>
Date: Fri, 10 Apr 2026 19:16:05 +0100
Subject: [PATCH 38/47] docs: drop destinations overview hub; clarify OSS
 hosting in concepts

- Remove redundant destinations/overview.mdoc; link to overview#supported-destinations
  from quickstarts, building-your-own-ui, nav, and redirects (/destinations)
- Document MAX_DESTINATIONS_PER_TENANT and DESTINATIONS_METADATA_PATH under
  self-hosting configuration
- Concepts: Hookdeck hosts same open-source Outpost; GitHub feature requests for all
- Ignore docs/dist and docs/TEMP-*.md; remove temp onboarding status file

Made-with: Cursor
---
 .gitignore                                    |   1 +
 ...TEMP-hookdeck-outpost-onboarding-status.md | 101 ------------------
 docs/content/concepts.mdoc                    |   8 +-
 docs/content/destinations/overview.mdoc       |  99 -----------------
 docs/content/guides/building-your-own-ui.mdoc |   4 +-
 docs/content/nav.json                         |   1 -
 .../quickstarts/hookdeck-outpost-curl.mdoc    |   2 +-
 .../quickstarts/hookdeck-outpost-go.mdoc      |   2 +-
 .../quickstarts/hookdeck-outpost-python.mdoc  |   2 +-
 .../hookdeck-outpost-typescript.mdoc          |   2 +-
 docs/content/redirects.json                   |   2 +-
 docs/content/self-hosting/configuration.mdoc  |   7 ++
 12 files changed, 20 insertions(+), 211 deletions(-)
 delete mode 100644 docs/TEMP-hookdeck-outpost-onboarding-status.md
 delete mode 100644 docs/content/destinations/overview.mdoc

diff --git a/.gitignore b/.gitignore
index 3ba3f42d2..64578dcf3 100644
--- a/.gitignore
+++ b/.gitignore
@@ -8,6 +8,7 @@
 
 # Documentation (local build artifacts; content lives under docs/content/)
 /docs/dist/
+/docs/TEMP-*.md
 /tmp
 
 # Golang test coverage
diff --git a/docs/TEMP-hookdeck-outpost-onboarding-status.md b/docs/TEMP-hookdeck-outpost-onboarding-status.md
deleted file mode 100644
index e37ec4a9d..000000000
--- a/docs/TEMP-hookdeck-outpost-onboarding-status.md
+++ /dev/null
@@ -1,101 +0,0 @@
-# Hookdeck Outpost onboarding — status (temporary)
-
-**Purpose:** Track implementation status for the managed quickstarts, agent prompt, and related work. **Delete this file** when tracking moves elsewhere (e.g. Linear, parent epic).
-
-**Last updated:** 2026-04-07
-
----
-
-## Agent eval harness — **implemented**; **prompt validation in progress**
-
-The automated harness in `docs/agent-evaluation/` is in place. **What it does today:**
-
-
-| Area           | Status                                                                                                                                                                                                                                                                                    |
-| -------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| **Runner**     | `src/run-agent-eval.ts` — **## Template** from `hookdeck-outpost-agent-prompt.mdx`, `{{…}}` from env, multi-turn scenarios, **Claude Agent SDK** with `**Read` / `Glob` / `Grep` / `WebFetch` / `Write` / `Edit` / `Bash`**, `**cwd`** = `results/runs/<stamp>-scenario-NN/`              |
-| **Artifacts**  | `transcript.json`, optional `**heuristic-score.json`** + `**llm-score.json`** (LLM reads each scenario `**## Success criteria**`), agent-written files beside the transcript                                                                                                              |
-| **Heuristics** | `score-transcript.ts` — `**scoreScenario01`–`scoreScenario10`** on assistant text + tool corpus (so **Write**/Edit content counts)                                                                                                                                                        |
-| **Scenarios**  | **01–04:** try-it-out (curl, TS, Python, Go). **05–07:** minimal UIs (Next, FastAPI, Go `net/http`). **08–10:** Option 3 — integrate into pinned repos (Next `**leerob/next-saas-starter`**, FastAPI `**fastapi/full-stack-fastapi-template`**, Go `**devinterface/startersaas-go-api**`) |
-| **CLI**        | `**npm run eval` requires `--scenario`, `--scenarios`, or `--all`** — no accidental full-suite run. Default scoring = **heuristic + LLM judge** unless `**--no-score`** / `**--no-score-llm`** or `**EVAL_NO_SCORE_***`. **Exit 1** if any enabled score fails                            |
-| **CI**         | `**npm run eval:ci`** = `**--scenarios 01,02`** + heuristic **and** LLM judge. `**scripts/ci-eval.sh`** — requires `**ANTHROPIC_API_KEY`**, `**EVAL_TEST_DESTINATION_URL**`                                                                                                               |
-| **Re-score**   | `npm run score -- --run <run-dir> [--llm] [--write]`                                                                                                                                                                                                                                      |
-
-
-**Operational**
-
-- Prefer a normal runner / full permissions for session persistence (`~/.claude/...`); tight sandboxes can break multi-turn resume.
-- **Validate the prompt in stages** (simple → complex); exact commands below.
-
-### Recommended run order (test evals → stress prompt)
-
-Run from `**docs/agent-evaluation/`** with `**.env`** set (`**ANTHROPIC_API_KEY**`, `**EVAL_TEST_DESTINATION_URL**`). Use a normal terminal (not a restricted sandbox) for reliable SDK sessions.
-
-**Stage A — basics (fast, minimal tooling)**
-
-```sh
-npm run eval -- --scenarios 01,02,03,04
-```
-
-**Stage B — minimal example apps**
-
-```sh
-npm run eval -- --scenarios 05,06,07
-```
-
-**Stage C — existing-app integration (clone + integrate; slowest)**
-
-```sh
-npm run eval -- --scenarios 08,09,10
-```
-
-**Full suite (explicit cost)**
-
-```sh
-npm run eval -- --all
-```
-
-After each stage, inspect `**results/runs/<stamp>-scenario-NN/**` (transcript, scores, on-disk artifacts). **Goal:** confirm the **dashboard prompt** + **Success criteria** hold across stacks; **Execution** (live `**OUTPOST_API_KEY`**) remains a separate human step per scenario.
-
----
-
-## Agent eval automation (original plan — historical)
-
-1. **In-repo runner** — ✅ Node + Agent SDK (not shell-only `curl`).
-2. **Default backend: Anthropic** — ✅ Agent SDK.
-3. **Claude Code CLI** — Optional local path only (unchanged).
-4. **OpenAI adapter** — Still optional / not implemented.
-5. **Judging** — ✅ Transcripts on disk; ✅ heuristics; ✅ LLM-as-judge vs `**## Success criteria`**.
-6. **CI shape** — ✅ `eval:ci` + docs; **GitHub Actions workflow** not committed (add `workflow_dispatch` + secrets when ready).
-
-**Avoid as primary design:** brittle hand-rolled JSON in bash, or CLI-only gates that break for contributors and headless runners.
-
----
-
-## Done (Outpost OSS repo)
-
-- Managed quickstarts: `hookdeck-outpost-curl.mdx`, `-typescript.mdx`, `-python.mdx`, `-go.mdx`
-- Agent prompt template page: `hookdeck-outpost-agent-prompt.mdx` (includes **Files on disk** guidance)
-- Zudoku sidebar: **Quickstarts → Hookdeck Outpost** (above **Self-Hosted**)
-- `quickstarts.mdx` index: managed vs self-hosted links
-- Content aligned with product copy: API key from **Settings → Secrets**, verify via Hookdeck Console + project logs
-- SDK quickstarts: env vars, step-commented scripts
-- **Agent evaluation:** `docs/agent-evaluation/` — scenarios **01–10**, dual scoring, explicit CLI, CI slice, `**SCENARIO-RUN-TRACKER.md`** (per-scenario + execution log), `results/README.md`, `fixtures/`, `SKILL-UPSTREAM-NOTES.md`
-
-## Pending / follow-up
-
-- **Prompt + eval validation (in progress):** Run stages **A → B → C** above (or `**--all`** when deliberate); record pass/fail per scenario; adjust prompt or heuristics if systematic failures appear
-- **hookdeck/agent-skills:** Refresh `skills/outpost/SKILL.md` using `docs/agent-evaluation/SKILL-UPSTREAM-NOTES.md` (managed-first, correct `/tenants/` paths, env naming)
-- **QA:** Run TypeScript, Python, and Go examples against live managed API; confirm production doc links
-- **Test destination URL:** When Console has a stable public URL story, align quickstarts if copy changes
-- **Hookdeck Dashboard:** Two-step onboarding (topics → copy agent prompt) with placeholder injection; env UI for `OUTPOST_API_KEY` (not in prompt body)
-- **Hookdeck Astro site:** MDX, `llms.txt` / `llms-full.txt`, canonical `DOCS_URL`
-- **CI workflow:** Optional GitHub Actions job for `eval:ci` with secrets
-- **Deferred (not blocking GA):** Broader docs IA per original plan
-
-## References
-
-- OpenAPI / managed base URL: `https://api.outpost.hookdeck.com/2025-07-01` (in `docs/apis/openapi.yaml` `servers`)
-- Agent template source: `docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx`
-- Eval harness: `docs/agent-evaluation/README.md`
-
diff --git a/docs/content/concepts.mdoc b/docs/content/concepts.mdoc
index 90259efb8..270764d7d 100644
--- a/docs/content/concepts.mdoc
+++ b/docs/content/concepts.mdoc
@@ -100,10 +100,12 @@ The following destination types are available for your tenants to configure:
 - [Azure Service Bus](/docs/outpost/destinations/azure-service-bus)
 - [GCP Pub/Sub](/docs/outpost/destinations/gcp-pubsub)
 - [RabbitMQ (AMQP)](/docs/outpost/destinations/rabbitmq)
-- [Amazon EventBridge (planned)](https://github.com/hookdeck/outpost/issues/201)
-- [Kafka (planned)](https://github.com/hookdeck/outpost/issues/141)
+- [Kafka](/docs/outpost/destinations/kafka)
+- Amazon EventBridge (planned)
 
-If there is an event destination type that you would like to see supported, [open a feature request](https://github.com/hookdeck/outpost/issues/new?assignees=&labels=enhancement&projects=&template=feature_request.md&title=%F0%9F%9A%80+Feature%3A+).
+**Hookdeck Outpost** is the same [open-source Outpost](https://github.com/hookdeck/outpost) project, operated on Hookdeck’s infrastructure. We do not maintain a separate hosted fork; what we run tracks the public codebase.
+
+If there is an event destination type you would like to see supported, [open a feature request on GitHub](https://github.com/hookdeck/outpost/issues/new?assignees=&labels=enhancement&projects=&template=feature_request.md&title=%F0%9F%9A%80+Feature%3A+).
 
 For a diagram of how the API, delivery, and log services connect in **self-hosted** deployments, see [Self-hosting architecture](/docs/outpost/self-hosting/architecture).
 
diff --git a/docs/content/destinations/overview.mdoc b/docs/content/destinations/overview.mdoc
deleted file mode 100644
index 890ce3db0..000000000
--- a/docs/content/destinations/overview.mdoc
+++ /dev/null
@@ -1,99 +0,0 @@
----
-title: "Destinations"
-description: "Supported destination types, creating destinations, filters, and dynamic configuration from the API."
----
-
-Outpost supports multiple event destination types. Each tenant can have multiple destinations, up to a maximum set by the `MAX_DESTINATIONS_PER_TENANT` environment variable (defaulting to `20`).
-
-> We recommend setting the `MAX_DESTINATIONS_PER_TENANT` value as low as is appropriate for your use case to prevent abuse and performance degradation. Updating the value to a lower value later will not delete existing destinations.
-
-## Supported Destinations
-
-| Destination | Description |
-| ----------- | ----------- |
-| [Webhook](/docs/outpost/destinations/webhook) | Send events via HTTP POST to a URL |
-| [Hookdeck](/docs/outpost/destinations/hookdeck) | Route events through Hookdeck Event Gateway |
-| [AWS Kinesis](/docs/outpost/destinations/aws-kinesis) | Stream events to Amazon Kinesis |
-| [AWS SQS](/docs/outpost/destinations/aws-sqs) | Send events to an Amazon SQS queue |
-| [AWS S3](/docs/outpost/destinations/aws-s3) | Store events in an Amazon S3 bucket |
-| [Azure Service Bus](/docs/outpost/destinations/azure-service-bus) | Send events to Azure Service Bus |
-| [GCP Pub/Sub](/docs/outpost/destinations/gcp-pubsub) | Publish events to Google Cloud Pub/Sub |
-| [RabbitMQ](/docs/outpost/destinations/rabbitmq) | Send events to a RabbitMQ exchange |
-
-See the [Outpost overview](/docs/outpost/overview) and [GitHub issues](https://github.com/hookdeck/outpost/issues) for planned destination types. To be eligible as a destination type, it must be asynchronous in nature and not run any business logic.
-
-## Creating a Destination
-
-Destinations can be registered through the tenant portal or via the API. Each destination type has its own configuration and credentials. Refer to the [Create Destination API](/docs/outpost/api/destinations#create-destination) for the required `config` and `credentials` fields for each destination type.
-
-```sh
-curl --location 'https://<OUTPOST_API_URL>/api/v1/tenants/<TENANT_ID>/destinations' \
---header 'Content-Type: application/json' \
---header 'Authorization: Bearer <API_KEY>' \
---data '{
-  "type": "<TYPE>",
-  "topics": ["<TOPIC>"],
-  "config": { ... },
-  "credentials": { ... }
-}'
-```
-
-## Destination Filtering
-
-Destinations can be configured with filters to selectively receive only events matching specific criteria. This allows tenants to create fine-grained routing rules based on event properties.
-
-See the [Filters](/docs/outpost/features/filter) documentation for the complete filter syntax and examples.
-
-## Getting Destination Types & Schemas
-
-When using the API, you may want to build your own UI to capture user input on the destination configuration. Since each destination requires a specific configuration, the `GET /destination-types` endpoint provides a JSON schema for standardized input fields for each destination type.
-
-For example, for the `webhook` type:
-
-```json
-{
-  "type": "webhook",
-  "label": "Webhook",
-  "description": "Send events via an HTTP POST request to a URL",
-  "icon": "<svg />",
-  "instructions": "Some *markdown*",
-  "remote_setup_url": null,
-  "config_fields": [
-    {
-      "key": "url",
-      "type": "text",
-      "label": "URL",
-      "description": "The URL to send the event to",
-      "pattern": "/((([A-Za-z]{3,9}:(?://)?)(?:[-;:&=+$,w]+@)?[A-Za-z0-9.-]+(:[0-9]+)?|(?:www.|[-;:&=+$,w]+@)[A-Za-z0-9.-]+)((?:/[+~%/.w-_]*)???(?:[-+=&;%@.w_]*)#?(?:[w]*))?)/",
-      "required": true
-    }
-  ],
-  "credential_fields": []
-}
-```
-
-### `config_fields` `Field[]`
-
-Config fields are non-secret values that can be stored and displayed to the user in plain text.
-
-### `credential_fields` `Field[]`
-
-Credential fields are secret values that will be AES encrypted and obfuscated to the user. Some credentials may not be obfuscated; the destination type dictates the obfuscation logic.
-
-### `instructions` `string`
-
-Some destinations will require instructions to configure. For instance, with Pub/Sub, the user will need to create a service account and grant some permissions to that service account. The value is a markdown string to be rendered with any markdown rendering library. Images will be hosted through the GitHub repository.
-
-### `remote_setup_url`
-
-Some destinations may have OAuth flow or other managed setup flow that can be triggered with a link. If a `remote_setup_url` is set, then the user should be prompted to follow the link to configure the destination.
-
-See the [building your own UI guide](/docs/outpost/guides/building-your-own-ui) for recommended UI patterns and wireframes for implementation in your own app.
-
-## Customizing Destination Type Definitions & Instructions
-
-The destination type definitions (label, description, icon, etc) and instructions can be customized by setting the `DESTINATIONS_METADATA_PATH` environment variable to a path on disk containing the destination type definitions and instructions. Outpost will load both the default destination type definitions and any custom destination type definitions and merge them.
-
-> Note: Core fields (`config_fields` and `credential_fields`) cannot be overridden via custom metadata. Only non-core fields such as `label`, `description`, `icon`, and `instructions` can be customized.
-
-The metadata path is a directory containing a subdirectory for each destination type. Each destination type directory contains a `metadata.json` file and an `instructions.md` file. You can find the default destination type definitions and instructions in the [outpost-providers](https://github.com/hookdeck/outpost/tree/main/internal/destregistry/metadata/providers) folder.
diff --git a/docs/content/guides/building-your-own-ui.mdoc b/docs/content/guides/building-your-own-ui.mdoc
index 405a3ff7c..59ea3ae51 100644
--- a/docs/content/guides/building-your-own-ui.mdoc
+++ b/docs/content/guides/building-your-own-ui.mdoc
@@ -42,7 +42,7 @@ The tenant portal illustrates how screens map to tenant → destinations → top
 
 ### Default information architecture (multi-destination products)
 
-When a tenant can have many destinations—of any [destination type](/docs/outpost/destinations/overview) your project enables—the primary path is destination → activity: people ask “what was delivered to this subscription?” rather than seeing all traffic in one undifferentiated list. The same API applies for webhooks, queues, and other types; only create/edit forms differ, driven by [destination type metadata and dynamic config](#destination-type-metadata-and-dynamic-config).
+When a tenant can have many destinations—of any [destination type](/docs/outpost/overview#supported-destinations) your project enables—the primary path is destination → activity: people ask “what was delivered to this subscription?” rather than seeing all traffic in one undifferentiated list. The same API applies for webhooks, queues, and other types; only create/edit forms differ, driven by [destination type metadata and dynamic config](#destination-type-metadata-and-dynamic-config).
 
 For list events and list attempts, reuse the same endpoints everywhere: vary query parameters (for example `destination_id`, cursors) rather than inventing parallel client-side contracts. Pagination and auth details are defined in [OpenAPI](/docs/outpost/api); [Events, attempts, and retries](#events-attempts-and-retries) below summarizes how those endpoints support common UI needs.
 
@@ -51,7 +51,7 @@ For list events and list attempts, reuse the same endpoints everywhere: vary que
 | Example route | What it does | Spec |
 | ------------- | ------------ | ---- |
 | `…/destinations` or `…/integrations` | Hub: list destinations; create or drill down | [Listing destinations](#listing-configured-destinations) · [List destinations](/docs/outpost/api/destinations#list-destinations) |
-| `…/destinations/new` (or wizard) | Create destination: choose type ([types](/docs/outpost/destinations/overview); `GET /destination-types` in [OpenAPI](/docs/outpost/api)), then topics and config | [Creating a destination](#creating-a-destination) |
+| `…/destinations/new` (or wizard) | Create destination: choose type ([types](/docs/outpost/overview#supported-destinations); `GET /destination-types` in [OpenAPI](/docs/outpost/api)), then topics and config | [Creating a destination](#creating-a-destination) |
 | `…/destinations/:destinationId` | Detail: edit config, enable/disable, topics | [OpenAPI](/docs/outpost/api) — Destinations |
 | `…/destinations/:destinationId/activity` | Activity for this destination: events, attempts, retry | [Events, attempts, and retries](#events-attempts-and-retries) · [List events](/docs/outpost/api/events#list-events) · [List attempts](/docs/outpost/api/attempts#list-attempts) |
 | `…/activity` (optional) | Tenant-wide activity; optional filter by `destination_id` | Same list-events operation with different query params ([OpenAPI](/docs/outpost/api)) |
diff --git a/docs/content/nav.json b/docs/content/nav.json
index 525f6eb4d..f7147e05b 100644
--- a/docs/content/nav.json
+++ b/docs/content/nav.json
@@ -56,7 +56,6 @@
       "label": "Destinations",
       "sections": [
         [
-          { "slug": "destinations/overview", "title": "Overview" },
           { "slug": "destinations/webhook", "title": "Webhook" },
           {
             "slug": "destinations/hookdeck",
diff --git a/docs/content/quickstarts/hookdeck-outpost-curl.mdoc b/docs/content/quickstarts/hookdeck-outpost-curl.mdoc
index a14900d86..69cabaeea 100644
--- a/docs/content/quickstarts/hookdeck-outpost-curl.mdoc
+++ b/docs/content/quickstarts/hookdeck-outpost-curl.mdoc
@@ -97,7 +97,7 @@ If you combine API response bodies with `curl --write-out '\n%{http_code}'`:
 
 ## Next steps
 
-- [Destination types](/docs/outpost/destinations/overview) — webhooks, AWS SQS, RabbitMQ, Hookdeck, and more
+- [Destination types](/docs/outpost/overview#supported-destinations) — webhooks, AWS SQS, RabbitMQ, Hookdeck, and more
 - [Tenant user portal](/docs/outpost/features/tenant-user-portal) — optional UI for tenants to manage their own destinations
 - [SDKs](/docs/outpost/sdks) — TypeScript, Python, Go, and others
 - [API reference](/docs/outpost/api) — full REST API
diff --git a/docs/content/quickstarts/hookdeck-outpost-go.mdoc b/docs/content/quickstarts/hookdeck-outpost-go.mdoc
index 1bc22ad0e..1b8f999d4 100644
--- a/docs/content/quickstarts/hookdeck-outpost-go.mdoc
+++ b/docs/content/quickstarts/hookdeck-outpost-go.mdoc
@@ -157,7 +157,7 @@ For all topics on that destination, use `components.CreateTopicsTopicsEnum(compo
 
 ## Next steps
 
-- [Destination types](/docs/outpost/destinations/overview)
+- [Destination types](/docs/outpost/overview#supported-destinations)
 - [Tenant user portal](/docs/outpost/features/tenant-user-portal)
 - [SDKs](/docs/outpost/sdks)
 - [API reference](/docs/outpost/api)
diff --git a/docs/content/quickstarts/hookdeck-outpost-python.mdoc b/docs/content/quickstarts/hookdeck-outpost-python.mdoc
index 75a02c2b4..37a627001 100644
--- a/docs/content/quickstarts/hookdeck-outpost-python.mdoc
+++ b/docs/content/quickstarts/hookdeck-outpost-python.mdoc
@@ -128,7 +128,7 @@ Use `topics: ["*"]` on the destination to receive all configured topics.
 
 ## Next steps
 
-- [Destination types](/docs/outpost/destinations/overview)
+- [Destination types](/docs/outpost/overview#supported-destinations)
 - [Tenant user portal](/docs/outpost/features/tenant-user-portal)
 - [SDKs](/docs/outpost/sdks)
 - [API reference](/docs/outpost/api)
diff --git a/docs/content/quickstarts/hookdeck-outpost-typescript.mdoc b/docs/content/quickstarts/hookdeck-outpost-typescript.mdoc
index a51dabc11..c58381103 100644
--- a/docs/content/quickstarts/hookdeck-outpost-typescript.mdoc
+++ b/docs/content/quickstarts/hookdeck-outpost-typescript.mdoc
@@ -129,7 +129,7 @@ To subscribe the destination to all topics, pass `topics: ["*"]` instead of `[to
 
 ## Next steps
 
-- [Destination types](/docs/outpost/destinations/overview)
+- [Destination types](/docs/outpost/overview#supported-destinations)
 - [Tenant user portal](/docs/outpost/features/tenant-user-portal)
 - [SDKs](/docs/outpost/sdks)
 - [API reference](/docs/outpost/api)
diff --git a/docs/content/redirects.json b/docs/content/redirects.json
index ec6304c5b..cc0570e92 100644
--- a/docs/content/redirects.json
+++ b/docs/content/redirects.json
@@ -21,7 +21,7 @@
   },
   {
     "from": "/docs/outpost/destinations",
-    "to": "/docs/outpost/destinations/overview"
+    "to": "/docs/outpost/overview#supported-destinations"
   },
   {
     "from": "/docs/outpost/guides",
diff --git a/docs/content/self-hosting/configuration.mdoc b/docs/content/self-hosting/configuration.mdoc
index ec8e9126c..b7af25c83 100644
--- a/docs/content/self-hosting/configuration.mdoc
+++ b/docs/content/self-hosting/configuration.mdoc
@@ -102,6 +102,13 @@ Choose one for event log persistence:
 | `ALERT_CONSECUTIVE_FAILURE_COUNT` | `20` | Consecutive failures before alert triggers |
 | `ALERT_AUTO_DISABLE_DESTINATION` | `true` | Auto-disable destination when failure count reaches 100% |
 
+## Destinations
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `MAX_DESTINATIONS_PER_TENANT` | `20` | Maximum destinations each tenant may create. Set as low as is practical for your product to limit abuse and load; lowering this value later does **not** remove destinations that already exist. |
+| `DESTINATIONS_METADATA_PATH` | — | Optional. Filesystem path to a directory of [custom destination metadata](https://github.com/hookdeck/outpost/tree/main/internal/destregistry/metadata/providers) (per-type `metadata.json` and `instructions.md`). Non-core fields such as `label`, `description`, `icon`, and `instructions` can be customized; `config_fields` and `credential_fields` cannot be overridden. |
+
 ## Webhook Behavior
 
 | Variable | Default | Description |

From 14ab5a27348869aac36f969cca84861cc01a2cf0 Mon Sep 17 00:00:00 2001
From: Phil Leggetter <phil@leggetter.co.uk>
Date: Fri, 10 Apr 2026 22:59:40 +0100
Subject: [PATCH 39/47] docs: use hookdeck.com/docs/outpost for production doc
 links

Update README, OpenAPI contact URL, entrypoint migration hint, and example
READMEs so public links match Outpost docs on Hookdeck.

Made-with: Cursor
---
 README.md                                     | 22 +++++++++----------
 build/entrypoint.sh                           |  2 +-
 docs/apis/openapi.yaml                        |  2 +-
 examples/azure/README.md                      |  2 +-
 .../demos/dashboard-integration/README.md     |  2 +-
 examples/kubernetes/README.md                 |  2 +-
 6 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/README.md b/README.md
index b5978eb84..1e00d30e7 100644
--- a/README.md
+++ b/README.md
@@ -53,7 +53,7 @@ Outpost is built and maintained by [Hookdeck](https://hookdeck.com?ref=github-ou
 
 ![Outpost architecture](docs/public/images/architecture.png)
 
-Read [Outpost Concepts](https://outpost.hookdeck.com/docs/concepts) to learn more about the Outpost architecture and design.
+Read [Outpost Concepts](https://hookdeck.com/docs/outpost/concepts) to learn more about the Outpost architecture and design.
 
 ## Features
 
@@ -70,17 +70,17 @@ Read [Outpost Concepts](https://outpost.hookdeck.com/docs/concepts) to learn mor
 - **Webhook best practices**: Opt-out webhook best practices, such as headers for idempotency, timestamp and signature, and signature rotation.
 - **SDKs and MCP server**: Go, Python, and TypeScript SDK are available. Outpost also ships with an MCP server. All generated by [Speakeasy](https://speakeasy.com).
 
-See the [Outpost Features](https://outpost.hookdeck.com/docs/features) for more information.
+See the [Outpost Features](https://hookdeck.com/docs/outpost/features) for more information.
 
 ## Documentation
 
-- [Overview](https://outpost.hookdeck.com/docs/overview)
-- [Concepts](https://outpost.hookdeck.com/docs/concepts)
-- [Quickstarts](https://outpost.hookdeck.com/docs/quickstarts)
-- [Features](https://outpost.hookdeck.com/docs/features)
-- [Guides](https://outpost.hookdeck.com/docs/guides)
-- [API Reference](https://outpost.hookdeck.com/docs/api)
-- [Configuration Reference](https://outpost.hookdeck.com/docs/references/configuration)
+- [Overview](https://hookdeck.com/docs/outpost/overview)
+- [Concepts](https://hookdeck.com/docs/outpost/concepts)
+- [Quickstarts](https://hookdeck.com/docs/outpost/quickstarts)
+- [Features](https://hookdeck.com/docs/outpost/features)
+- [Guides](https://hookdeck.com/docs/outpost/guides)
+- [API Reference](https://hookdeck.com/docs/outpost/api)
+- [Configuration Reference](https://hookdeck.com/docs/outpost/self-hosting/configuration)
 
 _The Outpost documentation is built using the [Zudoku documentation framework](https://zuplo.link/outpost)._
 
@@ -144,7 +144,7 @@ For other cloud Redis services or self-hosted Redis clusters, set `REDIS_CLUSTER
 ```sh
 go run cmd/redis-debug/main.go your-redis-host 6379 password 0 [tls] [cluster]
 ```
-See the [Redis Troubleshooting Guide](https://docs.outpost.hookdeck.com/references/troubleshooting-redis) for detailed guidance.
+See the [Redis Troubleshooting Guide](https://hookdeck.com/docs/outpost/self-hosting/guides/troubleshooting-redis) for detailed guidance.
 
 Start the Outpost dependencies and services:
 
@@ -241,7 +241,7 @@ Open the `redirect_url` link to view the Outpost portal.
 
 ![Dashboard homepage](docs/public/images/dashboard-homepage.png)
 
-Continue to use the [Outpost API](https://outpost.hookdeck.com/docs/api) or the Outpost portal to add and test more destinations.
+Continue to use the [Outpost API](https://hookdeck.com/docs/outpost/api) or the Outpost portal to add and test more destinations.
 
 ## Contributing
 
diff --git a/build/entrypoint.sh b/build/entrypoint.sh
index ab22587f8..fce97672c 100755
--- a/build/entrypoint.sh
+++ b/build/entrypoint.sh
@@ -23,7 +23,7 @@ if ! /usr/local/bin/outpost migrate init --current --log-format=json; then
     echo "  docker run --rm hookdeck/outpost migrate --help"
     echo ""
     echo "Learn more about Outpost migration workflow at:"
-    echo "  https://outpost.hookdeck.com/docs/guides/migration"
+    echo "  https://hookdeck.com/docs/outpost/self-hosting/guides/migration"
     echo ""
     exit 1
 fi
diff --git a/docs/apis/openapi.yaml b/docs/apis/openapi.yaml
index 4c03e4e20..ba3309cc6 100644
--- a/docs/apis/openapi.yaml
+++ b/docs/apis/openapi.yaml
@@ -7,7 +7,7 @@ info:
   contact:
     name: Outpost Support
     email: support@hookdeck.com
-    url: https://outpost.hookdeck.com/docs
+    url: https://hookdeck.com/docs/outpost
 security:
   - AdminApiKey: []
   - TenantJwt: []
diff --git a/examples/azure/README.md b/examples/azure/README.md
index b2434da4f..12d9a8e9a 100644
--- a/examples/azure/README.md
+++ b/examples/azure/README.md
@@ -368,7 +368,7 @@ For most users, `azure-deploy.sh` offers a balance of automation, reliability, a
 
 If you are not using the `dependencies.sh` and `local-deploy.sh` scripts to provision your infrastructure, you will need to create the `.env.outpost` and `.env.runtime` files manually.
 
-See the [Configure Azure Service Bus as the Outpost Internal Message Queue](https://outpost.hookdeck.com/docs/guides/service-bus-internal-mq) guide for more details on the environment variables required for Outpost and how to create the values.
+See the [Configure Azure Service Bus as the Outpost Internal Message Queue](https://hookdeck.com/docs/outpost/self-hosting/guides/service-bus-internal-mq) guide for more details on the environment variables required for Outpost and how to create the values.
 
 ### `.env.outpost`
 
diff --git a/examples/demos/dashboard-integration/README.md b/examples/demos/dashboard-integration/README.md
index a8026b4e9..f085ac596 100644
--- a/examples/demos/dashboard-integration/README.md
+++ b/examples/demos/dashboard-integration/README.md
@@ -49,7 +49,7 @@ A Next.js application demonstrating how to integrate Outpost with an API platfor
    TOPICS=user.created,user.updated,order.completed,payment.processed,subscription.created
    ```
 
-   For a full list of Outpost configuration options, see [Outpost Configuration](https://outpost.hookdeck.com/docs/references/configuration)
+   For a full list of Outpost configuration options, see [Outpost Configuration](https://hookdeck.com/docs/outpost/self-hosting/configuration)
 
 4. **Start the complete stack** (PostgreSQL, Redis, RabbitMQ, and Outpost):
 
diff --git a/examples/kubernetes/README.md b/examples/kubernetes/README.md
index 14519f954..5cfe9d147 100644
--- a/examples/kubernetes/README.md
+++ b/examples/kubernetes/README.md
@@ -1 +1 @@
-See https://outpost.hookdeck.com/docs/quickstarts/kubernetes
\ No newline at end of file
+See https://hookdeck.com/docs/outpost/self-hosting/quickstarts/kubernetes
\ No newline at end of file

From 81b2aff46afc37015dc9bbf9f9645de80dd83867 Mon Sep 17 00:00:00 2001
From: Phil Leggetter <phil@leggetter.co.uk>
Date: Fri, 10 Apr 2026 23:00:19 +0100
Subject: [PATCH 40/47] docs(eval): Hookdeck prod as default {{DOCS_URL}}; fix
 harness doc paths

- Default EVAL_DOCS_URL to https://hookdeck.com/docs/outpost
- Replace invalid destinations directory path with overview + webhook mdoc
- Document placeholder examples in agent prompt and fixtures

Made-with: Cursor
---
 docs/agent-evaluation/.env.example            |  5 +-
 .../fixtures/placeholder-values-for-turn0.md  |  4 +-
 docs/agent-evaluation/src/run-agent-eval.ts   | 96 ++++++++++++++-----
 .../hookdeck-outpost-agent-prompt.mdoc        |  2 +-
 4 files changed, 79 insertions(+), 28 deletions(-)

diff --git a/docs/agent-evaluation/.env.example b/docs/agent-evaluation/.env.example
index 9df940ad4..9f1392e98 100644
--- a/docs/agent-evaluation/.env.example
+++ b/docs/agent-evaluation/.env.example
@@ -15,13 +15,16 @@ EVAL_TEST_DESTINATION_URL=
 # Optional (see npm run eval -- --help)
 # EVAL_API_BASE_URL=https://api.outpost.hookdeck.com/2025-07-01
 # EVAL_TOPICS_LIST=- user.created
-# EVAL_DOCS_URL=https://outpost.hookdeck.com/docs
+# EVAL_DOCS_URL=https://hookdeck.com/docs/outpost
 # EVAL_LOCAL_DOCS=1
 # EVAL_LLMS_FULL_URL=
 # Default includes Write, Edit, Bash (per-run workspace + installs). Override to narrow:
 # EVAL_TOOLS=Read,Glob,Grep,WebFetch,Write,Edit,Bash
 # EVAL_MODEL=
 # EVAL_MAX_TURNS=40
+# Long runs (08–10): periodic stderr heartbeats while each agent query is in flight
+# EVAL_PROGRESS=1
+# EVAL_PROGRESS_INTERVAL_MS=30000
 # EVAL_PERMISSION_MODE=dontAsk
 # EVAL_PERSIST_SESSION=true
 # Debug only: allow Write/Edit outside the per-run workspace (not recommended)
diff --git a/docs/agent-evaluation/fixtures/placeholder-values-for-turn0.md b/docs/agent-evaluation/fixtures/placeholder-values-for-turn0.md
index f17f94ce6..152bcf9d3 100644
--- a/docs/agent-evaluation/fixtures/placeholder-values-for-turn0.md
+++ b/docs/agent-evaluation/fixtures/placeholder-values-for-turn0.md
@@ -2,7 +2,7 @@
 
 The **prompt template itself** lives in one place only:
 
-`**[hookdeck-outpost-agent-prompt.mdx](../../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx)`** (from repo root: `docs/pages/quickstarts/...`) — copy the fenced block under **## Template**, then replace each `{{PLACEHOLDER}}` using the table below.
+`**[hookdeck-outpost-agent-prompt.mdoc](../../content/quickstarts/hookdeck-outpost-agent-prompt.mdoc)`** (from repo root: `docs/content/quickstarts/...`) — copy the fenced block under **## Template**, then replace each `{{PLACEHOLDER}}` using the table below.
 
 Do **not** paste real API keys into chat. Have operators put `OUTPOST_API_KEY` in a project `**.env`** (or another loader), not in the agent transcript. Use a throwaway Hookdeck project when possible.
 
@@ -18,7 +18,7 @@ For `**npm run eval -- --scenario …**` (or `**--scenarios**` / `**--all**`), t
 | `{{API_BASE_URL}}`         | `https://api.outpost.hookdeck.com/2025-07-01`                                                                                                                       |
 | `{{TOPICS_LIST}}`          | `- user.created`                                                                                                                                                    |
 | `{{TEST_DESTINATION_URL}}` | Hookdeck Console **Source** URL the dashboard feeds in (for automated evals, set `EVAL_TEST_DESTINATION_URL` to the same value). Example: `https://hkdk.events/...` |
-| `{{DOCS_URL}}`             | `https://outpost.hookdeck.com/docs` (local Zudoku: same paths under `/docs`)                                                                                        |
+| `{{DOCS_URL}}`             | `https://hookdeck.com/docs/outpost` (same path segments as `/docs/outpost/…` on hookdeck.com; see `docs/content/nav.json`)                                              |
 | `{{LLMS_FULL_URL}}`        | Omit the line in the template if unused, or your public `llms-full.txt` URL                                                                                         |
 
 
diff --git a/docs/agent-evaluation/src/run-agent-eval.ts b/docs/agent-evaluation/src/run-agent-eval.ts
index ba1129170..26781248a 100644
--- a/docs/agent-evaluation/src/run-agent-eval.ts
+++ b/docs/agent-evaluation/src/run-agent-eval.ts
@@ -35,7 +35,7 @@ dotenv.config({ path: join(EVAL_ROOT, ".env") });
 const REPO_ROOT = join(EVAL_ROOT, "..", "..");
 const PROMPT_MDX = join(
   REPO_ROOT,
-  "docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx",
+  "docs/content/quickstarts/hookdeck-outpost-agent-prompt.mdoc",
 );
 const SCENARIOS_DIR = join(EVAL_ROOT, "scenarios");
 const RUNS_DIR = join(EVAL_ROOT, "results", "runs");
@@ -101,7 +101,9 @@ function isInitSystemMessage(m: SDKMessage): m is SDKSystemMessage {
 function extractTemplateFromMdx(mdx: string): string {
   const idx = mdx.indexOf("## Template");
   if (idx === -1) {
-    throw new Error("Could not find ## Template in hookdeck-outpost-agent-prompt.mdx");
+    throw new Error(
+      "Could not find ## Template in hookdeck-outpost-agent-prompt.mdoc",
+    );
   }
   const after = mdx.slice(idx);
   const fenceStart = after.indexOf("```");
@@ -122,6 +124,15 @@ function envFlagTruthy(v: string | undefined): boolean {
   return s === "1" || s === "true" || s === "yes";
 }
 
+/** Wall-clock heartbeat while the SDK stream is quiet (e.g. long Bash / blocked subprocess). */
+function evalProgressIntervalMs(): number {
+  const n = Number(process.env.EVAL_PROGRESS_INTERVAL_MS ?? "30000");
+  if (!Number.isFinite(n) || n < 5000) {
+    return 30000;
+  }
+  return n;
+}
+
 /** When docs are not published yet, point the agent at MDX/OpenAPI paths in this repo. */
 function localDocumentationBlock(repoRoot: string, llmsFullUrl: string | undefined): string {
   const f = (...parts: string[]) => join(repoRoot, ...parts);
@@ -145,19 +156,19 @@ Do **not** mix TS call shapes into Python.`;
 
 Do **not** rely on live public documentation URLs for this session. Read these files from the Outpost checkout (for example with the **Read** tool). Paths are absolute from the repository root:
 
-Follow **Language → SDK vs HTTP** below for mapping user intent to the **single** right quickstart. Prefer language quickstarts over \`sdks.mdx\` (TS-heavy).
+Follow **Language → SDK vs HTTP** below for mapping user intent to the **single** right quickstart. Prefer language quickstarts over \`sdks.mdoc\` (TS-heavy).
 
-- **Concepts** (tenants, destinations as subscriptions, topics, how this fits a SaaS/platform): \`${f("docs/pages/concepts.mdx")}\`
-- **Building your own UI** (screen structure: list destinations, create flow type → topics → config): \`${f("docs/pages/guides/building-your-own-ui.mdx")}\`
-- **Topics** (destination topic subscriptions, fan-out): \`${f("docs/pages/features/topics.mdx")}\`
-- Getting started (curl / HTTP only): \`${f("docs/pages/quickstarts/hookdeck-outpost-curl.mdx")}\`
-- TypeScript quickstart (TS SDK): \`${f("docs/pages/quickstarts/hookdeck-outpost-typescript.mdx")}\`
-- Python quickstart (Python SDK): \`${f("docs/pages/quickstarts/hookdeck-outpost-python.mdx")}\`
-- Go quickstart (Go SDK): \`${f("docs/pages/quickstarts/hookdeck-outpost-go.mdx")}\`
-- API reference (human-oriented pages under): \`${f("docs/pages/references/")}\`
+- **Concepts** (tenants, destinations as subscriptions, topics, how this fits a SaaS/platform): \`${f("docs/content/concepts.mdoc")}\`
+- **Building your own UI** (screen structure: list destinations, create flow type → topics → config): \`${f("docs/content/guides/building-your-own-ui.mdoc")}\`
+- **Topics** (destination topic subscriptions, fan-out): \`${f("docs/content/features/topics.mdoc")}\`
+- Getting started (curl / HTTP only): \`${f("docs/content/quickstarts/hookdeck-outpost-curl.mdoc")}\`
+- TypeScript quickstart (TS SDK): \`${f("docs/content/quickstarts/hookdeck-outpost-typescript.mdoc")}\`
+- Python quickstart (Python SDK): \`${f("docs/content/quickstarts/hookdeck-outpost-python.mdoc")}\`
+- Go quickstart (Go SDK): \`${f("docs/content/quickstarts/hookdeck-outpost-go.mdoc")}\`
+- Docs content (browse for feature pages): \`${f("docs/content/")}\`
 - OpenAPI spec (machine-readable): \`${f("docs/apis/openapi.yaml")}\`
-- Destination types: \`${f("docs/pages/destinations/")}\`
-- SDKs overview (TS-heavy): \`${f("docs/pages/sdks.mdx")}\` — prefer the language quickstart over this for Python/Go/TS code.
+- **Destination types** (summary + links): \`${f("docs/content/overview.mdoc")}\` — *Supported destinations*; per-type detail in \`docs/content/destinations/*.mdoc\` (e.g. \`${f("docs/content/destinations/webhook.mdoc")}\`)
+- SDKs overview (TS-heavy): \`${f("docs/content/sdks.mdoc")}\` — prefer the language quickstart over this for Python/Go/TS code.
 
 ${languageSdkBlock}`;
   if (llmsFullUrl) {
@@ -180,7 +191,7 @@ function applyPlaceholders(
       "Set EVAL_TEST_DESTINATION_URL to your Hookdeck Console Source URL (same value the dashboard injects as {{TEST_DESTINATION_URL}})",
     );
   }
-  const docsUrl = env.EVAL_DOCS_URL ?? "https://outpost.hookdeck.com/docs";
+  const docsUrl = env.EVAL_DOCS_URL ?? "https://hookdeck.com/docs/outpost";
   const llms = env.EVAL_LLMS_FULL_URL?.trim() ?? "";
   const useLocalDocs = envFlagTruthy(env.EVAL_LOCAL_DOCS);
 
@@ -301,18 +312,49 @@ function idFromFilename(file: string): string {
 async function runScenarioQuery(
   prompt: string,
   options: Options,
+  progress?: { readonly phaseLabel: string },
 ): Promise<{ messages: unknown[]; sessionId?: string }> {
   const messages: unknown[] = [];
   let sessionId: string | undefined;
+  const progressOn = envFlagTruthy(process.env.EVAL_PROGRESS);
+  const label = progress?.phaseLabel ?? "agent query";
+  let msgCount = 0;
+  let interval: ReturnType<typeof setInterval> | undefined;
 
-  const q = query({ prompt, options });
-  for await (const message of q) {
-    messages.push(serializeMessage(message));
-    if (isInitSystemMessage(message)) {
-      sessionId = message.session_id;
+  if (progressOn && progress) {
+    const maxTurns = options.maxTurns;
+    console.error(
+      `[eval] ${label}: starting (EVAL_PROGRESS=1; heartbeat every ${evalProgressIntervalMs()}ms; maxTurns=${String(maxTurns)})`,
+    );
+    interval = setInterval(() => {
+      console.error(
+        `[eval] ${label}: still running (${msgCount} SDK message(s) so far — subprocess or model may be busy with no new stream events)`,
+      );
+    }, evalProgressIntervalMs());
+  }
+
+  try {
+    const q = query({ prompt, options });
+    for await (const message of q) {
+      msgCount += 1;
+      messages.push(serializeMessage(message));
+      if (isInitSystemMessage(message)) {
+        sessionId = message.session_id;
+      }
+      if (progressOn && progress && msgCount > 0 && msgCount % 25 === 0) {
+        console.error(`[eval] ${label}: ${msgCount} SDK message(s) received`);
+      }
+    }
+  } finally {
+    if (interval !== undefined) {
+      clearInterval(interval);
     }
   }
 
+  if (progressOn && progress) {
+    console.error(`[eval] ${label}: finished this query (${msgCount} SDK message(s))`);
+  }
+
   return { messages, sessionId };
 }
 
@@ -354,10 +396,14 @@ async function runOneScenario(
   for (let i = 0; i < prompts.length; i++) {
     const label = i === 0 ? "Turn 0 (dashboard prompt)" : userTurns[i - 1]?.label ?? `Turn ${i}`;
     const before = allMessages.length;
-    const { messages, sessionId: sid } = await runScenarioQuery(prompts[i]!, {
-      ...opts.baseOptions,
-      resume: sessionId,
-    });
+    const { messages, sessionId: sid } = await runScenarioQuery(
+      prompts[i]!,
+      {
+        ...opts.baseOptions,
+        resume: sessionId,
+      },
+      { phaseLabel: label },
+    );
     if (sid) {
       sessionId = sid;
     }
@@ -691,6 +737,8 @@ Environment:
   EVAL_TOOLS            Optional, comma-separated (default: Read,Glob,Grep[,WebFetch],Write,Edit,Bash — see README)
   EVAL_MODEL            Optional
   EVAL_MAX_TURNS        Optional (default: 80; npm/go mod installs can exceed 40; lower only for smoke — may not finish 08–10)
+  EVAL_PROGRESS         Set to 1/true/yes — log heartbeats to stderr during each agent query (see EVAL_PROGRESS_INTERVAL_MS)
+  EVAL_PROGRESS_INTERVAL_MS  Optional (default: 30000, min 5000) — wall-clock heartbeat while the SDK stream is quiet
   EVAL_PERMISSION_MODE  Optional (default: dontAsk)
   EVAL_PERSIST_SESSION  Set to "false" to disable session persistence (breaks multi-turn resume)
   EVAL_DISABLE_WORKSPACE_WRITE_GUARD  Set to 1 to allow Write/Edit outside the run dir (not recommended)
@@ -800,7 +848,7 @@ Agent cwd is usually the run directory. Scenarios may define ## Eval harness (JS
     console.error(`\n>>> Scenario ${file} (run dir ${runDir}, agent cwd ${agentCwd}) ...`);
     if (scenarioIdEarly === "08" || scenarioIdEarly === "09" || scenarioIdEarly === "10") {
       console.error(
-        "Note: Scenarios 08–10 clone a full baseline and install deps — often 30–90+ min wall time with sparse console output until transcript.json. Ctrl+C aborts (writes *.eval-aborted.json). See README § Wall time.",
+        "Note: Scenarios 08–10 clone a full baseline and install deps — often 30–90+ min wall time with sparse console output until transcript.json. Ctrl+C aborts (writes *.eval-aborted.json). Set EVAL_PROGRESS=1 for stderr heartbeats. See README § Wall time.",
       );
     }
 
diff --git a/docs/content/quickstarts/hookdeck-outpost-agent-prompt.mdoc b/docs/content/quickstarts/hookdeck-outpost-agent-prompt.mdoc
index 73a28923a..7422c5e38 100644
--- a/docs/content/quickstarts/hookdeck-outpost-agent-prompt.mdoc
+++ b/docs/content/quickstarts/hookdeck-outpost-agent-prompt.mdoc
@@ -169,7 +169,7 @@ Apply **only** the items below that fit the task; **skip** any that do not apply
 | `{{API_BASE_URL}}` | `https://api.outpost.hookdeck.com/2025-07-01` | Safe to embed in the prompt |
 | `{{TOPICS_LIST}}` | Bullet list or comma-separated topic names | From dashboard config — operators should keep this aligned with what the integrated app will **publish** and what destinations subscribe to |
 | `{{TEST_DESTINATION_URL}}` | **Required** — HTTPS URL of the Hookdeck Console **Source** created for this onboarding flow (fed in by the dashboard). |
-| `{{DOCS_URL}}` | `https://outpost.hookdeck.com/docs` | Public docs root (no trailing slash). For unpublished docs, automated evals can set **`EVAL_LOCAL_DOCS=1`** so the Documentation section is replaced with repository file paths (see `docs/agent-evaluation/README.md`). |
+| `{{DOCS_URL}}` | `https://hookdeck.com/docs/outpost` | Production **Outpost** docs base on Hookdeck (no trailing slash). Template paths append the same segments as **`/docs/outpost/…`** on the docs site (see `docs/content/nav.json`). For unpublished docs, evals can set **`EVAL_LOCAL_DOCS=1`** so the Documentation section is replaced with repository file paths (see `docs/agent-evaluation/README.md`). |
 | `{{LLMS_FULL_URL}}` | `https://hookdeck.com/outpost/docs/llms-full.txt` | Optional; omit the line if not live yet |
 
 ### Building your own UI — where the detail lives

From 4d7a91a3e263eccc595baac1479ee1acbc15be55 Mon Sep 17 00:00:00 2001
From: Phil Leggetter <phil@leggetter.co.uk>
Date: Fri, 10 Apr 2026 23:00:42 +0100
Subject: [PATCH 41/47] docs(agent-evaluation): refresh tracker, scenarios, and
 harness docs

- Point scenario and script links at docs/content paths (.mdoc)
- Update SCENARIO-RUN-TRACKER for latest heuristic-pass runs
- Revise README and AGENTS for current layout
- Remove SKILL-UPSTREAM-NOTES (obsolete)

Made-with: Cursor
---
 docs/agent-evaluation/AGENTS.md               |  4 +--
 docs/agent-evaluation/README.md               | 11 ++++---
 docs/agent-evaluation/SCENARIO-RUN-TRACKER.md | 32 +++++++++----------
 docs/agent-evaluation/SKILL-UPSTREAM-NOTES.md | 22 -------------
 .../scenarios/01-basics-curl.md               |  2 +-
 .../scenarios/02-basics-typescript.md         |  2 +-
 .../scenarios/03-basics-python.md             |  2 +-
 .../scenarios/04-basics-go.md                 |  2 +-
 .../scenarios/05-app-nextjs.md                |  2 +-
 .../scenarios/06-app-fastapi.md               |  2 +-
 .../scenarios/07-app-go-http.md               |  2 +-
 .../scenarios/08-integrate-nextjs-existing.md |  4 +--
 .../09-integrate-fastapi-existing.md          |  4 +--
 .../scenarios/10-integrate-go-existing.md     |  4 +--
 docs/agent-evaluation/scripts/run-scenario.sh |  2 +-
 15 files changed, 38 insertions(+), 59 deletions(-)
 delete mode 100644 docs/agent-evaluation/SKILL-UPSTREAM-NOTES.md

diff --git a/docs/agent-evaluation/AGENTS.md b/docs/agent-evaluation/AGENTS.md
index 5ab942505..ea6cee0d8 100644
--- a/docs/agent-evaluation/AGENTS.md
+++ b/docs/agent-evaluation/AGENTS.md
@@ -6,7 +6,7 @@ This file applies to **everything under `docs/agent-evaluation/`** (scenarios, R
 
 | Audience | Content |
 |----------|---------|
-| **The model under test** | Turn 0 = pasted [`hookdeck-outpost-agent-prompt.mdx`](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx) template only, plus **Turn N — User** blockquotes (verbatim user role-play). |
+| **The model under test** | Turn 0 = pasted [`hookdeck-outpost-agent-prompt.mdoc`](../content/quickstarts/hookdeck-outpost-agent-prompt.mdoc) template only, plus **Turn N — User** blockquotes (verbatim user role-play). |
 | **Humans / harness** | Intent, preconditions, eval harness JSON, Success criteria, Failure modes, `score-transcript.ts`, README. |
 
 **Never** put harness vocabulary into **user** lines. The user is a product engineer, not an eval runner.
@@ -26,7 +26,7 @@ It is fine for **Success criteria**, **Failure modes**, and **Intent** to name `
 
 ## Alignment without parroting
 
-- **Product bar** (domain publish, topic reconciliation, full-stack UI depth) belongs in **Success criteria** and in the **prompt template** in `hookdeck-outpost-agent-prompt.mdx`.
+- **Product bar** (domain publish, topic reconciliation, full-stack UI depth) belongs in **Success criteria** and in the **prompt template** in `hookdeck-outpost-agent-prompt.mdoc`.
 - **User turns** should **request outcomes** (“I need customers to see failed deliveries and retry”) not **cite** where in the template that is spelled out.
 
 If you add a new requirement, update **Success criteria** (and heuristics only when a **durable, low–false-positive** check exists). Do not stuff the verbatim rubric into the user quote.
diff --git a/docs/agent-evaluation/README.md b/docs/agent-evaluation/README.md
index 5dfb9330c..40eed004b 100644
--- a/docs/agent-evaluation/README.md
+++ b/docs/agent-evaluation/README.md
@@ -59,6 +59,7 @@ The runner loads **`docs/agent-evaluation/.env`** automatically (via `dotenv`).
 
 Scenarios that **`git clone`** a full SaaS template and run **`npm` / `pnpm` / `docker compose`** installs are **slow by design**. Expect **roughly 30–90+ minutes** of wall time for a single run of **08**, **09**, or **10** (clone + install + several agent turns). The harness prints little to the terminal until **`transcript.json`** is written at the end, which can look hung.
 
+- **Progress on stderr:** set **`EVAL_PROGRESS=1`** so the runner prints **periodic lines** (default every **30s** per agent query, plus every **25** SDK messages). You still see activity when the agent is inside a **long Bash** call and the SDK emits **no** new messages for a while. Tune with **`EVAL_PROGRESS_INTERVAL_MS`** (minimum **5000**). Default is off so CI and short runs stay quiet.
 - **Stop early:** **Ctrl+C** (**SIGINT**) in the terminal running `npm run eval`. The runner writes **`*-scenario-NN.eval-aborted.json`** next to the run folder (see **Harness sidecars** at the top of this file).
 - **Skip re-clone:** If the baseline is already under the run directory, **`EVAL_SKIP_HARNESS_PRE_STEPS=1`** skips **`git_clone`** from the scenario harness (see each scenario’s **`## Eval harness`** block).
 - **Cap agent length (smoke only):** **`EVAL_MAX_TURNS`** (default **80**) limits SDK turns; lowering it may end the run sooner but often **fails** the integration before success criteria are met—use for debugging, not a real pass.
@@ -92,7 +93,7 @@ cd docs/agent-evaluation && npm ci && npm run eval:ci
 - **`EVAL_LOCAL_DOCS=1`** — before public docs are live, set this so Turn 0 replaces public doc URLs with **absolute paths to MDX/OpenAPI files in this repo** (so the agent should use **Read** on local files instead of WebFetch to production).
 - **`EVAL_SKIP_HARNESS_PRE_STEPS=1`** — skip **`git_clone`** (and any future **`preSteps`**) declared in a scenario’s **`## Eval harness`** JSON block; useful offline or when the baseline folder is already present.
 
-- **Turn 0** text is built from [`hookdeck-outpost-agent-prompt.mdx`](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx) (`## Template`) with placeholders filled from environment variables.
+- **Turn 0** text is built from [`hookdeck-outpost-agent-prompt.mdoc`](../content/quickstarts/hookdeck-outpost-agent-prompt.mdoc) (`## Template`) with placeholders filled from environment variables.
 - Transcripts are written to `results/runs/<stamp>-scenario-NN/transcript.json` (gitignored).
 
 See `npm run eval -- --help` for env vars (`EVAL_TOOLS`, `EVAL_MODEL`, etc.).
@@ -136,7 +137,7 @@ These measure **existing-app integration**, not a greenfield demo. When you **ex
 
 The **full prompt template** (the text operators paste as Turn 0) lives in **one** place:
 
-**[`docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx`](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx)** — use the fenced block under **## Template**.
+**[`docs/content/quickstarts/hookdeck-outpost-agent-prompt.mdoc`](../content/quickstarts/hookdeck-outpost-agent-prompt.mdoc)** — use the fenced block under **## Template**.
 
 For eval runs, example placeholder substitutions (non-secret) are in [`fixtures/placeholder-values-for-turn0.md`](fixtures/placeholder-values-for-turn0.md) only. That file intentionally **does not** duplicate the template.
 
@@ -144,7 +145,7 @@ The Hookdeck dashboard should eventually render the **same** template body from
 
 ## How to run an evaluation (manual)
 
-1. **Turn 0:** Open the [agent prompt MDX](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx), copy **## Template**, replace `{{…}}` (see [placeholder examples](fixtures/placeholder-values-for-turn0.md)).
+1. **Turn 0:** Open the [agent prompt template](../content/quickstarts/hookdeck-outpost-agent-prompt.mdoc), copy **## Template**, replace `{{…}}` (see [placeholder examples](fixtures/placeholder-values-for-turn0.md)).
 2. **Pick a scenario:** e.g. [`scenarios/01-basics-curl.md`](scenarios/01-basics-curl.md).
 3. **New agent thread:** Paste Turn 0, then follow each **Turn N — User** line from the scenario verbatim (or as specified).
 4. **Judge output:** Use the scenario’s **Success criteria** checkboxes (human decision).
@@ -218,7 +219,7 @@ Scenarios **1–4** align with **“Try it out”**; **5–7** with **“Build a
 
 **Caveats (update the skill in `hookdeck/agent-skills`, not in this repo):**
 
-1. **Managed-first** — The published skill is still **self-hosted heavy** (Docker block first; managed is a short table). For Hookdeck Outpost GA, the skill should foreground [managed quickstarts](../pages/quickstarts/hookdeck-outpost-curl.mdx), `https://api.outpost.hookdeck.com/2025-07-01`, **Settings → Secrets**, and `OUTPOST_API_KEY` / optional `OUTPOST_API_BASE_URL` to match product copy.
+1. **Managed-first** — The published skill is still **self-hosted heavy** (Docker block first; managed is a short table). For Hookdeck Outpost GA, the skill should foreground [managed quickstarts](../content/quickstarts/hookdeck-outpost-curl.mdoc), `https://api.outpost.hookdeck.com/2025-07-01`, **Settings → Secrets**, and `OUTPOST_API_KEY` / optional `OUTPOST_API_BASE_URL` to match product copy.
 2. **REST paths** — Examples must use **`/tenants/{id}`**, not `PUT $BASE_URL/$TENANT_ID` (that path is wrong for the real API).
 3. **Naming** — Align env var naming with docs (`OUTPOST_API_KEY` or documented dashboard name), not ad-hoc `HOOKDECK_API_KEY` unless the dashboard literally uses that string.
 4. **Router vs. deep skills** — Today `outpost` is one monolithic `SKILL.md`. The skill itself mentions **future** destination-specific skills (`outpost-webhooks`, etc.). For scale, consider either **sections** with clear headings or **child skills** (e.g. `outpost-managed-quickstart`, `outpost-self-hosted`) once content grows—without forcing users to install many tiles for the common case.
@@ -227,6 +228,6 @@ Until the skill is updated, agents should still be pointed at the **quickstart M
 
 ## Related docs
 
-- [Agent prompt template (SSoT)](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx)
+- [Agent prompt template (SSoT)](../content/quickstarts/hookdeck-outpost-agent-prompt.mdoc)
 - [Upstream skill notes](SKILL-UPSTREAM-NOTES.md)
 - [TEMP tracking note](../TEMP-hookdeck-outpost-onboarding-status.md)
diff --git a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
index 7c789f207..b5443c60a 100644
--- a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
+++ b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
@@ -9,7 +9,7 @@ Use this table while you **run scenarios one at a time** and **execute the gener
    npm run eval -- --scenario <NN>
   ```
    Each run creates `**results/runs/<ISO-stamp>-scenario-<NN>/**` with `transcript.json`, `heuristic-score.json`, `llm-score.json`, and whatever the agent wrote (scripts, apps, clones).
-2. **Fill the table:** paste or note the **run directory** (stamp), mark **Heuristic** / **LLM** pass or fail (from the sidecars or console).
+2. **Fill the table:** paste or note the **run directory** (stamp), mark **Heuristic** / **LLM** pass or fail (from the sidecars or console). **Run directory** should be the **latest** folder matching `results/runs/*-scenario-<NN>` whose `heuristic-score.json` has **`overallTranscriptPass: true`** (re-scan directories when updating this file).
 3. **Execution (generated code):** with `**OUTPOST_API_KEY`** (and `**OUTPOST_TEST_WEBHOOK_URL`** / `**OUTPOST_API_BASE_URL`** if needed) in your shell or `.env`, run the artifact the scenario expects — e.g. `bash outpost-quickstart.sh`, `npx tsx …`, `python …`, `go run …`, `npm run dev` in the generated app folder. Mark **Pass** / **Fail** / **Skip** and add **Notes** (HTTP status, delivery in Hookdeck Console, etc.). **Do not edit generated files to force a pass** — test what the agent produced; note OS/environment (e.g. Linux vs macOS) when relevant. **This column is the primary bar for “does the output actually work?”** Heuristic and LLM scores are supplementary.
 4. **Optional:** copy a row to your local run log under `results/` if you use `RUN-RECORDING.template.md`.
 
@@ -21,14 +21,14 @@ Use this table while you **run scenarios one at a time** and **execute the gener
 | ID  | Scenario file                                                                  | Run directory (`results/runs/…`)       | Heuristic              | LLM judge | Execution (generated code) | Notes                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
 | --- | ------------------------------------------------------------------------------ | -------------------------------------- | ---------------------- | --------- | -------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | 01  | [01-basics-curl.md](scenarios/01-basics-curl.md)                               | `2026-04-10T09-28-52-764Z-scenario-01` | Pass (7/7)             | Pass      | Pass                       | Artifact: `**quickstart.sh`**. Heuristic + LLM from `npm run eval -- --scenario 01`; harness sidecars are sibling `*.eval-*.json` under `results/runs/` (not inside run dir). Execution: `OUTPOST_API_KEY` from `docs/agent-evaluation/.env` + `bash quickstart.sh` in run dir; tenant **200**, destination **201**, publish **202**; exit 0.                                                                                                                                                                                                                                     |
-| 02  | [02-basics-typescript.md](scenarios/02-basics-typescript.md)                   | `2026-04-10T10-34-35-461Z-scenario-02` | Pass (9/9)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1` after **scope-router** update to [agent prompt template](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx). Artifact: `**outpost-quickstart.ts`** + `package.json` (SDK)—**no** Next.js scaffold. Heuristic + LLM pass; judge `execution_in_transcript` **pass** (agent ran script in transcript: tenant, destination, event id). Harness sidecars sibling under `results/runs/`. Earlier over-build run: `2026-04-10T09-39-06-362Z-scenario-02` (Next.js + script; LLM fail).                                                                          |
-| 03  | [03-basics-python.md](scenarios/03-basics-python.md)                           | `2026-04-10T11-02-19-073Z-scenario-03` | Pass (8/8)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1` with [scope-router prompt](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx). Artifact: `**outpost_quickstart.py`** + `.env.example` (`python-dotenv`, `outpost_sdk`)—**no** web framework. Heuristic + LLM pass; judge `execution_in_transcript` **pass** (agent ran script; printed event id). Harness sidecars sibling under `results/runs/`. Earlier run: `2026-04-08T15-34-12-720Z-scenario-03`.                                                                                                                                                   |
+| 02  | [02-basics-typescript.md](scenarios/02-basics-typescript.md)                   | `2026-04-10T15-01-35-359Z-scenario-02` | Pass (9/9)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1` after **scope-router** update to [agent prompt template](../content/quickstarts/hookdeck-outpost-agent-prompt.mdoc). Artifact: `**outpost-quickstart.ts`** + `package.json` (SDK)—**no** Next.js scaffold. Heuristic + LLM pass; harness sidecars sibling under `results/runs/`. Earlier passes: `2026-04-10T10-49-02-890Z-scenario-02`, `2026-04-10T10-34-35-461Z-scenario-02`. Over-build run: `2026-04-10T09-39-06-362Z-scenario-02` (Next.js + script; LLM fail).                                                                                        |
+| 03  | [03-basics-python.md](scenarios/03-basics-python.md)                           | `2026-04-10T11-02-19-073Z-scenario-03` | Pass (8/8)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1` with [scope-router prompt](../content/quickstarts/hookdeck-outpost-agent-prompt.mdoc). Artifact: `**outpost_quickstart.py`** + `.env.example` (`python-dotenv`, `outpost_sdk`)—**no** web framework. Heuristic + LLM pass; judge `execution_in_transcript` **pass** (agent ran script; printed event id). Harness sidecars sibling under `results/runs/`. Earlier run: `2026-04-08T15-34-12-720Z-scenario-03`.                                                                                                                                                   |
 | 04  | [04-basics-go.md](scenarios/04-basics-go.md)                                   | `2026-04-08T15-48-31-367Z-scenario-04` | Pass (9/9)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Artifacts: `**main.go`**, `go.mod` (replace → repo `sdks/outpost-go`). `docs/agent-evaluation/.env` + `go run .`; tenant, destination, publish OK.                                                                                                                                                                                                                                                                                                                                                                                                           |
-| 05  | [05-app-nextjs.md](scenarios/05-app-nextjs.md)                                 | `2026-04-08T17-21-22-170Z-scenario-05` | 9/10; overall **Fail** | Pass      | Pass                       | `**nextjs-webhook-demo/`** — primary assessed run; see **§ Scenario 05 — assessment (`17-21-22`)**. Heuristic failure is `managed_base_not_selfhosted` (doc-corpus), not the app. Earlier: `2026-04-08T16-12-10-708Z` (`outpost-nextjs-demo/`, 10/10 heuristic, simpler UI).                                                                                                                                                                                                                                                                                                      |
+| 05  | [05-app-nextjs.md](scenarios/05-app-nextjs.md)                                 | `2026-04-08T16-12-10-708Z-scenario-05` | Pass (10/10)           | Pass      | Pass                       | **Last heuristic-pass run:** `**outpost-nextjs-demo/`** — simpler two-route app (`/api/register`, `/api/publish`), fixed topic. Richer app + assessment: **§ Scenario 05 — assessment** (`**nextjs-webhook-demo/`** in `2026-04-08T17-21-22-170Z-scenario-05`) — LLM + execution pass; heuristic **9/10** (`managed_base_not_selfhosted`, doc-corpus).                                                                                                                                                                                                                              |
 | 06  | [06-app-fastapi.md](scenarios/06-app-fastapi.md)                               | `2026-04-09T08-38-42-008Z-scenario-06` | Pass (8/8)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. `**main.py`** + `requirements.txt`, `outpost_sdk` + FastAPI. HTML: destinations list, add webhook (topics from API + URL), publish test event, delete. Execution: `python3 -m venv .venv`, `pip install -r requirements.txt`, run-dir `.env`, `uvicorn main:app` on :8766; **GET /** 200, **POST /destinations** 303, **POST /publish** 303.                                                                                                                                                                                                                 |
 | 07  | [07-app-go-http.md](scenarios/07-app-go-http.md)                               | `2026-04-09T09-10-23-291Z-scenario-07` | Pass (9/9)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. `**go-portal-demo/`** — `main.go` + `templates/`, `net/http`, `outpost-go` (`replace` → repo `sdks/outpost-go`). Multi-step create destination + **GET/POST /publish**. Execution: `PORT=8777` + key/base from `docs/agent-evaluation/.env`; **GET /** 200, **POST /publish** 200. Eval ~25 min wall time.                                                                                                                                                                                                                                                   |
-| 08  | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md)   | `2026-04-10T14-29-04-214Z-scenario-08` | Pass (10/10)           | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1` + [scope-router prompt](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx). Harness `**next-saas-starter/`** under run dir (gitignored). **Execution pass** — operator QA (Postgres, `.env`, migrate/seed/dev, Outpost UI/API). See **§ Scenario 08 — execution notes** for reproducibility (seed/`server-only`, destination-schema `key` vs SDK). Earlier: `2026-04-10T11-08-35-921Z-scenario-08` (8/8), `2026-04-09T14-48-16-906Z-scenario-08`, `2026-04-09T11-08-32-505Z-scenario-08`. |
-| 09  | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) | `2026-04-09T22-16-54-750Z-scenario-09` | Pass (6/6)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. **Artifact** lives under `results/runs/…` (**gitignored**): `full-stack-fastapi-template/` + Docker **outpost-local-s09**; ports **5173** / **8001** / **54333** / **1080**. **§ Scenario 09 — post-agent work** lists everything applied after the agent transcript (incl. test publish, events/attempts/retry UI, docs + prompt). **§ Scenario 09 — review notes** — closed (IA + domain topics guidance landed in BYO UI + prompt). **Legacy runs:** `2026-04-09T20-48-16-530Z-scenario-09`, `2026-04-09T15-51-44-184Z-scenario-09`.                      |
+| 08  | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md)   | `2026-04-10T14-29-04-214Z-scenario-08` | Pass (10/10)           | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1` + [scope-router prompt](../content/quickstarts/hookdeck-outpost-agent-prompt.mdoc). Harness `**next-saas-starter/`** under run dir (gitignored). **Execution pass** — operator QA (Postgres, `.env`, migrate/seed/dev, Outpost UI/API). See **§ Scenario 08 — execution notes** for reproducibility (seed/`server-only`, destination-schema `key` vs SDK). Earlier: `2026-04-10T11-08-35-921Z-scenario-08` (8/8), `2026-04-09T14-48-16-906Z-scenario-08`, `2026-04-09T11-08-32-505Z-scenario-08`. |
+| 09  | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) | `2026-04-10T19-54-20-037Z-scenario-09` | Pass (10/10)           | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. **Artifact:** `full-stack-fastapi-template/` under run dir (**gitignored**). **Heuristic + LLM** from this stamp; harness sidecars sibling under `results/runs/`. Docker: default **5173** / **8000** / **1080** / **1025**; if host **5432** is taken, map DB e.g. **54334:5432** in `compose.override.yml`. After a **fresh DB volume**, clear the SPA token or **re-login** — stale JWT → **404 User not found** on `/api/v1/users/me` and `/api/v1/outpost/destinations`. **§ Scenario 09 — post-agent work** (below) still describes template fixes vs baseline. **Legacy runs:** `2026-04-10T19-22-02-903Z-scenario-09`, `2026-04-09T22-16-54-750Z-scenario-09` (6/6), `2026-04-09T20-48-16-530Z-scenario-09`, `2026-04-09T15-51-44-184Z-scenario-09`. |
 | 10  | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md)           |                                        |                        |           |                            |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
 
 
@@ -43,7 +43,7 @@ Reproducibility / gotchas:
 - **`pnpm dev`** — if another `**next dev**` already holds **`.next/dev/lock`** for this tree, stop it or remove the lock; port **3000** may be taken (Next picks another port). Turbopack may warn about multiple lockfiles when the app sits under the monorepo — see Next’s **`turbopack.root`** guidance if needed.
 - **Destination schema `key`** — API returns `key` on schema fields; older SDK parses may strip it and break create-destination payloads keyed from labels. Regenerating SDKs (or a BFF raw fetch + mapping) aligns the UI with the API until then.
 
-### Scenario 09 — post-agent work (`2026-04-09T22-16-54-750Z-scenario-09`)
+### Scenario 09 — post-agent work (representative: `2026-04-09T22-16-54-750Z-scenario-09`; latest eval stamp `2026-04-10T19-54-20-037Z-scenario-09`)
 
 Work applied **after** the agent transcript so the FastAPI + React artifact matches current integration guidance (eval honesty + local execution). The template tree under `results/runs/<stamp>-scenario-09/` is **not committed** (see `results/.gitignore`); repo **docs** and **prompt** updates that back this scenario **are** in git.
 
@@ -64,15 +64,15 @@ Work applied **after** the agent transcript so the FastAPI + React artifact matc
 
 **Docs & prompt (repository)**
 
-- [Building your own UI](../pages/guides/building-your-own-ui.mdx) — destination-type field fixes; **Events, attempts, and retries** section (features, how they connect, links to API).
-- [Agent prompt template](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx) — full-stack guidance mentions **events list**, **attempts**, **retry**, alongside test publish.
+- [Building your own UI](../content/guides/building-your-own-ui.mdoc) — destination-type field fixes; **Events, attempts, and retries** section (features, how they connect, links to API).
+- [Agent prompt template](../content/quickstarts/hookdeck-outpost-agent-prompt.mdoc) — full-stack guidance mentions **events list**, **attempts**, **retry**, alongside test publish.
 
 ### Scenario 09 — review notes (resolved, 2026-04-10)
 
 Operator feedback from exercising the FastAPI full-stack artifact is **closed** in-repo:
 
-1. **Event activity IA** — [Building your own UI](../pages/guides/building-your-own-ui.mdx) documents **default** destination → activity and **optional** tenant-wide activity with the same list endpoints; no open doc gap.
-2. **Domain topics + real publishes vs test-only** — [Agent prompt](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx) (topic reconciliation, domain publish, test publish as separate), scenarios **08–10** success criteria + user-turn scripts, [README](README.md) execution notes, and heuristic `**publish_beyond_test_only`** in `[src/score-transcript.ts](src/score-transcript.ts)` cover what we measure.
+1. **Event activity IA** — [Building your own UI](../content/guides/building-your-own-ui.mdoc) documents **default** destination → activity and **optional** tenant-wide activity with the same list endpoints; no open doc gap.
+2. **Domain topics + real publishes vs test-only** — [Agent prompt](../content/quickstarts/hookdeck-outpost-agent-prompt.mdoc) (topic reconciliation, domain publish, test publish as separate), scenarios **08–10** success criteria + user-turn scripts, [README](README.md) execution notes, and heuristic `**publish_beyond_test_only`** in `[src/score-transcript.ts](src/score-transcript.ts)` cover what we measure.
 
 The **copied agent template** (the `## Hookdeck Outpost integration` block) intentionally stays **scenario-agnostic**: it does not name eval baselines, harness repos, or scenario IDs—only product-level integration guidance and doc links.
 
@@ -81,7 +81,7 @@ The **copied agent template** (the `## Hookdeck Outpost integration` block) inte
 
 | Column            | Meaning                                                                                                    |
 | ----------------- | ---------------------------------------------------------------------------------------------------------- |
-| **Run directory** | e.g. `2026-04-07T15-00-00-000Z-scenario-01` — the folder containing `transcript.json`                      |
+| **Run directory** | Latest `results/runs/*-scenario-<NN>` with `heuristic-score.json` → `overallTranscriptPass: true` (folder contains `transcript.json`) |
 | **Heuristic**     | `heuristic-score.json` → `overallTranscriptPass` (or `passed`/`total`)                                     |
 | **LLM judge**     | `llm-score.json` → `overall_transcript_pass`                                                               |
 | **Execution**     | Your smoke test of the **produced** script/app with real credentials — **not** automated by `npm run eval` |
@@ -95,14 +95,14 @@ Use short text or symbols in cells, e.g. **Pass** / **Fail** / **Skip** / **N/A*
 
 ## Scenario 05 — assessment (`2026-04-08T17-21-22-170Z`)
 
-**Status:** This is the **current focus run** for scenario 05 reviews (not `2026-04-08T16-12-10-708Z`).
+**Status:** Deep-dive on the **richer** Next.js artifact (`nextjs-webhook-demo/`). The **tracker table** row for scenario **05** points at **`2026-04-08T16-12-10-708Z-scenario-05`** (`outpost-nextjs-demo/`) as the **latest heuristic-pass** run (10/10); this section documents **`17-21-22`** separately because it failed that check while still passing LLM + execution.
 
 
 | Dimension         | Result                                                                                                                                                                                                                                                                                                                            |
 | ----------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | **Run directory** | `results/runs/2026-04-08T17-21-22-170Z-scenario-05/`                                                                                                                                                                                                                                                                              |
 | **Artifact**      | `nextjs-webhook-demo/` — Next.js App Router, `@hookdeck/outpost-sdk`, Outpost calls **only** in `app/api/**/route.ts` (managed API via SDK default unless `OUTPOST_API_BASE_URL` is set).                                                                                                                                         |
-| **Heuristic**     | **9/10**; `overallTranscriptPass` false — single failure: `managed_base_not_selfhosted` because the transcript corpus included a **Read** of older [Building your own UI](../pages/guides/building-your-own-ui.mdx) containing `localhost:3333/api/v1`. The **generated app does not** use that URL. See § Scenario 05 heuristic. |
+| **Heuristic**     | **9/10**; `overallTranscriptPass` false — single failure: `managed_base_not_selfhosted` because the transcript corpus included a **Read** of older [Building your own UI](../content/guides/building-your-own-ui.mdoc) containing `localhost:3333/api/v1`. The **generated app does not** use that URL. See § Scenario 05 heuristic. |
 | **LLM judge**     | **Pass** — matches scenario 05 success criteria (Next.js structure, server-side SDK, distinct destination + publish UI, tenant/topic handling, README env, managed default).                                                                                                                                                      |
 | **Execution**     | **Pass** (re-checked): `npm run build` in `nextjs-webhook-demo/`; `npm run dev` with `docs/agent-evaluation/.env`; `POST /api/destinations` → **201**, `POST /api/publish` → **200**.                                                                                                                                             |
 
@@ -123,8 +123,8 @@ Use short text or symbols in cells, e.g. **Pass** / **Fail** / **Skip** / **N/A*
 Scenario 05 includes a regex check (`managed_base_not_selfhosted`) in `[src/score-transcript.ts](../src/score-transcript.ts)` (`scoreScenario05`). It looks at the **whole scoring corpus**: assistant-visible text **plus** content that ended up in the transcript from tools (e.g. **Read** of a doc file), not just files in the run folder.
 
 - It fails if the corpus contains a **self-hosted** default API path: specifically the literal substring `localhost:3333/api/v1` (Outpost’s common local dev URL), or a similar `localhost:<port> / api/v1` pattern, unless `OUTPOST_API_BASE_URL` also appears (see code for the exact conditions).
-- **Historical cause:** Older [Building your own UI](../pages/guides/building-your-own-ui.mdx) curl examples used `localhost:3333/api/v1`. If the agent **read** that page during a run, those lines were embedded in `transcript.json`, the check fired, and `overallTranscriptPass` became **false** even when the **generated Next.js app** only used the **managed** SDK default. That was a **harness / doc-corpus** interaction, not proof the app targeted local Outpost.
-- **Doc update:** `docs/pages/guides/building-your-own-ui.mdx` was rewritten to be **managed / self-hosted agnostic** (`OUTPOST_API_BASE_URL`, OpenAPI-shaped paths). Examples **no longer contain** the literal `localhost:3333/api/v1`, so a future eval whose corpus only picks up the current file should **not** fail this check for that substring. Re-run scenario 05 to confirm; other `localhost` patterns could still match if they appear elsewhere in the corpus.
+- **Historical cause:** Older [Building your own UI](../content/guides/building-your-own-ui.mdoc) curl examples used `localhost:3333/api/v1`. If the agent **read** that page during a run, those lines were embedded in `transcript.json`, the check fired, and `overallTranscriptPass` became **false** even when the **generated Next.js app** only used the **managed** SDK default. That was a **harness / doc-corpus** interaction, not proof the app targeted local Outpost.
+- **Doc update:** `docs/content/guides/building-your-own-ui.mdoc` was rewritten to be **managed / self-hosted agnostic** (`OUTPOST_API_BASE_URL`, OpenAPI-shaped paths). Examples **no longer contain** the literal `localhost:3333/api/v1`, so a future eval whose corpus only picks up the current file should **not** fail this check for that substring. Re-run scenario 05 to confirm; other `localhost` patterns could still match if they appear elsewhere in the corpus.
 - **Run `2026-04-08T16-12-10-708Z`:** heuristic **10/10**, `overallTranscriptPass: true`.
 - **Run `2026-04-08T17-21-22-170Z`:** heuristic **9/10**, `overallTranscriptPass: false` — failed `managed_base_not_selfhosted`; LLM judge still **passed**; transcript included **Read** of the **previous** `building-your-own-ui.mdx` with `localhost:3333/api/v1`.
 
diff --git a/docs/agent-evaluation/SKILL-UPSTREAM-NOTES.md b/docs/agent-evaluation/SKILL-UPSTREAM-NOTES.md
deleted file mode 100644
index 6c8de7367..000000000
--- a/docs/agent-evaluation/SKILL-UPSTREAM-NOTES.md
+++ /dev/null
@@ -1,22 +0,0 @@
-# Notes for updating `hookdeck/agent-skills` — `skills/outpost`
-
-Apply these in the **[agent-skills](https://github.com/hookdeck/agent-skills)** repository, not in Outpost OSS.
-
-## Recommended direction
-
-1. **Lead with managed Hookdeck Outpost** — Link prominently to managed quickstarts (curl, TypeScript, Python, Go) and `https://api.outpost.hookdeck.com/2025-07-01`.
-2. **Fix REST examples** — Tenant upsert must be `PUT {base}/tenants/{tenant_id}`, not `PUT {base}/{tenant_id}`.
-3. **Align env naming** — Match product/docs: Outpost API key from project **Settings → Secrets**, typically loaded as `OUTPOST_API_KEY` in examples; avoid introducing `HOOKDECK_API_KEY` unless the dashboard literally uses that name.
-4. **Self-hosted section** — Keep Docker/Kubernetes/Railway as a secondary path with `http://localhost:3333/api/v1` and correct `/tenants/...` paths.
-5. **Optional: split later** — If the file grows, add `outpost-managed.md` / `outpost-self-hosted.md` fragments or separate skills; keep the default tile entrypoint short.
-
-## Concrete issues in current `SKILL.md` (as of fetch against `main`)
-
-- **Wrong curl path:** `curl -X PUT "$BASE_URL/$TENANT_ID"` should target `/tenants/$TENANT_ID` relative to the API base (managed base has no `/api/v1` prefix).
-- **Managed auth row** — Verify exact dashboard copy for secret name and env var conventions; link to Hookdeck Outpost project settings, not only generic dashboard secrets if URLs differ.
-- **Tile summary** — `tile.json` says “self-hosted relay”; managed Outpost should be reflected in the summary string when GA positioning is final.
-
-## Cross-links from this repo
-
-- Onboarding prompt template: `docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx`
-- Manual agent eval harness: `docs/agent-evaluation/README.md`
\ No newline at end of file
diff --git a/docs/agent-evaluation/scenarios/01-basics-curl.md b/docs/agent-evaluation/scenarios/01-basics-curl.md
index 6aa12b215..7d90026f4 100644
--- a/docs/agent-evaluation/scenarios/01-basics-curl.md
+++ b/docs/agent-evaluation/scenarios/01-basics-curl.md
@@ -17,7 +17,7 @@ The harness sets the agent **cwd** to an empty directory under `docs/agent-evalu
 
 ### Turn 0
 
-Paste the **## Template** block from `[hookdeck-outpost-agent-prompt.mdx](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx)`, with `{{…}}` filled using your project or `[fixtures/placeholder-values-for-turn0.md](../fixtures/placeholder-values-for-turn0.md)`.
+Paste the **## Template** block from `[hookdeck-outpost-agent-prompt.mdoc](../../content/quickstarts/hookdeck-outpost-agent-prompt.mdoc)`, with `{{…}}` filled using your project or `[fixtures/placeholder-values-for-turn0.md](../fixtures/placeholder-values-for-turn0.md)`.
 
 ### Turn 1 — User
 
diff --git a/docs/agent-evaluation/scenarios/02-basics-typescript.md b/docs/agent-evaluation/scenarios/02-basics-typescript.md
index a403bab6d..afbc4b7f2 100644
--- a/docs/agent-evaluation/scenarios/02-basics-typescript.md
+++ b/docs/agent-evaluation/scenarios/02-basics-typescript.md
@@ -17,7 +17,7 @@ The harness sets the agent **cwd** to `docs/agent-evaluation/results/runs/<stamp
 
 ### Turn 0
 
-Paste the **## Template** block from `[hookdeck-outpost-agent-prompt.mdx](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx)`, with `{{…}}` filled using your project or `[fixtures/placeholder-values-for-turn0.md](../fixtures/placeholder-values-for-turn0.md)`.
+Paste the **## Template** block from `[hookdeck-outpost-agent-prompt.mdoc](../../content/quickstarts/hookdeck-outpost-agent-prompt.mdoc)`, with `{{…}}` filled using your project or `[fixtures/placeholder-values-for-turn0.md](../fixtures/placeholder-values-for-turn0.md)`.
 
 ### Turn 1 — User
 
diff --git a/docs/agent-evaluation/scenarios/03-basics-python.md b/docs/agent-evaluation/scenarios/03-basics-python.md
index 880b3c5e1..c0d747373 100644
--- a/docs/agent-evaluation/scenarios/03-basics-python.md
+++ b/docs/agent-evaluation/scenarios/03-basics-python.md
@@ -17,7 +17,7 @@ The harness sets the agent **cwd** to `docs/agent-evaluation/results/runs/<stamp
 
 ### Turn 0
 
-Paste the **## Template** block from `[hookdeck-outpost-agent-prompt.mdx](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx)`, with `{{…}}` filled using your project or `[fixtures/placeholder-values-for-turn0.md](../fixtures/placeholder-values-for-turn0.md)`.
+Paste the **## Template** block from `[hookdeck-outpost-agent-prompt.mdoc](../../content/quickstarts/hookdeck-outpost-agent-prompt.mdoc)`, with `{{…}}` filled using your project or `[fixtures/placeholder-values-for-turn0.md](../fixtures/placeholder-values-for-turn0.md)`.
 
 ### Turn 1 — User
 
diff --git a/docs/agent-evaluation/scenarios/04-basics-go.md b/docs/agent-evaluation/scenarios/04-basics-go.md
index 7d575c62f..e1d8a6db8 100644
--- a/docs/agent-evaluation/scenarios/04-basics-go.md
+++ b/docs/agent-evaluation/scenarios/04-basics-go.md
@@ -17,7 +17,7 @@ The harness sets the agent **cwd** to `docs/agent-evaluation/results/runs/<stamp
 
 ### Turn 0
 
-Paste the **## Template** block from [`hookdeck-outpost-agent-prompt.mdx`](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx), with `{{…}}` filled using your project or [`fixtures/placeholder-values-for-turn0.md`](../fixtures/placeholder-values-for-turn0.md).
+Paste the **## Template** block from [`hookdeck-outpost-agent-prompt.mdoc`](../../content/quickstarts/hookdeck-outpost-agent-prompt.mdoc), with `{{…}}` filled using your project or [`fixtures/placeholder-values-for-turn0.md`](../fixtures/placeholder-values-for-turn0.md).
 
 ### Turn 1 — User
 
diff --git a/docs/agent-evaluation/scenarios/05-app-nextjs.md b/docs/agent-evaluation/scenarios/05-app-nextjs.md
index bc4aca4db..c6f861f4a 100644
--- a/docs/agent-evaluation/scenarios/05-app-nextjs.md
+++ b/docs/agent-evaluation/scenarios/05-app-nextjs.md
@@ -22,7 +22,7 @@ The harness sets the agent **cwd** to an empty directory under `docs/agent-evalu
 
 ### Turn 0
 
-Paste the **## Template** block from `[hookdeck-outpost-agent-prompt.mdx](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx)`, with `{{…}}` filled using your project or `[fixtures/placeholder-values-for-turn0.md](../fixtures/placeholder-values-for-turn0.md)`.
+Paste the **## Template** block from `[hookdeck-outpost-agent-prompt.mdoc](../../content/quickstarts/hookdeck-outpost-agent-prompt.mdoc)`, with `{{…}}` filled using your project or `[fixtures/placeholder-values-for-turn0.md](../fixtures/placeholder-values-for-turn0.md)`.
 
 ### Turn 1 — User
 
diff --git a/docs/agent-evaluation/scenarios/06-app-fastapi.md b/docs/agent-evaluation/scenarios/06-app-fastapi.md
index 704415e33..db8bb76f6 100644
--- a/docs/agent-evaluation/scenarios/06-app-fastapi.md
+++ b/docs/agent-evaluation/scenarios/06-app-fastapi.md
@@ -20,7 +20,7 @@ The harness sets the agent **cwd** to `docs/agent-evaluation/results/runs/<stamp
 
 ### Turn 0
 
-Paste the **## Template** block from [`hookdeck-outpost-agent-prompt.mdx`](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx), with `{{…}}` filled using your project or [`fixtures/placeholder-values-for-turn0.md`](../fixtures/placeholder-values-for-turn0.md).
+Paste the **## Template** block from [`hookdeck-outpost-agent-prompt.mdoc`](../../content/quickstarts/hookdeck-outpost-agent-prompt.mdoc), with `{{…}}` filled using your project or [`fixtures/placeholder-values-for-turn0.md`](../fixtures/placeholder-values-for-turn0.md).
 
 ### Turn 1 — User
 
diff --git a/docs/agent-evaluation/scenarios/07-app-go-http.md b/docs/agent-evaluation/scenarios/07-app-go-http.md
index 5dfdd85e2..03f9fe31c 100644
--- a/docs/agent-evaluation/scenarios/07-app-go-http.md
+++ b/docs/agent-evaluation/scenarios/07-app-go-http.md
@@ -19,7 +19,7 @@ The harness sets the agent **cwd** to `docs/agent-evaluation/results/runs/<stamp
 
 ### Turn 0
 
-Paste the **## Template** block from `[hookdeck-outpost-agent-prompt.mdx](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx)`, with `{{…}}` filled using your project or `[fixtures/placeholder-values-for-turn0.md](../fixtures/placeholder-values-for-turn0.md)`.
+Paste the **## Template** block from `[hookdeck-outpost-agent-prompt.mdoc](../../content/quickstarts/hookdeck-outpost-agent-prompt.mdoc)`, with `{{…}}` filled using your project or `[fixtures/placeholder-values-for-turn0.md](../fixtures/placeholder-values-for-turn0.md)`.
 
 ### Turn 1 — User
 
diff --git a/docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md b/docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md
index 74ad08253..8d459ccfe 100644
--- a/docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md
+++ b/docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md
@@ -38,7 +38,7 @@ Same as other scenarios, except the agent starts **inside** the cloned tree abov
 
 ### Turn 0
 
-Paste the **## Template** block from [`hookdeck-outpost-agent-prompt.mdx`](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx), with `{{…}}` filled using your project or [`fixtures/placeholder-values-for-turn0.md`](../fixtures/placeholder-values-for-turn0.md).
+Paste the **## Template** block from [`hookdeck-outpost-agent-prompt.mdoc`](../../content/quickstarts/hookdeck-outpost-agent-prompt.mdoc), with `{{…}}` filled using your project or [`fixtures/placeholder-values-for-turn0.md`](../fixtures/placeholder-values-for-turn0.md).
 
 ### Turn 1 — User
 
@@ -54,7 +54,7 @@ Paste the **## Template** block from [`hookdeck-outpost-agent-prompt.mdx`](../pa
 
 **Measurement:** Heuristic `scoreScenario08` in [`src/score-transcript.ts`](../src/score-transcript.ts); LLM judge maps the bullets below ([`README.md` § Measuring scenarios](../README.md#measuring-scenarios)). Execution row is manual.
 
-**Contract:** The baseline ships a **customer-facing dashboard**. Treat it like **Existing application (full-stack products)** in [`hookdeck-outpost-agent-prompt.mdx`](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx). The detailed UI bar is **not** repeated here—use **[Building your own UI — Implementation checklists](../../pages/guides/building-your-own-ui.mdx#implementation-checklists)** (*Planning and contract*, *Destinations experience*, *Activity, attempts, and retries*). The agent must self-verify with **Before you stop (verify)** in the same prompt (full-stack UI item).
+**Contract:** The baseline ships a **customer-facing dashboard**. Treat it like **Existing application (full-stack products)** in [`hookdeck-outpost-agent-prompt.mdoc`](../../content/quickstarts/hookdeck-outpost-agent-prompt.mdoc). The detailed UI bar is **not** repeated here—use **[Building your own UI — Implementation checklists](../../content/guides/building-your-own-ui.mdoc#implementation-checklists)** (*Planning and contract*, *Destinations experience*, *Activity, attempts, and retries*). The agent must self-verify with **Before you stop (verify)** in the same prompt (full-stack UI item).
 
 - Baseline app is the documented **next-saas-starter** (or an explicitly justified fork): harness clone under the run directory plus install / integration steps reflected in the transcript or that tree.
 - **Outpost TypeScript SDK** used **server-side only**; no `NEXT_PUBLIC_*` API key.
diff --git a/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md b/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md
index bd171fb3c..fe4e1ed18 100644
--- a/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md
+++ b/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md
@@ -41,7 +41,7 @@ The agent starts **inside** the cloned baseline above. Expect **`docker compose`
 
 ### Turn 0
 
-Paste the **## Template** from [`hookdeck-outpost-agent-prompt.mdx`](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx) with placeholders filled.
+Paste the **## Template** from [`hookdeck-outpost-agent-prompt.mdoc`](../../content/quickstarts/hookdeck-outpost-agent-prompt.mdoc) with placeholders filled.
 
 ### Turn 1 — User
 
@@ -59,7 +59,7 @@ Paste the **## Template** from [`hookdeck-outpost-agent-prompt.mdx`](../pages/qu
 
 **Measurement:** Heuristic `scoreScenario09` in [`src/score-transcript.ts`](../src/score-transcript.ts); LLM judge (reads this section); execution manual.
 
-**Contract:** Same full-stack bar as scenario **8**, pinned to this template. **Canonical checklist:** [Building your own UI — Implementation checklists](../../pages/guides/building-your-own-ui.mdx#implementation-checklists). **Agent self-verify:** [`hookdeck-outpost-agent-prompt.mdx`](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx) → *Before you stop (verify)* (full-stack UI item). Do not duplicate checklist rows in transcripts—confirm against the guide.
+**Contract:** Same full-stack bar as scenario **8**, pinned to this template. **Canonical checklist:** [Building your own UI — Implementation checklists](../../content/guides/building-your-own-ui.mdoc#implementation-checklists). **Agent self-verify:** [`hookdeck-outpost-agent-prompt.mdoc`](../../content/quickstarts/hookdeck-outpost-agent-prompt.mdoc) → *Before you stop (verify)* (full-stack UI item). Do not duplicate checklist rows in transcripts—confirm against the guide.
 
 - **full-stack-fastapi-template** (or documented alternative) present via harness **`preSteps`** with install steps in the transcript or tree.
 - **`outpost_sdk`** with **`publish.event`** (and related calls as needed) on a **real** code path in the **backend** (server-side only for secrets)—**not** only a synthetic test-publish endpoint unless the scenario was explicitly scoped to wiring-only.
diff --git a/docs/agent-evaluation/scenarios/10-integrate-go-existing.md b/docs/agent-evaluation/scenarios/10-integrate-go-existing.md
index c9ab15366..01ca61438 100644
--- a/docs/agent-evaluation/scenarios/10-integrate-go-existing.md
+++ b/docs/agent-evaluation/scenarios/10-integrate-go-existing.md
@@ -35,7 +35,7 @@ The agent starts **inside** the cloned baseline above. Expect **`go mod`** / **`
 
 ### Turn 0
 
-Paste the **## Template** from [`hookdeck-outpost-agent-prompt.mdx`](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx) with placeholders filled.
+Paste the **## Template** from [`hookdeck-outpost-agent-prompt.mdoc`](../../content/quickstarts/hookdeck-outpost-agent-prompt.mdoc) with placeholders filled.
 
 ### Turn 1 — User
 
@@ -51,7 +51,7 @@ Paste the **## Template** from [`hookdeck-outpost-agent-prompt.mdx`](../pages/qu
 
 **Measurement:** Heuristic `scoreScenario10` in [`src/score-transcript.ts`](../src/score-transcript.ts); LLM judge; execution manual.
 
-**Contract:** This baseline is an **API-first** Go service (no first-party customer dashboard in the pin). It does **not** inherit the full **[Building your own UI](../../pages/guides/building-your-own-ui.mdx)** dashboard checklist wholesale—agents follow **[Existing application](../../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx#existing-application)** (minimum integration depth) plus **API-only** guidance in **Existing application (full-stack products)** (*Document how tenants manage destinations via **your** API*). If a future pin adds a UI, scenarios should be updated to require the **Implementation checklists** linked above.
+**Contract:** This baseline is an **API-first** Go service (no first-party customer dashboard in the pin). It does **not** inherit the full **[Building your own UI](../../content/guides/building-your-own-ui.mdoc)** dashboard checklist wholesale—agents follow **[Existing application](../../content/quickstarts/hookdeck-outpost-agent-prompt.mdoc#existing-application)** (minimum integration depth) plus **API-only** guidance in **Existing application (full-stack products)** (*Document how tenants manage destinations via **your** API*). If a future pin adds a UI, scenarios should be updated to require the **Implementation checklists** linked above.
 
 - **startersaas-go-api** (or documented alternative) present via harness **`preSteps`** with build instructions attempted in the transcript or tree.
 - **Outpost Go SDK** used with **`Publish.Event`** (and related types) on a **real** handler path—not only a test-only route unless wiring-only scope was agreed.
diff --git a/docs/agent-evaluation/scripts/run-scenario.sh b/docs/agent-evaluation/scripts/run-scenario.sh
index 7b24d3291..de47f2c87 100755
--- a/docs/agent-evaluation/scripts/run-scenario.sh
+++ b/docs/agent-evaluation/scripts/run-scenario.sh
@@ -36,7 +36,7 @@ echo "Scenario file:"
 echo "  $scenario"
 echo ""
 echo "Turn 0 — copy the fenced block under '## Template' from:"
-echo "  $REPO_ROOT/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx"
+echo "  $REPO_ROOT/docs/content/quickstarts/hookdeck-outpost-agent-prompt.mdoc"
 echo ""
 echo "Placeholder examples (not the template):"
 echo "  $ROOT/fixtures/placeholder-values-for-turn0.md"

From b7316f48a2ec7e49c1b7640868dae8224ee27c52 Mon Sep 17 00:00:00 2001
From: Phil Leggetter <phil@leggetter.co.uk>
Date: Fri, 10 Apr 2026 23:43:59 +0100
Subject: [PATCH 42/47] docs(eval): record scenario 10 pass in run tracker

Log 2026-04-10T22-14-20-704Z-scenario-10 with heuristic/LLM/execution
results and execution notes (Go baseline, signup smoke, Hookdeck probe).

Made-with: Cursor
---
 docs/agent-evaluation/SCENARIO-RUN-TRACKER.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
index b5443c60a..f55ad9cf7 100644
--- a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
+++ b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
@@ -29,7 +29,7 @@ Use this table while you **run scenarios one at a time** and **execute the gener
 | 07  | [07-app-go-http.md](scenarios/07-app-go-http.md)                               | `2026-04-09T09-10-23-291Z-scenario-07` | Pass (9/9)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. `**go-portal-demo/`** — `main.go` + `templates/`, `net/http`, `outpost-go` (`replace` → repo `sdks/outpost-go`). Multi-step create destination + **GET/POST /publish**. Execution: `PORT=8777` + key/base from `docs/agent-evaluation/.env`; **GET /** 200, **POST /publish** 200. Eval ~25 min wall time.                                                                                                                                                                                                                                                   |
 | 08  | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md)   | `2026-04-10T14-29-04-214Z-scenario-08` | Pass (10/10)           | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1` + [scope-router prompt](../content/quickstarts/hookdeck-outpost-agent-prompt.mdoc). Harness `**next-saas-starter/`** under run dir (gitignored). **Execution pass** — operator QA (Postgres, `.env`, migrate/seed/dev, Outpost UI/API). See **§ Scenario 08 — execution notes** for reproducibility (seed/`server-only`, destination-schema `key` vs SDK). Earlier: `2026-04-10T11-08-35-921Z-scenario-08` (8/8), `2026-04-09T14-48-16-906Z-scenario-08`, `2026-04-09T11-08-32-505Z-scenario-08`. |
 | 09  | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) | `2026-04-10T19-54-20-037Z-scenario-09` | Pass (10/10)           | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. **Artifact:** `full-stack-fastapi-template/` under run dir (**gitignored**). **Heuristic + LLM** from this stamp; harness sidecars sibling under `results/runs/`. Docker: default **5173** / **8000** / **1080** / **1025**; if host **5432** is taken, map DB e.g. **54334:5432** in `compose.override.yml`. After a **fresh DB volume**, clear the SPA token or **re-login** — stale JWT → **404 User not found** on `/api/v1/users/me` and `/api/v1/outpost/destinations`. **§ Scenario 09 — post-agent work** (below) still describes template fixes vs baseline. **Legacy runs:** `2026-04-10T19-22-02-903Z-scenario-09`, `2026-04-09T22-16-54-750Z-scenario-09` (6/6), `2026-04-09T20-48-16-530Z-scenario-09`, `2026-04-09T15-51-44-184Z-scenario-09`. |
-| 10  | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md)           |                                        |                        |           |                            |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
+| 10  | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md)           | `2026-04-10T22-14-20-704Z-scenario-10` | Pass (7/7)             | Pass      | Pass                       | `EVAL_LOCAL_DOCS=1`. Harness clone **`startersaas-go-api/`** under run dir (**gitignored**); pin [**devinterface/startersaas-go-api**](https://github.com/devinterface/startersaas-go-api). **Execution:** `go build` OK; **`docker compose build`** fails on baseline **Go 1.21** image vs **`go 1.22`** in `go.mod` (upstream Dockerfile). **Smoke:** Mongo **:27018**, `go run .`, **`POST /api/v1/auth/signup`** with **`privacyAccepted` / `marketingAccepted` as JSON booleans** → **200**; log **`[outpost] published user.created`**. **Outpost delivery** to Hookdeck Source verified with a distinct **`POST /publish`** probe (tenant + webhook destination + event). |
 
 
 ### Scenario 08 — execution notes (`2026-04-10T14-29-04-214Z-scenario-08`)

From 66fd663ffa9b9b6121b6e263a8dc91224d6f84aa Mon Sep 17 00:00:00 2001
From: Phil Leggetter <phil@leggetter.co.uk>
Date: Sat, 11 Apr 2026 00:16:24 +0100
Subject: [PATCH 43/47] ci(docs): agent eval workflow with live Outpost
 execution

Add docs-agent-eval-ci.yml: scenarios 01+02 with EVAL_LOCAL_DOCS, heuristic
+ LLM judge, then execute-ci-artifacts.sh (curl + TypeScript) using
OUTPOST_API_KEY. Trigger on docs content/apis, agent-evaluation harness
(ignoring tracker/results README noise), TypeScript SDK, and workflow edits.
Ignore .env.ci for local secret template; document secrets and execution in
README.

Made-with: Cursor
---
 .github/workflows/docs-agent-eval-ci.yml      | 77 +++++++++++++++
 .gitignore                                    |  1 +
 docs/agent-evaluation/.env.example            |  2 +-
 docs/agent-evaluation/README.md               | 11 ++-
 docs/agent-evaluation/results/README.md       |  2 +-
 docs/agent-evaluation/scripts/ci-eval.sh      |  1 +
 .../scripts/execute-ci-artifacts.sh           | 99 +++++++++++++++++++
 docs/agent-evaluation/src/score-transcript.ts |  2 +-
 8 files changed, 187 insertions(+), 8 deletions(-)
 create mode 100644 .github/workflows/docs-agent-eval-ci.yml
 create mode 100755 docs/agent-evaluation/scripts/execute-ci-artifacts.sh

diff --git a/.github/workflows/docs-agent-eval-ci.yml b/.github/workflows/docs-agent-eval-ci.yml
new file mode 100644
index 000000000..3367a7f06
--- /dev/null
+++ b/.github/workflows/docs-agent-eval-ci.yml
@@ -0,0 +1,77 @@
+# Runs scenarios 01+02 (curl + TypeScript SDK) with heuristic + LLM judge.
+# Sets EVAL_LOCAL_DOCS=1 so the agent reads repo docs under docs/ (not production WebFetch).
+# Triggers when local docs / OpenAPI / eval harness / TypeScript SDK change; ignores human-only files under results/ and tracker/README/AGENTS.
+# Each run bills Anthropic (agent + judge).
+# Requires repo secrets: ANTHROPIC_API_KEY, EVAL_TEST_DESTINATION_URL, OUTPOST_API_KEY
+# (OUTPOST_TEST_WEBHOOK_URL uses the same URL as EVAL_TEST_DESTINATION_URL in CI.)
+# See docs/agent-evaluation/README.md § CI (recommended slice).
+name: Docs agent eval (CI slice)
+
+on:
+  push:
+    branches:
+      - main
+    paths:
+      - "docs/content/**"
+      - "docs/apis/**"
+      - "docs/agent-evaluation/**"
+      - "docs/README.md"
+      - "docs/AGENTS.md"
+      - "sdks/outpost-typescript/**"
+      - ".github/workflows/docs-agent-eval-ci.yml"
+    paths-ignore:
+      - "docs/agent-evaluation/results/**"
+      - "docs/agent-evaluation/SCENARIO-RUN-TRACKER.md"
+      - "docs/agent-evaluation/README.md"
+      - "docs/agent-evaluation/AGENTS.md"
+  pull_request:
+    paths:
+      - "docs/content/**"
+      - "docs/apis/**"
+      - "docs/agent-evaluation/**"
+      - "docs/README.md"
+      - "docs/AGENTS.md"
+      - "sdks/outpost-typescript/**"
+      - ".github/workflows/docs-agent-eval-ci.yml"
+    paths-ignore:
+      - "docs/agent-evaluation/results/**"
+      - "docs/agent-evaluation/SCENARIO-RUN-TRACKER.md"
+      - "docs/agent-evaluation/README.md"
+      - "docs/agent-evaluation/AGENTS.md"
+
+jobs:
+  eval-ci:
+    # Fork PRs cannot use repository secrets; skip instead of failing a required-looking job.
+    if: github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository
+    runs-on: ubuntu-latest
+    timeout-minutes: 60
+    defaults:
+      run:
+        working-directory: docs/agent-evaluation
+
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@v4
+
+      - name: Set up Node.js
+        uses: actions/setup-node@v4
+        with:
+          node-version: "20"
+          cache: npm
+          cache-dependency-path: docs/agent-evaluation/package-lock.json
+
+      - name: Install dependencies
+        run: npm ci
+
+      - name: Run eval CI slice (scenarios 01, 02)
+        env:
+          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
+          EVAL_TEST_DESTINATION_URL: ${{ secrets.EVAL_TEST_DESTINATION_URL }}
+          EVAL_LOCAL_DOCS: "1"
+        run: ./scripts/ci-eval.sh
+
+      - name: Execute generated curl + TypeScript artifacts (live Outpost)
+        env:
+          OUTPOST_API_KEY: ${{ secrets.OUTPOST_API_KEY }}
+          OUTPOST_TEST_WEBHOOK_URL: ${{ secrets.EVAL_TEST_DESTINATION_URL }}
+        run: ./scripts/execute-ci-artifacts.sh
diff --git a/.gitignore b/.gitignore
index 64578dcf3..23b769f99 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,5 +1,6 @@
 # Environment variables
 .env
+.env.ci
 .outpost.yaml
 
 # Built binaries
diff --git a/docs/agent-evaluation/.env.example b/docs/agent-evaluation/.env.example
index 9f1392e98..7728e88d5 100644
--- a/docs/agent-evaluation/.env.example
+++ b/docs/agent-evaluation/.env.example
@@ -8,7 +8,7 @@ EVAL_TEST_DESTINATION_URL=
 
 # Strongly recommended for a *full* eval: run the agent’s curl/script/app against a real project.
 # The harness does not read this key; you (or a future verifier) use it after the run.
-# OUTPOST_API_KEY=
+# OUTPOST_API_KEY=   # required for ./scripts/execute-ci-artifacts.sh after eval:ci; GitHub Actions CI execution step
 # OUTPOST_API_BASE_URL=https://api.outpost.hookdeck.com/2025-07-01
 # OUTPOST_TEST_WEBHOOK_URL=https://hkdk.events/your-source-id   # often same as EVAL_TEST_DESTINATION_URL
 
diff --git a/docs/agent-evaluation/README.md b/docs/agent-evaluation/README.md
index 40eed004b..7cae6826c 100644
--- a/docs/agent-evaluation/README.md
+++ b/docs/agent-evaluation/README.md
@@ -81,16 +81,17 @@ For **pull-request or main-branch** automation, run **two** scenarios only:
 ```sh
 cd docs/agent-evaluation && npm ci && npm run eval:ci
 # or: ./scripts/ci-eval.sh   # requires ANTHROPIC_API_KEY + EVAL_TEST_DESTINATION_URL in the environment
+# after a successful eval:ci, live Outpost smoke: OUTPOST_API_KEY + OUTPOST_TEST_WEBHOOK_URL ./scripts/execute-ci-artifacts.sh
 ```
 
 `eval:ci` is **`npm run eval -- --scenarios 01,02`**: both **heuristic** checks and the **LLM judge** (grounded in each scenario’s **`## Success criteria`**). Skipping the judge would leave you with regex-only signal, which does not encode the product checklist.
 
-**GitHub Actions:** add repository secrets **`ANTHROPIC_API_KEY`** and **`EVAL_TEST_DESTINATION_URL`**, run from `docs/agent-evaluation` with a normal runner (Claude Agent SDK needs session filesystem access — avoid tight sandboxes; see **Permissions / failures** above). **`OUTPOST_API_KEY`** is still not required for transcript-only CI.
+**GitHub Actions:** add repository secrets **`ANTHROPIC_API_KEY`**, **`EVAL_TEST_DESTINATION_URL`**, and **`OUTPOST_API_KEY`**. Workflow **`.github/workflows/docs-agent-eval-ci.yml`** runs **`./scripts/ci-eval.sh`** with **`EVAL_LOCAL_DOCS=1`** (agent **reads docs from the repo**), then **`./scripts/execute-ci-artifacts.sh`**: picks the **newest** **`*-scenario-01`** / **`*-scenario-02`** pair from **`results/runs/`**, runs the generated **`.sh`** then **`npx tsx`** on the TypeScript artifact (**`npm install`** in the **02** run dir when **`package.json`** exists). **`OUTPOST_TEST_WEBHOOK_URL`** in CI is set from the same secret as **`EVAL_TEST_DESTINATION_URL`**. Triggers on pushes to **`main`** and on **pull requests** when **`docs/content/**`**, **`docs/apis/**`**, **`sdks/outpost-typescript/**`**, root **`docs/README.md`** / **`docs/AGENTS.md`**, or **`docs/agent-evaluation/**`** change, except **`paths-ignore`**: **`results/**`**, **`SCENARIO-RUN-TRACKER.md`**, **`README.md`**, and **`AGENTS.md`** under **`docs/agent-evaluation/`**. Uses **`ubuntu-latest`** (Claude Agent SDK needs normal filesystem access — avoid tight sandboxes; see **Permissions / failures** above). **Fork PRs** skip this job (secrets are not available).
 
 - **`ANTHROPIC_API_KEY`** — required for the agent and for the **LLM judge** (Success criteria) after each scenario you run.
-- **`EVAL_TEST_DESTINATION_URL`** — required for Turn 0; same Source URL as `{{TEST_DESTINATION_URL}}`.
-- **`OUTPOST_API_KEY`** — **not** read by the automated runner, but **required if you want a full evaluation**: without it you can only judge the transcript (plausible curl/SDK text). To verify that **generated commands or code actually work**, put the same Outpost API key you use against the managed API in **`docs/agent-evaluation/.env`** (or export it) and run the agent’s output against a real project. The onboarding prompt tells operators to keep that key in **`.env`** and never paste it into chat.
-- **`EVAL_LOCAL_DOCS=1`** — before public docs are live, set this so Turn 0 replaces public doc URLs with **absolute paths to MDX/OpenAPI files in this repo** (so the agent should use **Read** on local files instead of WebFetch to production).
+- **`EVAL_TEST_DESTINATION_URL`** — required for Turn 0; same Source URL as `{{TEST_DESTINATION_URL}}` (and, in CI, reused as **`OUTPOST_TEST_WEBHOOK_URL`** for execution).
+- **`OUTPOST_API_KEY`** — required for **`execute-ci-artifacts.sh`** and for **GitHub Actions** execution after **`eval:ci`**. For **local** transcript-only runs you can omit it. Put the key in **`docs/agent-evaluation/.env`** (or export); never paste it into chat.
+- **`EVAL_LOCAL_DOCS=1`** — Turn 0 replaces public doc URLs with **absolute paths to MDX/OpenAPI files in this repo** (agent uses **Read** on **`docs/`** instead of **WebFetch** to production). Use locally when validating unpublished docs; **GitHub Actions** sets this for **`docs-agent-eval-ci.yml`**.
 - **`EVAL_SKIP_HARNESS_PRE_STEPS=1`** — skip **`git_clone`** (and any future **`preSteps`**) declared in a scenario’s **`## Eval harness`** JSON block; useful offline or when the baseline folder is already present.
 
 - **Turn 0** text is built from [`hookdeck-outpost-agent-prompt.mdoc`](../content/quickstarts/hookdeck-outpost-agent-prompt.mdoc) (`## Template`) with placeholders filled from environment variables.
@@ -117,7 +118,7 @@ Changing **`EVAL_PERMISSION_MODE`** is usually unnecessary; widening **`EVAL_TOO
 
 ### Transcript vs execution (full pass)
 
-`npm run eval` only captures **what the model produced**; it does **not** call Outpost. Treat that as **transcript review**.
+`npm run eval` only captures **what the model produced**; by itself it does **not** call Outpost (transcript review). **`./scripts/execute-ci-artifacts.sh`** (and the **GitHub Actions** workflow’s second step) runs the **01** shell + **02** TypeScript outputs against **live** Outpost when **`OUTPOST_API_KEY`** and **`OUTPOST_TEST_WEBHOOK_URL`** are set.
 
 A **full pass** also answers: *did the generated curl / script / app succeed against a live Outpost project?* Each scenario’s **Success criteria** ends with **Execution** checkboxes for that step. To run them:
 
diff --git a/docs/agent-evaluation/results/README.md b/docs/agent-evaluation/results/README.md
index 0ed815986..9fe1615cc 100644
--- a/docs/agent-evaluation/results/README.md
+++ b/docs/agent-evaluation/results/README.md
@@ -36,7 +36,7 @@ npm run score -- --run results/runs/<stamp>-scenario-NN --write
 npm run score -- --run results/runs/<stamp>-scenario-NN --llm --write
 ```
 
-**Execution** (curl/SDK against live Outpost with `OUTPOST_API_KEY`) is **not** produced by these JSON files. Treat the **Execution (full pass)** rows in `[../scenarios/](../scenarios/)` as a separate human or CI step unless you add a verifier script.
+**Execution** (curl/SDK against live Outpost with `OUTPOST_API_KEY`) is **not** recorded in these JSON files. Use **`../scripts/execute-ci-artifacts.sh`** after **`eval:ci`**, or the second step in **`.github/workflows/docs-agent-eval-ci.yml`**, and the **Execution (full pass)** rows in `[../scenarios/](../scenarios/)` for human notes.
 
 ---
 
diff --git a/docs/agent-evaluation/scripts/ci-eval.sh b/docs/agent-evaluation/scripts/ci-eval.sh
index 4197c8b92..980442967 100755
--- a/docs/agent-evaluation/scripts/ci-eval.sh
+++ b/docs/agent-evaluation/scripts/ci-eval.sh
@@ -5,6 +5,7 @@
 # Optional: same vars in docs/agent-evaluation/.env for local runs.
 #
 # Scenarios: 01 = curl quickstart shape; 02 = TypeScript SDK script. See README § CI.
+# After success, run ./scripts/execute-ci-artifacts.sh with OUTPOST_API_KEY + OUTPOST_TEST_WEBHOOK_URL for live Outpost (CI does this automatically).
 set -euo pipefail
 
 ROOT="$(cd "$(dirname "$0")/.." && pwd)"
diff --git a/docs/agent-evaluation/scripts/execute-ci-artifacts.sh b/docs/agent-evaluation/scripts/execute-ci-artifacts.sh
new file mode 100755
index 000000000..1e67ae1da
--- /dev/null
+++ b/docs/agent-evaluation/scripts/execute-ci-artifacts.sh
@@ -0,0 +1,99 @@
+#!/usr/bin/env bash
+# After a successful eval:ci (same ISO stamp for scenario-01 and scenario-02), run generated
+# curl script and TypeScript quickstart against live Outpost (tenant → destination → publish).
+#
+# Required env: OUTPOST_API_KEY, OUTPOST_TEST_WEBHOOK_URL (often same URL as EVAL_TEST_DESTINATION_URL)
+# Optional: OUTPOST_API_BASE_URL (managed default if unset)
+set -euo pipefail
+
+ROOT="$(cd "$(dirname "$0")/.." && pwd)"
+RUNS="$ROOT/results/runs"
+
+if [[ -z "${OUTPOST_API_KEY:-}" ]]; then
+  echo "execute-ci-artifacts: OUTPOST_API_KEY is not set" >&2
+  exit 1
+fi
+if [[ -z "${OUTPOST_TEST_WEBHOOK_URL:-}" ]]; then
+  echo "execute-ci-artifacts: OUTPOST_TEST_WEBHOOK_URL is not set" >&2
+  exit 1
+fi
+
+if [[ ! -d "$RUNS" ]]; then
+  echo "execute-ci-artifacts: missing $RUNS (run eval:ci first)" >&2
+  exit 1
+fi
+
+# Latest scenario-01 run directory by mtime (same batch shares stamp with scenario-02).
+d01=""
+best=0
+for d in "$RUNS"/*-scenario-01; do
+  [[ -d "$d" ]] || continue
+  m=$(stat -c %Y "$d" 2>/dev/null || stat -f %m "$d")
+  if (( m >= best )); then
+    best=$m
+    d01=$d
+  fi
+done
+
+if [[ -z "$d01" ]]; then
+  echo "execute-ci-artifacts: no *-scenario-01 directory under $RUNS" >&2
+  exit 1
+fi
+
+prefix=${d01%-scenario-01}
+d02="${prefix}-scenario-02"
+if [[ ! -d "$d02" ]]; then
+  echo "execute-ci-artifacts: expected paired run dir missing: $d02" >&2
+  exit 1
+fi
+
+pick_sh() {
+  local dir=$1 f
+  for f in "$dir"/*quickstart*.sh "$dir"/outpost*.sh; do
+    [[ -f "$f" ]] && { echo "$f"; return 0; }
+  done
+  for f in "$dir"/*.sh; do
+    [[ -f "$f" ]] && { echo "$f"; return 0; }
+  done
+  return 1
+}
+
+pick_ts() {
+  local dir=$1 f
+  for f in "$dir"/outpost-quickstart.ts "$dir"/*quickstart*.ts; do
+    [[ -f "$f" ]] && { echo "$f"; return 0; }
+  done
+  for f in "$dir"/*.ts; do
+    [[ -f "$f" ]] && { echo "$f"; return 0; }
+  done
+  return 1
+}
+
+echo "execute-ci-artifacts: scenario 01 dir=$d01"
+sh_path=$(pick_sh "$d01") || {
+  echo "execute-ci-artifacts: no .sh script found in $d01" >&2
+  exit 1
+}
+echo "execute-ci-artifacts: running bash $sh_path"
+export OUTPOST_API_KEY OUTPOST_TEST_WEBHOOK_URL
+[[ -n "${OUTPOST_API_BASE_URL:-}" ]] && export OUTPOST_API_BASE_URL
+chmod +x "$sh_path" 2>/dev/null || true
+# Run from the scenario 01 run dir so relative paths in the generated script behave.
+cd "$d01"
+bash "$sh_path"
+
+echo "execute-ci-artifacts: scenario 02 dir=$d02"
+ts_path=$(pick_ts "$d02") || {
+  echo "execute-ci-artifacts: no .ts file found in $d02" >&2
+  exit 1
+}
+echo "execute-ci-artifacts: running npx tsx $ts_path (from $d02)"
+cd "$d02"
+if [[ -f package.json ]]; then
+  npm install --no-audit --no-fund
+fi
+export OUTPOST_API_KEY OUTPOST_TEST_WEBHOOK_URL
+[[ -n "${OUTPOST_API_BASE_URL:-}" ]] && export OUTPOST_API_BASE_URL
+npx --yes tsx "$ts_path"
+
+echo "execute-ci-artifacts: OK (scenario 01 shell + scenario 02 TypeScript)"
diff --git a/docs/agent-evaluation/src/score-transcript.ts b/docs/agent-evaluation/src/score-transcript.ts
index b3c4df2c9..2dbfb3d59 100644
--- a/docs/agent-evaluation/src/score-transcript.ts
+++ b/docs/agent-evaluation/src/score-transcript.ts
@@ -24,7 +24,7 @@ export interface ScoreReport {
   readonly scenarioId: string;
   readonly scenarioFile: string;
   readonly transcript: TranscriptScore;
-  /** Automated harness does not run Outpost; execution stays manual or a future verifier. */
+  /** Automated harness does not run Outpost; use `scripts/execute-ci-artifacts.sh` or CI for live 01/02 smoke. */
   readonly execution: { readonly status: "not_automated"; readonly note: string };
   /** null when no automated transcript rubric exists for this scenario yet */
   readonly overallTranscriptPass: boolean | null;

From 736a23fcf11b5cd208efe1dd5283b5279388749c Mon Sep 17 00:00:00 2001
From: Phil Leggetter <phil@leggetter.co.uk>
Date: Sat, 11 Apr 2026 00:20:47 +0100
Subject: [PATCH 44/47] ci(docs): allow workflow_dispatch for manual agent eval
 runs

Made-with: Cursor
---
 .github/workflows/docs-agent-eval-ci.yml | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/.github/workflows/docs-agent-eval-ci.yml b/.github/workflows/docs-agent-eval-ci.yml
index 3367a7f06..6647af05f 100644
--- a/.github/workflows/docs-agent-eval-ci.yml
+++ b/.github/workflows/docs-agent-eval-ci.yml
@@ -8,6 +8,7 @@
 name: Docs agent eval (CI slice)
 
 on:
+  workflow_dispatch:
   push:
     branches:
       - main
@@ -42,7 +43,7 @@ on:
 jobs:
   eval-ci:
     # Fork PRs cannot use repository secrets; skip instead of failing a required-looking job.
-    if: github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository
+    if: github.event_name != 'pull_request' || github.event.pull_request.head.repo.full_name == github.repository
     runs-on: ubuntu-latest
     timeout-minutes: 60
     defaults:

From 9ab377128eaffc3829de2d01c02453eb0d06e8c6 Mon Sep 17 00:00:00 2001
From: Phil Leggetter <phil@leggetter.co.uk>
Date: Sat, 11 Apr 2026 00:22:12 +0100
Subject: [PATCH 45/47] ci(docs): fix workflow YAML (paths vs paths-ignore);
 document dispatch

GitHub rejects paths + paths-ignore on the same event; drop paths-ignore.
README: manual workflow_dispatch; note broader path matches.

Made-with: Cursor
---
 .github/workflows/docs-agent-eval-ci.yml | 12 +-----------
 docs/agent-evaluation/README.md          |  2 +-
 2 files changed, 2 insertions(+), 12 deletions(-)

diff --git a/.github/workflows/docs-agent-eval-ci.yml b/.github/workflows/docs-agent-eval-ci.yml
index 6647af05f..f5ea2c63d 100644
--- a/.github/workflows/docs-agent-eval-ci.yml
+++ b/.github/workflows/docs-agent-eval-ci.yml
@@ -1,6 +1,6 @@
 # Runs scenarios 01+02 (curl + TypeScript SDK) with heuristic + LLM judge.
 # Sets EVAL_LOCAL_DOCS=1 so the agent reads repo docs under docs/ (not production WebFetch).
-# Triggers when local docs / OpenAPI / eval harness / TypeScript SDK change; ignores human-only files under results/ and tracker/README/AGENTS.
+# Triggers: workflow_dispatch, or push (main) / pull_request when docs / OpenAPI / agent-eval / TS SDK paths change.
 # Each run bills Anthropic (agent + judge).
 # Requires repo secrets: ANTHROPIC_API_KEY, EVAL_TEST_DESTINATION_URL, OUTPOST_API_KEY
 # (OUTPOST_TEST_WEBHOOK_URL uses the same URL as EVAL_TEST_DESTINATION_URL in CI.)
@@ -20,11 +20,6 @@ on:
       - "docs/AGENTS.md"
       - "sdks/outpost-typescript/**"
       - ".github/workflows/docs-agent-eval-ci.yml"
-    paths-ignore:
-      - "docs/agent-evaluation/results/**"
-      - "docs/agent-evaluation/SCENARIO-RUN-TRACKER.md"
-      - "docs/agent-evaluation/README.md"
-      - "docs/agent-evaluation/AGENTS.md"
   pull_request:
     paths:
       - "docs/content/**"
@@ -34,11 +29,6 @@ on:
       - "docs/AGENTS.md"
       - "sdks/outpost-typescript/**"
       - ".github/workflows/docs-agent-eval-ci.yml"
-    paths-ignore:
-      - "docs/agent-evaluation/results/**"
-      - "docs/agent-evaluation/SCENARIO-RUN-TRACKER.md"
-      - "docs/agent-evaluation/README.md"
-      - "docs/agent-evaluation/AGENTS.md"
 
 jobs:
   eval-ci:
diff --git a/docs/agent-evaluation/README.md b/docs/agent-evaluation/README.md
index 7cae6826c..14df8f51e 100644
--- a/docs/agent-evaluation/README.md
+++ b/docs/agent-evaluation/README.md
@@ -86,7 +86,7 @@ cd docs/agent-evaluation && npm ci && npm run eval:ci
 
 `eval:ci` is **`npm run eval -- --scenarios 01,02`**: both **heuristic** checks and the **LLM judge** (grounded in each scenario’s **`## Success criteria`**). Skipping the judge would leave you with regex-only signal, which does not encode the product checklist.
 
-**GitHub Actions:** add repository secrets **`ANTHROPIC_API_KEY`**, **`EVAL_TEST_DESTINATION_URL`**, and **`OUTPOST_API_KEY`**. Workflow **`.github/workflows/docs-agent-eval-ci.yml`** runs **`./scripts/ci-eval.sh`** with **`EVAL_LOCAL_DOCS=1`** (agent **reads docs from the repo**), then **`./scripts/execute-ci-artifacts.sh`**: picks the **newest** **`*-scenario-01`** / **`*-scenario-02`** pair from **`results/runs/`**, runs the generated **`.sh`** then **`npx tsx`** on the TypeScript artifact (**`npm install`** in the **02** run dir when **`package.json`** exists). **`OUTPOST_TEST_WEBHOOK_URL`** in CI is set from the same secret as **`EVAL_TEST_DESTINATION_URL`**. Triggers on pushes to **`main`** and on **pull requests** when **`docs/content/**`**, **`docs/apis/**`**, **`sdks/outpost-typescript/**`**, root **`docs/README.md`** / **`docs/AGENTS.md`**, or **`docs/agent-evaluation/**`** change, except **`paths-ignore`**: **`results/**`**, **`SCENARIO-RUN-TRACKER.md`**, **`README.md`**, and **`AGENTS.md`** under **`docs/agent-evaluation/`**. Uses **`ubuntu-latest`** (Claude Agent SDK needs normal filesystem access — avoid tight sandboxes; see **Permissions / failures** above). **Fork PRs** skip this job (secrets are not available).
+**GitHub Actions:** add repository secrets **`ANTHROPIC_API_KEY`**, **`EVAL_TEST_DESTINATION_URL`**, and **`OUTPOST_API_KEY`**. Workflow **`.github/workflows/docs-agent-eval-ci.yml`** runs **`./scripts/ci-eval.sh`** with **`EVAL_LOCAL_DOCS=1`** (agent **reads docs from the repo**), then **`./scripts/execute-ci-artifacts.sh`**: picks the **newest** **`*-scenario-01`** / **`*-scenario-02`** pair from **`results/runs/`**, runs the generated **`.sh`** then **`npx tsx`** on the TypeScript artifact (**`npm install`** in the **02** run dir when **`package.json`** exists). **`OUTPOST_TEST_WEBHOOK_URL`** in CI is set from the same secret as **`EVAL_TEST_DESTINATION_URL`**. Triggers on **`workflow_dispatch`** (manual: Actions → **Docs agent eval (CI slice)** → **Run workflow**, pick branch), pushes to **`main`**, and **pull requests** when **`docs/content/**`**, **`docs/apis/**`**, **`sdks/outpost-typescript/**`**, root **`docs/README.md`** / **`docs/AGENTS.md`**, or **`docs/agent-evaluation/**`** change (GitHub does not allow **`paths`** + **`paths-ignore`** together on the same event, so edits under e.g. **`docs/agent-evaluation/README.md`** also match **`docs/agent-evaluation/**`** and can trigger a run). Uses **`ubuntu-latest`** (Claude Agent SDK needs normal filesystem access — avoid tight sandboxes; see **Permissions / failures** above). **Fork PRs** skip this job (secrets are not available).
 
 - **`ANTHROPIC_API_KEY`** — required for the agent and for the **LLM judge** (Success criteria) after each scenario you run.
 - **`EVAL_TEST_DESTINATION_URL`** — required for Turn 0; same Source URL as `{{TEST_DESTINATION_URL}}` (and, in CI, reused as **`OUTPOST_TEST_WEBHOOK_URL`** for execution).

From 49d571354683758a31545b6dee344d9b9b7a6a27 Mon Sep 17 00:00:00 2001
From: Phil Leggetter <phil@leggetter.co.uk>
Date: Sat, 11 Apr 2026 00:24:17 +0100
Subject: [PATCH 46/47] =?UTF-8?q?fix(agent-eval):=20eval:ci=20argv=20?=
 =?UTF-8?q?=E2=80=94=20drop=20stray=20--=20before=20--scenarios?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Node parseArgs treats a bare -- as starting positionals; --scenarios then
failed with ERR_PARSE_ARGS_UNEXPECTED_POSITIONAL in CI.

Made-with: Cursor
---
 docs/agent-evaluation/package.json | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/agent-evaluation/package.json b/docs/agent-evaluation/package.json
index 900af5e2d..73d7d379d 100644
--- a/docs/agent-evaluation/package.json
+++ b/docs/agent-evaluation/package.json
@@ -6,7 +6,7 @@
   "description": "Claude Agent SDK harness for Outpost onboarding scenario evals",
   "scripts": {
     "eval": "node --import tsx src/run-agent-eval.ts",
-    "eval:ci": "node --import tsx src/run-agent-eval.ts -- --scenarios 01,02",
+    "eval:ci": "node --import tsx src/run-agent-eval.ts --scenarios 01,02",
     "eval:tsx-cli": "tsx src/run-agent-eval.ts",
     "score": "node --import tsx src/score-eval.ts",
     "typecheck": "tsc --noEmit"

From 052e48f2d9dc3c1472d4e5abc0705d7a26c8fc93 Mon Sep 17 00:00:00 2001
From: Phil Leggetter <phil@leggetter.co.uk>
Date: Sat, 11 Apr 2026 11:45:06 +0100
Subject: [PATCH 47/47] fix(agent-eval): execution defaults, smoke test, CI env
 for live Outpost

- execute-ci-artifacts: EVAL_TEST_DESTINATION_URL fallback for webhook URL;
  default OUTPOST_API_BASE_URL with := (empty .env no longer strips version path);
  clearer errors on shell/ts failure
- Add smoke-test-execute-ci-artifacts.sh + npm run smoke:execute-ci (topics *,
  loads .env then .env.ci)
- CI execution step: OUTPOST_API_BASE_URL + OUTPOST_CI_PUBLISH_TOPIC
- README troubleshooting (404) and .env.example OUTPOST_CI_PUBLISH_TOPIC

Made-with: Cursor
---
 .github/workflows/docs-agent-eval-ci.yml      |   2 +
 docs/agent-evaluation/.env.example            |   1 +
 docs/agent-evaluation/README.md               |  10 ++
 docs/agent-evaluation/package.json            |   1 +
 .../scripts/execute-ci-artifacts.sh           |  18 ++-
 .../smoke-test-execute-ci-artifacts.sh        | 126 ++++++++++++++++++
 6 files changed, 155 insertions(+), 3 deletions(-)
 create mode 100755 docs/agent-evaluation/scripts/smoke-test-execute-ci-artifacts.sh

diff --git a/.github/workflows/docs-agent-eval-ci.yml b/.github/workflows/docs-agent-eval-ci.yml
index f5ea2c63d..49fb76e87 100644
--- a/.github/workflows/docs-agent-eval-ci.yml
+++ b/.github/workflows/docs-agent-eval-ci.yml
@@ -65,4 +65,6 @@ jobs:
         env:
           OUTPOST_API_KEY: ${{ secrets.OUTPOST_API_KEY }}
           OUTPOST_TEST_WEBHOOK_URL: ${{ secrets.EVAL_TEST_DESTINATION_URL }}
+          OUTPOST_API_BASE_URL: https://api.outpost.hookdeck.com/2025-07-01
+          OUTPOST_CI_PUBLISH_TOPIC: user.created
         run: ./scripts/execute-ci-artifacts.sh
diff --git a/docs/agent-evaluation/.env.example b/docs/agent-evaluation/.env.example
index 7728e88d5..79e210a37 100644
--- a/docs/agent-evaluation/.env.example
+++ b/docs/agent-evaluation/.env.example
@@ -11,6 +11,7 @@ EVAL_TEST_DESTINATION_URL=
 # OUTPOST_API_KEY=   # required for ./scripts/execute-ci-artifacts.sh after eval:ci; GitHub Actions CI execution step
 # OUTPOST_API_BASE_URL=https://api.outpost.hookdeck.com/2025-07-01
 # OUTPOST_TEST_WEBHOOK_URL=https://hkdk.events/your-source-id   # often same as EVAL_TEST_DESTINATION_URL
+# OUTPOST_CI_PUBLISH_TOPIC=user.created   # optional; publish topic for npm run smoke:execute-ci (must exist in project)
 
 # Optional (see npm run eval -- --help)
 # EVAL_API_BASE_URL=https://api.outpost.hookdeck.com/2025-07-01
diff --git a/docs/agent-evaluation/README.md b/docs/agent-evaluation/README.md
index 14df8f51e..1c5799797 100644
--- a/docs/agent-evaluation/README.md
+++ b/docs/agent-evaluation/README.md
@@ -120,6 +120,16 @@ Changing **`EVAL_PERMISSION_MODE`** is usually unnecessary; widening **`EVAL_TOO
 
 `npm run eval` only captures **what the model produced**; by itself it does **not** call Outpost (transcript review). **`./scripts/execute-ci-artifacts.sh`** (and the **GitHub Actions** workflow’s second step) runs the **01** shell + **02** TypeScript outputs against **live** Outpost when **`OUTPOST_API_KEY`** and **`OUTPOST_TEST_WEBHOOK_URL`** are set.
 
+**Local smoke (no agent):** to verify secrets and the managed API the same way CI does—without depending on a fresh eval transcript—run from **`docs/agent-evaluation/`** with **`OUTPOST_API_KEY`** and **`OUTPOST_TEST_WEBHOOK_URL`** set (e.g. **`source .env`**):
+
+```sh
+npm run smoke:execute-ci
+```
+
+That writes a temporary **`*-scenario-01` / `*-scenario-02`** pair under **`results/runs/`** with hand-maintained scripts: shell destination uses **`topics: ["*"]`** so you do not need every topic name pre-created; publish still uses **`OUTPOST_CI_PUBLISH_TOPIC`** (default **`user.created`**, overridable in the environment), which **must exist** in your Outpost project’s topic list. **`execute-ci-artifacts.sh`** was not exercised end-to-end in-repo before CI; use this command after changing execution logic.
+
+**CI `curl: (22) … 404`:** the agent-generated shell script is calling an Outpost URL that returned **404**. Common causes: wrong **`OUTPOST_API_BASE_URL`** in the script (CI now sets the managed URL explicitly), or a **publish/destination topic** that does not exist in the project tied to **`OUTPOST_API_KEY`**. Ensure **`user.created`** is configured in that project, or set **`OUTPOST_CI_PUBLISH_TOPIC`** to a topic you do have. Compare the failing **`curl`** line in the Actions log with the [curl quickstart](../content/quickstarts/hookdeck-outpost-curl.mdoc).
+
 A **full pass** also answers: *did the generated curl / script / app succeed against a live Outpost project?* Each scenario’s **Success criteria** ends with **Execution** checkboxes for that step. To run them:
 
 1. Add **`OUTPOST_API_KEY`** (and **`OUTPOST_TEST_WEBHOOK_URL`** / **`OUTPOST_API_BASE_URL`** when the artifact expects them) to `docs/agent-evaluation/.env` so your shell has them after `dotenv` or when you `source` / copy into the directory where you run the code.
diff --git a/docs/agent-evaluation/package.json b/docs/agent-evaluation/package.json
index 73d7d379d..f9812c162 100644
--- a/docs/agent-evaluation/package.json
+++ b/docs/agent-evaluation/package.json
@@ -7,6 +7,7 @@
   "scripts": {
     "eval": "node --import tsx src/run-agent-eval.ts",
     "eval:ci": "node --import tsx src/run-agent-eval.ts --scenarios 01,02",
+    "smoke:execute-ci": "bash scripts/smoke-test-execute-ci-artifacts.sh",
     "eval:tsx-cli": "tsx src/run-agent-eval.ts",
     "score": "node --import tsx src/score-eval.ts",
     "typecheck": "tsc --noEmit"
diff --git a/docs/agent-evaluation/scripts/execute-ci-artifacts.sh b/docs/agent-evaluation/scripts/execute-ci-artifacts.sh
index 1e67ae1da..03c046d8c 100755
--- a/docs/agent-evaluation/scripts/execute-ci-artifacts.sh
+++ b/docs/agent-evaluation/scripts/execute-ci-artifacts.sh
@@ -13,11 +13,17 @@ if [[ -z "${OUTPOST_API_KEY:-}" ]]; then
   echo "execute-ci-artifacts: OUTPOST_API_KEY is not set" >&2
   exit 1
 fi
+export OUTPOST_TEST_WEBHOOK_URL="${OUTPOST_TEST_WEBHOOK_URL:-${EVAL_TEST_DESTINATION_URL:-}}"
 if [[ -z "${OUTPOST_TEST_WEBHOOK_URL:-}" ]]; then
-  echo "execute-ci-artifacts: OUTPOST_TEST_WEBHOOK_URL is not set" >&2
+  echo "execute-ci-artifacts: OUTPOST_TEST_WEBHOOK_URL or EVAL_TEST_DESTINATION_URL must be set" >&2
   exit 1
 fi
 
+# Managed API default (agent-generated scripts often expect this in the environment).
+# Use := so empty string from .env is treated like unset (otherwise curl hits /tenants without /2025-07-01 → 404).
+: "${OUTPOST_API_BASE_URL:=https://api.outpost.hookdeck.com/2025-07-01}"
+export OUTPOST_API_BASE_URL
+
 if [[ ! -d "$RUNS" ]]; then
   echo "execute-ci-artifacts: missing $RUNS (run eval:ci first)" >&2
   exit 1
@@ -80,7 +86,10 @@ export OUTPOST_API_KEY OUTPOST_TEST_WEBHOOK_URL
 chmod +x "$sh_path" 2>/dev/null || true
 # Run from the scenario 01 run dir so relative paths in the generated script behave.
 cd "$d01"
-bash "$sh_path"
+bash "$sh_path" || {
+  echo "execute-ci-artifacts: scenario 01 shell failed (curl exit 22 = HTTP error). 404 is often a wrong path or a publish/destination topic that is not configured in your Outpost project. Set OUTPOST_API_BASE_URL if needed; try npm run smoke:execute-ci (uses destination topics [\"*\"])." >&2
+  exit 1
+}
 
 echo "execute-ci-artifacts: scenario 02 dir=$d02"
 ts_path=$(pick_ts "$d02") || {
@@ -94,6 +103,9 @@ if [[ -f package.json ]]; then
 fi
 export OUTPOST_API_KEY OUTPOST_TEST_WEBHOOK_URL
 [[ -n "${OUTPOST_API_BASE_URL:-}" ]] && export OUTPOST_API_BASE_URL
-npx --yes tsx "$ts_path"
+npx --yes tsx "$ts_path" || {
+  echo "execute-ci-artifacts: scenario 02 TypeScript failed. Check OUTPOST_API_KEY, OUTPOST_TEST_WEBHOOK_URL, and that OUTPOST_CI_PUBLISH_TOPIC (default user.created) exists in the project. Try: npm run smoke:execute-ci" >&2
+  exit 1
+}
 
 echo "execute-ci-artifacts: OK (scenario 01 shell + scenario 02 TypeScript)"
diff --git a/docs/agent-evaluation/scripts/smoke-test-execute-ci-artifacts.sh b/docs/agent-evaluation/scripts/smoke-test-execute-ci-artifacts.sh
new file mode 100755
index 000000000..e85d1869b
--- /dev/null
+++ b/docs/agent-evaluation/scripts/smoke-test-execute-ci-artifacts.sh
@@ -0,0 +1,126 @@
+#!/usr/bin/env bash
+# Local / operator check for the same path as CI: materialize a fresh *-scenario-01 / *-scenario-02
+# pair with hand-maintained scripts (wildcard destination topics), then run execute-ci-artifacts.sh.
+#
+# Requires: OUTPOST_API_KEY, OUTPOST_TEST_WEBHOOK_URL (source docs/agent-evaluation/.env or export)
+# Optional: OUTPOST_API_BASE_URL, OUTPOST_CI_PUBLISH_TOPIC (default user.created — must exist in your project)
+#
+# Does not invoke the agent. Use this to verify secrets and managed API before relying on CI execution.
+set -euo pipefail
+
+ROOT="$(cd "$(dirname "$0")/.." && pwd)"
+cd "$ROOT"
+if [[ -f .env ]]; then
+  set -a
+  # shellcheck disable=SC1091
+  source .env
+  set +a
+fi
+if [[ -f .env.ci ]]; then
+  set -a
+  # shellcheck disable=SC1091
+  source .env.ci
+  set +a
+fi
+
+# Same as CI: webhook URL is often stored as EVAL_TEST_DESTINATION_URL in .env / .env.ci
+export OUTPOST_TEST_WEBHOOK_URL="${OUTPOST_TEST_WEBHOOK_URL:-${EVAL_TEST_DESTINATION_URL:-}}"
+
+if [[ -z "${OUTPOST_API_KEY:-}" || -z "${OUTPOST_TEST_WEBHOOK_URL:-}" ]]; then
+  echo "smoke-test-execute-ci: set OUTPOST_API_KEY and OUTPOST_TEST_WEBHOOK_URL (or EVAL_TEST_DESTINATION_URL), e.g. source .env" >&2
+  exit 1
+fi
+
+RUNS="$ROOT/results/runs"
+mkdir -p "$RUNS"
+
+STAMP="ci-smoke-$(date -u +%Y-%m-%dT%H-%M-%S)-$(printf '%03d' $((RANDOM % 1000)))Z"
+d01="$RUNS/${STAMP}-scenario-01"
+d02="$RUNS/${STAMP}-scenario-02"
+mkdir -p "$d01" "$d02"
+
+PUBLISH_TOPIC="${OUTPOST_CI_PUBLISH_TOPIC:-user.created}"
+
+# Shell: managed API, unique tenant, destination topics * (no dashboard topic list required), then publish.
+cat > "$d01/outpost_quickstart.sh" << 'EOSH'
+#!/usr/bin/env bash
+set -euo pipefail
+BASE="${OUTPOST_API_BASE_URL:-https://api.outpost.hookdeck.com/2025-07-01}"
+TENANT_ID="ci_smoke_${RANDOM}_$(date +%s)"
+TOPIC="${OUTPOST_CI_PUBLISH_TOPIC:-user.created}"
+DEST_JSON="$(OUTPOST_TEST_WEBHOOK_URL="$OUTPOST_TEST_WEBHOOK_URL" python3 -c '
+import json, os
+print(json.dumps({"type": "webhook", "topics": ["*"], "config": {"url": os.environ["OUTPOST_TEST_WEBHOOK_URL"]}}))
+')"
+curl -sS -f -X PUT "$BASE/tenants/$TENANT_ID" \
+  -H "Authorization: Bearer $OUTPOST_API_KEY" -o /dev/null
+curl -sS -f -X POST "$BASE/tenants/$TENANT_ID/destinations" \
+  -H "Authorization: Bearer $OUTPOST_API_KEY" -H "Content-Type: application/json" \
+  -d "$DEST_JSON" -o /dev/null
+curl -sS -f -X POST "$BASE/publish" \
+  -H "Authorization: Bearer $OUTPOST_API_KEY" -H "Content-Type: application/json" \
+  -d "$(TENANT_ID="$TENANT_ID" TOPIC="$TOPIC" python3 -c '
+import json, os
+print(json.dumps({
+  "tenant_id": os.environ["TENANT_ID"],
+  "topic": os.environ["TOPIC"],
+  "eligible_for_retry": True,
+  "metadata": {"source": "ci-smoke-sh"},
+  "data": {"smoke": True},
+}))
+')" -o /dev/null -w "publish_http=%{http_code}\n"
+echo "smoke shell OK tenant=$TENANT_ID"
+EOSH
+chmod +x "$d01/outpost_quickstart.sh"
+
+# TypeScript: same semantics (wildcard subscription); publish uses OUTPOST_CI_PUBLISH_TOPIC.
+cat > "$d02/package.json" << 'EOJSON'
+{
+  "name": "ci-smoke-outpost-ts",
+  "private": true,
+  "type": "module",
+  "dependencies": {
+    "@hookdeck/outpost-sdk": "^0.9.0"
+  }
+}
+EOJSON
+
+cat > "$d02/outpost-quickstart.ts" << 'EOTS'
+import { Outpost } from "@hookdeck/outpost-sdk";
+
+const apiKey = process.env.OUTPOST_API_KEY;
+if (!apiKey) throw new Error("Set OUTPOST_API_KEY");
+const webhookUrl = process.env.OUTPOST_TEST_WEBHOOK_URL;
+if (!webhookUrl) throw new Error("Set OUTPOST_TEST_WEBHOOK_URL");
+
+const outpost = new Outpost({
+  apiKey,
+  ...(process.env.OUTPOST_API_BASE_URL
+    ? { serverURL: process.env.OUTPOST_API_BASE_URL }
+    : {}),
+});
+
+const tenantId = `ci_smoke_ts_${Math.random().toString(36).slice(2)}_${Date.now()}`;
+const topic = process.env.OUTPOST_CI_PUBLISH_TOPIC ?? "user.created";
+
+await outpost.tenants.upsert(tenantId);
+await outpost.destinations.create(tenantId, {
+  type: "webhook",
+  topics: ["*"],
+  config: { url: webhookUrl },
+});
+const published = await outpost.publish.event({
+  tenantId,
+  topic,
+  eligibleForRetry: true,
+  metadata: { source: "ci-smoke-ts" },
+  data: { smoke: true },
+});
+console.log("smoke ts OK event id:", published.id);
+EOTS
+
+touch "$d01" "$d02"
+echo "smoke-test-execute-ci: wrote $d01 and $d02 (publish topic=$PUBLISH_TOPIC)"
+export OUTPOST_CI_PUBLISH_TOPIC="$PUBLISH_TOPIC"
+./scripts/execute-ci-artifacts.sh
+echo "smoke-test-execute-ci: OK"