);
```
@@ -248,7 +262,7 @@ You can find the source code of the `TopicPicker.tsx` component of the User Port
### Configuring the Destination
-Using the destination type schema for the selected destination type, you can render a form to create and manage destinations configuration. The configuration fields are found in the `configuration_fields` and `credentials_fields` arrays of the destination type schema.
+Using the destination type schema for the selected destination type, you can render a form to create and manage destinations configuration. The configuration fields are found in the **`config_fields`** and **`credential_fields`** arrays of the destination type schema (snake_case in JSON responses).
To render your form, you should render all fields from both arrays. Note that some of the `credentials_fields` will be obfuscated once the destination is created, and in order to edit the input, the value must be cleared first.
@@ -293,18 +307,22 @@ const DestinationConfigForm = ({
}
const type_schema = destination_types.find(
- (type) => type.id === destination_type
+ (t) => t.type === destination_type
);
+ if (!type_schema) {
+ return
>
);
@@ -362,45 +382,32 @@ const DestinationConfigForm = ({
You can find the source code of the `DestinationConfigForm.tsx` component of the User Portal here: [DestinationConfigForm.tsx](https://github.com/hookdeck/outpost/blob/main/internal/portal/src/common/DestinationConfigFields/DestinationConfigFields.tsx#L14)
-## Listing Events
-
-Events are listed using the [List Events API](/docs/api/events#list-events). You can use the `topic` parameter to filter the events by topic or the `destination_id` parameter to filter the events by destination.
+## Events, attempts, and retries
-```tsx
-const [events, setEvents] = useState([]);
+This section ties together **how customers see what was delivered** and **how they recover from failures**—without duplicating full UI code (see the [portal](https://github.com/hookdeck/outpost/tree/main/internal/portal) and [OpenAPI](/docs/api) for request/response shapes).
-const fetchEvents = async () => {
- const response = await fetch(`${API_URL}/tenants/events`, {
- headers: {
- Authorization: `Bearer ${token}`,
- },
- });
-};
+### How the pieces fit
-useEffect(() => {
- fetchEvents();
-}, []);
+1. **Destinations list** — Each row is a subscription (type, target URL or label, topics). From here, users typically open **“Activity”**, **“Events”**, or **“Logs”** for that destination, or you filter a shared events view by `destination_id`.
+2. **Events** — An event is something your **backend published** (topic + payload). The [List Events API](/docs/api/events#list-events) returns a **paginated** list. Important query dimensions:
+ - **`destination_id`** — only events that were routed to that destination (ideal for a per-destination screen).
+ - **`topic`**, **time ranges**, **pagination** (`limit`, `next` / `prev` cursors) — for broader “recent activity” views.
+ With a **tenant JWT**, results are limited to that tenant; with an **admin API key**, supply **`tenant_id`** (your BFF usually injects it from the signed-in customer).
+3. **Attempts** — Each row is one **delivery try** to a destination (status, HTTP code, timing, optional response payload). Link attempts to events via **`event_id`** and **`destination_id`**.
+ - Tenant-wide: [List attempts](/docs/api/attempts#list-attempts) with `event_id` (and optionally `destination_id`).
+ - Destination-scoped: `GET /tenants/{tenant_id}/destinations/{destination_id}/attempts` — see [OpenAPI / tenant destination attempts](/docs/api) (same filters, including `event_id` when drilling down).
+4. **Automatic vs manual retry** — Outpost [retries failed deliveries automatically](/docs/features/event-delivery) (backoff, limits). **Manual retry** lets a user trigger another delivery after fixing their endpoint—use [Retry event delivery](/docs/api/attempts#retry-attempt) (`POST /retry` with **`event_id`** and **`destination_id`**). The destination must be enabled and subscribed to the event’s topic; disabled destinations cannot be retried.
-if (!events) {
- return
Loading...
;
-}
+### What to expose in your dashboard UI
-return (
-
-
Events
-
- {events.map((event) => (
-
-
{event.id}
-
{event.created_at}
-
{event.payload}
-
- ))}
-
-
-);
-```
+| User need | API direction |
+| --------- | ------------- |
+| “What fired for my webhook?” | List **events** filtered by **`destination_id`**, then list **attempts** for the chosen **`event_id`** (and destination). |
+| “Why did it fail?” | Show attempt **status**, **code**, and **response** fields (when included); link to your own docs on fixing URLs, auth, or timeouts. |
+| “Send it again” | **Retry** button on failed attempts (or on the event row if you only show one destination) → `POST /retry`. Show **202** / success vs **400** (e.g. destination disabled) from the API. |
-For each event, you can retrieve all its associated delivery attempts using the [List Event Attempts API](/docs/api/events#list-event-attempts).
+### Implementation notes
-You can find the source code of the `Events.tsx` component of the User Portal here: [Events.tsx](https://github.com/hookdeck/outpost/blob/main/internal/portal/src/scenes/Destination/Events/Events.tsx)
+- **Pagination:** Event and attempt list endpoints are cursor-paged; your UI should pass through **`next`** / **`prev`** (or “Load more”) so busy tenants are usable.
+- **Auth:** If the browser never sees the admin key, proxy these endpoints from your backend and attach the platform **Outpost API key** server-side, scoping **`tenant_id`** to the logged-in customer—same pattern as destination CRUD.
+- **Reference UI:** The portal’s destination flow includes event listing for a destination—see [Events.tsx](https://github.com/hookdeck/outpost/blob/main/internal/portal/src/scenes/Destination/Events/Events.tsx) as a reference layout (not a copy-paste requirement).
diff --git a/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx b/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx
index 28e9cc9af..9e08b3e03 100644
--- a/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx
+++ b/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx
@@ -36,7 +36,7 @@ Use this **Hookdeck Console Source** URL to verify event delivery (the webhook `
- Full docs bundle (when available on the public site): {{LLMS_FULL_URL}}
- API reference and OpenAPI (REST JSON shapes and status codes): {{DOCS_URL}}/api
- **Concepts — how tenants, destinations (subscriptions), topics, and publish fit a SaaS/platform:** {{DOCS_URL}}/concepts
-- **Building your own UI — screen structure and flow** (list destinations, create destination: type → topics → config; tenant scope): {{DOCS_URL}}/guides/building-your-own-ui
+- **Building your own UI — screen structure and flow** (list destinations, create destination: type → topics → config; **events list**, delivery **attempts**, **manual retry**; tenant scope): {{DOCS_URL}}/guides/building-your-own-ui
- Destination types: {{DOCS_URL}}/destinations
- Topics and destination subscriptions (fan-out, `*`): {{DOCS_URL}}/features/topics
- SDK overview (mostly TypeScript-shaped examples): {{DOCS_URL}}/sdks — use **only** for high-level context; for **TypeScript, Python, or Go** code, follow that language’s **quickstart** for correct method signatures (e.g. Python `publish.event` uses `request={{...}}`, not TypeScript-style spreads as Python kwargs).
@@ -63,6 +63,8 @@ Do **not** mix patterns across languages (e.g. do not apply TypeScript `publish.
- **Backend:** Keep **`OUTPOST_API_KEY`** and all Outpost SDK usage **server-side only**. Implement **tenant** upsert/sync where it fits your model, **publish** on real domain events, and **authenticated HTTP routes** (BFF / API routes / server actions—whatever matches the stack) that list, create, update, or delete destinations for the **currently signed-in customer’s** tenant. Those handlers call Outpost with the platform credentials; responses return only what the customer should see (e.g. destination ids, URLs, topics—never the platform API key).
- **Frontend:** Wire **logged-in** pages to **your** backend endpoints (session cookie, JWT, or your existing API client)—**not** to Hookdeck’s API directly and **not** with the Outpost SDK in the browser. Reuse your design system and routing. **Before** building screens, read **Concepts** and **Building your own UI** in the Documentation list: flows should reflect **tenant scope**, **multiple destinations per tenant**, and **destination = topic subscription + delivery target** (avoid a single undifferentiated “webhook” field that hides topics unless the operator asks for that simplification).
+- **Events and retries in the product UI:** Surface an **events** view (filterable by **destination** when useful) so customers can see what was published, plus **delivery attempts** per event (success/failure, response hints). For **failed** attempts, offer **manual retry** (server-side `POST /retry` with `event_id` and `destination_id`) after they fix their endpoint—see **Building your own UI** for how this links to destinations and to automatic retries in Outpost.
+- **Send test events (strongly recommended for full-stack / Option 3):** When you ship customer-facing destination management, also add a **separate** control or screen that **publishes a test event** for the signed-in tenant (server-side `publish` to a selectable topic, same pattern as the test destination URL above). Treat this as **best practice** for SaaS products that offer webhooks: it proves end-to-end delivery without waiting on production traffic and matches what operators expect (similar to “send test webhook” in major platforms). Implement it **by default** for this integration path; the product team can remove or gate it later, but skipping it makes verification much harder.
- **API-only or headless products:** If there is **no** customer UI, document how tenants manage destinations through **your** documented API (OpenAPI, etc.); still keep the platform key on the server.
### What to do
@@ -73,10 +75,24 @@ Guide the conversation, then act:
2. **Build a minimal example** — Small UI + server; use the **SDK for that stack** (see **Option 2** above) or REST if they choose HTTP-only. Follow **Concepts** + **Building your own UI** for the real product model. For a **tiny** demo (e.g. one page), still keep the model visible: **tenant** in scope, **create destination** as **topics + delivery target** (not one undifferentiated “webhook” field that hides topics), and a **separate** control or flow to **publish a test event** so the operator can verify delivery—avoid collapsing tenant setup, destination creation, and publish into a single form unless the user insists. An events or attempts view is optional for the smallest demo but matches the portal pattern when you have room.
-3. **Integrate with an existing app** — Open their codebase; implement **Option 3**. For repos that ship a **product UI**, integrate **both** server and client: backend Outpost calls plus customer-facing screens (or clear extension points) wired through **your** authenticated API, following **Building your own UI** for structure. Document env vars, tenant mapping, topics, and how to verify delivery (e.g. `{{TEST_DESTINATION_URL}}` or the Hookdeck dashboard).
+3. **Integrate with an existing app** — Open their codebase; implement **Option 3**. For repos that ship a **product UI**, integrate **both** server and client: backend Outpost calls plus customer-facing screens (or clear extension points) wired through **your** authenticated API, following **Building your own UI** for structure—**including test publish**, an **events** list (and attempts / **retry** where appropriate), unless the operator explicitly asks to omit parts. Document env vars, tenant mapping, topics, and how to verify delivery (e.g. `{{TEST_DESTINATION_URL}}` or the Hookdeck dashboard).
For all modes, read the **single** language-appropriate quickstart (and OpenAPI when implementing raw HTTP) before writing code. For **Option 3** with a UI, also read **Building your own UI** before implementing destination-management screens.
+### Before you stop (verify)
+
+Apply **only** the items below that fit the task; **skip** any that do not apply (e.g. skip the existing-repo items for a standalone script or curl-only flow).
+
+**Always (when you produced or changed runnable code):**
+
+- [ ] **Ran** the smallest end-to-end check that fits this task (e.g. run the script or shell flow once, exercise one new API path, or smoke the UI/API flow you added) and saw a clear success signal (e.g. event id, HTTP 2xx, or expected output).
+- [ ] **Secrets:** The platform Outpost API key remains **server-side** / **environment** only — not in client bundles, not hard-coded in committed source.
+- [ ] **Repeatable:** Env vars, how to run, and how to verify with the test destination above are stated briefly (README, comments, or chat — match the task size; a one-file script may need only inline or chat notes).
+
+**When editing an existing application repository (Option 3 or equivalent):**
+
+- [ ] **Build integrity:** Generated outputs, route or module registries, and dependency lockfiles are **consistent** with new or edited source so a **clean** install + typecheck or build (or the repo’s documented CI step) would pass.
+
**Files on disk:** When your environment supports it, **write runnable artifacts into the operator’s project workspace** (real files: scripts, app source, `package.json`, `go.mod`, README) rather than only pasting long code in chat—so they can run, diff, and commit. Keep everything for one task in the same directory. For **minimal example apps** (option 2), scaffold and install dependencies there as you normally would (for example `npm` / `npx`, `go mod`, `pip` or `uv`). For **Option 3** full-stack products, change both **backend and frontend** (or equivalent UI layer) when the repo already includes a customer-facing app—do not stop at OpenAPI-only unless the product is genuinely API-only or the operator asks to skip UI work.
**Concepts:** Each **tenant** is one of the platform’s customers. A tenant has **zero or more destinations**; each **destination** is a **subscription**—a destination type (e.g. webhook) plus **which topics** to receive and **where** to deliver (e.g. HTTPS URL). Your **backend** publishes with **`tenant_id`**, **`topic`**, and payload; Outpost fans out to every destination of that tenant that subscribes to that topic. Read **{{DOCS_URL}}/concepts** and **{{DOCS_URL}}/guides/building-your-own-ui** for the full model and recommended screens. Topics for this project are listed above and were configured in the Hookdeck dashboard.
From 7ab552d7ade4faa2c0c6b6b9c6ab585471b67097 Mon Sep 17 00:00:00 2001
From: Phil Leggetter
Date: Fri, 10 Apr 2026 02:23:54 +0100
Subject: [PATCH 22/47] docs: refresh Building Your Own UI guide
Reword for customer-facing UI builders: clearer tenant/auth framing,
configurable API base URL, less internal jargon and emphasis noise.
Add implementation checklists for planning, destinations, activity,
and safe rendering without duplicating the OpenAPI mapping tables.
Made-with: Cursor
---
docs/pages/guides/building-your-own-ui.mdx | 495 ++++++---------------
1 file changed, 146 insertions(+), 349 deletions(-)
diff --git a/docs/pages/guides/building-your-own-ui.mdx b/docs/pages/guides/building-your-own-ui.mdx
index e3d90fad5..fd8496b76 100644
--- a/docs/pages/guides/building-your-own-ui.mdx
+++ b/docs/pages/guides/building-your-own-ui.mdx
@@ -2,51 +2,87 @@
title: "Building Your Own UI"
---
-While Outpost offers a Tenant User Portal, you may want to build your own UI for users to manage their destinations and view their events.
+While Outpost offers a Tenant User Portal, you may want to build your own UI so your customers can manage their destinations and view delivery activity.
-The portal is built using the Outpost API with JWT authentication. You can leverage the same API to build your own UI.
+The portal uses the same Outpost API you can call from your product. Its source is a useful reference ([`internal/portal`](https://github.com/hookdeck/outpost/tree/main/internal/portal), React); you are not required to match its stack.
-Within this guide, we will use the User Portal as a reference implementation for a simple UI. You can find the full source code for the User Portal [here](https://github.com/hookdeck/outpost/tree/main/internal/portal).
+This guide is framework-agnostic. It describes screens, flows, and how they map to the API. For paths, query parameters, request and response JSON, status codes, and authentication, use the [OpenAPI specification](/docs/api) as the authoritative contract. If anything here disagrees with OpenAPI, trust the spec.
-In this guide, we will assume you are using React (client-side) to build your own UI, but the same principles can be applied to any other framework.
+### Working from OpenAPI
+
+Each screen should map to named operations in the spec (list destinations, create destination, list events, and so on). Use the published schemas for request bodies and list rows.
+
+Destination type labels, icons, and dynamic form fields come from `GET /destination-types`—specifically `config_fields` and `credential_fields` (see [Destination type metadata and dynamic config](#destination-type-metadata-and-dynamic-config)). That response is the source for field keys and types, not guesses from older examples.
+
+If the browser calls Outpost directly, use the tenant JWT flows documented in OpenAPI. If you proxy through your backend (often called a BFF), your server performs the same operations with your session and injects `tenant_id` where the admin-key flows require it.
+
+The portal shows full UI code for complex forms; this page avoids long framework-specific snippets so the spec stays the single place for shapes and validation.
## UI structure and flow
-Outpost’s tenant portal is a good reference for how screens map to the **tenant → destinations → topics → delivery target** model. When you build your own UI, keep the same structure so operators and end users are not forced into a misleading “single global webhook URL” mental model.
+The tenant portal illustrates how screens map to tenant → destinations → topics → delivery target. Following that shape helps your customers understand subscriptions and targets instead of a single anonymous “webhook URL.”
**Tenant context**
-- Everything below is **scoped to one tenant**—the signed-in customer in your SaaS or the account selected in your platform. That tenant id is what you pass to Outpost when listing or creating destinations and when publishing from your backend.
-- If you use JWT auth against Outpost, the token is issued **for that tenant**; if you proxy through your API, your routes should resolve the current customer to a `tenant_id` and forward it on list/create/publish calls.
+- Everything below applies to one tenant at a time: the signed-in account in your SaaS (your customer). Use that account’s `tenant_id` when listing or creating destinations and when publishing from your backend.
+- With a tenant JWT, the token is scoped to that tenant. If you proxy through your API, resolve the signed-in account to `tenant_id` and forward it on list, create, and publish calls.
**Recommended areas / screens**
| Area | Purpose |
| ---- | ------- |
-| **Destinations list** | Show all destinations for the current tenant (each row is one subscription: type, human-readable **target** such as webhook URL, subscribed topics). Entry point to edit, disable, or remove. |
-| **Create destination** | Multi-step flow aligned with the API: (1) **choose destination type**, (2) **select topics** (from the topics configured on your Outpost project—often checkboxes or multi-select), (3) **configure** type-specific fields (e.g. webhook URL, credentials). Optional: instructions or remote setup links from the destination type schema. |
-| **Events and delivery attempts** | List recent events for the tenant, optionally scoped to one **destination**, and inspect **delivery attempts** (success/failure, response metadata). Support **manual retry** from the UI when an attempt failed—see [Events, attempts, and retries](#events-attempts-and-retries) below. |
+| Destinations list | All destinations for the current tenant (type, human-readable target such as webhook URL, queue name, or Hookdeck label, plus subscribed topics). Entry point to edit, disable, or remove. |
+| Create destination | Multi-step flow: (1) choose destination type, (2) select topics from your Outpost project configuration, (3) fill type-specific config from the type schema. Optional: instructions or remote setup URL from the schema. |
+| Events and delivery attempts | Default pattern: open activity from a destination (events, then attempts, then retry in that context). Optional: a tenant-wide activity view with a destination filter for support or power users. See [Default information architecture](#default-information-architecture-multi-destination-products) and [Events, attempts, and retries](#events-attempts-and-retries). |
+
+### Default information architecture (multi-destination products)
+
+When a tenant can have many destinations—of any [destination type](/docs/destinations) your project enables—the primary path is destination → activity: people ask “what was delivered to this subscription?” rather than seeing all traffic in one undifferentiated list. The same API applies for webhooks, queues, and other types; only create/edit forms differ, driven by [destination type metadata and dynamic config](#destination-type-metadata-and-dynamic-config).
+
+For list events and list attempts, reuse the same endpoints everywhere: vary query parameters (for example `destination_id`, cursors) rather than inventing parallel client-side contracts. Pagination and auth details are defined in [OpenAPI](/docs/api); [Events, attempts, and retries](#events-attempts-and-retries) below summarizes how those endpoints support common UI needs.
+
+**Example routes** (rename to fit your product—integrations, event destinations, webhooks, etc.):
+
+| Example route | What it does | Spec |
+| ------------- | ------------ | ---- |
+| `…/destinations` or `…/integrations` | Hub: list destinations; create or drill down | [Listing destinations](#listing-configured-destinations) · [List destinations](/docs/api/destinations#list-destinations) |
+| `…/destinations/new` (or wizard) | Create destination: choose type ([types](/docs/destinations); `GET /destination-types` in [OpenAPI](/docs/api)), then topics and config | [Creating a destination](#creating-a-destination) |
+| `…/destinations/:destinationId` | Detail: edit config, enable/disable, topics | [OpenAPI](/docs/api) — Destinations |
+| `…/destinations/:destinationId/activity` | Activity for this destination: events, attempts, retry | [Events, attempts, and retries](#events-attempts-and-retries) · [List events](/docs/api/events#list-events) · [List attempts](/docs/api/attempts#list-attempts) |
+| `…/activity` (optional) | Tenant-wide activity; optional filter by `destination_id` | Same list-events operation with different query params ([OpenAPI](/docs/api)) |
+
+For the conceptual model, see [Outpost Concepts](/docs/concepts), especially “How this fits your product.”
+
+## OpenAPI: core operations for a tenant dashboard
-For how tenants, destinations, and topics fit together in a multi-tenant product, see [Outpost Concepts](/docs/concepts)—especially **How this fits your product**.
+| Goal | OpenAPI entry point | In the UI |
+| ---- | ------------------- | --------- |
+| Types, labels, icons, dynamic form defs | [Destination types / schema](/docs/api/schemas) — `GET /destination-types` | Type picker; join list rows on `destination.type` (the type id is `type`, not a separate `id` on the type object). |
+| Topics for subscriptions | [Topics](/docs/api/topics#list-topics) — `GET /topics` | Checkboxes or multi-select on create/update. |
+| List destinations | [List destinations](/docs/api/destinations#list-destinations) | Main table; show `target` / `target_url` per schema. |
+| Create destination | [Create destination](/docs/api/destinations#create-destination) | Body: `type`, `topics`, type-specific `config` / credentials per spec. |
+| Get / update / delete | [OpenAPI](/docs/api) — Destinations | Detail and edit flows. |
+| Tenant JWT (optional browser calls) | [Tenant JWT](/docs/api/tenants#get-tenant-jwt-token) | Short-lived token; BFF is often simpler if you need to hide capabilities. |
+| Events, attempts, retry | [Events](/docs/api/events#list-events), [Attempts](/docs/api/attempts#list-attempts), [Retry](/docs/api/attempts#retry-attempt) | Activity and recovery; see below. |
## Authentication
-To perform API calls on behalf of your tenants, you can either generate a JWT token, which can be used client-side to make Outpost API calls, or you can proxy any API requests to the Outpost API through your own API. When proxying through your own API, you can ensure the API call is made for the currently authenticated tenant using the API `tenant_id` parameter.
+You can issue a tenant JWT for client-side calls to Outpost, or proxy requests through your own API. With a proxy, attach your platform’s Outpost API key on the server and scope each call to the authenticated tenant (for example via `tenant_id` on admin-key routes).
-Proxying through your own API can be useful if you want to limit access to some configuration or functionality of Outpost.
+Proxying is useful when you want to restrict which Outpost features are exposed or to keep the admin key off the client entirely.
### API base URL (managed and self-hosted)
-Examples below use a single variable **`API_URL`** (or **`OUTPOST_API_BASE_URL`** in shell snippets): the **root URL for Outpost’s HTTP API**, with **no trailing slash**. Paths in this guide match the [OpenAPI specification](/docs/api) (`/tenants/...`, `/topics`, `/destination-types`, …).
+Use one configurable base URL for Outpost (no trailing slash), for example `API_URL` or `OUTPOST_API_BASE_URL`. Paths in this guide match [OpenAPI](/docs/api) (`/tenants/...`, `/topics`, `/destination-types`, …).
-- **Hookdeck Outpost (managed):** use the base URL from your project (for example `https://api.outpost.hookdeck.com/2025-07-01`). The [managed curl quickstart](/docs/quickstarts/hookdeck-outpost-curl) uses the same pattern.
-- **Self-hosted Outpost:** use your deployment’s public origin **plus** whatever path prefix your install uses (commonly **`/api/v1`**), e.g. `https://outpost.internal.example.com/api/v1`. For local dev, use your actual host and port (see your deployment docs—do not assume a specific port in shared snippets).
+- **Managed Hookdeck Outpost:** use the base URL from your project (see the [curl quickstart](/docs/quickstarts/hookdeck-outpost-curl)).
+- **Self-hosted:** use your deployment’s public origin plus any path prefix (often `/api/v1`). Local development should still read host and port from configuration or environment so the same code works in staging and production.
-Do **not** hardcode `localhost` in product docs or copy-paste snippets meant for operators; always substitute your real base URL. The React snippets assume `API_URL` already includes any `/api/v1` segment so that `${API_URL}/tenants/destinations` resolves correctly for your environment.
+In your product, treat the base URL like any other external service: load it from config or env, not from literals baked into client bundles.
### Generating a JWT Token (Optional)
-You can generate a JWT token by using the [Tenant JWT Token API](/docs/api/tenants#get-tenant-jwt-token).
+See the [Tenant JWT Token API](/docs/api/tenants#get-tenant-jwt-token).
```bash
export OUTPOST_API_BASE_URL="https://api.outpost.hookdeck.com/2025-07-01" # or your self-hosted root, e.g. …/api/v1
@@ -56,358 +92,119 @@ curl --request GET "$OUTPOST_API_BASE_URL/tenants/$TENANT_ID/token" \
--header "Authorization: Bearer "
```
-## Fetching Destination Type Schema
-
-The destination type schema can be fetched using the [Destination Types Schema API](/docs/api/schemas). It can be used to render destination information such as the destination type icon and label. Additionally, the schema includes the destination type configuration fields, which can be used to render the destination configuration UI.
-
-Each entry returned by `GET /destination-types` includes:
-
-- **`type`** — string identifier for the destination kind (for example `webhook`). Use this as the stable key when mapping rows to [list destinations](/docs/api/destinations#list-destinations) results (`destination.type` refers to the same value). It is **not** named `id` in the API.
-- **`label`**, **`description`**, **`icon`** — display metadata; **`icon`** is typically an SVG string (some examples and older code may call this field `svg`—the JSON field is **`icon`**).
-- **`config_fields`** and **`credential_fields`** — arrays of field definitions for the configuration step (snake_case in JSON responses).
-
-Always align your UI types with the [OpenAPI schema](/docs/api) or a live response—do not assume generic names like `id` for the destination type identifier.
-
-## Listing Configured Destinations
-
-Destinations are listed using the [List Destinations API](/docs/api/destinations#list-destinations). Destinations can be listed by type and topic. Since each destination type has different configuration, the `target` field can be used to display a recognizable label for the destination, such as the Webhook URL, the SQS queue URL, or Hookdeck Source Name associated with the destination. Each destination type will return a sensible `target` value to display.
-
-```tsx
-// React example to fetch and render a list of destinations
-// API_URL = Outpost API root (managed project URL or self-hosted origin + /api/v1)
-
-const [destinations, setDestinations] = useState([]);
-
-const [destination_types, setDestinationTypes] = useState([]);
-
-const fetchDestinations = async () => {
- // Get the tenant destinations (JWT infers tenant — see Authentication API)
- const response = await fetch(`${API_URL}/tenants/destinations`, {
- headers: {
- Authorization: `Bearer ${token}`,
- },
- });
-
- const destinations = await response.json();
- setDestinations(destinations);
-};
-
-const fetchDestinationTypes = async () => {
- const response = await fetch(`${API_URL}/destination-types`, {
- headers: {
- Authorization: `Bearer ${token}`,
- },
- });
-
- const destination_types = await response.json();
- setDestinationTypes(destination_types);
-};
-
-useEffect(() => {
- fetchDestinations();
- fetchDestinationTypes();
-}, []);
-
-if (!destination_types || !destinations) {
- return
Loading...
;
-}
-
-// Key by `type` (API identifier), not `id` — see "Fetching Destination Type Schema" above.
-const destination_type_map = destination_types.reduce((acc, dt) => {
- if (dt.type) acc[dt.type] = dt;
- return acc;
-}, {});
-
-return (
-
- {destinations.map((destination) => {
- const meta = destination_type_map[destination.type];
- if (!meta) return null;
- return (
-
-);
-```
+## Destination type metadata and dynamic config
-You can find the source code of the `DestinationList.tsx` component of the User Portal here: [DestinationList.tsx](https://github.com/hookdeck/outpost/blob/main/internal/portal/src/scenes/DestinationsList/DestinationList.tsx)
-
-## Creating a Destination
-
-To create a destination, the form will require three steps: one to choose the destination type, one to select the topics (optional), and one to configure the destination.
-
-### Choosing the Destination Type
-
-The list of available destination types is rendered from the list of destination types fetched from the API.
-
-```tsx
-const [destination_types, setDestinationTypes] = useState([]);
-
-const fetchDestinationTypes = async () => {
- const response = await fetch(`${API_URL}/destination-types`, {
- headers: {
- Authorization: `Bearer ${token}`,
- },
- });
-
- const destination_types = await response.json();
- setDestinationTypes(destination_types);
-};
-
-useEffect(() => {
- fetchDestinationTypes();
-}, []);
-
-const handleSubmit = (e: React.FormEvent) => {
- e.preventDefault();
- const formData = new FormData(e.target as HTMLFormElement);
- const destination_type = formData.get("type");
- goToNextStep(destination_type);
-};
-
-if (!destination_types) {
- return
Loading...
;
-}
-
-return (
-
-
Choose a destination type
-
- {destination_types.map((dt) => (
-
-
-
-
- {dt.label}
-
-
{dt.description}
-
- ))}
-
-
-);
-```
+`GET /destination-types` returns everything needed to render type pickers and config forms. See the [Destination Types Schema API](/docs/api/schemas).
-You can find the source code of the `CreateDestination.tsx` component of the User Portal here: [CreateDestination.tsx](https://github.com/hookdeck/outpost/blob/main/internal/portal/src/scenes/CreateDestination/CreateDestination.tsx)
-
-### Selecting Topics
-
-Available topics are returned from the [List Topics API](/docs/api/topics#list-topics). You can display the list of topics as a list of checkboxes to capture the user input.
-
-```tsx
-const [topics, setTopics] = useState([]);
-
-const fetchTopics = async () => {
- const response = await fetch(`${API_URL}/topics`, {
- headers: {
- Authorization: `Bearer ${token}`,
- },
- });
-
- const topics = await response.json();
- setTopics(topics);
-};
-
-useEffect(() => {
- fetchTopics();
-}, []);
-
-if (!topics) {
- return
-);
-```
+Each entry typically includes (confirm names and optionality in OpenAPI):
-You can find the source code of the `TopicPicker.tsx` component of the User Portal here: [TopicPicker.tsx](https://github.com/hookdeck/outpost/blob/main/internal/portal/src/common/TopicPicker/TopicPicker.tsx)
+- `type` — Stable identifier (e.g. `webhook`). Matches `destination.type` on list rows; not named `id` on the type object.
+- `label`, `description`, `icon` — Display metadata; `icon` is often an SVG string (some older code used the name `svg`). Sanitize if you render inline HTML.
+- `config_fields`, `credential_fields` — Field definitions for the config step (snake_case in JSON). Include every field from both arrays on create and edit.
+- `instructions` — Markdown for complex setup (for example cloud resources).
+- `remote_setup_url` — Optional external setup flow before or instead of inline fields.
-### Configuring the Destination
+### Dynamic field shape (for forms)
-Using the destination type schema for the selected destination type, you can render a form to create and manage destinations configuration. The configuration fields are found in the **`config_fields`** and **`credential_fields`** arrays of the destination type schema (snake_case in JSON responses).
+Field objects are fully described in OpenAPI. Typically each has `key`, `label`, `type` (text vs checkbox), `required`, optional `description`, validation (`minlength`, `maxlength`, `pattern`), `default`, `disabled`, and `sensitive` (password-style; values may be masked after create—clear to edit).
-To render your form, you should render all fields from both arrays. Note that some of the `credentials_fields` will be obfuscated once the destination is created, and in order to edit the input, the value must be cleared first.
+**Reference:** [DestinationConfigFields.tsx](https://github.com/hookdeck/outpost/blob/main/internal/portal/src/common/DestinationConfigFields/DestinationConfigFields.tsx) maps schema fields to inputs.
-The input field schema is as follows:
+### Remote setup URL
-```ts
-type InputField = {
- type: "text" | "checkbox"; // Only text and checkbox fields are supported
- required: boolean; // If true, the field will be required
- description?: string; // Field description, to use as a tooltip
- sensitive?: boolean; // If true, the field will be obfuscated once the destination is created and should be treated as a password input
- default?: string; // Default value for the field
- minlength?: number; // Minimum length for the field
- maxlength?: number; // Maximum length for the field
- pattern?: string; // Regex validation pattern, to use with the input's pattern attribute
-};
-```
+When `remote_setup_url` is present, you can link users through an external setup flow (for example Hookdeck-managed configuration) instead of only inline fields.
-#### Remote Setup URL
-
-Some destination type schemas have a `remote_setup_url` property that contains a URL to a page where the destination can be configured. Destinations that support remote URLs have a simplified setup flow that doesn't require instructions. For example, with the Hookdeck destination, the user is taken through a setup flow managed by Hookdeck to configure the destination.
-
-The URL is optional but provides a better user experience than following sometimes lengthy instructions to configure the destination.
-
-#### Instructions
-
-Each destination type schema has an `instructions` property that contains instructions to configure the destination as a markdown string. These instructions should be displayed to the user to help them configure the destination, as for some destination types, such as AWS, the necessary configuration can be complex and require multiple steps by the user within AWS.
-
-Example of a destination configuration form:
-
-```tsx
-const DestinationConfigForm = ({
- destination_type,
-}: {
- destination_type: string;
-}) => {
- const [destination_types, setDestinationTypes] = useState([]);
- //... Fetch the destination type schema
-
- if (!destination_types) {
- return
- >
- );
-};
-```
+### Instructions
+
+Render `instructions` as markdown when the destination type needs context beyond simple fields.
+
+## Listing configured destinations
+
+Use the [List Destinations API](/docs/api/destinations#list-destinations). OpenAPI describes variants for admin API key (tenant in path or query) versus tenant JWT (tenant inferred from the token); choose the operations that match how you authenticate.
+
+- Call list and render `type`, `target`, `target_url` when present, and subscribed topics.
+- Optionally fetch `GET /destination-types` in parallel and map `type` string → schema row for `label` and `icon`.
+- Link each row to destination detail and destination-scoped activity ([Default information architecture](#default-information-architecture-multi-destination-products)).
+
+**Reference:** [DestinationList.tsx](https://github.com/hookdeck/outpost/blob/main/internal/portal/src/scenes/DestinationsList/DestinationList.tsx)
-You can find the source code of the `DestinationConfigForm.tsx` component of the User Portal here: [DestinationConfigForm.tsx](https://github.com/hookdeck/outpost/blob/main/internal/portal/src/common/DestinationConfigFields/DestinationConfigFields.tsx#L14)
+## Creating a destination
+
+The product flow is three steps; the API is typically one [create destination](/docs/api/destinations#create-destination) request once you have `type`, `topics`, and `config` (plus credentials if required). OpenAPI defines the body.
+
+### Step 1 — Choose destination type
+
+- Data: `GET /destination-types` ([schemas](/docs/api/schemas)).
+- Show each type’s `label`, `description`, and `icon`; store the chosen `type` string.
+
+**Reference:** [CreateDestination.tsx](https://github.com/hookdeck/outpost/blob/main/internal/portal/src/scenes/CreateDestination/CreateDestination.tsx)
+
+### Step 2 — Select topics
+
+- Data: `GET /topics` ([list topics](/docs/api/topics#list-topics)).
+- Collect topic strings, or `*` for all topics, as allowed by the create schema.
+
+**Reference:** [TopicPicker.tsx](https://github.com/hookdeck/outpost/blob/main/internal/portal/src/common/TopicPicker/TopicPicker.tsx)
+
+### Step 3 — Configure the destination
+
+- Read `config_fields` and `credential_fields` for the selected type from `GET /destination-types` (or a single-type endpoint if you use one—see OpenAPI).
+- If `remote_setup_url` is set, consider that flow first.
+- Otherwise render fields per [Dynamic field shape](#dynamic-field-shape-for-forms) and submit via [Create destination](/docs/api/destinations#create-destination).
+
+**Reference:** [DestinationConfigFields.tsx](https://github.com/hookdeck/outpost/blob/main/internal/portal/src/common/DestinationConfigFields/DestinationConfigFields.tsx)
## Events, attempts, and retries
-This section ties together **how customers see what was delivered** and **how they recover from failures**—without duplicating full UI code (see the [portal](https://github.com/hookdeck/outpost/tree/main/internal/portal) and [OpenAPI](/docs/api) for request/response shapes).
+This section connects what your customers see (what was delivered, what failed, how to retry) to the API. Request and response shapes live in [OpenAPI](/docs/api); the [portal](https://github.com/hookdeck/outpost/tree/main/internal/portal) shows one full implementation.
### How the pieces fit
-1. **Destinations list** — Each row is a subscription (type, target URL or label, topics). From here, users typically open **“Activity”**, **“Events”**, or **“Logs”** for that destination, or you filter a shared events view by `destination_id`.
-2. **Events** — An event is something your **backend published** (topic + payload). The [List Events API](/docs/api/events#list-events) returns a **paginated** list. Important query dimensions:
- - **`destination_id`** — only events that were routed to that destination (ideal for a per-destination screen).
- - **`topic`**, **time ranges**, **pagination** (`limit`, `next` / `prev` cursors) — for broader “recent activity” views.
- With a **tenant JWT**, results are limited to that tenant; with an **admin API key**, supply **`tenant_id`** (your BFF usually injects it from the signed-in customer).
-3. **Attempts** — Each row is one **delivery try** to a destination (status, HTTP code, timing, optional response payload). Link attempts to events via **`event_id`** and **`destination_id`**.
- - Tenant-wide: [List attempts](/docs/api/attempts#list-attempts) with `event_id` (and optionally `destination_id`).
- - Destination-scoped: `GET /tenants/{tenant_id}/destinations/{destination_id}/attempts` — see [OpenAPI / tenant destination attempts](/docs/api) (same filters, including `event_id` when drilling down).
-4. **Automatic vs manual retry** — Outpost [retries failed deliveries automatically](/docs/features/event-delivery) (backoff, limits). **Manual retry** lets a user trigger another delivery after fixing their endpoint—use [Retry event delivery](/docs/api/attempts#retry-attempt) (`POST /retry` with **`event_id`** and **`destination_id`**). The destination must be enabled and subscribed to the event’s topic; disabled destinations cannot be retried.
+1. **Destinations list** — Each row is a subscription. By default, link into destination-scoped activity ([Default information architecture](#default-information-architecture-multi-destination-products)). An optional tenant-wide activity route should still call the same list endpoints with different query parameters, not a separate unofficial API contract.
+2. **Events** — Your backend published each event (topic + payload). [List events](/docs/api/events#list-events) is paginated. Common filters: `destination_id` for a per-destination screen; `topic`, time ranges, and `limit` / `next` / `prev` for broader views. With a tenant JWT, results are limited to that tenant; with an admin key, supply `tenant_id` (your backend usually injects it for the signed-in account).
+3. **Attempts** — One row per delivery try (status, HTTP code, timing, optional response). Tie attempts to events with `event_id` and `destination_id`. Tenant-wide: [list attempts](/docs/api/attempts#list-attempts). Destination-scoped routes are under [OpenAPI](/docs/api) (tenant destination attempts).
+4. **Retry** — Outpost [retries automatically](/docs/features/event-delivery) with backoff. [Manual retry](/docs/api/attempts#retry-attempt) is `POST /retry` with `event_id` and `destination_id` after the customer fixes their endpoint. The destination must be enabled and subscribed to the event’s topic.
### What to expose in your dashboard UI
| User need | API direction |
| --------- | ------------- |
-| “What fired for my webhook?” | List **events** filtered by **`destination_id`**, then list **attempts** for the chosen **`event_id`** (and destination). |
-| “Why did it fail?” | Show attempt **status**, **code**, and **response** fields (when included); link to your own docs on fixing URLs, auth, or timeouts. |
-| “Send it again” | **Retry** button on failed attempts (or on the event row if you only show one destination) → `POST /retry`. Show **202** / success vs **400** (e.g. destination disabled) from the API. |
+| “What was delivered here?” (this destination) | List events with `destination_id`, then list attempts for the chosen `event_id` (and destination as needed)—same idea for webhooks, queues, and other types. |
+| “Why did it fail?” | Surface attempt status, code, and response when present; link to your docs on URLs, auth, or timeouts. |
+| “Send it again” | Retry on failed attempts → `POST /retry`; handle 202 vs errors such as disabled destination. |
### Implementation notes
-- **Pagination:** Event and attempt list endpoints are cursor-paged; your UI should pass through **`next`** / **`prev`** (or “Load more”) so busy tenants are usable.
-- **Auth:** If the browser never sees the admin key, proxy these endpoints from your backend and attach the platform **Outpost API key** server-side, scoping **`tenant_id`** to the logged-in customer—same pattern as destination CRUD.
-- **Reference UI:** The portal’s destination flow includes event listing for a destination—see [Events.tsx](https://github.com/hookdeck/outpost/blob/main/internal/portal/src/scenes/Destination/Events/Events.tsx) as a reference layout (not a copy-paste requirement).
+- Event and attempt lists use cursor pagination; pass through `next` and `prev` (or “load more”) for busy tenants.
+- If the browser never holds the admin key, proxy these calls through your backend with the platform key and the correct `tenant_id`, same as destination CRUD.
+- **Reference:** [Events.tsx](https://github.com/hookdeck/outpost/blob/main/internal/portal/src/scenes/Destination/Events/Events.tsx) for destination-scoped activity layout.
+
+## Implementation checklists
+
+These are readiness checks: they do not replace the tables above or OpenAPI. Use them to confirm nothing important was skipped before ship or when reviewing an implementation.
+
+### Planning and contract
+
+- [ ] Every call is scoped to the correct tenant (`tenant_id` on admin-key routes, or tenant inferred from JWT).
+- [ ] Outpost base URL comes from configuration or environment for dev, staging, and production (not a single hardcoded host in app code).
+- [ ] You chose an auth approach (browser JWT, server-side proxy/BFF, or mix) and use the matching OpenAPI operations and headers consistently.
+- [ ] Dynamic destination UI (labels, icons, form fields) is driven by `GET /destination-types`, not copied field lists from examples.
+
+### Destinations experience
+
+- [ ] List view shows type, human-readable target, and subscribed topics; each row reaches detail edit and destination-scoped activity.
+- [ ] Create flow covers: pick type → select topics (`GET /topics`) → collect `config` and credentials per the selected type’s `config_fields` and `credential_fields`.
+- [ ] When a type exposes `instructions` or `remote_setup_url`, the UI surfaces them (markdown / external flow) so customers are not blocked on opaque fields.
+- [ ] Detail supports lifecycle your product needs: view, update, delete, enable/disable—per OpenAPI and your product policy.
+
+### Activity, attempts, and retries
+
+- [ ] Default path is destination → events → attempts; optional tenant-wide activity still uses the same list endpoints with different query parameters.
+- [ ] Cursor pagination is implemented for busy tenants (`next` / `prev` or equivalent “load more”).
+- [ ] Failed deliveries show enough context (status, HTTP code, response when present) for customers to fix their side.
+- [ ] Manual retry is available where appropriate; errors such as disabled destination are handled with a clear message.
+
+### Content from the API
+
+- [ ] Inline icons or `instructions` markdown are rendered safely if they contain HTML or untrusted strings.
+- [ ] Sensitive credential fields respect masking and “clear to edit” behavior described in the spec.
From 320c039c9d3ba93d7fae492a230c6a623a78b605 Mon Sep 17 00:00:00 2001
From: Phil Leggetter
Date: Fri, 10 Apr 2026 02:28:32 +0100
Subject: [PATCH 23/47] =?UTF-8?q?docs(eval):=20align=20scenarios=2008?=
=?UTF-8?q?=E2=80=9310,=20prompt,=20and=20heuristics?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
- Agent prompt: topic reconciliation, domain vs test publish, full-stack UI
guidance; remove eval-flavored Turn 0 / next-run wording in template.
- score-transcript: publish_beyond_test_only for 08/09/10 (domain publish).
- Scenarios + README: success criteria and Turn 1 nudges match prompt.
- SCENARIO-RUN-TRACKER: scenario 09 review notes marked resolved.
Made-with: Cursor
---
docs/agent-evaluation/README.md | 10 +++-
docs/agent-evaluation/SCENARIO-RUN-TRACKER.md | 11 +++-
.../scenarios/08-integrate-nextjs-existing.md | 13 ++---
.../09-integrate-fastapi-existing.md | 10 ++--
.../scenarios/10-integrate-go-existing.md | 9 ++--
docs/agent-evaluation/src/score-transcript.ts | 50 +++++++++++++++++++
.../hookdeck-outpost-agent-prompt.mdx | 30 +++++++----
7 files changed, 107 insertions(+), 26 deletions(-)
diff --git a/docs/agent-evaluation/README.md b/docs/agent-evaluation/README.md
index 4a591adb8..87941a677 100644
--- a/docs/agent-evaluation/README.md
+++ b/docs/agent-evaluation/README.md
@@ -8,7 +8,7 @@ This folder contains **manual** scenario specs (markdown) and an **automated** r
|------|--------|
| **Human checklist** (full eval, including execution) | Each file under [`scenarios/`](scenarios/) — section **Success criteria** (static + **Execution (full pass)** rows). |
| **Manual run write-up** | [`results/RUN-RECORDING.template.md`](results/RUN-RECORDING.template.md) — copy to a local file under `results/` (gitignored). |
-| **Automated transcript rubric** (regex heuristics) | [`src/score-transcript.ts`](src/score-transcript.ts) — `scoreScenario01`–`scoreScenario10` (assistant text + tool-written file corpus). |
+| **Automated transcript rubric** (regex heuristics) | [`src/score-transcript.ts`](src/score-transcript.ts) — `scoreScenario01`–`scoreScenario10` (assistant text + tool-written file corpus). Scenarios **08–10** include **`publish_beyond_test_only`** (domain publish signal vs test-only). |
| **LLM judge** (Anthropic vs **`## Success criteria`** in each scenario) | [`src/llm-judge.ts`](src/llm-judge.ts) — runs after each scenario unless **`--no-score-llm`**; also `npm run score -- --llm`. |
**Deliberate scope:** `npm run eval` **requires** **`--scenario`**, **`--scenarios`**, or **`--all`**. There is no silent “run everything” default — you choose the scenarios and accept the cost. After **each** run: **`transcript.json`**, **`heuristic-score.json`**, and **`llm-score.json`** (judge reads the same **Success criteria** as humans). Exit **1** if any enabled score fails.
@@ -100,6 +100,14 @@ A **full pass** also answers: *did the generated curl / script / app succeed aga
2. Run the agent’s commands or start its app and complete the flows the scenario describes.
3. Record pass/fail in your run notes ([`results/RUN-RECORDING.template.md`](results/RUN-RECORDING.template.md)).
+#### Integration scenarios (08–10): depth to verify
+
+These measure **Option 3** (existing app), not a greenfield demo. When you **execute** the artifact:
+
+- **Topic reconciliation:** Confirm README maps **`publish` topics** to **real domain events** and, when Turn 0 is incomplete, tells the operator to **add topics in Hookdeck**—not to retarget the app to a stale list (unless the scenario was explicitly wiring-only).
+- **Domain publish:** Prefer a smoke step that performs a **real product action** (signup, create entity, etc.) and observe an accepted publish—not **only** a “send test event” button.
+- **Heuristic `publish_beyond_test_only`:** [`score-transcript.ts`](src/score-transcript.ts) adds a weak automated check that the transcript corpus suggests publish beyond synthetic test-only paths; it is **not** a substitute for execution or the LLM judge reading **Success criteria**.
+
## Single source of truth for the dashboard prompt
The **full prompt template** (the text operators paste as Turn 0) lives in **one** place:
diff --git a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
index 82a96208b..806d0c14a 100644
--- a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
+++ b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
@@ -28,7 +28,7 @@ Use this table while you **run scenarios one at a time** and **execute the gener
| 06 | [06-app-fastapi.md](scenarios/06-app-fastapi.md) | `2026-04-09T08-38-42-008Z-scenario-06` | Pass (8/8) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. `**main.py`** + `requirements.txt`, `outpost_sdk` + FastAPI. HTML: destinations list, add webhook (topics from API + URL), publish test event, delete. Execution: `python3 -m venv .venv`, `pip install -r requirements.txt`, run-dir `.env`, `uvicorn main:app` on :8766; **GET /** 200, **POST /destinations** 303, **POST /publish** 303. |
| 07 | [07-app-go-http.md](scenarios/07-app-go-http.md) | `2026-04-09T09-10-23-291Z-scenario-07` | Pass (9/9) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. `**go-portal-demo/`** — `main.go` + `templates/`, `net/http`, `outpost-go` (`replace` → repo `sdks/outpost-go`). Multi-step create destination + **GET/POST /publish**. Execution: `PORT=8777` + key/base from `docs/agent-evaluation/.env`; **GET /** 200, **POST /publish** 200. Eval ~25 min wall time. |
| 08 | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md) | `2026-04-09T14-48-16-906Z-scenario-08` | Pass (7/7) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. `**## Eval harness`** pre-clone + `**agent cwd`** = `next-saas-starter/` under run dir; artifact colocated (`app/api/outpost/`**, dashboard webhooks, `@hookdeck/outpost-sdk`). **Execution:** `npx tsc --noEmit` in `…/next-saas-starter/` — **exit 0**. Eval ~13 min wall time. Earlier run `2026-04-09T11-08-32-505Z-scenario-08`: work had landed outside run dir (no app tree in folder). |
-| 09 | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) | `2026-04-09T22-16-54-750Z-scenario-09` | Pass (6/6) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. **Artifact** lives under `results/runs/…` (**gitignored**): `full-stack-fastapi-template/` + Docker **outpost-local-s09**; ports **5173** / **8001** / **54333** / **1080**. **§ Scenario 09 — post-agent work** lists everything applied after the agent transcript (incl. test publish, events/attempts/retry UI, docs + prompt). **Legacy runs:** `2026-04-09T20-48-16-530Z-scenario-09`, `2026-04-09T15-51-44-184Z-scenario-09`. |
+| 09 | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) | `2026-04-09T22-16-54-750Z-scenario-09` | Pass (6/6) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. **Artifact** lives under `results/runs/…` (**gitignored**): `full-stack-fastapi-template/` + Docker **outpost-local-s09**; ports **5173** / **8001** / **54333** / **1080**. **§ Scenario 09 — post-agent work** lists everything applied after the agent transcript (incl. test publish, events/attempts/retry UI, docs + prompt). **§ Scenario 09 — review notes** — closed (IA + domain topics guidance landed in BYO UI + prompt). **Legacy runs:** `2026-04-09T20-48-16-530Z-scenario-09`, `2026-04-09T15-51-44-184Z-scenario-09`. |
| 10 | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md) | | | | | |
@@ -56,6 +56,15 @@ Work applied **after** the agent transcript so the FastAPI + React artifact matc
- [Building your own UI](../pages/guides/building-your-own-ui.mdx) — destination-type field fixes; **Events, attempts, and retries** section (features, how they connect, links to API).
- [Agent prompt template](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx) — full-stack guidance mentions **events list**, **attempts**, **retry**, alongside test publish.
+### Scenario 09 — review notes (resolved, 2026-04-10)
+
+Operator feedback from exercising the FastAPI full-stack artifact is **closed** in-repo:
+
+1. **Event activity IA** — [Building your own UI](../pages/guides/building-your-own-ui.mdx) documents **default** destination → activity and **optional** tenant-wide activity with the same list endpoints; no open doc gap.
+2. **Domain topics + real publishes vs test-only** — [Agent prompt](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx) (topic reconciliation, domain publish, test publish as separate), scenarios **08–10** success criteria + Turn 1 copy, [README](README.md) execution notes, and heuristic **`publish_beyond_test_only`** in [`src/score-transcript.ts`](src/score-transcript.ts) cover what we measure.
+
+The **copied agent template** (the `## Hookdeck Outpost integration` block) intentionally stays **scenario-agnostic**: it does not name eval baselines, harness repos, or scenario IDs—only product-level integration guidance and doc links.
+
### Column hints
diff --git a/docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md b/docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md
index 94c9b65ab..542a030fe 100644
--- a/docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md
+++ b/docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md
@@ -44,7 +44,7 @@ Paste the **## Template** block from [`hookdeck-outpost-agent-prompt.mdx`](../pa
> Option 3 — I’m not starting from scratch. **We’re already in the Next.js SaaS app in this workspace** — the baseline repo is checked out here. Install dependencies and get it runnable, then wire in **Hookdeck Outpost** so we can send **outbound webhooks** to our customers.
>
-> I need this tied to **something real in the app** (not a throwaway demo page), and I need to understand how each customer gets their webhook registered. Put whatever I need to configure in the README (env vars, etc.). Keep secrets on the server only.
+> I need this tied to **something real in the app** (not a throwaway demo page), and I need to understand how each customer gets their webhook registered. **Publish topic names should follow the app’s domain**; if Turn 0’s configured list is missing any name you need, document what to **add in the Outpost project**—don’t retarget real features to wrong topics just to match the list unless I explicitly asked for a minimal demo. Put whatever I need to configure in the README (env vars, etc.). Keep secrets on the server only.
### Turn 2 — User (optional)
@@ -56,16 +56,17 @@ Paste the **## Template** block from [`hookdeck-outpost-agent-prompt.mdx`](../pa
- Baseline app is the documented **next-saas-starter** (or an explicitly justified fork): harness clone under the run directory plus install / integration steps reflected in the transcript or that tree.
- **Outpost TypeScript SDK** used **server-side only**; no `NEXT_PUBLIC_*` API key.
-- At least one **publish** (or equivalent) tied to a **real code path** in the baseline (not dead code).
-- **Topic** aligns with Turn 0 configuration or is clearly named and documented.
-- **Per-customer webhook** story is explained: destination creation / subscription to topic.
+- **Topic reconciliation:** README or inline notes map **each `publish` topic** to a **real domain event**; if the app needs topics not in Turn 0, instructions say to **add them in Hookdeck** (domain-first—not reshaping product logic to fit a stale default list unless wiring-only scope was agreed).
+- At least one **publish** on a **real domain path** (signup, CRUD, billing, etc.)—**not** only a synthetic “test event” route. A separate test publish for wiring checks is fine but does **not** replace this.
+- **Per-customer webhook** story is explained: destination creation / subscription to topic; **tenant ↔ customer** mapping is consistent for publish and destination APIs.
- README (or equivalent) lists **env vars** for Outpost.
-- **Execution (full pass):** With `OUTPOST_API_KEY` set, the app runs; a manual path triggers the integrated publish and Outpost accepts the request (2xx/202 as appropriate). Run smoke tests from **`results/runs/…-scenario-08/next-saas-starter/`** (not transcript-only triage).
+- **Execution (full pass):** With `OUTPOST_API_KEY` set, the app runs; perform a **real in-app action** that triggers the domain publish and confirm Outpost accepts it (2xx/202). Optionally also run a test publish. Smoke from **`results/runs/…-scenario-08/next-saas-starter/`** (not transcript-only triage).
## Failure modes to note
- Pasting a greenfield Next app instead of integrating the **baseline** in the workspace.
-- Publishing only from a demo route unrelated to the product model.
+- Publishing only from a demo or **test-only** route with no domain path.
+- **Topics** in code with no README telling the operator to **add** them in Hookdeck when Turn 0 was incomplete (or silently retargeting domain logic to unrelated Turn 0 names).
- Calling Outpost from client components with secrets.
## Future baselines
diff --git a/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md b/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md
index 36f31229b..97a54756c 100644
--- a/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md
+++ b/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md
@@ -47,7 +47,7 @@ Paste the **## Template** from [`hookdeck-outpost-agent-prompt.mdx`](../pages/qu
> Option 3 — integrate Outpost into a real codebase. **We’re already in the full-stack FastAPI template in this workspace** — the repository is present here. Follow the project’s dev docs to get backend (and frontend if useful) running, then add **Hookdeck Outpost** for customer webhooks.
>
-> Hook publishing to **one real event** that already exists in the app (users, items, teams, whatever fits). Document topics, how tenants register webhook URLs, and env vars. Don’t leak the API key to the client.
+> Hook publishing to **one real event** that already exists in the app (users, items, teams, whatever fits). **Topic strings should match that domain**; if Turn 0’s list doesn’t include the right names yet, document what the operator must **add in the Outpost project**—don’t contort the app to arbitrary topics unless this is explicitly a minimal wiring pass. Document topics, how tenants register webhook URLs, and env vars. Don’t leak the API key to the client.
### Turn 2 — User (optional)
@@ -58,17 +58,19 @@ Paste the **## Template** from [`hookdeck-outpost-agent-prompt.mdx`](../pages/qu
**Measurement:** Heuristic `scoreScenario09` in [`src/score-transcript.ts`](../src/score-transcript.ts); LLM judge; execution manual.
- **full-stack-fastapi-template** (or documented alternative) present via harness **`preSteps`** with install steps in the transcript or tree.
-- **`outpost_sdk`** with **`publish.event`** (and related calls as needed) on a **real** code path in the **backend** (server-side only for secrets).
+- **`outpost_sdk`** with **`publish.event`** (and related calls as needed) on a **real** code path in the **backend** (server-side only for secrets)—**not** only a synthetic test-publish endpoint unless the scenario was explicitly scoped to wiring-only.
- API key from **environment** or secure settings — not hard-coded or exposed to clients.
-- **Topic** and **destination** story documented (README or inline); if the app has a UI, linking or exposing **safe** controls for webhook URLs is a plus.
+- **Topic reconciliation:** each **`topic` in code** ties to a real domain event; gaps vs Turn 0 are resolved by **operator adding topics in Hookdeck** (documented), not by retargeting domain logic to a mismatched list unless wiring-only scope was agreed.
+- **Destination** story documented; if the app has a UI, linking or exposing **safe** controls for customer destinations is a plus; **tenant id** usage consistent with publish.
- README (or equivalent) lists **env vars** for Outpost.
-- **Execution (full pass):** Stack runs per template docs; trigger path fires publish; Outpost accepts. *Skip for transcript-only.*
+- **Execution (full pass):** Stack runs per template docs; trigger a **real domain action** that fires publish; Outpost accepts. A test-publish button may be used **additionally** for smoke. *Skip for transcript-only.*
## Failure modes to note
- Greenfield FastAPI “hello world” instead of the **cloned** baseline.
- Using raw `httpx` to Outpost when the scenario asks for **`outpost_sdk`**.
- Putting `OUTPOST_API_KEY` in `NEXT_PUBLIC_*` / client bundles.
+- **Only** test/synthetic publish with no domain hook.
## Future baselines
diff --git a/docs/agent-evaluation/scenarios/10-integrate-go-existing.md b/docs/agent-evaluation/scenarios/10-integrate-go-existing.md
index bbe96d80f..be24d501e 100644
--- a/docs/agent-evaluation/scenarios/10-integrate-go-existing.md
+++ b/docs/agent-evaluation/scenarios/10-integrate-go-existing.md
@@ -41,7 +41,7 @@ Paste the **## Template** from [`hookdeck-outpost-agent-prompt.mdx`](../pages/qu
> Option 3 — existing Go API. **We’re already in the startersaas-go-api tree in this workspace** — the repository is present here. Get it building, then add **Hookdeck Outpost** for outbound webhooks.
>
-> Use **one real handler** as the publish trigger (signup, billing, etc.). API key from env only. Document how customers register webhook URLs and what to set in env. Use the test destination from the dashboard prompt where it helps.
+> Use **one real handler** as the publish trigger (signup, billing, etc.). **`topic` values should match that domain**; if Turn 0’s list is incomplete, document what to **add in the Outpost project**—don’t bend the handler to wrong topic names just to match the prompt unless this is explicitly minimal wiring. API key from env only. Document how customers register webhook URLs and what to set in env. Use the test destination from the dashboard prompt where it helps.
### Turn 2 — User (optional)
@@ -52,12 +52,13 @@ Paste the **## Template** from [`hookdeck-outpost-agent-prompt.mdx`](../pages/qu
**Measurement:** Heuristic `scoreScenario10` in [`src/score-transcript.ts`](../src/score-transcript.ts); LLM judge; execution manual.
- **startersaas-go-api** (or documented alternative) present via harness **`preSteps`** with build instructions attempted in the transcript or tree.
-- **Outpost Go SDK** used with **`Publish.Event`** (and related types) on a **real** handler path.
+- **Outpost Go SDK** used with **`Publish.Event`** (and related types) on a **real** handler path—not only a test-only route unless wiring-only scope was agreed.
- No API key in source; **`os.Getenv("OUTPOST_API_KEY")`** (or config loader) only.
-- **Topic** + **destination** documentation for operators.
-- **Execution (full pass):** Server runs; trigger handler; Outpost accepts publish. *Skip for transcript-only.*
+- **Topic reconciliation** (domain-first; operator adds missing Hookdeck topics as documented) + **destination** documentation for operators; **tenant** mapping consistent.
+- **Execution (full pass):** Server runs; trigger the **domain** handler; Outpost accepts publish. *Skip for transcript-only.*
## Failure modes to note
- New `main.go` only, without using the **cloned** baseline’s routes/models.
- Wrong `Create` shape without **`CreateDestinationCreateWebhook`** when creating webhook destinations.
+- Publish only from a **test** helper with no real handler path.
diff --git a/docs/agent-evaluation/src/score-transcript.ts b/docs/agent-evaluation/src/score-transcript.ts
index 976bd9bcd..9bc8df7d4 100644
--- a/docs/agent-evaluation/src/score-transcript.ts
+++ b/docs/agent-evaluation/src/score-transcript.ts
@@ -151,6 +151,29 @@ function containsLikelyLeakedKey(text: string): boolean {
return false;
}
+/**
+ * Option 3 (08–10): corpus should show publish on a real domain path, not only a synthetic
+ * “test event” / publish-test helper. Multiple publish sites, or one publish without test-only
+ * markers, passes. Weak signal — confirm with scenario Success criteria + execution smoke.
+ */
+function corpusSuggestsPublishBeyondTestOnly(corpus: string): boolean {
+ const t = corpus;
+ const publishHits = t.match(/publish\.event|Publish\.Event|PublishEvent/gi);
+ if (!publishHits?.length) return false;
+ if (publishHits.length >= 2) return true;
+ const lower = t.toLowerCase();
+ const testish =
+ /publish-test|publish_test|publishtest|test_publish|send test|synthetic.*(event|publish)|test event/.test(
+ lower,
+ );
+ if (!testish) return true;
+ const domainish =
+ /signup|register|user\.created|item\.|order\.|after_commit|post_save|on_.*create|createuser|create.?item|router\.(post|put|patch)|@router\.(post|put|patch)|handler\.|func.*create|def create_/.test(
+ lower,
+ ) && /publish|outpost/.test(lower);
+ return domainish;
+}
+
function scoreScenario01(corpus: string, assistant: string, meta: RunJson["meta"]): TranscriptScore {
const t = corpus;
const lower = t.toLowerCase();
@@ -778,6 +801,15 @@ function scoreScenario08(corpus: string, assistant: string): TranscriptScore {
: "Expected how operators register webhook URLs per customer/tenant",
});
+ const beyondTest = corpusSuggestsPublishBeyondTestOnly(t);
+ checks.push({
+ id: "publish_beyond_test_only",
+ pass: beyondTest,
+ detail: beyondTest
+ ? "Publish appears beyond a synthetic test-only path (or multiple publish sites)"
+ : "Expected domain publish (not only publish-test / send test) — see scenario Success criteria",
+ });
+
checks.push({
id: "no_key_in_reply",
pass: !containsLikelyLeakedKey(assistant),
@@ -844,6 +876,15 @@ function scoreScenario09(corpus: string, assistant: string): TranscriptScore {
detail: env ? "API key from environment / settings" : "Expected OUTPOST_API_KEY from env",
});
+ const beyondTest = corpusSuggestsPublishBeyondTestOnly(t);
+ checks.push({
+ id: "publish_beyond_test_only",
+ pass: beyondTest,
+ detail: beyondTest
+ ? "Publish appears beyond a synthetic test-only path (or multiple publish sites)"
+ : "Expected domain publish (not only publish-test / send test) — see scenario Success criteria",
+ });
+
checks.push({
id: "no_key_in_reply",
pass: !containsLikelyLeakedKey(assistant),
@@ -905,6 +946,15 @@ function scoreScenario10(corpus: string, assistant: string): TranscriptScore {
detail: envKey ? "Reads OUTPOST_API_KEY via os.Getenv" : "Expected os.Getenv(\"OUTPOST_API_KEY\")",
});
+ const beyondTest = corpusSuggestsPublishBeyondTestOnly(t);
+ checks.push({
+ id: "publish_beyond_test_only",
+ pass: beyondTest,
+ detail: beyondTest
+ ? "Publish appears beyond a synthetic test-only path (or multiple publish sites)"
+ : "Expected domain publish (not only publish-test / send test) — see scenario Success criteria",
+ });
+
checks.push({
id: "no_key_in_reply",
pass: !containsLikelyLeakedKey(assistant),
diff --git a/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx b/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx
index 9e08b3e03..16f348e09 100644
--- a/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx
+++ b/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx
@@ -10,7 +10,7 @@ This page is a **reference template** for the Hookdeck Outpost onboarding flow.
```
## Hookdeck Outpost integration
-You are helping integrate Hookdeck Outpost into a platform to deliver events (webhooks and event destinations) to the platform's customers.
+You are helping integrate Hookdeck Outpost into a platform to deliver events to the platform's customers via **event destinations** (webhook URLs, cloud queues, Hookdeck, and other supported types—see **{{DOCS_URL}}/destinations**).
### Credentials
@@ -21,6 +21,12 @@ You are helping integrate Hookdeck Outpost into a platform to deliver events (we
{{TOPICS_LIST}}
+These names must **exist in the Outpost project** (dashboard) for publishes and destination subscriptions to work.
+
+**Naming:** In typical B2B SaaS, lifecycle topics like **`user.created`** mean an **end-user of the tenant’s account** (your customer’s customer—e.g. a team member), **not** your platform’s internal operator or staff. Use topic names that match **your product’s domain** (`order.shipped`, `item.deleted`, …) when those are the real events.
+
+**Reconciliation (default):** Derive **`topic` strings in code** from **real state changes** in the app. If **Configured topics** above is missing a name the app should emit, **do not** bend the product model to fit the list—tell the operator to **add that topic in the Outpost project** (Hookdeck) and to **refresh `{{TOPICS_LIST}}`** in the dashboard so a regenerated prompt matches the project. Only narrow or rename domain publishes when the operator **explicitly** asks for a minimal wiring demo with a fixed topic set.
+
### Test destination
Use this **Hookdeck Console Source** URL to verify event delivery (the webhook `config.url`, or `OUTPOST_TEST_WEBHOOK_URL` in the SDK quickstarts). Your dashboard supplies it for this project:
@@ -36,7 +42,7 @@ Use this **Hookdeck Console Source** URL to verify event delivery (the webhook `
- Full docs bundle (when available on the public site): {{LLMS_FULL_URL}}
- API reference and OpenAPI (REST JSON shapes and status codes): {{DOCS_URL}}/api
- **Concepts — how tenants, destinations (subscriptions), topics, and publish fit a SaaS/platform:** {{DOCS_URL}}/concepts
-- **Building your own UI — screen structure and flow** (list destinations, create destination: type → topics → config; **events list**, delivery **attempts**, **manual retry**; tenant scope): {{DOCS_URL}}/guides/building-your-own-ui
+- **Building your own UI — screen structure and flow** (list destinations—**any type**; create: choose **type** → topics → type-specific config; **events** / **attempts** / **manual retry**; tenant scope; default **destination → activity**): {{DOCS_URL}}/guides/building-your-own-ui
- Destination types: {{DOCS_URL}}/destinations
- Topics and destination subscriptions (fan-out, `*`): {{DOCS_URL}}/features/topics
- SDK overview (mostly TypeScript-shaped examples): {{DOCS_URL}}/sdks — use **only** for high-level context; for **TypeScript, Python, or Go** code, follow that language’s **quickstart** for correct method signatures (e.g. Python `publish.event` uses `request={{...}}`, not TypeScript-style spreads as Python kwargs).
@@ -57,21 +63,21 @@ Do **not** mix patterns across languages (e.g. do not apply TypeScript `publish.
**Option 2 (small app)** — Map framework to the matching official SDK on the **server only**: e.g. **Next.js** → TypeScript SDK + patterns from the TypeScript quickstart and your Next conventions; **FastAPI** → Python SDK; **Go + net/http** → Go SDK. Prefer each language’s **quickstart** for Outpost call shapes. **Before** designing pages or forms, read **Concepts** and **Building your own UI** in the Documentation list: the UI should reflect **tenant scope**, **multiple destinations per tenant**, and **destination = topic subscription + delivery target** (not a single anonymous webhook field unless the user explicitly asks for that simplified shape).
-**Option 3 (existing app)** — Use the **official SDK for the repo’s backend language** on the **server** (or REST/OpenAPI if they insist on no SDK). Read that language’s quickstart for call shapes; integrate on **real** domain paths (signup, core entities, workflows), not throwaway demos.
+**Option 3 (existing app)** — Use the **official SDK for the repo’s backend language** on the **server** (or REST/OpenAPI if they insist on no SDK). Read that language’s quickstart for call shapes; integrate on **real** domain paths (signup, core entities, workflows), not throwaway demos. **Minimum integration depth:** (1) **Topic reconciliation** — every **`topic` in `publish`** must either appear under **Configured topics** above **or** be documented for the operator with **“add this topic in the Outpost project”** (prefer fixing the project to match the domain, not retargeting domain logic to a stale list). (2) **Domain publish** — at least one **`publish` on a real state-change path** (CRUD handler, service after commit, job, etc.), not only a “send test event” / synthetic demo route. (3) **Same tenant mapping** everywhere you call Outpost for that customer.
-**Full-stack existing apps (backend + product UI)** — If the codebase already has a **customer-facing UI** (dashboard, settings, integrations, account area) **or** a mobile app that talks to your API, assume operators want customers to **manage webhook destinations inside the product**, not only via raw API or Swagger:
+**Full-stack existing apps (backend + product UI)** — If the codebase already has a **customer-facing UI** (dashboard, settings, integrations, account area) **or** a mobile app that talks to your API, assume operators want customers to **manage event destinations** (every **destination type** the project enables—webhook, queues, Hookdeck, etc.; see **{{DOCS_URL}}/destinations** and **`GET /destination-types`** in OpenAPI) **inside the product**, not only via raw API or Swagger:
-- **Backend:** Keep **`OUTPOST_API_KEY`** and all Outpost SDK usage **server-side only**. Implement **tenant** upsert/sync where it fits your model, **publish** on real domain events, and **authenticated HTTP routes** (BFF / API routes / server actions—whatever matches the stack) that list, create, update, or delete destinations for the **currently signed-in customer’s** tenant. Those handlers call Outpost with the platform credentials; responses return only what the customer should see (e.g. destination ids, URLs, topics—never the platform API key).
+- **Backend:** Keep **`OUTPOST_API_KEY`** and all Outpost SDK usage **server-side only**. Implement **tenant** upsert/sync where it fits your model, **publish** on real domain events, and **authenticated HTTP routes** (BFF / API routes / server actions—whatever matches the stack) that list, create, update, or delete destinations for the **currently signed-in customer’s** tenant. Those handlers call Outpost with the platform credentials; responses return only what the customer should see (e.g. destination ids, **targets** / config summaries, topics—never the platform API key).
- **Frontend:** Wire **logged-in** pages to **your** backend endpoints (session cookie, JWT, or your existing API client)—**not** to Hookdeck’s API directly and **not** with the Outpost SDK in the browser. Reuse your design system and routing. **Before** building screens, read **Concepts** and **Building your own UI** in the Documentation list: flows should reflect **tenant scope**, **multiple destinations per tenant**, and **destination = topic subscription + delivery target** (avoid a single undifferentiated “webhook” field that hides topics unless the operator asks for that simplification).
-- **Events and retries in the product UI:** Surface an **events** view (filterable by **destination** when useful) so customers can see what was published, plus **delivery attempts** per event (success/failure, response hints). For **failed** attempts, offer **manual retry** (server-side `POST /retry` with `event_id` and `destination_id`) after they fix their endpoint—see **Building your own UI** for how this links to destinations and to automatic retries in Outpost.
-- **Send test events (strongly recommended for full-stack / Option 3):** When you ship customer-facing destination management, also add a **separate** control or screen that **publishes a test event** for the signed-in tenant (server-side `publish` to a selectable topic, same pattern as the test destination URL above). Treat this as **best practice** for SaaS products that offer webhooks: it proves end-to-end delivery without waiting on production traffic and matches what operators expect (similar to “send test webhook” in major platforms). Implement it **by default** for this integration path; the product team can remove or gate it later, but skipping it makes verification much harder.
+- **Events and retries in the product UI:** Surface an **events** view (filterable by **destination** when useful) so customers can see what was published, plus **delivery attempts** per event (success/failure, response hints). For **failed** attempts, offer **manual retry** (server-side `POST /retry` with `event_id` and `destination_id`) after they fix their endpoint or downstream config—see **Building your own UI** (default: **destination → activity**) for how this links to destinations and to automatic retries in Outpost.
+- **Send test events (strongly recommended for full-stack / Option 3):** When you ship customer-facing destination management, also add a **separate** control or screen that **publishes a test event** for the signed-in tenant (server-side `publish` to a selectable topic, same pattern as the test destination URL above). This is **complementary** to domain publishes: it proves wiring (destination + topic subscription + delivery) without waiting on real traffic. It **does not replace** a `publish` on a real domain path. The test topic can be any **configured** topic; domain publishes should use topics that match the events you document.
- **API-only or headless products:** If there is **no** customer UI, document how tenants manage destinations through **your** documented API (OpenAPI, etc.); still keep the platform key on the server.
### What to do
Guide the conversation, then act:
-1. **Try it out** — Minimal path: tenant → webhook destination → publish → print event id (or show success). If they want the **simplest** path, default to **curl** without making them say “curl.” If they name **TypeScript**, **Python**, or **Go**, use **only** that language’s quickstart and implied SDK. Ask only for what the quickstart and runnability still need (env vars, etc.).
+1. **Try it out** — Minimal path: tenant → **one destination** (often a webhook for quick verification) → publish → print event id (or show success). If they want the **simplest** path, default to **curl** without making them say “curl.” If they name **TypeScript**, **Python**, or **Go**, use **only** that language’s quickstart and implied SDK. Ask only for what the quickstart and runnability still need (env vars, etc.).
2. **Build a minimal example** — Small UI + server; use the **SDK for that stack** (see **Option 2** above) or REST if they choose HTTP-only. Follow **Concepts** + **Building your own UI** for the real product model. For a **tiny** demo (e.g. one page), still keep the model visible: **tenant** in scope, **create destination** as **topics + delivery target** (not one undifferentiated “webhook” field that hides topics), and a **separate** control or flow to **publish a test event** so the operator can verify delivery—avoid collapsing tenant setup, destination creation, and publish into a single form unless the user insists. An events or attempts view is optional for the smallest demo but matches the portal pattern when you have room.
@@ -91,11 +97,14 @@ Apply **only** the items below that fit the task; **skip** any that do not apply
**When editing an existing application repository (Option 3 or equivalent):**
+- [ ] **Topic reconciliation:** Every **`topic`** in `publish` is either in **Configured topics** above **or** README/chat tells the operator exactly which topics to **add in Hookdeck**—**domain-first**; do not retarget real features to wrong topic names to match an incomplete **Configured topics** list unless the operator explicitly asked for a minimal demo scope.
+- [ ] **Domain publish:** At least one **`publish` on a real application path** (entity create/update, signup, etc.), not solely a synthetic “test event” endpoint—unless the operator explicitly scoped the task to wiring-only.
+- [ ] **Test publish (if you added one):** Kept as a **separate** control from domain logic; does not satisfy the domain-publish item by itself.
- [ ] **Build integrity:** Generated outputs, route or module registries, and dependency lockfiles are **consistent** with new or edited source so a **clean** install + typecheck or build (or the repo’s documented CI step) would pass.
**Files on disk:** When your environment supports it, **write runnable artifacts into the operator’s project workspace** (real files: scripts, app source, `package.json`, `go.mod`, README) rather than only pasting long code in chat—so they can run, diff, and commit. Keep everything for one task in the same directory. For **minimal example apps** (option 2), scaffold and install dependencies there as you normally would (for example `npm` / `npx`, `go mod`, `pip` or `uv`). For **Option 3** full-stack products, change both **backend and frontend** (or equivalent UI layer) when the repo already includes a customer-facing app—do not stop at OpenAPI-only unless the product is genuinely API-only or the operator asks to skip UI work.
-**Concepts:** Each **tenant** is one of the platform’s customers. A tenant has **zero or more destinations**; each **destination** is a **subscription**—a destination type (e.g. webhook) plus **which topics** to receive and **where** to deliver (e.g. HTTPS URL). Your **backend** publishes with **`tenant_id`**, **`topic`**, and payload; Outpost fans out to every destination of that tenant that subscribes to that topic. Read **{{DOCS_URL}}/concepts** and **{{DOCS_URL}}/guides/building-your-own-ui** for the full model and recommended screens. Topics for this project are listed above and were configured in the Hookdeck dashboard.
+**Concepts:** Each **tenant** is one of the platform’s customers (an org/account you sell to). A tenant has **zero or more destinations**; each **destination** is a **subscription**—a **destination type** (webhook, queue, Hookdeck, …) plus **which topics** to receive and **where** to deliver (type-specific: URL, queue name, etc.). Your **backend** publishes with **`tenant_id`**, **`topic`**, and payload; Outpost fans out to every destination of that tenant that subscribes to that topic. Topic names should reflect **your product’s events**; **`user.*`** usually means **users inside that tenant’s account**, not your company’s internal operators. Read **{{DOCS_URL}}/concepts** and **{{DOCS_URL}}/guides/building-your-own-ui** for the full model and recommended screens. Topics for this project are listed above and were configured in the Hookdeck dashboard.
```
## Placeholder reference
@@ -103,7 +112,7 @@ Apply **only** the items below that fit the task; **skip** any that do not apply
| Placeholder | Example | Notes |
|-------------|---------|--------|
| `{{API_BASE_URL}}` | `https://api.outpost.hookdeck.com/2025-07-01` | Safe to embed in the prompt |
-| `{{TOPICS_LIST}}` | Bullet list or comma-separated topic names | From dashboard config |
+| `{{TOPICS_LIST}}` | Bullet list or comma-separated topic names | From dashboard config — operators should keep this aligned with what the integrated app will **publish** and what destinations subscribe to |
| `{{TEST_DESTINATION_URL}}` | **Required** — HTTPS URL of the Hookdeck Console **Source** created for this onboarding flow (fed in by the dashboard). |
| `{{DOCS_URL}}` | `https://outpost.hookdeck.com/docs` | Public docs root (no trailing slash). For unpublished docs, automated evals can set **`EVAL_LOCAL_DOCS=1`** so the Documentation section is replaced with repository file paths (see `docs/agent-evaluation/README.md`). |
| `{{LLMS_FULL_URL}}` | `https://hookdeck.com/outpost/docs/llms-full.txt` | Optional; omit the line if not live yet |
@@ -111,5 +120,6 @@ Apply **only** the items below that fit the task; **skip** any that do not apply
## Operator checklist (dashboard UI)
- Show **API base URL** and **topics** next to the copyable prompt.
+- **`{{TOPICS_LIST}}`:** Should match what the **integrated product** will publish (domain-first). If the baseline app emits events the project does not list yet, **add topics in Hookdeck** and refresh this list—avoid expecting the agent to **reshape the app** to fit a stale default (e.g. only `user.created` when the real model is `item.*`).
- Feed **`{{TEST_DESTINATION_URL}}`** from a Hookdeck Console **Source** URL you create for the operator (same value can be shown for `OUTPOST_TEST_WEBHOOK_URL` in env UI). Explain **Settings → Secrets** for `OUTPOST_API_KEY` (recommend a project **`.env`** or env-injection pattern, not pasting into the agent). Optional `OUTPOST_API_BASE_URL`.
- Keep the **API key out of the prompt text** to reduce exposure via model logs and chat history.
From 97aaa246256747379b2cca6531919652d3193229 Mon Sep 17 00:00:00 2001
From: Phil Leggetter
Date: Fri, 10 Apr 2026 02:54:56 +0100
Subject: [PATCH 24/47] =?UTF-8?q?docs(eval):=20de-meta=20user=20turns=20in?=
=?UTF-8?q?=20scenarios=208=E2=80=9310?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Rewrite Turn 1 blockquotes as natural operator speech; drop Option 3,
Turn 0, and prompt-section references. Align success-criteria wording
with configured onboarding topics. Tracker references user-turn scripts.
Made-with: Cursor
---
docs/agent-evaluation/SCENARIO-RUN-TRACKER.md | 2 +-
.../scenarios/08-integrate-nextjs-existing.md | 12 ++++----
.../09-integrate-fastapi-existing.md | 29 +++++++++++--------
.../scenarios/10-integrate-go-existing.md | 4 +--
4 files changed, 26 insertions(+), 21 deletions(-)
diff --git a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
index 806d0c14a..aba77c2a5 100644
--- a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
+++ b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
@@ -61,7 +61,7 @@ Work applied **after** the agent transcript so the FastAPI + React artifact matc
Operator feedback from exercising the FastAPI full-stack artifact is **closed** in-repo:
1. **Event activity IA** — [Building your own UI](../pages/guides/building-your-own-ui.mdx) documents **default** destination → activity and **optional** tenant-wide activity with the same list endpoints; no open doc gap.
-2. **Domain topics + real publishes vs test-only** — [Agent prompt](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx) (topic reconciliation, domain publish, test publish as separate), scenarios **08–10** success criteria + Turn 1 copy, [README](README.md) execution notes, and heuristic **`publish_beyond_test_only`** in [`src/score-transcript.ts`](src/score-transcript.ts) cover what we measure.
+2. **Domain topics + real publishes vs test-only** — [Agent prompt](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx) (topic reconciliation, domain publish, test publish as separate), scenarios **08–10** success criteria + user-turn scripts, [README](README.md) execution notes, and heuristic **`publish_beyond_test_only`** in [`src/score-transcript.ts`](src/score-transcript.ts) cover what we measure.
The **copied agent template** (the `## Hookdeck Outpost integration` block) intentionally stays **scenario-agnostic**: it does not name eval baselines, harness repos, or scenario IDs—only product-level integration guidance and doc links.
diff --git a/docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md b/docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md
index 542a030fe..9471a654c 100644
--- a/docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md
+++ b/docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md
@@ -9,7 +9,7 @@ Operators often have a **production-shaped SaaS codebase** (auth, teams, dashboa
## Preconditions
- Node 18+; `git` available.
-- Same Turn 0 placeholders as other scenarios (`OUTPOST_API_KEY` **not** in the prompt text; test destination URL from dashboard).
+- Same **initial onboarding prompt** as other scenarios (`OUTPOST_API_KEY` **not** in the pasted text; test destination URL from dashboard).
## Eval harness
@@ -42,9 +42,9 @@ Paste the **## Template** block from [`hookdeck-outpost-agent-prompt.mdx`](../pa
### Turn 1 — User
-> Option 3 — I’m not starting from scratch. **We’re already in the Next.js SaaS app in this workspace** — the baseline repo is checked out here. Install dependencies and get it runnable, then wire in **Hookdeck Outpost** so we can send **outbound webhooks** to our customers.
+> I’m integrating into our existing **Next.js** SaaS app—you’re in this repo with me. Install dependencies, get it running, then add **Hookdeck Outpost** so we can send **outbound webhooks** to our customers.
>
-> I need this tied to **something real in the app** (not a throwaway demo page), and I need to understand how each customer gets their webhook registered. **Publish topic names should follow the app’s domain**; if Turn 0’s configured list is missing any name you need, document what to **add in the Outpost project**—don’t retarget real features to wrong topics just to match the list unless I explicitly asked for a minimal demo. Put whatever I need to configure in the README (env vars, etc.). Keep secrets on the server only.
+> Tie it to **real product behavior** (not a throwaway demo page). I need a clear story for **how each customer registers their webhook** and which topics they receive. Use **topic names that match our domain**; if Hookdeck doesn’t list a topic we need yet, tell me exactly what to add in the project—don’t point our code at the wrong names just to match a short list unless I’ve said we’re only doing a quick wiring spike. Document env vars and setup in the **README**. Keep the Outpost API key on the **server** only.
### Turn 2 — User (optional)
@@ -56,7 +56,7 @@ Paste the **## Template** block from [`hookdeck-outpost-agent-prompt.mdx`](../pa
- Baseline app is the documented **next-saas-starter** (or an explicitly justified fork): harness clone under the run directory plus install / integration steps reflected in the transcript or that tree.
- **Outpost TypeScript SDK** used **server-side only**; no `NEXT_PUBLIC_*` API key.
-- **Topic reconciliation:** README or inline notes map **each `publish` topic** to a **real domain event**; if the app needs topics not in Turn 0, instructions say to **add them in Hookdeck** (domain-first—not reshaping product logic to fit a stale default list unless wiring-only scope was agreed).
+- **Topic reconciliation:** README or inline notes map **each `publish` topic** to a **real domain event**; if the app needs topics not in the **configured project list** from onboarding, instructions say to **add them in Hookdeck** (domain-first—not reshaping product logic to fit a stale default list unless wiring-only scope was agreed).
- At least one **publish** on a **real domain path** (signup, CRUD, billing, etc.)—**not** only a synthetic “test event” route. A separate test publish for wiring checks is fine but does **not** replace this.
- **Per-customer webhook** story is explained: destination creation / subscription to topic; **tenant ↔ customer** mapping is consistent for publish and destination APIs.
- README (or equivalent) lists **env vars** for Outpost.
@@ -66,9 +66,9 @@ Paste the **## Template** block from [`hookdeck-outpost-agent-prompt.mdx`](../pa
- Pasting a greenfield Next app instead of integrating the **baseline** in the workspace.
- Publishing only from a demo or **test-only** route with no domain path.
-- **Topics** in code with no README telling the operator to **add** them in Hookdeck when Turn 0 was incomplete (or silently retargeting domain logic to unrelated Turn 0 names).
+- **Topics** in code with no README telling the operator to **add** them in Hookdeck when the onboarding topic list was incomplete (or silently retargeting domain logic to unrelated configured names).
- Calling Outpost from client components with secrets.
## Future baselines
-Java / .NET “existing app” scenarios can follow the same shape: harness pre-clones a fixed public baseline into the run workspace + Option 3 Turn 1 (user already “in” the app) + Success criteria + `scoreScenarioNN`.
+Java / .NET “existing app” scenarios can follow the same shape: harness pre-clones a fixed public baseline into the run workspace + a natural-language **integration** Turn 1 + Success criteria + `scoreScenarioNN`.
diff --git a/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md b/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md
index 97a54756c..c24787d0a 100644
--- a/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md
+++ b/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md
@@ -12,7 +12,7 @@ Same as [scenario 8](08-integrate-nextjs-existing.md), but stack is **Python + F
- Python 3.10+; **Node.js 18+** (for the frontend); `git` available.
- **Docker** (recommended) — template dev flow uses Docker Compose for API, DB, and frontend; see repository `development.md`.
-- Same Turn 0 placeholders as other scenarios (`OUTPOST_API_KEY` **not** in the prompt text; test destination URL from dashboard).
+- Same **initial onboarding prompt** as other scenarios (`OUTPOST_API_KEY` **not** in the pasted text; test destination URL from dashboard).
## Eval harness
@@ -45,9 +45,11 @@ Paste the **## Template** from [`hookdeck-outpost-agent-prompt.mdx`](../pages/qu
### Turn 1 — User
-> Option 3 — integrate Outpost into a real codebase. **We’re already in the full-stack FastAPI template in this workspace** — the repository is present here. Follow the project’s dev docs to get backend (and frontend if useful) running, then add **Hookdeck Outpost** for customer webhooks.
+> This workspace is our **full-stack FastAPI + React** product (the template we ship). Follow the repo’s dev docs to bring up API, DB, and frontend, then integrate **Hookdeck Outpost** for **per-customer webhooks**.
>
-> Hook publishing to **one real event** that already exists in the app (users, items, teams, whatever fits). **Topic strings should match that domain**; if Turn 0’s list doesn’t include the right names yet, document what the operator must **add in the Outpost project**—don’t contort the app to arbitrary topics unless this is explicitly a minimal wiring pass. Document topics, how tenants register webhook URLs, and env vars. Don’t leak the API key to the client.
+> I want customers to manage **destinations** from the product (or through our authenticated API), a **separate** way to **fire a test event** that isn’t pretending to be production traffic, and enough **delivery visibility** that they can see **events**, **attempts**, and **retry** when something failed—all **through our backend**, never with the platform API key in the browser.
+>
+> Wire **publish** into **one real workflow** we already have (signups, records, teams—whatever fits this codebase). **Topics** should match that workflow. If Hookdeck doesn’t list a name we need, document what I should add there; don’t reshape the product around random topic strings unless I’ve said this is wiring-only. Document env vars and how **tenant** maps to our customer or team model. Don’t expose the API key to clients.
### Turn 2 — User (optional)
@@ -55,23 +57,26 @@ Paste the **## Template** from [`hookdeck-outpost-agent-prompt.mdx`](../pages/qu
## Success criteria
-**Measurement:** Heuristic `scoreScenario09` in [`src/score-transcript.ts`](../src/score-transcript.ts); LLM judge; execution manual.
+**Measurement:** Heuristic `scoreScenario09` in [`src/score-transcript.ts`](../src/score-transcript.ts); LLM judge (reads this section); execution manual.
- **full-stack-fastapi-template** (or documented alternative) present via harness **`preSteps`** with install steps in the transcript or tree.
- **`outpost_sdk`** with **`publish.event`** (and related calls as needed) on a **real** code path in the **backend** (server-side only for secrets)—**not** only a synthetic test-publish endpoint unless the scenario was explicitly scoped to wiring-only.
-- API key from **environment** or secure settings — not hard-coded or exposed to clients.
-- **Topic reconciliation:** each **`topic` in code** ties to a real domain event; gaps vs Turn 0 are resolved by **operator adding topics in Hookdeck** (documented), not by retargeting domain logic to a mismatched list unless wiring-only scope was agreed.
-- **Destination** story documented; if the app has a UI, linking or exposing **safe** controls for customer destinations is a plus; **tenant id** usage consistent with publish.
-- README (or equivalent) lists **env vars** for Outpost.
-- **Execution (full pass):** Stack runs per template docs; trigger a **real domain action** that fires publish; Outpost accepts. A test-publish button may be used **additionally** for smoke. *Skip for transcript-only.*
+- **Domain + test publish:** At least one **`publish` on a real domain path** (entity create/update, signup, etc.). A **separate** test-publish path or control is **also** expected for this baseline so operators can smoke-test wiring without waiting on production traffic—it **does not** replace the domain publish requirement.
+- API key from **environment** or secure backend settings only — not hard-coded, not exposed via **`NEXT_PUBLIC_*`**, **`VITE_*`**, or other client-visible env patterns.
+- **Topic reconciliation:** each **`topic` in code** ties to a real domain event; gaps vs the **configured project topic list** from onboarding are resolved by **adding topics in Hookdeck** (documented), not by retargeting domain logic to a mismatched list unless wiring-only scope was agreed.
+- **Destinations + tenant:** Per-customer (or per-team) **destination** management is **documented** and, where this template ships a dashboard, implemented with **safe** UI or BFF routes (list/create/edit as appropriate). **`tenant_id`** (or equivalent) is consistent between publish and destination APIs.
+- **Delivery visibility (full-stack bar):** Because this baseline includes a **customer-facing UI**, the product should expose **event activity** aligned with [Building your own UI](../../pages/guides/building-your-own-ui.mdx): customers can see **events** (e.g. filterable by destination), **attempts** for a selected event, and **manual retry** for failed deliveries—all via **your** authenticated backend calling Outpost (admin key server-side), not from the browser with the platform key. Omit only if the user explicitly scoped the task to **backend-only** or excluded activity UI.
+- **Operator docs:** Root **README**, **backend/README**, **development.md**, or **`.env.example`** (whichever the template uses) lists **Outpost env vars** and how to run and verify.
+- **Execution (full pass):** Stack runs per template docs; trigger a **real domain action** that fires publish; Outpost accepts. Optionally exercise test publish and activity/retry in the UI. *Skip for transcript-only.*
## Failure modes to note
- Greenfield FastAPI “hello world” instead of the **cloned** baseline.
- Using raw `httpx` to Outpost when the scenario asks for **`outpost_sdk`**.
-- Putting `OUTPOST_API_KEY` in `NEXT_PUBLIC_*` / client bundles.
-- **Only** test/synthetic publish with no domain hook.
+- Putting `OUTPOST_API_KEY` in `NEXT_PUBLIC_*`, `VITE_*`, or other client bundles.
+- **Only** test/synthetic publish with no domain hook, or **only** domain publish with no **separate** test-publish control when a dashboard is in scope.
+- **No** events/attempts/retry surfaced for customers when the baseline includes a product UI and the user did not ask to skip that scope.
## Future baselines
-Other “existing FastAPI app” pins can follow the same shape: harness pre-clone + Option 3 Turn 1 + success criteria + `scoreScenario09`.
+Other “existing FastAPI app” pins can follow the same shape: harness pre-clone + natural-language integration Turn 1 + success criteria + `scoreScenario09`.
diff --git a/docs/agent-evaluation/scenarios/10-integrate-go-existing.md b/docs/agent-evaluation/scenarios/10-integrate-go-existing.md
index be24d501e..7daab8da6 100644
--- a/docs/agent-evaluation/scenarios/10-integrate-go-existing.md
+++ b/docs/agent-evaluation/scenarios/10-integrate-go-existing.md
@@ -39,9 +39,9 @@ Paste the **## Template** from [`hookdeck-outpost-agent-prompt.mdx`](../pages/qu
### Turn 1 — User
-> Option 3 — existing Go API. **We’re already in the startersaas-go-api tree in this workspace** — the repository is present here. Get it building, then add **Hookdeck Outpost** for outbound webhooks.
+> Existing **Go** API—you’re in this repo with me. Get it building, then add **Hookdeck Outpost** for outbound webhooks.
>
-> Use **one real handler** as the publish trigger (signup, billing, etc.). **`topic` values should match that domain**; if Turn 0’s list is incomplete, document what to **add in the Outpost project**—don’t bend the handler to wrong topic names just to match the prompt unless this is explicitly minimal wiring. API key from env only. Document how customers register webhook URLs and what to set in env. Use the test destination from the dashboard prompt where it helps.
+> Trigger **publish** from **one real handler** (signup, billing, etc.—not a throwaway test-only route by itself). **`topic` values should match that domain**. If our Hookdeck project’s topic list is missing something, document what to add; don’t point production code at the wrong names just to match a stub list unless I’ve said this is a minimal wiring pass. **`OUTPOST_API_KEY`** from env only. Explain how customers register webhook URLs and what to put in **README** / env. Use the **test receiver URL** from our Hookdeck setup when you want to prove delivery end-to-end.
### Turn 2 — User (optional)
From d5eef9129ef0a50fb500b11b26e10b8cf4fef1c2 Mon Sep 17 00:00:00 2001
From: Phil Leggetter
Date: Fri, 10 Apr 2026 02:56:20 +0100
Subject: [PATCH 25/47] feat(eval): extend scenario 09 transcript heuristics
Add no_client_bundled_outpost_key and readme_or_env_docs checks to
scoreScenario09 (align with full-stack success criteria).
Made-with: Cursor
---
docs/agent-evaluation/src/score-transcript.ts | 24 +++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/docs/agent-evaluation/src/score-transcript.ts b/docs/agent-evaluation/src/score-transcript.ts
index 9bc8df7d4..73bc8d5c8 100644
--- a/docs/agent-evaluation/src/score-transcript.ts
+++ b/docs/agent-evaluation/src/score-transcript.ts
@@ -876,6 +876,19 @@ function scoreScenario09(corpus: string, assistant: string): TranscriptScore {
detail: env ? "API key from environment / settings" : "Expected OUTPOST_API_KEY from env",
});
+ const clientKeyLeak =
+ /NEXT_PUBLIC_OUTPOST_API_KEY\s*[=:]/.test(t) ||
+ /VITE_OUTPOST_API_KEY\s*[=:]/.test(t) ||
+ /process\.env\.NEXT_PUBLIC_OUTPOST_API_KEY\b/.test(t) ||
+ /import\.meta\.env\.(?:VITE_OUTPOST_API_KEY|NEXT_PUBLIC_OUTPOST_API_KEY)\b/.test(t);
+ checks.push({
+ id: "no_client_bundled_outpost_key",
+ pass: !clientKeyLeak,
+ detail: clientKeyLeak
+ ? "Corpus suggests Outpost API key wired into client-visible env — keep server-side only"
+ : "No client env assignment/access for OUTPOST_API_KEY (NEXT_PUBLIC_/VITE_) in corpus",
+ });
+
const beyondTest = corpusSuggestsPublishBeyondTestOnly(t);
checks.push({
id: "publish_beyond_test_only",
@@ -885,6 +898,17 @@ function scoreScenario09(corpus: string, assistant: string): TranscriptScore {
: "Expected domain publish (not only publish-test / send test) — see scenario Success criteria",
});
+ const readmeOrEnvDocs =
+ /OUTPOST_API_KEY/.test(t) &&
+ /README|development\.md|\.env\.example|backend\/readme/i.test(t);
+ checks.push({
+ id: "readme_or_env_docs",
+ pass: readmeOrEnvDocs,
+ detail: readmeOrEnvDocs
+ ? "README / development.md / .env.example (or similar) touches OUTPOST_API_KEY"
+ : "Expected operator docs listing OUTPOST env vars (see scenario Success criteria)",
+ });
+
checks.push({
id: "no_key_in_reply",
pass: !containsLikelyLeakedKey(assistant),
From e415e33004172fe15fc1c0575e89400c47199266 Mon Sep 17 00:00:00 2001
From: Phil Leggetter
Date: Fri, 10 Apr 2026 02:56:27 +0100
Subject: [PATCH 26/47] feat(eval): persist run lifecycle sidecars
Write eval-run-started.json at scenario start; eval-failure.json on
uncaught errors; eval-aborted.json on SIGTERM/SIGINT. Register signal
handlers so interrupted runs leave a trace (SIGKILL still silent).
Made-with: Cursor
---
docs/agent-evaluation/src/run-agent-eval.ts | 162 ++++++++++++++------
1 file changed, 116 insertions(+), 46 deletions(-)
diff --git a/docs/agent-evaluation/src/run-agent-eval.ts b/docs/agent-evaluation/src/run-agent-eval.ts
index 7201e6e51..62c0d4ea1 100644
--- a/docs/agent-evaluation/src/run-agent-eval.ts
+++ b/docs/agent-evaluation/src/run-agent-eval.ts
@@ -7,6 +7,7 @@
* @see https://platform.claude.com/docs/en/agent-sdk/overview
*/
+import { writeFileSync } from "node:fs";
import { mkdir, readdir, readFile, writeFile } from "node:fs/promises";
import { dirname, join, resolve, sep } from "node:path";
import { fileURLToPath } from "node:url";
@@ -39,6 +40,41 @@ const PROMPT_MDX = join(
const SCENARIOS_DIR = join(EVAL_ROOT, "scenarios");
const RUNS_DIR = join(EVAL_ROOT, "results", "runs");
+/** Set while a scenario is in progress so SIGTERM/SIGINT can leave a sidecar (not SIGKILL). */
+let activeRunDirForSignal: string | null = null;
+
+function registerEvalSignalHandlers(): void {
+ const recordAbort = (signal: string) => {
+ if (!activeRunDirForSignal) return;
+ try {
+ writeFileSync(
+ join(activeRunDirForSignal, "eval-aborted.json"),
+ `${JSON.stringify(
+ {
+ abortedAt: new Date().toISOString(),
+ signal,
+ pid: process.pid,
+ note: "Process exited before transcript.json was written; long agent turns often print little to stdout.",
+ },
+ null,
+ 2,
+ )}\n`,
+ "utf8",
+ );
+ } catch {
+ // best-effort
+ }
+ };
+ process.once("SIGTERM", () => {
+ recordAbort("SIGTERM");
+ process.exit(143);
+ });
+ process.once("SIGINT", () => {
+ recordAbort("SIGINT");
+ process.exit(130);
+ });
+}
+
function isInitSystemMessage(m: SDKMessage): m is SDKSystemMessage {
return m.type === "system" && m.subtype === "init";
}
@@ -539,6 +575,8 @@ Agent cwd is usually the run directory. Scenarios may define ## Eval harness (JS
`Running ${selected.length} scenario(s): ${selected.join(", ")} (heuristic=${String(wantScore)}, llm=${String(wantLlm)})`,
);
+ registerEvalSignalHandlers();
+
for (const file of selected) {
const scenarioIdEarly = idFromFilename(file);
const runDir = join(RUNS_DIR, `${stamp}-scenario-${scenarioIdEarly}`);
@@ -550,59 +588,91 @@ Agent cwd is usually the run directory. Scenarios may define ## Eval harness (JS
const { agentCwd, writeGuardRoot } = await applyEvalHarness(runDir, harnessConfig);
const baseOptions = buildBaseOptions(agentCwd, writeGuardRoot);
console.error(`\n>>> Scenario ${file} (run dir ${runDir}, agent cwd ${agentCwd}) ...`);
- const result = await runOneScenario(file, filledTemplate, {
- skipOptional: values["skip-optional"] ?? false,
- baseOptions,
- scenarioMarkdown: scenarioMd,
- });
- const outPath = join(runDir, "transcript.json");
- const payload = {
- meta: {
- scenarioId: result.scenarioId,
- scenarioFile: result.scenarioFile,
- runDirectory: runDir,
- agentWorkspaceCwd: agentCwd,
- evalHarness: {
- preStepCount: harnessConfig.preSteps.length,
- agentCwd: harnessConfig.agentCwd,
+ activeRunDirForSignal = runDir;
+ await writeFile(
+ join(runDir, "eval-run-started.json"),
+ `${JSON.stringify(
+ {
+ startedAt: new Date().toISOString(),
+ pid: process.pid,
+ scenarioFile: file,
+ scenarioId: scenarioIdEarly,
+ note: "If you see this without transcript.json, the run may still be in progress, was interrupted (SIGTERM/SIGINT writes eval-aborted.json), crashed, or was SIGKILL’d (no sidecar). The agent phase often logs little until the turn completes.",
},
- repositoryRoot: REPO_ROOT,
- completedAt: new Date().toISOString(),
- sessionId: result.sessionId,
- turns: result.turns,
- },
- messages: result.allMessages,
- };
+ null,
+ 2,
+ )}\n`,
+ "utf8",
+ );
- await writeFile(outPath, JSON.stringify(payload, null, 2), "utf8");
- console.error(`Wrote ${outPath}`);
+ try {
+ const result = await runOneScenario(file, filledTemplate, {
+ skipOptional: values["skip-optional"] ?? false,
+ baseOptions,
+ scenarioMarkdown: scenarioMd,
+ });
- if (wantScore) {
- const report = await scoreRunFile(outPath);
- const scorePath = join(runDir, "heuristic-score.json");
- await writeFile(scorePath, `${JSON.stringify(report, null, 2)}\n`, "utf8");
- console.error(`Wrote ${scorePath} (transcript: ${report.transcript.passed}/${report.transcript.total}, overallTranscriptPass=${String(report.overallTranscriptPass)})`);
- if (report.overallTranscriptPass === false) {
- anyScoreFailure = true;
+ const outPath = join(runDir, "transcript.json");
+ const payload = {
+ meta: {
+ scenarioId: result.scenarioId,
+ scenarioFile: result.scenarioFile,
+ runDirectory: runDir,
+ agentWorkspaceCwd: agentCwd,
+ evalHarness: {
+ preStepCount: harnessConfig.preSteps.length,
+ agentCwd: harnessConfig.agentCwd,
+ },
+ repositoryRoot: REPO_ROOT,
+ completedAt: new Date().toISOString(),
+ sessionId: result.sessionId,
+ turns: result.turns,
+ },
+ messages: result.allMessages,
+ };
+
+ await writeFile(outPath, JSON.stringify(payload, null, 2), "utf8");
+ console.error(`Wrote ${outPath}`);
+
+ if (wantScore) {
+ const report = await scoreRunFile(outPath);
+ const scorePath = join(runDir, "heuristic-score.json");
+ await writeFile(scorePath, `${JSON.stringify(report, null, 2)}\n`, "utf8");
+ console.error(`Wrote ${scorePath} (transcript: ${report.transcript.passed}/${report.transcript.total}, overallTranscriptPass=${String(report.overallTranscriptPass)})`);
+ if (report.overallTranscriptPass === false) {
+ anyScoreFailure = true;
+ }
}
- }
- if (wantLlm) {
- const scenarioPath = scenarioMdPathFromRun(EVAL_ROOT, result.scenarioFile);
- const llmReport = await llmJudgeRun({
- runPath: outPath,
- scenarioMdPath: scenarioPath,
- apiKey: process.env.ANTHROPIC_API_KEY!.trim(),
- });
- const llmPath = join(runDir, "llm-score.json");
- await writeFile(llmPath, `${JSON.stringify(llmReport, null, 2)}\n`, "utf8");
- console.error(
- `Wrote ${llmPath} (LLM overall_transcript_pass=${String(llmReport.overall_transcript_pass)})`,
- );
- if (!llmReport.overall_transcript_pass) {
- anyScoreFailure = true;
+ if (wantLlm) {
+ const scenarioPathForJudge = scenarioMdPathFromRun(EVAL_ROOT, result.scenarioFile);
+ const llmReport = await llmJudgeRun({
+ runPath: outPath,
+ scenarioMdPath: scenarioPathForJudge,
+ apiKey: process.env.ANTHROPIC_API_KEY!.trim(),
+ });
+ const llmPath = join(runDir, "llm-score.json");
+ await writeFile(llmPath, `${JSON.stringify(llmReport, null, 2)}\n`, "utf8");
+ console.error(
+ `Wrote ${llmPath} (LLM overall_transcript_pass=${String(llmReport.overall_transcript_pass)})`,
+ );
+ if (!llmReport.overall_transcript_pass) {
+ anyScoreFailure = true;
+ }
}
+ } catch (err) {
+ const message = err instanceof Error ? err.message : String(err);
+ const stack = err instanceof Error ? err.stack : undefined;
+ await writeFile(
+ join(runDir, "eval-failure.json"),
+ `${JSON.stringify({ failedAt: new Date().toISOString(), message, stack }, null, 2)}\n`,
+ "utf8",
+ );
+ console.error(`Eval scenario failed (${file}):`, err);
+ throw err;
+ } finally {
+ activeRunDirForSignal = null;
}
}
From cbb6c516aec8cf7294695a0099ef967150030932 Mon Sep 17 00:00:00 2001
From: Phil Leggetter
Date: Fri, 10 Apr 2026 02:56:44 +0100
Subject: [PATCH 27/47] docs(eval): authoring AGENTS, README, shared Cursor
rule
Add docs/agent-evaluation/AGENTS.md (anti-leakage checklist), root
AGENTS.md pointer, and a Cursor rule scoped to docs/agent-evaluation/.
Document run sidecars, re-scoring, integration verification wording,
and scenario 09 heuristic summary. Fix placeholder fixtures markdown.
Made-with: Cursor
---
.cursor/rules/agent-evaluation-authoring.mdc | 14 ++++++
AGENTS.md | 5 ++
docs/agent-evaluation/AGENTS.md | 46 +++++++++++++++++++
docs/agent-evaluation/README.md | 20 +++++---
.../fixtures/placeholder-values-for-turn0.md | 14 +++---
5 files changed, 86 insertions(+), 13 deletions(-)
create mode 100644 .cursor/rules/agent-evaluation-authoring.mdc
create mode 100644 AGENTS.md
create mode 100644 docs/agent-evaluation/AGENTS.md
diff --git a/.cursor/rules/agent-evaluation-authoring.mdc b/.cursor/rules/agent-evaluation-authoring.mdc
new file mode 100644
index 000000000..34e509cce
--- /dev/null
+++ b/.cursor/rules/agent-evaluation-authoring.mdc
@@ -0,0 +1,14 @@
+---
+description: Authoring standards for docs/agent-evaluation (no eval leakage in user turns)
+globs: docs/agent-evaluation/**/*
+---
+
+When editing anything under `docs/agent-evaluation/`, read and follow **`docs/agent-evaluation/AGENTS.md`**.
+
+**Quick guardrails for `scenarios/*.md`:**
+
+- **`### Turn N — User`** blockquotes = in-character **product engineer** speech only.
+- **Never** in user lines: `Option 1/2/3`, `Turn 0`, `scenario`, `eval`, `success criteria`, `scoreScenario`, references to “the prompt/instructions you already have” or named template sections.
+- Put rubric detail in **`## Success criteria`** / **Intent** / **Failure modes**, not in the user quote.
+
+Full checklist and rationale: **`docs/agent-evaluation/AGENTS.md`**.
diff --git a/AGENTS.md b/AGENTS.md
new file mode 100644
index 000000000..0fb773eda
--- /dev/null
+++ b/AGENTS.md
@@ -0,0 +1,5 @@
+# Coding agent notes (Outpost)
+
+When you change files under **`docs/agent-evaluation/`** (scenarios, scoring, harness docs), read and apply **[`docs/agent-evaluation/AGENTS.md`](docs/agent-evaluation/AGENTS.md)** first. It defines anti–“teach to the test” rules for user-turn wording and scenario structure.
+
+For this repo’s PR review format, see **`CLAUDE.md`**.
diff --git a/docs/agent-evaluation/AGENTS.md b/docs/agent-evaluation/AGENTS.md
new file mode 100644
index 000000000..5ab942505
--- /dev/null
+++ b/docs/agent-evaluation/AGENTS.md
@@ -0,0 +1,46 @@
+# Agent evaluation — authoring rules for humans & coding agents
+
+This file applies to **everything under `docs/agent-evaluation/`** (scenarios, README, tracker, harness TypeScript). Follow it when adding or editing eval specs so we do not **teach to the test** or confuse **evaluator docs** with **in-character user speech**.
+
+## Who reads what
+
+| Audience | Content |
+|----------|---------|
+| **The model under test** | Turn 0 = pasted [`hookdeck-outpost-agent-prompt.mdx`](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx) template only, plus **Turn N — User** blockquotes (verbatim user role-play). |
+| **Humans / harness** | Intent, preconditions, eval harness JSON, Success criteria, Failure modes, `score-transcript.ts`, README. |
+
+**Never** put harness vocabulary into **user** lines. The user is a product engineer, not an eval runner.
+
+## Anti-leakage rules (user turns)
+
+In **`### Turn N — User`** blockquotes, **do not** use:
+
+- **Option 1 / 2 / 3** (those labels exist only inside the dashboard template; a real user says what they want in plain language).
+- **Turn 0**, **Turn 1**, or any **turn** numbering (that is script metadata).
+- Phrases like **“the instructions you already have”**, **“the full-stack section of the prompt”**, **“follow the Hookdeck Outpost template”** as a stand-in for requirements (the model already has Turn 0; state the *product ask*, not a pointer to a doc section).
+- **“Match the prompt”**, **“dashboard prompt”**, **“eval”**, **“scenario”**, **“success criteria”**, **heuristic names**, **`scoreScenarioNN`**.
+
+**Do** use natural operator language: stack, repo, product behavior, security (key on server), domain topics, README/env, Hookdeck project/topics **as the customer would say them**.
+
+It is fine for **Success criteria**, **Failure modes**, and **Intent** to name `scoreScenarioNN`, Turn 0, Option 3, etc. — those sections are not pasted as the user.
+
+## Alignment without parroting
+
+- **Product bar** (domain publish, topic reconciliation, full-stack UI depth) belongs in **Success criteria** and in the **prompt template** in `hookdeck-outpost-agent-prompt.mdx`.
+- **User turns** should **request outcomes** (“I need customers to see failed deliveries and retry”) not **cite** where in the template that is spelled out.
+
+If you add a new requirement, update **Success criteria** (and heuristics only when a **durable, low–false-positive** check exists). Do not stuff the verbatim rubric into the user quote.
+
+## Pre-merge checklist (scenarios)
+
+Before merging changes to `scenarios/*.md`:
+
+- [ ] Every **`> ...` user** line reads like a **real customer** message (read aloud test).
+- [ ] No **Option N** / **Turn 0** / **scenario** / **prompt section** leakage in user blockquotes.
+- [ ] **Success criteria** still state the full bar; nothing removed from criteria and only moved into user text.
+- [ ] If integration depth changed, **`src/score-transcript.ts`** and this **README** scenario table are updated when rubrics change.
+
+## Where Cursor loads this
+
+- A **repo-root** [`AGENTS.md`](../../AGENTS.md) points here so agents see this folder’s rules.
+- [`.cursor/rules/agent-evaluation-authoring.mdc`](../../.cursor/rules/agent-evaluation-authoring.mdc) applies when editing paths under `docs/agent-evaluation/`.
diff --git a/docs/agent-evaluation/README.md b/docs/agent-evaluation/README.md
index 87941a677..3cf6ffb4d 100644
--- a/docs/agent-evaluation/README.md
+++ b/docs/agent-evaluation/README.md
@@ -2,6 +2,8 @@
This folder contains **manual** scenario specs (markdown) and an **automated** runner that uses the [Claude Agent SDK](https://platform.claude.com/docs/en/agent-sdk/overview) (`src/run-agent-eval.ts`).
+**Authoring standards (user-turn wording, no eval leakage):** [`AGENTS.md`](AGENTS.md) — also enforced via [`.cursor/rules/agent-evaluation-authoring.mdc`](../../.cursor/rules/agent-evaluation-authoring.mdc) when editing here.
+
## Where success criteria live
| What | Where |
@@ -19,13 +21,17 @@ Each scenario run uses one directory:
`results/runs/-scenario-NN/`
-- **`transcript.json`** — full SDK log
-- **`heuristic-score.json`** / **`llm-score.json`** — by default (unless disabled above)
+- **`transcript.json`** — full SDK log (written only **after** the agent finishes all turns — long runs may show little console output until then)
+- **`eval-run-started.json`** — created as soon as a scenario begins (pid, scenario id); if present **without** `transcript.json`, the run was interrupted, is still going, crashed, or was **SIGKILL**’d (no sidecar for SIGKILL)
+- **`eval-failure.json`** — uncaught exception before a transcript was written
+- **`eval-aborted.json`** — **SIGTERM** or **SIGINT** (e.g. stopping the process) before completion
+- **`heuristic-score.json`** / **`llm-score.json`** — by default (unless disabled above)
- **Agent-written files** — the SDK **`cwd`** is this directory. Defaults include **`Write`**, **`Edit`**, and **`Bash`** for clones, installs, and generated code.
-Re-score a finished run without re-invoking the agent:
+Re-score a finished run without re-invoking the agent — uses **today’s** [`src/score-transcript.ts`](src/score-transcript.ts) and **scenario markdown on disk** (so LLM criteria update when you edit **`## Success criteria`**):
-- **`npm run score -- --run results/runs/`** — heuristic (add **`--llm`** for LLM only, **`--write`** to persist sidecars).
+- **`npm run score -- --run results/runs/ --write`** — refresh **`heuristic-score.json`**
+- Add **`--llm`** to also re-run the judge and write **`llm-score.json`** (needs **`ANTHROPIC_API_KEY`**)
Legacy flat files `*-scenario-NN.json` next to `runs/` are still accepted by **`npm run score`** for older runs.
@@ -102,9 +108,9 @@ A **full pass** also answers: *did the generated curl / script / app succeed aga
#### Integration scenarios (08–10): depth to verify
-These measure **Option 3** (existing app), not a greenfield demo. When you **execute** the artifact:
+These measure **existing-app integration**, not a greenfield demo. When you **execute** the artifact:
-- **Topic reconciliation:** Confirm README maps **`publish` topics** to **real domain events** and, when Turn 0 is incomplete, tells the operator to **add topics in Hookdeck**—not to retarget the app to a stale list (unless the scenario was explicitly wiring-only).
+- **Topic reconciliation:** Confirm README maps **`publish` topics** to **real domain events** and, when the **configured topic list from onboarding** is incomplete, tells the operator to **add topics in Hookdeck**—not to retarget the app to a stale list (unless the scenario was explicitly wiring-only).
- **Domain publish:** Prefer a smoke step that performs a **real product action** (signup, create entity, etc.) and observe an accepted publish—not **only** a “send test event” button.
- **Heuristic `publish_beyond_test_only`:** [`score-transcript.ts`](src/score-transcript.ts) adds a weak automated check that the transcript corpus suggests publish beyond synthetic test-only paths; it is **not** a substitute for execution or the LLM judge reading **Success criteria**.
@@ -164,7 +170,7 @@ There is still **no single portable “IDE agent” CLI** for all vendors; the S
| 06 | `scoreScenario06` | FastAPI, `outpost_sdk`, uvicorn, server env, two flows, README, webhook docs |
| 07 | `scoreScenario07` | `net/http`, Go SDK + `CreateDestinationCreateWebhook`, HTML UI, two flows, `go run`, README |
| 08 | `scoreScenario08` | Clone **next-saas-starter** (or git baseline), TS SDK, publish/destinations/tenants, server env key, per-customer webhook story |
-| 09 | `scoreScenario09` | Clone **full-stack-fastapi-template** (or git baseline), `outpost_sdk`, integration + domain hook, env key |
+| 09 | `scoreScenario09` | Clone **full-stack-fastapi-template** (or git baseline), `outpost_sdk`, integration + domain hook, env key, no client `NEXT_PUBLIC_`/`VITE_` key wiring, `publish_beyond_test_only`, README/env docs signal |
| 10 | `scoreScenario10` | Clone **startersaas-go-api** (or git baseline), Go Outpost SDK, publish + handler hook, env key |
Export **`SCENARIO_IDS_WITH_HEURISTIC_RUBRIC`** in `score-transcript.ts` lists IDs **01–10** for tooling.
diff --git a/docs/agent-evaluation/fixtures/placeholder-values-for-turn0.md b/docs/agent-evaluation/fixtures/placeholder-values-for-turn0.md
index 39d344677..2336f6352 100644
--- a/docs/agent-evaluation/fixtures/placeholder-values-for-turn0.md
+++ b/docs/agent-evaluation/fixtures/placeholder-values-for-turn0.md
@@ -12,13 +12,15 @@ For **`npm run eval -- --scenario …`** (or **`--scenarios`** / **`--all`**), t
## Example substitutions (non-secret)
-| Placeholder | Example |
-|-------------|---------|
-| `{{API_BASE_URL}}` | `https://api.outpost.hookdeck.com/2025-07-01` |
-| `{{TOPICS_LIST}}` | `- user.created` |
+
+| Placeholder | Example |
+| -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `{{API_BASE_URL}}` | `https://api.outpost.hookdeck.com/2025-07-01` |
+| `{{TOPICS_LIST}}` | `- user.created` |
| `{{TEST_DESTINATION_URL}}` | Hookdeck Console **Source** URL the dashboard feeds in (for automated evals, set `EVAL_TEST_DESTINATION_URL` to the same value). Example: `https://hkdk.events/...` |
-| `{{DOCS_URL}}` | `https://outpost.hookdeck.com/docs` (local Zudoku: same paths under `/docs`) |
-| `{{LLMS_FULL_URL}}` | Omit the line in the template if unused, or your public `llms-full.txt` URL |
+| `{{DOCS_URL}}` | `https://outpost.hookdeck.com/docs` (local Zudoku: same paths under `/docs`) |
+| `{{LLMS_FULL_URL}}` | Omit the line in the template if unused, or your public `llms-full.txt` URL |
+
---
From ce0be6b7407aa1005ca36f94fcbfd0b1fe09a546 Mon Sep 17 00:00:00 2001
From: Phil Leggetter
Date: Fri, 10 Apr 2026 10:38:01 +0100
Subject: [PATCH 28/47] feat(agent-evaluation): read/bash sandbox and sibling
harness sidecars
Restrict PreToolUse Read/Glob/Grep to the run directory (and docs/ when
EVAL_LOCAL_DOCS). Block Bash that touches the monorepo root outside those
areas; deny Agent unless EVAL_ALLOW_AGENT_TOOL. Split read vs write guard
env vars.
Write eval-started, eval-failure, and eval-aborted next to the run folder
under results/runs/ so the agent cannot read harness metadata. SIGTERM/
SIGINT abort payload includes runDirectory.
Made-with: Cursor
---
docs/agent-evaluation/src/run-agent-eval.ts | 289 +++++++++++++++++---
1 file changed, 253 insertions(+), 36 deletions(-)
diff --git a/docs/agent-evaluation/src/run-agent-eval.ts b/docs/agent-evaluation/src/run-agent-eval.ts
index 62c0d4ea1..3c34c7d24 100644
--- a/docs/agent-evaluation/src/run-agent-eval.ts
+++ b/docs/agent-evaluation/src/run-agent-eval.ts
@@ -9,7 +9,7 @@
import { writeFileSync } from "node:fs";
import { mkdir, readdir, readFile, writeFile } from "node:fs/promises";
-import { dirname, join, resolve, sep } from "node:path";
+import { basename, dirname, join, resolve, sep } from "node:path";
import { fileURLToPath } from "node:url";
import { parseArgs } from "node:util";
import dotenv from "dotenv";
@@ -40,20 +40,39 @@ const PROMPT_MDX = join(
const SCENARIOS_DIR = join(EVAL_ROOT, "scenarios");
const RUNS_DIR = join(EVAL_ROOT, "results", "runs");
-/** Set while a scenario is in progress so SIGTERM/SIGINT can leave a sidecar (not SIGKILL). */
-let activeRunDirForSignal: string | null = null;
+/**
+ * Harness-only status files next to the run folder (not inside `runDir`) so the agent sandbox cannot Read them.
+ * Example: `…/runs/2026-…-scenario-08/transcript.json` vs `…/runs/2026-…-scenario-08.eval-started.json`.
+ */
+function harnessSidecarPaths(runDir: string): {
+ started: string;
+ failure: string;
+ aborted: string;
+} {
+ const stem = basename(runDir);
+ return {
+ started: join(RUNS_DIR, `${stem}.eval-started.json`),
+ failure: join(RUNS_DIR, `${stem}.eval-failure.json`),
+ aborted: join(RUNS_DIR, `${stem}.eval-aborted.json`),
+ };
+}
+
+/** Paths for SIGTERM/SIGINT abort sidecar while a scenario is in progress (not SIGKILL). */
+let activeHarnessAbortContext: { readonly path: string; readonly runDirectory: string } | null = null;
function registerEvalSignalHandlers(): void {
const recordAbort = (signal: string) => {
- if (!activeRunDirForSignal) return;
+ const ctx = activeHarnessAbortContext;
+ if (!ctx) return;
try {
writeFileSync(
- join(activeRunDirForSignal, "eval-aborted.json"),
+ ctx.path,
`${JSON.stringify(
{
abortedAt: new Date().toISOString(),
signal,
pid: process.pid,
+ runDirectory: ctx.runDirectory,
note: "Process exited before transcript.json was written; long agent turns often print little to stdout.",
},
null,
@@ -367,7 +386,42 @@ function filePathIsInsideRunDir(runDir: string, filePath: string): boolean {
return target.startsWith(prefix);
}
-function toolInputFilePath(toolName: string, toolInput: unknown): string | undefined {
+function resolveMaybeRelativePath(p: string, agentCwd: string): string {
+ if (p.startsWith(sep) || /^[A-Za-z]:[\\/]/.test(p)) {
+ return resolve(p);
+ }
+ return resolve(agentCwd, p);
+}
+
+/** Read/Glob/Grep may touch the run directory, or (with local docs) only `repoRoot/docs`. */
+function pathAllowedForReadTool(
+ absPath: string,
+ runDir: string,
+ repoRoot: string,
+ localDocs: boolean,
+): boolean {
+ const p = resolve(absPath);
+ if (filePathIsInsideRunDir(runDir, p)) return true;
+ if (localDocs && filePathIsInsideRunDir(join(repoRoot, "docs"), p)) return true;
+ return false;
+}
+
+/**
+ * Bash: block commands that reference the Outpost repo root unless the reference stays under
+ * `runDir` or (local docs) `repoRoot/docs`.
+ */
+function bashCommandAllowed(command: string, runDir: string, repoRoot: string, localDocs: boolean): boolean {
+ const rr = resolve(repoRoot);
+ const rd = resolve(runDir);
+ const docRoot = localDocs ? resolve(join(repoRoot, "docs")) : null;
+ if (!command.includes(rr)) return true;
+ if (command.includes(rd)) return true;
+ if (docRoot && command.includes(docRoot)) return true;
+ if (localDocs && command.includes(join(repoRoot, "docs"))) return true;
+ return false;
+}
+
+function toolInputWritePath(toolName: string, toolInput: unknown): string | undefined {
if (toolName !== "Write" && toolName !== "Edit" && toolName !== "NotebookEdit") {
return undefined;
}
@@ -380,23 +434,140 @@ function toolInputFilePath(toolName: string, toolInput: unknown): string | undef
return undefined;
}
+function toolInputReadFilePath(toolInput: unknown): string | undefined {
+ if (typeof toolInput !== "object" || toolInput === null) return undefined;
+ const v = (toolInput as Record).file_path;
+ return typeof v === "string" && v.length > 0 ? v : undefined;
+}
+
+function preToolDeny(reason: string) {
+ return {
+ hookSpecificOutput: {
+ hookEventName: "PreToolUse" as const,
+ permissionDecision: "deny" as const,
+ permissionDecisionReason: reason,
+ },
+ };
+}
+
+/**
+ * Appended to Turn 0 so the model does not treat the Hookdeck Outpost monorepo as the integration target.
+ */
+function buildWorkspaceBoundaryAppendix(
+ runDir: string,
+ agentCwd: string,
+ repoRoot: string,
+ localDocs: boolean,
+): string {
+ const docsPath = join(repoRoot, "docs");
+ const docBullet = localDocs
+ ? `\n- You **may** use Read/Glob/Grep only under **\`${docsPath}\`** when following the **Documentation (local repository)** paths in this prompt—not elsewhere under **\`${repoRoot}\`** (no \`sdks/\`, \`internal/\`, \`go.mod\` at repo root, etc.).`
+ : `\n- Do **not** read or search the Hookdeck Outpost checkout on disk outside **\`${runDir}\`**; use the documentation URLs already listed above.`;
+
+ return `
+
+### Workspace boundary (automated eval session)
+
+- The **integration target** is **only** under **\`${runDir}\`** (shell cwd: **\`${agentCwd}\`**). Install dependencies, add SDK usage, routes, UI, and env/README notes **there**.
+- Do **not** use Read, Glob, Grep, or Bash to explore **\`${repoRoot}\`** except:${docBullet}
+- Do **not** use the **Agent** tool to spider the monorepo or another tree; implement the integration directly in this workspace.
+`;
+}
+
/**
- * PreToolUse hook: deny Write/Edit/NotebookEdit outside the run dir.
- * `canUseTool` is not reliable under `permissionMode: dontAsk`; hooks receive `permissionDecision` instead.
+ * PreToolUse hook: Write/Edit only under run dir; Read/Glob/Grep/Bash constrained to run dir (+ docs/ when EVAL_LOCAL_DOCS).
+ * \`EVAL_DISABLE_WORKSPACE_READ_GUARD=1\` — allow Read/Glob/Grep/Bash/Agent outside the sandbox.
+ * \`EVAL_DISABLE_WORKSPACE_WRITE_GUARD=1\` — allow Write/Edit outside the run directory (read sandbox unchanged unless also disabled above).
*/
-function createRunDirPreToolHook(allowedRootDir: string) {
+function createRunDirPreToolHook(ctx: {
+ allowedRootDir: string;
+ agentCwd: string;
+ runDir: string;
+ repoRoot: string;
+ localDocs: boolean;
+ readGuardOn: boolean;
+ writeGuardOn: boolean;
+}) {
+ const { allowedRootDir, agentCwd, runDir, repoRoot, localDocs, readGuardOn, writeGuardOn } = ctx;
+
return async (input: HookInput) => {
if (input.hook_event_name !== "PreToolUse") return {};
- const candidate = toolInputFilePath(input.tool_name, input.tool_input);
- if (!candidate) return {};
- if (filePathIsInsideRunDir(allowedRootDir, candidate)) return {};
- return {
- hookSpecificOutput: {
- hookEventName: "PreToolUse" as const,
- permissionDecision: "deny" as const,
- permissionDecisionReason: `Outpost agent-eval: ${input.tool_name} must target only the scenario workspace. Use a path under ${allowedRootDir} (e.g. outpost-quickstart.sh). Refused: ${resolve(candidate)}`,
- },
- };
+
+ if (readGuardOn && input.tool_name === "Agent" && !envFlagTruthy(process.env.EVAL_ALLOW_AGENT_TOOL)) {
+ return preToolDeny(
+ "Outpost agent-eval: the Agent subagent is disabled for fair scoring (set EVAL_ALLOW_AGENT_TOOL=1 to allow).",
+ );
+ }
+
+ if (readGuardOn && input.tool_name === "Read") {
+ const raw = toolInputReadFilePath(input.tool_input);
+ if (raw) {
+ const abs = resolveMaybeRelativePath(raw, agentCwd);
+ if (!pathAllowedForReadTool(abs, runDir, repoRoot, localDocs)) {
+ return preToolDeny(
+ `Outpost agent-eval: Read must stay under the scenario run directory or (with EVAL_LOCAL_DOCS) ${join(repoRoot, "docs")}. Refused: ${abs}`,
+ );
+ }
+ }
+ return {};
+ }
+
+ if (readGuardOn && input.tool_name === "Glob") {
+ const inp = input.tool_input;
+ if (typeof inp === "object" && inp !== null) {
+ const pathRaw = (inp as Record).path;
+ if (typeof pathRaw === "string" && pathRaw.length > 0) {
+ const abs = resolveMaybeRelativePath(pathRaw, agentCwd);
+ if (!pathAllowedForReadTool(abs, runDir, repoRoot, localDocs)) {
+ return preToolDeny(
+ `Outpost agent-eval: Glob path must stay under the run directory or repo docs/. Refused: ${abs}`,
+ );
+ }
+ }
+ }
+ return {};
+ }
+
+ if (readGuardOn && input.tool_name === "Grep") {
+ const inp = input.tool_input;
+ if (typeof inp === "object" && inp !== null) {
+ const pathRaw = (inp as Record).path;
+ if (typeof pathRaw === "string" && pathRaw.length > 0) {
+ const abs = resolveMaybeRelativePath(pathRaw, agentCwd);
+ if (!pathAllowedForReadTool(abs, runDir, repoRoot, localDocs)) {
+ return preToolDeny(
+ `Outpost agent-eval: Grep path must stay under the run directory or repo docs/. Refused: ${abs}`,
+ );
+ }
+ }
+ }
+ return {};
+ }
+
+ if (readGuardOn && input.tool_name === "Bash") {
+ const inp = input.tool_input;
+ if (typeof inp === "object" && inp !== null) {
+ const cmd = (inp as Record).command;
+ if (typeof cmd === "string" && cmd.trim().length > 0) {
+ if (!bashCommandAllowed(cmd, runDir, repoRoot, localDocs)) {
+ return preToolDeny(
+ `Outpost agent-eval: Bash must not traverse the Outpost monorepo outside this run (or docs/ when EVAL_LOCAL_DOCS=1). Refused command prefix: ${cmd.slice(0, 120)}${cmd.length > 120 ? "…" : ""}`,
+ );
+ }
+ }
+ }
+ return {};
+ }
+
+ if (writeGuardOn) {
+ const candidate = toolInputWritePath(input.tool_name, input.tool_input);
+ if (candidate && !filePathIsInsideRunDir(allowedRootDir, candidate)) {
+ return preToolDeny(
+ `Outpost agent-eval: ${input.tool_name} must target only the scenario run directory tree. Use a path under ${allowedRootDir}. Refused: ${resolve(candidate)}`,
+ );
+ }
+ }
+ return {};
};
}
@@ -413,11 +584,14 @@ function defaultEvalTools(env: NodeJS.ProcessEnv): string {
: "Read,Glob,Grep,WebFetch,Write,Edit,Bash";
}
-/**
- * @param agentWorkspaceCwd — process cwd for the agent (per-run directory, or a subfolder when the scenario defines `agentCwd` in ## Eval harness).
- * @param writeGuardRoot — PreToolUse hook allows Write/Edit only under this path (usually the per-run directory so the clone stays inside it).
- */
-function buildBaseOptions(agentWorkspaceCwd: string, writeGuardRoot: string): Options {
+function buildBaseOptions(ctx: {
+ agentCwd: string;
+ writeGuardRoot: string;
+ runDir: string;
+ repoRoot: string;
+ localDocs: boolean;
+}): Options {
+ const { agentCwd, writeGuardRoot, runDir, repoRoot, localDocs } = ctx;
const toolsRaw = defaultEvalTools(process.env);
const allowedTools = toolsRaw
.split(",")
@@ -432,7 +606,7 @@ function buildBaseOptions(agentWorkspaceCwd: string, writeGuardRoot: string): Op
const persistSession = process.env.EVAL_PERSIST_SESSION !== "false";
const o: Options = {
- cwd: agentWorkspaceCwd,
+ cwd: agentCwd,
allowedTools,
permissionMode: mode,
maxTurns: Number.isFinite(maxTurns) ? maxTurns : 80,
@@ -443,9 +617,25 @@ function buildBaseOptions(agentWorkspaceCwd: string, writeGuardRoot: string): Op
} as Record,
};
- if (!envFlagTruthy(process.env.EVAL_DISABLE_WORKSPACE_WRITE_GUARD)) {
+ const readGuardOn = !envFlagTruthy(process.env.EVAL_DISABLE_WORKSPACE_READ_GUARD);
+ const writeGuardOn = !envFlagTruthy(process.env.EVAL_DISABLE_WORKSPACE_WRITE_GUARD);
+ if (readGuardOn || writeGuardOn) {
o.hooks = {
- PreToolUse: [{ hooks: [createRunDirPreToolHook(writeGuardRoot)] }],
+ PreToolUse: [
+ {
+ hooks: [
+ createRunDirPreToolHook({
+ allowedRootDir: writeGuardRoot,
+ agentCwd,
+ runDir,
+ repoRoot,
+ localDocs,
+ readGuardOn,
+ writeGuardOn,
+ }),
+ ],
+ },
+ ],
};
}
@@ -504,6 +694,8 @@ Environment:
EVAL_PERMISSION_MODE Optional (default: dontAsk)
EVAL_PERSIST_SESSION Set to "false" to disable session persistence (breaks multi-turn resume)
EVAL_DISABLE_WORKSPACE_WRITE_GUARD Set to 1 to allow Write/Edit outside the run dir (not recommended)
+ EVAL_DISABLE_WORKSPACE_READ_GUARD Set to 1 to allow Read/Glob/Grep/Bash/Agent outside the run dir (+ docs/ when local)
+ EVAL_ALLOW_AGENT_TOOL Set to 1 to allow the Agent subagent (default: denied for fair scoring)
EVAL_SKIP_HARNESS_PRE_STEPS Set to 1 to skip ## Eval harness preSteps (git_clone, etc.); see scenario markdown
Outputs under docs/agent-evaluation/results/runs/ (gitignored): each scenario gets
@@ -554,8 +746,17 @@ Agent cwd is usually the run directory. Scenarios may define ## Eval harness (JS
}
if (values["dry-run"]) {
+ const localDocs = envFlagTruthy(process.env.EVAL_LOCAL_DOCS);
+ const sampleRun = join(RUNS_DIR, "dry-run-example-scenario");
+ const sampleAgent = join(sampleRun, "app-baseline");
+ const boundarySample = buildWorkspaceBoundaryAppendix(sampleRun, sampleAgent, REPO_ROOT, localDocs);
console.log("Dry run: would execute", selected.join(", "));
- console.log("Turn 0 length (chars):", filledTemplate.length);
+ console.log(
+ "Turn 0 base template (chars):",
+ filledTemplate.length,
+ "| + workspace boundary (~chars):",
+ boundarySample.length,
+ );
process.exit(0);
}
@@ -586,19 +787,35 @@ Agent cwd is usually the run directory. Scenarios may define ## Eval harness (JS
const scenarioMd = await readFile(scenarioPath, "utf8");
const harnessConfig = parseEvalHarness(scenarioMd);
const { agentCwd, writeGuardRoot } = await applyEvalHarness(runDir, harnessConfig);
- const baseOptions = buildBaseOptions(agentCwd, writeGuardRoot);
+ const localDocs = envFlagTruthy(process.env.EVAL_LOCAL_DOCS);
+ const baseOptions = buildBaseOptions({
+ agentCwd,
+ writeGuardRoot,
+ runDir,
+ repoRoot: REPO_ROOT,
+ localDocs,
+ });
+ const turn0Prompt =
+ filledTemplate + buildWorkspaceBoundaryAppendix(runDir, agentCwd, REPO_ROOT, localDocs);
console.error(`\n>>> Scenario ${file} (run dir ${runDir}, agent cwd ${agentCwd}) ...`);
- activeRunDirForSignal = runDir;
+ const sidecars = harnessSidecarPaths(runDir);
+ activeHarnessAbortContext = { path: sidecars.aborted, runDirectory: runDir };
await writeFile(
- join(runDir, "eval-run-started.json"),
+ sidecars.started,
`${JSON.stringify(
{
startedAt: new Date().toISOString(),
pid: process.pid,
scenarioFile: file,
scenarioId: scenarioIdEarly,
- note: "If you see this without transcript.json, the run may still be in progress, was interrupted (SIGTERM/SIGINT writes eval-aborted.json), crashed, or was SIGKILL’d (no sidecar). The agent phase often logs little until the turn completes.",
+ runDirectory: runDir,
+ harnessSidecars: {
+ started: sidecars.started,
+ failure: sidecars.failure,
+ aborted: sidecars.aborted,
+ },
+ note: "Transcript and score JSON live under runDirectory. Harness *.eval-*.json paths are siblings of the run folder (not inside it) so the agent cannot read eval metadata.",
},
null,
2,
@@ -607,7 +824,7 @@ Agent cwd is usually the run directory. Scenarios may define ## Eval harness (JS
);
try {
- const result = await runOneScenario(file, filledTemplate, {
+ const result = await runOneScenario(file, turn0Prompt, {
skipOptional: values["skip-optional"] ?? false,
baseOptions,
scenarioMarkdown: scenarioMd,
@@ -665,14 +882,14 @@ Agent cwd is usually the run directory. Scenarios may define ## Eval harness (JS
const message = err instanceof Error ? err.message : String(err);
const stack = err instanceof Error ? err.stack : undefined;
await writeFile(
- join(runDir, "eval-failure.json"),
- `${JSON.stringify({ failedAt: new Date().toISOString(), message, stack }, null, 2)}\n`,
+ sidecars.failure,
+ `${JSON.stringify({ failedAt: new Date().toISOString(), message, stack, runDirectory: runDir }, null, 2)}\n`,
"utf8",
);
console.error(`Eval scenario failed (${file}):`, err);
throw err;
} finally {
- activeRunDirForSignal = null;
+ activeHarnessAbortContext = null;
}
}
From 8ab658f2435f7d83ed161f5bc9846598244dba99 Mon Sep 17 00:00:00 2001
From: Phil Leggetter
Date: Fri, 10 Apr 2026 10:38:48 +0100
Subject: [PATCH 29/47] docs(agent-evaluation): document sidecars, sandbox, and
env vars
Describe sibling *.eval-*.json harness files and expanded PreToolUse
permissions (read guard, bash, Agent tool).
Made-with: Cursor
---
docs/agent-evaluation/README.md | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/docs/agent-evaluation/README.md b/docs/agent-evaluation/README.md
index 3cf6ffb4d..94246c975 100644
--- a/docs/agent-evaluation/README.md
+++ b/docs/agent-evaluation/README.md
@@ -22,9 +22,11 @@ Each scenario run uses one directory:
`results/runs/-scenario-NN/`
- **`transcript.json`** — full SDK log (written only **after** the agent finishes all turns — long runs may show little console output until then)
-- **`eval-run-started.json`** — created as soon as a scenario begins (pid, scenario id); if present **without** `transcript.json`, the run was interrupted, is still going, crashed, or was **SIGKILL**’d (no sidecar for SIGKILL)
-- **`eval-failure.json`** — uncaught exception before a transcript was written
-- **`eval-aborted.json`** — **SIGTERM** or **SIGINT** (e.g. stopping the process) before completion
+- **Harness sidecars (siblings of the run folder, not inside it)** — so the agent sandbox cannot read them:
+ - **`-scenario-NN.eval-started.json`** — written when the scenario begins (pid, scenario id, paths)
+ - **`-scenario-NN.eval-failure.json`** — uncaught exception before `transcript.json`
+ - **`-scenario-NN.eval-aborted.json`** — **SIGTERM** / **SIGINT** before completion (not **SIGKILL**)
+ If **`transcript.json`** is missing, check these files next to **`…/runs/-scenario-NN/`** (same directory as the run folder, not inside it).
- **`heuristic-score.json`** / **`llm-score.json`** — by default (unless disabled above)
- **Agent-written files** — the SDK **`cwd`** is this directory. Defaults include **`Write`**, **`Edit`**, and **`Bash`** for clones, installs, and generated code.
@@ -92,7 +94,12 @@ Two different things get called “permissions”:
2. **Claude Agent SDK `dontAsk` + `allowedTools`** — In `dontAsk` mode, tools **not** listed in `allowedTools` are denied (no prompt). Defaults include **`Write`**, **`Edit`**, and **`Bash`** so app scenarios can scaffold and install dependencies inside the per-run directory. With **`EVAL_LOCAL_DOCS=1`**: **`Read,Glob,Grep,Write,Edit,Bash`**. Otherwise **`Read,Glob,Grep,WebFetch,Write,Edit,Bash`**. Narrow **`EVAL_TOOLS`** only if you need a stricter harness (e.g. transcript-only, no shell).
-3. **Run-directory write guard** — a **`PreToolUse`** hook denies **`Write` / `Edit` / `NotebookEdit`** when the target path resolves **outside** the current `results/runs/-scenario-NN/` workspace (hooks enforce this under `permissionMode: dontAsk`; `canUseTool` alone does not). Set **`EVAL_DISABLE_WORKSPACE_WRITE_GUARD=1`** only for debugging. **`Bash`** can still redirect output outside the run dir; review transcripts if that matters.
+3. **Run-directory sandbox (`PreToolUse`)** — Under `permissionMode: dontAsk`, hooks enforce boundaries (not `canUseTool` alone):
+ - **Write / Edit / NotebookEdit** — target path must resolve under `results/runs/-scenario-NN/`. **`EVAL_DISABLE_WORKSPACE_WRITE_GUARD=1`** disables this only (debug).
+ - **Read / Glob / Grep** — must stay under that same run directory, and (when **`EVAL_LOCAL_DOCS=1`**) under **`docs/`** of the Outpost repo for local MDX/OpenAPI only. **`EVAL_DISABLE_WORKSPACE_READ_GUARD=1`** disables read/glob/grep/bash/agent checks (restores pre–workspace-sandbox behavior).
+ - **Bash** — commands must not reference the Outpost **`repositoryRoot`** on disk unless the reference stays inside the run dir or (with local docs) inside **`docs/`**.
+ - **Agent** (subagent) — **denied by default** so runs cannot spider the monorepo for “free” SDK context. **`EVAL_ALLOW_AGENT_TOOL=1`** to opt in.
+ - Turn 0 also appends a short **workspace boundary** block (absolute run-dir paths) so the model treats only the clone as the product under integration.
Changing **`EVAL_PERMISSION_MODE`** is usually unnecessary; widening **`EVAL_TOOLS`** (or using local docs) fixes most tool denials.
From cc6e7e05111bb2c7fc0dbce517408571e4b63ad0 Mon Sep 17 00:00:00 2001
From: Phil Leggetter
Date: Fri, 10 Apr 2026 10:38:56 +0100
Subject: [PATCH 30/47] docs(agent-evaluation): update scenario 01 tracker row
Record 2026-04-10 run, quickstart.sh artifact, execution smoke test, and
sibling harness sidecar layout.
Made-with: Cursor
---
docs/agent-evaluation/SCENARIO-RUN-TRACKER.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
index aba77c2a5..0ffd50a42 100644
--- a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
+++ b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
@@ -20,7 +20,7 @@ Use this table while you **run scenarios one at a time** and **execute the gener
| ID | Scenario file | Run directory (`results/runs/…`) | Heuristic | LLM judge | Execution (generated code) | Notes |
| --- | ------------------------------------------------------------------------------ | -------------------------------------- | ---------------------- | --------- | -------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| 01 | [01-basics-curl.md](scenarios/01-basics-curl.md) | `2026-04-08T15-07-08-923Z-scenario-01` | Pass (7/7) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost-quickstart.sh`**. Execution: `docs/agent-evaluation/.env` + `./outpost-quickstart.sh`; tenant 200, destination 201, publish **202**; exit 0. |
+| 01 | [01-basics-curl.md](scenarios/01-basics-curl.md) | `2026-04-10T09-28-52-764Z-scenario-01` | Pass (7/7) | Pass | Pass | Artifact: **`quickstart.sh`**. Heuristic + LLM from `npm run eval -- --scenario 01`; harness sidecars are sibling `*.eval-*.json` under `results/runs/` (not inside run dir). Execution: `OUTPOST_API_KEY` from `docs/agent-evaluation/.env` + `bash quickstart.sh` in run dir; tenant **200**, destination **201**, publish **202**; exit 0. |
| 02 | [02-basics-typescript.md](scenarios/02-basics-typescript.md) | `2026-04-08T15-16-50-424Z-scenario-02` | Pass (9/9) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost-quickstart.ts`** + `package.json` (SDK). Ran `npx tsx outpost-quickstart.ts` with `docs/agent-evaluation/.env`; tenant, destination, publish OK. |
| 03 | [03-basics-python.md](scenarios/03-basics-python.md) | `2026-04-08T15-34-12-720Z-scenario-03` | Pass (8/8) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost_quickstart.py`**. Execution: `python3 -m venv .venv`, `pip install outpost_sdk`, `docs/agent-evaluation/.env` + `python outpost_quickstart.py`. |
| 04 | [04-basics-go.md](scenarios/04-basics-go.md) | `2026-04-08T15-48-31-367Z-scenario-04` | Pass (9/9) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifacts: `**main.go`**, `go.mod` (replace → repo `sdks/outpost-go`). `docs/agent-evaluation/.env` + `go run .`; tenant, destination, publish OK. |
From cee7ff4dae0b83fb8737571e6a07bebbf41f9688 Mon Sep 17 00:00:00 2001
From: Phil Leggetter
Date: Fri, 10 Apr 2026 10:59:35 +0100
Subject: [PATCH 31/47] docs(agent-evaluation): update scenario 02 tracker row
Record 2026-04-10 run: heuristic 9/9 pass, LLM fail (script vs Next.js
mismatch), execution pass via outpost-quickstart.ts.
Made-with: Cursor
---
docs/agent-evaluation/SCENARIO-RUN-TRACKER.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
index 0ffd50a42..461926046 100644
--- a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
+++ b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
@@ -21,7 +21,7 @@ Use this table while you **run scenarios one at a time** and **execute the gener
| ID | Scenario file | Run directory (`results/runs/…`) | Heuristic | LLM judge | Execution (generated code) | Notes |
| --- | ------------------------------------------------------------------------------ | -------------------------------------- | ---------------------- | --------- | -------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| 01 | [01-basics-curl.md](scenarios/01-basics-curl.md) | `2026-04-10T09-28-52-764Z-scenario-01` | Pass (7/7) | Pass | Pass | Artifact: **`quickstart.sh`**. Heuristic + LLM from `npm run eval -- --scenario 01`; harness sidecars are sibling `*.eval-*.json` under `results/runs/` (not inside run dir). Execution: `OUTPOST_API_KEY` from `docs/agent-evaluation/.env` + `bash quickstart.sh` in run dir; tenant **200**, destination **201**, publish **202**; exit 0. |
-| 02 | [02-basics-typescript.md](scenarios/02-basics-typescript.md) | `2026-04-08T15-16-50-424Z-scenario-02` | Pass (9/9) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost-quickstart.ts`** + `package.json` (SDK). Ran `npx tsx outpost-quickstart.ts` with `docs/agent-evaluation/.env`; tenant, destination, publish OK. |
+| 02 | [02-basics-typescript.md](scenarios/02-basics-typescript.md) | `2026-04-10T09-39-06-362Z-scenario-02` | Pass (9/9) | **Fail** | Pass | `EVAL_LOCAL_DOCS=1`. Agent produced a **Next.js app** plus **`outpost-quickstart.ts`**; LLM judge **failed** (`overall_transcript_pass=false`) — expected a minimal single-file script + `npx tsx` story, not a full UI (see `llm-score.json` criteria). Heuristic still 9/9. **Execution:** `npx tsx outpost-quickstart.ts` with run-dir `.env` (`OUTPOST_API_KEY`); tenant/destination/publish succeeded (printed event id). Harness sidecars sibling under `results/runs/`. |
| 03 | [03-basics-python.md](scenarios/03-basics-python.md) | `2026-04-08T15-34-12-720Z-scenario-03` | Pass (8/8) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost_quickstart.py`**. Execution: `python3 -m venv .venv`, `pip install outpost_sdk`, `docs/agent-evaluation/.env` + `python outpost_quickstart.py`. |
| 04 | [04-basics-go.md](scenarios/04-basics-go.md) | `2026-04-08T15-48-31-367Z-scenario-04` | Pass (9/9) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifacts: `**main.go`**, `go.mod` (replace → repo `sdks/outpost-go`). `docs/agent-evaluation/.env` + `go run .`; tenant, destination, publish OK. |
| 05 | [05-app-nextjs.md](scenarios/05-app-nextjs.md) | `2026-04-08T17-21-22-170Z-scenario-05` | 9/10; overall **Fail** | Pass | Pass | `**nextjs-webhook-demo/`** — primary assessed run; see **§ Scenario 05 — assessment (`17-21-22`)**. Heuristic failure is `managed_base_not_selfhosted` (doc-corpus), not the app. Earlier: `2026-04-08T16-12-10-708Z` (`outpost-nextjs-demo/`, 10/10 heuristic, simpler UI). |
From 60f73f43332f77404f70b229de62456b5cf59406 Mon Sep 17 00:00:00 2001
From: Phil Leggetter
Date: Fri, 10 Apr 2026 12:08:22 +0100
Subject: [PATCH 32/47] docs: scope-router Outpost agent prompt and refresh
basics tracker rows
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Restructure hookdeck-outpost-agent-prompt.mdx with Quick path / new minimal
app / existing app guidance, default-to-smallest behavior, language vs
architecture, doc list split, mapping hints, and explicit anti-over-build
rules.
Update SCENARIO-RUN-TRACKER for scenarios 01–03 with recent eval runs
(heuristic, LLM, execution notes, sibling harness sidecars).
Made-with: Cursor
---
docs/agent-evaluation/SCENARIO-RUN-TRACKER.md | 30 ++---
.../hookdeck-outpost-agent-prompt.mdx | 103 +++++++++++++-----
2 files changed, 91 insertions(+), 42 deletions(-)
diff --git a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
index 461926046..c043acaa7 100644
--- a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
+++ b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
@@ -18,18 +18,18 @@ Use this table while you **run scenarios one at a time** and **execute the gener
## Tracker
-| ID | Scenario file | Run directory (`results/runs/…`) | Heuristic | LLM judge | Execution (generated code) | Notes |
-| --- | ------------------------------------------------------------------------------ | -------------------------------------- | ---------------------- | --------- | -------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| 01 | [01-basics-curl.md](scenarios/01-basics-curl.md) | `2026-04-10T09-28-52-764Z-scenario-01` | Pass (7/7) | Pass | Pass | Artifact: **`quickstart.sh`**. Heuristic + LLM from `npm run eval -- --scenario 01`; harness sidecars are sibling `*.eval-*.json` under `results/runs/` (not inside run dir). Execution: `OUTPOST_API_KEY` from `docs/agent-evaluation/.env` + `bash quickstart.sh` in run dir; tenant **200**, destination **201**, publish **202**; exit 0. |
-| 02 | [02-basics-typescript.md](scenarios/02-basics-typescript.md) | `2026-04-10T09-39-06-362Z-scenario-02` | Pass (9/9) | **Fail** | Pass | `EVAL_LOCAL_DOCS=1`. Agent produced a **Next.js app** plus **`outpost-quickstart.ts`**; LLM judge **failed** (`overall_transcript_pass=false`) — expected a minimal single-file script + `npx tsx` story, not a full UI (see `llm-score.json` criteria). Heuristic still 9/9. **Execution:** `npx tsx outpost-quickstart.ts` with run-dir `.env` (`OUTPOST_API_KEY`); tenant/destination/publish succeeded (printed event id). Harness sidecars sibling under `results/runs/`. |
-| 03 | [03-basics-python.md](scenarios/03-basics-python.md) | `2026-04-08T15-34-12-720Z-scenario-03` | Pass (8/8) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost_quickstart.py`**. Execution: `python3 -m venv .venv`, `pip install outpost_sdk`, `docs/agent-evaluation/.env` + `python outpost_quickstart.py`. |
-| 04 | [04-basics-go.md](scenarios/04-basics-go.md) | `2026-04-08T15-48-31-367Z-scenario-04` | Pass (9/9) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifacts: `**main.go`**, `go.mod` (replace → repo `sdks/outpost-go`). `docs/agent-evaluation/.env` + `go run .`; tenant, destination, publish OK. |
-| 05 | [05-app-nextjs.md](scenarios/05-app-nextjs.md) | `2026-04-08T17-21-22-170Z-scenario-05` | 9/10; overall **Fail** | Pass | Pass | `**nextjs-webhook-demo/`** — primary assessed run; see **§ Scenario 05 — assessment (`17-21-22`)**. Heuristic failure is `managed_base_not_selfhosted` (doc-corpus), not the app. Earlier: `2026-04-08T16-12-10-708Z` (`outpost-nextjs-demo/`, 10/10 heuristic, simpler UI). |
-| 06 | [06-app-fastapi.md](scenarios/06-app-fastapi.md) | `2026-04-09T08-38-42-008Z-scenario-06` | Pass (8/8) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. `**main.py`** + `requirements.txt`, `outpost_sdk` + FastAPI. HTML: destinations list, add webhook (topics from API + URL), publish test event, delete. Execution: `python3 -m venv .venv`, `pip install -r requirements.txt`, run-dir `.env`, `uvicorn main:app` on :8766; **GET /** 200, **POST /destinations** 303, **POST /publish** 303. |
-| 07 | [07-app-go-http.md](scenarios/07-app-go-http.md) | `2026-04-09T09-10-23-291Z-scenario-07` | Pass (9/9) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. `**go-portal-demo/`** — `main.go` + `templates/`, `net/http`, `outpost-go` (`replace` → repo `sdks/outpost-go`). Multi-step create destination + **GET/POST /publish**. Execution: `PORT=8777` + key/base from `docs/agent-evaluation/.env`; **GET /** 200, **POST /publish** 200. Eval ~25 min wall time. |
-| 08 | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md) | `2026-04-09T14-48-16-906Z-scenario-08` | Pass (7/7) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. `**## Eval harness`** pre-clone + `**agent cwd`** = `next-saas-starter/` under run dir; artifact colocated (`app/api/outpost/`**, dashboard webhooks, `@hookdeck/outpost-sdk`). **Execution:** `npx tsc --noEmit` in `…/next-saas-starter/` — **exit 0**. Eval ~13 min wall time. Earlier run `2026-04-09T11-08-32-505Z-scenario-08`: work had landed outside run dir (no app tree in folder). |
+| ID | Scenario file | Run directory (`results/runs/…`) | Heuristic | LLM judge | Execution (generated code) | Notes |
+| --- | ------------------------------------------------------------------------------ | -------------------------------------- | ---------------------- | --------- | -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
+| 01 | [01-basics-curl.md](scenarios/01-basics-curl.md) | `2026-04-10T09-28-52-764Z-scenario-01` | Pass (7/7) | Pass | Pass | Artifact: `**quickstart.sh`**. Heuristic + LLM from `npm run eval -- --scenario 01`; harness sidecars are sibling `*.eval-*.json` under `results/runs/` (not inside run dir). Execution: `OUTPOST_API_KEY` from `docs/agent-evaluation/.env` + `bash quickstart.sh` in run dir; tenant **200**, destination **201**, publish **202**; exit 0. |
+| 02 | [02-basics-typescript.md](scenarios/02-basics-typescript.md) | `2026-04-10T10-34-35-461Z-scenario-02` | Pass (9/9) | Pass | Pass | `EVAL_LOCAL_DOCS=1` after **scope-router** update to [agent prompt template](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx). Artifact: `**outpost-quickstart.ts`** + `package.json` (SDK)—**no** Next.js scaffold. Heuristic + LLM pass; judge `execution_in_transcript` **pass** (agent ran script in transcript: tenant, destination, event id). Harness sidecars sibling under `results/runs/`. Earlier over-build run: `2026-04-10T09-39-06-362Z-scenario-02` (Next.js + script; LLM fail). |
+| 03 | [03-basics-python.md](scenarios/03-basics-python.md) | `2026-04-10T11-02-19-073Z-scenario-03` | Pass (8/8) | Pass | Pass | `EVAL_LOCAL_DOCS=1` with [scope-router prompt](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx). Artifact: `**outpost_quickstart.py`** + `.env.example` (`python-dotenv`, `outpost_sdk`)—**no** web framework. Heuristic + LLM pass; judge `execution_in_transcript` **pass** (agent ran script; printed event id). Harness sidecars sibling under `results/runs/`. Earlier run: `2026-04-08T15-34-12-720Z-scenario-03`. |
+| 04 | [04-basics-go.md](scenarios/04-basics-go.md) | `2026-04-08T15-48-31-367Z-scenario-04` | Pass (9/9) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifacts: `**main.go`**, `go.mod` (replace → repo `sdks/outpost-go`). `docs/agent-evaluation/.env` + `go run .`; tenant, destination, publish OK. |
+| 05 | [05-app-nextjs.md](scenarios/05-app-nextjs.md) | `2026-04-08T17-21-22-170Z-scenario-05` | 9/10; overall **Fail** | Pass | Pass | `**nextjs-webhook-demo/`** — primary assessed run; see **§ Scenario 05 — assessment (`17-21-22`)**. Heuristic failure is `managed_base_not_selfhosted` (doc-corpus), not the app. Earlier: `2026-04-08T16-12-10-708Z` (`outpost-nextjs-demo/`, 10/10 heuristic, simpler UI). |
+| 06 | [06-app-fastapi.md](scenarios/06-app-fastapi.md) | `2026-04-09T08-38-42-008Z-scenario-06` | Pass (8/8) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. `**main.py`** + `requirements.txt`, `outpost_sdk` + FastAPI. HTML: destinations list, add webhook (topics from API + URL), publish test event, delete. Execution: `python3 -m venv .venv`, `pip install -r requirements.txt`, run-dir `.env`, `uvicorn main:app` on :8766; **GET /** 200, **POST /destinations** 303, **POST /publish** 303. |
+| 07 | [07-app-go-http.md](scenarios/07-app-go-http.md) | `2026-04-09T09-10-23-291Z-scenario-07` | Pass (9/9) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. `**go-portal-demo/`** — `main.go` + `templates/`, `net/http`, `outpost-go` (`replace` → repo `sdks/outpost-go`). Multi-step create destination + **GET/POST /publish**. Execution: `PORT=8777` + key/base from `docs/agent-evaluation/.env`; **GET /** 200, **POST /publish** 200. Eval ~25 min wall time. |
+| 08 | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md) | `2026-04-09T14-48-16-906Z-scenario-08` | Pass (7/7) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. `**## Eval harness`** pre-clone + `**agent cwd`** = `next-saas-starter/` under run dir; artifact colocated (`app/api/outpost/`**, dashboard webhooks, `@hookdeck/outpost-sdk`). **Execution:** `npx tsc --noEmit` in `…/next-saas-starter/` — **exit 0**. Eval ~13 min wall time. Earlier run `2026-04-09T11-08-32-505Z-scenario-08`: work had landed outside run dir (no app tree in folder). |
| 09 | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) | `2026-04-09T22-16-54-750Z-scenario-09` | Pass (6/6) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. **Artifact** lives under `results/runs/…` (**gitignored**): `full-stack-fastapi-template/` + Docker **outpost-local-s09**; ports **5173** / **8001** / **54333** / **1080**. **§ Scenario 09 — post-agent work** lists everything applied after the agent transcript (incl. test publish, events/attempts/retry UI, docs + prompt). **§ Scenario 09 — review notes** — closed (IA + domain topics guidance landed in BYO UI + prompt). **Legacy runs:** `2026-04-09T20-48-16-530Z-scenario-09`, `2026-04-09T15-51-44-184Z-scenario-09`. |
-| 10 | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md) | | | | | |
+| 10 | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md) | | | | | |
### Scenario 09 — post-agent work (`2026-04-09T22-16-54-750Z-scenario-09`)
@@ -40,12 +40,12 @@ Work applied **after** the agent transcript so the FastAPI + React artifact matc
- **TanStack Router:** `frontend/src/routeTree.gen.ts` — register `/_layout/webhooks` (agent added the route file but not the generated tree).
- **API base URL:** webhooks page used browser-relative `/api/...` against nginx; switched to backend base (`OpenAPI.BASE` / `VITE_API_URL`).
-- **Destination types:** Outpost JSON uses **`type`** and **`icon`** (not `id` / `svg`); fixed controlled radios / **Next** in the create wizard.
+- **Destination types:** Outpost JSON uses `**type`** and `**icon**` (not `id` / `svg`); fixed controlled radios / **Next** in the create wizard.
**Backend**
-- **`POST /api/v1/webhooks/publish-test`** — synthetic `publish` for integration testing.
-- **`GET /api/v1/webhooks/events`**, **`GET /api/v1/webhooks/attempts`**, **`POST /api/v1/webhooks/retry`** — BFF proxies for tenant-scoped **events list**, **attempts**, and **manual retry** (admin key server-side).
+- `**POST /api/v1/webhooks/publish-test`** — synthetic `publish` for integration testing.
+- `**GET /api/v1/webhooks/events**`, `**GET /api/v1/webhooks/attempts**`, `**POST /api/v1/webhooks/retry**` — BFF proxies for tenant-scoped **events list**, **attempts**, and **manual retry** (admin key server-side).
**Dashboard UI (webhooks page)**
@@ -61,7 +61,7 @@ Work applied **after** the agent transcript so the FastAPI + React artifact matc
Operator feedback from exercising the FastAPI full-stack artifact is **closed** in-repo:
1. **Event activity IA** — [Building your own UI](../pages/guides/building-your-own-ui.mdx) documents **default** destination → activity and **optional** tenant-wide activity with the same list endpoints; no open doc gap.
-2. **Domain topics + real publishes vs test-only** — [Agent prompt](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx) (topic reconciliation, domain publish, test publish as separate), scenarios **08–10** success criteria + user-turn scripts, [README](README.md) execution notes, and heuristic **`publish_beyond_test_only`** in [`src/score-transcript.ts`](src/score-transcript.ts) cover what we measure.
+2. **Domain topics + real publishes vs test-only** — [Agent prompt](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx) (topic reconciliation, domain publish, test publish as separate), scenarios **08–10** success criteria + user-turn scripts, [README](README.md) execution notes, and heuristic `**publish_beyond_test_only`** in `[src/score-transcript.ts](src/score-transcript.ts)` cover what we measure.
The **copied agent template** (the `## Hookdeck Outpost integration` block) intentionally stays **scenario-agnostic**: it does not name eval baselines, harness repos, or scenario IDs—only product-level integration guidance and doc links.
diff --git a/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx b/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx
index 16f348e09..875bd739b 100644
--- a/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx
+++ b/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx
@@ -35,59 +35,108 @@ Use this **Hookdeck Console Source** URL to verify event delivery (the webhook `
### Documentation
+**Core (read for every path):**
+
- Getting started (curl / HTTP only, no SDK): {{DOCS_URL}}/quickstarts/hookdeck-outpost-curl
- TypeScript quickstart (TypeScript SDK): {{DOCS_URL}}/quickstarts/hookdeck-outpost-typescript
- Python quickstart (Python SDK): {{DOCS_URL}}/quickstarts/hookdeck-outpost-python
- Go quickstart (Go SDK): {{DOCS_URL}}/quickstarts/hookdeck-outpost-go
-- Full docs bundle (when available on the public site): {{LLMS_FULL_URL}}
- API reference and OpenAPI (REST JSON shapes and status codes): {{DOCS_URL}}/api
- **Concepts — how tenants, destinations (subscriptions), topics, and publish fit a SaaS/platform:** {{DOCS_URL}}/concepts
+- Full docs bundle (when available on the public site): {{LLMS_FULL_URL}}
+- SDK overview: {{DOCS_URL}}/sdks — use **only** for high-level context; for **TypeScript, Python, or Go** code, follow that language’s **quickstart** for correct method signatures (e.g. Python `publish.event` uses `request={{...}}`, not TypeScript-style spreads as Python kwargs).
+
+**When you build customer-facing UI or integrate into an existing product (not for quick path only):**
+
- **Building your own UI — screen structure and flow** (list destinations—**any type**; create: choose **type** → topics → type-specific config; **events** / **attempts** / **manual retry**; tenant scope; default **destination → activity**): {{DOCS_URL}}/guides/building-your-own-ui
- Destination types: {{DOCS_URL}}/destinations
- Topics and destination subscriptions (fan-out, `*`): {{DOCS_URL}}/features/topics
-- SDK overview (mostly TypeScript-shaped examples): {{DOCS_URL}}/sdks — use **only** for high-level context; for **TypeScript, Python, or Go** code, follow that language’s **quickstart** for correct method signatures (e.g. Python `publish.event` uses `request={{...}}`, not TypeScript-style spreads as Python kwargs).
+
+### Scope: choose the right depth (read before you build)
+
+Operators often give **short** answers (“TypeScript example,” “show me in Go”). **You** infer **how much** to build from their words—not from habit, and **not** from the language alone.
+
+**Three paths** (dashboard or chat may use other labels—“try it out,” “small demo app,” “our existing codebase,” or “Option 1 / 2 / 3”—map them to the same three):
+
+1. **Quick path** — Smallest runnable artifact: one **shell script** (curl) or **one source file** run with `npx tsx`, `python`, `go run`, etc., exactly as that language’s **quickstart** describes. No application framework, no multi-route server, no dev-server “project,” unless they clearly asked for an app.
+2. **New minimal application** — They want a **new** small service or UI (pages, forms, a demo they can open in a browser). Use the **official SDK on the server** for whatever stack they name; stay **framework-agnostic** unless they specify a framework—do not impose one.
+3. **Existing application** — They are changing **their current codebase**. Same SDK-on-server rules; integrate on **real** domain paths. Use the **full-stack** guidance in **Existing application (full-stack products)** below when the repo already has customer-facing UI.
+
+**Default when scope is ambiguous:** Prefer **Quick path**. If they only name a language, say “example,” “quickstart,” “try it,” “just show me,” or similar—and they do **not** ask for an app, UI, pages, a server project, or changes **inside their repo**—deliver **only** the quickstart-shaped artifact for that language (or curl if they gave no language). **Brief user messages are normal;** map them to the **smallest** matching path.
+
+**Language ≠ architecture:** **TypeScript**, **Python**, and **Go** select **which quickstart and SDK** to use. They do **not** mean “build a web application.” If they want an app or a full integration, they will signal it (“small dashboard,” “add to our backend,” “we use X in production,” etc.)—or ask **one** clarifying question if truly unclear.
+
+**Do not over-build:**
+
+- **Quick path** → **No** framework scaffold (no app router, no `create-*-app`, no Express/FastAPI/Go HTTP **project** just to demo Outpost). One file or one shell script is enough.
+- **Quick path** → Do **not** default to a large stack because the language was TypeScript or Node; a **single `.ts` file** per the TypeScript quickstart is the right shape unless they asked for more.
+- **New minimal application** → Do **not** ship full portal depth (events UI, retry flows, every destination type) unless they asked for that level; grow into **Building your own UI** when they want customer-grade destination management.
+- **Existing application** → Do **not** stop at a throwaway demo route when they asked for real integration; follow **Minimum integration depth** under that section.
+
+### If the operator said… (mapping hints)
+
+| They said (examples) | Likely path |
+|----------------------|-------------|
+| “Example,” “quickstart,” “fastest,” “simplest,” “just show me,” or **only** a language name with no app/repo context | **Quick path** |
+| “Small app,” “UI,” “page,” “form,” “demo site,” “dashboard” (greenfield, not their production repo) | **New minimal application** |
+| “Our app,” “existing code,” “add to my API,” “integrate into this repo,” “we already run …” | **Existing application** |
+
+Use judgment; when two paths seem possible, prefer **Quick path** unless they clearly want UI or repo integration.
### Language → SDK vs HTTP
-Operators rarely name packages or SDK details. **You** map what they say to the right doc and dependency:
+**You** map their words to the right doc—**after** you have chosen **scope** above.
-**“Try it out” — interpret their words**
+- **No language named** + simplest / minimal / “just show me” / no framework → **curl quickstart** + OpenAPI. One runnable shell script. **No SDK.**
+- **TypeScript** or **Node** → **TypeScript quickstart** + **`@hookdeck/outpost-sdk`** as that doc shows. They do not need to say “SDK.”
+- **Python** → **Python quickstart** + **`outpost_sdk`** (e.g. Python `publish.event` uses `request={{...}}` — **not** TypeScript-style kwargs).
+- **Go** → **Go quickstart** + official Go SDK as that doc shows.
+- **curl**, **HTTP only**, or **REST** without a language SDK → **curl quickstart** + OpenAPI.
-- **Simplest / fastest / minimal / least setup / “just show me” / no framework** (and they do **not** name TypeScript, Python, or Go) → treat as **curl**: **curl quickstart** + **OpenAPI** for exact JSON. One runnable shell script is ideal. **No SDK.**
-- **TypeScript** or **Node** → **TypeScript quickstart**; use the **official TypeScript SDK** (`@hookdeck/outpost-sdk`) exactly as that quickstart shows. The user does not need to say “SDK.”
-- **Python** → **Python quickstart**; use **`outpost_sdk`** as that quickstart shows (e.g. Python `publish.event` uses `request={{...}}` — **not** TypeScript-style kwargs on the method).
-- **Go** → **Go quickstart**; use the **official Go SDK** as that quickstart shows.
-- They explicitly want **curl**, **HTTP only**, or **REST** without a language SDK → **curl quickstart** + OpenAPI.
+Do **not** mix argument styles across languages (e.g. do not apply TypeScript `publish.event({ ... })` shapes to Python).
-Do **not** mix patterns across languages (e.g. do not apply TypeScript `publish.event({ ... })` argument style to Python).
+### Quick path — how to deliver
-**Option 2 (small app)** — Map framework to the matching official SDK on the **server only**: e.g. **Next.js** → TypeScript SDK + patterns from the TypeScript quickstart and your Next conventions; **FastAPI** → Python SDK; **Go + net/http** → Go SDK. Prefer each language’s **quickstart** for Outpost call shapes. **Before** designing pages or forms, read **Concepts** and **Building your own UI** in the Documentation list: the UI should reflect **tenant scope**, **multiple destinations per tenant**, and **destination = topic subscription + delivery target** (not a single anonymous webhook field unless the user explicitly asks for that simplified shape).
+Goal: tenant → **one destination** (often webhook to `{{TEST_DESTINATION_URL}}` / `OUTPOST_TEST_WEBHOOK_URL`) → **publish** → clear success (event id, HTTP 2xx, log line).
-**Option 3 (existing app)** — Use the **official SDK for the repo’s backend language** on the **server** (or REST/OpenAPI if they insist on no SDK). Read that language’s quickstart for call shapes; integrate on **real** domain paths (signup, core entities, workflows), not throwaway demos. **Minimum integration depth:** (1) **Topic reconciliation** — every **`topic` in `publish`** must either appear under **Configured topics** above **or** be documented for the operator with **“add this topic in the Outpost project”** (prefer fixing the project to match the domain, not retargeting domain logic to a stale list). (2) **Domain publish** — at least one **`publish` on a real state-change path** (CRUD handler, service after commit, job, etc.), not only a “send test event” / synthetic demo route. (3) **Same tenant mapping** everywhere you call Outpost for that customer.
+- Default to **curl** when they want the absolute minimum and did not name a language.
+- When they name **TypeScript**, **Python**, or **Go**, produce **only** what that language’s **quickstart** describes—typically **one file** (plus `package.json` / `go.mod` / venv if the quickstart needs it), not a full application tree.
+- Ask only for env vars and details the quickstart still needs.
-**Full-stack existing apps (backend + product UI)** — If the codebase already has a **customer-facing UI** (dashboard, settings, integrations, account area) **or** a mobile app that talks to your API, assume operators want customers to **manage event destinations** (every **destination type** the project enables—webhook, queues, Hookdeck, etc.; see **{{DOCS_URL}}/destinations** and **`GET /destination-types`** in OpenAPI) **inside the product**, not only via raw API or Swagger:
+### New minimal application
-- **Backend:** Keep **`OUTPOST_API_KEY`** and all Outpost SDK usage **server-side only**. Implement **tenant** upsert/sync where it fits your model, **publish** on real domain events, and **authenticated HTTP routes** (BFF / API routes / server actions—whatever matches the stack) that list, create, update, or delete destinations for the **currently signed-in customer’s** tenant. Those handlers call Outpost with the platform credentials; responses return only what the customer should see (e.g. destination ids, **targets** / config summaries, topics—never the platform API key).
-- **Frontend:** Wire **logged-in** pages to **your** backend endpoints (session cookie, JWT, or your existing API client)—**not** to Hookdeck’s API directly and **not** with the Outpost SDK in the browser. Reuse your design system and routing. **Before** building screens, read **Concepts** and **Building your own UI** in the Documentation list: flows should reflect **tenant scope**, **multiple destinations per tenant**, and **destination = topic subscription + delivery target** (avoid a single undifferentiated “webhook” field that hides topics unless the operator asks for that simplification).
-- **Events and retries in the product UI:** Surface an **events** view (filterable by **destination** when useful) so customers can see what was published, plus **delivery attempts** per event (success/failure, response hints). For **failed** attempts, offer **manual retry** (server-side `POST /retry` with `event_id` and `destination_id`) after they fix their endpoint or downstream config—see **Building your own UI** (default: **destination → activity**) for how this links to destinations and to automatic retries in Outpost.
-- **Send test events (strongly recommended for full-stack / Option 3):** When you ship customer-facing destination management, also add a **separate** control or screen that **publishes a test event** for the signed-in tenant (server-side `publish` to a selectable topic, same pattern as the test destination URL above). This is **complementary** to domain publishes: it proves wiring (destination + topic subscription + delivery) without waiting on real traffic. It **does not replace** a `publish` on a real domain path. The test topic can be any **configured** topic; domain publishes should use topics that match the events you document.
-- **API-only or headless products:** If there is **no** customer UI, document how tenants manage destinations through **your** documented API (OpenAPI, etc.); still keep the platform key on the server.
+When they want a **new** small app (not quick path): use the **official SDK on the server** for **their** stack. **Do not** treat any single framework as the default—follow what they name (or ask once). Prefer each language’s **quickstart** for Outpost call shapes, then add routes/pages as their stack requires.
-### What to do
+**Before** designing screens or forms, read **Concepts** and **Building your own UI** (under Documentation): reflect **tenant scope**, **multiple destinations per tenant**, and **destination = topic subscription + delivery target** (not one anonymous webhook field unless they ask for that simplification).
-Guide the conversation, then act:
+For a **tiny** demo, keep **tenant** in scope, **create destination** as **topics + delivery target**, and a **separate** way to **publish a test event** so they can verify delivery—avoid one giant form unless they insist. Events / attempts UI is optional for the smallest demo; add it when matching **Building your own UI**.
-1. **Try it out** — Minimal path: tenant → **one destination** (often a webhook for quick verification) → publish → print event id (or show success). If they want the **simplest** path, default to **curl** without making them say “curl.” If they name **TypeScript**, **Python**, or **Go**, use **only** that language’s quickstart and implied SDK. Ask only for what the quickstart and runnability still need (env vars, etc.).
+### Existing application
-2. **Build a minimal example** — Small UI + server; use the **SDK for that stack** (see **Option 2** above) or REST if they choose HTTP-only. Follow **Concepts** + **Building your own UI** for the real product model. For a **tiny** demo (e.g. one page), still keep the model visible: **tenant** in scope, **create destination** as **topics + delivery target** (not one undifferentiated “webhook” field that hides topics), and a **separate** control or flow to **publish a test event** so the operator can verify delivery—avoid collapsing tenant setup, destination creation, and publish into a single form unless the user insists. An events or attempts view is optional for the smallest demo but matches the portal pattern when you have room.
+Use the **official SDK for the repo’s backend language** on the **server** (or REST/OpenAPI if they refuse SDKs). Read that language’s quickstart for call shapes; integrate on **real** domain paths (signup, entities, workflows), not throwaway demos only.
-3. **Integrate with an existing app** — Open their codebase; implement **Option 3**. For repos that ship a **product UI**, integrate **both** server and client: backend Outpost calls plus customer-facing screens (or clear extension points) wired through **your** authenticated API, following **Building your own UI** for structure—**including test publish**, an **events** list (and attempts / **retry** where appropriate), unless the operator explicitly asks to omit parts. Document env vars, tenant mapping, topics, and how to verify delivery (e.g. `{{TEST_DESTINATION_URL}}` or the Hookdeck dashboard).
+**Minimum integration depth:** (1) **Topic reconciliation** — every **`topic` in `publish`** appears under **Configured topics** above **or** the operator is told to **add that topic in the Outpost project** (prefer fixing the project to match the domain, not retargeting domain logic to a stale list). (2) **Domain publish** — at least one **`publish` on a real state-change path**, not only a synthetic “test event” route. (3) **Same tenant mapping** everywhere you call Outpost for that customer.
+
+### Existing application (full-stack products)
+
+If the codebase already has **customer-facing UI** (dashboard, settings, integrations) **or** a client that talks to **your** API, operators usually want customers to **manage destinations** (every **destination type** the project enables; see **{{DOCS_URL}}/destinations** and **`GET /destination-types`** in OpenAPI) **inside the product**:
+
+- **Backend:** **`OUTPOST_API_KEY`** and all Outpost SDK usage **server-side only**. **Tenant** upsert/sync where it fits, **publish** on real domain events, and **authenticated routes** (BFF, server handlers, server actions—whatever matches **their** stack) to list/create/update/delete destinations for the **signed-in customer’s** tenant. Handlers call Outpost with platform credentials; responses expose only what the customer should see (ids, targets, topics—**never** the platform API key).
+- **Frontend:** **Logged-in** clients call **your** backend (session, JWT, existing API client)—**not** Hookdeck’s API directly; **not** the Outpost SDK in the browser. Reuse their design system and routing. **Before** building screens, read **Concepts** and **Building your own UI**: **tenant scope**, **multiple destinations**, **destination = topics + delivery target** (avoid one undifferentiated “webhook” field unless they want that simplification).
+- **Events and retries:** Surface **events** (filter by **destination** when useful) and **attempts** per event; offer **manual retry** for failed attempts (server-side retry API with `event_id` and `destination_id`) after they fix downstream—see **Building your own UI** (default **destination → activity**).
+- **Test publish (recommended when shipping destination UI):** A **separate** control that **publishes a test event** for the signed-in tenant (server-side `publish` to a configured topic). Complementary to domain publishes; **does not replace** a real domain `publish`.
+- **API-only products:** Document how tenants manage destinations via **your** API; keep the platform key on the server.
+
+### What to do
-For all modes, read the **single** language-appropriate quickstart (and OpenAPI when implementing raw HTTP) before writing code. For **Option 3** with a UI, also read **Building your own UI** before implementing destination-management screens.
+1. **Infer scope** from **Scope** + **If the operator said…** (default **Quick path** when unclear).
+2. **Map language** under **Language → SDK vs HTTP**.
+3. **Execute** the matching section: **Quick path**, **New minimal application**, or **Existing application** (+ **full-stack** subsection when applicable).
+4. Read the **single** language-appropriate quickstart (and OpenAPI for raw HTTP) before coding. For existing apps with UI, read **Building your own UI** before destination-management screens.
### Before you stop (verify)
-Apply **only** the items below that fit the task; **skip** any that do not apply (e.g. skip the existing-repo items for a standalone script or curl-only flow).
+Apply **only** the items below that fit the task; **skip** any that do not apply (e.g. skip existing-repo items for a standalone script or curl-only flow).
**Always (when you produced or changed runnable code):**
@@ -95,14 +144,14 @@ Apply **only** the items below that fit the task; **skip** any that do not apply
- [ ] **Secrets:** The platform Outpost API key remains **server-side** / **environment** only — not in client bundles, not hard-coded in committed source.
- [ ] **Repeatable:** Env vars, how to run, and how to verify with the test destination above are stated briefly (README, comments, or chat — match the task size; a one-file script may need only inline or chat notes).
-**When editing an existing application repository (Option 3 or equivalent):**
+**When editing an existing application repository (Existing application or equivalent):**
- [ ] **Topic reconciliation:** Every **`topic`** in `publish` is either in **Configured topics** above **or** README/chat tells the operator exactly which topics to **add in Hookdeck**—**domain-first**; do not retarget real features to wrong topic names to match an incomplete **Configured topics** list unless the operator explicitly asked for a minimal demo scope.
- [ ] **Domain publish:** At least one **`publish` on a real application path** (entity create/update, signup, etc.), not solely a synthetic “test event” endpoint—unless the operator explicitly scoped the task to wiring-only.
- [ ] **Test publish (if you added one):** Kept as a **separate** control from domain logic; does not satisfy the domain-publish item by itself.
- [ ] **Build integrity:** Generated outputs, route or module registries, and dependency lockfiles are **consistent** with new or edited source so a **clean** install + typecheck or build (or the repo’s documented CI step) would pass.
-**Files on disk:** When your environment supports it, **write runnable artifacts into the operator’s project workspace** (real files: scripts, app source, `package.json`, `go.mod`, README) rather than only pasting long code in chat—so they can run, diff, and commit. Keep everything for one task in the same directory. For **minimal example apps** (option 2), scaffold and install dependencies there as you normally would (for example `npm` / `npx`, `go mod`, `pip` or `uv`). For **Option 3** full-stack products, change both **backend and frontend** (or equivalent UI layer) when the repo already includes a customer-facing app—do not stop at OpenAPI-only unless the product is genuinely API-only or the operator asks to skip UI work.
+**Files on disk:** When your environment supports it, **write runnable artifacts into the operator’s project workspace** (real files: scripts, app source, `package.json`, `go.mod`, README) rather than only pasting long code in chat—so they can run, diff, and commit. Keep everything for one task in the same directory. For **new minimal application**, scaffold and install dependencies as you normally would (`npm` / `npx`, `go mod`, `pip` or `uv`). For **existing** full-stack products, change both **backend and frontend** (or equivalent UI layer) when the repo already includes customer-facing UI—do not stop at OpenAPI-only unless the product is genuinely API-only or the operator asks to skip UI work.
**Concepts:** Each **tenant** is one of the platform’s customers (an org/account you sell to). A tenant has **zero or more destinations**; each **destination** is a **subscription**—a **destination type** (webhook, queue, Hookdeck, …) plus **which topics** to receive and **where** to deliver (type-specific: URL, queue name, etc.). Your **backend** publishes with **`tenant_id`**, **`topic`**, and payload; Outpost fans out to every destination of that tenant that subscribes to that topic. Topic names should reflect **your product’s events**; **`user.*`** usually means **users inside that tenant’s account**, not your company’s internal operators. Read **{{DOCS_URL}}/concepts** and **{{DOCS_URL}}/guides/building-your-own-ui** for the full model and recommended screens. Topics for this project are listed above and were configured in the Hookdeck dashboard.
```
From 33653ddb3f9e03104e918fc4d11bc17b11f0c402 Mon Sep 17 00:00:00 2001
From: Phil Leggetter
Date: Fri, 10 Apr 2026 12:52:57 +0100
Subject: [PATCH 33/47] fix(api): add DestinationSchemaField.key to OpenAPI
spec
The API and registry metadata always returned key on config_fields and
credential_fields; the published schema omitted it and examples did not
validate against the corrected shape. Align DestinationSchemaField and
embedded destination-types examples with the wire format.
Made-with: Cursor
---
docs/apis/openapi.yaml | 24 +++++++++++++++++++++++-
docs/pages/destinations.mdx | 1 +
2 files changed, 24 insertions(+), 1 deletion(-)
diff --git a/docs/apis/openapi.yaml b/docs/apis/openapi.yaml
index 8f944fed2..95047557f 100644
--- a/docs/apis/openapi.yaml
+++ b/docs/apis/openapi.yaml
@@ -2008,8 +2008,14 @@ components:
$ref: "#/components/schemas/DestinationSchemaField"
DestinationSchemaField:
type: object
- required: [type, required]
+ required: [type, required, key]
properties:
+ key:
+ type: string
+ description: >-
+ Property name for this value inside the destination `config` or `credentials` object
+ on create/update (for example `url` for a webhook endpoint URL).
+ example: "url"
type:
type: string
enum: [text, checkbox, key_value_map, select]
@@ -3688,6 +3694,7 @@ paths:
instructions: "Enter the URL..."
config_fields: [
{
+ key: "url",
type: "text",
label: "URL",
description: "The URL to send the webhook to.",
@@ -3697,6 +3704,7 @@ paths:
]
credential_fields: [
{
+ key: "secret",
type: "text",
label: "Secret",
description: "Optional signing secret.",
@@ -3711,30 +3719,35 @@ paths:
config_fields:
[
{
+ key: "brokers",
type: "text",
label: "Brokers",
description: "Comma-separated list of Kafka broker addresses.",
required: true,
},
{
+ key: "topic",
type: "text",
label: "Topic",
description: "The Kafka topic to publish messages to.",
required: true,
},
{
+ key: "tls",
type: "checkbox",
label: "TLS",
description: "Enable TLS for the connection.",
default: "true",
},
{
+ key: "partition_key_template",
type: "text",
label: "Partition Key Template",
description: "JMESPath template to extract the partition key from the event payload.",
required: false,
},
{
+ key: "sasl_mechanism",
type: "select",
label: "SASL Mechanism",
description: "SASL authentication mechanism.",
@@ -3749,12 +3762,14 @@ paths:
credential_fields:
[
{
+ key: "username",
type: "text",
label: "Username",
description: "SASL username for authentication.",
required: true,
},
{
+ key: "password",
type: "text",
label: "Password",
description: "SASL password for authentication.",
@@ -3770,12 +3785,14 @@ paths:
config_fields:
[
{
+ key: "queue_url",
type: "text",
label: "Queue URL",
description: "The URL of the SQS queue.",
required: true,
},
{
+ key: "endpoint",
type: "text",
label: "Endpoint",
description: "Optional custom AWS endpoint URL.",
@@ -3785,6 +3802,7 @@ paths:
credential_fields:
[
{
+ key: "key",
type: "text",
label: "Key",
description: "AWS Access Key ID.",
@@ -3792,6 +3810,7 @@ paths:
sensitive: true,
},
{
+ key: "secret",
type: "text",
label: "Secret",
description: "AWS Secret Access Key.",
@@ -3799,6 +3818,7 @@ paths:
sensitive: true,
},
{
+ key: "session",
type: "text",
label: "Session",
description: "Optional AWS Session Token.",
@@ -3843,6 +3863,7 @@ paths:
# remote_setup_url is optional, omitted here
config_fields: [
{
+ key: "url",
type: "text",
label: "URL",
description: "The URL to send the webhook to.",
@@ -3852,6 +3873,7 @@ paths:
]
credential_fields: [
{
+ key: "secret",
type: "text",
label: "Secret",
description: "Optional signing secret.",
diff --git a/docs/pages/destinations.mdx b/docs/pages/destinations.mdx
index 4280108aa..7936153c0 100644
--- a/docs/pages/destinations.mdx
+++ b/docs/pages/destinations.mdx
@@ -59,6 +59,7 @@ For example, for the `webhook` type:
"remote_setup_url": null,
"config_fields": [
{
+ "key": "url",
"type": "text",
"label": "URL",
"description": "The URL to send the event to",
From e7d220964d7c187ac9afc6c4da200a7d0f409dec Mon Sep 17 00:00:00 2001
From: Phil Leggetter
Date: Fri, 10 Apr 2026 18:27:50 +0100
Subject: [PATCH 34/47] docs: refine Building your own UI guide and onboarding
agent prompt
Rebalance audience and IA (SDK-first server usage, wire JSON in later sections).
Shorten prompt invariants with links; align with integration checklist.
Made-with: Cursor
---
docs/pages/guides/building-your-own-ui.mdx | 37 ++++++++++++++++---
.../hookdeck-outpost-agent-prompt.mdx | 20 +++++++++-
2 files changed, 49 insertions(+), 8 deletions(-)
diff --git a/docs/pages/guides/building-your-own-ui.mdx b/docs/pages/guides/building-your-own-ui.mdx
index fd8496b76..a2d1cd28c 100644
--- a/docs/pages/guides/building-your-own-ui.mdx
+++ b/docs/pages/guides/building-your-own-ui.mdx
@@ -4,17 +4,21 @@ title: "Building Your Own UI"
While Outpost offers a Tenant User Portal, you may want to build your own UI so your customers can manage their destinations and view delivery activity.
+This page is for **teams shipping that experience**—usually product engineers and anyone designing settings, integrations, or support tooling around webhooks and other destination types. It is framework-agnostic: screens, flows, and how they map to Outpost. If you use an **AI coding assistant** with Hookdeck’s optional [integration prompt](/docs/quickstarts/hookdeck-outpost-agent-prompt), that document carries workflow-specific instructions; this guide stays focused on what your **customers** should see and what your **backend** should enforce.
+
The portal uses the same Outpost API you can call from your product. Its source is a useful reference ([`internal/portal`](https://github.com/hookdeck/outpost/tree/main/internal/portal), React); you are not required to match its stack.
-This guide is framework-agnostic. It describes screens, flows, and how they map to the API. For paths, query parameters, request and response JSON, status codes, and authentication, use the [OpenAPI specification](/docs/api) as the authoritative contract. If anything here disagrees with OpenAPI, trust the spec.
+For paths, query parameters, request and response JSON, status codes, and authentication, use the [OpenAPI specification](/docs/api) as the authoritative contract. If anything here disagrees with OpenAPI, trust the spec.
+
+**Prefer official SDKs on the server** where Hookdeck provides them for your backend language—see the [SDK overview](/docs/sdks) and the **curl**, **TypeScript**, **Python**, or **Go** quickstart in this documentation for runnable examples. The SDKs wrap the same API: less boilerplate, typed clients, and fewer raw HTTP mistakes. Use **OpenAPI** as the contract for **wire JSON** (especially when your browser or BFF returns JSON that should match the HTTP API), for generated clients, or when you integrate from a stack without a first-party SDK.
### Working from OpenAPI
-Each screen should map to named operations in the spec (list destinations, create destination, list events, and so on). Use the published schemas for request bodies and list rows.
+Map each surface in your product to named operations in the spec (list destinations, create destination, list events, and so on). Use the published schemas for request bodies and list rows, and implement those operations with the **official SDK** on your backend when available.
-Destination type labels, icons, and dynamic form fields come from `GET /destination-types`—specifically `config_fields` and `credential_fields` (see [Destination type metadata and dynamic config](#destination-type-metadata-and-dynamic-config)). That response is the source for field keys and types, not guesses from older examples.
+Destination type labels, icons, and dynamic form fields come from `GET /destination-types`—specifically `config_fields` and `credential_fields` (see [Destination type metadata and dynamic config](#destination-type-metadata-and-dynamic-config)). That response is the source for field keys and types, not guesses from older examples. Each field object includes a **`key`**: the property name inside the destination’s `config` or `credentials` object (for example `url` for a webhook). This is documented on **`DestinationSchemaField`** in [OpenAPI](/docs/api).
-If the browser calls Outpost directly, use the tenant JWT flows documented in OpenAPI. If you proxy through your backend (often called a BFF), your server performs the same operations with your session and injects `tenant_id` where the admin-key flows require it.
+Whether the browser uses a **tenant JWT** or talks only to **your** API, the operations are the ones in OpenAPI; see [Authentication](#authentication) for how credentials and `tenant_id` are applied.
The portal shows full UI code for complex forms; this page avoids long framework-specific snippets so the spec stays the single place for shapes and validation.
@@ -71,6 +75,12 @@ You can issue a tenant JWT for client-side calls to Outpost, or proxy requests t
Proxying is useful when you want to restrict which Outpost features are exposed or to keep the admin key off the client entirely.
+### Browser, your API, and Outpost (BFF pattern)
+
+In a typical **backend-for-frontend** arrangement, the customer’s browser calls **your** product API only. Your servers call Outpost with the **platform** API key and the correct **`tenant_id`** for the signed-in account. Teams refer to this as a **BFF**, an **Outpost proxy**, or a server-side integration layer—the pattern is the same.
+
+The alternative is for the browser to call Outpost **directly** using a short-lived **tenant JWT** ([Generating a JWT Token](#generating-a-jwt-token-optional) below). Many products prefer a proxy so the admin key never ships to the client and so they can limit which Outpost capabilities the UI may invoke.
+
### API base URL (managed and self-hosted)
Use one configurable base URL for Outpost (no trailing slash), for example `API_URL` or `OUTPOST_API_BASE_URL`. Paths in this guide match [OpenAPI](/docs/api) (`/tenants/...`, `/topics`, `/destination-types`, …).
@@ -106,10 +116,22 @@ Each entry typically includes (confirm names and optionality in OpenAPI):
### Dynamic field shape (for forms)
-Field objects are fully described in OpenAPI. Typically each has `key`, `label`, `type` (text vs checkbox), `required`, optional `description`, validation (`minlength`, `maxlength`, `pattern`), `default`, `disabled`, and `sensitive` (password-style; values may be masked after create—clear to edit).
+Field objects are fully described in OpenAPI (`DestinationSchemaField`), including **`key`** (where to place the value in `config` / `credentials` on create/update). Each field has `label`, `type` (text vs checkbox vs select vs key-value map), `required`, optional `description`, validation (`minlength`, `maxlength`, `pattern`), `default`, `disabled`, and `sensitive` (password-style; values may be masked after create—clear to edit). On submit, map each value to the **`key`** Outpost expects inside `config` / `credentials`, regardless of how property names were transformed earlier in your stack—see [Wire JSON, SDK responses, and your UI](#wire-json-sdk-responses-and-your-ui).
**Reference:** [DestinationConfigFields.tsx](https://github.com/hookdeck/outpost/blob/main/internal/portal/src/common/DestinationConfigFields/DestinationConfigFields.tsx) maps schema fields to inputs.
+### Wire JSON, SDK responses, and your UI
+
+This section matters whether you use an **official SDK** on the server (recommended when available) or raw HTTP: the **HTTP API** always follows [OpenAPI](/docs/api), while SDKs present language-native types to your backend code.
+
+HTTP responses from Outpost on the wire use JSON property names that match OpenAPI—typically **snake_case** (for example `config_fields`, `credential_fields`, and `remote_setup_url` on `GET /destination-types`).
+
+Official **SDKs** deserialize into language-native structures; names often differ from the wire format (for example TypeScript may expose **camelCase** such as `configFields` and `credentialFields`). Mutations use each SDK’s documented request types, which may not mirror OpenAPI field names literally.
+
+When a **browser** loads destination-type metadata via **your** backend, it receives whatever JSON your server returns. Options include forwarding the **raw** Outpost response body (so the client matches OpenAPI) or translating once on the server and treating that as your product’s contract. In all cases, create and update bodies must still place each value under the schema field’s **`key`** inside `config` and `credentials` as defined in OpenAPI.
+
+**Shape mismatches** between layers often appear as missing dynamic fields or create errors referencing absent `config.*` keys (for example `config.url` for webhooks). Comparing the **actual** JSON your UI receives with the property names your rendering code expects (`config_fields` versus `configFields`, and similar) usually isolates the problem.
+
### Remote setup URL
When `remote_setup_url` is present, you can link users through an external setup flow (for example Hookdeck-managed configuration) instead of only inline fields.
@@ -181,12 +203,15 @@ This section connects what your customers see (what was delivered, what failed,
## Implementation checklists
-These are readiness checks: they do not replace the tables above or OpenAPI. Use them to confirm nothing important was skipped before ship or when reviewing an implementation.
+Use these lists before launch, in design or code review, or when comparing your tenant experience to the patterns above. They do not replace OpenAPI, security review, or testing against your deployment.
+
+For **customer-facing** destination and delivery UI, work through **Planning and contract**, **Destinations experience**, and **Activity, attempts, and retries** at minimum. Skip rows that clearly do not apply (for example, if you only expose destinations through your own API and have no in-app activity screens—document how customers verify delivery instead).
### Planning and contract
- [ ] Every call is scoped to the correct tenant (`tenant_id` on admin-key routes, or tenant inferred from JWT).
- [ ] Outpost base URL comes from configuration or environment for dev, staging, and production (not a single hardcoded host in app code).
+- [ ] Server-side Outpost calls use an **official SDK** when Hookdeck ships one for your language; raw HTTP or generated OpenAPI clients are fine when they fit better.
- [ ] You chose an auth approach (browser JWT, server-side proxy/BFF, or mix) and use the matching OpenAPI operations and headers consistently.
- [ ] Dynamic destination UI (labels, icons, form fields) is driven by `GET /destination-types`, not copied field lists from examples.
diff --git a/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx b/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx
index 875bd739b..f8551a81e 100644
--- a/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx
+++ b/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx
@@ -46,6 +46,8 @@ Use this **Hookdeck Console Source** URL to verify event delivery (the webhook `
- Full docs bundle (when available on the public site): {{LLMS_FULL_URL}}
- SDK overview: {{DOCS_URL}}/sdks — use **only** for high-level context; for **TypeScript, Python, or Go** code, follow that language’s **quickstart** for correct method signatures (e.g. Python `publish.event` uses `request={{...}}`, not TypeScript-style spreads as Python kwargs).
+**SDK vs OpenAPI (BFF / dashboard UI):** **Prefer the official server SDK** when Hookdeck provides one for the repo’s backend language (**{{DOCS_URL}}/sdks**). Keep these invariants: (1) **Wire JSON** matches **OpenAPI** (often **snake_case**). **SDKs** rename fields in language types (e.g. TypeScript **camelCase**). (2) The **browser** should consume the same JSON shape your BFF actually returns—or the server should **normalize** (e.g. forward raw `GET /destination-types`). (3) On create/update, each schema field’s **`key`** maps into `config` / `credentials` per OpenAPI. **Calling** Outpost: use **SDK** types when using the SDK; use **OpenAPI** for raw `fetch` / curl. Detail: **{{DOCS_URL}}/guides/building-your-own-ui#authentication** and **{{DOCS_URL}}/guides/building-your-own-ui#wire-json-sdk-responses-and-your-ui**.
+
**When you build customer-facing UI or integrate into an existing product (not for quick path only):**
- **Building your own UI — screen structure and flow** (list destinations—**any type**; create: choose **type** → topics → type-specific config; **events** / **attempts** / **manual retry**; tenant scope; default **destination → activity**): {{DOCS_URL}}/guides/building-your-own-ui
@@ -121,7 +123,7 @@ Use the **official SDK for the repo’s backend language** on the **server** (or
If the codebase already has **customer-facing UI** (dashboard, settings, integrations) **or** a client that talks to **your** API, operators usually want customers to **manage destinations** (every **destination type** the project enables; see **{{DOCS_URL}}/destinations** and **`GET /destination-types`** in OpenAPI) **inside the product**:
-- **Backend:** **`OUTPOST_API_KEY`** and all Outpost SDK usage **server-side only**. **Tenant** upsert/sync where it fits, **publish** on real domain events, and **authenticated routes** (BFF, server handlers, server actions—whatever matches **their** stack) to list/create/update/delete destinations for the **signed-in customer’s** tenant. Handlers call Outpost with platform credentials; responses expose only what the customer should see (ids, targets, topics—**never** the platform API key).
+- **Backend:** **`OUTPOST_API_KEY`** and all Outpost SDK usage **server-side only**. **Tenant** upsert/sync where it fits, **publish** on real domain events, and **authenticated routes** (backend-for-frontend / BFF, server handlers, server actions—whatever matches **their** stack) to list/create/update/delete destinations for the **signed-in customer’s** tenant. Handlers call Outpost with platform credentials; responses expose only what the customer should see (ids, targets, topics—**never** the platform API key).
- **Frontend:** **Logged-in** clients call **your** backend (session, JWT, existing API client)—**not** Hookdeck’s API directly; **not** the Outpost SDK in the browser. Reuse their design system and routing. **Before** building screens, read **Concepts** and **Building your own UI**: **tenant scope**, **multiple destinations**, **destination = topics + delivery target** (avoid one undifferentiated “webhook” field unless they want that simplification).
- **Events and retries:** Surface **events** (filter by **destination** when useful) and **attempts** per event; offer **manual retry** for failed attempts (server-side retry API with `event_id` and `destination_id`) after they fix downstream—see **Building your own UI** (default **destination → activity**).
- **Test publish (recommended when shipping destination UI):** A **separate** control that **publishes a test event** for the signed-in tenant (server-side `publish` to a configured topic). Complementary to domain publishes; **does not replace** a real domain `publish`.
@@ -151,9 +153,13 @@ Apply **only** the items below that fit the task; **skip** any that do not apply
- [ ] **Test publish (if you added one):** Kept as a **separate** control from domain logic; does not satisfy the domain-publish item by itself.
- [ ] **Build integrity:** Generated outputs, route or module registries, and dependency lockfiles are **consistent** with new or edited source so a **clean** install + typecheck or build (or the repo’s documented CI step) would pass.
+**When you added or changed customer-facing destination management in an existing full-stack product** (dashboard, settings, or integrations UI—per **Existing application (full-stack products)** above):
+
+- [ ] **Full-stack UI bar:** Walked **Planning and contract**, **Destinations experience**, and **Activity, attempts, and retries** in **{{DOCS_URL}}/guides/building-your-own-ui#implementation-checklists** and confirmed the implementation matches: list rows reach **detail** and **destination-scoped activity** (events → attempts → manual retry as appropriate), **dynamic** create (and edit if you expose it) is driven by **`GET /destination-types`** (including each field’s **`key`** in `config` / `credentials`), and a **separate server-side test publish** control exists when customers can manage destinations. *Skip this item if the product is **API-only** (no customer UI for destinations) or the operator explicitly excluded activity / test UI—then document verification instead (README, Outpost dashboard, or curl to list events/attempts).*
+
**Files on disk:** When your environment supports it, **write runnable artifacts into the operator’s project workspace** (real files: scripts, app source, `package.json`, `go.mod`, README) rather than only pasting long code in chat—so they can run, diff, and commit. Keep everything for one task in the same directory. For **new minimal application**, scaffold and install dependencies as you normally would (`npm` / `npx`, `go mod`, `pip` or `uv`). For **existing** full-stack products, change both **backend and frontend** (or equivalent UI layer) when the repo already includes customer-facing UI—do not stop at OpenAPI-only unless the product is genuinely API-only or the operator asks to skip UI work.
-**Concepts:** Each **tenant** is one of the platform’s customers (an org/account you sell to). A tenant has **zero or more destinations**; each **destination** is a **subscription**—a **destination type** (webhook, queue, Hookdeck, …) plus **which topics** to receive and **where** to deliver (type-specific: URL, queue name, etc.). Your **backend** publishes with **`tenant_id`**, **`topic`**, and payload; Outpost fans out to every destination of that tenant that subscribes to that topic. Topic names should reflect **your product’s events**; **`user.*`** usually means **users inside that tenant’s account**, not your company’s internal operators. Read **{{DOCS_URL}}/concepts** and **{{DOCS_URL}}/guides/building-your-own-ui** for the full model and recommended screens. Topics for this project are listed above and were configured in the Hookdeck dashboard.
+**Concepts:** Read **{{DOCS_URL}}/concepts** for tenants, destinations as subscriptions, topics, and how **publish** fans out. Use **{{DOCS_URL}}/guides/building-your-own-ui** for recommended screens and implementation checklists. **Configured topics** above lists this project’s topic names (dashboard); **`user.*`** naming semantics are explained under **Configured topics** in this prompt.
```
## Placeholder reference
@@ -166,6 +172,16 @@ Apply **only** the items below that fit the task; **skip** any that do not apply
| `{{DOCS_URL}}` | `https://outpost.hookdeck.com/docs` | Public docs root (no trailing slash). For unpublished docs, automated evals can set **`EVAL_LOCAL_DOCS=1`** so the Documentation section is replaced with repository file paths (see `docs/agent-evaluation/README.md`). |
| `{{LLMS_FULL_URL}}` | `https://hookdeck.com/outpost/docs/llms-full.txt` | Optional; omit the line if not live yet |
+### Building your own UI — where the detail lives
+
+Product guidance is consolidated in **[Building your own UI](/docs/guides/building-your-own-ui)**:
+
+- **[Implementation checklists](/docs/guides/building-your-own-ui#implementation-checklists)** — ship/review rows for destinations and activity (referenced from **Before you stop (verify)** in the template above; not duplicated here).
+- **[Authentication](/docs/guides/building-your-own-ui#authentication)** — browser vs your API vs Outpost (**BFF** pattern) and JWT option.
+- **[Wire JSON, SDK responses, and your UI](/docs/guides/building-your-own-ui#wire-json-sdk-responses-and-your-ui)** — snake_case wire vs SDK names, `key` in `config` / `credentials`, shape mismatches.
+
+That page is written for **teams integrating Outpost** (engineers, PMs, reviewers). **Agent evaluation** in the Outpost repository (`docs/agent-evaluation/scenarios/`, scenarios **8–10** for existing-app baselines) uses the same implementation checklist when a run includes **customer-facing** destination UI—see each scenario’s success criteria for links.
+
## Operator checklist (dashboard UI)
- Show **API base URL** and **topics** next to the copyable prompt.
From 8f240ec1ce3293e3b428fe7a553d317b3316818c Mon Sep 17 00:00:00 2001
From: Phil Leggetter
Date: Fri, 10 Apr 2026 18:27:58 +0100
Subject: [PATCH 35/47] =?UTF-8?q?docs(eval):=20tighten=20scenarios=2008?=
=?UTF-8?q?=E2=80=9310=20and=20transcript=20heuristics?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Stricter success criteria with guide/prompt references; align placeholders.
Add heuristic checks for activity and test-publish signals where applicable.
Made-with: Cursor
---
.../fixtures/placeholder-values-for-turn0.md | 8 ++---
.../scenarios/08-integrate-nextjs-existing.md | 12 +++++--
.../09-integrate-fastapi-existing.md | 10 +++---
.../scenarios/10-integrate-go-existing.md | 10 ++++--
docs/agent-evaluation/src/score-transcript.ts | 34 +++++++++++++++++++
5 files changed, 61 insertions(+), 13 deletions(-)
diff --git a/docs/agent-evaluation/fixtures/placeholder-values-for-turn0.md b/docs/agent-evaluation/fixtures/placeholder-values-for-turn0.md
index 2336f6352..f17f94ce6 100644
--- a/docs/agent-evaluation/fixtures/placeholder-values-for-turn0.md
+++ b/docs/agent-evaluation/fixtures/placeholder-values-for-turn0.md
@@ -2,11 +2,11 @@
The **prompt template itself** lives in one place only:
-**[`hookdeck-outpost-agent-prompt.mdx`](../../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx)** (from repo root: `docs/pages/quickstarts/...`) — copy the fenced block under **## Template**, then replace each `{{PLACEHOLDER}}` using the table below.
+`**[hookdeck-outpost-agent-prompt.mdx](../../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx)`** (from repo root: `docs/pages/quickstarts/...`) — copy the fenced block under **## Template**, then replace each `{{PLACEHOLDER}}` using the table below.
-Do **not** paste real API keys into chat. Have operators put `OUTPOST_API_KEY` in a project **`.env`** (or another loader), not in the agent transcript. Use a throwaway Hookdeck project when possible.
+Do **not** paste real API keys into chat. Have operators put `OUTPOST_API_KEY` in a project `**.env`** (or another loader), not in the agent transcript. Use a throwaway Hookdeck project when possible.
-For **`npm run eval -- --scenario …`** (or **`--scenarios`** / **`--all`**), the runner only needs **`ANTHROPIC_API_KEY`** and **`EVAL_TEST_DESTINATION_URL`**. To score a **full** eval (generated commands/code actually work), you still need **`OUTPOST_API_KEY`** (and usually **`OUTPOST_TEST_WEBHOOK_URL`**) when you **execute** the agent’s output afterward. Optional **`EVAL_LOCAL_DOCS=1`** points Turn 0 at repo paths instead of live `{{DOCS_URL}}` links.
+For `**npm run eval -- --scenario …**` (or `**--scenarios**` / `**--all**`), the runner only needs `**ANTHROPIC_API_KEY**` and `**EVAL_TEST_DESTINATION_URL**`. To score a **full** eval (generated commands/code actually work), you still need `**OUTPOST_API_KEY`** (and usually `**OUTPOST_TEST_WEBHOOK_URL**`) when you **execute** the agent’s output afterward. Optional `**EVAL_LOCAL_DOCS=1`** points Turn 0 at repo paths instead of live `{{DOCS_URL}}` links.
---
@@ -26,4 +26,4 @@ For **`npm run eval -- --scenario …`** (or **`--scenarios`** / **`--all`**), t
## Dashboard implementation note
-When this text is embedded in the Hookdeck product, the **same** template body should be rendered from one dashboard/backend source so docs and product stay aligned. The MDX page in this repo is the documentation **canonical** copy until product source is wired to match it.
+When this text is embedded in the Hookdeck product, the **same** template body should be rendered from one dashboard/backend source so docs and product stay aligned. The MDX page in this repo is the documentation **canonical** copy until product source is wired to match it.
\ No newline at end of file
diff --git a/docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md b/docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md
index 9471a654c..74ad08253 100644
--- a/docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md
+++ b/docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md
@@ -54,17 +54,23 @@ Paste the **## Template** block from [`hookdeck-outpost-agent-prompt.mdx`](../pa
**Measurement:** Heuristic `scoreScenario08` in [`src/score-transcript.ts`](../src/score-transcript.ts); LLM judge maps the bullets below ([`README.md` § Measuring scenarios](../README.md#measuring-scenarios)). Execution row is manual.
+**Contract:** The baseline ships a **customer-facing dashboard**. Treat it like **Existing application (full-stack products)** in [`hookdeck-outpost-agent-prompt.mdx`](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx). The detailed UI bar is **not** repeated here—use **[Building your own UI — Implementation checklists](../../pages/guides/building-your-own-ui.mdx#implementation-checklists)** (*Planning and contract*, *Destinations experience*, *Activity, attempts, and retries*). The agent must self-verify with **Before you stop (verify)** in the same prompt (full-stack UI item).
+
- Baseline app is the documented **next-saas-starter** (or an explicitly justified fork): harness clone under the run directory plus install / integration steps reflected in the transcript or that tree.
- **Outpost TypeScript SDK** used **server-side only**; no `NEXT_PUBLIC_*` API key.
- **Topic reconciliation:** README or inline notes map **each `publish` topic** to a **real domain event**; if the app needs topics not in the **configured project list** from onboarding, instructions say to **add them in Hookdeck** (domain-first—not reshaping product logic to fit a stale default list unless wiring-only scope was agreed).
-- At least one **publish** on a **real domain path** (signup, CRUD, billing, etc.)—**not** only a synthetic “test event” route. A separate test publish for wiring checks is fine but does **not** replace this.
-- **Per-customer webhook** story is explained: destination creation / subscription to topic; **tenant ↔ customer** mapping is consistent for publish and destination APIs.
+- **Domain publish:** At least one **`publish` on a real domain path** (signup, CRUD, billing, etc.)—**not** only a synthetic “test event” route.
+- **Separate test publish:** A **distinct** server-side control (button, action, or route) that publishes a **test** event for the signed-in tenant—**in addition to** domain publish; does **not** satisfy the domain-publish requirement by itself (see prompt).
+- **Full-stack destination + activity UI:** Customers can **drill into** a destination (detail or edit—per product policy), reach **destination-scoped activity** (events / attempts / manual retry for failures) via **your** authenticated routes, and **create** destinations using **dynamic** fields from **`GET /destination-types`** (each field’s **`key`** → `config` / `credentials`). **List rows** link or navigate into that flow—not **only** create + delete with no detail or activity. Omit sub-items only if Turn 1 explicitly scoped **backend-only** or excluded activity UI (then document how operators verify delivery instead).
+- **Per-customer webhook** story: **tenant ↔ customer** mapping is consistent for publish and destination APIs.
- README (or equivalent) lists **env vars** for Outpost.
-- **Execution (full pass):** With `OUTPOST_API_KEY` set, the app runs; perform a **real in-app action** that triggers the domain publish and confirm Outpost accepts it (2xx/202). Optionally also run a test publish. Smoke from **`results/runs/…-scenario-08/next-saas-starter/`** (not transcript-only triage).
+- **Execution (full pass):** With `OUTPOST_API_KEY` set, the app runs; perform a **real in-app action** that triggers the domain publish and confirm Outpost accepts it (2xx/202). Exercise **test publish** and **activity / retry** in the UI when present. Smoke from **`results/runs/…-scenario-08/next-saas-starter/`** (not transcript-only triage).
## Failure modes to note
- Pasting a greenfield Next app instead of integrating the **baseline** in the workspace.
+- **List-only** destinations (no drill-down to detail or destination-scoped activity) while the baseline still has a product dashboard—unless the user explicitly scoped backend-only.
+- **No separate test publish** when customers can manage destinations from the UI.
- Publishing only from a demo or **test-only** route with no domain path.
- **Topics** in code with no README telling the operator to **add** them in Hookdeck when the onboarding topic list was incomplete (or silently retargeting domain logic to unrelated configured names).
- Calling Outpost from client components with secrets.
diff --git a/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md b/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md
index c24787d0a..bd171fb3c 100644
--- a/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md
+++ b/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md
@@ -59,15 +59,16 @@ Paste the **## Template** from [`hookdeck-outpost-agent-prompt.mdx`](../pages/qu
**Measurement:** Heuristic `scoreScenario09` in [`src/score-transcript.ts`](../src/score-transcript.ts); LLM judge (reads this section); execution manual.
+**Contract:** Same full-stack bar as scenario **8**, pinned to this template. **Canonical checklist:** [Building your own UI — Implementation checklists](../../pages/guides/building-your-own-ui.mdx#implementation-checklists). **Agent self-verify:** [`hookdeck-outpost-agent-prompt.mdx`](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx) → *Before you stop (verify)* (full-stack UI item). Do not duplicate checklist rows in transcripts—confirm against the guide.
+
- **full-stack-fastapi-template** (or documented alternative) present via harness **`preSteps`** with install steps in the transcript or tree.
- **`outpost_sdk`** with **`publish.event`** (and related calls as needed) on a **real** code path in the **backend** (server-side only for secrets)—**not** only a synthetic test-publish endpoint unless the scenario was explicitly scoped to wiring-only.
-- **Domain + test publish:** At least one **`publish` on a real domain path** (entity create/update, signup, etc.). A **separate** test-publish path or control is **also** expected for this baseline so operators can smoke-test wiring without waiting on production traffic—it **does not** replace the domain publish requirement.
+- **Domain + test publish:** At least one **`publish` on a real domain path** (entity create/update, signup, etc.). A **separate** test-publish path or control is **required** for this baseline—it **does not** replace the domain publish requirement.
- API key from **environment** or secure backend settings only — not hard-coded, not exposed via **`NEXT_PUBLIC_*`**, **`VITE_*`**, or other client-visible env patterns.
- **Topic reconciliation:** each **`topic` in code** ties to a real domain event; gaps vs the **configured project topic list** from onboarding are resolved by **adding topics in Hookdeck** (documented), not by retargeting domain logic to a mismatched list unless wiring-only scope was agreed.
-- **Destinations + tenant:** Per-customer (or per-team) **destination** management is **documented** and, where this template ships a dashboard, implemented with **safe** UI or BFF routes (list/create/edit as appropriate). **`tenant_id`** (or equivalent) is consistent between publish and destination APIs.
-- **Delivery visibility (full-stack bar):** Because this baseline includes a **customer-facing UI**, the product should expose **event activity** aligned with [Building your own UI](../../pages/guides/building-your-own-ui.mdx): customers can see **events** (e.g. filterable by destination), **attempts** for a selected event, and **manual retry** for failed deliveries—all via **your** authenticated backend calling Outpost (admin key server-side), not from the browser with the platform key. Omit only if the user explicitly scoped the task to **backend-only** or excluded activity UI.
+- **Destinations + tenant:** Per-customer (or per-team) **destination** management via **authenticated** UI or BFF routes: **list**, **create**, and **drill-down** (detail and **destination-scoped activity**—events, attempts, **manual retry**). **Dynamic** forms from **`GET /destination-types`** with correct **`key`** → `config` / `credentials`. **`tenant_id`** is consistent between publish and destination APIs. Omit drill-down / activity only if Turn 1 scoped **backend-only** or excluded activity UI (document verification instead).
- **Operator docs:** Root **README**, **backend/README**, **development.md**, or **`.env.example`** (whichever the template uses) lists **Outpost env vars** and how to run and verify.
-- **Execution (full pass):** Stack runs per template docs; trigger a **real domain action** that fires publish; Outpost accepts. Optionally exercise test publish and activity/retry in the UI. *Skip for transcript-only.*
+- **Execution (full pass):** Stack runs per template docs; trigger a **real domain action** that fires publish; Outpost accepts. Exercise **test publish** and **activity / retry** in the UI when in scope. *Skip for transcript-only.*
## Failure modes to note
@@ -76,6 +77,7 @@ Paste the **## Template** from [`hookdeck-outpost-agent-prompt.mdx`](../pages/qu
- Putting `OUTPOST_API_KEY` in `NEXT_PUBLIC_*`, `VITE_*`, or other client bundles.
- **Only** test/synthetic publish with no domain hook, or **only** domain publish with no **separate** test-publish control when a dashboard is in scope.
- **No** events/attempts/retry surfaced for customers when the baseline includes a product UI and the user did not ask to skip that scope.
+- **Flat list** of destinations with no navigation to **detail** or **per-destination activity** (same as scenario 8 failure mode).
## Future baselines
diff --git a/docs/agent-evaluation/scenarios/10-integrate-go-existing.md b/docs/agent-evaluation/scenarios/10-integrate-go-existing.md
index 7daab8da6..c9ab15366 100644
--- a/docs/agent-evaluation/scenarios/10-integrate-go-existing.md
+++ b/docs/agent-evaluation/scenarios/10-integrate-go-existing.md
@@ -51,14 +51,20 @@ Paste the **## Template** from [`hookdeck-outpost-agent-prompt.mdx`](../pages/qu
**Measurement:** Heuristic `scoreScenario10` in [`src/score-transcript.ts`](../src/score-transcript.ts); LLM judge; execution manual.
+**Contract:** This baseline is an **API-first** Go service (no first-party customer dashboard in the pin). It does **not** inherit the full **[Building your own UI](../../pages/guides/building-your-own-ui.mdx)** dashboard checklist wholesale—agents follow **[Existing application](../../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx#existing-application)** (minimum integration depth) plus **API-only** guidance in **Existing application (full-stack products)** (*Document how tenants manage destinations via **your** API*). If a future pin adds a UI, scenarios should be updated to require the **Implementation checklists** linked above.
+
- **startersaas-go-api** (or documented alternative) present via harness **`preSteps`** with build instructions attempted in the transcript or tree.
- **Outpost Go SDK** used with **`Publish.Event`** (and related types) on a **real** handler path—not only a test-only route unless wiring-only scope was agreed.
- No API key in source; **`os.Getenv("OUTPOST_API_KEY")`** (or config loader) only.
-- **Topic reconciliation** (domain-first; operator adds missing Hookdeck topics as documented) + **destination** documentation for operators; **tenant** mapping consistent.
-- **Execution (full pass):** Server runs; trigger the **domain** handler; Outpost accepts publish. *Skip for transcript-only.*
+- **Topic reconciliation** (domain-first; operator adds missing Hookdeck topics as documented); **tenant** mapping consistent everywhere Outpost is called.
+- **Customer webhook registration:** At least one **concrete** story—**implemented** authenticated route(s) and/or **OpenAPI/README**—for how a customer **creates or updates** a webhook destination (URL + topics) for their tenant. Prefer real **`Destinations.Create`** (or update) calls over prose-only if the Turn 1 story asks where destination creation lives.
+- **Test / verify delivery:** A **separate** mechanism from domain publish: e.g. documented **`curl`** + test receiver URL, a **small admin/test publish** endpoint, or README steps to trigger a test event—so operators can prove end-to-end delivery without relying solely on production traffic. Domain publish remains **required**; test-only wiring does **not** replace it (see prompt *Before you stop*).
+- **Execution (full pass):** Server runs; trigger the **domain** handler; Outpost accepts publish. Optionally exercise documented test publish / destination registration. *Skip for transcript-only.*
## Failure modes to note
- New `main.go` only, without using the **cloned** baseline’s routes/models.
- Wrong `Create` shape without **`CreateDestinationCreateWebhook`** when creating webhook destinations.
- Publish only from a **test** helper with no real handler path.
+- **Vague** “customers paste a URL somewhere” with no API contract, handler, or README steps for destination creation when the conversation asked for it.
+- **No** operator-facing way to smoke-test delivery (test publish or documented curl) when README promises outbound webhooks.
diff --git a/docs/agent-evaluation/src/score-transcript.ts b/docs/agent-evaluation/src/score-transcript.ts
index 73bc8d5c8..b3c4df2c9 100644
--- a/docs/agent-evaluation/src/score-transcript.ts
+++ b/docs/agent-evaluation/src/score-transcript.ts
@@ -810,6 +810,28 @@ function scoreScenario08(corpus: string, assistant: string): TranscriptScore {
: "Expected domain publish (not only publish-test / send test) — see scenario Success criteria",
});
+ const fullStackSignals =
+ /(attempt|retry|list\s*attempt|destination[_-]?scoped|\/activity|\/attempts|events?\s*\(|list\s*events|manual\s*retry)/i.test(
+ t,
+ ) && /(outpost|destination|tenant)/i.test(t);
+ checks.push({
+ id: "delivery_activity_signals",
+ pass: fullStackSignals,
+ detail: fullStackSignals
+ ? "Transcript mentions delivery visibility (attempts/events/retry/activity) with Outpost context"
+ : "Scenario 8 expects destination-scoped activity UI — see Building your own UI checklists + success criteria",
+ });
+
+ const testPublishSeparate =
+ /(test\s*publish|publish\s*test|send\s*test\s*event|\/api\/.*test|test.?event)/i.test(t);
+ checks.push({
+ id: "separate_test_publish_signal",
+ pass: testPublishSeparate,
+ detail: testPublishSeparate
+ ? "Separate test publish / test event control mentioned"
+ : "Expected distinct test-publish path or control (see scenario 8 success criteria)",
+ });
+
checks.push({
id: "no_key_in_reply",
pass: !containsLikelyLeakedKey(assistant),
@@ -909,6 +931,18 @@ function scoreScenario09(corpus: string, assistant: string): TranscriptScore {
: "Expected operator docs listing OUTPOST env vars (see scenario Success criteria)",
});
+ const fullStackSignals09 =
+ /(attempt|retry|list\s*attempt|destination[_-]?scoped|\/activity|\/attempts|events?\s*\(|list\s*events|manual\s*retry)/i.test(
+ t,
+ ) && /(outpost|destination|tenant)/i.test(t);
+ checks.push({
+ id: "delivery_activity_signals",
+ pass: fullStackSignals09,
+ detail: fullStackSignals09
+ ? "Transcript mentions delivery visibility (attempts/events/retry/activity) with Outpost context"
+ : "Scenario 9 expects full-stack activity UI — see Building your own UI checklists + success criteria",
+ });
+
checks.push({
id: "no_key_in_reply",
pass: !containsLikelyLeakedKey(assistant),
From 2031762e50cb067d0807ff9cfa79cf87b302f200 Mon Sep 17 00:00:00 2001
From: Phil Leggetter
Date: Fri, 10 Apr 2026 18:28:05 +0100
Subject: [PATCH 36/47] docs(eval): document wall time for heavy baseline
scenarios
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Explain 08–10 clone/install cost, sparse console output, and operator knobs
(Ctrl+C, EVAL_SKIP_HARNESS_PRE_STEPS, EVAL_MAX_TURNS, --no-score-llm).
Mirror a short note in run-agent-eval --help output.
Made-with: Cursor
---
docs/agent-evaluation/README.md | 11 +++++++++++
docs/agent-evaluation/src/run-agent-eval.ts | 7 ++++++-
2 files changed, 17 insertions(+), 1 deletion(-)
diff --git a/docs/agent-evaluation/README.md b/docs/agent-evaluation/README.md
index 94246c975..5dfb9330c 100644
--- a/docs/agent-evaluation/README.md
+++ b/docs/agent-evaluation/README.md
@@ -55,6 +55,17 @@ npm run eval -- --dry-run
The runner loads **`docs/agent-evaluation/.env`** automatically (via `dotenv`). Shell exports still override `.env` if both are set.
+### Wall time (scenarios **08–10** and other heavy baselines)
+
+Scenarios that **`git clone`** a full SaaS template and run **`npm` / `pnpm` / `docker compose`** installs are **slow by design**. Expect **roughly 30–90+ minutes** of wall time for a single run of **08**, **09**, or **10** (clone + install + several agent turns). The harness prints little to the terminal until **`transcript.json`** is written at the end, which can look hung.
+
+- **Stop early:** **Ctrl+C** (**SIGINT**) in the terminal running `npm run eval`. The runner writes **`*-scenario-NN.eval-aborted.json`** next to the run folder (see **Harness sidecars** at the top of this file).
+- **Skip re-clone:** If the baseline is already under the run directory, **`EVAL_SKIP_HARNESS_PRE_STEPS=1`** skips **`git_clone`** from the scenario harness (see each scenario’s **`## Eval harness`** block).
+- **Cap agent length (smoke only):** **`EVAL_MAX_TURNS`** (default **80**) limits SDK turns; lowering it may end the run sooner but often **fails** the integration before success criteria are met—use for debugging, not a real pass.
+- **Save judge time only:** **`--no-score-llm`** skips the Success-criteria LLM judge at the end (saves a few minutes; you lose that rubric).
+
+For **fast** automated signal in CI, use **`eval:ci`** (**01** + **02** only)—not **08**.
+
### CI (recommended slice)
For **pull-request or main-branch** automation, run **two** scenarios only:
diff --git a/docs/agent-evaluation/src/run-agent-eval.ts b/docs/agent-evaluation/src/run-agent-eval.ts
index 3c34c7d24..ba1129170 100644
--- a/docs/agent-evaluation/src/run-agent-eval.ts
+++ b/docs/agent-evaluation/src/run-agent-eval.ts
@@ -690,7 +690,7 @@ Environment:
EVAL_LLMS_FULL_URL Optional (omit docs line if unset)
EVAL_TOOLS Optional, comma-separated (default: Read,Glob,Grep[,WebFetch],Write,Edit,Bash — see README)
EVAL_MODEL Optional
- EVAL_MAX_TURNS Optional (default: 80; npm/go mod installs can exceed 40)
+ EVAL_MAX_TURNS Optional (default: 80; npm/go mod installs can exceed 40; lower only for smoke — may not finish 08–10)
EVAL_PERMISSION_MODE Optional (default: dontAsk)
EVAL_PERSIST_SESSION Set to "false" to disable session persistence (breaks multi-turn resume)
EVAL_DISABLE_WORKSPACE_WRITE_GUARD Set to 1 to allow Write/Edit outside the run dir (not recommended)
@@ -798,6 +798,11 @@ Agent cwd is usually the run directory. Scenarios may define ## Eval harness (JS
const turn0Prompt =
filledTemplate + buildWorkspaceBoundaryAppendix(runDir, agentCwd, REPO_ROOT, localDocs);
console.error(`\n>>> Scenario ${file} (run dir ${runDir}, agent cwd ${agentCwd}) ...`);
+ if (scenarioIdEarly === "08" || scenarioIdEarly === "09" || scenarioIdEarly === "10") {
+ console.error(
+ "Note: Scenarios 08–10 clone a full baseline and install deps — often 30–90+ min wall time with sparse console output until transcript.json. Ctrl+C aborts (writes *.eval-aborted.json). See README § Wall time.",
+ );
+ }
const sidecars = harnessSidecarPaths(runDir);
activeHarnessAbortContext = { path: sidecars.aborted, runDirectory: runDir };
From 4186ca3ddb2ff666cb1e8c5c17249d67757c22d7 Mon Sep 17 00:00:00 2001
From: Phil Leggetter
Date: Fri, 10 Apr 2026 18:28:11 +0100
Subject: [PATCH 37/47] docs(eval): update scenario run tracker for scenario 08
Record primary run 2026-04-10T14-29-04-214Z-scenario-08, heuristic 10/10,
execution pass, and execution notes (seed/dev, schema key vs SDK).
Made-with: Cursor
---
docs/agent-evaluation/SCENARIO-RUN-TRACKER.md | 39 ++++++++++++-------
1 file changed, 25 insertions(+), 14 deletions(-)
diff --git a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
index c043acaa7..7c789f207 100644
--- a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
+++ b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
@@ -18,20 +18,31 @@ Use this table while you **run scenarios one at a time** and **execute the gener
## Tracker
-| ID | Scenario file | Run directory (`results/runs/…`) | Heuristic | LLM judge | Execution (generated code) | Notes |
-| --- | ------------------------------------------------------------------------------ | -------------------------------------- | ---------------------- | --------- | -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
-| 01 | [01-basics-curl.md](scenarios/01-basics-curl.md) | `2026-04-10T09-28-52-764Z-scenario-01` | Pass (7/7) | Pass | Pass | Artifact: `**quickstart.sh`**. Heuristic + LLM from `npm run eval -- --scenario 01`; harness sidecars are sibling `*.eval-*.json` under `results/runs/` (not inside run dir). Execution: `OUTPOST_API_KEY` from `docs/agent-evaluation/.env` + `bash quickstart.sh` in run dir; tenant **200**, destination **201**, publish **202**; exit 0. |
-| 02 | [02-basics-typescript.md](scenarios/02-basics-typescript.md) | `2026-04-10T10-34-35-461Z-scenario-02` | Pass (9/9) | Pass | Pass | `EVAL_LOCAL_DOCS=1` after **scope-router** update to [agent prompt template](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx). Artifact: `**outpost-quickstart.ts`** + `package.json` (SDK)—**no** Next.js scaffold. Heuristic + LLM pass; judge `execution_in_transcript` **pass** (agent ran script in transcript: tenant, destination, event id). Harness sidecars sibling under `results/runs/`. Earlier over-build run: `2026-04-10T09-39-06-362Z-scenario-02` (Next.js + script; LLM fail). |
-| 03 | [03-basics-python.md](scenarios/03-basics-python.md) | `2026-04-10T11-02-19-073Z-scenario-03` | Pass (8/8) | Pass | Pass | `EVAL_LOCAL_DOCS=1` with [scope-router prompt](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx). Artifact: `**outpost_quickstart.py`** + `.env.example` (`python-dotenv`, `outpost_sdk`)—**no** web framework. Heuristic + LLM pass; judge `execution_in_transcript` **pass** (agent ran script; printed event id). Harness sidecars sibling under `results/runs/`. Earlier run: `2026-04-08T15-34-12-720Z-scenario-03`. |
-| 04 | [04-basics-go.md](scenarios/04-basics-go.md) | `2026-04-08T15-48-31-367Z-scenario-04` | Pass (9/9) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifacts: `**main.go`**, `go.mod` (replace → repo `sdks/outpost-go`). `docs/agent-evaluation/.env` + `go run .`; tenant, destination, publish OK. |
-| 05 | [05-app-nextjs.md](scenarios/05-app-nextjs.md) | `2026-04-08T17-21-22-170Z-scenario-05` | 9/10; overall **Fail** | Pass | Pass | `**nextjs-webhook-demo/`** — primary assessed run; see **§ Scenario 05 — assessment (`17-21-22`)**. Heuristic failure is `managed_base_not_selfhosted` (doc-corpus), not the app. Earlier: `2026-04-08T16-12-10-708Z` (`outpost-nextjs-demo/`, 10/10 heuristic, simpler UI). |
-| 06 | [06-app-fastapi.md](scenarios/06-app-fastapi.md) | `2026-04-09T08-38-42-008Z-scenario-06` | Pass (8/8) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. `**main.py`** + `requirements.txt`, `outpost_sdk` + FastAPI. HTML: destinations list, add webhook (topics from API + URL), publish test event, delete. Execution: `python3 -m venv .venv`, `pip install -r requirements.txt`, run-dir `.env`, `uvicorn main:app` on :8766; **GET /** 200, **POST /destinations** 303, **POST /publish** 303. |
-| 07 | [07-app-go-http.md](scenarios/07-app-go-http.md) | `2026-04-09T09-10-23-291Z-scenario-07` | Pass (9/9) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. `**go-portal-demo/`** — `main.go` + `templates/`, `net/http`, `outpost-go` (`replace` → repo `sdks/outpost-go`). Multi-step create destination + **GET/POST /publish**. Execution: `PORT=8777` + key/base from `docs/agent-evaluation/.env`; **GET /** 200, **POST /publish** 200. Eval ~25 min wall time. |
-| 08 | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md) | `2026-04-09T14-48-16-906Z-scenario-08` | Pass (7/7) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. `**## Eval harness`** pre-clone + `**agent cwd`** = `next-saas-starter/` under run dir; artifact colocated (`app/api/outpost/`**, dashboard webhooks, `@hookdeck/outpost-sdk`). **Execution:** `npx tsc --noEmit` in `…/next-saas-starter/` — **exit 0**. Eval ~13 min wall time. Earlier run `2026-04-09T11-08-32-505Z-scenario-08`: work had landed outside run dir (no app tree in folder). |
-| 09 | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) | `2026-04-09T22-16-54-750Z-scenario-09` | Pass (6/6) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. **Artifact** lives under `results/runs/…` (**gitignored**): `full-stack-fastapi-template/` + Docker **outpost-local-s09**; ports **5173** / **8001** / **54333** / **1080**. **§ Scenario 09 — post-agent work** lists everything applied after the agent transcript (incl. test publish, events/attempts/retry UI, docs + prompt). **§ Scenario 09 — review notes** — closed (IA + domain topics guidance landed in BYO UI + prompt). **Legacy runs:** `2026-04-09T20-48-16-530Z-scenario-09`, `2026-04-09T15-51-44-184Z-scenario-09`. |
-| 10 | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md) | | | | | |
+| ID | Scenario file | Run directory (`results/runs/…`) | Heuristic | LLM judge | Execution (generated code) | Notes |
+| --- | ------------------------------------------------------------------------------ | -------------------------------------- | ---------------------- | --------- | -------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| 01 | [01-basics-curl.md](scenarios/01-basics-curl.md) | `2026-04-10T09-28-52-764Z-scenario-01` | Pass (7/7) | Pass | Pass | Artifact: `**quickstart.sh`**. Heuristic + LLM from `npm run eval -- --scenario 01`; harness sidecars are sibling `*.eval-*.json` under `results/runs/` (not inside run dir). Execution: `OUTPOST_API_KEY` from `docs/agent-evaluation/.env` + `bash quickstart.sh` in run dir; tenant **200**, destination **201**, publish **202**; exit 0. |
+| 02 | [02-basics-typescript.md](scenarios/02-basics-typescript.md) | `2026-04-10T10-34-35-461Z-scenario-02` | Pass (9/9) | Pass | Pass | `EVAL_LOCAL_DOCS=1` after **scope-router** update to [agent prompt template](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx). Artifact: `**outpost-quickstart.ts`** + `package.json` (SDK)—**no** Next.js scaffold. Heuristic + LLM pass; judge `execution_in_transcript` **pass** (agent ran script in transcript: tenant, destination, event id). Harness sidecars sibling under `results/runs/`. Earlier over-build run: `2026-04-10T09-39-06-362Z-scenario-02` (Next.js + script; LLM fail). |
+| 03 | [03-basics-python.md](scenarios/03-basics-python.md) | `2026-04-10T11-02-19-073Z-scenario-03` | Pass (8/8) | Pass | Pass | `EVAL_LOCAL_DOCS=1` with [scope-router prompt](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx). Artifact: `**outpost_quickstart.py`** + `.env.example` (`python-dotenv`, `outpost_sdk`)—**no** web framework. Heuristic + LLM pass; judge `execution_in_transcript` **pass** (agent ran script; printed event id). Harness sidecars sibling under `results/runs/`. Earlier run: `2026-04-08T15-34-12-720Z-scenario-03`. |
+| 04 | [04-basics-go.md](scenarios/04-basics-go.md) | `2026-04-08T15-48-31-367Z-scenario-04` | Pass (9/9) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifacts: `**main.go`**, `go.mod` (replace → repo `sdks/outpost-go`). `docs/agent-evaluation/.env` + `go run .`; tenant, destination, publish OK. |
+| 05 | [05-app-nextjs.md](scenarios/05-app-nextjs.md) | `2026-04-08T17-21-22-170Z-scenario-05` | 9/10; overall **Fail** | Pass | Pass | `**nextjs-webhook-demo/`** — primary assessed run; see **§ Scenario 05 — assessment (`17-21-22`)**. Heuristic failure is `managed_base_not_selfhosted` (doc-corpus), not the app. Earlier: `2026-04-08T16-12-10-708Z` (`outpost-nextjs-demo/`, 10/10 heuristic, simpler UI). |
+| 06 | [06-app-fastapi.md](scenarios/06-app-fastapi.md) | `2026-04-09T08-38-42-008Z-scenario-06` | Pass (8/8) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. `**main.py`** + `requirements.txt`, `outpost_sdk` + FastAPI. HTML: destinations list, add webhook (topics from API + URL), publish test event, delete. Execution: `python3 -m venv .venv`, `pip install -r requirements.txt`, run-dir `.env`, `uvicorn main:app` on :8766; **GET /** 200, **POST /destinations** 303, **POST /publish** 303. |
+| 07 | [07-app-go-http.md](scenarios/07-app-go-http.md) | `2026-04-09T09-10-23-291Z-scenario-07` | Pass (9/9) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. `**go-portal-demo/`** — `main.go` + `templates/`, `net/http`, `outpost-go` (`replace` → repo `sdks/outpost-go`). Multi-step create destination + **GET/POST /publish**. Execution: `PORT=8777` + key/base from `docs/agent-evaluation/.env`; **GET /** 200, **POST /publish** 200. Eval ~25 min wall time. |
+| 08 | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md) | `2026-04-10T14-29-04-214Z-scenario-08` | Pass (10/10) | Pass | Pass | `EVAL_LOCAL_DOCS=1` + [scope-router prompt](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx). Harness `**next-saas-starter/`** under run dir (gitignored). **Execution pass** — operator QA (Postgres, `.env`, migrate/seed/dev, Outpost UI/API). See **§ Scenario 08 — execution notes** for reproducibility (seed/`server-only`, destination-schema `key` vs SDK). Earlier: `2026-04-10T11-08-35-921Z-scenario-08` (8/8), `2026-04-09T14-48-16-906Z-scenario-08`, `2026-04-09T11-08-32-505Z-scenario-08`. |
+| 09 | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) | `2026-04-09T22-16-54-750Z-scenario-09` | Pass (6/6) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. **Artifact** lives under `results/runs/…` (**gitignored**): `full-stack-fastapi-template/` + Docker **outpost-local-s09**; ports **5173** / **8001** / **54333** / **1080**. **§ Scenario 09 — post-agent work** lists everything applied after the agent transcript (incl. test publish, events/attempts/retry UI, docs + prompt). **§ Scenario 09 — review notes** — closed (IA + domain topics guidance landed in BYO UI + prompt). **Legacy runs:** `2026-04-09T20-48-16-530Z-scenario-09`, `2026-04-09T15-51-44-184Z-scenario-09`. |
+| 10 | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md) | | | | | |
+### Scenario 08 — execution notes (`2026-04-10T14-29-04-214Z-scenario-08`)
+
+**Execution:** **Pass** — operator QA on `**next-saas-starter/`** (artifact **not** committed; run folder under `results/runs/` is gitignored).
+
+Reproducibility / gotchas:
+
+- **`pnpm db:migrate`** — succeeds against local Postgres when `POSTGRES_URL` is set (see clone `README.md`).
+- **`pnpm db:seed`** — as generated, importing `stripe` from `**lib/payments/stripe.ts**` pulls Outpost and `**server-only**`, which throws when the seed script runs under `**tsx**` (not the Next server). Common **local** fix: instantiate `**Stripe**` directly in `**lib/db/seed.ts**` with the same `**apiVersion**` as the payments module so seed does not load that file. Requires valid **Stripe** keys in `.env`. Re-running seed after a successful run fails on duplicate `**test@test.com**` — expected.
+- **`pnpm dev`** — if another `**next dev**` already holds **`.next/dev/lock`** for this tree, stop it or remove the lock; port **3000** may be taken (Next picks another port). Turbopack may warn about multiple lockfiles when the app sits under the monorepo — see Next’s **`turbopack.root`** guidance if needed.
+- **Destination schema `key`** — API returns `key` on schema fields; older SDK parses may strip it and break create-destination payloads keyed from labels. Regenerating SDKs (or a BFF raw fetch + mapping) aligns the UI with the API until then.
+
### Scenario 09 — post-agent work (`2026-04-09T22-16-54-750Z-scenario-09`)
Work applied **after** the agent transcript so the FastAPI + React artifact matches current integration guidance (eval honesty + local execution). The template tree under `results/runs/-scenario-09/` is **not committed** (see `results/.gitignore`); repo **docs** and **prompt** updates that back this scenario **are** in git.
@@ -40,12 +51,12 @@ Work applied **after** the agent transcript so the FastAPI + React artifact matc
- **TanStack Router:** `frontend/src/routeTree.gen.ts` — register `/_layout/webhooks` (agent added the route file but not the generated tree).
- **API base URL:** webhooks page used browser-relative `/api/...` against nginx; switched to backend base (`OpenAPI.BASE` / `VITE_API_URL`).
-- **Destination types:** Outpost JSON uses `**type`** and `**icon**` (not `id` / `svg`); fixed controlled radios / **Next** in the create wizard.
+- **Destination types:** Outpost JSON uses `**type`** and `**icon`** (not `id` / `svg`); fixed controlled radios / **Next** in the create wizard.
**Backend**
- `**POST /api/v1/webhooks/publish-test`** — synthetic `publish` for integration testing.
-- `**GET /api/v1/webhooks/events**`, `**GET /api/v1/webhooks/attempts**`, `**POST /api/v1/webhooks/retry**` — BFF proxies for tenant-scoped **events list**, **attempts**, and **manual retry** (admin key server-side).
+- `**GET /api/v1/webhooks/events`**, `**GET /api/v1/webhooks/attempts**`, `**POST /api/v1/webhooks/retry**` — BFF proxies for tenant-scoped **events list**, **attempts**, and **manual retry** (admin key server-side).
**Dashboard UI (webhooks page)**
From c83d43d14895d9c0d21c1b41e2919b422198ca88 Mon Sep 17 00:00:00 2001
From: Phil Leggetter
Date: Fri, 10 Apr 2026 19:16:05 +0100
Subject: [PATCH 38/47] docs: drop destinations overview hub; clarify OSS
hosting in concepts
- Remove redundant destinations/overview.mdoc; link to overview#supported-destinations
from quickstarts, building-your-own-ui, nav, and redirects (/destinations)
- Document MAX_DESTINATIONS_PER_TENANT and DESTINATIONS_METADATA_PATH under
self-hosting configuration
- Concepts: Hookdeck hosts same open-source Outpost; GitHub feature requests for all
- Ignore docs/dist and docs/TEMP-*.md; remove temp onboarding status file
Made-with: Cursor
---
.gitignore | 1 +
...TEMP-hookdeck-outpost-onboarding-status.md | 101 ------------------
docs/content/concepts.mdoc | 8 +-
docs/content/destinations/overview.mdoc | 99 -----------------
docs/content/guides/building-your-own-ui.mdoc | 4 +-
docs/content/nav.json | 1 -
.../quickstarts/hookdeck-outpost-curl.mdoc | 2 +-
.../quickstarts/hookdeck-outpost-go.mdoc | 2 +-
.../quickstarts/hookdeck-outpost-python.mdoc | 2 +-
.../hookdeck-outpost-typescript.mdoc | 2 +-
docs/content/redirects.json | 2 +-
docs/content/self-hosting/configuration.mdoc | 7 ++
12 files changed, 20 insertions(+), 211 deletions(-)
delete mode 100644 docs/TEMP-hookdeck-outpost-onboarding-status.md
delete mode 100644 docs/content/destinations/overview.mdoc
diff --git a/.gitignore b/.gitignore
index 3ba3f42d2..64578dcf3 100644
--- a/.gitignore
+++ b/.gitignore
@@ -8,6 +8,7 @@
# Documentation (local build artifacts; content lives under docs/content/)
/docs/dist/
+/docs/TEMP-*.md
/tmp
# Golang test coverage
diff --git a/docs/TEMP-hookdeck-outpost-onboarding-status.md b/docs/TEMP-hookdeck-outpost-onboarding-status.md
deleted file mode 100644
index e37ec4a9d..000000000
--- a/docs/TEMP-hookdeck-outpost-onboarding-status.md
+++ /dev/null
@@ -1,101 +0,0 @@
-# Hookdeck Outpost onboarding — status (temporary)
-
-**Purpose:** Track implementation status for the managed quickstarts, agent prompt, and related work. **Delete this file** when tracking moves elsewhere (e.g. Linear, parent epic).
-
-**Last updated:** 2026-04-07
-
----
-
-## Agent eval harness — **implemented**; **prompt validation in progress**
-
-The automated harness in `docs/agent-evaluation/` is in place. **What it does today:**
-
-
-| Area | Status |
-| -------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| **Runner** | `src/run-agent-eval.ts` — **## Template** from `hookdeck-outpost-agent-prompt.mdx`, `{{…}}` from env, multi-turn scenarios, **Claude Agent SDK** with `**Read` / `Glob` / `Grep` / `WebFetch` / `Write` / `Edit` / `Bash`**, `**cwd`** = `results/runs/-scenario-NN/` |
-| **Artifacts** | `transcript.json`, optional `**heuristic-score.json`** + `**llm-score.json`** (LLM reads each scenario `**## Success criteria**`), agent-written files beside the transcript |
-| **Heuristics** | `score-transcript.ts` — `**scoreScenario01`–`scoreScenario10`** on assistant text + tool corpus (so **Write**/Edit content counts) |
-| **Scenarios** | **01–04:** try-it-out (curl, TS, Python, Go). **05–07:** minimal UIs (Next, FastAPI, Go `net/http`). **08–10:** Option 3 — integrate into pinned repos (Next `**leerob/next-saas-starter`**, FastAPI `**fastapi/full-stack-fastapi-template`**, Go `**devinterface/startersaas-go-api**`) |
-| **CLI** | `**npm run eval` requires `--scenario`, `--scenarios`, or `--all`** — no accidental full-suite run. Default scoring = **heuristic + LLM judge** unless `**--no-score`** / `**--no-score-llm`** or `**EVAL_NO_SCORE_***`. **Exit 1** if any enabled score fails |
-| **CI** | `**npm run eval:ci`** = `**--scenarios 01,02`** + heuristic **and** LLM judge. `**scripts/ci-eval.sh`** — requires `**ANTHROPIC_API_KEY`**, `**EVAL_TEST_DESTINATION_URL**` |
-| **Re-score** | `npm run score -- --run [--llm] [--write]` |
-
-
-**Operational**
-
-- Prefer a normal runner / full permissions for session persistence (`~/.claude/...`); tight sandboxes can break multi-turn resume.
-- **Validate the prompt in stages** (simple → complex); exact commands below.
-
-### Recommended run order (test evals → stress prompt)
-
-Run from `**docs/agent-evaluation/`** with `**.env`** set (`**ANTHROPIC_API_KEY**`, `**EVAL_TEST_DESTINATION_URL**`). Use a normal terminal (not a restricted sandbox) for reliable SDK sessions.
-
-**Stage A — basics (fast, minimal tooling)**
-
-```sh
-npm run eval -- --scenarios 01,02,03,04
-```
-
-**Stage B — minimal example apps**
-
-```sh
-npm run eval -- --scenarios 05,06,07
-```
-
-**Stage C — existing-app integration (clone + integrate; slowest)**
-
-```sh
-npm run eval -- --scenarios 08,09,10
-```
-
-**Full suite (explicit cost)**
-
-```sh
-npm run eval -- --all
-```
-
-After each stage, inspect `**results/runs/-scenario-NN/**` (transcript, scores, on-disk artifacts). **Goal:** confirm the **dashboard prompt** + **Success criteria** hold across stacks; **Execution** (live `**OUTPOST_API_KEY`**) remains a separate human step per scenario.
-
----
-
-## Agent eval automation (original plan — historical)
-
-1. **In-repo runner** — ✅ Node + Agent SDK (not shell-only `curl`).
-2. **Default backend: Anthropic** — ✅ Agent SDK.
-3. **Claude Code CLI** — Optional local path only (unchanged).
-4. **OpenAI adapter** — Still optional / not implemented.
-5. **Judging** — ✅ Transcripts on disk; ✅ heuristics; ✅ LLM-as-judge vs `**## Success criteria`**.
-6. **CI shape** — ✅ `eval:ci` + docs; **GitHub Actions workflow** not committed (add `workflow_dispatch` + secrets when ready).
-
-**Avoid as primary design:** brittle hand-rolled JSON in bash, or CLI-only gates that break for contributors and headless runners.
-
----
-
-## Done (Outpost OSS repo)
-
-- Managed quickstarts: `hookdeck-outpost-curl.mdx`, `-typescript.mdx`, `-python.mdx`, `-go.mdx`
-- Agent prompt template page: `hookdeck-outpost-agent-prompt.mdx` (includes **Files on disk** guidance)
-- Zudoku sidebar: **Quickstarts → Hookdeck Outpost** (above **Self-Hosted**)
-- `quickstarts.mdx` index: managed vs self-hosted links
-- Content aligned with product copy: API key from **Settings → Secrets**, verify via Hookdeck Console + project logs
-- SDK quickstarts: env vars, step-commented scripts
-- **Agent evaluation:** `docs/agent-evaluation/` — scenarios **01–10**, dual scoring, explicit CLI, CI slice, `**SCENARIO-RUN-TRACKER.md`** (per-scenario + execution log), `results/README.md`, `fixtures/`, `SKILL-UPSTREAM-NOTES.md`
-
-## Pending / follow-up
-
-- **Prompt + eval validation (in progress):** Run stages **A → B → C** above (or `**--all`** when deliberate); record pass/fail per scenario; adjust prompt or heuristics if systematic failures appear
-- **hookdeck/agent-skills:** Refresh `skills/outpost/SKILL.md` using `docs/agent-evaluation/SKILL-UPSTREAM-NOTES.md` (managed-first, correct `/tenants/` paths, env naming)
-- **QA:** Run TypeScript, Python, and Go examples against live managed API; confirm production doc links
-- **Test destination URL:** When Console has a stable public URL story, align quickstarts if copy changes
-- **Hookdeck Dashboard:** Two-step onboarding (topics → copy agent prompt) with placeholder injection; env UI for `OUTPOST_API_KEY` (not in prompt body)
-- **Hookdeck Astro site:** MDX, `llms.txt` / `llms-full.txt`, canonical `DOCS_URL`
-- **CI workflow:** Optional GitHub Actions job for `eval:ci` with secrets
-- **Deferred (not blocking GA):** Broader docs IA per original plan
-
-## References
-
-- OpenAPI / managed base URL: `https://api.outpost.hookdeck.com/2025-07-01` (in `docs/apis/openapi.yaml` `servers`)
-- Agent template source: `docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx`
-- Eval harness: `docs/agent-evaluation/README.md`
-
diff --git a/docs/content/concepts.mdoc b/docs/content/concepts.mdoc
index 90259efb8..270764d7d 100644
--- a/docs/content/concepts.mdoc
+++ b/docs/content/concepts.mdoc
@@ -100,10 +100,12 @@ The following destination types are available for your tenants to configure:
- [Azure Service Bus](/docs/outpost/destinations/azure-service-bus)
- [GCP Pub/Sub](/docs/outpost/destinations/gcp-pubsub)
- [RabbitMQ (AMQP)](/docs/outpost/destinations/rabbitmq)
-- [Amazon EventBridge (planned)](https://github.com/hookdeck/outpost/issues/201)
-- [Kafka (planned)](https://github.com/hookdeck/outpost/issues/141)
+- [Kafka](/docs/outpost/destinations/kafka)
+- Amazon EventBridge (planned)
-If there is an event destination type that you would like to see supported, [open a feature request](https://github.com/hookdeck/outpost/issues/new?assignees=&labels=enhancement&projects=&template=feature_request.md&title=%F0%9F%9A%80+Feature%3A+).
+**Hookdeck Outpost** is the same [open-source Outpost](https://github.com/hookdeck/outpost) project, operated on Hookdeck’s infrastructure. We do not maintain a separate hosted fork; what we run tracks the public codebase.
+
+If there is an event destination type you would like to see supported, [open a feature request on GitHub](https://github.com/hookdeck/outpost/issues/new?assignees=&labels=enhancement&projects=&template=feature_request.md&title=%F0%9F%9A%80+Feature%3A+).
For a diagram of how the API, delivery, and log services connect in **self-hosted** deployments, see [Self-hosting architecture](/docs/outpost/self-hosting/architecture).
diff --git a/docs/content/destinations/overview.mdoc b/docs/content/destinations/overview.mdoc
deleted file mode 100644
index 890ce3db0..000000000
--- a/docs/content/destinations/overview.mdoc
+++ /dev/null
@@ -1,99 +0,0 @@
----
-title: "Destinations"
-description: "Supported destination types, creating destinations, filters, and dynamic configuration from the API."
----
-
-Outpost supports multiple event destination types. Each tenant can have multiple destinations, up to a maximum set by the `MAX_DESTINATIONS_PER_TENANT` environment variable (defaulting to `20`).
-
-> We recommend setting the `MAX_DESTINATIONS_PER_TENANT` value as low as is appropriate for your use case to prevent abuse and performance degradation. Updating the value to a lower value later will not delete existing destinations.
-
-## Supported Destinations
-
-| Destination | Description |
-| ----------- | ----------- |
-| [Webhook](/docs/outpost/destinations/webhook) | Send events via HTTP POST to a URL |
-| [Hookdeck](/docs/outpost/destinations/hookdeck) | Route events through Hookdeck Event Gateway |
-| [AWS Kinesis](/docs/outpost/destinations/aws-kinesis) | Stream events to Amazon Kinesis |
-| [AWS SQS](/docs/outpost/destinations/aws-sqs) | Send events to an Amazon SQS queue |
-| [AWS S3](/docs/outpost/destinations/aws-s3) | Store events in an Amazon S3 bucket |
-| [Azure Service Bus](/docs/outpost/destinations/azure-service-bus) | Send events to Azure Service Bus |
-| [GCP Pub/Sub](/docs/outpost/destinations/gcp-pubsub) | Publish events to Google Cloud Pub/Sub |
-| [RabbitMQ](/docs/outpost/destinations/rabbitmq) | Send events to a RabbitMQ exchange |
-
-See the [Outpost overview](/docs/outpost/overview) and [GitHub issues](https://github.com/hookdeck/outpost/issues) for planned destination types. To be eligible as a destination type, it must be asynchronous in nature and not run any business logic.
-
-## Creating a Destination
-
-Destinations can be registered through the tenant portal or via the API. Each destination type has its own configuration and credentials. Refer to the [Create Destination API](/docs/outpost/api/destinations#create-destination) for the required `config` and `credentials` fields for each destination type.
-
-```sh
-curl --location 'https:///api/v1/tenants//destinations' \
---header 'Content-Type: application/json' \
---header 'Authorization: Bearer ' \
---data '{
- "type": "",
- "topics": [""],
- "config": { ... },
- "credentials": { ... }
-}'
-```
-
-## Destination Filtering
-
-Destinations can be configured with filters to selectively receive only events matching specific criteria. This allows tenants to create fine-grained routing rules based on event properties.
-
-See the [Filters](/docs/outpost/features/filter) documentation for the complete filter syntax and examples.
-
-## Getting Destination Types & Schemas
-
-When using the API, you may want to build your own UI to capture user input on the destination configuration. Since each destination requires a specific configuration, the `GET /destination-types` endpoint provides a JSON schema for standardized input fields for each destination type.
-
-For example, for the `webhook` type:
-
-```json
-{
- "type": "webhook",
- "label": "Webhook",
- "description": "Send events via an HTTP POST request to a URL",
- "icon": "",
- "instructions": "Some *markdown*",
- "remote_setup_url": null,
- "config_fields": [
- {
- "key": "url",
- "type": "text",
- "label": "URL",
- "description": "The URL to send the event to",
- "pattern": "/((([A-Za-z]{3,9}:(?://)?)(?:[-;:&=+$,w]+@)?[A-Za-z0-9.-]+(:[0-9]+)?|(?:www.|[-;:&=+$,w]+@)[A-Za-z0-9.-]+)((?:/[+~%/.w-_]*)???(?:[-+=&;%@.w_]*)#?(?:[w]*))?)/",
- "required": true
- }
- ],
- "credential_fields": []
-}
-```
-
-### `config_fields` `Field[]`
-
-Config fields are non-secret values that can be stored and displayed to the user in plain text.
-
-### `credential_fields` `Field[]`
-
-Credential fields are secret values that will be AES encrypted and obfuscated to the user. Some credentials may not be obfuscated; the destination type dictates the obfuscation logic.
-
-### `instructions` `string`
-
-Some destinations will require instructions to configure. For instance, with Pub/Sub, the user will need to create a service account and grant some permissions to that service account. The value is a markdown string to be rendered with any markdown rendering library. Images will be hosted through the GitHub repository.
-
-### `remote_setup_url`
-
-Some destinations may have OAuth flow or other managed setup flow that can be triggered with a link. If a `remote_setup_url` is set, then the user should be prompted to follow the link to configure the destination.
-
-See the [building your own UI guide](/docs/outpost/guides/building-your-own-ui) for recommended UI patterns and wireframes for implementation in your own app.
-
-## Customizing Destination Type Definitions & Instructions
-
-The destination type definitions (label, description, icon, etc) and instructions can be customized by setting the `DESTINATIONS_METADATA_PATH` environment variable to a path on disk containing the destination type definitions and instructions. Outpost will load both the default destination type definitions and any custom destination type definitions and merge them.
-
-> Note: Core fields (`config_fields` and `credential_fields`) cannot be overridden via custom metadata. Only non-core fields such as `label`, `description`, `icon`, and `instructions` can be customized.
-
-The metadata path is a directory containing a subdirectory for each destination type. Each destination type directory contains a `metadata.json` file and an `instructions.md` file. You can find the default destination type definitions and instructions in the [outpost-providers](https://github.com/hookdeck/outpost/tree/main/internal/destregistry/metadata/providers) folder.
diff --git a/docs/content/guides/building-your-own-ui.mdoc b/docs/content/guides/building-your-own-ui.mdoc
index 405a3ff7c..59ea3ae51 100644
--- a/docs/content/guides/building-your-own-ui.mdoc
+++ b/docs/content/guides/building-your-own-ui.mdoc
@@ -42,7 +42,7 @@ The tenant portal illustrates how screens map to tenant → destinations → top
### Default information architecture (multi-destination products)
-When a tenant can have many destinations—of any [destination type](/docs/outpost/destinations/overview) your project enables—the primary path is destination → activity: people ask “what was delivered to this subscription?” rather than seeing all traffic in one undifferentiated list. The same API applies for webhooks, queues, and other types; only create/edit forms differ, driven by [destination type metadata and dynamic config](#destination-type-metadata-and-dynamic-config).
+When a tenant can have many destinations—of any [destination type](/docs/outpost/overview#supported-destinations) your project enables—the primary path is destination → activity: people ask “what was delivered to this subscription?” rather than seeing all traffic in one undifferentiated list. The same API applies for webhooks, queues, and other types; only create/edit forms differ, driven by [destination type metadata and dynamic config](#destination-type-metadata-and-dynamic-config).
For list events and list attempts, reuse the same endpoints everywhere: vary query parameters (for example `destination_id`, cursors) rather than inventing parallel client-side contracts. Pagination and auth details are defined in [OpenAPI](/docs/outpost/api); [Events, attempts, and retries](#events-attempts-and-retries) below summarizes how those endpoints support common UI needs.
@@ -51,7 +51,7 @@ For list events and list attempts, reuse the same endpoints everywhere: vary que
| Example route | What it does | Spec |
| ------------- | ------------ | ---- |
| `…/destinations` or `…/integrations` | Hub: list destinations; create or drill down | [Listing destinations](#listing-configured-destinations) · [List destinations](/docs/outpost/api/destinations#list-destinations) |
-| `…/destinations/new` (or wizard) | Create destination: choose type ([types](/docs/outpost/destinations/overview); `GET /destination-types` in [OpenAPI](/docs/outpost/api)), then topics and config | [Creating a destination](#creating-a-destination) |
+| `…/destinations/new` (or wizard) | Create destination: choose type ([types](/docs/outpost/overview#supported-destinations); `GET /destination-types` in [OpenAPI](/docs/outpost/api)), then topics and config | [Creating a destination](#creating-a-destination) |
| `…/destinations/:destinationId` | Detail: edit config, enable/disable, topics | [OpenAPI](/docs/outpost/api) — Destinations |
| `…/destinations/:destinationId/activity` | Activity for this destination: events, attempts, retry | [Events, attempts, and retries](#events-attempts-and-retries) · [List events](/docs/outpost/api/events#list-events) · [List attempts](/docs/outpost/api/attempts#list-attempts) |
| `…/activity` (optional) | Tenant-wide activity; optional filter by `destination_id` | Same list-events operation with different query params ([OpenAPI](/docs/outpost/api)) |
diff --git a/docs/content/nav.json b/docs/content/nav.json
index 525f6eb4d..f7147e05b 100644
--- a/docs/content/nav.json
+++ b/docs/content/nav.json
@@ -56,7 +56,6 @@
"label": "Destinations",
"sections": [
[
- { "slug": "destinations/overview", "title": "Overview" },
{ "slug": "destinations/webhook", "title": "Webhook" },
{
"slug": "destinations/hookdeck",
diff --git a/docs/content/quickstarts/hookdeck-outpost-curl.mdoc b/docs/content/quickstarts/hookdeck-outpost-curl.mdoc
index a14900d86..69cabaeea 100644
--- a/docs/content/quickstarts/hookdeck-outpost-curl.mdoc
+++ b/docs/content/quickstarts/hookdeck-outpost-curl.mdoc
@@ -97,7 +97,7 @@ If you combine API response bodies with `curl --write-out '\n%{http_code}'`:
## Next steps
-- [Destination types](/docs/outpost/destinations/overview) — webhooks, AWS SQS, RabbitMQ, Hookdeck, and more
+- [Destination types](/docs/outpost/overview#supported-destinations) — webhooks, AWS SQS, RabbitMQ, Hookdeck, and more
- [Tenant user portal](/docs/outpost/features/tenant-user-portal) — optional UI for tenants to manage their own destinations
- [SDKs](/docs/outpost/sdks) — TypeScript, Python, Go, and others
- [API reference](/docs/outpost/api) — full REST API
diff --git a/docs/content/quickstarts/hookdeck-outpost-go.mdoc b/docs/content/quickstarts/hookdeck-outpost-go.mdoc
index 1bc22ad0e..1b8f999d4 100644
--- a/docs/content/quickstarts/hookdeck-outpost-go.mdoc
+++ b/docs/content/quickstarts/hookdeck-outpost-go.mdoc
@@ -157,7 +157,7 @@ For all topics on that destination, use `components.CreateTopicsTopicsEnum(compo
## Next steps
-- [Destination types](/docs/outpost/destinations/overview)
+- [Destination types](/docs/outpost/overview#supported-destinations)
- [Tenant user portal](/docs/outpost/features/tenant-user-portal)
- [SDKs](/docs/outpost/sdks)
- [API reference](/docs/outpost/api)
diff --git a/docs/content/quickstarts/hookdeck-outpost-python.mdoc b/docs/content/quickstarts/hookdeck-outpost-python.mdoc
index 75a02c2b4..37a627001 100644
--- a/docs/content/quickstarts/hookdeck-outpost-python.mdoc
+++ b/docs/content/quickstarts/hookdeck-outpost-python.mdoc
@@ -128,7 +128,7 @@ Use `topics: ["*"]` on the destination to receive all configured topics.
## Next steps
-- [Destination types](/docs/outpost/destinations/overview)
+- [Destination types](/docs/outpost/overview#supported-destinations)
- [Tenant user portal](/docs/outpost/features/tenant-user-portal)
- [SDKs](/docs/outpost/sdks)
- [API reference](/docs/outpost/api)
diff --git a/docs/content/quickstarts/hookdeck-outpost-typescript.mdoc b/docs/content/quickstarts/hookdeck-outpost-typescript.mdoc
index a51dabc11..c58381103 100644
--- a/docs/content/quickstarts/hookdeck-outpost-typescript.mdoc
+++ b/docs/content/quickstarts/hookdeck-outpost-typescript.mdoc
@@ -129,7 +129,7 @@ To subscribe the destination to all topics, pass `topics: ["*"]` instead of `[to
## Next steps
-- [Destination types](/docs/outpost/destinations/overview)
+- [Destination types](/docs/outpost/overview#supported-destinations)
- [Tenant user portal](/docs/outpost/features/tenant-user-portal)
- [SDKs](/docs/outpost/sdks)
- [API reference](/docs/outpost/api)
diff --git a/docs/content/redirects.json b/docs/content/redirects.json
index ec6304c5b..cc0570e92 100644
--- a/docs/content/redirects.json
+++ b/docs/content/redirects.json
@@ -21,7 +21,7 @@
},
{
"from": "/docs/outpost/destinations",
- "to": "/docs/outpost/destinations/overview"
+ "to": "/docs/outpost/overview#supported-destinations"
},
{
"from": "/docs/outpost/guides",
diff --git a/docs/content/self-hosting/configuration.mdoc b/docs/content/self-hosting/configuration.mdoc
index ec8e9126c..b7af25c83 100644
--- a/docs/content/self-hosting/configuration.mdoc
+++ b/docs/content/self-hosting/configuration.mdoc
@@ -102,6 +102,13 @@ Choose one for event log persistence:
| `ALERT_CONSECUTIVE_FAILURE_COUNT` | `20` | Consecutive failures before alert triggers |
| `ALERT_AUTO_DISABLE_DESTINATION` | `true` | Auto-disable destination when failure count reaches 100% |
+## Destinations
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `MAX_DESTINATIONS_PER_TENANT` | `20` | Maximum destinations each tenant may create. Set as low as is practical for your product to limit abuse and load; lowering this value later does **not** remove destinations that already exist. |
+| `DESTINATIONS_METADATA_PATH` | — | Optional. Filesystem path to a directory of [custom destination metadata](https://github.com/hookdeck/outpost/tree/main/internal/destregistry/metadata/providers) (per-type `metadata.json` and `instructions.md`). Non-core fields such as `label`, `description`, `icon`, and `instructions` can be customized; `config_fields` and `credential_fields` cannot be overridden. |
+
## Webhook Behavior
| Variable | Default | Description |
From 14ab5a27348869aac36f969cca84861cc01a2cf0 Mon Sep 17 00:00:00 2001
From: Phil Leggetter
Date: Fri, 10 Apr 2026 22:59:40 +0100
Subject: [PATCH 39/47] docs: use hookdeck.com/docs/outpost for production doc
links
Update README, OpenAPI contact URL, entrypoint migration hint, and example
READMEs so public links match Outpost docs on Hookdeck.
Made-with: Cursor
---
README.md | 22 +++++++++----------
build/entrypoint.sh | 2 +-
docs/apis/openapi.yaml | 2 +-
examples/azure/README.md | 2 +-
.../demos/dashboard-integration/README.md | 2 +-
examples/kubernetes/README.md | 2 +-
6 files changed, 16 insertions(+), 16 deletions(-)
diff --git a/README.md b/README.md
index b5978eb84..1e00d30e7 100644
--- a/README.md
+++ b/README.md
@@ -53,7 +53,7 @@ Outpost is built and maintained by [Hookdeck](https://hookdeck.com?ref=github-ou

-Read [Outpost Concepts](https://outpost.hookdeck.com/docs/concepts) to learn more about the Outpost architecture and design.
+Read [Outpost Concepts](https://hookdeck.com/docs/outpost/concepts) to learn more about the Outpost architecture and design.
## Features
@@ -70,17 +70,17 @@ Read [Outpost Concepts](https://outpost.hookdeck.com/docs/concepts) to learn mor
- **Webhook best practices**: Opt-out webhook best practices, such as headers for idempotency, timestamp and signature, and signature rotation.
- **SDKs and MCP server**: Go, Python, and TypeScript SDK are available. Outpost also ships with an MCP server. All generated by [Speakeasy](https://speakeasy.com).
-See the [Outpost Features](https://outpost.hookdeck.com/docs/features) for more information.
+See the [Outpost Features](https://hookdeck.com/docs/outpost/features) for more information.
## Documentation
-- [Overview](https://outpost.hookdeck.com/docs/overview)
-- [Concepts](https://outpost.hookdeck.com/docs/concepts)
-- [Quickstarts](https://outpost.hookdeck.com/docs/quickstarts)
-- [Features](https://outpost.hookdeck.com/docs/features)
-- [Guides](https://outpost.hookdeck.com/docs/guides)
-- [API Reference](https://outpost.hookdeck.com/docs/api)
-- [Configuration Reference](https://outpost.hookdeck.com/docs/references/configuration)
+- [Overview](https://hookdeck.com/docs/outpost/overview)
+- [Concepts](https://hookdeck.com/docs/outpost/concepts)
+- [Quickstarts](https://hookdeck.com/docs/outpost/quickstarts)
+- [Features](https://hookdeck.com/docs/outpost/features)
+- [Guides](https://hookdeck.com/docs/outpost/guides)
+- [API Reference](https://hookdeck.com/docs/outpost/api)
+- [Configuration Reference](https://hookdeck.com/docs/outpost/self-hosting/configuration)
_The Outpost documentation is built using the [Zudoku documentation framework](https://zuplo.link/outpost)._
@@ -144,7 +144,7 @@ For other cloud Redis services or self-hosted Redis clusters, set `REDIS_CLUSTER
```sh
go run cmd/redis-debug/main.go your-redis-host 6379 password 0 [tls] [cluster]
```
-See the [Redis Troubleshooting Guide](https://docs.outpost.hookdeck.com/references/troubleshooting-redis) for detailed guidance.
+See the [Redis Troubleshooting Guide](https://hookdeck.com/docs/outpost/self-hosting/guides/troubleshooting-redis) for detailed guidance.
Start the Outpost dependencies and services:
@@ -241,7 +241,7 @@ Open the `redirect_url` link to view the Outpost portal.

-Continue to use the [Outpost API](https://outpost.hookdeck.com/docs/api) or the Outpost portal to add and test more destinations.
+Continue to use the [Outpost API](https://hookdeck.com/docs/outpost/api) or the Outpost portal to add and test more destinations.
## Contributing
diff --git a/build/entrypoint.sh b/build/entrypoint.sh
index ab22587f8..fce97672c 100755
--- a/build/entrypoint.sh
+++ b/build/entrypoint.sh
@@ -23,7 +23,7 @@ if ! /usr/local/bin/outpost migrate init --current --log-format=json; then
echo " docker run --rm hookdeck/outpost migrate --help"
echo ""
echo "Learn more about Outpost migration workflow at:"
- echo " https://outpost.hookdeck.com/docs/guides/migration"
+ echo " https://hookdeck.com/docs/outpost/self-hosting/guides/migration"
echo ""
exit 1
fi
diff --git a/docs/apis/openapi.yaml b/docs/apis/openapi.yaml
index 4c03e4e20..ba3309cc6 100644
--- a/docs/apis/openapi.yaml
+++ b/docs/apis/openapi.yaml
@@ -7,7 +7,7 @@ info:
contact:
name: Outpost Support
email: support@hookdeck.com
- url: https://outpost.hookdeck.com/docs
+ url: https://hookdeck.com/docs/outpost
security:
- AdminApiKey: []
- TenantJwt: []
diff --git a/examples/azure/README.md b/examples/azure/README.md
index b2434da4f..12d9a8e9a 100644
--- a/examples/azure/README.md
+++ b/examples/azure/README.md
@@ -368,7 +368,7 @@ For most users, `azure-deploy.sh` offers a balance of automation, reliability, a
If you are not using the `dependencies.sh` and `local-deploy.sh` scripts to provision your infrastructure, you will need to create the `.env.outpost` and `.env.runtime` files manually.
-See the [Configure Azure Service Bus as the Outpost Internal Message Queue](https://outpost.hookdeck.com/docs/guides/service-bus-internal-mq) guide for more details on the environment variables required for Outpost and how to create the values.
+See the [Configure Azure Service Bus as the Outpost Internal Message Queue](https://hookdeck.com/docs/outpost/self-hosting/guides/service-bus-internal-mq) guide for more details on the environment variables required for Outpost and how to create the values.
### `.env.outpost`
diff --git a/examples/demos/dashboard-integration/README.md b/examples/demos/dashboard-integration/README.md
index a8026b4e9..f085ac596 100644
--- a/examples/demos/dashboard-integration/README.md
+++ b/examples/demos/dashboard-integration/README.md
@@ -49,7 +49,7 @@ A Next.js application demonstrating how to integrate Outpost with an API platfor
TOPICS=user.created,user.updated,order.completed,payment.processed,subscription.created
```
- For a full list of Outpost configuration options, see [Outpost Configuration](https://outpost.hookdeck.com/docs/references/configuration)
+ For a full list of Outpost configuration options, see [Outpost Configuration](https://hookdeck.com/docs/outpost/self-hosting/configuration)
4. **Start the complete stack** (PostgreSQL, Redis, RabbitMQ, and Outpost):
diff --git a/examples/kubernetes/README.md b/examples/kubernetes/README.md
index 14519f954..5cfe9d147 100644
--- a/examples/kubernetes/README.md
+++ b/examples/kubernetes/README.md
@@ -1 +1 @@
-See https://outpost.hookdeck.com/docs/quickstarts/kubernetes
\ No newline at end of file
+See https://hookdeck.com/docs/outpost/self-hosting/quickstarts/kubernetes
\ No newline at end of file
From 81b2aff46afc37015dc9bbf9f9645de80dd83867 Mon Sep 17 00:00:00 2001
From: Phil Leggetter
Date: Fri, 10 Apr 2026 23:00:19 +0100
Subject: [PATCH 40/47] docs(eval): Hookdeck prod as default {{DOCS_URL}}; fix
harness doc paths
- Default EVAL_DOCS_URL to https://hookdeck.com/docs/outpost
- Replace invalid destinations directory path with overview + webhook mdoc
- Document placeholder examples in agent prompt and fixtures
Made-with: Cursor
---
docs/agent-evaluation/.env.example | 5 +-
.../fixtures/placeholder-values-for-turn0.md | 4 +-
docs/agent-evaluation/src/run-agent-eval.ts | 96 ++++++++++++++-----
.../hookdeck-outpost-agent-prompt.mdoc | 2 +-
4 files changed, 79 insertions(+), 28 deletions(-)
diff --git a/docs/agent-evaluation/.env.example b/docs/agent-evaluation/.env.example
index 9df940ad4..9f1392e98 100644
--- a/docs/agent-evaluation/.env.example
+++ b/docs/agent-evaluation/.env.example
@@ -15,13 +15,16 @@ EVAL_TEST_DESTINATION_URL=
# Optional (see npm run eval -- --help)
# EVAL_API_BASE_URL=https://api.outpost.hookdeck.com/2025-07-01
# EVAL_TOPICS_LIST=- user.created
-# EVAL_DOCS_URL=https://outpost.hookdeck.com/docs
+# EVAL_DOCS_URL=https://hookdeck.com/docs/outpost
# EVAL_LOCAL_DOCS=1
# EVAL_LLMS_FULL_URL=
# Default includes Write, Edit, Bash (per-run workspace + installs). Override to narrow:
# EVAL_TOOLS=Read,Glob,Grep,WebFetch,Write,Edit,Bash
# EVAL_MODEL=
# EVAL_MAX_TURNS=40
+# Long runs (08–10): periodic stderr heartbeats while each agent query is in flight
+# EVAL_PROGRESS=1
+# EVAL_PROGRESS_INTERVAL_MS=30000
# EVAL_PERMISSION_MODE=dontAsk
# EVAL_PERSIST_SESSION=true
# Debug only: allow Write/Edit outside the per-run workspace (not recommended)
diff --git a/docs/agent-evaluation/fixtures/placeholder-values-for-turn0.md b/docs/agent-evaluation/fixtures/placeholder-values-for-turn0.md
index f17f94ce6..152bcf9d3 100644
--- a/docs/agent-evaluation/fixtures/placeholder-values-for-turn0.md
+++ b/docs/agent-evaluation/fixtures/placeholder-values-for-turn0.md
@@ -2,7 +2,7 @@
The **prompt template itself** lives in one place only:
-`**[hookdeck-outpost-agent-prompt.mdx](../../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx)`** (from repo root: `docs/pages/quickstarts/...`) — copy the fenced block under **## Template**, then replace each `{{PLACEHOLDER}}` using the table below.
+`**[hookdeck-outpost-agent-prompt.mdoc](../../content/quickstarts/hookdeck-outpost-agent-prompt.mdoc)`** (from repo root: `docs/content/quickstarts/...`) — copy the fenced block under **## Template**, then replace each `{{PLACEHOLDER}}` using the table below.
Do **not** paste real API keys into chat. Have operators put `OUTPOST_API_KEY` in a project `**.env`** (or another loader), not in the agent transcript. Use a throwaway Hookdeck project when possible.
@@ -18,7 +18,7 @@ For `**npm run eval -- --scenario …**` (or `**--scenarios**` / `**--all**`), t
| `{{API_BASE_URL}}` | `https://api.outpost.hookdeck.com/2025-07-01` |
| `{{TOPICS_LIST}}` | `- user.created` |
| `{{TEST_DESTINATION_URL}}` | Hookdeck Console **Source** URL the dashboard feeds in (for automated evals, set `EVAL_TEST_DESTINATION_URL` to the same value). Example: `https://hkdk.events/...` |
-| `{{DOCS_URL}}` | `https://outpost.hookdeck.com/docs` (local Zudoku: same paths under `/docs`) |
+| `{{DOCS_URL}}` | `https://hookdeck.com/docs/outpost` (same path segments as `/docs/outpost/…` on hookdeck.com; see `docs/content/nav.json`) |
| `{{LLMS_FULL_URL}}` | Omit the line in the template if unused, or your public `llms-full.txt` URL |
diff --git a/docs/agent-evaluation/src/run-agent-eval.ts b/docs/agent-evaluation/src/run-agent-eval.ts
index ba1129170..26781248a 100644
--- a/docs/agent-evaluation/src/run-agent-eval.ts
+++ b/docs/agent-evaluation/src/run-agent-eval.ts
@@ -35,7 +35,7 @@ dotenv.config({ path: join(EVAL_ROOT, ".env") });
const REPO_ROOT = join(EVAL_ROOT, "..", "..");
const PROMPT_MDX = join(
REPO_ROOT,
- "docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx",
+ "docs/content/quickstarts/hookdeck-outpost-agent-prompt.mdoc",
);
const SCENARIOS_DIR = join(EVAL_ROOT, "scenarios");
const RUNS_DIR = join(EVAL_ROOT, "results", "runs");
@@ -101,7 +101,9 @@ function isInitSystemMessage(m: SDKMessage): m is SDKSystemMessage {
function extractTemplateFromMdx(mdx: string): string {
const idx = mdx.indexOf("## Template");
if (idx === -1) {
- throw new Error("Could not find ## Template in hookdeck-outpost-agent-prompt.mdx");
+ throw new Error(
+ "Could not find ## Template in hookdeck-outpost-agent-prompt.mdoc",
+ );
}
const after = mdx.slice(idx);
const fenceStart = after.indexOf("```");
@@ -122,6 +124,15 @@ function envFlagTruthy(v: string | undefined): boolean {
return s === "1" || s === "true" || s === "yes";
}
+/** Wall-clock heartbeat while the SDK stream is quiet (e.g. long Bash / blocked subprocess). */
+function evalProgressIntervalMs(): number {
+ const n = Number(process.env.EVAL_PROGRESS_INTERVAL_MS ?? "30000");
+ if (!Number.isFinite(n) || n < 5000) {
+ return 30000;
+ }
+ return n;
+}
+
/** When docs are not published yet, point the agent at MDX/OpenAPI paths in this repo. */
function localDocumentationBlock(repoRoot: string, llmsFullUrl: string | undefined): string {
const f = (...parts: string[]) => join(repoRoot, ...parts);
@@ -145,19 +156,19 @@ Do **not** mix TS call shapes into Python.`;
Do **not** rely on live public documentation URLs for this session. Read these files from the Outpost checkout (for example with the **Read** tool). Paths are absolute from the repository root:
-Follow **Language → SDK vs HTTP** below for mapping user intent to the **single** right quickstart. Prefer language quickstarts over \`sdks.mdx\` (TS-heavy).
+Follow **Language → SDK vs HTTP** below for mapping user intent to the **single** right quickstart. Prefer language quickstarts over \`sdks.mdoc\` (TS-heavy).
-- **Concepts** (tenants, destinations as subscriptions, topics, how this fits a SaaS/platform): \`${f("docs/pages/concepts.mdx")}\`
-- **Building your own UI** (screen structure: list destinations, create flow type → topics → config): \`${f("docs/pages/guides/building-your-own-ui.mdx")}\`
-- **Topics** (destination topic subscriptions, fan-out): \`${f("docs/pages/features/topics.mdx")}\`
-- Getting started (curl / HTTP only): \`${f("docs/pages/quickstarts/hookdeck-outpost-curl.mdx")}\`
-- TypeScript quickstart (TS SDK): \`${f("docs/pages/quickstarts/hookdeck-outpost-typescript.mdx")}\`
-- Python quickstart (Python SDK): \`${f("docs/pages/quickstarts/hookdeck-outpost-python.mdx")}\`
-- Go quickstart (Go SDK): \`${f("docs/pages/quickstarts/hookdeck-outpost-go.mdx")}\`
-- API reference (human-oriented pages under): \`${f("docs/pages/references/")}\`
+- **Concepts** (tenants, destinations as subscriptions, topics, how this fits a SaaS/platform): \`${f("docs/content/concepts.mdoc")}\`
+- **Building your own UI** (screen structure: list destinations, create flow type → topics → config): \`${f("docs/content/guides/building-your-own-ui.mdoc")}\`
+- **Topics** (destination topic subscriptions, fan-out): \`${f("docs/content/features/topics.mdoc")}\`
+- Getting started (curl / HTTP only): \`${f("docs/content/quickstarts/hookdeck-outpost-curl.mdoc")}\`
+- TypeScript quickstart (TS SDK): \`${f("docs/content/quickstarts/hookdeck-outpost-typescript.mdoc")}\`
+- Python quickstart (Python SDK): \`${f("docs/content/quickstarts/hookdeck-outpost-python.mdoc")}\`
+- Go quickstart (Go SDK): \`${f("docs/content/quickstarts/hookdeck-outpost-go.mdoc")}\`
+- Docs content (browse for feature pages): \`${f("docs/content/")}\`
- OpenAPI spec (machine-readable): \`${f("docs/apis/openapi.yaml")}\`
-- Destination types: \`${f("docs/pages/destinations/")}\`
-- SDKs overview (TS-heavy): \`${f("docs/pages/sdks.mdx")}\` — prefer the language quickstart over this for Python/Go/TS code.
+- **Destination types** (summary + links): \`${f("docs/content/overview.mdoc")}\` — *Supported destinations*; per-type detail in \`docs/content/destinations/*.mdoc\` (e.g. \`${f("docs/content/destinations/webhook.mdoc")}\`)
+- SDKs overview (TS-heavy): \`${f("docs/content/sdks.mdoc")}\` — prefer the language quickstart over this for Python/Go/TS code.
${languageSdkBlock}`;
if (llmsFullUrl) {
@@ -180,7 +191,7 @@ function applyPlaceholders(
"Set EVAL_TEST_DESTINATION_URL to your Hookdeck Console Source URL (same value the dashboard injects as {{TEST_DESTINATION_URL}})",
);
}
- const docsUrl = env.EVAL_DOCS_URL ?? "https://outpost.hookdeck.com/docs";
+ const docsUrl = env.EVAL_DOCS_URL ?? "https://hookdeck.com/docs/outpost";
const llms = env.EVAL_LLMS_FULL_URL?.trim() ?? "";
const useLocalDocs = envFlagTruthy(env.EVAL_LOCAL_DOCS);
@@ -301,18 +312,49 @@ function idFromFilename(file: string): string {
async function runScenarioQuery(
prompt: string,
options: Options,
+ progress?: { readonly phaseLabel: string },
): Promise<{ messages: unknown[]; sessionId?: string }> {
const messages: unknown[] = [];
let sessionId: string | undefined;
+ const progressOn = envFlagTruthy(process.env.EVAL_PROGRESS);
+ const label = progress?.phaseLabel ?? "agent query";
+ let msgCount = 0;
+ let interval: ReturnType | undefined;
- const q = query({ prompt, options });
- for await (const message of q) {
- messages.push(serializeMessage(message));
- if (isInitSystemMessage(message)) {
- sessionId = message.session_id;
+ if (progressOn && progress) {
+ const maxTurns = options.maxTurns;
+ console.error(
+ `[eval] ${label}: starting (EVAL_PROGRESS=1; heartbeat every ${evalProgressIntervalMs()}ms; maxTurns=${String(maxTurns)})`,
+ );
+ interval = setInterval(() => {
+ console.error(
+ `[eval] ${label}: still running (${msgCount} SDK message(s) so far — subprocess or model may be busy with no new stream events)`,
+ );
+ }, evalProgressIntervalMs());
+ }
+
+ try {
+ const q = query({ prompt, options });
+ for await (const message of q) {
+ msgCount += 1;
+ messages.push(serializeMessage(message));
+ if (isInitSystemMessage(message)) {
+ sessionId = message.session_id;
+ }
+ if (progressOn && progress && msgCount > 0 && msgCount % 25 === 0) {
+ console.error(`[eval] ${label}: ${msgCount} SDK message(s) received`);
+ }
+ }
+ } finally {
+ if (interval !== undefined) {
+ clearInterval(interval);
}
}
+ if (progressOn && progress) {
+ console.error(`[eval] ${label}: finished this query (${msgCount} SDK message(s))`);
+ }
+
return { messages, sessionId };
}
@@ -354,10 +396,14 @@ async function runOneScenario(
for (let i = 0; i < prompts.length; i++) {
const label = i === 0 ? "Turn 0 (dashboard prompt)" : userTurns[i - 1]?.label ?? `Turn ${i}`;
const before = allMessages.length;
- const { messages, sessionId: sid } = await runScenarioQuery(prompts[i]!, {
- ...opts.baseOptions,
- resume: sessionId,
- });
+ const { messages, sessionId: sid } = await runScenarioQuery(
+ prompts[i]!,
+ {
+ ...opts.baseOptions,
+ resume: sessionId,
+ },
+ { phaseLabel: label },
+ );
if (sid) {
sessionId = sid;
}
@@ -691,6 +737,8 @@ Environment:
EVAL_TOOLS Optional, comma-separated (default: Read,Glob,Grep[,WebFetch],Write,Edit,Bash — see README)
EVAL_MODEL Optional
EVAL_MAX_TURNS Optional (default: 80; npm/go mod installs can exceed 40; lower only for smoke — may not finish 08–10)
+ EVAL_PROGRESS Set to 1/true/yes — log heartbeats to stderr during each agent query (see EVAL_PROGRESS_INTERVAL_MS)
+ EVAL_PROGRESS_INTERVAL_MS Optional (default: 30000, min 5000) — wall-clock heartbeat while the SDK stream is quiet
EVAL_PERMISSION_MODE Optional (default: dontAsk)
EVAL_PERSIST_SESSION Set to "false" to disable session persistence (breaks multi-turn resume)
EVAL_DISABLE_WORKSPACE_WRITE_GUARD Set to 1 to allow Write/Edit outside the run dir (not recommended)
@@ -800,7 +848,7 @@ Agent cwd is usually the run directory. Scenarios may define ## Eval harness (JS
console.error(`\n>>> Scenario ${file} (run dir ${runDir}, agent cwd ${agentCwd}) ...`);
if (scenarioIdEarly === "08" || scenarioIdEarly === "09" || scenarioIdEarly === "10") {
console.error(
- "Note: Scenarios 08–10 clone a full baseline and install deps — often 30–90+ min wall time with sparse console output until transcript.json. Ctrl+C aborts (writes *.eval-aborted.json). See README § Wall time.",
+ "Note: Scenarios 08–10 clone a full baseline and install deps — often 30–90+ min wall time with sparse console output until transcript.json. Ctrl+C aborts (writes *.eval-aborted.json). Set EVAL_PROGRESS=1 for stderr heartbeats. See README § Wall time.",
);
}
diff --git a/docs/content/quickstarts/hookdeck-outpost-agent-prompt.mdoc b/docs/content/quickstarts/hookdeck-outpost-agent-prompt.mdoc
index 73a28923a..7422c5e38 100644
--- a/docs/content/quickstarts/hookdeck-outpost-agent-prompt.mdoc
+++ b/docs/content/quickstarts/hookdeck-outpost-agent-prompt.mdoc
@@ -169,7 +169,7 @@ Apply **only** the items below that fit the task; **skip** any that do not apply
| `{{API_BASE_URL}}` | `https://api.outpost.hookdeck.com/2025-07-01` | Safe to embed in the prompt |
| `{{TOPICS_LIST}}` | Bullet list or comma-separated topic names | From dashboard config — operators should keep this aligned with what the integrated app will **publish** and what destinations subscribe to |
| `{{TEST_DESTINATION_URL}}` | **Required** — HTTPS URL of the Hookdeck Console **Source** created for this onboarding flow (fed in by the dashboard). |
-| `{{DOCS_URL}}` | `https://outpost.hookdeck.com/docs` | Public docs root (no trailing slash). For unpublished docs, automated evals can set **`EVAL_LOCAL_DOCS=1`** so the Documentation section is replaced with repository file paths (see `docs/agent-evaluation/README.md`). |
+| `{{DOCS_URL}}` | `https://hookdeck.com/docs/outpost` | Production **Outpost** docs base on Hookdeck (no trailing slash). Template paths append the same segments as **`/docs/outpost/…`** on the docs site (see `docs/content/nav.json`). For unpublished docs, evals can set **`EVAL_LOCAL_DOCS=1`** so the Documentation section is replaced with repository file paths (see `docs/agent-evaluation/README.md`). |
| `{{LLMS_FULL_URL}}` | `https://hookdeck.com/outpost/docs/llms-full.txt` | Optional; omit the line if not live yet |
### Building your own UI — where the detail lives
From 4d7a91a3e263eccc595baac1479ee1acbc15be55 Mon Sep 17 00:00:00 2001
From: Phil Leggetter
Date: Fri, 10 Apr 2026 23:00:42 +0100
Subject: [PATCH 41/47] docs(agent-evaluation): refresh tracker, scenarios, and
harness docs
- Point scenario and script links at docs/content paths (.mdoc)
- Update SCENARIO-RUN-TRACKER for latest heuristic-pass runs
- Revise README and AGENTS for current layout
- Remove SKILL-UPSTREAM-NOTES (obsolete)
Made-with: Cursor
---
docs/agent-evaluation/AGENTS.md | 4 +--
docs/agent-evaluation/README.md | 11 ++++---
docs/agent-evaluation/SCENARIO-RUN-TRACKER.md | 32 +++++++++----------
docs/agent-evaluation/SKILL-UPSTREAM-NOTES.md | 22 -------------
.../scenarios/01-basics-curl.md | 2 +-
.../scenarios/02-basics-typescript.md | 2 +-
.../scenarios/03-basics-python.md | 2 +-
.../scenarios/04-basics-go.md | 2 +-
.../scenarios/05-app-nextjs.md | 2 +-
.../scenarios/06-app-fastapi.md | 2 +-
.../scenarios/07-app-go-http.md | 2 +-
.../scenarios/08-integrate-nextjs-existing.md | 4 +--
.../09-integrate-fastapi-existing.md | 4 +--
.../scenarios/10-integrate-go-existing.md | 4 +--
docs/agent-evaluation/scripts/run-scenario.sh | 2 +-
15 files changed, 38 insertions(+), 59 deletions(-)
delete mode 100644 docs/agent-evaluation/SKILL-UPSTREAM-NOTES.md
diff --git a/docs/agent-evaluation/AGENTS.md b/docs/agent-evaluation/AGENTS.md
index 5ab942505..ea6cee0d8 100644
--- a/docs/agent-evaluation/AGENTS.md
+++ b/docs/agent-evaluation/AGENTS.md
@@ -6,7 +6,7 @@ This file applies to **everything under `docs/agent-evaluation/`** (scenarios, R
| Audience | Content |
|----------|---------|
-| **The model under test** | Turn 0 = pasted [`hookdeck-outpost-agent-prompt.mdx`](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx) template only, plus **Turn N — User** blockquotes (verbatim user role-play). |
+| **The model under test** | Turn 0 = pasted [`hookdeck-outpost-agent-prompt.mdoc`](../content/quickstarts/hookdeck-outpost-agent-prompt.mdoc) template only, plus **Turn N — User** blockquotes (verbatim user role-play). |
| **Humans / harness** | Intent, preconditions, eval harness JSON, Success criteria, Failure modes, `score-transcript.ts`, README. |
**Never** put harness vocabulary into **user** lines. The user is a product engineer, not an eval runner.
@@ -26,7 +26,7 @@ It is fine for **Success criteria**, **Failure modes**, and **Intent** to name `
## Alignment without parroting
-- **Product bar** (domain publish, topic reconciliation, full-stack UI depth) belongs in **Success criteria** and in the **prompt template** in `hookdeck-outpost-agent-prompt.mdx`.
+- **Product bar** (domain publish, topic reconciliation, full-stack UI depth) belongs in **Success criteria** and in the **prompt template** in `hookdeck-outpost-agent-prompt.mdoc`.
- **User turns** should **request outcomes** (“I need customers to see failed deliveries and retry”) not **cite** where in the template that is spelled out.
If you add a new requirement, update **Success criteria** (and heuristics only when a **durable, low–false-positive** check exists). Do not stuff the verbatim rubric into the user quote.
diff --git a/docs/agent-evaluation/README.md b/docs/agent-evaluation/README.md
index 5dfb9330c..40eed004b 100644
--- a/docs/agent-evaluation/README.md
+++ b/docs/agent-evaluation/README.md
@@ -59,6 +59,7 @@ The runner loads **`docs/agent-evaluation/.env`** automatically (via `dotenv`).
Scenarios that **`git clone`** a full SaaS template and run **`npm` / `pnpm` / `docker compose`** installs are **slow by design**. Expect **roughly 30–90+ minutes** of wall time for a single run of **08**, **09**, or **10** (clone + install + several agent turns). The harness prints little to the terminal until **`transcript.json`** is written at the end, which can look hung.
+- **Progress on stderr:** set **`EVAL_PROGRESS=1`** so the runner prints **periodic lines** (default every **30s** per agent query, plus every **25** SDK messages). You still see activity when the agent is inside a **long Bash** call and the SDK emits **no** new messages for a while. Tune with **`EVAL_PROGRESS_INTERVAL_MS`** (minimum **5000**). Default is off so CI and short runs stay quiet.
- **Stop early:** **Ctrl+C** (**SIGINT**) in the terminal running `npm run eval`. The runner writes **`*-scenario-NN.eval-aborted.json`** next to the run folder (see **Harness sidecars** at the top of this file).
- **Skip re-clone:** If the baseline is already under the run directory, **`EVAL_SKIP_HARNESS_PRE_STEPS=1`** skips **`git_clone`** from the scenario harness (see each scenario’s **`## Eval harness`** block).
- **Cap agent length (smoke only):** **`EVAL_MAX_TURNS`** (default **80**) limits SDK turns; lowering it may end the run sooner but often **fails** the integration before success criteria are met—use for debugging, not a real pass.
@@ -92,7 +93,7 @@ cd docs/agent-evaluation && npm ci && npm run eval:ci
- **`EVAL_LOCAL_DOCS=1`** — before public docs are live, set this so Turn 0 replaces public doc URLs with **absolute paths to MDX/OpenAPI files in this repo** (so the agent should use **Read** on local files instead of WebFetch to production).
- **`EVAL_SKIP_HARNESS_PRE_STEPS=1`** — skip **`git_clone`** (and any future **`preSteps`**) declared in a scenario’s **`## Eval harness`** JSON block; useful offline or when the baseline folder is already present.
-- **Turn 0** text is built from [`hookdeck-outpost-agent-prompt.mdx`](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx) (`## Template`) with placeholders filled from environment variables.
+- **Turn 0** text is built from [`hookdeck-outpost-agent-prompt.mdoc`](../content/quickstarts/hookdeck-outpost-agent-prompt.mdoc) (`## Template`) with placeholders filled from environment variables.
- Transcripts are written to `results/runs/-scenario-NN/transcript.json` (gitignored).
See `npm run eval -- --help` for env vars (`EVAL_TOOLS`, `EVAL_MODEL`, etc.).
@@ -136,7 +137,7 @@ These measure **existing-app integration**, not a greenfield demo. When you **ex
The **full prompt template** (the text operators paste as Turn 0) lives in **one** place:
-**[`docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx`](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx)** — use the fenced block under **## Template**.
+**[`docs/content/quickstarts/hookdeck-outpost-agent-prompt.mdoc`](../content/quickstarts/hookdeck-outpost-agent-prompt.mdoc)** — use the fenced block under **## Template**.
For eval runs, example placeholder substitutions (non-secret) are in [`fixtures/placeholder-values-for-turn0.md`](fixtures/placeholder-values-for-turn0.md) only. That file intentionally **does not** duplicate the template.
@@ -144,7 +145,7 @@ The Hookdeck dashboard should eventually render the **same** template body from
## How to run an evaluation (manual)
-1. **Turn 0:** Open the [agent prompt MDX](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx), copy **## Template**, replace `{{…}}` (see [placeholder examples](fixtures/placeholder-values-for-turn0.md)).
+1. **Turn 0:** Open the [agent prompt template](../content/quickstarts/hookdeck-outpost-agent-prompt.mdoc), copy **## Template**, replace `{{…}}` (see [placeholder examples](fixtures/placeholder-values-for-turn0.md)).
2. **Pick a scenario:** e.g. [`scenarios/01-basics-curl.md`](scenarios/01-basics-curl.md).
3. **New agent thread:** Paste Turn 0, then follow each **Turn N — User** line from the scenario verbatim (or as specified).
4. **Judge output:** Use the scenario’s **Success criteria** checkboxes (human decision).
@@ -218,7 +219,7 @@ Scenarios **1–4** align with **“Try it out”**; **5–7** with **“Build a
**Caveats (update the skill in `hookdeck/agent-skills`, not in this repo):**
-1. **Managed-first** — The published skill is still **self-hosted heavy** (Docker block first; managed is a short table). For Hookdeck Outpost GA, the skill should foreground [managed quickstarts](../pages/quickstarts/hookdeck-outpost-curl.mdx), `https://api.outpost.hookdeck.com/2025-07-01`, **Settings → Secrets**, and `OUTPOST_API_KEY` / optional `OUTPOST_API_BASE_URL` to match product copy.
+1. **Managed-first** — The published skill is still **self-hosted heavy** (Docker block first; managed is a short table). For Hookdeck Outpost GA, the skill should foreground [managed quickstarts](../content/quickstarts/hookdeck-outpost-curl.mdoc), `https://api.outpost.hookdeck.com/2025-07-01`, **Settings → Secrets**, and `OUTPOST_API_KEY` / optional `OUTPOST_API_BASE_URL` to match product copy.
2. **REST paths** — Examples must use **`/tenants/{id}`**, not `PUT $BASE_URL/$TENANT_ID` (that path is wrong for the real API).
3. **Naming** — Align env var naming with docs (`OUTPOST_API_KEY` or documented dashboard name), not ad-hoc `HOOKDECK_API_KEY` unless the dashboard literally uses that string.
4. **Router vs. deep skills** — Today `outpost` is one monolithic `SKILL.md`. The skill itself mentions **future** destination-specific skills (`outpost-webhooks`, etc.). For scale, consider either **sections** with clear headings or **child skills** (e.g. `outpost-managed-quickstart`, `outpost-self-hosted`) once content grows—without forcing users to install many tiles for the common case.
@@ -227,6 +228,6 @@ Until the skill is updated, agents should still be pointed at the **quickstart M
## Related docs
-- [Agent prompt template (SSoT)](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx)
+- [Agent prompt template (SSoT)](../content/quickstarts/hookdeck-outpost-agent-prompt.mdoc)
- [Upstream skill notes](SKILL-UPSTREAM-NOTES.md)
- [TEMP tracking note](../TEMP-hookdeck-outpost-onboarding-status.md)
diff --git a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
index 7c789f207..b5443c60a 100644
--- a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
+++ b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
@@ -9,7 +9,7 @@ Use this table while you **run scenarios one at a time** and **execute the gener
npm run eval -- --scenario
```
Each run creates `**results/runs/-scenario-/**` with `transcript.json`, `heuristic-score.json`, `llm-score.json`, and whatever the agent wrote (scripts, apps, clones).
-2. **Fill the table:** paste or note the **run directory** (stamp), mark **Heuristic** / **LLM** pass or fail (from the sidecars or console).
+2. **Fill the table:** paste or note the **run directory** (stamp), mark **Heuristic** / **LLM** pass or fail (from the sidecars or console). **Run directory** should be the **latest** folder matching `results/runs/*-scenario-` whose `heuristic-score.json` has **`overallTranscriptPass: true`** (re-scan directories when updating this file).
3. **Execution (generated code):** with `**OUTPOST_API_KEY`** (and `**OUTPOST_TEST_WEBHOOK_URL`** / `**OUTPOST_API_BASE_URL`** if needed) in your shell or `.env`, run the artifact the scenario expects — e.g. `bash outpost-quickstart.sh`, `npx tsx …`, `python …`, `go run …`, `npm run dev` in the generated app folder. Mark **Pass** / **Fail** / **Skip** and add **Notes** (HTTP status, delivery in Hookdeck Console, etc.). **Do not edit generated files to force a pass** — test what the agent produced; note OS/environment (e.g. Linux vs macOS) when relevant. **This column is the primary bar for “does the output actually work?”** Heuristic and LLM scores are supplementary.
4. **Optional:** copy a row to your local run log under `results/` if you use `RUN-RECORDING.template.md`.
@@ -21,14 +21,14 @@ Use this table while you **run scenarios one at a time** and **execute the gener
| ID | Scenario file | Run directory (`results/runs/…`) | Heuristic | LLM judge | Execution (generated code) | Notes |
| --- | ------------------------------------------------------------------------------ | -------------------------------------- | ---------------------- | --------- | -------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| 01 | [01-basics-curl.md](scenarios/01-basics-curl.md) | `2026-04-10T09-28-52-764Z-scenario-01` | Pass (7/7) | Pass | Pass | Artifact: `**quickstart.sh`**. Heuristic + LLM from `npm run eval -- --scenario 01`; harness sidecars are sibling `*.eval-*.json` under `results/runs/` (not inside run dir). Execution: `OUTPOST_API_KEY` from `docs/agent-evaluation/.env` + `bash quickstart.sh` in run dir; tenant **200**, destination **201**, publish **202**; exit 0. |
-| 02 | [02-basics-typescript.md](scenarios/02-basics-typescript.md) | `2026-04-10T10-34-35-461Z-scenario-02` | Pass (9/9) | Pass | Pass | `EVAL_LOCAL_DOCS=1` after **scope-router** update to [agent prompt template](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx). Artifact: `**outpost-quickstart.ts`** + `package.json` (SDK)—**no** Next.js scaffold. Heuristic + LLM pass; judge `execution_in_transcript` **pass** (agent ran script in transcript: tenant, destination, event id). Harness sidecars sibling under `results/runs/`. Earlier over-build run: `2026-04-10T09-39-06-362Z-scenario-02` (Next.js + script; LLM fail). |
-| 03 | [03-basics-python.md](scenarios/03-basics-python.md) | `2026-04-10T11-02-19-073Z-scenario-03` | Pass (8/8) | Pass | Pass | `EVAL_LOCAL_DOCS=1` with [scope-router prompt](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx). Artifact: `**outpost_quickstart.py`** + `.env.example` (`python-dotenv`, `outpost_sdk`)—**no** web framework. Heuristic + LLM pass; judge `execution_in_transcript` **pass** (agent ran script; printed event id). Harness sidecars sibling under `results/runs/`. Earlier run: `2026-04-08T15-34-12-720Z-scenario-03`. |
+| 02 | [02-basics-typescript.md](scenarios/02-basics-typescript.md) | `2026-04-10T15-01-35-359Z-scenario-02` | Pass (9/9) | Pass | Pass | `EVAL_LOCAL_DOCS=1` after **scope-router** update to [agent prompt template](../content/quickstarts/hookdeck-outpost-agent-prompt.mdoc). Artifact: `**outpost-quickstart.ts`** + `package.json` (SDK)—**no** Next.js scaffold. Heuristic + LLM pass; harness sidecars sibling under `results/runs/`. Earlier passes: `2026-04-10T10-49-02-890Z-scenario-02`, `2026-04-10T10-34-35-461Z-scenario-02`. Over-build run: `2026-04-10T09-39-06-362Z-scenario-02` (Next.js + script; LLM fail). |
+| 03 | [03-basics-python.md](scenarios/03-basics-python.md) | `2026-04-10T11-02-19-073Z-scenario-03` | Pass (8/8) | Pass | Pass | `EVAL_LOCAL_DOCS=1` with [scope-router prompt](../content/quickstarts/hookdeck-outpost-agent-prompt.mdoc). Artifact: `**outpost_quickstart.py`** + `.env.example` (`python-dotenv`, `outpost_sdk`)—**no** web framework. Heuristic + LLM pass; judge `execution_in_transcript` **pass** (agent ran script; printed event id). Harness sidecars sibling under `results/runs/`. Earlier run: `2026-04-08T15-34-12-720Z-scenario-03`. |
| 04 | [04-basics-go.md](scenarios/04-basics-go.md) | `2026-04-08T15-48-31-367Z-scenario-04` | Pass (9/9) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifacts: `**main.go`**, `go.mod` (replace → repo `sdks/outpost-go`). `docs/agent-evaluation/.env` + `go run .`; tenant, destination, publish OK. |
-| 05 | [05-app-nextjs.md](scenarios/05-app-nextjs.md) | `2026-04-08T17-21-22-170Z-scenario-05` | 9/10; overall **Fail** | Pass | Pass | `**nextjs-webhook-demo/`** — primary assessed run; see **§ Scenario 05 — assessment (`17-21-22`)**. Heuristic failure is `managed_base_not_selfhosted` (doc-corpus), not the app. Earlier: `2026-04-08T16-12-10-708Z` (`outpost-nextjs-demo/`, 10/10 heuristic, simpler UI). |
+| 05 | [05-app-nextjs.md](scenarios/05-app-nextjs.md) | `2026-04-08T16-12-10-708Z-scenario-05` | Pass (10/10) | Pass | Pass | **Last heuristic-pass run:** `**outpost-nextjs-demo/`** — simpler two-route app (`/api/register`, `/api/publish`), fixed topic. Richer app + assessment: **§ Scenario 05 — assessment** (`**nextjs-webhook-demo/`** in `2026-04-08T17-21-22-170Z-scenario-05`) — LLM + execution pass; heuristic **9/10** (`managed_base_not_selfhosted`, doc-corpus). |
| 06 | [06-app-fastapi.md](scenarios/06-app-fastapi.md) | `2026-04-09T08-38-42-008Z-scenario-06` | Pass (8/8) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. `**main.py`** + `requirements.txt`, `outpost_sdk` + FastAPI. HTML: destinations list, add webhook (topics from API + URL), publish test event, delete. Execution: `python3 -m venv .venv`, `pip install -r requirements.txt`, run-dir `.env`, `uvicorn main:app` on :8766; **GET /** 200, **POST /destinations** 303, **POST /publish** 303. |
| 07 | [07-app-go-http.md](scenarios/07-app-go-http.md) | `2026-04-09T09-10-23-291Z-scenario-07` | Pass (9/9) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. `**go-portal-demo/`** — `main.go` + `templates/`, `net/http`, `outpost-go` (`replace` → repo `sdks/outpost-go`). Multi-step create destination + **GET/POST /publish**. Execution: `PORT=8777` + key/base from `docs/agent-evaluation/.env`; **GET /** 200, **POST /publish** 200. Eval ~25 min wall time. |
-| 08 | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md) | `2026-04-10T14-29-04-214Z-scenario-08` | Pass (10/10) | Pass | Pass | `EVAL_LOCAL_DOCS=1` + [scope-router prompt](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx). Harness `**next-saas-starter/`** under run dir (gitignored). **Execution pass** — operator QA (Postgres, `.env`, migrate/seed/dev, Outpost UI/API). See **§ Scenario 08 — execution notes** for reproducibility (seed/`server-only`, destination-schema `key` vs SDK). Earlier: `2026-04-10T11-08-35-921Z-scenario-08` (8/8), `2026-04-09T14-48-16-906Z-scenario-08`, `2026-04-09T11-08-32-505Z-scenario-08`. |
-| 09 | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) | `2026-04-09T22-16-54-750Z-scenario-09` | Pass (6/6) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. **Artifact** lives under `results/runs/…` (**gitignored**): `full-stack-fastapi-template/` + Docker **outpost-local-s09**; ports **5173** / **8001** / **54333** / **1080**. **§ Scenario 09 — post-agent work** lists everything applied after the agent transcript (incl. test publish, events/attempts/retry UI, docs + prompt). **§ Scenario 09 — review notes** — closed (IA + domain topics guidance landed in BYO UI + prompt). **Legacy runs:** `2026-04-09T20-48-16-530Z-scenario-09`, `2026-04-09T15-51-44-184Z-scenario-09`. |
+| 08 | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md) | `2026-04-10T14-29-04-214Z-scenario-08` | Pass (10/10) | Pass | Pass | `EVAL_LOCAL_DOCS=1` + [scope-router prompt](../content/quickstarts/hookdeck-outpost-agent-prompt.mdoc). Harness `**next-saas-starter/`** under run dir (gitignored). **Execution pass** — operator QA (Postgres, `.env`, migrate/seed/dev, Outpost UI/API). See **§ Scenario 08 — execution notes** for reproducibility (seed/`server-only`, destination-schema `key` vs SDK). Earlier: `2026-04-10T11-08-35-921Z-scenario-08` (8/8), `2026-04-09T14-48-16-906Z-scenario-08`, `2026-04-09T11-08-32-505Z-scenario-08`. |
+| 09 | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) | `2026-04-10T19-54-20-037Z-scenario-09` | Pass (10/10) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. **Artifact:** `full-stack-fastapi-template/` under run dir (**gitignored**). **Heuristic + LLM** from this stamp; harness sidecars sibling under `results/runs/`. Docker: default **5173** / **8000** / **1080** / **1025**; if host **5432** is taken, map DB e.g. **54334:5432** in `compose.override.yml`. After a **fresh DB volume**, clear the SPA token or **re-login** — stale JWT → **404 User not found** on `/api/v1/users/me` and `/api/v1/outpost/destinations`. **§ Scenario 09 — post-agent work** (below) still describes template fixes vs baseline. **Legacy runs:** `2026-04-10T19-22-02-903Z-scenario-09`, `2026-04-09T22-16-54-750Z-scenario-09` (6/6), `2026-04-09T20-48-16-530Z-scenario-09`, `2026-04-09T15-51-44-184Z-scenario-09`. |
| 10 | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md) | | | | | |
@@ -43,7 +43,7 @@ Reproducibility / gotchas:
- **`pnpm dev`** — if another `**next dev**` already holds **`.next/dev/lock`** for this tree, stop it or remove the lock; port **3000** may be taken (Next picks another port). Turbopack may warn about multiple lockfiles when the app sits under the monorepo — see Next’s **`turbopack.root`** guidance if needed.
- **Destination schema `key`** — API returns `key` on schema fields; older SDK parses may strip it and break create-destination payloads keyed from labels. Regenerating SDKs (or a BFF raw fetch + mapping) aligns the UI with the API until then.
-### Scenario 09 — post-agent work (`2026-04-09T22-16-54-750Z-scenario-09`)
+### Scenario 09 — post-agent work (representative: `2026-04-09T22-16-54-750Z-scenario-09`; latest eval stamp `2026-04-10T19-54-20-037Z-scenario-09`)
Work applied **after** the agent transcript so the FastAPI + React artifact matches current integration guidance (eval honesty + local execution). The template tree under `results/runs/-scenario-09/` is **not committed** (see `results/.gitignore`); repo **docs** and **prompt** updates that back this scenario **are** in git.
@@ -64,15 +64,15 @@ Work applied **after** the agent transcript so the FastAPI + React artifact matc
**Docs & prompt (repository)**
-- [Building your own UI](../pages/guides/building-your-own-ui.mdx) — destination-type field fixes; **Events, attempts, and retries** section (features, how they connect, links to API).
-- [Agent prompt template](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx) — full-stack guidance mentions **events list**, **attempts**, **retry**, alongside test publish.
+- [Building your own UI](../content/guides/building-your-own-ui.mdoc) — destination-type field fixes; **Events, attempts, and retries** section (features, how they connect, links to API).
+- [Agent prompt template](../content/quickstarts/hookdeck-outpost-agent-prompt.mdoc) — full-stack guidance mentions **events list**, **attempts**, **retry**, alongside test publish.
### Scenario 09 — review notes (resolved, 2026-04-10)
Operator feedback from exercising the FastAPI full-stack artifact is **closed** in-repo:
-1. **Event activity IA** — [Building your own UI](../pages/guides/building-your-own-ui.mdx) documents **default** destination → activity and **optional** tenant-wide activity with the same list endpoints; no open doc gap.
-2. **Domain topics + real publishes vs test-only** — [Agent prompt](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx) (topic reconciliation, domain publish, test publish as separate), scenarios **08–10** success criteria + user-turn scripts, [README](README.md) execution notes, and heuristic `**publish_beyond_test_only`** in `[src/score-transcript.ts](src/score-transcript.ts)` cover what we measure.
+1. **Event activity IA** — [Building your own UI](../content/guides/building-your-own-ui.mdoc) documents **default** destination → activity and **optional** tenant-wide activity with the same list endpoints; no open doc gap.
+2. **Domain topics + real publishes vs test-only** — [Agent prompt](../content/quickstarts/hookdeck-outpost-agent-prompt.mdoc) (topic reconciliation, domain publish, test publish as separate), scenarios **08–10** success criteria + user-turn scripts, [README](README.md) execution notes, and heuristic `**publish_beyond_test_only`** in `[src/score-transcript.ts](src/score-transcript.ts)` cover what we measure.
The **copied agent template** (the `## Hookdeck Outpost integration` block) intentionally stays **scenario-agnostic**: it does not name eval baselines, harness repos, or scenario IDs—only product-level integration guidance and doc links.
@@ -81,7 +81,7 @@ The **copied agent template** (the `## Hookdeck Outpost integration` block) inte
| Column | Meaning |
| ----------------- | ---------------------------------------------------------------------------------------------------------- |
-| **Run directory** | e.g. `2026-04-07T15-00-00-000Z-scenario-01` — the folder containing `transcript.json` |
+| **Run directory** | Latest `results/runs/*-scenario-` with `heuristic-score.json` → `overallTranscriptPass: true` (folder contains `transcript.json`) |
| **Heuristic** | `heuristic-score.json` → `overallTranscriptPass` (or `passed`/`total`) |
| **LLM judge** | `llm-score.json` → `overall_transcript_pass` |
| **Execution** | Your smoke test of the **produced** script/app with real credentials — **not** automated by `npm run eval` |
@@ -95,14 +95,14 @@ Use short text or symbols in cells, e.g. **Pass** / **Fail** / **Skip** / **N/A*
## Scenario 05 — assessment (`2026-04-08T17-21-22-170Z`)
-**Status:** This is the **current focus run** for scenario 05 reviews (not `2026-04-08T16-12-10-708Z`).
+**Status:** Deep-dive on the **richer** Next.js artifact (`nextjs-webhook-demo/`). The **tracker table** row for scenario **05** points at **`2026-04-08T16-12-10-708Z-scenario-05`** (`outpost-nextjs-demo/`) as the **latest heuristic-pass** run (10/10); this section documents **`17-21-22`** separately because it failed that check while still passing LLM + execution.
| Dimension | Result |
| ----------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Run directory** | `results/runs/2026-04-08T17-21-22-170Z-scenario-05/` |
| **Artifact** | `nextjs-webhook-demo/` — Next.js App Router, `@hookdeck/outpost-sdk`, Outpost calls **only** in `app/api/**/route.ts` (managed API via SDK default unless `OUTPOST_API_BASE_URL` is set). |
-| **Heuristic** | **9/10**; `overallTranscriptPass` false — single failure: `managed_base_not_selfhosted` because the transcript corpus included a **Read** of older [Building your own UI](../pages/guides/building-your-own-ui.mdx) containing `localhost:3333/api/v1`. The **generated app does not** use that URL. See § Scenario 05 heuristic. |
+| **Heuristic** | **9/10**; `overallTranscriptPass` false — single failure: `managed_base_not_selfhosted` because the transcript corpus included a **Read** of older [Building your own UI](../content/guides/building-your-own-ui.mdoc) containing `localhost:3333/api/v1`. The **generated app does not** use that URL. See § Scenario 05 heuristic. |
| **LLM judge** | **Pass** — matches scenario 05 success criteria (Next.js structure, server-side SDK, distinct destination + publish UI, tenant/topic handling, README env, managed default). |
| **Execution** | **Pass** (re-checked): `npm run build` in `nextjs-webhook-demo/`; `npm run dev` with `docs/agent-evaluation/.env`; `POST /api/destinations` → **201**, `POST /api/publish` → **200**. |
@@ -123,8 +123,8 @@ Use short text or symbols in cells, e.g. **Pass** / **Fail** / **Skip** / **N/A*
Scenario 05 includes a regex check (`managed_base_not_selfhosted`) in `[src/score-transcript.ts](../src/score-transcript.ts)` (`scoreScenario05`). It looks at the **whole scoring corpus**: assistant-visible text **plus** content that ended up in the transcript from tools (e.g. **Read** of a doc file), not just files in the run folder.
- It fails if the corpus contains a **self-hosted** default API path: specifically the literal substring `localhost:3333/api/v1` (Outpost’s common local dev URL), or a similar `localhost: / api/v1` pattern, unless `OUTPOST_API_BASE_URL` also appears (see code for the exact conditions).
-- **Historical cause:** Older [Building your own UI](../pages/guides/building-your-own-ui.mdx) curl examples used `localhost:3333/api/v1`. If the agent **read** that page during a run, those lines were embedded in `transcript.json`, the check fired, and `overallTranscriptPass` became **false** even when the **generated Next.js app** only used the **managed** SDK default. That was a **harness / doc-corpus** interaction, not proof the app targeted local Outpost.
-- **Doc update:** `docs/pages/guides/building-your-own-ui.mdx` was rewritten to be **managed / self-hosted agnostic** (`OUTPOST_API_BASE_URL`, OpenAPI-shaped paths). Examples **no longer contain** the literal `localhost:3333/api/v1`, so a future eval whose corpus only picks up the current file should **not** fail this check for that substring. Re-run scenario 05 to confirm; other `localhost` patterns could still match if they appear elsewhere in the corpus.
+- **Historical cause:** Older [Building your own UI](../content/guides/building-your-own-ui.mdoc) curl examples used `localhost:3333/api/v1`. If the agent **read** that page during a run, those lines were embedded in `transcript.json`, the check fired, and `overallTranscriptPass` became **false** even when the **generated Next.js app** only used the **managed** SDK default. That was a **harness / doc-corpus** interaction, not proof the app targeted local Outpost.
+- **Doc update:** `docs/content/guides/building-your-own-ui.mdoc` was rewritten to be **managed / self-hosted agnostic** (`OUTPOST_API_BASE_URL`, OpenAPI-shaped paths). Examples **no longer contain** the literal `localhost:3333/api/v1`, so a future eval whose corpus only picks up the current file should **not** fail this check for that substring. Re-run scenario 05 to confirm; other `localhost` patterns could still match if they appear elsewhere in the corpus.
- **Run `2026-04-08T16-12-10-708Z`:** heuristic **10/10**, `overallTranscriptPass: true`.
- **Run `2026-04-08T17-21-22-170Z`:** heuristic **9/10**, `overallTranscriptPass: false` — failed `managed_base_not_selfhosted`; LLM judge still **passed**; transcript included **Read** of the **previous** `building-your-own-ui.mdx` with `localhost:3333/api/v1`.
diff --git a/docs/agent-evaluation/SKILL-UPSTREAM-NOTES.md b/docs/agent-evaluation/SKILL-UPSTREAM-NOTES.md
deleted file mode 100644
index 6c8de7367..000000000
--- a/docs/agent-evaluation/SKILL-UPSTREAM-NOTES.md
+++ /dev/null
@@ -1,22 +0,0 @@
-# Notes for updating `hookdeck/agent-skills` — `skills/outpost`
-
-Apply these in the **[agent-skills](https://github.com/hookdeck/agent-skills)** repository, not in Outpost OSS.
-
-## Recommended direction
-
-1. **Lead with managed Hookdeck Outpost** — Link prominently to managed quickstarts (curl, TypeScript, Python, Go) and `https://api.outpost.hookdeck.com/2025-07-01`.
-2. **Fix REST examples** — Tenant upsert must be `PUT {base}/tenants/{tenant_id}`, not `PUT {base}/{tenant_id}`.
-3. **Align env naming** — Match product/docs: Outpost API key from project **Settings → Secrets**, typically loaded as `OUTPOST_API_KEY` in examples; avoid introducing `HOOKDECK_API_KEY` unless the dashboard literally uses that name.
-4. **Self-hosted section** — Keep Docker/Kubernetes/Railway as a secondary path with `http://localhost:3333/api/v1` and correct `/tenants/...` paths.
-5. **Optional: split later** — If the file grows, add `outpost-managed.md` / `outpost-self-hosted.md` fragments or separate skills; keep the default tile entrypoint short.
-
-## Concrete issues in current `SKILL.md` (as of fetch against `main`)
-
-- **Wrong curl path:** `curl -X PUT "$BASE_URL/$TENANT_ID"` should target `/tenants/$TENANT_ID` relative to the API base (managed base has no `/api/v1` prefix).
-- **Managed auth row** — Verify exact dashboard copy for secret name and env var conventions; link to Hookdeck Outpost project settings, not only generic dashboard secrets if URLs differ.
-- **Tile summary** — `tile.json` says “self-hosted relay”; managed Outpost should be reflected in the summary string when GA positioning is final.
-
-## Cross-links from this repo
-
-- Onboarding prompt template: `docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx`
-- Manual agent eval harness: `docs/agent-evaluation/README.md`
\ No newline at end of file
diff --git a/docs/agent-evaluation/scenarios/01-basics-curl.md b/docs/agent-evaluation/scenarios/01-basics-curl.md
index 6aa12b215..7d90026f4 100644
--- a/docs/agent-evaluation/scenarios/01-basics-curl.md
+++ b/docs/agent-evaluation/scenarios/01-basics-curl.md
@@ -17,7 +17,7 @@ The harness sets the agent **cwd** to an empty directory under `docs/agent-evalu
### Turn 0
-Paste the **## Template** block from `[hookdeck-outpost-agent-prompt.mdx](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx)`, with `{{…}}` filled using your project or `[fixtures/placeholder-values-for-turn0.md](../fixtures/placeholder-values-for-turn0.md)`.
+Paste the **## Template** block from `[hookdeck-outpost-agent-prompt.mdoc](../../content/quickstarts/hookdeck-outpost-agent-prompt.mdoc)`, with `{{…}}` filled using your project or `[fixtures/placeholder-values-for-turn0.md](../fixtures/placeholder-values-for-turn0.md)`.
### Turn 1 — User
diff --git a/docs/agent-evaluation/scenarios/02-basics-typescript.md b/docs/agent-evaluation/scenarios/02-basics-typescript.md
index a403bab6d..afbc4b7f2 100644
--- a/docs/agent-evaluation/scenarios/02-basics-typescript.md
+++ b/docs/agent-evaluation/scenarios/02-basics-typescript.md
@@ -17,7 +17,7 @@ The harness sets the agent **cwd** to `docs/agent-evaluation/results/runs/
Date: Fri, 10 Apr 2026 23:43:59 +0100
Subject: [PATCH 42/47] docs(eval): record scenario 10 pass in run tracker
Log 2026-04-10T22-14-20-704Z-scenario-10 with heuristic/LLM/execution
results and execution notes (Go baseline, signup smoke, Hookdeck probe).
Made-with: Cursor
---
docs/agent-evaluation/SCENARIO-RUN-TRACKER.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
index b5443c60a..f55ad9cf7 100644
--- a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
+++ b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md
@@ -29,7 +29,7 @@ Use this table while you **run scenarios one at a time** and **execute the gener
| 07 | [07-app-go-http.md](scenarios/07-app-go-http.md) | `2026-04-09T09-10-23-291Z-scenario-07` | Pass (9/9) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. `**go-portal-demo/`** — `main.go` + `templates/`, `net/http`, `outpost-go` (`replace` → repo `sdks/outpost-go`). Multi-step create destination + **GET/POST /publish**. Execution: `PORT=8777` + key/base from `docs/agent-evaluation/.env`; **GET /** 200, **POST /publish** 200. Eval ~25 min wall time. |
| 08 | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md) | `2026-04-10T14-29-04-214Z-scenario-08` | Pass (10/10) | Pass | Pass | `EVAL_LOCAL_DOCS=1` + [scope-router prompt](../content/quickstarts/hookdeck-outpost-agent-prompt.mdoc). Harness `**next-saas-starter/`** under run dir (gitignored). **Execution pass** — operator QA (Postgres, `.env`, migrate/seed/dev, Outpost UI/API). See **§ Scenario 08 — execution notes** for reproducibility (seed/`server-only`, destination-schema `key` vs SDK). Earlier: `2026-04-10T11-08-35-921Z-scenario-08` (8/8), `2026-04-09T14-48-16-906Z-scenario-08`, `2026-04-09T11-08-32-505Z-scenario-08`. |
| 09 | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) | `2026-04-10T19-54-20-037Z-scenario-09` | Pass (10/10) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. **Artifact:** `full-stack-fastapi-template/` under run dir (**gitignored**). **Heuristic + LLM** from this stamp; harness sidecars sibling under `results/runs/`. Docker: default **5173** / **8000** / **1080** / **1025**; if host **5432** is taken, map DB e.g. **54334:5432** in `compose.override.yml`. After a **fresh DB volume**, clear the SPA token or **re-login** — stale JWT → **404 User not found** on `/api/v1/users/me` and `/api/v1/outpost/destinations`. **§ Scenario 09 — post-agent work** (below) still describes template fixes vs baseline. **Legacy runs:** `2026-04-10T19-22-02-903Z-scenario-09`, `2026-04-09T22-16-54-750Z-scenario-09` (6/6), `2026-04-09T20-48-16-530Z-scenario-09`, `2026-04-09T15-51-44-184Z-scenario-09`. |
-| 10 | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md) | | | | | |
+| 10 | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md) | `2026-04-10T22-14-20-704Z-scenario-10` | Pass (7/7) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Harness clone **`startersaas-go-api/`** under run dir (**gitignored**); pin [**devinterface/startersaas-go-api**](https://github.com/devinterface/startersaas-go-api). **Execution:** `go build` OK; **`docker compose build`** fails on baseline **Go 1.21** image vs **`go 1.22`** in `go.mod` (upstream Dockerfile). **Smoke:** Mongo **:27018**, `go run .`, **`POST /api/v1/auth/signup`** with **`privacyAccepted` / `marketingAccepted` as JSON booleans** → **200**; log **`[outpost] published user.created`**. **Outpost delivery** to Hookdeck Source verified with a distinct **`POST /publish`** probe (tenant + webhook destination + event). |
### Scenario 08 — execution notes (`2026-04-10T14-29-04-214Z-scenario-08`)
From 66fd663ffa9b9b6121b6e263a8dc91224d6f84aa Mon Sep 17 00:00:00 2001
From: Phil Leggetter
Date: Sat, 11 Apr 2026 00:16:24 +0100
Subject: [PATCH 43/47] ci(docs): agent eval workflow with live Outpost
execution
Add docs-agent-eval-ci.yml: scenarios 01+02 with EVAL_LOCAL_DOCS, heuristic
+ LLM judge, then execute-ci-artifacts.sh (curl + TypeScript) using
OUTPOST_API_KEY. Trigger on docs content/apis, agent-evaluation harness
(ignoring tracker/results README noise), TypeScript SDK, and workflow edits.
Ignore .env.ci for local secret template; document secrets and execution in
README.
Made-with: Cursor
---
.github/workflows/docs-agent-eval-ci.yml | 77 +++++++++++++++
.gitignore | 1 +
docs/agent-evaluation/.env.example | 2 +-
docs/agent-evaluation/README.md | 11 ++-
docs/agent-evaluation/results/README.md | 2 +-
docs/agent-evaluation/scripts/ci-eval.sh | 1 +
.../scripts/execute-ci-artifacts.sh | 99 +++++++++++++++++++
docs/agent-evaluation/src/score-transcript.ts | 2 +-
8 files changed, 187 insertions(+), 8 deletions(-)
create mode 100644 .github/workflows/docs-agent-eval-ci.yml
create mode 100755 docs/agent-evaluation/scripts/execute-ci-artifacts.sh
diff --git a/.github/workflows/docs-agent-eval-ci.yml b/.github/workflows/docs-agent-eval-ci.yml
new file mode 100644
index 000000000..3367a7f06
--- /dev/null
+++ b/.github/workflows/docs-agent-eval-ci.yml
@@ -0,0 +1,77 @@
+# Runs scenarios 01+02 (curl + TypeScript SDK) with heuristic + LLM judge.
+# Sets EVAL_LOCAL_DOCS=1 so the agent reads repo docs under docs/ (not production WebFetch).
+# Triggers when local docs / OpenAPI / eval harness / TypeScript SDK change; ignores human-only files under results/ and tracker/README/AGENTS.
+# Each run bills Anthropic (agent + judge).
+# Requires repo secrets: ANTHROPIC_API_KEY, EVAL_TEST_DESTINATION_URL, OUTPOST_API_KEY
+# (OUTPOST_TEST_WEBHOOK_URL uses the same URL as EVAL_TEST_DESTINATION_URL in CI.)
+# See docs/agent-evaluation/README.md § CI (recommended slice).
+name: Docs agent eval (CI slice)
+
+on:
+ push:
+ branches:
+ - main
+ paths:
+ - "docs/content/**"
+ - "docs/apis/**"
+ - "docs/agent-evaluation/**"
+ - "docs/README.md"
+ - "docs/AGENTS.md"
+ - "sdks/outpost-typescript/**"
+ - ".github/workflows/docs-agent-eval-ci.yml"
+ paths-ignore:
+ - "docs/agent-evaluation/results/**"
+ - "docs/agent-evaluation/SCENARIO-RUN-TRACKER.md"
+ - "docs/agent-evaluation/README.md"
+ - "docs/agent-evaluation/AGENTS.md"
+ pull_request:
+ paths:
+ - "docs/content/**"
+ - "docs/apis/**"
+ - "docs/agent-evaluation/**"
+ - "docs/README.md"
+ - "docs/AGENTS.md"
+ - "sdks/outpost-typescript/**"
+ - ".github/workflows/docs-agent-eval-ci.yml"
+ paths-ignore:
+ - "docs/agent-evaluation/results/**"
+ - "docs/agent-evaluation/SCENARIO-RUN-TRACKER.md"
+ - "docs/agent-evaluation/README.md"
+ - "docs/agent-evaluation/AGENTS.md"
+
+jobs:
+ eval-ci:
+ # Fork PRs cannot use repository secrets; skip instead of failing a required-looking job.
+ if: github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository
+ runs-on: ubuntu-latest
+ timeout-minutes: 60
+ defaults:
+ run:
+ working-directory: docs/agent-evaluation
+
+ steps:
+ - name: Checkout code
+ uses: actions/checkout@v4
+
+ - name: Set up Node.js
+ uses: actions/setup-node@v4
+ with:
+ node-version: "20"
+ cache: npm
+ cache-dependency-path: docs/agent-evaluation/package-lock.json
+
+ - name: Install dependencies
+ run: npm ci
+
+ - name: Run eval CI slice (scenarios 01, 02)
+ env:
+ ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
+ EVAL_TEST_DESTINATION_URL: ${{ secrets.EVAL_TEST_DESTINATION_URL }}
+ EVAL_LOCAL_DOCS: "1"
+ run: ./scripts/ci-eval.sh
+
+ - name: Execute generated curl + TypeScript artifacts (live Outpost)
+ env:
+ OUTPOST_API_KEY: ${{ secrets.OUTPOST_API_KEY }}
+ OUTPOST_TEST_WEBHOOK_URL: ${{ secrets.EVAL_TEST_DESTINATION_URL }}
+ run: ./scripts/execute-ci-artifacts.sh
diff --git a/.gitignore b/.gitignore
index 64578dcf3..23b769f99 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,5 +1,6 @@
# Environment variables
.env
+.env.ci
.outpost.yaml
# Built binaries
diff --git a/docs/agent-evaluation/.env.example b/docs/agent-evaluation/.env.example
index 9f1392e98..7728e88d5 100644
--- a/docs/agent-evaluation/.env.example
+++ b/docs/agent-evaluation/.env.example
@@ -8,7 +8,7 @@ EVAL_TEST_DESTINATION_URL=
# Strongly recommended for a *full* eval: run the agent’s curl/script/app against a real project.
# The harness does not read this key; you (or a future verifier) use it after the run.
-# OUTPOST_API_KEY=
+# OUTPOST_API_KEY= # required for ./scripts/execute-ci-artifacts.sh after eval:ci; GitHub Actions CI execution step
# OUTPOST_API_BASE_URL=https://api.outpost.hookdeck.com/2025-07-01
# OUTPOST_TEST_WEBHOOK_URL=https://hkdk.events/your-source-id # often same as EVAL_TEST_DESTINATION_URL
diff --git a/docs/agent-evaluation/README.md b/docs/agent-evaluation/README.md
index 40eed004b..7cae6826c 100644
--- a/docs/agent-evaluation/README.md
+++ b/docs/agent-evaluation/README.md
@@ -81,16 +81,17 @@ For **pull-request or main-branch** automation, run **two** scenarios only:
```sh
cd docs/agent-evaluation && npm ci && npm run eval:ci
# or: ./scripts/ci-eval.sh # requires ANTHROPIC_API_KEY + EVAL_TEST_DESTINATION_URL in the environment
+# after a successful eval:ci, live Outpost smoke: OUTPOST_API_KEY + OUTPOST_TEST_WEBHOOK_URL ./scripts/execute-ci-artifacts.sh
```
`eval:ci` is **`npm run eval -- --scenarios 01,02`**: both **heuristic** checks and the **LLM judge** (grounded in each scenario’s **`## Success criteria`**). Skipping the judge would leave you with regex-only signal, which does not encode the product checklist.
-**GitHub Actions:** add repository secrets **`ANTHROPIC_API_KEY`** and **`EVAL_TEST_DESTINATION_URL`**, run from `docs/agent-evaluation` with a normal runner (Claude Agent SDK needs session filesystem access — avoid tight sandboxes; see **Permissions / failures** above). **`OUTPOST_API_KEY`** is still not required for transcript-only CI.
+**GitHub Actions:** add repository secrets **`ANTHROPIC_API_KEY`**, **`EVAL_TEST_DESTINATION_URL`**, and **`OUTPOST_API_KEY`**. Workflow **`.github/workflows/docs-agent-eval-ci.yml`** runs **`./scripts/ci-eval.sh`** with **`EVAL_LOCAL_DOCS=1`** (agent **reads docs from the repo**), then **`./scripts/execute-ci-artifacts.sh`**: picks the **newest** **`*-scenario-01`** / **`*-scenario-02`** pair from **`results/runs/`**, runs the generated **`.sh`** then **`npx tsx`** on the TypeScript artifact (**`npm install`** in the **02** run dir when **`package.json`** exists). **`OUTPOST_TEST_WEBHOOK_URL`** in CI is set from the same secret as **`EVAL_TEST_DESTINATION_URL`**. Triggers on pushes to **`main`** and on **pull requests** when **`docs/content/**`**, **`docs/apis/**`**, **`sdks/outpost-typescript/**`**, root **`docs/README.md`** / **`docs/AGENTS.md`**, or **`docs/agent-evaluation/**`** change, except **`paths-ignore`**: **`results/**`**, **`SCENARIO-RUN-TRACKER.md`**, **`README.md`**, and **`AGENTS.md`** under **`docs/agent-evaluation/`**. Uses **`ubuntu-latest`** (Claude Agent SDK needs normal filesystem access — avoid tight sandboxes; see **Permissions / failures** above). **Fork PRs** skip this job (secrets are not available).
- **`ANTHROPIC_API_KEY`** — required for the agent and for the **LLM judge** (Success criteria) after each scenario you run.
-- **`EVAL_TEST_DESTINATION_URL`** — required for Turn 0; same Source URL as `{{TEST_DESTINATION_URL}}`.
-- **`OUTPOST_API_KEY`** — **not** read by the automated runner, but **required if you want a full evaluation**: without it you can only judge the transcript (plausible curl/SDK text). To verify that **generated commands or code actually work**, put the same Outpost API key you use against the managed API in **`docs/agent-evaluation/.env`** (or export it) and run the agent’s output against a real project. The onboarding prompt tells operators to keep that key in **`.env`** and never paste it into chat.
-- **`EVAL_LOCAL_DOCS=1`** — before public docs are live, set this so Turn 0 replaces public doc URLs with **absolute paths to MDX/OpenAPI files in this repo** (so the agent should use **Read** on local files instead of WebFetch to production).
+- **`EVAL_TEST_DESTINATION_URL`** — required for Turn 0; same Source URL as `{{TEST_DESTINATION_URL}}` (and, in CI, reused as **`OUTPOST_TEST_WEBHOOK_URL`** for execution).
+- **`OUTPOST_API_KEY`** — required for **`execute-ci-artifacts.sh`** and for **GitHub Actions** execution after **`eval:ci`**. For **local** transcript-only runs you can omit it. Put the key in **`docs/agent-evaluation/.env`** (or export); never paste it into chat.
+- **`EVAL_LOCAL_DOCS=1`** — Turn 0 replaces public doc URLs with **absolute paths to MDX/OpenAPI files in this repo** (agent uses **Read** on **`docs/`** instead of **WebFetch** to production). Use locally when validating unpublished docs; **GitHub Actions** sets this for **`docs-agent-eval-ci.yml`**.
- **`EVAL_SKIP_HARNESS_PRE_STEPS=1`** — skip **`git_clone`** (and any future **`preSteps`**) declared in a scenario’s **`## Eval harness`** JSON block; useful offline or when the baseline folder is already present.
- **Turn 0** text is built from [`hookdeck-outpost-agent-prompt.mdoc`](../content/quickstarts/hookdeck-outpost-agent-prompt.mdoc) (`## Template`) with placeholders filled from environment variables.
@@ -117,7 +118,7 @@ Changing **`EVAL_PERMISSION_MODE`** is usually unnecessary; widening **`EVAL_TOO
### Transcript vs execution (full pass)
-`npm run eval` only captures **what the model produced**; it does **not** call Outpost. Treat that as **transcript review**.
+`npm run eval` only captures **what the model produced**; by itself it does **not** call Outpost (transcript review). **`./scripts/execute-ci-artifacts.sh`** (and the **GitHub Actions** workflow’s second step) runs the **01** shell + **02** TypeScript outputs against **live** Outpost when **`OUTPOST_API_KEY`** and **`OUTPOST_TEST_WEBHOOK_URL`** are set.
A **full pass** also answers: *did the generated curl / script / app succeed against a live Outpost project?* Each scenario’s **Success criteria** ends with **Execution** checkboxes for that step. To run them:
diff --git a/docs/agent-evaluation/results/README.md b/docs/agent-evaluation/results/README.md
index 0ed815986..9fe1615cc 100644
--- a/docs/agent-evaluation/results/README.md
+++ b/docs/agent-evaluation/results/README.md
@@ -36,7 +36,7 @@ npm run score -- --run results/runs/-scenario-NN --write
npm run score -- --run results/runs/-scenario-NN --llm --write
```
-**Execution** (curl/SDK against live Outpost with `OUTPOST_API_KEY`) is **not** produced by these JSON files. Treat the **Execution (full pass)** rows in `[../scenarios/](../scenarios/)` as a separate human or CI step unless you add a verifier script.
+**Execution** (curl/SDK against live Outpost with `OUTPOST_API_KEY`) is **not** recorded in these JSON files. Use **`../scripts/execute-ci-artifacts.sh`** after **`eval:ci`**, or the second step in **`.github/workflows/docs-agent-eval-ci.yml`**, and the **Execution (full pass)** rows in `[../scenarios/](../scenarios/)` for human notes.
---
diff --git a/docs/agent-evaluation/scripts/ci-eval.sh b/docs/agent-evaluation/scripts/ci-eval.sh
index 4197c8b92..980442967 100755
--- a/docs/agent-evaluation/scripts/ci-eval.sh
+++ b/docs/agent-evaluation/scripts/ci-eval.sh
@@ -5,6 +5,7 @@
# Optional: same vars in docs/agent-evaluation/.env for local runs.
#
# Scenarios: 01 = curl quickstart shape; 02 = TypeScript SDK script. See README § CI.
+# After success, run ./scripts/execute-ci-artifacts.sh with OUTPOST_API_KEY + OUTPOST_TEST_WEBHOOK_URL for live Outpost (CI does this automatically).
set -euo pipefail
ROOT="$(cd "$(dirname "$0")/.." && pwd)"
diff --git a/docs/agent-evaluation/scripts/execute-ci-artifacts.sh b/docs/agent-evaluation/scripts/execute-ci-artifacts.sh
new file mode 100755
index 000000000..1e67ae1da
--- /dev/null
+++ b/docs/agent-evaluation/scripts/execute-ci-artifacts.sh
@@ -0,0 +1,99 @@
+#!/usr/bin/env bash
+# After a successful eval:ci (same ISO stamp for scenario-01 and scenario-02), run generated
+# curl script and TypeScript quickstart against live Outpost (tenant → destination → publish).
+#
+# Required env: OUTPOST_API_KEY, OUTPOST_TEST_WEBHOOK_URL (often same URL as EVAL_TEST_DESTINATION_URL)
+# Optional: OUTPOST_API_BASE_URL (managed default if unset)
+set -euo pipefail
+
+ROOT="$(cd "$(dirname "$0")/.." && pwd)"
+RUNS="$ROOT/results/runs"
+
+if [[ -z "${OUTPOST_API_KEY:-}" ]]; then
+ echo "execute-ci-artifacts: OUTPOST_API_KEY is not set" >&2
+ exit 1
+fi
+if [[ -z "${OUTPOST_TEST_WEBHOOK_URL:-}" ]]; then
+ echo "execute-ci-artifacts: OUTPOST_TEST_WEBHOOK_URL is not set" >&2
+ exit 1
+fi
+
+if [[ ! -d "$RUNS" ]]; then
+ echo "execute-ci-artifacts: missing $RUNS (run eval:ci first)" >&2
+ exit 1
+fi
+
+# Latest scenario-01 run directory by mtime (same batch shares stamp with scenario-02).
+d01=""
+best=0
+for d in "$RUNS"/*-scenario-01; do
+ [[ -d "$d" ]] || continue
+ m=$(stat -c %Y "$d" 2>/dev/null || stat -f %m "$d")
+ if (( m >= best )); then
+ best=$m
+ d01=$d
+ fi
+done
+
+if [[ -z "$d01" ]]; then
+ echo "execute-ci-artifacts: no *-scenario-01 directory under $RUNS" >&2
+ exit 1
+fi
+
+prefix=${d01%-scenario-01}
+d02="${prefix}-scenario-02"
+if [[ ! -d "$d02" ]]; then
+ echo "execute-ci-artifacts: expected paired run dir missing: $d02" >&2
+ exit 1
+fi
+
+pick_sh() {
+ local dir=$1 f
+ for f in "$dir"/*quickstart*.sh "$dir"/outpost*.sh; do
+ [[ -f "$f" ]] && { echo "$f"; return 0; }
+ done
+ for f in "$dir"/*.sh; do
+ [[ -f "$f" ]] && { echo "$f"; return 0; }
+ done
+ return 1
+}
+
+pick_ts() {
+ local dir=$1 f
+ for f in "$dir"/outpost-quickstart.ts "$dir"/*quickstart*.ts; do
+ [[ -f "$f" ]] && { echo "$f"; return 0; }
+ done
+ for f in "$dir"/*.ts; do
+ [[ -f "$f" ]] && { echo "$f"; return 0; }
+ done
+ return 1
+}
+
+echo "execute-ci-artifacts: scenario 01 dir=$d01"
+sh_path=$(pick_sh "$d01") || {
+ echo "execute-ci-artifacts: no .sh script found in $d01" >&2
+ exit 1
+}
+echo "execute-ci-artifacts: running bash $sh_path"
+export OUTPOST_API_KEY OUTPOST_TEST_WEBHOOK_URL
+[[ -n "${OUTPOST_API_BASE_URL:-}" ]] && export OUTPOST_API_BASE_URL
+chmod +x "$sh_path" 2>/dev/null || true
+# Run from the scenario 01 run dir so relative paths in the generated script behave.
+cd "$d01"
+bash "$sh_path"
+
+echo "execute-ci-artifacts: scenario 02 dir=$d02"
+ts_path=$(pick_ts "$d02") || {
+ echo "execute-ci-artifacts: no .ts file found in $d02" >&2
+ exit 1
+}
+echo "execute-ci-artifacts: running npx tsx $ts_path (from $d02)"
+cd "$d02"
+if [[ -f package.json ]]; then
+ npm install --no-audit --no-fund
+fi
+export OUTPOST_API_KEY OUTPOST_TEST_WEBHOOK_URL
+[[ -n "${OUTPOST_API_BASE_URL:-}" ]] && export OUTPOST_API_BASE_URL
+npx --yes tsx "$ts_path"
+
+echo "execute-ci-artifacts: OK (scenario 01 shell + scenario 02 TypeScript)"
diff --git a/docs/agent-evaluation/src/score-transcript.ts b/docs/agent-evaluation/src/score-transcript.ts
index b3c4df2c9..2dbfb3d59 100644
--- a/docs/agent-evaluation/src/score-transcript.ts
+++ b/docs/agent-evaluation/src/score-transcript.ts
@@ -24,7 +24,7 @@ export interface ScoreReport {
readonly scenarioId: string;
readonly scenarioFile: string;
readonly transcript: TranscriptScore;
- /** Automated harness does not run Outpost; execution stays manual or a future verifier. */
+ /** Automated harness does not run Outpost; use `scripts/execute-ci-artifacts.sh` or CI for live 01/02 smoke. */
readonly execution: { readonly status: "not_automated"; readonly note: string };
/** null when no automated transcript rubric exists for this scenario yet */
readonly overallTranscriptPass: boolean | null;
From 736a23fcf11b5cd208efe1dd5283b5279388749c Mon Sep 17 00:00:00 2001
From: Phil Leggetter
Date: Sat, 11 Apr 2026 00:20:47 +0100
Subject: [PATCH 44/47] ci(docs): allow workflow_dispatch for manual agent eval
runs
Made-with: Cursor
---
.github/workflows/docs-agent-eval-ci.yml | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/.github/workflows/docs-agent-eval-ci.yml b/.github/workflows/docs-agent-eval-ci.yml
index 3367a7f06..6647af05f 100644
--- a/.github/workflows/docs-agent-eval-ci.yml
+++ b/.github/workflows/docs-agent-eval-ci.yml
@@ -8,6 +8,7 @@
name: Docs agent eval (CI slice)
on:
+ workflow_dispatch:
push:
branches:
- main
@@ -42,7 +43,7 @@ on:
jobs:
eval-ci:
# Fork PRs cannot use repository secrets; skip instead of failing a required-looking job.
- if: github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository
+ if: github.event_name != 'pull_request' || github.event.pull_request.head.repo.full_name == github.repository
runs-on: ubuntu-latest
timeout-minutes: 60
defaults:
From 9ab377128eaffc3829de2d01c02453eb0d06e8c6 Mon Sep 17 00:00:00 2001
From: Phil Leggetter
Date: Sat, 11 Apr 2026 00:22:12 +0100
Subject: [PATCH 45/47] ci(docs): fix workflow YAML (paths vs paths-ignore);
document dispatch
GitHub rejects paths + paths-ignore on the same event; drop paths-ignore.
README: manual workflow_dispatch; note broader path matches.
Made-with: Cursor
---
.github/workflows/docs-agent-eval-ci.yml | 12 +-----------
docs/agent-evaluation/README.md | 2 +-
2 files changed, 2 insertions(+), 12 deletions(-)
diff --git a/.github/workflows/docs-agent-eval-ci.yml b/.github/workflows/docs-agent-eval-ci.yml
index 6647af05f..f5ea2c63d 100644
--- a/.github/workflows/docs-agent-eval-ci.yml
+++ b/.github/workflows/docs-agent-eval-ci.yml
@@ -1,6 +1,6 @@
# Runs scenarios 01+02 (curl + TypeScript SDK) with heuristic + LLM judge.
# Sets EVAL_LOCAL_DOCS=1 so the agent reads repo docs under docs/ (not production WebFetch).
-# Triggers when local docs / OpenAPI / eval harness / TypeScript SDK change; ignores human-only files under results/ and tracker/README/AGENTS.
+# Triggers: workflow_dispatch, or push (main) / pull_request when docs / OpenAPI / agent-eval / TS SDK paths change.
# Each run bills Anthropic (agent + judge).
# Requires repo secrets: ANTHROPIC_API_KEY, EVAL_TEST_DESTINATION_URL, OUTPOST_API_KEY
# (OUTPOST_TEST_WEBHOOK_URL uses the same URL as EVAL_TEST_DESTINATION_URL in CI.)
@@ -20,11 +20,6 @@ on:
- "docs/AGENTS.md"
- "sdks/outpost-typescript/**"
- ".github/workflows/docs-agent-eval-ci.yml"
- paths-ignore:
- - "docs/agent-evaluation/results/**"
- - "docs/agent-evaluation/SCENARIO-RUN-TRACKER.md"
- - "docs/agent-evaluation/README.md"
- - "docs/agent-evaluation/AGENTS.md"
pull_request:
paths:
- "docs/content/**"
@@ -34,11 +29,6 @@ on:
- "docs/AGENTS.md"
- "sdks/outpost-typescript/**"
- ".github/workflows/docs-agent-eval-ci.yml"
- paths-ignore:
- - "docs/agent-evaluation/results/**"
- - "docs/agent-evaluation/SCENARIO-RUN-TRACKER.md"
- - "docs/agent-evaluation/README.md"
- - "docs/agent-evaluation/AGENTS.md"
jobs:
eval-ci:
diff --git a/docs/agent-evaluation/README.md b/docs/agent-evaluation/README.md
index 7cae6826c..14df8f51e 100644
--- a/docs/agent-evaluation/README.md
+++ b/docs/agent-evaluation/README.md
@@ -86,7 +86,7 @@ cd docs/agent-evaluation && npm ci && npm run eval:ci
`eval:ci` is **`npm run eval -- --scenarios 01,02`**: both **heuristic** checks and the **LLM judge** (grounded in each scenario’s **`## Success criteria`**). Skipping the judge would leave you with regex-only signal, which does not encode the product checklist.
-**GitHub Actions:** add repository secrets **`ANTHROPIC_API_KEY`**, **`EVAL_TEST_DESTINATION_URL`**, and **`OUTPOST_API_KEY`**. Workflow **`.github/workflows/docs-agent-eval-ci.yml`** runs **`./scripts/ci-eval.sh`** with **`EVAL_LOCAL_DOCS=1`** (agent **reads docs from the repo**), then **`./scripts/execute-ci-artifacts.sh`**: picks the **newest** **`*-scenario-01`** / **`*-scenario-02`** pair from **`results/runs/`**, runs the generated **`.sh`** then **`npx tsx`** on the TypeScript artifact (**`npm install`** in the **02** run dir when **`package.json`** exists). **`OUTPOST_TEST_WEBHOOK_URL`** in CI is set from the same secret as **`EVAL_TEST_DESTINATION_URL`**. Triggers on pushes to **`main`** and on **pull requests** when **`docs/content/**`**, **`docs/apis/**`**, **`sdks/outpost-typescript/**`**, root **`docs/README.md`** / **`docs/AGENTS.md`**, or **`docs/agent-evaluation/**`** change, except **`paths-ignore`**: **`results/**`**, **`SCENARIO-RUN-TRACKER.md`**, **`README.md`**, and **`AGENTS.md`** under **`docs/agent-evaluation/`**. Uses **`ubuntu-latest`** (Claude Agent SDK needs normal filesystem access — avoid tight sandboxes; see **Permissions / failures** above). **Fork PRs** skip this job (secrets are not available).
+**GitHub Actions:** add repository secrets **`ANTHROPIC_API_KEY`**, **`EVAL_TEST_DESTINATION_URL`**, and **`OUTPOST_API_KEY`**. Workflow **`.github/workflows/docs-agent-eval-ci.yml`** runs **`./scripts/ci-eval.sh`** with **`EVAL_LOCAL_DOCS=1`** (agent **reads docs from the repo**), then **`./scripts/execute-ci-artifacts.sh`**: picks the **newest** **`*-scenario-01`** / **`*-scenario-02`** pair from **`results/runs/`**, runs the generated **`.sh`** then **`npx tsx`** on the TypeScript artifact (**`npm install`** in the **02** run dir when **`package.json`** exists). **`OUTPOST_TEST_WEBHOOK_URL`** in CI is set from the same secret as **`EVAL_TEST_DESTINATION_URL`**. Triggers on **`workflow_dispatch`** (manual: Actions → **Docs agent eval (CI slice)** → **Run workflow**, pick branch), pushes to **`main`**, and **pull requests** when **`docs/content/**`**, **`docs/apis/**`**, **`sdks/outpost-typescript/**`**, root **`docs/README.md`** / **`docs/AGENTS.md`**, or **`docs/agent-evaluation/**`** change (GitHub does not allow **`paths`** + **`paths-ignore`** together on the same event, so edits under e.g. **`docs/agent-evaluation/README.md`** also match **`docs/agent-evaluation/**`** and can trigger a run). Uses **`ubuntu-latest`** (Claude Agent SDK needs normal filesystem access — avoid tight sandboxes; see **Permissions / failures** above). **Fork PRs** skip this job (secrets are not available).
- **`ANTHROPIC_API_KEY`** — required for the agent and for the **LLM judge** (Success criteria) after each scenario you run.
- **`EVAL_TEST_DESTINATION_URL`** — required for Turn 0; same Source URL as `{{TEST_DESTINATION_URL}}` (and, in CI, reused as **`OUTPOST_TEST_WEBHOOK_URL`** for execution).
From 49d571354683758a31545b6dee344d9b9b7a6a27 Mon Sep 17 00:00:00 2001
From: Phil Leggetter
Date: Sat, 11 Apr 2026 00:24:17 +0100
Subject: [PATCH 46/47] =?UTF-8?q?fix(agent-eval):=20eval:ci=20argv=20?=
=?UTF-8?q?=E2=80=94=20drop=20stray=20--=20before=20--scenarios?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Node parseArgs treats a bare -- as starting positionals; --scenarios then
failed with ERR_PARSE_ARGS_UNEXPECTED_POSITIONAL in CI.
Made-with: Cursor
---
docs/agent-evaluation/package.json | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/docs/agent-evaluation/package.json b/docs/agent-evaluation/package.json
index 900af5e2d..73d7d379d 100644
--- a/docs/agent-evaluation/package.json
+++ b/docs/agent-evaluation/package.json
@@ -6,7 +6,7 @@
"description": "Claude Agent SDK harness for Outpost onboarding scenario evals",
"scripts": {
"eval": "node --import tsx src/run-agent-eval.ts",
- "eval:ci": "node --import tsx src/run-agent-eval.ts -- --scenarios 01,02",
+ "eval:ci": "node --import tsx src/run-agent-eval.ts --scenarios 01,02",
"eval:tsx-cli": "tsx src/run-agent-eval.ts",
"score": "node --import tsx src/score-eval.ts",
"typecheck": "tsc --noEmit"
From 052e48f2d9dc3c1472d4e5abc0705d7a26c8fc93 Mon Sep 17 00:00:00 2001
From: Phil Leggetter
Date: Sat, 11 Apr 2026 11:45:06 +0100
Subject: [PATCH 47/47] fix(agent-eval): execution defaults, smoke test, CI env
for live Outpost
- execute-ci-artifacts: EVAL_TEST_DESTINATION_URL fallback for webhook URL;
default OUTPOST_API_BASE_URL with := (empty .env no longer strips version path);
clearer errors on shell/ts failure
- Add smoke-test-execute-ci-artifacts.sh + npm run smoke:execute-ci (topics *,
loads .env then .env.ci)
- CI execution step: OUTPOST_API_BASE_URL + OUTPOST_CI_PUBLISH_TOPIC
- README troubleshooting (404) and .env.example OUTPOST_CI_PUBLISH_TOPIC
Made-with: Cursor
---
.github/workflows/docs-agent-eval-ci.yml | 2 +
docs/agent-evaluation/.env.example | 1 +
docs/agent-evaluation/README.md | 10 ++
docs/agent-evaluation/package.json | 1 +
.../scripts/execute-ci-artifacts.sh | 18 ++-
.../smoke-test-execute-ci-artifacts.sh | 126 ++++++++++++++++++
6 files changed, 155 insertions(+), 3 deletions(-)
create mode 100755 docs/agent-evaluation/scripts/smoke-test-execute-ci-artifacts.sh
diff --git a/.github/workflows/docs-agent-eval-ci.yml b/.github/workflows/docs-agent-eval-ci.yml
index f5ea2c63d..49fb76e87 100644
--- a/.github/workflows/docs-agent-eval-ci.yml
+++ b/.github/workflows/docs-agent-eval-ci.yml
@@ -65,4 +65,6 @@ jobs:
env:
OUTPOST_API_KEY: ${{ secrets.OUTPOST_API_KEY }}
OUTPOST_TEST_WEBHOOK_URL: ${{ secrets.EVAL_TEST_DESTINATION_URL }}
+ OUTPOST_API_BASE_URL: https://api.outpost.hookdeck.com/2025-07-01
+ OUTPOST_CI_PUBLISH_TOPIC: user.created
run: ./scripts/execute-ci-artifacts.sh
diff --git a/docs/agent-evaluation/.env.example b/docs/agent-evaluation/.env.example
index 7728e88d5..79e210a37 100644
--- a/docs/agent-evaluation/.env.example
+++ b/docs/agent-evaluation/.env.example
@@ -11,6 +11,7 @@ EVAL_TEST_DESTINATION_URL=
# OUTPOST_API_KEY= # required for ./scripts/execute-ci-artifacts.sh after eval:ci; GitHub Actions CI execution step
# OUTPOST_API_BASE_URL=https://api.outpost.hookdeck.com/2025-07-01
# OUTPOST_TEST_WEBHOOK_URL=https://hkdk.events/your-source-id # often same as EVAL_TEST_DESTINATION_URL
+# OUTPOST_CI_PUBLISH_TOPIC=user.created # optional; publish topic for npm run smoke:execute-ci (must exist in project)
# Optional (see npm run eval -- --help)
# EVAL_API_BASE_URL=https://api.outpost.hookdeck.com/2025-07-01
diff --git a/docs/agent-evaluation/README.md b/docs/agent-evaluation/README.md
index 14df8f51e..1c5799797 100644
--- a/docs/agent-evaluation/README.md
+++ b/docs/agent-evaluation/README.md
@@ -120,6 +120,16 @@ Changing **`EVAL_PERMISSION_MODE`** is usually unnecessary; widening **`EVAL_TOO
`npm run eval` only captures **what the model produced**; by itself it does **not** call Outpost (transcript review). **`./scripts/execute-ci-artifacts.sh`** (and the **GitHub Actions** workflow’s second step) runs the **01** shell + **02** TypeScript outputs against **live** Outpost when **`OUTPOST_API_KEY`** and **`OUTPOST_TEST_WEBHOOK_URL`** are set.
+**Local smoke (no agent):** to verify secrets and the managed API the same way CI does—without depending on a fresh eval transcript—run from **`docs/agent-evaluation/`** with **`OUTPOST_API_KEY`** and **`OUTPOST_TEST_WEBHOOK_URL`** set (e.g. **`source .env`**):
+
+```sh
+npm run smoke:execute-ci
+```
+
+That writes a temporary **`*-scenario-01` / `*-scenario-02`** pair under **`results/runs/`** with hand-maintained scripts: shell destination uses **`topics: ["*"]`** so you do not need every topic name pre-created; publish still uses **`OUTPOST_CI_PUBLISH_TOPIC`** (default **`user.created`**, overridable in the environment), which **must exist** in your Outpost project’s topic list. **`execute-ci-artifacts.sh`** was not exercised end-to-end in-repo before CI; use this command after changing execution logic.
+
+**CI `curl: (22) … 404`:** the agent-generated shell script is calling an Outpost URL that returned **404**. Common causes: wrong **`OUTPOST_API_BASE_URL`** in the script (CI now sets the managed URL explicitly), or a **publish/destination topic** that does not exist in the project tied to **`OUTPOST_API_KEY`**. Ensure **`user.created`** is configured in that project, or set **`OUTPOST_CI_PUBLISH_TOPIC`** to a topic you do have. Compare the failing **`curl`** line in the Actions log with the [curl quickstart](../content/quickstarts/hookdeck-outpost-curl.mdoc).
+
A **full pass** also answers: *did the generated curl / script / app succeed against a live Outpost project?* Each scenario’s **Success criteria** ends with **Execution** checkboxes for that step. To run them:
1. Add **`OUTPOST_API_KEY`** (and **`OUTPOST_TEST_WEBHOOK_URL`** / **`OUTPOST_API_BASE_URL`** when the artifact expects them) to `docs/agent-evaluation/.env` so your shell has them after `dotenv` or when you `source` / copy into the directory where you run the code.
diff --git a/docs/agent-evaluation/package.json b/docs/agent-evaluation/package.json
index 73d7d379d..f9812c162 100644
--- a/docs/agent-evaluation/package.json
+++ b/docs/agent-evaluation/package.json
@@ -7,6 +7,7 @@
"scripts": {
"eval": "node --import tsx src/run-agent-eval.ts",
"eval:ci": "node --import tsx src/run-agent-eval.ts --scenarios 01,02",
+ "smoke:execute-ci": "bash scripts/smoke-test-execute-ci-artifacts.sh",
"eval:tsx-cli": "tsx src/run-agent-eval.ts",
"score": "node --import tsx src/score-eval.ts",
"typecheck": "tsc --noEmit"
diff --git a/docs/agent-evaluation/scripts/execute-ci-artifacts.sh b/docs/agent-evaluation/scripts/execute-ci-artifacts.sh
index 1e67ae1da..03c046d8c 100755
--- a/docs/agent-evaluation/scripts/execute-ci-artifacts.sh
+++ b/docs/agent-evaluation/scripts/execute-ci-artifacts.sh
@@ -13,11 +13,17 @@ if [[ -z "${OUTPOST_API_KEY:-}" ]]; then
echo "execute-ci-artifacts: OUTPOST_API_KEY is not set" >&2
exit 1
fi
+export OUTPOST_TEST_WEBHOOK_URL="${OUTPOST_TEST_WEBHOOK_URL:-${EVAL_TEST_DESTINATION_URL:-}}"
if [[ -z "${OUTPOST_TEST_WEBHOOK_URL:-}" ]]; then
- echo "execute-ci-artifacts: OUTPOST_TEST_WEBHOOK_URL is not set" >&2
+ echo "execute-ci-artifacts: OUTPOST_TEST_WEBHOOK_URL or EVAL_TEST_DESTINATION_URL must be set" >&2
exit 1
fi
+# Managed API default (agent-generated scripts often expect this in the environment).
+# Use := so empty string from .env is treated like unset (otherwise curl hits /tenants without /2025-07-01 → 404).
+: "${OUTPOST_API_BASE_URL:=https://api.outpost.hookdeck.com/2025-07-01}"
+export OUTPOST_API_BASE_URL
+
if [[ ! -d "$RUNS" ]]; then
echo "execute-ci-artifacts: missing $RUNS (run eval:ci first)" >&2
exit 1
@@ -80,7 +86,10 @@ export OUTPOST_API_KEY OUTPOST_TEST_WEBHOOK_URL
chmod +x "$sh_path" 2>/dev/null || true
# Run from the scenario 01 run dir so relative paths in the generated script behave.
cd "$d01"
-bash "$sh_path"
+bash "$sh_path" || {
+ echo "execute-ci-artifacts: scenario 01 shell failed (curl exit 22 = HTTP error). 404 is often a wrong path or a publish/destination topic that is not configured in your Outpost project. Set OUTPOST_API_BASE_URL if needed; try npm run smoke:execute-ci (uses destination topics [\"*\"])." >&2
+ exit 1
+}
echo "execute-ci-artifacts: scenario 02 dir=$d02"
ts_path=$(pick_ts "$d02") || {
@@ -94,6 +103,9 @@ if [[ -f package.json ]]; then
fi
export OUTPOST_API_KEY OUTPOST_TEST_WEBHOOK_URL
[[ -n "${OUTPOST_API_BASE_URL:-}" ]] && export OUTPOST_API_BASE_URL
-npx --yes tsx "$ts_path"
+npx --yes tsx "$ts_path" || {
+ echo "execute-ci-artifacts: scenario 02 TypeScript failed. Check OUTPOST_API_KEY, OUTPOST_TEST_WEBHOOK_URL, and that OUTPOST_CI_PUBLISH_TOPIC (default user.created) exists in the project. Try: npm run smoke:execute-ci" >&2
+ exit 1
+}
echo "execute-ci-artifacts: OK (scenario 01 shell + scenario 02 TypeScript)"
diff --git a/docs/agent-evaluation/scripts/smoke-test-execute-ci-artifacts.sh b/docs/agent-evaluation/scripts/smoke-test-execute-ci-artifacts.sh
new file mode 100755
index 000000000..e85d1869b
--- /dev/null
+++ b/docs/agent-evaluation/scripts/smoke-test-execute-ci-artifacts.sh
@@ -0,0 +1,126 @@
+#!/usr/bin/env bash
+# Local / operator check for the same path as CI: materialize a fresh *-scenario-01 / *-scenario-02
+# pair with hand-maintained scripts (wildcard destination topics), then run execute-ci-artifacts.sh.
+#
+# Requires: OUTPOST_API_KEY, OUTPOST_TEST_WEBHOOK_URL (source docs/agent-evaluation/.env or export)
+# Optional: OUTPOST_API_BASE_URL, OUTPOST_CI_PUBLISH_TOPIC (default user.created — must exist in your project)
+#
+# Does not invoke the agent. Use this to verify secrets and managed API before relying on CI execution.
+set -euo pipefail
+
+ROOT="$(cd "$(dirname "$0")/.." && pwd)"
+cd "$ROOT"
+if [[ -f .env ]]; then
+ set -a
+ # shellcheck disable=SC1091
+ source .env
+ set +a
+fi
+if [[ -f .env.ci ]]; then
+ set -a
+ # shellcheck disable=SC1091
+ source .env.ci
+ set +a
+fi
+
+# Same as CI: webhook URL is often stored as EVAL_TEST_DESTINATION_URL in .env / .env.ci
+export OUTPOST_TEST_WEBHOOK_URL="${OUTPOST_TEST_WEBHOOK_URL:-${EVAL_TEST_DESTINATION_URL:-}}"
+
+if [[ -z "${OUTPOST_API_KEY:-}" || -z "${OUTPOST_TEST_WEBHOOK_URL:-}" ]]; then
+ echo "smoke-test-execute-ci: set OUTPOST_API_KEY and OUTPOST_TEST_WEBHOOK_URL (or EVAL_TEST_DESTINATION_URL), e.g. source .env" >&2
+ exit 1
+fi
+
+RUNS="$ROOT/results/runs"
+mkdir -p "$RUNS"
+
+STAMP="ci-smoke-$(date -u +%Y-%m-%dT%H-%M-%S)-$(printf '%03d' $((RANDOM % 1000)))Z"
+d01="$RUNS/${STAMP}-scenario-01"
+d02="$RUNS/${STAMP}-scenario-02"
+mkdir -p "$d01" "$d02"
+
+PUBLISH_TOPIC="${OUTPOST_CI_PUBLISH_TOPIC:-user.created}"
+
+# Shell: managed API, unique tenant, destination topics * (no dashboard topic list required), then publish.
+cat > "$d01/outpost_quickstart.sh" << 'EOSH'
+#!/usr/bin/env bash
+set -euo pipefail
+BASE="${OUTPOST_API_BASE_URL:-https://api.outpost.hookdeck.com/2025-07-01}"
+TENANT_ID="ci_smoke_${RANDOM}_$(date +%s)"
+TOPIC="${OUTPOST_CI_PUBLISH_TOPIC:-user.created}"
+DEST_JSON="$(OUTPOST_TEST_WEBHOOK_URL="$OUTPOST_TEST_WEBHOOK_URL" python3 -c '
+import json, os
+print(json.dumps({"type": "webhook", "topics": ["*"], "config": {"url": os.environ["OUTPOST_TEST_WEBHOOK_URL"]}}))
+')"
+curl -sS -f -X PUT "$BASE/tenants/$TENANT_ID" \
+ -H "Authorization: Bearer $OUTPOST_API_KEY" -o /dev/null
+curl -sS -f -X POST "$BASE/tenants/$TENANT_ID/destinations" \
+ -H "Authorization: Bearer $OUTPOST_API_KEY" -H "Content-Type: application/json" \
+ -d "$DEST_JSON" -o /dev/null
+curl -sS -f -X POST "$BASE/publish" \
+ -H "Authorization: Bearer $OUTPOST_API_KEY" -H "Content-Type: application/json" \
+ -d "$(TENANT_ID="$TENANT_ID" TOPIC="$TOPIC" python3 -c '
+import json, os
+print(json.dumps({
+ "tenant_id": os.environ["TENANT_ID"],
+ "topic": os.environ["TOPIC"],
+ "eligible_for_retry": True,
+ "metadata": {"source": "ci-smoke-sh"},
+ "data": {"smoke": True},
+}))
+')" -o /dev/null -w "publish_http=%{http_code}\n"
+echo "smoke shell OK tenant=$TENANT_ID"
+EOSH
+chmod +x "$d01/outpost_quickstart.sh"
+
+# TypeScript: same semantics (wildcard subscription); publish uses OUTPOST_CI_PUBLISH_TOPIC.
+cat > "$d02/package.json" << 'EOJSON'
+{
+ "name": "ci-smoke-outpost-ts",
+ "private": true,
+ "type": "module",
+ "dependencies": {
+ "@hookdeck/outpost-sdk": "^0.9.0"
+ }
+}
+EOJSON
+
+cat > "$d02/outpost-quickstart.ts" << 'EOTS'
+import { Outpost } from "@hookdeck/outpost-sdk";
+
+const apiKey = process.env.OUTPOST_API_KEY;
+if (!apiKey) throw new Error("Set OUTPOST_API_KEY");
+const webhookUrl = process.env.OUTPOST_TEST_WEBHOOK_URL;
+if (!webhookUrl) throw new Error("Set OUTPOST_TEST_WEBHOOK_URL");
+
+const outpost = new Outpost({
+ apiKey,
+ ...(process.env.OUTPOST_API_BASE_URL
+ ? { serverURL: process.env.OUTPOST_API_BASE_URL }
+ : {}),
+});
+
+const tenantId = `ci_smoke_ts_${Math.random().toString(36).slice(2)}_${Date.now()}`;
+const topic = process.env.OUTPOST_CI_PUBLISH_TOPIC ?? "user.created";
+
+await outpost.tenants.upsert(tenantId);
+await outpost.destinations.create(tenantId, {
+ type: "webhook",
+ topics: ["*"],
+ config: { url: webhookUrl },
+});
+const published = await outpost.publish.event({
+ tenantId,
+ topic,
+ eligibleForRetry: true,
+ metadata: { source: "ci-smoke-ts" },
+ data: { smoke: true },
+});
+console.log("smoke ts OK event id:", published.id);
+EOTS
+
+touch "$d01" "$d02"
+echo "smoke-test-execute-ci: wrote $d01 and $d02 (publish topic=$PUBLISH_TOPIC)"
+export OUTPOST_CI_PUBLISH_TOPIC="$PUBLISH_TOPIC"
+./scripts/execute-ci-artifacts.sh
+echo "smoke-test-execute-ci: OK"