From 1638fe15c58c8fbaa42d0d4b22b90101787e2466 Mon Sep 17 00:00:00 2001 From: Phil Leggetter Date: Tue, 7 Apr 2026 13:03:41 +0100 Subject: [PATCH 01/47] docs: add Hookdeck Outpost managed quickstarts and agent prompt MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add self-contained quickstarts for curl, TypeScript, Python, and Go against the managed API, with Settings → Secrets, env-based examples, and verification via Hookdeck Console and project logs. Nest Quickstarts nav under Hookdeck Outpost (above Self-Hosted) and add an agent prompt template page for dashboard copy/paste. Include TEMP-hookdeck-outpost-onboarding-status.md for GA tracking. Made-with: Cursor --- ...TEMP-hookdeck-outpost-onboarding-status.md | 29 ++++ docs/pages/quickstarts.mdx | 14 +- .../hookdeck-outpost-agent-prompt.mdx | 68 ++++++++ .../quickstarts/hookdeck-outpost-curl.mdx | 96 +++++++++++ .../pages/quickstarts/hookdeck-outpost-go.mdx | 163 ++++++++++++++++++ .../quickstarts/hookdeck-outpost-python.mdx | 134 ++++++++++++++ .../hookdeck-outpost-typescript.mdx | 135 +++++++++++++++ docs/zudoku.config.ts | 63 +++++-- 8 files changed, 690 insertions(+), 12 deletions(-) create mode 100644 docs/TEMP-hookdeck-outpost-onboarding-status.md create mode 100644 docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx create mode 100644 docs/pages/quickstarts/hookdeck-outpost-curl.mdx create mode 100644 docs/pages/quickstarts/hookdeck-outpost-go.mdx create mode 100644 docs/pages/quickstarts/hookdeck-outpost-python.mdx create mode 100644 docs/pages/quickstarts/hookdeck-outpost-typescript.mdx diff --git a/docs/TEMP-hookdeck-outpost-onboarding-status.md b/docs/TEMP-hookdeck-outpost-onboarding-status.md new file mode 100644 index 000000000..9faa176f0 --- /dev/null +++ b/docs/TEMP-hookdeck-outpost-onboarding-status.md @@ -0,0 +1,29 @@ +# Hookdeck Outpost onboarding — status (temporary) + +**Purpose:** Track implementation status for the managed quickstarts, agent prompt, and related work. **Delete this file** when tracking moves elsewhere (e.g. Linear, parent epic). + +**Last updated:** 2026-04-07 + +--- + +## Done (Outpost OSS repo) + +- Managed quickstarts: `hookdeck-outpost-curl.mdx`, `-typescript.mdx`, `-python.mdx`, `-go.mdx` +- Agent prompt template page: `hookdeck-outpost-agent-prompt.mdx` +- Zudoku sidebar: **Quickstarts → Hookdeck Outpost** (above **Self-Hosted**) +- `quickstarts.mdx` index: managed vs self-hosted links +- Content aligned with product copy: API key from **Settings → Secrets**, standard markdown (no `:::tip`), verify via Hookdeck Console + project logs +- SDK examples: env vars section, numbered quickstart scripts with step comments + +## Pending / follow-up + +- **QA:** Run TypeScript, Python, and Go examples against live managed API; confirm all doc links resolve on production docs URL +- **Test destination URL:** When `console.hookdeck.com` (or equivalent) has a stable public URL format, update quickstarts if it replaces “create a Console Source” instructions +- **Hookdeck Dashboard:** Two-step onboarding (topics → copy agent prompt) with placeholder injection (`{{API_BASE_URL}}`, `{{TOPICS_LIST}}`, `{{TEST_DESTINATION_URL}}`, `{{DOCS_URL}}`, optional `{{LLMS_FULL_URL}}`); env var UI for `OUTPOST_API_KEY` (not in prompt body) +- **Hookdeck Astro site:** Consume MDX, `llms.txt` / `llms-full.txt` / `.md` exports, canonical `DOCS_URL` (e.g. `https://hookdeck.com/outpost/docs`) +- **Deferred (not blocking GA):** Broader docs IA (“Self-Hosted” under Guides, redirects for moved pages) per original plan + +## References + +- OpenAPI / managed base URL: `https://api.outpost.hookdeck.com/2025-07-01` (in `docs/apis/openapi.yaml` `servers`) +- Agent template source: `docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx` \ No newline at end of file diff --git a/docs/pages/quickstarts.mdx b/docs/pages/quickstarts.mdx index e5a74ee7e..13f6aaa5b 100644 --- a/docs/pages/quickstarts.mdx +++ b/docs/pages/quickstarts.mdx @@ -2,7 +2,19 @@ title: "Outpost Quickstarts" --- -Get started with Outpost by following one of the quickstarts: +## Hookdeck Outpost (managed) + +Use Hookdeck’s hosted Outpost API with your dashboard API key and preconfigured topics: + +- [curl](/docs/quickstarts/hookdeck-outpost-curl) +- [TypeScript](/docs/quickstarts/hookdeck-outpost-typescript) +- [Python](/docs/quickstarts/hookdeck-outpost-python) +- [Go](/docs/quickstarts/hookdeck-outpost-go) +- [Agent prompt template](/docs/quickstarts/hookdeck-outpost-agent-prompt) (for AI-assisted integration) + +## Self-hosted + +Run Outpost in your own infrastructure: - [Docker with RabbitMQ or AWS SQS via LocalStack](/docs/quickstarts/docker) - [Kubernetes with RabbitMQ](/docs/quickstarts/kubernetes) diff --git a/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx b/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx new file mode 100644 index 000000000..1f2a3a394 --- /dev/null +++ b/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx @@ -0,0 +1,68 @@ +--- +title: "Hookdeck Outpost — agent prompt template" +description: "Copy-paste template for AI coding agents. Dashboard teams should inject the placeholders server-side or client-side." +--- + +This page is a **reference template** for the Hookdeck Outpost onboarding flow. Replace `{{PLACEHOLDERS}}` with values from the operator’s project (or render them in the dashboard). **Do not** put the API key in the prompt; the operator sets `OUTPOST_API_KEY` separately. API keys are created under the Outpost project: **Settings → Secrets** (the same Outpost API key used by the REST API and SDKs). + +## Template + +``` +## Hookdeck Outpost integration + +You are helping integrate Hookdeck Outpost into a platform to deliver events (webhooks and event destinations) to the platform's customers. + +### Credentials + +- API base URL: {{API_BASE_URL}} +- API key (Outpost API key from the project **Settings → Secrets**): read from the `OUTPOST_API_KEY` environment variable (never ask the user to paste the key into chat) + +### Configured topics + +{{TOPICS_LIST}} + +### Test destination + +Use this URL to verify event delivery (webhook destination): {{TEST_DESTINATION_URL}} + +### Documentation + +- Getting started (curl): {{DOCS_URL}}/quickstarts/hookdeck-outpost-curl +- TypeScript quickstart: {{DOCS_URL}}/quickstarts/hookdeck-outpost-typescript +- Python quickstart: {{DOCS_URL}}/quickstarts/hookdeck-outpost-python +- Go quickstart: {{DOCS_URL}}/quickstarts/hookdeck-outpost-go +- Full docs bundle (when available on the public site): {{LLMS_FULL_URL}} +- API reference: {{DOCS_URL}}/api +- Destination types: {{DOCS_URL}}/destinations +- SDK documentation: {{DOCS_URL}}/sdks + +### What to do + +Ask the user which of the following they want: + +1. **Try it out** — Create a minimal script that runs through the full flow: create a tenant, add a webhook destination, publish a test event. Ask which language they prefer (TypeScript, Python, Go, or curl) and follow the matching quickstart doc. + +2. **Build a minimal example** — Scaffold a small app with a simple UI that demonstrates tenant creation, destination management, and event publishing. Ask which framework they prefer. + +3. **Integrate with an existing app** — Inspect the codebase for language and framework, then integrate Outpost: add the SDK (or use REST), create tenants when customers onboard, and publish events at the right points in application logic. + +For all modes, read the relevant quickstart documentation before writing code. + +**Concepts:** Each tenant is one of the platform's customers. Destinations are where events are delivered (webhook URLs, queues, etc.). Events are published with a **topic**; only destinations subscribed to that topic receive the event. Topics for this project are listed above and were configured in the Hookdeck dashboard. +``` + +## Placeholder reference + +| Placeholder | Example | Notes | +|-------------|---------|--------| +| `{{API_BASE_URL}}` | `https://api.outpost.hookdeck.com/2025-07-01` | Safe to embed in the prompt | +| `{{TOPICS_LIST}}` | Bullet list or comma-separated topic names | From dashboard config | +| `{{TEST_DESTINATION_URL}}` | Unique URL from Hookdeck Console Source, or operator’s test endpoint | May be TBC until `console.hookdeck.com` flow is finalized | +| `{{DOCS_URL}}` | `https://hookdeck.com/outpost/docs` | Public docs root (no trailing slash) | +| `{{LLMS_FULL_URL}}` | `https://hookdeck.com/outpost/docs/llms-full.txt` | Optional; omit the line if not live yet | + +## Operator checklist (dashboard UI) + +- Show **API base URL** and **topics** next to the copyable prompt. +- Explain that the **API key** is the Outpost API key from **Settings → Secrets**, and show **environment variables**: `OUTPOST_API_KEY` (value with copy button), optional `OUTPOST_API_BASE_URL`, and `OUTPOST_TEST_WEBHOOK_URL` when the quickstart examples need a test webhook URL. +- Keep the **API key out of the prompt text** to reduce exposure via model logs and chat history. diff --git a/docs/pages/quickstarts/hookdeck-outpost-curl.mdx b/docs/pages/quickstarts/hookdeck-outpost-curl.mdx new file mode 100644 index 000000000..c7614b614 --- /dev/null +++ b/docs/pages/quickstarts/hookdeck-outpost-curl.mdx @@ -0,0 +1,96 @@ +--- +title: "Hookdeck Outpost Quickstart: curl" +--- + +[Hookdeck Outpost](https://outpost.hookdeck.com) is Hookdeck’s managed [Outpost](https://github.com/hookdeck/outpost) service: a control plane and delivery layer for event destinations (webhooks, queues, and more) scoped per **tenant**—each tenant is one of your platform’s customers. + +This quickstart uses the REST API with `curl`. Topics are assumed to be configured already in the Hookdeck dashboard; use a topic name that exists there when you publish. + +## Prerequisites + +- A Hookdeck account with an Outpost project +- An **API key** (Outpost API key) from your project: **Settings → Secrets** +- **Topics** already configured in the dashboard (for example `user.created`, `order.completed`) +- API base URL: `https://api.outpost.hookdeck.com/2025-07-01` + +## Set up credentials + +In the Hookdeck Dashboard, open your Outpost project, go to **Settings → Secrets**, and create or copy an API key. That value is the same Outpost API key you use for the REST API and the SDKs. + +Store the API key and base URL in your shell (or in a `.env` file you `source`): + +```sh +export OUTPOST_API_BASE_URL="https://api.outpost.hookdeck.com/2025-07-01" +export OUTPOST_API_KEY="your_api_key" +``` + +Use them in the requests below as `$OUTPOST_API_BASE_URL` and `$OUTPOST_API_KEY`. + +## Create a tenant + +Each tenant maps to one of your customers. Pick a stable ID from your own system (for example a team or account ID). + +```sh +TENANT_ID="customer_acme_001" + +curl --request PUT "$OUTPOST_API_BASE_URL/tenants/$TENANT_ID" \ + --header "Authorization: Bearer $OUTPOST_API_KEY" +``` + +## Create a webhook destination + +Subscribe the tenant to one or more topics you configured in the dashboard. Set `config.url` to an HTTPS endpoint you control. + +If you do not have your own endpoint yet, open [Hookdeck Console](https://console.hookdeck.com?ref=outpost-docs), create a **Source**, and paste that Source URL as the webhook URL below (or any HTTPS URL you own). Replace `REPLACE_WITH_YOUR_WEBHOOK_URL` accordingly. + +Replace `user.created` with a topic that exists in your project if needed. + +```sh +curl --request POST "$OUTPOST_API_BASE_URL/tenants/$TENANT_ID/destinations" \ + --header "Authorization: Bearer $OUTPOST_API_KEY" \ + --header "Content-Type: application/json" \ + --data '{ + "type": "webhook", + "topics": ["user.created"], + "config": { + "url": "REPLACE_WITH_YOUR_WEBHOOK_URL" + } + }' +``` + +To receive every configured topic on this destination, set `"topics": ["*"]` instead. + +## Publish a test event + +Use the same tenant ID and a `topic` that matches both your dashboard configuration and the destination’s `topics`. + +```sh +curl --request POST "$OUTPOST_API_BASE_URL/publish" \ + --header "Authorization: Bearer $OUTPOST_API_KEY" \ + --header "Content-Type: application/json" \ + --data '{ + "tenant_id": "'"$TENANT_ID"'", + "topic": "user.created", + "eligible_for_retry": true, + "metadata": { + "source": "quickstart" + }, + "data": { + "user_id": "user_123" + } + }' +``` + +A `202` response means the event was accepted for delivery. + +## Verify delivery + +- In **Hookdeck Console**, inspect the connection or destination you used (for example the Source you created) and confirm the webhook request and payload look correct. +- In the **Hookdeck Dashboard**, open **your Outpost project** and review **logs** (and any deliveries or event views your project exposes) to confirm the event was processed and delivered. + +## Next steps + +- [Destination types](/docs/destinations) — webhooks, AWS SQS, RabbitMQ, Hookdeck, and more +- [Tenant user portal](/docs/features/tenant-user-portal) — optional UI for tenants to manage their own destinations +- [SDKs](/docs/sdks) — TypeScript, Python, Go, and others +- [API reference](/docs/api/authentication) — full REST API diff --git a/docs/pages/quickstarts/hookdeck-outpost-go.mdx b/docs/pages/quickstarts/hookdeck-outpost-go.mdx new file mode 100644 index 000000000..c70ff9986 --- /dev/null +++ b/docs/pages/quickstarts/hookdeck-outpost-go.mdx @@ -0,0 +1,163 @@ +--- +title: "Hookdeck Outpost Quickstart: Go" +--- + +[Hookdeck Outpost](https://outpost.hookdeck.com) is Hookdeck’s managed [Outpost](https://github.com/hookdeck/outpost) service. Use **tenants** for each customer, **destinations** for delivery targets, and **topics** aligned with your dashboard configuration. + +## Prerequisites + +- A Hookdeck account with an Outpost project +- An **API key** (Outpost API key) from your project: **Settings → Secrets** +- **Topics** already configured in the dashboard +- [Go](https://go.dev/) 1.22+ recommended +- API base URL: `https://api.outpost.hookdeck.com/2025-07-01` + +## Install the SDK + +```sh +go get github.com/hookdeck/outpost/sdks/outpost-go +``` + +## Set up credentials + +In the Hookdeck Dashboard, open your Outpost project, go to **Settings → Secrets**, and create or copy an API key. Export it (and optionally the base URL) in your shell: + +```sh +export OUTPOST_API_KEY="your_api_key" +export OUTPOST_API_BASE_URL="https://api.outpost.hookdeck.com/2025-07-01" +``` + +If `OUTPOST_API_BASE_URL` is unset, the SDK uses its default production server URL. + +## Set environment variables + +Set these in the shell where you run `go run .` (or inject them the way your deployment platform expects). + +1. **`OUTPOST_API_KEY`** — **Required.** From **Settings → Secrets**. The program exits if it is missing. + +2. **`OUTPOST_API_BASE_URL`** — **Optional.** When set, the client is configured with `WithServerURL`. Otherwise the Go SDK uses its default Hookdeck Outpost production URL. + +3. **`OUTPOST_TEST_WEBHOOK_URL`** — **Required for this walkthrough.** Webhook destination URL (HTTPS). Use your own server or a [Hookdeck Console](https://console.hookdeck.com?ref=outpost-docs) **Source** URL for a quick test. + +## Create and run the quickstart program + +Use `main.go` in a small module (after `go get github.com/hookdeck/outpost/sdks/outpost-go`). + +The program (**1)** configures the client with your API key, (**2)** upserts a tenant, (**3)** creates a webhook destination for your topic, (**4)** publishes one event, and (**5)** prints ids. + +```go +package main + +import ( + "context" + "fmt" + "log" + "os" + + outpostgo "github.com/hookdeck/outpost/sdks/outpost-go" + "github.com/hookdeck/outpost/sdks/outpost-go/models/components" +) + +func main() { + ctx := context.Background() + + // + // --- 1. Authenticated client (API key from Settings → Secrets) --- + // + + apiKey := os.Getenv("OUTPOST_API_KEY") + if apiKey == "" { + log.Fatal("Set OUTPOST_API_KEY") + } + + opts := []outpostgo.SDKOption{outpostgo.WithSecurity(apiKey)} + if base := os.Getenv("OUTPOST_API_BASE_URL"); base != "" { + opts = append(opts, outpostgo.WithServerURL(base)) + } + + s := outpostgo.New(opts...) + + // + // --- 2. Tenant id, topic name, and webhook URL (from env) --- + // + // tenantID = one of your customers in Outpost. + // topic = must match a topic configured in the dashboard. + // + + tenantID := "customer_acme_001" + topic := "user.created" + + webhookURL := os.Getenv("OUTPOST_TEST_WEBHOOK_URL") + if webhookURL == "" { + log.Fatal("Set OUTPOST_TEST_WEBHOOK_URL (e.g. a Hookdeck Console Source URL)") + } + + // + // --- 3. Create or update the tenant --- + // + + if _, err := s.Tenants.Upsert(ctx, tenantID, nil); err != nil { + log.Fatal(err) + } + + // + // --- 4. Webhook destination: events on `topic` are POSTed to this URL --- + // + + destBody := components.CreateDestinationCreateWebhook( + components.DestinationCreateWebhook{ + Topics: components.CreateTopicsArrayOfStr([]string{topic}), + Config: components.WebhookConfig{URL: webhookURL}, + }, + ) + + createRes, err := s.Destinations.Create(ctx, tenantID, destBody) + if err != nil { + log.Fatal(err) + } + + if createRes != nil && createRes.GetDestinationWebhook() != nil { + fmt.Println("Destination id:", createRes.GetDestinationWebhook().GetID()) + } + + // + // --- 5. Publish one event --- + // + + pubRes, err := s.Publish.Event(ctx, components.PublishRequest{ + TenantID: outpostgo.String(tenantID), + Topic: outpostgo.String(topic), + EligibleForRetry: outpostgo.Bool(true), + Metadata: map[string]string{"source": "quickstart"}, + Data: map[string]any{"user_id": "user_123"}, + }) + + if err != nil { + log.Fatal(err) + } + + if pubRes != nil && pubRes.GetPublishResponse() != nil { + fmt.Println("Published event id:", pubRes.GetPublishResponse().GetID()) + } +} +``` + +Run: + +```sh +go run . +``` + +For all topics on that destination, use `components.CreateTopicsTopicsEnum(components.TopicsEnumWildcard)` instead of `CreateTopicsArrayOfStr`. + +## Verify delivery + +- In **Hookdeck Console**, confirm the webhook hit your test URL. +- In the **Hookdeck Dashboard**, open **your Outpost project** and review **logs** to confirm the event was processed and delivered. + +## Next steps + +- [Destination types](/docs/destinations) +- [Tenant user portal](/docs/features/tenant-user-portal) +- [SDKs](/docs/sdks) +- [API reference](/docs/api/authentication) diff --git a/docs/pages/quickstarts/hookdeck-outpost-python.mdx b/docs/pages/quickstarts/hookdeck-outpost-python.mdx new file mode 100644 index 000000000..f49fd28f1 --- /dev/null +++ b/docs/pages/quickstarts/hookdeck-outpost-python.mdx @@ -0,0 +1,134 @@ +--- +title: "Hookdeck Outpost Quickstart: Python" +--- + +[Hookdeck Outpost](https://outpost.hookdeck.com) is Hookdeck’s managed [Outpost](https://github.com/hookdeck/outpost) service. Each **tenant** is one of your customers; **destinations** receive events; **topics** must match what you configured in the dashboard. + +## Prerequisites + +- A Hookdeck account with an Outpost project +- An **API key** (Outpost API key) from your project: **Settings → Secrets** +- **Topics** already configured in the dashboard +- Python 3.9+ recommended +- API base URL: `https://api.outpost.hookdeck.com/2025-07-01` + +## Install the SDK + +```sh +pip install outpost_sdk +``` + +## Set up credentials + +In the Hookdeck Dashboard, open your Outpost project, go to **Settings → Secrets**, and create or copy an API key. Export it (and optionally the base URL) in your shell: + +```sh +export OUTPOST_API_KEY="your_api_key" +export OUTPOST_API_BASE_URL="https://api.outpost.hookdeck.com/2025-07-01" +``` + +The SDK defaults to the production API base URL when `server_url` is omitted. + +## Set environment variables + +Set these in the same shell before you run the script (or load them with your preferred `.env` helper). + +1. **`OUTPOST_API_KEY`** — **Required.** From **Settings → Secrets**. Without it the script exits, because every API call must be authenticated. + +2. **`OUTPOST_API_BASE_URL`** — **Optional.** Passed through as `server_url` on the client. Omit it to use the SDK default production URL for Hookdeck Outpost. + +3. **`OUTPOST_TEST_WEBHOOK_URL`** — **Required for this walkthrough.** Webhook destinations need an HTTPS URL. Use your own endpoint or a [Hookdeck Console](https://console.hookdeck.com?ref=outpost-docs) **Source** URL for a quick, no-server test. + +## Create and run the quickstart script + +Save as `outpost_quickstart.py`. + +The script (**1)** creates an authenticated client, (**2)** upserts a tenant, (**3)** creates a webhook destination subscribed to your topic, (**4)** publishes one test event, and (**5)** prints the event id. + +```python +import os + +from outpost_sdk import Outpost + +# +# --- 1. Authenticated client (API key from Settings → Secrets) --- +# + +api_key = os.environ.get("OUTPOST_API_KEY") +if not api_key: + raise SystemExit("Set OUTPOST_API_KEY") + +base_url = os.environ.get("OUTPOST_API_BASE_URL") +client = Outpost(api_key=api_key, server_url=base_url) + +# +# --- 2. Tenant id, topic name, and webhook URL (from env) --- +# +# tenant_id = one of your customers in Outpost. +# topic = must match a topic configured in the dashboard. +# + +tenant_id = "customer_acme_001" +topic = "user.created" + +webhook_url = os.environ.get("OUTPOST_TEST_WEBHOOK_URL") +if not webhook_url: + raise SystemExit( + "Set OUTPOST_TEST_WEBHOOK_URL (e.g. a Hookdeck Console Source URL)" + ) + +# +# --- 3. Create or update the tenant --- +# + +client.tenants.upsert(tenant_id=tenant_id) + +# +# --- 4. Webhook destination: events on `topic` are POSTed to this URL --- +# + +client.destinations.create( + tenant_id=tenant_id, + body={ + "type": "webhook", + "topics": [topic], + "config": {"url": webhook_url}, + }, +) + +# +# --- 5. Publish one event --- +# + +published = client.publish.event( + request={ + "tenant_id": tenant_id, + "topic": topic, + "eligible_for_retry": True, + "metadata": {"source": "quickstart"}, + "data": {"user_id": "user_123"}, + } +) + +print("Published event id:", published.id) +``` + +Run: + +```sh +python outpost_quickstart.py +``` + +Use `topics: ["*"]` on the destination to receive all configured topics. + +## Verify delivery + +- In **Hookdeck Console**, confirm the webhook hit your test URL. +- In the **Hookdeck Dashboard**, open **your Outpost project** and review **logs** to confirm the event was processed and delivered. + +## Next steps + +- [Destination types](/docs/destinations) +- [Tenant user portal](/docs/features/tenant-user-portal) +- [SDKs](/docs/sdks) +- [API reference](/docs/api/authentication) diff --git a/docs/pages/quickstarts/hookdeck-outpost-typescript.mdx b/docs/pages/quickstarts/hookdeck-outpost-typescript.mdx new file mode 100644 index 000000000..a3bbfe04a --- /dev/null +++ b/docs/pages/quickstarts/hookdeck-outpost-typescript.mdx @@ -0,0 +1,135 @@ +--- +title: "Hookdeck Outpost Quickstart: TypeScript" +--- + +[Hookdeck Outpost](https://outpost.hookdeck.com) is Hookdeck’s managed [Outpost](https://github.com/hookdeck/outpost) service. Each **tenant** represents one of your platform’s customers; **destinations** are where events are delivered; **topics** route events to the right destinations. + +This quickstart uses the official TypeScript SDK. Configure **topics** in the Hookdeck dashboard before publishing—use a topic name that exists there in the code below. + +## Prerequisites + +- A Hookdeck account with an Outpost project +- An **API key** (Outpost API key) from your project: **Settings → Secrets** +- **Topics** already configured in the dashboard +- [Node.js](https://nodejs.org/) 18+ recommended +- API base URL: `https://api.outpost.hookdeck.com/2025-07-01` + +## Install the SDK + +```sh +npm install @hookdeck/outpost-sdk +``` + +## Set up credentials + +In the Hookdeck Dashboard, open your Outpost project, go to **Settings → Secrets**, and create or copy an API key. Export it (and optionally the base URL) in your shell: + +```sh +export OUTPOST_API_KEY="your_api_key" +export OUTPOST_API_BASE_URL="https://api.outpost.hookdeck.com/2025-07-01" +``` + +The SDK defaults to the production API base URL, so `OUTPOST_API_BASE_URL` is only needed if you want to be explicit or point at another environment. + +## Set environment variables + +Before you run the quickstart script, define these in the same terminal session (or load them from a `.env` file if your tooling supports it). + +1. **`OUTPOST_API_KEY`** — **Required.** Copy the Outpost API key from **Settings → Secrets** in your project. The script passes this to the SDK as the Bearer token. Without it, the script stops with an error. + +2. **`OUTPOST_API_BASE_URL`** — **Optional.** Only set this if you need to override the API host. For Hookdeck Outpost you can omit it entirely: the SDK already uses `https://api.outpost.hookdeck.com/2025-07-01`. + +3. **`OUTPOST_TEST_WEBHOOK_URL`** — **Required for this walkthrough.** The script creates a webhook destination, which must point at an HTTPS URL. Easiest path: open [Hookdeck Console](https://console.hookdeck.com?ref=outpost-docs), create a **Source**, copy its URL, and assign it to this variable so you can see the webhook payload without deploying your own server. + +## Create and run the quickstart script + +Save the following as `outpost-quickstart.ts`. + +The script (**1)** builds an authenticated SDK client, (**2)** ensures a tenant exists, (**3)** adds a webhook destination subscribed to your topic, (**4)** publishes one test event, and (**5)** prints the event id. + +```typescript +import { Outpost } from "@hookdeck/outpost-sdk"; + +// +// --- 1. Authenticated client (API key from Settings → Secrets) --- +// + +const apiKey = process.env.OUTPOST_API_KEY; +if (!apiKey) { + throw new Error("Set OUTPOST_API_KEY"); +} + +const outpost = new Outpost({ + apiKey, + ...(process.env.OUTPOST_API_BASE_URL + ? { serverURL: process.env.OUTPOST_API_BASE_URL } + : {}), +}); + +// +// --- 2. Tenant id, topic name, and webhook URL (from env) --- +// +// tenantId = one of your customers in Outpost. +// topic = must match a topic configured in the dashboard. +// + +const tenantId = "customer_acme_001"; +const topic = "user.created"; + +const webhookUrl = process.env.OUTPOST_TEST_WEBHOOK_URL; +if (!webhookUrl) { + throw new Error( + "Set OUTPOST_TEST_WEBHOOK_URL to an HTTPS endpoint (e.g. a Hookdeck Console Source URL)", + ); +} + +// +// --- 3. Create or update the tenant --- +// + +await outpost.tenants.upsert(tenantId); + +// +// --- 4. Webhook destination: Outpost delivers events on `topic` to this URL --- +// + +await outpost.destinations.create(tenantId, { + type: "webhook", + topics: [topic], + config: { url: webhookUrl }, +}); + +// +// --- 5. Publish one event (delivered to destinations subscribed to `topic`) --- +// + +const published = await outpost.publish.event({ + tenantId, + topic, + eligibleForRetry: true, + metadata: { source: "quickstart" }, + data: { user_id: "user_123" }, +}); + +console.log("Published event id:", published.id); +``` + +Run: + +```sh +npx tsx outpost-quickstart.ts +``` + +To subscribe the destination to all topics, pass `topics: ["*"]` instead of `[topic]`. + +## Verify delivery + +- In **Hookdeck Console**, inspect the Source or connection you used for `OUTPOST_TEST_WEBHOOK_URL` and confirm the webhook request arrived as expected. +- In the **Hookdeck Dashboard**, open **your Outpost project** and review **logs** to confirm the event was processed and delivered. + +## Next steps + +- [Destination types](/docs/destinations) +- [Tenant user portal](/docs/features/tenant-user-portal) +- [SDKs](/docs/sdks) +- [API reference](/docs/api/authentication) diff --git a/docs/zudoku.config.ts b/docs/zudoku.config.ts index 1687bd4c9..ec7164478 100644 --- a/docs/zudoku.config.ts +++ b/docs/zudoku.config.ts @@ -86,19 +86,60 @@ const config: ZudokuConfig = { collapsible: false, items: [ { - type: "doc", - label: "Docker", - id: "quickstarts/docker", + type: "category", + label: "Hookdeck Outpost", + collapsed: false, + collapsible: true, + items: [ + { + type: "doc", + label: "curl", + id: "quickstarts/hookdeck-outpost-curl", + }, + { + type: "doc", + label: "TypeScript", + id: "quickstarts/hookdeck-outpost-typescript", + }, + { + type: "doc", + label: "Python", + id: "quickstarts/hookdeck-outpost-python", + }, + { + type: "doc", + label: "Go", + id: "quickstarts/hookdeck-outpost-go", + }, + { + type: "doc", + label: "Agent prompt", + id: "quickstarts/hookdeck-outpost-agent-prompt", + }, + ], }, { - type: "doc", - label: "Kubernetes", - id: "quickstarts/kubernetes", - }, - { - type: "doc", - label: "Railway", - id: "quickstarts/railway", + type: "category", + label: "Self-Hosted", + collapsed: false, + collapsible: true, + items: [ + { + type: "doc", + label: "Docker", + id: "quickstarts/docker", + }, + { + type: "doc", + label: "Kubernetes", + id: "quickstarts/kubernetes", + }, + { + type: "doc", + label: "Railway", + id: "quickstarts/railway", + }, + ], }, ], }, From e0897218161fde31573856e5926d2b5900a56520 Mon Sep 17 00:00:00 2001 From: Phil Leggetter Date: Wed, 8 Apr 2026 11:23:18 +0100 Subject: [PATCH 02/47] docs: add Outpost agent evaluation harness and scenarios - Claude Agent SDK runner with explicit --scenario/--scenarios/--all, per-run workspace - Heuristic + LLM scoring vs scenario Success criteria; score-transcript 01-10 - Scenarios: basics, minimal apps, existing-app integration baselines - CI slice (eval:ci), SCENARIO-RUN-TRACKER, prompt template Files on disk guidance - Allow committing docs/**/.env.example under docs/.gitignore - TEMP status and README updates Made-with: Cursor --- docs/.gitignore | 1 + ...TEMP-hookdeck-outpost-onboarding-status.md | 87 +- docs/agent-evaluation/.env.example | 31 + docs/agent-evaluation/README.md | 197 ++ docs/agent-evaluation/SCENARIO-RUN-TRACKER.md | 53 + docs/agent-evaluation/SKILL-UPSTREAM-NOTES.md | 22 + .../fixtures/placeholder-values-for-turn0.md | 27 + docs/agent-evaluation/package-lock.json | 2096 +++++++++++++++++ docs/agent-evaluation/package.json | 25 + docs/agent-evaluation/results/.gitignore | 5 + docs/agent-evaluation/results/README.md | 57 + .../results/RUN-RECORDING.template.md | 36 + .../scenarios/01-basics-curl.md | 48 + .../scenarios/02-basics-typescript.md | 45 + .../scenarios/03-basics-python.md | 43 + .../scenarios/04-basics-go.md | 38 + .../scenarios/05-app-nextjs.md | 58 + .../scenarios/06-app-fastapi.md | 47 + .../scenarios/07-app-go-http.md | 46 + .../scenarios/08-integrate-nextjs-existing.md | 59 + .../09-integrate-fastapi-existing.md | 52 + .../scenarios/10-integrate-go-existing.md | 51 + docs/agent-evaluation/scripts/ci-eval.sh | 22 + docs/agent-evaluation/scripts/run-scenario.sh | 46 + docs/agent-evaluation/src/llm-judge.ts | 230 ++ docs/agent-evaluation/src/run-agent-eval.ts | 527 +++++ docs/agent-evaluation/src/score-eval.ts | 183 ++ docs/agent-evaluation/src/score-transcript.ts | 1119 +++++++++ docs/agent-evaluation/tsconfig.json | 15 + .../hookdeck-outpost-agent-prompt.mdx | 16 +- 30 files changed, 5267 insertions(+), 15 deletions(-) create mode 100644 docs/agent-evaluation/.env.example create mode 100644 docs/agent-evaluation/README.md create mode 100644 docs/agent-evaluation/SCENARIO-RUN-TRACKER.md create mode 100644 docs/agent-evaluation/SKILL-UPSTREAM-NOTES.md create mode 100644 docs/agent-evaluation/fixtures/placeholder-values-for-turn0.md create mode 100644 docs/agent-evaluation/package-lock.json create mode 100644 docs/agent-evaluation/package.json create mode 100644 docs/agent-evaluation/results/.gitignore create mode 100644 docs/agent-evaluation/results/README.md create mode 100644 docs/agent-evaluation/results/RUN-RECORDING.template.md create mode 100644 docs/agent-evaluation/scenarios/01-basics-curl.md create mode 100644 docs/agent-evaluation/scenarios/02-basics-typescript.md create mode 100644 docs/agent-evaluation/scenarios/03-basics-python.md create mode 100644 docs/agent-evaluation/scenarios/04-basics-go.md create mode 100644 docs/agent-evaluation/scenarios/05-app-nextjs.md create mode 100644 docs/agent-evaluation/scenarios/06-app-fastapi.md create mode 100644 docs/agent-evaluation/scenarios/07-app-go-http.md create mode 100644 docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md create mode 100644 docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md create mode 100644 docs/agent-evaluation/scenarios/10-integrate-go-existing.md create mode 100755 docs/agent-evaluation/scripts/ci-eval.sh create mode 100755 docs/agent-evaluation/scripts/run-scenario.sh create mode 100644 docs/agent-evaluation/src/llm-judge.ts create mode 100644 docs/agent-evaluation/src/run-agent-eval.ts create mode 100644 docs/agent-evaluation/src/score-eval.ts create mode 100644 docs/agent-evaluation/src/score-transcript.ts create mode 100644 docs/agent-evaluation/tsconfig.json diff --git a/docs/.gitignore b/docs/.gitignore index d777781b5..1f70a5a5a 100644 --- a/docs/.gitignore +++ b/docs/.gitignore @@ -27,6 +27,7 @@ yarn-error.log* # env files (can opt-in for commiting if needed) .env* +!.env.example # typescript *.tsbuildinfo diff --git a/docs/TEMP-hookdeck-outpost-onboarding-status.md b/docs/TEMP-hookdeck-outpost-onboarding-status.md index 9faa176f0..1d481b17f 100644 --- a/docs/TEMP-hookdeck-outpost-onboarding-status.md +++ b/docs/TEMP-hookdeck-outpost-onboarding-status.md @@ -6,24 +6,93 @@ --- +## Agent eval harness — **implemented**; **prompt validation in progress** + +The automated harness in `docs/agent-evaluation/` is in place. **What it does today:** + +| Area | Status | +|------|--------| +| **Runner** | `src/run-agent-eval.ts` — **## Template** from `hookdeck-outpost-agent-prompt.mdx`, `{{…}}` from env, multi-turn scenarios, **Claude Agent SDK** with **`Read` / `Glob` / `Grep` / `WebFetch` / `Write` / `Edit` / `Bash`**, **`cwd`** = `results/runs/-scenario-NN/` | +| **Artifacts** | `transcript.json`, optional **`heuristic-score.json`** + **`llm-score.json`** (LLM reads each scenario **`## Success criteria`**), agent-written files beside the transcript | +| **Heuristics** | `score-transcript.ts` — **`scoreScenario01`–`scoreScenario10`** on assistant text + tool corpus (so **Write**/Edit content counts) | +| **Scenarios** | **01–04:** try-it-out (curl, TS, Python, Go). **05–07:** minimal UIs (Next, FastAPI, Go `net/http`). **08–10:** Option 3 — integrate into pinned repos (Next **`leerob/next-saas-starter`**, FastAPI **`philipokiokio/FastAPI_SAAS_Template`**, Go **`devinterface/startersaas-go-api`**) | +| **CLI** | **`npm run eval` requires `--scenario`, `--scenarios`, or `--all`** — no accidental full-suite run. Default scoring = **heuristic + LLM judge** unless **`--no-score`** / **`--no-score-llm`** or **`EVAL_NO_SCORE_*`**. **Exit 1** if any enabled score fails | +| **CI** | **`npm run eval:ci`** = **`--scenarios 01,02`** + heuristic **and** LLM judge. **`scripts/ci-eval.sh`** — requires **`ANTHROPIC_API_KEY`**, **`EVAL_TEST_DESTINATION_URL`** | +| **Re-score** | `npm run score -- --run [--llm] [--write]` | + +**Operational** + +- Prefer a normal runner / full permissions for session persistence (`~/.claude/...`); tight sandboxes can break multi-turn resume. +- **Validate the prompt in stages** (simple → complex); exact commands below. + +### Recommended run order (test evals → stress prompt) + +Run from **`docs/agent-evaluation/`** with **`.env`** set (**`ANTHROPIC_API_KEY`**, **`EVAL_TEST_DESTINATION_URL`**). Use a normal terminal (not a restricted sandbox) for reliable SDK sessions. + +**Stage A — basics (fast, minimal tooling)** + +```sh +npm run eval -- --scenarios 01,02,03,04 +``` + +**Stage B — minimal example apps** + +```sh +npm run eval -- --scenarios 05,06,07 +``` + +**Stage C — existing-app integration (clone + integrate; slowest)** + +```sh +npm run eval -- --scenarios 08,09,10 +``` + +**Full suite (explicit cost)** + +```sh +npm run eval -- --all +``` + +After each stage, inspect **`results/runs/-scenario-NN/`** (transcript, scores, on-disk artifacts). **Goal:** confirm the **dashboard prompt** + **Success criteria** hold across stacks; **Execution** (live **`OUTPOST_API_KEY`**) remains a separate human step per scenario. + +--- + +## Agent eval automation (original plan — historical) + +1. **In-repo runner** — ✅ Node + Agent SDK (not shell-only `curl`). +2. **Default backend: Anthropic** — ✅ Agent SDK. +3. **Claude Code CLI** — Optional local path only (unchanged). +4. **OpenAI adapter** — Still optional / not implemented. +5. **Judging** — ✅ Transcripts on disk; ✅ heuristics; ✅ LLM-as-judge vs **`## Success criteria`**. +6. **CI shape** — ✅ `eval:ci` + docs; **GitHub Actions workflow** not committed (add `workflow_dispatch` + secrets when ready). + +**Avoid as primary design:** brittle hand-rolled JSON in bash, or CLI-only gates that break for contributors and headless runners. + +--- + ## Done (Outpost OSS repo) - Managed quickstarts: `hookdeck-outpost-curl.mdx`, `-typescript.mdx`, `-python.mdx`, `-go.mdx` -- Agent prompt template page: `hookdeck-outpost-agent-prompt.mdx` +- Agent prompt template page: `hookdeck-outpost-agent-prompt.mdx` (includes **Files on disk** guidance) - Zudoku sidebar: **Quickstarts → Hookdeck Outpost** (above **Self-Hosted**) - `quickstarts.mdx` index: managed vs self-hosted links -- Content aligned with product copy: API key from **Settings → Secrets**, standard markdown (no `:::tip`), verify via Hookdeck Console + project logs -- SDK examples: env vars section, numbered quickstart scripts with step comments +- Content aligned with product copy: API key from **Settings → Secrets**, verify via Hookdeck Console + project logs +- SDK quickstarts: env vars, step-commented scripts +- **Agent evaluation:** `docs/agent-evaluation/` — scenarios **01–10**, dual scoring, explicit CLI, CI slice, **`SCENARIO-RUN-TRACKER.md`** (per-scenario + execution log), `results/README.md`, `fixtures/`, `SKILL-UPSTREAM-NOTES.md` ## Pending / follow-up -- **QA:** Run TypeScript, Python, and Go examples against live managed API; confirm all doc links resolve on production docs URL -- **Test destination URL:** When `console.hookdeck.com` (or equivalent) has a stable public URL format, update quickstarts if it replaces “create a Console Source” instructions -- **Hookdeck Dashboard:** Two-step onboarding (topics → copy agent prompt) with placeholder injection (`{{API_BASE_URL}}`, `{{TOPICS_LIST}}`, `{{TEST_DESTINATION_URL}}`, `{{DOCS_URL}}`, optional `{{LLMS_FULL_URL}}`); env var UI for `OUTPOST_API_KEY` (not in prompt body) -- **Hookdeck Astro site:** Consume MDX, `llms.txt` / `llms-full.txt` / `.md` exports, canonical `DOCS_URL` (e.g. `https://hookdeck.com/outpost/docs`) -- **Deferred (not blocking GA):** Broader docs IA (“Self-Hosted” under Guides, redirects for moved pages) per original plan +- **Prompt + eval validation (in progress):** Run stages **A → B → C** above (or **`--all`** when deliberate); record pass/fail per scenario; adjust prompt or heuristics if systematic failures appear +- **hookdeck/agent-skills:** Refresh `skills/outpost/SKILL.md` using `docs/agent-evaluation/SKILL-UPSTREAM-NOTES.md` (managed-first, correct `/tenants/` paths, env naming) +- **QA:** Run TypeScript, Python, and Go examples against live managed API; confirm production doc links +- **Test destination URL:** When Console has a stable public URL story, align quickstarts if copy changes +- **Hookdeck Dashboard:** Two-step onboarding (topics → copy agent prompt) with placeholder injection; env UI for `OUTPOST_API_KEY` (not in prompt body) +- **Hookdeck Astro site:** MDX, `llms.txt` / `llms-full.txt`, canonical `DOCS_URL` +- **CI workflow:** Optional GitHub Actions job for `eval:ci` with secrets +- **Deferred (not blocking GA):** Broader docs IA per original plan ## References - OpenAPI / managed base URL: `https://api.outpost.hookdeck.com/2025-07-01` (in `docs/apis/openapi.yaml` `servers`) -- Agent template source: `docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx` \ No newline at end of file +- Agent template source: `docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx` +- Eval harness: `docs/agent-evaluation/README.md` diff --git a/docs/agent-evaluation/.env.example b/docs/agent-evaluation/.env.example new file mode 100644 index 000000000..6f1e3eb48 --- /dev/null +++ b/docs/agent-evaluation/.env.example @@ -0,0 +1,31 @@ +# Copy to .env and fill in. .env is gitignored at the repo root. + +# Required for npm run eval (Claude Agent SDK — calls Anthropic only) +ANTHROPIC_API_KEY= + +# Required for Turn 0 template (test webhook URL injected into the prompt) +EVAL_TEST_DESTINATION_URL= + +# Strongly recommended for a *full* eval: run the agent’s curl/script/app against a real project. +# The harness does not read this key; you (or a future verifier) use it after the run. +# OUTPOST_API_KEY= +# OUTPOST_API_BASE_URL=https://api.outpost.hookdeck.com/2025-07-01 +# OUTPOST_TEST_WEBHOOK_URL=https://hkdk.events/your-source-id # often same as EVAL_TEST_DESTINATION_URL + +# Optional (see npm run eval -- --help) +# EVAL_API_BASE_URL=https://api.outpost.hookdeck.com/2025-07-01 +# EVAL_TOPICS_LIST=- user.created +# EVAL_DOCS_URL=https://outpost.hookdeck.com/docs +# EVAL_LOCAL_DOCS=1 +# EVAL_LLMS_FULL_URL= +# Default includes Write, Edit, Bash (per-run workspace + installs). Override to narrow: +# EVAL_TOOLS=Read,Glob,Grep,WebFetch,Write,Edit,Bash +# EVAL_MODEL= +# EVAL_MAX_TURNS=40 +# EVAL_PERMISSION_MODE=dontAsk +# EVAL_PERSIST_SESSION=true + +# Scoring is ON by default after each scenario (heuristic + LLM). Opt out: +# EVAL_NO_SCORE_HEURISTIC=1 +# EVAL_NO_SCORE_LLM=1 +# EVAL_SCORE_MODEL=claude-sonnet-4-20250514 diff --git a/docs/agent-evaluation/README.md b/docs/agent-evaluation/README.md new file mode 100644 index 000000000..274921647 --- /dev/null +++ b/docs/agent-evaluation/README.md @@ -0,0 +1,197 @@ +# Agent evaluation — Hookdeck Outpost onboarding + +This folder contains **manual** scenario specs (markdown) and an **automated** runner that uses the [Claude Agent SDK](https://platform.claude.com/docs/en/agent-sdk/overview) (`src/run-agent-eval.ts`). + +## Where success criteria live + +| What | Where | +|------|--------| +| **Human checklist** (full eval, including execution) | Each file under [`scenarios/`](scenarios/) — section **Success criteria** (static + **Execution (full pass)** rows). | +| **Manual run write-up** | [`results/RUN-RECORDING.template.md`](results/RUN-RECORDING.template.md) — copy to a local file under `results/` (gitignored). | +| **Automated transcript rubric** (regex heuristics) | [`src/score-transcript.ts`](src/score-transcript.ts) — `scoreScenario01`–`scoreScenario10` (assistant text + tool-written file corpus). | +| **LLM judge** (Anthropic vs **`## Success criteria`** in each scenario) | [`src/llm-judge.ts`](src/llm-judge.ts) — runs after each scenario unless **`--no-score-llm`**; also `npm run score -- --llm`. | + +**Deliberate scope:** `npm run eval` **requires** **`--scenario`**, **`--scenarios`**, or **`--all`**. There is no silent “run everything” default — you choose the scenarios and accept the cost. After **each** run: **`transcript.json`**, **`heuristic-score.json`**, and **`llm-score.json`** (judge reads the same **Success criteria** as humans). Exit **1** if any enabled score fails. + +Opt out of scoring: **`--no-score`** (heuristic only), **`--no-score-llm`** (drops the Success-criteria judge), or **`.env`**: **`EVAL_NO_SCORE_HEURISTIC=1`**, **`EVAL_NO_SCORE_LLM=1`**. Transcript-only: **`npm run eval -- --no-score --no-score-llm`**. + +Each scenario run uses one directory: + +`results/runs/-scenario-NN/` + +- **`transcript.json`** — full SDK log +- **`heuristic-score.json`** / **`llm-score.json`** — by default (unless disabled above) +- **Agent-written files** — the SDK **`cwd`** is this directory. Defaults include **`Write`**, **`Edit`**, and **`Bash`** for clones, installs, and generated code. + +Re-score a finished run without re-invoking the agent: + +- **`npm run score -- --run results/runs/`** — heuristic (add **`--llm`** for LLM only, **`--write`** to persist sidecars). + +Legacy flat files `*-scenario-NN.json` next to `runs/` are still accepted by **`npm run score`** for older runs. + +**Execution** (live Outpost) is still not auto-verified; the LLM is instructed to set `execution_in_transcript.pass` to **null** unless the transcript itself reports HTTP results. + +## Automated runs (Claude Agent SDK) + +From `docs/agent-evaluation/`: + +```sh +npm install +cp .env.example .env # then edit: ANTHROPIC_API_KEY, EVAL_TEST_DESTINATION_URL, … +npm run eval -- --scenario 01 +npm run eval -- --scenarios 01,02,08 +npm run eval -- --all # explicit full suite (every scenario file) +npm run eval:ci # same as --scenarios 01,02 + heuristic + LLM judge (see § CI) +npm run eval -- --dry-run +``` + +The runner loads **`docs/agent-evaluation/.env`** automatically (via `dotenv`). Shell exports still override `.env` if both are set. + +### CI (recommended slice) + +For **pull-request or main-branch** automation, run **two** scenarios only: + +| Scenario | Why | +|----------|-----| +| **01** (curl) | Shortest path: managed API, tenant → destination → publish, no `npm install` / framework scaffold. Cheap signal that the prompt + heuristics still align with the curl quickstart. | +| **02** (TypeScript) | Most common integration style: **`@hookdeck/outpost-sdk`**, env vars, same API flow in code. Still much faster than **05** (Next.js) or **08** (clone a full SaaS repo). | + +**Commands:** + +```sh +cd docs/agent-evaluation && npm ci && npm run eval:ci +# or: ./scripts/ci-eval.sh # requires ANTHROPIC_API_KEY + EVAL_TEST_DESTINATION_URL in the environment +``` + +`eval:ci` is **`npm run eval -- --scenarios 01,02`**: both **heuristic** checks and the **LLM judge** (grounded in each scenario’s **`## Success criteria`**). Skipping the judge would leave you with regex-only signal, which does not encode the product checklist. + +**GitHub Actions:** add repository secrets **`ANTHROPIC_API_KEY`** and **`EVAL_TEST_DESTINATION_URL`**, run from `docs/agent-evaluation` with a normal runner (Claude Agent SDK needs session filesystem access — avoid tight sandboxes; see **Permissions / failures** above). **`OUTPOST_API_KEY`** is still not required for transcript-only CI. + +- **`ANTHROPIC_API_KEY`** — required for the agent and for the **LLM judge** (Success criteria) after each scenario you run. +- **`EVAL_TEST_DESTINATION_URL`** — required for Turn 0; same Source URL as `{{TEST_DESTINATION_URL}}`. +- **`OUTPOST_API_KEY`** — **not** read by the automated runner, but **required if you want a full evaluation**: without it you can only judge the transcript (plausible curl/SDK text). To verify that **generated commands or code actually work**, put the same Outpost API key you use against the managed API in **`docs/agent-evaluation/.env`** (or export it) and run the agent’s output against a real project. The onboarding prompt tells operators to keep that key in **`.env`** and never paste it into chat. +- **`EVAL_LOCAL_DOCS=1`** — before public docs are live, set this so Turn 0 replaces public doc URLs with **absolute paths to MDX/OpenAPI files in this repo** (so the agent should use **Read** on local files instead of WebFetch to production). + +- **Turn 0** text is built from [`hookdeck-outpost-agent-prompt.mdx`](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx) (`## Template`) with placeholders filled from environment variables. +- Transcripts are written to `results/runs/-scenario-NN/transcript.json` (gitignored). + +See `npm run eval -- --help` for env vars (`EVAL_TOOLS`, `EVAL_MODEL`, etc.). + +### Permissions / failures (why a run might not work) + +Two different things get called “permissions”: + +1. **Cursor (or CI) sandbox and `tsx`** — The `tsx` **CLI** opens an IPC pipe in `/tmp` (or similar), which some sandboxes block (`listen EPERM`). This repo’s `npm run eval` uses **`node --import tsx`** instead so Node loads the tsx **loader** only (no CLI IPC). If you still see EPERM, run the same command in a normal terminal outside the sandbox, or use `npm run eval:tsx-cli` only where IPC is allowed. + +2. **Claude Agent SDK `dontAsk` + `allowedTools`** — In `dontAsk` mode, tools **not** listed in `allowedTools` are denied (no prompt). Defaults include **`Write`**, **`Edit`**, and **`Bash`** so app scenarios can scaffold and install dependencies inside the per-run directory. With **`EVAL_LOCAL_DOCS=1`**: **`Read,Glob,Grep,Write,Edit,Bash`**. Otherwise **`Read,Glob,Grep,WebFetch,Write,Edit,Bash`**. Narrow **`EVAL_TOOLS`** only if you need a stricter harness (e.g. transcript-only, no shell). + +Changing **`EVAL_PERMISSION_MODE`** is usually unnecessary; widening **`EVAL_TOOLS`** (or using local docs) fixes most tool denials. + +### Transcript vs execution (full pass) + +`npm run eval` only captures **what the model produced**; it does **not** call Outpost. Treat that as **transcript review**. + +A **full pass** also answers: *did the generated curl / script / app succeed against a live Outpost project?* Each scenario’s **Success criteria** ends with **Execution** checkboxes for that step. To run them: + +1. Add **`OUTPOST_API_KEY`** (and **`OUTPOST_TEST_WEBHOOK_URL`** / **`OUTPOST_API_BASE_URL`** when the artifact expects them) to `docs/agent-evaluation/.env` so your shell has them after `dotenv` or when you `source` / copy into the directory where you run the code. +2. Run the agent’s commands or start its app and complete the flows the scenario describes. +3. Record pass/fail in your run notes ([`results/RUN-RECORDING.template.md`](results/RUN-RECORDING.template.md)). + +## Single source of truth for the dashboard prompt + +The **full prompt template** (the text operators paste as Turn 0) lives in **one** place: + +**[`docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx`](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx)** — use the fenced block under **## Template**. + +For eval runs, example placeholder substitutions (non-secret) are in [`fixtures/placeholder-values-for-turn0.md`](fixtures/placeholder-values-for-turn0.md) only. That file intentionally **does not** duplicate the template. + +The Hookdeck dashboard should eventually render the **same** template body from product-side source; until then, this MDX page is the documentation canonical copy. + +## How to run an evaluation (manual) + +1. **Turn 0:** Open the [agent prompt MDX](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx), copy **## Template**, replace `{{…}}` (see [placeholder examples](fixtures/placeholder-values-for-turn0.md)). +2. **Pick a scenario:** e.g. [`scenarios/01-basics-curl.md`](scenarios/01-basics-curl.md). +3. **New agent thread:** Paste Turn 0, then follow each **Turn N — User** line from the scenario verbatim (or as specified). +4. **Judge output:** Use the scenario’s **Success criteria** checkboxes (human decision). +5. **Record:** Copy [`results/RUN-RECORDING.template.md`](results/RUN-RECORDING.template.md) to a local filename under `results/` (see [`results/README.md`](results/README.md)); those files are **gitignored** by default. + +### Helper script (optional) + +From the repo root: + +```sh +./docs/agent-evaluation/scripts/run-scenario.sh 01 +``` + +This **only prints** paths and reminders. It does **not** start an agent or call OpenAI/Anthropic/etc. + +## Judging results + +- **Automated runs:** use **Success criteria** in each `scenarios/*.md` (definition of pass). Each **`npm run eval -- --scenario|scenarios|all`** run applies **heuristic + LLM** scorers unless you pass **`--no-score`** / **`--no-score-llm`**; **Execution** rows stay manual unless you add a verifier. +- **Manual runs** use the checklist in [`results/RUN-RECORDING.template.md`](results/RUN-RECORDING.template.md). + +There is still **no single portable “IDE agent” CLI** for all vendors; the SDK runner is the supported path for headless Anthropic-based CI. + +## Measuring scenarios + +| Layer | What it answers | Where | +|--------|-----------------|--------| +| **Definition** | What “good” means (product + transcript) | **`## Success criteria`** in each [`scenarios/*.md`](scenarios/) | +| **Heuristic** | Fast, deterministic signal from transcript JSON | [`src/score-transcript.ts`](src/score-transcript.ts) — combines assistant text with **Write/Edit tool inputs** and tool results so on-disk artifacts count | +| **LLM judge** | Structured pass/fail vs the same **Success criteria** | After each scenario when **`--no-score-llm`** is not set; or `npm run score -- --run --llm` — [`src/llm-judge.ts`](src/llm-judge.ts) | +| **Execution** | Live API / app smoke test | Human (or future script); not automated here | + +**Heuristic functions** (failed checks set **`npm run eval`** / **`npm run score`** exit **1** when that scorer ran): + +| Scenario | Function | Topics covered (summary) | +|----------|----------|---------------------------| +| 01 | `scoreScenario01` | Managed URL, tenant PUT, webhook destination POST, publish `data`, no key leak, optional verify turn | +| 02 | `scoreScenario02` | TS SDK, `Outpost`, env key, tenants/destinations/publish, webhook env, run command | +| 03 | `scoreScenario03` | Python SDK import, client, same API calls, env, webhook URL | +| 04 | `scoreScenario04` | Go module, `New`/`WithSecurity`, Upsert/Create/Publish, env, webhook URL | +| 05 | `scoreScenario05` | Next.js signals, TS SDK, API routes, two flows, server env key, no `NEXT_PUBLIC_` key, README, optional stress-turn Hookdeck hint | +| 06 | `scoreScenario06` | FastAPI, `outpost_sdk`, uvicorn, server env, two flows, README, webhook docs | +| 07 | `scoreScenario07` | `net/http`, Go SDK + `CreateDestinationCreateWebhook`, HTML UI, two flows, `go run`, README | +| 08 | `scoreScenario08` | Clone **next-saas-starter** (or git baseline), TS SDK, publish/destinations/tenants, server env key, per-customer webhook story | +| 09 | `scoreScenario09` | Clone **FastAPI_SAAS_Template** (or git baseline), `outpost_sdk`, integration + domain hook, env key | +| 10 | `scoreScenario10` | Clone **startersaas-go-api** (or git baseline), Go Outpost SDK, publish + handler hook, env key | + +Export **`SCENARIO_IDS_WITH_HEURISTIC_RUBRIC`** in `score-transcript.ts` lists IDs **01–10** for tooling. + +## Scenarios + +To record each **`npm run eval -- --scenario …`** run, automated scores, and **whether you ran the generated code** with `OUTPOST_API_KEY`, use **[`SCENARIO-RUN-TRACKER.md`](SCENARIO-RUN-TRACKER.md)** (committed; not under `results/`, which is gitignored). + +| ID | File | Goal | +|----|------|------| +| 1 | [scenarios/01-basics-curl.md](scenarios/01-basics-curl.md) | Minimal **curl** only (managed API). | +| 2 | [scenarios/02-basics-typescript.md](scenarios/02-basics-typescript.md) | Minimal **TypeScript** script (`@hookdeck/outpost-sdk`). | +| 3 | [scenarios/03-basics-python.md](scenarios/03-basics-python.md) | Minimal **Python** script (`outpost_sdk`). | +| 4 | [scenarios/04-basics-go.md](scenarios/04-basics-go.md) | Minimal **Go** program (`outpost-go`). | +| 5 | [scenarios/05-app-nextjs.md](scenarios/05-app-nextjs.md) | Small **Next.js** app: UI to register a webhook destination and trigger a test publish. | +| 6 | [scenarios/06-app-fastapi.md](scenarios/06-app-fastapi.md) | Small **FastAPI** app with the same UX as scenario 5. | +| 7 | [scenarios/07-app-go-http.md](scenarios/07-app-go-http.md) | Small **Go** `net/http` app + simple HTML UI (same UX as scenario 5). | +| 8 | [scenarios/08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md) | **Existing Next.js SaaS** baseline — add outbound webhooks via Outpost ([leerob/next-saas-starter](https://github.com/leerob/next-saas-starter)). | +| 9 | [scenarios/09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) | **Existing FastAPI SaaS** baseline — Outpost integration ([philipokiokio/FastAPI_SAAS_Template](https://github.com/philipokiokio/FastAPI_SAAS_Template)). | +| 10 | [scenarios/10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md) | **Existing Go SaaS API** baseline — Outpost integration ([devinterface/startersaas-go-api](https://github.com/devinterface/startersaas-go-api)). | + +Scenarios **1–4** align with **“Try it out”**; **5–7** with **“Build a minimal example”**; **8–10** with **“Integrate with an existing app”** using pinned OSS baselines (Java / .NET can be added later the same way). + +## Agent skills recommendation + +**Recommend yes** for teams standardizing on Hookdeck’s skill pack: the [outpost skill](https://github.com/hookdeck/agent-skills/tree/main/skills/outpost) gives agents a consistent overview (tenants, destinations, topics, curl shape) and links into docs. + +**Caveats (update the skill in `hookdeck/agent-skills`, not in this repo):** + +1. **Managed-first** — The published skill is still **self-hosted heavy** (Docker block first; managed is a short table). For Hookdeck Outpost GA, the skill should foreground [managed quickstarts](../pages/quickstarts/hookdeck-outpost-curl.mdx), `https://api.outpost.hookdeck.com/2025-07-01`, **Settings → Secrets**, and `OUTPOST_API_KEY` / optional `OUTPOST_API_BASE_URL` to match product copy. +2. **REST paths** — Examples must use **`/tenants/{id}`**, not `PUT $BASE_URL/$TENANT_ID` (that path is wrong for the real API). +3. **Naming** — Align env var naming with docs (`OUTPOST_API_KEY` or documented dashboard name), not ad-hoc `HOOKDECK_API_KEY` unless the dashboard literally uses that string. +4. **Router vs. deep skills** — Today `outpost` is one monolithic `SKILL.md`. The skill itself mentions **future** destination-specific skills (`outpost-webhooks`, etc.). For scale, consider either **sections** with clear headings or **child skills** (e.g. `outpost-managed-quickstart`, `outpost-self-hosted`) once content grows—without forcing users to install many tiles for the common case. + +Until the skill is updated, agents should still be pointed at the **quickstart MDX pages** in this repo (or production docs URLs); the skill is supplementary. + +## Related docs + +- [Agent prompt template (SSoT)](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx) +- [Upstream skill notes](SKILL-UPSTREAM-NOTES.md) +- [TEMP tracking note](../TEMP-hookdeck-outpost-onboarding-status.md) diff --git a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md new file mode 100644 index 000000000..ac620193f --- /dev/null +++ b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md @@ -0,0 +1,53 @@ +# Scenario run tracker + +Use this table while you **run scenarios one at a time** and **execute the generated artifacts** against a real Outpost project. + +## How to use + +1. **Automated agent eval** (from `docs/agent-evaluation/`): + + ```sh + npm run eval -- --scenario + ``` + + Each run creates **`results/runs/-scenario-/`** with `transcript.json`, `heuristic-score.json`, `llm-score.json`, and whatever the agent wrote (scripts, apps, clones). + +2. **Fill the table:** paste or note the **run directory** (stamp), mark **Heuristic** / **LLM** pass or fail (from the sidecars or console). + +3. **Execution (generated code):** with **`OUTPOST_API_KEY`** (and **`OUTPOST_TEST_WEBHOOK_URL`** / **`OUTPOST_API_BASE_URL`** if needed) in your shell or `.env`, run the artifact the scenario expects — e.g. `bash outpost-quickstart.sh`, `npx tsx …`, `python …`, `go run …`, `npm run dev` in the generated app folder. Mark **Pass** / **Fail** / **Skip** and add **Notes** (HTTP status, delivery in Hookdeck Console, etc.). + +4. **Optional:** copy a row to your local run log under `results/` if you use `RUN-RECORDING.template.md`. + +--- + +## Tracker + +| ID | Scenario file | Run directory (`results/runs/…`) | Heuristic | LLM judge | Execution (generated code) | Notes | +|----|---------------|-----------------------------------|-----------|-----------|----------------------------|-------| +| 01 | [01-basics-curl.md](scenarios/01-basics-curl.md) | | | | | | +| 02 | [02-basics-typescript.md](scenarios/02-basics-typescript.md) | | | | | | +| 03 | [03-basics-python.md](scenarios/03-basics-python.md) | | | | | | +| 04 | [04-basics-go.md](scenarios/04-basics-go.md) | | | | | | +| 05 | [05-app-nextjs.md](scenarios/05-app-nextjs.md) | | | | | | +| 06 | [06-app-fastapi.md](scenarios/06-app-fastapi.md) | | | | | | +| 07 | [07-app-go-http.md](scenarios/07-app-go-http.md) | | | | | | +| 08 | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md) | | | | | | +| 09 | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) | | | | | | +| 10 | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md) | | | | | | + +### Column hints + +| Column | Meaning | +|--------|---------| +| **Run directory** | e.g. `2026-04-07T15-00-00-000Z-scenario-01` — the folder containing `transcript.json` | +| **Heuristic** | `heuristic-score.json` → `overallTranscriptPass` (or `passed`/`total`) | +| **LLM judge** | `llm-score.json` → `overall_transcript_pass` | +| **Execution** | Your smoke test of the **produced** script/app with real credentials — **not** automated by `npm run eval` | + +### Status legend (suggested) + +Use short text or symbols in cells, e.g. **Pass** / **Fail** / **Skip** / **N/A**, or ✅ / ❌ / — + +--- + +Full harness docs: [README.md](README.md). diff --git a/docs/agent-evaluation/SKILL-UPSTREAM-NOTES.md b/docs/agent-evaluation/SKILL-UPSTREAM-NOTES.md new file mode 100644 index 000000000..6c8de7367 --- /dev/null +++ b/docs/agent-evaluation/SKILL-UPSTREAM-NOTES.md @@ -0,0 +1,22 @@ +# Notes for updating `hookdeck/agent-skills` — `skills/outpost` + +Apply these in the **[agent-skills](https://github.com/hookdeck/agent-skills)** repository, not in Outpost OSS. + +## Recommended direction + +1. **Lead with managed Hookdeck Outpost** — Link prominently to managed quickstarts (curl, TypeScript, Python, Go) and `https://api.outpost.hookdeck.com/2025-07-01`. +2. **Fix REST examples** — Tenant upsert must be `PUT {base}/tenants/{tenant_id}`, not `PUT {base}/{tenant_id}`. +3. **Align env naming** — Match product/docs: Outpost API key from project **Settings → Secrets**, typically loaded as `OUTPOST_API_KEY` in examples; avoid introducing `HOOKDECK_API_KEY` unless the dashboard literally uses that name. +4. **Self-hosted section** — Keep Docker/Kubernetes/Railway as a secondary path with `http://localhost:3333/api/v1` and correct `/tenants/...` paths. +5. **Optional: split later** — If the file grows, add `outpost-managed.md` / `outpost-self-hosted.md` fragments or separate skills; keep the default tile entrypoint short. + +## Concrete issues in current `SKILL.md` (as of fetch against `main`) + +- **Wrong curl path:** `curl -X PUT "$BASE_URL/$TENANT_ID"` should target `/tenants/$TENANT_ID` relative to the API base (managed base has no `/api/v1` prefix). +- **Managed auth row** — Verify exact dashboard copy for secret name and env var conventions; link to Hookdeck Outpost project settings, not only generic dashboard secrets if URLs differ. +- **Tile summary** — `tile.json` says “self-hosted relay”; managed Outpost should be reflected in the summary string when GA positioning is final. + +## Cross-links from this repo + +- Onboarding prompt template: `docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx` +- Manual agent eval harness: `docs/agent-evaluation/README.md` \ No newline at end of file diff --git a/docs/agent-evaluation/fixtures/placeholder-values-for-turn0.md b/docs/agent-evaluation/fixtures/placeholder-values-for-turn0.md new file mode 100644 index 000000000..39d344677 --- /dev/null +++ b/docs/agent-evaluation/fixtures/placeholder-values-for-turn0.md @@ -0,0 +1,27 @@ +# Placeholder values for Turn 0 (eval / local testing) + +The **prompt template itself** lives in one place only: + +**[`hookdeck-outpost-agent-prompt.mdx`](../../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx)** (from repo root: `docs/pages/quickstarts/...`) — copy the fenced block under **## Template**, then replace each `{{PLACEHOLDER}}` using the table below. + +Do **not** paste real API keys into chat. Have operators put `OUTPOST_API_KEY` in a project **`.env`** (or another loader), not in the agent transcript. Use a throwaway Hookdeck project when possible. + +For **`npm run eval -- --scenario …`** (or **`--scenarios`** / **`--all`**), the runner only needs **`ANTHROPIC_API_KEY`** and **`EVAL_TEST_DESTINATION_URL`**. To score a **full** eval (generated commands/code actually work), you still need **`OUTPOST_API_KEY`** (and usually **`OUTPOST_TEST_WEBHOOK_URL`**) when you **execute** the agent’s output afterward. Optional **`EVAL_LOCAL_DOCS=1`** points Turn 0 at repo paths instead of live `{{DOCS_URL}}` links. + +--- + +## Example substitutions (non-secret) + +| Placeholder | Example | +|-------------|---------| +| `{{API_BASE_URL}}` | `https://api.outpost.hookdeck.com/2025-07-01` | +| `{{TOPICS_LIST}}` | `- user.created` | +| `{{TEST_DESTINATION_URL}}` | Hookdeck Console **Source** URL the dashboard feeds in (for automated evals, set `EVAL_TEST_DESTINATION_URL` to the same value). Example: `https://hkdk.events/...` | +| `{{DOCS_URL}}` | `https://outpost.hookdeck.com/docs` (local Zudoku: same paths under `/docs`) | +| `{{LLMS_FULL_URL}}` | Omit the line in the template if unused, or your public `llms-full.txt` URL | + +--- + +## Dashboard implementation note + +When this text is embedded in the Hookdeck product, the **same** template body should be rendered from one dashboard/backend source so docs and product stay aligned. The MDX page in this repo is the documentation **canonical** copy until product source is wired to match it. diff --git a/docs/agent-evaluation/package-lock.json b/docs/agent-evaluation/package-lock.json new file mode 100644 index 000000000..12d5ab75e --- /dev/null +++ b/docs/agent-evaluation/package-lock.json @@ -0,0 +1,2096 @@ +{ + "name": "outpost-agent-evaluation", + "version": "1.0.0", + "lockfileVersion": 3, + "requires": true, + "packages": { + "": { + "name": "outpost-agent-evaluation", + "version": "1.0.0", + "dependencies": { + "@anthropic-ai/claude-agent-sdk": "^0.2.92", + "dotenv": "^16.4.7" + }, + "devDependencies": { + "tsx": "^4.19.4", + "typescript": "^5.8.3" + }, + "engines": { + "node": ">=18" + } + }, + "node_modules/@anthropic-ai/claude-agent-sdk": { + "version": "0.2.92", + "resolved": "https://registry.npmjs.org/@anthropic-ai/claude-agent-sdk/-/claude-agent-sdk-0.2.92.tgz", + "integrity": "sha512-loYyxVUC5gBwHjGi9Fv0b84mduJTp9Z3Pum+y/7IVQDb4NynKfVQl6l4VeDKZaW+1QTQtd25tY4hwUznD7Krqw==", + "license": "SEE LICENSE IN README.md", + "dependencies": { + "@anthropic-ai/sdk": "^0.80.0", + "@modelcontextprotocol/sdk": "^1.27.1" + }, + "engines": { + "node": ">=18.0.0" + }, + "optionalDependencies": { + "@img/sharp-darwin-arm64": "^0.34.2", + "@img/sharp-darwin-x64": "^0.34.2", + "@img/sharp-linux-arm": "^0.34.2", + "@img/sharp-linux-arm64": "^0.34.2", + "@img/sharp-linux-x64": "^0.34.2", + "@img/sharp-linuxmusl-arm64": "^0.34.2", + "@img/sharp-linuxmusl-x64": "^0.34.2", + "@img/sharp-win32-arm64": "^0.34.2", + "@img/sharp-win32-x64": "^0.34.2" + }, + "peerDependencies": { + "zod": "^4.0.0" + } + }, + "node_modules/@anthropic-ai/sdk": { + "version": "0.80.0", + "resolved": "https://registry.npmjs.org/@anthropic-ai/sdk/-/sdk-0.80.0.tgz", + "integrity": "sha512-WeXLn7zNVk3yjeshn+xZHvld6AoFUOR3Sep6pSoHho5YbSi6HwcirqgPA5ccFuW8QTVJAAU7N8uQQC6Wa9TG+g==", + "license": "MIT", + "dependencies": { + "json-schema-to-ts": "^3.1.1" + }, + "bin": { + "anthropic-ai-sdk": "bin/cli" + }, + "peerDependencies": { + "zod": "^3.25.0 || ^4.0.0" + }, + "peerDependenciesMeta": { + "zod": { + "optional": true + } + } + }, + "node_modules/@babel/runtime": { + "version": "7.29.2", + "resolved": "https://registry.npmjs.org/@babel/runtime/-/runtime-7.29.2.tgz", + "integrity": "sha512-JiDShH45zKHWyGe4ZNVRrCjBz8Nh9TMmZG1kh4QTK8hCBTWBi8Da+i7s1fJw7/lYpM4ccepSNfqzZ/QvABBi5g==", + "license": "MIT", + "engines": { + "node": ">=6.9.0" + } + }, + "node_modules/@esbuild/aix-ppc64": { + "version": "0.27.7", + "resolved": "https://registry.npmjs.org/@esbuild/aix-ppc64/-/aix-ppc64-0.27.7.tgz", + "integrity": "sha512-EKX3Qwmhz1eMdEJokhALr0YiD0lhQNwDqkPYyPhiSwKrh7/4KRjQc04sZ8db+5DVVnZ1LmbNDI1uAMPEUBnQPg==", + "cpu": [ + "ppc64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "aix" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/android-arm": { + "version": "0.27.7", + "resolved": "https://registry.npmjs.org/@esbuild/android-arm/-/android-arm-0.27.7.tgz", + "integrity": "sha512-jbPXvB4Yj2yBV7HUfE2KHe4GJX51QplCN1pGbYjvsyCZbQmies29EoJbkEc+vYuU5o45AfQn37vZlyXy4YJ8RQ==", + "cpu": [ + "arm" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "android" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/android-arm64": { + "version": "0.27.7", + "resolved": "https://registry.npmjs.org/@esbuild/android-arm64/-/android-arm64-0.27.7.tgz", + "integrity": "sha512-62dPZHpIXzvChfvfLJow3q5dDtiNMkwiRzPylSCfriLvZeq0a1bWChrGx/BbUbPwOrsWKMn8idSllklzBy+dgQ==", + "cpu": [ + "arm64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "android" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/android-x64": { + "version": "0.27.7", + "resolved": "https://registry.npmjs.org/@esbuild/android-x64/-/android-x64-0.27.7.tgz", + "integrity": "sha512-x5VpMODneVDb70PYV2VQOmIUUiBtY3D3mPBG8NxVk5CogneYhkR7MmM3yR/uMdITLrC1ml/NV1rj4bMJuy9MCg==", + "cpu": [ + "x64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "android" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/darwin-arm64": { + "version": "0.27.7", + "resolved": "https://registry.npmjs.org/@esbuild/darwin-arm64/-/darwin-arm64-0.27.7.tgz", + "integrity": "sha512-5lckdqeuBPlKUwvoCXIgI2D9/ABmPq3Rdp7IfL70393YgaASt7tbju3Ac+ePVi3KDH6N2RqePfHnXkaDtY9fkw==", + "cpu": [ + "arm64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "darwin" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/darwin-x64": { + "version": "0.27.7", + "resolved": "https://registry.npmjs.org/@esbuild/darwin-x64/-/darwin-x64-0.27.7.tgz", + "integrity": "sha512-rYnXrKcXuT7Z+WL5K980jVFdvVKhCHhUwid+dDYQpH+qu+TefcomiMAJpIiC2EM3Rjtq0sO3StMV/+3w3MyyqQ==", + "cpu": [ + "x64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "darwin" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/freebsd-arm64": { + "version": "0.27.7", + "resolved": "https://registry.npmjs.org/@esbuild/freebsd-arm64/-/freebsd-arm64-0.27.7.tgz", + "integrity": "sha512-B48PqeCsEgOtzME2GbNM2roU29AMTuOIN91dsMO30t+Ydis3z/3Ngoj5hhnsOSSwNzS+6JppqWsuhTp6E82l2w==", + "cpu": [ + "arm64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "freebsd" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/freebsd-x64": { + "version": "0.27.7", + "resolved": "https://registry.npmjs.org/@esbuild/freebsd-x64/-/freebsd-x64-0.27.7.tgz", + "integrity": "sha512-jOBDK5XEjA4m5IJK3bpAQF9/Lelu/Z9ZcdhTRLf4cajlB+8VEhFFRjWgfy3M1O4rO2GQ/b2dLwCUGpiF/eATNQ==", + "cpu": [ + "x64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "freebsd" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/linux-arm": { + "version": "0.27.7", + "resolved": "https://registry.npmjs.org/@esbuild/linux-arm/-/linux-arm-0.27.7.tgz", + "integrity": "sha512-RkT/YXYBTSULo3+af8Ib0ykH8u2MBh57o7q/DAs3lTJlyVQkgQvlrPTnjIzzRPQyavxtPtfg0EopvDyIt0j1rA==", + "cpu": [ + "arm" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/linux-arm64": { + "version": "0.27.7", + "resolved": "https://registry.npmjs.org/@esbuild/linux-arm64/-/linux-arm64-0.27.7.tgz", + "integrity": "sha512-RZPHBoxXuNnPQO9rvjh5jdkRmVizktkT7TCDkDmQ0W2SwHInKCAV95GRuvdSvA7w4VMwfCjUiPwDi0ZO6Nfe9A==", + "cpu": [ + "arm64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/linux-ia32": { + "version": "0.27.7", + "resolved": "https://registry.npmjs.org/@esbuild/linux-ia32/-/linux-ia32-0.27.7.tgz", + "integrity": "sha512-GA48aKNkyQDbd3KtkplYWT102C5sn/EZTY4XROkxONgruHPU72l+gW+FfF8tf2cFjeHaRbWpOYa/uRBz/Xq1Pg==", + "cpu": [ + "ia32" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/linux-loong64": { + "version": "0.27.7", + "resolved": "https://registry.npmjs.org/@esbuild/linux-loong64/-/linux-loong64-0.27.7.tgz", + "integrity": "sha512-a4POruNM2oWsD4WKvBSEKGIiWQF8fZOAsycHOt6JBpZ+JN2n2JH9WAv56SOyu9X5IqAjqSIPTaJkqN8F7XOQ5Q==", + "cpu": [ + "loong64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/linux-mips64el": { + "version": "0.27.7", + "resolved": "https://registry.npmjs.org/@esbuild/linux-mips64el/-/linux-mips64el-0.27.7.tgz", + "integrity": "sha512-KabT5I6StirGfIz0FMgl1I+R1H73Gp0ofL9A3nG3i/cYFJzKHhouBV5VWK1CSgKvVaG4q1RNpCTR2LuTVB3fIw==", + "cpu": [ + "mips64el" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/linux-ppc64": { + "version": "0.27.7", + "resolved": "https://registry.npmjs.org/@esbuild/linux-ppc64/-/linux-ppc64-0.27.7.tgz", + "integrity": "sha512-gRsL4x6wsGHGRqhtI+ifpN/vpOFTQtnbsupUF5R5YTAg+y/lKelYR1hXbnBdzDjGbMYjVJLJTd2OFmMewAgwlQ==", + "cpu": [ + "ppc64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/linux-riscv64": { + "version": "0.27.7", + "resolved": "https://registry.npmjs.org/@esbuild/linux-riscv64/-/linux-riscv64-0.27.7.tgz", + "integrity": "sha512-hL25LbxO1QOngGzu2U5xeXtxXcW+/GvMN3ejANqXkxZ/opySAZMrc+9LY/WyjAan41unrR3YrmtTsUpwT66InQ==", + "cpu": [ + "riscv64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/linux-s390x": { + "version": "0.27.7", + "resolved": "https://registry.npmjs.org/@esbuild/linux-s390x/-/linux-s390x-0.27.7.tgz", + "integrity": "sha512-2k8go8Ycu1Kb46vEelhu1vqEP+UeRVj2zY1pSuPdgvbd5ykAw82Lrro28vXUrRmzEsUV0NzCf54yARIK8r0fdw==", + "cpu": [ + "s390x" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/linux-x64": { + "version": "0.27.7", + "resolved": "https://registry.npmjs.org/@esbuild/linux-x64/-/linux-x64-0.27.7.tgz", + "integrity": "sha512-hzznmADPt+OmsYzw1EE33ccA+HPdIqiCRq7cQeL1Jlq2gb1+OyWBkMCrYGBJ+sxVzve2ZJEVeePbLM2iEIZSxA==", + "cpu": [ + "x64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/netbsd-arm64": { + "version": "0.27.7", + "resolved": "https://registry.npmjs.org/@esbuild/netbsd-arm64/-/netbsd-arm64-0.27.7.tgz", + "integrity": "sha512-b6pqtrQdigZBwZxAn1UpazEisvwaIDvdbMbmrly7cDTMFnw/+3lVxxCTGOrkPVnsYIosJJXAsILG9XcQS+Yu6w==", + "cpu": [ + "arm64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "netbsd" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/netbsd-x64": { + "version": "0.27.7", + "resolved": "https://registry.npmjs.org/@esbuild/netbsd-x64/-/netbsd-x64-0.27.7.tgz", + "integrity": "sha512-OfatkLojr6U+WN5EDYuoQhtM+1xco+/6FSzJJnuWiUw5eVcicbyK3dq5EeV/QHT1uy6GoDhGbFpprUiHUYggrw==", + "cpu": [ + "x64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "netbsd" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/openbsd-arm64": { + "version": "0.27.7", + "resolved": "https://registry.npmjs.org/@esbuild/openbsd-arm64/-/openbsd-arm64-0.27.7.tgz", + "integrity": "sha512-AFuojMQTxAz75Fo8idVcqoQWEHIXFRbOc1TrVcFSgCZtQfSdc1RXgB3tjOn/krRHENUB4j00bfGjyl2mJrU37A==", + "cpu": [ + "arm64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "openbsd" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/openbsd-x64": { + "version": "0.27.7", + "resolved": "https://registry.npmjs.org/@esbuild/openbsd-x64/-/openbsd-x64-0.27.7.tgz", + "integrity": "sha512-+A1NJmfM8WNDv5CLVQYJ5PshuRm/4cI6WMZRg1by1GwPIQPCTs1GLEUHwiiQGT5zDdyLiRM/l1G0Pv54gvtKIg==", + "cpu": [ + "x64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "openbsd" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/openharmony-arm64": { + "version": "0.27.7", + "resolved": "https://registry.npmjs.org/@esbuild/openharmony-arm64/-/openharmony-arm64-0.27.7.tgz", + "integrity": "sha512-+KrvYb/C8zA9CU/g0sR6w2RBw7IGc5J2BPnc3dYc5VJxHCSF1yNMxTV5LQ7GuKteQXZtspjFbiuW5/dOj7H4Yw==", + "cpu": [ + "arm64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "openharmony" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/sunos-x64": { + "version": "0.27.7", + "resolved": "https://registry.npmjs.org/@esbuild/sunos-x64/-/sunos-x64-0.27.7.tgz", + "integrity": "sha512-ikktIhFBzQNt/QDyOL580ti9+5mL/YZeUPKU2ivGtGjdTYoqz6jObj6nOMfhASpS4GU4Q/Clh1QtxWAvcYKamA==", + "cpu": [ + "x64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "sunos" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/win32-arm64": { + "version": "0.27.7", + "resolved": "https://registry.npmjs.org/@esbuild/win32-arm64/-/win32-arm64-0.27.7.tgz", + "integrity": "sha512-7yRhbHvPqSpRUV7Q20VuDwbjW5kIMwTHpptuUzV+AA46kiPze5Z7qgt6CLCK3pWFrHeNfDd1VKgyP4O+ng17CA==", + "cpu": [ + "arm64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "win32" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/win32-ia32": { + "version": "0.27.7", + "resolved": "https://registry.npmjs.org/@esbuild/win32-ia32/-/win32-ia32-0.27.7.tgz", + "integrity": "sha512-SmwKXe6VHIyZYbBLJrhOoCJRB/Z1tckzmgTLfFYOfpMAx63BJEaL9ExI8x7v0oAO3Zh6D/Oi1gVxEYr5oUCFhw==", + "cpu": [ + "ia32" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "win32" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/win32-x64": { + "version": "0.27.7", + "resolved": "https://registry.npmjs.org/@esbuild/win32-x64/-/win32-x64-0.27.7.tgz", + "integrity": "sha512-56hiAJPhwQ1R4i+21FVF7V8kSD5zZTdHcVuRFMW0hn753vVfQN8xlx4uOPT4xoGH0Z/oVATuR82AiqSTDIpaHg==", + "cpu": [ + "x64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "win32" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@hono/node-server": { + "version": "1.19.13", + "resolved": "https://registry.npmjs.org/@hono/node-server/-/node-server-1.19.13.tgz", + "integrity": "sha512-TsQLe4i2gvoTtrHje625ngThGBySOgSK3Xo2XRYOdqGN1teR8+I7vchQC46uLJi8OF62YTYA3AhSpumtkhsaKQ==", + "license": "MIT", + "engines": { + "node": ">=18.14.1" + }, + "peerDependencies": { + "hono": "^4" + } + }, + "node_modules/@img/sharp-darwin-arm64": { + "version": "0.34.5", + "resolved": "https://registry.npmjs.org/@img/sharp-darwin-arm64/-/sharp-darwin-arm64-0.34.5.tgz", + "integrity": "sha512-imtQ3WMJXbMY4fxb/Ndp6HBTNVtWCUI0WdobyheGf5+ad6xX8VIDO8u2xE4qc/fr08CKG/7dDseFtn6M6g/r3w==", + "cpu": [ + "arm64" + ], + "license": "Apache-2.0", + "optional": true, + "os": [ + "darwin" + ], + "engines": { + "node": "^18.17.0 || ^20.3.0 || >=21.0.0" + }, + "funding": { + "url": "https://opencollective.com/libvips" + }, + "optionalDependencies": { + "@img/sharp-libvips-darwin-arm64": "1.2.4" + } + }, + "node_modules/@img/sharp-darwin-x64": { + "version": "0.34.5", + "resolved": "https://registry.npmjs.org/@img/sharp-darwin-x64/-/sharp-darwin-x64-0.34.5.tgz", + "integrity": "sha512-YNEFAF/4KQ/PeW0N+r+aVVsoIY0/qxxikF2SWdp+NRkmMB7y9LBZAVqQ4yhGCm/H3H270OSykqmQMKLBhBJDEw==", + "cpu": [ + "x64" + ], + "license": "Apache-2.0", + "optional": true, + "os": [ + "darwin" + ], + "engines": { + "node": "^18.17.0 || ^20.3.0 || >=21.0.0" + }, + "funding": { + "url": "https://opencollective.com/libvips" + }, + "optionalDependencies": { + "@img/sharp-libvips-darwin-x64": "1.2.4" + } + }, + "node_modules/@img/sharp-libvips-darwin-arm64": { + "version": "1.2.4", + "resolved": "https://registry.npmjs.org/@img/sharp-libvips-darwin-arm64/-/sharp-libvips-darwin-arm64-1.2.4.tgz", + "integrity": "sha512-zqjjo7RatFfFoP0MkQ51jfuFZBnVE2pRiaydKJ1G/rHZvnsrHAOcQALIi9sA5co5xenQdTugCvtb1cuf78Vf4g==", + "cpu": [ + "arm64" + ], + "license": "LGPL-3.0-or-later", + "optional": true, + "os": [ + "darwin" + ], + "funding": { + "url": "https://opencollective.com/libvips" + } + }, + "node_modules/@img/sharp-libvips-darwin-x64": { + "version": "1.2.4", + "resolved": "https://registry.npmjs.org/@img/sharp-libvips-darwin-x64/-/sharp-libvips-darwin-x64-1.2.4.tgz", + "integrity": "sha512-1IOd5xfVhlGwX+zXv2N93k0yMONvUlANylbJw1eTah8K/Jtpi15KC+WSiaX/nBmbm2HxRM1gZ0nSdjSsrZbGKg==", + "cpu": [ + "x64" + ], + "license": "LGPL-3.0-or-later", + "optional": true, + "os": [ + "darwin" + ], + "funding": { + "url": "https://opencollective.com/libvips" + } + }, + "node_modules/@img/sharp-libvips-linux-arm": { + "version": "1.2.4", + "resolved": "https://registry.npmjs.org/@img/sharp-libvips-linux-arm/-/sharp-libvips-linux-arm-1.2.4.tgz", + "integrity": "sha512-bFI7xcKFELdiNCVov8e44Ia4u2byA+l3XtsAj+Q8tfCwO6BQ8iDojYdvoPMqsKDkuoOo+X6HZA0s0q11ANMQ8A==", + "cpu": [ + "arm" + ], + "license": "LGPL-3.0-or-later", + "optional": true, + "os": [ + "linux" + ], + "funding": { + "url": "https://opencollective.com/libvips" + } + }, + "node_modules/@img/sharp-libvips-linux-arm64": { + "version": "1.2.4", + "resolved": "https://registry.npmjs.org/@img/sharp-libvips-linux-arm64/-/sharp-libvips-linux-arm64-1.2.4.tgz", + "integrity": "sha512-excjX8DfsIcJ10x1Kzr4RcWe1edC9PquDRRPx3YVCvQv+U5p7Yin2s32ftzikXojb1PIFc/9Mt28/y+iRklkrw==", + "cpu": [ + "arm64" + ], + "license": "LGPL-3.0-or-later", + "optional": true, + "os": [ + "linux" + ], + "funding": { + "url": "https://opencollective.com/libvips" + } + }, + "node_modules/@img/sharp-libvips-linux-x64": { + "version": "1.2.4", + "resolved": "https://registry.npmjs.org/@img/sharp-libvips-linux-x64/-/sharp-libvips-linux-x64-1.2.4.tgz", + "integrity": "sha512-tJxiiLsmHc9Ax1bz3oaOYBURTXGIRDODBqhveVHonrHJ9/+k89qbLl0bcJns+e4t4rvaNBxaEZsFtSfAdquPrw==", + "cpu": [ + "x64" + ], + "license": "LGPL-3.0-or-later", + "optional": true, + "os": [ + "linux" + ], + "funding": { + "url": "https://opencollective.com/libvips" + } + }, + "node_modules/@img/sharp-libvips-linuxmusl-arm64": { + "version": "1.2.4", + "resolved": "https://registry.npmjs.org/@img/sharp-libvips-linuxmusl-arm64/-/sharp-libvips-linuxmusl-arm64-1.2.4.tgz", + "integrity": "sha512-FVQHuwx1IIuNow9QAbYUzJ+En8KcVm9Lk5+uGUQJHaZmMECZmOlix9HnH7n1TRkXMS0pGxIJokIVB9SuqZGGXw==", + "cpu": [ + "arm64" + ], + "license": "LGPL-3.0-or-later", + "optional": true, + "os": [ + "linux" + ], + "funding": { + "url": "https://opencollective.com/libvips" + } + }, + "node_modules/@img/sharp-libvips-linuxmusl-x64": { + "version": "1.2.4", + "resolved": "https://registry.npmjs.org/@img/sharp-libvips-linuxmusl-x64/-/sharp-libvips-linuxmusl-x64-1.2.4.tgz", + "integrity": "sha512-+LpyBk7L44ZIXwz/VYfglaX/okxezESc6UxDSoyo2Ks6Jxc4Y7sGjpgU9s4PMgqgjj1gZCylTieNamqA1MF7Dg==", + "cpu": [ + "x64" + ], + "license": "LGPL-3.0-or-later", + "optional": true, + "os": [ + "linux" + ], + "funding": { + "url": "https://opencollective.com/libvips" + } + }, + "node_modules/@img/sharp-linux-arm": { + "version": "0.34.5", + "resolved": "https://registry.npmjs.org/@img/sharp-linux-arm/-/sharp-linux-arm-0.34.5.tgz", + "integrity": "sha512-9dLqsvwtg1uuXBGZKsxem9595+ujv0sJ6Vi8wcTANSFpwV/GONat5eCkzQo/1O6zRIkh0m/8+5BjrRr7jDUSZw==", + "cpu": [ + "arm" + ], + "license": "Apache-2.0", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": "^18.17.0 || ^20.3.0 || >=21.0.0" + }, + "funding": { + "url": "https://opencollective.com/libvips" + }, + "optionalDependencies": { + "@img/sharp-libvips-linux-arm": "1.2.4" + } + }, + "node_modules/@img/sharp-linux-arm64": { + "version": "0.34.5", + "resolved": "https://registry.npmjs.org/@img/sharp-linux-arm64/-/sharp-linux-arm64-0.34.5.tgz", + "integrity": "sha512-bKQzaJRY/bkPOXyKx5EVup7qkaojECG6NLYswgktOZjaXecSAeCWiZwwiFf3/Y+O1HrauiE3FVsGxFg8c24rZg==", + "cpu": [ + "arm64" + ], + "license": "Apache-2.0", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": "^18.17.0 || ^20.3.0 || >=21.0.0" + }, + "funding": { + "url": "https://opencollective.com/libvips" + }, + "optionalDependencies": { + "@img/sharp-libvips-linux-arm64": "1.2.4" + } + }, + "node_modules/@img/sharp-linux-x64": { + "version": "0.34.5", + "resolved": "https://registry.npmjs.org/@img/sharp-linux-x64/-/sharp-linux-x64-0.34.5.tgz", + "integrity": "sha512-MEzd8HPKxVxVenwAa+JRPwEC7QFjoPWuS5NZnBt6B3pu7EG2Ge0id1oLHZpPJdn3OQK+BQDiw9zStiHBTJQQQQ==", + "cpu": [ + "x64" + ], + "license": "Apache-2.0", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": "^18.17.0 || ^20.3.0 || >=21.0.0" + }, + "funding": { + "url": "https://opencollective.com/libvips" + }, + "optionalDependencies": { + "@img/sharp-libvips-linux-x64": "1.2.4" + } + }, + "node_modules/@img/sharp-linuxmusl-arm64": { + "version": "0.34.5", + "resolved": "https://registry.npmjs.org/@img/sharp-linuxmusl-arm64/-/sharp-linuxmusl-arm64-0.34.5.tgz", + "integrity": "sha512-fprJR6GtRsMt6Kyfq44IsChVZeGN97gTD331weR1ex1c1rypDEABN6Tm2xa1wE6lYb5DdEnk03NZPqA7Id21yg==", + "cpu": [ + "arm64" + ], + "license": "Apache-2.0", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": "^18.17.0 || ^20.3.0 || >=21.0.0" + }, + "funding": { + "url": "https://opencollective.com/libvips" + }, + "optionalDependencies": { + "@img/sharp-libvips-linuxmusl-arm64": "1.2.4" + } + }, + "node_modules/@img/sharp-linuxmusl-x64": { + "version": "0.34.5", + "resolved": "https://registry.npmjs.org/@img/sharp-linuxmusl-x64/-/sharp-linuxmusl-x64-0.34.5.tgz", + "integrity": "sha512-Jg8wNT1MUzIvhBFxViqrEhWDGzqymo3sV7z7ZsaWbZNDLXRJZoRGrjulp60YYtV4wfY8VIKcWidjojlLcWrd8Q==", + "cpu": [ + "x64" + ], + "license": "Apache-2.0", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": "^18.17.0 || ^20.3.0 || >=21.0.0" + }, + "funding": { + "url": "https://opencollective.com/libvips" + }, + "optionalDependencies": { + "@img/sharp-libvips-linuxmusl-x64": "1.2.4" + } + }, + "node_modules/@img/sharp-win32-arm64": { + "version": "0.34.5", + "resolved": "https://registry.npmjs.org/@img/sharp-win32-arm64/-/sharp-win32-arm64-0.34.5.tgz", + "integrity": "sha512-WQ3AgWCWYSb2yt+IG8mnC6Jdk9Whs7O0gxphblsLvdhSpSTtmu69ZG1Gkb6NuvxsNACwiPV6cNSZNzt0KPsw7g==", + "cpu": [ + "arm64" + ], + "license": "Apache-2.0 AND LGPL-3.0-or-later", + "optional": true, + "os": [ + "win32" + ], + "engines": { + "node": "^18.17.0 || ^20.3.0 || >=21.0.0" + }, + "funding": { + "url": "https://opencollective.com/libvips" + } + }, + "node_modules/@img/sharp-win32-x64": { + "version": "0.34.5", + "resolved": "https://registry.npmjs.org/@img/sharp-win32-x64/-/sharp-win32-x64-0.34.5.tgz", + "integrity": "sha512-+29YMsqY2/9eFEiW93eqWnuLcWcufowXewwSNIT6UwZdUUCrM3oFjMWH/Z6/TMmb4hlFenmfAVbpWeup2jryCw==", + "cpu": [ + "x64" + ], + "license": "Apache-2.0 AND LGPL-3.0-or-later", + "optional": true, + "os": [ + "win32" + ], + "engines": { + "node": "^18.17.0 || ^20.3.0 || >=21.0.0" + }, + "funding": { + "url": "https://opencollective.com/libvips" + } + }, + "node_modules/@modelcontextprotocol/sdk": { + "version": "1.29.0", + "resolved": "https://registry.npmjs.org/@modelcontextprotocol/sdk/-/sdk-1.29.0.tgz", + "integrity": "sha512-zo37mZA9hJWpULgkRpowewez1y6ML5GsXJPY8FI0tBBCd77HEvza4jDqRKOXgHNn867PVGCyTdzqpz0izu5ZjQ==", + "license": "MIT", + "dependencies": { + "@hono/node-server": "^1.19.9", + "ajv": "^8.17.1", + "ajv-formats": "^3.0.1", + "content-type": "^1.0.5", + "cors": "^2.8.5", + "cross-spawn": "^7.0.5", + "eventsource": "^3.0.2", + "eventsource-parser": "^3.0.0", + "express": "^5.2.1", + "express-rate-limit": "^8.2.1", + "hono": "^4.11.4", + "jose": "^6.1.3", + "json-schema-typed": "^8.0.2", + "pkce-challenge": "^5.0.0", + "raw-body": "^3.0.0", + "zod": "^3.25 || ^4.0", + "zod-to-json-schema": "^3.25.1" + }, + "engines": { + "node": ">=18" + }, + "peerDependencies": { + "@cfworker/json-schema": "^4.1.1", + "zod": "^3.25 || ^4.0" + }, + "peerDependenciesMeta": { + "@cfworker/json-schema": { + "optional": true + }, + "zod": { + "optional": false + } + } + }, + "node_modules/accepts": { + "version": "2.0.0", + "resolved": "https://registry.npmjs.org/accepts/-/accepts-2.0.0.tgz", + "integrity": "sha512-5cvg6CtKwfgdmVqY1WIiXKc3Q1bkRqGLi+2W/6ao+6Y7gu/RCwRuAhGEzh5B4KlszSuTLgZYuqFqo5bImjNKng==", + "license": "MIT", + "dependencies": { + "mime-types": "^3.0.0", + "negotiator": "^1.0.0" + }, + "engines": { + "node": ">= 0.6" + } + }, + "node_modules/ajv": { + "version": "8.18.0", + "resolved": "https://registry.npmjs.org/ajv/-/ajv-8.18.0.tgz", + "integrity": "sha512-PlXPeEWMXMZ7sPYOHqmDyCJzcfNrUr3fGNKtezX14ykXOEIvyK81d+qydx89KY5O71FKMPaQ2vBfBFI5NHR63A==", + "license": "MIT", + "dependencies": { + "fast-deep-equal": "^3.1.3", + "fast-uri": "^3.0.1", + "json-schema-traverse": "^1.0.0", + "require-from-string": "^2.0.2" + }, + "funding": { + "type": "github", + "url": "https://github.com/sponsors/epoberezkin" + } + }, + "node_modules/ajv-formats": { + "version": "3.0.1", + "resolved": "https://registry.npmjs.org/ajv-formats/-/ajv-formats-3.0.1.tgz", + "integrity": "sha512-8iUql50EUR+uUcdRQ3HDqa6EVyo3docL8g5WJ3FNcWmu62IbkGUue/pEyLBW8VGKKucTPgqeks4fIU1DA4yowQ==", + "license": "MIT", + "dependencies": { + "ajv": "^8.0.0" + }, + "peerDependencies": { + "ajv": "^8.0.0" + }, + "peerDependenciesMeta": { + "ajv": { + "optional": true + } + } + }, + "node_modules/body-parser": { + "version": "2.2.2", + "resolved": "https://registry.npmjs.org/body-parser/-/body-parser-2.2.2.tgz", + "integrity": "sha512-oP5VkATKlNwcgvxi0vM0p/D3n2C3EReYVX+DNYs5TjZFn/oQt2j+4sVJtSMr18pdRr8wjTcBl6LoV+FUwzPmNA==", + "license": "MIT", + "dependencies": { + "bytes": "^3.1.2", + "content-type": "^1.0.5", + "debug": "^4.4.3", + "http-errors": "^2.0.0", + "iconv-lite": "^0.7.0", + "on-finished": "^2.4.1", + "qs": "^6.14.1", + "raw-body": "^3.0.1", + "type-is": "^2.0.1" + }, + "engines": { + "node": ">=18" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/express" + } + }, + "node_modules/bytes": { + "version": "3.1.2", + "resolved": "https://registry.npmjs.org/bytes/-/bytes-3.1.2.tgz", + "integrity": "sha512-/Nf7TyzTx6S3yRJObOAV7956r8cr2+Oj8AC5dt8wSP3BQAoeX58NoHyCU8P8zGkNXStjTSi6fzO6F0pBdcYbEg==", + "license": "MIT", + "engines": { + "node": ">= 0.8" + } + }, + "node_modules/call-bind-apply-helpers": { + "version": "1.0.2", + "resolved": "https://registry.npmjs.org/call-bind-apply-helpers/-/call-bind-apply-helpers-1.0.2.tgz", + "integrity": "sha512-Sp1ablJ0ivDkSzjcaJdxEunN5/XvksFJ2sMBFfq6x0ryhQV/2b/KwFe21cMpmHtPOSij8K99/wSfoEuTObmuMQ==", + "license": "MIT", + "dependencies": { + "es-errors": "^1.3.0", + "function-bind": "^1.1.2" + }, + "engines": { + "node": ">= 0.4" + } + }, + "node_modules/call-bound": { + "version": "1.0.4", + "resolved": "https://registry.npmjs.org/call-bound/-/call-bound-1.0.4.tgz", + "integrity": "sha512-+ys997U96po4Kx/ABpBCqhA9EuxJaQWDQg7295H4hBphv3IZg0boBKuwYpt4YXp6MZ5AmZQnU/tyMTlRpaSejg==", + "license": "MIT", + "dependencies": { + "call-bind-apply-helpers": "^1.0.2", + "get-intrinsic": "^1.3.0" + }, + "engines": { + "node": ">= 0.4" + }, + "funding": { + "url": "https://github.com/sponsors/ljharb" + } + }, + "node_modules/content-disposition": { + "version": "1.0.1", + "resolved": "https://registry.npmjs.org/content-disposition/-/content-disposition-1.0.1.tgz", + "integrity": "sha512-oIXISMynqSqm241k6kcQ5UwttDILMK4BiurCfGEREw6+X9jkkpEe5T9FZaApyLGGOnFuyMWZpdolTXMtvEJ08Q==", + "license": "MIT", + "engines": { + "node": ">=18" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/express" + } + }, + "node_modules/content-type": { + "version": "1.0.5", + "resolved": "https://registry.npmjs.org/content-type/-/content-type-1.0.5.tgz", + "integrity": "sha512-nTjqfcBFEipKdXCv4YDQWCfmcLZKm81ldF0pAopTvyrFGVbcR6P/VAAd5G7N+0tTr8QqiU0tFadD6FK4NtJwOA==", + "license": "MIT", + "engines": { + "node": ">= 0.6" + } + }, + "node_modules/cookie": { + "version": "0.7.2", + "resolved": "https://registry.npmjs.org/cookie/-/cookie-0.7.2.tgz", + "integrity": "sha512-yki5XnKuf750l50uGTllt6kKILY4nQ1eNIQatoXEByZ5dWgnKqbnqmTrBE5B4N7lrMJKQ2ytWMiTO2o0v6Ew/w==", + "license": "MIT", + "engines": { + "node": ">= 0.6" + } + }, + "node_modules/cookie-signature": { + "version": "1.2.2", + "resolved": "https://registry.npmjs.org/cookie-signature/-/cookie-signature-1.2.2.tgz", + "integrity": "sha512-D76uU73ulSXrD1UXF4KE2TMxVVwhsnCgfAyTg9k8P6KGZjlXKrOLe4dJQKI3Bxi5wjesZoFXJWElNWBjPZMbhg==", + "license": "MIT", + "engines": { + "node": ">=6.6.0" + } + }, + "node_modules/cors": { + "version": "2.8.6", + "resolved": "https://registry.npmjs.org/cors/-/cors-2.8.6.tgz", + "integrity": "sha512-tJtZBBHA6vjIAaF6EnIaq6laBBP9aq/Y3ouVJjEfoHbRBcHBAHYcMh/w8LDrk2PvIMMq8gmopa5D4V8RmbrxGw==", + "license": "MIT", + "dependencies": { + "object-assign": "^4", + "vary": "^1" + }, + "engines": { + "node": ">= 0.10" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/express" + } + }, + "node_modules/cross-spawn": { + "version": "7.0.6", + "resolved": "https://registry.npmjs.org/cross-spawn/-/cross-spawn-7.0.6.tgz", + "integrity": "sha512-uV2QOWP2nWzsy2aMp8aRibhi9dlzF5Hgh5SHaB9OiTGEyDTiJJyx0uy51QXdyWbtAHNua4XJzUKca3OzKUd3vA==", + "license": "MIT", + "dependencies": { + "path-key": "^3.1.0", + "shebang-command": "^2.0.0", + "which": "^2.0.1" + }, + "engines": { + "node": ">= 8" + } + }, + "node_modules/debug": { + "version": "4.4.3", + "resolved": "https://registry.npmjs.org/debug/-/debug-4.4.3.tgz", + "integrity": "sha512-RGwwWnwQvkVfavKVt22FGLw+xYSdzARwm0ru6DhTVA3umU5hZc28V3kO4stgYryrTlLpuvgI9GiijltAjNbcqA==", + "license": "MIT", + "dependencies": { + "ms": "^2.1.3" + }, + "engines": { + "node": ">=6.0" + }, + "peerDependenciesMeta": { + "supports-color": { + "optional": true + } + } + }, + "node_modules/depd": { + "version": "2.0.0", + "resolved": "https://registry.npmjs.org/depd/-/depd-2.0.0.tgz", + "integrity": "sha512-g7nH6P6dyDioJogAAGprGpCtVImJhpPk/roCzdb3fIh61/s/nPsfR6onyMwkCAR/OlC3yBC0lESvUoQEAssIrw==", + "license": "MIT", + "engines": { + "node": ">= 0.8" + } + }, + "node_modules/dotenv": { + "version": "16.6.1", + "resolved": "https://registry.npmjs.org/dotenv/-/dotenv-16.6.1.tgz", + "integrity": "sha512-uBq4egWHTcTt33a72vpSG0z3HnPuIl6NqYcTrKEg2azoEyl2hpW0zqlxysq2pK9HlDIHyHyakeYaYnSAwd8bow==", + "license": "BSD-2-Clause", + "engines": { + "node": ">=12" + }, + "funding": { + "url": "https://dotenvx.com" + } + }, + "node_modules/dunder-proto": { + "version": "1.0.1", + "resolved": "https://registry.npmjs.org/dunder-proto/-/dunder-proto-1.0.1.tgz", + "integrity": "sha512-KIN/nDJBQRcXw0MLVhZE9iQHmG68qAVIBg9CqmUYjmQIhgij9U5MFvrqkUL5FbtyyzZuOeOt0zdeRe4UY7ct+A==", + "license": "MIT", + "dependencies": { + "call-bind-apply-helpers": "^1.0.1", + "es-errors": "^1.3.0", + "gopd": "^1.2.0" + }, + "engines": { + "node": ">= 0.4" + } + }, + "node_modules/ee-first": { + "version": "1.1.1", + "resolved": "https://registry.npmjs.org/ee-first/-/ee-first-1.1.1.tgz", + "integrity": "sha512-WMwm9LhRUo+WUaRN+vRuETqG89IgZphVSNkdFgeb6sS/E4OrDIN7t48CAewSHXc6C8lefD8KKfr5vY61brQlow==", + "license": "MIT" + }, + "node_modules/encodeurl": { + "version": "2.0.0", + "resolved": "https://registry.npmjs.org/encodeurl/-/encodeurl-2.0.0.tgz", + "integrity": "sha512-Q0n9HRi4m6JuGIV1eFlmvJB7ZEVxu93IrMyiMsGC0lrMJMWzRgx6WGquyfQgZVb31vhGgXnfmPNNXmxnOkRBrg==", + "license": "MIT", + "engines": { + "node": ">= 0.8" + } + }, + "node_modules/es-define-property": { + "version": "1.0.1", + "resolved": "https://registry.npmjs.org/es-define-property/-/es-define-property-1.0.1.tgz", + "integrity": "sha512-e3nRfgfUZ4rNGL232gUgX06QNyyez04KdjFrF+LTRoOXmrOgFKDg4BCdsjW8EnT69eqdYGmRpJwiPVYNrCaW3g==", + "license": "MIT", + "engines": { + "node": ">= 0.4" + } + }, + "node_modules/es-errors": { + "version": "1.3.0", + "resolved": "https://registry.npmjs.org/es-errors/-/es-errors-1.3.0.tgz", + "integrity": "sha512-Zf5H2Kxt2xjTvbJvP2ZWLEICxA6j+hAmMzIlypy4xcBg1vKVnx89Wy0GbS+kf5cwCVFFzdCFh2XSCFNULS6csw==", + "license": "MIT", + "engines": { + "node": ">= 0.4" + } + }, + "node_modules/es-object-atoms": { + "version": "1.1.1", + "resolved": "https://registry.npmjs.org/es-object-atoms/-/es-object-atoms-1.1.1.tgz", + "integrity": "sha512-FGgH2h8zKNim9ljj7dankFPcICIK9Cp5bm+c2gQSYePhpaG5+esrLODihIorn+Pe6FGJzWhXQotPv73jTaldXA==", + "license": "MIT", + "dependencies": { + "es-errors": "^1.3.0" + }, + "engines": { + "node": ">= 0.4" + } + }, + "node_modules/esbuild": { + "version": "0.27.7", + "resolved": "https://registry.npmjs.org/esbuild/-/esbuild-0.27.7.tgz", + "integrity": "sha512-IxpibTjyVnmrIQo5aqNpCgoACA/dTKLTlhMHihVHhdkxKyPO1uBBthumT0rdHmcsk9uMonIWS0m4FljWzILh3w==", + "dev": true, + "hasInstallScript": true, + "license": "MIT", + "bin": { + "esbuild": "bin/esbuild" + }, + "engines": { + "node": ">=18" + }, + "optionalDependencies": { + "@esbuild/aix-ppc64": "0.27.7", + "@esbuild/android-arm": "0.27.7", + "@esbuild/android-arm64": "0.27.7", + "@esbuild/android-x64": "0.27.7", + "@esbuild/darwin-arm64": "0.27.7", + "@esbuild/darwin-x64": "0.27.7", + "@esbuild/freebsd-arm64": "0.27.7", + "@esbuild/freebsd-x64": "0.27.7", + "@esbuild/linux-arm": "0.27.7", + "@esbuild/linux-arm64": "0.27.7", + "@esbuild/linux-ia32": "0.27.7", + "@esbuild/linux-loong64": "0.27.7", + "@esbuild/linux-mips64el": "0.27.7", + "@esbuild/linux-ppc64": "0.27.7", + "@esbuild/linux-riscv64": "0.27.7", + "@esbuild/linux-s390x": "0.27.7", + "@esbuild/linux-x64": "0.27.7", + "@esbuild/netbsd-arm64": "0.27.7", + "@esbuild/netbsd-x64": "0.27.7", + "@esbuild/openbsd-arm64": "0.27.7", + "@esbuild/openbsd-x64": "0.27.7", + "@esbuild/openharmony-arm64": "0.27.7", + "@esbuild/sunos-x64": "0.27.7", + "@esbuild/win32-arm64": "0.27.7", + "@esbuild/win32-ia32": "0.27.7", + "@esbuild/win32-x64": "0.27.7" + } + }, + "node_modules/escape-html": { + "version": "1.0.3", + "resolved": "https://registry.npmjs.org/escape-html/-/escape-html-1.0.3.tgz", + "integrity": "sha512-NiSupZ4OeuGwr68lGIeym/ksIZMJodUGOSCZ/FSnTxcrekbvqrgdUxlJOMpijaKZVjAJrWrGs/6Jy8OMuyj9ow==", + "license": "MIT" + }, + "node_modules/etag": { + "version": "1.8.1", + "resolved": "https://registry.npmjs.org/etag/-/etag-1.8.1.tgz", + "integrity": "sha512-aIL5Fx7mawVa300al2BnEE4iNvo1qETxLrPI/o05L7z6go7fCw1J6EQmbK4FmJ2AS7kgVF/KEZWufBfdClMcPg==", + "license": "MIT", + "engines": { + "node": ">= 0.6" + } + }, + "node_modules/eventsource": { + "version": "3.0.7", + "resolved": "https://registry.npmjs.org/eventsource/-/eventsource-3.0.7.tgz", + "integrity": "sha512-CRT1WTyuQoD771GW56XEZFQ/ZoSfWid1alKGDYMmkt2yl8UXrVR4pspqWNEcqKvVIzg6PAltWjxcSSPrboA4iA==", + "license": "MIT", + "dependencies": { + "eventsource-parser": "^3.0.1" + }, + "engines": { + "node": ">=18.0.0" + } + }, + "node_modules/eventsource-parser": { + "version": "3.0.6", + "resolved": "https://registry.npmjs.org/eventsource-parser/-/eventsource-parser-3.0.6.tgz", + "integrity": "sha512-Vo1ab+QXPzZ4tCa8SwIHJFaSzy4R6SHf7BY79rFBDf0idraZWAkYrDjDj8uWaSm3S2TK+hJ7/t1CEmZ7jXw+pg==", + "license": "MIT", + "engines": { + "node": ">=18.0.0" + } + }, + "node_modules/express": { + "version": "5.2.1", + "resolved": "https://registry.npmjs.org/express/-/express-5.2.1.tgz", + "integrity": "sha512-hIS4idWWai69NezIdRt2xFVofaF4j+6INOpJlVOLDO8zXGpUVEVzIYk12UUi2JzjEzWL3IOAxcTubgz9Po0yXw==", + "license": "MIT", + "peer": true, + "dependencies": { + "accepts": "^2.0.0", + "body-parser": "^2.2.1", + "content-disposition": "^1.0.0", + "content-type": "^1.0.5", + "cookie": "^0.7.1", + "cookie-signature": "^1.2.1", + "debug": "^4.4.0", + "depd": "^2.0.0", + "encodeurl": "^2.0.0", + "escape-html": "^1.0.3", + "etag": "^1.8.1", + "finalhandler": "^2.1.0", + "fresh": "^2.0.0", + "http-errors": "^2.0.0", + "merge-descriptors": "^2.0.0", + "mime-types": "^3.0.0", + "on-finished": "^2.4.1", + "once": "^1.4.0", + "parseurl": "^1.3.3", + "proxy-addr": "^2.0.7", + "qs": "^6.14.0", + "range-parser": "^1.2.1", + "router": "^2.2.0", + "send": "^1.1.0", + "serve-static": "^2.2.0", + "statuses": "^2.0.1", + "type-is": "^2.0.1", + "vary": "^1.1.2" + }, + "engines": { + "node": ">= 18" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/express" + } + }, + "node_modules/express-rate-limit": { + "version": "8.3.2", + "resolved": "https://registry.npmjs.org/express-rate-limit/-/express-rate-limit-8.3.2.tgz", + "integrity": "sha512-77VmFeJkO0/rvimEDuUC5H30oqUC4EyOhyGccfqoLebB0oiEYfM7nwPrsDsBL1gsTpwfzX8SFy2MT3TDyRq+bg==", + "license": "MIT", + "dependencies": { + "ip-address": "10.1.0" + }, + "engines": { + "node": ">= 16" + }, + "funding": { + "url": "https://github.com/sponsors/express-rate-limit" + }, + "peerDependencies": { + "express": ">= 4.11" + } + }, + "node_modules/fast-deep-equal": { + "version": "3.1.3", + "resolved": "https://registry.npmjs.org/fast-deep-equal/-/fast-deep-equal-3.1.3.tgz", + "integrity": "sha512-f3qQ9oQy9j2AhBe/H9VC91wLmKBCCU/gDOnKNAYG5hswO7BLKj09Hc5HYNz9cGI++xlpDCIgDaitVs03ATR84Q==", + "license": "MIT" + }, + "node_modules/fast-uri": { + "version": "3.1.0", + "resolved": "https://registry.npmjs.org/fast-uri/-/fast-uri-3.1.0.tgz", + "integrity": "sha512-iPeeDKJSWf4IEOasVVrknXpaBV0IApz/gp7S2bb7Z4Lljbl2MGJRqInZiUrQwV16cpzw/D3S5j5Julj/gT52AA==", + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/fastify" + }, + { + "type": "opencollective", + "url": "https://opencollective.com/fastify" + } + ], + "license": "BSD-3-Clause" + }, + "node_modules/finalhandler": { + "version": "2.1.1", + "resolved": "https://registry.npmjs.org/finalhandler/-/finalhandler-2.1.1.tgz", + "integrity": "sha512-S8KoZgRZN+a5rNwqTxlZZePjT/4cnm0ROV70LedRHZ0p8u9fRID0hJUZQpkKLzro8LfmC8sx23bY6tVNxv8pQA==", + "license": "MIT", + "dependencies": { + "debug": "^4.4.0", + "encodeurl": "^2.0.0", + "escape-html": "^1.0.3", + "on-finished": "^2.4.1", + "parseurl": "^1.3.3", + "statuses": "^2.0.1" + }, + "engines": { + "node": ">= 18.0.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/express" + } + }, + "node_modules/forwarded": { + "version": "0.2.0", + "resolved": "https://registry.npmjs.org/forwarded/-/forwarded-0.2.0.tgz", + "integrity": "sha512-buRG0fpBtRHSTCOASe6hD258tEubFoRLb4ZNA6NxMVHNw2gOcwHo9wyablzMzOA5z9xA9L1KNjk/Nt6MT9aYow==", + "license": "MIT", + "engines": { + "node": ">= 0.6" + } + }, + "node_modules/fresh": { + "version": "2.0.0", + "resolved": "https://registry.npmjs.org/fresh/-/fresh-2.0.0.tgz", + "integrity": "sha512-Rx/WycZ60HOaqLKAi6cHRKKI7zxWbJ31MhntmtwMoaTeF7XFH9hhBp8vITaMidfljRQ6eYWCKkaTK+ykVJHP2A==", + "license": "MIT", + "engines": { + "node": ">= 0.8" + } + }, + "node_modules/fsevents": { + "version": "2.3.3", + "resolved": "https://registry.npmjs.org/fsevents/-/fsevents-2.3.3.tgz", + "integrity": "sha512-5xoDfX+fL7faATnagmWPpbFtwh/R77WmMMqqHGS65C3vvB0YHrgF+B1YmZ3441tMj5n63k0212XNoJwzlhffQw==", + "dev": true, + "hasInstallScript": true, + "license": "MIT", + "optional": true, + "os": [ + "darwin" + ], + "engines": { + "node": "^8.16.0 || ^10.6.0 || >=11.0.0" + } + }, + "node_modules/function-bind": { + "version": "1.1.2", + "resolved": "https://registry.npmjs.org/function-bind/-/function-bind-1.1.2.tgz", + "integrity": "sha512-7XHNxH7qX9xG5mIwxkhumTox/MIRNcOgDrxWsMt2pAr23WHp6MrRlN7FBSFpCpr+oVO0F744iUgR82nJMfG2SA==", + "license": "MIT", + "funding": { + "url": "https://github.com/sponsors/ljharb" + } + }, + "node_modules/get-intrinsic": { + "version": "1.3.0", + "resolved": "https://registry.npmjs.org/get-intrinsic/-/get-intrinsic-1.3.0.tgz", + "integrity": "sha512-9fSjSaos/fRIVIp+xSJlE6lfwhES7LNtKaCBIamHsjr2na1BiABJPo0mOjjz8GJDURarmCPGqaiVg5mfjb98CQ==", + "license": "MIT", + "dependencies": { + "call-bind-apply-helpers": "^1.0.2", + "es-define-property": "^1.0.1", + "es-errors": "^1.3.0", + "es-object-atoms": "^1.1.1", + "function-bind": "^1.1.2", + "get-proto": "^1.0.1", + "gopd": "^1.2.0", + "has-symbols": "^1.1.0", + "hasown": "^2.0.2", + "math-intrinsics": "^1.1.0" + }, + "engines": { + "node": ">= 0.4" + }, + "funding": { + "url": "https://github.com/sponsors/ljharb" + } + }, + "node_modules/get-proto": { + "version": "1.0.1", + "resolved": "https://registry.npmjs.org/get-proto/-/get-proto-1.0.1.tgz", + "integrity": "sha512-sTSfBjoXBp89JvIKIefqw7U2CCebsc74kiY6awiGogKtoSGbgjYE/G/+l9sF3MWFPNc9IcoOC4ODfKHfxFmp0g==", + "license": "MIT", + "dependencies": { + "dunder-proto": "^1.0.1", + "es-object-atoms": "^1.0.0" + }, + "engines": { + "node": ">= 0.4" + } + }, + "node_modules/get-tsconfig": { + "version": "4.13.7", + "resolved": "https://registry.npmjs.org/get-tsconfig/-/get-tsconfig-4.13.7.tgz", + "integrity": "sha512-7tN6rFgBlMgpBML5j8typ92BKFi2sFQvIdpAqLA2beia5avZDrMs0FLZiM5etShWq5irVyGcGMEA1jcDaK7A/Q==", + "dev": true, + "license": "MIT", + "dependencies": { + "resolve-pkg-maps": "^1.0.0" + }, + "funding": { + "url": "https://github.com/privatenumber/get-tsconfig?sponsor=1" + } + }, + "node_modules/gopd": { + "version": "1.2.0", + "resolved": "https://registry.npmjs.org/gopd/-/gopd-1.2.0.tgz", + "integrity": "sha512-ZUKRh6/kUFoAiTAtTYPZJ3hw9wNxx+BIBOijnlG9PnrJsCcSjs1wyyD6vJpaYtgnzDrKYRSqf3OO6Rfa93xsRg==", + "license": "MIT", + "engines": { + "node": ">= 0.4" + }, + "funding": { + "url": "https://github.com/sponsors/ljharb" + } + }, + "node_modules/has-symbols": { + "version": "1.1.0", + "resolved": "https://registry.npmjs.org/has-symbols/-/has-symbols-1.1.0.tgz", + "integrity": "sha512-1cDNdwJ2Jaohmb3sg4OmKaMBwuC48sYni5HUw2DvsC8LjGTLK9h+eb1X6RyuOHe4hT0ULCW68iomhjUoKUqlPQ==", + "license": "MIT", + "engines": { + "node": ">= 0.4" + }, + "funding": { + "url": "https://github.com/sponsors/ljharb" + } + }, + "node_modules/hasown": { + "version": "2.0.2", + "resolved": "https://registry.npmjs.org/hasown/-/hasown-2.0.2.tgz", + "integrity": "sha512-0hJU9SCPvmMzIBdZFqNPXWa6dqh7WdH0cII9y+CyS8rG3nL48Bclra9HmKhVVUHyPWNH5Y7xDwAB7bfgSjkUMQ==", + "license": "MIT", + "dependencies": { + "function-bind": "^1.1.2" + }, + "engines": { + "node": ">= 0.4" + } + }, + "node_modules/hono": { + "version": "4.12.12", + "resolved": "https://registry.npmjs.org/hono/-/hono-4.12.12.tgz", + "integrity": "sha512-p1JfQMKaceuCbpJKAPKVqyqviZdS0eUxH9v82oWo1kb9xjQ5wA6iP3FNVAPDFlz5/p7d45lO+BpSk1tuSZMF4Q==", + "license": "MIT", + "peer": true, + "engines": { + "node": ">=16.9.0" + } + }, + "node_modules/http-errors": { + "version": "2.0.1", + "resolved": "https://registry.npmjs.org/http-errors/-/http-errors-2.0.1.tgz", + "integrity": "sha512-4FbRdAX+bSdmo4AUFuS0WNiPz8NgFt+r8ThgNWmlrjQjt1Q7ZR9+zTlce2859x4KSXrwIsaeTqDoKQmtP8pLmQ==", + "license": "MIT", + "dependencies": { + "depd": "~2.0.0", + "inherits": "~2.0.4", + "setprototypeof": "~1.2.0", + "statuses": "~2.0.2", + "toidentifier": "~1.0.1" + }, + "engines": { + "node": ">= 0.8" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/express" + } + }, + "node_modules/iconv-lite": { + "version": "0.7.2", + "resolved": "https://registry.npmjs.org/iconv-lite/-/iconv-lite-0.7.2.tgz", + "integrity": "sha512-im9DjEDQ55s9fL4EYzOAv0yMqmMBSZp6G0VvFyTMPKWxiSBHUj9NW/qqLmXUwXrrM7AvqSlTCfvqRb0cM8yYqw==", + "license": "MIT", + "dependencies": { + "safer-buffer": ">= 2.1.2 < 3.0.0" + }, + "engines": { + "node": ">=0.10.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/express" + } + }, + "node_modules/inherits": { + "version": "2.0.4", + "resolved": "https://registry.npmjs.org/inherits/-/inherits-2.0.4.tgz", + "integrity": "sha512-k/vGaX4/Yla3WzyMCvTQOXYeIHvqOKtnqBduzTHpzpQZzAskKMhZ2K+EnBiSM9zGSoIFeMpXKxa4dYeZIQqewQ==", + "license": "ISC" + }, + "node_modules/ip-address": { + "version": "10.1.0", + "resolved": "https://registry.npmjs.org/ip-address/-/ip-address-10.1.0.tgz", + "integrity": "sha512-XXADHxXmvT9+CRxhXg56LJovE+bmWnEWB78LB83VZTprKTmaC5QfruXocxzTZ2Kl0DNwKuBdlIhjL8LeY8Sf8Q==", + "license": "MIT", + "engines": { + "node": ">= 12" + } + }, + "node_modules/ipaddr.js": { + "version": "1.9.1", + "resolved": "https://registry.npmjs.org/ipaddr.js/-/ipaddr.js-1.9.1.tgz", + "integrity": "sha512-0KI/607xoxSToH7GjN1FfSbLoU0+btTicjsQSWQlh/hZykN8KpmMf7uYwPW3R+akZ6R/w18ZlXSHBYXiYUPO3g==", + "license": "MIT", + "engines": { + "node": ">= 0.10" + } + }, + "node_modules/is-promise": { + "version": "4.0.0", + "resolved": "https://registry.npmjs.org/is-promise/-/is-promise-4.0.0.tgz", + "integrity": "sha512-hvpoI6korhJMnej285dSg6nu1+e6uxs7zG3BYAm5byqDsgJNWwxzM6z6iZiAgQR4TJ30JmBTOwqZUw3WlyH3AQ==", + "license": "MIT" + }, + "node_modules/isexe": { + "version": "2.0.0", + "resolved": "https://registry.npmjs.org/isexe/-/isexe-2.0.0.tgz", + "integrity": "sha512-RHxMLp9lnKHGHRng9QFhRCMbYAcVpn69smSGcq3f36xjgVVWThj4qqLbTLlq7Ssj8B+fIQ1EuCEGI2lKsyQeIw==", + "license": "ISC" + }, + "node_modules/jose": { + "version": "6.2.2", + "resolved": "https://registry.npmjs.org/jose/-/jose-6.2.2.tgz", + "integrity": "sha512-d7kPDd34KO/YnzaDOlikGpOurfF0ByC2sEV4cANCtdqLlTfBlw2p14O/5d/zv40gJPbIQxfES3nSx1/oYNyuZQ==", + "license": "MIT", + "funding": { + "url": "https://github.com/sponsors/panva" + } + }, + "node_modules/json-schema-to-ts": { + "version": "3.1.1", + "resolved": "https://registry.npmjs.org/json-schema-to-ts/-/json-schema-to-ts-3.1.1.tgz", + "integrity": "sha512-+DWg8jCJG2TEnpy7kOm/7/AxaYoaRbjVB4LFZLySZlWn8exGs3A4OLJR966cVvU26N7X9TWxl+Jsw7dzAqKT6g==", + "license": "MIT", + "dependencies": { + "@babel/runtime": "^7.18.3", + "ts-algebra": "^2.0.0" + }, + "engines": { + "node": ">=16" + } + }, + "node_modules/json-schema-traverse": { + "version": "1.0.0", + "resolved": "https://registry.npmjs.org/json-schema-traverse/-/json-schema-traverse-1.0.0.tgz", + "integrity": "sha512-NM8/P9n3XjXhIZn1lLhkFaACTOURQXjWhV4BA/RnOv8xvgqtqpAX9IO4mRQxSx1Rlo4tqzeqb0sOlruaOy3dug==", + "license": "MIT" + }, + "node_modules/json-schema-typed": { + "version": "8.0.2", + "resolved": "https://registry.npmjs.org/json-schema-typed/-/json-schema-typed-8.0.2.tgz", + "integrity": "sha512-fQhoXdcvc3V28x7C7BMs4P5+kNlgUURe2jmUT1T//oBRMDrqy1QPelJimwZGo7Hg9VPV3EQV5Bnq4hbFy2vetA==", + "license": "BSD-2-Clause" + }, + "node_modules/math-intrinsics": { + "version": "1.1.0", + "resolved": "https://registry.npmjs.org/math-intrinsics/-/math-intrinsics-1.1.0.tgz", + "integrity": "sha512-/IXtbwEk5HTPyEwyKX6hGkYXxM9nbj64B+ilVJnC/R6B0pH5G4V3b0pVbL7DBj4tkhBAppbQUlf6F6Xl9LHu1g==", + "license": "MIT", + "engines": { + "node": ">= 0.4" + } + }, + "node_modules/media-typer": { + "version": "1.1.0", + "resolved": "https://registry.npmjs.org/media-typer/-/media-typer-1.1.0.tgz", + "integrity": "sha512-aisnrDP4GNe06UcKFnV5bfMNPBUw4jsLGaWwWfnH3v02GnBuXX2MCVn5RbrWo0j3pczUilYblq7fQ7Nw2t5XKw==", + "license": "MIT", + "engines": { + "node": ">= 0.8" + } + }, + "node_modules/merge-descriptors": { + "version": "2.0.0", + "resolved": "https://registry.npmjs.org/merge-descriptors/-/merge-descriptors-2.0.0.tgz", + "integrity": "sha512-Snk314V5ayFLhp3fkUREub6WtjBfPdCPY1Ln8/8munuLuiYhsABgBVWsozAG+MWMbVEvcdcpbi9R7ww22l9Q3g==", + "license": "MIT", + "engines": { + "node": ">=18" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, + "node_modules/mime-db": { + "version": "1.54.0", + "resolved": "https://registry.npmjs.org/mime-db/-/mime-db-1.54.0.tgz", + "integrity": "sha512-aU5EJuIN2WDemCcAp2vFBfp/m4EAhWJnUNSSw0ixs7/kXbd6Pg64EmwJkNdFhB8aWt1sH2CTXrLxo/iAGV3oPQ==", + "license": "MIT", + "engines": { + "node": ">= 0.6" + } + }, + "node_modules/mime-types": { + "version": "3.0.2", + "resolved": "https://registry.npmjs.org/mime-types/-/mime-types-3.0.2.tgz", + "integrity": "sha512-Lbgzdk0h4juoQ9fCKXW4by0UJqj+nOOrI9MJ1sSj4nI8aI2eo1qmvQEie4VD1glsS250n15LsWsYtCugiStS5A==", + "license": "MIT", + "dependencies": { + "mime-db": "^1.54.0" + }, + "engines": { + "node": ">=18" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/express" + } + }, + "node_modules/ms": { + "version": "2.1.3", + "resolved": "https://registry.npmjs.org/ms/-/ms-2.1.3.tgz", + "integrity": "sha512-6FlzubTLZG3J2a/NVCAleEhjzq5oxgHyaCU9yYXvcLsvoVaHJq/s5xXI6/XXP6tz7R9xAOtHnSO/tXtF3WRTlA==", + "license": "MIT" + }, + "node_modules/negotiator": { + "version": "1.0.0", + "resolved": "https://registry.npmjs.org/negotiator/-/negotiator-1.0.0.tgz", + "integrity": "sha512-8Ofs/AUQh8MaEcrlq5xOX0CQ9ypTF5dl78mjlMNfOK08fzpgTHQRQPBxcPlEtIw0yRpws+Zo/3r+5WRby7u3Gg==", + "license": "MIT", + "engines": { + "node": ">= 0.6" + } + }, + "node_modules/object-assign": { + "version": "4.1.1", + "resolved": "https://registry.npmjs.org/object-assign/-/object-assign-4.1.1.tgz", + "integrity": "sha512-rJgTQnkUnH1sFw8yT6VSU3zD3sWmu6sZhIseY8VX+GRu3P6F7Fu+JNDoXfklElbLJSnc3FUQHVe4cU5hj+BcUg==", + "license": "MIT", + "engines": { + "node": ">=0.10.0" + } + }, + "node_modules/object-inspect": { + "version": "1.13.4", + "resolved": "https://registry.npmjs.org/object-inspect/-/object-inspect-1.13.4.tgz", + "integrity": "sha512-W67iLl4J2EXEGTbfeHCffrjDfitvLANg0UlX3wFUUSTx92KXRFegMHUVgSqE+wvhAbi4WqjGg9czysTV2Epbew==", + "license": "MIT", + "engines": { + "node": ">= 0.4" + }, + "funding": { + "url": "https://github.com/sponsors/ljharb" + } + }, + "node_modules/on-finished": { + "version": "2.4.1", + "resolved": "https://registry.npmjs.org/on-finished/-/on-finished-2.4.1.tgz", + "integrity": "sha512-oVlzkg3ENAhCk2zdv7IJwd/QUD4z2RxRwpkcGY8psCVcCYZNq4wYnVWALHM+brtuJjePWiYF/ClmuDr8Ch5+kg==", + "license": "MIT", + "dependencies": { + "ee-first": "1.1.1" + }, + "engines": { + "node": ">= 0.8" + } + }, + "node_modules/once": { + "version": "1.4.0", + "resolved": "https://registry.npmjs.org/once/-/once-1.4.0.tgz", + "integrity": "sha512-lNaJgI+2Q5URQBkccEKHTQOPaXdUxnZZElQTZY0MFUAuaEqe1E+Nyvgdz/aIyNi6Z9MzO5dv1H8n58/GELp3+w==", + "license": "ISC", + "dependencies": { + "wrappy": "1" + } + }, + "node_modules/parseurl": { + "version": "1.3.3", + "resolved": "https://registry.npmjs.org/parseurl/-/parseurl-1.3.3.tgz", + "integrity": "sha512-CiyeOxFT/JZyN5m0z9PfXw4SCBJ6Sygz1Dpl0wqjlhDEGGBP1GnsUVEL0p63hoG1fcj3fHynXi9NYO4nWOL+qQ==", + "license": "MIT", + "engines": { + "node": ">= 0.8" + } + }, + "node_modules/path-key": { + "version": "3.1.1", + "resolved": "https://registry.npmjs.org/path-key/-/path-key-3.1.1.tgz", + "integrity": "sha512-ojmeN0qd+y0jszEtoY48r0Peq5dwMEkIlCOu6Q5f41lfkswXuKtYrhgoTpLnyIcHm24Uhqx+5Tqm2InSwLhE6Q==", + "license": "MIT", + "engines": { + "node": ">=8" + } + }, + "node_modules/path-to-regexp": { + "version": "8.4.2", + "resolved": "https://registry.npmjs.org/path-to-regexp/-/path-to-regexp-8.4.2.tgz", + "integrity": "sha512-qRcuIdP69NPm4qbACK+aDogI5CBDMi1jKe0ry5rSQJz8JVLsC7jV8XpiJjGRLLol3N+R5ihGYcrPLTno6pAdBA==", + "license": "MIT", + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/express" + } + }, + "node_modules/pkce-challenge": { + "version": "5.0.1", + "resolved": "https://registry.npmjs.org/pkce-challenge/-/pkce-challenge-5.0.1.tgz", + "integrity": "sha512-wQ0b/W4Fr01qtpHlqSqspcj3EhBvimsdh0KlHhH8HRZnMsEa0ea2fTULOXOS9ccQr3om+GcGRk4e+isrZWV8qQ==", + "license": "MIT", + "engines": { + "node": ">=16.20.0" + } + }, + "node_modules/proxy-addr": { + "version": "2.0.7", + "resolved": "https://registry.npmjs.org/proxy-addr/-/proxy-addr-2.0.7.tgz", + "integrity": "sha512-llQsMLSUDUPT44jdrU/O37qlnifitDP+ZwrmmZcoSKyLKvtZxpyV0n2/bD/N4tBAAZ/gJEdZU7KMraoK1+XYAg==", + "license": "MIT", + "dependencies": { + "forwarded": "0.2.0", + "ipaddr.js": "1.9.1" + }, + "engines": { + "node": ">= 0.10" + } + }, + "node_modules/qs": { + "version": "6.15.0", + "resolved": "https://registry.npmjs.org/qs/-/qs-6.15.0.tgz", + "integrity": "sha512-mAZTtNCeetKMH+pSjrb76NAM8V9a05I9aBZOHztWy/UqcJdQYNsf59vrRKWnojAT9Y+GbIvoTBC++CPHqpDBhQ==", + "license": "BSD-3-Clause", + "dependencies": { + "side-channel": "^1.1.0" + }, + "engines": { + "node": ">=0.6" + }, + "funding": { + "url": "https://github.com/sponsors/ljharb" + } + }, + "node_modules/range-parser": { + "version": "1.2.1", + "resolved": "https://registry.npmjs.org/range-parser/-/range-parser-1.2.1.tgz", + "integrity": "sha512-Hrgsx+orqoygnmhFbKaHE6c296J+HTAQXoxEF6gNupROmmGJRoyzfG3ccAveqCBrwr/2yxQ5BVd/GTl5agOwSg==", + "license": "MIT", + "engines": { + "node": ">= 0.6" + } + }, + "node_modules/raw-body": { + "version": "3.0.2", + "resolved": "https://registry.npmjs.org/raw-body/-/raw-body-3.0.2.tgz", + "integrity": "sha512-K5zQjDllxWkf7Z5xJdV0/B0WTNqx6vxG70zJE4N0kBs4LovmEYWJzQGxC9bS9RAKu3bgM40lrd5zoLJ12MQ5BA==", + "license": "MIT", + "dependencies": { + "bytes": "~3.1.2", + "http-errors": "~2.0.1", + "iconv-lite": "~0.7.0", + "unpipe": "~1.0.0" + }, + "engines": { + "node": ">= 0.10" + } + }, + "node_modules/require-from-string": { + "version": "2.0.2", + "resolved": "https://registry.npmjs.org/require-from-string/-/require-from-string-2.0.2.tgz", + "integrity": "sha512-Xf0nWe6RseziFMu+Ap9biiUbmplq6S9/p+7w7YXP/JBHhrUDDUhwa+vANyubuqfZWTveU//DYVGsDG7RKL/vEw==", + "license": "MIT", + "engines": { + "node": ">=0.10.0" + } + }, + "node_modules/resolve-pkg-maps": { + "version": "1.0.0", + "resolved": "https://registry.npmjs.org/resolve-pkg-maps/-/resolve-pkg-maps-1.0.0.tgz", + "integrity": "sha512-seS2Tj26TBVOC2NIc2rOe2y2ZO7efxITtLZcGSOnHHNOQ7CkiUBfw0Iw2ck6xkIhPwLhKNLS8BO+hEpngQlqzw==", + "dev": true, + "license": "MIT", + "funding": { + "url": "https://github.com/privatenumber/resolve-pkg-maps?sponsor=1" + } + }, + "node_modules/router": { + "version": "2.2.0", + "resolved": "https://registry.npmjs.org/router/-/router-2.2.0.tgz", + "integrity": "sha512-nLTrUKm2UyiL7rlhapu/Zl45FwNgkZGaCpZbIHajDYgwlJCOzLSk+cIPAnsEqV955GjILJnKbdQC1nVPz+gAYQ==", + "license": "MIT", + "dependencies": { + "debug": "^4.4.0", + "depd": "^2.0.0", + "is-promise": "^4.0.0", + "parseurl": "^1.3.3", + "path-to-regexp": "^8.0.0" + }, + "engines": { + "node": ">= 18" + } + }, + "node_modules/safer-buffer": { + "version": "2.1.2", + "resolved": "https://registry.npmjs.org/safer-buffer/-/safer-buffer-2.1.2.tgz", + "integrity": "sha512-YZo3K82SD7Riyi0E1EQPojLz7kpepnSQI9IyPbHHg1XXXevb5dJI7tpyN2ADxGcQbHG7vcyRHk0cbwqcQriUtg==", + "license": "MIT" + }, + "node_modules/send": { + "version": "1.2.1", + "resolved": "https://registry.npmjs.org/send/-/send-1.2.1.tgz", + "integrity": "sha512-1gnZf7DFcoIcajTjTwjwuDjzuz4PPcY2StKPlsGAQ1+YH20IRVrBaXSWmdjowTJ6u8Rc01PoYOGHXfP1mYcZNQ==", + "license": "MIT", + "dependencies": { + "debug": "^4.4.3", + "encodeurl": "^2.0.0", + "escape-html": "^1.0.3", + "etag": "^1.8.1", + "fresh": "^2.0.0", + "http-errors": "^2.0.1", + "mime-types": "^3.0.2", + "ms": "^2.1.3", + "on-finished": "^2.4.1", + "range-parser": "^1.2.1", + "statuses": "^2.0.2" + }, + "engines": { + "node": ">= 18" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/express" + } + }, + "node_modules/serve-static": { + "version": "2.2.1", + "resolved": "https://registry.npmjs.org/serve-static/-/serve-static-2.2.1.tgz", + "integrity": "sha512-xRXBn0pPqQTVQiC8wyQrKs2MOlX24zQ0POGaj0kultvoOCstBQM5yvOhAVSUwOMjQtTvsPWoNCHfPGwaaQJhTw==", + "license": "MIT", + "dependencies": { + "encodeurl": "^2.0.0", + "escape-html": "^1.0.3", + "parseurl": "^1.3.3", + "send": "^1.2.0" + }, + "engines": { + "node": ">= 18" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/express" + } + }, + "node_modules/setprototypeof": { + "version": "1.2.0", + "resolved": "https://registry.npmjs.org/setprototypeof/-/setprototypeof-1.2.0.tgz", + "integrity": "sha512-E5LDX7Wrp85Kil5bhZv46j8jOeboKq5JMmYM3gVGdGH8xFpPWXUMsNrlODCrkoxMEeNi/XZIwuRvY4XNwYMJpw==", + "license": "ISC" + }, + "node_modules/shebang-command": { + "version": "2.0.0", + "resolved": "https://registry.npmjs.org/shebang-command/-/shebang-command-2.0.0.tgz", + "integrity": "sha512-kHxr2zZpYtdmrN1qDjrrX/Z1rR1kG8Dx+gkpK1G4eXmvXswmcE1hTWBWYUzlraYw1/yZp6YuDY77YtvbN0dmDA==", + "license": "MIT", + "dependencies": { + "shebang-regex": "^3.0.0" + }, + "engines": { + "node": ">=8" + } + }, + "node_modules/shebang-regex": { + "version": "3.0.0", + "resolved": "https://registry.npmjs.org/shebang-regex/-/shebang-regex-3.0.0.tgz", + "integrity": "sha512-7++dFhtcx3353uBaq8DDR4NuxBetBzC7ZQOhmTQInHEd6bSrXdiEyzCvG07Z44UYdLShWUyXt5M/yhz8ekcb1A==", + "license": "MIT", + "engines": { + "node": ">=8" + } + }, + "node_modules/side-channel": { + "version": "1.1.0", + "resolved": "https://registry.npmjs.org/side-channel/-/side-channel-1.1.0.tgz", + "integrity": "sha512-ZX99e6tRweoUXqR+VBrslhda51Nh5MTQwou5tnUDgbtyM0dBgmhEDtWGP/xbKn6hqfPRHujUNwz5fy/wbbhnpw==", + "license": "MIT", + "dependencies": { + "es-errors": "^1.3.0", + "object-inspect": "^1.13.3", + "side-channel-list": "^1.0.0", + "side-channel-map": "^1.0.1", + "side-channel-weakmap": "^1.0.2" + }, + "engines": { + "node": ">= 0.4" + }, + "funding": { + "url": "https://github.com/sponsors/ljharb" + } + }, + "node_modules/side-channel-list": { + "version": "1.0.0", + "resolved": "https://registry.npmjs.org/side-channel-list/-/side-channel-list-1.0.0.tgz", + "integrity": "sha512-FCLHtRD/gnpCiCHEiJLOwdmFP+wzCmDEkc9y7NsYxeF4u7Btsn1ZuwgwJGxImImHicJArLP4R0yX4c2KCrMrTA==", + "license": "MIT", + "dependencies": { + "es-errors": "^1.3.0", + "object-inspect": "^1.13.3" + }, + "engines": { + "node": ">= 0.4" + }, + "funding": { + "url": "https://github.com/sponsors/ljharb" + } + }, + "node_modules/side-channel-map": { + "version": "1.0.1", + "resolved": "https://registry.npmjs.org/side-channel-map/-/side-channel-map-1.0.1.tgz", + "integrity": "sha512-VCjCNfgMsby3tTdo02nbjtM/ewra6jPHmpThenkTYh8pG9ucZ/1P8So4u4FGBek/BjpOVsDCMoLA/iuBKIFXRA==", + "license": "MIT", + "dependencies": { + "call-bound": "^1.0.2", + "es-errors": "^1.3.0", + "get-intrinsic": "^1.2.5", + "object-inspect": "^1.13.3" + }, + "engines": { + "node": ">= 0.4" + }, + "funding": { + "url": "https://github.com/sponsors/ljharb" + } + }, + "node_modules/side-channel-weakmap": { + "version": "1.0.2", + "resolved": "https://registry.npmjs.org/side-channel-weakmap/-/side-channel-weakmap-1.0.2.tgz", + "integrity": "sha512-WPS/HvHQTYnHisLo9McqBHOJk2FkHO/tlpvldyrnem4aeQp4hai3gythswg6p01oSoTl58rcpiFAjF2br2Ak2A==", + "license": "MIT", + "dependencies": { + "call-bound": "^1.0.2", + "es-errors": "^1.3.0", + "get-intrinsic": "^1.2.5", + "object-inspect": "^1.13.3", + "side-channel-map": "^1.0.1" + }, + "engines": { + "node": ">= 0.4" + }, + "funding": { + "url": "https://github.com/sponsors/ljharb" + } + }, + "node_modules/statuses": { + "version": "2.0.2", + "resolved": "https://registry.npmjs.org/statuses/-/statuses-2.0.2.tgz", + "integrity": "sha512-DvEy55V3DB7uknRo+4iOGT5fP1slR8wQohVdknigZPMpMstaKJQWhwiYBACJE3Ul2pTnATihhBYnRhZQHGBiRw==", + "license": "MIT", + "engines": { + "node": ">= 0.8" + } + }, + "node_modules/toidentifier": { + "version": "1.0.1", + "resolved": "https://registry.npmjs.org/toidentifier/-/toidentifier-1.0.1.tgz", + "integrity": "sha512-o5sSPKEkg/DIQNmH43V0/uerLrpzVedkUh8tGNvaeXpfpuwjKenlSox/2O/BTlZUtEe+JG7s5YhEz608PlAHRA==", + "license": "MIT", + "engines": { + "node": ">=0.6" + } + }, + "node_modules/ts-algebra": { + "version": "2.0.0", + "resolved": "https://registry.npmjs.org/ts-algebra/-/ts-algebra-2.0.0.tgz", + "integrity": "sha512-FPAhNPFMrkwz76P7cdjdmiShwMynZYN6SgOujD1urY4oNm80Ou9oMdmbR45LotcKOXoy7wSmHkRFE6Mxbrhefw==", + "license": "MIT" + }, + "node_modules/tsx": { + "version": "4.21.0", + "resolved": "https://registry.npmjs.org/tsx/-/tsx-4.21.0.tgz", + "integrity": "sha512-5C1sg4USs1lfG0GFb2RLXsdpXqBSEhAaA/0kPL01wxzpMqLILNxIxIOKiILz+cdg/pLnOUxFYOR5yhHU666wbw==", + "dev": true, + "license": "MIT", + "dependencies": { + "esbuild": "~0.27.0", + "get-tsconfig": "^4.7.5" + }, + "bin": { + "tsx": "dist/cli.mjs" + }, + "engines": { + "node": ">=18.0.0" + }, + "optionalDependencies": { + "fsevents": "~2.3.3" + } + }, + "node_modules/type-is": { + "version": "2.0.1", + "resolved": "https://registry.npmjs.org/type-is/-/type-is-2.0.1.tgz", + "integrity": "sha512-OZs6gsjF4vMp32qrCbiVSkrFmXtG/AZhY3t0iAMrMBiAZyV9oALtXO8hsrHbMXF9x6L3grlFuwW2oAz7cav+Gw==", + "license": "MIT", + "dependencies": { + "content-type": "^1.0.5", + "media-typer": "^1.1.0", + "mime-types": "^3.0.0" + }, + "engines": { + "node": ">= 0.6" + } + }, + "node_modules/typescript": { + "version": "5.9.3", + "resolved": "https://registry.npmjs.org/typescript/-/typescript-5.9.3.tgz", + "integrity": "sha512-jl1vZzPDinLr9eUt3J/t7V6FgNEw9QjvBPdysz9KfQDD41fQrC2Y4vKQdiaUpFT4bXlb1RHhLpp8wtm6M5TgSw==", + "dev": true, + "license": "Apache-2.0", + "bin": { + "tsc": "bin/tsc", + "tsserver": "bin/tsserver" + }, + "engines": { + "node": ">=14.17" + } + }, + "node_modules/unpipe": { + "version": "1.0.0", + "resolved": "https://registry.npmjs.org/unpipe/-/unpipe-1.0.0.tgz", + "integrity": "sha512-pjy2bYhSsufwWlKwPc+l3cN7+wuJlK6uz0YdJEOlQDbl6jo/YlPi4mb8agUkVC8BF7V8NuzeyPNqRksA3hztKQ==", + "license": "MIT", + "engines": { + "node": ">= 0.8" + } + }, + "node_modules/vary": { + "version": "1.1.2", + "resolved": "https://registry.npmjs.org/vary/-/vary-1.1.2.tgz", + "integrity": "sha512-BNGbWLfd0eUPabhkXUVm0j8uuvREyTh5ovRa/dyow/BqAbZJyC+5fU+IzQOzmAKzYqYRAISoRhdQr3eIZ/PXqg==", + "license": "MIT", + "engines": { + "node": ">= 0.8" + } + }, + "node_modules/which": { + "version": "2.0.2", + "resolved": "https://registry.npmjs.org/which/-/which-2.0.2.tgz", + "integrity": "sha512-BLI3Tl1TW3Pvl70l3yq3Y64i+awpwXqsGBYWkkqMtnbXgrMD+yj7rhW0kuEDxzJaYXGjEW5ogapKNMEKNMjibA==", + "license": "ISC", + "dependencies": { + "isexe": "^2.0.0" + }, + "bin": { + "node-which": "bin/node-which" + }, + "engines": { + "node": ">= 8" + } + }, + "node_modules/wrappy": { + "version": "1.0.2", + "resolved": "https://registry.npmjs.org/wrappy/-/wrappy-1.0.2.tgz", + "integrity": "sha512-l4Sp/DRseor9wL6EvV2+TuQn63dMkPjZ/sp9XkghTEbV9KlPS1xUsZ3u7/IQO4wxtcFB4bgpQPRcR3QCvezPcQ==", + "license": "ISC" + }, + "node_modules/zod": { + "version": "4.3.6", + "resolved": "https://registry.npmjs.org/zod/-/zod-4.3.6.tgz", + "integrity": "sha512-rftlrkhHZOcjDwkGlnUtZZkvaPHCsDATp4pGpuOOMDaTdDDXF91wuVDJoWoPsKX/3YPQ5fHuF3STjcYyKr+Qhg==", + "license": "MIT", + "peer": true, + "funding": { + "url": "https://github.com/sponsors/colinhacks" + } + }, + "node_modules/zod-to-json-schema": { + "version": "3.25.2", + "resolved": "https://registry.npmjs.org/zod-to-json-schema/-/zod-to-json-schema-3.25.2.tgz", + "integrity": "sha512-O/PgfnpT1xKSDeQYSCfRI5Gy3hPf91mKVDuYLUHZJMiDFptvP41MSnWofm8dnCm0256ZNfZIM7DSzuSMAFnjHA==", + "license": "ISC", + "peerDependencies": { + "zod": "^3.25.28 || ^4" + } + } + } +} diff --git a/docs/agent-evaluation/package.json b/docs/agent-evaluation/package.json new file mode 100644 index 000000000..900af5e2d --- /dev/null +++ b/docs/agent-evaluation/package.json @@ -0,0 +1,25 @@ +{ + "name": "outpost-agent-evaluation", + "version": "1.0.0", + "private": true, + "type": "module", + "description": "Claude Agent SDK harness for Outpost onboarding scenario evals", + "scripts": { + "eval": "node --import tsx src/run-agent-eval.ts", + "eval:ci": "node --import tsx src/run-agent-eval.ts -- --scenarios 01,02", + "eval:tsx-cli": "tsx src/run-agent-eval.ts", + "score": "node --import tsx src/score-eval.ts", + "typecheck": "tsc --noEmit" + }, + "engines": { + "node": ">=18" + }, + "dependencies": { + "@anthropic-ai/claude-agent-sdk": "^0.2.92", + "dotenv": "^16.4.7" + }, + "devDependencies": { + "tsx": "^4.19.4", + "typescript": "^5.8.3" + } +} diff --git a/docs/agent-evaluation/results/.gitignore b/docs/agent-evaluation/results/.gitignore new file mode 100644 index 000000000..3a2f71330 --- /dev/null +++ b/docs/agent-evaluation/results/.gitignore @@ -0,0 +1,5 @@ +# Ignore local run recordings; keep README + template committed +* +!.gitignore +!README.md +!RUN-RECORDING.template.md diff --git a/docs/agent-evaluation/results/README.md b/docs/agent-evaluation/results/README.md new file mode 100644 index 000000000..0ed815986 --- /dev/null +++ b/docs/agent-evaluation/results/README.md @@ -0,0 +1,57 @@ +# Agent evaluation — results + +This directory holds **manual run write-ups** and, under `**runs/`**, **automated** artifacts from `npm run eval`. Almost everything here is **gitignored** by default (see `[.gitignore](.gitignore)`). + +Full workflow and env vars: `**[../README.md](../README.md)`**. + +--- + +## Automated runs (`runs/`) + +From `docs/agent-evaluation/`: + +```sh +npm run eval -- --scenario 01 +npm run eval -- --scenarios 01,02 +npm run eval -- --all +``` + +Each run is a **directory** (same timestamp stem, all gitignored): + +`runs/-scenario-NN/` + +| Path in run dir | What it is | +| --------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------- | +| `transcript.json` | Full Claude Agent SDK transcript (`meta` + `messages`). | +| `heuristic-score.json` | **Heuristic** transcript checks (`[../src/score-transcript.ts](../src/score-transcript.ts)`); rubrics **01–10** (`scoreScenario01`–`10`). | +| `llm-score.json` | **LLM judge** output (`[../src/llm-judge.ts](../src/llm-judge.ts)`) vs `**## Success criteria`** in the scenario markdown. | +| *(other files)* | Anything the agent **`Write`**s (e.g. `outpost-quickstart.sh`); SDK **`cwd`** is this directory. | + +Legacy flat `runs/-scenario-NN.json` (and `*.score.json` / `*.llm-score.json` beside it) still work with **`npm run score`**. + +Re-score an existing run without re-running the agent: + +```sh +npm run score -- --run results/runs/-scenario-NN --write +npm run score -- --run results/runs/-scenario-NN --llm --write +``` + +**Execution** (curl/SDK against live Outpost with `OUTPOST_API_KEY`) is **not** produced by these JSON files. Treat the **Execution (full pass)** rows in `[../scenarios/](../scenarios/)` as a separate human or CI step unless you add a verifier script. + +--- + +## Manual run recordings + +For **IDE-only** or ad-hoc runs (no `npm run eval`): + +1. Copy `[RUN-RECORDING.template.md](RUN-RECORDING.template.md)` to a **local-only** name (e.g. `2026-04-08-s01-cursor.md`) in this directory. +2. Fill in transcript summary, heuristic/LLM pointers if you ran `npm run score` separately, **Execution verification**, and notes. +3. Do not commit raw recordings unless your policy allows it; anonymized summaries in a PR are fine. + +Success criteria for every scenario: `**[../scenarios/*.md](../scenarios/)`** — section **Success criteria**. + +--- + +## Template + +See `[RUN-RECORDING.template.md](RUN-RECORDING.template.md)`. \ No newline at end of file diff --git a/docs/agent-evaluation/results/RUN-RECORDING.template.md b/docs/agent-evaluation/results/RUN-RECORDING.template.md new file mode 100644 index 000000000..047b9fa84 --- /dev/null +++ b/docs/agent-evaluation/results/RUN-RECORDING.template.md @@ -0,0 +1,36 @@ +# Agent eval recording (copy this file, rename, fill in) + +**Scenario:** (e.g. `01-basics-curl` — link to `../scenarios/....md`) +**Date:** YYYY-MM-DD +**Agent / client:** (e.g. Cursor Agent, Claude Code, Copilot Chat) +**Model:** (if known) +**Outpost skill enabled?** yes / no + +## Environment + +- Docs / prompt source: (commit SHA or “main @ date”) +- Hookdeck project: throwaway / prod (describe) + +## Transcript summary + +(Optional bullets — do not paste secrets.) + +- Turn 0: … +- Turn 1: … + +## Success criteria (from scenario doc) + +Copy the checklist from the scenario and mark **PASS** / **FAIL** / **N/A**. + +- … + +## Execution verification (full pass) + +Did you run the generated curl / script / app against a **live** Outpost project with `**OUTPOST_API_KEY`** (and related env vars)? + +- **Execution:** PASS / FAIL / SKIPPED (transcript-only) +- Notes (HTTP status codes, error bodies — no secrets): + +## Notes / regressions + +… \ No newline at end of file diff --git a/docs/agent-evaluation/scenarios/01-basics-curl.md b/docs/agent-evaluation/scenarios/01-basics-curl.md new file mode 100644 index 000000000..b7a491861 --- /dev/null +++ b/docs/agent-evaluation/scenarios/01-basics-curl.md @@ -0,0 +1,48 @@ +# Scenario 1 — Basics with curl + +## Intent + +Agent should produce a **minimal shell + curl** flow against the **managed** API (no SDK), matching the official curl quickstart. Prefer a **single runnable shell script** (e.g. `outpost-quickstart.sh`) that sets variables and runs all curls, so the operator can `chmod +x` and run once; inline copy-paste blocks are acceptable if the user asked only for “commands.” + +## Preconditions + +- `OUTPOST_API_KEY` set in the environment (user states this; agent must not ask for the raw key in chat). +- Topics include at least one topic used in the script (e.g. `user.created`). + +## Automated eval (Claude Agent SDK) + +The harness sets the agent **cwd** to an empty directory under `docs/agent-evaluation/results/runs/-scenario-NN/`. Save the shell script there with **Write** (e.g. `outpost-quickstart.sh`), not only as a fenced block in chat, so the run folder is reviewable on disk. + +## Conversation script + +### Turn 0 + +Paste the **## Template** block from `[hookdeck-outpost-agent-prompt.mdx](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx)`, with `{{…}}` filled using your project or `[fixtures/placeholder-values-for-turn0.md](../fixtures/placeholder-values-for-turn0.md)`. + +### Turn 1 — User + +> I only want the basics using **curl** against the managed API. No SDK. Give me a **single shell script** I can save and run (e.g. `bash outpost-quickstart.sh`) that: creates a tenant, adds a webhook destination for my test URL, and publishes one event. Use the topic from the prompt. Use `OUTPOST_API_KEY` from the environment (document that I should `export` it or load `.env`). If you can’t provide a file, paste one script block I can save as `.sh`. + +### Turn 2 — User (optional probe) + +> Show me how to verify delivery after I run those commands. + +## Success criteria + +**Measurement:** Heuristic rubric `scoreScenario01` in [`../src/score-transcript.ts`](../src/score-transcript.ts) (assistant text + tool-written script content). LLM judge: `npm run score -- --run --llm`. Execution row remains manual. + +- Uses managed base URL `https://api.outpost.hookdeck.com/2025-07-01` (or explicit `OUTPOST_API_BASE_URL`), **not** `localhost:3333/api/v1`, unless the user asked for self-hosted. +- Tenant: `PUT .../tenants/{tenant_id}` with `Authorization: Bearer` (or documents equivalent). +- Destination: `POST .../tenants/{tenant_id}/destinations` with `type: webhook`, `topics` including the configured topic or `*`, and `config.url` pointing at a test HTTPS URL (env or placeholder). +- Publish: `POST .../publish` with `tenant_id`, `topic`, and a top-level JSON field `**data`** (the event payload object — see OpenAPI `PublishRequest` and curl quickstart). Not `payload`. Typically also `eligible_for_retry`. +- Delivers as one **shell script** (or one fenced `bash` block meant to be saved as `.sh`), not only three unrelated snippets without a shebang/variables. +- Does **not** embed a pasted API key in the reply. +- Verification mentions Hookdeck Console / dashboard logs if Turn 2 was asked. +- **Execution (full pass):** With `OUTPOST_API_KEY` (and `OUTPOST_API_BASE_URL` if the snippet uses it) set in your environment, run the agent’s tenant → destination → publish sequence against a real project. Expect **2xx** on tenant upsert and destination create, **202** (or documented success) on publish, and a visible delivery to the test webhook URL (Hookdeck Console / project logs, or `GET .../attempts` as appropriate). *Skip only if you are doing transcript-only triage.* + +## Failure modes to note + +- Wrong path (`PUT /{tenant}` without `/tenants/`). +- Mixing self-hosted base path with managed host. +- Skipping topic alignment with dashboard configuration. + diff --git a/docs/agent-evaluation/scenarios/02-basics-typescript.md b/docs/agent-evaluation/scenarios/02-basics-typescript.md new file mode 100644 index 000000000..9a2fc40a7 --- /dev/null +++ b/docs/agent-evaluation/scenarios/02-basics-typescript.md @@ -0,0 +1,45 @@ +# Scenario 2 — Basics with TypeScript + +## Intent + +Agent should produce a **single runnable `.ts` file** using `@hookdeck/outpost-sdk`, following the managed TypeScript quickstart pattern. + +## Preconditions + +- Node 18+; user can run `npx tsx`. +- `OUTPOST_API_KEY` and `OUTPOST_TEST_WEBHOOK_URL` available as env vars. + +## Automated eval (Claude Agent SDK) + +The harness sets the agent **cwd** to `docs/agent-evaluation/results/runs/-scenario-NN/`. Write the script and any `package.json` there with **Write** / **Edit**; use **Bash** for `npm install`, `npx tsx`, etc., so the folder is a runnable mini-project. + +## Conversation script + +### Turn 0 + +Paste the **## Template** block from `[hookdeck-outpost-agent-prompt.mdx](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx)`, with `{{…}}` filled using your project or `[fixtures/placeholder-values-for-turn0.md](../fixtures/placeholder-values-for-turn0.md)`. + +### Turn 1 — User + +> Option 1 — try it out. Use **TypeScript** only: one script file, use `@hookdeck/outpost-sdk`, read `OUTPOST_API_KEY` and `OUTPOST_TEST_WEBHOOK_URL` from the environment. Create tenant, webhook destination for the topic in the prompt, publish one test event, print the event id. + +### Turn 2 — User (optional) + +> How do I run it? + +## Success criteria + +**Measurement:** Heuristic `scoreScenario02` in [`src/score-transcript.ts`](../src/score-transcript.ts); LLM judge maps the bullets below ([`README.md` § Measuring scenarios](../README.md#measuring-scenarios)). Execution row is manual. + +- Depends on `@hookdeck/outpost-sdk`; uses `Outpost` client with `apiKey` from `process.env.OUTPOST_API_KEY`. +- Calls `tenants.upsert`, `destinations.create` (webhook), `publish.event`. +- Uses a topic that matches the dashboard list from the prompt (or asks which topic if ambiguous). +- Webhook URL from `OUTPOST_TEST_WEBHOOK_URL` (or clearly documented env). +- No API key in source; fails fast if env missing. +- Mentions `npx tsx script.ts` or equivalent run instructions. +- **Execution (full pass):** With `OUTPOST_API_KEY`, `OUTPOST_TEST_WEBHOOK_URL`, and optional `OUTPOST_API_BASE_URL` set, the generated script runs to completion (no uncaught API errors) and prints or logs an event id or other clear success signal. *Skip only for transcript-only triage.* + +## Failure modes to note + +- Defaulting to localhost API without user asking for self-hosted. +- Using raw `fetch` when user asked for TypeScript SDK specifically. \ No newline at end of file diff --git a/docs/agent-evaluation/scenarios/03-basics-python.md b/docs/agent-evaluation/scenarios/03-basics-python.md new file mode 100644 index 000000000..2d9ecb88b --- /dev/null +++ b/docs/agent-evaluation/scenarios/03-basics-python.md @@ -0,0 +1,43 @@ +# Scenario 3 — Basics with Python + +## Intent + +Agent should produce a **single Python script** using `outpost_sdk`, equivalent to scenario 2. + +## Preconditions + +- Python 3.9+; `pip install outpost_sdk`. +- `OUTPOST_API_KEY`, `OUTPOST_TEST_WEBHOOK_URL` set. + +## Automated eval (Claude Agent SDK) + +The harness sets the agent **cwd** to `docs/agent-evaluation/results/runs/-scenario-NN/`. Save `*.py`, `requirements.txt` or `pyproject.toml` with **Write** / **Edit**; use **Bash** for `pip` / `uv` installs so the run directory is self-contained. + +## Conversation script + +### Turn 0 + +Paste the **## Template** block from [`hookdeck-outpost-agent-prompt.mdx`](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx), with `{{…}}` filled using your project or [`fixtures/placeholder-values-for-turn0.md`](../fixtures/placeholder-values-for-turn0.md). + +### Turn 1 — User + +> Option 1 — try it out. Use **Python** with `outpost_sdk`. Read credentials from the environment. Same flow: tenant, webhook destination, one publish, print event id. + +### Turn 2 — User (optional) + +> Keep it to one file I can run with `python`. + +## Success criteria + +**Measurement:** Heuristic `scoreScenario03` in [`src/score-transcript.ts`](../src/score-transcript.ts); LLM judge maps the checklist below ([`README.md` § Measuring scenarios](../README.md#measuring-scenarios)). Execution row is manual. + +- [ ] `from outpost_sdk import Outpost` (or equivalent documented import path). +- [ ] `Outpost(api_key=..., server_url=...)` with optional base URL from env. +- [ ] `tenants.upsert`, `destinations.create`, `publish.event` with correct shapes. +- [ ] Topic aligned with prompt; webhook URL from env. +- [ ] No secrets in file. +- [ ] **Execution (full pass):** With `OUTPOST_API_KEY`, `OUTPOST_TEST_WEBHOOK_URL`, and optional base URL env vars set, `python …` (as documented) completes without API errors and prints an event id or clear success. *Skip only for transcript-only triage.* + +## Failure modes to note + +- Using `requests` only when user asked for the official SDK. diff --git a/docs/agent-evaluation/scenarios/04-basics-go.md b/docs/agent-evaluation/scenarios/04-basics-go.md new file mode 100644 index 000000000..29622c6a1 --- /dev/null +++ b/docs/agent-evaluation/scenarios/04-basics-go.md @@ -0,0 +1,38 @@ +# Scenario 4 — Basics with Go + +## Intent + +Agent should produce a **small Go program** using `github.com/hookdeck/outpost/sdks/outpost-go`, equivalent to scenarios 2–3. + +## Preconditions + +- Go toolchain; module with `outpost-go` dependency. +- `OUTPOST_API_KEY`, `OUTPOST_TEST_WEBHOOK_URL` set. + +## Automated eval (Claude Agent SDK) + +The harness sets the agent **cwd** to `docs/agent-evaluation/results/runs/-scenario-NN/`. Write `go.mod`, `main.go`, etc. with **Write** / **Edit**; use **Bash** for `go mod init`, `go mod tidy`, and `go run` so the folder is a complete module. + +## Conversation script + +### Turn 0 + +Paste the **## Template** block from [`hookdeck-outpost-agent-prompt.mdx`](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx), with `{{…}}` filled using your project or [`fixtures/placeholder-values-for-turn0.md`](../fixtures/placeholder-values-for-turn0.md). + +### Turn 1 — User + +> Option 1 — try it out. Use **Go** and the official Outpost Go SDK. Environment variables for API key and test webhook URL. Tenant upsert, webhook destination, publish one event, print ids. + +## Success criteria + +**Measurement:** Heuristic `scoreScenario04` in [`src/score-transcript.ts`](../src/score-transcript.ts); LLM judge maps the checklist below ([`README.md` § Measuring scenarios](../README.md#measuring-scenarios)). Execution row is manual. + +- [ ] `outpostgo.New` with `WithSecurity` (and optional `WithServerURL`). +- [ ] `Tenants.Upsert`, `Destinations.Create` with `CreateDestinationCreateWebhook` (or correct union wrapper), `Publish.Event`. +- [ ] Topic and tenant id explicit; matches prompt topics. +- [ ] No API key in source. +- [ ] **Execution (full pass):** With `OUTPOST_API_KEY`, `OUTPOST_TEST_WEBHOOK_URL`, and optional server URL env vars set, `go run …` succeeds and prints ids or clear success. *Skip only for transcript-only triage.* + +## Failure modes to note + +- Passing raw struct to `Create` without `CreateDestinationCreateWebhook` wrapper (common compile mistake). diff --git a/docs/agent-evaluation/scenarios/05-app-nextjs.md b/docs/agent-evaluation/scenarios/05-app-nextjs.md new file mode 100644 index 000000000..3e5ffa10b --- /dev/null +++ b/docs/agent-evaluation/scenarios/05-app-nextjs.md @@ -0,0 +1,58 @@ +# Scenario 5 — Minimal example app (Next.js) + +## Intent + +Agent scaffolds a **minimal Next.js** app (App Router or Pages Router acceptable) with a **simple UI** that lets an operator: + +1. Register a **webhook destination** for a tenant (URL input + submit). +2. After registration, **trigger a test publish** to a configured topic so the destination receives an event. + +Server-side code must call Outpost with the API key from **environment** (e.g. `OUTPOST_API_KEY`), never exposed to the browser. + +## Preconditions + +- User has Node 18+; comfortable creating a Next app. +- `OUTPOST_API_KEY`, managed base URL, at least one topic, and `OUTPOST_TEST_WEBHOOK_URL` or user-supplied URL pattern documented. + +## Automated eval (Claude Agent SDK) + +The harness sets the agent **cwd** to an empty directory under `docs/agent-evaluation/results/runs/-scenario-NN/`. You **must** scaffold the Next.js app **into that directory** (e.g. `npx create-next-app@latest` with flags for non-interactive use) using **Bash**, then implement routes/server code with **Write** / **Edit**. Chat-only snippets are not enough for this scenario—the run folder should contain a real project tree reviewers can `npm install && npm run dev`. + +## Conversation script + +### Turn 0 + +Paste the **## Template** block from `[hookdeck-outpost-agent-prompt.mdx](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx)`, with `{{…}}` filled using your project or `[fixtures/placeholder-values-for-turn0.md](../fixtures/placeholder-values-for-turn0.md)`. + +### Turn 1 — User + +> Option 2 — build a minimal example. I want **Next.js**. Very small UI: field for webhook URL, button to create the webhook destination for tenant `demo_tenant` (or let me edit tenant id in the UI), and a button to send one test event on topic `user.created` (or the first topic from the prompt). Use the Outpost TypeScript SDK on the server only. + +### Turn 2 — User (optional) + +> Add a short README with env vars and `npm run dev` steps. + +### Turn 3 — User (stress) + +> I do not have a public URL yet — what should I use for the webhook URL field? + +Expected: agent suggests Hookdeck Console Source URL or similar, aligned with quickstarts. + +## Success criteria + +**Measurement:** Heuristic `scoreScenario05` in `[src/score-transcript.ts](../src/score-transcript.ts)`; LLM judge maps the bullets below (`[README.md` § Measuring scenarios](../README.md#measuring-scenarios)). Execution row is manual. + +- Next.js project structure with install/run instructions. +- API routes or server actions perform Outpost calls; **no API key** in client bundles. +- UI flow covers **create destination** and **publish** (two distinct actions visible to the user). +- Tenant id and topic are configurable or clearly documented constants. +- Uses managed base URL by default. +- README lists required env vars. +- **Execution (full pass):** After `npm install` and `npm run dev` (or documented command), a manual smoke test completes **both** flows: register webhook destination and trigger test publish, without 5xx from your app’s Outpost calls and with Outpost accepting the requests. Requires `OUTPOST_API_KEY` and related env in `.env.local` or as documented. *Skip only for transcript-only triage.* + +## Failure modes to note + +- Calling Outpost directly from browser-side code with embedded key. +- Only publishing without a UI path to register the destination first. +- Hard-coding localhost Outpost without user request. + diff --git a/docs/agent-evaluation/scenarios/06-app-fastapi.md b/docs/agent-evaluation/scenarios/06-app-fastapi.md new file mode 100644 index 000000000..1f00b5f68 --- /dev/null +++ b/docs/agent-evaluation/scenarios/06-app-fastapi.md @@ -0,0 +1,47 @@ +# Scenario 6 — Minimal example app (FastAPI + Jinja or HTMX) + +## Intent + +Same product behavior as [scenario 5](05-app-nextjs.md), but stack is **Python FastAPI**: + +- Server renders a **simple HTML form** (Jinja2 templates, HTMX, or minimal static HTML served by FastAPI). +- Endpoints (or form posts) call `outpost_sdk` with env-based API key. +- User can submit webhook URL → create destination; user can trigger test publish. + +## Preconditions + +- Python 3.9+; `fastapi`, `uvicorn`, `outpost_sdk`. + +## Automated eval (Claude Agent SDK) + +The harness sets the agent **cwd** to `docs/agent-evaluation/results/runs/-scenario-NN/`. Create the FastAPI app **in that directory**: add source files with **Write** / **Edit**, install deps with **Bash** (`pip` / `uv`). The run folder must be a small but complete app (not only code pasted in chat). + +## Conversation script + +### Turn 0 + +Paste the **## Template** block from [`hookdeck-outpost-agent-prompt.mdx`](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx), with `{{…}}` filled using your project or [`fixtures/placeholder-values-for-turn0.md`](../fixtures/placeholder-values-for-turn0.md). + +### Turn 1 — User + +> Option 2 — minimal example with **FastAPI**. Single small app: HTML page with webhook URL field, button to register destination for tenant `demo_tenant`, button to publish one test event. Use `outpost_sdk` only on the server. Keep it to a few files. + +### Turn 2 — User (optional) + +> Document env vars and `uvicorn` command in README. + +## Success criteria + +**Measurement:** Heuristic `scoreScenario06` in [`src/score-transcript.ts`](../src/score-transcript.ts); LLM judge maps the checklist below ([`README.md` § Measuring scenarios](../README.md#measuring-scenarios)). Execution row is manual. + +- [ ] FastAPI app runs with one command documented (`uvicorn ...`). +- [ ] Outpost calls only server-side; API key from environment. +- [ ] Two user-visible actions: **register webhook** and **publish test event**. +- [ ] Managed API base URL by default. +- [ ] README with `OUTPOST_API_KEY`, `OUTPOST_TEST_WEBHOOK_URL` or equivalent. +- [ ] **Execution (full pass):** App starts (`uvicorn` or as documented); manual smoke test completes **register webhook** and **publish test event** without server errors on Outpost calls. Env vars set including `OUTPOST_API_KEY`. *Skip only for transcript-only triage.* + +## Failure modes to note + +- Exposing API key to templates/inline JS. +- Using only `curl` subprocesses when user asked for FastAPI + SDK. diff --git a/docs/agent-evaluation/scenarios/07-app-go-http.md b/docs/agent-evaluation/scenarios/07-app-go-http.md new file mode 100644 index 000000000..cfdd594a9 --- /dev/null +++ b/docs/agent-evaluation/scenarios/07-app-go-http.md @@ -0,0 +1,46 @@ +# Scenario 7 — Minimal example app (Go net/http) + +## Intent + +Same behavior as scenarios 5–6: **small Go program** using `net/http` (no heavy framework required) that serves **basic HTML** with: + +1. Form or fields for webhook URL → create webhook destination (via `outpost-go`). +2. Control to **publish** one test event. + +## Preconditions + +- Go 1.22+; `outpost-go` module. + +## Automated eval (Claude Agent SDK) + +The harness sets the agent **cwd** to `docs/agent-evaluation/results/runs/-scenario-NN/`. Initialize the module and server **there** (`go mod init`, `go get`, etc. via **Bash**; `main.go` / `handlers.go` via **Write** / **Edit`). Reviewers should be able to `go run .` from the run directory after the eval. + +## Conversation script + +### Turn 0 + +Paste the **## Template** block from `[hookdeck-outpost-agent-prompt.mdx](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx)`, with `{{…}}` filled using your project or `[fixtures/placeholder-values-for-turn0.md](../fixtures/placeholder-values-for-turn0.md)`. + +### Turn 1 — User + +> Option 2 — minimal example in **Go**. Standard library HTTP server, simple HTML page: register webhook destination for a fixed tenant id, then button to publish one event. Use the official Go SDK for Outpost calls. API key from environment. + +### Turn 2 — User (optional) + +> Keep everything in `main.go` if reasonable, or split `handlers.go` — your choice, but stay small. + +## Success criteria + +**Measurement:** Heuristic `scoreScenario07` in [`src/score-transcript.ts`](../src/score-transcript.ts); LLM judge maps the bullets below ([`README.md` § Measuring scenarios](../README.md#measuring-scenarios)). Execution row is manual. + +- `go run .` (or `go run main.go`) documented. +- HTML UI with two flows: **create destination**, **publish**. +- SDK used server-side only; `OUTPOST_API_KEY` from env. +- Correct `CreateDestinationCreateWebhook` usage. +- README lists env vars and port. +- **Execution (full pass):** `go run …` starts the server; manual smoke test completes **create destination** and **publish** through the HTML UI without Outpost API failures. `OUTPOST_API_KEY` (and related env) set. *Skip only for transcript-only triage.* + +## Failure modes to note + +- Embedding API key in HTML/JS. +- Omitting publish action after destination registration. \ No newline at end of file diff --git a/docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md b/docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md new file mode 100644 index 000000000..56cd9c9b0 --- /dev/null +++ b/docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md @@ -0,0 +1,59 @@ +# Scenario 8 — Integrate Outpost into an existing Next.js SaaS app + +## Intent + +Operators often have a **production-shaped SaaS codebase** (auth, teams, dashboard) and need **outbound webhooks** for their customers. This scenario measures whether the agent can **clone a known open-source baseline**, understand where **domain events** happen, and **wire Hookdeck Outpost** so events are **published** to Outpost (with **per-tenant webhook destinations** documented or implemented). + +**Baseline application (pin this in evals):** [**leerob/next-saas-starter**](https://github.com/leerob/next-saas-starter) — Next.js, PostgreSQL, Drizzle, team/member flows, MIT license. It is a common reference for “real” SaaS structure; adjust the prompt if you standardize on another repo. + +## Preconditions + +- Node 18+; `git` available. +- Same Turn 0 placeholders as other scenarios (`OUTPOST_API_KEY` **not** in the prompt text; test destination URL from dashboard). + +## Automated eval (Claude Agent SDK) + +The harness **`cwd`** is an empty directory under `results/runs/-scenario-08/`. The agent should **`git clone`** the baseline into that workspace (or a subdirectory), **`npm` / `pnpm install`** via **Bash**, then **Write** / **Edit** integration code. Reviewers inspect the run folder and transcript. + +## Conversation script + +### Turn 0 + +Paste the **## Template** block from [`hookdeck-outpost-agent-prompt.mdx`](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx), with `{{…}}` filled using your project or [`fixtures/placeholder-values-for-turn0.md`](../fixtures/placeholder-values-for-turn0.md). + +### Turn 1 — User + +> **Option 3 — integrate with an existing app.** Clone **`https://github.com/leerob/next-saas-starter`** into this workspace (subdirectory is fine), install dependencies per its README, and get it in a state where we could run it locally. +> +> Then integrate **Hookdeck Outpost** for **outbound webhooks** to our customers: +> +> 1. Use the official **`@hookdeck/outpost-sdk`** on the **server only** (API routes, server actions, or equivalent — never expose `OUTPOST_API_KEY` to the browser). +> 2. Pick **one meaningful domain event** in this starter (e.g. team or member lifecycle — choose something that actually exists in the code) and **`publish`** an event to Outpost with a **topic** from the Turn 0 prompt (or document the topic constant). +> 3. Document how an operator registers a **webhook destination** per **tenant/customer** (REST flow or small admin UI is fine). Use the test destination URL from Turn 0 where helpful. +> 4. Add or update a **README section** listing required env vars (`OUTPOST_API_KEY`, optional base URL, anything else you add). + +### Turn 2 — User (optional) + +> Where should we call **`tenants.upsert`** relative to our own tenant/customer model? + +## Success criteria + +**Measurement:** Heuristic `scoreScenario08` in [`src/score-transcript.ts`](../src/score-transcript.ts); LLM judge maps the bullets below ([`README.md` § Measuring scenarios](../README.md#measuring-scenarios)). Execution row is manual. + +- Baseline app is the documented **next-saas-starter** (or an explicitly justified fork) with clone + install steps reflected in the transcript or run directory. +- **Outpost TypeScript SDK** used **server-side only**; no `NEXT_PUBLIC_*` API key. +- At least one **publish** (or equivalent) tied to a **real code path** in the baseline (not dead code). +- **Topic** aligns with Turn 0 configuration or is clearly named and documented. +- **Per-customer webhook** story is explained: destination creation / subscription to topic. +- README (or equivalent) lists **env vars** for Outpost. +- **Execution (full pass):** With `OUTPOST_API_KEY` set, the app runs; a manual path triggers the integrated publish and Outpost accepts the request (2xx/202 as appropriate). *Skip only for transcript-only triage.* + +## Failure modes to note + +- Pasting a greenfield Next app instead of integrating the **cloned** baseline. +- Publishing only from a demo route unrelated to the product model. +- Calling Outpost from client components with secrets. + +## Future baselines + +Java / .NET “existing app” scenarios can follow the same shape: fixed public baseline repo + Option 3 Turn 1 + Success criteria + `scoreScenarioNN`. diff --git a/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md b/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md new file mode 100644 index 000000000..72c63ef86 --- /dev/null +++ b/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md @@ -0,0 +1,52 @@ +# Scenario 9 — Integrate Outpost into an existing FastAPI SaaS app + +## Intent + +Same as [scenario 8](08-integrate-nextjs-existing.md), but stack is **Python + FastAPI** with a **multi-tenant / org** style baseline. + +**Baseline application (pin this in evals):** [**philipokiokio/FastAPI_SAAS_Template**](https://github.com/philipokiokio/FastAPI_SAAS_Template) — FastAPI, organizations, permissions, Alembic, MIT-style OSS template commonly used as a starting point. Substitute only if you document another baseline in the scenario and update heuristics. + +## Preconditions + +- Python 3.10+; `git` available. + +## Automated eval (Claude Agent SDK) + +**`cwd`** is `results/runs/-scenario-09/`. Expect **`git clone`**, **`pip` / `uv`**, then **Write** / **Edit** for Outpost integration. + +## Conversation script + +### Turn 0 + +Paste the **## Template** from [`hookdeck-outpost-agent-prompt.mdx`](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx) with placeholders filled. + +### Turn 1 — User + +> **Option 3 — integrate with an existing app.** Clone **`https://github.com/philipokiokio/FastAPI_SAAS_Template`** into this workspace, install dependencies per its README (venv + `pip install -r requirements.txt` or `uv` as you prefer). +> +> Integrate **Hookdeck Outpost** for **outbound webhooks**: +> +> 1. Use **`outpost_sdk`** only in **server** code (routers, services — never embed the API key in templates or static JS). +> 2. Hook **`publish.event`** (and tenant/destination setup as needed) to **one real domain event** in this template (e.g. org membership or user lifecycle — pick something that exists in the codebase). +> 3. Document how operators register **webhook destinations** per tenant/customer and which **topic** you publish on (use topics from Turn 0 when possible). +> 4. Document **`OUTPOST_API_KEY`** and **`uvicorn`** (or equivalent) run instructions in README. + +### Turn 2 — User (optional) + +> Should **`tenants.upsert`** run at org creation or lazily on first publish? + +## Success criteria + +**Measurement:** Heuristic `scoreScenario09` in [`src/score-transcript.ts`](../src/score-transcript.ts); LLM judge; execution manual. + +- Cloned **FastAPI_SAAS_Template** (or documented alternative) with install steps. +- **`outpost_sdk`** with **`publish.event`** (and related calls as needed) on a **real** code path. +- API key from **environment** or secure settings — not hard-coded or exposed to clients. +- **Topic** and **destination** story documented. +- README updated for env + run. +- **Execution (full pass):** App starts; trigger path fires publish; Outpost accepts. *Skip for transcript-only.* + +## Failure modes to note + +- Greenfield FastAPI “hello world” instead of the **cloned** template. +- Using raw `httpx` to Outpost when the scenario asks for **`outpost_sdk`**. diff --git a/docs/agent-evaluation/scenarios/10-integrate-go-existing.md b/docs/agent-evaluation/scenarios/10-integrate-go-existing.md new file mode 100644 index 000000000..c8f91c79e --- /dev/null +++ b/docs/agent-evaluation/scenarios/10-integrate-go-existing.md @@ -0,0 +1,51 @@ +# Scenario 10 — Integrate Outpost into an existing Go SaaS API + +## Intent + +Same integration goal as [scenarios 8–9](08-integrate-nextjs-existing.md), for a **Go** REST API baseline with **auth and typical SaaS** structure. + +**Baseline application (pin this in evals):** [**devinterface/startersaas-go-api**](https://github.com/devinterface/startersaas-go-api) — Go API, JWT, MongoDB, Stripe hooks, Docker — MIT license, small enough to clone in an eval. If you standardize on another Go SaaS boilerplate, update this file and `scoreScenario10`’s baseline check. + +## Preconditions + +- Go 1.21+; `git` available. + +## Automated eval (Claude Agent SDK) + +**`cwd`** is `results/runs/-scenario-10/`. Expect **`git clone`**, **`go mod`** / **`go get`** for **`outpost-go`**, then source edits. + +## Conversation script + +### Turn 0 + +Paste the **## Template** from [`hookdeck-outpost-agent-prompt.mdx`](../pages/quickstarts/hookdeck-outpost-agent-prompt.mdx) with placeholders filled. + +### Turn 1 — User + +> **Option 3 — integrate with an existing app.** Clone **`https://github.com/devinterface/startersaas-go-api`** into this workspace and make it build (`go build` / `go test` ./… as appropriate per the repo). +> +> Add **Hookdeck Outpost** for **outbound webhooks** to customers: +> +> 1. Use the official **Go SDK** (`github.com/hookdeck/outpost/sdks/outpost-go` or current module path from docs). +> 2. **`OUTPOST_API_KEY`** from environment only. +> 3. On **one real domain event** in this API (e.g. user registration, subscription, or another existing handler), call **`Publish.Event`** (and **`Tenants` / `Destinations`** as needed) with a **topic** from Turn 0. +> 4. Document how to register **webhook destinations** per tenant and which env vars to set. Mention the Hookdeck test destination URL from Turn 0 where useful. + +### Turn 2 — User (optional) + +> Show where **`CreateDestinationCreateWebhook`** fits if we let each customer paste a webhook URL in a settings API. + +## Success criteria + +**Measurement:** Heuristic `scoreScenario10` in [`src/score-transcript.ts`](../src/score-transcript.ts); LLM judge; execution manual. + +- Cloned **startersaas-go-api** (or documented alternative) with build instructions attempted. +- **Outpost Go SDK** used with **`Publish.Event`** (and related types) on a **real** handler path. +- No API key in source; **`os.Getenv("OUTPOST_API_KEY")`** (or config loader) only. +- **Topic** + **destination** documentation for operators. +- **Execution (full pass):** Server runs; trigger handler; Outpost accepts publish. *Skip for transcript-only.* + +## Failure modes to note + +- New `main.go` only, without using the **cloned** baseline’s routes/models. +- Wrong `Create` shape without **`CreateDestinationCreateWebhook`** when creating webhook destinations. diff --git a/docs/agent-evaluation/scripts/ci-eval.sh b/docs/agent-evaluation/scripts/ci-eval.sh new file mode 100755 index 000000000..4197c8b92 --- /dev/null +++ b/docs/agent-evaluation/scripts/ci-eval.sh @@ -0,0 +1,22 @@ +#!/usr/bin/env bash +# CI-friendly agent eval: scenarios 01+02 with heuristic + LLM judge (Success criteria from each scenario .md). +# +# Required secrets (e.g. GitHub Actions): ANTHROPIC_API_KEY, EVAL_TEST_DESTINATION_URL +# Optional: same vars in docs/agent-evaluation/.env for local runs. +# +# Scenarios: 01 = curl quickstart shape; 02 = TypeScript SDK script. See README § CI. +set -euo pipefail + +ROOT="$(cd "$(dirname "$0")/.." && pwd)" +cd "$ROOT" + +if [[ -z "${ANTHROPIC_API_KEY:-}" ]]; then + echo "ci-eval: ANTHROPIC_API_KEY is not set" >&2 + exit 1 +fi +if [[ -z "${EVAL_TEST_DESTINATION_URL:-}" ]]; then + echo "ci-eval: EVAL_TEST_DESTINATION_URL is not set" >&2 + exit 1 +fi + +exec npm run eval:ci diff --git a/docs/agent-evaluation/scripts/run-scenario.sh b/docs/agent-evaluation/scripts/run-scenario.sh new file mode 100755 index 000000000..7b24d3291 --- /dev/null +++ b/docs/agent-evaluation/scripts/run-scenario.sh @@ -0,0 +1,46 @@ +#!/usr/bin/env bash +# Manual agent evaluation helper: prints paths and Turn 0 instructions. +# Does NOT invoke an LLM or run automated tests. +set -euo pipefail + +ROOT="$(cd "$(dirname "$0")/.." && pwd)" +REPO_ROOT="$(cd "$ROOT/../.." && pwd)" + +usage() { + echo "Usage: $0 <01|02|03|04|05|06|07|08|09|10>" + echo "Prints the scenario file path and how to obtain Turn 0 from the single source of truth." + echo "" + echo "This script does not call an API or start an agent." +} + +if [[ "${1:-}" == "-h" || "${1:-}" == "--help" || -z "${1:-}" ]]; then + usage + exit 0 +fi + +id="$1" +shopt -s nullglob +matches=( "$ROOT/scenarios/${id}"-*.md ) +shopt -u nullglob + +if [[ ${#matches[@]} -eq 0 ]]; then + echo "No scenario matching: scenarios/${id}-*.md" >&2 + exit 1 +fi + +scenario="${matches[0]}" + +echo "=== Outpost agent eval (manual) ===" +echo "" +echo "Scenario file:" +echo " $scenario" +echo "" +echo "Turn 0 — copy the fenced block under '## Template' from:" +echo " $REPO_ROOT/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx" +echo "" +echo "Placeholder examples (not the template):" +echo " $ROOT/fixtures/placeholder-values-for-turn0.md" +echo "" +echo "Record results (local copy; see results/.gitignore):" +echo " cp \"$ROOT/results/RUN-RECORDING.template.md\" \"$ROOT/results/$(date +%F)-s${id}-.md\"" +echo "" diff --git a/docs/agent-evaluation/src/llm-judge.ts b/docs/agent-evaluation/src/llm-judge.ts new file mode 100644 index 000000000..b3e9ae0b9 --- /dev/null +++ b/docs/agent-evaluation/src/llm-judge.ts @@ -0,0 +1,230 @@ +/** + * LLM-as-judge scoring via Anthropic Messages API. + * Feeds scenario Success criteria + assistant transcript; returns structured JSON from the model. + */ + +import { readFile } from "node:fs/promises"; +import { basename, dirname, join } from "node:path"; +import { extractTranscriptScoringText } from "./score-transcript.js"; + +const ANTHROPIC_MESSAGES_URL = "https://api.anthropic.com/v1/messages"; +const DEFAULT_SCORE_MODEL = "claude-sonnet-4-20250514"; +const MAX_TRANSCRIPT_CHARS = 180_000; + +export interface LlmCriterionJudgment { + readonly criterion: string; + readonly pass: boolean; + readonly evidence: string; +} + +export interface LlmJudgeReport { + readonly version: 1; + readonly model: string; + readonly runFile: string; + readonly scenarioFile: string; + readonly overall_transcript_pass: boolean; + /** LLM cannot run curls; always note limits */ + readonly execution_in_transcript: { + readonly pass: boolean | null; + readonly note: string; + }; + readonly criteria: readonly LlmCriterionJudgment[]; + readonly summary: string; +} + +interface RunJson { + meta?: { + scenarioId?: string; + scenarioFile?: string; + turns?: readonly { label?: string; messageCount?: number }[]; + }; + messages?: unknown[]; +} + +export function extractSuccessCriteriaMarkdown(fullMd: string): string { + const anchor = "## Success criteria"; + const i = fullMd.indexOf(anchor); + if (i === -1) { + return "(No ## Success criteria section found.)"; + } + const rest = fullMd.slice(i); + const sub = rest.slice(anchor.length); + const rel = sub.search(/\n## [A-Za-z]/); + return rel === -1 ? rest.trim() : rest.slice(0, anchor.length + rel).trim(); +} + +function stripJsonFence(text: string): string { + const t = text.trim(); + const m = t.match(/^```(?:json)?\s*([\s\S]*?)```$/m); + if (m) return m[1].trim(); + return t; +} + +function parseJudgeJson(text: string): Omit & { + version?: number; +} { + const raw = stripJsonFence(text); + const parsed = JSON.parse(raw) as Record; + const overall = Boolean(parsed.overall_transcript_pass); + const criteriaIn = parsed.criteria; + const criteria: LlmCriterionJudgment[] = []; + if (Array.isArray(criteriaIn)) { + for (const c of criteriaIn) { + if (typeof c !== "object" || c === null) continue; + const o = c as Record; + criteria.push({ + criterion: String(o.criterion ?? o.id ?? "unnamed"), + pass: Boolean(o.pass), + evidence: String(o.evidence ?? ""), + }); + } + } + const exec = parsed.execution_in_transcript; + let execution_in_transcript: LlmJudgeReport["execution_in_transcript"] = { + pass: null, + note: "Not specified by judge.", + }; + if (typeof exec === "object" && exec !== null) { + const e = exec as Record; + execution_in_transcript = { + pass: typeof e.pass === "boolean" ? e.pass : null, + note: String(e.note ?? ""), + }; + } + return { + overall_transcript_pass: overall, + execution_in_transcript: execution_in_transcript, + criteria, + summary: String(parsed.summary ?? ""), + }; +} + +const JUDGE_SYSTEM = `You are an expert evaluator for Hookdeck Outpost onboarding documentation and API usage. +You judge whether an AI assistant's replies satisfy the scenario's Success criteria (markdown checklist from the scenario spec). +Be strict: a criterion passes only if the transcript (including code the model wrote via tools) clearly satisfies it. +You cannot run shell or HTTP — do not claim execution passed; use execution_in_transcript.pass = null and explain in note. +Output ONLY valid JSON (no markdown fences, no commentary outside JSON) matching this shape: +{ + "overall_transcript_pass": boolean, + "execution_in_transcript": { "pass": null, "note": "string explaining you did not execute code" }, + "criteria": [ + { "criterion": "short label from checklist", "pass": boolean, "evidence": "1-3 sentences; quote or paraphrase assistant" } + ], + "summary": "2-4 sentences overall" +} +Map each major bullet/checkbox line from Success criteria to one criteria[] entry (merge tiny sub-bullets if needed).`; + +export async function llmJudgeRun(options: { + readonly runPath: string; + readonly scenarioMdPath: string; + readonly apiKey: string; + readonly model?: string; +}): Promise { + const model = options.model?.trim() || process.env.EVAL_SCORE_MODEL?.trim() || DEFAULT_SCORE_MODEL; + const rawRun = await readFile(options.runPath, "utf8"); + const data = JSON.parse(rawRun) as RunJson; + const scenarioFile = data.meta?.scenarioFile ?? "unknown.md"; + const scenarioMd = await readFile(options.scenarioMdPath, "utf8"); + const criteriaBlock = extractSuccessCriteriaMarkdown(scenarioMd); + + let transcript = extractTranscriptScoringText(data.messages); + if (transcript.length > MAX_TRANSCRIPT_CHARS) { + transcript = + transcript.slice(0, MAX_TRANSCRIPT_CHARS) + + "\n\n[… transcript truncated for judge context …]\n"; + } + + const userContent = `## Success criteria (from scenario spec — your rubric) + +${criteriaBlock} + +--- + +## Transcript for review (assistant text plus tool-written file contents and tool inputs from the run JSON) + +${transcript} + +--- + +Judge the transcript against the Success criteria. Remember: execution (running curl against a live API) is NOT evidenced here unless the transcript explicitly describes successful HTTP results; normally set execution_in_transcript.pass to null.`; + + const res = await fetch(ANTHROPIC_MESSAGES_URL, { + method: "POST", + headers: { + "content-type": "application/json", + "x-api-key": options.apiKey, + "anthropic-version": "2023-06-01", + }, + body: JSON.stringify({ + model, + max_tokens: 8192, + system: JUDGE_SYSTEM, + messages: [{ role: "user", content: userContent }], + }), + }); + + if (!res.ok) { + const errText = await res.text(); + throw new Error(`Anthropic API ${res.status}: ${errText.slice(0, 2000)}`); + } + + const body = (await res.json()) as { + content?: readonly { type?: string; text?: string }[]; + }; + const textBlock = body.content?.find((c) => c.type === "text"); + const text = textBlock?.text ?? ""; + let judged: ReturnType; + try { + judged = parseJudgeJson(text); + } catch { + throw new Error( + `Judge did not return parseable JSON. First 800 chars:\n${text.slice(0, 800)}`, + ); + } + + return { + version: 1, + model, + runFile: options.runPath, + scenarioFile, + overall_transcript_pass: judged.overall_transcript_pass, + execution_in_transcript: judged.execution_in_transcript, + criteria: judged.criteria, + summary: judged.summary, + }; +} + +export function scenarioMdPathFromRun( + evalRoot: string, + scenarioFile: string | undefined, +): string { + if (!scenarioFile?.trim()) { + throw new Error("Run JSON meta.scenarioFile is missing"); + } + return join(evalRoot, "scenarios", scenarioFile); +} + +export function formatLlmReportHuman(r: LlmJudgeReport): string { + const lines: string[] = [ + `LLM judge (${r.model})`, + `Transcript: ${r.runFile}`, + `Scenario: ${r.scenarioFile}`, + ]; + if (basename(r.runFile) === "transcript.json") { + lines.push(`Run directory: ${dirname(r.runFile)}`); + } + lines.push( + "", + `Overall transcript pass: ${r.overall_transcript_pass ? "YES" : "NO"}`, + `Execution (from transcript only): pass=${String(r.execution_in_transcript.pass)} — ${r.execution_in_transcript.note}`, + "", + "Per criterion:", + ); + for (const c of r.criteria) { + lines.push(` [${c.pass ? "PASS" : "FAIL"}] ${c.criterion}`); + lines.push(` ${c.evidence}`); + } + lines.push(""); + lines.push(`Summary: ${r.summary}`); + return lines.join("\n"); +} diff --git a/docs/agent-evaluation/src/run-agent-eval.ts b/docs/agent-evaluation/src/run-agent-eval.ts new file mode 100644 index 000000000..72464f3a2 --- /dev/null +++ b/docs/agent-evaluation/src/run-agent-eval.ts @@ -0,0 +1,527 @@ +/** + * Automated Outpost onboarding agent evals via the Claude Agent SDK. + * + * Requires ANTHROPIC_API_KEY (and EVAL_TEST_DESTINATION_URL). Does not call Outpost. + * For a full eval, humans (or a separate verifier) run generated artifacts using OUTPOST_API_KEY — see README. + * + * @see https://platform.claude.com/docs/en/agent-sdk/overview + */ + +import { mkdir, readdir, readFile, writeFile } from "node:fs/promises"; +import { join, dirname } from "node:path"; +import { fileURLToPath } from "node:url"; +import { parseArgs } from "node:util"; +import dotenv from "dotenv"; +import { + query, + type Options, + type SDKMessage, + type SDKSystemMessage, +} from "@anthropic-ai/claude-agent-sdk"; +import { llmJudgeRun, scenarioMdPathFromRun } from "./llm-judge.js"; +import { scoreRunFile } from "./score-transcript.js"; + +const __filename = fileURLToPath(import.meta.url); +const __dirname = dirname(__filename); + +/** `docs/agent-evaluation/` */ +const EVAL_ROOT = join(__dirname, ".."); + +dotenv.config({ path: join(EVAL_ROOT, ".env") }); +/** Outpost repository root */ +const REPO_ROOT = join(EVAL_ROOT, "..", ".."); +const PROMPT_MDX = join( + REPO_ROOT, + "docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx", +); +const SCENARIOS_DIR = join(EVAL_ROOT, "scenarios"); +const RUNS_DIR = join(EVAL_ROOT, "results", "runs"); + +function isInitSystemMessage(m: SDKMessage): m is SDKSystemMessage { + return m.type === "system" && m.subtype === "init"; +} + +function extractTemplateFromMdx(mdx: string): string { + const idx = mdx.indexOf("## Template"); + if (idx === -1) { + throw new Error("Could not find ## Template in hookdeck-outpost-agent-prompt.mdx"); + } + const after = mdx.slice(idx); + const fenceStart = after.indexOf("```"); + if (fenceStart === -1) { + throw new Error("No opening code fence after ## Template"); + } + const contentStart = after.indexOf("\n", fenceStart) + 1; + const fenceEnd = after.indexOf("```", contentStart); + if (fenceEnd === -1) { + throw new Error("No closing code fence for ## Template"); + } + return after.slice(contentStart, fenceEnd).trim(); +} + +function envFlagTruthy(v: string | undefined): boolean { + if (!v) return false; + const s = v.trim().toLowerCase(); + return s === "1" || s === "true" || s === "yes"; +} + +/** When docs are not published yet, point the agent at MDX/OpenAPI paths in this repo. */ +function localDocumentationBlock(repoRoot: string, llmsFullUrl: string | undefined): string { + const f = (...parts: string[]) => join(repoRoot, ...parts); + let block = `### Documentation (local repository — unpublished) + +Do **not** rely on live public documentation URLs for this session. Read these files from the Outpost checkout (for example with the **Read** tool). Paths are absolute from the repository root: + +- Getting started (curl): \`${f("docs/pages/quickstarts/hookdeck-outpost-curl.mdx")}\` +- TypeScript quickstart: \`${f("docs/pages/quickstarts/hookdeck-outpost-typescript.mdx")}\` +- Python quickstart: \`${f("docs/pages/quickstarts/hookdeck-outpost-python.mdx")}\` +- Go quickstart: \`${f("docs/pages/quickstarts/hookdeck-outpost-go.mdx")}\` +- API reference (human-oriented pages under): \`${f("docs/pages/references/")}\` +- OpenAPI spec (machine-readable): \`${f("docs/apis/openapi.yaml")}\` +- Destination types: \`${f("docs/pages/destinations/")}\` +- SDKs overview: \`${f("docs/pages/sdks.mdx")}\``; + if (llmsFullUrl) { + block += `\n- Full docs bundle: ${llmsFullUrl}`; + } + return block; +} + +function applyPlaceholders( + template: string, + env: NodeJS.ProcessEnv, + repoRoot: string, +): string { + const apiBase = + env.EVAL_API_BASE_URL ?? "https://api.outpost.hookdeck.com/2025-07-01"; + const topics = env.EVAL_TOPICS_LIST ?? "- user.created"; + const testUrl = env.EVAL_TEST_DESTINATION_URL?.trim(); + if (!testUrl) { + throw new Error( + "Set EVAL_TEST_DESTINATION_URL to your Hookdeck Console Source URL (same value the dashboard injects as {{TEST_DESTINATION_URL}})", + ); + } + const docsUrl = env.EVAL_DOCS_URL ?? "https://outpost.hookdeck.com/docs"; + const llms = env.EVAL_LLMS_FULL_URL?.trim() ?? ""; + const useLocalDocs = envFlagTruthy(env.EVAL_LOCAL_DOCS); + + let base = template; + if (useLocalDocs) { + const docSection = /^### Documentation\n\n[\s\S]*?(?=\n### What to do\b)/m; + if (!docSection.test(base)) { + throw new Error( + "EVAL_LOCAL_DOCS is set but the prompt template has no ### Documentation section before ### What to do", + ); + } + base = base.replace( + docSection, + localDocumentationBlock(repoRoot, llms || undefined), + ); + } + + let out = base + .replaceAll("{{API_BASE_URL}}", apiBase) + .replaceAll("{{TOPICS_LIST}}", topics) + .replaceAll("{{TEST_DESTINATION_URL}}", testUrl) + .replaceAll("{{DOCS_URL}}", docsUrl) + .replaceAll("{{LLMS_FULL_URL}}", llms); + + if (!llms) { + out = out + .split("\n") + .filter((line) => !/Full docs bundle/i.test(line)) + .join("\n"); + } + + return out; +} + +interface ParsedTurn { + readonly num: number; + readonly title: string; + readonly body: string; + readonly optional: boolean; +} + +function parseScenarioTurns(markdown: string): ParsedTurn[] { + const lines = markdown.split(/\r?\n/); + const turns: ParsedTurn[] = []; + let i = 0; + + while (i < lines.length) { + const line = lines[i]; + const m = line.match(/^### Turn (\d+)\s*(.*)$/); + if (m) { + const num = Number(m[1]); + const restOfTitle = m[2] ?? ""; + const title = `Turn ${m[1]}${restOfTitle ? ` ${restOfTitle}` : ""}`; + const optional = /optional/i.test(title); + i++; + const bodyLines: string[] = []; + while (i < lines.length) { + const L = lines[i]; + if (/^### /.test(L)) { + break; + } + if (/^## /.test(L)) { + break; + } + bodyLines.push(L); + i++; + } + turns.push({ + num, + title, + body: bodyLines.join("\n").trim(), + optional, + }); + continue; + } + i++; + } + + return turns.sort((a, b) => a.num - b.num); +} + +function extractUserMessage(turnBody: string): string { + const quoted: string[] = []; + for (const line of turnBody.split(/\r?\n/)) { + const q = line.match(/^\s*>\s?(.*)$/); + if (q) { + quoted.push(q[1]); + } + } + const fromBlockquote = quoted.join("\n").trim(); + if (fromBlockquote) { + return fromBlockquote; + } + return turnBody.replace(/^\s*$/gm, "").trim(); +} + +function serializeMessage(message: SDKMessage): unknown { + try { + return JSON.parse( + JSON.stringify(message, (_, v) => (typeof v === "bigint" ? v.toString() : v)), + ); + } catch { + return { _nonSerializable: String(message) }; + } +} + +async function listScenarioFiles(): Promise { + const names = await readdir(SCENARIOS_DIR); + return names + .filter((n) => /^\d{2}-.*\.md$/.test(n)) + .sort(); +} + +function idFromFilename(file: string): string { + return file.slice(0, 2); +} + +async function runScenarioQuery( + prompt: string, + options: Options, +): Promise<{ messages: unknown[]; sessionId?: string }> { + const messages: unknown[] = []; + let sessionId: string | undefined; + + const q = query({ prompt, options }); + for await (const message of q) { + messages.push(serializeMessage(message)); + if (isInitSystemMessage(message)) { + sessionId = message.session_id; + } + } + + return { messages, sessionId }; +} + +async function runOneScenario( + scenarioFile: string, + filledTemplate: string, + opts: { + skipOptional: boolean; + baseOptions: Options; + }, +): Promise<{ + scenarioId: string; + scenarioFile: string; + turns: Array<{ label: string; messageCount: number }>; + sessionId?: string; + allMessages: unknown[]; +}> { + const path = join(SCENARIOS_DIR, scenarioFile); + const md = await readFile(path, "utf8"); + const parsed = parseScenarioTurns(md); + + const userTurns = parsed + .filter((t) => t.num >= 1) + .filter((t) => !t.optional || !opts.skipOptional) + .map((t) => ({ + label: t.title, + text: extractUserMessage(t.body), + })) + .filter((t) => t.text.length > 0); + + const prompts = [filledTemplate, ...userTurns.map((t) => t.text)]; + + const allMessages: unknown[] = []; + let sessionId: string | undefined; + const turnStats: Array<{ label: string; messageCount: number }> = []; + + for (let i = 0; i < prompts.length; i++) { + const label = i === 0 ? "Turn 0 (dashboard prompt)" : userTurns[i - 1]?.label ?? `Turn ${i}`; + const before = allMessages.length; + const { messages, sessionId: sid } = await runScenarioQuery(prompts[i]!, { + ...opts.baseOptions, + resume: sessionId, + }); + if (sid) { + sessionId = sid; + } + allMessages.push(...messages); + turnStats.push({ + label, + messageCount: allMessages.length - before, + }); + } + + return { + scenarioId: idFromFilename(scenarioFile), + scenarioFile, + turns: turnStats, + sessionId, + allMessages, + }; +} + +function defaultEvalTools(env: NodeJS.ProcessEnv): string { + if (env.EVAL_TOOLS?.trim()) { + return env.EVAL_TOOLS.trim(); + } + // dontAsk + allowedTools: only listed tools are pre-approved; others are denied. + // Write/Edit: materialize scripts and apps into the per-run directory (agent cwd). + // Bash: npm/npx/go mod/pip/uv for app scenarios (05–07) and installs for 02–04. + // WebFetch: omitted when EVAL_LOCAL_DOCS uses repo paths + Read instead. + return envFlagTruthy(env.EVAL_LOCAL_DOCS) + ? "Read,Glob,Grep,Write,Edit,Bash" + : "Read,Glob,Grep,WebFetch,Write,Edit,Bash"; +} + +function buildBaseOptions(agentWorkspaceCwd: string): Options { + const toolsRaw = defaultEvalTools(process.env); + const allowedTools = toolsRaw + .split(",") + .map((s) => s.trim()) + .filter(Boolean); + + const mode = (process.env.EVAL_PERMISSION_MODE ?? "dontAsk") as NonNullable< + Options["permissionMode"] + >; + + const maxTurns = Number(process.env.EVAL_MAX_TURNS ?? "40"); + const persistSession = process.env.EVAL_PERSIST_SESSION !== "false"; + + const o: Options = { + cwd: agentWorkspaceCwd, + allowedTools, + permissionMode: mode, + maxTurns: Number.isFinite(maxTurns) ? maxTurns : 40, + persistSession, + env: { + ...process.env, + CLAUDE_AGENT_SDK_CLIENT_APP: "outpost-docs-agent-eval/1.0.0", + } as Record, + }; + + if (process.env.EVAL_MODEL?.trim()) { + o.model = process.env.EVAL_MODEL.trim(); + } + + return o; +} + +async function main(): Promise { + const { values } = parseArgs({ + options: { + scenario: { type: "string" }, + scenarios: { type: "string" }, + all: { type: "boolean", default: false }, + "skip-optional": { type: "boolean", default: false }, + "dry-run": { type: "boolean", default: false }, + "no-score": { type: "boolean", default: false }, + "no-score-llm": { type: "boolean", default: false }, + help: { type: "boolean", short: "h", default: false }, + }, + allowPositionals: false, + }); + + if (values.help) { + console.log(` +Outpost agent evaluation (Claude Agent SDK) + +Usage: + npm run eval -- --scenario 01 + npm run eval -- --scenarios 01,02,05 + npm run eval -- --all # deliberate: every scenario (costly) + npm run eval -- --skip-optional + npm run eval -- --no-score # skip heuristic-score.json + npm run eval -- --no-score-llm # skip llm-score.json (no Success-criteria judge) + npm run eval -- --no-score --no-score-llm # transcripts only + npm run eval -- --dry-run + +You must pass --scenario, --scenarios, or --all so the set of runs is explicit (cost and scope). +After each scenario: transcript + heuristic-score.json + llm-score.json (judge uses ## Success criteria) unless disabled above. +Exit 1 if any enabled score fails. + +Environment: + Values can be set in docs/agent-evaluation/.env (loaded automatically) or exported in the shell. + ANTHROPIC_API_KEY Required + EVAL_TEST_DESTINATION_URL Required — Hookdeck Console Source URL (fed into {{TEST_DESTINATION_URL}}) + EVAL_API_BASE_URL Optional (default: managed production URL) + EVAL_TOPICS_LIST Optional + EVAL_DOCS_URL Optional (ignored for doc links when EVAL_LOCAL_DOCS is set) + EVAL_LOCAL_DOCS Set to 1/true/yes to replace Documentation URLs with repo file paths (unpublished docs) + EVAL_LLMS_FULL_URL Optional (omit docs line if unset) + EVAL_TOOLS Optional, comma-separated (default: Read,Glob,Grep[,WebFetch],Write,Edit,Bash — see README) + EVAL_MODEL Optional + EVAL_MAX_TURNS Optional (default: 40) + EVAL_PERMISSION_MODE Optional (default: dontAsk) + EVAL_PERSIST_SESSION Set to "false" to disable session persistence (breaks multi-turn resume) + +Outputs under docs/agent-evaluation/results/runs/ (gitignored): each scenario gets + results/runs/-scenario-NN/transcript.json + heuristic-score.json and llm-score.json unless disabled (see above). +Also set EVAL_NO_SCORE_HEURISTIC=1 or EVAL_NO_SCORE_LLM=1 in .env to skip scoring without flags. + +Each run uses results/runs/-scenario-NN/ as agent cwd so Write creates files there. +`); + process.exit(0); + } + + if (!process.env.ANTHROPIC_API_KEY?.trim()) { + console.error("Missing ANTHROPIC_API_KEY"); + process.exit(1); + } + + const mdx = await readFile(PROMPT_MDX, "utf8"); + const template = extractTemplateFromMdx(mdx); + const filledTemplate = applyPlaceholders(template, process.env, REPO_ROOT); + + const allFiles = await listScenarioFiles(); + let selected: string[]; + + if (values.all) { + selected = allFiles; + } else if (values.scenarios) { + const ids = values.scenarios.split(",").map((s) => s.trim()); + selected = allFiles.filter((f) => ids.includes(idFromFilename(f))); + const missing = ids.filter((id) => !selected.some((f) => idFromFilename(f) === id)); + if (missing.length) { + console.error("Unknown scenario id(s):", missing.join(", ")); + process.exit(1); + } + } else if (values.scenario) { + const id = values.scenario.padStart(2, "0"); + selected = allFiles.filter((f) => idFromFilename(f) === id); + if (selected.length === 0) { + console.error("Unknown scenario:", values.scenario); + process.exit(1); + } + } else { + console.error( + "Choose which scenarios to run (cost is proportional): --scenario , --scenarios id,id, or --all for the full set.", + ); + console.error(`Available: ${allFiles.map((f) => idFromFilename(f)).join(", ")}`); + process.exit(1); + } + + if (values["dry-run"]) { + console.log("Dry run: would execute", selected.join(", ")); + console.log("Turn 0 length (chars):", filledTemplate.length); + process.exit(0); + } + + await mkdir(RUNS_DIR, { recursive: true }); + const stamp = new Date().toISOString().replace(/[:.]/g, "-"); + + const wantScore = + !values["no-score"] && + !envFlagTruthy(process.env.EVAL_NO_SCORE_HEURISTIC); + const wantLlm = + !values["no-score-llm"] && + !envFlagTruthy(process.env.EVAL_NO_SCORE_LLM); + + let anyScoreFailure = false; + + console.error( + `Running ${selected.length} scenario(s): ${selected.join(", ")} (heuristic=${String(wantScore)}, llm=${String(wantLlm)})`, + ); + + for (const file of selected) { + const scenarioIdEarly = idFromFilename(file); + const runDir = join(RUNS_DIR, `${stamp}-scenario-${scenarioIdEarly}`); + await mkdir(runDir, { recursive: true }); + + const baseOptions = buildBaseOptions(runDir); + console.error(`\n>>> Scenario ${file} (workspace ${runDir}) ...`); + const result = await runOneScenario(file, filledTemplate, { + skipOptional: values["skip-optional"] ?? false, + baseOptions, + }); + + const outPath = join(runDir, "transcript.json"); + const payload = { + meta: { + scenarioId: result.scenarioId, + scenarioFile: result.scenarioFile, + runDirectory: runDir, + agentWorkspaceCwd: runDir, + repositoryRoot: REPO_ROOT, + completedAt: new Date().toISOString(), + sessionId: result.sessionId, + turns: result.turns, + }, + messages: result.allMessages, + }; + + await writeFile(outPath, JSON.stringify(payload, null, 2), "utf8"); + console.error(`Wrote ${outPath}`); + + if (wantScore) { + const report = await scoreRunFile(outPath); + const scorePath = join(runDir, "heuristic-score.json"); + await writeFile(scorePath, `${JSON.stringify(report, null, 2)}\n`, "utf8"); + console.error(`Wrote ${scorePath} (transcript: ${report.transcript.passed}/${report.transcript.total}, overallTranscriptPass=${String(report.overallTranscriptPass)})`); + if (report.overallTranscriptPass === false) { + anyScoreFailure = true; + } + } + + if (wantLlm) { + const scenarioPath = scenarioMdPathFromRun(EVAL_ROOT, result.scenarioFile); + const llmReport = await llmJudgeRun({ + runPath: outPath, + scenarioMdPath: scenarioPath, + apiKey: process.env.ANTHROPIC_API_KEY!.trim(), + }); + const llmPath = join(runDir, "llm-score.json"); + await writeFile(llmPath, `${JSON.stringify(llmReport, null, 2)}\n`, "utf8"); + console.error( + `Wrote ${llmPath} (LLM overall_transcript_pass=${String(llmReport.overall_transcript_pass)})`, + ); + if (!llmReport.overall_transcript_pass) { + anyScoreFailure = true; + } + } + } + + if (anyScoreFailure) { + process.exit(1); + } +} + +main().catch((err) => { + console.error(err); + process.exit(1); +}); diff --git a/docs/agent-evaluation/src/score-eval.ts b/docs/agent-evaluation/src/score-eval.ts new file mode 100644 index 000000000..4c720060d --- /dev/null +++ b/docs/agent-evaluation/src/score-eval.ts @@ -0,0 +1,183 @@ +/** + * CLI: score a transcript JSON from npm run eval. + * + * Usage: + * npm run score -- --run results/runs/2026-...-scenario-01.json + * npm run score -- --latest + * npm run score -- --latest --scenario 01 + * npm run score -- --run .json --llm --write # Anthropic judge → .llm-score.json + */ + +import { readFile, writeFile } from "node:fs/promises"; +import { join, dirname } from "node:path"; +import { fileURLToPath } from "node:url"; +import { parseArgs } from "node:util"; +import dotenv from "dotenv"; +import { + formatLlmReportHuman, + llmJudgeRun, + scenarioMdPathFromRun, + type LlmJudgeReport, +} from "./llm-judge.js"; +import { + findLatestRunFile, + formatScoreReportHuman, + resolveTranscriptJsonPath, + scoreRunFile, + scoreSidecarPaths, + type ScoreReport, +} from "./score-transcript.js"; + +const __dirname = dirname(fileURLToPath(import.meta.url)); +const EVAL_ROOT = join(__dirname, ".."); +dotenv.config({ path: join(EVAL_ROOT, ".env") }); + +const RUNS_DIR = join(EVAL_ROOT, "results", "runs"); + +async function main(): Promise { + const { values, positionals } = parseArgs({ + options: { + run: { type: "string" }, + latest: { type: "boolean", default: false }, + scenario: { type: "string" }, + json: { type: "boolean", default: false }, + write: { type: "boolean", default: false }, + llm: { type: "boolean", default: false }, + "no-heuristic": { type: "boolean", default: false }, + help: { type: "boolean", short: "h", default: false }, + }, + allowPositionals: true, + }); + + if (values.help) { + console.log(` +Score an eval transcript. + + npm run score -- --run results/runs/-scenario-01/transcript.json + npm run score -- --run results/runs/-scenario-01 # directory ok + npm run score -- --latest [--scenario 01] + npm run score -- --write # heuristic-score.json + llm-score.json in run dir + npm run score -- --llm [--write] # Anthropic judge (needs ANTHROPIC_API_KEY) + npm run score -- --llm --no-heuristic # LLM only (no regex heuristic) + +Heuristic: src/score-transcript.ts. LLM: reads scenarios/*.md Success criteria + assistant text; model from EVAL_SCORE_MODEL (default claude-sonnet-4-20250514). + +Options: + --run transcript.json, a run directory, or legacy flat *-scenario-NN.json + --latest Newest transcript (nested run dir or legacy flat file) + --scenario With --latest, filter scenario-0 + --json Print machine-readable JSON only (last scorer: heuristic or LLM if --llm-only) + --write Write sidecar file(s) for enabled scorers + --llm Call Anthropic Messages API to judge against Success criteria + --no-heuristic Skip regex heuristic (use with --llm for API-only scoring) +`); + process.exit(0); + } + + let runPath: string | null = values.run ?? null; + if (values.latest) { + runPath = await findLatestRunFile(RUNS_DIR, values.scenario); + if (!runPath) { + console.error("No matching run JSON in", RUNS_DIR); + process.exit(1); + } + } + + if (!runPath && positionals[0]) { + runPath = positionals[0]; + } + + if (!runPath) { + console.error("Provide --run or --latest"); + process.exit(1); + } + + let transcriptPath: string; + try { + transcriptPath = await resolveTranscriptJsonPath(runPath); + } catch (e) { + console.error(String(e)); + process.exit(1); + } + + const doHeuristic = !values["no-heuristic"]; + const doLlm = values.llm; + + if (!doHeuristic && !doLlm) { + console.error("Nothing to run: enable heuristic (default) or pass --llm"); + process.exit(1); + } + + let heuristicReport: ScoreReport | null = null; + let llmReport: LlmJudgeReport | null = null; + let fail = false; + + if (doHeuristic) { + heuristicReport = await scoreRunFile(transcriptPath); + if (heuristicReport.overallTranscriptPass === false) { + fail = true; + } + } + + if (doLlm) { + const key = process.env.ANTHROPIC_API_KEY?.trim(); + if (!key) { + console.error("Missing ANTHROPIC_API_KEY for --llm"); + process.exit(1); + } + const raw = await readFile(transcriptPath, "utf8"); + const meta = JSON.parse(raw) as { meta?: { scenarioFile?: string } }; + const scenarioPath = scenarioMdPathFromRun(EVAL_ROOT, meta.meta?.scenarioFile); + llmReport = await llmJudgeRun({ + runPath: transcriptPath, + scenarioMdPath: scenarioPath, + apiKey: key, + }); + if (!llmReport.overall_transcript_pass) { + fail = true; + } + } + + if (values.json) { + if (doLlm && values["no-heuristic"]) { + console.log(JSON.stringify(llmReport, null, 2)); + } else if (doHeuristic && !doLlm) { + console.log(JSON.stringify(heuristicReport, null, 2)); + } else { + console.log( + JSON.stringify({ heuristic: heuristicReport, llm: llmReport }, null, 2), + ); + } + } else { + if (heuristicReport) { + console.log(formatScoreReportHuman(heuristicReport)); + console.log(""); + } + if (llmReport) { + console.log(formatLlmReportHuman(llmReport)); + } + } + + if (values.write) { + const { heuristic: heuristicOut, llm: llmOut } = scoreSidecarPaths(transcriptPath); + if (heuristicReport) { + await writeFile(heuristicOut, `${JSON.stringify(heuristicReport, null, 2)}\n`, "utf8"); + if (!values.json) { + console.error(`Wrote ${heuristicOut}`); + } + } + if (llmReport) { + await writeFile(llmOut, `${JSON.stringify(llmReport, null, 2)}\n`, "utf8"); + if (!values.json) { + console.error(`Wrote ${llmOut}`); + } + } + } + + process.exit(fail ? 1 : 0); +} + +main().catch((e) => { + console.error(e); + process.exit(1); +}); diff --git a/docs/agent-evaluation/src/score-transcript.ts b/docs/agent-evaluation/src/score-transcript.ts new file mode 100644 index 000000000..5ba55459b --- /dev/null +++ b/docs/agent-evaluation/src/score-transcript.ts @@ -0,0 +1,1119 @@ +/** + * Heuristic transcript scoring for agent eval runs. + * Maps to human checklist items in scenarios/*.md — not a substitute for execution verification. + */ + +import { readFile, readdir, stat } from "node:fs/promises"; +import { basename, dirname, join } from "node:path"; + +export interface CheckResult { + readonly id: string; + readonly pass: boolean; + readonly detail: string; +} + +export interface TranscriptScore { + readonly passed: number; + readonly total: number; + readonly checks: readonly CheckResult[]; + readonly fraction: number; +} + +export interface ScoreReport { + readonly runFile: string; + readonly scenarioId: string; + readonly scenarioFile: string; + readonly transcript: TranscriptScore; + /** Automated harness does not run Outpost; execution stays manual or a future verifier. */ + readonly execution: { readonly status: "not_automated"; readonly note: string }; + /** null when no automated transcript rubric exists for this scenario yet */ + readonly overallTranscriptPass: boolean | null; +} + +interface RunJson { + meta?: { + scenarioId?: string; + scenarioFile?: string; + turns?: readonly { label?: string; messageCount?: number }[]; + }; + messages?: unknown[]; +} + +export function extractAssistantText(messages: unknown[] | undefined): string { + if (!messages?.length) return ""; + let out = ""; + for (const m of messages) { + if (typeof m !== "object" || m === null) continue; + const o = m as Record; + if (o.type !== "assistant") continue; + const inner = o.message; + if (typeof inner !== "object" || inner === null) continue; + const msg = inner as Record; + const content = msg.content; + if (!Array.isArray(content)) continue; + for (const block of content) { + if (typeof block !== "object" || block === null) continue; + const b = block as Record; + if (b.type === "text" && typeof b.text === "string") { + out += b.text; + } + } + } + return out; +} + +const MAX_TOOL_SCORING_CHARS = 600_000; + +/** + * Assistant-visible text plus tool inputs and Write/Edit file bodies from the transcript. + * Heuristics use this so scored content includes material that only appeared in tool calls/results. + */ +export function extractTranscriptScoringText(messages: unknown[] | undefined): string { + const assistant = extractAssistantText(messages); + if (!messages?.length) return assistant; + const chunks: string[] = []; + let budget = MAX_TOOL_SCORING_CHARS; + + const push = (s: string) => { + if (budget <= 0) return; + const take = s.slice(0, budget); + chunks.push(take); + budget -= take.length; + }; + + for (const m of messages) { + if (typeof m !== "object" || m === null) continue; + const o = m as Record; + + if (o.type === "assistant") { + const inner = o.message; + if (typeof inner !== "object" || inner === null) continue; + const content = (inner as Record).content; + if (!Array.isArray(content)) continue; + for (const block of content) { + if (typeof block !== "object" || block === null) continue; + const b = block as Record; + if (b.type !== "tool_use") continue; + const input = b.input; + if (input !== undefined) { + try { + push(`\n[tool_use ${String(b.name ?? "?")}]\n${JSON.stringify(input)}\n`); + } catch { + push(`\n[tool_use ${String(b.name ?? "?")}]\n`); + } + } + } + continue; + } + + if (o.type === "user") { + const tur = o.tool_use_result; + if (typeof tur === "object" && tur !== null) { + const t = tur as Record; + if (typeof t.content === "string") { + push(`\n[tool_result content]\n${t.content}\n`); + } + if (typeof t.newContent === "string") { + push(`\n[tool_result newContent]\n${t.newContent}\n`); + } + } + const inner = o.message; + if (typeof inner === "object" && inner !== null) { + const content = (inner as Record).content; + if (Array.isArray(content)) { + for (const block of content) { + if (typeof block !== "object" || block === null) continue; + const b = block as Record; + if (b.type === "tool_result" && typeof b.content === "string") { + push(`\n[tool_result]\n${b.content}\n`); + } + } + } + } + } + } + + return `${assistant}\n\n--- tool corpus ---\n${chunks.join("")}`; +} + +function hadOptionalSecondUserTurn(meta: RunJson["meta"]): boolean { + const turns = meta?.turns ?? []; + return turns.some((t) => { + const l = (t.label ?? "").toLowerCase(); + return l.includes("turn 2") || l.includes("optional"); + }); +} + +/** Likely pasted API key (not env var reference). */ +function containsLikelyLeakedKey(text: string): boolean { + if (/Bearer\s+sk-ant-api/i.test(text)) return true; + if (/Bearer\s+[a-zA-Z0-9_-]{40,}/.test(text)) return true; + return false; +} + +function scoreScenario01(corpus: string, assistant: string, meta: RunJson["meta"]): TranscriptScore { + const t = corpus; + const lower = t.toLowerCase(); + const checks: CheckResult[] = []; + + const managed = + t.includes("api.outpost.hookdeck.com/2025-07-01") || + /\$OUTPOST_API_BASE_URL/.test(t); + // Self-hosted snippet must not be what the assistant told the user to run (tool corpus can quote docs). + const selfHostedInUserGuidance = /\blocalhost:3333\/api\/v1\b/.test(assistant); + checks.push({ + id: "managed_base_url", + pass: managed && !selfHostedInUserGuidance, + detail: !managed + ? "Expected api.outpost.hookdeck.com/2025-07-01 or $OUTPOST_API_BASE_URL" + : selfHostedInUserGuidance + ? "Assistant guidance includes localhost:3333/api/v1 (self-hosted) as primary" + : "Uses managed API base (or OUTPOST_API_BASE_URL); no self-hosted path in assistant guidance", + }); + + const tenantPut = + /PUT|put/i.test(t) && + (t.includes("/tenants/") || t.includes("/tenants/$") || t.includes("/tenants/${")); + checks.push({ + id: "tenant_put", + pass: tenantPut, + detail: tenantPut ? "PUT …/tenants/… present" : "Expected PUT with /tenants/ path", + }); + + const dest = + lower.includes("webhook") && + (t.includes("/destinations") || t.includes("/destinations\"")) && + (lower.includes("post") || t.includes("-X POST") || t.includes("-X post")); + checks.push({ + id: "destination_webhook", + pass: dest, + detail: dest ? "POST destinations with webhook" : "Expected POST …/destinations with webhook type", + }); + + const publish = + (t.includes("/publish") || t.includes("/publish\"")) && + (lower.includes("post") || t.includes("-X POST")); + checks.push({ + id: "publish_post", + pass: publish, + detail: publish ? "POST …/publish present" : "Expected POST publish", + }); + + const afterPublish = t.split(/\/publish/i).pop() ?? t; + const wrongPayload = /"payload"\s*:/.test(afterPublish); + const hasData = /"data"\s*:/.test(afterPublish); + checks.push({ + id: "publish_body_data_not_payload", + pass: publish && !wrongPayload && hasData, + detail: !publish + ? "N/A (no publish block)" + : wrongPayload + ? 'Found "payload" after /publish — Outpost expects "data"' + : hasData + ? 'Publish section uses "data"' + : 'Missing "data" in publish JSON (check manually)', + }); + + checks.push({ + id: "no_key_in_reply", + pass: !containsLikelyLeakedKey(assistant), + detail: containsLikelyLeakedKey(assistant) + ? "Possible raw API key in assistant-visible text" + : "No obvious raw Bearer secret in assistant text", + }); + + const verifyTurn = hadOptionalSecondUserTurn(meta); + if (verifyTurn) { + const verify = + lower.includes("hookdeck") && + (lower.includes("console") || lower.includes("dashboard") || lower.includes("log")); + checks.push({ + id: "verification_console_or_logs", + pass: verify, + detail: verify + ? "Turn 2+ mentions Hookdeck Console / dashboard / logs" + : "Optional verify turn ran but no Console/dashboard/logs mention found", + }); + } + + const passed = checks.filter((c) => c.pass).length; + const total = checks.length; + return { + passed, + total, + checks, + fraction: total ? passed / total : 0, + }; +} + +function scoreScenario02(corpus: string, assistant: string): TranscriptScore { + const t = corpus; + const checks: CheckResult[] = []; + + const sdk = /@hookdeck\/outpost-sdk\b/.test(t); + checks.push({ + id: "ts_sdk_dependency", + pass: sdk, + detail: sdk ? "References @hookdeck/outpost-sdk" : "Expected @hookdeck/outpost-sdk in code or package.json", + }); + + const client = /new\s+Outpost\s*\(|Outpost\s*\(\s*\{/.test(t); + checks.push({ + id: "outpost_client", + pass: client, + detail: client ? "Constructs Outpost client" : "Expected new Outpost(…) or Outpost({ … })", + }); + + const envKey = /process\.env\.OUTPOST_API_KEY|OUTPOST_API_KEY/.test(t); + checks.push({ + id: "env_api_key", + pass: envKey, + detail: envKey ? "Uses OUTPOST_API_KEY from env" : "Expected process.env.OUTPOST_API_KEY (or documented env)", + }); + + const upsert = /tenants\.upsert|tenants\?\.upsert/.test(t); + checks.push({ + id: "tenants_upsert", + pass: upsert, + detail: upsert ? "Calls tenants.upsert" : "Expected tenants.upsert", + }); + + const dest = /destinations\.create|destinations\?\.create/.test(t); + checks.push({ + id: "destinations_create", + pass: dest, + detail: dest ? "Calls destinations.create" : "Expected destinations.create", + }); + + const pub = /publish\.event|publish\?\.event/.test(t); + checks.push({ + id: "publish_event", + pass: pub, + detail: pub ? "Calls publish.event" : "Expected publish.event", + }); + + const hookUrl = /OUTPOST_TEST_WEBHOOK_URL/.test(t); + checks.push({ + id: "webhook_env", + pass: hookUrl, + detail: hookUrl ? "Uses OUTPOST_TEST_WEBHOOK_URL" : "Expected OUTPOST_TEST_WEBHOOK_URL for webhook URL", + }); + + const run = /npx\s+tsx\b|tsx\s+\S+\.ts\b|ts-node\b|node\s+.*\.ts\b/.test(t); + checks.push({ + id: "run_instructions", + pass: run, + detail: run ? "Mentions npx tsx / ts-node / running .ts" : "Expected run instructions (e.g. npx tsx …)", + }); + + checks.push({ + id: "no_key_in_reply", + pass: !containsLikelyLeakedKey(assistant), + detail: containsLikelyLeakedKey(assistant) + ? "Possible raw API key in assistant-visible text" + : "No obvious raw Bearer secret in assistant text", + }); + + const passed = checks.filter((c) => c.pass).length; + const total = checks.length; + return { passed, total, checks, fraction: total ? passed / total : 0 }; +} + +function scoreScenario03(corpus: string, assistant: string): TranscriptScore { + const t = corpus; + const checks: CheckResult[] = []; + + const imp = /from\s+outpost_sdk\s+import|import\s+outpost_sdk/.test(t); + checks.push({ + id: "python_sdk_import", + pass: imp, + detail: imp ? "Imports outpost_sdk" : "Expected `from outpost_sdk import …` or import outpost_sdk", + }); + + const client = /Outpost\s*\(/.test(t); + checks.push({ + id: "outpost_client", + pass: client, + detail: client ? "Constructs Outpost(…)" : "Expected Outpost(…) client", + }); + + const upsert = /tenants\.upsert|tenants\?\.upsert/.test(t); + checks.push({ + id: "tenants_upsert", + pass: upsert, + detail: upsert ? "Calls tenants.upsert" : "Expected tenants.upsert", + }); + + const dest = /destinations\.create|destinations\?\.create/.test(t); + checks.push({ + id: "destinations_create", + pass: dest, + detail: dest ? "Calls destinations.create" : "Expected destinations.create", + }); + + const pub = /publish\.event|publish\?\.event/.test(t); + checks.push({ + id: "publish_event", + pass: pub, + detail: pub ? "Calls publish.event" : "Expected publish.event", + }); + + const env = /os\.environ|getenv\s*\(\s*["']OUTPOST_API_KEY/.test(t); + checks.push({ + id: "env_api_key", + pass: env, + detail: env ? "Reads API key from environment" : "Expected os.environ or getenv for OUTPOST_API_KEY", + }); + + const hookUrl = /OUTPOST_TEST_WEBHOOK_URL/.test(t); + checks.push({ + id: "webhook_env", + pass: hookUrl, + detail: hookUrl ? "Uses OUTPOST_TEST_WEBHOOK_URL" : "Expected OUTPOST_TEST_WEBHOOK_URL", + }); + + checks.push({ + id: "no_key_in_reply", + pass: !containsLikelyLeakedKey(assistant), + detail: containsLikelyLeakedKey(assistant) + ? "Possible raw API key in assistant-visible text" + : "No obvious raw Bearer secret in assistant text", + }); + + const passed = checks.filter((c) => c.pass).length; + const total = checks.length; + return { passed, total, checks, fraction: total ? passed / total : 0 }; +} + +function scoreScenario04(corpus: string, assistant: string): TranscriptScore { + const t = corpus; + const checks: CheckResult[] = []; + + const mod = /hookdeck\/outpost.*outpost-go|outpost-go|outpostgo/.test(t); + checks.push({ + id: "go_sdk_module", + pass: mod, + detail: mod ? "References outpost-go / outpostgo" : "Expected github.com/hookdeck/outpost/.../outpost-go or outpostgo", + }); + + const newClient = /outpostgo\.New\s*\(|\bNew\s*\(\s*context\./.test(t); + checks.push({ + id: "go_client_new", + pass: newClient, + detail: newClient ? "Creates client with New(…)" : "Expected outpostgo.New(…) or similar", + }); + + const sec = /WithSecurity|WithServerURL/.test(t); + checks.push({ + id: "go_client_options", + pass: sec, + detail: sec ? "Uses WithSecurity or WithServerURL" : "Expected WithSecurity (and optional WithServerURL)", + }); + + const upsert = /Tenants\.Upsert|\.Upsert\s*\(/.test(t); + checks.push({ + id: "tenants_upsert", + pass: upsert, + detail: upsert ? "Calls Tenants.Upsert" : "Expected Tenants.Upsert", + }); + + const dest = /Destinations\.Create|CreateDestinationCreateWebhook/.test(t); + checks.push({ + id: "destinations_create", + pass: dest, + detail: dest ? "Creates webhook destination" : "Expected Destinations.Create / CreateDestinationCreateWebhook", + }); + + const pub = /Publish\.Event|\.Event\s*\(/.test(t); + checks.push({ + id: "publish_event", + pass: pub, + detail: pub ? "Calls Publish.Event" : "Expected Publish.Event", + }); + + const envKey = /Getenv\s*\(\s*["']OUTPOST_API_KEY["']/.test(t); + checks.push({ + id: "env_api_key", + pass: envKey, + detail: envKey ? "Reads OUTPOST_API_KEY via os.Getenv" : "Expected os.Getenv(\"OUTPOST_API_KEY\")", + }); + + const hookUrl = /OUTPOST_TEST_WEBHOOK_URL/.test(t); + checks.push({ + id: "webhook_env", + pass: hookUrl, + detail: hookUrl ? "Uses OUTPOST_TEST_WEBHOOK_URL" : "Expected OUTPOST_TEST_WEBHOOK_URL", + }); + + checks.push({ + id: "no_key_in_reply", + pass: !containsLikelyLeakedKey(assistant), + detail: containsLikelyLeakedKey(assistant) + ? "Possible raw API key in assistant-visible text" + : "No obvious raw Bearer secret in assistant text", + }); + + const passed = checks.filter((c) => c.pass).length; + const total = checks.length; + return { passed, total, checks, fraction: total ? passed / total : 0 }; +} + +function scoreScenario05(corpus: string, assistant: string, meta: RunJson["meta"]): TranscriptScore { + const t = corpus; + const lower = t.toLowerCase(); + const checks: CheckResult[] = []; + + const next = + /"next"\s*:\s*"/.test(t) || + /next\/dev|next\s+dev|next\.config/.test(t) || + /\bnext@\d/.test(t); + checks.push({ + id: "nextjs_signals", + pass: next, + detail: next ? "Next.js dependency or dev command present" : "Expected next in package.json or next dev / next.config", + }); + + const sdk = /@hookdeck\/outpost-sdk\b/.test(t); + checks.push({ + id: "outpost_ts_sdk", + pass: sdk, + detail: sdk ? "Uses @hookdeck/outpost-sdk" : "Expected @hookdeck/outpost-sdk in dependencies or imports", + }); + + const api = + /app\/api\/[^"'\s]+\/route\.(t|j)sx?/.test(t) || + /pages\/api\//.test(t) || + /["']\/api\/(destination|destinations|event|publish)/.test(t); + checks.push({ + id: "api_routes_layer", + pass: api, + detail: api ? "App/Pages API route layer present" : "Expected app/api/.../route or pages/api or /api/… fetches", + }); + + const twoFlows = + (/destination|webhook|subscribe/i.test(t) && /publish|event|send/i.test(t) && /\/api\//.test(t)) || + (t.includes("/api/destination") && t.includes("/api/event")); + checks.push({ + id: "destination_and_publish_surface", + pass: twoFlows, + detail: twoFlows + ? "Distinct destination + publish flows (URLs or labels)" + : "Expected separate destination registration and publish (e.g. two API routes or actions)", + }); + + const serverEnv = + /route\.(t|j)sx?[\s\S]{0,12000}process\.env\.OUTPOST_API_KEY|OUTPOST_API_KEY[\s\S]{0,800}(route\.(t|j)sx?|api\/)/i.test( + t, + ) || (/process\.env\.OUTPOST_API_KEY/.test(t) && /app\/api\//.test(t)); + checks.push({ + id: "server_env_outpost_key", + pass: serverEnv, + detail: serverEnv + ? "OUTPOST_API_KEY read server-side (e.g. API route)" + : "Expected process.env.OUTPOST_API_KEY in API route context", + }); + + const leakClient = /NEXT_PUBLIC_OUTPOST_API_KEY/.test(t); + checks.push({ + id: "no_next_public_api_key", + pass: !leakClient, + detail: leakClient + ? "NEXT_PUBLIC_OUTPOST_API_KEY would expose key to browser" + : "No NEXT_PUBLIC_OUTPOST_API_KEY", + }); + + const readme = /README/i.test(t) && /OUTPOST_API_KEY/.test(t); + checks.push({ + id: "readme_env", + pass: readme, + detail: readme ? "README mentions OUTPOST_API_KEY" : "Expected README with OUTPOST_API_KEY", + }); + + const managed = + !/\blocalhost:3333\/api\/v1\b/.test(t) && + (!/localhost:\d{2,5}\s*\/\s*api\/v1/.test(t) || /OUTPOST_API_BASE_URL/.test(t)); + checks.push({ + id: "managed_base_not_selfhosted", + pass: managed, + detail: managed + ? "No self-hosted localhost API path as default" + : "Avoid localhost:3333/api/v1 unless user asked for self-hosted", + }); + + checks.push({ + id: "no_key_in_reply", + pass: !containsLikelyLeakedKey(assistant), + detail: containsLikelyLeakedKey(assistant) + ? "Possible raw API key in assistant-visible text" + : "No obvious raw Bearer secret in assistant text", + }); + + const stressTurn = (meta?.turns?.length ?? 0) >= 4; + if (stressTurn) { + const hookdeckHint = + lower.includes("hookdeck") && + (lower.includes("console") || lower.includes("source") || lower.includes("dashboard")); + checks.push({ + id: "stress_public_url_hint", + pass: hookdeckHint, + detail: hookdeckHint + ? "Turn 3+ stress: mentions Hookdeck Console/Source/dashboard for webhook URL" + : "Stress turn present but no Hookdeck Console/Source hint found", + }); + } + + const passed = checks.filter((c) => c.pass).length; + const total = checks.length; + return { passed, total, checks, fraction: total ? passed / total : 0 }; +} + +function scoreScenario06(corpus: string, assistant: string): TranscriptScore { + const t = corpus; + const lower = t.toLowerCase(); + const checks: CheckResult[] = []; + + const fast = /FastAPI|from\s+fastapi\s+import/.test(t); + checks.push({ + id: "fastapi_framework", + pass: fast, + detail: fast ? "Uses FastAPI" : "Expected FastAPI import or class", + }); + + const sdk = /from\s+outpost_sdk\s+import|import\s+outpost_sdk|outpost_sdk/.test(t); + checks.push({ + id: "python_outpost_sdk", + pass: sdk, + detail: sdk ? "Uses outpost_sdk" : "Expected outpost_sdk import or usage", + }); + + const uv = /uvicorn/.test(lower); + checks.push({ + id: "uvicorn_documented", + pass: uv, + detail: uv ? "Mentions uvicorn" : "Expected uvicorn run command or import", + }); + + const envKey = /OUTPOST_API_KEY/.test(t) && (/os\.environ|getenv/.test(t) || /Depends?\(/.test(t)); + checks.push({ + id: "server_env_api_key", + pass: envKey, + detail: envKey ? "API key from environment on server" : "Expected OUTPOST_API_KEY via os.environ/getenv or settings", + }); + + const two = + (/destination|webhook/i.test(t) && /publish|event/i.test(t)) || + (/@app\.(get|post)|APIRouter/.test(t) && /publish/i.test(t) && /destination|webhook/i.test(t)); + checks.push({ + id: "register_and_publish_flow", + pass: two, + detail: two ? "Both destination/webhook and publish/event surfaced" : "Expected register webhook + publish flows", + }); + + const readme = /README/i.test(t) && /OUTPOST_API_KEY/.test(t); + checks.push({ + id: "readme_env", + pass: readme, + detail: readme ? "README mentions OUTPOST_API_KEY" : "Expected README with OUTPOST_API_KEY", + }); + + const hookOrDoc = /OUTPOST_TEST_WEBHOOK_URL|TEST_WEBHOOK|webhook\s*url/i.test(t); + checks.push({ + id: "webhook_url_documented", + pass: hookOrDoc, + detail: hookOrDoc ? "Webhook URL env or field documented" : "Expected OUTPOST_TEST_WEBHOOK_URL or webhook URL docs", + }); + + checks.push({ + id: "no_key_in_reply", + pass: !containsLikelyLeakedKey(assistant), + detail: containsLikelyLeakedKey(assistant) + ? "Possible raw API key in assistant-visible text" + : "No obvious raw Bearer secret in assistant text", + }); + + const passed = checks.filter((c) => c.pass).length; + const total = checks.length; + return { passed, total, checks, fraction: total ? passed / total : 0 }; +} + +function scoreScenario07(corpus: string, assistant: string): TranscriptScore { + const t = corpus; + const lower = t.toLowerCase(); + const checks: CheckResult[] = []; + + const httpLib = /"net\/http"|net\/http/.test(t) || /\bhttp\.HandleFunc\b/.test(t); + checks.push({ + id: "stdlib_http", + pass: httpLib, + detail: httpLib ? "Uses net/http" : "Expected net/http or http.HandleFunc", + }); + + const sdk = /hookdeck\/outpost.*outpost-go|outpostgo|CreateDestinationCreateWebhook/.test(t); + checks.push({ + id: "go_outpost_sdk", + pass: sdk, + detail: sdk ? "Uses Outpost Go SDK patterns" : "Expected outpost-go / CreateDestinationCreateWebhook", + }); + + const createWebhook = /CreateDestinationCreateWebhook/.test(t); + checks.push({ + id: "create_destination_webhook", + pass: createWebhook, + detail: createWebhook ? "CreateDestinationCreateWebhook present" : "Expected CreateDestinationCreateWebhook wrapper", + }); + + const htmlUi = / c.pass).length; + const total = checks.length; + return { passed, total, checks, fraction: total ? passed / total : 0 }; +} + +/** Option 3 — integrate Outpost into an existing SaaS-style codebase (Next.js baseline). */ +function scoreScenario08(corpus: string, assistant: string): TranscriptScore { + const t = corpus; + const lower = t.toLowerCase(); + const checks: CheckResult[] = []; + + const baseline = + /leerob\/next-saas-starter|next-saas-starter/.test(t) || + (/git\s+clone\b/.test(lower) && /github\.com/.test(t)); + checks.push({ + id: "baseline_or_clone", + pass: baseline, + detail: baseline + ? "References next-saas-starter baseline or git clone from GitHub" + : "Expected clone/setup of the documented baseline (e.g. leerob/next-saas-starter)", + }); + + const sdk = /@hookdeck\/outpost-sdk\b/.test(t); + checks.push({ + id: "outpost_ts_sdk", + pass: sdk, + detail: sdk ? "Uses @hookdeck/outpost-sdk" : "Expected @hookdeck/outpost-sdk", + }); + + const integration = + /publish\.event|destinations\.create|tenants\.upsert/.test(t) || + /\/api\/.*outpost|outpost.*publish/i.test(t); + checks.push({ + id: "outpost_integration_calls", + pass: integration, + detail: integration + ? "Server-side Outpost client usage (publish / destinations / tenants)" + : "Expected publish.event, destinations.create, or tenants.upsert (or clear API wrapper)", + }); + + const topic = /user\.created|topic|TOPIC/.test(t); + checks.push({ + id: "topic_or_event_hook", + pass: topic, + detail: topic ? "Topic or event hook documented" : "Expected topic from prompt or explicit event naming", + }); + + const serverKey = + /process\.env\.OUTPOST_API_KEY/.test(t) && + !/NEXT_PUBLIC_OUTPOST_API_KEY/.test(t); + checks.push({ + id: "server_env_key_only", + pass: serverKey, + detail: serverKey + ? "OUTPOST_API_KEY read server-side; no NEXT_PUBLIC_ key" + : "Expected process.env.OUTPOST_API_KEY and no NEXT_PUBLIC_OUTPOST_API_KEY", + }); + + const destDoc = + /destination|webhook\s*url|register.*webhook/i.test(t) && /tenant|customer|team/i.test(lower); + checks.push({ + id: "destination_per_customer_doc", + pass: destDoc, + detail: destDoc + ? "Documents webhook destination registration per tenant/customer (or team)" + : "Expected how operators register webhook URLs per customer/tenant", + }); + + checks.push({ + id: "no_key_in_reply", + pass: !containsLikelyLeakedKey(assistant), + detail: containsLikelyLeakedKey(assistant) + ? "Possible raw API key in assistant-visible text" + : "No obvious raw Bearer secret in assistant text", + }); + + const passed = checks.filter((c) => c.pass).length; + const total = checks.length; + return { passed, total, checks, fraction: total ? passed / total : 0 }; +} + +/** Option 3 — existing FastAPI SaaS baseline. */ +function scoreScenario09(corpus: string, assistant: string): TranscriptScore { + const t = corpus; + const lower = t.toLowerCase(); + const checks: CheckResult[] = []; + + const baseline = + /philipokiokio\/fastapi_saas_template|fastapi_saas_template|FastAPI_SAAS/i.test(t) || + (/git\s+clone\b/.test(lower) && /github\.com/.test(t)); + checks.push({ + id: "baseline_or_clone", + pass: baseline, + detail: baseline + ? "References FastAPI_SAAS_Template baseline or git clone" + : "Expected clone/setup of philipokiokio/FastAPI_SAAS_Template (or documented alternative)", + }); + + const sdk = /from\s+outpost_sdk\s+import|import\s+outpost_sdk/.test(t); + checks.push({ + id: "python_outpost_sdk", + pass: sdk, + detail: sdk ? "Imports outpost_sdk" : "Expected outpost_sdk import", + }); + + const integration = + /publish\.event|destinations\.create|tenants\.upsert/.test(t); + checks.push({ + id: "outpost_integration_calls", + pass: integration, + detail: integration ? "Uses tenants/destinations/publish APIs" : "Expected SDK API calls for Outpost", + }); + + const hook = + /signal|event|webhook|post_save|after_create|lifecycle|router\.(post|put)/i.test(t) && + /publish|outpost/i.test(lower); + checks.push({ + id: "domain_event_hook", + pass: hook, + detail: hook + ? "Hooks Outpost publish into an application event or route" + : "Expected tying publish to a domain event or HTTP handler", + }); + + const env = /OUTPOST_API_KEY/.test(t) && (/os\.environ|getenv|settings|Depends/.test(t)); + checks.push({ + id: "env_api_key", + pass: env, + detail: env ? "API key from environment / settings" : "Expected OUTPOST_API_KEY from env", + }); + + checks.push({ + id: "no_key_in_reply", + pass: !containsLikelyLeakedKey(assistant), + detail: containsLikelyLeakedKey(assistant) + ? "Possible raw API key in assistant-visible text" + : "No obvious raw Bearer secret in assistant text", + }); + + const passed = checks.filter((c) => c.pass).length; + const total = checks.length; + return { passed, total, checks, fraction: total ? passed / total : 0 }; +} + +/** Option 3 — existing Go SaaS/API baseline. */ +function scoreScenario10(corpus: string, assistant: string): TranscriptScore { + const t = corpus; + const lower = t.toLowerCase(); + const checks: CheckResult[] = []; + + const baseline = + /devinterface\/startersaas-go-api|startersaas-go-api|StarterSaaS/.test(t) || + (/git\s+clone\b/.test(lower) && /github\.com/.test(t)); + checks.push({ + id: "baseline_or_clone", + pass: baseline, + detail: baseline + ? "References StarterSaaS Go API baseline or git clone" + : "Expected clone/setup of devinterface/startersaas-go-api (or documented alternative)", + }); + + const sdk = /hookdeck\/outpost.*outpost-go|outpostgo\.|github\.com\/hookdeck\/outpost/.test(t); + checks.push({ + id: "go_outpost_sdk", + pass: sdk, + detail: sdk ? "Uses Outpost Go module" : "Expected outpost-go / outpostgo import path", + }); + + const integration = /Publish\.Event|Tenants\.|Destinations\./.test(t); + checks.push({ + id: "outpost_integration_calls", + pass: integration, + detail: integration ? "Uses Outpost Go client operations" : "Expected Publish / Tenants / Destinations usage", + }); + + const hook = + /handler|middleware|OnUser|event|CreateUser|signup|register/i.test(t) && /publish|outpost/i.test(lower); + checks.push({ + id: "domain_event_hook", + pass: hook, + detail: hook + ? "Hooks publish into a handler or domain flow" + : "Expected publish tied to a concrete code path", + }); + + const envKey = /Getenv\s*\(\s*["']OUTPOST_API_KEY["']/.test(t); + checks.push({ + id: "env_api_key", + pass: envKey, + detail: envKey ? "Reads OUTPOST_API_KEY via os.Getenv" : "Expected os.Getenv(\"OUTPOST_API_KEY\")", + }); + + checks.push({ + id: "no_key_in_reply", + pass: !containsLikelyLeakedKey(assistant), + detail: containsLikelyLeakedKey(assistant) + ? "Possible raw API key in assistant-visible text" + : "No obvious raw Bearer secret in assistant text", + }); + + const passed = checks.filter((c) => c.pass).length; + const total = checks.length; + return { passed, total, checks, fraction: total ? passed / total : 0 }; +} + +/** Scenarios with a non-empty regex rubric in this file (used for exit / overallTranscriptPass). */ +export const SCENARIO_IDS_WITH_HEURISTIC_RUBRIC: ReadonlySet = new Set([ + "01", + "02", + "03", + "04", + "05", + "06", + "07", + "08", + "09", + "10", +]); + +function scoreByScenarioId( + scenarioId: string, + corpus: string, + assistant: string, + meta: RunJson["meta"], +): TranscriptScore { + switch (scenarioId) { + case "01": + return scoreScenario01(corpus, assistant, meta); + case "02": + return scoreScenario02(corpus, assistant); + case "03": + return scoreScenario03(corpus, assistant); + case "04": + return scoreScenario04(corpus, assistant); + case "05": + return scoreScenario05(corpus, assistant, meta); + case "06": + return scoreScenario06(corpus, assistant); + case "07": + return scoreScenario07(corpus, assistant); + case "08": + return scoreScenario08(corpus, assistant); + case "09": + return scoreScenario09(corpus, assistant); + case "10": + return scoreScenario10(corpus, assistant); + default: + return { + passed: 0, + total: 0, + checks: [], + fraction: 0, + }; + } +} + +export async function scoreRunJson( + runPath: string, + raw: string, +): Promise { + const data = JSON.parse(raw) as RunJson; + const scenarioId = data.meta?.scenarioId ?? "unknown"; + const scenarioFile = data.meta?.scenarioFile ?? `${scenarioId}-unknown.md`; + const assistantOnly = extractAssistantText(data.messages); + const corpus = extractTranscriptScoringText(data.messages); + const transcript = scoreByScenarioId(scenarioId, corpus, assistantOnly, data.meta); + + const hasRubric = SCENARIO_IDS_WITH_HEURISTIC_RUBRIC.has(scenarioId); + const overallTranscriptPass = hasRubric + ? transcript.total > 0 && transcript.passed === transcript.total + : null; + + return { + runFile: runPath, + scenarioId, + scenarioFile, + transcript, + execution: { + status: "not_automated", + note: + "Execution (live Outpost) is not scored here. After running curls/code with OUTPOST_API_KEY, mark the Execution row in scenarios/*.md or results/RUN-RECORDING.template.md.", + }, + overallTranscriptPass, + }; +} + +export async function scoreRunFile(runPath: string): Promise { + const raw = await readFile(runPath, "utf8"); + return scoreRunJson(runPath, raw); +} + +/** Resolve a run directory or legacy flat JSON path to transcript.json path. */ +export async function resolveTranscriptJsonPath(input: string): Promise { + let st; + try { + st = await stat(input); + } catch { + throw new Error(`Path not found: ${input}`); + } + if (st.isDirectory()) { + const t = join(input, "transcript.json"); + try { + await stat(t); + } catch { + throw new Error(`No transcript.json in directory: ${input}`); + } + return t; + } + return input; +} + +/** Sidecar score paths: nested run dir vs legacy flat *-scenario-NN.json */ +export function scoreSidecarPaths(transcriptPath: string): { + heuristic: string; + llm: string; +} { + if (basename(transcriptPath) === "transcript.json") { + const dir = dirname(transcriptPath); + return { + heuristic: join(dir, "heuristic-score.json"), + llm: join(dir, "llm-score.json"), + }; + } + return { + heuristic: transcriptPath.replace(/\.json$/i, ".score.json"), + llm: transcriptPath.replace(/\.json$/i, ".llm-score.json"), + }; +} + +export async function findLatestRunFile( + runsDir: string, + scenarioId?: string, +): Promise { + const entries = await readdir(runsDir, { withFileTypes: true }); + /** Mutable holder so TS control flow tracks updates across async `consider` calls. */ + const latest = { path: null as string | null, mtime: -Infinity }; + + const consider = async (transcriptPath: string) => { + try { + const st = await stat(transcriptPath); + if (st.mtimeMs > latest.mtime) { + latest.path = transcriptPath; + latest.mtime = st.mtimeMs; + } + } catch { + /* skip */ + } + }; + + for (const ent of entries) { + const name = ent.name; + if (ent.isDirectory()) { + if (!/-scenario-\d{2}$/i.test(name)) continue; + if ( + scenarioId && + !name.endsWith(`scenario-${scenarioId.padStart(2, "0")}`) + ) { + continue; + } + await consider(join(runsDir, name, "transcript.json")); + continue; + } + if ( + ent.isFile() && + /-scenario-\d{2}\.json$/i.test(name) && + !name.endsWith(".score.json") && + !name.endsWith(".llm-score.json") + ) { + if ( + scenarioId && + !name.includes(`scenario-${scenarioId.padStart(2, "0")}`) + ) { + continue; + } + await consider(join(runsDir, name)); + } + } + + return latest.path; +} + +export function formatScoreReportHuman(r: ScoreReport): string { + const lines: string[] = [ + `Transcript: ${r.runFile}`, + `Scenario: ${r.scenarioId} (${r.scenarioFile})`, + ]; + if (basename(r.runFile) === "transcript.json") { + lines.push(`Run directory (agent workspace): ${dirname(r.runFile)}`); + } + lines.push(""); + if (r.transcript.total === 0) { + lines.push("Transcript checks: (no automated rubric — add scorers in src/score-transcript.ts)"); + } else { + lines.push( + `Transcript checks: ${r.transcript.passed}/${r.transcript.total} passed (${Math.round(r.transcript.fraction * 100)}%)`, + ); + } + for (const c of r.transcript.checks) { + lines.push(` [${c.pass ? "PASS" : "FAIL"}] ${c.id}: ${c.detail}`); + } + lines.push(""); + lines.push(`Execution: ${r.execution.status} — ${r.execution.note}`); + lines.push(""); + lines.push( + `Overall transcript pass: ${ + r.overallTranscriptPass === null ? "N/A (no rubric)" : r.overallTranscriptPass ? "YES" : "NO" + }`, + ); + return lines.join("\n"); +} diff --git a/docs/agent-evaluation/tsconfig.json b/docs/agent-evaluation/tsconfig.json new file mode 100644 index 000000000..80fcf22d3 --- /dev/null +++ b/docs/agent-evaluation/tsconfig.json @@ -0,0 +1,15 @@ +{ + "compilerOptions": { + "target": "ES2022", + "module": "NodeNext", + "moduleResolution": "NodeNext", + "lib": ["ES2022"], + "strict": true, + "skipLibCheck": true, + "noEmit": true, + "esModuleInterop": true, + "verbatimModuleSyntax": true, + "resolveJsonModule": true + }, + "include": ["src/**/*.ts"] +} diff --git a/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx b/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx index 1f2a3a394..bba3b53d7 100644 --- a/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx +++ b/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx @@ -3,7 +3,7 @@ title: "Hookdeck Outpost — agent prompt template" description: "Copy-paste template for AI coding agents. Dashboard teams should inject the placeholders server-side or client-side." --- -This page is a **reference template** for the Hookdeck Outpost onboarding flow. Replace `{{PLACEHOLDERS}}` with values from the operator’s project (or render them in the dashboard). **Do not** put the API key in the prompt; the operator sets `OUTPOST_API_KEY` separately. API keys are created under the Outpost project: **Settings → Secrets** (the same Outpost API key used by the REST API and SDKs). +This page is a **reference template** for the Hookdeck Outpost onboarding flow. Replace `{{PLACEHOLDERS}}` with values from the operator’s project (or render them in the dashboard). **Do not** put the API key in the prompt; the operator sets `OUTPOST_API_KEY` separately (for example in a project **`.env`** file loaded by their shell or app—never pasted into chat). API keys are created under the Outpost project: **Settings → Secrets** (the same Outpost API key used by the REST API and SDKs). ## Template @@ -15,7 +15,7 @@ You are helping integrate Hookdeck Outpost into a platform to deliver events (we ### Credentials - API base URL: {{API_BASE_URL}} -- API key (Outpost API key from the project **Settings → Secrets**): read from the `OUTPOST_API_KEY` environment variable (never ask the user to paste the key into chat) +- API key (Outpost API key from the project **Settings → Secrets**): load from the `OUTPOST_API_KEY` environment variable — typically a **`.env`** file in the operator’s project (or another secrets mechanism their tooling loads); never ask the user to paste the key into chat ### Configured topics @@ -23,7 +23,9 @@ You are helping integrate Hookdeck Outpost into a platform to deliver events (we ### Test destination -Use this URL to verify event delivery (webhook destination): {{TEST_DESTINATION_URL}} +Use this **Hookdeck Console Source** URL to verify event delivery (the webhook `config.url`, or `OUTPOST_TEST_WEBHOOK_URL` in the SDK quickstarts). Your dashboard supplies it for this project: + +{{TEST_DESTINATION_URL}} ### Documentation @@ -48,6 +50,8 @@ Ask the user which of the following they want: For all modes, read the relevant quickstart documentation before writing code. +**Files on disk:** When your environment supports it, **write runnable artifacts into the operator’s project workspace** (real files: scripts, app source, `package.json`, `go.mod`, README) rather than only pasting long code in chat—so they can run, diff, and commit. Keep everything for one task in the same directory. For **minimal example apps** (option 2), scaffold and install dependencies there as you normally would (for example `npm` / `npx`, `go mod`, `pip` or `uv`). + **Concepts:** Each tenant is one of the platform's customers. Destinations are where events are delivered (webhook URLs, queues, etc.). Events are published with a **topic**; only destinations subscribed to that topic receive the event. Topics for this project are listed above and were configured in the Hookdeck dashboard. ``` @@ -57,12 +61,12 @@ For all modes, read the relevant quickstart documentation before writing code. |-------------|---------|--------| | `{{API_BASE_URL}}` | `https://api.outpost.hookdeck.com/2025-07-01` | Safe to embed in the prompt | | `{{TOPICS_LIST}}` | Bullet list or comma-separated topic names | From dashboard config | -| `{{TEST_DESTINATION_URL}}` | Unique URL from Hookdeck Console Source, or operator’s test endpoint | May be TBC until `console.hookdeck.com` flow is finalized | -| `{{DOCS_URL}}` | `https://hookdeck.com/outpost/docs` | Public docs root (no trailing slash) | +| `{{TEST_DESTINATION_URL}}` | **Required** — HTTPS URL of the Hookdeck Console **Source** created for this onboarding flow (fed in by the dashboard). | +| `{{DOCS_URL}}` | `https://outpost.hookdeck.com/docs` | Public docs root (no trailing slash). For unpublished docs, automated evals can set **`EVAL_LOCAL_DOCS=1`** so the Documentation section is replaced with repository file paths (see `docs/agent-evaluation/README.md`). | | `{{LLMS_FULL_URL}}` | `https://hookdeck.com/outpost/docs/llms-full.txt` | Optional; omit the line if not live yet | ## Operator checklist (dashboard UI) - Show **API base URL** and **topics** next to the copyable prompt. -- Explain that the **API key** is the Outpost API key from **Settings → Secrets**, and show **environment variables**: `OUTPOST_API_KEY` (value with copy button), optional `OUTPOST_API_BASE_URL`, and `OUTPOST_TEST_WEBHOOK_URL` when the quickstart examples need a test webhook URL. +- Feed **`{{TEST_DESTINATION_URL}}`** from a Hookdeck Console **Source** URL you create for the operator (same value can be shown for `OUTPOST_TEST_WEBHOOK_URL` in env UI). Explain **Settings → Secrets** for `OUTPOST_API_KEY` (recommend a project **`.env`** or env-injection pattern, not pasting into the agent). Optional `OUTPOST_API_BASE_URL`. - Keep the **API key out of the prompt text** to reduce exposure via model logs and chat history. From 76d7c9be0ff7b567ea057bd93fa391a3b5e2e935 Mon Sep 17 00:00:00 2001 From: Phil Leggetter Date: Wed, 8 Apr 2026 15:58:19 +0100 Subject: [PATCH 03/47] docs(agent-eval): prompt mapping, scenarios, harness; reset scenario tracker - Agent prompt: language implies SDK; simplest path defaults to curl; option 2/3 framework mapping; warn on sdks.mdx vs per-language quickstarts. - Curl quickstart: shell script notes (HTTP 202, portable body/status split). - run-agent-eval: PreToolUse write guard, default EVAL_MAX_TURNS 80, local docs block aligned with prompt; scenario heuristic fix for publish data key escaping. - Scenarios 01-10: realistic short user turns; success-criteria fixes where needed. - SCENARIO-RUN-TRACKER: cleared run results for a fresh pass; action items reset. - README and .env.example updates for eval harness as applicable. Made-with: Cursor --- ...TEMP-hookdeck-outpost-onboarding-status.md | 31 ++++--- docs/agent-evaluation/.env.example | 2 + docs/agent-evaluation/README.md | 2 + docs/agent-evaluation/SCENARIO-RUN-TRACKER.md | 61 +++++++------ .../scenarios/01-basics-curl.md | 13 ++- .../scenarios/02-basics-typescript.md | 6 +- .../scenarios/03-basics-python.md | 22 ++--- .../scenarios/04-basics-go.md | 6 +- .../scenarios/05-app-nextjs.md | 8 +- .../scenarios/06-app-fastapi.md | 4 +- .../scenarios/07-app-go-http.md | 4 +- .../scenarios/08-integrate-nextjs-existing.md | 11 +-- .../09-integrate-fastapi-existing.md | 11 +-- .../scenarios/10-integrate-go-existing.md | 11 +-- docs/agent-evaluation/src/run-agent-eval.ts | 88 +++++++++++++++++-- docs/agent-evaluation/src/score-transcript.ts | 7 +- .../hookdeck-outpost-agent-prompt.mdx | 40 ++++++--- .../quickstarts/hookdeck-outpost-curl.mdx | 7 ++ 18 files changed, 216 insertions(+), 118 deletions(-) diff --git a/docs/TEMP-hookdeck-outpost-onboarding-status.md b/docs/TEMP-hookdeck-outpost-onboarding-status.md index 1d481b17f..8fbff69c8 100644 --- a/docs/TEMP-hookdeck-outpost-onboarding-status.md +++ b/docs/TEMP-hookdeck-outpost-onboarding-status.md @@ -10,15 +10,17 @@ The automated harness in `docs/agent-evaluation/` is in place. **What it does today:** -| Area | Status | -|------|--------| -| **Runner** | `src/run-agent-eval.ts` — **## Template** from `hookdeck-outpost-agent-prompt.mdx`, `{{…}}` from env, multi-turn scenarios, **Claude Agent SDK** with **`Read` / `Glob` / `Grep` / `WebFetch` / `Write` / `Edit` / `Bash`**, **`cwd`** = `results/runs/-scenario-NN/` | -| **Artifacts** | `transcript.json`, optional **`heuristic-score.json`** + **`llm-score.json`** (LLM reads each scenario **`## Success criteria`**), agent-written files beside the transcript | -| **Heuristics** | `score-transcript.ts` — **`scoreScenario01`–`scoreScenario10`** on assistant text + tool corpus (so **Write**/Edit content counts) | -| **Scenarios** | **01–04:** try-it-out (curl, TS, Python, Go). **05–07:** minimal UIs (Next, FastAPI, Go `net/http`). **08–10:** Option 3 — integrate into pinned repos (Next **`leerob/next-saas-starter`**, FastAPI **`philipokiokio/FastAPI_SAAS_Template`**, Go **`devinterface/startersaas-go-api`**) | -| **CLI** | **`npm run eval` requires `--scenario`, `--scenarios`, or `--all`** — no accidental full-suite run. Default scoring = **heuristic + LLM judge** unless **`--no-score`** / **`--no-score-llm`** or **`EVAL_NO_SCORE_*`**. **Exit 1** if any enabled score fails | -| **CI** | **`npm run eval:ci`** = **`--scenarios 01,02`** + heuristic **and** LLM judge. **`scripts/ci-eval.sh`** — requires **`ANTHROPIC_API_KEY`**, **`EVAL_TEST_DESTINATION_URL`** | -| **Re-score** | `npm run score -- --run [--llm] [--write]` | + +| Area | Status | +| -------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| **Runner** | `src/run-agent-eval.ts` — **## Template** from `hookdeck-outpost-agent-prompt.mdx`, `{{…}}` from env, multi-turn scenarios, **Claude Agent SDK** with `**Read` / `Glob` / `Grep` / `WebFetch` / `Write` / `Edit` / `Bash`**, `**cwd`** = `results/runs/-scenario-NN/` | +| **Artifacts** | `transcript.json`, optional `**heuristic-score.json`** + `**llm-score.json`** (LLM reads each scenario `**## Success criteria**`), agent-written files beside the transcript | +| **Heuristics** | `score-transcript.ts` — `**scoreScenario01`–`scoreScenario10`** on assistant text + tool corpus (so **Write**/Edit content counts) | +| **Scenarios** | **01–04:** try-it-out (curl, TS, Python, Go). **05–07:** minimal UIs (Next, FastAPI, Go `net/http`). **08–10:** Option 3 — integrate into pinned repos (Next `**leerob/next-saas-starter`**, FastAPI `**philipokiokio/FastAPI_SAAS_Template`**, Go `**devinterface/startersaas-go-api**`) | +| **CLI** | `**npm run eval` requires `--scenario`, `--scenarios`, or `--all`** — no accidental full-suite run. Default scoring = **heuristic + LLM judge** unless `**--no-score`** / `**--no-score-llm`** or `**EVAL_NO_SCORE_***`. **Exit 1** if any enabled score fails | +| **CI** | `**npm run eval:ci`** = `**--scenarios 01,02`** + heuristic **and** LLM judge. `**scripts/ci-eval.sh`** — requires `**ANTHROPIC_API_KEY`**, `**EVAL_TEST_DESTINATION_URL**` | +| **Re-score** | `npm run score -- --run [--llm] [--write]` | + **Operational** @@ -27,7 +29,7 @@ The automated harness in `docs/agent-evaluation/` is in place. **What it does to ### Recommended run order (test evals → stress prompt) -Run from **`docs/agent-evaluation/`** with **`.env`** set (**`ANTHROPIC_API_KEY`**, **`EVAL_TEST_DESTINATION_URL`**). Use a normal terminal (not a restricted sandbox) for reliable SDK sessions. +Run from `**docs/agent-evaluation/`** with `**.env`** set (`**ANTHROPIC_API_KEY**`, `**EVAL_TEST_DESTINATION_URL**`). Use a normal terminal (not a restricted sandbox) for reliable SDK sessions. **Stage A — basics (fast, minimal tooling)** @@ -53,7 +55,7 @@ npm run eval -- --scenarios 08,09,10 npm run eval -- --all ``` -After each stage, inspect **`results/runs/-scenario-NN/`** (transcript, scores, on-disk artifacts). **Goal:** confirm the **dashboard prompt** + **Success criteria** hold across stacks; **Execution** (live **`OUTPOST_API_KEY`**) remains a separate human step per scenario. +After each stage, inspect `**results/runs/-scenario-NN/**` (transcript, scores, on-disk artifacts). **Goal:** confirm the **dashboard prompt** + **Success criteria** hold across stacks; **Execution** (live `**OUTPOST_API_KEY`**) remains a separate human step per scenario. --- @@ -63,7 +65,7 @@ After each stage, inspect **`results/runs/-scenario-NN/`** (transcript, s 2. **Default backend: Anthropic** — ✅ Agent SDK. 3. **Claude Code CLI** — Optional local path only (unchanged). 4. **OpenAI adapter** — Still optional / not implemented. -5. **Judging** — ✅ Transcripts on disk; ✅ heuristics; ✅ LLM-as-judge vs **`## Success criteria`**. +5. **Judging** — ✅ Transcripts on disk; ✅ heuristics; ✅ LLM-as-judge vs `**## Success criteria`**. 6. **CI shape** — ✅ `eval:ci` + docs; **GitHub Actions workflow** not committed (add `workflow_dispatch` + secrets when ready). **Avoid as primary design:** brittle hand-rolled JSON in bash, or CLI-only gates that break for contributors and headless runners. @@ -78,11 +80,11 @@ After each stage, inspect **`results/runs/-scenario-NN/`** (transcript, s - `quickstarts.mdx` index: managed vs self-hosted links - Content aligned with product copy: API key from **Settings → Secrets**, verify via Hookdeck Console + project logs - SDK quickstarts: env vars, step-commented scripts -- **Agent evaluation:** `docs/agent-evaluation/` — scenarios **01–10**, dual scoring, explicit CLI, CI slice, **`SCENARIO-RUN-TRACKER.md`** (per-scenario + execution log), `results/README.md`, `fixtures/`, `SKILL-UPSTREAM-NOTES.md` +- **Agent evaluation:** `docs/agent-evaluation/` — scenarios **01–10**, dual scoring, explicit CLI, CI slice, `**SCENARIO-RUN-TRACKER.md`** (per-scenario + execution log), `results/README.md`, `fixtures/`, `SKILL-UPSTREAM-NOTES.md` ## Pending / follow-up -- **Prompt + eval validation (in progress):** Run stages **A → B → C** above (or **`--all`** when deliberate); record pass/fail per scenario; adjust prompt or heuristics if systematic failures appear +- **Prompt + eval validation (in progress):** Run stages **A → B → C** above (or `**--all`** when deliberate); record pass/fail per scenario; adjust prompt or heuristics if systematic failures appear - **hookdeck/agent-skills:** Refresh `skills/outpost/SKILL.md` using `docs/agent-evaluation/SKILL-UPSTREAM-NOTES.md` (managed-first, correct `/tenants/` paths, env naming) - **QA:** Run TypeScript, Python, and Go examples against live managed API; confirm production doc links - **Test destination URL:** When Console has a stable public URL story, align quickstarts if copy changes @@ -96,3 +98,4 @@ After each stage, inspect **`results/runs/-scenario-NN/`** (transcript, s - OpenAPI / managed base URL: `https://api.outpost.hookdeck.com/2025-07-01` (in `docs/apis/openapi.yaml` `servers`) - Agent template source: `docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx` - Eval harness: `docs/agent-evaluation/README.md` + diff --git a/docs/agent-evaluation/.env.example b/docs/agent-evaluation/.env.example index 6f1e3eb48..9df940ad4 100644 --- a/docs/agent-evaluation/.env.example +++ b/docs/agent-evaluation/.env.example @@ -24,6 +24,8 @@ EVAL_TEST_DESTINATION_URL= # EVAL_MAX_TURNS=40 # EVAL_PERMISSION_MODE=dontAsk # EVAL_PERSIST_SESSION=true +# Debug only: allow Write/Edit outside the per-run workspace (not recommended) +# EVAL_DISABLE_WORKSPACE_WRITE_GUARD=1 # Scoring is ON by default after each scenario (heuristic + LLM). Opt out: # EVAL_NO_SCORE_HEURISTIC=1 diff --git a/docs/agent-evaluation/README.md b/docs/agent-evaluation/README.md index 274921647..8f63b4abd 100644 --- a/docs/agent-evaluation/README.md +++ b/docs/agent-evaluation/README.md @@ -85,6 +85,8 @@ Two different things get called “permissions”: 2. **Claude Agent SDK `dontAsk` + `allowedTools`** — In `dontAsk` mode, tools **not** listed in `allowedTools` are denied (no prompt). Defaults include **`Write`**, **`Edit`**, and **`Bash`** so app scenarios can scaffold and install dependencies inside the per-run directory. With **`EVAL_LOCAL_DOCS=1`**: **`Read,Glob,Grep,Write,Edit,Bash`**. Otherwise **`Read,Glob,Grep,WebFetch,Write,Edit,Bash`**. Narrow **`EVAL_TOOLS`** only if you need a stricter harness (e.g. transcript-only, no shell). +3. **Run-directory write guard** — a **`PreToolUse`** hook denies **`Write` / `Edit` / `NotebookEdit`** when the target path resolves **outside** the current `results/runs/-scenario-NN/` workspace (hooks enforce this under `permissionMode: dontAsk`; `canUseTool` alone does not). Set **`EVAL_DISABLE_WORKSPACE_WRITE_GUARD=1`** only for debugging. **`Bash`** can still redirect output outside the run dir; review transcripts if that matters. + Changing **`EVAL_PERMISSION_MODE`** is usually unnecessary; widening **`EVAL_TOOLS`** (or using local docs) fixes most tool denials. ### Transcript vs execution (full pass) diff --git a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md index ac620193f..543ef09a9 100644 --- a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md +++ b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md @@ -5,44 +5,43 @@ Use this table while you **run scenarios one at a time** and **execute the gener ## How to use 1. **Automated agent eval** (from `docs/agent-evaluation/`): - - ```sh + ```sh npm run eval -- --scenario - ``` - - Each run creates **`results/runs/-scenario-/`** with `transcript.json`, `heuristic-score.json`, `llm-score.json`, and whatever the agent wrote (scripts, apps, clones). - + ``` + Each run creates `**results/runs/-scenario-/**` with `transcript.json`, `heuristic-score.json`, `llm-score.json`, and whatever the agent wrote (scripts, apps, clones). 2. **Fill the table:** paste or note the **run directory** (stamp), mark **Heuristic** / **LLM** pass or fail (from the sidecars or console). - -3. **Execution (generated code):** with **`OUTPOST_API_KEY`** (and **`OUTPOST_TEST_WEBHOOK_URL`** / **`OUTPOST_API_BASE_URL`** if needed) in your shell or `.env`, run the artifact the scenario expects — e.g. `bash outpost-quickstart.sh`, `npx tsx …`, `python …`, `go run …`, `npm run dev` in the generated app folder. Mark **Pass** / **Fail** / **Skip** and add **Notes** (HTTP status, delivery in Hookdeck Console, etc.). - +3. **Execution (generated code):** with `**OUTPOST_API_KEY`** (and `**OUTPOST_TEST_WEBHOOK_URL`** / `**OUTPOST_API_BASE_URL`** if needed) in your shell or `.env`, run the artifact the scenario expects — e.g. `bash outpost-quickstart.sh`, `npx tsx …`, `python …`, `go run …`, `npm run dev` in the generated app folder. Mark **Pass** / **Fail** / **Skip** and add **Notes** (HTTP status, delivery in Hookdeck Console, etc.). **Do not edit generated files to force a pass** — test what the agent produced; note OS/environment (e.g. Linux vs macOS) when relevant. **This column is the primary bar for “does the output actually work?”** Heuristic and LLM scores are supplementary. 4. **Optional:** copy a row to your local run log under `results/` if you use `RUN-RECORDING.template.md`. --- ## Tracker -| ID | Scenario file | Run directory (`results/runs/…`) | Heuristic | LLM judge | Execution (generated code) | Notes | -|----|---------------|-----------------------------------|-----------|-----------|----------------------------|-------| -| 01 | [01-basics-curl.md](scenarios/01-basics-curl.md) | | | | | | -| 02 | [02-basics-typescript.md](scenarios/02-basics-typescript.md) | | | | | | -| 03 | [03-basics-python.md](scenarios/03-basics-python.md) | | | | | | -| 04 | [04-basics-go.md](scenarios/04-basics-go.md) | | | | | | -| 05 | [05-app-nextjs.md](scenarios/05-app-nextjs.md) | | | | | | -| 06 | [06-app-fastapi.md](scenarios/06-app-fastapi.md) | | | | | | -| 07 | [07-app-go-http.md](scenarios/07-app-go-http.md) | | | | | | -| 08 | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md) | | | | | | -| 09 | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) | | | | | | -| 10 | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md) | | | | | | + +| ID | Scenario file | Run directory (`results/runs/…`) | Heuristic | LLM judge | Execution (generated code) | Notes | +| --- | ------------------------------------------------------------------------------ | -------------------------------- | --------- | --------- | ---------------------------- | ----- | +| 01 | [01-basics-curl.md](scenarios/01-basics-curl.md) | | | | | | +| 02 | [02-basics-typescript.md](scenarios/02-basics-typescript.md) | | | | | | +| 03 | [03-basics-python.md](scenarios/03-basics-python.md) | | | | | | +| 04 | [04-basics-go.md](scenarios/04-basics-go.md) | | | | | | +| 05 | [05-app-nextjs.md](scenarios/05-app-nextjs.md) | | | | | | +| 06 | [06-app-fastapi.md](scenarios/06-app-fastapi.md) | | | | | | +| 07 | [07-app-go-http.md](scenarios/07-app-go-http.md) | | | | | | +| 08 | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md) | | | | | | +| 09 | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) | | | | | | +| 10 | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md) | | | | | | + ### Column hints -| Column | Meaning | -|--------|---------| -| **Run directory** | e.g. `2026-04-07T15-00-00-000Z-scenario-01` — the folder containing `transcript.json` | -| **Heuristic** | `heuristic-score.json` → `overallTranscriptPass` (or `passed`/`total`) | -| **LLM judge** | `llm-score.json` → `overall_transcript_pass` | -| **Execution** | Your smoke test of the **produced** script/app with real credentials — **not** automated by `npm run eval` | + +| Column | Meaning | +| ----------------- | ---------------------------------------------------------------------------------------------------------- | +| **Run directory** | e.g. `2026-04-07T15-00-00-000Z-scenario-01` — the folder containing `transcript.json` | +| **Heuristic** | `heuristic-score.json` → `overallTranscriptPass` (or `passed`/`total`) | +| **LLM judge** | `llm-score.json` → `overall_transcript_pass` | +| **Execution** | Your smoke test of the **produced** script/app with real credentials — **not** automated by `npm run eval` | + ### Status legend (suggested) @@ -50,4 +49,10 @@ Use short text or symbols in cells, e.g. **Pass** / **Fail** / **Skip** / **N/A* --- -Full harness docs: [README.md](README.md). +## Action items + +Add bullet or table rows here when something should be tracked across runs (docs gaps, harness changes, etc.). *None recorded yet for this pass.* + +--- + +Full harness docs: [README.md](README.md). \ No newline at end of file diff --git a/docs/agent-evaluation/scenarios/01-basics-curl.md b/docs/agent-evaluation/scenarios/01-basics-curl.md index b7a491861..ad48add99 100644 --- a/docs/agent-evaluation/scenarios/01-basics-curl.md +++ b/docs/agent-evaluation/scenarios/01-basics-curl.md @@ -11,7 +11,7 @@ Agent should produce a **minimal shell + curl** flow against the **managed** API ## Automated eval (Claude Agent SDK) -The harness sets the agent **cwd** to an empty directory under `docs/agent-evaluation/results/runs/-scenario-NN/`. Save the shell script there with **Write** (e.g. `outpost-quickstart.sh`), not only as a fenced block in chat, so the run folder is reviewable on disk. +The harness sets the agent **cwd** to an empty directory under `docs/agent-evaluation/results/runs/-scenario-NN/`. **`Write` / `Edit` / `NotebookEdit` paths are enforced** to that directory only (absolute paths elsewhere are denied). Save the script as e.g. **`outpost-quickstart.sh`** in that folder (relative path or a path under the run dir), not under `examples/` or the repo root. ## Conversation script @@ -21,15 +21,15 @@ Paste the **## Template** block from `[hookdeck-outpost-agent-prompt.mdx](../pag ### Turn 1 — User -> I only want the basics using **curl** against the managed API. No SDK. Give me a **single shell script** I can save and run (e.g. `bash outpost-quickstart.sh`) that: creates a tenant, adds a webhook destination for my test URL, and publishes one event. Use the topic from the prompt. Use `OUTPOST_API_KEY` from the environment (document that I should `export` it or load `.env`). If you can’t provide a file, paste one script block I can save as `.sh`. +> I want option 1 — **the simplest thing possible**. I don’t need a framework or SDK; just the smallest path to see tenant → webhook → publish working. -### Turn 2 — User (optional probe) +### Turn 2 — User (optional) -> Show me how to verify delivery after I run those commands. +> How do I know the event actually reached my test URL? ## Success criteria -**Measurement:** Heuristic rubric `scoreScenario01` in [`../src/score-transcript.ts`](../src/score-transcript.ts) (assistant text + tool-written script content). LLM judge: `npm run score -- --run --llm`. Execution row remains manual. +**Measurement:** Heuristic rubric `scoreScenario01` in `[../src/score-transcript.ts](../src/score-transcript.ts)` (assistant text + tool-written script content). LLM judge: `npm run score -- --run --llm`. Execution row remains manual. - Uses managed base URL `https://api.outpost.hookdeck.com/2025-07-01` (or explicit `OUTPOST_API_BASE_URL`), **not** `localhost:3333/api/v1`, unless the user asked for self-hosted. - Tenant: `PUT .../tenants/{tenant_id}` with `Authorization: Bearer` (or documents equivalent). @@ -44,5 +44,4 @@ Paste the **## Template** block from `[hookdeck-outpost-agent-prompt.mdx](../pag - Wrong path (`PUT /{tenant}` without `/tenants/`). - Mixing self-hosted base path with managed host. -- Skipping topic alignment with dashboard configuration. - +- Skipping topic alignment with dashboard configuration. \ No newline at end of file diff --git a/docs/agent-evaluation/scenarios/02-basics-typescript.md b/docs/agent-evaluation/scenarios/02-basics-typescript.md index 9a2fc40a7..a403bab6d 100644 --- a/docs/agent-evaluation/scenarios/02-basics-typescript.md +++ b/docs/agent-evaluation/scenarios/02-basics-typescript.md @@ -21,15 +21,15 @@ Paste the **## Template** block from `[hookdeck-outpost-agent-prompt.mdx](../pag ### Turn 1 — User -> Option 1 — try it out. Use **TypeScript** only: one script file, use `@hookdeck/outpost-sdk`, read `OUTPOST_API_KEY` and `OUTPOST_TEST_WEBHOOK_URL` from the environment. Create tenant, webhook destination for the topic in the prompt, publish one test event, print the event id. +> Option 1. Let’s do it in **TypeScript**. ### Turn 2 — User (optional) -> How do I run it? +> How do I run it locally? ## Success criteria -**Measurement:** Heuristic `scoreScenario02` in [`src/score-transcript.ts`](../src/score-transcript.ts); LLM judge maps the bullets below ([`README.md` § Measuring scenarios](../README.md#measuring-scenarios)). Execution row is manual. +**Measurement:** Heuristic `scoreScenario02` in [`src/score-transcript.ts`](../src/score-transcript.ts); LLM judge maps the bullets below ([README.md § Measuring scenarios](../README.md#measuring-scenarios)). Execution row is manual. - Depends on `@hookdeck/outpost-sdk`; uses `Outpost` client with `apiKey` from `process.env.OUTPOST_API_KEY`. - Calls `tenants.upsert`, `destinations.create` (webhook), `publish.event`. diff --git a/docs/agent-evaluation/scenarios/03-basics-python.md b/docs/agent-evaluation/scenarios/03-basics-python.md index 2d9ecb88b..880b3c5e1 100644 --- a/docs/agent-evaluation/scenarios/03-basics-python.md +++ b/docs/agent-evaluation/scenarios/03-basics-python.md @@ -17,27 +17,27 @@ The harness sets the agent **cwd** to `docs/agent-evaluation/results/runs/ Option 1 — try it out. Use **Python** with `outpost_sdk`. Read credentials from the environment. Same flow: tenant, webhook destination, one publish, print event id. +> Option 1. I’d like to use **Python**. ### Turn 2 — User (optional) -> Keep it to one file I can run with `python`. +> One file I can run with `python` is enough. ## Success criteria -**Measurement:** Heuristic `scoreScenario03` in [`src/score-transcript.ts`](../src/score-transcript.ts); LLM judge maps the checklist below ([`README.md` § Measuring scenarios](../README.md#measuring-scenarios)). Execution row is manual. +**Measurement:** Heuristic `scoreScenario03` in [`src/score-transcript.ts`](../src/score-transcript.ts); LLM judge maps the checklist below ([README.md § Measuring scenarios](../README.md#measuring-scenarios)). Execution row is manual. -- [ ] `from outpost_sdk import Outpost` (or equivalent documented import path). -- [ ] `Outpost(api_key=..., server_url=...)` with optional base URL from env. -- [ ] `tenants.upsert`, `destinations.create`, `publish.event` with correct shapes. -- [ ] Topic aligned with prompt; webhook URL from env. -- [ ] No secrets in file. -- [ ] **Execution (full pass):** With `OUTPOST_API_KEY`, `OUTPOST_TEST_WEBHOOK_URL`, and optional base URL env vars set, `python …` (as documented) completes without API errors and prints an event id or clear success. *Skip only for transcript-only triage.* +- `from outpost_sdk import Outpost` (or equivalent documented import path). +- `Outpost(api_key=..., server_url=...)` with optional base URL from env. +- `tenants.upsert`, `destinations.create`, `publish.event` as in the **Python quickstart** (including `request=` for publish where the SDK requires it). +- Topic aligned with prompt; webhook URL from env. +- No secrets in file. +- **Execution (full pass):** With `OUTPOST_API_KEY`, `OUTPOST_TEST_WEBHOOK_URL`, and optional base URL env vars set, `python …` (as documented) completes without API errors and prints an event id or clear success. *Skip only for transcript-only triage.* ## Failure modes to note -- Using `requests` only when user asked for the official SDK. +- Using `requests` only when user asked for the official SDK. \ No newline at end of file diff --git a/docs/agent-evaluation/scenarios/04-basics-go.md b/docs/agent-evaluation/scenarios/04-basics-go.md index 29622c6a1..7d575c62f 100644 --- a/docs/agent-evaluation/scenarios/04-basics-go.md +++ b/docs/agent-evaluation/scenarios/04-basics-go.md @@ -21,7 +21,11 @@ Paste the **## Template** block from [`hookdeck-outpost-agent-prompt.mdx`](../pa ### Turn 1 — User -> Option 1 — try it out. Use **Go** and the official Outpost Go SDK. Environment variables for API key and test webhook URL. Tenant upsert, webhook destination, publish one event, print ids. +> Option 1. I want to try it in **Go**. + +### Turn 2 — User (optional) + +> Keep the program small — one `main` or a couple of files is fine. ## Success criteria diff --git a/docs/agent-evaluation/scenarios/05-app-nextjs.md b/docs/agent-evaluation/scenarios/05-app-nextjs.md index 3e5ffa10b..f44061775 100644 --- a/docs/agent-evaluation/scenarios/05-app-nextjs.md +++ b/docs/agent-evaluation/scenarios/05-app-nextjs.md @@ -26,17 +26,17 @@ Paste the **## Template** block from `[hookdeck-outpost-agent-prompt.mdx](../pag ### Turn 1 — User -> Option 2 — build a minimal example. I want **Next.js**. Very small UI: field for webhook URL, button to create the webhook destination for tenant `demo_tenant` (or let me edit tenant id in the UI), and a button to send one test event on topic `user.created` (or the first topic from the prompt). Use the Outpost TypeScript SDK on the server only. +> Option 2 — a **tiny demo app**. Can we use **Next.js**? I want a minimal page: somewhere to put a webhook URL, register it for a customer, and a way to fire one test event. ### Turn 2 — User (optional) -> Add a short README with env vars and `npm run dev` steps. +> Can you add a short README — what goes in `.env` and how I start the dev server? ### Turn 3 — User (stress) -> I do not have a public URL yet — what should I use for the webhook URL field? +> I don’t have a public webhook URL yet. What should I put in that field? -Expected: agent suggests Hookdeck Console Source URL or similar, aligned with quickstarts. +*Expected:* agent points to a Hookdeck Console Source URL (or equivalent) consistent with the quickstarts and Turn 0 test destination. ## Success criteria diff --git a/docs/agent-evaluation/scenarios/06-app-fastapi.md b/docs/agent-evaluation/scenarios/06-app-fastapi.md index 1f00b5f68..704415e33 100644 --- a/docs/agent-evaluation/scenarios/06-app-fastapi.md +++ b/docs/agent-evaluation/scenarios/06-app-fastapi.md @@ -24,11 +24,11 @@ Paste the **## Template** block from [`hookdeck-outpost-agent-prompt.mdx`](../pa ### Turn 1 — User -> Option 2 — minimal example with **FastAPI**. Single small app: HTML page with webhook URL field, button to register destination for tenant `demo_tenant`, button to publish one test event. Use `outpost_sdk` only on the server. Keep it to a few files. +> Option 2 — **FastAPI**, same idea as a tiny demo: simple HTML, register a webhook for a tenant, button to send one test event. Keep the codebase small. ### Turn 2 — User (optional) -> Document env vars and `uvicorn` command in README. +> README with env vars and how to run it would help. ## Success criteria diff --git a/docs/agent-evaluation/scenarios/07-app-go-http.md b/docs/agent-evaluation/scenarios/07-app-go-http.md index cfdd594a9..5dfdd85e2 100644 --- a/docs/agent-evaluation/scenarios/07-app-go-http.md +++ b/docs/agent-evaluation/scenarios/07-app-go-http.md @@ -23,11 +23,11 @@ Paste the **## Template** block from `[hookdeck-outpost-agent-prompt.mdx](../pag ### Turn 1 — User -> Option 2 — minimal example in **Go**. Standard library HTTP server, simple HTML page: register webhook destination for a fixed tenant id, then button to publish one event. Use the official Go SDK for Outpost calls. API key from environment. +> Option 2 — **Go** with the standard library: small HTTP server, basic HTML, register a webhook and publish one test event. ### Turn 2 — User (optional) -> Keep everything in `main.go` if reasonable, or split `handlers.go` — your choice, but stay small. +> One or two files is fine if you can keep it readable. ## Success criteria diff --git a/docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md b/docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md index 56cd9c9b0..fc1594ff0 100644 --- a/docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md +++ b/docs/agent-evaluation/scenarios/08-integrate-nextjs-existing.md @@ -23,18 +23,13 @@ Paste the **## Template** block from [`hookdeck-outpost-agent-prompt.mdx`](../pa ### Turn 1 — User -> **Option 3 — integrate with an existing app.** Clone **`https://github.com/leerob/next-saas-starter`** into this workspace (subdirectory is fine), install dependencies per its README, and get it in a state where we could run it locally. +> Option 3 — I’m not starting from scratch. Please clone **`https://github.com/leerob/next-saas-starter`** here, install it, and get it runnable. Then wire in **Hookdeck Outpost** so we can send **outbound webhooks** to our customers. > -> Then integrate **Hookdeck Outpost** for **outbound webhooks** to our customers: -> -> 1. Use the official **`@hookdeck/outpost-sdk`** on the **server only** (API routes, server actions, or equivalent — never expose `OUTPOST_API_KEY` to the browser). -> 2. Pick **one meaningful domain event** in this starter (e.g. team or member lifecycle — choose something that actually exists in the code) and **`publish`** an event to Outpost with a **topic** from the Turn 0 prompt (or document the topic constant). -> 3. Document how an operator registers a **webhook destination** per **tenant/customer** (REST flow or small admin UI is fine). Use the test destination URL from Turn 0 where helpful. -> 4. Add or update a **README section** listing required env vars (`OUTPOST_API_KEY`, optional base URL, anything else you add). +> I need this tied to **something real in the app** (not a throwaway demo page), and I need to understand how each customer gets their webhook registered. Put whatever I need to configure in the README (env vars, etc.). Keep secrets on the server only. ### Turn 2 — User (optional) -> Where should we call **`tenants.upsert`** relative to our own tenant/customer model? +> When should we create or sync the Outpost **tenant** with our own customer or team model? ## Success criteria diff --git a/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md b/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md index 72c63ef86..dd8270921 100644 --- a/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md +++ b/docs/agent-evaluation/scenarios/09-integrate-fastapi-existing.md @@ -22,18 +22,13 @@ Paste the **## Template** from [`hookdeck-outpost-agent-prompt.mdx`](../pages/qu ### Turn 1 — User -> **Option 3 — integrate with an existing app.** Clone **`https://github.com/philipokiokio/FastAPI_SAAS_Template`** into this workspace, install dependencies per its README (venv + `pip install -r requirements.txt` or `uv` as you prefer). +> Option 3 — integrate Outpost into a real codebase. Clone **`https://github.com/philipokiokio/FastAPI_SAAS_Template`**, set it up from its README, then add **Hookdeck Outpost** for customer webhooks. > -> Integrate **Hookdeck Outpost** for **outbound webhooks**: -> -> 1. Use **`outpost_sdk`** only in **server** code (routers, services — never embed the API key in templates or static JS). -> 2. Hook **`publish.event`** (and tenant/destination setup as needed) to **one real domain event** in this template (e.g. org membership or user lifecycle — pick something that exists in the codebase). -> 3. Document how operators register **webhook destinations** per tenant/customer and which **topic** you publish on (use topics from Turn 0 when possible). -> 4. Document **`OUTPOST_API_KEY`** and **`uvicorn`** (or equivalent) run instructions in README. +> Hook publishing to **one real event** that already exists in the app (orgs, users, whatever fits). Document topics, how tenants register webhook URLs, and env vars. Don’t leak the API key to the client. ### Turn 2 — User (optional) -> Should **`tenants.upsert`** run at org creation or lazily on first publish? +> Should we create the Outpost tenant when the org is created, or lazily on first publish? ## Success criteria diff --git a/docs/agent-evaluation/scenarios/10-integrate-go-existing.md b/docs/agent-evaluation/scenarios/10-integrate-go-existing.md index c8f91c79e..1408caa57 100644 --- a/docs/agent-evaluation/scenarios/10-integrate-go-existing.md +++ b/docs/agent-evaluation/scenarios/10-integrate-go-existing.md @@ -22,18 +22,13 @@ Paste the **## Template** from [`hookdeck-outpost-agent-prompt.mdx`](../pages/qu ### Turn 1 — User -> **Option 3 — integrate with an existing app.** Clone **`https://github.com/devinterface/startersaas-go-api`** into this workspace and make it build (`go build` / `go test` ./… as appropriate per the repo). +> Option 3 — existing Go API. Clone **`https://github.com/devinterface/startersaas-go-api`**, get it building, then add **Hookdeck Outpost** for outbound webhooks. > -> Add **Hookdeck Outpost** for **outbound webhooks** to customers: -> -> 1. Use the official **Go SDK** (`github.com/hookdeck/outpost/sdks/outpost-go` or current module path from docs). -> 2. **`OUTPOST_API_KEY`** from environment only. -> 3. On **one real domain event** in this API (e.g. user registration, subscription, or another existing handler), call **`Publish.Event`** (and **`Tenants` / `Destinations`** as needed) with a **topic** from Turn 0. -> 4. Document how to register **webhook destinations** per tenant and which env vars to set. Mention the Hookdeck test destination URL from Turn 0 where useful. +> Use **one real handler** as the publish trigger (signup, billing, etc.). API key from env only. Document how customers register webhook URLs and what to set in env. Use the test destination from the dashboard prompt where it helps. ### Turn 2 — User (optional) -> Show where **`CreateDestinationCreateWebhook`** fits if we let each customer paste a webhook URL in a settings API. +> If customers submit a webhook URL in a settings endpoint, where does destination creation live? ## Success criteria diff --git a/docs/agent-evaluation/src/run-agent-eval.ts b/docs/agent-evaluation/src/run-agent-eval.ts index 72464f3a2..25c67f459 100644 --- a/docs/agent-evaluation/src/run-agent-eval.ts +++ b/docs/agent-evaluation/src/run-agent-eval.ts @@ -8,12 +8,13 @@ */ import { mkdir, readdir, readFile, writeFile } from "node:fs/promises"; -import { join, dirname } from "node:path"; +import { dirname, join, resolve, sep } from "node:path"; import { fileURLToPath } from "node:url"; import { parseArgs } from "node:util"; import dotenv from "dotenv"; import { query, + type HookInput, type Options, type SDKMessage, type SDKSystemMessage, @@ -68,18 +69,38 @@ function envFlagTruthy(v: string | undefined): boolean { /** When docs are not published yet, point the agent at MDX/OpenAPI paths in this repo. */ function localDocumentationBlock(repoRoot: string, llmsFullUrl: string | undefined): string { const f = (...parts: string[]) => join(repoRoot, ...parts); + const languageSdkBlock = `### Language → SDK vs HTTP + +Map what the user says (they rarely name packages): + +- **Simplest / minimal / least setup** and no language named → **curl** quickstart + OpenAPI; one shell script; **no SDK**. Publish success is **HTTP 202**; see curl quickstart for script portability (avoid GNU-only \`head -n -1\`). +- **TypeScript** or **Node** → TypeScript quickstart + \`@hookdeck/outpost-sdk\` as in that doc. +- **Python** → Python quickstart + \`outpost_sdk\`; \`publish.event(request={{...}})\` as in that doc — not TS-style kwargs. +- **Go** → Go quickstart + official Go SDK as in that doc. +- Explicit **curl** / **HTTP only** / **REST** → curl quickstart + OpenAPI. + +**Small app (option 2):** Next.js → TS SDK server-side; FastAPI → Python SDK; Go net/http → Go SDK — use that language’s quickstart for Outpost shapes. + +**Existing app (option 3):** Official SDK for the repo’s language (or REST if they refuse SDK). + +Do **not** mix TS call shapes into Python.`; + let block = `### Documentation (local repository — unpublished) Do **not** rely on live public documentation URLs for this session. Read these files from the Outpost checkout (for example with the **Read** tool). Paths are absolute from the repository root: -- Getting started (curl): \`${f("docs/pages/quickstarts/hookdeck-outpost-curl.mdx")}\` -- TypeScript quickstart: \`${f("docs/pages/quickstarts/hookdeck-outpost-typescript.mdx")}\` -- Python quickstart: \`${f("docs/pages/quickstarts/hookdeck-outpost-python.mdx")}\` -- Go quickstart: \`${f("docs/pages/quickstarts/hookdeck-outpost-go.mdx")}\` +Follow **Language → SDK vs HTTP** below for mapping user intent to the **single** right quickstart. Prefer language quickstarts over \`sdks.mdx\` (TS-heavy). + +- Getting started (curl / HTTP only): \`${f("docs/pages/quickstarts/hookdeck-outpost-curl.mdx")}\` +- TypeScript quickstart (TS SDK): \`${f("docs/pages/quickstarts/hookdeck-outpost-typescript.mdx")}\` +- Python quickstart (Python SDK): \`${f("docs/pages/quickstarts/hookdeck-outpost-python.mdx")}\` +- Go quickstart (Go SDK): \`${f("docs/pages/quickstarts/hookdeck-outpost-go.mdx")}\` - API reference (human-oriented pages under): \`${f("docs/pages/references/")}\` - OpenAPI spec (machine-readable): \`${f("docs/apis/openapi.yaml")}\` - Destination types: \`${f("docs/pages/destinations/")}\` -- SDKs overview: \`${f("docs/pages/sdks.mdx")}\``; +- SDKs overview (TS-heavy): \`${f("docs/pages/sdks.mdx")}\` — prefer the language quickstart over this for Python/Go/TS code. + +${languageSdkBlock}`; if (llmsFullUrl) { block += `\n- Full docs bundle: ${llmsFullUrl}`; } @@ -295,6 +316,48 @@ async function runOneScenario( }; } +/** True if resolved `filePath` is `runDir` or a path inside it (never outside). */ +function filePathIsInsideRunDir(runDir: string, filePath: string): boolean { + const root = resolve(runDir); + const target = resolve(filePath); + if (target === root) return true; + const prefix = root.endsWith(sep) ? root : root + sep; + return target.startsWith(prefix); +} + +function toolInputFilePath(toolName: string, toolInput: unknown): string | undefined { + if (toolName !== "Write" && toolName !== "Edit" && toolName !== "NotebookEdit") { + return undefined; + } + if (typeof toolInput !== "object" || toolInput === null) return undefined; + const input = toolInput as Record; + for (const k of ["file_path", "path", "notebook_path"] as const) { + const v = input[k]; + if (typeof v === "string" && v.length > 0) return v; + } + return undefined; +} + +/** + * PreToolUse hook: deny Write/Edit/NotebookEdit outside the run dir. + * `canUseTool` is not reliable under `permissionMode: dontAsk`; hooks receive `permissionDecision` instead. + */ +function createRunDirPreToolHook(runDir: string) { + return async (input: HookInput) => { + if (input.hook_event_name !== "PreToolUse") return {}; + const candidate = toolInputFilePath(input.tool_name, input.tool_input); + if (!candidate) return {}; + if (filePathIsInsideRunDir(runDir, candidate)) return {}; + return { + hookSpecificOutput: { + hookEventName: "PreToolUse" as const, + permissionDecision: "deny" as const, + permissionDecisionReason: `Outpost agent-eval: ${input.tool_name} must target only the scenario workspace. Use a path under ${runDir} (e.g. outpost-quickstart.sh). Refused: ${resolve(candidate)}`, + }, + }; + }; +} + function defaultEvalTools(env: NodeJS.ProcessEnv): string { if (env.EVAL_TOOLS?.trim()) { return env.EVAL_TOOLS.trim(); @@ -319,14 +382,14 @@ function buildBaseOptions(agentWorkspaceCwd: string): Options { Options["permissionMode"] >; - const maxTurns = Number(process.env.EVAL_MAX_TURNS ?? "40"); + const maxTurns = Number(process.env.EVAL_MAX_TURNS ?? "80"); const persistSession = process.env.EVAL_PERSIST_SESSION !== "false"; const o: Options = { cwd: agentWorkspaceCwd, allowedTools, permissionMode: mode, - maxTurns: Number.isFinite(maxTurns) ? maxTurns : 40, + maxTurns: Number.isFinite(maxTurns) ? maxTurns : 80, persistSession, env: { ...process.env, @@ -334,6 +397,12 @@ function buildBaseOptions(agentWorkspaceCwd: string): Options { } as Record, }; + if (!envFlagTruthy(process.env.EVAL_DISABLE_WORKSPACE_WRITE_GUARD)) { + o.hooks = { + PreToolUse: [{ hooks: [createRunDirPreToolHook(agentWorkspaceCwd)] }], + }; + } + if (process.env.EVAL_MODEL?.trim()) { o.model = process.env.EVAL_MODEL.trim(); } @@ -385,9 +454,10 @@ Environment: EVAL_LLMS_FULL_URL Optional (omit docs line if unset) EVAL_TOOLS Optional, comma-separated (default: Read,Glob,Grep[,WebFetch],Write,Edit,Bash — see README) EVAL_MODEL Optional - EVAL_MAX_TURNS Optional (default: 40) + EVAL_MAX_TURNS Optional (default: 80; npm/go mod installs can exceed 40) EVAL_PERMISSION_MODE Optional (default: dontAsk) EVAL_PERSIST_SESSION Set to "false" to disable session persistence (breaks multi-turn resume) + EVAL_DISABLE_WORKSPACE_WRITE_GUARD Set to 1 to allow Write/Edit outside the run dir (not recommended) Outputs under docs/agent-evaluation/results/runs/ (gitignored): each scenario gets results/runs/-scenario-NN/transcript.json diff --git a/docs/agent-evaluation/src/score-transcript.ts b/docs/agent-evaluation/src/score-transcript.ts index 5ba55459b..ec7455243 100644 --- a/docs/agent-evaluation/src/score-transcript.ts +++ b/docs/agent-evaluation/src/score-transcript.ts @@ -200,8 +200,11 @@ function scoreScenario01(corpus: string, assistant: string, meta: RunJson["meta" }); const afterPublish = t.split(/\/publish/i).pop() ?? t; - const wrongPayload = /"payload"\s*:/.test(afterPublish); - const hasData = /"data"\s*:/.test(afterPublish); + // Tool corpus JSON-stringifies Write bodies, so bash-escaped keys look like \"data\": not "data": + const wrongPayload = + /"payload"\s*:/.test(afterPublish) || /\\"payload\\"\s*:/.test(afterPublish); + const hasData = + /"data"\s*:/.test(afterPublish) || /\\"data\\"\s*:/.test(afterPublish); checks.push({ id: "publish_body_data_not_payload", pass: publish && !wrongPayload && hasData, diff --git a/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx b/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx index bba3b53d7..8e6afe122 100644 --- a/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx +++ b/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx @@ -29,26 +29,44 @@ Use this **Hookdeck Console Source** URL to verify event delivery (the webhook ` ### Documentation -- Getting started (curl): {{DOCS_URL}}/quickstarts/hookdeck-outpost-curl -- TypeScript quickstart: {{DOCS_URL}}/quickstarts/hookdeck-outpost-typescript -- Python quickstart: {{DOCS_URL}}/quickstarts/hookdeck-outpost-python -- Go quickstart: {{DOCS_URL}}/quickstarts/hookdeck-outpost-go +- Getting started (curl / HTTP only, no SDK): {{DOCS_URL}}/quickstarts/hookdeck-outpost-curl +- TypeScript quickstart (TypeScript SDK): {{DOCS_URL}}/quickstarts/hookdeck-outpost-typescript +- Python quickstart (Python SDK): {{DOCS_URL}}/quickstarts/hookdeck-outpost-python +- Go quickstart (Go SDK): {{DOCS_URL}}/quickstarts/hookdeck-outpost-go - Full docs bundle (when available on the public site): {{LLMS_FULL_URL}} -- API reference: {{DOCS_URL}}/api +- API reference and OpenAPI (REST JSON shapes and status codes): {{DOCS_URL}}/api - Destination types: {{DOCS_URL}}/destinations -- SDK documentation: {{DOCS_URL}}/sdks +- SDK overview (mostly TypeScript-shaped examples): {{DOCS_URL}}/sdks — use **only** for high-level context; for **TypeScript, Python, or Go** code, follow that language’s **quickstart** for correct method signatures (e.g. Python `publish.event` uses `request={{...}}`, not TypeScript-style spreads as Python kwargs). + +### Language → SDK vs HTTP + +Operators rarely name packages or SDK details. **You** map what they say to the right doc and dependency: + +**“Try it out” — interpret their words** + +- **Simplest / fastest / minimal / least setup / “just show me” / no framework** (and they do **not** name TypeScript, Python, or Go) → treat as **curl**: **curl quickstart** + **OpenAPI** for exact JSON. One runnable shell script is ideal. **No SDK.** +- **TypeScript** or **Node** → **TypeScript quickstart**; use the **official TypeScript SDK** (`@hookdeck/outpost-sdk`) exactly as that quickstart shows. The user does not need to say “SDK.” +- **Python** → **Python quickstart**; use **`outpost_sdk`** as that quickstart shows (e.g. Python `publish.event` uses `request={{...}}` — **not** TypeScript-style kwargs on the method). +- **Go** → **Go quickstart**; use the **official Go SDK** as that quickstart shows. +- They explicitly want **curl**, **HTTP only**, or **REST** without a language SDK → **curl quickstart** + OpenAPI. + +Do **not** mix patterns across languages (e.g. do not apply TypeScript `publish.event({ ... })` argument style to Python). + +**Option 2 (small app)** — Map framework to the matching official SDK on the **server only**: e.g. **Next.js** → TypeScript SDK + patterns from the TypeScript quickstart and your Next conventions; **FastAPI** → Python SDK; **Go + net/http** → Go SDK. Prefer each language’s **quickstart** for Outpost call shapes. + +**Option 3 (existing app)** — Use the **official SDK for the repo’s language** on the server (or REST/OpenAPI if they insist on no SDK). Read that language’s quickstart for shapes; integrate on **real** domain paths, not throwaway demos. ### What to do -Ask the user which of the following they want: +Guide the conversation, then act: -1. **Try it out** — Create a minimal script that runs through the full flow: create a tenant, add a webhook destination, publish a test event. Ask which language they prefer (TypeScript, Python, Go, or curl) and follow the matching quickstart doc. +1. **Try it out** — Minimal path: tenant → webhook destination → publish → print event id (or show success). If they want the **simplest** path, default to **curl** without making them say “curl.” If they name **TypeScript**, **Python**, or **Go**, use **only** that language’s quickstart and implied SDK. Ask only for what the quickstart and runnability still need (env vars, etc.). -2. **Build a minimal example** — Scaffold a small app with a simple UI that demonstrates tenant creation, destination management, and event publishing. Ask which framework they prefer. +2. **Build a minimal example** — Small UI + server; use the **SDK for that stack** (see **Option 2** above) or REST if they choose HTTP-only. -3. **Integrate with an existing app** — Inspect the codebase for language and framework, then integrate Outpost: add the SDK (or use REST), create tenants when customers onboard, and publish events at the right points in application logic. +3. **Integrate with an existing app** — Clone or open their codebase; add Outpost per **Option 3** above; document env vars and operator steps. -For all modes, read the relevant quickstart documentation before writing code. +For all modes, read the **single** language-appropriate quickstart (and OpenAPI when implementing raw HTTP) before writing code. **Files on disk:** When your environment supports it, **write runnable artifacts into the operator’s project workspace** (real files: scripts, app source, `package.json`, `go.mod`, README) rather than only pasting long code in chat—so they can run, diff, and commit. Keep everything for one task in the same directory. For **minimal example apps** (option 2), scaffold and install dependencies there as you normally would (for example `npm` / `npx`, `go mod`, `pip` or `uv`). diff --git a/docs/pages/quickstarts/hookdeck-outpost-curl.mdx b/docs/pages/quickstarts/hookdeck-outpost-curl.mdx index c7614b614..6194262f8 100644 --- a/docs/pages/quickstarts/hookdeck-outpost-curl.mdx +++ b/docs/pages/quickstarts/hookdeck-outpost-curl.mdx @@ -83,6 +83,13 @@ curl --request POST "$OUTPOST_API_BASE_URL/publish" \ A `202` response means the event was accepted for delivery. +## Shell scripts: status codes and portability + +If you combine API response bodies with `curl --write-out '\n%{http_code}'`: + +- **Publish** success is **HTTP 202** (not only 200/201). Treat **202** as success in conditional checks. +- **Portability:** GNU `head -n -1` (“all lines but the last”) is **not** available on macOS BSD `head`. Prefer splitting with **`sed '$d'`** (body) and **`tail -n 1`** (status), or another POSIX-friendly approach, so the same script runs on Linux and macOS. + ## Verify delivery - In **Hookdeck Console**, inspect the connection or destination you used (for example the Source you created) and confirm the webhook request and payload look correct. From 3bc54696115c81416fbd892cc1b7ca7a78f3f0bd Mon Sep 17 00:00:00 2001 From: Phil Leggetter Date: Wed, 8 Apr 2026 16:02:51 +0100 Subject: [PATCH 04/47] docs(agent-eval): record fresh scenario 01 eval run in tracker Made-with: Cursor --- docs/agent-evaluation/SCENARIO-RUN-TRACKER.md | 22 +++++++++---------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md index 543ef09a9..e02009ff5 100644 --- a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md +++ b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md @@ -19,17 +19,17 @@ Use this table while you **run scenarios one at a time** and **execute the gener | ID | Scenario file | Run directory (`results/runs/…`) | Heuristic | LLM judge | Execution (generated code) | Notes | -| --- | ------------------------------------------------------------------------------ | -------------------------------- | --------- | --------- | ---------------------------- | ----- | -| 01 | [01-basics-curl.md](scenarios/01-basics-curl.md) | | | | | | -| 02 | [02-basics-typescript.md](scenarios/02-basics-typescript.md) | | | | | | -| 03 | [03-basics-python.md](scenarios/03-basics-python.md) | | | | | | -| 04 | [04-basics-go.md](scenarios/04-basics-go.md) | | | | | | -| 05 | [05-app-nextjs.md](scenarios/05-app-nextjs.md) | | | | | | -| 06 | [06-app-fastapi.md](scenarios/06-app-fastapi.md) | | | | | | -| 07 | [07-app-go-http.md](scenarios/07-app-go-http.md) | | | | | | -| 08 | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md) | | | | | | -| 09 | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) | | | | | | -| 10 | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md) | | | | | | +| --- | ------------------------------------------------------------------------------ | -------------------------------- | --------- | --------- | -------------------------- | ----- | +| 01 | [01-basics-curl.md](scenarios/01-basics-curl.md) | `2026-04-08T14-58-40-850Z-scenario-01` | Pass (7/7) | Pass | — | Eval exit 0. Artifact: **`try-it-out.sh`**. **Execution** (manual): set `OUTPOST_API_KEY`, run script; uses `curl --fail-with-body` (2xx includes **202** on publish). | +| 02 | [02-basics-typescript.md](scenarios/02-basics-typescript.md) | | | | | | +| 03 | [03-basics-python.md](scenarios/03-basics-python.md) | | | | | | +| 04 | [04-basics-go.md](scenarios/04-basics-go.md) | | | | | | +| 05 | [05-app-nextjs.md](scenarios/05-app-nextjs.md) | | | | | | +| 06 | [06-app-fastapi.md](scenarios/06-app-fastapi.md) | | | | | | +| 07 | [07-app-go-http.md](scenarios/07-app-go-http.md) | | | | | | +| 08 | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md) | | | | | | +| 09 | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) | | | | | | +| 10 | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md) | | | | | | ### Column hints From 241dae68334115a3478db5f352d6bb94bcacd6e6 Mon Sep 17 00:00:00 2001 From: Phil Leggetter Date: Wed, 8 Apr 2026 16:06:54 +0100 Subject: [PATCH 05/47] fix(agent-eval): remove harness-only 202/head hints from local docs block - Point local EVAL_LOCAL_DOCS guidance at full curl quickstart instead - Reword scenario 01 execution criteria to reference quickstart/OpenAPI --- docs/agent-evaluation/scenarios/01-basics-curl.md | 2 +- docs/agent-evaluation/src/run-agent-eval.ts | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/agent-evaluation/scenarios/01-basics-curl.md b/docs/agent-evaluation/scenarios/01-basics-curl.md index ad48add99..6aa12b215 100644 --- a/docs/agent-evaluation/scenarios/01-basics-curl.md +++ b/docs/agent-evaluation/scenarios/01-basics-curl.md @@ -38,7 +38,7 @@ Paste the **## Template** block from `[hookdeck-outpost-agent-prompt.mdx](../pag - Delivers as one **shell script** (or one fenced `bash` block meant to be saved as `.sh`), not only three unrelated snippets without a shebang/variables. - Does **not** embed a pasted API key in the reply. - Verification mentions Hookdeck Console / dashboard logs if Turn 2 was asked. -- **Execution (full pass):** With `OUTPOST_API_KEY` (and `OUTPOST_API_BASE_URL` if the snippet uses it) set in your environment, run the agent’s tenant → destination → publish sequence against a real project. Expect **2xx** on tenant upsert and destination create, **202** (or documented success) on publish, and a visible delivery to the test webhook URL (Hookdeck Console / project logs, or `GET .../attempts` as appropriate). *Skip only if you are doing transcript-only triage.* +- **Execution (full pass):** With `OUTPOST_API_KEY` (and `OUTPOST_API_BASE_URL` if the snippet uses it) set in your environment, run the agent’s tenant → destination → publish sequence against a real project. Expect success per the **curl quickstart** and **OpenAPI** (tenant and destination typically 2xx; publish uses the documented success status—often **202**). Confirm delivery via Hookdeck Console / project logs (or `GET .../attempts` as appropriate). *Skip only if you are doing transcript-only triage.* ## Failure modes to note diff --git a/docs/agent-evaluation/src/run-agent-eval.ts b/docs/agent-evaluation/src/run-agent-eval.ts index 25c67f459..87abd3b78 100644 --- a/docs/agent-evaluation/src/run-agent-eval.ts +++ b/docs/agent-evaluation/src/run-agent-eval.ts @@ -73,7 +73,7 @@ function localDocumentationBlock(repoRoot: string, llmsFullUrl: string | undefin Map what the user says (they rarely name packages): -- **Simplest / minimal / least setup** and no language named → **curl** quickstart + OpenAPI; one shell script; **no SDK**. Publish success is **HTTP 202**; see curl quickstart for script portability (avoid GNU-only \`head -n -1\`). +- **Simplest / minimal / least setup** and no language named → **curl** quickstart + OpenAPI; one shell script; **no SDK**. Read the **entire** curl quickstart (it covers REST responses and any shell portability notes for scripts). - **TypeScript** or **Node** → TypeScript quickstart + \`@hookdeck/outpost-sdk\` as in that doc. - **Python** → Python quickstart + \`outpost_sdk\`; \`publish.event(request={{...}})\` as in that doc — not TS-style kwargs. - **Go** → Go quickstart + official Go SDK as in that doc. From 6b1fd4b13ee282043559ce177b8be8b60bb1edc0 Mon Sep 17 00:00:00 2001 From: Phil Leggetter Date: Wed, 8 Apr 2026 16:11:44 +0100 Subject: [PATCH 06/47] docs(agent-eval): update scenario 01 tracker after re-run and execution pass --- docs/agent-evaluation/SCENARIO-RUN-TRACKER.md | 24 +++++++++---------- 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md index e02009ff5..bf45cc4ee 100644 --- a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md +++ b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md @@ -18,18 +18,18 @@ Use this table while you **run scenarios one at a time** and **execute the gener ## Tracker -| ID | Scenario file | Run directory (`results/runs/…`) | Heuristic | LLM judge | Execution (generated code) | Notes | -| --- | ------------------------------------------------------------------------------ | -------------------------------- | --------- | --------- | -------------------------- | ----- | -| 01 | [01-basics-curl.md](scenarios/01-basics-curl.md) | `2026-04-08T14-58-40-850Z-scenario-01` | Pass (7/7) | Pass | — | Eval exit 0. Artifact: **`try-it-out.sh`**. **Execution** (manual): set `OUTPOST_API_KEY`, run script; uses `curl --fail-with-body` (2xx includes **202** on publish). | -| 02 | [02-basics-typescript.md](scenarios/02-basics-typescript.md) | | | | | | -| 03 | [03-basics-python.md](scenarios/03-basics-python.md) | | | | | | -| 04 | [04-basics-go.md](scenarios/04-basics-go.md) | | | | | | -| 05 | [05-app-nextjs.md](scenarios/05-app-nextjs.md) | | | | | | -| 06 | [06-app-fastapi.md](scenarios/06-app-fastapi.md) | | | | | | -| 07 | [07-app-go-http.md](scenarios/07-app-go-http.md) | | | | | | -| 08 | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md) | | | | | | -| 09 | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) | | | | | | -| 10 | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md) | | | | | | +| ID | Scenario file | Run directory (`results/runs/…`) | Heuristic | LLM judge | Execution (generated code) | Notes | +| --- | ------------------------------------------------------------------------------ | -------------------------------------- | ---------- | --------- | -------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| 01 | [01-basics-curl.md](scenarios/01-basics-curl.md) | `2026-04-08T15-07-08-923Z-scenario-01` | Pass (7/7) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifact: **`outpost-quickstart.sh`**. Execution: `docs/agent-evaluation/.env` + `./outpost-quickstart.sh`; tenant 200, destination 201, publish **202**; exit 0. | +| 02 | [02-basics-typescript.md](scenarios/02-basics-typescript.md) | | | | | | +| 03 | [03-basics-python.md](scenarios/03-basics-python.md) | | | | | | +| 04 | [04-basics-go.md](scenarios/04-basics-go.md) | | | | | | +| 05 | [05-app-nextjs.md](scenarios/05-app-nextjs.md) | | | | | | +| 06 | [06-app-fastapi.md](scenarios/06-app-fastapi.md) | | | | | | +| 07 | [07-app-go-http.md](scenarios/07-app-go-http.md) | | | | | | +| 08 | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md) | | | | | | +| 09 | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) | | | | | | +| 10 | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md) | | | | | | ### Column hints From 556b77f62c65a00c91348bf97f384f1183dd13cc Mon Sep 17 00:00:00 2001 From: Phil Leggetter Date: Wed, 8 Apr 2026 16:32:08 +0100 Subject: [PATCH 07/47] docs(agent-eval): record scenario 02 run and execution pass --- docs/agent-evaluation/SCENARIO-RUN-TRACKER.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md index bf45cc4ee..b9f0f5156 100644 --- a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md +++ b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md @@ -21,7 +21,7 @@ Use this table while you **run scenarios one at a time** and **execute the gener | ID | Scenario file | Run directory (`results/runs/…`) | Heuristic | LLM judge | Execution (generated code) | Notes | | --- | ------------------------------------------------------------------------------ | -------------------------------------- | ---------- | --------- | -------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | 01 | [01-basics-curl.md](scenarios/01-basics-curl.md) | `2026-04-08T15-07-08-923Z-scenario-01` | Pass (7/7) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifact: **`outpost-quickstart.sh`**. Execution: `docs/agent-evaluation/.env` + `./outpost-quickstart.sh`; tenant 200, destination 201, publish **202**; exit 0. | -| 02 | [02-basics-typescript.md](scenarios/02-basics-typescript.md) | | | | | | +| 02 | [02-basics-typescript.md](scenarios/02-basics-typescript.md) | `2026-04-08T15-16-50-424Z-scenario-02` | Pass (9/9) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifact: **`outpost-quickstart.ts`** + `package.json` (SDK). Ran `npx tsx outpost-quickstart.ts` with `docs/agent-evaluation/.env`; tenant, destination, publish OK. | | 03 | [03-basics-python.md](scenarios/03-basics-python.md) | | | | | | | 04 | [04-basics-go.md](scenarios/04-basics-go.md) | | | | | | | 05 | [05-app-nextjs.md](scenarios/05-app-nextjs.md) | | | | | | From 46e6dcc45b1e5037a6bd44fb14403d30f306f605 Mon Sep 17 00:00:00 2001 From: Phil Leggetter Date: Wed, 8 Apr 2026 16:34:09 +0100 Subject: [PATCH 08/47] docs(agent-eval): fix tracker table formatting and artifact markdown --- docs/agent-evaluation/SCENARIO-RUN-TRACKER.md | 22 +++++++++---------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md index b9f0f5156..a05e6b5f3 100644 --- a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md +++ b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md @@ -18,18 +18,18 @@ Use this table while you **run scenarios one at a time** and **execute the gener ## Tracker -| ID | Scenario file | Run directory (`results/runs/…`) | Heuristic | LLM judge | Execution (generated code) | Notes | -| --- | ------------------------------------------------------------------------------ | -------------------------------------- | ---------- | --------- | -------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| 01 | [01-basics-curl.md](scenarios/01-basics-curl.md) | `2026-04-08T15-07-08-923Z-scenario-01` | Pass (7/7) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifact: **`outpost-quickstart.sh`**. Execution: `docs/agent-evaluation/.env` + `./outpost-quickstart.sh`; tenant 200, destination 201, publish **202**; exit 0. | +| ID | Scenario file | Run directory (`results/runs/…`) | Heuristic | LLM judge | Execution (generated code) | Notes | +| --- | ------------------------------------------------------------------------------ | -------------------------------------- | ---------- | --------- | -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | +| 01 | [01-basics-curl.md](scenarios/01-basics-curl.md) | `2026-04-08T15-07-08-923Z-scenario-01` | Pass (7/7) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifact: **`outpost-quickstart.sh`**. Execution: `docs/agent-evaluation/.env` + `./outpost-quickstart.sh`; tenant 200, destination 201, publish **202**; exit 0. | | 02 | [02-basics-typescript.md](scenarios/02-basics-typescript.md) | `2026-04-08T15-16-50-424Z-scenario-02` | Pass (9/9) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifact: **`outpost-quickstart.ts`** + `package.json` (SDK). Ran `npx tsx outpost-quickstart.ts` with `docs/agent-evaluation/.env`; tenant, destination, publish OK. | -| 03 | [03-basics-python.md](scenarios/03-basics-python.md) | | | | | | -| 04 | [04-basics-go.md](scenarios/04-basics-go.md) | | | | | | -| 05 | [05-app-nextjs.md](scenarios/05-app-nextjs.md) | | | | | | -| 06 | [06-app-fastapi.md](scenarios/06-app-fastapi.md) | | | | | | -| 07 | [07-app-go-http.md](scenarios/07-app-go-http.md) | | | | | | -| 08 | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md) | | | | | | -| 09 | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) | | | | | | -| 10 | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md) | | | | | | +| 03 | [03-basics-python.md](scenarios/03-basics-python.md) | | | | | | +| 04 | [04-basics-go.md](scenarios/04-basics-go.md) | | | | | | +| 05 | [05-app-nextjs.md](scenarios/05-app-nextjs.md) | | | | | | +| 06 | [06-app-fastapi.md](scenarios/06-app-fastapi.md) | | | | | | +| 07 | [07-app-go-http.md](scenarios/07-app-go-http.md) | | | | | | +| 08 | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md) | | | | | | +| 09 | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) | | | | | | +| 10 | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md) | | | | | | ### Column hints From f57b59db4dae8f0e54eea51073e5bd1bdbe1c10e Mon Sep 17 00:00:00 2001 From: Phil Leggetter Date: Wed, 8 Apr 2026 16:48:01 +0100 Subject: [PATCH 09/47] docs(agent-eval): record scenario 03 run and execution pass --- docs/agent-evaluation/SCENARIO-RUN-TRACKER.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md index a05e6b5f3..69fd31de0 100644 --- a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md +++ b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md @@ -22,7 +22,7 @@ Use this table while you **run scenarios one at a time** and **execute the gener | --- | ------------------------------------------------------------------------------ | -------------------------------------- | ---------- | --------- | -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | 01 | [01-basics-curl.md](scenarios/01-basics-curl.md) | `2026-04-08T15-07-08-923Z-scenario-01` | Pass (7/7) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifact: **`outpost-quickstart.sh`**. Execution: `docs/agent-evaluation/.env` + `./outpost-quickstart.sh`; tenant 200, destination 201, publish **202**; exit 0. | | 02 | [02-basics-typescript.md](scenarios/02-basics-typescript.md) | `2026-04-08T15-16-50-424Z-scenario-02` | Pass (9/9) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifact: **`outpost-quickstart.ts`** + `package.json` (SDK). Ran `npx tsx outpost-quickstart.ts` with `docs/agent-evaluation/.env`; tenant, destination, publish OK. | -| 03 | [03-basics-python.md](scenarios/03-basics-python.md) | | | | | | +| 03 | [03-basics-python.md](scenarios/03-basics-python.md) | `2026-04-08T15-34-12-720Z-scenario-03` | Pass (8/8) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifact: **`outpost_quickstart.py`**. Execution: `python3 -m venv .venv`, `pip install outpost_sdk`, `docs/agent-evaluation/.env` + `python outpost_quickstart.py`. | | 04 | [04-basics-go.md](scenarios/04-basics-go.md) | | | | | | | 05 | [05-app-nextjs.md](scenarios/05-app-nextjs.md) | | | | | | | 06 | [06-app-fastapi.md](scenarios/06-app-fastapi.md) | | | | | | From 803b51c0413b8a37e7c5cf1795bfaf061ba59a31 Mon Sep 17 00:00:00 2001 From: Phil Leggetter Date: Wed, 8 Apr 2026 17:04:08 +0100 Subject: [PATCH 10/47] docs(agent-eval): record scenario 04 run and execution pass --- docs/agent-evaluation/SCENARIO-RUN-TRACKER.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md index 69fd31de0..cedd90ff6 100644 --- a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md +++ b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md @@ -23,7 +23,7 @@ Use this table while you **run scenarios one at a time** and **execute the gener | 01 | [01-basics-curl.md](scenarios/01-basics-curl.md) | `2026-04-08T15-07-08-923Z-scenario-01` | Pass (7/7) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifact: **`outpost-quickstart.sh`**. Execution: `docs/agent-evaluation/.env` + `./outpost-quickstart.sh`; tenant 200, destination 201, publish **202**; exit 0. | | 02 | [02-basics-typescript.md](scenarios/02-basics-typescript.md) | `2026-04-08T15-16-50-424Z-scenario-02` | Pass (9/9) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifact: **`outpost-quickstart.ts`** + `package.json` (SDK). Ran `npx tsx outpost-quickstart.ts` with `docs/agent-evaluation/.env`; tenant, destination, publish OK. | | 03 | [03-basics-python.md](scenarios/03-basics-python.md) | `2026-04-08T15-34-12-720Z-scenario-03` | Pass (8/8) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifact: **`outpost_quickstart.py`**. Execution: `python3 -m venv .venv`, `pip install outpost_sdk`, `docs/agent-evaluation/.env` + `python outpost_quickstart.py`. | -| 04 | [04-basics-go.md](scenarios/04-basics-go.md) | | | | | | +| 04 | [04-basics-go.md](scenarios/04-basics-go.md) | `2026-04-08T15-48-31-367Z-scenario-04` | Pass (9/9) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifacts: **`main.go`**, `go.mod` (replace → repo `sdks/outpost-go`). `docs/agent-evaluation/.env` + `go run .`; tenant, destination, publish OK. | | 05 | [05-app-nextjs.md](scenarios/05-app-nextjs.md) | | | | | | | 06 | [06-app-fastapi.md](scenarios/06-app-fastapi.md) | | | | | | | 07 | [07-app-go-http.md](scenarios/07-app-go-http.md) | | | | | | From f600652b26c1e6bd74708cce1a6f8197e96e79f9 Mon Sep 17 00:00:00 2001 From: Phil Leggetter Date: Wed, 8 Apr 2026 17:22:11 +0100 Subject: [PATCH 11/47] docs(agent-eval): record scenario 05 run and execution pass Made-with: Cursor --- docs/agent-evaluation/SCENARIO-RUN-TRACKER.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md index cedd90ff6..576da3add 100644 --- a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md +++ b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md @@ -20,11 +20,11 @@ Use this table while you **run scenarios one at a time** and **execute the gener | ID | Scenario file | Run directory (`results/runs/…`) | Heuristic | LLM judge | Execution (generated code) | Notes | | --- | ------------------------------------------------------------------------------ | -------------------------------------- | ---------- | --------- | -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -| 01 | [01-basics-curl.md](scenarios/01-basics-curl.md) | `2026-04-08T15-07-08-923Z-scenario-01` | Pass (7/7) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifact: **`outpost-quickstart.sh`**. Execution: `docs/agent-evaluation/.env` + `./outpost-quickstart.sh`; tenant 200, destination 201, publish **202**; exit 0. | -| 02 | [02-basics-typescript.md](scenarios/02-basics-typescript.md) | `2026-04-08T15-16-50-424Z-scenario-02` | Pass (9/9) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifact: **`outpost-quickstart.ts`** + `package.json` (SDK). Ran `npx tsx outpost-quickstart.ts` with `docs/agent-evaluation/.env`; tenant, destination, publish OK. | -| 03 | [03-basics-python.md](scenarios/03-basics-python.md) | `2026-04-08T15-34-12-720Z-scenario-03` | Pass (8/8) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifact: **`outpost_quickstart.py`**. Execution: `python3 -m venv .venv`, `pip install outpost_sdk`, `docs/agent-evaluation/.env` + `python outpost_quickstart.py`. | -| 04 | [04-basics-go.md](scenarios/04-basics-go.md) | `2026-04-08T15-48-31-367Z-scenario-04` | Pass (9/9) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifacts: **`main.go`**, `go.mod` (replace → repo `sdks/outpost-go`). `docs/agent-evaluation/.env` + `go run .`; tenant, destination, publish OK. | -| 05 | [05-app-nextjs.md](scenarios/05-app-nextjs.md) | | | | | | +| 01 | [01-basics-curl.md](scenarios/01-basics-curl.md) | `2026-04-08T15-07-08-923Z-scenario-01` | Pass (7/7) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost-quickstart.sh`**. Execution: `docs/agent-evaluation/.env` + `./outpost-quickstart.sh`; tenant 200, destination 201, publish **202**; exit 0. | +| 02 | [02-basics-typescript.md](scenarios/02-basics-typescript.md) | `2026-04-08T15-16-50-424Z-scenario-02` | Pass (9/9) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost-quickstart.ts`** + `package.json` (SDK). Ran `npx tsx outpost-quickstart.ts` with `docs/agent-evaluation/.env`; tenant, destination, publish OK. | +| 03 | [03-basics-python.md](scenarios/03-basics-python.md) | `2026-04-08T15-34-12-720Z-scenario-03` | Pass (8/8) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost_quickstart.py`**. Execution: `python3 -m venv .venv`, `pip install outpost_sdk`, `docs/agent-evaluation/.env` + `python outpost_quickstart.py`. | +| 04 | [04-basics-go.md](scenarios/04-basics-go.md) | `2026-04-08T15-48-31-367Z-scenario-04` | Pass (9/9) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifacts: `**main.go**`, `go.mod` (replace → repo `sdks/outpost-go`). `docs/agent-evaluation/.env` + `go run .`; tenant, destination, publish OK. | +| 05 | [05-app-nextjs.md](scenarios/05-app-nextjs.md) | `2026-04-08T16-12-10-708Z-scenario-05` | Pass (10/10) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. App in **`outpost-nextjs-demo/`** (`@hookdeck/outpost-sdk` npm). `npm run build` OK. Dev on :3010: `POST /api/register` and `POST /api/publish` **200** with `docs/agent-evaluation/.env`. Next.js workspace-root warning (nested lockfiles). | | 06 | [06-app-fastapi.md](scenarios/06-app-fastapi.md) | | | | | | | 07 | [07-app-go-http.md](scenarios/07-app-go-http.md) | | | | | | | 08 | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md) | | | | | | From e1e5154ed85731ce5dbed20ea21fdeaa83bd7053 Mon Sep 17 00:00:00 2001 From: Phil Leggetter Date: Wed, 8 Apr 2026 20:41:00 +0100 Subject: [PATCH 12/47] docs: Outpost mental model, UI guide agnostic URLs, agent prompt links - Expand concepts with SaaS/platform flow; refine building-your-own-ui (API root, paths, no localhost:3333 in examples) - Agent prompt: link concepts, UI guide, topics; tighten option-2 guidance - Eval harness: local docs list includes concepts, building-your-own-ui, topics - SCENARIO-RUN-TRACKER: scenario 05 assessment for 17-21-22 run, heuristic notes - Minor scenario 05 doc tweak Made-with: Cursor --- docs/agent-evaluation/SCENARIO-RUN-TRACKER.md | 63 +++++++++++++++---- .../scenarios/05-app-nextjs.md | 3 +- docs/agent-evaluation/src/run-agent-eval.ts | 3 + docs/pages/concepts.mdx | 33 +++++++--- docs/pages/guides/building-your-own-ui.mdx | 51 +++++++++++---- .../hookdeck-outpost-agent-prompt.mdx | 9 ++- 6 files changed, 125 insertions(+), 37 deletions(-) diff --git a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md index 576da3add..e9b506e5c 100644 --- a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md +++ b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md @@ -18,18 +18,18 @@ Use this table while you **run scenarios one at a time** and **execute the gener ## Tracker -| ID | Scenario file | Run directory (`results/runs/…`) | Heuristic | LLM judge | Execution (generated code) | Notes | -| --- | ------------------------------------------------------------------------------ | -------------------------------------- | ---------- | --------- | -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -| 01 | [01-basics-curl.md](scenarios/01-basics-curl.md) | `2026-04-08T15-07-08-923Z-scenario-01` | Pass (7/7) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost-quickstart.sh`**. Execution: `docs/agent-evaluation/.env` + `./outpost-quickstart.sh`; tenant 200, destination 201, publish **202**; exit 0. | -| 02 | [02-basics-typescript.md](scenarios/02-basics-typescript.md) | `2026-04-08T15-16-50-424Z-scenario-02` | Pass (9/9) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost-quickstart.ts`** + `package.json` (SDK). Ran `npx tsx outpost-quickstart.ts` with `docs/agent-evaluation/.env`; tenant, destination, publish OK. | -| 03 | [03-basics-python.md](scenarios/03-basics-python.md) | `2026-04-08T15-34-12-720Z-scenario-03` | Pass (8/8) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost_quickstart.py`**. Execution: `python3 -m venv .venv`, `pip install outpost_sdk`, `docs/agent-evaluation/.env` + `python outpost_quickstart.py`. | -| 04 | [04-basics-go.md](scenarios/04-basics-go.md) | `2026-04-08T15-48-31-367Z-scenario-04` | Pass (9/9) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifacts: `**main.go**`, `go.mod` (replace → repo `sdks/outpost-go`). `docs/agent-evaluation/.env` + `go run .`; tenant, destination, publish OK. | -| 05 | [05-app-nextjs.md](scenarios/05-app-nextjs.md) | `2026-04-08T16-12-10-708Z-scenario-05` | Pass (10/10) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. App in **`outpost-nextjs-demo/`** (`@hookdeck/outpost-sdk` npm). `npm run build` OK. Dev on :3010: `POST /api/register` and `POST /api/publish` **200** with `docs/agent-evaluation/.env`. Next.js workspace-root warning (nested lockfiles). | -| 06 | [06-app-fastapi.md](scenarios/06-app-fastapi.md) | | | | | | -| 07 | [07-app-go-http.md](scenarios/07-app-go-http.md) | | | | | | -| 08 | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md) | | | | | | -| 09 | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) | | | | | | -| 10 | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md) | | | | | | +| ID | Scenario file | Run directory (`results/runs/…`) | Heuristic | LLM judge | Execution (generated code) | Notes | +| --- | ------------------------------------------------------------------------------ | -------------------------------------- | ---------------------- | --------- | -------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| 01 | [01-basics-curl.md](scenarios/01-basics-curl.md) | `2026-04-08T15-07-08-923Z-scenario-01` | Pass (7/7) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost-quickstart.sh`**. Execution: `docs/agent-evaluation/.env` + `./outpost-quickstart.sh`; tenant 200, destination 201, publish **202**; exit 0. | +| 02 | [02-basics-typescript.md](scenarios/02-basics-typescript.md) | `2026-04-08T15-16-50-424Z-scenario-02` | Pass (9/9) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost-quickstart.ts`** + `package.json` (SDK). Ran `npx tsx outpost-quickstart.ts` with `docs/agent-evaluation/.env`; tenant, destination, publish OK. | +| 03 | [03-basics-python.md](scenarios/03-basics-python.md) | `2026-04-08T15-34-12-720Z-scenario-03` | Pass (8/8) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost_quickstart.py`**. Execution: `python3 -m venv .venv`, `pip install outpost_sdk`, `docs/agent-evaluation/.env` + `python outpost_quickstart.py`. | +| 04 | [04-basics-go.md](scenarios/04-basics-go.md) | `2026-04-08T15-48-31-367Z-scenario-04` | Pass (9/9) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifacts: `**main.go`**, `go.mod` (replace → repo `sdks/outpost-go`). `docs/agent-evaluation/.env` + `go run .`; tenant, destination, publish OK. | +| 05 | [05-app-nextjs.md](scenarios/05-app-nextjs.md) | `2026-04-08T17-21-22-170Z-scenario-05` | 9/10; overall **Fail** | Pass | Pass | `**nextjs-webhook-demo/`** — primary assessed run; see **§ Scenario 05 — assessment (`17-21-22`)**. Heuristic failure is `managed_base_not_selfhosted` (doc-corpus), not the app. Earlier: `2026-04-08T16-12-10-708Z` (`outpost-nextjs-demo/`, 10/10 heuristic, simpler UI). | +| 06 | [06-app-fastapi.md](scenarios/06-app-fastapi.md) | | | | | | +| 07 | [07-app-go-http.md](scenarios/07-app-go-http.md) | | | | | | +| 08 | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md) | | | | | | +| 09 | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) | | | | | | +| 10 | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md) | | | | | | ### Column hints @@ -49,9 +49,46 @@ Use short text or symbols in cells, e.g. **Pass** / **Fail** / **Skip** / **N/A* --- +## Scenario 05 — assessment (`2026-04-08T17-21-22-170Z`) + +**Status:** This is the **current focus run** for scenario 05 reviews (not `2026-04-08T16-12-10-708Z`). + + +| Dimension | Result | +| ----------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| **Run directory** | `results/runs/2026-04-08T17-21-22-170Z-scenario-05/` | +| **Artifact** | `nextjs-webhook-demo/` — Next.js App Router, `@hookdeck/outpost-sdk`, Outpost calls **only** in `app/api/**/route.ts` (managed API via SDK default unless `OUTPOST_API_BASE_URL` is set). | +| **Heuristic** | **9/10**; `overallTranscriptPass` false — single failure: `managed_base_not_selfhosted` because the transcript corpus included a **Read** of older [Building your own UI](../pages/guides/building-your-own-ui.mdx) containing `localhost:3333/api/v1`. The **generated app does not** use that URL. See § Scenario 05 heuristic. | +| **LLM judge** | **Pass** — matches scenario 05 success criteria (Next.js structure, server-side SDK, distinct destination + publish UI, tenant/topic handling, README env, managed default). | +| **Execution** | **Pass** (re-checked): `npm run build` in `nextjs-webhook-demo/`; `npm run dev` with `docs/agent-evaluation/.env`; `POST /api/destinations` → **201**, `POST /api/publish` → **200**. | + + +**What the app demonstrates (UX / model):** + +1. **Tenant** — Editable tenant id; copy states destinations and publishes are scoped to it. +2. **Register webhook destination** — URL field + **topic checkboxes** populated from `**GET /api/topics`** (server lists topics from Outpost); `**POST /api/destinations**` upserts tenant and creates webhook destination for selected topics. +3. **Destinations list** — `**GET /api/destinations?tenantId=`** table (type, target, topics) with refresh — matches “tenant → many destinations” mental model. +4. **Publish test event** — Separate action; `**POST /api/publish`** with chosen topic; UI notes fan-out to matching destinations. + +**Comparison — older run `2026-04-08T16-12-10-708Z` (`outpost-nextjs-demo/`):** Simpler two-route app (`/api/register`, `/api/publish`), **fixed topic** in routes, **no** topics or destinations list APIs, **10/10** heuristic (no offending doc fragment in corpus). Useful as a minimal baseline; **17-21-22** is the richer assessment target. + +--- + +## Scenario 05 heuristic — `managed_base_not_selfhosted` + +Scenario 05 includes a regex check (`managed_base_not_selfhosted`) in `[src/score-transcript.ts](../src/score-transcript.ts)` (`scoreScenario05`). It looks at the **whole scoring corpus**: assistant-visible text **plus** content that ended up in the transcript from tools (e.g. **Read** of a doc file), not just files in the run folder. + +- It fails if the corpus contains a **self-hosted** default API path: specifically the literal substring `localhost:3333/api/v1` (Outpost’s common local dev URL), or a similar `localhost: / api/v1` pattern, unless `OUTPOST_API_BASE_URL` also appears (see code for the exact conditions). +- **Historical cause:** Older [Building your own UI](../pages/guides/building-your-own-ui.mdx) curl examples used `localhost:3333/api/v1`. If the agent **read** that page during a run, those lines were embedded in `transcript.json`, the check fired, and `overallTranscriptPass` became **false** even when the **generated Next.js app** only used the **managed** SDK default. That was a **harness / doc-corpus** interaction, not proof the app targeted local Outpost. +- **Doc update:** `docs/pages/guides/building-your-own-ui.mdx` was rewritten to be **managed / self-hosted agnostic** (`OUTPOST_API_BASE_URL`, OpenAPI-shaped paths). Examples **no longer contain** the literal `localhost:3333/api/v1`, so a future eval whose corpus only picks up the current file should **not** fail this check for that substring. Re-run scenario 05 to confirm; other `localhost` patterns could still match if they appear elsewhere in the corpus. +- **Run `2026-04-08T16-12-10-708Z`:** heuristic **10/10**, `overallTranscriptPass: true`. +- **Run `2026-04-08T17-21-22-170Z`:** heuristic **9/10**, `overallTranscriptPass: false` — failed `managed_base_not_selfhosted`; LLM judge still **passed**; transcript included **Read** of the **previous** `building-your-own-ui.mdx` with `localhost:3333/api/v1`. + +**Possible follow-ups:** narrow the heuristic to tool-written files under the run workspace only, or exclude known doc paths from the substring that triggers this check. + ## Action items -Add bullet or table rows here when something should be tracked across runs (docs gaps, harness changes, etc.). *None recorded yet for this pass.* +- Scenario 05: optionally re-run eval after the UI guide rewrite to confirm `managed_base_not_selfhosted` no longer false-positives on that doc **Read**; then consider whether the heuristic can be narrowed (see § above). --- diff --git a/docs/agent-evaluation/scenarios/05-app-nextjs.md b/docs/agent-evaluation/scenarios/05-app-nextjs.md index f44061775..bc4aca4db 100644 --- a/docs/agent-evaluation/scenarios/05-app-nextjs.md +++ b/docs/agent-evaluation/scenarios/05-app-nextjs.md @@ -54,5 +54,4 @@ Paste the **## Template** block from `[hookdeck-outpost-agent-prompt.mdx](../pag - Calling Outpost directly from browser-side code with embedded key. - Only publishing without a UI path to register the destination first. -- Hard-coding localhost Outpost without user request. - +- Hard-coding localhost Outpost without user request. \ No newline at end of file diff --git a/docs/agent-evaluation/src/run-agent-eval.ts b/docs/agent-evaluation/src/run-agent-eval.ts index 87abd3b78..bc7629f53 100644 --- a/docs/agent-evaluation/src/run-agent-eval.ts +++ b/docs/agent-evaluation/src/run-agent-eval.ts @@ -91,6 +91,9 @@ Do **not** rely on live public documentation URLs for this session. Read these f Follow **Language → SDK vs HTTP** below for mapping user intent to the **single** right quickstart. Prefer language quickstarts over \`sdks.mdx\` (TS-heavy). +- **Concepts** (tenants, destinations as subscriptions, topics, how this fits a SaaS/platform): \`${f("docs/pages/concepts.mdx")}\` +- **Building your own UI** (screen structure: list destinations, create flow type → topics → config): \`${f("docs/pages/guides/building-your-own-ui.mdx")}\` +- **Topics** (destination topic subscriptions, fan-out): \`${f("docs/pages/features/topics.mdx")}\` - Getting started (curl / HTTP only): \`${f("docs/pages/quickstarts/hookdeck-outpost-curl.mdx")}\` - TypeScript quickstart (TS SDK): \`${f("docs/pages/quickstarts/hookdeck-outpost-typescript.mdx")}\` - Python quickstart (Python SDK): \`${f("docs/pages/quickstarts/hookdeck-outpost-python.mdx")}\` diff --git a/docs/pages/concepts.mdx b/docs/pages/concepts.mdx index 841f64249..a74e927bb 100644 --- a/docs/pages/concepts.mdx +++ b/docs/pages/concepts.mdx @@ -2,14 +2,31 @@ title: "Outpost Concepts" --- +## How this fits your product + +If you run a **SaaS**, **platform**, or **API product** and want each of **your customers** to receive webhooks or other event destinations, Outpost gives you a **multi-tenant** control plane for that. + +At a high level, the same mental model as a single-tenant webhook product still applies: something happens in your system (**event**), it belongs to a category (**topic**), and the consumer cares about **where** it should be delivered (**URL**, queue, etc.). Outpost adds one layer: those subscriptions live **per customer** in your product, which maps to a **tenant** in Outpost. + +**Typical flow:** + +1. **Map your customer to a tenant** — Each organization, team, or account in your app should have a stable **tenant id** in Outpost (often the same id you already use internally). Create or upsert that tenant when the customer is ready to use outbound events (onboarding, first visit to integrations, etc.). +2. **Each tenant has zero or more destinations** — A **destination** is a concrete subscription: it combines a **destination type** (webhook, SQS, Hookdeck, …), one or more **topics** the customer wants to receive, and **type-specific configuration** (for a webhook, the HTTPS **endpoint URL** and signing secret; for a queue, the queue identifier; and so on). One tenant may have several destinations (e.g. production vs staging endpoints, or different systems). +3. **Your backend publishes events** — When something happens, your **server** calls the publish API (or SDK) with **`tenant_id`**, **`topic`**, and payload metadata. Outpost does **not** infer the tenant from the browser; publishing uses your **platform** credentials and explicit tenant scope. +4. **Outpost delivers to matching destinations** — For that tenant, every destination whose **topic subscription** includes the event’s topic gets a delivery attempt. A single publish can fan out to **many** destinations or to **none** if no destination subscribes to that topic. + +**What to build in your UI (conceptually):** screens or flows scoped to the **current customer** (tenant): list their **destinations**, **create or edit** a destination (choose type → choose topics → enter URL or other config), and surfaces for **events and delivery attempts** when you want users to inspect what was sent and how delivery behaved. Your UI talks to Outpost **through your backend** (recommended) or via **per-tenant JWT**, never by embedding your platform API key in the browser. See the [Building your own UI](/docs/guides/building-your-own-ui) guide for screen-level structure and API patterns. + +For topic subscription behavior (wildcard `*`, multiple topics, fan-out), see [Topics](/docs/features/topics). + ## Models -- **Tenants**: A tenant represents a user/team/organization in your product. -- **Destination Types**: The type of destination where events will be delivered. For example, webhook, Hookdeck, or AWS SQS. -- **Destinations**: A destination is a specific instance of a destination type. For example, a webhook destination with a specific URL. -- **Topics**: A topic is a way to categorize events and is a common concept found in Pub/Sub messaging. For example, a `user.created` event might be categorized under the user topic. -- **Events**: An event is a piece of data that represents an action that occurred in your system. For example, a user signed up or a payment was processed. -- **Delivery Attempts**: A delivery attempt represents the result of an attempt to deliver an event to a destination. +- **Tenants**: A tenant represents a user, team, or organization **in your product**—the customer who owns their own destinations and receives their own deliveries. +- **Destination types**: The kind of endpoint where events are delivered (webhook, Hookdeck, AWS SQS, …). The set of types is configured on the Outpost deployment. +- **Destinations**: A **subscription** for one tenant: an instance of a destination type plus **which topics** to receive and **where** to deliver (webhook URL, queue name, Hookdeck token, etc.). A tenant may have **many** destinations. +- **Topics**: Labels for categories of events (e.g. `user.created`). Your platform configures which topics exist; destinations **subscribe** to one or more topics; publish calls include a **topic** so Outpost knows which subscriptions match. +- **Events**: A unit of something that happened in your system, published into Outpost with tenant, topic, and payload. Delivery attempts record how each destination received (or failed) that event. +- **Delivery attempts**: The outcome of trying to deliver one event to one destination (success, failure, retries, response metadata). ## Architecture @@ -41,9 +58,9 @@ Required for log storage. - PostgreSQL - ClickHouse -## Tenant Destination Types +## Supported destination types -Event destination types belonging to Outpost tenants where events are delivered. +These are the **destination types** your tenants can choose when creating a destination (see **Models** above). - Webhooks - Hookdeck Event Gateway diff --git a/docs/pages/guides/building-your-own-ui.mdx b/docs/pages/guides/building-your-own-ui.mdx index 2edbcc6ad..73fe8c135 100644 --- a/docs/pages/guides/building-your-own-ui.mdx +++ b/docs/pages/guides/building-your-own-ui.mdx @@ -10,20 +10,50 @@ Within this guide, we will use the User Portal as a reference implementation for In this guide, we will assume you are using React (client-side) to build your own UI, but the same principles can be applied to any other framework. +## UI structure and flow + +Outpost’s tenant portal is a good reference for how screens map to the **tenant → destinations → topics → delivery target** model. When you build your own UI, keep the same structure so operators and end users are not forced into a misleading “single global webhook URL” mental model. + +**Tenant context** + +- Everything below is **scoped to one tenant**—the signed-in customer in your SaaS or the account selected in your platform. That tenant id is what you pass to Outpost when listing or creating destinations and when publishing from your backend. +- If you use JWT auth against Outpost, the token is issued **for that tenant**; if you proxy through your API, your routes should resolve the current customer to a `tenant_id` and forward it on list/create/publish calls. + +**Recommended areas / screens** + +| Area | Purpose | +| ---- | ------- | +| **Destinations list** | Show all destinations for the current tenant (each row is one subscription: type, human-readable **target** such as webhook URL, subscribed topics). Entry point to edit, disable, or remove. | +| **Create destination** | Multi-step flow aligned with the API: (1) **choose destination type**, (2) **select topics** (from the topics configured on your Outpost project—often checkboxes or multi-select), (3) **configure** type-specific fields (e.g. webhook URL, credentials). Optional: instructions or remote setup links from the destination type schema. | +| **Events and delivery attempts** | List recent events for the tenant and inspect **delivery attempts** per event or destination so users can see outcomes, failures, and retries—similar to the portal’s event and log experience. | + +For how tenants, destinations, and topics fit together in a multi-tenant product, see [Outpost Concepts](/docs/concepts)—especially **How this fits your product**. + ## Authentication To perform API calls on behalf of your tenants, you can either generate a JWT token, which can be used client-side to make Outpost API calls, or you can proxy any API requests to the Outpost API through your own API. When proxying through your own API, you can ensure the API call is made for the currently authenticated tenant using the API `tenant_id` parameter. Proxying through your own API can be useful if you want to limit access to some configuration or functionality of Outpost. +### API base URL (managed and self-hosted) + +Examples below use a single variable **`API_URL`** (or **`OUTPOST_API_BASE_URL`** in shell snippets): the **root URL for Outpost’s HTTP API**, with **no trailing slash**. Paths in this guide match the [OpenAPI specification](/docs/api) (`/tenants/...`, `/topics`, `/destination-types`, …). + +- **Hookdeck Outpost (managed):** use the base URL from your project (for example `https://api.outpost.hookdeck.com/2025-07-01`). The [managed curl quickstart](/docs/quickstarts/hookdeck-outpost-curl) uses the same pattern. +- **Self-hosted Outpost:** use your deployment’s public origin **plus** whatever path prefix your install uses (commonly **`/api/v1`**), e.g. `https://outpost.internal.example.com/api/v1`. For local dev, use your actual host and port (see your deployment docs—do not assume a specific port in shared snippets). + +Do **not** hardcode `localhost` in product docs or copy-paste snippets meant for operators; always substitute your real base URL. The React snippets assume `API_URL` already includes any `/api/v1` segment so that `${API_URL}/tenants/destinations` resolves correctly for your environment. + ### Generating a JWT Token (Optional) You can generate a JWT token by using the [Tenant JWT Token API](/docs/api/tenants#get-tenant-jwt-token). ```bash -curl --location 'localhost:3333/api/v1/tenants//token' \ - --header 'Content-Type: application/json' \ - --header 'Authorization: Bearer ' \ +export OUTPOST_API_BASE_URL="https://api.outpost.hookdeck.com/2025-07-01" # or your self-hosted root, e.g. …/api/v1 +TENANT_ID="" + +curl --request GET "$OUTPOST_API_BASE_URL/tenants/$TENANT_ID/token" \ + --header "Authorization: Bearer " ``` ## Fetching Destination Type Schema @@ -36,14 +66,15 @@ Destinations are listed using the [List Destinations API](/docs/api/destinations ```tsx // React example to fetch and render a list of destinations +// API_URL = Outpost API root (managed project URL or self-hosted origin + /api/v1) const [destinations, setDestinations] = useState([]); const [destination_types, setDestinationTypes] = useState([]); const fetchDestinations = async () => { - // Get the tenant destinations - const response = await fetch(`${API_URL}/api/v1/tenants/destinations`, { + // Get the tenant destinations (JWT infers tenant — see Authentication API) + const response = await fetch(`${API_URL}/tenants/destinations`, { headers: { Authorization: `Bearer ${token}`, }, @@ -54,8 +85,7 @@ const fetchDestinations = async () => { }; const fetchDestinationTypes = async () => { - // Get the destination types schemas - const response = await fetch(`${API_URL}/api/v1/destination-types`, { + const response = await fetch(`${API_URL}/destination-types`, { headers: { Authorization: `Bearer ${token}`, }, @@ -120,8 +150,7 @@ The list of available destination types is rendered from the list of destination const [destination_types, setDestinationTypes] = useState([]); const fetchDestinationTypes = async () => { - // Get the destination types schemas - const response = await fetch(`${API_URL}/api/v1/destination-types`, { + const response = await fetch(`${API_URL}/destination-types`, { headers: { Authorization: `Bearer ${token}`, }, @@ -183,7 +212,7 @@ Available topics are returned from the [List Topics API](/docs/api/topics#list-t const [topics, setTopics] = useState([]); const fetchTopics = async () => { - const response = await fetch(`${API_URL}/api/v1/topics`, { + const response = await fetch(`${API_URL}/topics`, { headers: { Authorization: `Bearer ${token}`, }, @@ -341,7 +370,7 @@ Events are listed using the [List Events API](/docs/api/events#list-events). You const [events, setEvents] = useState([]); const fetchEvents = async () => { - const response = await fetch(`${API_URL}/api/v1/tenants/events`, { + const response = await fetch(`${API_URL}/tenants/events`, { headers: { Authorization: `Bearer ${token}`, }, diff --git a/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx b/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx index 8e6afe122..a36ea94a9 100644 --- a/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx +++ b/docs/pages/quickstarts/hookdeck-outpost-agent-prompt.mdx @@ -35,7 +35,10 @@ Use this **Hookdeck Console Source** URL to verify event delivery (the webhook ` - Go quickstart (Go SDK): {{DOCS_URL}}/quickstarts/hookdeck-outpost-go - Full docs bundle (when available on the public site): {{LLMS_FULL_URL}} - API reference and OpenAPI (REST JSON shapes and status codes): {{DOCS_URL}}/api +- **Concepts — how tenants, destinations (subscriptions), topics, and publish fit a SaaS/platform:** {{DOCS_URL}}/concepts +- **Building your own UI — screen structure and flow** (list destinations, create destination: type → topics → config; tenant scope): {{DOCS_URL}}/guides/building-your-own-ui - Destination types: {{DOCS_URL}}/destinations +- Topics and destination subscriptions (fan-out, `*`): {{DOCS_URL}}/features/topics - SDK overview (mostly TypeScript-shaped examples): {{DOCS_URL}}/sdks — use **only** for high-level context; for **TypeScript, Python, or Go** code, follow that language’s **quickstart** for correct method signatures (e.g. Python `publish.event` uses `request={{...}}`, not TypeScript-style spreads as Python kwargs). ### Language → SDK vs HTTP @@ -52,7 +55,7 @@ Operators rarely name packages or SDK details. **You** map what they say to the Do **not** mix patterns across languages (e.g. do not apply TypeScript `publish.event({ ... })` argument style to Python). -**Option 2 (small app)** — Map framework to the matching official SDK on the **server only**: e.g. **Next.js** → TypeScript SDK + patterns from the TypeScript quickstart and your Next conventions; **FastAPI** → Python SDK; **Go + net/http** → Go SDK. Prefer each language’s **quickstart** for Outpost call shapes. +**Option 2 (small app)** — Map framework to the matching official SDK on the **server only**: e.g. **Next.js** → TypeScript SDK + patterns from the TypeScript quickstart and your Next conventions; **FastAPI** → Python SDK; **Go + net/http** → Go SDK. Prefer each language’s **quickstart** for Outpost call shapes. **Before** designing pages or forms, read **Concepts** and **Building your own UI** in the Documentation list: the UI should reflect **tenant scope**, **multiple destinations per tenant**, and **destination = topic subscription + delivery target** (not a single anonymous webhook field unless the user explicitly asks for that simplified shape). **Option 3 (existing app)** — Use the **official SDK for the repo’s language** on the server (or REST/OpenAPI if they insist on no SDK). Read that language’s quickstart for shapes; integrate on **real** domain paths, not throwaway demos. @@ -62,7 +65,7 @@ Guide the conversation, then act: 1. **Try it out** — Minimal path: tenant → webhook destination → publish → print event id (or show success). If they want the **simplest** path, default to **curl** without making them say “curl.” If they name **TypeScript**, **Python**, or **Go**, use **only** that language’s quickstart and implied SDK. Ask only for what the quickstart and runnability still need (env vars, etc.). -2. **Build a minimal example** — Small UI + server; use the **SDK for that stack** (see **Option 2** above) or REST if they choose HTTP-only. +2. **Build a minimal example** — Small UI + server; use the **SDK for that stack** (see **Option 2** above) or REST if they choose HTTP-only. Follow **Concepts** + **Building your own UI** for the real product model. For a **tiny** demo (e.g. one page), still keep the model visible: **tenant** in scope, **create destination** as **topics + delivery target** (not one undifferentiated “webhook” field that hides topics), and a **separate** control or flow to **publish a test event** so the operator can verify delivery—avoid collapsing tenant setup, destination creation, and publish into a single form unless the user insists. An events or attempts view is optional for the smallest demo but matches the portal pattern when you have room. 3. **Integrate with an existing app** — Clone or open their codebase; add Outpost per **Option 3** above; document env vars and operator steps. @@ -70,7 +73,7 @@ For all modes, read the **single** language-appropriate quickstart (and OpenAPI **Files on disk:** When your environment supports it, **write runnable artifacts into the operator’s project workspace** (real files: scripts, app source, `package.json`, `go.mod`, README) rather than only pasting long code in chat—so they can run, diff, and commit. Keep everything for one task in the same directory. For **minimal example apps** (option 2), scaffold and install dependencies there as you normally would (for example `npm` / `npx`, `go mod`, `pip` or `uv`). -**Concepts:** Each tenant is one of the platform's customers. Destinations are where events are delivered (webhook URLs, queues, etc.). Events are published with a **topic**; only destinations subscribed to that topic receive the event. Topics for this project are listed above and were configured in the Hookdeck dashboard. +**Concepts:** Each **tenant** is one of the platform’s customers. A tenant has **zero or more destinations**; each **destination** is a **subscription**—a destination type (e.g. webhook) plus **which topics** to receive and **where** to deliver (e.g. HTTPS URL). Your **backend** publishes with **`tenant_id`**, **`topic`**, and payload; Outpost fans out to every destination of that tenant that subscribes to that topic. Read **{{DOCS_URL}}/concepts** and **{{DOCS_URL}}/guides/building-your-own-ui** for the full model and recommended screens. Topics for this project are listed above and were configured in the Hookdeck dashboard. ``` ## Placeholder reference From 1c6042be32ff39c6f54b6a5938f19a3bb53b232f Mon Sep 17 00:00:00 2001 From: Phil Leggetter Date: Thu, 9 Apr 2026 11:41:38 +0100 Subject: [PATCH 13/47] =?UTF-8?q?docs(agent-eval):=20record=20scenario=200?= =?UTF-8?q?6=E2=80=9307=20runs=20and=20execution=20passes?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Made-with: Cursor --- docs/agent-evaluation/SCENARIO-RUN-TRACKER.md | 26 +++++++++---------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md index e9b506e5c..7bc14b506 100644 --- a/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md +++ b/docs/agent-evaluation/SCENARIO-RUN-TRACKER.md @@ -18,18 +18,18 @@ Use this table while you **run scenarios one at a time** and **execute the gener ## Tracker -| ID | Scenario file | Run directory (`results/runs/…`) | Heuristic | LLM judge | Execution (generated code) | Notes | -| --- | ------------------------------------------------------------------------------ | -------------------------------------- | ---------------------- | --------- | -------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| 01 | [01-basics-curl.md](scenarios/01-basics-curl.md) | `2026-04-08T15-07-08-923Z-scenario-01` | Pass (7/7) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost-quickstart.sh`**. Execution: `docs/agent-evaluation/.env` + `./outpost-quickstart.sh`; tenant 200, destination 201, publish **202**; exit 0. | -| 02 | [02-basics-typescript.md](scenarios/02-basics-typescript.md) | `2026-04-08T15-16-50-424Z-scenario-02` | Pass (9/9) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost-quickstart.ts`** + `package.json` (SDK). Ran `npx tsx outpost-quickstart.ts` with `docs/agent-evaluation/.env`; tenant, destination, publish OK. | -| 03 | [03-basics-python.md](scenarios/03-basics-python.md) | `2026-04-08T15-34-12-720Z-scenario-03` | Pass (8/8) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost_quickstart.py`**. Execution: `python3 -m venv .venv`, `pip install outpost_sdk`, `docs/agent-evaluation/.env` + `python outpost_quickstart.py`. | -| 04 | [04-basics-go.md](scenarios/04-basics-go.md) | `2026-04-08T15-48-31-367Z-scenario-04` | Pass (9/9) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifacts: `**main.go`**, `go.mod` (replace → repo `sdks/outpost-go`). `docs/agent-evaluation/.env` + `go run .`; tenant, destination, publish OK. | -| 05 | [05-app-nextjs.md](scenarios/05-app-nextjs.md) | `2026-04-08T17-21-22-170Z-scenario-05` | 9/10; overall **Fail** | Pass | Pass | `**nextjs-webhook-demo/`** — primary assessed run; see **§ Scenario 05 — assessment (`17-21-22`)**. Heuristic failure is `managed_base_not_selfhosted` (doc-corpus), not the app. Earlier: `2026-04-08T16-12-10-708Z` (`outpost-nextjs-demo/`, 10/10 heuristic, simpler UI). | -| 06 | [06-app-fastapi.md](scenarios/06-app-fastapi.md) | | | | | | -| 07 | [07-app-go-http.md](scenarios/07-app-go-http.md) | | | | | | -| 08 | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md) | | | | | | -| 09 | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) | | | | | | -| 10 | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md) | | | | | | +| ID | Scenario file | Run directory (`results/runs/…`) | Heuristic | LLM judge | Execution (generated code) | Notes | +| --- | ------------------------------------------------------------------------------ | -------------------------------------- | ---------------------- | --------- | -------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| 01 | [01-basics-curl.md](scenarios/01-basics-curl.md) | `2026-04-08T15-07-08-923Z-scenario-01` | Pass (7/7) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost-quickstart.sh`**. Execution: `docs/agent-evaluation/.env` + `./outpost-quickstart.sh`; tenant 200, destination 201, publish **202**; exit 0. | +| 02 | [02-basics-typescript.md](scenarios/02-basics-typescript.md) | `2026-04-08T15-16-50-424Z-scenario-02` | Pass (9/9) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost-quickstart.ts`** + `package.json` (SDK). Ran `npx tsx outpost-quickstart.ts` with `docs/agent-evaluation/.env`; tenant, destination, publish OK. | +| 03 | [03-basics-python.md](scenarios/03-basics-python.md) | `2026-04-08T15-34-12-720Z-scenario-03` | Pass (8/8) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifact: `**outpost_quickstart.py`**. Execution: `python3 -m venv .venv`, `pip install outpost_sdk`, `docs/agent-evaluation/.env` + `python outpost_quickstart.py`. | +| 04 | [04-basics-go.md](scenarios/04-basics-go.md) | `2026-04-08T15-48-31-367Z-scenario-04` | Pass (9/9) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. Artifacts: `**main.go`**, `go.mod` (replace → repo `sdks/outpost-go`). `docs/agent-evaluation/.env` + `go run .`; tenant, destination, publish OK. | +| 05 | [05-app-nextjs.md](scenarios/05-app-nextjs.md) | `2026-04-08T17-21-22-170Z-scenario-05` | 9/10; overall **Fail** | Pass | Pass | `**nextjs-webhook-demo/`** — primary assessed run; see **§ Scenario 05 — assessment (`17-21-22`)**. Heuristic failure is `managed_base_not_selfhosted` (doc-corpus), not the app. Earlier: `2026-04-08T16-12-10-708Z` (`outpost-nextjs-demo/`, 10/10 heuristic, simpler UI). | +| 06 | [06-app-fastapi.md](scenarios/06-app-fastapi.md) | `2026-04-09T08-38-42-008Z-scenario-06` | Pass (8/8) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. `**main.py`** + `requirements.txt`, `outpost_sdk` + FastAPI. HTML: destinations list, add webhook (topics from API + URL), publish test event, delete. Execution: `python3 -m venv .venv`, `pip install -r requirements.txt`, run-dir `.env`, `uvicorn main:app` on :8766; **GET /** 200, **POST /destinations** 303, **POST /publish** 303. | +| 07 | [07-app-go-http.md](scenarios/07-app-go-http.md) | `2026-04-09T09-10-23-291Z-scenario-07` | Pass (9/9) | Pass | Pass | `EVAL_LOCAL_DOCS=1`. **`go-portal-demo/`** — `main.go` + `templates/`, `net/http`, `outpost-go` (`replace` → repo `sdks/outpost-go`). Multi-step create destination + **GET/POST /publish**. Execution: `PORT=8777` + key/base from `docs/agent-evaluation/.env`; **GET /** 200, **POST /publish** 200. Eval ~25 min wall time. | +| 08 | [08-integrate-nextjs-existing.md](scenarios/08-integrate-nextjs-existing.md) | | | | | | +| 09 | [09-integrate-fastapi-existing.md](scenarios/09-integrate-fastapi-existing.md) | | | | | | +| 10 | [10-integrate-go-existing.md](scenarios/10-integrate-go-existing.md) | | | | | | ### Column hints @@ -66,7 +66,7 @@ Use short text or symbols in cells, e.g. **Pass** / **Fail** / **Skip** / **N/A* **What the app demonstrates (UX / model):** 1. **Tenant** — Editable tenant id; copy states destinations and publishes are scoped to it. -2. **Register webhook destination** — URL field + **topic checkboxes** populated from `**GET /api/topics`** (server lists topics from Outpost); `**POST /api/destinations**` upserts tenant and creates webhook destination for selected topics. +2. **Register webhook destination** — URL field + **topic checkboxes** populated from `**GET /api/topics`** (server lists topics from Outpost); `**POST /api/destinations`** upserts tenant and creates webhook destination for selected topics. 3. **Destinations list** — `**GET /api/destinations?tenantId=`** table (type, target, topics) with refresh — matches “tenant → many destinations” mental model. 4. **Publish test event** — Separate action; `**POST /api/publish`** with chosen topic; UI notes fan-out to matching destinations. From 89afda8ec7bb8ef0b0f07f997ff89d03a65c9e87 Mon Sep 17 00:00:00 2001 From: Phil Leggetter Date: Thu, 9 Apr 2026 16:32:29 +0100 Subject: [PATCH 14/47] docs: fix List Topics UI example for string[] API response MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit GET /topics returns a JSON array of topic names (OpenAPI). The React snippet incorrectly treated items as objects with id and name, which misled readers and agent integrations. Use the string as key, value, and label to match the API and TypeScript SDK (topicsList → Array). Made-with: Cursor --- docs/pages/guides/building-your-own-ui.mdx | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/pages/guides/building-your-own-ui.mdx b/docs/pages/guides/building-your-own-ui.mdx index 73fe8c135..3b5e1711b 100644 --- a/docs/pages/guides/building-your-own-ui.mdx +++ b/docs/pages/guides/building-your-own-ui.mdx @@ -235,9 +235,9 @@ return (

Select topics

{topics.map((topic) => ( -