Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
1638fe1
docs: add Hookdeck Outpost managed quickstarts and agent prompt
leggetter Apr 7, 2026
e089721
docs: add Outpost agent evaluation harness and scenarios
leggetter Apr 8, 2026
76d7c9b
docs(agent-eval): prompt mapping, scenarios, harness; reset scenario …
leggetter Apr 8, 2026
3bc5469
docs(agent-eval): record fresh scenario 01 eval run in tracker
leggetter Apr 8, 2026
241dae6
fix(agent-eval): remove harness-only 202/head hints from local docs b…
leggetter Apr 8, 2026
6b1fd4b
docs(agent-eval): update scenario 01 tracker after re-run and executi…
leggetter Apr 8, 2026
556b77f
docs(agent-eval): record scenario 02 run and execution pass
leggetter Apr 8, 2026
46e6dcc
docs(agent-eval): fix tracker table formatting and artifact markdown
leggetter Apr 8, 2026
f57b59d
docs(agent-eval): record scenario 03 run and execution pass
leggetter Apr 8, 2026
803b51c
docs(agent-eval): record scenario 04 run and execution pass
leggetter Apr 8, 2026
f600652
docs(agent-eval): record scenario 05 run and execution pass
leggetter Apr 8, 2026
e1e5154
docs: Outpost mental model, UI guide agnostic URLs, agent prompt links
leggetter Apr 8, 2026
1c6042b
docs(agent-eval): record scenario 06–07 runs and execution passes
leggetter Apr 9, 2026
89afda8
docs: fix List Topics UI example for string[] API response
leggetter Apr 9, 2026
78845ab
feat(agent-eval): declarative pre-steps via Eval harness section
leggetter Apr 9, 2026
77f2608
docs(agent-eval): harness blocks for existing-app scenarios 08–10
leggetter Apr 9, 2026
e766d98
chore(agent-eval): update SCENARIO-RUN-TRACKER for recent runs
leggetter Apr 9, 2026
43a3f3c
docs(pages): guide Option 3 full-stack Outpost integration
leggetter Apr 9, 2026
7ecdbee
docs(agent-eval): scenario 09 uses full-stack FastAPI template
leggetter Apr 9, 2026
12bca0d
docs(agent-eval): record scenario 09 re-eval after prompt update
leggetter Apr 9, 2026
9bad021
docs: scenario 09 tracker, agent prompt, BYO UI events/retry guidance
leggetter Apr 9, 2026
7ab552d
docs: refresh Building Your Own UI guide
leggetter Apr 10, 2026
320c039
docs(eval): align scenarios 08–10, prompt, and heuristics
leggetter Apr 10, 2026
97aaa24
docs(eval): de-meta user turns in scenarios 8–10
leggetter Apr 10, 2026
d5eef91
feat(eval): extend scenario 09 transcript heuristics
leggetter Apr 10, 2026
e415e33
feat(eval): persist run lifecycle sidecars
leggetter Apr 10, 2026
cbb6c51
docs(eval): authoring AGENTS, README, shared Cursor rule
leggetter Apr 10, 2026
ce0be6b
feat(agent-evaluation): read/bash sandbox and sibling harness sidecars
leggetter Apr 10, 2026
8ab658f
docs(agent-evaluation): document sidecars, sandbox, and env vars
leggetter Apr 10, 2026
cc6e7e0
docs(agent-evaluation): update scenario 01 tracker row
leggetter Apr 10, 2026
cee7ff4
docs(agent-evaluation): update scenario 02 tracker row
leggetter Apr 10, 2026
60f73f4
docs: scope-router Outpost agent prompt and refresh basics tracker rows
leggetter Apr 10, 2026
33653dd
fix(api): add DestinationSchemaField.key to OpenAPI spec
leggetter Apr 10, 2026
e7d2209
docs: refine Building your own UI guide and onboarding agent prompt
leggetter Apr 10, 2026
8f240ec
docs(eval): tighten scenarios 08–10 and transcript heuristics
leggetter Apr 10, 2026
2031762
docs(eval): document wall time for heavy baseline scenarios
leggetter Apr 10, 2026
4186ca3
docs(eval): update scenario run tracker for scenario 08
leggetter Apr 10, 2026
8bf761a
Merge origin/feat/refactor-docs into feat/hookdeck-outpost-quickstarts
leggetter Apr 10, 2026
c83d43d
docs: drop destinations overview hub; clarify OSS hosting in concepts
leggetter Apr 10, 2026
14ab5a2
docs: use hookdeck.com/docs/outpost for production doc links
leggetter Apr 10, 2026
81b2aff
docs(eval): Hookdeck prod as default {{DOCS_URL}}; fix harness doc paths
leggetter Apr 10, 2026
4d7a91a
docs(agent-evaluation): refresh tracker, scenarios, and harness docs
leggetter Apr 10, 2026
b7316f4
docs(eval): record scenario 10 pass in run tracker
leggetter Apr 10, 2026
66fd663
ci(docs): agent eval workflow with live Outpost execution
leggetter Apr 10, 2026
736a23f
ci(docs): allow workflow_dispatch for manual agent eval runs
leggetter Apr 10, 2026
9ab3771
ci(docs): fix workflow YAML (paths vs paths-ignore); document dispatch
leggetter Apr 10, 2026
49d5713
fix(agent-eval): eval:ci argv — drop stray -- before --scenarios
leggetter Apr 10, 2026
052e48f
fix(agent-eval): execution defaults, smoke test, CI env for live Outpost
leggetter Apr 11, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions .cursor/rules/agent-evaluation-authoring.mdc
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
---
description: Authoring standards for docs/agent-evaluation (no eval leakage in user turns)
globs: docs/agent-evaluation/**/*
---

When editing anything under `docs/agent-evaluation/`, read and follow **`docs/agent-evaluation/AGENTS.md`**.

**Quick guardrails for `scenarios/*.md`:**

- **`### Turn N — User`** blockquotes = in-character **product engineer** speech only.
- **Never** in user lines: `Option 1/2/3`, `Turn 0`, `scenario`, `eval`, `success criteria`, `scoreScenario`, references to “the prompt/instructions you already have” or named template sections.
- Put rubric detail in **`## Success criteria`** / **Intent** / **Failure modes**, not in the user quote.

Full checklist and rationale: **`docs/agent-evaluation/AGENTS.md`**.
70 changes: 70 additions & 0 deletions .github/workflows/docs-agent-eval-ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# Runs scenarios 01+02 (curl + TypeScript SDK) with heuristic + LLM judge.
# Sets EVAL_LOCAL_DOCS=1 so the agent reads repo docs under docs/ (not production WebFetch).
# Triggers: workflow_dispatch, or push (main) / pull_request when docs / OpenAPI / agent-eval / TS SDK paths change.
# Each run bills Anthropic (agent + judge).
# Requires repo secrets: ANTHROPIC_API_KEY, EVAL_TEST_DESTINATION_URL, OUTPOST_API_KEY
# (OUTPOST_TEST_WEBHOOK_URL uses the same URL as EVAL_TEST_DESTINATION_URL in CI.)
# See docs/agent-evaluation/README.md § CI (recommended slice).
name: Docs agent eval (CI slice)

on:
workflow_dispatch:
push:
branches:
- main
paths:
- "docs/content/**"
- "docs/apis/**"
- "docs/agent-evaluation/**"
- "docs/README.md"
- "docs/AGENTS.md"
- "sdks/outpost-typescript/**"
- ".github/workflows/docs-agent-eval-ci.yml"
pull_request:
paths:
- "docs/content/**"
- "docs/apis/**"
- "docs/agent-evaluation/**"
- "docs/README.md"
- "docs/AGENTS.md"
- "sdks/outpost-typescript/**"
- ".github/workflows/docs-agent-eval-ci.yml"

jobs:
eval-ci:
# Fork PRs cannot use repository secrets; skip instead of failing a required-looking job.
if: github.event_name != 'pull_request' || github.event.pull_request.head.repo.full_name == github.repository
runs-on: ubuntu-latest
timeout-minutes: 60
defaults:
run:
working-directory: docs/agent-evaluation

steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: "20"
cache: npm
cache-dependency-path: docs/agent-evaluation/package-lock.json

- name: Install dependencies
run: npm ci

- name: Run eval CI slice (scenarios 01, 02)
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
EVAL_TEST_DESTINATION_URL: ${{ secrets.EVAL_TEST_DESTINATION_URL }}
EVAL_LOCAL_DOCS: "1"
run: ./scripts/ci-eval.sh

- name: Execute generated curl + TypeScript artifacts (live Outpost)
env:
OUTPOST_API_KEY: ${{ secrets.OUTPOST_API_KEY }}
OUTPOST_TEST_WEBHOOK_URL: ${{ secrets.EVAL_TEST_DESTINATION_URL }}
OUTPOST_API_BASE_URL: https://api.outpost.hookdeck.com/2025-07-01
OUTPOST_CI_PUBLISH_TOPIC: user.created
run: ./scripts/execute-ci-artifacts.sh
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,10 +1,15 @@
# Environment variables
.env
.env.ci
.outpost.yaml

# Built binaries
/dist
/bin

# Documentation (local build artifacts; content lives under docs/content/)
/docs/dist/
/docs/TEMP-*.md
/tmp

# Golang test coverage
Expand Down
5 changes: 5 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Coding agent notes (Outpost)

When you change files under **`docs/agent-evaluation/`** (scenarios, scoring, harness docs), read and apply **[`docs/agent-evaluation/AGENTS.md`](docs/agent-evaluation/AGENTS.md)** first. It defines anti–“teach to the test” rules for user-turn wording and scenario structure.

For this repo’s PR review format, see **`CLAUDE.md`**.
22 changes: 11 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ Outpost is built and maintained by [Hookdeck](https://hookdeck.com?ref=github-ou

![Outpost architecture](docs/public/images/architecture.png)

Read [Outpost Concepts](https://outpost.hookdeck.com/docs/concepts) to learn more about the Outpost architecture and design.
Read [Outpost Concepts](https://hookdeck.com/docs/outpost/concepts) to learn more about the Outpost architecture and design.

## Features

Expand All @@ -70,17 +70,17 @@ Read [Outpost Concepts](https://outpost.hookdeck.com/docs/concepts) to learn mor
- **Webhook best practices**: Opt-out webhook best practices, such as headers for idempotency, timestamp and signature, and signature rotation.
- **SDKs and MCP server**: Go, Python, and TypeScript SDK are available. Outpost also ships with an MCP server. All generated by [Speakeasy](https://speakeasy.com).

See the [Outpost Features](https://outpost.hookdeck.com/docs/features) for more information.
See the [Outpost Features](https://hookdeck.com/docs/outpost/features) for more information.

## Documentation

- [Overview](https://outpost.hookdeck.com/docs/overview)
- [Concepts](https://outpost.hookdeck.com/docs/concepts)
- [Quickstarts](https://outpost.hookdeck.com/docs/quickstarts)
- [Features](https://outpost.hookdeck.com/docs/features)
- [Guides](https://outpost.hookdeck.com/docs/guides)
- [API Reference](https://outpost.hookdeck.com/docs/api)
- [Configuration Reference](https://outpost.hookdeck.com/docs/references/configuration)
- [Overview](https://hookdeck.com/docs/outpost/overview)
- [Concepts](https://hookdeck.com/docs/outpost/concepts)
- [Quickstarts](https://hookdeck.com/docs/outpost/quickstarts)
- [Features](https://hookdeck.com/docs/outpost/features)
- [Guides](https://hookdeck.com/docs/outpost/guides)
- [API Reference](https://hookdeck.com/docs/outpost/api)
- [Configuration Reference](https://hookdeck.com/docs/outpost/self-hosting/configuration)

_The Outpost documentation is built using the [Zudoku documentation framework](https://zuplo.link/outpost)._

Expand Down Expand Up @@ -144,7 +144,7 @@ For other cloud Redis services or self-hosted Redis clusters, set `REDIS_CLUSTER
```sh
go run cmd/redis-debug/main.go your-redis-host 6379 password 0 [tls] [cluster]
```
See the [Redis Troubleshooting Guide](https://docs.outpost.hookdeck.com/references/troubleshooting-redis) for detailed guidance.
See the [Redis Troubleshooting Guide](https://hookdeck.com/docs/outpost/self-hosting/guides/troubleshooting-redis) for detailed guidance.

Start the Outpost dependencies and services:

Expand Down Expand Up @@ -241,7 +241,7 @@ Open the `redirect_url` link to view the Outpost portal.

![Dashboard homepage](docs/public/images/dashboard-homepage.png)

Continue to use the [Outpost API](https://outpost.hookdeck.com/docs/api) or the Outpost portal to add and test more destinations.
Continue to use the [Outpost API](https://hookdeck.com/docs/outpost/api) or the Outpost portal to add and test more destinations.

## Contributing

Expand Down
2 changes: 1 addition & 1 deletion build/entrypoint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ if ! /usr/local/bin/outpost migrate init --current --log-format=json; then
echo " docker run --rm hookdeck/outpost migrate --help"
echo ""
echo "Learn more about Outpost migration workflow at:"
echo " https://outpost.hookdeck.com/docs/guides/migration"
echo " https://hookdeck.com/docs/outpost/self-hosting/guides/migration"
echo ""
exit 1
fi
Expand Down
37 changes: 37 additions & 0 deletions docs/agent-evaluation/.env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Copy to .env and fill in. .env is gitignored at the repo root.

# Required for npm run eval (Claude Agent SDK — calls Anthropic only)
ANTHROPIC_API_KEY=

# Required for Turn 0 template (test webhook URL injected into the prompt)
EVAL_TEST_DESTINATION_URL=

# Strongly recommended for a *full* eval: run the agent’s curl/script/app against a real project.
# The harness does not read this key; you (or a future verifier) use it after the run.
# OUTPOST_API_KEY= # required for ./scripts/execute-ci-artifacts.sh after eval:ci; GitHub Actions CI execution step
# OUTPOST_API_BASE_URL=https://api.outpost.hookdeck.com/2025-07-01
# OUTPOST_TEST_WEBHOOK_URL=https://hkdk.events/your-source-id # often same as EVAL_TEST_DESTINATION_URL
# OUTPOST_CI_PUBLISH_TOPIC=user.created # optional; publish topic for npm run smoke:execute-ci (must exist in project)

# Optional (see npm run eval -- --help)
# EVAL_API_BASE_URL=https://api.outpost.hookdeck.com/2025-07-01
# EVAL_TOPICS_LIST=- user.created
# EVAL_DOCS_URL=https://hookdeck.com/docs/outpost
# EVAL_LOCAL_DOCS=1
# EVAL_LLMS_FULL_URL=
# Default includes Write, Edit, Bash (per-run workspace + installs). Override to narrow:
# EVAL_TOOLS=Read,Glob,Grep,WebFetch,Write,Edit,Bash
# EVAL_MODEL=
# EVAL_MAX_TURNS=40
# Long runs (08–10): periodic stderr heartbeats while each agent query is in flight
# EVAL_PROGRESS=1
# EVAL_PROGRESS_INTERVAL_MS=30000
# EVAL_PERMISSION_MODE=dontAsk
# EVAL_PERSIST_SESSION=true
# Debug only: allow Write/Edit outside the per-run workspace (not recommended)
# EVAL_DISABLE_WORKSPACE_WRITE_GUARD=1

# Scoring is ON by default after each scenario (heuristic + LLM). Opt out:
# EVAL_NO_SCORE_HEURISTIC=1
# EVAL_NO_SCORE_LLM=1
# EVAL_SCORE_MODEL=claude-sonnet-4-20250514
46 changes: 46 additions & 0 deletions docs/agent-evaluation/AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Agent evaluation — authoring rules for humans & coding agents

This file applies to **everything under `docs/agent-evaluation/`** (scenarios, README, tracker, harness TypeScript). Follow it when adding or editing eval specs so we do not **teach to the test** or confuse **evaluator docs** with **in-character user speech**.

## Who reads what

| Audience | Content |
|----------|---------|
| **The model under test** | Turn 0 = pasted [`hookdeck-outpost-agent-prompt.mdoc`](../content/quickstarts/hookdeck-outpost-agent-prompt.mdoc) template only, plus **Turn N — User** blockquotes (verbatim user role-play). |
| **Humans / harness** | Intent, preconditions, eval harness JSON, Success criteria, Failure modes, `score-transcript.ts`, README. |

**Never** put harness vocabulary into **user** lines. The user is a product engineer, not an eval runner.

## Anti-leakage rules (user turns)

In **`### Turn N — User`** blockquotes, **do not** use:

- **Option 1 / 2 / 3** (those labels exist only inside the dashboard template; a real user says what they want in plain language).
- **Turn 0**, **Turn 1**, or any **turn** numbering (that is script metadata).
- Phrases like **“the instructions you already have”**, **“the full-stack section of the prompt”**, **“follow the Hookdeck Outpost template”** as a stand-in for requirements (the model already has Turn 0; state the *product ask*, not a pointer to a doc section).
- **“Match the prompt”**, **“dashboard prompt”**, **“eval”**, **“scenario”**, **“success criteria”**, **heuristic names**, **`scoreScenarioNN`**.

**Do** use natural operator language: stack, repo, product behavior, security (key on server), domain topics, README/env, Hookdeck project/topics **as the customer would say them**.

It is fine for **Success criteria**, **Failure modes**, and **Intent** to name `scoreScenarioNN`, Turn 0, Option 3, etc. — those sections are not pasted as the user.

## Alignment without parroting

- **Product bar** (domain publish, topic reconciliation, full-stack UI depth) belongs in **Success criteria** and in the **prompt template** in `hookdeck-outpost-agent-prompt.mdoc`.
- **User turns** should **request outcomes** (“I need customers to see failed deliveries and retry”) not **cite** where in the template that is spelled out.

If you add a new requirement, update **Success criteria** (and heuristics only when a **durable, low–false-positive** check exists). Do not stuff the verbatim rubric into the user quote.

## Pre-merge checklist (scenarios)

Before merging changes to `scenarios/*.md`:

- [ ] Every **`> ...` user** line reads like a **real customer** message (read aloud test).
- [ ] No **Option N** / **Turn 0** / **scenario** / **prompt section** leakage in user blockquotes.
- [ ] **Success criteria** still state the full bar; nothing removed from criteria and only moved into user text.
- [ ] If integration depth changed, **`src/score-transcript.ts`** and this **README** scenario table are updated when rubrics change.

## Where Cursor loads this

- A **repo-root** [`AGENTS.md`](../../AGENTS.md) points here so agents see this folder’s rules.
- [`.cursor/rules/agent-evaluation-authoring.mdc`](../../.cursor/rules/agent-evaluation-authoring.mdc) applies when editing paths under `docs/agent-evaluation/`.
Loading
Loading