Skip to content

Fix self-hosting secrets and env docs#774

Merged
simple-agent-manager[bot] merged 1 commit intomainfrom
sam/self-hosting-docs-env-fixes
Apr 21, 2026
Merged

Fix self-hosting secrets and env docs#774
simple-agent-manager[bot] merged 1 commit intomainfrom
sam/self-hosting-docs-env-fixes

Conversation

@simple-agent-manager
Copy link
Copy Markdown
Contributor

@simple-agent-manager simple-agent-manager Bot commented Apr 21, 2026

Summary

  • Correct self-hosting docs for Cloudflare API token scopes, health checks, DNS routing, runtime vars, and Worker secrets.
  • Fix GitHub Actions webhook secret mapping from GH_WEBHOOK_SECRET to Worker GITHUB_WEBHOOK_SECRET.
  • Align env/config references with current task-title, trial orchestrator, and TTS defaults.

Validation

  • pnpm lint
  • pnpm typecheck
  • pnpm test
  • Additional validation run: pnpm exec prettier --check ..., bash -n scripts/deploy/configure-secrets.sh, pnpm exec tsc -p scripts/deploy/tsconfig.json --noEmit, pnpm --filter @simple-agent-manager/www build, git diff --check, stale-string grep pass.

Staging Verification (REQUIRED for all code changes — merge-blocking)

All checkboxes below are mandatory for any PR that changes runtime code (.ts, .tsx, .go, etc.). Write N/A: docs-only ONLY if the PR contains zero runtime code changes. See .claude/rules/13-staging-verification.md.

  • Staging deployment green — N/A: docs and deployment-config validation only; no app runtime behavior changed.
  • Live app verified via Playwright — N/A: no UI/runtime app behavior changed.
  • Existing workflows confirmed working — N/A: no UI/runtime app behavior changed; deployment script validation and CI cover the changed path.
  • New feature/fix verified on staging — N/A: no live staging behavior to exercise before merge; this PR updates docs, workflow secret mapping, and deploy validation metadata.
  • Infrastructure verification completed — deployment script syntax, deploy TypeScript check, CI Validate Deploy Scripts, and Pulumi Infrastructure Tests passed for the changed deployment/infra-adjacent paths.
  • Mobile and desktop verification notes added for UI changes — N/A: no UI changes.

Staging Verification Evidence

N/A: this PR does not change app runtime behavior. The infra-adjacent deployment changes were validated with bash -n scripts/deploy/configure-secrets.sh, pnpm exec tsc -p scripts/deploy/tsconfig.json --noEmit, CI Validate Deploy Scripts, and CI Pulumi Infrastructure Tests.

UI Compliance Checklist (Required for UI changes)

  • Mobile-first layout verified — N/A: no UI changes.
  • Accessibility checks completed — N/A: no UI changes.
  • Shared UI components used or exception documented — N/A: no UI changes.
  • Playwright visual audit run locally — N/A: no UI changes.

End-to-End Verification (Required for multi-component changes)

  • Data flow traced from user input to final outcome with code path citations (see .claude/rules/10-e2e-verification.md)
  • Capability test exercises the complete happy path across system boundaries
  • All spec/doc assumptions about existing behavior verified against code (not just "read the code")
  • If any gap exists between automated test coverage and full E2E, manual verification steps documented below

Data Flow Trace

  • GitHub Actions GH_WEBHOOK_SECRET is validated by .github/workflows/deploy-reusable.yml, passed to scripts/deploy/configure-secrets.sh, and mapped to the Worker secret name GITHUB_WEBHOOK_SECRET.
  • Deployment secret requirements are represented in scripts/deploy/types.ts and consumed by deploy validation paths.
  • Runtime Worker vars are defined at top-level in apps/api/wrangler.toml and copied into generated environment sections by scripts/deploy/sync-wrangler-config.ts.
  • The documented /health endpoint matches apps/api/src/index.ts, which returns status: healthy and a timestamp.

Untested Gaps

No live staging deploy was run because this PR does not change app runtime behavior. The changed deploy-script path was validated locally and by CI deploy-script/infrastructure checks.

Post-Mortem (Required for bug fix PRs)

N/A: not a production bug fix. This PR corrects stale self-hosting documentation and deployment secret mapping documentation/configuration.

What broke

N/A: not a production bug fix.

Root cause

N/A: not a production bug fix.

Class of bug

N/A: not a production bug fix.

Why it wasn't caught

N/A: not a production bug fix.

Process fix included in this PR

Updated environment reference and validation documentation in .claude/skills/env-reference/SKILL.md, .claude/agents/env-validator/ENV_VALIDATOR.md, .claude/rules/07-env-and-urls.md, AGENTS.md, CLAUDE.md, and .specify/memory/constitution.md.

Post-mortem file

N/A: not a production bug fix.

Specialist Review Evidence (Required for agent-authored PRs)

  • All dispatched reviewers completed and findings addressed before merge
  • If any reviewer did NOT complete: needs-human-review label added and merge deferred to human
Reviewer Status Outcome
env-validator PASS GH/GITHUB mapping and env reference consistency reviewed directly against code and docs.
doc-sync-validator PASS Self-hosting, configuration, architecture, and docs-site copies aligned with implementation.
cloudflare-specialist PASS Cloudflare token scopes, routes, Worker vars/secrets, and current docs checked.

Exceptions (If any)

  • Scope: no live staging deploy for docs/deploy-script-only correction.
  • Rationale: no application runtime path or UI behavior changed; validation covered formatting, deploy scripts, docs build, typecheck, lint, tests, and CI deploy-script/infra checks.
  • Expiration: this PR only.

Agent Preflight (Required)

  • Preflight completed before code changes

Classification

  • external-api-change
  • cross-component-change
  • business-logic-change
  • public-surface-change
  • docs-sync-change
  • security-sensitive-change
  • ui-change
  • infra-change

External References

Official documentation consulted before coding: Cloudflare Wrangler environments (https://developers.cloudflare.com/workers/wrangler/environments/), Cloudflare API token permissions (https://developers.cloudflare.com/fundamentals/api/reference/permissions/), Cloudflare Origin CA keys/deprecation (https://developers.cloudflare.com/fundamentals/api/get-started/ca-keys/, https://developers.cloudflare.com/fundamentals/api/reference/deprecations/), and Cloudflare Worker routes (https://developers.cloudflare.com/workers/configuration/routing/routes/).

Codebase Impact Analysis

Affected components and paths: .github/workflows/deploy-reusable.yml for GitHub Environment secret validation and pass-through, scripts/deploy/configure-secrets.sh for GH-to-Worker secret mapping, scripts/deploy/types.ts for required secret metadata, scripts/deploy/sync-wrangler-config.ts for route comment accuracy, apps/api/.env.example for current env defaults, docs under docs/, docs-site content under apps/www/src/content/docs/docs/, and agent/reference docs under .claude/, AGENTS.md, CLAUDE.md, and .specify/memory/constitution.md.

Documentation & Specs

Updated docs/guides/self-hosting.md, apps/www/src/content/docs/docs/guides/self-hosting.md, apps/www/src/content/docs/docs/reference/configuration.md, apps/www/src/content/docs/docs/guides/chat-features.md, docs/architecture/secrets-taxonomy.md, docs/architecture/credential-security.md, apps/www/src/content/docs/docs/architecture/security.md, docs/guides/deployment-troubleshooting.md, .claude/skills/env-reference/SKILL.md, .claude/agents/env-validator/ENV_VALIDATOR.md, .claude/rules/07-env-and-urls.md, AGENTS.md, CLAUDE.md, and .specify/memory/constitution.md.

Constitution & Risk Check

Checked Principle XI for configurable values and no hardcoded URLs/timeouts/limits beyond documented defaults. Key risks were GitHub Actions GH_* versus Worker GITHUB_* naming drift, Cloudflare token-scope drift, non-inheritable Wrangler environment vars, and stale self-hosting DNS/health-check instructions; those are now documented and mapped consistently.

@simple-agent-manager simple-agent-manager Bot force-pushed the sam/self-hosting-docs-env-fixes branch from 50cfc45 to 3f1de7f Compare April 21, 2026 10:13
@simple-agent-manager simple-agent-manager Bot force-pushed the sam/self-hosting-docs-env-fixes branch from 3f1de7f to 68f23cf Compare April 21, 2026 10:18
@sonarqubecloud
Copy link
Copy Markdown

@simple-agent-manager simple-agent-manager Bot merged commit 6f13b8f into main Apr 21, 2026
19 checks passed
@simple-agent-manager simple-agent-manager Bot deleted the sam/self-hosting-docs-env-fixes branch April 21, 2026 10:22
simple-agent-manager Bot added a commit that referenced this pull request Apr 23, 2026
* fix: make GH_WEBHOOK_SECRET optional in deploy validation

The webhook secret is not essential for staging testing. Downgrade from
a hard deploy blocker to a notice, matching the existing pattern used
for CF_ORIGIN_CA_KEY. This unblocks staging deployments that have been
failing since PR #774 added the validation check.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: create ACP session in ProjectData DO during task execution

The task runner created a D1 agent_sessions record and started the agent
on the node, but never created an ACP session in the ProjectData DO.
Without this, the chat session's agentSessionId lookup returned null,
preventing the browser from establishing an ACP WebSocket connection.
This caused the "Agent offline" banner to appear even while the agent
was actively running and producing output.

Now creates the ACP session (pending → assigned → running) in the
ProjectData DO during the agent_session step, linking it to the chat
session so the browser can connect via ACP.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: align ACP session ID with D1 agent session ID for WebSocket routing

The browser passes the ACP session ID from the ProjectData DO as the
sessionId query parameter when connecting to the VM agent WebSocket.
But the VM agent looks up sessions by the D1 agent session ID that was
used during createAgentSessionOnNode. These were different IDs, so the
VM agent couldn't find the running session and created a new empty one,
reporting status 'idle' instead of 'ready'.

Fix: pass the D1 agent session ID as an explicit ID when creating the
ACP session in the ProjectData DO, ensuring both IDs match. Add an
optional 'id' parameter to createAcpSession throughout the stack.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Raphaël Titsworth-Morin <raphael@raphaeltm.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant