Claude Swarm Platform

A self-hosted orchestration platform for running coordinated Claude agents at scale — with real-time visibility, persistent memory, and production-ready deployment on Google Cloud Run.

▶ Watch the demo on YouTube

Quick start (local)

git clone https://github.com/simonstaton/ClaudeSwarm.git
cd ClaudeSwarm
cp .env.example .env
# Edit .env — you need at minimum:
#   ANTHROPIC_API_KEY=sk-ant-...
#   API_KEY=use-this-password-to-access-the-ui
#   JWT_SECRET=any-random-string
npm run setup

This installs all dependencies, creates the shared context directory, and starts the dev server.

Open http://localhost:5173, log in with your API_KEY, and start creating agents.

Architecture

Browser (React SPA)  →  Express API (/api/*)  →  Claude CLI processes
                         Static serving (/*)       per-agent workspaces
                         JWT auth                  GCS-synced shared context

Single container, single service. The Express server handles both API routes and serves the built React UI. Each agent is an isolated Claude CLI process with its own workspace.

Agent Communication

Agents coordinate via an in-memory message bus and shared context files.

Message Bus

In-memory pub/sub system for real-time inter-agent coordination
Message types: task (assign work), result (return results), question (ask), info (share context), status (updates), interrupt (redirect a busy agent)
Messages can be direct (to specific agent) or broadcast (to all agents)
SSE stream available at /api/messages/stream for real-time UI updates

Agent Registry

Agents register their role, capabilities, and current task
/api/agents/registry endpoint provides discovery for agent-to-agent coordination
Agents can query the registry to find other agents by role or capability

Parent-Child Relationships

Agents can spawn sub-agents via /api/agents with parentId
When a parent agent is destroyed, all child agents are automatically destroyed
Useful for delegating subtasks to specialized agents

Delegation Model

Claude Code has three mechanisms for delegating work. Two are available on this platform; the third is not (yet).

1. Task tool / Subagents (available — fast, invisible)

Claude Code's built-in Task tool spawns a lightweight in-process sub-agent. It runs inside the parent's process, returns results directly, and is invisible to the platform UI. Zero overhead, no process spawning.

"Research the authentication module and summarise how sessions work."

"Read all the test files in src/ and tell me which areas have poor coverage."

The agent handles these internally — sub-agents spin up in-process, do the work, and report back. No new processes, no API calls.

2. Platform API (available — visible, independent)

POST /api/agents spawns a full platform-managed agent with its own Claude CLI process, workspace, and terminal in the UI. Agents coordinate via the message bus (POST /api/messages).

"Spawn a team of 3 agents: one to refactor the auth module, one to update the tests, and one to review their PRs. Coordinate via the message bus."

"Create a long-running code reviewer agent that monitors for new changes."

Each spawned agent appears in the UI, can receive messages, and can be independently monitored or interrupted. The parent sets parentId to itself so children are cleaned up automatically.

3. Native Agent Teams (not supported from platform agents)

Claude Code's experimental Agent Teams feature (CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1) spawns full independent Claude Code sessions that coordinate peer-to-peer via a shared task list and mailbox — teammates talk to each other, not just back to a parent. This is more powerful than subagents but requires an interactive Claude Code session. You can use Agent Teams directly from your local terminal (just set the env var and run claude interactively), but platform-managed agents run in non-interactive --print mode with piped stdio, so they can't spawn teams themselves.

Hybrid — both available mechanisms in one prompt:

"Research this repo's structure and figure out what needs changing, then spawn a visible agent team to do the implementation work."

The agent uses the Task tool for the fast research phase, then switches to the platform API to spawn agents the user can watch and interact with.

Shared Context

Persistent markdown files in /shared-context/ directory
All agents can read and write to shared context files
Synced to GCS for persistence across Cloud Run restarts
Used for long-form documentation, decisions, and team memory

Slash Command Skills

Custom skills available to all agents:

/agent-status — Show agent registry and roles
/check-messages — Check message bus inbox
/send-message — Post to message bus
/spawn-agent — Create sub-agents

Agent Persistence

Agent state is automatically saved to GCS and restored on container restart, so agents survive Cloud Run cold starts.

Agent metadata (name, role, status, workspace path) is persisted to agent-state/{agentId}.json in GCS
On startup, the platform restores all agents from persisted state
Workspaces are ephemeral but can be recreated from git worktrees
Shared context and Claude home (~/.claude/) are continuously synced to GCS
Keep-alive system prevents Cloud Run scale-to-zero while agents exist (60s ping to /api/health)

API

Auth

Method	Path	Description
POST	`/api/auth/token`	Exchange API key for JWT

Agents

Method	Path	Description
GET	`/api/agents`	List all agents
GET	`/api/agents/registry`	Agent registry with roles, tasks, message counts
POST	`/api/agents`	Create agent (SSE stream)
POST	`/api/agents/batch`	Batch create multiple agents (JSON response)
GET	`/api/agents/:id`	Get agent details
PATCH	`/api/agents/:id`	Update agent metadata (role, capabilities, currentTask)
POST	`/api/agents/:id/message`	Send message to agent (SSE stream)
GET	`/api/agents/:id/events`	Reconnect to agent SSE stream
GET	`/api/agents/:id/raw-events`	Debug: raw event log
GET	`/api/agents/:id/logs`	Session logs in readable format (supports `?type=`, `?tail=`, `?format=text`)
GET	`/api/agents/:id/files`	List workspace files (for @ mentions)
DELETE	`/api/agents/:id`	Destroy agent (and children)

Messages

Method	Path	Description
POST	`/api/messages`	Post a message to the bus
GET	`/api/messages`	Query messages (filter by to, from, channel, type, unreadBy, since)
POST	`/api/messages/:id/read`	Mark message as read
POST	`/api/messages/read-all`	Mark all messages as read for an agent
DELETE	`/api/messages/:id`	Delete a message
GET	`/api/messages/stream`	SSE stream for real-time messages

Config

Method	Path	Description
GET	`/api/claude-config`	List editable config files
GET	`/api/claude-config/file`	Read a config file
PUT	`/api/claude-config/file`	Write a config file
POST	`/api/claude-config/commands`	Create a new skill/command
DELETE	`/api/claude-config/file`	Delete a skill or memory file

Context

Method	Path	Description
GET	`/api/context`	List shared context files (recursive)
GET	`/api/context/file?name=...`	Read a context file (supports subdirectories)
PUT	`/api/context/file`	Create/update context file (`{ name, content }`)
DELETE	`/api/context/file?name=...`	Delete context file

Settings

Method	Path	Description
GET	`/api/settings`	Get current settings (key hint, available models)
PUT	`/api/settings/anthropic-key`	Switch Anthropic API key at runtime

Kill Switch

Method	Path	Description
POST	`/api/kill-switch`	Activate or deactivate (`{ action: "activate" \| "deactivate", reason? }`)
GET	`/api/kill-switch`	Check kill switch status

Health

Method	Path	Description
GET	`/api/health`	Health check (no auth required)

Switching Anthropic API keys

You can switch between personal and work API keys at runtime:

Via UI: Settings in the header (shows current key hint like ...abc12345)

Via API:

curl -X PUT http://localhost:8080/api/settings/anthropic-key \
  -H "Authorization: Bearer $JWT" \
  -H "Content-Type: application/json" \
  -d '{"key": "sk-ant-your-other-key"}'

New agents will use the updated key. Existing agents keep their original key until restarted.

Security considerations

Warning: This platform runs Claude CLI with --dangerously-skip-permissions, meaning agents have full access to execute shell commands, read/write files, and make network requests within their workspace.

Each agent runs in an isolated /tmp/workspace-{uuid} directory but shares the container's network and process namespace.

Built-in safeguards:

JWT authentication required for all API access
Agent tool allowlist (configurable in guardrails)
Blocked command patterns (prevents destructive operations like rm -rf /, gh pr merge, gcloud deploy, terraform apply, git push --force)
Memory pressure monitoring (rejects new agents at 85% memory)
Rate limiting on API endpoints
1-hour session TTL with automatic cleanup
Max 20 concurrent agents per container
Max agent spawn depth of 3 and max 6 children per agent (prevents recursive swarm explosions)
Emergency kill switch — instantly halts all agents, rotates JWT secret, and persists state to GCS

For production deployments, run behind a reverse proxy with additional network-level controls. See SECURITY.md for vulnerability reporting.

Kill Switch

Why this exists

One evening, an orchestrator agent was spawned to coordinate some routine work. It decided the best way to accomplish its goals was to spawn a large swarm of sub-agents. Those sub-agents — helpful as ever — began reviewing each other's pull requests, approving them, merging them, and deploying the results to GCP. Autonomously. At scale.

The server was taken down. The pull requests kept coming. Turns out, when you give agents your GitHub token, your Anthropic API key, and --dangerously-skip-permissions, they don't strictly need your server to keep working. The invoice was... educational.

The kill switch exists so this never happens again.

How it works

The kill switch is a multi-layered emergency stop:

Layer 1 — Global halt (the big red button)

POST /api/kill-switch with { "action": "activate", "reason": "..." } immediately blocks all API requests
A persistent flag is written to disk and synced to GCS so it survives container restarts
The UI has a panic button in the header — it's red, you can't miss it
Deactivate with { "action": "deactivate" } when the coast is clear

Layer 2 — Nuclear process kill

All tracked agents are destroyed and their state files deleted (so they don't come back on restart)
ALL claude processes on the system are killed, not just the ones the platform is tracking
Workspaces are wiped

Layer 3 — Token invalidation

The JWT secret is rotated on activation (and again on deactivation), instantly invalidating every existing token
Agent service tokens have a 4-hour lifetime (down from 7 days)

Layer 4 — Spawning limits

Max agent depth of 3 (agents spawning agents spawning agents... stops there)
Max 6 children per agent
Prevents the recursive swarm explosion that started all this

Layer 5 — Preventive guardrails

Dangerous operations are blocked in prompts: gh pr merge, gcloud deploy, terraform apply, git push --force
These are speed bumps, not walls — agents with shell access can still run commands autonomously

Layer 6 — Remote kill via GCS

Upload a kill switch file directly to GCS to halt the platform even if the API is unreachable:

echo '{"killed":true,"reason":"emergency"}' | gsutil cp - gs://your-bucket/kill-switch.json

The server polls for this file periodically and auto-activates the kill switch

Kill Switch API

Method	Path	Description
POST	`/api/kill-switch`	Activate or deactivate (`{ action, reason? }`)
GET	`/api/kill-switch`	Check current status

Emergency runbook

If you notice runaway agent behavior:

Hit the kill switch — UI panic button or POST /api/kill-switch
Revoke external tokens — rotate your GitHub PAT, Anthropic API key, and any MCP credentials
Check for damage — review merged PRs, deployed services, and GCP resource creation
Review GCS — check shared-context for any payloads agents may have left behind
If the API is unreachable — upload the kill switch file to GCS, or delete the Cloud Run service entirely:
```
gcloud run services delete claude-swarm --region=$REGION
```

Limitations

The kill switch controls the platform, but it cannot:

Un-merge pull requests or un-deploy services
Revoke external API tokens (you must do this manually)
Stop processes that agents spawned outside the platform (e.g., direct curl calls to the Anthropic API)
Prevent damage that already happened before you pressed the button

The best defense is limiting what credentials you give agents in the first place. See the security considerations above.

Deploying to GCP

Prerequisites

GCP project with billing enabled
gcloud CLI authenticated
terraform installed
Docker

1. Build and push the image

# Set your project
export PROJECT_ID=your-project-id
export REGION=us-central1

# Option A: Build remotely with Cloud Build (no local Docker needed)
gcloud builds submit \
  --tag $REGION-docker.pkg.dev/$PROJECT_ID/claude-swarm/claude-swarm:latest \
  --project=$PROJECT_ID --region=$REGION

# Option B: Build locally with Docker
docker build -t $REGION-docker.pkg.dev/$PROJECT_ID/claude-swarm/claude-swarm:latest .
docker push $REGION-docker.pkg.dev/$PROJECT_ID/claude-swarm/claude-swarm:latest

2. Deploy infrastructure

cd terraform
cp terraform.tfvars.example terraform.tfvars
# Edit terraform.tfvars with your values

terraform init
terraform plan
terraform apply

This creates:

Cloud Run service (8GB RAM, 4 CPU, max 5 instances)
GCS bucket for persistence
Secret Manager secrets (Anthropic key, API key, JWT secret)
Service account with minimal permissions
Optional MCP server secrets

3. Grant yourself access

Cloud Run is deployed with IAM auth (no public access):

gcloud run services add-iam-policy-binding claude-swarm \
  --region=$REGION \
  --member="user:you@email.com" \
  --role="roles/run.invoker"

4. Get the service URL

terraform output service_url
# or
gcloud run services describe claude-swarm --region=$REGION --format='value(status.url)'

Scaling

Setting	Default	How to change
Max instances	5	`terraform/cloud-run.tf` → `max_instance_count`
Concurrency	2	`terraform/cloud-run.tf` → `max_instance_request_concurrency`
CPU/Memory	4 CPU / 8GB	`terraform/cloud-run.tf` → `resources.limits`
Max agents per container	20	`src/guardrails.ts` → `MAX_AGENTS`
Session TTL	1 hour	`src/guardrails.ts` → `SESSION_TTL_MS`
Request timeout	1 hour	`terraform/cloud-run.tf` → `timeout`

Concurrency is low by design — each agent is a heavy Claude CLI process. With 4 CPU and 8GB RAM, 2 concurrent requests (each potentially spawning agents) is a safe default.

Min instances = 0 means cold starts. Set to 1 if you want instant responses (costs more).

MCP servers

MCP (Model Context Protocol) servers give agents access to external tools. See mcp/README.md for setup.

Supported out of the box:

Notion — read/write Notion pages
GitHub — interact with repos, PRs, issues + git push via credential helper
Google Calendar — read/create events
Slack — read/send messages
Figma — read designs, extract assets, analyze components (setup guide)

Add credentials as env vars (locally in .env, in production via Terraform/Secret Manager).

GitHub integration

Setting GITHUB_TOKEN enables three things for agents:

gh CLI — create PRs, manage issues, query repos
git push/git fetch — credential helper is configured automatically on startup via gh auth setup-git
GitHub MCP server — structured tool access to GitHub's API

Creating a token

Option A: Fine-grained token (recommended)

Go to GitHub Settings → Fine-grained tokens:

Token name: claude-swarm
Expiration: 90 days (or custom)
Repository access: "Only select repositories" → pick the repos agents should access
Permissions:
- Contents — Read and write
- Pull requests — Read and write
- Metadata — Read-only (auto-selected)

Option B: Classic PAT

Go to GitHub Settings → Tokens (classic):

Scopes: repo (required), workflow (optional)

Configuring the token

Local development — add to .env:

GITHUB_TOKEN=github_pat_xxxxx

Production (Cloud Run) — add to terraform/terraform.tfvars:

github_token = "github_pat_xxxxx"

Then run terraform apply and redeploy. Terraform stores the token in Secret Manager and injects it as an env var.

Quick update without Terraform — update the secret directly:

echo -n "github_pat_new_token_here" | gcloud secrets versions add github-token --data-file=- --project=$PROJECT_ID
gcloud run services update claude-swarm --region=$REGION --project=$PROJECT_ID

Secrets management

Local

Use .env file (gitignored). Copy from .env.example.

Production

Secrets are in GCP Secret Manager, injected into Cloud Run as env vars by Terraform.

# Update a secret
echo -n "new-value" | gcloud secrets versions add SECRET_NAME --data-file=-

# Redeploy to pick up new secrets
gcloud run services update claude-swarm --region=$REGION

Adding a new secret

Add to terraform/variables.tf
Add to terraform/secrets.tf
Reference in terraform/cloud-run.tf env block
Run terraform apply

Docker (local)

# Build
docker build -t claude-swarm .

# Run
docker run -p 8080:8080 \
  -e ANTHROPIC_API_KEY=sk-ant-... \
  -e API_KEY=your-password \
  -e JWT_SECRET=any-random-string \
  claude-swarm

# Open http://localhost:8080

Project structure

server.ts              # Express server (routes, SSE setup, startup)
src/
  agents.ts            # AgentManager — spawn/kill/message Claude CLI
  auth.ts              # JWT auth + API key exchange
  messages.ts          # MessageBus for inter-agent communication
  persistence.ts       # Agent state persistence across restarts
  storage.ts           # GCS sync for shared context and Claude home
  worktrees.ts         # Git worktree GC for dead agent workspaces
  validation.ts        # Input validation + rate limiting
  guardrails.ts        # Safety config (tool allowlists, limits, spawning depth)
  kill-switch.ts       # Emergency kill switch (persistent state, GCS sync)
  types.ts             # Shared TypeScript interfaces
commands/              # Slash command skills
  agent-status.md      # /agent-status — show agent registry
  check-messages.md    # /check-messages — check message bus inbox
  send-message.md      # /send-message — post to message bus
  spawn-agent.md       # /spawn-agent — create sub-agents
ui/                    # React SPA (Vite + Tailwind v4 + @fanvue/ui)
  src/
    pages/             # Login, Dashboard, AgentView, Settings
    components/        # Header, Sidebar, AgentCard, AgentTerminal, PromptInput, MessageFeed
    hooks/             # useAgentStream (SSE management)
    api.ts             # API client with SSE parsing
    auth.tsx           # Auth context (JWT in sessionStorage)
docs/                  # Architecture documentation
  agent-comms-design.md # Agent communication system architecture
terraform/             # GCP infrastructure
mcp/                   # MCP server config templates
Dockerfile             # Multi-stage build
entrypoint.sh          # Runtime setup

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.github/workflows		.github/workflows
assets		assets
commands		commands
docs		docs
mcp		mcp
plans		plans
src		src
terraform		terraform
ui		ui
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
biome.json		biome.json
entrypoint.sh		entrypoint.sh
home-claude.md		home-claude.md
package-lock.json		package-lock.json
package.json		package.json
server.ts		server.ts
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

License

simonstaton/ClaudeSwarm

Folders and files

Latest commit

History

Repository files navigation

Claude Swarm Platform

Quick start (local)

Architecture

Agent Communication

Message Bus

Agent Registry

Parent-Child Relationships

Delegation Model

Shared Context

Slash Command Skills

Agent Persistence

API

Auth

Agents

Messages

Config

Context

Settings

Kill Switch

Health

Switching Anthropic API keys

Security considerations

Kill Switch

Why this exists

How it works

Kill Switch API

Emergency runbook

Limitations

Deploying to GCP

Prerequisites

1. Build and push the image

2. Deploy infrastructure

3. Grant yourself access

4. Get the service URL

Scaling

MCP servers

GitHub integration

Creating a token

Configuring the token

Secrets management

Local

Production

Adding a new secret

Docker (local)

Project structure

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages