diff --git a/.github/workflows/env-examples.yml b/.github/workflows/env-examples.yml
index 94632516..0fad9853 100644
--- a/.github/workflows/env-examples.yml
+++ b/.github/workflows/env-examples.yml
@@ -4,7 +4,13 @@ on:
   push:
     branches:
       - main
+    paths-ignore:
+      - '**.md'
+      - 'docs/**'
   pull_request:
+    paths-ignore:
+      - '**.md'
+      - 'docs/**'
 
 permissions:
   contents: read
diff --git a/.github/workflows/lint-typecheck.yml b/.github/workflows/lint-typecheck.yml
index 742576f6..e1808c38 100644
--- a/.github/workflows/lint-typecheck.yml
+++ b/.github/workflows/lint-typecheck.yml
@@ -4,7 +4,13 @@ on:
   push:
     branches:
       - main
+    paths-ignore:
+      - '**.md'
+      - 'docs/**'
   pull_request:
+    paths-ignore:
+      - '**.md'
+      - 'docs/**'
 
 permissions:
   contents: read
diff --git a/.github/workflows/requirements-sync.yml b/.github/workflows/requirements-sync.yml
index ac2de1e5..cfb38fb4 100644
--- a/.github/workflows/requirements-sync.yml
+++ b/.github/workflows/requirements-sync.yml
@@ -4,7 +4,13 @@ on:
   push:
     branches:
       - main
+    paths-ignore:
+      - '**.md'
+      - 'docs/**'
   pull_request:
+    paths-ignore:
+      - '**.md'
+      - 'docs/**'
 
 permissions:
   contents: read
diff --git a/.github/workflows/security.yml b/.github/workflows/security.yml
index 1a46ab3d..6a3a047e 100644
--- a/.github/workflows/security.yml
+++ b/.github/workflows/security.yml
@@ -2,8 +2,14 @@ name: security
 
 on:
   pull_request:
+    paths-ignore:
+      - '**.md'
+      - 'docs/**'
   push:
     branches: [main]
+    paths-ignore:
+      - '**.md'
+      - 'docs/**'
 
 permissions:
   contents: read
diff --git a/.github/workflows/unit-tests.yml b/.github/workflows/unit-tests.yml
index 3c24da35..ea66cb07 100644
--- a/.github/workflows/unit-tests.yml
+++ b/.github/workflows/unit-tests.yml
@@ -4,7 +4,13 @@ on:
   push:
     branches:
       - main
+    paths-ignore:
+      - '**.md'
+      - 'docs/**'
   pull_request:
+    paths-ignore:
+      - '**.md'
+      - 'docs/**'
 
 permissions:
   contents: read
diff --git a/AGENTS.md b/AGENTS.md
index 6b985607..f10a9d0d 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -1,710 +1,80 @@
 # AGENTS.md — FlowMesh
 
-This file provides guidance to coding agents (Claude Code, Codex, Cursor, and
-similar tools) working in this repository. It follows the cross-agent
-`AGENTS.md` convention; agent-specific entry files (e.g. `CLAUDE.md`) at this
-level simply import from here.
-
-## Project Overview
-
-FlowMesh is a service fabric for running LLM agentic workflows on distributed
-GPU workers. It accepts workflow definitions (YAML, JSON, or n8n graph format),
-parses them into a DAG of tasks, schedules and dispatches each task to a
-suitable worker, and collects results and artifacts. It supports inference
-(vLLM, HF transformers, diffusers), training (SFT, LoRA, DPO, PPO),
-retrieval-augmented generation, agent execution, SSH-style interactive
-sessions, and arbitrary container jobs.
-
-The codebase is organized as a **uv workspace** with SDK and CLI packages:
-
-| Package | Path | Purpose |
-|---------|------|---------|
-| `flowmesh` (root) | `src/` | Server, Worker, shared schemas |
-| `flowmesh-sdk` | `sdk/` | Public Python SDK for the server API |
-| `flowmesh-sdk-stack` | `sdk/stack/` | Stack and node helpers for local deployments |
-| `flowmesh-cli` | `cli/` | Typer CLI entry point + core commands |
-| `flowmesh-cli-stack` | `cli/stack/` | Stack deployment commands |
-
-## Architecture
-
-```
-Client (CLI / SDK / API)
-  │
-  ▼  HTTP (default 8000)
-Server  ─── FastAPI orchestrator
-  │  - YAML / JSON / n8n workflow parsing
-  │  - DAG dependency resolution, task scheduling
-  │  - Dispatch: best-fit / first-fit / min-satisfying worker selection
-  │  - Task merging, stage stickiness, context reuse, epoch scheduling
-  │  - SSE log streaming, metrics recording
-  │  - Redis (control + telemetry channels) for pub/sub and task state
-  │
-  ├──▶ Redis Pub/Sub ──▶ Supervisor(s) ──▶ gRPC ──▶ Worker(s)
-  │                           │                       │
-  │                           │  Worker lifecycle     │  Stateless GPU executors
-  │                           │  Docker / Vast.ai     │  19 executor types
-  │                           │  adapters             │
-  │                           │                       │
-  │                           ▼                       ▼
-  │                      Worker Registry         Results / Artifacts
-  │
-  └──▶ Redis streams: logs, events, task queues
-```
-
-### Components
-
-The runtime is two top-level processes:
-
-1. **Server** (`src/server/`) — Central FastAPI orchestrator (default HTTP
-   port 8000). Hosts workflow / task / dispatch logic and the **Supervisor
-   subsystem** under `src/server/supervisor/`, which manages per-node worker
-   lifecycle, runs the worker-facing gRPC server (default port 50051), and
-   drives the Docker / Vast.ai worker adapters.
-
-2. **Worker** (`src/worker/`) — Stateless executor process. Connects to a
-   supervisor via gRPC, receives tasks, runs executors, reports results.
-
-### Communication Protocols
-
-- **Server ↔ Supervisor (within a node)**: `multiprocessing.Queue` between
-  the parent server and its supervisor child process for command/response,
-  plus `nodes:events` Redis pub/sub for telemetry.
-- **Server ↔ Supervisor (across nodes)**: Redis pub/sub on
-  `node:{id}:dispatch`, `node:{id}:cmds`, `nodes:events`, `nodes:responses`.
-- **Supervisor ↔ Worker**: bidirectional gRPC (proto stubs at
-  `src/shared/grpc/supervisor/v1/`).
-- **Client ↔ Server**: REST API over HTTP.
-
-### Prefer CLI and SDK
-
-When interacting with FlowMesh, prefer the **CLI** (`flowmesh`) or **SDK**
-(`flowmesh` Python package) over raw HTTP calls or shell scripts. The CLI and
-SDK handle pagination, error formatting, and SSE streaming.
-
-### Hook Plugin Extension Points
-
-External integrations (auth, submission policy, usage tracking) plug into the
-server through three protocol hooks defined in `src/server/hooks/`:
-
-- `IdentityProvider` — resolve a bearer token to a `PrincipalContext`
-  (iterated from `auth/security.py`). With no providers registered, auth is
-  a no-op and `authenticate_api_key` returns a default admin principal.
-- `SubmissionGuard` — pre-submit precondition (iterated from
-  `routers/v1/workflows.py`).
-- `UsageSink` — fan-out per-task usage rows after a task completes
-  (iterated from `services/monitoring.py`). Typical consumers: billing,
-  audit, observability.
-
-A plugin is any Python module that exposes a top-level `install()`. The
-server loads `FLOWMESH_PLUGINS` (comma-separated module names) inside its
-FastAPI lifespan and treats `install()` as either:
-
-- a sync function returning `None` — the plugin appends its adapters to the
-  registries in `server.hooks` and returns; or
-- an `@asynccontextmanager async def install()` — the plugin owns resources
-  with a lifecycle (a SQLAlchemy engine, an HTTP client, a background task)
-  that need teardown on server shutdown. The loader holds an
-  `AsyncExitStack`, enters each ctx-manager `install()` on startup, and
-  unwinds them in reverse order on shutdown.
-
-Plugins live anywhere on `sys.path` — in-tree under `src/server/<name>/`,
-sibling-mounted under `/app/src/<name>/`, or a pip-installable wheel. Core
-never references plugin module names; each plugin self-filters internally.
-
-OSS ships no DB itself. Plugins that need persistence bring their own engine
-and manage it via the ctx-manager `install()` form. Example:
-
-```python
-# myorg_auth_plugin/__init__.py
-import os
-from contextlib import asynccontextmanager
-from sqlalchemy.ext.asyncio import async_sessionmaker, create_async_engine
-
-from server.hooks import IDENTITY_PROVIDERS
-
-
-class _MyOrgAuth:
-    name = "myorg.auth"
-    def __init__(self, sessionmaker): self._sm = sessionmaker
-    async def resolve(self, raw_token, logger):
-        async with self._sm() as session:
-            ...
-
-@asynccontextmanager
-async def install():
-    engine = create_async_engine(os.environ["MYORG_DATABASE_URL"])
-    IDENTITY_PROVIDERS.append(_MyOrgAuth(async_sessionmaker(engine)))
-    try:
-        yield
-    finally:
-        await engine.dispose()
-```
-
-## API Reference (Server: `http://localhost:8000`)
-
-### Workflows
-
-| Method | Path | Description |
-|--------|------|-------------|
-| POST | `/api/v1/workflows` | Submit workflow (YAML `text/plain` or JSON; set `Workflow-Format: n8n` header for n8n) |
-| POST | `/api/v1/workflows/validate` | Validate without executing |
-| GET | `/api/v1/workflows` | List workflows (filterable) |
-| GET | `/api/v1/workflows/{id}` | Get workflow details |
-| GET | `/api/v1/workflows/{id}/logs` | Query logs (limit, before/after cursors) |
-| GET | `/api/v1/workflows/{id}/logs/stream` | SSE log stream |
-| POST | `/api/v1/workflows/{id}/cancel` | Cancel workflow |
-
-Workflow statuses: `PENDING`, `DISPATCHED`, `FAILED`, `CANCELLED`, `DONE`
-
-### Tasks
-
-| Method | Path | Description |
-|--------|------|-------------|
-| GET | `/api/v1/tasks` | List tasks (filter by workflow_id, status, task_type, assigned_worker) |
-| GET | `/api/v1/tasks/{id}` | Task details |
-| GET | `/api/v1/tasks/{id}/logs` | Query task logs |
-| GET | `/api/v1/tasks/{id}/logs/stream` | SSE task log stream |
-
-Task statuses: `PENDING`, `DISPATCHED`, `FAILED`, `CANCELLED`, `DONE`
-
-### SSH
-
-| Method | Path | Description |
-|--------|------|-------------|
-| WS | `/api/v1/ssh/tasks/{task_id}/proxy` | WebSocket SSH proxy for proxy-mode SSH tasks |
-| GET | `/api/v1/ssh/connections` | List active server-audited SSH proxy/forward connections |
-| TCP | `<SERVER_SSH_FORWARD_PUBLIC_HOST>:<forward_port>` | Raw TCP per-task forward-mode port |
-
-Server policy toggles:
-- `ENABLE_SERVER_SSH_PROXY` — WebSocket SSH proxy endpoint
-- `ENABLE_SERVER_SSH_FORWARD` — TCP forward listener
-- `ENABLE_SERVER_SSH_CONNECTION_AUDIT` — best-effort SSH connection audit
-
-### Results
-
-| Method | Path | Description |
-|--------|------|-------------|
-| POST | `/api/v1/results` | Submit task result (worker → server) |
-| GET | `/api/v1/results/{task_id}` | Get task result |
-| GET | `/api/v1/results/{task_id}/bundle` | Download tar.gz bundle (`?include=results,artifacts,logs,all`) |
-| POST | `/api/v1/results/{task_id}/files` | Upload artifact (multipart) |
-| GET | `/api/v1/results/{task_id}/files/{filename}` | Download artifact |
-| GET | `/api/v1/results/{task_id}/logs` | Download archived logs.jsonl |
-
-### Workers
-
-| Method | Path | Description |
-|--------|------|-------------|
-| GET | `/api/v1/workers` | List workers (filter by alias, namespace, cluster, status, tags) |
-| GET | `/api/v1/workers/{id}` | Worker details |
-
-Worker statuses: `STARTING`, `IDLE`, `BUSY`, `STOPPING`, `STOPPED`
-
-### Nodes (Supervisors)
-
-| Method | Path | Description |
-|--------|------|-------------|
-| GET | `/api/v1/nodes` | List nodes |
-| POST | `/api/v1/nodes/register` | Register a node |
-| GET | `/api/v1/nodes/{id}` | Node details |
-| GET | `/api/v1/nodes/{id}/workers` | List workers under a node |
-| POST | `/api/v1/nodes/{id}/workers/register` | Register worker under node |
-| POST | `/api/v1/nodes/{id}/workers/{name}/start` | Start worker |
-| POST | `/api/v1/nodes/{id}/workers/{name}/stop` | Stop worker |
-
-### Stack (local single-node lifecycle)
-
-`/api/v1/stack/workers/...` — wraps node-registered workers with local-only
-container lifecycle, used by `flowmesh stack worker {up,down,start,stop}`.
-
-### System
-
-| Method | Path | Description |
-|--------|------|-------------|
-| GET | `/healthz` | Top-level health check |
-| GET | `/api/v1/system/metrics` | System metrics snapshot |
-
-## Object Identifiers
-
-Object IDs use 3-character type prefixes:
-
-- `wfl-` workflows
-- `tsk-` tasks
-- `ssn-` SSH session containers
-- `scn-` SSH connection (server-side audit row)
-- `cmd-` supervisor commands
-
-ID factories live in `src/shared/utils/ids.py`. Always use `new_*_id()`
-helpers; never use `uuid4()` or `secrets.token_hex` for IDs.
-
-## SDK Usage
-
-### Sync Client
-
-```python
-from flowmesh import FlowMesh
-
-client = FlowMesh(base_url="http://localhost:8000", api_key="...")
-
-# Submit a workflow
-wf = client.workflows.submit_yaml(open("templates/echo_local.yaml").read())
-
-# Watch progress
-for ev in client.workflows.stream_logs(wf.workflow_id):
-    print(ev.line)
-
-# Inspect tasks
-tasks = client.tasks.list(workflow_id=wf.workflow_id)
-result = client.results.get(tasks[0].task_id)
-```
-
-### Async Client
-
-```python
-from flowmesh import AsyncFlowMesh
-
-async with AsyncFlowMesh(base_url="...", api_key="...") as client:
-    wf = await client.workflows.submit_yaml(...)
-    async for ev in client.workflows.stream_logs(wf.workflow_id):
-        ...
-```
-
-## CLI Commands
-
-The CLI entry point is `flowmesh`. Top-level commands and command groups:
-
-```
-flowmesh info | health | logout
-flowmesh workflow {submit, validate, list, info, watch, cancel, logs}
-flowmesh task {list, info, watch, stop, logs}
-flowmesh worker {list, info}
-flowmesh node {list, info, worker}
-flowmesh node worker {list, start, stop}
-flowmesh ssh {connect, run, proxy, connections}
-flowmesh result {fetch, download}
-flowmesh system {metrics}
-flowmesh stack {build, push, pull, pullall, up, down, restart, ps, logs}
-flowmesh stack worker {up, start, stop, down, list, pull}
-```
-
-`flowmesh stack` (from `flowmesh-cli-stack`) builds Docker images, runs the
-local Compose stack, and manages local-only worker containers. Core commands
-work against any reachable FlowMesh server.
-
-## Workflow YAML Format
-
-### Single Task
-
-```yaml
-apiVersion: flowmesh/v1
-kind: InferenceTask
-metadata:
-  name: hello-inference
-spec:
-  taskType: inference
-  resources:
-    hardware: { gpu: { type: any, count: 1 } }
-  model:
-    source: { type: huggingface, identifier: TinyLlama/TinyLlama-1.1B-Chat-v1.0 }
-    vllm: { gpu_memory_utilization: 0.5 }
-  data:
-    type: list
-    items:
-      - - role: user
-          content: What is the capital of France?
-  inference: { max_tokens: 64, temperature: 0.0 }
-  output:
-    destination: { type: http }
-```
-
-### Multi-Stage DAG
-
-```yaml
-apiVersion: flowmesh/v1
-kind: Workflow
-spec:
-  stages:
-    - name: extract
-      spec:
-        taskType: inference
-        ...
-        data:
-          type: list
-          items:
-            - - role: user
-                content: "Extract entities from: {{input}}"
-    - name: summarize
-      dependsOn: [extract]
-      spec:
-        taskType: inference
-        ...
-        data:
-          type: list
-          items:
-            - - role: user
-                content: "Summarize: {{extract.output}}"
-```
-
-### Graph DAG
-
-`taskType: graph_template` — topology-aware multi-input prompts with parent
-output substitution and validation. See
-`src/worker/executors/utils/graph_templates.py` for the templating contract.
-
-### Schedule Hints
-
-Workflows can declare scheduling preferences via
-`metadata.annotations.schedule_hint`:
-
-- `epoch_groups: [[<task_name>, ...], ...]` — epoch-ordered execution; tasks
-  in epoch `n` only dispatch after every task in epoch `n-1` succeeds.
-- `schedule_in_epoch_order: true` — for dependent DAGs, prefer
-  position-in-epoch tie-breaks during dispatch.
-
-## Task Types & Executor Registry
-
-The worker resolves `spec.taskType` against an executor registry in
-`src/worker/runner.py`. Built-in executors:
-
-| `taskType` | Executor | Use case |
-|-----------|----------|----------|
-| `echo` | `EchoExecutor` | Echo input back as result (smoke tests) |
-| `inference` | `VLLMExecutor` / `TransformersExecutor` | LLM inference |
-| `diffusion` | `DiffusersExecutor` | Image / video diffusion models |
-| `omni_text2{audio,image,speech,general}` | `Omni*Executor` | Multimodal generation |
-| `training` | `SFTExecutor` / `LoRASFTExecutor` / `DPOExecutor` / `PPOExecutor` | Fine-tuning |
-| `rag` | `RAGExecutor` | Retrieval-augmented generation |
-| `agent` | `AgentExecutor` | Tool-using LLM agent (utu / youtu-agent backend) |
-| `data_profiling` | `DataProfilingExecutor` | DataFrame profiling |
-| `data_retrieval` | `DataRetrievalExecutor` | DataFrame loading from sources |
-| `ssh` | `SSHExecutor` | Interactive SSH session or non-interactive container job |
-
-Helper utilities live in `src/worker/executors/utils/` (`artifacts`,
-`checkpoints`, `data_utils`, `graph_templates`, `huggingface`, `safe_eval`).
-Cross-cutting behavior is in `src/worker/executors/mixins/`
-(`data`, `governance`, `inference`, `training`).
-
-### Agent Executor (utu / youtu-agent)
-
-Requires `UTU_LLM_TYPE`, `UTU_LLM_MODEL`, `UTU_LLM_BASE_URL`, `UTU_LLM_API_KEY`
-to run. Optional: `SERPER_API_KEY`, `JINA_API_KEY` for search tools.
-
-## Environment Variables
-
-The canonical declared set lives in
-`cli/stack/src/flowmesh_cli_stack/env_schema.py` and is mirrored to
-`cli/stack/src/flowmesh_cli_stack/assets/.env.example`. Run
-`uv run scripts/dev/check_env_examples.py --write` after schema edits.
-
-### Server (selected)
-
-| Variable | Default | Description |
-|----------|---------|-------------|
-| `REDIS_CONTROL_URL` | `redis://localhost:6379/0` | Redis control channel |
-| `REDIS_TELEMETRY_URL` | `redis://localhost:6380/0` | Redis telemetry channel |
-| `DATABASE_URL` | – | Postgres connection string |
-| `RESULTS_DIR` | `./results` | Server-side results directory |
-| `SERVER_HTTP_PORT` | `8000` | Public HTTP port |
-| `SERVER_GRPC_PORT` | `50051` | Supervisor gRPC port |
-| `ORCHESTRATOR_DISPATCH_MODE` | `adaptive` | Scheduler mode |
-| `ORCHESTRATOR_WORKER_SELECTION` | `best_fit` | `best_fit`, `first_fit`, `min_satisfying` |
-| `SCHEDULER_LAMBDA_INFERENCE` | `0.4` | Inference task weight |
-| `SCHEDULER_LAMBDA_TRAINING` | `0.8` | Training task weight |
-| `SCHEDULER_LAMBDA_OTHER` | `0.5` | Other-task weight |
-| `SCHEDULER_SELECTION_JITTER` | `1e-3` | Tie-break jitter |
-| `ENABLE_TASK_MERGE` | `true` | DAG-level task coalescing |
-| `TASK_MERGE_MAX_BATCH_SIZE` | `4` | Max merged tasks per dispatch |
-| `ENABLE_CONTEXT_REUSE` | `true` | Bias toward workers with cached models |
-| `WORKER_CACHE_TTL_SEC` | `3600` | Cache metadata TTL |
-| `ENABLE_STAGE_WEIGHT_STICKINESS` | `false` | Pin stages to checkpoint-producing workers |
-| `ENABLE_WORKER_WATCHDOG` | `true` | Worker death detection |
-| `WORKER_DEATH_GRACE_SEC` | `60` | Grace period before marking dead |
-| `FLOWMESH_PLUGINS` | – | Comma-separated plugin module names |
-| `LOG_LEVEL` | `INFO` | Server log level |
-
-### Worker (selected)
-
-| Variable | Default | Description |
-|----------|---------|-------------|
-| `WORKER_TOKEN` | – | Auth token for supervisor gRPC |
-| `SUPERVISOR_GRPC_TARGET` | – | Supervisor gRPC endpoint |
-| `RESULTS_DIR` | `./results_workers` | Task output directory |
-| `WORKER_TAGS` | `` | Scheduler hints |
-| `WORKER_COST_PER_HOUR` | `1.0` | Cost metadata |
-| `WORKER_UPLOAD_RESULTS` | `false` | Upload results when no destination set |
-| `HF_CACHE_DIR` | – | Shared HuggingFace cache mount |
-| `HEARTBEAT_INTERVAL_SEC` | `30` | Heartbeat cadence |
-
-### Supervisor (selected)
-
-| Variable | Default | Description |
-|----------|---------|-------------|
-| `NODE_NAMESPACE` / `NODE_CLUSTER` / `NODE_ALIAS` | defaults | Identity |
-| `NODE_TAGS` | `` | Scheduler hints (CSV) |
-| `SUPERVISOR_GRPC_DISABLE_SERVER_TLS` | `false` | Local-only insecure gRPC |
-| `SUPERVISOR_GRPC_KEEPALIVE_PERMIT_WITHOUT_CALLS` | `true` | gRPC keepalive |
-| `SUPERVISOR_GRPC_EXTERNAL_PORT` | – | External port (when port-forwarded) |
-| `SERVER_GRPC_TLS_*` | – | TLS certificate files |
-
-## Directory Structure
-
-```
-src/
-  server/                      # FastAPI orchestrator
-    auth/                      # OSS auth shim (no-op when IDENTITY_PROVIDERS is empty; otherwise delegates to the chain)
-    clients/                   # Redis client(s)
-    config.py                  # Server config + DispatchConfig
-    db/                        # SQLAlchemy session factory and migrations (no-op in OSS)
-    dispatcher/                # Dispatch loop, worker selector, stage stickiness, context reuse
-    env.py                     # Centralized env reads
-    hooks/                     # IdentityProvider / SubmissionGuard / UsageSink ABCs + registries
-    main.py                    # Entrypoint, FLOWMESH_PLUGINS loader, EventMonitor wiring
-    registries/                # Worker / Node registries (Redis-backed)
-    routers/v1/                # workflows, tasks, results, workers, nodes, ssh, stack, system
-    schemas/                   # Pydantic models for API + DB
-    services/                  # monitoring, log streaming, ssh forwarding, runtime
-    supervisor/                # Per-node agent (gRPC server, adapters, lifecycle)
-      adapters/                # Docker / Vast.ai worker adapters
-      services/                # gRPC server, command listener, task listener, relay
-    task/                      # parser, runtime, models, merge / epoch helpers
-  shared/
-    schemas/                   # Cross-cutting schemas (NodeInfo, etc.)
-    grpc/supervisor/v1/        # Generated proto stubs (used by server's supervisor subsystem + worker)
-    tasks/                     # Workflow/task spec models
-    utils/                     # JSON, parsing, time, ids
-  worker/
-    config.py                  # Worker config (loaded from env)
-    docker/                    # Worker Dockerfiles (CPU + GPU)
-    executors/                 # Executor implementations
-      mixins/                  # data, governance, inference, training
-      utils/                   # artifacts, checkpoints, data_utils, graph_templates, huggingface, safe_eval
-    runner.py                  # Task lifecycle (execute, write results, upload artifacts)
-    supervisor_client.py       # gRPC client to supervisor
-cli/
-  src/flowmesh_cli/            # Core CLI commands (flowmesh-cli)
-  stack/src/flowmesh_cli_stack/   # Stack deployment commands (flowmesh-cli-stack)
-sdk/
-  src/flowmesh/                # Public SDK (flowmesh-sdk)
-  stack/src/flowmesh_stack/    # Stack helpers (flowmesh-sdk-stack)
-proto/
-  supervisor/v1/supervisor.proto  # gRPC service definition
-templates/                     # Example workflow YAMLs
-tests/
-  server/                      # Server-side unit tests
-  worker/                      # Worker-side unit tests
-  shared/                      # Shared schema tests
-  cli/                         # CLI tests
-  sdk/                         # SDK tests
-scripts/dev/                   # Developer scripts (compile_protos.sh, sync_requirements.py, check_env_examples.py)
-```
-
-## Development Workflow
-
-### Setup
-
-```bash
-pip install uv
-uv sync --all-extras                  # All Python deps including dev tooling
-uv run scripts/dev/compile_protos.sh  # Regenerate proto stubs (only when proto changes)
-```
-
-### Format / lint / type-check
-
-```bash
-uv run pre-commit run --all-files
-```
-
-Hooks: gitleaks, isort, black, ruff, codespell, mypy, sync_requirements,
-check_env_examples (via `scripts/dev/check_env_examples.py`).
-
-### Tests
-
-```bash
-uv run pytest tests/ --ignore=tests/worker/test_mp_executor_cleanup_gpu.py
-```
-
-The cleanup-gpu test requires NVIDIA hardware and isolated processes; skip it
-in normal CI.
-
-### Run locally
-
-```bash
-uv run flowmesh stack up                          # Server + Postgres + Redis + Supervisor
-uv run flowmesh stack worker up cpu 1             # 1 CPU worker
-uv run flowmesh stack worker up gpu --targets 0   # 1 GPU worker pinned to GPU 0
-uv run flowmesh workflow submit templates/echo_local.yaml
-```
-
-### Docker
-
-Build images locally:
-
-```bash
-uv run flowmesh stack build server
-uv run flowmesh stack build flowmesh_worker_cpu flowmesh_worker_gpu
-uv run flowmesh stack build flowmesh_ssh_cpu flowmesh_ssh_gpu
-```
-
-Always rebuild the affected worker image after changing executor code; stale
-images silently mask code regressions.
-
-## Code Style
-
-- Python 3.12+ (see `[project]` in `pyproject.toml`).
-- Top-level imports only; inline imports only to break circular imports.
-- Prefer `typing.Any` over `object` in annotations; only use `object` when
-  `Any` is semantically wrong.
-- Use `# type: ignore[<error-code>]` only after exhausting alternatives. Never
-  use a bare `# type: ignore`.
-- Prefer `X | Y` over `typing.Union[X, Y]` and `X | None` over `typing.Optional[X]`.
-- Don't write `from __future__ import annotations` unless strictly necessary;
-  use `typing.Self` or quoted forward references instead.
-- Avoid `hasattr` / `getattr` that bypass type checking. Use `isinstance`
-  guards. Acceptable `getattr` uses: dynamic dispatch, providing a default,
-  accessing untyped third-party APIs.
-- Default to no comments. Comment only when the *why* is non-obvious. Names
-  self-document.
-- Don't add comments referencing the current task, fix, or callers ("used by
-  X", "added for Y", "handles issue #123") — those rot as the codebase
-  evolves.
-
-### Security Rules (bandit-enforced)
-
-CI runs `bandit` with no severity / confidence threshold. Every finding must
-either have a source-level fix, a documented skip in `[tool.bandit]` in
-`pyproject.toml`, or a per-line `# nosec BXXX` paired with an inline TODO
-that names the planned fix. A bare `# nosec` with no rule code and no TODO
-is disallowed — silencing a finding without a written reason defeats the
-audit.
-
-When writing new code, follow these rules so bandit stays green:
-
-- **B113 (request timeouts)** — every `requests.get/post/...` call must pass
-  `timeout=`. Hung connections are denial-of-service; no implicit defaults.
-- **B202 (archive extraction)** — `tarfile.extractall(..., filter="data")` is
-  required (Python 3.12+). For zipfile, iterate `infolist()`, validate that
-  each member resolves under the destination, and extract per-member; never
-  call `zipfile.extractall` on untrusted archives.
-- **B310 (urlopen)** — don't use `urllib.request.urlopen`; use the `requests`
-  library and validate the URL scheme (`http`/`https` only) before fetching.
-- **B324 (insecure hash)** — `hashlib.md5(..., usedforsecurity=False)` is
-  required when MD5 is used for cache keys / fingerprints. Never use MD5 for
-  anything that crosses a security boundary.
-- **B506 (yaml load)** — always use `yaml.safe_load`. `yaml.load(...,
-  Loader=yaml.FullLoader)` is forbidden.
-- **B603 (subprocess call site)** — every `subprocess.run/call/Popen/...`
-  needs a per-line `# nosec B603` with a one-line written rationale (e.g.
-  `argv list, no shell=True, absolute path via shutil.which()`). The B404
-  blanket rule is project-skipped because B602 (`shell=True`) and B607
-  (partial path) catch the actually-dangerous patterns; B603 is enforced
-  per-site so every shellout is visible at the call line.
-- **B607 (subprocess with partial path)** — prefer the vendored SDK
-  (`pynvml`, `docker-py`, `GitPython`) over shelling out via `nvidia-smi` /
-  `docker` / `git`. If shelling out is unavoidable, the absolute path must
-  be provided.
-- **B614 (torch.load)** — `torch.load(..., weights_only=True)` is required.
-  Pickle deserialization is RCE waiting to happen.
-- **B701 (jinja2 autoescape)** — `Environment(autoescape=select_autoescape())`
-  is required. The default `False` is unsafe even for non-HTML templates.
-- **B108 (hardcoded /tmp)** — use `tempfile.gettempdir()` or
-  `tempfile.NamedTemporaryFile` for local-host work. The literal `"/tmp"` in
-  Python source is forbidden; if a Linux container path is genuinely
-  intended (e.g. an in-container sentinel), construct it from
-  `PurePosixPath` segments rather than as a single string constant.
-
-Skipped rules and the rationale for each are listed in `[tool.bandit]`.
-When the rationale stops being true (e.g. a sandbox stops being a sandbox),
-remove the skip and fix the call sites — don't widen the skip list silently.
-
-### Dependency CVE scanning (pip-audit)
-
-CI runs `pip-audit` against each generated requirements file
-(`src/server/requirements.txt`, `src/worker/requirements/requirements.txt`,
-`src/worker/requirements/requirements.gpu.txt`). The job lives in
-`.github/workflows/security.yml`.
-
-When pip-audit reports a new CVE, the only real fix is to bump the
-offending dep in `pyproject.toml`, then `uv lock` + `uv run
-scripts/dev/sync_requirements.py --write`. Silencing a finding via
-`--ignore-vuln` is a last resort; every silenced GHSA needs a written
-reason, same rule as bandit. The currently-ignored advisories and the
-upgrade blocker that justifies each are listed below.
-
-| GHSA | Package | Fix version | Why ignored |
-|------|---------|-------------|-------------|
-| GHSA-69w3-r845-3855 | transformers | 5.0.0rc3 | held by vllm/vllm-omni 0.18 compatibility |
-| GHSA-pf3h-qjgv-vcpr | vllm | 0.19.0 | held by transformers 4.57 + adjacent inference deps |
-| GHSA-pq5c-rjhq-qp7p | vllm | 0.19.0 | same |
-| GHSA-3mwp-wvh9-7528 | vllm | 0.19.0 | same |
-| GHSA-cfh3-3jmp-rvhc | pillow | 12.1.1 | gradio 5.50 caps pillow<12 (transitive via vllm-omni) |
-| GHSA-whj4-6x5x-4v2j | pillow | 12.2.0 | same cap |
-| GHSA-vfmq-68hx-4jfw | lxml | 6.1.0 | crawl4ai 0.8.6 caps lxml<6 |
-| GHSA-39mp-8hj3-5c49 | gradio | 6.7.0 | vllm-omni 0.18 pins gradio==5.50 |
-| GHSA-h3h8-3v2v-rg7m | gradio | 6.6.0 | same pin |
-| GHSA-jmh7-g254-2cq9 | gradio | 6.6.0 | same pin |
-| GHSA-pfjf-5gxr-995x | gradio | 6.6.0 | same pin |
-| GHSA-w8v5-vhqr-4h9v | diskcache | (none) | upstream unmaintained, no fixed version published |
-
-When a blocker lifts (e.g. transformers 5 ↔ vllm 0.19 line stabilizes),
-drop the corresponding `--ignore-vuln` flag from the workflow rather than
-extending the rationale to unrelated packages.
-
-## Commit Conventions
-
-- Single-line subject in imperative mood; no body unless a non-obvious "why"
-  is needed.
-- Conventional prefixes: `feat:`, `fix:`, `refactor:`, `chore:`, `docs:`,
-  `style:`, `test:`. Scope optional: `feat(server): ...`.
-- Sign off (`--signoff`) is required for code coming from coding agents
-  (Claude Code, Codex, Cursor, etc.).
-- One logical change per commit. Don't batch unrelated changes.
-
-## Key Patterns
-
-### Task State Machine
-
-`PENDING` → `DISPATCHED` → (`DONE` | `FAILED` | `CANCELLED`).
-A retried task transitions back to `PENDING` until exhausted.
-
-### Redis Channel Conventions
-
-- `flowmesh:control:*` — control plane (task assignments, cancellations,
-  worker lifecycle).
-- `flowmesh:telemetry:*` — telemetry (heartbeats, status updates).
-- `flowmesh:logs:task:{task_id}` — per-task log stream.
-- `flowmesh:logs:workflow:{workflow_id}` — per-workflow log stream.
-
-Stream lengths are bounded via `LOG_STREAM_MAXLEN_TASK` and
-`LOG_STREAM_MAXLEN_WORKFLOW`. Streams expire `LOG_STREAM_TTL_SEC` after close.
-
-### Cursor-Based Pagination
-
-List endpoints (`/api/v1/workflows`, `/api/v1/tasks`, log queries) accept
-`limit` and `before` / `after` cursors. The cursor is an opaque base64
-encoding of `(timestamp, id)`; do not parse client-side.
-
-### Task Merging
-
-Compatible adjacent tasks in a DAG (same `taskType`, model, hardware shape,
-and merge key) coalesce into a single dispatch. The merged children are
-carried in `WorkerTaskMessage.merged_children`. The runtime hands the worker
-a single message; the worker writes per-child results into a `children`
-section of the response. The dispatcher then fans out synthetic
-TASK_SUCCEEDED / TASK_FAILED events for each child.
-
-Merge can be disabled with `ENABLE_TASK_MERGE=false`.
-
-### Stage Stickiness
-
-When `ENABLE_STAGE_WEIGHT_STICKINESS=true`, the dispatcher pins stages that
-reference an upstream stage's checkpoint to the worker that produced it,
-falling back to normal selection if that worker is unavailable or stale.
-Mostly relevant for training pipelines where reusing the on-disk checkpoint
-saves repeated downloads.
-
-### Context Reuse / Cache Affinity
-
-Workers report cached models and datasets via `WorkerHardware`. The
-dispatcher's `_cached_worker_candidates` filters the candidate pool to
-workers whose cache covers the task's model/dataset references. Stale cache
-entries (older than `WORKER_CACHE_TTL_SEC`) are ignored.
+Routing doc for coding agents (Claude Code, Codex, Cursor, …) working in
+this repository. The full project rules live in dedicated docs; read the
+ones relevant to your task before editing.
+
+FlowMesh is a service fabric for running LLM agentic workflows on
+distributed GPU workers. The server parses a workflow, turns it into a
+DAG of tasks, dispatches each task to a worker, and collects results
+and artifacts.
+
+## Where to read what
+
+- **[`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md)** — topology diagram,
+  components (server / supervisor / worker), communication protocols,
+  object IDs, task state machine, directory map, runtime behavior
+  (task merging, stage stickiness, context reuse, log streams), plugin
+  hooks. Read before any cross-component change.
+- **[`docs/CODE_STYLE.md`](docs/CODE_STYLE.md)** — Python rules,
+  docstring conventions, bandit security rules and `# nosec` policy,
+  pip-audit policy. Read before writing any source code.
+- **[`CONTRIBUTING.md`](CONTRIBUTING.md)** — setup, pre-commit hooks,
+  testing, dependency-pin workflow, **commit and PR title conventions**,
+  DCO sign-off. Read before committing or opening a PR.
+- **[`docs/API.md`](docs/API.md)** — common server REST endpoints
+  (workflows, tasks, results, workers, nodes, SSH, system) and cursor
+  pagination contract. Read before calling the server directly or
+  changing a router.
+- **[`docs/SDK.md`](docs/SDK.md)** — client usage, common operations, error contract.
+- **[`docs/CLI.md`](docs/CLI.md)** — `flowmesh ...` command groups,
+  common workflows (submit/watch/logs), local stack lifecycle, SSH tasks.
+- **[`docs/EXECUTORS.md`](docs/EXECUTORS.md)** — `taskType → Executor`
+  registry table, helper utilities, and the `AgentExecutor` env
+  requirements (`UTU_LLM_*`, `SERPER_API_KEY`, `JINA_API_KEY`).
+- **[`docs/WORKFLOWS.md`](docs/WORKFLOWS.md)** — workflow YAML format
+  hierarchy: single task, multi-stage DAG (`spec.stages`), graph DAG
+  (`taskType: graph_template`), and schedule hints (`epoch_groups`,
+  `schedule_in_epoch_order`).
+- **[`docs/ENV.md`](docs/ENV.md)** — curated server / worker /
+  supervisor env var tables (the knobs you actually tune). Full schema
+  in `cli/stack/src/flowmesh_cli_stack/env_schema.py`.
+- **[`docs/PLUGINS.md`](docs/PLUGINS.md)** — plugin extension contract,
+  loader semantics (`FLOWMESH_PLUGINS`), and a worked example.
+
+Concrete examples and runnable workflows live in `templates/`.
+When code, APIs, CLI commands, SDK methods, env vars, workflow formats, or
+runtime behavior change, update the corresponding docs in the same PR.
+
+## Reminders
+
+The full justification lives in the linked docs; these are the rules
+that come up most often, surfaced here so you can't miss them.
+
+- **PR title type**: one of `feat | fix | refactor | chore | test |
+  perf | docs`. Anything else fails the title check
+  (`scripts/ci/check_pr_title.py`). Use `docs:` for doc-only PRs —
+  those skip the code-related CI jobs (lint, tests, security,
+  env/requirements sync).
+- **Docstrings**: describe what the code *does*, not what it
+  *replaced*. No "in-process replacement for X", no "previously did Y".
+- **Comments**: default to none. Add a short one only when the *why* is
+  non-obvious; let names self-document.
+- **Test failures**: never ignore one. CI guards the suite, so a red
+  test on `main` is a CI gap, not a permission to skip — fix it as part
+  of your PR even when the failure looks unrelated to your change.
+  Every PR ships a runnable build.
+- **Cluster management goes through SDK / CLI**: reach for `flowmesh ...`
+  or the Python SDK. Raw `docker` is a last-resort escape hatch when the
+  SDK / CLI doesn't expose what you need.
+
+## Dev workflow
+
+Use `CONTRIBUTING.md` for setup, hooks, tests, dependency pins, and
+commit rules. Use `docs/CLI.md` for local stack and workflow commands.
+
+## Auto-loaded references
+
+@docs/ARCHITECTURE.md
+@docs/CODE_STYLE.md
+@CONTRIBUTING.md
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index dd9cdfc6..044bd88f 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -56,8 +56,27 @@ This installs three hook stages:
   [DCO sign-off](#signing-off-commits-dco) line to your commit message.
 - **commit-msg** — verifies the sign-off is present (safety net).
 
+## Commit message and PR title conventions
+
+- Allowed prefixes for commit messages and PR titles (enforced for **PR titles** by
+  `scripts/ci/check_pr_title.py`): `feat:`, `fix:`, `refactor:`,
+  `chore:`, `test:`, `perf:`, `docs:`. Use `docs:` for doc-only PRs —
+  those skip the code-related CI jobs (lint, tests, security,
+  env/requirements sync).
+- Optional scope: `feat(server): ...`. Optional `[BREAKING]` prefix
+  *before* the type for breaking changes: `[BREAKING] feat: ...`.
+- For commit messages, the subject should be a short sentence about the key changes;
+  add body only when the reason for the changes need elaboration.
+- Sign off (`--signoff`) is required for every commit.
+- One logical change per commit. Don't batch unrelated changes.
+
 ## Code Style
 
+See [`docs/CODE_STYLE.md`](docs/CODE_STYLE.md) for the project's Python
+rules, docstring conventions, bandit security rules, and pip-audit
+policy. The tools below run automatically via pre-commit; running them
+manually is rarely necessary.
+
 | Tool | Purpose | Config |
 |------|---------|--------|
 | [gitleaks](https://github.com/gitleaks/gitleaks) | Committed-secret detection | - |
@@ -91,7 +110,7 @@ If your change touches shared schemas or proto definitions, verify downstream co
 Dependency versions live in two places with different styles:
 
 - **`pyproject.toml`** — `>=X.Y.Z` lower bounds. Expresses a compatibility floor; lets uv resolve the current acceptable version.
-- **`src/worker/requirements/requirements{,.gpu}.txt`** — exact `==X.Y.Z` pins. These feed the worker Docker images (`uv pip install --requirement …`), which need deterministic, reproducible installs.
+- **`src/worker/requirements/requirements{,.gpu}.txt`** — exact `==X.Y.Z` pins for deterministic, reproducible environment of worker environment.
 
 The requirements files are **auto-generated** from `pyproject.toml` + `uv.lock` by `scripts/dev/sync_requirements.py`. Do not edit them by hand.
 
@@ -123,5 +142,7 @@ git push --force-with-lease
 
 ## Project Layout
 
-FlowMesh follows a multi-tier architecture (Server / Worker) with
-shared schemas, SDK, and CLI packages.
+FlowMesh follows a multi-tier architecture (Server / Worker) with shared
+schemas, SDK, and CLI packages. See [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md)
+for the topology diagram, communication protocols, directory map, and
+runtime behavior (task merging, stickiness, context reuse, etc.).
diff --git a/docs/API.md b/docs/API.md
new file mode 100644
index 00000000..cebac686
--- /dev/null
+++ b/docs/API.md
@@ -0,0 +1,87 @@
+# API reference (common endpoints)
+
+Server runs at `http://localhost:8000` by default. The router source of
+truth is `src/server/routers/v1/*.py`.
+
+Workflow and task statuses (used in payloads and filters):
+`PENDING`, `DISPATCHED`, `FAILED`, `CANCELLED`, `DONE`. Worker statuses:
+`STARTING`, `IDLE`, `BUSY`, `STOPPING`, `STOPPED`.
+
+## Workflows
+
+| Method | Path | Description |
+|--------|------|-------------|
+| POST | `/api/v1/workflows` | Submit a workflow. Body is YAML (`text/plain`) or JSON; set `Workflow-Format: n8n` for n8n graphs. |
+| POST | `/api/v1/workflows/validate` | Parse without executing. |
+| GET | `/api/v1/workflows` | List workflows (`workflow_id`, `owner`, `status`, cursor pagination). |
+| GET | `/api/v1/workflows/{id}` | Workflow details + per-task summary. |
+| GET | `/api/v1/workflows/{id}/logs` | Query logs (`limit`, `before`/`after` cursors). |
+| GET | `/api/v1/workflows/{id}/logs/stream` | SSE log stream. |
+| POST | `/api/v1/workflows/{id}/cancel` | Cancel a workflow and all in-flight tasks. |
+
+## Tasks
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/v1/tasks` | List tasks. Filters: `workflow_id`, `status`, `task_type`, `assigned_worker`. |
+| GET | `/api/v1/tasks/{id}` | Task details. |
+| GET | `/api/v1/tasks/{id}/logs` | Query task logs. |
+| GET | `/api/v1/tasks/{id}/logs/stream` | SSE task log stream. |
+
+## Results
+
+| Method | Path | Description |
+|--------|------|-------------|
+| POST | `/api/v1/results` | Submit task result (worker → server). |
+| GET | `/api/v1/results/{task_id}` | Get task result JSON. |
+| GET | `/api/v1/results/{task_id}/bundle` | Download tar.gz bundle (`?include=results,artifacts,logs,all`). |
+| POST | `/api/v1/results/{task_id}/files` | Upload artifact (multipart). |
+| GET | `/api/v1/results/{task_id}/files/{filename}` | Download artifact. |
+| GET | `/api/v1/results/{task_id}/logs` | Download archived `logs.jsonl`. |
+
+## Traces
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/v1/traces/workflows/{workflow_id}/{trace_type}` | Fetch workflow trace JSONL rows. `trace_type` is `spans`, `assets`, or `lineage`. |
+| GET | `/api/v1/traces/workflows/analyze/{workflow_id}` | Run the trace analyzer and return a profile summary. |
+| POST | `/api/v1/traces/tasks/{task_id}/{trace_type}` | Upload a per-task trace JSONL file. `trace_type` is `spans`, `assets`, or `lineage`. |
+
+## Workers and nodes
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/v1/workers` | List workers. Filters: `alias`, `namespace`, `cluster`, `status`, `tags`. |
+| GET | `/api/v1/workers/{id}` | Worker details + hardware. |
+| GET | `/api/v1/nodes` | List nodes (supervisors). |
+| POST | `/api/v1/nodes/register` | Register a node. |
+| GET | `/api/v1/nodes/{id}/workers` | List workers under a node. |
+| POST | `/api/v1/nodes/{id}/workers/register` | Register worker under node. |
+| POST | `/api/v1/nodes/{id}/workers/{name}/{start,stop}` | Start/stop a worker. |
+
+`/api/v1/stack/workers/...` wraps node-registered workers with local-only
+container lifecycle and is what `flowmesh stack worker {up,down,...}`
+calls.
+
+## SSH
+
+| Method | Path | Description |
+|--------|------|-------------|
+| WS | `/api/v1/ssh/tasks/{task_id}/proxy` | WebSocket SSH proxy for proxy-mode SSH tasks. |
+| GET | `/api/v1/ssh/connections` | List active server-audited SSH proxy/forward connections. |
+
+Server policy toggles: `ENABLE_SERVER_SSH_PROXY`,
+`ENABLE_SERVER_SSH_FORWARD`, `ENABLE_SERVER_SSH_CONNECTION_AUDIT`.
+
+## System
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/healthz` | Top-level health check. |
+| GET | `/api/v1/system/metrics` | System metrics snapshot. |
+
+## Cursor pagination
+
+List endpoints (`/api/v1/workflows`, `/api/v1/tasks`, log queries)
+accept `limit` and `before` / `after` cursors. The cursor is an opaque
+base64 of `(timestamp, id)`; do not parse client-side.
diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md
new file mode 100644
index 00000000..2a3ca71c
--- /dev/null
+++ b/docs/ARCHITECTURE.md
@@ -0,0 +1,129 @@
+# Architecture
+
+FlowMesh is a service fabric for running LLM agentic workflows on
+distributed GPU workers. The server parses a workflow (YAML / JSON / n8n),
+turns it into a DAG of tasks, dispatches each task to a worker, and
+collects results and artifacts.
+
+## Workspace layout
+
+The codebase is a **uv workspace** with these packages:
+
+| Package | Path | Purpose |
+|---------|------|---------|
+| `flowmesh` (root) | `src/` | Server, Worker, shared schemas |
+| `flowmesh-sdk` | `sdk/` | Public Python SDK |
+| `flowmesh-sdk-stack` | `sdk/stack/` | Stack/node helpers |
+| `flowmesh-cli` | `cli/` | Typer CLI (`flowmesh ...`) |
+| `flowmesh-cli-stack` | `cli/stack/` | Stack deployment commands |
+
+## Topology
+
+```
+Client (CLI / SDK / API) ──▶ Server (FastAPI orchestrator, :8000)
+                                │
+                                ├── Redis (control + telemetry pub/sub, log streams)
+                                │
+                                └─▶ Supervisor (per-node) ──gRPC──▶ Worker (executor)
+```
+
+The runtime is two top-level processes:
+
+1. **Server** (`src/server/`) — FastAPI orchestrator at `:8000`. Hosts
+   workflow / task / dispatch logic and the **Supervisor subsystem**
+   (`src/server/supervisor/`), which manages per-node worker lifecycle,
+   runs the worker-facing gRPC server (`:50051`), and drives the
+   Docker / Vast.ai worker adapters.
+2. **Worker** (`src/worker/`) — stateless executor. Connects to a
+   supervisor via gRPC, receives tasks, runs the matching executor,
+   reports results.
+
+## Communication
+
+- **server ↔ supervisor (same node)** — `multiprocessing.Queue`.
+- **server ↔ supervisor (across nodes)** — Redis pub/sub.
+- **supervisor ↔ worker** — bidirectional gRPC. Proto stubs at
+  `src/shared/grpc/supervisor/v1/`.
+- **client ↔ server** — REST.
+
+## Object IDs
+
+3-char prefixes: `wfl-` workflows, `tsk-` tasks, `ssn-` SSH sessions,
+`scn-` SSH connection rows, `cmd-` supervisor commands. Always use
+`new_*_id()` helpers in `src/shared/utils/ids.py`. Never use `uuid4()` or
+`secrets.token_hex` for IDs.
+
+## Task state machine
+
+`PENDING → DISPATCHED → (DONE | FAILED | CANCELLED)`. Retried tasks
+cycle back to `PENDING` until exhausted.
+
+## Directory map
+
+```
+src/
+  server/               FastAPI orchestrator
+    auth/                 OSS no-op auth shim; delegates to IDENTITY_PROVIDERS chain
+    db/                   SQLAlchemy session factory + migrations (no-op in OSS)
+    dispatcher/           Dispatch loop, worker selector, stage stickiness, context reuse
+    hooks/                Plugin extension ABCs + registries
+    main.py               Entrypoint, FLOWMESH_PLUGINS loader, EventMonitor wiring
+    registries/           Worker / Node registries (Redis-backed)
+    routers/v1/           workflows, tasks, results, workers, nodes, ssh, stack, system
+    services/             monitoring, log streaming, ssh forwarding, runtime
+    supervisor/           Per-node agent (gRPC server, adapters, lifecycle)
+    task/                 parser, runtime, models, merge / epoch helpers
+  shared/
+    grpc/supervisor/v1/   Generated proto stubs (server + worker)
+    schemas/              Cross-cutting schemas
+    tasks/                Workflow/task spec models
+    utils/                JSON, parsing, time, ids
+  worker/
+    docker/               Worker Dockerfiles (CPU + GPU)
+    executors/            Executor implementations
+      mixins/               data, governance, inference, training
+      utils/                artifacts, checkpoints, data_utils, distributed,
+                            graph_templates, huggingface, safe_eval
+    runner.py             Task lifecycle (execute, write results, upload artifacts)
+cli/                    Typer CLI (`flowmesh`)
+sdk/                    Public Python SDK
+proto/                  gRPC service definition
+templates/              Example workflow YAMLs
+tests/{server,worker,shared,cli,sdk}/
+scripts/dev/            compile_protos, sync_requirements, check_env_examples
+```
+
+## Key runtime behavior
+
+- **Task merging.** Compatible adjacent tasks in a DAG (same `taskType`,
+  model, hardware shape, and merge key) coalesce into a single dispatch.
+  Merged children ride on `WorkerTaskMessage.merged_children`; the worker
+  writes per-child results into `result.children`; the dispatcher fans
+  out synthetic `TASK_SUCCEEDED` / `TASK_FAILED` events. Disable with
+  `ENABLE_TASK_MERGE=false`.
+- **Stage stickiness** (`ENABLE_STAGE_WEIGHT_STICKINESS=true`) — the
+  dispatcher pins stages that reference an upstream stage's checkpoint
+  to the worker that produced it, falling back to normal selection when
+  unavailable or stale. Mostly relevant for training pipelines reusing
+  on-disk checkpoints.
+- **Context reuse.** Workers report cached models/datasets in their
+  `WorkerHardware`. The dispatcher's `_cached_worker_candidates` filters
+  to workers whose cache covers the task's references; entries older
+  than `WORKER_CACHE_TTL_SEC` are ignored.
+- **Cursor pagination.** List endpoints accept `limit` and `before` /
+  `after` cursors. The cursor is an opaque base64 of `(timestamp, id)`;
+  do not parse client-side.
+- **Redis channels.** The runtime uses three namespaces:
+  - `flowmesh:control:*` — control plane (task assignments,
+    cancellations, worker lifecycle).
+  - `flowmesh:telemetry:*` — telemetry (heartbeats, status updates).
+  - `flowmesh:logs:task:{task_id}` and
+    `flowmesh:logs:workflow:{wfl_id}` — log streams, bounded by
+    `LOG_STREAM_MAXLEN_TASK` / `LOG_STREAM_MAXLEN_WORKFLOW` and
+    expired `LOG_STREAM_TTL_SEC` after close.
+
+## Plugin extension points
+
+Server extension points are loaded via the `FLOWMESH_PLUGINS` env var.
+Full contract, loader semantics, and a worked example live in
+[`docs/PLUGINS.md`](PLUGINS.md).
diff --git a/docs/CLI.md b/docs/CLI.md
new file mode 100644
index 00000000..0bfd67eb
--- /dev/null
+++ b/docs/CLI.md
@@ -0,0 +1,90 @@
+# CLI usage (`flowmesh`)
+
+The CLI entry point is `flowmesh` (provided by `flowmesh-cli`). Run any
+subcommand with `-h` / `--help` for the full flag list — this doc covers
+the common ones. Stack commands (`flowmesh stack ...`) come from the
+separate `flowmesh-cli-stack` package and operate on the local Compose
+stack and its containers; everything else works against any reachable
+FlowMesh server.
+
+## Top-level groups
+
+```
+flowmesh info | health | logout
+flowmesh workflow {submit, validate, list, info, watch, cancel, logs}
+flowmesh task     {list, info, watch, stop, logs}
+flowmesh worker   {list, info}
+flowmesh node     {list, info, worker {list, start, stop}}
+flowmesh ssh      {connect, run, proxy, connections}
+flowmesh result   {fetch, download}
+flowmesh trace    {fetch, analyze}
+flowmesh system   {metrics}
+flowmesh stack    {build, push, pull, pullall, up, down, restart, ps, logs}
+flowmesh stack worker {up, start, stop, down, list, pull}
+```
+
+## Common workflows
+
+Submit a workflow and watch it:
+
+```bash
+flowmesh workflow submit templates/echo_local.yaml
+flowmesh workflow watch <wfl-id>              # blocks until DONE / FAILED
+flowmesh workflow logs show <wfl-id>          # recent log entries
+flowmesh workflow logs stream <wfl-id>        # SSE log stream
+flowmesh workflow logs download <wfl-id> -o logs/
+```
+
+List, filter, paginate:
+
+```bash
+flowmesh workflow list --status DONE
+flowmesh task list --workflow-id <wfl-id> --status FAILED
+```
+
+Pull results / artifacts:
+
+```bash
+flowmesh result fetch <tsk-id>                                  # JSON result
+flowmesh result download <tsk-id> --include all -o bundle.tgz   # tar.gz bundle
+```
+
+Fetch or analyze workflow traces:
+
+```bash
+flowmesh trace fetch spans <wfl-id> -o spans.jsonl
+flowmesh trace analyze <wfl-id> --format critical-path
+```
+
+`trace fetch` accepts `spans`, `assets`, or `lineage`. `trace analyze
+--format` accepts `rich`, `critical-path` (`cp`), `end-to-end` (`e2e`),
+`queuing`, `lineage`, or `json`.
+
+## Local stack lifecycle
+
+`.env` controls registry, version tag, and ports — see
+`cli/stack/src/flowmesh_cli_stack/assets/.env.example`. Set
+`FLOWMESH_VERSION` to a PR-identifying slug (e.g. `myfeature`) so parallel
+PRs don't overwrite each other's local images.
+
+```bash
+flowmesh stack up                          # Server + Redis + Supervisor
+flowmesh stack worker up cpu 1             # 1 CPU worker
+flowmesh stack worker up gpu --targets 0   # 1 GPU worker pinned to GPU 0
+flowmesh stack down
+```
+
+After changing executor code, rebuild the affected image before bringing
+the stack back up — running containers don't pick up source changes:
+
+```bash
+flowmesh stack build flowmesh_worker_cpu flowmesh_worker_gpu
+```
+
+## SSH tasks
+
+```bash
+flowmesh ssh connect <tsk-id>          # interactive shell into an SSH task
+flowmesh ssh run <tsk-id> -- <cmd>     # one-shot exec
+flowmesh ssh connections               # list active proxy/forward connections
+```
diff --git a/docs/CODE_STYLE.md b/docs/CODE_STYLE.md
new file mode 100644
index 00000000..86b0b72d
--- /dev/null
+++ b/docs/CODE_STYLE.md
@@ -0,0 +1,110 @@
+# Code style
+
+## Python
+
+- Python 3.12+ (see `[project]` in `pyproject.toml`).
+- Top-level imports only; inline imports only to break circular imports.
+- Prefer `typing.Any` over `object`; use `object` only when `Any` is
+  semantically wrong.
+- Prefer `X | Y` and `X | None` over `typing.Union[X, Y]` /
+  `typing.Optional[X]`.
+- Don't write `from __future__ import annotations` — use `typing.Self` or
+  quoted forward references instead.
+- No `hasattr` / `getattr` that bypasses type checking. Use `isinstance`
+  guards. Acceptable `getattr` uses: dynamic dispatch, providing a
+  default, reaching into untyped third-party APIs.
+- `# type: ignore[<error-code>]` only after exhausting alternatives.
+  Never a bare `# type: ignore`.
+
+## Comments and docstrings
+
+- Default to **no comments**. Comment only when *why* is non-obvious.
+  Names self-document.
+- Don't reference the current task / fix / caller in comments ("used by
+  X", "added for Y", "handles issue #123") — those rot.
+- Docstrings describe what the function/module *does*, not what it
+  *replaced*. No "in-process replacement for X", no "previously did Y" —
+  read the docstring as if seeing the code for the first time.
+
+## Object IDs
+
+3-char prefixes: `wfl-`, `tsk-`, `ssn-`, `scn-`, `cmd-`. Always use
+`new_*_id()` helpers in `src/shared/utils/ids.py`. Never use `uuid4()`
+or `secrets.token_hex` for IDs.
+
+## Security rules (bandit-enforced)
+
+CI runs `bandit` with no severity / confidence threshold. Every finding
+must have a source-level fix, a documented skip in `[tool.bandit]` in
+`pyproject.toml`, or a per-line `# nosec BXXX` with a one-line written
+rationale at the call site. A bare `# nosec` (no rule code, no reason)
+is disallowed.
+
+When writing new code, follow these rules:
+
+- **B113** — every `requests.get/post/...` call passes `timeout=`. No
+  implicit defaults; hung connections are a DoS.
+- **B202** — `tarfile.extractall(..., filter="data")` (Python 3.12+).
+  For zipfile, iterate `infolist()`, validate each member resolves under
+  the destination, extract per-member. Never `zipfile.extractall` on
+  untrusted archives.
+- **B310** — don't use `urllib.request.urlopen`. Use `requests` and
+  validate the URL scheme (`http`/`https` only) before fetching.
+- **B324** — `hashlib.md5(..., usedforsecurity=False)` is required for
+  cache-key / fingerprint use. Never MD5 across a security boundary.
+- **B506** — `yaml.safe_load`, never `yaml.load(..., Loader=FullLoader)`.
+- **B603** — every `subprocess.run/call/Popen/...` needs a per-line
+  `# nosec B603` with a one-line rationale (e.g. `argv list, no
+  shell=True, absolute path via shutil.which()`). The B404 import-level
+  rule is project-skipped because B602/B607 catch the actually-dangerous
+  patterns; B603 is enforced per-site so every shellout is visible at
+  the call line.
+- **B607** — prefer the vendored SDK (`pynvml`, `docker-py`, `GitPython`)
+  over shelling out via `nvidia-smi` / `docker` / `git`. If shelling out
+  is unavoidable, the absolute path must be provided.
+- **B614** — `torch.load(..., weights_only=True)`. Pickle deserialization
+  is RCE waiting to happen.
+- **B701** — `Environment(autoescape=select_autoescape())`. The default
+  `False` is unsafe even for non-HTML templates.
+- **B108** — use `tempfile.gettempdir()` or `tempfile.NamedTemporaryFile`.
+  The literal `"/tmp"` in Python source is forbidden; for an
+  in-container sentinel, build it from `PurePosixPath` segments.
+
+When a documented `[tool.bandit]` skip stops being true (e.g. a sandbox
+stops being a sandbox), remove the skip and fix the call sites — don't
+widen the skip list silently.
+
+## Dependency CVE scanning (pip-audit)
+
+CI runs `pip-audit` against each generated requirements file
+(`src/server/requirements.txt`,
+`src/worker/requirements/requirements.txt`,
+`src/worker/requirements/requirements.gpu.txt`). The job lives in
+`.github/workflows/security.yml`.
+
+When pip-audit reports a new CVE, the only real fix is to bump the
+offending dep in `pyproject.toml`, then `uv lock` and `uv run
+scripts/dev/sync_requirements.py --write`. Silencing via `--ignore-vuln`
+is a last resort; every silenced GHSA needs a written upgrade-blocker.
+The currently-ignored advisories and the upgrade blocker that justifies
+each are listed below; the same list is encoded as `--ignore-vuln`
+flags in `.github/workflows/security.yml`.
+
+| GHSA | Package | Fix version | Why ignored |
+|------|---------|-------------|-------------|
+| GHSA-69w3-r845-3855 | transformers | 5.0.0rc3 | held by vllm/vllm-omni 0.18 compatibility |
+| GHSA-pf3h-qjgv-vcpr | vllm | 0.19.0 | held by transformers 4.57 + adjacent inference deps |
+| GHSA-pq5c-rjhq-qp7p | vllm | 0.19.0 | same |
+| GHSA-3mwp-wvh9-7528 | vllm | 0.19.0 | same |
+| GHSA-cfh3-3jmp-rvhc | pillow | 12.1.1 | gradio 5.50 caps pillow<12 (transitive via vllm-omni) |
+| GHSA-whj4-6x5x-4v2j | pillow | 12.2.0 | same cap |
+| GHSA-vfmq-68hx-4jfw | lxml | 6.1.0 | crawl4ai 0.8.6 caps lxml<6 |
+| GHSA-39mp-8hj3-5c49 | gradio | 6.7.0 | vllm-omni 0.18 pins gradio==5.50 |
+| GHSA-h3h8-3v2v-rg7m | gradio | 6.6.0 | same pin |
+| GHSA-jmh7-g254-2cq9 | gradio | 6.6.0 | same pin |
+| GHSA-pfjf-5gxr-995x | gradio | 6.6.0 | same pin |
+| GHSA-w8v5-vhqr-4h9v | diskcache | (none) | upstream unmaintained, no fixed version published |
+
+When a blocker lifts (e.g. transformers 5 ↔ vllm 0.19 line stabilizes),
+drop the corresponding `--ignore-vuln` flag from the workflow and the
+row from this table — don't extend the rationale to unrelated packages.
diff --git a/docs/ENV.md b/docs/ENV.md
new file mode 100644
index 00000000..efe5cf6b
--- /dev/null
+++ b/docs/ENV.md
@@ -0,0 +1,59 @@
+# Environment variables (curated)
+
+The canonical declared set lives in
+`cli/stack/src/flowmesh_cli_stack/env_schema.py` and is mirrored to
+`cli/stack/src/flowmesh_cli_stack/assets/.env.example`. Run
+`uv run scripts/dev/check_env_examples.py --write` after schema edits.
+
+The tables below curate the knobs you actually tune. Anything not
+listed here is in `.env.example`.
+
+## Server
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `REDIS_CONTROL_URL` | `redis://localhost:6379/0` | Redis control channel |
+| `REDIS_TELEMETRY_URL` | `redis://localhost:6380/0` | Redis telemetry channel |
+| `DATABASE_URL` | – | Postgres connection string |
+| `RESULTS_DIR` | `./results` | Server-side results directory |
+| `SERVER_HTTP_PORT` | `8000` | Public HTTP port |
+| `SERVER_GRPC_PORT` | `50051` | Supervisor gRPC port |
+| `ORCHESTRATOR_DISPATCH_MODE` | `adaptive` | Scheduler mode |
+| `ORCHESTRATOR_WORKER_SELECTION` | `best_fit` | `best_fit`, `first_fit`, `min_satisfying` |
+| `SCHEDULER_LAMBDA_INFERENCE` | `0.4` | Inference task weight |
+| `SCHEDULER_LAMBDA_TRAINING` | `0.8` | Training task weight |
+| `SCHEDULER_LAMBDA_OTHER` | `0.5` | Other-task weight |
+| `SCHEDULER_SELECTION_JITTER` | `1e-3` | Tie-break jitter |
+| `ENABLE_TASK_MERGE` | `true` | DAG-level task coalescing |
+| `TASK_MERGE_MAX_BATCH_SIZE` | `4` | Max merged tasks per dispatch |
+| `ENABLE_CONTEXT_REUSE` | `true` | Bias toward workers with cached models |
+| `WORKER_CACHE_TTL_SEC` | `3600` | Cache metadata TTL |
+| `ENABLE_STAGE_WEIGHT_STICKINESS` | `false` | Pin stages to checkpoint-producing workers |
+| `ENABLE_WORKER_WATCHDOG` | `true` | Worker death detection |
+| `WORKER_DEATH_GRACE_SEC` | `60` | Grace period before marking dead |
+| `FLOWMESH_PLUGINS` | – | Comma-separated plugin module names |
+| `LOG_LEVEL` | `INFO` | Server log level |
+
+## Worker
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `WORKER_TOKEN` | – | Auth token for supervisor gRPC |
+| `SUPERVISOR_GRPC_TARGET` | – | Supervisor gRPC endpoint |
+| `RESULTS_DIR` | `./results_workers` | Task output directory |
+| `WORKER_TAGS` | `` | Scheduler hints |
+| `WORKER_COST_PER_HOUR` | `1.0` | Cost metadata |
+| `WORKER_UPLOAD_RESULTS` | `false` | Upload results when no destination set |
+| `HF_CACHE_DIR` | – | Shared HuggingFace cache mount |
+| `HEARTBEAT_INTERVAL_SEC` | `30` | Heartbeat cadence |
+
+## Supervisor
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `NODE_NAMESPACE` / `NODE_CLUSTER` / `NODE_ALIAS` | defaults | Identity |
+| `NODE_TAGS` | `` | Scheduler hints (CSV) |
+| `SUPERVISOR_GRPC_DISABLE_SERVER_TLS` | `false` | Local-only insecure gRPC |
+| `SUPERVISOR_GRPC_KEEPALIVE_PERMIT_WITHOUT_CALLS` | `true` | gRPC keepalive |
+| `SUPERVISOR_GRPC_EXTERNAL_PORT` | – | External port (when port-forwarded) |
+| `SERVER_GRPC_TLS_*` | – | TLS certificate files |
diff --git a/docs/EXECUTORS.md b/docs/EXECUTORS.md
new file mode 100644
index 00000000..ed02e903
--- /dev/null
+++ b/docs/EXECUTORS.md
@@ -0,0 +1,39 @@
+# Task types and executor registry
+
+The worker resolves `spec.taskType` against an executor registry in
+`src/worker/runner.py`. Built-in executors:
+
+| `taskType` | Executor | Use case |
+|-----------|----------|----------|
+| `echo` | `EchoExecutor` | Echo input back as result (smoke tests) |
+| `inference` | `VLLMExecutor` / `TransformersExecutor` | LLM inference |
+| `diffusion` | `DiffusersExecutor` | Image / video diffusion models |
+| `omni_text2{audio,image,speech,general}` | `Omni*Executor` | Multimodal generation |
+| `training` | `SFTExecutor` / `LoRASFTExecutor` / `DPOExecutor` / `PPOExecutor` | Fine-tuning |
+| `rag` | `RAGExecutor` | Retrieval-augmented generation |
+| `agent` | `AgentExecutor` | Tool-using LLM agent (utu / youtu-agent backend) |
+| `data_profiling` | `DataProfilingExecutor` | DataFrame profiling |
+| `data_retrieval` | `DataRetrievalExecutor` | DataFrame loading from sources |
+| `ssh` | `SSHExecutor` | Interactive SSH session or non-interactive container job |
+
+Helper utilities live in `src/worker/executors/utils/` (`artifacts`,
+`checkpoints`, `data_utils`, `distributed`, `graph_templates`,
+`huggingface`, `safe_eval`). Cross-cutting behavior is in
+`src/worker/executors/mixins/` (`data`, `governance`, `inference`,
+`training`).
+
+## Agent executor (utu / youtu-agent)
+
+`AgentExecutor` requires the following env vars to run; the executor
+asserts them at import time, so a worker without them fails the task
+immediately:
+
+- `UTU_LLM_TYPE` — provider kind (e.g. `chat.completions`).
+- `UTU_LLM_MODEL` — model identifier.
+- `UTU_LLM_BASE_URL` — LLM endpoint base URL.
+- `UTU_LLM_API_KEY` — LLM API key.
+
+Optional, for the search tools:
+
+- `SERPER_API_KEY`
+- `JINA_API_KEY`
diff --git a/docs/PLUGINS.md b/docs/PLUGINS.md
new file mode 100644
index 00000000..69e8476d
--- /dev/null
+++ b/docs/PLUGINS.md
@@ -0,0 +1,66 @@
+# Plugin extension points
+
+External integrations (auth, submission policy, usage tracking) plug
+into the server through three protocol hooks defined in
+`src/server/hooks/`:
+
+- `IdentityProvider` — resolve a bearer token to a `PrincipalContext`
+  (iterated from `auth/security.py`). With no providers registered,
+  auth is a no-op and `authenticate_api_key` returns a default admin
+  principal.
+- `SubmissionGuard` — pre-submit precondition (iterated from
+  `routers/v1/workflows.py`).
+- `UsageSink` — fan-out per-task usage rows after a task completes
+  (iterated from `services/monitoring.py`). Typical consumers: billing,
+  audit, observability.
+
+## How plugins are loaded
+
+A plugin is any Python module that exposes a top-level `install()`. The
+server loads `FLOWMESH_PLUGINS` (comma-separated module names) inside
+its FastAPI lifespan and treats `install()` as either:
+
+- a sync function returning `None` — the plugin appends its adapters to
+  the registries in `server.hooks` and returns; or
+- an `@asynccontextmanager async def install()` — the plugin owns
+  resources with a lifecycle (a SQLAlchemy engine, an HTTP client, a
+  background task) that need teardown on server shutdown. The loader
+  holds an `AsyncExitStack`, enters each ctx-manager `install()` on
+  startup, and unwinds them in reverse order on shutdown.
+
+Plugins live anywhere on `sys.path` — in-tree under
+`src/server/<name>/`, sibling-mounted under `/app/src/<name>/`, or a
+pip-installable wheel. Core never references plugin module names; each
+plugin self-filters internally.
+
+## Plugins with their own DB
+
+OSS ships no DB itself. Plugins that need persistence bring their own
+engine and manage it via the ctx-manager `install()` form:
+
+```python
+# myorg_auth_plugin/__init__.py
+import os
+from contextlib import asynccontextmanager
+from sqlalchemy.ext.asyncio import async_sessionmaker, create_async_engine
+
+from server.hooks import IDENTITY_PROVIDERS
+
+
+class _MyOrgAuth:
+    name = "myorg.auth"
+    def __init__(self, sessionmaker): self._sm = sessionmaker
+    async def resolve(self, raw_token, logger):
+        async with self._sm() as session:
+            ...
+
+
+@asynccontextmanager
+async def install():
+    engine = create_async_engine(os.environ["MYORG_DATABASE_URL"])
+    IDENTITY_PROVIDERS.append(_MyOrgAuth(async_sessionmaker(engine)))
+    try:
+        yield
+    finally:
+        await engine.dispose()
+```
diff --git a/docs/SDK.md b/docs/SDK.md
new file mode 100644
index 00000000..fa3615e2
--- /dev/null
+++ b/docs/SDK.md
@@ -0,0 +1,70 @@
+# SDK usage (Python)
+
+The Python SDK lives in `sdk/src/flowmesh/`. The public surface exposes
+two clients — `FlowMesh` (sync) and `AsyncFlowMesh` (async) — both
+configured the same way and exposing the same resource groups
+(`workflows`, `tasks`, `workers`, `nodes`, `results`, `system`,
+`ssh`, `traces`). Both handle pagination, error formatting, and SSE
+streaming for you; prefer them over raw HTTP calls.
+
+## Sync client
+
+```python
+from flowmesh import FlowMesh
+
+client = FlowMesh(base_url="http://localhost:8000", api_key="...")
+
+# Submit a workflow from a YAML file.
+workflow_text = open("templates/echo_local.yaml").read()
+wf = client.workflows.submit(workflow_text)
+
+# Stream logs until the workflow finishes.
+for ev in client.workflows.stream_logs(wf.workflow_id):
+    print(ev.line)
+
+# Inspect tasks and pull a result.
+tasks = client.tasks.list(workflow_id=wf.workflow_id)
+result = client.results.get(tasks[0].task_id)
+```
+
+## Async client
+
+```python
+from flowmesh import AsyncFlowMesh
+
+async with AsyncFlowMesh(base_url="...", api_key="...") as client:
+    wf = await client.workflows.submit(yaml_text)
+    async for ev in client.workflows.stream_logs(wf.workflow_id):
+        ...
+```
+
+## Common operations
+
+- **Submit YAML / JSON / n8n** — `client.workflows.submit(text_or_mapping)`;
+  pass `workflow_format="n8n"` for n8n graphs.
+- **Validate without executing** — `client.workflows.validate(text_or_mapping)`.
+- **List with filters and pagination** — `client.workflows.list(status=...)`
+  and `client.tasks.list(workflow_id=..., status=...)`; pass raw cursor
+  params with `query_params=[("before", cursor), ("limit", "100")]`.
+- **Stream logs** — `client.workflows.stream_logs(wf_id)` and
+  `client.tasks.stream_logs(task_id)` yield server-sent events; the
+  iterator stops when the source closes.
+- **Pull artifacts** — `client.results.get(task_id)` for the result
+  payload, `client.results.download_bundle(task_id, include="all")`
+  for the tar.gz.
+- **Fetch and analyze traces** — `client.traces.fetch(wf_id, "spans")`
+  yields JSONL rows; `client.traces.analyze(wf_id)` returns a profile
+  summary.
+- **Cancel** — `client.workflows.cancel(wf_id)`.
+
+## Cursor pagination
+
+Cursor-enabled calls take `limit` and `before` / `after` params.
+Cursors are opaque base64 strings — pass them through; do not parse
+them.
+
+## Errors
+
+The SDK raises `flowmesh.FlowMeshError` (and a small set of subclasses
+for auth / not-found / rate-limit) instead of returning HTTP error
+shapes. Wrap your calls accordingly.
diff --git a/docs/WORKFLOWS.md b/docs/WORKFLOWS.md
new file mode 100644
index 00000000..c2fa4b6d
--- /dev/null
+++ b/docs/WORKFLOWS.md
@@ -0,0 +1,81 @@
+# Workflow YAML format
+
+Workflows are submitted as YAML (or JSON) to `POST /api/v1/workflows`
+(see [`docs/API.md`](API.md)). The `templates/` directory contains
+runnable examples for each shape; this page documents the spec
+hierarchy and the cross-cutting features.
+
+## Single task
+
+```yaml
+apiVersion: flowmesh/v1
+kind: InferenceTask
+metadata:
+  name: hello-inference
+spec:
+  taskType: inference
+  resources:
+    hardware: { gpu: { type: any, count: 1 } }
+  model:
+    source: { type: huggingface, identifier: TinyLlama/TinyLlama-1.1B-Chat-v1.0 }
+    vllm: { gpu_memory_utilization: 0.5 }
+  data:
+    type: list
+    items:
+      - - role: user
+          content: What is the capital of France?
+  inference: { max_tokens: 64, temperature: 0.0 }
+  output:
+    destination: { type: http }
+```
+
+## Multi-stage DAG
+
+```yaml
+apiVersion: flowmesh/v1
+kind: Workflow
+spec:
+  stages:
+    - name: extract
+      spec:
+        taskType: inference
+        ...
+        data:
+          type: list
+          items:
+            - - role: user
+                content: "Extract entities from: {{input}}"
+    - name: summarize
+      dependsOn: [extract]
+      spec:
+        taskType: inference
+        ...
+        data:
+          type: list
+          items:
+            - - role: user
+                content: "Summarize: {{extract.output}}"
+```
+
+`spec.stages[].dependsOn` declares the DAG edges; the dispatcher
+schedules each stage once all of its dependencies are `DONE`.
+Substitutions like `{{extract.output}}` are resolved against the
+upstream stage's result.
+
+## Graph DAG
+
+`taskType: graph_template` — topology-aware multi-input prompts with
+parent output substitution and validation. See
+`src/worker/executors/utils/graph_templates.py` for the templating
+contract.
+
+## Schedule hints
+
+Workflows can declare scheduling preferences via
+`metadata.annotations.schedule_hint`:
+
+- `epoch_groups: [[<task_name>, ...], ...]` — epoch-ordered execution;
+  tasks in epoch `n` only dispatch after every task in epoch `n-1`
+  succeeds.
+- `schedule_in_epoch_order: true` — for dependent DAGs, prefer
+  position-in-epoch tie-breaks during dispatch.
diff --git a/scripts/ci/check_pr_title.py b/scripts/ci/check_pr_title.py
index e6e02915..33c5d9c9 100644
--- a/scripts/ci/check_pr_title.py
+++ b/scripts/ci/check_pr_title.py
@@ -9,7 +9,7 @@
     print("❌ PR title is empty.")
     sys.exit(1)
 
-ALLOWED_TYPES = ["feat", "fix", "refactor", "chore", "test", "perf"]
+ALLOWED_TYPES = ["feat", "fix", "refactor", "chore", "test", "perf", "docs"]
 
 # Strip optional [1/N] progress prefix
 progress_match = re.match(r"^\[\d+/[\dNn]+\]\s*(.+)$", pr_title, re.IGNORECASE)
@@ -25,7 +25,7 @@
     core_title = pr_title
     is_breaking = False
 
-# Validate type: feat, fix, refactor, chore, test, perf
+# Validate type: feat, fix, refactor, chore, test, perf, docs
 types_re = "|".join(re.escape(t) for t in ALLOWED_TYPES)
 type_match = re.match(rf"^({types_re}):\s+.+$", core_title, re.IGNORECASE)
 if not type_match: