This repository contains planning docs and the agent execution scaffolding to build the steering platform end to end.
-
Install prerequisites:
- GitHub CLI (
gh) authenticated to your account - Node.js 20+
- pnpm
- Deep Agents CLI (
deepagents) installed and configured
- GitHub CLI (
-
Validate task contracts:
pnpm verify- Bootstrap GitHub repo labels and issues:
bash scripts/bootstrap-github.sh- Assign tasks from
tasks/to coding agents.
The repository includes a supervisor/worker/reconciler loop that dispatches tasks from GitHub issues and runs Deep Agents in isolated git worktrees.
export REPO="hntrl/steering-rl"
export BASE_BRANCH="main"
export EXECUTOR_BOT_TOKEN="..."
export LANGSMITH_API_KEY="..."
export LANGCHAIN_API_KEY="$LANGSMITH_API_KEY"
export LANGCHAIN_TRACING=true
export DEEPAGENTS_LANGSMITH_PROJECT="steer-build-agents-staging"
export DEEPAGENTS_AGENT="build"
export DEEPAGENTS_SHELL_ALLOW_LIST="cd,git,pnpm,npm,node,npx,python3,bash,sh,ls,cat,head,tail,grep,pwd,which,cp,mv,rm,mkdir,touch"
export MAX_PARALLEL=2
export POLL_INTERVAL_SECONDS=60
export MAX_AGENT_RUNTIME_MINUTES=20Use an explicit command list for coding workflows. Do not set DEEPAGENTS_SHELL_ALLOW_LIST=all.
bash scripts/agent-daemon.sh onceonce runs in the foreground and prints dispatch output directly to your terminal.
Use DRY_RUN=1 to test dispatch logic without labeling issues or starting workers.
DRY_RUN=1 bash scripts/agent-daemon.sh oncebash scripts/agent-daemon.sh start
bash scripts/agent-daemon.sh status
bash scripts/agent-daemon.sh follow
bash scripts/agent-daemon.sh stopIf a previous dry run or crash leaves a task lock behind, clear locks:
bash scripts/agent-daemon.sh resetAfter reconciliation the reconciler automatically deletes remote branches for merged task PRs and prunes stale local worktrees that no longer have active runs.
Only branches matching agent/P0-##, agent/P1-##, agent/P2-##, or agent/P3-## with merged PRs are deleted. Protected branches (main, master) are never touched.
Use DRY_RUN=1 (or --dry-run) to preview planned deletions without mutating git state:
DRY_RUN=1 bash scripts/agent-daemon.sh onceSet REPO_ROOT to point at the bare repo root when running the reconciler outside a worktree.
- Runtime state:
~/.agentd/state/runs.json - Structured events:
~/.agentd/logs/events.jsonl - Worker logs:
~/.agentd/logs/workers/
For live Deep Agents progress, tail the latest worker log shown in the supervisor dispatch output.
The worker automatically syncs its branch with the base branch before each agent run. When merge conflicts occur:
- Lockfile-only conflicts (
pnpm-lock.yaml): Resolved automatically by checking out the upstream version and runningpnpm install --lockfile-only. Non-lockfile task changes are preserved. - Non-lockfile conflicts: The rebase is aborted and the run exits with a retry status. No task changes are discarded.
- Repeated failures: After 3 failed conflict-recovery attempts, the issue is marked
status:blockedwith an actionable remediation comment describing manual resolution steps.
Conflict recovery emits branch_sync and conflict_recovery events to events.jsonl.
To disable automatic lockfile resolution, set DISABLE_LOCKFILE_AUTO_RESOLVE=1 in the worker environment and fall back to manual conflict resolution.
onceruns a single dispatch cycle and exits;statuswill still show stopped afterward.- If nothing dispatches, run
bash scripts/agent-daemon.sh resetand retry. - Check latest run status in
~/.agentd/state/runs.json. - Open
log_pathfrom the run record to see the exact Deep Agents error.
Run the doctor script for a consolidated diagnosis:
pnpm agent:doctorThe doctor report includes explicit requeue commands when root dependency tasks are blocked.
Machine-readable JSON output for CI and cron alerting:
node scripts/agent-doctor.mjs --format jsonThe JSON report follows schemas/doctor-report.schema.json and includes summary counts, per-check detail, and recommended remediation actions. Secret values are always redacted in both text and JSON modes.
Strict mode returns non-zero exit when failure or warning thresholds are breached:
node scripts/agent-doctor.mjs --strict # exit 1 on any fail
node scripts/agent-doctor.mjs --strict --fail-threshold 2 # exit 1 on >= 2 fails
node scripts/agent-doctor.mjs --strict --warn-threshold 3 # also gate on >= 3 warnings
node scripts/agent-doctor.mjs --strict --format json # combine JSON + strictOptional smoke test (30s timeout):
node scripts/agent-doctor.mjs --smokeEvent payloads follow schemas/agent-event.schema.json.
pnpm events:validateThe services/steering-inference-api/ service exposes an OpenAI-compatible /v1/chat/completions endpoint with activation steering support.
The route delegates inference to a pluggable ModelAdapter interface (src/providers/model-adapter.ts). The default HttpModelAdapter calls an upstream OpenAI-compatible runtime configured via environment variables:
INFERENCE_BASE_URL— base URL of the model runtime (default:http://localhost:8000)INFERENCE_API_KEY— optional bearer token for the runtime
When no adapter is injected (e.g. in tests), the route falls back to a deterministic stub path.
Provider failures are mapped to structured 5xx responses with retry-safe error codes:
| Upstream status | Mapped status | Error code |
|---|---|---|
| 429 | 529 | provider_rate_limited |
| 5xx | 502 | provider_internal_error |
| Connection failure | 502 | provider_connection_error |
Steering metadata is attached to both successful responses and error paths when a profile is resolved.
The CanaryController (services/canary-router/src/controller.ts) orchestrates phased traffic rollout from champion to challenger steering profiles with automatic rollback.
- Phase progression: 10% → 50% → 100% traffic to challenger, configurable at runtime without redeploy.
- Auto-advance: Automatically advances to the next phase after a configurable observation window when metrics are healthy.
- Automatic rollback: Triggers on
degenerate_rate,p95_latency_ms, orerror_ratethreshold breaches. - Kill switch: Instantly routes all traffic to baseline (no steering) path.
- Freeze mode: Disables all phase advancement while maintaining current routing.
- Machine-readable events: Emits
phase_advance,rollback_triggered,kill_switch_enabled,config_updated, and other structured events.
import { CanaryController } from "./src/controller.js";
const ctrl = new CanaryController({
minPhaseObservationMs: 5 * 60 * 1000,
autoAdvance: true,
router: {
championProfileId: "steer-gemma3-default-v12",
challengerProfileId: "steer-gemma4-candidate-v3",
rollbackPolicy: {
windowMs: 30 * 60 * 1000,
thresholds: [
{ metric: "degenerate_rate", maxValue: 0.03 },
{ metric: "error_rate", maxValue: 0.05 },
{ metric: "p95_latency_ms", maxValue: 5000 },
],
},
},
});
ctrl.on((event) => console.log(JSON.stringify(event)));pnpm test --filter canary-router && pnpm run canary:simulationThe inference path enforces per-route token budgets and request quotas to prevent production traffic from exceeding cost or safety envelopes.
src/guardrails/cost-policy.ts provides a CostPolicy class that:
- Tracks token usage and request counts within a configurable rolling window.
- Supports per-model and per-profile budget overrides with specificity-based resolution.
- Emits soft-limit warnings when usage crosses a configurable fraction (default 80%) of the budget.
- Returns deterministic 429 errors with
Retry-Afterguidance when hard limits are breached. - Emits all policy decisions as structured telemetry events for auditing.
- Supports runtime configuration updates — hard limits can be disabled to retain warning-only telemetry.
import { CostPolicy, BudgetExceededError } from "./src/guardrails/cost-policy.js";
const policy = new CostPolicy({
defaultLimits: { maxTokens: 1_000_000, maxRequests: 10_000, softLimitFraction: 0.8 },
windowMs: 60 * 60 * 1000,
overrides: [
{ model: "gemma-3-27b-it", limits: { maxTokens: 2_000_000 } },
{ profileId: "premium", limits: { maxRequests: 50_000 } },
],
hardLimitsEnabled: true,
});
policy.on((event) => console.log(JSON.stringify(event)));
const decision = policy.enforce("gemma-3-27b-it", "profile-id", estimatedTokens);src/budget-hooks.ts connects budget breach signals to the rollout controller:
- Warning signals are recorded and emitted as telemetry without affecting rollout.
- Breach signals freeze the controller (halt phase progression) after a configurable threshold.
- Optional rollback-on-breach mode triggers rollback evaluation.
- Reset clears breach counters and optionally unfreezes the controller.
import { CanaryController } from "./src/controller.js";
import { BudgetHooks } from "./src/budget-hooks.js";
const ctrl = new CanaryController();
const hooks = new BudgetHooks(ctrl, { freezeOnBreach: true, breachCountThreshold: 1 });
hooks.on((event) => console.log(JSON.stringify(event)));
hooks.processSignal({ severity: "breach", model: "gemma-3-27b-it", ... });If budget enforcement blocks valid traffic unexpectedly, disable hard limits and retain warning-only telemetry until policy thresholds are corrected.
The nightly promotion pipeline (jobs/nightly/promote.ts) automates the end-to-end flow from trace ingestion through Stage D decision output and canary configuration handoff.
pnpm run promote:nightlySafe for CI and scheduled checks — executes all stages but writes no files:
pnpm run promote:nightly -- --dry-run- Dataset mining — Runs Stage A (baseline), Stage B (single-layer sweep), and Stage C (multi-layer calibration)
- Experiment scoring — Runs Stage D champion-challenger bake-off with hard gate enforcement
- Promotion handoff — Builds canary router configuration with rollback payload
- Release artifact — Emits decision summary, evidence links, and rollback instructions to
artifacts/releases/
If the nightly promotion flow fails, pause automatic handoff and require manual promotion review with static canary champion routing. See artifacts/releases/README.md for rollback payload format.
steering-exec-plan.mdfeedback-loop.mdagent-delivery-plan.mdagent-instrumentation-spec.md
Task contracts live in tasks/ and are designed for coding-agent execution.