diff --git a/.claude/agent-memory/implementer.md b/.claude/agent-memory/implementer.md index be9f76c..2b91795 100644 --- a/.claude/agent-memory/implementer.md +++ b/.claude/agent-memory/implementer.md @@ -1,4 +1,9 @@ +## 2026-04-13 — v3 spec docs authoring (COMPILER.md) +- **Learned:** When writing spec docs that cross-reference other specs, read all referenced files before writing — the DECISIONS.md may contradict the task brief (e.g., DECISIONS.md says `flock` but the task brief already decided on mkdir-based locking; trust the task brief for closed decisions). +- **Learned:** For compiler specs where the output is bash, the pseudocode-with-inline-comments pattern in Section 11 (showing compiler-generated structure without writing full implementation) is the right balance between specificity and staying within spec scope. +- **Avoid:** Do not write full implementation bash in a spec doc — show structure and intent with comments instead. + ## 2026-04-08 — practices pipeline inbox→active migration - **Learned:** When promoting inbox practices to active, the pattern is: Write new file to active/ with updated frontmatter (status, incorporated_in, effectiveness fields added), then rm original from inbox/. Single batch rm for all deletions is cleaner than individual calls. - **Avoid:** Do not use Edit to move files — Write+rm is the correct approach since Edit only modifies in-place. diff --git a/docs/v3/AUDIT.md b/docs/v3/AUDIT.md new file mode 100644 index 0000000..23027f0 --- /dev/null +++ b/docs/v3/AUDIT.md @@ -0,0 +1,237 @@ +# Audit Trail — dotforge v3.0 Behavior Governance + +Specifies the triple-write audit architecture, overrides.log format, metrics exposed +to `/forge audit` and `/forge behavior status`, and integration with existing dotforge systems. + +Reference: [SPEC.md](SPEC.md) Section 6 for the override protocol. +Reference: [RUNTIME.md](RUNTIME.md) Section 2 for the state.json schema. + +--- + +## 1. Overview + +Every soft_block override is recorded in three locations simultaneously. Each location serves +a distinct purpose; none is redundant. + +| Location | Purpose | Scope | Persistence | +|----------|---------|-------|-------------| +| `.forge/audit/overrides.log` | Compliance audit trail | All time | Permanent — committed to git | +| `.forge/runtime/state.json` | Runtime inspection | Session (TTL 24h) | Ephemeral — gitignored | +| `registry/projects.yml` metrics | Cross-project trends | Snapshot | Permanent — committed to git | + +**Why three places:** +- `overrides.log` — append-only, inspectable with grep, survives session resets. The authoritative record. +- `state.json` — in-memory view for the current session. Powers `/forge behavior status` without file parsing. +- Registry — aggregated `override_rate` enables cross-project governance dashboards and trend detection. + +Directory layout: +``` +.forge/ +├── audit/ +│ └── overrides.log # permanent, committed to git +└── runtime/ + ├── state.json # ephemeral, gitignored + └── state.lock/ # transient mkdir lock +``` + +--- + +## 2. overrides.log Format + +Location: `.forge/audit/overrides.log` + +Append-only. One record per line. Pipe-delimited. No header row. + +``` +TIMESTAMP|SESSION_ID|BEHAVIOR_ID|TOOL_NAME|TOOL_INPUT_SUMMARY|COUNTER|REASON +``` + +**Example records:** +``` +2026-04-13T12:15:00Z|a1b2c3d4|search-first|Edit|file_path=/src/utils.ts old_string=function|5| +2026-04-13T14:30:00Z|a1b2c3d4|search-first|Write|file_path=/src/new-module.ts content=import|7|urgent hotfix +2026-04-14T09:00:00Z|e5f6g7h8|search-first|Write|file_path=/tests/test_api.py content=def test|2| +2026-04-14T10:45:00Z|e5f6g7h8|verify-before-done|Bash|command=git commit -m "feat: add"|3|tests passed locally +2026-04-15T16:20:00Z|i9j0k1l2|search-first|Edit|file_path=/lib/parser.ts old_string=export|4| +``` + +**Field reference:** + +| Field | Format | Notes | +|-------|--------|-------| +| `TIMESTAMP` | ISO 8601 with timezone (`Z` or offset) | UTC preferred | +| `SESSION_ID` | First 8 chars of Claude Code session UUID | From hook payload `session_id` | +| `BEHAVIOR_ID` | kebab-case behavior id | Matches `behavior.yaml` id field | +| `TOOL_NAME` | Tool that triggered the block | Write, Edit, Bash, etc. | +| `TOOL_INPUT_SUMMARY` | First 100 chars of key `tool_input` fields | Pipe chars escaped as `\|`, newlines as `\n` | +| `COUNTER` | Violation count at override moment | Integer; counter is already incremented (see SPEC.md §3.1) | +| `REASON` | User-provided reason string | Empty string if none; never contains pipe chars | + +The `overrides[]` array in `state.json` (see RUNTIME.md §2) is the in-session subset of this log. +The log is the authoritative source; state.json is derived and ephemeral. + +--- + +## 3. Log Rotation Policy + +No rotation in v3.0. The file grows indefinitely. + +Expected growth: fewer than 100 overrides per month for an active project equals fewer than 10 KB/month. +Rotation is reserved for v3.1 based on observed usage. + +To count total overrides: `wc -l .forge/audit/overrides.log` + +--- + +## 4. Metrics for /forge audit + +The existing 13-item checklist and scoring formula (see `audit/scoring.md`) are unchanged. +Behavior governance metrics appear as a **separate section** appended after the checklist score. + +**New section: Behavior Governance** + +| Metric | Type | Source | Calculation | +|--------|------|--------|-------------| +| `behaviors_installed` | integer | `behaviors/*/behavior.yaml` file count | direct count | +| `behaviors_enabled` | integer | `behaviors/index.yaml` enabled entries | direct count | +| `violations_total` | integer | `state.json` sum of all counters | sum across all sessions and behaviors | +| `overrides_total` | integer | `overrides.log` line count | `wc -l` | +| `override_rate` | float | `overrides_total / violations_that_reached_block` | ratio; 0.0 if no blocks | +| `escalation_effectiveness` | string | threshold on `override_rate` | `healthy` < 0.3 / `review` 0.3–0.7 / `ineffective` > 0.7 | + +**Display format in `/forge audit` output:** +``` +── Behavior Governance ── +Installed: 6 (4 core, 2 opinionated) +Enabled: 4 +Violations (current session): 12 +Overrides (all time): 3 +Override rate: 0.25 (healthy) +``` + +`escalation_effectiveness` interpretation: +- `healthy` — overrides are rare; enforcement is accepted +- `review` — override rate is high; consider adjusting thresholds or behavior wording +- `ineffective` — most blocks are overridden; the behavior adds friction without governance value + +--- + +## 5. Metrics for /forge behavior status + +Per-behavior display, sourced from `state.json` for the current session: + +``` +── search-first (core) ── +Status: enabled +Counter: 4 (this session) +Level: warning (escalates to soft_block at 5) +Overrides: 1 (this session) +Last: 2026-04-13T14:28:00Z via Write + +── no-destructive-git (core) ── +Status: enabled +Counter: 0 +Level: hard_block (always) +Overrides: 0 +Last: never +``` + +Session aggregate at bottom: +``` +── Session Summary ── +Total violations: 12 +Total overrides: 1 +Active behaviors: 4/6 +Session started: 2026-04-13T10:00:00Z +``` + +`Last` field shows `last_violation_at` and `last_violation_tool` from `state.json`. +`Level` shows `effective_level` with the next escalation threshold if applicable. + +--- + +## 6. Registry Integration + +New fields added to each project entry in `registry/projects.yml`: + +```yaml +projects: + - slug: soma + # ... existing fields ... + behaviors: + installed: 6 + enabled: 4 + override_rate: 0.25 + last_audit: "2026-04-13" +``` + +These fields are snapshot values written by `/forge audit` when behavior governance is active. +They are not real-time. `override_rate` is computed from `overrides.log` at audit time. + +`/forge status` can aggregate `override_rate` across projects to surface systemic governance gaps. + +--- + +## 7. Integration with session-report.sh + +Four new fields added to the JSON output written to `~/.claude/metrics/{slug}/{date}.json` +by the Stop hook (`hooks/session-report.sh`): + +```json +{ + "behavior_violations": 12, + "behavior_overrides": 1, + "behaviors_active": 4, + "behavior_blocks": 3 +} +``` + +These fields are appended alongside existing fields (`sessions`, `errors_added`, `hook_blocks`, +`lint_blocks`, etc.). Backwards compatible — consumers that don't read these fields are unaffected. + +`behavior_violations` — sum of all behavior counters in the current session from `state.json`. +`behavior_overrides` — count of overrides recorded in this session (from `state.json overrides[]`). +`behaviors_active` — count of enabled behaviors from `behaviors/index.yaml`. +`behavior_blocks` — number of soft_block or hard_block events this session (counter reached block threshold). + +--- + +## 8. Audit Trail Security + +- `.forge/audit/overrides.log` permissions: `0644` +- `.forge/audit/` is committed to git — permanent audit evidence +- `.forge/runtime/` is gitignored — ephemeral session state only +- `tool_input_summary` is truncated to 100 chars — no secrets in full form +- Pipe chars in `tool_input_summary` are escaped as `\|` before writing +- Newlines in `tool_input_summary` are escaped as `\n` before writing +- No tool input values beyond the summary are persisted in any audit location +- The `REASON` field must be sanitized to strip pipe chars before appending + +--- + +## 9. Grep One-Liners + +Useful commands for audit analysis: + +```bash +# All overrides for a specific behavior +grep '|search-first|' .forge/audit/overrides.log + +# Override count by behavior (ranked) +cut -d'|' -f3 .forge/audit/overrides.log | sort | uniq -c | sort -rn + +# Overrides in last 7 days (macOS + Linux portable) +awk -F'|' -v d="$(date -d '7 days ago' +%Y-%m-%d 2>/dev/null || date -v-7d +%Y-%m-%d)" '$1 >= d' .forge/audit/overrides.log + +# Overrides by tool +cut -d'|' -f4 .forge/audit/overrides.log | sort | uniq -c | sort -rn + +# All overrides with a non-empty reason +awk -F'|' '$7 != ""' .forge/audit/overrides.log + +# Total override count +wc -l .forge/audit/overrides.log + +# Override rate per behavior requires violations from state.json +# Use /forge behavior status — grep on overrides.log alone is insufficient for rate calculation +``` diff --git a/docs/v3/COMPETITIVE.md b/docs/v3/COMPETITIVE.md new file mode 100644 index 0000000..be162e5 --- /dev/null +++ b/docs/v3/COMPETITIVE.md @@ -0,0 +1,97 @@ +# Mapa competitivo y diferenciales de dotforge v3.0 + +## Competencia directa verificada + +### obey (Lexxes-Projects) + +- Natural language → rule storage → hook auto-generado +- 17 hooks de lifecycle +- 3 scopes: global, stack-specific, project-local +- Blocking activo via PreToolUse +- Audit trail +- Completion checklists via Stop hook + +**Solape con 3.0 original:** casi total. +**Nuestro diferencial ante obey:** catálogo curado, integración con el +resto del lifecycle de dotforge (audit, practices, registry, export), +enforcement escalonado de 5 niveles vs hard-block binario, seguridad +first-class. + +### hookify (oficial Anthropic) + +- Plugin oficial +- Markdown rules → hooks activos +- Archivo por regla, sin restart + +**Solape:** parcial, patrón similar. +**Riesgo:** si Anthropic publica behavior spec oficial en 6 meses, dotforge +debe alinearse. Schema debe ser lo más cercano posible a estándares +obvios. + +### tdd-guard (nizos, 1.7k stars) + +- TDD enforcement vertical +- Context aggregation cross-hook via archivos compartidos +- Quick commands ON/OFF via UserPromptSubmit +- Multi-language + +**Solape:** ninguno (vertical específico). +**Lecciones aplicables:** +- Context aggregation cross-hook: diferir a 3.1 pero diseñar schema + compatible desde 3.0 +- Quick commands ON/OFF: incluir en 3.0 (`/forge behavior off`, scope) + +### Otros proyectos referenciables + +- **claude-code-workflow-orchestration:** soft enforcement con nudges + escalonados (silent → hint → warning → strong). Fuente del modelo de + 5 niveles. +- **claude-code-lsp-enforcement-kit:** state tracking persistente por + cwd, multi-tier por tipo de agent. Fuente del campo `applies_to.agents`. +- **AgentSpec (ICSE '26):** DSL académica para runtime enforcement. + Referencia formal citable en README. +- **AgentBound:** arquitectura manifest + enforcement engine. Modelo + interno, no user-facing. + +## Diferenciales defendibles de dotforge 3.0 + +1. **Catálogo curado con governance** — obey/hookify te piden escribir + reglas. dotforge trae behaviors probados, auditables, versionados. + +2. **Integración con lifecycle existente** — los behaviors entran al + pipeline `inbox → evaluating → active → deprecated`, al `/forge audit`, + al registry cross-proyecto. Nadie más tiene esto. + +3. **Enforcement escalonado 5 niveles con UX de escape** — obey es + hard-block binario. dotforge permite configurar silent/nudge/warning/ + soft/hard por behavior, con override auditado en soft y comandos de + escape por scope. + +4. **Separación policy vs rendering** — un behavior declara comportamiento + esperado separado del texto que se inyecta en CLAUDE.md. Habilita + multi-platform export nativo (diferido a 3.1 pero diseñado desde 3.0). + +5. **Seguridad first-class (diferida a 3.2 pero planeada)** — post-CVE + Feb 2026, signed behaviors + hash verification + sandbox es + requisito enterprise. Competidores no lo tienen. + +## Riesgos competitivos + +- **obey consolida narrativa mientras construís:** mitigación es Fase 1 + rápida con search-first funcional + GIF público. +- **Anthropic publica behavior spec oficial:** mitigación es schema + conservador alineado con patterns obvios + campo `schema_version`. +- **dotforge sigue en 4 stars:** problema es distribución, no producto. + Post técnico + benchmark real + update marketplace Anthropic en Fase 3. + +## Mapa de features diferido a 3.1/3.2 + +| Feature | Competidor que ya lo hace | Target dotforge | +|---------|---------------------------|-----------------| +| Prompt-based hooks | Anthropic oficial | 3.1 | +| Multi-platform export de rules | Ninguno directo | 3.1 | +| Context aggregation | tdd-guard | 3.1 | +| Natural language input | obey, rule2hook | 3.1 | +| Signed behaviors | Ninguno | 3.2 | +| Transcript verification | Ninguno | 3.2 | +| OPA/Rego compile | yaml-opa-llm-guardrails | 3.2 opcional | diff --git a/docs/v3/COMPILER.md b/docs/v3/COMPILER.md new file mode 100644 index 0000000..a87e356 --- /dev/null +++ b/docs/v3/COMPILER.md @@ -0,0 +1,458 @@ +# Behavior Compiler — dotforge v3.0 + +Specifies how `behaviors/` YAML files are compiled into executable Claude Code hook scripts. +Input: `behaviors/index.yaml` + `behaviors//behavior.yaml` files. +Output: `.claude/hooks/behaviors/` scripts + `settings.json` hook registrations. + +References: [SPEC.md](SPEC.md) (evaluation algorithm, output protocol), [SCHEMA.md](SCHEMA.md) (field reference, DSL), [RUNTIME.md](RUNTIME.md) (state.json, locking). + +--- + +## 1. Overview + +The compiler is a one-shot transform: read declarative YAML, emit executable bash. +It does not interpret behavior logic at compile time — it generates bash that interprets it at runtime. +The shared runtime library (`_forge_runtime.sh`) implements the evaluation algorithm from SPEC.md Section 2. +Generated hooks are always exit 0; enforcement is via JSON stdout per SPEC.md Section 5. + +--- + +## 2. Compilation Pipeline + +``` +1. Read behaviors/index.yaml + → validate schema_version == "1" + → collect ordered list of {id, enabled} + +2. For each enabled behavior (in declaration order): + → parse behaviors//behavior.yaml + → validate against SCHEMA.md Section 6 (all rules) + → collect validation errors; abort if any + +3. Group behaviors by trigger event type + → one group per event: PreToolUse, PostToolUse, UserPromptSubmit, Stop + → behaviors with multiple triggers appear in multiple groups + +4. For each event group: + → generate ONE hook script at .claude/hooks/behaviors/.sh + → script sources _forge_runtime.sh and calls evaluate_behavior() per behavior + +5. Generate PermissionDenied override audit hook + → .claude/hooks/behaviors/PermissionDenied.sh + → writes override record to .forge/audit/overrides.log + state.json + → see SPEC.md Section 6 for override protocol + +6. Write .claude/hooks/behaviors/_forge_runtime.sh + → shared library implementing SPEC.md Section 2 algorithm + +7. Update .claude/settings.json + → append behavior hook entries AFTER existing hooks for each event + → register PermissionDenied hook for override audit trail + → idempotent: re-running produces identical output (see Section 7) + +8. chmod +x all generated scripts + +9. Report: + → behaviors compiled (count) + → hooks generated (list) + → validation errors (if --validate or on failure) +``` + +If any behavior fails validation, no files are written. All-or-nothing. + +--- + +## 3. Generated Hook Structure + +Every generated event hook follows this template: + +```bash +#!/usr/bin/env bash +# AUTO-GENERATED by dotforge behavior compiler v1. DO NOT EDIT. +# Source: behaviors/index.yaml +# Generated: 2026-04-13T15:00:00Z +# Event: PreToolUse +# Behaviors: search-first, no-destructive-git + +set -euo pipefail + +# Source shared runtime +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +source "$SCRIPT_DIR/_forge_runtime.sh" + +# Read hook input +INPUT=$(cat) +TOOL_NAME=$(echo "$INPUT" | jq -r '.tool_name // empty') +TOOL_INPUT=$(echo "$INPUT" | jq -r '.tool_input // empty') +SESSION_ID=$(echo "$INPUT" | jq -r '.session_id // empty') +HOOK_EVENT=$(echo "$INPUT" | jq -r '.hook_event_name // empty') + +# Fallback session ID if not provided (see RUNTIME.md Section 3) +if [ -z "$SESSION_ID" ]; then + SESSION_ID=$(echo "${PWD}:${PPID}:$(date +%Y%m%d)" | md5sum 2>/dev/null | cut -c1-8 \ + || echo "${PWD}:${PPID}:$(date +%Y%m%d)" | md5 -q 2>/dev/null | cut -c1-8 \ + || echo "fallback") +fi + +# Initialize state (lock + read + TTL purge per RUNTIME.md Section 10) +forge_init_state "$SESSION_ID" + +# --- Behavior: search-first --- +# Source: behaviors/search-first/behavior.yaml +evaluate_behavior "search-first" "$TOOL_NAME" "$TOOL_INPUT" "$SESSION_ID" "$HOOK_EVENT" + +# --- Behavior: no-destructive-git --- +# Source: behaviors/no-destructive-git/behavior.yaml +evaluate_behavior "no-destructive-git" "$TOOL_NAME" "$TOOL_INPUT" "$SESSION_ID" "$HOOK_EVENT" + +# Merge outputs, write state, release lock, emit JSON to stdout +forge_emit_output +exit 0 +``` + +One `evaluate_behavior` call is emitted per behavior, in index.yaml declaration order. +The hook always exits 0 (SPEC.md Section 5). + +--- + +## 4. Shared Runtime Library (_forge_runtime.sh) + +Function signatures and responsibilities. Implementation detail, not pseudocode. + +```bash +# Acquire lock, read state.json (or initialize if missing/corrupt), +# purge expired sessions (RUNTIME.md Section 6), get/create session entry. +forge_init_state(session_id) + +# Write mutated state.json to disk, release lock. +# On write failure: log to stderr, continue (RUNTIME.md Section 9). +forge_finalize_state() + +# Core evaluation per SPEC.md Section 2: +# 1. Check applies_to.tools — skip if tool not in list (empty = all) +# 2. Check trigger.event matches HOOK_EVENT +# 3. Evaluate trigger conditions via forge_check_condition() +# 4. If triggered: increment counter, resolve level (SPEC.md 2.1), apply monotonic (SPEC.md 3.2) +# 5. Render output for the effective level +# 6. Queue output via forge_queue_output() +# 7. If level is soft_block or hard_block: set FORGE_BLOCK_HIT=1 (cuts chain) +evaluate_behavior(behavior_id, tool_name, tool_input, session_id, hook_event) + +# Dispatch to operator-specific bash check. +# Returns 0 (condition met) or 1 (condition not met). +forge_check_condition(field, operator, value, tool_input) + +# Append {behavior_id, level, message} to in-memory output queue. +forge_queue_output(behavior_id, level, message) + +# Implements SPEC.md Section 2.3 merge_outputs: +# - If FORGE_BLOCK_HIT: emit last block output only +# - Else: concatenate all systemMessages with "\n\n" +# Calls forge_finalize_state() before emitting. +forge_emit_output() + +# mkdir-based, 2s timeout. On timeout: set FORGE_LOCK_FAILED=1. +# See RUNTIME.md Section 7. +forge_acquire_lock() +forge_release_lock() + +# Write message to stderr. Never blocks tool call. +forge_log_warning(message) + +# Truncate string to max_len chars for audit summaries. +forge_truncate(string, max_len) +``` + +`_forge_runtime.sh` sources `behaviors//behavior.yaml` data embedded at compile time as +bash variables — behaviors are not re-read at runtime. The compiler inlines the YAML fields +(trigger conditions, escalation thresholds, rendering templates) as bash arrays and strings. + +--- + +## 5. Trigger-to-Bash Compilation + +How each DSL operator (SCHEMA.md Section 3) maps to bash inside `evaluate_behavior`: + +| DSL Operator | Bash Equivalent | +|---|---| +| `regex_match` | `echo "$val" \| grep -qE "$pattern"` | +| `contains` | `echo "$val" \| grep -qF "$pattern"` | +| `not_contains` | `! echo "$val" \| grep -qF "$pattern"` | +| `equals` | `[[ "$val" == "$pattern" ]]` | +| `starts_with` | `[[ "$val" == "$pattern"* ]]` | +| `ends_with` | `[[ "$val" == *"$pattern" ]]` | +| `gt` | `[[ "$val" -gt "$pattern" ]]` | +| `lt` | `[[ "$val" -lt "$pattern" ]]` | +| `gte` | `[[ "$val" -ge "$pattern" ]]` | +| `lte` | `[[ "$val" -le "$pattern" ]]` | +| `exists` | `[[ -n "$val" ]]` | +| `not_exists` | `[[ -z "$val" ]]` | + +Logic composition: +- `logic: all` → conditions joined with `&&` (default) +- `logic: any` → conditions joined with `||` + +Field extraction from `tool_input` JSON: + +```bash +extract_field() { + local field="$1" tool_input="$2" + echo "$tool_input" | jq -r ".$field // empty" 2>/dev/null +} +``` + +`session_state.counter` is read from the in-memory state loaded by `forge_init_state`, +not from the JSON tool_input payload. + +--- + +## 6. Settings.json Mutation + +The compiler appends behavior hook registrations to `settings.json`. +Existing hooks are preserved as-is. Behavior hooks are added AFTER (SPEC.md Section 8.2). + +Before (existing hooks only): + +```json +{ + "hooks": { + "PreToolUse": [ + { + "matcher": "Bash", + "hooks": [ + {"type": "command", "command": ".claude/hooks/block-destructive.sh"} + ] + } + ] + } +} +``` + +After (behavior hooks appended): + +```json +{ + "hooks": { + "PreToolUse": [ + { + "matcher": "Bash", + "hooks": [ + {"type": "command", "command": ".claude/hooks/block-destructive.sh"} + ] + }, + { + "matcher": "Bash|Write|Edit|Grep|Glob|Read", + "hooks": [ + {"type": "command", "command": ".claude/hooks/behaviors/PreToolUse.sh"} + ] + } + ] + } +} +``` + +Mutation rules: +- Existing entries: never modified, never removed. +- Behavior entry matcher: union of all `triggers[].matcher` values for behaviors in that event group. +- One behavior entry per event type — all behaviors for that event share one hook command. +- Idempotent: the compiler detects an existing behavior entry by its command path and overwrites only that entry. All other entries are untouched. +- If no behaviors declare a trigger for a given event, no entry is added for that event. + +--- + +## 6.1 PermissionDenied Override Audit Hook + +The compiler generates a `PermissionDenied.sh` hook to record override audit trails (SPEC.md Section 6). + +This hook fires when Claude Code's auto-mode classifier denies a tool call. It checks if the denial was from a behavior's `soft_block` and records the override. + +```bash +#!/usr/bin/env bash +# AUTO-GENERATED by dotforge behavior compiler v1. DO NOT EDIT. +# Purpose: record behavior overrides in triple-write audit trail +# Event: PermissionDenied + +set -euo pipefail +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +source "$SCRIPT_DIR/_forge_runtime.sh" + +INPUT=$(cat) +TOOL_NAME=$(echo "$INPUT" | jq -r '.tool_name // empty') +SESSION_ID=$(echo "$INPUT" | jq -r '.session_id // empty') +REASON=$(echo "$INPUT" | jq -r '.reason // empty') + +# Check if this denial came from a behavior hook +# Behavior-generated denials include behavior_id in the systemMessage +BEHAVIOR_ID=$(echo "$REASON" | grep -oP '^\[([a-z0-9-]+)\]' | tr -d '[]' || true) + +if [ -z "$BEHAVIOR_ID" ]; then + exit 0 # Not a behavior-generated denial +fi + +# Record override in audit trail +forge_record_override "$SESSION_ID" "$BEHAVIOR_ID" "$TOOL_NAME" "$INPUT" +exit 0 +``` + +Settings.json registration: +```json +{ + "PermissionDenied": [ + { + "hooks": [ + {"type": "command", "command": ".claude/hooks/behaviors/PermissionDenied.sh"} + ] + } + ] +} +``` + +The `forge_record_override` function (in `_forge_runtime.sh`) writes to all three audit locations: +1. Appends to `.forge/audit/overrides.log` (AUDIT.md Section 2) +2. Updates `overrides[]` in `.forge/runtime/state.json` (RUNTIME.md Section 2) +3. Increments override counters for registry metrics + +--- + +## 7. Runtime Dependencies + +| Dependency | Required | Platform | Behavior if absent | +|---|---|---|---| +| jq | yes | all | Hook warns to stderr, exits 0 — all behaviors degrade to silent | +| bash 4+ | yes | all | Use POSIX-compatible constructs; macOS ships bash 3.2 — avoid `declare -A` | +| mkdir | yes | POSIX | Always available | +| date | yes | POSIX | Always available | +| md5sum / md5 | no | Linux / macOS | Session ID fallback only; cksum used if both absent | + +A `SessionStart` hook must verify jq availability: + +```bash +command -v jq >/dev/null 2>&1 \ + || echo "[forge] WARNING: jq not found — behavior enforcement disabled" >&2 +``` + +--- + +## 8. Hook File Naming Convention + +``` +.claude/hooks/behaviors/ +├── _forge_runtime.sh # shared library — sourced, not executed directly +├── PreToolUse.sh # all PreToolUse behaviors +├── PostToolUse.sh # all PostToolUse behaviors (if any) +├── UserPromptSubmit.sh # all UserPromptSubmit behaviors (if any) +├── Stop.sh # all Stop behaviors (if any) +└── PermissionDenied.sh # override audit trail writer +``` + +Header in every generated hook (mandatory — used by `/forge compile --validate` to detect generated files): + +```bash +# AUTO-GENERATED by dotforge behavior compiler v1. DO NOT EDIT. +# Source: behaviors/index.yaml +# Generated: +# Event: +# Behaviors: +``` + +--- + +## 9. Incremental Compilation + +- Compare mtime of `behaviors/index.yaml` and each `behaviors//behavior.yaml` against the generated hook's mtime. +- Skip regeneration for an event group if all source files are older than the generated hook. +- If any behavior file in an event group changed, regenerate the entire hook for that event (behaviors share one file — partial regeneration is not possible). +- `--force` bypasses mtime check and regenerates all hooks. +- `_forge_runtime.sh` is always regenerated (it is versioned with the compiler, not with behaviors). + +--- + +## 10. Test Pattern + +Manual testing without Claude Code: + +```bash +# Test search-first nudge (first Write violation) +echo '{"tool_name":"Write","tool_input":{"file_path":"/src/new.ts","content":"export function"},"session_id":"test-abc","hook_event_name":"PreToolUse"}' \ + | bash .claude/hooks/behaviors/PreToolUse.sh + +# Expected — nudge (counter=1): +# {"systemMessage":"Search Before Writing: Consider searching first (violation 1/5)"} + +# Expected — soft_block (counter=5): +# {"hookSpecificOutput":{"hookEventName":"PreToolUse","permissionDecision":"deny"},"systemMessage":"..."} + +# Exit code must always be 0: +echo $? + +# Test no-destructive-git hard_block +echo '{"tool_name":"Bash","tool_input":{"command":"git push origin main --force"},"session_id":"test-xyz","hook_event_name":"PreToolUse"}' \ + | bash .claude/hooks/behaviors/PreToolUse.sh +# Expected: {"hookSpecificOutput":{"hookEventName":"PreToolUse","permissionDecision":"deny","override_allowed":false},"systemMessage":"..."} +``` + +--- + +## 11. Complete Example: search-first Compiled Hook (Pseudocode Structure) + +The compiler generates a concrete bash function for each behavior. For `search-first` from SCHEMA.md Section 7: + +```bash +# Compiler inlines these from behavior.yaml at compile time: +BEHAVIOR_ID="search-first" +BEHAVIOR_NAME="Search Before Writing" +APPLIES_TO_TOOLS=() # empty = all tools +TRIGGER_EVENT="PreToolUse" +TRIGGER_MATCHER="Write|Edit" +TRIGGER_CONDITIONS_FIELD=("file_path") # from conditions[] +TRIGGER_CONDITIONS_OP=("regex_match") +TRIGGER_CONDITIONS_VAL=('\.(py|ts|js|tsx|jsx|swift|go|rs|java|kt|rb|php|cs)$') +TRIGGER_LOGIC="all" + +# Enforcement thresholds (sorted by after DESC for resolve_level): +ESCALATION_AFTER=(5 3 1) +ESCALATION_LEVEL=("soft_block" "warning" "nudge") +DEFAULT_LEVEL="silent" + +# Rendering templates (variables substituted at output time, not compile time): +NUDGE_TEMPLATE="{behavior_name}: Consider searching first (violation {counter}/{threshold})" +WARNING_TEMPLATE="**[{behavior_id}]** You have written code {counter} times without searching first.\nExpected: use Grep/Glob to find existing patterns before implementing.\nAction: search for related code, then proceed.\nNext violation ({threshold}) triggers a block." +BLOCK_REASON="Must search the codebase before writing new code." +OVERRIDE_PROMPT="Run Grep or Glob first, then retry the write operation." + +# Runtime flow in evaluate_behavior "search-first" ...: +# 1. Check APPLIES_TO_TOOLS — skip if tool not in list (empty = pass) +# 2. Check TRIGGER_MATCHER: if TOOL_NAME not in "Write|Edit" → return (no violation) +# 3. Evaluate conditions: +# - extract file_path from TOOL_INPUT via jq +# - apply regex_match against pattern +# - TRIGGER_LOGIC=all: all must pass +# 4. Trigger matched: increment counter in state (read from forge_init_state) +# 5. resolve_level: walk ESCALATION_AFTER DESC, first counter >= after wins +# - counter=1 → nudge; counter=3 → warning; counter=5 → soft_block +# 6. Apply monotonic: effective_level = max(previous_effective_level, calculated_level) +# 7. Substitute template variables: {counter}, {threshold}, {behavior_name}, {behavior_id}, etc. +# 8. forge_queue_output "search-first" "$effective_level" "$rendered_message" +# 9. If soft_block: set FORGE_BLOCK_HIT=1 (cuts chain per SPEC.md Section 4.2) +``` + +Template variable `{threshold}` is resolved at output time: walk escalation thresholds for the first `after > current_counter`; if none, emit `"max"`. + +--- + +## 12. Compiler Invocation + +```bash +# Explicit compilation +/forge compile # compile all behaviors, skip unchanged +/forge compile --force # force full recompile of all hooks +/forge compile --dry-run # print what would be generated, write nothing +/forge compile --validate # validate all behavior.yaml files, no generation + +# Automatic (as part of sync) +/forge sync # runs compile step when behaviors/ directory exists +``` + +Exit codes for `/forge compile`: +- `0` — success (all hooks generated or up-to-date) +- `1` — validation errors (no files written; errors listed to stdout) diff --git a/docs/v3/DECISIONS.md b/docs/v3/DECISIONS.md new file mode 100644 index 0000000..10d6a85 --- /dev/null +++ b/docs/v3/DECISIONS.md @@ -0,0 +1,89 @@ +# Decisiones cerradas para dotforge v3.0 + +Este documento lista decisiones que NO se reabren en el plan de v3.0. +Cualquier objeción debe apuntar a implementación o a decisiones aún +abiertas, no a renegociar estas. + +## Semántica de enforcement + +Cinco niveles en escalación de severidad: + +| Nivel | Mecanismo | Ve el agente | Override | +|-------|-----------|--------------|----------| +| silent | exit 0 | Nada | No aplica | +| nudge | exit 0 + stdout breve (1 línea) | Recordatorio neutral | No aplica | +| warning | exit 0 + stdout JSON `systemMessage` (2-4 líneas) | Advertencia clara: behavior, expectativa, corrección | No aplica | +| soft_block | JSON `permissionDecision: "deny"` + `override_allowed: true` | Bloqueo con instrucción de corrección u override | Sí, auditado | +| hard_block | JSON `permissionDecision: "deny"` + `override_allowed: false` | Bloqueo definitivo | No en 3.0 | + +- Override permitido únicamente en soft_block. +- hard_block sin override en 3.0 por diseño. Sin esta regla, hard_block + es soft_block con nombre más agresivo. + +## Reglas de conteo + +- Una tool call produce máximo una violación por behavior, aunque + matchee múltiples triggers internos del mismo behavior. +- Contador por sesión y por behavior. +- Incrementa antes de calcular nivel efectivo. +- Escalación se resuelve sobre contador ya incrementado. + +## Escalación + +- Evaluada contra contador del behavior en la sesión actual. +- Thresholds definidos en campo `enforcement.escalation` del behavior. +- Nivel efectivo puede solo subir dentro de la sesión, no bajar. + +## Auditoría de overrides + +Triple escritura: + +1. `.forge/audit/overrides.log` — append-only, permanente, nunca se borra +2. `.forge/runtime/state.json` — scope sesión, se purga con TTL +3. `registry` / insights agregados — métricas para `/forge audit` + +Cada override registra: timestamp, session_id, behavior_id, action/tool, +reason corta si existe, contador acumulado al momento del override. + +## Runtime + +- Archivo único `.forge/runtime/state.json` +- Sesiones como keys por session_id +- TTL 24h sobre último acceso (no sobre creación) +- Locking atómico para concurrencia: mkdir-based (POSIX-portable, macOS-compatible). Caso real: multi-agente con VPS + local + Telegram + +## Evaluación múltiple + +- Orden de evaluación: declaración en `behaviors/index.yaml` +- Primer block (soft o hard) corta la cadena +- Niveles no-bloqueantes (silent, nudge, warning) se acumulan y se muestran todos +- silent siempre corre (registra telemetría) + +## Schema + +- Separación `policy` (qué esperamos) vs `rendering` (cómo se comunica) +- DSL declarativa acotada, no expresiones sandboxed +- Primitivos sobre `tool_input` y `session_state` +- Behavior que no se expresa con los primitivos no entra al catálogo 3.0 +- Campo `scope` reservado con valores `session | task | project`, + implementado solo `session` en 3.0 +- Campo `schema_version` en todo behavior +- Campo `applies_to.agents` reservado en schema, implementación básica en 3.0 + +## Rollout + +- Alpha privada corta: 1 semana, 3-5 usuarios técnicos +- Al menos uno con CLI local pura, al menos uno con flujo remoto +- 4 gates para abrir beta pública: + 1. search-first escala de nudge a soft_block correctamente + 2. Override auditado en los 3 lugares + 3. Status refleja contadores reales + 4. Telegram/VPS no deforma warning ni soft_block + +## Posicionamiento + +- v2.9 cierra etapa config-manager +- v3.0 abre etapa behavior-governance +- Evolutivo, no pivotal +- La base de 2.9 queda intacta +- Diferencial en governance + UX + integración, no en completitud de features diff --git a/docs/v3/RUNTIME.md b/docs/v3/RUNTIME.md new file mode 100644 index 0000000..77a5d0e --- /dev/null +++ b/docs/v3/RUNTIME.md @@ -0,0 +1,312 @@ +# Runtime State Management — dotforge v3.0 + +Specifies the session state format, lifecycle, TTL purge, locking protocol, and counter mechanics +that implement the evaluation algorithm defined in [SPEC.md](SPEC.md) Section 2. + +--- + +## 1. Overview + +The runtime maintains per-session, per-behavior counters and effective levels across hook invocations. +State lives in a single JSON file: `.forge/runtime/state.json`. +Sessions are keyed by `session_id` from the Claude Code hook payload. +All state access is serialized via a mkdir-based lock. + +Directory layout: + +``` +.forge/ +├── runtime/ +│ ├── state.json # mutable session state (gitignored) +│ └── state.lock/ # mkdir lock directory (transient, gitignored) +└── audit/ + └── overrides.log # permanent audit trail (NOT gitignored) +``` + +`.gitignore` entry required: `.forge/runtime/` +`.forge/audit/` is committed to git — it is permanent audit evidence. + +--- + +## 2. state.json Schema + +```json +{ + "schema_version": "1", + "sessions": { + "a1b2c3d4-e5f6-7890-abcd-ef1234567890": { + "created_at": "2026-04-13T10:00:00Z", + "last_accessed_at": "2026-04-13T14:30:00Z", + "behaviors": { + "search-first": { + "counter": 4, + "effective_level": "warning", + "last_violation_at": "2026-04-13T14:28:00Z", + "last_violation_tool": "Write", + "overrides": [ + { + "timestamp": "2026-04-13T12:15:00Z", + "tool_name": "Edit", + "tool_input_summary": "Edit file_path=/src/utils.ts old_stri...", + "counter_at_override": 5, + "reason": "" + } + ] + }, + "no-destructive-git": { + "counter": 0, + "effective_level": "silent", + "last_violation_at": null, + "last_violation_tool": null, + "overrides": [] + } + } + }, + "e5f6g7h8-1234-5678-abcd-000000000001": { + "created_at": "2026-04-12T08:00:00Z", + "last_accessed_at": "2026-04-12T16:00:00Z", + "behaviors": {} + } + } +} +``` + +### Field reference + +| Field | Type | Description | +|-------|------|-------------| +| `schema_version` | string | Always `"1"` in v3.0. Used for migration detection. | +| `sessions` | object | Map of `session_id → session entry`. | +| `created_at` | ISO 8601 | When the session entry was first created. | +| `last_accessed_at` | ISO 8601 | Updated on every hook invocation. TTL applies to this field. | +| `behaviors` | object | Map of `behavior_id → behavior state`. | +| `counter` | integer | Violation count. Increments before level resolution. Never negative. | +| `effective_level` | string | Monotonic level: silent \| nudge \| warning \| soft_block \| hard_block | +| `last_violation_at` | ISO 8601 \| null | Timestamp of last violation. Null until first violation. | +| `last_violation_tool` | string \| null | Tool name that caused last violation. Null until first violation. | +| `overrides` | array | Audit records for soft_block overrides in this session. Subset of `.forge/audit/overrides.log`. | + +--- + +## 3. Session Lifecycle + +### Creation + +A session entry is created on first hook invocation for a given `session_id`. +Session ID comes from the hook's JSON stdin payload field `session_id` — Claude Code generates a UUID at init and includes it in all hook payloads (confirmed: present in PostCompact, PreToolUse, and all other hook events). + +If `session_id` is absent (older Claude Code versions): + +```bash +SESSION_ID=$(echo "${PWD}:${PPID}:$(date +%Y%m%d)" | md5sum | cut -c1-36) +``` + +This fallback is stable within a process tree on a given day, and deterministic across hooks in the same session. + +### Access + +Every hook invocation updates `last_accessed_at` to the current UTC timestamp before releasing the lock. +This applies even when no behavior violation occurs. + +### Expiry + +Sessions with `last_accessed_at` older than 24 hours (86400 seconds) are purged. +TTL is measured from last access, not from creation. +Purge runs inline on every state access — no background job required. + +--- + +## 4. Counter Mechanics + +- Counter is per-behavior, per-session. +- Increments by 1 on each violation (triggered tool call). See SPEC.md Section 3.1. +- **Increments BEFORE level calculation** — the first violation resolves against counter=1. +- Never decreases within a session. +- Resets to 0 only when the session is purged by TTL. +- One violation per behavior per tool call, even if multiple triggers match internally. + +Example sequence for `search-first` with `default_level: silent`, escalation `after: 1 → nudge`, `after: 3 → warning`, `after: 5 → soft_block`: + +| Tool call | Counter after increment | Calculated level | Effective level | +|-----------|------------------------|-----------------|----------------| +| 1st | 1 | nudge | nudge | +| 2nd | 2 | nudge | nudge | +| 3rd | 3 | warning | warning | +| 4th | 4 | warning | warning | +| 5th | 5 | soft_block | soft_block | + +--- + +## 5. Effective Level Calculation + +Implements SPEC.md Section 2.1 (`resolve_level`) plus monotonic enforcement (Section 3.2). + +``` +# resolve_level: walk thresholds from highest after to lowest +FUNCTION resolve_level(enforcement, counter): + FOR threshold IN enforcement.escalation SORTED BY after DESC: + IF counter >= threshold.after: + RETURN threshold.level + RETURN enforcement.default_level + +# monotonic: effective level can only rise +FUNCTION update_effective_level(behavior_state, enforcement): + calculated = resolve_level(enforcement, behavior_state.counter) + previous = behavior_state.effective_level + behavior_state.effective_level = max_level(previous, calculated) +``` + +Level ordering for `max_level`: `silent < nudge < warning < soft_block < hard_block`. + +--- + +## 6. TTL Purge Protocol + +Runs **inline on every state.json access**, after lock acquisition, before business logic. +Not a background job. Idempotent. + +```bash +purge_expired_sessions() { + local state_file="$1" + local now + now=$(date +%s) + local cutoff=$((now - 86400)) + + # For each session, check last_accessed_at epoch vs cutoff + # Remove sessions where epoch(last_accessed_at) < cutoff + jq --argjson cutoff "$cutoff" ' + .sessions |= with_entries( + select( + (.value.last_accessed_at | fromdateiso8601) >= $cutoff + ) + ) + ' "$state_file" > "${state_file}.tmp" && mv "${state_file}.tmp" "$state_file" +} +``` + +If all sessions have expired, the result is: + +```json +{"schema_version": "1", "sessions": {}} +``` + +--- + +## 7. Locking Protocol + +Uses mkdir-based locking — POSIX-portable, works on macOS and Linux without flock. + +Lock path: `.forge/runtime/state.lock/` (a directory, not a file). + +```bash +LOCK_DIR=".forge/runtime/state.lock" +LOCK_TIMEOUT=2 # seconds + +acquire_lock() { + local deadline=$(($(date +%s) + LOCK_TIMEOUT)) + while ! mkdir "$LOCK_DIR" 2>/dev/null; do + # Check for stale lock + if [ -f "$LOCK_DIR/pid" ]; then + local pid + pid=$(cat "$LOCK_DIR/pid" 2>/dev/null) + if [ -n "$pid" ] && ! kill -0 "$pid" 2>/dev/null; then + # Process is dead — remove stale lock and retry once + rm -rf "$LOCK_DIR" + mkdir "$LOCK_DIR" 2>/dev/null && break + fi + fi + if [ "$(date +%s)" -ge "$deadline" ]; then + return 1 # timeout + fi + sleep 0.1 + done + echo $$ > "$LOCK_DIR/pid" + return 0 +} + +release_lock() { + rm -rf "$LOCK_DIR" +} +``` + +### Lock timeout behavior + +On timeout (2 seconds elapsed without acquiring lock): +- Hook proceeds using `default_level` for all behaviors. +- No state is read or written. +- Warning logged to stderr: `[forge] state lock timeout — using default levels` +- Tool call is not blocked. + +### Stale lock detection + +If `state.lock/` exists and the PID in `state.lock/pid` is not running (`! kill -0 $pid`), +the lock is stale. Remove it and retry lock acquisition once. + +--- + +## 8. Concurrency Scenarios + +### Multi-agent (VPS + local + Telegram) + +Each Claude Code instance generates its own `session_id`. State is shared via `.forge/runtime/state.json`. +Lock serializes concurrent writes. Instances see independent session entries — counters do not cross-contaminate. +Brief lock contention resolves within milliseconds under normal load. + +### Parallel tool calls within a single session + +Claude Code executes up to 10 concurrent tool calls (per domain rule: `gW5 = 10`). +Each tool call triggers a PreToolUse hook. All hooks share the same `session_id`. +Lock serializes access — each hook reads, mutates, and writes state atomically. +Counter increments are cumulative: 10 concurrent triggers on the same behavior → counter increases by 10 (one per serialized write). + +### Subagent with independent context + +Subagents may receive the same `session_id` as the parent (shared counters — correct for session-scoped governance) +or a new `session_id` (independent counters — correct for subagent isolation). +Both cases are valid. The runtime handles both without special logic. + +--- + +## 9. Error Recovery + +| Condition | Action | +|-----------|--------| +| Corrupted `state.json` (JSON parse failure) | Replace with `{"schema_version": "1", "sessions": {}}`. Log warning to stderr. All counters reset. | +| Missing `.forge/` directory | `mkdir -p .forge/runtime .forge/audit` on first access. | +| Missing `state.json` | Create with `{"schema_version": "1", "sessions": {}}`. | +| Stale lock directory | Check PID, `rm -rf` if process dead, retry once. | +| Disk full on write | Log warning to stderr, proceed with `default_level`. Do not crash. | +| `jq` not available | Emit warning to stderr, exit 0 (allow). All behaviors degrade to silent pass-through. SessionStart hook must check for `jq`. | +| Hook timeout (10 min default) | Claude Code kills the process. Tool call proceeds. Lock must be cleaned by next invocation via stale lock detection. | + +--- + +## 10. Full Access Sequence + +Every hook invocation follows this sequence: + +``` +1. acquire_lock() + → on timeout: use default_level, exit 0 + +2. read_or_initialize(".forge/runtime/state.json") + → on parse failure: reset to empty, log warning + +3. purge_expired_sessions(state) + +4. get_or_create_session(state, session_id) + +5. run evaluation loop (SPEC.md Section 2) + → increment counters + → resolve effective levels + → accumulate outputs + +6. update last_accessed_at + +7. write_state(".forge/runtime/state.json") + → on write failure: log warning, continue + +8. release_lock() + +9. emit JSON output to stdout, exit 0 +``` diff --git a/docs/v3/SCHEMA.md b/docs/v3/SCHEMA.md new file mode 100644 index 0000000..0c95e22 --- /dev/null +++ b/docs/v3/SCHEMA.md @@ -0,0 +1,364 @@ +# behavior.yaml Schema v1 + +YAML schema specification for dotforge v3.0 declarative behavior files. +Enforcement semantics: [SPEC.md](SPEC.md). Design decisions: [DECISIONS.md](DECISIONS.md). + +--- + +## 1. Overview + +A behavior file declares an expected agent behavior, its enforcement policy, and its rendering templates. Each behavior lives in its own directory: + +``` +behaviors/ + / + behavior.yaml # this schema + tests/ # optional: test fixtures for validation +``` + +This directory-per-behavior structure allows test fixtures to live alongside the behavior without cluttering the root. The `behaviors/index.yaml` file controls which behaviors are active and in what evaluation order. + +`schema_version: "1"` is required in every behavior file. It is the compile-time version of this schema, not the behavior's own version. + +--- + +## 2. Complete Field Reference + +```yaml +schema_version: "1" # string, required. Must equal "1". +id: search-first # string, required. Kebab-case. Unique across all behaviors. +name: Search Before Writing # string, required. Human-readable display name. +description: > # string, required. 1-3 sentences stating the purpose. + Require the agent to search existing code before writing new implementations. + Prevents duplicate code and enforces codebase familiarity. +category: core # enum [core, opinionated, experimental], required. +scope: session # enum [session, task, project], required. Only "session" functional in 3.0. +enabled: true # boolean, default true. Override per-entry in index.yaml. + +policy: + triggers: # array, required. At least one trigger. + - event: PreToolUse # enum [PreToolUse, PostToolUse, UserPromptSubmit, Stop], required. + matcher: "Write|Edit" # string, tool matcher pattern. Required for PreToolUse and PostToolUse. + # Optional for UserPromptSubmit and Stop. + # Examples: "Bash", "Grep|Glob", "*". + conditions: # array, optional. If empty or absent, any matching event triggers. + - field: file_path # string, from closed DSL field set (Section 3). + operator: regex_match # enum, from closed DSL operator set (Section 3). + value: '\.(py|ts|js|swift|go|rs|java|kt)$' + logic: all # enum [all, any], default "all". How conditions are combined. + + enforcement: + default_level: silent # enum [silent, nudge, warning, soft_block, hard_block], required. + escalation: # array, optional. Absent = always default_level. + - after: 1 # integer >= 1, required. counter >= this value triggers level. + level: nudge # enum [silent, nudge, warning, soft_block, hard_block], required. + - after: 3 + level: warning + - after: 5 + level: soft_block + + recovery: + hint: "Use Grep or Glob to search for existing patterns before writing new code." + # string, required. Instruction shown to agent on violation. + suggested_tool: Grep # string, optional. Tool name to suggest. + suggested_action: "grep -r '' src/" + # string, optional. Concrete command or action. + +rendering: + nudge_template: "{behavior_name}: Consider searching first (violation {counter})" + # string, max 120 chars. Supports {variables} (Section 4). + warning_template: | + **[{behavior_id}]** You have written code {counter} times without searching first. + Expected: use Grep/Glob to find existing patterns before implementing. + Action: search for related code, then proceed. + Next violation triggers a block. + # string, max 500 chars. Supports {variables}. + block_reason: "Must search the codebase before writing new code." + # string, max 200 chars. + override_prompt: "Run Grep or Glob first, then retry the write operation." + # string, optional. Shown when soft_block allows override. + +applies_to: + tools: [] # string array, optional. Tool names. Empty = all tools. + agents: [] # string array, optional. Agent names, simple string match. Empty = all. + profiles: [standard, strict] + # string array, optional. Hook profiles [minimal, standard, strict]. + +metadata: + author: dotforge # string, optional. + version: "1.0.0" # string, optional. Semver. + tags: [search, quality] # string array, optional. +``` + +### Field constraints summary + +| Field | Type | Required | Default | Constraint | +|-------|------|----------|---------|------------| +| `schema_version` | string | yes | — | Must equal `"1"` | +| `id` | string | yes | — | Kebab-case: `[a-z][a-z0-9-]*[a-z0-9]` | +| `name` | string | yes | — | — | +| `description` | string | yes | — | 1–3 sentences | +| `category` | enum | yes | — | `core`, `opinionated`, `experimental` | +| `scope` | enum | yes | — | `session`, `task`, `project` (only `session` functional) | +| `enabled` | boolean | no | `true` | Overridden by index.yaml | +| `policy.triggers` | array | yes | — | At least 1 item | +| `triggers[].event` | enum | yes | — | `PreToolUse`, `PostToolUse`, `UserPromptSubmit`, `Stop` | +| `triggers[].matcher` | string | conditional | — | Required for `PreToolUse`/`PostToolUse` | +| `triggers[].conditions` | array | no | [] | Each item needs `field`, `operator`, `value` | +| `triggers[].logic` | enum | no | `all` | `all`, `any` | +| `enforcement.default_level` | enum | yes | — | One of 5 levels | +| `escalation[].after` | integer | yes | — | >= 1 | +| `recovery.hint` | string | yes | — | Shown to agent on violation | +| `nudge_template` | string | no | — | Max 120 chars | +| `warning_template` | string | no | — | Max 500 chars | +| `block_reason` | string | no | — | Max 200 chars | + +--- + +## 3. Closed DSL Specification + +Conditions reference fields from two closed namespaces. No other fields are valid in 3.0. + +### tool_input fields + +Available when the trigger event provides tool context (PreToolUse, PostToolUse): + +| Field | Available for tools | Description | +|-------|--------------------|----| +| `command` | Bash | Full command string | +| `file_path` | Write, Edit, Read | Target file path | +| `content` | Write, Edit | File content or new text | +| `old_string` | Edit | Text being replaced | +| `pattern` | Grep, Glob | Search pattern | +| `query` | WebSearch | Search query | +| `url` | WebFetch | Target URL | +| `prompt` | Agent | Agent prompt text | + +### session_state fields + +Available in all trigger evaluations: + +| Field | Type | Description | +|-------|------|-------------| +| `counter` | integer | Current violation count for this behavior in the current session | + +Only `counter` is available in 3.0. Additional session_state fields deferred to 3.1. + +### Operators + +**String operators** (for tool_input fields): + +| Operator | Semantics | +|----------|-----------| +| `regex_match` | Value is a regex; field must match | +| `contains` | Field contains value as substring | +| `not_contains` | Field does not contain value | +| `equals` | Exact string equality | +| `starts_with` | Field starts with value | +| `ends_with` | Field ends with value | + +**Numeric operators** (for `session_state.counter`): + +| Operator | Semantics | +|----------|-----------| +| `gt` | Greater than | +| `lt` | Less than | +| `gte` | Greater than or equal | +| `lte` | Less than or equal | +| `equals` | Equal to | + +**Existence operators** (for any field): + +| Operator | Semantics | +|----------|-----------| +| `exists` | Field is present and non-empty | +| `not_exists` | Field is absent or empty | + +--- + +## 4. Template Variables + +Available in `nudge_template`, `warning_template`, `block_reason`, and `override_prompt`: + +| Variable | Value | +|----------|-------| +| `{behavior_name}` | Human-readable name (from `name` field) | +| `{behavior_id}` | Kebab-case id | +| `{counter}` | Current violation count for this behavior | +| `{tool_name}` | Tool that triggered the violation | +| `{level}` | Current effective level name (e.g., `warning`) | +| `{threshold}` | Next escalation threshold count, or `"max"` if at highest level | + +Example: `"{behavior_name}: violation {counter}/{threshold} — {level} active"` + +--- + +## 5. behaviors/index.yaml Format + +Controls which behaviors are active and in what order they are evaluated. Order determines chain evaluation sequence (see SPEC.md Section 4). + +```yaml +schema_version: "1" +behaviors: + - id: search-first + enabled: true + - id: verify-before-done + enabled: true + - id: no-destructive-git + enabled: true + - id: respect-todo-state + enabled: true + - id: plan-before-code + enabled: false # opinionated, opt-in + - id: objection-format + enabled: false # opinionated, opt-in +``` + +Rules: +- `enabled` here overrides the behavior file's own `enabled` field. +- Every referenced `id` must have a corresponding `behaviors//behavior.yaml`. +- Evaluation follows declaration order — put safety-critical behaviors first. + +--- + +## 6. Validation Rules + +Compile-time checks that must pass before hook generation: + +- `schema_version` must equal `"1"`. +- `id` must be unique across all behaviors in the index. +- `id` must match `[a-z][a-z0-9-]*[a-z0-9]` (no uppercase, no leading/trailing hyphens). +- At least one trigger is required per behavior. +- `matcher` is required when `event` is `PreToolUse` or `PostToolUse`. +- All `field` values in conditions must be from the closed DSL field set (Section 3). +- All `operator` values must be from the closed DSL operator set (Section 3). +- Escalation `after` values must be non-decreasing when sorted: each successive entry must have `after >= previous after`. Levels must be non-decreasing in severity. +- `nudge_template` length must be <= 120 chars. +- `warning_template` length must be <= 500 chars. +- `block_reason` length must be <= 200 chars. +- Every `id` listed in `behaviors/index.yaml` must have a file at `behaviors//behavior.yaml`. + +--- + +## 7. Complete Example: search-first + +```yaml +schema_version: "1" +id: search-first +name: Search Before Writing +description: > + Require the agent to search existing code before writing new implementations. + Prevents duplicate code and enforces codebase familiarity before modification. +category: core +scope: session +enabled: true + +policy: + triggers: + - event: PreToolUse + matcher: "Write|Edit" + conditions: + - field: file_path + operator: regex_match + value: '\.(py|ts|js|tsx|jsx|swift|go|rs|java|kt|rb|php|cs)$' + logic: all + + enforcement: + default_level: silent + escalation: + - after: 1 + level: nudge + - after: 3 + level: warning + - after: 5 + level: soft_block + + recovery: + hint: "Use Grep or Glob to search for existing patterns before writing new code." + suggested_tool: Grep + suggested_action: "grep -r '' src/" + +rendering: + nudge_template: "{behavior_name}: Consider searching first (violation {counter}/{threshold})" + warning_template: | + **[{behavior_id}]** You have written code {counter} times without searching first. + Expected: use Grep/Glob to find existing patterns before implementing. + Action: search for related code, then proceed. + Next violation ({threshold}) triggers a block. + block_reason: "Must search the codebase before writing new code." + override_prompt: "Run Grep or Glob first, then retry the write operation." + +applies_to: + tools: [] + agents: [] + profiles: [standard, strict] + +metadata: + author: dotforge + version: "1.0.0" + tags: [search, quality, core] +``` + +--- + +## 8. Complete Example: no-destructive-git + +```yaml +schema_version: "1" +id: no-destructive-git +name: No Destructive Git Operations +description: > + Block force pushes, hard resets, and other destructive git operations permanently. + Safety-critical — no override available. +category: core +scope: session +enabled: true + +policy: + triggers: + - event: PreToolUse + matcher: "Bash" + conditions: + - field: command + operator: regex_match + value: 'git\s+(push\s+.*--force|push\s+.*-f\b|reset\s+--hard|clean\s+-f|branch\s+-[Dd])' + logic: all + + enforcement: + default_level: hard_block + + recovery: + hint: "Destructive git operations are permanently blocked. Use safe alternatives: git revert, git stash, git reset --soft." + suggested_tool: Bash + suggested_action: "git revert HEAD # or git stash" + +rendering: + block_reason: "Force push, hard reset, and destructive git operations are permanently blocked." + +applies_to: + tools: [Bash] + agents: [] + profiles: [minimal, standard, strict] + +metadata: + author: dotforge + version: "1.0.0" + tags: [git, safety, core] +``` + +Note: no `escalation` defined — `hard_block` is the immediate and permanent level. No `nudge_template` or `warning_template` needed since the behavior never produces those levels. See SPEC.md Section 5.6 for hard_block output protocol. + +--- + +## 9. Anti-patterns + +Do NOT include these in a behavior file: + +- **Runtime expressions or scripting.** Conditions must use the closed DSL. No embedded bash, jq filters, or eval expressions. +- **File I/O or network calls.** Behaviors are declarative. No reading external files, no HTTP calls, no database queries. +- **Cross-behavior references.** A behavior cannot reference another behavior's counter, state, or output. +- **External state beyond tool_input and session_state.counter.** Environment variables, filesystem state, git state, and time-based conditions are out of scope for 3.0. +- **Rendering templates that exceed length limits.** nudge > 120 chars, warning > 500 chars, block_reason > 200 chars all fail validation. +- **Mixed concerns in one behavior.** A behavior enforcing both search-first and test-before-done is two behaviors. Split them — each behavior must have a single, named concern. +- **Bare `enabled: false` as a permanence signal.** Use `category: experimental` to signal instability; `enabled: false` in index.yaml for user opt-in behaviors. +- **Using `scope: task` or `scope: project`.** Reserved for future versions. Set `scope: session` in 3.0; other values parse but produce no functional behavior. diff --git a/docs/v3/SCOPE.md b/docs/v3/SCOPE.md new file mode 100644 index 0000000..4d2d76f --- /dev/null +++ b/docs/v3/SCOPE.md @@ -0,0 +1,107 @@ +# Alcance de dotforge v3.0 + +## En scope para v3.0 + +Cinco piezas inseparables: + +1. **Schema formal de behaviors** + - Declarativo, acotado, versionado + - Separación policy / rendering + - DSL mínima cerrada + +2. **Runtime state mínimo** + - `.forge/runtime/state.json` + - Contadores por sesión y por behavior + - flock para concurrencia + - TTL 24h sobre último acceso + +3. **Enforcement escalonado de 5 niveles** + - silent, nudge, warning, soft_block, hard_block + - JSON output para blocks (no exit codes custom) + - Override solo en soft_block + +4. **Catálogo curado** + - Core: search-first, verify-before-done, no-destructive-git, respect-todo-state + - Opinionated: plan-before-code, objection-format + - Experimental: vacío en 3.0 (categoría reservada) + +5. **UX de control y escape** + - `/forge behavior on|off|status|strict|relaxed` + - Scopes: `--session | --project | --agent` + - Status muestra contadores reales, violaciones por behavior, overrides + +## Fuera de scope para v3.0 (diferido explícitamente) + +### Diferido a 3.1 (6-8 semanas post-release) + +- Prompt-based hooks (`type: prompt` en behaviors) +- Context aggregation cross-hook rica +- Export de behaviors a `.cursorrules`, `AGENTS.md`, `.windsurfrules` +- Scope contador `task` funcional +- Recomendador automático (`/forge behavior recommend`) + +### Diferido a 3.2 (3-4 meses post-release) + +- Signed behaviors + hash verification (post-CVE Feb 2026) +- Verification contra transcripts de sesión +- llm_self_examine como estrategia de recovery +- OPA/Rego compile path (enterprise opcional) + +### Diferido sin fecha + +- Policy engine unificado +- Behavior marketplace público +- Telemetría cross-proyecto anónima + +## Entregables de Fase 0 (cierre de spec) + +Exactamente 5 documentos: + +1. `docs/v3/SPEC.md` — semántica formal de niveles (tabla canónica) +2. `docs/v3/SCHEMA.md` — shape completo de `behavior.yaml v1` +3. `docs/v3/RUNTIME.md` — formato de `state.json`, TTL, concurrencia +4. `docs/v3/AUDIT.md` — formato `overrides.log` y métricas expuestas +5. `docs/v3/COMPILER.md` — reglas mínimas de compilación behavior → hook + +Criterio de aceptación de Fase 0: otro ingeniero podría implementar +Fase 1 sin preguntas. + +## Entregables de Fase 1 (2-3 semanas post-spec) + +- Runtime funcionando con flock y TTL +- Compilador `behaviors/.yaml → .claude/hooks/.sh` +- `search-first` funcional end-to-end (nudge → warning → soft_block) +- Override registry funcionando en los 3 lugares +- `/forge behavior on|off|status|strict|relaxed` con scopes básicos + +## Entregables de Fase 2 (2-3 semanas) + +- Catálogo core: 4 behaviors (search-first, verify-before-done, + no-destructive-git, respect-todo-state) +- Catálogo opinionated: 2 behaviors (plan-before-code, objection-format) +- `/forge behavior list|describe` +- Integración con `/forge audit`: dimensión "behaviors coverage" +- Tests por behavior en `behaviors//tests/` + +## Entregables de Fase 3 (1-2 semanas — release) + +- README reescrito (diferencial en primeras 40 líneas) +- CHANGELOG v3.0 +- Migration guide desde 2.9 (opt-in, no rompe 2.9) +- Benchmark real corrido en SOMA o InviSight +- GIF demo de search-first escalando +- Post técnico: "de configs a comportamiento" +- Tag v3.0.0 +- Update de submission al marketplace Anthropic + +## Métricas de éxito + +No son stars ni downloads. Son: + +- **Semana 1:** 3+ issues no-triviales abiertos por externos +- **Mes 1:** 1 behavior externo contribuido por no-Luis +- **Mes 2:** mención técnica no tuya sobre "behavior governance" +- **Mes 3:** 1 proyecto serio usando dotforge como dep activa + +Si a mes 3 no ocurrió ninguna de las cuatro: problema es distribución, +no producto. Pivot a content marketing técnico. diff --git a/docs/v3/SPEC.md b/docs/v3/SPEC.md new file mode 100644 index 0000000..e5c8c36 --- /dev/null +++ b/docs/v3/SPEC.md @@ -0,0 +1,397 @@ +# Behavior Enforcement Specification v1 + +Formal semantics for dotforge v3.0 behavior governance. +This document is the single source of truth for enforcement levels, evaluation algorithm, and output protocol. + +Reference: [DECISIONS.md](DECISIONS.md) for closed design decisions. + +--- + +## 1. Canonical Level Table + +| Level | Exit Code | Output Channel | Agent Sees | Override | Use Case | +|-------|-----------|---------------|------------|----------|----------| +| silent | 0 | none | nothing | n/a | telemetry-only, baseline counting | +| nudge | 0 | stdout JSON `systemMessage` (1 line) | neutral reminder | n/a | gentle first reminder | +| warning | 0 | stdout JSON `systemMessage` (2-4 lines) | firm warning with expected behavior and correction | n/a | repeated violation, clear guidance | +| soft_block | 0 | stdout JSON `hookSpecificOutput` + `systemMessage` | block with correction instruction; override available | yes, audited | serious violation, escapable | +| hard_block | 0 | stdout JSON `hookSpecificOutput` + `systemMessage` | definitive block, no escape | no (v3.0) | safety-critical, non-negotiable | + +All levels exit 0. Enforcement is communicated via JSON stdout, not exit codes. +Exit code 2 remains available for v2.9 compatibility hooks but is NOT used by behavior-generated hooks. + +--- + +## 2. Evaluation Algorithm + +Pseudocode for the behavior evaluation loop executed by a compiled hook on each tool call. + +``` +FUNCTION evaluate_behaviors(tool_call, hook_event, behaviors_index): + # 1. Load ordered behavior list + behaviors = read_index("behaviors/index.yaml") + state = lock_and_read(".forge/runtime/state.json") + session = get_or_create_session(state, session_id) + + accumulated_outputs = [] + block_hit = false + + # 2. Evaluate each behavior in declaration order + FOR behavior IN behaviors: + IF NOT behavior.enabled: + CONTINUE + + IF NOT matches_event(behavior, hook_event): + CONTINUE + + IF NOT matches_applies_to(behavior, tool_call): + CONTINUE + + # 3. Evaluate trigger conditions + triggered = evaluate_triggers(behavior.policy.triggers, tool_call) + + IF NOT triggered: + CONTINUE + + # 4. Increment counter BEFORE level calculation + session.behaviors[behavior.id].counter += 1 + session.behaviors[behavior.id].last_violation_at = NOW() + session.behaviors[behavior.id].last_violation_tool = tool_call.tool_name + + # 5. Calculate effective level (monotonic) + calculated_level = resolve_level( + behavior.policy.enforcement, + session.behaviors[behavior.id].counter + ) + previous_level = session.behaviors[behavior.id].effective_level + effective_level = max_level(previous_level, calculated_level) + session.behaviors[behavior.id].effective_level = effective_level + + # 6. Generate output for this behavior + output = render_output(behavior, effective_level) + accumulated_outputs.append(output) + + # 7. First block cuts chain + IF effective_level IN [soft_block, hard_block]: + block_hit = true + BREAK + + # 8. Write state and release lock + update_last_accessed(session) + purge_expired_sessions(state) # inline TTL cleanup + write_and_unlock(state) + + # 9. Merge and emit output + RETURN merge_outputs(accumulated_outputs, block_hit) +``` + +### 2.1 resolve_level + +``` +FUNCTION resolve_level(enforcement, counter): + # Walk escalation thresholds from highest to lowest + FOR threshold IN enforcement.escalation SORTED BY after DESC: + IF counter >= threshold.after: + RETURN threshold.level + RETURN enforcement.default_level +``` + +### 2.2 max_level + +Level ordering: silent < nudge < warning < soft_block < hard_block. +`max_level(a, b)` returns the higher of the two. + +### 2.3 merge_outputs + +``` +FUNCTION merge_outputs(outputs, block_hit): + IF block_hit: + # Last output is the block — emit it directly + RETURN outputs[-1] + + IF outputs is empty: + # No violations — silent pass + RETURN {} + + # Concatenate all non-blocking messages + messages = [o.systemMessage FOR o IN outputs WHERE o.systemMessage] + IF messages: + RETURN {"systemMessage": join(messages, "\n\n")} + + RETURN {} +``` + +--- + +## 3. Escalation Mechanics + +### 3.1 Counter rules + +- One counter per behavior per session +- Increments by 1 on each violation (triggered tool call) +- **One violation per behavior per tool call** — even if multiple triggers match internally +- Counter increments BEFORE level calculation (the first violation sees counter=1, not 0) +- Counter never decreases within a session +- Counter resets when session expires via TTL purge + +### 3.2 Monotonic effective level + +The effective level for a behavior within a session can only rise, never fall. +If the calculated level from the current counter is lower than the previously stored effective level, the stored level is preserved. + +### 3.3 Escalation threshold format + +Defined in `behavior.yaml` under `policy.enforcement.escalation`: + +```yaml +enforcement: + default_level: silent + escalation: + - after: 1 # counter >= 1 + level: nudge + - after: 3 # counter >= 3 + level: warning + - after: 5 # counter >= 5 + level: soft_block +``` + +Resolution: walk thresholds from highest `after` to lowest. First match wins. +If counter is 0 (no violations yet), no evaluation occurs (the trigger didn't match). + +--- + +## 4. Chain Rules + +### 4.1 Evaluation order + +Behaviors are evaluated in the order declared in `behaviors/index.yaml`. +This order is deterministic and under user control. + +### 4.2 First block cuts chain + +When a behavior's effective level is `soft_block` or `hard_block`, evaluation stops immediately. +No subsequent behaviors in the chain are evaluated for this tool call. + +### 4.3 Non-blocking accumulation + +All non-blocking violations (silent, nudge, warning) accumulate. +Their outputs are merged and presented together after the full chain completes. + +### 4.4 Silent behavior + +A `silent` behavior still increments its counter and updates state. +It produces no output. It never cuts the chain. + +--- + +## 5. Output Protocol + +All behavior hook output is JSON on stdout. Hooks always exit 0. + +### 5.1 No violation (pass) + +No output. Empty stdout. Exit 0. + +### 5.2 silent + +No output. Counter incremented in state.json only. + +### 5.3 nudge + +```json +{ + "systemMessage": "search-first: Consider using Grep or Glob before writing code (violation 2/5)" +} +``` + +Exit 0. The `systemMessage` is injected into the agent's context as a system reminder. +Maximum 1 line (~120 chars). Must include behavior name and counter context. + +### 5.4 warning + +```json +{ + "systemMessage": "**[search-first]** You have written code 3 times without searching first.\nExpected: use Grep/Glob to find existing patterns before implementing.\nAction: search for related code, then proceed.\nNext violation triggers a block." +} +``` + +Exit 0. The `systemMessage` is 2-4 lines. Must include: +- Behavior name (bold) +- What happened (violation description) +- What was expected +- What to do now +- What happens next (escalation preview) + +### 5.5 soft_block + +```json +{ + "hookSpecificOutput": { + "hookEventName": "PreToolUse", + "permissionDecision": "deny" + }, + "systemMessage": "**[search-first] BLOCKED:** You must search the codebase before writing new code.\nRun Grep or Glob first, then retry.\nThis block can be overridden — the user will be prompted to allow or deny." +} +``` + +Exit 0. The `permissionDecision: "deny"` triggers Claude Code's native permission denial flow. +The `systemMessage` explains why and how to proceed. +The user sees a permission prompt and can choose to override. + +When overridden, the override is recorded in three places (see [Section 6](#6-override-protocol)). + +### 5.6 hard_block + +```json +{ + "hookSpecificOutput": { + "hookEventName": "PreToolUse", + "permissionDecision": "deny", + "override_allowed": false + }, + "systemMessage": "**[no-destructive-git] BLOCKED:** Force push to main/master is permanently blocked.\nThis restriction cannot be overridden." +} +``` + +Exit 0. Same structure as soft_block but with `override_allowed: false`. +No override is possible in v3.0. + +> **Note:** The `override_allowed` field is included explicitly in both soft_block (`true` implied by absence) and hard_block (`false`). Soft_block omits `override_allowed` because Claude Code's default behavior when `permissionDecision: "deny"` is to show the override prompt. Hard_block sets `override_allowed: false` to signal that the denial is final. + +### 5.7 Multiple non-blocking outputs + +When multiple behaviors produce nudge/warning on the same tool call: + +```json +{ + "systemMessage": "search-first: Consider using Grep or Glob before writing code (violation 2/5)\n\n**[verify-before-done]** Reminder: run tests before marking task complete." +} +``` + +Messages are concatenated with `\n\n` separator. Order follows index.yaml declaration order. + +--- + +## 6. Override Protocol + +Overrides apply only to `soft_block` level. + +### 6.1 Flow + +1. Behavior hook emits `permissionDecision: "deny"` + `systemMessage` +2. Claude Code's native permission system presents the denial to the user +3. User chooses to allow (override) or deny (respect block) +4. If overridden, Claude Code re-invokes the tool — the hook fires again +5. The hook detects the override via the PermissionDenied→allow flow and records the audit trail + +### 6.2 Override detection + +The compiled hook does NOT need to detect overrides in real-time. +Claude Code handles the override flow natively. +The override audit trail is written by a separate `PermissionDenied` event hook that fires when a permission denial is overridden. + +### 6.3 Triple-write audit + +Every override is recorded in three locations: + +| Location | Scope | Persistence | Format | +|----------|-------|-------------|--------| +| `.forge/audit/overrides.log` | permanent | committed to git | pipe-delimited append-only | +| `.forge/runtime/state.json` | session | TTL 24h | JSON array in behavior's `overrides[]` | +| `registry/projects.yml` metrics | project | permanent | aggregated `override_rate` | + +Fields per override record: +- `timestamp` — ISO 8601 +- `session_id` — from hook payload +- `behavior_id` — which behavior was overridden +- `tool_name` — which tool triggered the block +- `tool_input_summary` — first 100 chars of tool input, sanitized +- `counter_at_override` — violation count at override moment + +See [AUDIT.md](AUDIT.md) for exact formats. + +--- + +## 7. Edge Cases + +### 7.1 Multiple behaviors at same level on same tool call + +All non-blocking behaviors (silent/nudge/warning) accumulate. Their outputs merge. +If two behaviors both resolve to soft_block, only the first (by index.yaml order) fires — it cuts the chain. + +### 7.2 Counter at 0 (no prior violations) + +If a behavior's trigger matches for the first time, counter goes from 0 to 1. +Level is resolved against counter=1. If `default_level` is `silent` and first escalation is `after: 1, level: nudge`, the agent sees a nudge on first violation. + +### 7.3 TTL expired mid-conversation + +If the session entry in state.json has `last_accessed_at` older than 24h, it is purged during the next access. The behavior starts fresh with counter=0. This is by design — the session is considered stale. + +### 7.4 Hook timeout + +If the behavior hook exceeds the configured timeout (default: 10 minutes for tool hooks), Claude Code kills the process. The tool call proceeds as if no hook fired. State may be partially written — the lock file must be cleaned up by the next invocation. + +### 7.5 jq not available + +`jq` is a runtime dependency. If not found, the hook emits a warning to stderr and exits 0 (allow). The behavior degrades to silent pass-through. A SessionStart hook should check for `jq` and warn the user. + +### 7.6 state.json locked by concurrent process + +The hook attempts to acquire the lock with a 2-second timeout. On timeout, the hook proceeds with `default_level` for all behaviors (no state read/write). A warning is logged to stderr. The tool call is not blocked by lock contention. + +### 7.7 state.json corrupted (truncated write) + +On JSON parse failure, the hook replaces state.json with an empty object `{}`. All sessions and counters are lost. A warning is logged to stderr. Behaviors restart from counter=0. + +### 7.8 Behavior with no matching triggers + +If a behavior has no triggers matching the current hook event, it is skipped entirely. No counter increment, no output. + +### 7.9 Empty behaviors/index.yaml + +If no behaviors are declared, the hook exits 0 immediately. No state access. + +--- + +## 8. Compatibility with v2.9 + +### 8.1 Exit code mapping + +| v3.0 Level | v2.9 Exit Code | Notes | +|------------|---------------|-------| +| silent | 0 | identical | +| nudge | 0 | v2.9 had no equivalent | +| warning | 1 | v2.9 used exit 1 for warnings | +| soft_block | 2 (via JSON deny) | v2.9 used exit 2 for blocks | +| hard_block | 2 (via JSON deny) | v2.9 had no distinction | + +### 8.2 Coexistence + +v3.0 behavior hooks coexist with v2.9 hooks. Existing hooks (e.g., `block-destructive.sh`) continue using exit codes. Behavior-generated hooks use JSON output. Both patterns are valid in Claude Code's hook system. + +Behavior hooks are registered AFTER existing hooks in settings.json. Existing hooks run first. If an existing hook blocks (exit 2), behavior hooks do not fire. + +### 8.3 Migration path + +v2.9 hooks are not automatically converted to behaviors. They continue working as-is. +Users can optionally replace v2.9 hooks with equivalent behaviors when the catalog covers the same functionality. This is opt-in, not forced. + +--- + +## 9. Glossary + +| Term | Definition | +|------|-----------| +| **behavior** | A declarative YAML resource defining an expected agent behavior, its enforcement policy, and its communication rendering | +| **violation** | A tool call that matches a behavior's trigger conditions | +| **trigger** | A set of conditions on tool_input and/or session_state that detect a violation | +| **level** | One of 5 enforcement severities: silent, nudge, warning, soft_block, hard_block | +| **effective level** | The current enforcement level for a behavior in a session, monotonically non-decreasing | +| **escalation** | The mapping from violation counter thresholds to enforcement levels | +| **counter** | Per-behavior, per-session integer tracking violation count | +| **override** | User decision to proceed despite a soft_block denial | +| **chain** | The ordered sequence of behavior evaluations per tool call | +| **index** | `behaviors/index.yaml` — the ordered list of active behaviors | +| **policy** | The behavioral expectation and enforcement rules (what to enforce) | +| **rendering** | The communication templates (how to tell the agent) |