Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .claude/agent-memory/implementer.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@

## 2026-04-13 — v3 spec docs authoring (COMPILER.md)
- **Learned:** When writing spec docs that cross-reference other specs, read all referenced files before writing — the DECISIONS.md may contradict the task brief (e.g., DECISIONS.md says `flock` but the task brief already decided on mkdir-based locking; trust the task brief for closed decisions).
- **Learned:** For compiler specs where the output is bash, the pseudocode-with-inline-comments pattern in Section 11 (showing compiler-generated structure without writing full implementation) is the right balance between specificity and staying within spec scope.
- **Avoid:** Do not write full implementation bash in a spec doc — show structure and intent with comments instead.

## 2026-04-08 — practices pipeline inbox→active migration
- **Learned:** When promoting inbox practices to active, the pattern is: Write new file to active/ with updated frontmatter (status, incorporated_in, effectiveness fields added), then rm original from inbox/. Single batch rm for all deletions is cleaner than individual calls.
- **Avoid:** Do not use Edit to move files — Write+rm is the correct approach since Edit only modifies in-place.
237 changes: 237 additions & 0 deletions docs/v3/AUDIT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,237 @@
# Audit Trail — dotforge v3.0 Behavior Governance

Specifies the triple-write audit architecture, overrides.log format, metrics exposed
to `/forge audit` and `/forge behavior status`, and integration with existing dotforge systems.

Reference: [SPEC.md](SPEC.md) Section 6 for the override protocol.
Reference: [RUNTIME.md](RUNTIME.md) Section 2 for the state.json schema.

---

## 1. Overview

Every soft_block override is recorded in three locations simultaneously. Each location serves
a distinct purpose; none is redundant.

| Location | Purpose | Scope | Persistence |
|----------|---------|-------|-------------|
| `.forge/audit/overrides.log` | Compliance audit trail | All time | Permanent — committed to git |
| `.forge/runtime/state.json` | Runtime inspection | Session (TTL 24h) | Ephemeral — gitignored |
| `registry/projects.yml` metrics | Cross-project trends | Snapshot | Permanent — committed to git |

**Why three places:**
- `overrides.log` — append-only, inspectable with grep, survives session resets. The authoritative record.
- `state.json` — in-memory view for the current session. Powers `/forge behavior status` without file parsing.
- Registry — aggregated `override_rate` enables cross-project governance dashboards and trend detection.

Directory layout:
```
.forge/
├── audit/
│ └── overrides.log # permanent, committed to git
└── runtime/
├── state.json # ephemeral, gitignored
└── state.lock/ # transient mkdir lock
```

---

## 2. overrides.log Format

Location: `.forge/audit/overrides.log`

Append-only. One record per line. Pipe-delimited. No header row.

```
TIMESTAMP|SESSION_ID|BEHAVIOR_ID|TOOL_NAME|TOOL_INPUT_SUMMARY|COUNTER|REASON
```

**Example records:**
```
2026-04-13T12:15:00Z|a1b2c3d4|search-first|Edit|file_path=/src/utils.ts old_string=function|5|
2026-04-13T14:30:00Z|a1b2c3d4|search-first|Write|file_path=/src/new-module.ts content=import|7|urgent hotfix
2026-04-14T09:00:00Z|e5f6g7h8|search-first|Write|file_path=/tests/test_api.py content=def test|2|
2026-04-14T10:45:00Z|e5f6g7h8|verify-before-done|Bash|command=git commit -m "feat: add"|3|tests passed locally
2026-04-15T16:20:00Z|i9j0k1l2|search-first|Edit|file_path=/lib/parser.ts old_string=export|4|
```

**Field reference:**

| Field | Format | Notes |
|-------|--------|-------|
| `TIMESTAMP` | ISO 8601 with timezone (`Z` or offset) | UTC preferred |
| `SESSION_ID` | First 8 chars of Claude Code session UUID | From hook payload `session_id` |
| `BEHAVIOR_ID` | kebab-case behavior id | Matches `behavior.yaml` id field |
| `TOOL_NAME` | Tool that triggered the block | Write, Edit, Bash, etc. |
| `TOOL_INPUT_SUMMARY` | First 100 chars of key `tool_input` fields | Pipe chars escaped as `\|`, newlines as `\n` |
| `COUNTER` | Violation count at override moment | Integer; counter is already incremented (see SPEC.md §3.1) |
| `REASON` | User-provided reason string | Empty string if none; never contains pipe chars |

The `overrides[]` array in `state.json` (see RUNTIME.md §2) is the in-session subset of this log.
The log is the authoritative source; state.json is derived and ephemeral.

---

## 3. Log Rotation Policy

No rotation in v3.0. The file grows indefinitely.

Expected growth: fewer than 100 overrides per month for an active project equals fewer than 10 KB/month.
Rotation is reserved for v3.1 based on observed usage.

To count total overrides: `wc -l .forge/audit/overrides.log`

---

## 4. Metrics for /forge audit

The existing 13-item checklist and scoring formula (see `audit/scoring.md`) are unchanged.
Behavior governance metrics appear as a **separate section** appended after the checklist score.

**New section: Behavior Governance**

| Metric | Type | Source | Calculation |
|--------|------|--------|-------------|
| `behaviors_installed` | integer | `behaviors/*/behavior.yaml` file count | direct count |
| `behaviors_enabled` | integer | `behaviors/index.yaml` enabled entries | direct count |
| `violations_total` | integer | `state.json` sum of all counters | sum across all sessions and behaviors |
| `overrides_total` | integer | `overrides.log` line count | `wc -l` |
| `override_rate` | float | `overrides_total / violations_that_reached_block` | ratio; 0.0 if no blocks |
| `escalation_effectiveness` | string | threshold on `override_rate` | `healthy` < 0.3 / `review` 0.3–0.7 / `ineffective` > 0.7 |

**Display format in `/forge audit` output:**
```
── Behavior Governance ──
Installed: 6 (4 core, 2 opinionated)
Enabled: 4
Violations (current session): 12
Overrides (all time): 3
Override rate: 0.25 (healthy)
```

`escalation_effectiveness` interpretation:
- `healthy` — overrides are rare; enforcement is accepted
- `review` — override rate is high; consider adjusting thresholds or behavior wording
- `ineffective` — most blocks are overridden; the behavior adds friction without governance value

---

## 5. Metrics for /forge behavior status

Per-behavior display, sourced from `state.json` for the current session:

```
── search-first (core) ──
Status: enabled
Counter: 4 (this session)
Level: warning (escalates to soft_block at 5)
Overrides: 1 (this session)
Last: 2026-04-13T14:28:00Z via Write

── no-destructive-git (core) ──
Status: enabled
Counter: 0
Level: hard_block (always)
Overrides: 0
Last: never
```

Session aggregate at bottom:
```
── Session Summary ──
Total violations: 12
Total overrides: 1
Active behaviors: 4/6
Session started: 2026-04-13T10:00:00Z
```

`Last` field shows `last_violation_at` and `last_violation_tool` from `state.json`.
`Level` shows `effective_level` with the next escalation threshold if applicable.

---

## 6. Registry Integration

New fields added to each project entry in `registry/projects.yml`:

```yaml
projects:
- slug: soma
# ... existing fields ...
behaviors:
installed: 6
enabled: 4
override_rate: 0.25
last_audit: "2026-04-13"
```

These fields are snapshot values written by `/forge audit` when behavior governance is active.
They are not real-time. `override_rate` is computed from `overrides.log` at audit time.

`/forge status` can aggregate `override_rate` across projects to surface systemic governance gaps.

---

## 7. Integration with session-report.sh

Four new fields added to the JSON output written to `~/.claude/metrics/{slug}/{date}.json`
by the Stop hook (`hooks/session-report.sh`):

```json
{
"behavior_violations": 12,
"behavior_overrides": 1,
"behaviors_active": 4,
"behavior_blocks": 3
}
```

These fields are appended alongside existing fields (`sessions`, `errors_added`, `hook_blocks`,
`lint_blocks`, etc.). Backwards compatible — consumers that don't read these fields are unaffected.

`behavior_violations` — sum of all behavior counters in the current session from `state.json`.
`behavior_overrides` — count of overrides recorded in this session (from `state.json overrides[]`).
`behaviors_active` — count of enabled behaviors from `behaviors/index.yaml`.
`behavior_blocks` — number of soft_block or hard_block events this session (counter reached block threshold).

---

## 8. Audit Trail Security

- `.forge/audit/overrides.log` permissions: `0644`
- `.forge/audit/` is committed to git — permanent audit evidence
- `.forge/runtime/` is gitignored — ephemeral session state only
- `tool_input_summary` is truncated to 100 chars — no secrets in full form
- Pipe chars in `tool_input_summary` are escaped as `\|` before writing
- Newlines in `tool_input_summary` are escaped as `\n` before writing
- No tool input values beyond the summary are persisted in any audit location
- The `REASON` field must be sanitized to strip pipe chars before appending

---

## 9. Grep One-Liners

Useful commands for audit analysis:

```bash
# All overrides for a specific behavior
grep '|search-first|' .forge/audit/overrides.log

# Override count by behavior (ranked)
cut -d'|' -f3 .forge/audit/overrides.log | sort | uniq -c | sort -rn

# Overrides in last 7 days (macOS + Linux portable)
awk -F'|' -v d="$(date -d '7 days ago' +%Y-%m-%d 2>/dev/null || date -v-7d +%Y-%m-%d)" '$1 >= d' .forge/audit/overrides.log

# Overrides by tool
cut -d'|' -f4 .forge/audit/overrides.log | sort | uniq -c | sort -rn

# All overrides with a non-empty reason
awk -F'|' '$7 != ""' .forge/audit/overrides.log

# Total override count
wc -l .forge/audit/overrides.log

# Override rate per behavior requires violations from state.json
# Use /forge behavior status — grep on overrides.log alone is insufficient for rate calculation
```
97 changes: 97 additions & 0 deletions docs/v3/COMPETITIVE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# Mapa competitivo y diferenciales de dotforge v3.0

## Competencia directa verificada

### obey (Lexxes-Projects)

- Natural language → rule storage → hook auto-generado
- 17 hooks de lifecycle
- 3 scopes: global, stack-specific, project-local
- Blocking activo via PreToolUse
- Audit trail
- Completion checklists via Stop hook

**Solape con 3.0 original:** casi total.
**Nuestro diferencial ante obey:** catálogo curado, integración con el
resto del lifecycle de dotforge (audit, practices, registry, export),
enforcement escalonado de 5 niveles vs hard-block binario, seguridad
first-class.

### hookify (oficial Anthropic)

- Plugin oficial
- Markdown rules → hooks activos
- Archivo por regla, sin restart

**Solape:** parcial, patrón similar.
**Riesgo:** si Anthropic publica behavior spec oficial en 6 meses, dotforge
debe alinearse. Schema debe ser lo más cercano posible a estándares
obvios.

### tdd-guard (nizos, 1.7k stars)

- TDD enforcement vertical
- Context aggregation cross-hook via archivos compartidos
- Quick commands ON/OFF via UserPromptSubmit
- Multi-language

**Solape:** ninguno (vertical específico).
**Lecciones aplicables:**
- Context aggregation cross-hook: diferir a 3.1 pero diseñar schema
compatible desde 3.0
- Quick commands ON/OFF: incluir en 3.0 (`/forge behavior off`, scope)

### Otros proyectos referenciables

- **claude-code-workflow-orchestration:** soft enforcement con nudges
escalonados (silent → hint → warning → strong). Fuente del modelo de
5 niveles.
- **claude-code-lsp-enforcement-kit:** state tracking persistente por
cwd, multi-tier por tipo de agent. Fuente del campo `applies_to.agents`.
- **AgentSpec (ICSE '26):** DSL académica para runtime enforcement.
Referencia formal citable en README.
- **AgentBound:** arquitectura manifest + enforcement engine. Modelo
interno, no user-facing.

## Diferenciales defendibles de dotforge 3.0

1. **Catálogo curado con governance** — obey/hookify te piden escribir
reglas. dotforge trae behaviors probados, auditables, versionados.

2. **Integración con lifecycle existente** — los behaviors entran al
pipeline `inbox → evaluating → active → deprecated`, al `/forge audit`,
al registry cross-proyecto. Nadie más tiene esto.

3. **Enforcement escalonado 5 niveles con UX de escape** — obey es
hard-block binario. dotforge permite configurar silent/nudge/warning/
soft/hard por behavior, con override auditado en soft y comandos de
escape por scope.

4. **Separación policy vs rendering** — un behavior declara comportamiento
esperado separado del texto que se inyecta en CLAUDE.md. Habilita
multi-platform export nativo (diferido a 3.1 pero diseñado desde 3.0).

5. **Seguridad first-class (diferida a 3.2 pero planeada)** — post-CVE
Feb 2026, signed behaviors + hash verification + sandbox es
requisito enterprise. Competidores no lo tienen.

## Riesgos competitivos

- **obey consolida narrativa mientras construís:** mitigación es Fase 1
rápida con search-first funcional + GIF público.
- **Anthropic publica behavior spec oficial:** mitigación es schema
conservador alineado con patterns obvios + campo `schema_version`.
- **dotforge sigue en 4 stars:** problema es distribución, no producto.
Post técnico + benchmark real + update marketplace Anthropic en Fase 3.

## Mapa de features diferido a 3.1/3.2

| Feature | Competidor que ya lo hace | Target dotforge |
|---------|---------------------------|-----------------|
| Prompt-based hooks | Anthropic oficial | 3.1 |
| Multi-platform export de rules | Ninguno directo | 3.1 |
| Context aggregation | tdd-guard | 3.1 |
| Natural language input | obey, rule2hook | 3.1 |
| Signed behaviors | Ninguno | 3.2 |
| Transcript verification | Ninguno | 3.2 |
| OPA/Rego compile | yaml-opa-llm-guardrails | 3.2 opcional |
Loading
Loading