Skip to content

forgen-team/forgen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

184 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Forgen

When your agent says "done", forgen makes it prove it.
Turn-level self-verification + personalized rules for Claude Code and Codex CLI, at $0 extra API cost.

npm version License: MIT Node.js >= 20

First Block · Quick Start · How It Works · 4-Axis · Harness Vision · Commands · Architecture · Safety

English · 한국어 · 日本語 · 简体中文


The first block (30 seconds)

You've been burned: Claude says "tests pass, implementation done" — you run it — it doesn't work. forgen closes that gap.

You:     "Update the auth middleware."
Claude:  ...makes edits to src/middleware/auth.ts...
Claude:  "구현 완료. 신뢰도 95/100."

[forgen:stop-guard/builtin:self-score-inflation]
자가 점수 상승 선언 1건 (95/100). 측정 도구 호출 0회 — 숫자를 뒷받침할
실행/확인 증거 없음. 테스트/빌드/curl 실행 결과를 턴에 포함해 재응답.

Claude:  "측정 없이 점수를 매겼습니다. 실 테스트부터 실행합니다..."
         $ npm test
         "31 passed / 0 failed. auth middleware 구현 완료."

[forgen] ✓ approved

What just happened: Claude's Stop hook detected a score claim (95/100) without any measurement tool call (Bash / NotebookEdit) in the turn — one of forgen's three built-in meta guards (TEST-1 fact vs agreement, TEST-2 self-score inflation, TEST-3 conclusion/verification ratio). Claude read the block reason, retracted, ran the real test, and re-submitted. Zero extra API calls — it all happened in the same session turn Claude was going to produce anyway.

The same mechanism also fires when Claude writes conclusions faster than evidence ("done. passed. shipped. verified." with no measurement context), or claims facts ("테스트가 통과합니다") without ever having executed them. You can also define custom rules (e.g. "require npm test evidence before saying 'done' in this repo") via forgen compound --rule — they slot into the same Stop-hook dispatcher.

This is Mech-B self-check prompt-inject. It works because Claude Code's Stop hook accepts decision: "block" + reason, and Claude in the next turn reads that reason as input. Codex CLI gets the same treatment via the symmetric host adapter (v0.4.3, multi-host core design). We verified it end-to-end on 10 scenarios at $1.74 total cost (A1 spike report), and v0.4.1 added built-in guards so you get the first block without writing any rule.

v0.4.3 self-correction story: the same guards detected their own 16-day false-positive (strict φ 65.66% — 84% from a single Korean-regex bug), and the forgen-eval introspect testbed (alpha) flagged a TEST-1 wiring gap on top of it. Both fixes shipped in v0.4.3 — forgen finding and fixing forgen. Details in CHANGELOG.

🎬 See it happen (27 seconds):

# Watch the full loop live — actual hook, actual rule, actual block/approve cycle
bash docs/demo/mech-b-demo.sh

# Or replay the pre-recorded asciinema cast
asciinema play docs/demo/mech-b-block-unblock.cast

See docs/demo/README.md for what's real vs simulated in the demo.


Two developers. Same Claude. Completely different behavior.

The Trust Layer above is one pillar. The other is personalization — still the reason you'd keep forgen around after the first block.

Developer A is careful. They want Claude to run all tests, explain reasoning, and ask before touching anything outside the current file.

Developer B moves fast. They want Claude to make assumptions, fix related files automatically, and report results in two lines.

Without forgen, both developers get the same generic Claude. With forgen, each gets a Claude that works the way they work.

Developer A's Claude:                    Developer B's Claude:
"I found 3 related issues.               "Fixed login + 2 related files.
Before proceeding, should I also          Tests pass. One risk: session
fix the session handler? Here's           timeout not covered. Done."
my analysis of each..."

Forgen profiles your work style, learns from your corrections, and renders personalized rules that Claude follows every session.


The harness carries you

Personalization is the surface. The deeper idea: every session leaves a trace, and those traces compound into a harness that reasons like you do. Your corrections, your conventions, your trade-off preferences — extracted from conversation, stored under ~/.forgen/me/, and replayed back to Claude on every future session.

Conversation  ──►  Extracted: solution / rule / behavior / profile update
                   ─────────────────────────────────────────────────────
                                         │
                                         ▼
Next session  ◄──  Injected: UserPromptSubmit context + rendered rules
                   + Stop-hook guards calibrated to *your* standards

After a few weeks this harness stops being "a tool that enforces rules" and starts being a portable bundle of how you judge work. One command exports it:

forgen compound export    # → forgen-knowledge-YYYY-MM-DD.tar.gz
                          #   (rules + solutions + behavior — your philosophy)
forgen compound import <path>   # replay it on another machine

That's the north star: a Claude on your laptop that judges like you do, and a tarball you can carry.


What happens when you use forgen

First run (one time, ~1 minute)

npm install -g @wooojin/forgen
forgen

Forgen detects this is your first run and launches a 4-question onboarding. Each question is a concrete scenario:

  Q1: Ambiguous implementation request

  You receive "improve the login feature." Requirements are
  unclear and adjacent modules may be affected.

  A) Clarify requirements/scope first. Ask if scope expansion is possible.
  B) Proceed if within same flow. Check when major scope expansion appears.
  C) Make reasonable assumptions and fix adjacent files directly.

  Choice (A/B/C):

Four questions. Four axes measured. Your profile is created with a pack for each axis plus fine-grained facets. A personalized rule file is rendered and placed where Claude reads it.

Every session (daily use)

forgen                    # Use this instead of `claude`

Behind the scenes:

  1. Harness loads your profile from ~/.forgen/me/forge-profile.json
  2. Preset manager composes the session: global safety rules + pack base rules + personal overlays + session overlays
  3. Rule renderer converts everything into natural language and writes ~/.claude/rules/v1-rules.md
  4. Claude Code starts and reads those rules as behavioral instructions
  5. Safety hooks activate: blocking dangerous commands, filtering secrets, detecting prompt injection

When you correct Claude

You say: "Don't refactor files I didn't ask you to touch."

Claude calls the correction-record MCP tool. The correction is stored as structured evidence with axis classification (judgment_philosophy), kind (avoid-this), and confidence score. A temporary rule is created for immediate effect in the current session.

Between sessions (automatic)

When a session ends, auto-compound extracts:

  • Solutions (reusable patterns with context)
  • Behavioral observations (how you work)
  • A session learning summary

Facets are micro-adjusted based on accumulated evidence. If your corrections consistently point away from your current pack, mismatch detection triggers after 3 sessions and recommends a pack change.

Next session

Updated rules are rendered with your corrections included. Compound knowledge is searchable via MCP. Retrieval precision grows as your personal accumulation grows — the mechanism is in place from day 1 (starter-pack covers common dev queries on a fresh install), and the signal-to-noise ratio improves over roughly 2–4 weeks of real use as low-fitness solutions are auto-demoted and your specific patterns get promoted.


Quick Start

# 1. Install (MUST use -g — forgen is a global CLI)
npm install -g @wooojin/forgen

# 2. Register forgen on your host(s) — Claude Code, Codex, or both
forgen install both         # 3-choice interactive: claude / codex / both
# or non-interactive:
forgen install claude
forgen install codex

# 3. First run — 4-question onboarding (English or Korean)
forgen                       # default: Claude
forgen --runtime codex       # use Codex
forgen config default-host codex   # set persistent default

Prerequisites

  • Node.js >= 20 (>= 22 recommended for SQLite session search)
  • At least one host installed and authenticated:
    • Claude Codenpm i -g @anthropic-ai/claude-code
    • Codex CLI — install per Codex docs
    • Or both — forgen install both registers symmetric hooks/MCP for each

Vendor dependency: Forgen wraps Claude Code and Codex CLI symmetrically (Claude is the behavior reference; Codex extends with equivalence). Upstream API/CLI changes may affect behavior. Tested with Claude Code 1.0.x / 2.1.x and Codex 0.x.

Isolated / CI / Docker usage

Forgen's home directory is ~/.forgen by default, but can be overridden per-process:

# Fresh isolated home — does NOT touch your real ~/.forgen
FORGEN_HOME=/tmp/forgen-clean forgen init       # provisions 15-solution starter pack
FORGEN_HOME=/tmp/forgen-clean forgen stats      # shows stats from the isolated home
FORGEN_HOME=/tmp/forgen-clean claude -p "..."   # hooks inherit the env → isolated logs

Claude Code hook processes inherit the parent env, so any claude command prefixed with FORGEN_HOME=... routes all state (rules, solutions, behavior, enforcement logs) into that directory. Useful for:

  • CI pipelines validating forgen against a pinned seed set
  • Reproducing buyer-first-day experience without polluting your real home
  • Running multiple personas on one machine

Docker / remote servers (OAuth limitation): Claude Code stores its OAuth session in the OS keychain (macOS Keychain / libsecret / Windows Credential Manager). Mounting ~/.claude.json alone is not enough in a fresh Linux container because the keychain-bound refresh is missing. For container use, set ANTHROPIC_API_KEY in the container env instead. Host-native usage (macOS, Linux workstations) works with the normal claude login flow — no API key needed.

Migrations

forgen migrate implicit-feedback backfills the category field on pre-v0.4.1 entries in ~/.forgen/state/implicit-feedback.jsonl. Idempotent — safe to re-run.


Why forgen

Generic Claude Code oh-my-claudecode forgen
Same for everyone Yes Yes No
Learns from corrections No No Yes
Evidence-based lifecycle No No Yes
Auto-retires bad patterns No No Yes
Personalized rules No No Yes
Runtime dependencies - many 3

When to use forgen

Good fit:

  • Long-running projects where Claude learns your patterns over weeks
  • Developers with strong preferences about how AI should behave
  • Codebases with recurring patterns that benefit from compound knowledge

Not a fit:

  • One-off scripts or throwaway prototypes
  • Environments without Claude Code
  • Teams that need identical AI behavior for all members (forgen is personal, not team-wide)

forgen + oh-my-claudecode: They work together. OMC provides orchestration (agents, workflows); forgen provides personalization (profile, learning). See Coexistence Guide.


How It Works

The learning loop

                          +-------------------+
                          |    Onboarding     |
                          |  (4 questions)    |
                          +--------+----------+
                                   |
                                   v
                   +-------------------------------+
                   |        Profile Created         |
                   |  4 axes x pack + facets + trust |
                   +-------------------------------+
                                   |
           +-----------------------+------------------------+
           |                                                |
           v                                                |
  +------------------+                                      |
  | Rules Rendered   |   ~/.claude/rules/v1-rules.md        |
  | to Claude format |                                      |
  +--------+---------+                                      |
           |                                                |
           v                                                |
  +------------------+                                      |
  | Session Runs     |   Claude follows your rules          |
  |   You correct    | ---> correction-record MCP           |
  |   Claude learns  |      Evidence stored                 |
  +--------+---------+      Temp rule created               |
           |                                                |
           v                                                |
  +------------------+                                      |
  | Session Ends     |   auto-compound extracts:            |
  |                  |   solutions + observations + summary  |
  +--------+---------+                                      |
           |                                                |
           v                                                |
  +------------------+                                      |
  | Facets Adjusted  |   micro-adjustments to profile       |
  | Mismatch Check   |   rolling 3-session analysis         |
  +--------+---------+                                      |
           |                                                |
           +------------------------------------------------+
                    (next session: updated rules)

Compound knowledge

Knowledge accumulates across sessions with a trust-based lifecycle:

experiment (0.30) → candidate (0.55) → verified (0.75) → mature (0.90)

Each solution starts as an experiment. As it gets reflected in your code across sessions, it's automatically promoted. Negative evidence triggers a circuit breaker (auto-retire). This means only patterns that actually work for you survive.

Type Source How Claude uses it
Solutions Extracted from sessions Auto-injected when relevant to your prompt (TF-IDF + BM25 + bigram ensemble)
Skills 10 built-in + promoted from verified solutions Activated by keyword (deep-interview, forge-loop, ship, etc.)
Behavioral patterns Auto-detected at 3+ observations Applied to forge-behavioral.md
Evidence Corrections + observations Drives facet adjustments + rule creation

Solution auto-injection

Every prompt you type is matched against your accumulated solutions. Relevant ones are automatically injected into Claude's context — no manual lookup needed.

You type: "fix the error handling in the API"
                    ↓
solution-injector matches: starter-error-handling-patterns (0.70)
                    ↓
Claude sees: "Matched solutions: error-handling-patterns [pattern|0.70]
             Use try/catch with specific error types. Always log original error..."
                    ↓
Claude has your accumulated patterns in context while drafting the response.

Precision gates (v0.3.2+): matches below relevance 0.3 or with only a single common-word tag overlap are filtered before injection so Claude's context doesn't get polluted by low-signal hits. Cold-start boost (v0.4.1+): when your outcome history has < 5 champion/active solutions (first days after install), the injection threshold is relaxed to 0.2 so starter-pack solutions can actually surface; once your own patterns accumulate the threshold returns to the standard 0.3.

10 built-in skills

Curated, compound-native skills. Each integrates with your accumulated knowledge — effectiveness compounds as your personal solution base grows.

Core chain (build → learn):

Skill Trigger What it does
deep-interview "deep-interview", "딥인터뷰" Weighted 4-dimension ambiguity scoring, 3 challenge modes (Contrarian/Simplifier/Ontologist), ontology tracking
forge-loop "forge-loop", "끝까지" PRD-based iteration loop. Stop hook prevents polite-stop. Verifier enforcement with fresh evidence
compound "복리화", "compound" Extract reusable patterns with 5-Question quality filter. Health dashboard included

Management chain (review → tune):

Skill Trigger What it does
retro "retro", "회고" Weekly retrospective: git analysis + compound health + learning trend + 3 recommendations
learn "learn prune", "compound 정리" 5 subcommands: search/stats/prune/export/import. Stale & duplicate detection
calibrate "calibrate", "프로필 보정" Evidence-based profile adjustment. Max 2 axes per calibration. Threshold: 3+ corrections in same direction

Independent skills:

Skill Trigger What it does
ship "ship", "배포" 15-step pipeline. "Never ask, just do" philosophy. Review Readiness Dashboard + Verification Gate
code-review "code review", "리뷰" Confidence 1-10 calibration, Critical 5 categories (SQL/race/LLM trust/secrets/enum), auto-fix
architecture-decision "adr" Weighted trade-off matrix, ADR lifecycle, reversibility classification
docker "docker", "컨테이너" Multi-stage builds, security hardening, 10 failure modes

13 built-in agents

Sub-agents with physically separated tool access, Failure_Modes_To_Avoid sections, and Good/Bad examples. Invoked via Agent(subagent_type: "ch-<name>"). The ch- prefix avoids collisions with OMC / built-in Claude Code agents.

Read-only (investigation / review):

Agent Model Role
ch-explore Haiku Fast codebase explorer — file/pattern search, structure mapping
ch-analyst Opus Requirements analyst — uncovers hidden constraints via Socratic inquiry
ch-architect Opus Strategic architecture advisor
ch-code-reviewer Opus Unified reviewer — quality + security (OWASP) + performance (absorbs former security-reviewer / performance-reviewer)
ch-critic Opus Final quality gate — plan/code verifier

Plan-only:

Agent Model Role
ch-planner Opus Strategic planning — decomposes tasks, identifies risks, creates actionable plans
ch-solution-evolver Opus Propose 3 novel compound-solution candidates from a weakness report (Phase 4 evolution loop)

Write-enabled (implementation / verification):

Agent Model Role
ch-executor Sonnet Code implementation — compound-aware, absorbs refactoring & simplification
ch-debugger Sonnet Root-cause debugger — isolates regressions, analyzes stack traces
ch-test-engineer Sonnet Test strategist — integration/E2E coverage, TDD, flaky-test hardening
ch-designer Sonnet UI/UX — component architecture, accessibility, responsive design
ch-git-master Sonnet Git workflows — atomic commits, rebasing, history management (Bash limited to git)
ch-verifier Sonnet Completion verifier — evidence collection, test adequacy, manual test scenarios (compound-aware)

Absorbed in this redesign: security-reviewer / performance-reviewerch-code-reviewer, refactoring-expert / code-simplifierch-executor, qa-testerch-verifier, scientist / writer removed.

Session management

Feature What happens
Session brief Before context compaction, a structured brief is saved and restored in the next session
Drift detection EWMA-based edit rate tracking → warning at 15 edits, critical at 30, hard stop at 50
Agent output validation When Claude spawns sub-agents, their output quality is automatically verified
Auto-compact At 120K chars accumulated, Claude is instructed to compact context
Pending compound After 20+ prompt sessions, a compound extraction is auto-triggered next session

4-Axis Personalization

Each axis has 3 packs. Each pack includes fine-grained facets (numerical values from 0-1) that are micro-adjusted over time based on your corrections.

Quality/Safety

Pack What Claude does
Conservative Runs all tests before reporting done. Checks types. Verifies edge cases. Won't say "complete" until everything passes.
Balanced Runs key checks, summarizes remaining risks. Balances thoroughness with speed.
Speed-first Quick smoke test. Reports results and risks immediately. Prioritizes delivery.

Autonomy

Pack What Claude does
Confirm-first Asks before touching adjacent files. Clarifies ambiguous requirements. Requests approval for scope expansion.
Balanced Proceeds within the same flow. Checks when major scope expansion appears.
Autonomous Makes reasonable assumptions. Fixes related files directly. Reports what was done after.

Judgment

Pack What Claude does
Minimal-change Preserves existing structure. Does not refactor working code. Keeps modification scope minimal.
Balanced Focuses on current task. Suggests improvements when clearly beneficial.
Structural Proactively suggests structural improvements. Prefers abstraction and reusable design. Maintains architectural consistency.

Communication

Pack What Claude does
Concise Code and results only. No proactive elaboration. Explains only when asked.
Balanced Summarizes key changes and reasons. Invites follow-up questions.
Detailed Explains what, why, impact, and alternatives. Provides educational context. Structures reports with sections.

What the rendered rules actually look like

When forgen composes your session, it renders a v1-rules.md file that Claude reads. Here are two real examples showing how different profiles produce completely different Claude behavior.

Example 1: Conservative + Confirm-first + Structural + Detailed

[Conservative quality / Confirm-first autonomy / Structural judgment / Detailed communication]

## Must Not
- Never commit or expose .env, credentials, or API keys.
- Never execute destructive commands (rm -rf, DROP, force-push) without user confirmation.

## Working Defaults
- Trust: Dangerous bypass disabled. Always confirm before destructive commands or sensitive path access.
- Proactively suggest structural improvements when you spot repeated patterns or tech debt.
- Prefer abstraction and reusable design, but avoid over-abstraction.
- Maintain architectural consistency across changes.

## When To Ask
- Clarify requirements before starting ambiguous tasks.
- Ask before modifying files outside the explicitly requested scope.

## How To Validate
- Run all related tests, type checks, and key verifications before reporting completion.
- Do not say "done" until all checks pass.

## How To Report
- Explain what changed, why, impact scope, and alternatives considered.
- Provide educational context — why this approach is better, compare with alternatives.
- Structure reports: changes, reasoning, impact, next steps.

## Evidence Collection
- When the user corrects your behavior ("don't do that", "always do X", "stop doing Y"), call the correction-record MCP tool to record it as evidence.
- kind: fix-now (immediate fix), prefer-from-now (going forward), avoid-this (never do this)
- axis_hint: quality_safety, autonomy, judgment_philosophy, communication_style
- Do not record general feedback — only explicit behavioral corrections.

Example 2: Speed-first + Autonomous + Minimal-change + Concise

[Speed-first quality / Autonomous autonomy / Minimal-change judgment / Concise communication]

## Must Not
- Never commit or expose .env, credentials, or API keys.
- Never execute destructive commands (rm -rf, DROP, force-push) without user confirmation.

## Working Defaults
- Trust: Minimal runtime friction. Free execution except explicit bans and destructive commands.
- Preserve existing code structure. Do not refactor working code unnecessarily.
- Keep modification scope minimal. Change adjacent files only when strictly necessary.
- Secure evidence (tests, error logs) before making changes.

## How To Validate
- Quick smoke test. Report results and risks immediately.

## How To Report
- Keep responses short and to the point. Focus on code and results.
- Only elaborate when asked. Do not proactively write long explanations.

## Evidence Collection
- When the user corrects your behavior ("don't do that", "always do X", "stop doing Y"), call the correction-record MCP tool to record it as evidence.
- kind: fix-now (immediate fix), prefer-from-now (going forward), avoid-this (never do this)
- axis_hint: quality_safety, autonomy, judgment_philosophy, communication_style
- Do not record general feedback — only explicit behavioral corrections.

Same Claude. Same codebase. Completely different working style, driven by a 1-minute onboarding.


Commands

Core

forgen                          # Start Claude Code with personalization
forgen "fix the login bug"      # Start with a prompt
forgen --resume                 # Resume previous session

Personalization

forgen onboarding               # Run 4-question onboarding
forgen forge --profile          # View current profile
forgen forge --reset soft       # Reset profile (soft / learning / full)
forgen forge --export           # Export profile

Inspection

forgen stats                    # Trust-layer dashboard (rules, corrections, blocks 7d, assist today, philosophy)
forgen recall [--limit N] [--show]
                                # Recent compound recalls surfaced to Claude (with body preview)
forgen last-block               # Most recent block event with rule detail
forgen inspect profile          # 4-axis profile with packs and facets
forgen inspect rules            # Active and suppressed rules
forgen inspect corrections      # Correction history (alias: evidence)
forgen inspect session          # Current session state
forgen inspect violations       # Recent block events (--last N)
forgen me                       # Personal dashboard (shortcut for inspect profile)

Rule management

forgen rule list                # List active + suppressed rules
forgen rule suppress <id>       # Disable a rule (hard rules refused)
forgen rule activate <id>       # Re-activate a suppressed rule
forgen rule scan [--apply]      # Run lifecycle triggers (promote/demote/retire)
forgen rule health-scan         # Scan drift → Mech downgrade candidates
forgen rule classify            # Propose enforce_via for legacy rules

Knowledge management

forgen compound                 # Preview accumulated knowledge
forgen compound --save          # Save auto-analyzed patterns
forgen compound list            # List all solutions with status
forgen compound inspect <name>  # Show full solution details
forgen compound --lifecycle     # Run promotion/demotion check
forgen compound --verify <name> # Manually promote to verified
forgen compound export          # Export knowledge as tar.gz
forgen compound import <path>   # Import knowledge archive
forgen skill promote <name>     # Promote a verified solution to a skill
forgen skill list               # List promoted skills

System

forgen init                     # Initialize project (+ 15 starter-pack solutions)
forgen migrate [implicit-feedback|all]
                                # One-shot schema migrations (idempotent)
forgen doctor                   # System diagnostics (10 categories + harness maturity)
forgen doctor --prune-state     # Daily hygiene: state GC + T4 rule decay (90d idle → retire)
forgen dashboard                # Knowledge overview (6 sections)
forgen config hooks             # View hook status + context budget
forgen config hooks --regenerate # Regenerate hooks
forgen mcp list                 # List installed MCP servers
forgen mcp add <name>           # Add MCP server from template
forgen mcp templates            # Show available templates
forgen notepad show             # View session notepad
forgen uninstall                # Remove forgen cleanly

Rule lifecycle (v0.4.0, ADR-001/002)

forgen classify-enforce          # Preview enforce_via proposals for existing rules
forgen classify-enforce --apply  # Save proposed enforce_via (skips already-set rules)
forgen classify-enforce --apply --force  # Overwrite existing enforce_via
forgen rule-meta-scan            # Preview Mech demotion candidates (drift.jsonl → A→B→C)
forgen rule-meta-scan --apply    # Persist demotions + meta_promotions history
forgen lifecycle-scan            # Preview T2~T5 + Meta (T1 fires inline on correction, not via CLI)
forgen lifecycle-scan --apply    # Apply all lifecycle state transitions

Rule enforcement is 3-axis (ADR-001):

  • Mech-A (hook-BLOCK) — mechanical checks (rm -rf, artifact presence). Violation blocks immediately.
  • Mech-B (self-check) — natural-language rules. Stop hook feeds self-check question back to Claude via decision: "block" + reason. Zero extra API cost.
  • Mech-C (drift-measure) — long-term bias tracking only.

Rule lifecycle (ADR-002): rules auto-flag / suppress / retire / merge / supersede based on T1~T5 + Meta signals. Details in docs/adr/ADR-002-rule-lifecycle-engine.md.

Release self-gate (v0.4.0, ADR-003)

Three CI gates prove forgen does not violate its own L1 rules before release:

node scripts/self-gate.cjs          # Static: mock-in-prod, secrets, enforce_via, release-artifact
node scripts/self-gate-runtime.cjs  # Runtime smoke: 6 hook scenarios
node scripts/self-gate-release.cjs  # Tag-only: version/tag/CHANGELOG/dist/e2e-report consistency

Triggered by .github/workflows/self-gate.yml on push main / PR main / tag v*. Dogfood opt-in: see .forgen/README.md.

MCP tools (available to Claude during sessions)

Tool Purpose
compound-search Search accumulated knowledge by query (TF-IDF + BM25 + bigram ensemble)
compound-read Read full solution content (Progressive Disclosure Tier 3)
compound-list List solutions with status/type/scope filters
compound-stats Overview statistics by status, type, scope
session-search Search past session conversations (SQLite FTS5, Node.js 22+)
correction-record Record user corrections as structured evidence
profile-read Read current personalization profile
rule-list List active personalization rules by category

Architecture

~/.forgen/                           Personalization home
|-- me/
|   |-- forge-profile.json           4-axis profile (packs + facets + trust)
|   |-- rules/                       Rule store (one JSON file per rule)
|   |-- behavior/                    Evidence store (corrections + observations)
|   |-- recommendations/             Pack recommendations (onboarding + mismatch)
|   +-- solutions/                   Compound knowledge
|-- state/
|   |-- sessions/                    Session effective state snapshots
|   +-- raw-logs/                    Raw session logs (7-day TTL auto-cleanup)
+-- config.json                      Global config (locale, trust, packs)

~/.claude/
|-- settings.json                    Hooks + env vars injected by harness
|-- rules/
|   |-- forge-behavioral.md          Learned behavioral patterns (auto-generated)
|   +-- v1-rules.md                  Rendered personalization rules (per-session)
|-- commands/forgen/                 Slash commands (promoted skills)
+-- .claude.json                     MCP server registration

~/.forgen/                           Forgen home (v5.1 unified storage)
|-- me/
|   |-- solutions/                   Accumulated compound knowledge
|   |-- behavior/                    Behavioral patterns
|   |-- rules/                       Personal correction rules
|   +-- forge-profile.json           4-axis personalization profile
|-- state/                           Session state, checkpoints
+-- sessions.db                      SQLite session history (Node.js 22+)

Data flow

forge-profile.json                   Source of truth for personalization
        |
        v
preset-manager.ts                    Composes session state:
  global safety rules                  hard constraints (always active)
  + base pack rules                    from profile packs
  + personal overlays                  from correction-generated rules
  + session overlays                   temporary rules from current session
  + runtime capability detection       trust policy adjustment
        |
        v
rule-renderer.ts                     Converts Rule[] to natural language:
  filter (active only)                 pipeline: filter -> dedupe -> group ->
  dedupe (render_key)                  order -> template -> budget (4000 chars)
  group by section
  order: Must Not -> Working Defaults -> When To Ask -> How To Validate -> How To Report
        |
        v
~/.claude/rules/v1-rules.md         What Claude actually reads

Safety

Safety hooks are automatically registered in settings.json and run on every tool call Claude makes.

Hook Trigger What it does
pre-tool-use Before any tool execution Blocks rm -rf, curl|sh, --force push, dangerous patterns
db-guard SQL operations Blocks DROP TABLE, WHERE-less DELETE, TRUNCATE
secret-filter File writes and outputs Warns when API keys, tokens, or credentials are about to be exposed
slop-detector After code generation Detects TODO remnants, eslint-disable, as any, @ts-ignore, empty catch
prompt-injection-filter All inputs Blocks prompt injection attempts with pattern + heuristic detection
context-guard During session Warns at 50 prompts/200K chars, auto-compact at 120K, session handoff
rate-limiter MCP tool calls Prevents excessive MCP tool invocations
drift-detector File edits EWMA-based drift score: warning → critical → hard stop at 50 edits
agent-validator Agent tool output Warns on empty/failed/truncated sub-agent output

Safety rules are hard constraints -- they cannot be overridden by pack selection or corrections. The "Must Not" section in rendered rules is always present regardless of profile.


Key Design Decisions

  • 4-axis profile, not preference toggles. Each axis has a pack (coarse) and facets (fine-grained, 0-1 numerical values). Packs give stable behavior; facets allow micro-adjustment without full reclassification.

  • Evidence-based learning, not regex matching. Corrections are structured data (CorrectionRequest with kind, axis_hint, message). Claude classifies them; algorithms apply them. No pattern matching on user input.

  • Pack + overlay model. Base packs provide stable defaults. Personal overlays from corrections layer on top. Session overlays for temporary rules. Conflict resolution: session > personal > pack (global safety is always hard constraint).

  • Rules rendered as natural language. The v1-rules.md file contains English (or Korean) sentences, not configuration. Claude reads instructions like "Do not refactor working code unnecessarily" -- the same way a human mentor would give guidance.

  • Mismatch detection. Rolling 3-session analysis checks if your corrections consistently diverge from your current pack. When detected, forgen proposes a pack re-recommendation rather than silently drifting.

  • Runtime trust computation. Your desired trust policy is reconciled with Claude Code's actual runtime permission mode. If Claude Code runs with --dangerously-skip-permissions, forgen adjusts the effective trust level accordingly.

  • Internationalization. English and Korean fully supported. Language selected at onboarding, applied throughout (onboarding questions, rendered rules, CLI output).


Coexistence

Forgen detects other Claude Code plugins (oh-my-claudecode, superpowers, claude-mem) at install time and automatically reduces its context injection by 50% ("yielding principle"). Core safety and compound hooks always remain active. Conflicting skills are skipped when another plugin already provides them.

Better with claude-mem (recommended pairing)

forgen and claude-mem solve complementary halves of the trust gap:

forgen claude-mem
Job Enforcement — block unverified claims Recall — inject relevant past sessions
Trigger Stop / PreToolUse hooks UserPromptSubmit hook
Cost $0 (in-turn block/reason) $0 (vector recall, local)

Install both as separate Claude Code plugins (Plugin model — forgen does not bundle claude-mem; AGPL-3.0 stays at arm's length). When both are present forgen's auto-detect yields context budget so claude-mem's recall has room to land, and the orchestration contract — order, failure isolation, Stop-hook ownership — is documented in ADR-004. The pairing is one of the 5 arms tracked by forgen-eval (see claude-mem spike).

You:        "fix the auth flow"
claude-mem: ↓ recalls past auth-flow session, injects 3 relevant chunks
forgen:     ↓ matches your "no mock as proof" rule, primes Stop guard
Claude:     edits → declares done → forgen Stop hook blocks (no test ran)
            → re-runs test → approved

See Coexistence Guide for the full plugin-detection matrix.


Documentation

Document Description
Hooks Reference 19 hooks across 3 tiers — events, timeouts, behavior
Coexistence Guide Using forgen alongside oh-my-claudecode
forgen-eval testbed Alpha self-measurement package — multi-host parity, 7-axis metrics, drift detection (private workspace, v0.4.3+)
Multi-host core design Codex/Claude symmetric host adapter spec
ADR-005 forgen-eval architecture Self-measurement testbed module design
CHANGELOG Version history and release notes

License

MIT

About

Code, forged for you — personalized Claude Code harness

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors