Debug Bank

Give your AI coding agent a memory that learns from failure — and a debugger that knows where to look.

AI agents repeat the same mistakes because they forget everything between sessions. Debug Bank fixes this — a pattern-first debugging system that checks "have I seen this before?" in 30 seconds, provides targeted breakpoints for runtime debuggers, and catches known failure patterns before they ship.

One-liner: Drop a CLAUDE.md into your project. Your agent never makes the same debugging mistake twice — and when it uses a debugger, it knows exactly which breakpoints to set.

curl -O https://raw.githubusercontent.com/soleimanmansouri/debug-bank/main/CLAUDE.md

How It Works

graph TD
    BUG["Bug Reported"] --> PC["Step 1: Pattern Check (30s)"]
    DEPLOY["About to Deploy"] --> PDS["Pre-Deploy Scan"]
    PDS -->|"Patterns flagged"| REVIEW["Review / Fix before shipping"]
    PDS -->|"No matches"| SHIP["Deploy"]
    REVIEW --> SHIP

    PC -->|"Match found"| VERIFY["Verify known fix applies"]
    PC -->|"No match"| REPRODUCE["Step 2: Reproduce"]
    VERIFY -->|"Confirmed"| FIX["Step 6: Fix"]
    VERIFY -->|"Doesn't apply"| REPRODUCE
    REPRODUCE --> HYPOTHESIZE["Step 3: Hypothesize (2-3 ranked causes)"]
    HYPOTHESIZE --> ISOLATE["Step 4: Isolate (binary search)"]
    ISOLATE -->|"3 failures"| STOP["STOP — 3-Exchange Rule"]
    STOP --> REPLAN["Re-plan / Add logging / Switch strategy"]
    REPLAN --> HYPOTHESIZE
    ISOLATE -->|"Found it"| DIAGNOSE["Step 5: Diagnose (trace call chain)"]
    DIAGNOSE --> FIX
    FIX --> RECORD["Step 7: Record trajectory"]
    RECORD --> PB["Pattern Bank grows"]
    RECORD --> DC["Domain Catalog grows"]
    PB -.->|"Next bug"| PC
    DC -.->|"Next bug"| PC
    PB -.->|"Next deploy"| PDS
    DC -.->|"Next deploy"| PDS

    style BUG fill:#ff6b6b,stroke:#333,color:#fff
    style DEPLOY fill:#fd9644,stroke:#333,color:#fff
    style PC fill:#4ecdc4,stroke:#333,color:#fff
    style PDS fill:#4ecdc4,stroke:#333,color:#fff
    style STOP fill:#ff6b6b,stroke:#333,color:#fff
    style FIX fill:#95e77e,stroke:#333,color:#000
    style RECORD fill:#a29bfe,stroke:#333,color:#fff
    style PB fill:#74b9ff,stroke:#333,color:#000
    style DC fill:#74b9ff,stroke:#333,color:#000
    style SHIP fill:#95e77e,stroke:#333,color:#000

Three layers that compound over time — and a runtime bridge that makes debuggers smart:

Layer	What It Does	How It Helps
Pattern Bank (P01-P22)	Generalized root cause patterns with debugger strategies	30-second match + targeted breakpoints
Symptom Classifier	Keyword-driven symptom → pattern lookup	Structured hypothesis ranking before touching code
Debug Subagent Protocol	Pattern-guided runtime debugging via PDB/JDB	2-4 targeted breakpoints instead of 15+ blind ones
Domain Catalogs	Bugs organized by subsystem	Search by symptom type, not by date
Feedback Rules	User corrections → enforceable rules	Agent adapts to YOUR working style
Pre-Deploy Scanner	Scans git diff against pattern keywords before shipping	Catches known failure classes before they reach production

The Layer Model

Layer 3: KNOWLEDGE     ← Debug Bank (patterns, protocol, memory, classifier)
Layer 2: RUNTIME       ← Debug Subagent Protocol (breakpoints, variables, call stacks)
Layer 1: STATIC        ← Most agents today (grep, read, guess, retry)

Most coding agents are stuck at Layer 1. Tools like Debug2Fix move them to Layer 2 — but their debug subagent starts from scratch every time. Debug Bank bridges Layer 2 and Layer 3: when your agent matches a pattern, it gets targeted breakpoints and watch expressions from the pattern's debugger strategy, not a blind stepping session. The result: fewer steps to diagnosis, higher-quality fixes from canonical solutions, and graceful fallback to exploratory mode for novel bugs.

The Problem This Solves

AI coding agents are expensive debugging partners:

They re-investigate bugs they've seen before — from scratch, every time
They circle through 5+ failed attempts before finding root causes
They can't learn from corrections — "I told you this yesterday" doesn't stick
They have no pattern recognition — a P08 (Config Chain Gap) looks brand new every time

Stack Overflow data: AI-generated code has 2.66x more formatting problems and 1.5-2x more security bugs than human code. Much of this comes from agents not learning from past failures.

Google's ReasoningBank research proved that distilling failures into reusable patterns yields +8.3% on WebArena and +4.6% on SWE-Bench. Debug Bank is the production-ready implementation of that concept.

Quick Start

Claude Code (Drop-in)

curl -O https://raw.githubusercontent.com/soleimanmansouri/debug-bank/main/CLAUDE.md

Claude Code (Skills)

cp -r skills/debug-trajectory ~/.claude/skills/
cp -r skills/pattern-check ~/.claude/skills/

Codex CLI / Gemini CLI

cp AGENTS.md /path/to/your/project/
cp -r patterns/ /path/to/your/project/patterns/

Cursor

cat CLAUDE.md >> /path/to/your/project/.cursorrules

Works in 30 seconds. No dependencies. No infrastructure. Just markdown files your agent reads.

The 3-Exchange Stop Rule

The single most impactful rule in this repo:

If 3 rounds of iterative fixing show no progress: STOP. Re-plan from scratch, add logging, or switch strategy entirely.

This prevents the #1 failure mode of AI agents — circular debugging that wastes tokens and produces nothing. After switching strategy, the counter resets.

Pre-Deploy Pattern Scanner

Before a bug ships is the cheapest time to catch it. The pre-deploy scanner scans your git diff against the 21 pattern keywords and flags any matches before you deploy.

What it does:

Greps the staged diff for keywords linked to each pattern (e.g., observer, subscribe, multiple writers, fallback, retry)
Prints a ranked list of flagged patterns with their quick-check
Exits non-zero when matches are found, so it can block a deploy pipeline

Run it manually:

bash integrations/pre-deploy-check.sh

Hook it into Claude Code — so it runs automatically before every deploy action. See the full setup guide:

integrations/claude-code-pre-deploy.md

Example output:

[debug-bank] Pre-Deploy Pattern Scan
Scanning git diff for known failure patterns...

  FLAGGED  P03 Observer/Hook Multiplier
           keyword: subscribe
           Check: Deduplicate by event/frame ID

  FLAGGED  P08 Config Resolution Chain Gap
           keyword: fallback
           Check: Trace the full fallback chain

2 pattern(s) flagged. Review before deploying.
Exit code: 1

No flagged patterns means a clean scan — the script exits 0 and the deploy proceeds.

Symptom Classifier — Step 0

Before the 7-step protocol begins, run the symptom through the Symptom Classifier. It maps keywords to pattern IDs with confidence scoring:

INPUT:  "The greeting plays twice on every call"
OUTPUT: Primary:   P03 (Observer Multiplier) — HIGH — 3/3 checklist
        Secondary: P01 (Wrapper Defaults)    — MEDIUM — 1/3 checklist
        → Debugger: break on observer callback, watch frame.id hit count

The classifier covers 25+ symptom signals, 5 compound pattern triggers, and outputs targeted breakpoints when a pattern's debugger strategy is available. See the full keyword index and usage protocol in classifier/symptom-classifier.md.

Debug Subagent Protocol — Pattern-Guided Runtime Debugging

When your agent has access to a runtime debugger (PDB, JDB), the Debug Subagent Protocol defines how a main agent delegates targeted investigations to a specialized debug subagent.

Unlike brute-force approaches like Debug2Fix (which explore from scratch), this protocol feeds the subagent pattern-specific breakpoints:

Approach	Starting Knowledge	Avg Breakpoints	Steps to Diagnosis
Debug2Fix (brute-force)	None	8-15	15-25
Debug Bank v3 (pattern-guided)	Pattern match + debugger strategy	2-4	5-12

Three delegation modes based on classifier confidence:

High confidence: "Confirm P02. Set breakpoints on context_manager.save and observer.save_transcript_turn. Watch inspect.stack() at each write."
Low confidence: "Investigate whether P08 applies. Break at each config resolution level, report which source provides the value."
No match: Exploratory mode — inspect locals at error site, walk the call stack.

Full spec with typed tool signatures, evidence format, and integration points: protocol/debug-subagent.md.

22 Battle-Tested Patterns

Each pattern has: description, 30-second checklist, real-world examples, fix strategy, prevention guide, and debugger strategy (targeted breakpoints, watch expressions, isolation technique for PDB/JDB).

Code Structure

ID	Pattern	Quick Check
P01	Wrapper/Decorator Default Mismatch	Audit ALL parent class defaults when wrapping
P03	Observer/Hook Multiplier	Deduplicate by event/frame ID
P05	Context-Dependent Flag Duality	Check if any context needs the opposite value
P20	Filler/Background Audio Pipeline Contention	Ensure only one source writes to the audio pipeline at a time
P21	Untested Handler Path After Shared Code Change	Test ALL handlers in files you changed, not just the one you edited

Data Integrity

ID	Pattern	Quick Check
P02	Multiple Write Sources → Corruption	Grep for ALL writes to the same target
P09	Auto-Apply Pipeline Writing Feedback as Data	Validate payload matches target field structure

Configuration

ID	Pattern	Quick Check
P07	Stale/Dead Config	Trace where runtime actually reads from
P08	Config Resolution Chain Gap	Trace the full fallback chain
P10	Contradictory Multi-Source Config	Validate ALL sibling fields match provider

Dependencies

ID	Pattern	Quick Check
P06	Dependency Resolution Cascade	Check lock file after adding any dependency

Platform Quirks

ID	Pattern	Quick Check
P11	Credential Expression Scope Limitation	Test credential expressions with echo/log
P12	Expression Engine Corrupts Non-JSON Bodies	Use JSON-based APIs in workflow engines
P13	Parse Code Matches Errors as Success	Check for error indicators BEFORE extracting data
P14	Expression Evaluation Requires Prefix	Add prefix if template renders as literal
P15	Multi-Output Node Rejects Valid Returns	Use parallel single-output nodes
P16	Binary Data Is Reference-Based	Use helper methods to read actual data

LLM / AI Agents

ID	Pattern	Quick Check
P04	LLM Copies Example Text as Behavior	No action-like text in prompts
P17	Model Speaks Everything in Context	Keep speakable text out of conversation history
P18	Model Loops Without Stop Signal	Set precise timeouts, add idempotency guards
P19	Prompt Engineering Has Hard Limits	Switch to code-level after 2 failed prompt fixes
P22	Iterative Fix Regression (Failswitch)	STOP after 2 failed fixes — deep analyze before attempt 3

Scenarios — Multi-Service Debugging Challenges

Single-file bugs are for practice. Real production bugs span services, databases, and timing boundaries. The scenarios/ directory contains self-contained L3-L4 debugging environments where the symptom is in one place and the root cause is somewhere else entirely.

#	Name	Tier	Patterns	Key Challenge
S01	Stale Cache Race	L4	P02 + P08	Cache invalidation arrives after consumer reads stale data
S02	Retry Storm Amplification	L4	P06 + P03	Library upgrade changes retry defaults, cascading across services
S03	Silent Schema Drift	L3	P07 + P02 + P13	Migration runs but service reads stale schema cache

Each scenario includes: system architecture, red herrings, full investigation path, solution, and blast-radius analysis. See scenarios/README.md for the full guide.

Postmortems — Learning From Production Incidents

Anonymized postmortem reports from real incidents. Each goes beyond "what broke" to cover timeline, false leads, blast radius, and — most importantly — systemic mitigation that prevents the entire CLASS of incident.

#	Title	Duration	Impact	Patterns
PM01	The Invisible Throttle	4.5 hours	12% of requests silently degraded	P07 + P13
PM02	Midnight Migration	2 hours	Full outage + 30 min data loss	P02 + P08
PM03	The Helpful Retry	35 minutes	$23K in duplicate charges	P06 + P03

See postmortems/README.md for the template and writing guide.

Compositions — When Patterns Combine

Real bugs rarely match a single pattern. Compositions document common pattern pairings, why they amplify each other, and how to detect the combination.

ID	Composition	Patterns	Signal
C01	Write Race + Stale Fallback	P02 + P08	Intermittent stale data that self-heals then re-breaks
C02	Upgrade Cascade + Retry Multiplier	P06 + P03	Traffic amplification after dependency update
C03	Silent Success + Stale Config	P13 + P07	Wrong results, no errors, 100% "success" rate
C04	LLM Hallucination + Missing Stop	P04 + P18	AI agent loops wrong behavior confidently
C05	Prompt Limits + Flag Duality	P19 + P05	Prompt fix breaks opposite context

See compositions/README.md for investigation strategies.

Difficulty Tiers

The protocol scales with the bug's scope. Use the Difficulty Tiers guide to right-size your investigation:

Tier	Scope	Time Budget	Example
L1	Single file	5-30 min	Off-by-one, wrong variable, missing null check
L2	Multi-file, single service	30 min - 2 hours	Controller returns wrong data due to service layer bug
L3	Multi-service	2-8 hours	Service A writes correctly, service B reads stale cache
L4	Distributed / timing	4 hours - 2 days	Cache invalidation race, retry storm, eventual consistency violation

Feedback Rules — Your Agent Adapts to You

When you correct your agent, the correction becomes a persistent rule:

---
name: no-mocking-database
type: feedback
---
Integration tests must hit a real database, not mocks.

**Why:** Prior incident where mock/prod divergence masked a broken migration.
**How to apply:** Any test file touching database operations.

The Why lets the agent judge edge cases instead of blindly following rules. After 30+ rules, the agent rarely needs the same correction twice.

Project Structure

debug-bank/
├── CLAUDE.md                          # Drop-in for Claude Code
├── AGENTS.md                          # Cross-agent (Codex, Gemini CLI, Cursor)
├── protocol/
│   ├── debug-trajectory.md            # The 7-step protocol
│   ├── debug-subagent.md              # v3: Pattern-guided debug subagent spec
│   ├── 3-exchange-rule.md             # When to stop and re-plan
│   ├── difficulty-tiers.md            # L1-L4 scale selector
│   └── feedback-capture.md            # Corrections → persistent rules
├── classifier/
│   └── symptom-classifier.md          # v3: Symptom → pattern matcher with confidence scoring
├── patterns/
│   ├── P01 through P22               # 22 patterns, each with debugger strategy
│   └── TEMPLATE.md                    # Add your own (includes debugger_strategy section)
├── compositions/                      # Common pattern combinations
│   ├── C01 through C05               # 5 documented compositions
│   └── README.md
├── scenarios/                         # Multi-service debugging challenges
│   ├── S01 through S03               # L3-L4 scenarios with full solutions
│   ├── TEMPLATE.md
│   └── README.md
├── postmortems/                       # Anonymized production incidents
│   ├── PM01 through PM03             # With blast radius + systemic mitigation
│   ├── TEMPLATE.md
│   └── README.md
├── memory/
│   ├── schema.md                      # Memory file format
│   ├── feedback-rules.md              # Behavioral rule structure
│   └── domain-catalogs.md             # Organizing bugs by subsystem
├── skills/
│   ├── debug-trajectory/SKILL.md      # Claude Code skill
│   └── pattern-check/SKILL.md        # Pre-investigation scan
├── examples/                          # 20 real bug trajectories
│   ├── voice-pipeline/
│   ├── api-integration/
│   └── config-management/
└── integrations/                      # Setup guides per agent
    ├── claude-code.md
    ├── codex-cli.md
    ├── gemini-cli.md
    ├── cursor.md
    ├── pre-deploy-check.sh            # Bash scanner: git diff → pattern keywords
    └── claude-code-pre-deploy.md      # Claude Code hook integration guide

Why This Works

Compound learning — Every bug fix teaches the system. After 50 bugs, most issues resolve at Step 1 (pattern match).

Transfers across projects — P02 (Multiple Writers) and P08 (Config Chain Gap) appear in web apps, APIs, pipelines, and infrastructure. The pattern bank moves with you.

User-driven self-improvement — Feedback rules capture corrections with WHY context. The agent gets better at matching your expectations over time.

Evidence-based — Every pattern has a check list. Every catalog entry links to a pattern ID. Nothing is "just trust me."

Research Foundation

Research	Contribution	How Debug Bank Uses It
Google ReasoningBank (2025)	Distilling reasoning from failures yields +8.3% WebArena, +4.6% SWE-Bench	Pattern bank + domain catalogs = production implementation of this concept
AgentDebug (ICLR 2026)	Agent Error Taxonomy across 5 failure categories, +24% all-correct accuracy	P01-P22 categories map to and extend the taxonomy
Debug2Fix (2026)	Subagent debugger architecture, +12-22% fix rate via PDB/JDB	Debug subagent protocol adds pattern-guided breakpoints to this architecture
debug-gym (2025)	Text-based interactive debugging environment for LLM agents	Debugger strategy fields designed to be compatible with debug-gym tool interface
Trajectory-based learning	Searchable, pattern-linked debug entries	Every recorded trajectory feeds the classifier and grows the pattern bank

Contributing

Add patterns: Copy patterns/TEMPLATE.md, assign the next P-number, submit a PR with a real-world example.

Add domain catalogs: Create a directory under examples/ with bug entries following memory/domain-catalogs.md.

Share feedback rules: The best rules include a clear Why that helps the agent judge edge cases.

License

MIT

Built from months of production debugging across diverse software systems. Battle-tested on 100+ real bugs before being open-sourced.

Created by Soleiman Mansouri.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
classifier		classifier
compositions		compositions
examples		examples
integrations		integrations
memory		memory
patterns		patterns
postmortems		postmortems
protocol		protocol
scenarios		scenarios
skills		skills
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Debug Bank

How It Works

The Layer Model

The Problem This Solves

Quick Start

Claude Code (Drop-in)

Claude Code (Skills)

Codex CLI / Gemini CLI

Cursor

The 3-Exchange Stop Rule

Pre-Deploy Pattern Scanner

Symptom Classifier — Step 0

Debug Subagent Protocol — Pattern-Guided Runtime Debugging

22 Battle-Tested Patterns

Code Structure

Data Integrity

Configuration

Dependencies

Platform Quirks

LLM / AI Agents

Scenarios — Multi-Service Debugging Challenges

Postmortems — Learning From Production Incidents

Compositions — When Patterns Combine

Difficulty Tiers

Feedback Rules — Your Agent Adapts to You

Project Structure

Why This Works

Research Foundation

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages