Code Agent with Persistent Codebase Memory

A coding agent that builds and queries a structured, persistent memory of a codebase instead of re-reading files at every step. It indexes file purposes, symbol signatures, cross-file dependencies, and design decisions into a SQLite store, then queries that store before touching the filesystem. Stale entries are detected via content hashing and invalidated automatically.

The result: fewer redundant file reads, consistent decisions across long multi-step tasks, and an explicit trace proving both.

Architecture

The system has five components, each with a single responsibility:

Component	File	Role
Executor	`src/executor.py`	Orchestrates the per-step loop: decompose task → query memory → check staleness → read only if needed → act → log decisions
Memory Store	`src/memory_store.py`	SQLite-backed persistence with three tables (`files`, `symbols`, `decisions`), WAL journaling, indexed symbol lookups, and foreign-key cascades
Query Interface	`src/query.py`	Two-tier retrieval: fast path-and-symbol matching first, LLM-based semantic ranking only when the fast path finds too little. Staleness is a hard gate — stale entries are never returned
Staleness Checker	`src/staleness.py`	SHA-256 content hashing. Compares stored hash against current file on disk. Enforced as a non-bypassable gate inside the query path
Indexer	`src/indexer.py`	Extracts structured facts from source files via LLM. Uses file-type-aware prompts (code vs config vs test) because a single generic prompt produces inconsistent results

Data flow per step

┌─────────────────────────────────────────────────────────────────┐
│  For each step in the decomposed task:                         │
│                                                                 │
│  1. Query memory for files/symbols relevant to this step        │
│  2. Staleness gate: check content hash for every match          │
│     ├─ Fresh → serve from memory (no file I/O)                  │
│     └─ Stale → invalidate, add to "needs fresh read" set        │
│  3. Read only files that memory couldn't cover                  │
│  4. Index freshly read files into SQLite for future steps       │
│  5. Execute the step via LLM, with prior decisions as context   │
│  6. If a naming/pattern decision was made, log it to the        │
│     decision table so later steps follow it consistently        │
└─────────────────────────────────────────────────────────────────┘

SQLite schema

files     (file_path PK, content_hash, purpose, depends_on JSON, indexed_at)
symbols   (id PK, file_path FK→files CASCADE, name, signature, line, kind)
decisions (id PK, decision, reasoning, step_number, related_files JSON, timestamp)

-- Indexed for fast lookups
CREATE INDEX idx_symbols_name      ON symbols(name);
CREATE INDEX idx_symbols_file_path ON symbols(file_path);
CREATE INDEX idx_decisions_step    ON decisions(step_number);

What a memory entry looks like

{
  "file_path": "src/auth/session.py",
  "content_hash": "a3f9...",
  "purpose": "Manages user session creation, validation, and expiry",
  "key_symbols": [
    {"name": "create_session", "signature": "create_session(user_id: str) -> Session", "line": 24, "kind": "function"},
    {"name": "validate_session", "signature": "validate_session(token: str) -> bool", "line": 41, "kind": "function"}
  ],
  "depends_on": ["src/auth/tokens.py", "src/db/models.py"],
  "indexed_at": "2026-06-22T18:45:11Z"
}

Results

Evaluated on encode/httpx (23 source files), a production HTTP client library.

Task: Add a request_id parameter to all public API functions and propagate it through the Client and AsyncClient call chains for distributed tracing, following a consistent naming convention.

Read efficiency

Metric	Memory Agent	Naive Baseline
File reads	4	7
Memory queries	8	0
Memory hits	3	0
Reads avoided	3	—
Reduction	42.9%	—

Decision consistency

The agent logged a naming convention (request_id: str | None = None) at step 1. Across the remaining 7 steps, it produced 19 explicit decision-reuse events — each one a later step retrieving and following the convention from the decision table rather than re-deciding it independently.

DECIDED  step 1: parameter naming and default value convention
DECIDED  step 2: request_id parameter naming and default value convention

REUSED   step 3: followed → "request_id: default None, backward compatible"
REUSED   step 5: followed → "request_id: default None, backward compatible"
REUSED   step 6: followed → "request_id: default None, backward compatible"
REUSED   step 7: followed → "request_id: default None, backward compatible"
REUSED   step 8: followed → "request_id: default None, backward compatible"
  ... (19 total reuse events across steps 2-8)

Per-step trace

Step	Description	Memory	Fresh Reads	Decisions Reused
1	Review public API in `_api.py`	MISS	1	—
2	Establish naming convention	MISS	0	1
3	Modify `_api.py` functions	HIT	0	2
4	Review `Client`/`AsyncClient` in `_client.py`	MISS	1	2
5	Modify `Client` class	HIT	0	3
6	Modify `AsyncClient` class	HIT	0	4
7	Validate parameter propagation	MISS	0	5
8	Update documentation	MISS	2	2

Steps 3, 5, and 6 required zero file reads — the agent served everything from its indexed memory of _api.py and _client.py. Step 1 was the cold start; every subsequent reference to those files was a memory hit.

Quickstart

# Clone and set up
cd code_memory_agent
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
cp .env.example .env   # add your OPENROUTER_API_KEY

# Run against any codebase
python -m code_memory_agent.run \
  --codebase /path/to/repo \
  --task "rename all helper functions to snake_case consistently" \
  -v

# Run tests
pytest tests/ -v

CLI options

Flag	Description
`--codebase`	Path to the target repository root
`--task`	Natural language description of the multi-step task
`--memory-file`	Path to SQLite DB (default: `<codebase>/.code_memory.db`)
`--output`	Path to save the JSON efficiency report
`-v`	Verbose logging — shows per-step memory/staleness decisions

Tests

17 unit tests covering the staleness and storage layer:

test_staleness.py::TestComputeHash::test_same_content_same_hash           PASSED
test_staleness.py::TestComputeHash::test_different_content_different_hash  PASSED
test_staleness.py::TestComputeHash::test_nonexistent_file_returns_empty   PASSED
test_staleness.py::TestIsStaleFresh::test_freshly_indexed_file_is_not_stale PASSED
test_staleness.py::TestIsStaleModified::test_modified_file_is_stale       PASSED
test_staleness.py::TestIsStaleModified::test_unmodified_sibling_stays_fresh PASSED
test_staleness.py::TestIsStaleEdgeCases::test_never_indexed_file_is_stale PASSED
test_staleness.py::TestIsStaleEdgeCases::test_deleted_file_is_stale       PASSED
test_staleness.py::TestInvalidation::test_invalidation_removes_only_stale_file PASSED
test_staleness.py::TestInvalidation::test_invalidation_of_fresh_file_does_nothing PASSED
test_staleness.py::TestInvalidation::test_invalidation_clears_symbols     PASSED
test_staleness.py::TestBatchStaleness::test_batch_check                   PASSED
test_staleness.py::TestMemoryStoreSQL::test_set_and_get_file              PASSED
test_staleness.py::TestMemoryStoreSQL::test_overwrite_file_replaces_symbols PASSED
test_staleness.py::TestMemoryStoreSQL::test_decision_log                  PASSED
test_staleness.py::TestMemoryStoreSQL::test_summary_stats                 PASSED
test_staleness.py::TestMemoryStoreSQL::test_persistence_across_reopen     PASSED

Key properties tested:

Isolation: modifying file A does not invalidate file B's memory
Cascade: invalidating a file removes its symbol index entries
Persistence: data survives store close and reopen
Correctness: deleted files, never-indexed files, and modified files are all correctly identified as stale

Project structure

code_memory_agent/
├── src/
│   ├── memory_store.py    # SQLite schema, CRUD, symbol index, decision log
│   ├── indexer.py         # LLM-powered structured extraction (code/config/test)
│   ├── staleness.py       # SHA-256 content-hash invalidation
│   ├── query.py           # Two-tier memory lookup with staleness hard gate
│   ├── executor.py        # Task decomposition and per-step orchestration
│   └── llm_client.py      # OpenRouter client with retry logic
├── benchmark/
│   └── efficiency_report.py  # Naive-baseline comparison, rich terminal output
├── tests/
│   └── test_staleness.py  # 17 unit tests for staleness + storage correctness
├── results/
│   └── httpx_run.json     # Full trace from the httpx evaluation
├── assets/
│   └── architecture.svg   # System architecture diagram
├── run.py                 # CLI entry point
├── requirements.txt
└── .env.example

Design decisions

Decision	Rationale
SQLite over JSON files	ACID transactions, indexed symbol lookups via SQL, WAL for read concurrency, foreign-key cascades for cleanup — without requiring a running server
SHA-256 staleness as a hard gate	A memory system that serves stale facts as current is worse than no memory. The check lives inside the query path and cannot be bypassed by the executor
File-type-aware indexer prompts	Config files have settings, test files have test functions, code files have signatures. A single generic extraction prompt produces inconsistent results across these
Two-tier query (path match → LLM ranking)	Most lookups can be resolved by matching file paths or symbol names mentioned in the task description. LLM-based semantic ranking is expensive and only fires when the fast path finds fewer than 2 results
Executor plans but does not modify files	Keeps the benchmark clean (measuring reads, not write correctness) and the demo safe. The action trace describes exactly what changes would be made
Decision log with step attribution	Enables the consistency proof: step N logs a convention, step N+3 retrieves it and follows it. The trace captures both the logging and the reuse explicitly

Tech stack

Python 3.13
SQLite with WAL journaling and foreign keys
OpenAI SDK via OpenRouter (model-agnostic — works with GPT-4o-mini, Claude, etc.)
Rich for terminal output
pytest for unit tests

Requirements

Python 3.11+
An OpenRouter API key (or any OpenAI-compatible endpoint)
No GPU required — this is an orchestration project using API calls

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code Agent with Persistent Codebase Memory

Architecture

Data flow per step

SQLite schema

What a memory entry looks like

Results

Read efficiency

Decision consistency

Per-step trace

Quickstart

CLI options

Tests

Project structure

Design decisions

Tech stack

Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
benchmark		benchmark
results		results
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
requirements.txt		requirements.txt
run.py		run.py

Folders and files

Latest commit

History

Repository files navigation

Code Agent with Persistent Codebase Memory

Architecture

Data flow per step

SQLite schema

What a memory entry looks like

Results

Read efficiency

Decision consistency

Per-step trace

Quickstart

CLI options

Tests

Project structure

Design decisions

Tech stack

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages