Skip to content

oladri-renuka/code-memory-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Code Agent with Persistent Codebase Memory

A coding agent that builds and queries a structured, persistent memory of a codebase instead of re-reading files at every step. It indexes file purposes, symbol signatures, cross-file dependencies, and design decisions into a SQLite store, then queries that store before touching the filesystem. Stale entries are detected via content hashing and invalidated automatically.

The result: fewer redundant file reads, consistent decisions across long multi-step tasks, and an explicit trace proving both.

Architecture

Architecture diagram showing the five-component pipeline: Executor orchestrates Query Interface, Staleness Checker, Indexer, and SQLite Memory Store

The system has five components, each with a single responsibility:

Component File Role
Executor src/executor.py Orchestrates the per-step loop: decompose task → query memory → check staleness → read only if needed → act → log decisions
Memory Store src/memory_store.py SQLite-backed persistence with three tables (files, symbols, decisions), WAL journaling, indexed symbol lookups, and foreign-key cascades
Query Interface src/query.py Two-tier retrieval: fast path-and-symbol matching first, LLM-based semantic ranking only when the fast path finds too little. Staleness is a hard gate — stale entries are never returned
Staleness Checker src/staleness.py SHA-256 content hashing. Compares stored hash against current file on disk. Enforced as a non-bypassable gate inside the query path
Indexer src/indexer.py Extracts structured facts from source files via LLM. Uses file-type-aware prompts (code vs config vs test) because a single generic prompt produces inconsistent results

Data flow per step

┌─────────────────────────────────────────────────────────────────┐
│  For each step in the decomposed task:                         │
│                                                                 │
│  1. Query memory for files/symbols relevant to this step        │
│  2. Staleness gate: check content hash for every match          │
│     ├─ Fresh → serve from memory (no file I/O)                  │
│     └─ Stale → invalidate, add to "needs fresh read" set        │
│  3. Read only files that memory couldn't cover                  │
│  4. Index freshly read files into SQLite for future steps       │
│  5. Execute the step via LLM, with prior decisions as context   │
│  6. If a naming/pattern decision was made, log it to the        │
│     decision table so later steps follow it consistently        │
└─────────────────────────────────────────────────────────────────┘

SQLite schema

files     (file_path PK, content_hash, purpose, depends_on JSON, indexed_at)
symbols   (id PK, file_path FK→files CASCADE, name, signature, line, kind)
decisions (id PK, decision, reasoning, step_number, related_files JSON, timestamp)

-- Indexed for fast lookups
CREATE INDEX idx_symbols_name      ON symbols(name);
CREATE INDEX idx_symbols_file_path ON symbols(file_path);
CREATE INDEX idx_decisions_step    ON decisions(step_number);

What a memory entry looks like

{
  "file_path": "src/auth/session.py",
  "content_hash": "a3f9...",
  "purpose": "Manages user session creation, validation, and expiry",
  "key_symbols": [
    {"name": "create_session", "signature": "create_session(user_id: str) -> Session", "line": 24, "kind": "function"},
    {"name": "validate_session", "signature": "validate_session(token: str) -> bool", "line": 41, "kind": "function"}
  ],
  "depends_on": ["src/auth/tokens.py", "src/db/models.py"],
  "indexed_at": "2026-06-22T18:45:11Z"
}

Results

Evaluated on encode/httpx (23 source files), a production HTTP client library.

Task: Add a request_id parameter to all public API functions and propagate it through the Client and AsyncClient call chains for distributed tracing, following a consistent naming convention.

Read efficiency

Metric Memory Agent Naive Baseline
File reads 4 7
Memory queries 8 0
Memory hits 3 0
Reads avoided 3
Reduction 42.9%

Decision consistency

The agent logged a naming convention (request_id: str | None = None) at step 1. Across the remaining 7 steps, it produced 19 explicit decision-reuse events — each one a later step retrieving and following the convention from the decision table rather than re-deciding it independently.

DECIDED  step 1: parameter naming and default value convention
DECIDED  step 2: request_id parameter naming and default value convention

REUSED   step 3: followed → "request_id: default None, backward compatible"
REUSED   step 5: followed → "request_id: default None, backward compatible"
REUSED   step 6: followed → "request_id: default None, backward compatible"
REUSED   step 7: followed → "request_id: default None, backward compatible"
REUSED   step 8: followed → "request_id: default None, backward compatible"
  ... (19 total reuse events across steps 2-8)

Per-step trace

Step Description Memory Fresh Reads Decisions Reused
1 Review public API in _api.py MISS 1
2 Establish naming convention MISS 0 1
3 Modify _api.py functions HIT 0 2
4 Review Client/AsyncClient in _client.py MISS 1 2
5 Modify Client class HIT 0 3
6 Modify AsyncClient class HIT 0 4
7 Validate parameter propagation MISS 0 5
8 Update documentation MISS 2 2

Steps 3, 5, and 6 required zero file reads — the agent served everything from its indexed memory of _api.py and _client.py. Step 1 was the cold start; every subsequent reference to those files was a memory hit.

Quickstart

# Clone and set up
cd code_memory_agent
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
cp .env.example .env   # add your OPENROUTER_API_KEY

# Run against any codebase
python -m code_memory_agent.run \
  --codebase /path/to/repo \
  --task "rename all helper functions to snake_case consistently" \
  -v

# Run tests
pytest tests/ -v

CLI options

Flag Description
--codebase Path to the target repository root
--task Natural language description of the multi-step task
--memory-file Path to SQLite DB (default: <codebase>/.code_memory.db)
--output Path to save the JSON efficiency report
-v Verbose logging — shows per-step memory/staleness decisions

Tests

17 unit tests covering the staleness and storage layer:

test_staleness.py::TestComputeHash::test_same_content_same_hash           PASSED
test_staleness.py::TestComputeHash::test_different_content_different_hash  PASSED
test_staleness.py::TestComputeHash::test_nonexistent_file_returns_empty   PASSED
test_staleness.py::TestIsStaleFresh::test_freshly_indexed_file_is_not_stale PASSED
test_staleness.py::TestIsStaleModified::test_modified_file_is_stale       PASSED
test_staleness.py::TestIsStaleModified::test_unmodified_sibling_stays_fresh PASSED
test_staleness.py::TestIsStaleEdgeCases::test_never_indexed_file_is_stale PASSED
test_staleness.py::TestIsStaleEdgeCases::test_deleted_file_is_stale       PASSED
test_staleness.py::TestInvalidation::test_invalidation_removes_only_stale_file PASSED
test_staleness.py::TestInvalidation::test_invalidation_of_fresh_file_does_nothing PASSED
test_staleness.py::TestInvalidation::test_invalidation_clears_symbols     PASSED
test_staleness.py::TestBatchStaleness::test_batch_check                   PASSED
test_staleness.py::TestMemoryStoreSQL::test_set_and_get_file              PASSED
test_staleness.py::TestMemoryStoreSQL::test_overwrite_file_replaces_symbols PASSED
test_staleness.py::TestMemoryStoreSQL::test_decision_log                  PASSED
test_staleness.py::TestMemoryStoreSQL::test_summary_stats                 PASSED
test_staleness.py::TestMemoryStoreSQL::test_persistence_across_reopen     PASSED

Key properties tested:

  • Isolation: modifying file A does not invalidate file B's memory
  • Cascade: invalidating a file removes its symbol index entries
  • Persistence: data survives store close and reopen
  • Correctness: deleted files, never-indexed files, and modified files are all correctly identified as stale

Project structure

code_memory_agent/
├── src/
│   ├── memory_store.py    # SQLite schema, CRUD, symbol index, decision log
│   ├── indexer.py         # LLM-powered structured extraction (code/config/test)
│   ├── staleness.py       # SHA-256 content-hash invalidation
│   ├── query.py           # Two-tier memory lookup with staleness hard gate
│   ├── executor.py        # Task decomposition and per-step orchestration
│   └── llm_client.py      # OpenRouter client with retry logic
├── benchmark/
│   └── efficiency_report.py  # Naive-baseline comparison, rich terminal output
├── tests/
│   └── test_staleness.py  # 17 unit tests for staleness + storage correctness
├── results/
│   └── httpx_run.json     # Full trace from the httpx evaluation
├── assets/
│   └── architecture.svg   # System architecture diagram
├── run.py                 # CLI entry point
├── requirements.txt
└── .env.example

Design decisions

Decision Rationale
SQLite over JSON files ACID transactions, indexed symbol lookups via SQL, WAL for read concurrency, foreign-key cascades for cleanup — without requiring a running server
SHA-256 staleness as a hard gate A memory system that serves stale facts as current is worse than no memory. The check lives inside the query path and cannot be bypassed by the executor
File-type-aware indexer prompts Config files have settings, test files have test functions, code files have signatures. A single generic extraction prompt produces inconsistent results across these
Two-tier query (path match → LLM ranking) Most lookups can be resolved by matching file paths or symbol names mentioned in the task description. LLM-based semantic ranking is expensive and only fires when the fast path finds fewer than 2 results
Executor plans but does not modify files Keeps the benchmark clean (measuring reads, not write correctness) and the demo safe. The action trace describes exactly what changes would be made
Decision log with step attribution Enables the consistency proof: step N logs a convention, step N+3 retrieves it and follows it. The trace captures both the logging and the reuse explicitly

Tech stack

  • Python 3.13
  • SQLite with WAL journaling and foreign keys
  • OpenAI SDK via OpenRouter (model-agnostic — works with GPT-4o-mini, Claude, etc.)
  • Rich for terminal output
  • pytest for unit tests

Requirements

  • Python 3.11+
  • An OpenRouter API key (or any OpenAI-compatible endpoint)
  • No GPU required — this is an orchestration project using API calls

About

Coding agent that builds persistent SQLite memory of a codebase, queries memory before reading files, and proves decision consistency across multi-step tasks

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages