Code graph extraction for LLM-assisted debugging. Turn a codebase into a directed property graph so an LLM gets surgical, structured, bounded context — the broken node plus its neighbors — instead of the whole repository.
LLMs fail on large codebases for three reasons:
- Too much context → noise, confusion, hallucination
- Too little context → blind spots, wrong fixes
- No structure → the model can't see how components relate
Bugs don't live in isolation — they live in the relationships between components. ContextAI makes those relationships explicit and traversable by representing every meaningful piece of code as a node and every relationship as an edge. When debugging, you feed the LLM only the broken node, its adjacent nodes, and the connecting edges.
source code ──▶ extraction pipeline ──▶ directed property graph ──▶ bounded LLM context
(AST · framework · (nodes + edges, (a node + its
patterns · runtime) fully typed schema) neighborhood)
ContextAI scans a Python project — .py modules and .ipynb notebooks — and emits a typed graph (graph.json) plus an interactive visualization (graph.html). Each node carries its signature, side effects, error-handling profile, and complexity; each edge carries its contract, criticality, and failure behavior.
Requirements: Python 3.11+
# install
pip install -e . # or: pip install -r requirements.txt
# extract a graph from any file or directory
contextai path/to/your/project/ # or: python3 run.py path/to/your/project/
# outputs (under out/, gitignored):
# out/graph.json — all nodes + edges
# out/index.html — dashboard: coverage · connectivity health · interactive graph
# out/graph.html — raw interactive graph (pyvis, self-contained)Open out/index.html for the full picture — extraction coverage, connectivity health
(fragmentation, isolated nodes, sinks/sources), node-type distribution, and the live graph
with a code/location details panel, all in one self-contained file.
Run it against the bundled benchmark (the VS Code Flask tutorial):
python3 run.py python-sample-vscode-flask-tutorial/| Layer | Tools | Gets you | Status |
|---|---|---|---|
| 1. AST + types | Python ast (.py + .ipynb) |
functions, classes, imports, call sites, signatures, side effects | ✅ Implemented |
| 2. Framework conventions | custom parsers | route → handler, templates, static assets (Flask) | ✅ Flask; others planned |
| 2.5. Pattern matching | regex on source | DB tables, hardcoded URLs, cache ops, template refs | 🟡 Partial |
| 3. Runtime tracing | sys.settrace + asyncio hooks |
actual call chains, dynamic dispatch, async fan-out | ✅ Implemented |
| 4. LLM pass | Claude / GPT | semantic intent, implicit relationships | ⏳ Planned |
No single method captures everything: static analysis sees what code says, runtime tracing sees what it does. ContextAI merges both into one graph.
run.py
├─ Pass 1: walk files → emit NODES (ast_extractor, flask_convention_extractor)
└─ Pass 2: resolve IDs → emit EDGES (edge_extractor)
graph/extractors/runtime/ ← trace a running app and merge real call chains
├─ tracer.py sys.settrace + asyncio monkey-patches
├─ call_log.py captured calls → call_log.json
├─ script_runner.py drive a target script/entry point under the tracer
└─ edge_injector.py merge runtime calls into the static graph
schema.py is the source of truth (Pydantic v2).
Nodes carry: id, type, name, location, code, signature (typed inputs/outputs), side_effects, error_handling, and metadata (complexity, test coverage, staleness).
Edges carry: id, type, from, to, direction, contract (input/output shape), criticality, on_failure (retry / default / throw / circuit-break), and performance.
Node types
- Boundary:
API_ENDPOINT,MESSAGE_CONSUMER,CRON_JOB - Logic:
FUNCTION,CLASS,MIDDLEWARE,ROUTE_HANDLER - Data:
SCHEMA,MODEL,DTO,DATABASE,TABLE,COLLECTION - Infra:
MESSAGE_QUEUE,FILE_STORAGE,EXTERNAL_LIBRARY - Scaffolding:
MODULE_INIT,ENTRY_POINT,FILE,TEMPLATE,STATIC_ASSET
Edge types
| Category | Edges |
|---|---|
| Call | CALLS, CALLS_ASYNC, DELEGATES_TO |
| Dependency | IMPORTS, IMPORTS_SIDE_EFFECT, INHERITS, IMPLEMENTS, INSTANTIATES, INJECTS |
| Data | READS, WRITES, VALIDATES, TRANSFORMS, MAPS_TO, RETURNS |
| Communication | HANDLES, GUARDS, PUBLISHES_TO, SUBSCRIBES_TO, CALLS_EXTERNAL, RENDERS, SERVES_STATIC |
| App wiring | USES_APP_INSTANCE |
All default to graph.json in the current directory.
python3 tools/graph_connectivity.py # health score, isolated nodes, islands
python3 tools/coverage_from_graph.py # % of source lines covered by nodes
python3 tools/graph_duplicates.py # duplicate / overlapping node ranges
python3 tools/diff_graph.py a.json b.json # diff two graphspython3 tools/run_with_tracing.py \
--target your_app/main.py \
--project-root your_app/ \
--build-static \
--output graph_runtime.jsonRuns your app under the tracer, then merges observed call chains into the static graph (confirming static edges, filling gaps, and adding runtime-only edges).
graph/api.py is the only surface the MCP server (and any other client) should import — never reach into extractors or GraphStore directly.
from graph.api import (
build_graph, load_graph, find_node, get_context, list_gaps, get_edge_path,
run_trace, merge_trace, start_trace, stop_trace,
)Querying. get_context(store, node_id, depth=2, direction="in") returns a bounded subgraph around a node. It is incoming-biased by default: deep predecessors (who calls this — the blast radius), one shallow successor hop, and a successor pull around gap nodes. Pass direction="out" / "both" to change the bias.
Runtime tracing has three capture modes, all converging on one merge:
| Mode | Entry point | Use it for |
|---|---|---|
| One-shot script / IDE run | run_trace(target, project_root, …) |
capture + merge a single script or entry point in one call |
| Long-running session | start_trace(project_root) … stop_trace(project_root, base_graph, output) |
a server or worker traced across many requests without restarting — every call in between is unioned into one capture |
| Per-request web | TracingMiddleware / AsyncTracingMiddleware (tools/trace_middleware.py) |
trace one request at a time, triggered by an X-Trace: 1 header |
All three feed merge_trace(base_graph, call_log, project_root, output) — the single seam that folds a runtime call log onto a base graph. The base is a parameter: pass the static graph to merge a single action, or a prior runtime graph to accumulate a sequence of actions (call counts sum, edges union). The static graph is never mutated — every merge writes a fresh overlay.
pip install pytest
pytest -qThe suite (153 tests) is built on an inductive strategy: every atomic extraction pattern — structural, web/API, data access, messaging, signatures, data flow, async runtime — has a minimal fixture and an exact-count assertion. If the extractor handles every base case, it handles their combinations.
tests/
test_static_induction.py structural · web · data · messaging · signatures · data flow
test_phase2_edges.py call resolution, dynamic dispatch, super(), properties
test_phase3_runtime.py tracer capture + edge-injector merge + end-to-end
test_phase3_http.py HTTP / async routing patterns
test_runtime_api.py public API: direction-aware context, merge/run/session tracing
test_notebook_extractor.py .ipynb flattening + node/edge extraction across cells
test_dashboard.py metrics builder + self-contained dashboard generation
fixtures/ minimal atomic patterns per test
schema.py NodeSchema + EdgeSchema (Pydantic, source of truth)
run.py entry point: extract → store → visualize (contextai cli)
graph/
api.py public API consumed by the MCP server
extractors/
ast_extractor.py Python AST → nodes (functions, classes, schemas, …)
edge_extractor.py all edge types
notebook_extractor.py .ipynb → flatten code cells → reuse AST pipeline
flask_convention_extractor.py templates + static assets
runtime/ sys.settrace tracer + call log + script runner + edge injector
store/graph_store.py NetworkX graph + JSON persistence + direction-aware neighbor traversal
visualizer/visualizer.py pyvis HTML output (out/graph.html)
tools/ connectivity, coverage, duplicates, diff, tracing
benchmarks/flask-tutorial/ hand-authored ground-truth graph (diff target)
dashboard/ self-contained dashboard → out/index.html
metrics.py reuses the coverage + connectivity tools → one payload
dashboard.py / template.html embed graph + metrics into a single HTML file
out/ generated artifacts (gitignored)
docs/ design + planning notes
tests/ inductive test suite + fixtures
- Static extraction (AST) — nodes, edges, signatures, side effects
- Flask framework conventions (routes, templates, static assets)
- Runtime tracing (sync + async call chains)
- Inductive test suite (153 tests)
- Jupyter
.ipynbnotebook extraction — code cells → AST pipeline, with cell-aware locations - MCP server — expose the graph to LLM clients as a tool (
docs/MCP_SERVER_PLAN.md) - LLM integration — neighborhood-context retrieval for debugging
- More frameworks (Django, FastAPI), git/version metadata, derived impact edges (
AFFECTS,DEPENDS_ON,TRIGGERS) - Multi-language extraction (JS/TS →
UI_COMPONENT)
See docs/ for design and planning notes.
Alpha. The static extractor and runtime tracer work and are covered by tests. The LLM/MCP consumption layer — the part that turns the graph into better debugging answers — is in active development.
No license has been chosen yet. Until one is added, all rights are reserved by the author.