Parent Epic
Part of #5 — Integrate Graphify for zero-cost code entity extraction
Task
Create agent_notes/services/code_graph.py — a boundary module that encapsulates all Graphify interaction. No Graphify types leak into the rest of the codebase; every function works with plain Python dicts and Path objects.
Location
/agent_notes/services/code_graph.py (new file, follows existing pattern: wiki_backend.py, memory_backend.py, credentials.py)
Functions
1. graphify_available() -> bool
def graphify_available() -> bool:
"""Return True if the graphifyy package is importable."""
try:
import graphify.extract # noqa: F401
return True
except ImportError:
return False
2. extract_code_graph(folder_path, *, extensions=None, skip_dirs=None) -> dict
Core extraction function. Runs tree-sitter parsing via Graphify's Python API.
Parameters:
folder_path: Path — directory to scan
extensions: set[str] | None — allowed code extensions (default: _CODE_EXTENSIONS)
skip_dirs: set[str] | None — directories to skip (reuse wiki_backend._SKIP_DIRS)
Returns:
{
"nodes": [
{"id": "auth_userservice", "label": "UserService", "source_file": "auth.py",
"source_location": "L42", "type": "class"}
],
"edges": [
{"source": "auth_userservice", "target": "payments_gateway",
"relation": "calls", "confidence": "EXTRACTED"}
],
"communities": {0: ["auth_userservice", "auth_login"], 1: ["payments_gateway"]},
"cohesion": {0: 0.85, 1: 0.72},
"god_nodes": [{"label": "UserService", "degree": 12}],
"stats": {"files_parsed": 5, "nodes": 23, "edges": 41, "communities": 3}
}
Implementation logic:
def extract_code_graph(folder_path: Path, *, extensions=None, skip_dirs=None):
from graphify.extract import collect_files, extract
from graphify.build import build_from_json
from graphify.cluster import cluster, score_all
from graphify.analyze import god_nodes
# Step 1: Collect code files
code_files = collect_files(folder_path)
# Step 2: Filter by extensions if specified
if extensions:
code_files = [f for f in code_files if f.suffix in extensions]
# Step 3: Filter by skip_dirs if specified
if skip_dirs:
code_files = [f for f in code_files
if not any(d in f.parts for d in skip_dirs)]
if not code_files:
return _empty_graph()
# Step 4: Extract AST (zero API cost)
extraction = extract(code_files)
if not extraction.get("nodes"):
return _empty_graph()
# Step 5: Build graph
G = build_from_json(extraction)
# Step 6: Community detection
communities = cluster(G)
cohesion = score_all(G, communities)
gods = god_nodes(G)
# Step 7: Convert to plain dict
nodes = [
{
"id": n,
"label": G.nodes[n].get("label", n),
"source_file": G.nodes[n].get("source_file", ""),
"source_location": G.nodes[n].get("source_location", ""),
"type": G.nodes[n].get("file_type", "code"),
}
for n in G.nodes
]
edges = [
{
"source": u,
"target": v,
"relation": d.get("relation", "related"),
"confidence": d.get("confidence", "EXTRACTED"),
}
for u, v, d in G.edges(data=True)
]
return {
"nodes": nodes,
"edges": edges,
"communities": {k: list(v) for k, v in communities.items()},
"cohesion": {k: v for k, v in cohesion.items()},
"god_nodes": gods,
"stats": {
"files_parsed": len(code_files),
"nodes": len(nodes),
"edges": len(edges),
"communities": len(communities),
},
}
3. graph_to_wiki_terms(graph_data) -> dict
Maps Graphify nodes and communities to wiki-compatible entity and concept names.
Mapping rules:
| Graphify node |
Condition |
Wiki type |
Example |
class |
any degree |
entity |
"UserService" |
function (top-level) |
degree >= 3 |
entity |
"process_payment" |
function (method) |
skip |
— |
stays inside class page |
module / file |
degree >= 2 |
entity |
"auth" |
| Leiden community |
size >= 2 |
concept |
"Authentication System" |
Community naming algorithm:
- Collect
source_file values from all community member nodes
- Extract common path prefix (e.g.,
auth/, payments/)
- If prefix gives a meaningful directory name → use it title-cased
- Otherwise → use the highest-degree node's label + "Module" suffix
- Deduplicate against existing concept names
Returns:
{
"entities": ["UserService", "PaymentGateway", "process_payment"],
"concepts": ["Authentication", "Payment Processing"],
"edges_by_entity": {
"UserService": [
{"target": "PaymentGateway", "relation": "calls"},
{"target": "login", "relation": "contains"}
]
}
}
Implementation detail — filtering trivial nodes:
- Skip nodes whose label starts with
_ (private/internal)
- Skip nodes whose label is
__init__, __main__, setup
- Skip
"rationale" type nodes (Graphify extracts # NOTE: comments as rationale nodes)
- Skip file-level module nodes that are just containers (only have
"contains" edges out)
4. save_graph_json(wiki_root, slug, graph_data) -> Path
import json
def save_graph_json(wiki_root: Path, slug: str, graph_data: dict) -> Path:
"""Write graph.json to raw/<slug>-graph.json. Returns the path."""
raw_dir = wiki_root / "raw"
raw_dir.mkdir(parents=True, exist_ok=True)
path = raw_dir / f"{slug}-graph.json"
path.write_text(json.dumps(graph_data, indent=2, default=str))
return path
Storage rationale: raw/ is the immutable source material directory. The graph is derived from source code — it belongs with source data. .obsidianignore already excludes raw/ from Obsidian indexing.
5. Helper: _empty_graph() -> dict
def _empty_graph():
return {
"nodes": [], "edges": [],
"communities": {}, "cohesion": {},
"god_nodes": [],
"stats": {"files_parsed": 0, "nodes": 0, "edges": 0, "communities": 0},
}
6. Constant: _CODE_EXTENSIONS
_CODE_EXTENSIONS = {
".py", ".ts", ".js", ".tsx", ".jsx",
".go", ".rs", ".java", ".cpp", ".c", ".h",
".rb", ".swift", ".kt", ".cs", ".scala",
".php", ".lua", ".groovy", ".jl",
".f90", ".pas",
}
This matches Graphify's supported tree-sitter languages.
Potential Issues
-
Graphify's collect_files() vs our file walking: collect_files() has its own filtering logic. We may get different file sets than wiki_ingest_folder(). Solution: use our own file list from the walk loop where possible, or at minimum filter collect_files() output with our _SKIP_DIRS and extensions.
-
NetworkX graph iteration order: G.nodes and G.edges(data=True) iteration order is insertion-order in Python 3.7+, but community assignment is non-deterministic (Leiden uses randomization). This is fine — we only need consistent node IDs, not consistent community assignment.
-
Large repositories: extract() on a 1000+ file repo could take 10-30 seconds (tree-sitter is fast but not instant). This is acceptable for a one-time ingest operation, but document that large repos may take a moment.
-
extract() with cache_root: The v7 API supports extract(code_files, cache_root=Path(".")) for caching parsed results. We should pass a cache path to avoid re-parsing on --update runs. Use wiki_root / "raw" as cache root.
-
Import safety: All Graphify imports are lazy (inside function bodies), so import agent_notes never fails even when graphifyy isn't installed.
Dependencies
Parent Epic
Part of #5 — Integrate Graphify for zero-cost code entity extraction
Task
Create
agent_notes/services/code_graph.py— a boundary module that encapsulates all Graphify interaction. No Graphify types leak into the rest of the codebase; every function works with plain Python dicts and Path objects.Location
/agent_notes/services/code_graph.py(new file, follows existing pattern:wiki_backend.py,memory_backend.py,credentials.py)Functions
1.
graphify_available() -> bool2.
extract_code_graph(folder_path, *, extensions=None, skip_dirs=None) -> dictCore extraction function. Runs tree-sitter parsing via Graphify's Python API.
Parameters:
folder_path: Path— directory to scanextensions: set[str] | None— allowed code extensions (default:_CODE_EXTENSIONS)skip_dirs: set[str] | None— directories to skip (reusewiki_backend._SKIP_DIRS)Returns:
{ "nodes": [ {"id": "auth_userservice", "label": "UserService", "source_file": "auth.py", "source_location": "L42", "type": "class"} ], "edges": [ {"source": "auth_userservice", "target": "payments_gateway", "relation": "calls", "confidence": "EXTRACTED"} ], "communities": {0: ["auth_userservice", "auth_login"], 1: ["payments_gateway"]}, "cohesion": {0: 0.85, 1: 0.72}, "god_nodes": [{"label": "UserService", "degree": 12}], "stats": {"files_parsed": 5, "nodes": 23, "edges": 41, "communities": 3} }Implementation logic:
3.
graph_to_wiki_terms(graph_data) -> dictMaps Graphify nodes and communities to wiki-compatible entity and concept names.
Mapping rules:
classfunction(top-level)function(method)module/ fileCommunity naming algorithm:
source_filevalues from all community member nodesauth/,payments/)Returns:
{ "entities": ["UserService", "PaymentGateway", "process_payment"], "concepts": ["Authentication", "Payment Processing"], "edges_by_entity": { "UserService": [ {"target": "PaymentGateway", "relation": "calls"}, {"target": "login", "relation": "contains"} ] } }Implementation detail — filtering trivial nodes:
_(private/internal)__init__,__main__,setup"rationale"type nodes (Graphify extracts# NOTE:comments as rationale nodes)"contains"edges out)4.
save_graph_json(wiki_root, slug, graph_data) -> PathStorage rationale:
raw/is the immutable source material directory. The graph is derived from source code — it belongs with source data..obsidianignorealready excludesraw/from Obsidian indexing.5. Helper:
_empty_graph() -> dict6. Constant:
_CODE_EXTENSIONSThis matches Graphify's supported tree-sitter languages.
Potential Issues
Graphify's
collect_files()vs our file walking:collect_files()has its own filtering logic. We may get different file sets thanwiki_ingest_folder(). Solution: use our own file list from the walk loop where possible, or at minimum filtercollect_files()output with our_SKIP_DIRSand extensions.NetworkX graph iteration order:
G.nodesandG.edges(data=True)iteration order is insertion-order in Python 3.7+, but community assignment is non-deterministic (Leiden uses randomization). This is fine — we only need consistent node IDs, not consistent community assignment.Large repositories:
extract()on a 1000+ file repo could take 10-30 seconds (tree-sitter is fast but not instant). This is acceptable for a one-time ingest operation, but document that large repos may take a moment.extract()withcache_root: The v7 API supportsextract(code_files, cache_root=Path("."))for caching parsed results. We should pass a cache path to avoid re-parsing on--updateruns. Usewiki_root / "raw"as cache root.Import safety: All Graphify imports are lazy (inside function bodies), so
import agent_notesnever fails even when graphifyy isn't installed.Dependencies