Parent Epic
Part of #5 — Integrate Graphify for zero-cost code entity extraction
Task
Modify wiki_ingest_folder() in agent_notes/services/wiki_backend.py to automatically run Graphify extraction when code files are present and the package is available.
File
agent_notes/services/wiki_backend.py — function wiki_ingest_folder() (lines 338-441)
Current Flow
wiki_ingest_folder(folder_path)
├── Walk files, filter by extension/.gitignore/_SKIP_DIRS
├── Concatenate with "--- FILE: <rel> ---" markers
├── Chunk if > 2MB
└── Call wiki_ingest(concepts=caller_provided, entities=caller_provided)
Problem: concepts and entities are almost always None when called programmatically — the caller (LLM agent or CLI) doesn't know what's in the code yet.
New Flow
wiki_ingest_folder(folder_path)
├── Walk files, filter by extension/.gitignore/_SKIP_DIRS
├── Track has_code flag during walk
├── Concatenate with "--- FILE: <rel> ---" markers
│
├── [NEW] if has_code and graphify_available():
│ ├── extract_code_graph(folder_path, extensions, skip_dirs)
│ ├── graph_to_wiki_terms(graph_data)
│ ├── save_graph_json(wiki_root, slug, graph_data)
│ └── Merge discovered terms with caller-provided ones
│
├── Chunk if > 2MB
└── Call wiki_ingest(concepts=merged, entities=merged)
Implementation Details
Step 1: Add _CODE_EXTENSIONS constant (near line 306)
_CODE_EXTENSIONS = {
".py", ".ts", ".js", ".tsx", ".jsx",
".go", ".rs", ".java", ".cpp", ".c", ".h",
".rb", ".swift", ".kt", ".cs", ".scala",
".php", ".lua", ".groovy",
}
Step 2: Track has_code during file walk (inside the for loop, line 364-387)
Add before the loop:
Inside the loop, after the extension filter passes (after line 374):
if file.suffix in _CODE_EXTENSIONS:
has_code = True
Step 3: Insert Graphify extraction block (after line 389, before line 391)
# ── Graphify auto-extraction (zero-cost entity discovery) ────────
graphify_concepts: list[str] = []
graphify_entities: list[str] = []
if has_code:
try:
from .code_graph import (
graphify_available,
extract_code_graph,
graph_to_wiki_terms,
save_graph_json,
)
if graphify_available():
graph_data = extract_code_graph(
folder_path,
extensions=allowed_exts & _CODE_EXTENSIONS if allowed_exts != _DEFAULT_EXTENSIONS else None,
skip_dirs=_SKIP_DIRS,
)
if graph_data["stats"]["nodes"] > 0:
wiki_terms = graph_to_wiki_terms(graph_data)
graphify_entities = wiki_terms["entities"]
graphify_concepts = wiki_terms["concepts"]
# Persist graph alongside raw content
_slug_name = slug if 'slug' in dir() else _slug(title or folder_path.name)
save_graph_json(wiki_root, _slug_name, graph_data)
except Exception:
pass # Graphify failure must never break ingestion
Step 4: Merge terms before wiki_ingest() calls
Add helper function:
def _merge_unique(base: list[str], extra: list[str]) -> list[str]:
"""Merge two lists preserving order, removing duplicates (case-insensitive)."""
seen = {x.lower() for x in base}
result = list(base)
for item in extra:
if item.lower() not in seen:
seen.add(item.lower())
result.append(item)
return result
Before both wiki_ingest() calls (line 421 and 432), merge:
merged_concepts = _merge_unique(concepts or [], graphify_concepts)
merged_entities = _merge_unique(entities or [], graphify_entities)
Then pass concepts=merged_concepts, entities=merged_entities instead of concepts=concepts, entities=entities.
Step 5: Add graph.json reference to source page (optional enhancement)
In wiki_ingest(), if a graph.json was saved, add its path to the sources list in the source page frontmatter. This is optional — the graph.json is discoverable by convention (raw/<slug>-graph.json).
Insertion Points (exact line references)
| What |
Where |
Line |
_CODE_EXTENSIONS constant |
After _DEFAULT_EXTENSIONS |
~306 |
has_code = False |
Before for file in sorted(...) |
~363 |
has_code = True |
Inside loop, after extension check |
~375 |
| Graphify extraction block |
After raw_content = "".join(parts) |
~390 |
_merge_unique() helper |
Before wiki_ingest_folder() or as module-level |
~337 |
Merged args to wiki_ingest() (chunked) |
Replace concepts=concepts |
~428 |
Merged args to wiki_ingest() (single) |
Replace concepts=concepts |
~439 |
Edge Cases
- Folder with no code files (only .md/.yaml):
has_code stays False, Graphify block skipped entirely. Zero overhead.
- Graphify not installed:
graphify_available() returns False. Zero overhead beyond one failed import attempt (cached by Python).
- Graphify extraction returns empty:
stats.nodes == 0 check skips term mapping. Falls through to original behavior.
- Graphify crashes:
except Exception: pass catches everything. Ingestion continues without entity discovery.
- Caller provides entities AND Graphify discovers more:
_merge_unique() combines both, deduplicating case-insensitively. Caller's entities come first (higher priority).
- Very large folder (1000+ files):
extract() may take 10-30s. This is acceptable for a one-time ingest. Tree-sitter is O(n) in file size.
Testing
See #11 for test specifications.
Dependencies
Parent Epic
Part of #5 — Integrate Graphify for zero-cost code entity extraction
Task
Modify
wiki_ingest_folder()inagent_notes/services/wiki_backend.pyto automatically run Graphify extraction when code files are present and the package is available.File
agent_notes/services/wiki_backend.py— functionwiki_ingest_folder()(lines 338-441)Current Flow
Problem:
conceptsandentitiesare almost alwaysNonewhen called programmatically — the caller (LLM agent or CLI) doesn't know what's in the code yet.New Flow
Implementation Details
Step 1: Add
_CODE_EXTENSIONSconstant (near line 306)Step 2: Track
has_codeduring file walk (inside the for loop, line 364-387)Add before the loop:
Inside the loop, after the extension filter passes (after line 374):
Step 3: Insert Graphify extraction block (after line 389, before line 391)
Step 4: Merge terms before wiki_ingest() calls
Add helper function:
Before both
wiki_ingest()calls (line 421 and 432), merge:Then pass
concepts=merged_concepts, entities=merged_entitiesinstead ofconcepts=concepts, entities=entities.Step 5: Add graph.json reference to source page (optional enhancement)
In
wiki_ingest(), if a graph.json was saved, add its path to thesourceslist in the source page frontmatter. This is optional — the graph.json is discoverable by convention (raw/<slug>-graph.json).Insertion Points (exact line references)
_CODE_EXTENSIONSconstant_DEFAULT_EXTENSIONShas_code = Falsefor file in sorted(...)has_code = Trueraw_content = "".join(parts)_merge_unique()helperwiki_ingest_folder()or as module-levelwiki_ingest()(chunked)concepts=conceptswiki_ingest()(single)concepts=conceptsEdge Cases
has_codestays False, Graphify block skipped entirely. Zero overhead.graphify_available()returns False. Zero overhead beyond one failed import attempt (cached by Python).stats.nodes == 0check skips term mapping. Falls through to original behavior.except Exception: passcatches everything. Ingestion continues without entity discovery._merge_unique()combines both, deduplicating case-insensitively. Caller's entities come first (higher priority).extract()may take 10-30s. This is acceptable for a one-time ingest. Tree-sitter is O(n) in file size.Testing
See #11 for test specifications.
Dependencies