Skip to content

Update wiki-compiler and obsidian-memory skill for graph.json #11

@verkligheten

Description

@verkligheten

Parent Epic

Part of #5 — Integrate Graphify for zero-cost code entity extraction

Task

Update agent instructions so the wiki-compiler leverages graph.json when available (skipping expensive grep-based discovery), and the obsidian-memory skill documents the folder auto-detection feature.

Files

  1. agent_notes/data/agents/wiki-compiler.md
  2. agent_notes/data/skills/obsidian-memory/SKILL.md

Changes to wiki-compiler.md

Add a new section after the existing "## Process" section:

### Pre-extracted graph (when available)

Before starting the Discover step, check if a graph file exists:

    ls raw/*-graph.json

If found, read the graph JSON to accelerate compilation:

1. **Skip Discover** — the graph already contains all code entities with their source locations.
   - `nodes[].label` = entity names (classes, functions, modules)
   - `nodes[].source_file` = which file to read
   - `nodes[].source_location` = line number (e.g., "L42")
   - `nodes[].type` = "class", "function", "module", "rationale"

2. **Use edges for relationships** — no need to grep for cross-references.
   - `edges[].relation` = "calls", "imports", "uses", "inherits", "contains"
   - `edges[].confidence` = "EXTRACTED" (deterministic from AST), "INFERRED", "AMBIGUOUS"

3. **Use communities for grouping** — Leiden algorithm clusters related entities.
   - `communities` = `{community_id: [node_ids]}`
   - `cohesion` = `{community_id: score}` (higher = tighter coupling)

4. **Use god_nodes for priority** — compile the most-connected entities first.
   - `god_nodes[].label` = entity name
   - `god_nodes[].degree` = number of connections

5. **Go directly to Read** — use `source_file` and `source_location` to read the actual code, then write the domain narrative.

This saves significant tokens: entity discovery is free (tree-sitter extracted), and you focus exclusively on semantic enrichment — writing the "why" narratives that AST parsing can't provide.

**Example workflow with graph.json:**

```bash
# Read the graph
cat raw/my-project-graph.json | head -100

# Identify top entities from god_nodes
# Read their source files at the specified locations
# Write concept/entity pages with domain narratives
agent-notes memory add "UserService" "..." entity wiki-compiler

## Changes to obsidian-memory/SKILL.md

Update the ingest workflow section to document folder auto-detection:

```markdown
### Folder ingestion with auto-extraction

When the first argument to `ingest` is a directory path, the CLI automatically:
1. Walks the directory (respects .gitignore, skips __pycache__/.git/node_modules)
2. If graphifyy is installed: runs tree-sitter AST extraction (zero API cost)
3. Discovers code entities (classes, functions, modules) and their relationships
4. Saves extraction as `raw/<slug>-graph.json`
5. Creates entity stub pages for discovered classes and high-connectivity functions
6. Creates concept pages for detected code communities (Leiden algorithm)
7. Falls back to text-only ingestion if graphifyy is not installed

```bash
# Ingest a code project (Graphify auto-extracts if installed)
agent-notes memory ingest /path/to/project "Project summary"
agent-notes memory ingest ./src "Source code analysis"
agent-notes memory ingest ~/code/my-app

# Install graphifyy for zero-cost code extraction
pip install agent-notes[graph]

After folder ingestion, run agent-notes memory lint to see which stub pages need compilation. The wiki-compiler can then leverage the saved graph.json to skip discovery and focus on writing domain narratives.


## Rationale

These are prompt changes, not code changes. They teach the LLM wiki-compiler to:
1. **Check for graph.json first** — if it exists, skip the expensive grep + read discovery loop
2. **Use structural data** — entities, relationships, and communities are already known
3. **Focus on semantics** — the LLM's value is in writing "why" narratives, not in discovering "what" exists

This is where the bulk of the LLM cost savings comes from in practice: the wiki-compiler currently spends 60-70% of its tokens on discovery (grepping, reading files to find entities), and only 30-40% on actually writing the wiki page content.

## Dependencies

- #8 (graph.json must be saved during ingest for these instructions to work)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions