Skip to content

v0.1.9 — Janitor + Distiller Pipeline Rewrites

Choose a tag to compare

@ssdavidai ssdavidai released this 23 Feb 08:38

What Changed

Janitor: 3-Stage Pipeline

Replaces the monolithic single-LLM-call janitor with a focused 3-stage pipeline:

  • Stage 1: Autofix (pure Python) — Deterministic fixes for missing fields, invalid types/statuses, and field type mismatches. Flags directory mismatches, orphans, and duplicates with janitor_note for manual review. No LLM needed for these.
  • Stage 2: Link Repair (LLM per-link) — Broken wikilinks are matched against candidates found via vault search. Unambiguous matches are fixed in Python; ambiguous cases get focused per-link LLM calls with candidate context.
  • Stage 3: Enrich (LLM per-stub) — Stub records are enriched using only existing vault context and verifiable public facts (for person/org only). Strict constraints against LLM-generated filler content — every fact must be traceable.

Distiller: Multi-Stage Pipeline + Meta-Analysis

Replaces the monolithic distiller with a 2-pass architecture:

Pass A — Per-source extraction:

  • Stage 1: Extract (LLM per-source) — Analyzes one source record at a time with keyword signal hints and compact dedup context. Only includes the 5 learn-type schemas instead of all 22 reference templates.
  • Stage 2: Dedup + Merge (pure Python) — Cross-source merge using overlap coefficient fuzzy title matching. Deduplicates against existing learning records at a stricter threshold.
  • Stage 3: Create (LLM per-learning) — Focused per-learning call with single type schema to create well-formed vault records.

Pass B — Cross-learning meta-analysis:

  • Scans all learning records in the vault, clusters by project and type
  • LLM analyzes each cluster for contradictions between decisions, shared assumptions, and emergent syntheses
  • Creates higher-order learning records that link the reasoning graph together
  • This is the "from having things to having reasoning" layer

Other

  • Saves distiller_signals on source records (visible in Obsidian)
  • Saves distiller_learnings wikilinks back to source records after extraction
  • Legacy single-call path preserved for Claude/Zo backends across all tools
  • Updated README with janitor + distiller pipeline descriptions

Install

pip install alfred-vault==0.1.9