Skip to content

🔬 Smarter Fact Extraction (Import Pipeline) #218

@hurttlocker

Description

@hurttlocker

Summary

Augment the import pipeline with LLM-powered fact extraction to capture the WHY behind decisions, implicit relationships, and confidence calibration — things the rule-based extractor misses.

Priority: HIGH IMPACT — improves quality of every future import.

Spec

Modified Pipeline: internal/extract/extract.go

Current flow: file → chunk → rule-based extract → facts
New flow: file → chunk → rule-based extract → LLM enrichment (optional) → facts

func EnrichFacts(ctx context.Context, llm llm.Provider, chunk string, ruleFacts []Fact) ([]Fact, error)

What the LLM adds:

  1. Decision reasoning: "Q locked ORB config" → also extracts "because IEX volume filter was the problem"
  2. Implicit relationships: "SB needs this for Eyes Web" → links SB ↔ Eyes Web ↔ health
  3. Confidence calibration: "we might try X" (tentative, 0.4) vs "X is locked" (definitive, 0.9)
  4. Missed facts: Things the rule extractor skipped but are clearly important

LLM Prompt Strategy

  • Send: the raw chunk + the rule-extracted facts
  • Ask: "What additional facts are missing? What reasoning/relationships did the rules miss?"
  • Format: structured JSON output matching existing Fact schema
  • Dedup: compare LLM facts against rule facts before inserting (fuzzy match on subject+predicate)

CLI Integration

cortex import notes.md --extract --enrich              # rule extract + LLM enrichment
cortex import notes.md --extract --enrich --llm google/gemini-3-flash
cortex import notes.md --extract                       # unchanged (rule-based only)

Sync Integration

cortex connect sync --provider file --extract --enrich  # enriched sync

Files to Create/Modify

  • internal/extract/enrich.go — LLM enrichment logic
  • internal/extract/enrich_test.go — tests (mock LLM)
  • internal/search/prompts/enrich_facts.txt — prompt template
  • internal/extract/extract.go — wire enrichment into pipeline
  • cmd/cortex/main.go — add --enrich flag to import command

Benchmark Test Spec

Test Corpus

Use 10 real memory files from the Cortex test fixtures (or anonymized versions):

  • 3 daily notes (decisions, conversations, progress)
  • 2 MEMORY.md sections (curated facts)
  • 2 trading journal entries (technical decisions)
  • 1 agent handoff doc
  • 1 meeting notes
  • 1 config change log

Metrics (per file, per model)

Metric Target
Latency (per chunk) <3s
Tokens in <500
Tokens out <300
Cost per import <$0.01
New facts found (vs rule-only) ≥20% more
New fact quality (0-5 rubric) ≥3.0 avg
False positive rate <15%

Quality Rubric for New Facts

  • 5: Critical fact that rule extractor missed entirely
  • 4: Useful relationship or reasoning not in rule output
  • 3: Valid but somewhat obvious fact
  • 2: Marginally useful, borderline noise
  • 1: Duplicate of existing rule-extracted fact
  • 0: Wrong or hallucinated fact

Benchmark Script

Create scripts/benchmark_enrich.go:

  • Runs all 10 files through both models
  • Compares LLM-enriched facts vs rule-only facts
  • Human rates new facts on quality rubric
  • Outputs: new fact count, quality scores, cost, latency per model

Acceptance Criteria

  • --enrich flag works on cortex import
  • LLM enrichment is additive (never removes rule-extracted facts)
  • Dedup prevents inserting near-duplicate facts
  • Confidence calibration adjusts fact confidence scores
  • Relationship extraction creates proper subject→predicate→object triples
  • Graceful fallback on LLM error (rule-only results still saved)
  • Benchmark results documented in PR
  • All existing tests pass

Dependencies

Estimated Cost

$0.01-0.05 per import cycle × 8 cycles/day = **$0.08-0.40/month**

Metadata

Metadata

Assignees

No one assigned

    Labels

    benchmarkPerformance/cost benchmarkingllmLLM integration featuresv0.9.0v0.9.0 LLM-Augmented Intelligence

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions