🔬 Smarter Fact Extraction (Import Pipeline)

## Summary
Augment the import pipeline with LLM-powered fact extraction to capture the WHY behind decisions, implicit relationships, and confidence calibration — things the rule-based extractor misses.

**Priority:** HIGH IMPACT — improves quality of every future import.

## Spec

### Modified Pipeline: `internal/extract/extract.go`

Current flow: `file → chunk → rule-based extract → facts`
New flow: `file → chunk → rule-based extract → LLM enrichment (optional) → facts`

```go
func EnrichFacts(ctx context.Context, llm llm.Provider, chunk string, ruleFacts []Fact) ([]Fact, error)
```

**What the LLM adds:**
1. **Decision reasoning**: "Q locked ORB config" → also extracts "because IEX volume filter was the problem"
2. **Implicit relationships**: "SB needs this for Eyes Web" → links SB ↔ Eyes Web ↔ health
3. **Confidence calibration**: "we might try X" (tentative, 0.4) vs "X is locked" (definitive, 0.9)
4. **Missed facts**: Things the rule extractor skipped but are clearly important

### LLM Prompt Strategy
- Send: the raw chunk + the rule-extracted facts
- Ask: "What additional facts are missing? What reasoning/relationships did the rules miss?"
- Format: structured JSON output matching existing Fact schema
- Dedup: compare LLM facts against rule facts before inserting (fuzzy match on subject+predicate)

### CLI Integration
```bash
cortex import notes.md --extract --enrich              # rule extract + LLM enrichment
cortex import notes.md --extract --enrich --llm google/gemini-3-flash
cortex import notes.md --extract                       # unchanged (rule-based only)
```

### Sync Integration
```bash
cortex connect sync --provider file --extract --enrich  # enriched sync
```

## Files to Create/Modify
- `internal/extract/enrich.go` — LLM enrichment logic
- `internal/extract/enrich_test.go` — tests (mock LLM)
- `internal/search/prompts/enrich_facts.txt` — prompt template
- `internal/extract/extract.go` — wire enrichment into pipeline
- `cmd/cortex/main.go` — add `--enrich` flag to import command

## Benchmark Test Spec

### Test Corpus
Use 10 real memory files from the Cortex test fixtures (or anonymized versions):
- 3 daily notes (decisions, conversations, progress)
- 2 MEMORY.md sections (curated facts)
- 2 trading journal entries (technical decisions)
- 1 agent handoff doc
- 1 meeting notes
- 1 config change log

### Metrics (per file, per model)
| Metric | Target |
|---|---|
| **Latency** (per chunk) | <3s |
| **Tokens in** | <500 |
| **Tokens out** | <300 |
| **Cost per import** | <$0.01 |
| **New facts found** (vs rule-only) | ≥20% more |
| **New fact quality** (0-5 rubric) | ≥3.0 avg |
| **False positive rate** | <15% |

### Quality Rubric for New Facts
- 5: Critical fact that rule extractor missed entirely
- 4: Useful relationship or reasoning not in rule output
- 3: Valid but somewhat obvious fact
- 2: Marginally useful, borderline noise
- 1: Duplicate of existing rule-extracted fact
- 0: Wrong or hallucinated fact

### Benchmark Script
Create `scripts/benchmark_enrich.go`:
- Runs all 10 files through both models
- Compares LLM-enriched facts vs rule-only facts
- Human rates new facts on quality rubric
- Outputs: new fact count, quality scores, cost, latency per model

## Acceptance Criteria
- [ ] `--enrich` flag works on `cortex import`
- [ ] LLM enrichment is additive (never removes rule-extracted facts)
- [ ] Dedup prevents inserting near-duplicate facts
- [ ] Confidence calibration adjusts fact confidence scores
- [ ] Relationship extraction creates proper subject→predicate→object triples
- [ ] Graceful fallback on LLM error (rule-only results still saved)
- [ ] Benchmark results documented in PR
- [ ] All existing tests pass

## Dependencies
- #216 (Query Expansion) — uses `internal/llm/` adapter

## Estimated Cost
~$0.01-0.05 per import cycle × 8 cycles/day = **~$0.08-0.40/month**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🔬 Smarter Fact Extraction (Import Pipeline) #218

Summary

Spec

Modified Pipeline: `internal/extract/extract.go`

LLM Prompt Strategy

CLI Integration

Sync Integration

Files to Create/Modify

Benchmark Test Spec

Test Corpus

Metrics (per file, per model)

Quality Rubric for New Facts

Benchmark Script

Acceptance Criteria

Dependencies

Estimated Cost

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Metric	Target
Latency (per chunk)	<3s
Tokens in	<500
Tokens out	<300
Cost per import	<$0.01
New facts found (vs rule-only)	≥20% more
New fact quality (0-5 rubric)	≥3.0 avg
False positive rate	<15%

🔬 Smarter Fact Extraction (Import Pipeline) #218

Description

Summary

Spec

Modified Pipeline: internal/extract/extract.go

LLM Prompt Strategy

CLI Integration

Sync Integration

Files to Create/Modify

Benchmark Test Spec

Test Corpus

Metrics (per file, per model)

Quality Rubric for New Facts

Benchmark Script

Acceptance Criteria

Dependencies

Estimated Cost

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Modified Pipeline: `internal/extract/extract.go`