-
Notifications
You must be signed in to change notification settings - Fork 1
Closed
Labels
benchmarkPerformance/cost benchmarkingPerformance/cost benchmarkingllmLLM integration featuresLLM integration featuresv0.9.0v0.9.0 LLM-Augmented Intelligencev0.9.0 LLM-Augmented Intelligence
Description
Summary
Reclassify the 426K+ facts stuck in the catch-all kv (key-value) type bucket into proper semantic types: decision, preference, identity, relationship, temporal, state, etc.
Priority: MEDIUM IMPACT β one-time cleanup of existing corpus + ongoing classification for new facts.
Spec
New File: internal/extract/classify.go
type FactClassification struct {
FactID int64
OldType string
NewType string
Confidence float64
}
func ClassifyFacts(ctx context.Context, llm llm.Provider, facts []Fact) ([]FactClassification, error)Classification types (existing in schema):
decisionβ "ORB config locked Feb 9"preferenceβ "Q prefers reasoning then conclusion"identityβ "SB has scleritis"relationshipβ "Niot works on Eyes Web"temporalβ "eBay token expires 2027-07-28"stateβ "ADA ML220 bot is LIVE"configβ "Hawk model is Haiku 4.5"kvβ generic key-value (fallback only)
Batch Processing
- Process facts in batches of 50 (fits in context window, amortizes overhead)
- LLM receives batch + type definitions + examples
- Returns JSON array of reclassifications
- Only update facts where new type confidence > 0.8
CLI Integration
cortex classify --dry-run # show what would change
cortex classify --llm openai/gpt-5.1-codex-mini # reclassify all kv facts
cortex classify --batch-size 100 # larger batches
cortex classify --limit 1000 # process N facts only (for testing)Import Integration
New facts extracted with --enrich get classified inline. Without --enrich, use existing rule-based classification (unchanged).
Files to Create/Modify
internal/extract/classify.goβ batch classification logicinternal/extract/classify_test.goβ tests (mock LLM)internal/search/prompts/classify_facts.txtβ prompt template with type definitions + examplescmd/cortex/main.goβ addcortex classifysubcommandinternal/store/fact_store.goβ addUpdateFactType()method
Benchmark Test Spec
Test Corpus
Sample 200 random kv facts from the live Cortex DB, manually pre-label 50 of them.
Metrics (per batch of 50, per model)
| Metric | Target |
|---|---|
| Latency (per batch) | <5s |
| Tokens in | <2000 |
| Tokens out | <1000 |
| Cost per batch | <$0.01 |
| Accuracy (vs human labels) | β₯80% |
| kv β specific type rate | β₯60% of kv facts reclassified |
| False reclassification rate | <10% |
Accuracy Rubric
- Correct: Matches human-assigned type exactly
- Acceptable: Reasonable type (e.g., labeled "config" when "state" is also valid)
- Wrong: Incorrect type that would mislead search/filtering
- Harmful: Reclassified a correctly-typed fact incorrectly
Benchmark Script
Create scripts/benchmark_classify.go:
- Samples 200 kv facts from DB
- Runs through both models in batches of 50
- Compares to pre-labeled subset
- Outputs: accuracy, type distribution before/after, cost
Acceptance Criteria
-
cortex classifysubcommand works with both models -
--dry-runshows changes without applying - Batch processing with configurable batch size
- Confidence threshold (0.8) prevents low-confidence reclassifications
-
--limitflag for incremental processing - Benchmark results documented in PR
- One-time bulk reclassification completes in <10 minutes for 426K facts
- All existing tests pass
Dependencies
- π§ Query Expansion (Pre-Search)Β #216 (Query Expansion) β uses
internal/llm/adapter
Estimated Cost
- One-time bulk: ~426K facts / 50 per batch = 8,520 batches Γ
$0.01 = **$85** (one-time) - Ongoing: pennies per import (new facts only)
- Optimization: Only process
kvtype facts, skip already-classified
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
benchmarkPerformance/cost benchmarkingPerformance/cost benchmarkingllmLLM integration featuresLLM integration featuresv0.9.0v0.9.0 LLM-Augmented Intelligencev0.9.0 LLM-Augmented Intelligence