Skip to content

🏷️ Auto-Classification on Import #219

@hurttlocker

Description

@hurttlocker

Summary

Reclassify the 426K+ facts stuck in the catch-all kv (key-value) type bucket into proper semantic types: decision, preference, identity, relationship, temporal, state, etc.

Priority: MEDIUM IMPACT β€” one-time cleanup of existing corpus + ongoing classification for new facts.

Spec

New File: internal/extract/classify.go

type FactClassification struct {
    FactID    int64
    OldType   string
    NewType   string
    Confidence float64
}

func ClassifyFacts(ctx context.Context, llm llm.Provider, facts []Fact) ([]FactClassification, error)

Classification types (existing in schema):

  • decision β€” "ORB config locked Feb 9"
  • preference β€” "Q prefers reasoning then conclusion"
  • identity β€” "SB has scleritis"
  • relationship β€” "Niot works on Eyes Web"
  • temporal β€” "eBay token expires 2027-07-28"
  • state β€” "ADA ML220 bot is LIVE"
  • config β€” "Hawk model is Haiku 4.5"
  • kv β€” generic key-value (fallback only)

Batch Processing

  • Process facts in batches of 50 (fits in context window, amortizes overhead)
  • LLM receives batch + type definitions + examples
  • Returns JSON array of reclassifications
  • Only update facts where new type confidence > 0.8

CLI Integration

cortex classify --dry-run                              # show what would change
cortex classify --llm openai/gpt-5.1-codex-mini       # reclassify all kv facts
cortex classify --batch-size 100                       # larger batches
cortex classify --limit 1000                           # process N facts only (for testing)

Import Integration

New facts extracted with --enrich get classified inline. Without --enrich, use existing rule-based classification (unchanged).

Files to Create/Modify

  • internal/extract/classify.go β€” batch classification logic
  • internal/extract/classify_test.go β€” tests (mock LLM)
  • internal/search/prompts/classify_facts.txt β€” prompt template with type definitions + examples
  • cmd/cortex/main.go β€” add cortex classify subcommand
  • internal/store/fact_store.go β€” add UpdateFactType() method

Benchmark Test Spec

Test Corpus

Sample 200 random kv facts from the live Cortex DB, manually pre-label 50 of them.

Metrics (per batch of 50, per model)

Metric Target
Latency (per batch) <5s
Tokens in <2000
Tokens out <1000
Cost per batch <$0.01
Accuracy (vs human labels) β‰₯80%
kv β†’ specific type rate β‰₯60% of kv facts reclassified
False reclassification rate <10%

Accuracy Rubric

  • Correct: Matches human-assigned type exactly
  • Acceptable: Reasonable type (e.g., labeled "config" when "state" is also valid)
  • Wrong: Incorrect type that would mislead search/filtering
  • Harmful: Reclassified a correctly-typed fact incorrectly

Benchmark Script

Create scripts/benchmark_classify.go:

  • Samples 200 kv facts from DB
  • Runs through both models in batches of 50
  • Compares to pre-labeled subset
  • Outputs: accuracy, type distribution before/after, cost

Acceptance Criteria

  • cortex classify subcommand works with both models
  • --dry-run shows changes without applying
  • Batch processing with configurable batch size
  • Confidence threshold (0.8) prevents low-confidence reclassifications
  • --limit flag for incremental processing
  • Benchmark results documented in PR
  • One-time bulk reclassification completes in <10 minutes for 426K facts
  • All existing tests pass

Dependencies

Estimated Cost

  • One-time bulk: ~426K facts / 50 per batch = 8,520 batches Γ— $0.01 = **$85** (one-time)
  • Ongoing: pennies per import (new facts only)
  • Optimization: Only process kv type facts, skip already-classified

Metadata

Metadata

Assignees

No one assigned

    Labels

    benchmarkPerformance/cost benchmarkingllmLLM integration featuresv0.9.0v0.9.0 LLM-Augmented Intelligence

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions