Skip to content

taOSmd optimisations: semantic compression, memify, recall scoring, intent-aware retrieval #198

@jaylfc

Description

@jaylfc

Summary

Implement advanced memory system optimisations inspired by SimpleMem, Cognee, LangMem, and on-device LLM research.

Optimisations (priority order)

1. Semantic Compression (SimpleMem)

Compress raw conversation into self-contained memory units with resolved coreferences and absolute timestamps. SimpleMem achieves 26.4% F1 improvement + 30x token reduction.

  • Distill dialogue windows into compact facts
  • Resolve pronouns ("he" → "Jay", "it" → "taOS")
  • Add absolute timestamps instead of relative ones
  • Store compressed units alongside raw archive (zero-loss preserved)

2. Memory Memification (Cognee)

Self-improving graph that evolves over time:

  • Prune stale/unused nodes periodically
  • Strengthen frequently-accessed connections (increase confidence)
  • Reweight edges based on usage signals
  • Derive new facts from existing patterns (transitive relationships)
  • Runs as a background dreaming task

3. Recall Scoring (LangMem)

Multi-signal retrieval ranking: score = similarity * importance * recency * strength

  • Importance: how critical is this fact (preference > trivia)
  • Recency: recently accessed/created facts rank higher
  • Strength: frequency of access/confirmation
  • Currently we only use semantic similarity

4. Intent-Aware Retrieval (SimpleMem)

Classify query intent before searching:

  • Factual query → search KG first (structured facts)
  • Recent context → search archive (recent events)
  • Semantic exploration → search QMD (vector similarity)
  • Currently we search all layers regardless of intent

5. KV Cache Compression (HAN Lab)

For on-device inference:

  • StreamingLLM attention sinks for infinite-length generation
  • DuoAttention head-specific treatment
  • Would let Pi handle much longer context windows

Research References

Current Baseline

  • KG: 100% precision on structured queries, 13ms
  • Regex extraction: 80% on simple text, 15ms
  • LLM extraction: 72% recall, 17s/turn (Qwen3-4B on NPU)
  • Combined retrieval: 46% precision

Metadata

Metadata

Assignees

No one assigned

    Labels

    agentsAgent frameworks and deploymentbackend-drivenBackend-driven discoveryenhancementNew feature or requestfeatureNew featurekilo-auto-fixAuto-generated label by Kilokilo-triagedAuto-generated label by Kilo

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions