Summary
Implement advanced memory system optimisations inspired by SimpleMem, Cognee, LangMem, and on-device LLM research.
Optimisations (priority order)
1. Semantic Compression (SimpleMem)
Compress raw conversation into self-contained memory units with resolved coreferences and absolute timestamps. SimpleMem achieves 26.4% F1 improvement + 30x token reduction.
- Distill dialogue windows into compact facts
- Resolve pronouns ("he" → "Jay", "it" → "taOS")
- Add absolute timestamps instead of relative ones
- Store compressed units alongside raw archive (zero-loss preserved)
2. Memory Memification (Cognee)
Self-improving graph that evolves over time:
- Prune stale/unused nodes periodically
- Strengthen frequently-accessed connections (increase confidence)
- Reweight edges based on usage signals
- Derive new facts from existing patterns (transitive relationships)
- Runs as a background dreaming task
3. Recall Scoring (LangMem)
Multi-signal retrieval ranking: score = similarity * importance * recency * strength
- Importance: how critical is this fact (preference > trivia)
- Recency: recently accessed/created facts rank higher
- Strength: frequency of access/confirmation
- Currently we only use semantic similarity
4. Intent-Aware Retrieval (SimpleMem)
Classify query intent before searching:
- Factual query → search KG first (structured facts)
- Recent context → search archive (recent events)
- Semantic exploration → search QMD (vector similarity)
- Currently we search all layers regardless of intent
5. KV Cache Compression (HAN Lab)
For on-device inference:
- StreamingLLM attention sinks for infinite-length generation
- DuoAttention head-specific treatment
- Would let Pi handle much longer context windows
Research References
Current Baseline
- KG: 100% precision on structured queries, 13ms
- Regex extraction: 80% on simple text, 15ms
- LLM extraction: 72% recall, 17s/turn (Qwen3-4B on NPU)
- Combined retrieval: 46% precision
Summary
Implement advanced memory system optimisations inspired by SimpleMem, Cognee, LangMem, and on-device LLM research.
Optimisations (priority order)
1. Semantic Compression (SimpleMem)
Compress raw conversation into self-contained memory units with resolved coreferences and absolute timestamps. SimpleMem achieves 26.4% F1 improvement + 30x token reduction.
2. Memory Memification (Cognee)
Self-improving graph that evolves over time:
3. Recall Scoring (LangMem)
Multi-signal retrieval ranking:
score = similarity * importance * recency * strength4. Intent-Aware Retrieval (SimpleMem)
Classify query intent before searching:
5. KV Cache Compression (HAN Lab)
For on-device inference:
Research References
Current Baseline