taOSmd optimisations: semantic compression, memify, recall scoring, intent-aware retrieval

## Summary
Implement advanced memory system optimisations inspired by SimpleMem, Cognee, LangMem, and on-device LLM research.

## Optimisations (priority order)

### 1. Semantic Compression (SimpleMem)
Compress raw conversation into self-contained memory units with resolved coreferences and absolute timestamps. SimpleMem achieves 26.4% F1 improvement + 30x token reduction.
- Distill dialogue windows into compact facts
- Resolve pronouns ("he" → "Jay", "it" → "taOS")
- Add absolute timestamps instead of relative ones
- Store compressed units alongside raw archive (zero-loss preserved)

### 2. Memory Memification (Cognee)
Self-improving graph that evolves over time:
- Prune stale/unused nodes periodically
- Strengthen frequently-accessed connections (increase confidence)
- Reweight edges based on usage signals
- Derive new facts from existing patterns (transitive relationships)
- Runs as a background dreaming task

### 3. Recall Scoring (LangMem)
Multi-signal retrieval ranking: `score = similarity * importance * recency * strength`
- Importance: how critical is this fact (preference > trivia)
- Recency: recently accessed/created facts rank higher
- Strength: frequency of access/confirmation
- Currently we only use semantic similarity

### 4. Intent-Aware Retrieval (SimpleMem)
Classify query intent before searching:
- Factual query → search KG first (structured facts)
- Recent context → search archive (recent events)
- Semantic exploration → search QMD (vector similarity)
- Currently we search all layers regardless of intent

### 5. KV Cache Compression (HAN Lab)
For on-device inference:
- StreamingLLM attention sinks for infinite-length generation
- DuoAttention head-specific treatment
- Would let Pi handle much longer context windows

## Research References
- SimpleMem: https://github.com/aiming-lab/SimpleMem (26.4% F1 gain)
- Cognee: https://github.com/topoteretes/cognee (self-improving memory)
- LangMem: https://langchain-ai.github.io/langmem/ (recall scoring)
- ReMe: https://github.com/agentscope-ai/ReMe (memory management kit)
- MIT HAN Lab: DuoAttention + StreamingLLM

## Current Baseline
- KG: 100% precision on structured queries, 13ms
- Regex extraction: 80% on simple text, 15ms
- LLM extraction: 72% recall, 17s/turn (Qwen3-4B on NPU)
- Combined retrieval: 46% precision

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

taOSmd optimisations: semantic compression, memify, recall scoring, intent-aware retrieval #198

Summary

Optimisations (priority order)

1. Semantic Compression (SimpleMem)

2. Memory Memification (Cognee)

3. Recall Scoring (LangMem)

4. Intent-Aware Retrieval (SimpleMem)

5. KV Cache Compression (HAN Lab)

Research References

Current Baseline

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

taOSmd optimisations: semantic compression, memify, recall scoring, intent-aware retrieval #198

Description

Summary

Optimisations (priority order)

1. Semantic Compression (SimpleMem)

2. Memory Memification (Cognee)

3. Recall Scoring (LangMem)

4. Intent-Aware Retrieval (SimpleMem)

5. KV Cache Compression (HAN Lab)

Research References

Current Baseline

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions