Skip to content

Add benchmark suite for memory quality evaluation #54

@rohitg00

Description

@rohitg00

Problem

No published benchmarks or evaluation scores. Cannot demonstrate quality vs alternatives. In 2026, published benchmark results are the primary credibility signal for memory systems.

Proposed Solution

  1. Implement evaluation scripts against standard benchmarks (LOCOMO, LongMemEval, AMA-Bench)
  2. Create an internal eval harness that measures:
    • Retrieval precision/recall at k
    • Compression quality scores (already have `scoreCompression`)
    • End-to-end accuracy (store → retrieve → use in context)
    • Latency percentiles (P50, P95, P99)
  3. Add benchmark results to README
  4. Run benchmarks in CI to catch regressions

Impact

Establishes credibility and enables data-driven optimization decisions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthighHigh priority

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions