Skip to content

v1.4.26 — Temporal Reasoning + LongMemEval Benchmarks

Latest

Choose a tag to compare

@sachitrafa sachitrafa released this 25 May 18:31
· 47 commits to main since this release

What's new

Temporal reasoning boost

Memory retrieval now understands time expressions. When you ask "what did we discuss recently?" or "what changed last week?", the system resolves the expression to a concrete date window and applies a +0.25 score boost to memories whose created_at falls inside it — surfacing recent context over older memories with higher base scores.

Supported expressions: recently, last week, yesterday, this month, last N days, and more.

yourmemory-ask — query memory from the terminal

No Claude API call. No MCP client. Just ask directly against the local server:

yourmemory-ask "what database does this project use?"
yourmemory-ask "what did we decide about authentication?"

Streams the answer back to stdout. Useful for scripts, debugging, or quickly checking what's stored before starting a session.

Memory dashboard — http://localhost:3033/ui

A local web UI that starts automatically alongside the MCP server. Shows all stored memories with:

  • Strength bars — visualizes Ebbinghaus decay in real time (strong → fading → critical)
  • Category filter — browse by fact, strategy, assumption, failure
  • Sort by strength or recency
  • Per-agent tabs — see memories scoped to each connected agent separately

No setup needed. Open it while Claude is running to watch memories form and decay.

created_at support in store_memory

Pass a timestamp when storing a memory — useful for backfilling historical context or seeding a database with past decisions.

store_memory(content="Switched to BM25 + vector hybrid retrieval", created_at="2026-05-17")

Works across all three backends: SQLite, DuckDB, and Postgres.

noGraph ablation flag

New noGraph: true option on the retrieve endpoint skips entity graph expansion and returns BM25 + vector only. Useful for benchmarking, debugging, or low-latency use cases.


Benchmarks

LongMemEval-S — 89.4% Recall@5

500 questions, ~53 distractor sessions each. Full breakdown by question type:

Question type Recall@5
single-session-user 72.9%
single-session-assistant 98.2%
multi-session 95.5%
temporal-reasoning 84.2%
knowledge-update 90.0%

Script: benchmarks/longmemeval_temporal.py

Temporal boost ablation — honest finding

The +0.25 boost adds 0pp on LongMemEval's temporal-reasoning questions. Those questions are event-anchored ("when did X happen?") not window-anchored ("recently"). The boost is designed for the latter — real assistant queries where the user says "recently" or "last week". Full write-up in BENCHMARKS.md.


Install / upgrade

pip install --upgrade yourmemory
yourmemory-setup

Then open http://localhost:3033/ui to see the dashboard.


Full changelog

  • feat temporal reasoning boost in src/services/temporal.py
  • feat yourmemory-ask CLI — query memory without Claude, streams to stdout
  • feat memory dashboard at /ui — strength bars, decay visualization, agent tabs
  • feat created_at field on store_memory MCP tool + all three DB backends
  • feat noGraph flag on retrieve endpoint for BM25+vector-only mode
  • docs LongMemEval benchmark results published to README and BENCHMARKS.md
  • docs README overhaul — Table of Contents, all badges, cleaner install flow
  • chore version bump 1.4.25 → 1.4.26, published to PyPI