Quick Start · Method Overview · Run the Example · Training · Code Layout
|
Candidate Pool |
Intent Plan |
Memory Chain |
Active Memory |
SFT + RL |
MemChain is a read-time memory policy for long-dialogue agents. Given a user question and a candidate memory pool, it produces an explicit memory trace before the answer model is called:
- 🧭
intent_plan: what evidence the question requires. - 🧩
memory_actions: which memories to keep, drop, refine, merge, or stop on. - 🔗
memory_chain: ordered evidence steps with memory citations. - 📝
active_memories: compact answer-ready memories passed to a frozen answer model.
The candidate-memory pool is answer-blind: it uses the question and dialogue provenance, not the gold answer. This keeps retrieval, policy learning, and answer generation separated and auditable.
📦 Install
git clone https://github.com/mayiwen0212/MemChain.git
cd MemChain
pip install -e ".[dev]"
pytest -q🧠 Build a candidate pool and run the policy
python scripts/build_memory_pool.py \
--input examples/minimal_dialogue.jsonl \
--output examples/out/minimal_candidates.jsonl \
--max-candidates 8 \
--no-dense
python scripts/run_heuristic_policy.py \
--input examples/out/minimal_candidates.jsonl \
--output examples/out/minimal_policy_outputs.jsonl \
--keep-k 3🔍 Inspect one MemChain policy output
head -n 1 examples/out/minimal_policy_outputs.jsonl | python -m json.tool | head -80The example is compact, but it follows the same public contract as the full experiments: normalized dialogue history, answer-blind candidate construction, structured policy output, and active-memory composition.
MemChain follows this read-time pipeline:
- Convert long dialogue history into provenance-preserving candidate memories.
- Infer the evidence need from the question.
- Build a bounded candidate pool with lexical, entity, temporal, and multi-hop retrieval views.
- Produce a structured MemChain trace with actions and cited chain steps.
- Pass only
question + active_memoriesto the answer model.
This design makes memory use explicit instead of silently stuffing raw retrieved notes into the answer context.
The repository includes a complete compact benchmark file:
examples/minimal_dialogue.jsonl
It contains:
- three timestamped dialogue sessions,
- two QA pairs,
- answer-session provenance,
- one fact lookup question and one temporal tracking question.
After running the quick-start commands, the generated policy rows contain:
{
"intent_plan": {"intent": "fact_lookup", "...": "..."},
"memory_actions": [{"action": "KEEP", "memory_id": "..."}],
"memory_chain": [{"step_id": "c1", "memory_ids": ["..."]}],
"active_memories": ["..."]
}The intended training setup is H200 eight-card training.
In our experiments, the MemChain policy is trained with an SFT-to-RL workflow on 8 x H200 GPUs. The released repository keeps the method-facing modules, schema, memory-pool construction, policy IO, reward utilities, metrics, and smoke tests.
Private dataset paths, model checkpoints, API keys, service endpoints, raw benchmark dumps, and machine-specific launch scripts are intentionally not included.
memchain/
data/benchmarks/base.py # normalized dialogue/session/QA dataclasses
memory_pool/intent_guided.py # answer-blind candidate memory pool
schema.py # public MemChain data schema
framework.py # framework wrapper and active-memory composer
prompts.py # policy and teacher prompts
policy_io.py # policy JSON parsing and SFT row export
reward.py # active-memory reward utility
metrics.py # trace and active-memory metrics
llm/ # optional OpenAI-compatible clients
scripts/
build_memory_pool.py # build candidate pools from normalized JSONL
run_heuristic_policy.py # deterministic policy sanity check
examples/
minimal_dialogue.jsonl # runnable open-source example
tests/
test_core.py
Input data should be normalized as dialogues with sessions and QA pairs. The
core dataclasses are in memchain/data/benchmarks/base.py.
Each policy input uses:
sample_idquestioncandidate_memories- optional
gold_answer - optional
metadata
Each policy output uses:
intent_planmemory_actionsmemory_chainactive_memoriessufficiency
This repository contains the MemChain core implementation for open-source inspection and extension. It does not include third-party comparison code, benchmark raw data, trained checkpoints, evaluation outputs, paper drafts, private API keys, private endpoints, or machine-specific paths.
@misc{memchain2026,
title = {MemChain: Learning Interpretable Memory Traces for Memory-Augmented LLM Agents},
author = {Ma, Yiwen},
year = {2026},
note = {Open-source code release}
}This project is licensed under the MIT License - see the LICENSE file for details.
We would like to thank the following projects and teams:
- 🔍 Embedding Backend: Sentence Transformers and compatible embedding models, such as Qwen3-Embedding, for optional dense candidate-memory search.
- 🧮 Retrieval Core: NumPy-backed in-memory dense scoring plus MemChain's BM25, entity, temporal, and feedback retrieval views for provenance-grounded candidate pools.
- 📊 Benchmark: LoCoMo - long-context memory evaluation framework.
- 📚 Benchmark: LongMemEval - long-range memory evaluation benchmark for conversational agents.
- ⚖️ Evaluation Judge: LoCoMo-Refined by Memorax AI - refined judge support for more reliable long-memory answer evaluation.
