GitHub - mayiwen0212/MemChain: Official implementation of MemChain for interpretable memory reasoning in memory-augmented LLMs.

Learning Interpretable Memory Traces for Memory-Augmented
LLM Agents

Quick Start · Method Overview · Run the Example · Training · Code Layout

_{Candidate Pool}

_{Intent Plan}

_{Memory Chain}

_{Active Memory}

_{SFT + RL}

✨ Highlights

MemChain is a read-time memory policy for long-dialogue agents. Given a user question and a candidate memory pool, it produces an explicit memory trace before the answer model is called:

🧭 intent_plan: what evidence the question requires.
🧩 memory_actions: which memories to keep, drop, refine, merge, or stop on.
🔗 memory_chain: ordered evidence steps with memory citations.
📝 active_memories: compact answer-ready memories passed to a frozen answer model.

The candidate-memory pool is answer-blind: it uses the question and dialogue provenance, not the gold answer. This keeps retrieval, policy learning, and answer generation separated and auditable.

🚀 Quick Start

📦 Install

git clone https://github.com/mayiwen0212/MemChain.git
cd MemChain
pip install -e ".[dev]"
pytest -q

🧠 Build a candidate pool and run the policy

python scripts/build_memory_pool.py \
  --input examples/minimal_dialogue.jsonl \
  --output examples/out/minimal_candidates.jsonl \
  --max-candidates 8 \
  --no-dense

python scripts/run_heuristic_policy.py \
  --input examples/out/minimal_candidates.jsonl \
  --output examples/out/minimal_policy_outputs.jsonl \
  --keep-k 3

🔍 Inspect one MemChain policy output

head -n 1 examples/out/minimal_policy_outputs.jsonl | python -m json.tool | head -80

The example is compact, but it follows the same public contract as the full experiments: normalized dialogue history, answer-blind candidate construction, structured policy output, and active-memory composition.

🧭 Method Overview

MemChain follows this read-time pipeline:

Convert long dialogue history into provenance-preserving candidate memories.
Infer the evidence need from the question.
Build a bounded candidate pool with lexical, entity, temporal, and multi-hop retrieval views.
Produce a structured MemChain trace with actions and cited chain steps.
Pass only question + active_memories to the answer model.

This design makes memory use explicit instead of silently stuffing raw retrieved notes into the answer context.

🧪 Run the Example

The repository includes a complete compact benchmark file:

examples/minimal_dialogue.jsonl

It contains:

three timestamped dialogue sessions,
two QA pairs,
answer-session provenance,
one fact lookup question and one temporal tracking question.

After running the quick-start commands, the generated policy rows contain:

{
  "intent_plan": {"intent": "fact_lookup", "...": "..."},
  "memory_actions": [{"action": "KEEP", "memory_id": "..."}],
  "memory_chain": [{"step_id": "c1", "memory_ids": ["..."]}],
  "active_memories": ["..."]
}

🏋️ Training

The intended training setup is H200 eight-card training.

In our experiments, the MemChain policy is trained with an SFT-to-RL workflow on 8 x H200 GPUs. The released repository keeps the method-facing modules, schema, memory-pool construction, policy IO, reward utilities, metrics, and smoke tests.

Private dataset paths, model checkpoints, API keys, service endpoints, raw benchmark dumps, and machine-specific launch scripts are intentionally not included.

🗂️ Code Layout

memchain/
  data/benchmarks/base.py        # normalized dialogue/session/QA dataclasses
  memory_pool/intent_guided.py   # answer-blind candidate memory pool
  schema.py                      # public MemChain data schema
  framework.py                   # framework wrapper and active-memory composer
  prompts.py                     # policy and teacher prompts
  policy_io.py                   # policy JSON parsing and SFT row export
  reward.py                      # active-memory reward utility
  metrics.py                     # trace and active-memory metrics
  llm/                           # optional OpenAI-compatible clients
scripts/
  build_memory_pool.py           # build candidate pools from normalized JSONL
  run_heuristic_policy.py        # deterministic policy sanity check
examples/
  minimal_dialogue.jsonl         # runnable open-source example
tests/
  test_core.py

🔌 Data Interface

Input data should be normalized as dialogues with sessions and QA pairs. The core dataclasses are in memchain/data/benchmarks/base.py.

Each policy input uses:

sample_id
question
candidate_memories
optional gold_answer
optional metadata

Each policy output uses:

intent_plan
memory_actions
memory_chain
active_memories
sufficiency

📦 Scope

This repository contains the MemChain core implementation for open-source inspection and extension. It does not include third-party comparison code, benchmark raw data, trained checkpoints, evaluation outputs, paper drafts, private API keys, private endpoints, or machine-specific paths.

📝 Citation

@misc{memchain2026,
  title  = {MemChain: Learning Interpretable Memory Traces for Memory-Augmented LLM Agents},
  author = {Ma, Yiwen},
  year   = {2026},
  note   = {Open-source code release}
}

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

We would like to thank the following projects and teams:

🔍 Embedding Backend: Sentence Transformers and compatible embedding models, such as Qwen3-Embedding, for optional dense candidate-memory search.
🧮 Retrieval Core: NumPy-backed in-memory dense scoring plus MemChain's BM25, entity, temporal, and feedback retrieval views for provenance-grounded candidate pools.
📊 Benchmark: LoCoMo - long-context memory evaluation framework.
📚 Benchmark: LongMemEval - long-range memory evaluation benchmark for conversational agents.
⚖️ Evaluation Judge: LoCoMo-Refined by Memorax AI - refined judge support for more reliable long-memory answer evaluation.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
assets		assets
examples		examples
memchain		memchain
scripts		scripts
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning Interpretable Memory Traces for Memory-Augmented
LLM Agents

✨ Highlights

🚀 Quick Start

🧭 Method Overview

🧪 Run the Example

🏋️ Training

🗂️ Code Layout

🔌 Data Interface

📦 Scope

📝 Citation

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Learning Interpretable Memory Traces for Memory-AugmentedLLM Agents

✨ Highlights

🚀 Quick Start

🧭 Method Overview

🧪 Run the Example

🏋️ Training

🗂️ Code Layout

🔌 Data Interface

📦 Scope

📝 Citation

📄 License

🙏 Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Learning Interpretable Memory Traces for Memory-Augmented
LLM Agents

Packages