Skip to content

mayiwen0212/MemChain

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MemChain

Learning Interpretable Memory Traces for Memory-Augmented
LLM Agents

License Python Tests Figure

Quick Start · Method Overview · Run the Example · Training · Code Layout

Candidate Pool
Candidate Pool
Intent Plan
Intent Plan
Memory Chain
Memory Chain
Active Memory
Active Memory
SFT to RL
SFT + RL

MemChain overview

✨ Highlights

MemChain is a read-time memory policy for long-dialogue agents. Given a user question and a candidate memory pool, it produces an explicit memory trace before the answer model is called:

  • 🧭 intent_plan: what evidence the question requires.
  • 🧩 memory_actions: which memories to keep, drop, refine, merge, or stop on.
  • 🔗 memory_chain: ordered evidence steps with memory citations.
  • 📝 active_memories: compact answer-ready memories passed to a frozen answer model.

The candidate-memory pool is answer-blind: it uses the question and dialogue provenance, not the gold answer. This keeps retrieval, policy learning, and answer generation separated and auditable.

🚀 Quick Start

📦 Install

git clone https://github.com/mayiwen0212/MemChain.git
cd MemChain
pip install -e ".[dev]"
pytest -q

🧠 Build a candidate pool and run the policy

python scripts/build_memory_pool.py \
  --input examples/minimal_dialogue.jsonl \
  --output examples/out/minimal_candidates.jsonl \
  --max-candidates 8 \
  --no-dense

python scripts/run_heuristic_policy.py \
  --input examples/out/minimal_candidates.jsonl \
  --output examples/out/minimal_policy_outputs.jsonl \
  --keep-k 3

🔍 Inspect one MemChain policy output

head -n 1 examples/out/minimal_policy_outputs.jsonl | python -m json.tool | head -80

The example is compact, but it follows the same public contract as the full experiments: normalized dialogue history, answer-blind candidate construction, structured policy output, and active-memory composition.

🧭 Method Overview

MemChain follows this read-time pipeline:

  1. Convert long dialogue history into provenance-preserving candidate memories.
  2. Infer the evidence need from the question.
  3. Build a bounded candidate pool with lexical, entity, temporal, and multi-hop retrieval views.
  4. Produce a structured MemChain trace with actions and cited chain steps.
  5. Pass only question + active_memories to the answer model.

This design makes memory use explicit instead of silently stuffing raw retrieved notes into the answer context.

🧪 Run the Example

The repository includes a complete compact benchmark file:

examples/minimal_dialogue.jsonl

It contains:

  • three timestamped dialogue sessions,
  • two QA pairs,
  • answer-session provenance,
  • one fact lookup question and one temporal tracking question.

After running the quick-start commands, the generated policy rows contain:

{
  "intent_plan": {"intent": "fact_lookup", "...": "..."},
  "memory_actions": [{"action": "KEEP", "memory_id": "..."}],
  "memory_chain": [{"step_id": "c1", "memory_ids": ["..."]}],
  "active_memories": ["..."]
}

🏋️ Training

The intended training setup is H200 eight-card training.

In our experiments, the MemChain policy is trained with an SFT-to-RL workflow on 8 x H200 GPUs. The released repository keeps the method-facing modules, schema, memory-pool construction, policy IO, reward utilities, metrics, and smoke tests.

Private dataset paths, model checkpoints, API keys, service endpoints, raw benchmark dumps, and machine-specific launch scripts are intentionally not included.

🗂️ Code Layout

memchain/
  data/benchmarks/base.py        # normalized dialogue/session/QA dataclasses
  memory_pool/intent_guided.py   # answer-blind candidate memory pool
  schema.py                      # public MemChain data schema
  framework.py                   # framework wrapper and active-memory composer
  prompts.py                     # policy and teacher prompts
  policy_io.py                   # policy JSON parsing and SFT row export
  reward.py                      # active-memory reward utility
  metrics.py                     # trace and active-memory metrics
  llm/                           # optional OpenAI-compatible clients
scripts/
  build_memory_pool.py           # build candidate pools from normalized JSONL
  run_heuristic_policy.py        # deterministic policy sanity check
examples/
  minimal_dialogue.jsonl         # runnable open-source example
tests/
  test_core.py

🔌 Data Interface

Input data should be normalized as dialogues with sessions and QA pairs. The core dataclasses are in memchain/data/benchmarks/base.py.

Each policy input uses:

  • sample_id
  • question
  • candidate_memories
  • optional gold_answer
  • optional metadata

Each policy output uses:

  • intent_plan
  • memory_actions
  • memory_chain
  • active_memories
  • sufficiency

📦 Scope

This repository contains the MemChain core implementation for open-source inspection and extension. It does not include third-party comparison code, benchmark raw data, trained checkpoints, evaluation outputs, paper drafts, private API keys, private endpoints, or machine-specific paths.

📝 Citation

@misc{memchain2026,
  title  = {MemChain: Learning Interpretable Memory Traces for Memory-Augmented LLM Agents},
  author = {Ma, Yiwen},
  year   = {2026},
  note   = {Open-source code release}
}

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

We would like to thank the following projects and teams:

  • 🔍 Embedding Backend: Sentence Transformers and compatible embedding models, such as Qwen3-Embedding, for optional dense candidate-memory search.
  • 🧮 Retrieval Core: NumPy-backed in-memory dense scoring plus MemChain's BM25, entity, temporal, and feedback retrieval views for provenance-grounded candidate pools.
  • 📊 Benchmark: LoCoMo - long-context memory evaluation framework.
  • 📚 Benchmark: LongMemEval - long-range memory evaluation benchmark for conversational agents.
  • ⚖️ Evaluation Judge: LoCoMo-Refined by Memorax AI - refined judge support for more reliable long-memory answer evaluation.

About

Official implementation of MemChain for interpretable memory reasoning in memory-augmented LLMs.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages