MEMO

Memory-Augmented Model Context Optimization for Robust Multi-Turn Multi-Agent LLM Games

Yunfei Xie^,1, Kevin Wang,+,2, Bobby Cheng^*,4, Jianzhu Yao³, Zhizhou Sha², Alexander Duffy⁵, Yihan Xi², Hongyuan Mei⁶, Cheston Tan⁴, Chen Wei^†,1, Pramod Viswanath^†,3, Zhangyang Wang^†,2

¹Rice University, ²The University of Texas at Austin, ³Princeton University, ⁴A*STAR, ⁵Good Start Labs, ⁶TTIC

^*Equal Contribution ⁺Project Leader ^†Equal Advising

Introduction

MEMO is a self-play framework that treats inference-time context as an optimizable, agentic object by coupling retention and exploration. Retention maintains a persistent memory bank that distills self-play trajectories into structured insights, consolidates them via CRUD-style updates, and injects them as priors during subsequent play. Exploration performs tournament-style prompt evolution with uncertainty-aware selection via TrueSkill, and uses prioritized replay to revisit vital states for sample-efficient coverage.

Across five text-based games, MEMO raises mean win rate from 24.9% to 49.5% for GPT-4o-mini and 21.7% to 44.3% for Qwen-2.5-7B-Instruct using a mere budget of 2000 self-play games per task.

Left: Standard LLM agents discard experience after each game. Middle: RL approaches learn from reward signals but require many samples. Right: MEMO recognizes patterns from trajectories, extracts insights, and accumulates them in a memory bank across generations.

The system combines:

Tournament-based evaluation using TrueSkill / win-rate ratings
Trajectory memory for learning from successful game patterns
Replay buffer for prioritized experience replay of strategic game states
Multiple evolution strategies (keep-best, random exploration, memory-guided, trajectory-based, crossover)
Integration with TextArena for diverse multi-agent text-based games

Key Features

Self-Play Prompt Evolution — Prompts compete against each other and evolve based on performance
Trajectory Memory System — Extracts strategic insights from games to guide prompt improvements
Replay Buffer — Prioritized sampling of past game states to diversify tournament play
Multiple Evolution Strategies — Keep top performers, random exploration, memory-guided generation, trajectory-based, crossover
Flexible Evaluation — Evaluate evolved prompts against configurable lists of eval models
W&B Integration — Track performance, token usage, and diversity metrics across generations

Supported Games

MEMO works with any TextArena environment. The games we commonly evaluate on:

Game	Env ID	Players	Type	Description
SimpleNegotiation	`SimpleNegotiation-v0-short`	2	Negotiation	Trading game where players negotiate resource exchanges to maximize inventory value, with asymmetric valuations of five resources (Wheat, Wood, Sheep, Brick, Ore).
SimpleTak	`SimpleTak-v0`	2	Strategy	Connection game where players place stones to form a continuous path connecting opposite edges of the board.
KuhnPoker	`KuhnPoker-v0-short`	2	Imperfect Info	Simplified poker variant using a 3-card deck (J, Q, K) with a single betting round.
TicTacToe	`TicTacToe-v0`	2	Strategy	Classic grid game — take turns marking cells to form three in a row.
Briscola	`Briscola-v0`	2–4	Trick-Taking	Italian card game using a 40-card deck with trump suits; collect tricks to score points.

Several games support length variants (e.g., -short, -medium, -long, -extreme) that control the number of rounds or turns per game.

Installation

Prerequisites

Python >= 3.10
API keys for LLM providers (OpenRouter, OpenAI, etc.) in a .env file

Clone with Submodules

git clone <repo-url>
cd <repo>
git submodule update --init  # Do NOT use --recursive

Warning: Do not use --recurse-submodules or --recursive. The TextArena submodule contains a nested self-referential submodule that should not be initialized.

Install Dependencies

pip install -e .
# or
pip install -r requirements.txt

This installs TextArena from the mpr branch submodule along with all other dependencies (wandb, trueskill, torch, etc.).

Environment Variables

Create a .env file in mpr/ with your API keys, or export to set these api key:

OPENROUTER_API_KEY=sk-or-...
WANDB_API_KEY=...

Quick Start

Run a prompt evolution experiment using a shell script:

bash mpr/scripts/tests_latest/simplenegotiation.sh

Or configure and run directly:

python -m mpr.self_play_prompt_evolution_memory \
    --model "gpt-4o-mini" \
    --baseline-model "gpt-4o-mini" \
    --base-prompt "You are playing a two-player game. Make valid moves to win. Submit the move enclosed by \boxed{}." \
    --eval-model-list google/gemini-2.5-flash-lite \
    --env "SimpleNegotiation-v0-short" \
    --generations 5 \
    --tournament-rounds 25 \
    --eval-rounds 25 \
    --population-size 8 \
    --keep-ratio 0.25 \
    --random-ratio 0.75 \
    --fitness-method winrate \
    --use-replay true \
    --skip-baseline-eval

Key Configuration Options

Parameter	Description	Default
`--population-size`	Number of prompt candidates per generation	10
`--generations`	Number of evolution generations	10
`--tournament-rounds`	Games per self-play tournament phase	5
`--eval-rounds`	Games per evaluation phase	3
`--keep-ratio`	Fraction of top prompts kept as elites	0.3
`--random-ratio`	Fraction of new random prompts	0.2
`--memory-guided-ratio`	Fraction using memory insights	0.0
`--trajectory-ratio`	Fraction from trajectory-based learning	0.3
`--crossover-ratio`	Fraction from crossover	0.2
`--fitness-method`	`"winrate"` or `"trueskill"`	`"trueskill"`
`--temperature`	Sampling temperature (0.0 = deterministic)	1.0
`--use-replay`	Enable replay buffer for diverse play	`false`
`--skip-baseline-eval`	Skip generation-0 baseline evaluation	`false`
`--eval-model-list`	Models to evaluate best candidate against	—

Note: Evolution ratios must sum to 1.0.

Evolution Strategies

Keep — Retain top-performing prompts unchanged
Random — Generate novel prompts (optionally guided by memory insights)
Memory-Guided — Use extracted game insights to improve prompts
Trajectory-Based — Learn from game trajectories and reflections
Crossover — Combine elements from high-performing prompts

Architecture

┌─────────────────────────────────────────────────────────────┐
│                    SelfPlayPromptEvolution                   │
│                    (Main Orchestrator)                       │
└─────────────────────────────────────────────────────────────┘
                              │
           ┌──────────────────┼──────────────────┐
           ▼                  ▼                  ▼
    ┌─────────────┐    ┌─────────────┐    ┌─────────────────┐
    │  Tournament │    │  Prompt     │    │  Trajectory     │
    │  System     │    │  Evolution  │    │  Memory         │
    │             │    │  Engine     │    │  System         │
    └─────────────┘    └─────────────┘    └─────────────────┘
           │                  │                  │
           ▼                  ▼                  ▼
    ┌─────────────┐    ┌─────────────┐    ┌─────────────────┐
    │  AgentPool  │    │  Prompt     │    │  Game Insights  │
    │  + Ratings  │    │  Candidates │    │  + Replay Buffer│
    └─────────────┘    └─────────────┘    └─────────────────┘
           │
           ▼
    ┌─────────────┐
    │  GameRunner │──────▶ TextArena Environments
    └─────────────┘

Project Structure

├── mpr/                                    # MEMO system
│   ├── self_play_prompt_evolution_memory.py # Main orchestrator & CLI entry point
│   ├── cores/                              # Core components
│   │   ├── game_runner.py                  # Async game execution
│   │   ├── templates.py                    # Prompt templates for agents
│   │   ├── tournament_scheduler.py         # Game scheduling (round-robin, vs_best, etc.)
│   │   └── evaluation.py                   # Evaluation utilities
│   ├── tournament/                         # Tournament system
│   │   ├── tournament.py                   # Tournament orchestration & eval pipeline
│   │   └── agent_pool.py                   # Agent management & TrueSkill ratings
│   ├── evaluation/                         # Tournament runners
│   │   └── simple_memory_eval.py           # Main tournament execution (run_tournament)
│   ├── memory/                             # Trajectory memory system
│   │   ├── trajectory_memory_system.py     # Memory system, MemoryEnhancedAgent
│   │   ├── xml_crud_operations.py          # XML-based memory CRUD operations
│   │   └── prompts/                        # LLM prompts for memory operations
│   │       ├── memory_merge/               # Insight merging prompts
│   │       ├── abstract_gen/               # State abstract generation prompts
│   │       └── abstract_merge/             # State abstract merging prompts
│   ├── prompts/                            # Prompt evolution
│   │   └── prompt_evolution_engine.py      # Population management & evolution strategies
│   ├── replaybuffer/                       # Experience replay
│   │   └── replaybuffer.py                 # Prioritized replay buffer
│   ├── utils/                              # Utilities
│   │   ├── output_manager.py               # Log/output directory management
│   │   └── evolution_reporter.py           # Evolution summary reporting
│   └── scripts/                            # Run scripts
│       ├── tests_latest/                   # Current experiment scripts
│       │   ├── simplenegotiation.sh
│       │   ├── simpletak.sh
│       │   └── briscola.sh
│       └── simple_neg_test.sh              # Quick test script
├── TextArena/                              # TextArena submodule (mpr branch)
├── pyproject.toml
├── requirements.txt
└── logs/                                   # Experiment outputs (generated at runtime)

Citation

If you find this work useful, please cite:

@article{xie2025memo,
  title={MEMO: Memory-Augmented Model Context Optimization for Robust Multi-Turn Multi-Agent LLM Games},
  author={Xie, Yunfei and Wang, Kevin and Cheng, Bobby and Yao, Jianzhu and Sha, Zhizhou and Duffy, Alexander and Xi, Yihan and Mei, Hongyuan and Tan, Cheston and Wei, Chen and Viswanath, Pramod and Wang, Zhangyang},
  year={2025}
}

Acknowledgments

The authors thank Good Start Labs and Sentient for their financial support of the experiment costs of this work.

Built on top of:

TextArena — Multi-agent text-based game framework

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
mpr		mpr
scripts		scripts
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MEMO

Introduction

Key Features

Supported Games

Installation

Prerequisites

Clone with Submodules

Install Dependencies

Environment Variables

Quick Start

Key Configuration Options

Evolution Strategies

Architecture

Project Structure

Citation

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MEMO

Introduction

Key Features

Supported Games

Installation

Prerequisites

Clone with Submodules

Install Dependencies

Environment Variables

Quick Start

Key Configuration Options

Evolution Strategies

Architecture

Project Structure

Citation

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages