AReaL: A Large-Scale Asynchronous Reinforcement Learning System

AReaL is an open-source fully asynchronous reinforcement learning training system for large reasoning and agentic models, developed by members from Tsinghua IIIS and the AReaL Team at Ant Group. Built upon the open-source project ReaLHF, we are fully committed to open-source principles by providing the training details, data, and infrastructure required to reproduce our results, along with the models themselves. AReaL aims to help everyone build their own AI agents easily and affordably. Our team loves milk tea because it's delicious, customizable, and affordable—we hope you enjoy our project just as much as you'd enjoy real milk tea. Cheers!

AReaL Highlights

📰 News

[2026/03/02] We provide a complete example to train your own 🦞 OpenClaw agent by simply replacing the base_url and api_key with AReaL's RL service - no complicated dependencies, no code changes, works with any agentic runtime!

[2026/02/06] We are delighted to introduce AReaL-SEA, a self-evolving data synthesis engine. Combined with RL training on AReaL, the 235B MoE model surpasses GPT 5 and achieves comparable performance with Gemini 3.0 Pro on $\tau^2$ -bench! Check out the paper, model, data, and code.

[2026/01/15] Congrats to our friends at CAMEL-AI for open-sourcing SETA, their terminal agent RL project trained with AReaL! Check out their training workflow and the announcement on X.

🚀 Getting Started

First, install the package:

git clone https://github.com/inclusionAI/AReaL cd AReaL pip install uv uv sync --extra cuda

Our training scripts automatically download the required dataset (openai/gsm8k) and model (Qwen/Qwen2-1.5B-Instruct). To run on a single node:

python3 examples/math/gsm8k_rl.py --config examples/math/gsm8k_grpo.yaml scheduler.type=local

To run on a Ray cluster with 2 nodes and 8 GPUs per node (remember to update paths in the YAML file to point to your shared storage):

python3 examples/math/gsm8k_rl.py --config examples/math/gsm8k_grpo.yaml \ cluster.n_nodes=2 cluster.n_gpus_per_node=8 \ scheduler.type=ray

For comprehensive setup instructions, see our quickstart guide.

📚 Examples

Math & Reasoning

Task Description Performance Math GSM8K math reasoning with GRPO, PPO, DAPO, REINFORCE, RLOO, LitePPO, DR-GRPO, GSPO, and more - Multi-Turn Math Multi-turn math agent with reward discounting across turns Training Curve LoRA Math Parameter-efficient math training with LoRA (SGLang/vLLM backends) - Countdown Countdown numbers game with custom rewards Training Curve

Agentic RL

Vision-Language Models

Task Description Performance VLM Geometry3K and CLEVR Count 70K visual reasoning with GRPO - VLM on NPU VLM training on Huawei NPU hardware Benchmark Results

Alignment & Infrastructure

🔧 Support Matrix

🧠 Algorithms

All RL algorithms support both asynchronous and synchronous versions by setting max_head_offpolicyness=0 . See Asynchronous RL Guide.

Models

Model Family Megatron PyTorch FSDP PyTorch Archon Notes Qwen2/3 ✅ ✅ ✅ - Qwen3-MoE ✅ ✅ ✅ - Qwen2.5-VL ❌ ✅ ❌ Vision-language model Qwen3-VL ❌ ✅ ❌ Vision-language model Gemma 3 ❌ ✅ ❌ Vision-language model Other Hugging Face LLM ❌ ✅ ❌ Compatibility depending on the version of transformers

Check the AI Coding Assistant Guide and Archon Reference for how to integrate new models into AReaL.

Training Backends

Backend DP Tensor Parallel Sequence Parallel within TP Context Parallel Pipeline Parallel Expert Parallel 1D Sequence Packing LoRA Megatron ✅ (ZeRO-1) ✅ ✅ ✅ ✅ ✅ ✅ ❌ PyTorch FSDP ✅ (FSDP2) ✅ ✅ ✅ ❌ ❌ ✅ ✅ PyTorch Archon ✅ (FSDP2) ✅ ✅ ✅ ✅ ✅ ✅ ❌

Inference Backends

Backend Tensor Parallel Context Parallel Pipeline Parallel Data Parallel Attention Expert Parallel vLLM ✅ ❓ ✅ ❓ ❓ SGLang ✅ ❌ ❌ ✅ ✅

📖 Resources

Tutorial

Code Walkthrough

Best Practices

Customization

Algorithms

Reference

🤝 Contributing

We warmly welcome contributions from the community! Whether you're fixing bugs, adding features, improving documentation, or helping others, your contribution is valued. Please check our Contributing Guide for detailed information.

# Fork and clone the repository git clone https://github.com/YOUR-USERNAME/AReaL cd AReaL # Install uv and sync dependencies pip install uv # Use `--extra cuda` on Linux with CUDA for full functionality uv sync --extra cuda --group dev # Or without CUDA support # uv sync --group dev # Set up pre-commit hooks (formatting, linting, commit message checks) pre-commit install --install-hooks # Make changes git checkout -b feat/gpt-o5 git add . # `git commit` will automatically check your files and commit messages git commit -m " feat: implement gpt-o5 training loop " git push

🗺️ Future Roadmap

AReaL is under active development with planned minor releases weekly and major releases monthly. We warmly welcome community engagement and contributions. We are also actively hiring interns and full-time employees with open positions in both the US and China.

🙏 Acknowledgments

We gratefully acknowledge that major contributors are from the AReaL Team at the Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University and Ant Group.

We have also received invaluable assistance from the following groups (listed alphabetically):

The Data Intelligence Lab at Ant Research for their data support

@HwVanICI for support on vLLM, LoRA, NPU integration, and more

The Relaxed System Lab at HKUST for seamless collaboration on numerous system-related aspects

The SGLang team for supporting custom weight update features and their contributions during AReaL-lite development

The Super Computing Technology (SCT) team at Ant Group for their expertise in large-scale cluster operations and maintenance

Special thanks to @Lyken17 for providing valuable suggestions throughout the API design process

We also deeply appreciate all pioneering work from the community, particularly the ReaLHF project from OpenPsi Inc. and other outstanding projects, including but not limited to DeepScaleR, Open-Reasoner-Zero, OpenRLHF, VeRL, SGLang, QwQ, Light-R1, and DAPO.

📄 Citation

