Skip to content

kidding-404/nano-verl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

nano-verl

nano-verl

中文版

A lightweight verl-style RL training framework implemented from scratch.

Core Features

  1. Readability: nanoverl has about 6k lines of code, compared with 90K+ lines in verl.
  2. Distributed training: uses FSDP+vLLM as the training and inference backends, with Ray for distributed management. Supports rollout load balancing, dynamic batch, remove padding, and more.
  3. Asynchronous support: supports one-step-off-policy asynchronous training, enabled by setting trainer.mode=one_step_off.

Installation

  1. Clone the code:
git clone https://github.com/kidding-404/nano-verl.git
cd nano-verl
  1. Install dependencies with uv:
uv sync
  1. Find the compatible flash-attn wheel and install it separately:
uv run pip install <flash_attn_wheel_url>

Quick Start

Train qwen3-0.6B on the gsm8k dataset:

uv run python main.py --config configs/gsm8k-qwen3-0.6b-single-gpu.yaml

You can also train qwen3-1.7B asynchronously on two GPUs:

uv run python main.py --config configs/gsm8k-qwen3-1.7b-1p1-async.yaml

Benchmark

Test configuration:

  • Model: Qwen3-4B
  • Trainset: DAPO-17K
  • Reward: 1/-1 accuracy reward
  • Steps: 150
  • Global batch size: 64
  • Rollout n: 8
  • Prompt length: 1024
  • Response length: 8192
  • Hardware: 1 node, 8 x NVIDIA H100 80GB HBM3

Reward curve:

nano-verl vs verl reward convergence

Performance comparison:

Setting AIME24 avg16 AIME24 pass@16 AIME25 avg16 AIME25 pass@16
Qwen3-4B Base 0.4333 0.7000 0.3563 0.5333
Qwen3-4B + verl 0.5313 0.8333 0.4417 0.6667
Qwen3-4B + nano-verl 0.535 0.8333 0.429 0.6667

About

A lightweight verl-style RL training framework.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages