Skip to content

v0.0.1

Choose a tag to compare

@adohe adohe released this 16 Jun 03:38
· 38 commits to main since this release

Features

Self-Contained Single-Node RL Training

AReno ships as a single Python package with its own CUDA kernels, tensor-parallel inference engine, and OpenAI-compatible serving — no external training/inference backend to wire together. The RL loop is a short cycle of Trainer calls:

from areno.api import Trainer, ArenoConfig, Areno, SamplingParams, gspo_loss_fn

trainer = Trainer(
    world_size=1,
    model_path="Qwen/Qwen3-0.6B",
    backend_type=Areno,
    custom_config=ArenoConfig(tp_size=1),
)
trainer.init()

Or from the CLI:

areno train --ckpt Qwen/Qwen3-0.6B --dataset-path gsm8k:main \
  --reward-fn-path examples/math/math_verify_reward.py --algo gspo --tp-size 4

Swap algorithms via --algo: sft, dpo, gspo, grpo, ppo.

Agentic RL with Tool-Calling Trajectories

Built-in support for training agents that call tools and produce multi-turn trajectories. The trainer provides a local OpenAI-compatible proxy during rollout, parses tool calls, logs message-level trajectories, and assigns rewards at token boundaries — no external agent framework needed.

Key design choices:

  • Continuous batching: new samples enter as completions finish, maximizing GPU utilization during multi-turn rollout
  • Async rollout: separate event loop eliminates GIL contention with the training autograd engine
  • Shared tool parser: same parsing logic in training rollout and areno serve, ensuring consistent behavior

OpenAI-Compatible Serving

areno serve provides /v1/chat/completions and /v1/completions with tensor-parallel inference:

areno serve --model-path /path/to/model --tp-size 1 --port 8000

Multi-Model Support

Per-family model adapters for Qwen3, Qwen3.5, LLaMA, Gemma4, Bailing, and MiniCPM-V, registered through areno/models/registry.py. Add a new model family by creating areno/models/<family>/ — no core changes needed.

PyPI Distribution

Install via pip install areno --no-build-isolation. Automated sdist release workflow publishes from git tags.

Fixes

  • Unified reward function contract across all algorithms
  • Fixed PPO loss function export missing from public API
  • Improved agentic rollout batching and trajectory coalescing
  • Fixed CUDA extension build for sdist/metadata-only installs
  • Aligned serve sampling defaults with training rollout

Documentation and Examples

  • README with installation guide, quick start, and feature highlights
  • Developer contributing guide
  • CLI and SDK operation guides for training, serving, and agentic rollout
  • Agentic rollout examples with rollout sessions

What's Changed

Full Changelog: 3447d53...0c476db