rlib 3.0.0
Some AI powered much needed TLC
A large modernisation of the library covering packaging, the env layer, the agent / trainer split, configuration, the CLI, docs and CI. This release contains breaking changes for users still on legacy gym, or anything that imported from rlib.networks, rlib.utils.SyncMultiEnvTrainer, or rlib.utils.VecEnv.
Highlights
- New
rlib.envspackage — single canonical Gymnasium 5-tuple contract (RLEnv/RLVecEnvABCs).BatchEnv(multiprocessing),DummyBatchEnv(in-process), Atari + classic-control wrappers, and theApplePickerexploration env all live here. The(terminated, truncated) → donecollapse happens once inRLVecEnv.merge_done/merge_info. - New
rlib.trainingpackage — promoted out ofrlib.utils. AddsTrainerConfig+ per-agent subclasses,TrainMode(StrEnum), aReturnsenum-of-functions wrappingnstep_return/lambda_return/GAE, a pluggableValidator, tqdm progress bars on every training loop with livescore / loss / fpspostfix, and auto-logged hyperparameters. - New
rlib.agent.Agentbase class +ModelConfig— every agent split intomodel.py(network + frozen-dataclass config) and trainer.py (loop + per-trainer config). Reusable network blocks live inrlib.models(replacingrlib.networks). - YAML CLI — every agent module is now runnable as
Hydra-style
python -m rlib.A2C examples/paper/configs/classic_a2c.yaml python -m rlib.PPO examples/paper/configs/atari_ppo.yaml --set agent.config.lr=3e-4
constructor: dotted.path,partial: true,${name}interpolation, and helper factoriesatari_envs(...)/classic_envs(...)/clone_module(...). - Examples + paper reproductions — cartpole_a2c.py, atari_ppo.py, montezuma_rnd.py, plus paper with 11 Python recipe scripts and 11 matching YAML configs reproducing every (agent, env-class) pair from arXiv:1910.09281.
- 🌐 Docs site — MkDocs + Material with auto-generated mkdocstrings API reference for every public module. Deployed at https://jhare96.github.io/reinforcement-learning/.
- Packaging + CI — Apache 2.0 license, PEP 621 pyproject.toml with
[atari]/[classic]/[mujoco]/[docs]/[dev]extras, Dockerfile, PEP 561py.typed. GitHub Actions CI:ruff+mypy+pytest(3.11 + 3.12) +python -m build+twine check. Makefile mirrors CI 1:1 (make ciruns the same steps locally). Pre-commit config included.
Breaking changes
- Python 3.11+ required (was 3.8+). Driven by
enum.memberin theReturnsenum. - PyTorch 1.13+, Gymnasium 0.29+ required.
- Legacy
gymsupport removed. Usegymnasium.make(re-exported asrlib.envs.make). rlib.utils.gym_compatremoved.rlib.networksremoved — seerlib.agent(base +ModelConfig) andrlib.models(network blocks).rlib.utils.SyncMultiEnvTrainermoved torlib.training.rlib.utils.VecEnv/rlib.utils.wrappersmoved torlib.envs/rlib.envs.wrappers.- Trainers + agents are config-only — pass a
TrainerConfig/ModelConfiginstead of a long kwargs list. Trainers consume agents throughself.agent(wasself.model).
Migration
-from rlib.utils.SyncMultiEnvTrainer import SyncMultiEnvTrainer
+from rlib.training import SyncMultiEnvTrainer, TrainerConfig
-from rlib.utils.VecEnv import BatchEnv, DummyBatchEnv
+from rlib.envs import BatchEnv, DummyBatchEnv
-from rlib.utils.wrappers import AtariEnv
+from rlib.envs.wrappers import AtariEnv
-from rlib.networks.networks import NatureCNN
+from rlib.models import NatureCNN
-from rlib.utils.gym_compat import gym
+import gymnasium as gym# v2 trainer call
trainer = A2C(envs, model=model, total_steps=int(1e5), nsteps=5, ...)
# v3 trainer call
trainer = A2CTrainer(envs, agent, val_envs, config=TrainerConfig(
total_steps=int(1e5), nsteps=5, ...,
))Fixed
- Several latent bugs in trainers, models and replay memory shaken out by the refactor (
fastsample().item()on 0-d tensors, RND/RANDAL observation-shape mismatches, n-step DDQNnstepstypo,get_valueshape,ApplePicker.generate_random_locsmembership bug, etc.). - Lint, format and mypy clean across the supported surface.
Acknowledgements
Wrappers adapted from OpenAI Baselines (see NOTICE). The RANDAL agent originated in arXiv:1910.09281.
Full Changelog: 2.0.0...v3.0.0