Skip to content

v3.0.0

Latest

Choose a tag to compare

@jhare96 jhare96 released this 09 May 20:12
8dfd0af

rlib 3.0.0

Some AI powered much needed TLC

A large modernisation of the library covering packaging, the env layer, the agent / trainer split, configuration, the CLI, docs and CI. This release contains breaking changes for users still on legacy gym, or anything that imported from rlib.networks, rlib.utils.SyncMultiEnvTrainer, or rlib.utils.VecEnv.

Highlights

  • New rlib.envs package — single canonical Gymnasium 5-tuple contract (RLEnv / RLVecEnv ABCs). BatchEnv (multiprocessing), DummyBatchEnv (in-process), Atari + classic-control wrappers, and the ApplePicker exploration env all live here. The (terminated, truncated) → done collapse happens once in RLVecEnv.merge_done / merge_info.
  • New rlib.training package — promoted out of rlib.utils. Adds TrainerConfig + per-agent subclasses, TrainMode (StrEnum), a Returns enum-of-functions wrapping nstep_return / lambda_return / GAE, a pluggable Validator, tqdm progress bars on every training loop with live score / loss / fps postfix, and auto-logged hyperparameters.
  • New rlib.agent.Agent base class + ModelConfig — every agent split into model.py (network + frozen-dataclass config) and trainer.py (loop + per-trainer config). Reusable network blocks live in rlib.models (replacing rlib.networks).
  • YAML CLI — every agent module is now runnable as
    python -m rlib.A2C examples/paper/configs/classic_a2c.yaml
    python -m rlib.PPO examples/paper/configs/atari_ppo.yaml --set agent.config.lr=3e-4
    Hydra-style constructor: dotted.path, partial: true, ${name} interpolation, and helper factories atari_envs(...) / classic_envs(...) / clone_module(...).
  • Examples + paper reproductions — cartpole_a2c.py, atari_ppo.py, montezuma_rnd.py, plus paper with 11 Python recipe scripts and 11 matching YAML configs reproducing every (agent, env-class) pair from arXiv:1910.09281.
  • 🌐 Docs site — MkDocs + Material with auto-generated mkdocstrings API reference for every public module. Deployed at https://jhare96.github.io/reinforcement-learning/.
  • Packaging + CI — Apache 2.0 license, PEP 621 pyproject.toml with [atari] / [classic] / [mujoco] / [docs] / [dev] extras, Dockerfile, PEP 561 py.typed. GitHub Actions CI: ruff + mypy + pytest (3.11 + 3.12) + python -m build + twine check. Makefile mirrors CI 1:1 (make ci runs the same steps locally). Pre-commit config included.

Breaking changes

  • Python 3.11+ required (was 3.8+). Driven by enum.member in the Returns enum.
  • PyTorch 1.13+, Gymnasium 0.29+ required.
  • Legacy gym support removed. Use gymnasium.make (re-exported as rlib.envs.make).
  • rlib.utils.gym_compat removed.
  • rlib.networks removed — see rlib.agent (base + ModelConfig) and rlib.models (network blocks).
  • rlib.utils.SyncMultiEnvTrainer moved to rlib.training.
  • rlib.utils.VecEnv / rlib.utils.wrappers moved to rlib.envs / rlib.envs.wrappers.
  • Trainers + agents are config-only — pass a TrainerConfig / ModelConfig instead of a long kwargs list. Trainers consume agents through self.agent (was self.model).

Migration

-from rlib.utils.SyncMultiEnvTrainer import SyncMultiEnvTrainer
+from rlib.training import SyncMultiEnvTrainer, TrainerConfig

-from rlib.utils.VecEnv  import BatchEnv, DummyBatchEnv
+from rlib.envs           import BatchEnv, DummyBatchEnv

-from rlib.utils.wrappers import AtariEnv
+from rlib.envs.wrappers  import AtariEnv

-from rlib.networks.networks import NatureCNN
+from rlib.models            import NatureCNN

-from rlib.utils.gym_compat import gym
+import gymnasium as gym
# v2 trainer call
trainer = A2C(envs, model=model, total_steps=int(1e5), nsteps=5, ...)

# v3 trainer call
trainer = A2CTrainer(envs, agent, val_envs, config=TrainerConfig(
    total_steps=int(1e5), nsteps=5, ...,
))

Fixed

  • Several latent bugs in trainers, models and replay memory shaken out by the refactor (fastsample().item() on 0-d tensors, RND/RANDAL observation-shape mismatches, n-step DDQN nsteps typo, get_value shape, ApplePicker.generate_random_locs membership bug, etc.).
  • Lint, format and mypy clean across the supported surface.

Acknowledgements

Wrappers adapted from OpenAI Baselines (see NOTICE). The RANDAL agent originated in arXiv:1910.09281.

Full Changelog: 2.0.0...v3.0.0