rlib 3.0.0

Some AI powered much needed TLC

A large modernisation of the library covering packaging, the env layer, the agent / trainer split, configuration, the CLI, docs and CI. This release contains breaking changes for users still on legacy gym, or anything that imported from rlib.networks, rlib.utils.SyncMultiEnvTrainer, or rlib.utils.VecEnv.

Highlights

New rlib.envs package — single canonical Gymnasium 5-tuple contract (RLEnv / RLVecEnv ABCs). BatchEnv (multiprocessing), DummyBatchEnv (in-process), Atari + classic-control wrappers, and the ApplePicker exploration env all live here. The (terminated, truncated) → done collapse happens once in RLVecEnv.merge_done / merge_info.
New rlib.training package — promoted out of rlib.utils. Adds TrainerConfig + per-agent subclasses, TrainMode (StrEnum), a Returns enum-of-functions wrapping nstep_return / lambda_return / GAE, a pluggable Validator, tqdm progress bars on every training loop with live score / loss / fps postfix, and auto-logged hyperparameters.
New rlib.agent.Agent base class + ModelConfig — every agent split into model.py (network + frozen-dataclass config) and trainer.py (loop + per-trainer config). Reusable network blocks live in rlib.models (replacing rlib.networks).
YAML CLI — every agent module is now runnable as
```
python -m rlib.A2C examples/paper/configs/classic_a2c.yaml
python -m rlib.PPO examples/paper/configs/atari_ppo.yaml --set agent.config.lr=3e-4
```
Hydra-style constructor: dotted.path, partial: true, ${name} interpolation, and helper factories atari_envs(...) / classic_envs(...) / clone_module(...).
Examples + paper reproductions — cartpole_a2c.py, atari_ppo.py, montezuma_rnd.py, plus paper with 11 Python recipe scripts and 11 matching YAML configs reproducing every (agent, env-class) pair from arXiv:1910.09281.
🌐 Docs site — MkDocs + Material with auto-generated mkdocstrings API reference for every public module. Deployed at https://jhare96.github.io/reinforcement-learning/.
Packaging + CI — Apache 2.0 license, PEP 621 pyproject.toml with [atari] / [classic] / [mujoco] / [docs] / [dev] extras, Dockerfile, PEP 561 py.typed. GitHub Actions CI: ruff + mypy + pytest (3.11 + 3.12) + python -m build + twine check. Makefile mirrors CI 1:1 (make ci runs the same steps locally). Pre-commit config included.

Breaking changes

Python 3.11+ required (was 3.8+). Driven by enum.member in the Returns enum.
PyTorch 1.13+, Gymnasium 0.29+ required.
Legacy gym support removed. Use gymnasium.make (re-exported as rlib.envs.make).
rlib.utils.gym_compat removed.
rlib.networks removed — see rlib.agent (base + ModelConfig) and rlib.models (network blocks).
rlib.utils.SyncMultiEnvTrainer moved to rlib.training.
rlib.utils.VecEnv / rlib.utils.wrappers moved to rlib.envs / rlib.envs.wrappers.
Trainers + agents are config-only — pass a TrainerConfig / ModelConfig instead of a long kwargs list. Trainers consume agents through self.agent (was self.model).

Migration

-from rlib.utils.SyncMultiEnvTrainer import SyncMultiEnvTrainer
+from rlib.training import SyncMultiEnvTrainer, TrainerConfig

-from rlib.utils.VecEnv  import BatchEnv, DummyBatchEnv
+from rlib.envs           import BatchEnv, DummyBatchEnv

-from rlib.utils.wrappers import AtariEnv
+from rlib.envs.wrappers  import AtariEnv

-from rlib.networks.networks import NatureCNN
+from rlib.models            import NatureCNN

-from rlib.utils.gym_compat import gym
+import gymnasium as gym

# v2 trainer call
trainer = A2C(envs, model=model, total_steps=int(1e5), nsteps=5, ...)

# v3 trainer call
trainer = A2CTrainer(envs, agent, val_envs, config=TrainerConfig(
    total_steps=int(1e5), nsteps=5, ...,
))

Fixed

Several latent bugs in trainers, models and replay memory shaken out by the refactor (fastsample().item() on 0-d tensors, RND/RANDAL observation-shape mismatches, n-step DDQN nsteps typo, get_value shape, ApplePicker.generate_random_locs membership bug, etc.).
Lint, format and mypy clean across the supported surface.

Acknowledgements

Wrappers adapted from OpenAI Baselines (see NOTICE). The RANDAL agent originated in arXiv:1910.09281.

Full Changelog: 2.0.0...v3.0.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v3.0.0

Choose a tag to compare

Sorry, something went wrong.