Releases: jhare96/reinforcement-learning
v3.0.0
rlib 3.0.0
Some AI powered much needed TLC
A large modernisation of the library covering packaging, the env layer, the agent / trainer split, configuration, the CLI, docs and CI. This release contains breaking changes for users still on legacy gym, or anything that imported from rlib.networks, rlib.utils.SyncMultiEnvTrainer, or rlib.utils.VecEnv.
Highlights
- New
rlib.envspackage — single canonical Gymnasium 5-tuple contract (RLEnv/RLVecEnvABCs).BatchEnv(multiprocessing),DummyBatchEnv(in-process), Atari + classic-control wrappers, and theApplePickerexploration env all live here. The(terminated, truncated) → donecollapse happens once inRLVecEnv.merge_done/merge_info. - New
rlib.trainingpackage — promoted out ofrlib.utils. AddsTrainerConfig+ per-agent subclasses,TrainMode(StrEnum), aReturnsenum-of-functions wrappingnstep_return/lambda_return/GAE, a pluggableValidator, tqdm progress bars on every training loop with livescore / loss / fpspostfix, and auto-logged hyperparameters. - New
rlib.agent.Agentbase class +ModelConfig— every agent split intomodel.py(network + frozen-dataclass config) and trainer.py (loop + per-trainer config). Reusable network blocks live inrlib.models(replacingrlib.networks). - YAML CLI — every agent module is now runnable as
Hydra-style
python -m rlib.A2C examples/paper/configs/classic_a2c.yaml python -m rlib.PPO examples/paper/configs/atari_ppo.yaml --set agent.config.lr=3e-4
constructor: dotted.path,partial: true,${name}interpolation, and helper factoriesatari_envs(...)/classic_envs(...)/clone_module(...). - Examples + paper reproductions — cartpole_a2c.py, atari_ppo.py, montezuma_rnd.py, plus paper with 11 Python recipe scripts and 11 matching YAML configs reproducing every (agent, env-class) pair from arXiv:1910.09281.
- 🌐 Docs site — MkDocs + Material with auto-generated mkdocstrings API reference for every public module. Deployed at https://jhare96.github.io/reinforcement-learning/.
- Packaging + CI — Apache 2.0 license, PEP 621 pyproject.toml with
[atari]/[classic]/[mujoco]/[docs]/[dev]extras, Dockerfile, PEP 561py.typed. GitHub Actions CI:ruff+mypy+pytest(3.11 + 3.12) +python -m build+twine check. Makefile mirrors CI 1:1 (make ciruns the same steps locally). Pre-commit config included.
Breaking changes
- Python 3.11+ required (was 3.8+). Driven by
enum.memberin theReturnsenum. - PyTorch 1.13+, Gymnasium 0.29+ required.
- Legacy
gymsupport removed. Usegymnasium.make(re-exported asrlib.envs.make). rlib.utils.gym_compatremoved.rlib.networksremoved — seerlib.agent(base +ModelConfig) andrlib.models(network blocks).rlib.utils.SyncMultiEnvTrainermoved torlib.training.rlib.utils.VecEnv/rlib.utils.wrappersmoved torlib.envs/rlib.envs.wrappers.- Trainers + agents are config-only — pass a
TrainerConfig/ModelConfiginstead of a long kwargs list. Trainers consume agents throughself.agent(wasself.model).
Migration
-from rlib.utils.SyncMultiEnvTrainer import SyncMultiEnvTrainer
+from rlib.training import SyncMultiEnvTrainer, TrainerConfig
-from rlib.utils.VecEnv import BatchEnv, DummyBatchEnv
+from rlib.envs import BatchEnv, DummyBatchEnv
-from rlib.utils.wrappers import AtariEnv
+from rlib.envs.wrappers import AtariEnv
-from rlib.networks.networks import NatureCNN
+from rlib.models import NatureCNN
-from rlib.utils.gym_compat import gym
+import gymnasium as gym# v2 trainer call
trainer = A2C(envs, model=model, total_steps=int(1e5), nsteps=5, ...)
# v3 trainer call
trainer = A2CTrainer(envs, agent, val_envs, config=TrainerConfig(
total_steps=int(1e5), nsteps=5, ...,
))Fixed
- Several latent bugs in trainers, models and replay memory shaken out by the refactor (
fastsample().item()on 0-d tensors, RND/RANDAL observation-shape mismatches, n-step DDQNnstepstypo,get_valueshape,ApplePicker.generate_random_locsmembership bug, etc.). - Lint, format and mypy clean across the supported surface.
Acknowledgements
Wrappers adapted from OpenAI Baselines (see NOTICE). The RANDAL agent originated in arXiv:1910.09281.
Full Changelog: 2.0.0...v3.0.0
rlib 2.0.0
Pytorch conversion of synchronous reinforcement learning algorithms
rlib 1.0.0
A small reinforcement learning library used for MSc dissertation project 'Dealing with sparse rewards in reinforcement learning' at the University of Sheffield.
Uses Tensorflow v1.14 as the framework for training the neural networks models used by the RL agents.
This repository has working implementations of the following reinforcement agents:
- Advantage Actor Critic (A2C)
- Synchronous n-step Double Deep Q Network (Sync-DDQN)
- Proximal Policy Optimisation (PPO)
- Random Network Distillation (RND)
- UNREAL-A2C2, A2C-CNN version of the UNREAL agent
- Random Network Distillation with Auxiliary Learning (RANDAL), novel solution combining UNREAL and RND agents