Skip to content

Releases: jhare96/reinforcement-learning

v3.0.0

09 May 20:12
8dfd0af

Choose a tag to compare

rlib 3.0.0

Some AI powered much needed TLC

A large modernisation of the library covering packaging, the env layer, the agent / trainer split, configuration, the CLI, docs and CI. This release contains breaking changes for users still on legacy gym, or anything that imported from rlib.networks, rlib.utils.SyncMultiEnvTrainer, or rlib.utils.VecEnv.

Highlights

  • New rlib.envs package — single canonical Gymnasium 5-tuple contract (RLEnv / RLVecEnv ABCs). BatchEnv (multiprocessing), DummyBatchEnv (in-process), Atari + classic-control wrappers, and the ApplePicker exploration env all live here. The (terminated, truncated) → done collapse happens once in RLVecEnv.merge_done / merge_info.
  • New rlib.training package — promoted out of rlib.utils. Adds TrainerConfig + per-agent subclasses, TrainMode (StrEnum), a Returns enum-of-functions wrapping nstep_return / lambda_return / GAE, a pluggable Validator, tqdm progress bars on every training loop with live score / loss / fps postfix, and auto-logged hyperparameters.
  • New rlib.agent.Agent base class + ModelConfig — every agent split into model.py (network + frozen-dataclass config) and trainer.py (loop + per-trainer config). Reusable network blocks live in rlib.models (replacing rlib.networks).
  • YAML CLI — every agent module is now runnable as
    python -m rlib.A2C examples/paper/configs/classic_a2c.yaml
    python -m rlib.PPO examples/paper/configs/atari_ppo.yaml --set agent.config.lr=3e-4
    Hydra-style constructor: dotted.path, partial: true, ${name} interpolation, and helper factories atari_envs(...) / classic_envs(...) / clone_module(...).
  • Examples + paper reproductions — cartpole_a2c.py, atari_ppo.py, montezuma_rnd.py, plus paper with 11 Python recipe scripts and 11 matching YAML configs reproducing every (agent, env-class) pair from arXiv:1910.09281.
  • 🌐 Docs site — MkDocs + Material with auto-generated mkdocstrings API reference for every public module. Deployed at https://jhare96.github.io/reinforcement-learning/.
  • Packaging + CI — Apache 2.0 license, PEP 621 pyproject.toml with [atari] / [classic] / [mujoco] / [docs] / [dev] extras, Dockerfile, PEP 561 py.typed. GitHub Actions CI: ruff + mypy + pytest (3.11 + 3.12) + python -m build + twine check. Makefile mirrors CI 1:1 (make ci runs the same steps locally). Pre-commit config included.

Breaking changes

  • Python 3.11+ required (was 3.8+). Driven by enum.member in the Returns enum.
  • PyTorch 1.13+, Gymnasium 0.29+ required.
  • Legacy gym support removed. Use gymnasium.make (re-exported as rlib.envs.make).
  • rlib.utils.gym_compat removed.
  • rlib.networks removed — see rlib.agent (base + ModelConfig) and rlib.models (network blocks).
  • rlib.utils.SyncMultiEnvTrainer moved to rlib.training.
  • rlib.utils.VecEnv / rlib.utils.wrappers moved to rlib.envs / rlib.envs.wrappers.
  • Trainers + agents are config-only — pass a TrainerConfig / ModelConfig instead of a long kwargs list. Trainers consume agents through self.agent (was self.model).

Migration

-from rlib.utils.SyncMultiEnvTrainer import SyncMultiEnvTrainer
+from rlib.training import SyncMultiEnvTrainer, TrainerConfig

-from rlib.utils.VecEnv  import BatchEnv, DummyBatchEnv
+from rlib.envs           import BatchEnv, DummyBatchEnv

-from rlib.utils.wrappers import AtariEnv
+from rlib.envs.wrappers  import AtariEnv

-from rlib.networks.networks import NatureCNN
+from rlib.models            import NatureCNN

-from rlib.utils.gym_compat import gym
+import gymnasium as gym
# v2 trainer call
trainer = A2C(envs, model=model, total_steps=int(1e5), nsteps=5, ...)

# v3 trainer call
trainer = A2CTrainer(envs, agent, val_envs, config=TrainerConfig(
    total_steps=int(1e5), nsteps=5, ...,
))

Fixed

  • Several latent bugs in trainers, models and replay memory shaken out by the refactor (fastsample().item() on 0-d tensors, RND/RANDAL observation-shape mismatches, n-step DDQN nsteps typo, get_value shape, ApplePicker.generate_random_locs membership bug, etc.).
  • Lint, format and mypy clean across the supported surface.

Acknowledgements

Wrappers adapted from OpenAI Baselines (see NOTICE). The RANDAL agent originated in arXiv:1910.09281.

Full Changelog: 2.0.0...v3.0.0

rlib 2.0.0

27 Aug 16:18

Choose a tag to compare

Pytorch conversion of synchronous reinforcement learning algorithms

rlib 1.0.0

27 Aug 16:30
0744a52

Choose a tag to compare

A small reinforcement learning library used for MSc dissertation project 'Dealing with sparse rewards in reinforcement learning' at the University of Sheffield.

Uses Tensorflow v1.14 as the framework for training the neural networks models used by the RL agents.

This repository has working implementations of the following reinforcement agents:

  1. Advantage Actor Critic (A2C)
  2. Synchronous n-step Double Deep Q Network (Sync-DDQN)
  3. Proximal Policy Optimisation (PPO)
  4. Random Network Distillation (RND)
  5. UNREAL-A2C2, A2C-CNN version of the UNREAL agent
  6. Random Network Distillation with Auxiliary Learning (RANDAL), novel solution combining UNREAL and RND agents