Skip to content

v0.1.0

Pre-release
Pre-release

Choose a tag to compare

@keraJLi keraJLi released this 02 Sep 15:55
· 56 commits to main since this release

This update changes the algorithm interface:

Before (v0.0.x) After (v0.1.0)
from rejax import PPO, PPOConfig
config = PPOConfig.create(**kwargs)      
PPO.train(config, rng)
from rejax import PPO
ppo = PPO.create(**kwargs)      
ppo.train(rng)

Rationale:

  1. It's simpler and more intuitive
  2. Parameters and algorithm subroutines depend on each other (e.g. an algorithm that samples from a replay buffer also has the buffer's size as an HP). It makes sense to collect them in the same class to modularize the algorithm architecture.
  3. We can eliminate a lot of boilerplate code by inheriting from mixins that have both parameters and subroutines.

What's Changed

  • Merged config and algorithm
  • New algorithm: Implicit Quantile Networks by Dabney et al, 2018
  • New algorithm: Parallelised Q Networks by Gallici, Fellows et al, 2024
  • Removed DDPG, as it is now a special case of TD3
  • Added support for more than two critics to SAC and TD3
  • Changed default hyperparameters (to be powers of 2 mostly)
  • Change the name of hyperparameters: (gradient_steps -> num_epochs, tau -> polyak)
  • Removed rejax.evaluate.make_evaluate, use rejax.evaluate.evaluate instead
  • Moved rejax.algos.networks and rejax.algos.buffers to rejax
  • New module: rejax.compat implements loading environments from different packages. Currently supports gymnax, brax, and navix
  • Removed rejax.brax2gymnax (use the new rejax.compat instead)

Full Changelog: v0.0.1...v0.1.0