v0.1.0
Pre-release
Pre-release
This update changes the algorithm interface:
| Before (v0.0.x) | After (v0.1.0) |
|---|---|
|
|
Rationale:
- It's simpler and more intuitive
- Parameters and algorithm subroutines depend on each other (e.g. an algorithm that samples from a replay buffer also has the buffer's size as an HP). It makes sense to collect them in the same class to modularize the algorithm architecture.
- We can eliminate a lot of boilerplate code by inheriting from mixins that have both parameters and subroutines.
What's Changed
- Merged config and algorithm
- New algorithm: Implicit Quantile Networks by Dabney et al, 2018
- New algorithm: Parallelised Q Networks by Gallici, Fellows et al, 2024
- Removed DDPG, as it is now a special case of TD3
- Added support for more than two critics to SAC and TD3
- Changed default hyperparameters (to be powers of 2 mostly)
- Change the name of hyperparameters: (
gradient_steps->num_epochs,tau->polyak) - Removed
rejax.evaluate.make_evaluate, userejax.evaluate.evaluateinstead - Moved
rejax.algos.networksandrejax.algos.bufferstorejax - New module:
rejax.compatimplements loading environments from different packages. Currently supports gymnax, brax, and navix - Removed
rejax.brax2gymnax(use the newrejax.compatinstead)
Full Changelog: v0.0.1...v0.1.0