Release v0.1.0 · keraJLi/rejax

This update changes the algorithm interface:

Before (v0.0.x)	After (v0.1.0)
`from rejax import PPO, PPOConfig config = PPOConfig.create(**kwargs) PPO.train(config, rng)`	`from rejax import PPO ppo = PPO.create(**kwargs) ppo.train(rng)`

Rationale:

It's simpler and more intuitive
Parameters and algorithm subroutines depend on each other (e.g. an algorithm that samples from a replay buffer also has the buffer's size as an HP). It makes sense to collect them in the same class to modularize the algorithm architecture.
We can eliminate a lot of boilerplate code by inheriting from mixins that have both parameters and subroutines.

What's Changed

Merged config and algorithm
New algorithm: Implicit Quantile Networks by Dabney et al, 2018
New algorithm: Parallelised Q Networks by Gallici, Fellows et al, 2024
Removed DDPG, as it is now a special case of TD3
Added support for more than two critics to SAC and TD3
Changed default hyperparameters (to be powers of 2 mostly)
Change the name of hyperparameters: (gradient_steps -> num_epochs, tau -> polyak)
Removed rejax.evaluate.make_evaluate, use rejax.evaluate.evaluate instead
Moved rejax.algos.networks and rejax.algos.buffers to rejax
New module: rejax.compat implements loading environments from different packages. Currently supports gymnax, brax, and navix
Removed rejax.brax2gymnax (use the new rejax.compat instead)

Full Changelog: v0.0.1...v0.1.0