Release SB3-Contrib v2.3.0: New defaults hyperparameters for QR-DQN · Stable-Baselines-Team/stable-baselines3-contrib

Breaking Changes:

Upgraded to Stable-Baselines3 >= 2.3.0
The default learning_starts parameter of QRDQN have been changed to be consistent with the other offpolicy algorithms

# SB3 < 2.3.0 default hyperparameters, 50_000 corresponded to Atari defaults hyperparameters
# model = QRDQN("MlpPolicy", env, learning_starts=50_000)
# SB3 >= 2.3.0:
model = QRDQN("MlpPolicy", env, learning_starts=100)

New Features:

Added rollout_buffer_class and rollout_buffer_kwargs arguments to MaskablePPO
Log success rate rollout/success_rate when available for on policy algorithms

Others:

Fixed train_freq type annotation for tqc and qrdqn (@Armandpl)
Fixed sb3_contrib/common/maskable/*.py type annotations
Fixed sb3_contrib/ppo_mask/ppo_mask.py type annotations
Fixed sb3_contrib/common/vec_env/async_eval.py type annotations

Documentation:

Add some additional notes about MaskablePPO (evaluation and multi-process) (@icheered)

Full Changelog: v2.2.1...v2.3.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SB3-Contrib v2.3.0: New defaults hyperparameters for QR-DQN

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Breaking Changes:

New Features:

Others:

Documentation:

Contributors

Uh oh!