SB3-Contrib v2.3.0: New defaults hyperparameters for QR-DQN
Breaking Changes:
- Upgraded to Stable-Baselines3 >= 2.3.0
- The default
learning_startsparameter ofQRDQNhave been changed to be consistent with the other offpolicy algorithms
# SB3 < 2.3.0 default hyperparameters, 50_000 corresponded to Atari defaults hyperparameters
# model = QRDQN("MlpPolicy", env, learning_starts=50_000)
# SB3 >= 2.3.0:
model = QRDQN("MlpPolicy", env, learning_starts=100)New Features:
- Added
rollout_buffer_classandrollout_buffer_kwargsarguments to MaskablePPO - Log success rate
rollout/success_ratewhen available for on policy algorithms
Others:
- Fixed
train_freqtype annotation for tqc and qrdqn (@Armandpl) - Fixed
sb3_contrib/common/maskable/*.pytype annotations - Fixed
sb3_contrib/ppo_mask/ppo_mask.pytype annotations - Fixed
sb3_contrib/common/vec_env/async_eval.pytype annotations
Documentation:
- Add some additional notes about
MaskablePPO(evaluation and multi-process) (@icheered)
Full Changelog: v2.2.1...v2.3.0