Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to use RLLib master (almost v1.3). #10

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

Manuscrit
Copy link
Collaborator

Remove use lock_replay during training (must not use it in LTFT).
Create submodule marltoolbox.utils.log.
Move methods to summarize a model into an helper class.
use before_init_loss instead of after_init (policy class factory arg).

Maxime Riché added 15 commits April 15, 2021 15:28
Remove use lock_replay during training (must not use it in LTFT).
Create submodule marltoolbox.utils.log.
Move methods to summarize a model into an helper class.
use before_init_loss instead of after_init (policy class factory arg).
Fix some tests.
Add augmented R2D2.
Add examples with R2D2.
Add some end to end tests for amTFT vs exploiter, meta game and R2D2.
Fix speed performance issue in entropy computation.
Some refactoring of configs and hyperparameters (DQN, R2D2, LOLA-PG).
Tune HP for R2D2.
Few corrections.
Few style changes.
Partial refactoring of the coin game envs tests.
Add logging & plot of exploration temperature.
…than 2.

Add rolling average for the LOLA-PG reward centering and normalization.
- punishment helped in CGs
- customizable matrix game
- coop coins log in vectorized MCPCG
Add the "punishment helped" option in vectorized_ssd_mm_coin_game.py.
Add new plots by defaults in cross an self play evaluation.
Add script to plot bar chart summary figure.
…ns (instead of 2 or 3 for LOLA-Exact and instead of 2 for SOS-Exact)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant