Upgrade to use RLLib master (almost v1.3). #10

Manuscrit · 2021-04-15T13:30:09Z

Remove use lock_replay during training (must not use it in LTFT).
Create submodule marltoolbox.utils.log.
Move methods to summarize a model into an helper class.
use before_init_loss instead of after_init (policy class factory arg).

Remove use lock_replay during training (must not use it in LTFT). Create submodule marltoolbox.utils.log. Move methods to summarize a model into an helper class. use before_init_loss instead of after_init (policy class factory arg).

…mall fixes.

Fix some tests. Add augmented R2D2. Add examples with R2D2. Add some end to end tests for amTFT vs exploiter, meta game and R2D2.

Fix speed performance issue in entropy computation. Some refactoring of configs and hyperparameters (DQN, R2D2, LOLA-PG).

Tune HP for R2D2. Few corrections. Few style changes.

Partial refactoring of the coin game envs tests. Add logging & plot of exploration temperature.

…than 2. Add rolling average for the LOLA-PG reward centering and normalization.

- punishment helped in CGs - customizable matrix game - coop coins log in vectorized MCPCG

…, replicator dynamic)

Add the "punishment helped" option in vectorized_ssd_mm_coin_game.py. Add new plots by defaults in cross an self play evaluation. Add script to plot bar chart summary figure.

…ns (instead of 2 or 3 for LOLA-Exact and instead of 2 for SOS-Exact)

Maxime Riché added 15 commits April 15, 2021 15:28

Upgrade to use RLLib master (almost v1.3).

9f8a1b2

Remove use lock_replay during training (must not use it in LTFT). Create submodule marltoolbox.utils.log. Move methods to summarize a model into an helper class. use before_init_loss instead of after_init (policy class factory arg).

Use _compute_actions_helper instead of compute_actions in policies. S…

4e0bf57

…mall fixes.

Fixing some tests, _comute_action_helper, update_target

9888b75

Fix amTFT exploiter.

5599bf6

Fix some tests. Add augmented R2D2. Add examples with R2D2. Add some end to end tests for amTFT vs exploiter, meta game and R2D2.

Support LSTM in amTFT.

00927ab

Fix speed performance issue in entropy computation. Some refactoring of configs and hyperparameters (DQN, R2D2, LOLA-PG).

Add meta game exp with LOLA-Exact.

07c35cc

Tune HP for R2D2. Few corrections. Few style changes.

Add several options to stabilize the LOLA-PG training.

c43019c

Fix bug in cross_play evaluator.

c1051f0

Partial refactoring of the coin game envs tests. Add logging & plot of exploration temperature.

Add SSDMMCG vectorized.

67c8f79

Fix amTFT: how the environment is used during the rollouts

6f8f7e4

Change LOLA-Exact to make it work with discrete actions space larger …

9dd9738

…than 2. Add rolling average for the LOLA-PG reward centering and normalization.

Add stuff to the environments:

028be50

- punishment helped in CGs - customizable matrix game - coop coins log in vectorized MCPCG

Add meta game experiments with various meta solvers (like: alpha-rank…

46ff4b0

…, replicator dynamic)

Add more meta algorithms in meta game experiment.

c65be87

Add the "punishment helped" option in vectorized_ssd_mm_coin_game.py. Add new plots by defaults in cross an self play evaluation. Add script to plot bar chart summary figure.

Adapt LOLA-Exact and SOS-Exact to work with matrix games with N actio…

92ae8b4

…ns (instead of 2 or 3 for LOLA-Exact and instead of 2 for SOS-Exact)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade to use RLLib master (almost v1.3). #10

Upgrade to use RLLib master (almost v1.3). #10

Manuscrit commented Apr 15, 2021

Upgrade to use RLLib master (almost v1.3). #10

Are you sure you want to change the base?

Upgrade to use RLLib master (almost v1.3). #10

Conversation

Manuscrit commented Apr 15, 2021