[WIP] Refactoring RL models #3

hill-a · 2018-07-18T13:20:22Z

refactored A2C, ACER, ACTKR, DDPG, DeepQ, GAIL, TRPO, PPO1 and PPO2 under a single constant class
added callback to refactored algorithm training
added saving and loading to refactored algorithms
refactored ACER, DDPG, GAIL, PPO1 and TRPO to fit with A2C, PPO2 and ACKTR policies
added new policies for most algorithms (Mlp, MlpLstm, MlpLnLstm, Cnn, CnnLstm and CnnLnLstm)
added dynamic environment switching (so continual RL learning is now feasible)
added prediction from observation and action probability from observation for all the algorithms
fixed graphs issues, so models wont collide in names
fixed behavior_clone weight loading for GAIL
fixed Tensorflow using all the GPU VRAM
fixed models so that they are all compatible with vectorized environments
fixed set_global_seed to update gym.spaces's random seed
fixed PPO1 and TRPO performance issues when learning identity function
added new tests for loading, saving, continuous actions and learning the identity function
fixed DQN wrapping for atari
added saving and loading for Vecnormalize wrapper
added automatic detection of action space (for the policy network)
fixed ACER buffer with constant values assuming n_stack=4
fixed some RL algorithms not clipping the action to be in the action_space, when using gym.spaces.Box
refactored algorithms can take either a gym.Environment or a str (if the environment name is registered)

TODO:

Finish refactoring HER
Refactor ACKTR and ACER for continuous implementation

…tests

+ Make discrete env episodic + update ACER/ACKTR hyperparams

… refactoring

into refactoring

… refactoring

into refactoring

* Fixed typo * Update changelog.rst Co-authored-by: Rouslan Placella <rouslan@placella.com>

* Faster tests * Add github workflow * Faster test * Fix MPI dependency * Faster HER tests + fix CI * No specific python version for pytype * Fixes + add badge * Fix multiprocessing error * Separate TD3 test * Add tolerance for deterministic check * Better tests for saving/loading * Remove unnecessary check * Add comment about VecEnv start method * Make pytype happy * Debug MPI * Add MPI step * Move MPI tests outside pytest * Two processes for GitHub CI * Deactivate check moments * Fix import error * Copy version.txt to docker container (#2) * Copy version.txt to docker container * Update changelog * Update Max username Co-authored-by: Al Nejati <anej001@aucklanduni.ac.nz> * Fixed typo (#3) * Fixed typo * Update changelog.rst Co-authored-by: Rouslan Placella <rouslan@placella.com> * Warn users to switch to Stable Baselines3 (#4) * Warn users to switch to Stable Baselines3 * Fix for pytype * Fix other pytype error * Clarify update warning * Remove python 3.5 build * Allow test to fail * Flaky TD3 test * Fix argument Co-authored-by: Anssi <kaneran21@hotmail.com> Co-authored-by: Al Nejati <anej001@aucklanduni.ac.nz> Co-authored-by: Rouslan Placella <rouslan@placella.com> Co-authored-by: Anssi <kaneran21@hotmail.com>

hill-a added 25 commits July 18, 2018 11:43

added more policies to A2C and PPO2

edd9bf7

parameter hotfix

b39cbbd

removed useless continuous arguement from a2c policies

1f6d3a9

fixed tests

6c50d71

refactored a2c

b359e7e

added base class for RL models

5d91a66

refactored acer + fixes

f9f5e9e

partial refactor of acktr

5e33e14

refactored DDPG + added DDPG CNN policies

d43624d

refactored deepq + ddpg alterations

2cbb22d

refactored TRPO

6497b64

fixed a2c test

c374e57

fixed TRPO argument issue

d62fdc8

refactored PPO2 + corrected loading calls

99ea253

refactored PPO1

7281158

refactored PPO1

233ead8

changed total timestep management + bugfixes

d86303f

refactored environment management + improved all the saving methods

9fef74f

fixed test issues

ba1ee44

finished action prediction and action proba

247c6b4

fixed graphs + added more identity tests + acer upgrades

a3746c7

fixed models not supporting VecEnvs

e7e3dca

fixed seed issues + fixed training issues + added saving and loading …

4c1ad6e

…tests

fixed tests + verbosity fixed

7c8200f

added continuous tests + ddpg cleaned up

1de90b2

araffin changed the base branch from fixes_cleanup to master July 27, 2018 14:42

hill-a added 4 commits July 27, 2018 17:24

fixed continuous actions for A2C

21772de

fadded spaces.MultiDiscrete and spaces.MultiBinary to A2C policies

17e0134

changed acer policies

64713a8

fixed a2c issues

3779b0c

hill-a and others added 23 commits August 19, 2018 01:49

fixed ACER runner issues

49fd8cc

fixed ACER with discrete observation space

d451bca

Update ACER hyperparams for test_identity + minor edits

66aae4a

Restore test identity

93e5bc1

Update ACER hyperparams

47c877b

Trying to fix CI

75fa2dc

Revert to default ACER hyperparams

2d29ece

Update ACER test

bc001f3

improved vectorized environment changing

ff3665a

merged with refactoring branch

dae6d3d

forced ACER to only use environment with the same n_envs

f1c47c1

fixes and cleanup

5a0669c

Update test identity

8b6c8e2

+ Make discrete env episodic + update ACER/ACKTR hyperparams

Merge branch 'refactoring' of github.com:hill-a/stable-baselines into…

fbc10ff

… refactoring

Update PPO1 hyperparams for test

66e088b

fixed early reset issues with monitor + extra error messages

2b82f46

Merge branch 'refactoring' of https://github.com/hill-a/stable-baselines

16acde7

into refactoring

Fix test identity hyperparams

f98ad5d

Merge branch 'refactoring' of github.com:hill-a/stable-baselines into…

62061c1

… refactoring

hotfix

8bb4186

Merge branch 'refactoring' of https://github.com/hill-a/stable-baselines

c9d0725

into refactoring

Bump version

afd31eb

fixed ACER + added setup files

282e2ec

araffin merged commit 282e2ec into master Aug 29, 2018

hill-a deleted the refactoring branch August 29, 2018 11:49

lukepolson mentioned this pull request Apr 30, 2020

Using Saved Model as Enemy Policy in Custom Environment (while training in a subprocvecenv) #835

Open

araffin added a commit that referenced this pull request Apr 19, 2021

Fixed typo (#3)

b83a434

* Fixed typo * Update changelog.rst Co-authored-by: Rouslan Placella <rouslan@placella.com>

winthropharvey mentioned this pull request Oct 27, 2021

[Question] How to save and load a trained model into Tensorflow 1 and 2? #974

Open

araffin mentioned this pull request Oct 4, 2023

[Bug]: Callback returns aren't handled properly by either OffPolicyAlgorithm or OnPolicyAlgorithm DLR-RM/stable-baselines3#1706

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Refactoring RL models #3

[WIP] Refactoring RL models #3

hill-a commented Jul 18, 2018 •

edited

[WIP] Refactoring RL models #3

[WIP] Refactoring RL models #3

Conversation

hill-a commented Jul 18, 2018 • edited

hill-a commented Jul 18, 2018 •

edited