Compare PPO with spinning up #579

lihuoran · 2023-02-09T06:28:56Z

Description

Modify PPO algorithm
- Refine preprocess_batch logic.
- Switch the order of training critic/actor in a train step.
Remove forbid_last in replay memory. It's meaningless after we refined PPO algorithm.
Make PPO hyper-parameters exactly identical to spinning up's version.

Linked issue(s)/Pull request(s)

issue_number

Type of Change

Related Component

Simulation toolkit
RL toolkit
Distributed toolkit

Has Been Tested

OS:
- Windows
- Mac OS
- Linux
Python version:
- 3.7
- 3.8
- 3.9
Key information snapshot(s):

Needs Follow Up Actions

New release package
New docker image

Checklist

Add/update the related comments
Add/update the related tests
Add/update the related documentations
Update the dependent downstream modules usage

maro/rl/training/algorithms/base/ac_ppo_base.py

tests/rl/gym_wrapper/env_sampler.py

Jinyu-W · 2023-02-09T10:32:40Z

tests/rl/gym_wrapper/env_sampler.py

@@ -75,6 +75,7 @@ def post_collect(self, info_list: list, ep: int) -> None:
        self.metrics.update(cur)
        # clear validation metrics
        self.metrics = {k: v for k, v in self.metrics.items() if not k.startswith("val/")}
+        self._sample_rewards.clear()


bug fix to record the-ep-only statistics?

* PPO, SAC, DDPG passed * Explore in SAC * Test GYM on server * Sync server changes * pre-commit * Ready to try on server * . * . * . * . * . * Performance OK * Move to tests * Remove old versions * PPO done * Start to test AC * Start to test SAC * SAC test passed * Multiple round in evaluation * Modify config.yml * Add Callbacks * [wip] SAC performance not good * [wip] still not good * update for some PR comments; Add a MARKDOWN file (#576) Co-authored-by: Jinyu Wang <wang.jinyu@microsoft.com> * Use FullyConnected to replace mlp * Update action bound * ??? * Change gym env wrapper metrics logci * Change gym env wrapper metrics logci * refine env_sampler.sample under step mode * Add DDPG. Performance not good... * Add DDPG. Performance not good... * wip * Sounds like sac works * Refactor file structure * Refactor file structure * Refactor file structure * Pre-commit * Pre commit * Minor refinement of CIM RL * Jinyu/rl workflow refine (#578) * remove useless files; add device mapping; update pdoc * add default checkpoint path; fix distributed worker log path issue; update example log path * update performance doc * remove tests/rl/algorithms folder * Resolve PR comments * Compare PPO with spinning up (#579) * [wip] compare PPO * PPO matching * Revert unnecessary changes * Minor * Minor * SAC Test parameters update (#580) * fix sac to_device issue; update sac gym test parameters * add rl test performance plot func * update sac eval interval config * update sac checkpoint interval config * fix callback issue * update plot func * update plot func * update plot func * update performance doc; upload performance images * Minor fix in callbacks; refine plot.py format. * Add n_interactions. Use n_interactions to plot curves. * pre-commit --------- Co-authored-by: Huoran Li <huo53926@126.com> Co-authored-by: Huoran Li <huoranli@microsoft.com> * Episode truncation & early stopping (#581) * Add truncated logic * (To be tested) early stop * Early stop test passed * Test passed * Random action. To be tested. * Warmup OK * Pre-commit * random seed * Revert pre-commit config --------- Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> Co-authored-by: Jinyu Wang <wang.jinyu@microsoft.com>

lihuoran and others added 3 commits February 8, 2023 13:54

[wip] compare PPO

b22de3e

PPO matching

553f5c8

Revert unnecessary changes

cf135ac

lihuoran requested a review from Jinyu-W February 9, 2023 06:28

Minor

caaca8c

Jinyu-W reviewed Feb 9, 2023

View reviewed changes

Minor

b8a01bb

Jinyu-W approved these changes Feb 9, 2023

View reviewed changes

Jinyu-W merged commit ab5e675 into rl_workflow_refine Feb 9, 2023

Jinyu-W deleted the huoran/rl_workflow_refine branch February 9, 2023 11:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compare PPO with spinning up #579

Compare PPO with spinning up #579

lihuoran commented Feb 9, 2023 •

edited

Jinyu-W Feb 9, 2023

Compare PPO with spinning up #579

Compare PPO with spinning up #579

Conversation

lihuoran commented Feb 9, 2023 • edited

Description

Linked issue(s)/Pull request(s)

Type of Change

Related Component

Has Been Tested

Needs Follow Up Actions

Checklist

Jinyu-W Feb 9, 2023

Choose a reason for hiding this comment

lihuoran commented Feb 9, 2023 •

edited