Episode truncation & early stopping #581

lihuoran · 2023-02-14T08:02:42Z

Description

Add episode truncation to EnvSampler. This is not identical to duration in Env.
- duration is an internal attribute of Env. When Env has executed all the steps in the duration, it "terminates".
- "Truncation" is a concept in EnvSampler and does not have to be perceived by Env. When a single episode reaches maximum episode length, it is "truncated".
Refine callback-related logics. Add early stopping in training pipeline.

Linked issue(s)/Pull request(s)

issue_number

Type of Change

Related Component

Simulation toolkit
RL toolkit
Distributed toolkit

Has Been Tested

OS:
- Windows
- Mac OS
- Linux
Python version:
- 3.7
- 3.8
- 3.9
Key information snapshot(s):

Needs Follow Up Actions

New release package
New docker image

Checklist

Add/update the related comments
Add/update the related tests
Add/update the related documentations
Update the dependent downstream modules usage

.pre-commit-config.yaml

Jinyu-W · 2023-02-17T05:47:32Z

maro/rl/rollout/env_sampler.py

@@ -267,6 +274,8 @@ def __init__(
        self._transition_cache: List[CacheElement] = []
        self._agent_last_index: Dict[Any, int] = {}  # Index of last occurrence of agent in self._transition_cache
        self._reward_eval_delay = reward_eval_delay
+        self._max_episode_length = max_episode_length


feels like duplicated with the "max_tick", but the "truncated" info is different from "is_done". Strange

max_tick is the concept of Env while max_episode_length is the concept of EnvSampler. A more readable name is more than welcome.

* PPO, SAC, DDPG passed * Explore in SAC * Test GYM on server * Sync server changes * pre-commit * Ready to try on server * . * . * . * . * . * Performance OK * Move to tests * Remove old versions * PPO done * Start to test AC * Start to test SAC * SAC test passed * Multiple round in evaluation * Modify config.yml * Add Callbacks * [wip] SAC performance not good * [wip] still not good * update for some PR comments; Add a MARKDOWN file (#576) Co-authored-by: Jinyu Wang <wang.jinyu@microsoft.com> * Use FullyConnected to replace mlp * Update action bound * ??? * Change gym env wrapper metrics logci * Change gym env wrapper metrics logci * refine env_sampler.sample under step mode * Add DDPG. Performance not good... * Add DDPG. Performance not good... * wip * Sounds like sac works * Refactor file structure * Refactor file structure * Refactor file structure * Pre-commit * Pre commit * Minor refinement of CIM RL * Jinyu/rl workflow refine (#578) * remove useless files; add device mapping; update pdoc * add default checkpoint path; fix distributed worker log path issue; update example log path * update performance doc * remove tests/rl/algorithms folder * Resolve PR comments * Compare PPO with spinning up (#579) * [wip] compare PPO * PPO matching * Revert unnecessary changes * Minor * Minor * SAC Test parameters update (#580) * fix sac to_device issue; update sac gym test parameters * add rl test performance plot func * update sac eval interval config * update sac checkpoint interval config * fix callback issue * update plot func * update plot func * update plot func * update performance doc; upload performance images * Minor fix in callbacks; refine plot.py format. * Add n_interactions. Use n_interactions to plot curves. * pre-commit --------- Co-authored-by: Huoran Li <huo53926@126.com> Co-authored-by: Huoran Li <huoranli@microsoft.com> * Episode truncation & early stopping (#581) * Add truncated logic * (To be tested) early stop * Early stop test passed * Test passed * Random action. To be tested. * Warmup OK * Pre-commit * random seed * Revert pre-commit config --------- Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> Co-authored-by: Jinyu Wang <wang.jinyu@microsoft.com>

lihuoran and others added 4 commits February 14, 2023 11:02

Add truncated logic

719eeb7

(To be tested) early stop

6efb8ec

Early stop test passed

2578d03

Test passed

0cf4235

lihuoran requested a review from Jinyu-W February 14, 2023 08:02

lihuoran and others added 4 commits February 16, 2023 15:58

Random action. To be tested.

c853824

Warmup OK

ac7bc1a

Pre-commit

ef3fe5f

random seed

88d769c

Jinyu-W reviewed Feb 17, 2023

View reviewed changes

Revert pre-commit config

cc5170c

Jinyu-W merged commit 9371949 into rl_workflow_refine Feb 17, 2023

Jinyu-W deleted the huoran/rl_workflow_refine branch February 17, 2023 06:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Episode truncation & early stopping #581

Episode truncation & early stopping #581

lihuoran commented Feb 14, 2023

Jinyu-W Feb 17, 2023

lihuoran Feb 17, 2023

Episode truncation & early stopping #581

Episode truncation & early stopping #581

Conversation

lihuoran commented Feb 14, 2023

Description

Linked issue(s)/Pull request(s)

Type of Change

Related Component

Has Been Tested

Needs Follow Up Actions

Checklist

Jinyu-W Feb 17, 2023

Choose a reason for hiding this comment

lihuoran Feb 17, 2023

Choose a reason for hiding this comment