[Algorithm] Update PPO examples #1495

albertbou92 · 2023-09-06T10:05:41Z

Description

Updated PPO examples. Now the scripts reproduce the results from Atari and MuJoCo environments in the original PPO paper.
Some common improvements are added (like computing the advantage at every epoch)

Motivation and Context

Why is this change required? What problem does it solve?
If it fixes an open issue, please link to the issue here.
You can use the syntax close #15213 if this solves the issue #15213

I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

What types of changes does your code introduce? Remove all that do not apply:

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds core functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)
Example (update in the folder of examples)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

I have read the CONTRIBUTION guide (required)
My change requires a change to the documentation.
I have updated the tests accordingly (required for a bug fix or a new feature).
I have updated the documentation accordingly.

albertbou92 · 2023-09-08T07:44:38Z

This is my suggestion to update the PPO examples. I have 1 script for MuJoCo and one for Atari since the architectures and the env transforms are different. I think it is easier to read, but we could also have a single script since the rest is almost the same. Maybe we could all review it as @matteobettini suggested and agree on a template for the other examples? what do you think? @vmoens @BY571

vmoens · 2023-09-08T08:12:05Z

I think it's great. Having 2 separates files is fine, it's better to have clarity than highly engineered, poorly tested and unreadable single scripts :) .
We can (if you want) write a .md file in the directory that says that we have 2 files for clarity, even though it's self explanatory...

matteobettini · 2023-09-09T09:14:36Z

examples/ppo/ppo_mujoco.py

+
+        # Test logging
+        with torch.no_grad(), set_exploration_type(ExplorationType.MODE):
+            if (collected_frames - frames_in_batch) // cfg.logger.test_interval < (


can't you use a variable from the enumeration of the collector here?

doing it implicitly since collected_frames is the "number of batches collected" * frames_in_batch.
I can make it more explicit:

if (i - 1) * frames_in_batch % test_interval < i * frames_in_batch % test_interval:

why not if i % cfg.eval.evaluation_interval == 0?

ah because your test interval is in frames, then if collected_frames_till_now % test_interval == 0?

examples/ppo/ppo_mujoco.py

matteobettini · 2023-09-09T09:22:19Z

examples/ppo/ppo_mujoco.py

+        episode_rewards = data["next", "episode_reward"][data["next", "done"]]
+        if len(episode_rewards) > 0:
+            logger.log_scalar(
+                "reward_train", episode_rewards.mean().item(), collected_frames


i usually use names like "train/reward", this will make wandb automatically divide them into different panels.

here is the stuff i log for training

rl/examples/multiagent/utils/logging.py

Line 76 in 147de71

to_log.update(

and this some for eval

rl/examples/multiagent/utils/logging.py

Line 122 in 147de71

"eval/episode_reward_min": min(rewards),

i think we at least need to time the scripts and log times both for collection and training

Makes sense to use this kind of naming. Probably in each example the metrics will vary a bit, but at least we can agree to make them always "train/..." and "eval/...".

Adding the timing also makes sense

vmoens · 2023-09-17T19:36:27Z

@albertbou92 we also need to update the examples CI (cc @BY571)

examples/ppo/ppo_mujoco.py

examples/ppo/ppo_atari.py

examples/ppo/utils_atari.py

examples/ppo/utils_mujoco.py

vmoens · 2023-09-18T10:09:56Z

examples/ppo/utils_mujoco.py

@@ -0,0 +1,116 @@
+import gym


Since you're using HalfCheeta-v4, we can use gymnasium no?
This version of halfcheetah with the gym from the CI makes it crash.
The CI uses gym 0.23 for D4RL compatibility, and gymnasium for newer stuff.
(welcome to gym wonderland)

I used HalfCheetah-v4 in case the CI env did not have MuJoCo. The default config has HalfCheetah-v3.

But using gymnasium seems to work out of the box, so I changed that.

albertbou92 · 2023-09-18T10:55:40Z

for atari, we can now speedup training with the new vectorised envs right?

vmoens · 2023-09-18T13:27:15Z

I think, I can check

vmoens · 2023-09-20T16:27:01Z

@albertbou92 the examples test are failing

albertbou92 · 2023-09-20T17:54:43Z

@albertbou92 the examples test are failing

solved! also for A2C @vmoens

vmoens

LGTM!

Co-authored-by: vmoens <vincentmoens@gmail.com>

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 6, 2023

vmoens added the new algo New algorithm request or PR label Sep 6, 2023

vmoens force-pushed the update_ppo_example branch from 9d1e8a8 to f2c8e68 Compare September 8, 2023 13:54

matteobettini reviewed Sep 9, 2023

View reviewed changes

examples/ppo/ppo_mujoco.py Show resolved Hide resolved

matteobettini reviewed Sep 9, 2023

View reviewed changes

BY571 mentioned this pull request Sep 14, 2023

[Algorithm] Update TD3 Example #1523

Merged

10 tasks

albertbou92 added 20 commits September 18, 2023 11:34

merge

e5892af

format

66a7b11

fixes

6ec4850

fixes

1f0bce4

fixes

0b8a2bf

fixes

b299c91

fixes

6317e2a

replace noops wrapper with transform

9fb9cdd

merge

592341f

merge

b1fd17d

merge

7c64d7c

merge

fbc3718

improvements PR

c504f2a

merge

9a19d67

fix

f9dce34

eval time

f1fc7ef

minor fix

3f0b18a

fix

d320416

ci

433390c

ci

2acf683

merge

ebe28a6

albertbou92 force-pushed the update_ppo_example branch from 47b71e4 to ebe28a6 Compare September 18, 2023 09:39

vmoens changed the title ~~[Feature] Update PPO examples~~ [Algorithm] Update PPO examples Sep 18, 2023

vmoens reviewed Sep 18, 2023

View reviewed changes

albertbou92 added 2 commits September 18, 2023 12:19

headers

e5f928f

use gymnasium

314f984

albertbou92 added 4 commits September 18, 2023 15:32

env fix

26dc185

cleaner logging

ccffb73

device count

9de5e39

fixes

303fcd7

fix tests

44dffc1

Merge branch 'main' into update_ppo_example

8ab7fcb

vmoens approved these changes Sep 21, 2023

View reviewed changes

vmoens merged commit fc9794d into pytorch:main Sep 21, 2023
50 of 59 checks passed

vmoens deleted the update_ppo_example branch September 21, 2023 12:41

vmoens added a commit to hyerra/rl that referenced this pull request Oct 10, 2023

[Algorithm] Update PPO examples (pytorch#1495)

ae370e9

Co-authored-by: vmoens <vincentmoens@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Algorithm] Update PPO examples #1495

[Algorithm] Update PPO examples #1495

albertbou92 commented Sep 6, 2023 •

edited

Loading

albertbou92 commented Sep 8, 2023

vmoens commented Sep 8, 2023

matteobettini Sep 9, 2023 •

edited

Loading

albertbou92 Sep 10, 2023

matteobettini Sep 10, 2023 •

edited

Loading

matteobettini Sep 10, 2023

matteobettini Sep 9, 2023

matteobettini Sep 9, 2023

albertbou92 Sep 10, 2023

vmoens commented Sep 17, 2023

vmoens Sep 18, 2023

albertbou92 Sep 18, 2023 •

edited

Loading

albertbou92 commented Sep 18, 2023

vmoens commented Sep 18, 2023

vmoens commented Sep 20, 2023

albertbou92 commented Sep 20, 2023

vmoens left a comment

[Algorithm] Update PPO examples #1495

[Algorithm] Update PPO examples #1495

Conversation

albertbou92 commented Sep 6, 2023 • edited Loading

Description

Motivation and Context

Types of changes

Checklist

albertbou92 commented Sep 8, 2023

vmoens commented Sep 8, 2023

matteobettini Sep 9, 2023 • edited Loading

Choose a reason for hiding this comment

albertbou92 Sep 10, 2023

Choose a reason for hiding this comment

matteobettini Sep 10, 2023 • edited Loading

Choose a reason for hiding this comment

matteobettini Sep 10, 2023

Choose a reason for hiding this comment

matteobettini Sep 9, 2023

Choose a reason for hiding this comment

matteobettini Sep 9, 2023

Choose a reason for hiding this comment

albertbou92 Sep 10, 2023

Choose a reason for hiding this comment

vmoens commented Sep 17, 2023

vmoens Sep 18, 2023

Choose a reason for hiding this comment

albertbou92 Sep 18, 2023 • edited Loading

Choose a reason for hiding this comment

albertbou92 commented Sep 18, 2023

vmoens commented Sep 18, 2023

vmoens commented Sep 20, 2023

albertbou92 commented Sep 20, 2023

vmoens left a comment

Choose a reason for hiding this comment

albertbou92 commented Sep 6, 2023 •

edited

Loading

matteobettini Sep 9, 2023 •

edited

Loading

matteobettini Sep 10, 2023 •

edited

Loading

albertbou92 Sep 18, 2023 •

edited

Loading