[RLlib] Fix issues with action masking examples #38095

ArturNiederfahrenhorst · 2023-08-03T22:10:56Z

Why are these changes needed?

Fixes #37707 and a set of other issues.
The action masking example was broken but this was not picked up by CI because the error did not propagate.
The tf1 code path of the example was broken.
The example needed to be liften into RL Modules API.

Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>

…ch example

Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>

kouroshHakha

LGTM

Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>

ArturNiederfahrenhorst · 2023-08-04T01:14:30Z

@sven1977 Need you tensorflow help here.
It appears that the same code is failing for TF but working for torch.
Here is what I've found out so far:

Sampling runs fine
If you compute gradients, they will be infinite (loss produced by learner is not infinite, but gradients are)
If you don't add the infinite mask (remove the code below), gradients are look normal + no error

inf_mask = tf.maximum(tf.math.log(action_mask), tf.float32.min)
masked_logits = logits + inf_mask

sven1977 · 2023-08-07T16:07:23Z

Let me debug the new example script on tf2. ...

…ctionmasking

…asking

sven1977 · 2023-08-08T18:18:40Z

Root cause: The gradients in tf2 - for some reason - for the last action layer (the one producing the 100 action logits) are being computed as NaNs already in the very first training update step. This then leads to a broken NN and the final crash because one of the action labels is 100 (out of bounds, should be 0 to 99).

sven1977 · 2023-08-08T18:19:00Z

Confirmed that torch is working fine ...

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 · 2023-08-09T10:22:41Z

This is fixed now (new example script is running fine on my end). Waiting for tests to pass, then we can merge.

sven1977

LGTM. Thanks for the fix @ArturNiederfahrenhorst !

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…asking

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Signed-off-by: NripeshN <nn2012@hw.ac.uk>

PhilippWillms · 2023-08-15T20:18:27Z

@sven1977, @ArturNiederfahrenhorst : Appreciate your work on that topic!

Installing latest nightly build to run the new example file, I stumbled across two topics:

Even if I do not call the example with torch, I need to install torch cause in the rl_module example file the corresponding imports are loaded.
Which torch version did you use for implementing this PR ? Because with torch==2.0.1, I get the following exception

File "/mnt/d/git/ray/rllib/examples/action_masking.py", line 46, in
from ray.rllib.examples.rl_module.action_masking_rlm import (
File "/home/philipp/miniconda3/envs/tf-39-gpu-linux/lib/python3.9/site-packages/ray/rllib/examples/rl_module/action_masking_rlm.py", line 4, in
from ray.rllib.algorithms.ppo.torch.ppo_torch_rl_module import PPOTorchRLModule
File "/home/philipp/miniconda3/envs/tf-39-gpu-linux/lib/python3.9/site-packages/ray/rllib/algorithms/ppo/torch/ppo_torch_rl_module.py", line 7, in
from ray.rllib.core.rl_module.torch import TorchRLModule
File "/home/philipp/miniconda3/envs/tf-39-gpu-linux/lib/python3.9/site-packages/ray/rllib/core/rl_module/torch/init.py", line 1, in
from .torch_rl_module import TorchRLModule
File "/home/philipp/miniconda3/envs/tf-39-gpu-linux/lib/python3.9/site-packages/ray/rllib/core/rl_module/torch/torch_rl_module.py", line 11, in
from ray.rllib.models.torch.torch_distributions import TorchDistribution
File "/home/philipp/miniconda3/envs/tf-39-gpu-linux/lib/python3.9/site-packages/ray/rllib/models/torch/torch_distributions.py", line 22, in
class TorchDistribution(Distribution, abc.ABC):
File "/home/philipp/miniconda3/envs/tf-39-gpu-linux/lib/python3.9/site-packages/ray/rllib/models/torch/torch_distributions.py", line 51, in TorchDistribution
sample_shape=torch.Size(),

Verification outside of the ray cascade shows interesting behavior of torch leading to the issue

$ python -c "import torch; sample_shape=torch.Size(); print(sample_shape)"

torch.Size([])

Signed-off-by: harborn <gangsheng.wu@intel.com>

Signed-off-by: e428265 <arvind.chandramouli@lmco.com>

Signed-off-by: Victor <vctr.y.m@example.com>

PhilippWillms · 2024-01-25T20:23:45Z

The issue reported earlier does not occur anymore in ray 2.9.1. Tested with Torch 2.0.1 and the examples/action_masking file in the following version: https://github.com/ray-project/ray/blob/84d17cad835665631c2f68f6fb332e2973028fab/rllib/examples/action_masking.py

ArturNiederfahrenhorst added 3 commits August 3, 2023 11:13

Add --no-tune to raise errors in CI

8bcbd77

Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>

Remove clutter code for action mask example and introduce working tor…

85d228e

…ch example

fix tf2 issues

dea2374

Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>

ArturNiederfahrenhorst assigned kouroshHakha Aug 3, 2023

ArturNiederfahrenhorst requested review from sven1977, gjoliver, avnishn, smorad, maxpumperla, kouroshHakha and krfricke as code owners August 3, 2023 22:10

ArturNiederfahrenhorst changed the title ~~[RLlib] Fix issues with~~ [RLlib] Fix issues with action masking examples Aug 3, 2023

kouroshHakha approved these changes Aug 3, 2023

View reviewed changes

TF failing

fc6667a

Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>

ArturNiederfahrenhorst assigned sven1977 and unassigned kouroshHakha Aug 4, 2023

sven1977 added 4 commits August 8, 2023 19:12

Merge branch 'master' of https://github.com/ray-project/ray into fixa…

3183099

…ctionmasking

Merge branch 'master' into fixactionmasking

fb8c806

Merge branch 'master' of https://github.com/ray-project/ray into fixa…

24196dd

…ctionmasking

Merge remote-tracking branch 'artur/fixactionmasking' into fixactionm…

3a2c3e0

…asking

wip

7defd58

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 approved these changes Aug 9, 2023

View reviewed changes

sven1977 added 3 commits August 9, 2023 12:23

Merge branch 'master' into fixactionmasking

e37b6e3

LINT

d7f7544

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge remote-tracking branch 'artur/fixactionmasking' into fixactionm…

d0e3257

…asking

sven1977 added 2 commits August 9, 2023 16:46

fix

f25afd7

Signed-off-by: sven1977 <svenmika1977@gmail.com>

fix

f30630d

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 merged commit 1c29b98 into ray-project:master Aug 9, 2023
45 of 49 checks passed

NripeshN pushed a commit to NripeshN/ray that referenced this pull request Aug 15, 2023

[RLlib] Fix issues with action masking examples. (ray-project#38095)

59527e2

Signed-off-by: NripeshN <nn2012@hw.ac.uk>

harborn pushed a commit to harborn/ray that referenced this pull request Aug 17, 2023

[RLlib] Fix issues with action masking examples. (ray-project#38095)

3dd834b

Signed-off-by: harborn <gangsheng.wu@intel.com>

harborn pushed a commit to harborn/ray that referenced this pull request Aug 17, 2023

[RLlib] Fix issues with action masking examples. (ray-project#38095)

64c7fee

PhilippWillms mentioned this pull request Aug 17, 2023

[RLlib] PPO instantiation requires torch even though tf is to be used #38561

Open

arvind-chandra pushed a commit to lmco/ray that referenced this pull request Aug 31, 2023

[RLlib] Fix issues with action masking examples. (ray-project#38095)

b922e9a

Signed-off-by: e428265 <arvind.chandramouli@lmco.com>

vymao pushed a commit to vymao/ray that referenced this pull request Oct 11, 2023

[RLlib] Fix issues with action masking examples. (ray-project#38095)

c35f5b6

Signed-off-by: Victor <vctr.y.m@example.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Fix issues with action masking examples #38095

[RLlib] Fix issues with action masking examples #38095

ArturNiederfahrenhorst commented Aug 3, 2023

kouroshHakha left a comment

ArturNiederfahrenhorst commented Aug 4, 2023

sven1977 commented Aug 7, 2023

sven1977 commented Aug 8, 2023

sven1977 commented Aug 8, 2023

sven1977 commented Aug 9, 2023

sven1977 left a comment

PhilippWillms commented Aug 15, 2023 •

edited

PhilippWillms commented Jan 25, 2024

[RLlib] Fix issues with action masking examples #38095

[RLlib] Fix issues with action masking examples #38095

Conversation

ArturNiederfahrenhorst commented Aug 3, 2023

Why are these changes needed?

kouroshHakha left a comment

Choose a reason for hiding this comment

ArturNiederfahrenhorst commented Aug 4, 2023

sven1977 commented Aug 7, 2023

sven1977 commented Aug 8, 2023

sven1977 commented Aug 8, 2023

sven1977 commented Aug 9, 2023

sven1977 left a comment

Choose a reason for hiding this comment

PhilippWillms commented Aug 15, 2023 • edited

PhilippWillms commented Jan 25, 2024

PhilippWillms commented Aug 15, 2023 •

edited