Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Fix issues with action masking examples #38095

Merged
merged 14 commits into from Aug 9, 2023

Conversation

ArturNiederfahrenhorst
Copy link
Contributor

Why are these changes needed?

Fixes #37707 and a set of other issues.
The action masking example was broken but this was not picked up by CI because the error did not propagate.
The tf1 code path of the example was broken.
The example needed to be liften into RL Modules API.

Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
@ArturNiederfahrenhorst ArturNiederfahrenhorst changed the title [RLlib] Fix issues with [RLlib] Fix issues with action masking examples Aug 3, 2023
Copy link
Contributor

@kouroshHakha kouroshHakha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
@ArturNiederfahrenhorst
Copy link
Contributor Author

@sven1977 Need you tensorflow help here.
It appears that the same code is failing for TF but working for torch.
Here is what I've found out so far:

  • Sampling runs fine
  • If you compute gradients, they will be infinite (loss produced by learner is not infinite, but gradients are)
  • If you don't add the infinite mask (remove the code below), gradients are look normal + no error
inf_mask = tf.maximum(tf.math.log(action_mask), tf.float32.min)
masked_logits = logits + inf_mask

@sven1977
Copy link
Contributor

sven1977 commented Aug 7, 2023

Let me debug the new example script on tf2. ...

@sven1977
Copy link
Contributor

sven1977 commented Aug 8, 2023

Root cause: The gradients in tf2 - for some reason - for the last action layer (the one producing the 100 action logits) are being computed as NaNs already in the very first training update step. This then leads to a broken NN and the final crash because one of the action labels is 100 (out of bounds, should be 0 to 99).

@sven1977
Copy link
Contributor

sven1977 commented Aug 8, 2023

Confirmed that torch is working fine ...

Signed-off-by: sven1977 <svenmika1977@gmail.com>
@sven1977
Copy link
Contributor

sven1977 commented Aug 9, 2023

This is fixed now (new example script is running fine on my end). Waiting for tests to pass, then we can merge.

Copy link
Contributor

@sven1977 sven1977 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the fix @ArturNiederfahrenhorst !

Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
@sven1977 sven1977 merged commit 1c29b98 into ray-project:master Aug 9, 2023
45 of 49 checks passed
NripeshN pushed a commit to NripeshN/ray that referenced this pull request Aug 15, 2023
@PhilippWillms
Copy link

PhilippWillms commented Aug 15, 2023

@sven1977, @ArturNiederfahrenhorst : Appreciate your work on that topic!

Installing latest nightly build to run the new example file, I stumbled across two topics:

  1. Even if I do not call the example with torch, I need to install torch cause in the rl_module example file the corresponding imports are loaded.
  2. Which torch version did you use for implementing this PR ? Because with torch==2.0.1, I get the following exception

File "/mnt/d/git/ray/rllib/examples/action_masking.py", line 46, in
from ray.rllib.examples.rl_module.action_masking_rlm import (
File "/home/philipp/miniconda3/envs/tf-39-gpu-linux/lib/python3.9/site-packages/ray/rllib/examples/rl_module/action_masking_rlm.py", line 4, in
from ray.rllib.algorithms.ppo.torch.ppo_torch_rl_module import PPOTorchRLModule
File "/home/philipp/miniconda3/envs/tf-39-gpu-linux/lib/python3.9/site-packages/ray/rllib/algorithms/ppo/torch/ppo_torch_rl_module.py", line 7, in
from ray.rllib.core.rl_module.torch import TorchRLModule
File "/home/philipp/miniconda3/envs/tf-39-gpu-linux/lib/python3.9/site-packages/ray/rllib/core/rl_module/torch/init.py", line 1, in
from .torch_rl_module import TorchRLModule
File "/home/philipp/miniconda3/envs/tf-39-gpu-linux/lib/python3.9/site-packages/ray/rllib/core/rl_module/torch/torch_rl_module.py", line 11, in
from ray.rllib.models.torch.torch_distributions import TorchDistribution
File "/home/philipp/miniconda3/envs/tf-39-gpu-linux/lib/python3.9/site-packages/ray/rllib/models/torch/torch_distributions.py", line 22, in
class TorchDistribution(Distribution, abc.ABC):
File "/home/philipp/miniconda3/envs/tf-39-gpu-linux/lib/python3.9/site-packages/ray/rllib/models/torch/torch_distributions.py", line 51, in TorchDistribution
sample_shape=torch.Size(),

Verification outside of the ray cascade shows interesting behavior of torch leading to the issue

$ python -c "import torch; sample_shape=torch.Size(); print(sample_shape)"

torch.Size([])

harborn pushed a commit to harborn/ray that referenced this pull request Aug 17, 2023
Signed-off-by: harborn <gangsheng.wu@intel.com>
harborn pushed a commit to harborn/ray that referenced this pull request Aug 17, 2023
arvind-chandra pushed a commit to lmco/ray that referenced this pull request Aug 31, 2023
Signed-off-by: e428265 <arvind.chandramouli@lmco.com>
vymao pushed a commit to vymao/ray that referenced this pull request Oct 11, 2023
Signed-off-by: Victor <vctr.y.m@example.com>
@PhilippWillms
Copy link

The issue reported earlier does not occur anymore in ray 2.9.1. Tested with Torch 2.0.1 and the examples/action_masking file in the following version: https://github.com/ray-project/ray/blob/84d17cad835665631c2f68f6fb332e2973028fab/rllib/examples/action_masking.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RLlib] Action masking is not working in Ray 2.6.1
4 participants