[RLlib] Fix bug in SingleAgentEnvRunner: Calling `sample()` would always force-reset the vector env (even if episodes were not completed in a previous call). #41168

sven1977 · 2023-11-15T20:41:24Z

Fix bug in SingleAgentEnvRunner: Calling sample() repeatedly would always force-reset the vector env (even if episodes were not completed in a previous call).

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: sven1977 <svenmika1977@gmail.com>

kouroshHakha · 2023-11-16T04:23:34Z

rllib/algorithms/ppo/ppo.py

+                # TEST
+                #import pickle
+                #import torch
+                #import os
+                #with open("train_batch.pkl", "wb") as file:
+                #    pickle.dump(train_batch, file)
+                #torch.save(self.workers.local_worker().policy_map["default_policy"].model.state_dict(), "model_weights.pth")
+                # END TEST
+


Sorry, will comb through changes and clean everything up before merging. Thanks for this catch!

kouroshHakha · 2023-11-16T04:23:42Z

rllib/algorithms/ppo/ppo.py

+                # TEST
+                #import pickle
+                #import torch
+                #import os
+                #from ray.rllib.utils.test_utils import check
+                #rw_path = "/Users/sven/ray_results/PPO_2023-11-15_13-28-16/PPO_CartPoleDebug_74001_00000_0_2023-11-15_13-28-16"
+                #with open(os.path.join(rw_path, "train_batch.pkl"), "rb") as file:
+                #    rw_train_batch = pickle.load(file)
+                #rw_state_dict = torch.load(os.path.join(rw_path, "model_weights.pth"))
+                #self.workers.local_worker().module.load_state_dict(rw_state_dict)
+                # END TEST


kouroshHakha · 2023-11-16T04:23:48Z

rllib/algorithms/ppo/ppo.py

+                #check(train_batch["advantages"], rw_train_batch["default_policy"]["advantages"], rtol=0.000001)
+                #check(train_batch["vf_preds"], rw_train_batch["default_policy"]["vf_preds"])
+                #check(train_batch["value_targets"], rw_train_batch["default_policy"]["value_targets"])


…single_agent_env_runner_force_reset_bug

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…ays force-reset the vector env (even if episodes were not completed in a previous call). (ray-project#41168)

wip

8cb2521

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 requested review from avnishn, ArturNiederfahrenhorst, smorad, maxpumperla and kouroshHakha as code owners November 15, 2023 20:41

sven1977 assigned kouroshHakha Nov 15, 2023

kouroshHakha approved these changes Nov 16, 2023

View reviewed changes

sven1977 added 2 commits November 16, 2023 11:45

Merge branch 'master' of https://github.com/ray-project/ray into fix_…

9e0f851

…single_agent_env_runner_force_reset_bug

wip

e907658

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 merged commit ed9588a into ray-project:master Nov 16, 2023
15 of 18 checks passed

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Nov 29, 2023

[RLlib] Fix bug in SingleAgentEnvRunner: Calling sample() would alw…

c974d52

…ays force-reset the vector env (even if episodes were not completed in a previous call). (ray-project#41168)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Fix bug in SingleAgentEnvRunner: Calling `sample()` would always force-reset the vector env (even if episodes were not completed in a previous call). #41168

[RLlib] Fix bug in SingleAgentEnvRunner: Calling `sample()` would always force-reset the vector env (even if episodes were not completed in a previous call). #41168

sven1977 commented Nov 15, 2023 •

edited

Loading

kouroshHakha Nov 16, 2023

sven1977 Nov 16, 2023

sven1977 Nov 16, 2023

kouroshHakha Nov 16, 2023

sven1977 Nov 16, 2023

kouroshHakha Nov 16, 2023

sven1977 Nov 16, 2023

[RLlib] Fix bug in SingleAgentEnvRunner: Calling sample() would always force-reset the vector env (even if episodes were not completed in a previous call). #41168

[RLlib] Fix bug in SingleAgentEnvRunner: Calling sample() would always force-reset the vector env (even if episodes were not completed in a previous call). #41168

Conversation

sven1977 commented Nov 15, 2023 • edited Loading

Why are these changes needed?

Related issue number

Checks

kouroshHakha Nov 16, 2023

Choose a reason for hiding this comment

sven1977 Nov 16, 2023

Choose a reason for hiding this comment

sven1977 Nov 16, 2023

Choose a reason for hiding this comment

kouroshHakha Nov 16, 2023

Choose a reason for hiding this comment

sven1977 Nov 16, 2023

Choose a reason for hiding this comment

kouroshHakha Nov 16, 2023

Choose a reason for hiding this comment

sven1977 Nov 16, 2023

Choose a reason for hiding this comment

[RLlib] Fix bug in SingleAgentEnvRunner: Calling `sample()` would always force-reset the vector env (even if episodes were not completed in a previous call). #41168

[RLlib] Fix bug in SingleAgentEnvRunner: Calling `sample()` would always force-reset the vector env (even if episodes were not completed in a previous call). #41168

sven1977 commented Nov 15, 2023 •

edited

Loading