[RLlib] PPO torch memory leak and unnecessary torch.Tensor creation and gc'ing. #7238

sven1977 · 2020-02-20T11:07:39Z

PPO torch has a memory leak due to a missing torch.no_grad() around compute_advantages.

PPO torch produces lots of intermediary Tensors (which are then garbage collected) when run on CPU or GPU. This is due to the reporting code, creating volatile CPU Tensors (then numpy'ing them).

#6962

Closes #6962

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://ray.readthedocs.io/en/latest/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failure rates at https://ray-travis-tracker.herokuapp.com/.

…torch_memory_leak # Conflicts: # rllib/agents/ppo/tests/test_ppo.py

…torch_memory_leak

…torch_memory_leak # Conflicts: # rllib/agents/ppo/ppo_torch_policy.py

…torch_memory_leak � Conflicts: � rllib/agents/ppo/ppo_torch_policy.py

AmplabJenkins · 2020-02-20T22:03:11Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22195/
Test FAILed.

ericl · 2020-02-20T22:08:04Z

rllib/agents/a3c/a3c_torch_policy.py

-        sample_batch, last_r, policy.config["gamma"], policy.config["lambda"],
-        policy.config["use_gae"], policy.config["use_critic"])
+
+    with torch.no_grad():


Could we actually move this into the postprocess method defined in torch_policy_template? That way it will work automatically for all torch policies and we don't need to clutter the individual ones.

Sure, will do.

ericl · 2020-02-20T22:09:08Z

rllib/agents/ppo/ppo_torch_policy.py

-        "total_loss": policy.loss_obj.loss.cpu().detach().numpy(),
-        "policy_loss": policy.loss_obj.mean_policy_loss.cpu().detach().numpy(),
-        "vf_loss": policy.loss_obj.mean_vf_loss.cpu().detach().numpy(),
+        "total_loss": policy.loss_obj.loss.item(),


Similarly, could we automatically apply .item() to the dict values returned in the common torch policy class?

All done. ... Waiting for tests.

ericl

Main question here is if we can automatically insert these conversions in the template, to avoid having to do this for each algo (which could be brittle).

…torch_memory_leak

AmplabJenkins · 2020-02-21T08:17:34Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22213/
Test FAILed.

AmplabJenkins · 2020-02-21T09:30:42Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22220/
Test FAILed.

sven1977 · 2020-02-21T10:04:52Z

@ericl Everything is handled by the template now, which also does the bumpy/item conversion AND takes care of the no_grad. Individual TorchPolicies don't have to worry about this anymore.
All RLlib tests are pass.

AmplabJenkins · 2020-02-21T10:42:31Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22210/
Test FAILed.

AmplabJenkins · 2020-02-21T11:19:20Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22227/
Test PASSed.

AmplabJenkins · 2020-02-21T13:41:54Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22238/
Test FAILed.

AmplabJenkins · 2020-02-21T14:02:12Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22241/
Test FAILed.

AmplabJenkins · 2020-02-21T15:00:44Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22246/
Test FAILed.

ericl

Looks great!

ericl · 2020-02-21T22:44:46Z

rllib/agents/a3c/a3c_torch_policy.py

-    completed = sample_batch[SampleBatch.DONES][-1]
-    if completed:
+
+    if sample_batch[SampleBatch.DONES][-1]:


Prefer to use intermediate variables to clarify the value of a long expression when psosible.

ericl · 2020-02-21T22:48:22Z

//rllib:examples/custom_tf_policy                                       TIMEOUT in 3 out of 3 in 75.0s
  Stats over 3 runs: max = 75.0s, min = 75.0s, avg = 75.0s, dev = 0.0s
  /home/travis/.cache/bazel/_bazel_travis/b88c129a127452fc94033a29d9f90e20/execroot/com_github_ray_project_ray/bazel-out/k8-opt/testlogs/rllib/examples/custom_tf_policy/test.log
  /home/travis/.cache/bazel/_bazel_travis/b88c129a127452fc94033a29d9f90e20/execroot/com_github_ray_project_ray/bazel-out/k8-opt/testlogs/rllib/examples/custom_tf_policy/test_attempts/attempt_1.log
  /home/travis/.cache/bazel/_bazel_travis/b88c129a127452fc94033a29d9f90e20/execroot/com_github_ray_project_ray/bazel-out/k8-opt/testlogs/rllib/examples/custom_tf_policy/test_attempts/attempt_2.log

Possibly related to the changes?

sven1977 · 2020-02-22T08:37:23Z

I'll make custom_tf_policy size=medium, see whether that fixes that. I changed back the intermediary var.

AmplabJenkins · 2020-02-22T09:17:50Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22273/
Test FAILed.

sven1977 · 2020-02-22T15:04:12Z

@ericl Tests are all pass. Please merge.

sven1977 added 30 commits February 4, 2020 14:32

Take out stats to analyze memory leak in torch PPO.

db868e0

Merge branch 'upstream_master' into ppo_torch_memory_leak

25f68ee

WIP

c9e8f7a

WIP

a5714ce

Merge remote-tracking branch 'origin/ppo_torch_memory_leak' into ppo_…

658b39e

…torch_memory_leak # Conflicts: # rllib/agents/ppo/tests/test_ppo.py

WIP

f1393d2

LINT

3671cca

Merge branch 'master' of https://github.com/ray-project/ray into ppo_…

845c0ce

…torch_memory_leak

WIP

3513562

WIP

63f9d67

WIP.

8076b8f

Merge remote-tracking branch 'origin/ppo_torch_memory_leak' into ppo_…

2a1ae4d

…torch_memory_leak # Conflicts: # rllib/agents/ppo/ppo_torch_policy.py

WIP.

27cc8e5

WIP.

7f69410

WIP.

ca4a661

WIP.

99d4dd6

WIP.

042d26a

WIP.

e709a64

WIP.

f2a82c7

WIP.

b7af2c4

WIP.

49e5538

Merge branch 'master' of https://github.com/ray-project/ray into ppo_…

16d8bde

…torch_memory_leak � Conflicts: � rllib/agents/ppo/ppo_torch_policy.py

WIP.

76383ff

WIP.

9bef9e5

WIP.

1d6197c

WIP.

27c82ac

WIP.

0be80a3

WIP.

fad5c58

WIP.

a46c48d

WIP.

a5ac6db

ericl self-assigned this Feb 20, 2020

ericl reviewed Feb 20, 2020

View reviewed changes

sven1977 added 3 commits February 21, 2020 08:15

WIP

9b2bc16

Merge branch 'master' of https://github.com/ray-project/ray into ppo_…

4ebe385

…torch_memory_leak

WIP

f5b639b

FIX.

4c3d568

Fix sequence_mask being dependent on torch being installed.

7683db7

sven1977 added 2 commits February 21, 2020 13:55

Fix strange ray-core tf-error in test_memory_scheduling test case.

908bcf3

Fix strange ray-core tf-error in test_memory_scheduling test case.

47dee5d

Fix strange ray-core tf-error in test_memory_scheduling test case.

ef94b4c

ericl approved these changes Feb 21, 2020

View reviewed changes

Fix strange ray-core tf-error in test_memory_scheduling test case.

ae97b09

ericl merged commit e2edca4 into ray-project:master Feb 22, 2020

pmacalpine mentioned this pull request Feb 23, 2020

[rllib] TypeError when running atari with ppo and torch #7277

Closed

2 tasks

sven1977 deleted the ppo_torch_memory_leak branch March 3, 2020 10:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] PPO torch memory leak and unnecessary torch.Tensor creation and gc'ing. #7238

[RLlib] PPO torch memory leak and unnecessary torch.Tensor creation and gc'ing. #7238

sven1977 commented Feb 20, 2020 •

edited

Loading

AmplabJenkins commented Feb 20, 2020

ericl Feb 20, 2020

sven1977 Feb 21, 2020

ericl Feb 20, 2020

sven1977 Feb 21, 2020

ericl left a comment

AmplabJenkins commented Feb 21, 2020

AmplabJenkins commented Feb 21, 2020

sven1977 commented Feb 21, 2020

AmplabJenkins commented Feb 21, 2020

AmplabJenkins commented Feb 21, 2020

AmplabJenkins commented Feb 21, 2020

AmplabJenkins commented Feb 21, 2020

AmplabJenkins commented Feb 21, 2020

ericl left a comment

ericl Feb 21, 2020

ericl commented Feb 21, 2020

sven1977 commented Feb 22, 2020

AmplabJenkins commented Feb 22, 2020

sven1977 commented Feb 22, 2020

[RLlib] PPO torch memory leak and unnecessary torch.Tensor creation and gc'ing. #7238

[RLlib] PPO torch memory leak and unnecessary torch.Tensor creation and gc'ing. #7238

Conversation

sven1977 commented Feb 20, 2020 • edited Loading

AmplabJenkins commented Feb 20, 2020

ericl Feb 20, 2020

Choose a reason for hiding this comment

sven1977 Feb 21, 2020

Choose a reason for hiding this comment

ericl Feb 20, 2020

Choose a reason for hiding this comment

sven1977 Feb 21, 2020

Choose a reason for hiding this comment

ericl left a comment

Choose a reason for hiding this comment

AmplabJenkins commented Feb 21, 2020

AmplabJenkins commented Feb 21, 2020

sven1977 commented Feb 21, 2020

AmplabJenkins commented Feb 21, 2020

AmplabJenkins commented Feb 21, 2020

AmplabJenkins commented Feb 21, 2020

AmplabJenkins commented Feb 21, 2020

AmplabJenkins commented Feb 21, 2020

ericl left a comment

Choose a reason for hiding this comment

ericl Feb 21, 2020

Choose a reason for hiding this comment

ericl commented Feb 21, 2020

sven1977 commented Feb 22, 2020

AmplabJenkins commented Feb 22, 2020

sven1977 commented Feb 22, 2020

sven1977 commented Feb 20, 2020 •

edited

Loading