Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] PPO torch memory leak and unnecessary torch.Tensor creation and gc'ing. #7238

Merged
merged 48 commits into from
Feb 22, 2020

Conversation

sven1977
Copy link
Contributor

@sven1977 sven1977 commented Feb 20, 2020

PPO torch has a memory leak due to a missing torch.no_grad() around compute_advantages.

PPO torch produces lots of intermediary Tensors (which are then garbage collected) when run on CPU or GPU. This is due to the reporting code, creating volatile CPU Tensors (then numpy'ing them).

#6962

Closes #6962

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22195/
Test FAILed.

@ericl ericl self-assigned this Feb 20, 2020
sample_batch, last_r, policy.config["gamma"], policy.config["lambda"],
policy.config["use_gae"], policy.config["use_critic"])

with torch.no_grad():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we actually move this into the postprocess method defined in torch_policy_template? That way it will work automatically for all torch policies and we don't need to clutter the individual ones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do.

"total_loss": policy.loss_obj.loss.cpu().detach().numpy(),
"policy_loss": policy.loss_obj.mean_policy_loss.cpu().detach().numpy(),
"vf_loss": policy.loss_obj.mean_vf_loss.cpu().detach().numpy(),
"total_loss": policy.loss_obj.loss.item(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, could we automatically apply .item() to the dict values returned in the common torch policy class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All done. ... Waiting for tests.

Copy link
Contributor

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main question here is if we can automatically insert these conversions in the template, to avoid having to do this for each algo (which could be brittle).

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22213/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22220/
Test FAILed.

@sven1977
Copy link
Contributor Author

@ericl Everything is handled by the template now, which also does the bumpy/item conversion AND takes care of the no_grad. Individual TorchPolicies don't have to worry about this anymore.
All RLlib tests are pass.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22210/
Test FAILed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22227/
Test PASSed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22238/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22241/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22246/
Test FAILed.

Copy link
Contributor

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

completed = sample_batch[SampleBatch.DONES][-1]
if completed:

if sample_batch[SampleBatch.DONES][-1]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer to use intermediate variables to clarify the value of a long expression when psosible.

@ericl
Copy link
Contributor

ericl commented Feb 21, 2020

//rllib:examples/custom_tf_policy                                       TIMEOUT in 3 out of 3 in 75.0s
  Stats over 3 runs: max = 75.0s, min = 75.0s, avg = 75.0s, dev = 0.0s
  /home/travis/.cache/bazel/_bazel_travis/b88c129a127452fc94033a29d9f90e20/execroot/com_github_ray_project_ray/bazel-out/k8-opt/testlogs/rllib/examples/custom_tf_policy/test.log
  /home/travis/.cache/bazel/_bazel_travis/b88c129a127452fc94033a29d9f90e20/execroot/com_github_ray_project_ray/bazel-out/k8-opt/testlogs/rllib/examples/custom_tf_policy/test_attempts/attempt_1.log
  /home/travis/.cache/bazel/_bazel_travis/b88c129a127452fc94033a29d9f90e20/execroot/com_github_ray_project_ray/bazel-out/k8-opt/testlogs/rllib/examples/custom_tf_policy/test_attempts/attempt_2.log

Possibly related to the changes?

@sven1977
Copy link
Contributor Author

I'll make custom_tf_policy size=medium, see whether that fixes that. I changed back the intermediary var.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22273/
Test FAILed.

@sven1977
Copy link
Contributor Author

@ericl Tests are all pass. Please merge.

@ericl ericl merged commit e2edca4 into ray-project:master Feb 22, 2020
@sven1977 sven1977 deleted the ppo_torch_memory_leak branch March 3, 2020 10:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests-ok The tagger certifies test failures are unrelated and assumes personal liability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RLlib] PPO torch over 5X slower than tensorflow on atari and uses up all RAM
4 participants