Add default hooks to save tensors on CPU #61928

Varal7 · 2021-07-20T20:00:18Z

Stack from ghstack:

Fix #57100.
Creates a context-manager torch.autograd.graph.save_on_cpu() under which all tensors saved during the forward pass are moved* to cpu, then copied back to the appropriate device for the backward pass.

*If the tensor was already on cpu, the entire operation is a no op.

If the user so desires, we move the tensor to pin_memory during packing so that the unpacking can be done asynchronously.

With the current PR, hooks are registered globally, across threads. In the near future, we want to make these hooks thread-local.

See benchmark and note about training large models

Differential Revision: D29848526

Fix #57100 [ghstack-poisoned]

facebook-github-bot · 2021-07-20T20:00:24Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/61928
📄 Preview docs built from this PR

💊 CI failures summary and remediations

As of commit 679572c (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

Fix #57100. Creates a context manager `torch.autograd.graph.save_on_cpu()` under which all tensors saved during the forward pass are actually copied to cpu, then copied back to the appropriate device for the backward pass. [ghstack-poisoned]

Fix #57100 ghstack-source-id: 2d3c9f987a9f865078184f8dc04ad075c9a2bdac Pull Request resolved: #61928

Fix #57100. Creates a context manager `torch.autograd.graph.save_on_cpu()` under which all tensors saved during the forward pass are actually copied to cpu, then copied back to the appropriate device for the backward pass. [ghstack-poisoned]

Fix #57100 ghstack-source-id: 9285a37caf37f90fd65b309235d7027504ce4ca5 Pull Request resolved: #61928

Fix #57100. Creates a context manager `torch.autograd.graph.save_on_cpu()` under which all tensors saved during the forward pass are actually copied to cpu, then copied back to the appropriate device for the backward pass. [ghstack-poisoned]

Fix #57100 ghstack-source-id: a9e56df9653a4240b4d382f7dbec3080a3a07a3d Pull Request resolved: #61928

This PR adds docstrings for CPU hooks introduced in #61928. Also uncomments the warning about pinned memory in CUDA semantics docs ghstack-source-id: 4c86248f5818546f5c82117a306e61d328389bb7 Pull Request resolved: #62410

Varal7 · 2021-08-03T04:02:11Z

@Varal7 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Fix #57100. Creates a function `torch.autograd.graph.set_save_on_cpu_hooks()` which can be used to register default hooks under which all tensors saved during the forward pass are moved* to cpu, then copied back to the appropriate device for the backward pass. *If the tensor was already on cpu, the entire operation is a no op. If the tensor is on GPU, we move the tensor to `pin_memory` during packing so that the unpacking can be done asynchronously. With the current PR, hooks are set with `torch.autograd.graph.set_save_on_cpu_hooks()` and unset with `torch.autograd.graph.reset_saved_tensors_default_hooks`. In the near future, we want to make these hooks thread-local and expose a context manager `torch.autograd.graph.save_on_cpu`. See [benchmark](#61928 (comment)) and [note about training large models](#61928 (comment)) Differential Revision: [D29848526](https://our.internmc.facebook.com/intern/diff/D29848526) [ghstack-poisoned]

This PR adds docstrings for CPU hooks introduced in #61928. Also uncomments the warning about pinned memory in CUDA semantics docs. Depends on: #62361. For now docstrings are an orphan page at https://docs-preview.pytorch.org/62410/generated/torch.autograd.graph.set_save_on_cpu_hooks.html#torch-autograd-graph-set-save-on-cpu-hooks Differential Revision: [D29990129](https://our.internmc.facebook.com/intern/diff/D29990129) [ghstack-poisoned]

Fix #57100. Creates a function `torch.autograd.graph.set_save_on_cpu_hooks()` which can be used to register default hooks under which all tensors saved during the forward pass are moved* to cpu, then copied back to the appropriate device for the backward pass. *If the tensor was already on cpu, the entire operation is a no op. If the tensor is on GPU, we move the tensor to `pin_memory` during packing so that the unpacking can be done asynchronously. With the current PR, hooks are set with `torch.autograd.graph.set_save_on_cpu_hooks()` and unset with `torch.autograd.graph.reset_saved_tensors_default_hooks`. In the near future, we want to make these hooks thread-local and expose a context manager `torch.autograd.graph.save_on_cpu`. See [benchmark](#61928 (comment)) and [note about training large models](#61928 (comment)) Differential Revision: [D29848526](https://our.internmc.facebook.com/intern/diff/D29848526) [ghstack-poisoned]

This PR adds docstrings for CPU hooks introduced in #61928. Also uncomments the warning about pinned memory in CUDA semantics docs. Depends on: #62361. For now docstrings are an orphan page at https://docs-preview.pytorch.org/62410/generated/torch.autograd.graph.set_save_on_cpu_hooks.html#torch-autograd-graph-set-save-on-cpu-hooks Differential Revision: [D29990129](https://our.internmc.facebook.com/intern/diff/D29990129) [ghstack-poisoned]

This PR adds docstrings for CPU hooks introduced in #61928. Also uncomments the warning about pinned memory in CUDA semantics docs ghstack-source-id: c711d4ff378cb6096d9e454c9b5c89b66bbe44f9 Pull Request resolved: #62410

Varal7 · 2021-08-03T15:20:04Z

@Varal7 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

zou3519 · 2021-08-03T19:46:45Z

torch/autograd/saved_variable_default_hooks.py

+            storage = torch.empty(
+                tensor.size(),
+                dtype=tensor.dtype,
+                layout=tensor.layout,
+                pin_memory=(torch.cuda.is_available() and not tensor.is_sparse))
+            storage.copy_(tensor)


Why can't we do storage = tensor.to(device='cpu', pin_memory=True)?

Also nit: we should rename storage into something else; storage can be confused with torch.Storage

I don't think that pin_memory is an acceptable argument of torch.tensor.to

Ok for storage rename.

I don't think that pin_memory is an acceptable argument of torch.tensor.to

Good point, thanks for the clarification

would something like storage = tensor.to("cpu", non_blocking=True).pin_memory() work?

That would mean 2 copies

This PR adds docstrings for CPU hooks introduced in #61928. Also uncomments the warning about pinned memory in CUDA semantics docs. Depends on: #62361. For now docstrings are an orphan page at https://docs-preview.pytorch.org/62410/generated/torch.autograd.graph.set_save_on_cpu_hooks.html#torch-autograd-graph-set-save-on-cpu-hooks Differential Revision: [D29990129](https://our.internmc.facebook.com/intern/diff/D29990129) [ghstack-poisoned]

This PR adds docstrings for CPU hooks introduced in #61928. Also uncomments the warning about pinned memory in CUDA semantics docs ghstack-source-id: 5f554c225627491e60560e7d4a8b4511c70637c9 Pull Request resolved: #62410

facebook-github-bot · 2021-08-03T20:10:18Z

@Varal7 merged this pull request in 9beb279.

This PR adds docstrings for CPU hooks introduced in #61928. Also uncomments the warning about pinned memory in CUDA semantics docs. Depends on: #62361. For now docstrings are an orphan page at https://docs-preview.pytorch.org/62410/generated/torch.autograd.graph.set_save_on_cpu_hooks.html#torch-autograd-graph-set-save-on-cpu-hooks Differential Revision: [D29990129](https://our.internmc.facebook.com/intern/diff/D29990129) [ghstack-poisoned]

This PR adds docstrings for CPU hooks introduced in #61928. Also uncomments the warning about pinned memory in CUDA semantics docs ghstack-source-id: 936111085ab4102b3ee20605594c00a6cce7cb8b Pull Request resolved: #62410

This PR adds docstrings for CPU hooks introduced in #61928. Also uncomments the warning about pinned memory in CUDA semantics docs. Depends on: #62361. For now docstrings are an orphan page at https://docs-preview.pytorch.org/62410/generated/torch.autograd.graph.set_save_on_cpu_hooks.html#torch-autograd-graph-set-save-on-cpu-hooks Differential Revision: [D29990129](https://our.internmc.facebook.com/intern/diff/D29990129) [ghstack-poisoned]

This PR adds docstrings for CPU hooks introduced in #61928. Also uncomments the warning about pinned memory in CUDA semantics docs ghstack-source-id: 4a98e10627dc2147d1a3b8cf69f9628f1d1d4655 Pull Request resolved: #62410

This reverts commit 9beb279. [ghstack-poisoned]

This reverts commit 9beb279. ghstack-source-id: f4ad32c72c88e0e6b618f0535de52dec684f3871 Pull Request resolved: #62693

Summary: Pull Request resolved: #62410 This PR adds docstrings for CPU hooks introduced in #61928. Also uncomments the warning about pinned memory in CUDA semantics docs. Depends on: #62361. For now docstrings are an orphan page at https://docs-preview.pytorch.org/62410/generated/torch.autograd.graph.set_save_on_cpu_hooks.html#torch-autograd-graph-set-save-on-cpu-hooks Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D29990129 Pulled By: Varal7 fbshipit-source-id: 7a98eeee6a0abb11e2c2d9169cd1aa35ad7ba3f4

Add context manager to save tensors on CPU

7822692

Fix #57100 [ghstack-poisoned]

Varal7 requested review from albanD and soulitzer as code owners July 20, 2021 20:00

This was referenced Jul 20, 2021

Add default Saved Variable hooks #61834

Closed

Avoid unnecessary temporary data tensor in Saved Variable #61927

Closed

facebook-github-bot added the cla signed label Jul 20, 2021

Varal7 added a commit that referenced this pull request Jul 20, 2021

Add context manager to save tensors on CPU

203b1b2

Fix #57100 ghstack-source-id: 2d3c9f987a9f865078184f8dc04ad075c9a2bdac Pull Request resolved: #61928

Varal7 added a commit that referenced this pull request Jul 21, 2021

Add context manager to save tensors on CPU

1a737df

Fix #57100 ghstack-source-id: 9285a37caf37f90fd65b309235d7027504ce4ca5 Pull Request resolved: #61928

Varal7 mentioned this pull request Jul 21, 2021

Catch saved tensors default hooks race condition #61957

Closed

Varal7 added a commit that referenced this pull request Jul 21, 2021

Add context manager to save tensors on CPU

268160e

Fix #57100 ghstack-source-id: a9e56df9653a4240b4d382f7dbec3080a3a07a3d Pull Request resolved: #61928

Varal7 marked this pull request as draft July 22, 2021 14:01

Varal7 requested a review from zou3519 August 3, 2021 15:20

zou3519 approved these changes Aug 3, 2021

View reviewed changes

facebook-github-bot closed this in 9beb279 Aug 3, 2021

facebook-github-bot added the Merged label Aug 3, 2021

IvanKobzarev added a commit that referenced this pull request Aug 3, 2021

Revert "Add context manager to save tensors on CPU (#61928)"

9c0fdb5

This reverts commit 9beb279. [ghstack-poisoned]

IvanKobzarev added a commit that referenced this pull request Aug 3, 2021

Revert "Add context manager to save tensors on CPU (#61928)"

10db8af

This reverts commit 9beb279. ghstack-source-id: f4ad32c72c88e0e6b618f0535de52dec684f3871 Pull Request resolved: #62693

facebook-github-bot deleted the gh/varal7/18/head branch August 7, 2021 14:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add default hooks to save tensors on CPU #61928

Add default hooks to save tensors on CPU #61928

Varal7 commented Jul 20, 2021 •

edited

facebook-github-bot commented Jul 20, 2021 •

edited

Varal7 commented Aug 3, 2021

Varal7 commented Aug 3, 2021

zou3519 Aug 3, 2021

Varal7 Aug 3, 2021

zou3519 Aug 3, 2021

rohan-varma Dec 21, 2021

albanD Dec 21, 2021

facebook-github-bot commented Aug 3, 2021

Add default hooks to save tensors on CPU #61928

Add default hooks to save tensors on CPU #61928

Conversation

Varal7 commented Jul 20, 2021 • edited

facebook-github-bot commented Jul 20, 2021 • edited

🔗 Helpful links

💊 CI failures summary and remediations

Varal7 commented Aug 3, 2021

Varal7 commented Aug 3, 2021

zou3519 Aug 3, 2021

Choose a reason for hiding this comment

Varal7 Aug 3, 2021

Choose a reason for hiding this comment

zou3519 Aug 3, 2021

Choose a reason for hiding this comment

rohan-varma Dec 21, 2021

Choose a reason for hiding this comment

albanD Dec 21, 2021

Choose a reason for hiding this comment

facebook-github-bot commented Aug 3, 2021

Varal7 commented Jul 20, 2021 •

edited

facebook-github-bot commented Jul 20, 2021 •

edited