[Autograd] Use in-place input accumulation fast path for dense Tensors. #88339

robieta · 2022-11-02T17:54:24Z

Stack from ghstack (oldest at bottom):

-> [Autograd] Use in-place input accumulation fast path for dense Tensors. #88339

There is a fast path in InputBuffer to steal memory when use count is zero, however it is only used for sparse Tensors. According to Natalia, this is just because it wasn't obvious that there would be a benefit for dense Tensors so there was no reason to live dangerously. However I've noticed large Tensors in internal models which would benefit from this optimization as well.

Differential Revision: D40946601

cc @ezyang @albanD @zou3519 @gqchen @pearu @nikitaved @soulitzer @lezcano @Varal7

There is a fast path in InputBuffer to steal memory when use count is zero, however it is only used for sparse Tensors. According to Natalia, this is just because it wasn't obvious that there would be a benefit for dense Tensors so there was no reason to live dangerously. However I've noticed large Tensors in internal models which would benefit from this optimization as well. Differential Revision: [D40946601](https://our.internmc.facebook.com/intern/diff/D40946601/) [ghstack-poisoned]

pytorch-bot · 2022-11-02T17:54:28Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/88339

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Failures

As of commit 835e532:

The following jobs have failed:

linux-focal-py3.7-clang7-asan / test (default, 3, 5, linux.2xlarge)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ngimel · 2022-11-02T17:57:55Z

torch/csrc/autograd/input_buffer.cpp

@@ -86,13 +86,12 @@ static void accumulate(
    } else {
      buffer[pos] = var + old_var;
    }
+  } else if (
+      old_var.is_contiguous() && old_var.use_count() == 1 &&


I don't think we want to take this path when both var and old_var are sparse, who even knows if we have working inplace sparse-sparse addition, and in any case it's a conceptually weird operation, so you might want to push this case to the else branch.

If old_var is sparse it will get hooked by the first part of the conditional. (Which has a !var.is_sparse() on its steal check.) Maybe I should add a cautionary comment?

Ah yeah that would be good, thanks for checking!

robieta · 2022-11-02T18:24:47Z

Looks like I also need a !old_var._is_zerotensor() check. @anjali411 is there a more general mutability check that I should use?

…ense Tensors." There is a fast path in InputBuffer to steal memory when use count is zero, however it is only used for sparse Tensors. According to Natalia, this is just because it wasn't obvious that there would be a benefit for dense Tensors so there was no reason to live dangerously. However I've noticed large Tensors in internal models which would benefit from this optimization as well. Differential Revision: [D40946601](https://our.internmc.facebook.com/intern/diff/D40946601/) cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 [ghstack-poisoned]

albanD · 2022-11-02T18:48:28Z

torch/csrc/autograd/input_buffer.cpp

+  } else if (
+      old_var.is_contiguous() && !old_var._is_zerotensor() &&
+      old_var.use_count() == 1 && old_var.storage().use_count() == 1) {
+    buffer[pos] = old_var.add_(var);


I think there are a few things wrong here that need updating. In particular looking at the logic for AccumulateGrad show a lot of possible failure cases here:

pytorch/torch/csrc/autograd/functions/accumulate_grad.h

Line 56 in 0fa2366

// "Gradient Layout Contract"

We need to check that grad mode is disabled. This inplace might be invalid if old_var is saved by autograd

You need to guard for .is_sparse_csr() which is not covered by .is_sparse()

We want is_non_overlapping_and_dense on top of is_contiguous as we really don't want weird memory overlapping Tensors to be written inplace.

Why the special case on zero_tensor here?

We need to check that grad mode is disabled. This inplace might be invalid if old_var is saved by autograd

Should that be GradMode::is_enabled() or a property of old_var?

SGTM on is_sparse_csr and is_non_overlapping_and_dense.

Why the special case on zero_tensor here?

ZeroTensors are immutable

is_non_overlapping_and_dense on top of is_contiguous

instead of is contiguous? Yeah that's reasonable. IIRC it errors out on sparse tensors though, so the logic will need to become more tortured

FWIW I'm also updating the path where var is used, so there will be sparse checks before any other checks.

Should that be GradMode::is_enabled() or a property of old_var?

GradMode::is_enabled() because it might not be ok to modify old_var even if it doesn't require gradients.
Note that it could still be invalid when grad mode is disabled but that should be unlikely enough here that we're ok.

…ense Tensors." There is a fast path in InputBuffer to steal memory when use count is zero, however it is only used for sparse Tensors. According to Natalia, this is just because it wasn't obvious that there would be a benefit for dense Tensors so there was no reason to live dangerously. However I've noticed large Tensors in internal models which would benefit from this optimization as well. Differential Revision: [D40946601](https://our.internmc.facebook.com/intern/diff/D40946601/) cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 [ghstack-poisoned]

robieta · 2022-11-02T20:26:19Z

Updated with more rigorous checks and comments.

robieta · 2022-11-02T21:09:15Z

The latest failure is torch.autograd.gradcheck.GradcheckError: While computing batched gradients, got: Cannot access storage of BatchedTensorImpl. Nested Tensor, sparse Tensor, ZeroTensor, BatchedTensor etc. will all blow up if various methods are called and it's virtually impossible to get everything right, much less future proof it. I'll fix the unit test failures, but I'm increasingly coming to the conclusion that we need to add something like is_plain_old_tensor() to Tensor.

zou3519 · 2022-11-02T21:33:02Z

there's an isTensorSubclassLike check for this, though it is not completely comprehensive yet:

pytorch/aten/src/ATen/TensorSubclassLikeUtils.h

Line 51 in ef4ce6d

inline bool isTensorSubclassLike(const Tensor& tensor) {

We probably want to exclude tensors that check isTensorSubclassLike True from this fast path

robieta · 2022-11-03T00:43:46Z

there's an isTensorSubclassLike check for this, though it is not completely comprehensive yet:

pytorch/aten/src/ATen/TensorSubclassLikeUtils.h

Line 51 in ef4ce6d

inline bool isTensorSubclassLike(const Tensor& tensor) {

We probably want to exclude tensors that check isTensorSubclassLike True from this fast path

This is AWESOME! (CC @chaekit, @aaronenyeshi, @slgong-fb we should use this in profiler too.) AFAICT it doesn't check nested or any other C++ subclass? (And if not, do you think it should?) Plus it includes sparse and sparse_csr, so we shouldn't need those explicit checks.

…ense Tensors." There is a fast path in InputBuffer to steal memory when use count is zero, however it is only used for sparse Tensors. According to Natalia, this is just because it wasn't obvious that there would be a benefit for dense Tensors so there was no reason to live dangerously. However I've noticed large Tensors in internal models which would benefit from this optimization as well. Differential Revision: [D40946601](https://our.internmc.facebook.com/intern/diff/D40946601/) cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 [ghstack-poisoned]

Pull Request resolved: #88339 There is a fast path in InputBuffer to steal memory when use count is zero, however it is only used for sparse Tensors. According to Natalia, this is just because it wasn't obvious that there would be a benefit for dense Tensors so there was no reason to live dangerously. However I've noticed large Tensors in internal models which would benefit from this optimization as well. ghstack-source-id: 172517219 Differential Revision: [D40946601](https://our.internmc.facebook.com/intern/diff/D40946601/)

…ense Tensors." There is a fast path in InputBuffer to steal memory when use count is zero, however it is only used for sparse Tensors. According to Natalia, this is just because it wasn't obvious that there would be a benefit for dense Tensors so there was no reason to live dangerously. However I've noticed large Tensors in internal models which would benefit from this optimization as well. Differential Revision: [D40946601](https://our.internmc.facebook.com/intern/diff/D40946601/) cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 [ghstack-poisoned]

Pull Request resolved: #88339 There is a fast path in InputBuffer to steal memory when use count is zero, however it is only used for sparse Tensors. According to Natalia, this is just because it wasn't obvious that there would be a benefit for dense Tensors so there was no reason to live dangerously. However I've noticed large Tensors in internal models which would benefit from this optimization as well. ghstack-source-id: 172596747 Differential Revision: [D40946601](https://our.internmc.facebook.com/intern/diff/D40946601/)

robieta · 2022-11-03T21:14:19Z

I also had to gate on var not being a subclass because of a forward mode AD test. In theory I could have made the test check grad mode, but given how varied Tensor subclasses can be it seems like a better idea to just exclude them from the fast path. PTAL

…ense Tensors." There is a fast path in InputBuffer to steal memory when use count is zero, however it is only used for sparse Tensors. According to Natalia, this is just because it wasn't obvious that there would be a benefit for dense Tensors so there was no reason to live dangerously. However I've noticed large Tensors in internal models which would benefit from this optimization as well. Differential Revision: [D40946601](https://our.internmc.facebook.com/intern/diff/D40946601/) cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 [ghstack-poisoned]

Pull Request resolved: #88339 There is a fast path in InputBuffer to steal memory when use count is zero, however it is only used for sparse Tensors. According to Natalia, this is just because it wasn't obvious that there would be a benefit for dense Tensors so there was no reason to live dangerously. However I've noticed large Tensors in internal models which would benefit from this optimization as well. ghstack-source-id: 172919004 Differential Revision: [D40946601](https://our.internmc.facebook.com/intern/diff/D40946601/)

Pull Request resolved: #88339 There is a fast path in InputBuffer to steal memory when use count is zero, however it is only used for sparse Tensors. According to Natalia, this is just because it wasn't obvious that there would be a benefit for dense Tensors so there was no reason to live dangerously. However I've noticed large Tensors in internal models which would benefit from this optimization as well. ghstack-source-id: 172984864 Differential Revision: [D40946601](https://our.internmc.facebook.com/intern/diff/D40946601/)

robieta · 2022-11-08T05:35:20Z

@pytorchbot merge -f "test failure is unrelated. (failed to install triton)"

pytorchmergebot · 2022-11-08T05:37:39Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

mehtanirav · 2022-11-11T17:00:50Z

@pytorchbot revert -m "Internal test failures" -c ghfirst

pytorchmergebot · 2022-11-11T17:03:19Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot · 2022-11-11T17:03:29Z

@robieta your PR has been successfully reverted.

…e Tensors. (#88339)" This reverts commit 8f66ae4. Reverted #88339 on behalf of https://github.com/mehtanirav due to Internal test failures

…e Tensors. Identical to #88339 except with a `.has_storage()` check before `.storage()`. Differential Revision: [D41737935](https://our.internmc.facebook.com/intern/diff/D41737935/) [ghstack-poisoned]

…e Tensors. Identical to #88339 except with a `.has_storage()` check before `.storage()`. Differential Revision: [D41737935](https://our.internmc.facebook.com/intern/diff/D41737935/) ghstack-source-id: 175170552 Pull Request resolved: #90217

…s. (pytorch#88339) There is a fast path in InputBuffer to steal memory when use count is zero, however it is only used for sparse Tensors. According to Natalia, this is just because it wasn't obvious that there would be a benefit for dense Tensors so there was no reason to live dangerously. However I've noticed large Tensors in internal models which would benefit from this optimization as well. Differential Revision: [D40946601](https://our.internmc.facebook.com/intern/diff/D40946601/) Pull Request resolved: pytorch#88339 Approved by: https://github.com/ngimel

…e Tensors. (pytorch#88339)" This reverts commit 8f66ae4. Reverted pytorch#88339 on behalf of https://github.com/mehtanirav due to Internal test failures

…e Tensors. Pull Request resolved: #90217 Identical to #88339 except with a `.has_storage()` check before `.storage()`. ghstack-source-id: 177028549 Differential Revision: [D41737935](https://our.internmc.facebook.com/intern/diff/D41737935/)

…mulation fast path for dense Tensors." Identical to #88339 except with a `.has_storage()` check before `.storage()`. Differential Revision: [D41737935](https://our.internmc.facebook.com/intern/diff/D41737935/) cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 [ghstack-poisoned]

…th for dense Tensors." Identical to #88339 except with a `.has_storage()` check before `.storage()`. Differential Revision: [D41737935](https://our.internmc.facebook.com/intern/diff/D41737935/) cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 [ghstack-poisoned]

…e Tensors. Pull Request resolved: #90217 Identical to #88339 except with a `.has_storage()` check before `.storage()`. ghstack-source-id: 178930904 Differential Revision: [D41737935](https://our.internmc.facebook.com/intern/diff/D41737935/)

…mulation fast path for dense Tensors." Identical to #88339 except with a `.has_storage()` check before `.storage()`. Differential Revision: [D41737935](https://our.internmc.facebook.com/intern/diff/D41737935/) cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 [ghstack-poisoned]

…th for dense Tensors." Identical to #88339 except with a `.has_storage()` check before `.storage()`. Differential Revision: [D41737935](https://our.internmc.facebook.com/intern/diff/D41737935/) cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 [ghstack-poisoned]

…e Tensors. Pull Request resolved: #90217 Identical to #88339 except with a `.has_storage()` check before `.storage()`. ghstack-source-id: 179779511 Differential Revision: [D41737935](https://our.internmc.facebook.com/intern/diff/D41737935/)

…mulation fast path for dense Tensors." Identical to #88339 except with a `.has_storage()` check before `.storage()`. Differential Revision: [D41737935](https://our.internmc.facebook.com/intern/diff/D41737935/) cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 [ghstack-poisoned]

…th for dense Tensors." Identical to #88339 except with a `.has_storage()` check before `.storage()`. Differential Revision: [D41737935](https://our.internmc.facebook.com/intern/diff/D41737935/) cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 [ghstack-poisoned]

…mulation fast path for dense Tensors." Identical to #88339 except with a `.has_storage()` check before `.storage()`. Differential Revision: [D41737935](https://our.internmc.facebook.com/intern/diff/D41737935/) cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 [ghstack-poisoned]

…e Tensors. Pull Request resolved: #90217 Identical to #88339 except with a `.has_storage()` check before `.storage()`. ghstack-source-id: 179910561 Differential Revision: [D41737935](https://our.internmc.facebook.com/intern/diff/D41737935/)

…th for dense Tensors." Identical to #88339 except with a `.has_storage()` check before `.storage()`. Differential Revision: [D41737935](https://our.internmc.facebook.com/intern/diff/D41737935/) cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 [ghstack-poisoned]

…e Tensors. (#90217) Identical to #88339 except with a `.has_storage()` check before `.storage()`. Differential Revision: [D41737935](https://our.internmc.facebook.com/intern/diff/D41737935/) Pull Request resolved: #90217 Approved by: https://github.com/ngimel

robieta requested review from albanD and soulitzer as code owners November 2, 2022 17:54

robieta requested review from ngimel, gchanan and ezyang November 2, 2022 17:55

ngimel reviewed Nov 2, 2022

View reviewed changes

robieta added module: autograd Related to torch.autograd, and the autograd engine in general release notes: autograd release notes category labels Nov 2, 2022

ngimel approved these changes Nov 2, 2022

View reviewed changes

albanD reviewed Nov 2, 2022

View reviewed changes

robieta requested a review from ngimel November 3, 2022 01:42

pytorchmergebot added the Merged label Nov 8, 2022

pytorchmergebot closed this in 8f66ae4 Nov 8, 2022

pytorchmergebot added the Reverted label Nov 11, 2022

robieta mentioned this pull request Dec 5, 2022

Reland: [Autograd] Use in-place input accumulation fast path for dense Tensors. #90217

Closed

facebook-github-bot deleted the gh/robieta/148/head branch June 8, 2023 18:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Autograd] Use in-place input accumulation fast path for dense Tensors. #88339

[Autograd] Use in-place input accumulation fast path for dense Tensors. #88339

robieta commented Nov 2, 2022 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Nov 2, 2022 •

edited

Loading

ngimel Nov 2, 2022

robieta Nov 2, 2022 •

edited

Loading

ngimel Nov 2, 2022

robieta commented Nov 2, 2022

albanD Nov 2, 2022

robieta Nov 2, 2022

ngimel Nov 2, 2022

robieta Nov 2, 2022

albanD Nov 2, 2022

robieta commented Nov 2, 2022

robieta commented Nov 2, 2022

zou3519 commented Nov 2, 2022 •

edited

Loading

robieta commented Nov 3, 2022

robieta commented Nov 3, 2022

robieta commented Nov 8, 2022

pytorchmergebot commented Nov 8, 2022

mehtanirav commented Nov 11, 2022

pytorchmergebot commented Nov 11, 2022

pytorchmergebot commented Nov 11, 2022

[Autograd] Use in-place input accumulation fast path for dense Tensors. #88339

[Autograd] Use in-place input accumulation fast path for dense Tensors. #88339

Conversation

robieta commented Nov 2, 2022 • edited by pytorch-bot bot Loading

pytorch-bot bot commented Nov 2, 2022 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/88339

❌ 1 Failures

ngimel Nov 2, 2022

Choose a reason for hiding this comment

robieta Nov 2, 2022 • edited Loading

Choose a reason for hiding this comment

ngimel Nov 2, 2022

Choose a reason for hiding this comment

robieta commented Nov 2, 2022

albanD Nov 2, 2022

Choose a reason for hiding this comment

robieta Nov 2, 2022

Choose a reason for hiding this comment

ngimel Nov 2, 2022

Choose a reason for hiding this comment

robieta Nov 2, 2022

Choose a reason for hiding this comment

albanD Nov 2, 2022

Choose a reason for hiding this comment

robieta commented Nov 2, 2022

robieta commented Nov 2, 2022

zou3519 commented Nov 2, 2022 • edited Loading

robieta commented Nov 3, 2022

robieta commented Nov 3, 2022

robieta commented Nov 8, 2022

pytorchmergebot commented Nov 8, 2022

Merge started

mehtanirav commented Nov 11, 2022

pytorchmergebot commented Nov 11, 2022

pytorchmergebot commented Nov 11, 2022

robieta commented Nov 2, 2022 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Nov 2, 2022 •

edited

Loading

robieta Nov 2, 2022 •

edited

Loading

zou3519 commented Nov 2, 2022 •

edited

Loading