-
Notifications
You must be signed in to change notification settings - Fork 22.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Autograd] Use in-place input accumulation fast path for dense Tensors. #88339
Conversation
There is a fast path in InputBuffer to steal memory when use count is zero, however it is only used for sparse Tensors. According to Natalia, this is just because it wasn't obvious that there would be a benefit for dense Tensors so there was no reason to live dangerously. However I've noticed large Tensors in internal models which would benefit from this optimization as well. Differential Revision: [D40946601](https://our.internmc.facebook.com/intern/diff/D40946601/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/88339
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 FailuresAs of commit 835e532: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
torch/csrc/autograd/input_buffer.cpp
Outdated
@@ -86,13 +86,12 @@ static void accumulate( | |||
} else { | |||
buffer[pos] = var + old_var; | |||
} | |||
} else if ( | |||
old_var.is_contiguous() && old_var.use_count() == 1 && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to take this path when both var
and old_var
are sparse, who even knows if we have working inplace sparse-sparse addition, and in any case it's a conceptually weird operation, so you might want to push this case to the else
branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If old_var
is sparse it will get hooked by the first part of the conditional. (Which has a !var.is_sparse()
on its steal check.) Maybe I should add a cautionary comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah that would be good, thanks for checking!
Looks like I also need a |
…ense Tensors." There is a fast path in InputBuffer to steal memory when use count is zero, however it is only used for sparse Tensors. According to Natalia, this is just because it wasn't obvious that there would be a benefit for dense Tensors so there was no reason to live dangerously. However I've noticed large Tensors in internal models which would benefit from this optimization as well. Differential Revision: [D40946601](https://our.internmc.facebook.com/intern/diff/D40946601/) cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 [ghstack-poisoned]
} else if ( | ||
old_var.is_contiguous() && !old_var._is_zerotensor() && | ||
old_var.use_count() == 1 && old_var.storage().use_count() == 1) { | ||
buffer[pos] = old_var.add_(var); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are a few things wrong here that need updating. In particular looking at the logic for AccumulateGrad show a lot of possible failure cases here:
// "Gradient Layout Contract" |
- We need to check that grad mode is disabled. This inplace might be invalid if old_var is saved by autograd
- You need to guard for
.is_sparse_csr()
which is not covered by.is_sparse()
- We want is_non_overlapping_and_dense on top of is_contiguous as we really don't want weird memory overlapping Tensors to be written inplace.
- Why the special case on zero_tensor here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to check that grad mode is disabled. This inplace might be invalid if old_var is saved by autograd
Should that be GradMode::is_enabled()
or a property of old_var
?
SGTM on is_sparse_csr
and is_non_overlapping_and_dense
.
Why the special case on zero_tensor here?
ZeroTensors are immutable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_non_overlapping_and_dense on top of is_contiguous
instead of is contiguous? Yeah that's reasonable. IIRC it errors out on sparse tensors though, so the logic will need to become more tortured
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I'm also updating the path where var
is used, so there will be sparse checks before any other checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should that be GradMode::is_enabled() or a property of old_var?
GradMode::is_enabled() because it might not be ok to modify old_var even if it doesn't require gradients.
Note that it could still be invalid when grad mode is disabled but that should be unlikely enough here that we're ok.
…ense Tensors." There is a fast path in InputBuffer to steal memory when use count is zero, however it is only used for sparse Tensors. According to Natalia, this is just because it wasn't obvious that there would be a benefit for dense Tensors so there was no reason to live dangerously. However I've noticed large Tensors in internal models which would benefit from this optimization as well. Differential Revision: [D40946601](https://our.internmc.facebook.com/intern/diff/D40946601/) cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 [ghstack-poisoned]
Updated with more rigorous checks and comments. |
The latest failure is |
there's an isTensorSubclassLike check for this, though it is not completely comprehensive yet:
We probably want to exclude tensors that check isTensorSubclassLike True from this fast path |
This is AWESOME! (CC @chaekit, @aaronenyeshi, @slgong-fb we should use this in profiler too.) AFAICT it doesn't check nested or any other C++ subclass? (And if not, do you think it should?) Plus it includes sparse and sparse_csr, so we shouldn't need those explicit checks. |
…ense Tensors." There is a fast path in InputBuffer to steal memory when use count is zero, however it is only used for sparse Tensors. According to Natalia, this is just because it wasn't obvious that there would be a benefit for dense Tensors so there was no reason to live dangerously. However I've noticed large Tensors in internal models which would benefit from this optimization as well. Differential Revision: [D40946601](https://our.internmc.facebook.com/intern/diff/D40946601/) cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 [ghstack-poisoned]
Pull Request resolved: #88339 There is a fast path in InputBuffer to steal memory when use count is zero, however it is only used for sparse Tensors. According to Natalia, this is just because it wasn't obvious that there would be a benefit for dense Tensors so there was no reason to live dangerously. However I've noticed large Tensors in internal models which would benefit from this optimization as well. ghstack-source-id: 172517219 Differential Revision: [D40946601](https://our.internmc.facebook.com/intern/diff/D40946601/)
…ense Tensors." There is a fast path in InputBuffer to steal memory when use count is zero, however it is only used for sparse Tensors. According to Natalia, this is just because it wasn't obvious that there would be a benefit for dense Tensors so there was no reason to live dangerously. However I've noticed large Tensors in internal models which would benefit from this optimization as well. Differential Revision: [D40946601](https://our.internmc.facebook.com/intern/diff/D40946601/) cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 [ghstack-poisoned]
Pull Request resolved: #88339 There is a fast path in InputBuffer to steal memory when use count is zero, however it is only used for sparse Tensors. According to Natalia, this is just because it wasn't obvious that there would be a benefit for dense Tensors so there was no reason to live dangerously. However I've noticed large Tensors in internal models which would benefit from this optimization as well. ghstack-source-id: 172596747 Differential Revision: [D40946601](https://our.internmc.facebook.com/intern/diff/D40946601/)
I also had to gate on |
…ense Tensors." There is a fast path in InputBuffer to steal memory when use count is zero, however it is only used for sparse Tensors. According to Natalia, this is just because it wasn't obvious that there would be a benefit for dense Tensors so there was no reason to live dangerously. However I've noticed large Tensors in internal models which would benefit from this optimization as well. Differential Revision: [D40946601](https://our.internmc.facebook.com/intern/diff/D40946601/) cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 [ghstack-poisoned]
Pull Request resolved: #88339 There is a fast path in InputBuffer to steal memory when use count is zero, however it is only used for sparse Tensors. According to Natalia, this is just because it wasn't obvious that there would be a benefit for dense Tensors so there was no reason to live dangerously. However I've noticed large Tensors in internal models which would benefit from this optimization as well. ghstack-source-id: 172919004 Differential Revision: [D40946601](https://our.internmc.facebook.com/intern/diff/D40946601/)
Pull Request resolved: #88339 There is a fast path in InputBuffer to steal memory when use count is zero, however it is only used for sparse Tensors. According to Natalia, this is just because it wasn't obvious that there would be a benefit for dense Tensors so there was no reason to live dangerously. However I've noticed large Tensors in internal models which would benefit from this optimization as well. ghstack-source-id: 172984864 Differential Revision: [D40946601](https://our.internmc.facebook.com/intern/diff/D40946601/)
@pytorchbot merge -f "test failure is unrelated. (failed to install triton)" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
@pytorchbot revert -m "Internal test failures" -c ghfirst |
@pytorchbot successfully started a revert job. Check the current status here. |
@robieta your PR has been successfully reverted. |
…e Tensors. (#88339)" This reverts commit 8f66ae4. Reverted #88339 on behalf of https://github.com/mehtanirav due to Internal test failures
…e Tensors. Identical to #88339 except with a `.has_storage()` check before `.storage()`. Differential Revision: [D41737935](https://our.internmc.facebook.com/intern/diff/D41737935/) [ghstack-poisoned]
…e Tensors. Identical to #88339 except with a `.has_storage()` check before `.storage()`. Differential Revision: [D41737935](https://our.internmc.facebook.com/intern/diff/D41737935/) ghstack-source-id: 175170552 Pull Request resolved: #90217
…s. (pytorch#88339) There is a fast path in InputBuffer to steal memory when use count is zero, however it is only used for sparse Tensors. According to Natalia, this is just because it wasn't obvious that there would be a benefit for dense Tensors so there was no reason to live dangerously. However I've noticed large Tensors in internal models which would benefit from this optimization as well. Differential Revision: [D40946601](https://our.internmc.facebook.com/intern/diff/D40946601/) Pull Request resolved: pytorch#88339 Approved by: https://github.com/ngimel
…e Tensors. (pytorch#88339)" This reverts commit 8f66ae4. Reverted pytorch#88339 on behalf of https://github.com/mehtanirav due to Internal test failures
…e Tensors. Pull Request resolved: #90217 Identical to #88339 except with a `.has_storage()` check before `.storage()`. ghstack-source-id: 177028549 Differential Revision: [D41737935](https://our.internmc.facebook.com/intern/diff/D41737935/)
…mulation fast path for dense Tensors." Identical to #88339 except with a `.has_storage()` check before `.storage()`. Differential Revision: [D41737935](https://our.internmc.facebook.com/intern/diff/D41737935/) cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 [ghstack-poisoned]
…th for dense Tensors." Identical to #88339 except with a `.has_storage()` check before `.storage()`. Differential Revision: [D41737935](https://our.internmc.facebook.com/intern/diff/D41737935/) cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 [ghstack-poisoned]
…e Tensors. Pull Request resolved: #90217 Identical to #88339 except with a `.has_storage()` check before `.storage()`. ghstack-source-id: 178930904 Differential Revision: [D41737935](https://our.internmc.facebook.com/intern/diff/D41737935/)
…mulation fast path for dense Tensors." Identical to #88339 except with a `.has_storage()` check before `.storage()`. Differential Revision: [D41737935](https://our.internmc.facebook.com/intern/diff/D41737935/) cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 [ghstack-poisoned]
…th for dense Tensors." Identical to #88339 except with a `.has_storage()` check before `.storage()`. Differential Revision: [D41737935](https://our.internmc.facebook.com/intern/diff/D41737935/) cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 [ghstack-poisoned]
…e Tensors. Pull Request resolved: #90217 Identical to #88339 except with a `.has_storage()` check before `.storage()`. ghstack-source-id: 179779511 Differential Revision: [D41737935](https://our.internmc.facebook.com/intern/diff/D41737935/)
…mulation fast path for dense Tensors." Identical to #88339 except with a `.has_storage()` check before `.storage()`. Differential Revision: [D41737935](https://our.internmc.facebook.com/intern/diff/D41737935/) cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 [ghstack-poisoned]
…th for dense Tensors." Identical to #88339 except with a `.has_storage()` check before `.storage()`. Differential Revision: [D41737935](https://our.internmc.facebook.com/intern/diff/D41737935/) cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 [ghstack-poisoned]
…mulation fast path for dense Tensors." Identical to #88339 except with a `.has_storage()` check before `.storage()`. Differential Revision: [D41737935](https://our.internmc.facebook.com/intern/diff/D41737935/) cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 [ghstack-poisoned]
…e Tensors. Pull Request resolved: #90217 Identical to #88339 except with a `.has_storage()` check before `.storage()`. ghstack-source-id: 179910561 Differential Revision: [D41737935](https://our.internmc.facebook.com/intern/diff/D41737935/)
…th for dense Tensors." Identical to #88339 except with a `.has_storage()` check before `.storage()`. Differential Revision: [D41737935](https://our.internmc.facebook.com/intern/diff/D41737935/) cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 [ghstack-poisoned]
…e Tensors. (#90217) Identical to #88339 except with a `.has_storage()` check before `.storage()`. Differential Revision: [D41737935](https://our.internmc.facebook.com/intern/diff/D41737935/) Pull Request resolved: #90217 Approved by: https://github.com/ngimel
Stack from ghstack (oldest at bottom):
There is a fast path in InputBuffer to steal memory when use count is zero, however it is only used for sparse Tensors. According to Natalia, this is just because it wasn't obvious that there would be a benefit for dense Tensors so there was no reason to live dangerously. However I've noticed large Tensors in internal models which would benefit from this optimization as well.
Differential Revision: D40946601
cc @ezyang @albanD @zou3519 @gqchen @pearu @nikitaved @soulitzer @lezcano @Varal7