-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[quant][core][gpu][bux fix] Added clone and contiguous() to broadcasted_bias tensor in quantized cudnn linear op #75944
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ed_bias tensor in quantized cudnn linear op Summary: The previous implementation for broadcasted_bias in quantized cudnn linear op has 2 issues. 1) broadcasted_bias is a view of the the input bias tensor. This is not desired as any modifications to broadcasted_bias is also done to the input bias. To remedy this, we clone the input bias tensor. 2) Calling broadcast_to doesn't affect the storage, which is problematic for the cudnn operations. We need to create a fully broadcasted tensor, rather than a view (which is what's returned by broadcast_to). To remedy this, we call contiguous(). Test plan: python test/test_quantization.py -k test_linear_cudnn [ghstack-poisoned]
…ed_bias tensor in quantized cudnn linear op Summary: The previous implementation for broadcasted_bias in quantized cudnn linear op has 2 issues. 1) broadcasted_bias is a view of the the input bias tensor. This is not desired as any modifications to broadcasted_bias is also done to the input bias. To remedy this, we clone the input bias tensor. 2) Calling broadcast_to doesn't affect the storage, which is problematic for the cudnn operations. We need to create a fully broadcasted tensor, rather than a view (which is what's returned by broadcast_to). To remedy this, we call contiguous(). Test plan: python test/test_quantization.py -k test_linear_cudnn ghstack-source-id: 22cabed Pull Request resolved: #75944
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 18948a3 (more details on the Dr. CI page):
🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages
|
…o broadcasted_bias tensor in quantized cudnn linear op" Summary: The previous implementation for broadcasted_bias in quantized cudnn linear op has 2 issues. 1) broadcasted_bias is a view of the the input bias tensor. This is not desired as any modifications to broadcasted_bias is also done to the input bias. To remedy this, we clone the input bias tensor. 2) Calling broadcast_to doesn't affect the storage, which is problematic for the cudnn operations. We need to create a fully broadcasted tensor, rather than a view (which is what's returned by broadcast_to). To remedy this, we call contiguous(). Test plan: python test/test_quantization.py -k test_linear_cudnn [ghstack-poisoned]
…o broadcasted_bias tensor in quantized cudnn linear op" Summary: The previous implementation for broadcasted_bias in quantized cudnn linear op has 2 issues. 1) broadcasted_bias is a view of the the input bias tensor. This is not desired as any modifications to broadcasted_bias is also done to the input bias. To remedy this, we clone the input bias tensor. 2) Calling broadcast_to doesn't affect the storage, which is problematic for the cudnn operations. We need to create a fully broadcasted tensor, rather than a view (which is what's returned by broadcast_to). To remedy this, we call contiguous(). Test plan: python test/test_quantization.py -k test_linear_cudnn [ghstack-poisoned]
|
@dzdang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
broadcast_to seems to be the same as expand, https://pytorch.org/docs/stable/generated/torch.broadcast_to.html?highlight=broadcast_to#torch.broadcast_to, I feel expand might be slightly more popular than broadcast_to, maybe we can use that
|
@dzdang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
|
@pytorchbot merge this (Initiating merge automatically since Phabricator Diff has merged) |
…ed_bias tensor in quantized cudnn linear op (#75944) Summary: Pull Request resolved: #75944 The previous implementation for broadcasted_bias in quantized cudnn linear op has 2 issues. 1) broadcasted_bias is a view of the the input bias tensor. This is not desired as any modifications to broadcasted_bias is also done to the input bias. To remedy this, we clone the input bias tensor. 2) Calling broadcast_to doesn't affect the storage, which is problematic for the cudnn operations. We need to create a fully broadcasted tensor, rather than a view (which is what's returned by broadcast_to). To remedy this, we call contiguous(). (Note: this ignores all push blocking failures!) Test Plan: python test/test_quantization.py -k test_linear_cudnn Reviewed By: jerryzh168 Differential Revision: D35717355 Pulled By: dzdang fbshipit-source-id: bc5e47666e4d0a8e1a544a094008520a290e5d25
…ed_bias tensor in quantized cudnn linear op Summary: The previous implementation for broadcasted_bias in quantized cudnn linear op has 2 issues. 1) broadcasted_bias is a view of the the input bias tensor. This is not desired as any modifications to broadcasted_bias is also done to the input bias. To remedy this, we clone the input bias tensor. 2) Calling broadcast_to doesn't affect the storage, which is problematic for the cudnn operations. We need to create a fully broadcasted tensor, rather than a view (which is what's returned by broadcast_to). To remedy this, we call contiguous(). Test plan: python test/test_quantization.py -k test_linear_cudnn Pull Request resolved: #75944 Approved by: https://github.com/jerryzh168 (cherry picked from commit 381e725)
Stack from ghstack (oldest at bottom):
Summary:
The previous implementation for broadcasted_bias in quantized cudnn
linear op has 2 issues.
broadcasted_bias is a view of the the input bias tensor. This is not
desired as any modifications to broadcasted_bias is also done to the
input bias. To remedy this, we clone the input bias tensor.
Calling broadcast_to doesn't affect the storage, which is problematic
for the cudnn operations. We need to create a fully broadcasted tensor,
rather than a view (which is what's returned by broadcast_to). To remedy
this, we call contiguous().
Test plan:
python test/test_quantization.py -k test_linear_cudnn
Differential Revision: D35717355