-
Notifications
You must be signed in to change notification settings - Fork 22.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use lru_cache to cache indices data for bsr_scatter_mm. #111470
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/111470
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 2807fda with merge base 57c7aa1 (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
ghstack-source-id: b4fae5af4bdef106c06b631da5d004fb09f0b8f8 Pull Request resolved: #111470
[ghstack-poisoned]
[ghstack-poisoned]
def bsr_scatter_mm_indices_data(bsr, other, indices_format='bsr_strided_mm_compressed', **meta_input): | ||
"""Computes indices data for :func:`scatter_mm` used in BSR and | ||
strided tensor matrix multiplication. | ||
class TensorAsKey: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @albanD
I think this might not always work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends how do you define "not working". The approach is not exact in the sense that otherwise equal tensors may represent different keys but that does not break tensor caching.
On the other hand, it is better than all tensors representing different keys, including tensors that share the storage and have equal strides which would make tensor caching rather pointless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's (obviously) two key risks here
- Correctness: Two Tensors are seen as equal, but they're actually not.
- Performance: A lot of Tensors are being kept alive that would have otherwise been deallocated causing what maybe appear like an OOM (or making the use of sparsity cause worse memory footprint than not using sparsity).
I'm mostly worried about Correctness for now.
I think the key you implemented here is comprehensive, but @albanD tends to have a complete view of the data structure. I want to make sure we're not missing anything here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very confused by this. How do you know the content of these Tensors hasn't changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, you are right in general: two tensors having the same data pointer and strides may be unequal if these have different, say, conjugate bits.
In this particular case, we consider only integer tensors that are returned by crow/col/ccol/row_indices
methods. So, I think we are relatively safe here. To make sure that TensorAsKey is used to wrap only integer tensors, we can introduce dtype check in its constructor. Are there any tensor bits specific to integer tensors?
Re performance: to avoid unnecessary memory consumption, we can reduce the LRU cache size. Currently, we are using the default size 128.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you know the content of these Tensors hasn't changed?
@albanD As said above, the feature is used only for integer tensors returned by crow/col/ccol/row_indices
methods. Such tensors cannot be changed in place (well, nothing prevents changing the content of a tensor but in the case of indices of sparse tensors, an arbitrary change of such tensors will likely cause a crash).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're happy with ignoring this kind of silent correctness.
I would:
- Update the comment to make it clearer that we want to match any Tensor that looks at the same memory even if they're not the same Tensor object. (object sameness doesn't use equal, no need to mention it here).
- Add storage_offset to your key.
- You should find a way to use weakref to avoid holding onto the Tensor directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do this as a follow up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
… is 16. (#111489) Pull Request resolved: #111489 Approved by: https://github.com/cpuhrsch ghstack dependencies: #110396, #111470
As in the title. The figures below illustrate the performance differences of bsr_dense_mm with optimized parameters and bsr_dense_mm with default parameters (GPU: NVIDIA A100-SXM4-80GB). The first figure represents the performance equilibrium point in BSR tensor sparsity at which value bsr_dense_mm have the same performance characteristics as torch.matmul. The second figure represents speedups from using optimized meta parameters in bsr_dense_mm at its performance equilibrium points with respect to bsr_dense_mm with default meta parameters. In sum, this PR speeds up `bsr_dense_mm` about 50 % depending on the bsr tensor shape and blocksize and lowers the performance equilibrium points of BSR tensor sparsity and strided tensor for matmul operations. <img src="https://github.com/pytorch/pytorch/assets/402156/6fe9d35f-dd21-4aa0-bb01-6ee257254453" width="48%"> <img src="https://github.com/pytorch/pytorch/assets/402156/506921c6-3770-4209-ad3d-498d2ae4989d" width="48%"> Pull Request resolved: #111760 Approved by: https://github.com/cpuhrsch ghstack dependencies: #110396, #111470, #111489
…)" This PR addresses the discussion items in #111470 (comment), that is, - use weakref when storing tensors as keys, - add `storage_offset` to the key data, - and revise the description of the `TensorAsKey` utility. [ghstack-poisoned]
ghstack-source-id: 387ab6555d39da7c6afe9a0534304aae478689e8 Pull Request resolved: #112076
ghstack-source-id: 80454291cb4c1209f8479f374bb74e16c70626cd Pull Request resolved: #112076
…llow-up to #111470)" This PR addresses the discussion items in #111470 (comment), that is, - use weakref when storing tensors as keys, - add `storage_offset` to the key data, - and revise the description of the `TensorAsKey` utility. [ghstack-poisoned]
…)" This PR addresses the discussion items in #111470 (comment), that is, - use weakref when storing tensors as keys, - add `storage_offset` to the key data, - and revise the description of the `TensorAsKey` utility. [ghstack-poisoned]
…llow-up to #111470)" This PR addresses the discussion items in #111470 (comment), that is, - use weakref when storing tensors as keys, - add `storage_offset` to the key data, - and revise the description of the `TensorAsKey` utility. [ghstack-poisoned]
…)" This PR addresses the discussion items in #111470 (comment), that is, - use weakref when storing tensors as keys, - add `storage_offset` to the key data, - and revise the description of the `TensorAsKey` utility. [ghstack-poisoned]
ghstack-source-id: 0e4f72b72cd925feab6fad78b8443a874a1eac46 Pull Request resolved: #112076
ghstack-source-id: 112bf9e37dd54d463322c0b937fbbd32d63483b7 Pull Request resolved: #112076
…llow-up to #111470)" This PR addresses the discussion items in #111470 (comment), that is, - use weakref when storing tensors as keys, - add `storage_offset` to the key data, - and revise the description of the `TensorAsKey` utility. [ghstack-poisoned]
…)" This PR addresses the discussion items in #111470 (comment), that is, - use weakref when storing tensors as keys, - add `storage_offset` to the key data, - and revise the description of the `TensorAsKey` utility. [ghstack-poisoned]
This PR addresses the discussion items in #111470 (comment), that is, - use weakref when storing tensors as keys, - add `storage_offset` to the key data, - and revise the description of the `TensorAsKey` utility. Pull Request resolved: #112076 Approved by: https://github.com/cpuhrsch ghstack dependencies: #112154
Pull Request resolved: pytorch#111470 Approved by: https://github.com/cpuhrsch ghstack dependencies: pytorch#110396
… is 16. (pytorch#111489) Pull Request resolved: pytorch#111489 Approved by: https://github.com/cpuhrsch ghstack dependencies: pytorch#110396, pytorch#111470
…1760) As in the title. The figures below illustrate the performance differences of bsr_dense_mm with optimized parameters and bsr_dense_mm with default parameters (GPU: NVIDIA A100-SXM4-80GB). The first figure represents the performance equilibrium point in BSR tensor sparsity at which value bsr_dense_mm have the same performance characteristics as torch.matmul. The second figure represents speedups from using optimized meta parameters in bsr_dense_mm at its performance equilibrium points with respect to bsr_dense_mm with default meta parameters. In sum, this PR speeds up `bsr_dense_mm` about 50 % depending on the bsr tensor shape and blocksize and lowers the performance equilibrium points of BSR tensor sparsity and strided tensor for matmul operations. <img src="https://github.com/pytorch/pytorch/assets/402156/6fe9d35f-dd21-4aa0-bb01-6ee257254453" width="48%"> <img src="https://github.com/pytorch/pytorch/assets/402156/506921c6-3770-4209-ad3d-498d2ae4989d" width="48%"> Pull Request resolved: pytorch#111760 Approved by: https://github.com/cpuhrsch ghstack dependencies: pytorch#110396, pytorch#111470, pytorch#111489
pytorch#111796) Pull Request resolved: pytorch#111796 Approved by: https://github.com/cpuhrsch ghstack dependencies: pytorch#110396, pytorch#111470, pytorch#111489, pytorch#111760
…ytorch#112076) This PR addresses the discussion items in pytorch#111470 (comment), that is, - use weakref when storing tensors as keys, - add `storage_offset` to the key data, - and revise the description of the `TensorAsKey` utility. Pull Request resolved: pytorch#112076 Approved by: https://github.com/cpuhrsch ghstack dependencies: pytorch#112154
Pull Request resolved: pytorch#111470 Approved by: https://github.com/cpuhrsch ghstack dependencies: pytorch#110396
… is 16. (pytorch#111489) Pull Request resolved: pytorch#111489 Approved by: https://github.com/cpuhrsch ghstack dependencies: pytorch#110396, pytorch#111470
…1760) As in the title. The figures below illustrate the performance differences of bsr_dense_mm with optimized parameters and bsr_dense_mm with default parameters (GPU: NVIDIA A100-SXM4-80GB). The first figure represents the performance equilibrium point in BSR tensor sparsity at which value bsr_dense_mm have the same performance characteristics as torch.matmul. The second figure represents speedups from using optimized meta parameters in bsr_dense_mm at its performance equilibrium points with respect to bsr_dense_mm with default meta parameters. In sum, this PR speeds up `bsr_dense_mm` about 50 % depending on the bsr tensor shape and blocksize and lowers the performance equilibrium points of BSR tensor sparsity and strided tensor for matmul operations. <img src="https://github.com/pytorch/pytorch/assets/402156/6fe9d35f-dd21-4aa0-bb01-6ee257254453" width="48%"> <img src="https://github.com/pytorch/pytorch/assets/402156/506921c6-3770-4209-ad3d-498d2ae4989d" width="48%"> Pull Request resolved: pytorch#111760 Approved by: https://github.com/cpuhrsch ghstack dependencies: pytorch#110396, pytorch#111470, pytorch#111489
pytorch#111796) Pull Request resolved: pytorch#111796 Approved by: https://github.com/cpuhrsch ghstack dependencies: pytorch#110396, pytorch#111470, pytorch#111489, pytorch#111760
…ytorch#112076) This PR addresses the discussion items in pytorch#111470 (comment), that is, - use weakref when storing tensors as keys, - add `storage_offset` to the key data, - and revise the description of the `TensorAsKey` utility. Pull Request resolved: pytorch#112076 Approved by: https://github.com/cpuhrsch ghstack dependencies: pytorch#112154
Stack from ghstack (oldest at bottom):