Use lru_cache to cache indices data for bsr_scatter_mm. #111470

pearu · 2023-10-18T10:48:22Z

Stack from ghstack (oldest at bottom):

[ghstack-poisoned]

pytorch-bot · 2023-10-18T10:48:26Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/111470

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 2807fda with merge base 57c7aa1 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: b4fae5af4bdef106c06b631da5d004fb09f0b8f8 Pull Request resolved: #111470

[ghstack-poisoned]

cpuhrsch · 2023-10-23T17:17:02Z

torch/sparse/_triton_ops.py

-def bsr_scatter_mm_indices_data(bsr, other, indices_format='bsr_strided_mm_compressed', **meta_input):
-    """Computes indices data for :func:`scatter_mm` used in BSR and
-    strided tensor matrix multiplication.
+class TensorAsKey:


cc @albanD

I think this might not always work?

It depends how do you define "not working". The approach is not exact in the sense that otherwise equal tensors may represent different keys but that does not break tensor caching.
On the other hand, it is better than all tensors representing different keys, including tensors that share the storage and have equal strides which would make tensor caching rather pointless.

There's (obviously) two key risks here

Correctness: Two Tensors are seen as equal, but they're actually not.

Performance: A lot of Tensors are being kept alive that would have otherwise been deallocated causing what maybe appear like an OOM (or making the use of sparsity cause worse memory footprint than not using sparsity).

I'm mostly worried about Correctness for now.

I think the key you implemented here is comprehensive, but @albanD tends to have a complete view of the data structure. I want to make sure we're not missing anything here.

I'm very confused by this. How do you know the content of these Tensors hasn't changed?

yes, you are right in general: two tensors having the same data pointer and strides may be unequal if these have different, say, conjugate bits.

In this particular case, we consider only integer tensors that are returned by crow/col/ccol/row_indices methods. So, I think we are relatively safe here. To make sure that TensorAsKey is used to wrap only integer tensors, we can introduce dtype check in its constructor. Are there any tensor bits specific to integer tensors?

Re performance: to avoid unnecessary memory consumption, we can reduce the LRU cache size. Currently, we are using the default size 128.

How do you know the content of these Tensors hasn't changed?

@albanD As said above, the feature is used only for integer tensors returned by crow/col/ccol/row_indices methods. Such tensors cannot be changed in place (well, nothing prevents changing the content of a tensor but in the case of indices of sparse tensors, an arbitrary change of such tensors will likely cause a crash).

If you're happy with ignoring this kind of silent correctness.
I would:

Update the comment to make it clearer that we want to match any Tensor that looks at the same memory even if they're not the same Tensor object. (object sameness doesn't use equal, no need to mention it here).

Add storage_offset to your key.

You should find a way to use weakref to avoid holding onto the Tensor directly.

Let's do this as a follow up PR.

@albanD @cpuhrsch here's the follow-up PR: #112076

… is 16. (#111489) Pull Request resolved: #111489 Approved by: https://github.com/cpuhrsch ghstack dependencies: #110396, #111470

As in the title. The figures below illustrate the performance differences of bsr_dense_mm with optimized parameters and bsr_dense_mm with default parameters (GPU: NVIDIA A100-SXM4-80GB). The first figure represents the performance equilibrium point in BSR tensor sparsity at which value bsr_dense_mm have the same performance characteristics as torch.matmul. The second figure represents speedups from using optimized meta parameters in bsr_dense_mm at its performance equilibrium points with respect to bsr_dense_mm with default meta parameters. In sum, this PR speeds up `bsr_dense_mm` about 50 % depending on the bsr tensor shape and blocksize and lowers the performance equilibrium points of BSR tensor sparsity and strided tensor for matmul operations. <img src="https://github.com/pytorch/pytorch/assets/402156/6fe9d35f-dd21-4aa0-bb01-6ee257254453" width="48%"> <img src="https://github.com/pytorch/pytorch/assets/402156/506921c6-3770-4209-ad3d-498d2ae4989d" width="48%"> Pull Request resolved: #111760 Approved by: https://github.com/cpuhrsch ghstack dependencies: #110396, #111470, #111489

#111796) Pull Request resolved: #111796 Approved by: https://github.com/cpuhrsch ghstack dependencies: #110396, #111470, #111489, #111760

[ghstack-poisoned]

…)" This PR addresses the discussion items in #111470 (comment), that is, - use weakref when storing tensors as keys, - add `storage_offset` to the key data, - and revise the description of the `TensorAsKey` utility. [ghstack-poisoned]

ghstack-source-id: 387ab6555d39da7c6afe9a0534304aae478689e8 Pull Request resolved: #112076

ghstack-source-id: 80454291cb4c1209f8479f374bb74e16c70626cd Pull Request resolved: #112076

…llow-up to #111470)" This PR addresses the discussion items in #111470 (comment), that is, - use weakref when storing tensors as keys, - add `storage_offset` to the key data, - and revise the description of the `TensorAsKey` utility. [ghstack-poisoned]

…)" This PR addresses the discussion items in #111470 (comment), that is, - use weakref when storing tensors as keys, - add `storage_offset` to the key data, - and revise the description of the `TensorAsKey` utility. [ghstack-poisoned]

…llow-up to #111470)" This PR addresses the discussion items in #111470 (comment), that is, - use weakref when storing tensors as keys, - add `storage_offset` to the key data, - and revise the description of the `TensorAsKey` utility. [ghstack-poisoned]

…)" This PR addresses the discussion items in #111470 (comment), that is, - use weakref when storing tensors as keys, - add `storage_offset` to the key data, - and revise the description of the `TensorAsKey` utility. [ghstack-poisoned]

ghstack-source-id: 0e4f72b72cd925feab6fad78b8443a874a1eac46 Pull Request resolved: #112076

ghstack-source-id: 112bf9e37dd54d463322c0b937fbbd32d63483b7 Pull Request resolved: #112076

…llow-up to #111470)" This PR addresses the discussion items in #111470 (comment), that is, - use weakref when storing tensors as keys, - add `storage_offset` to the key data, - and revise the description of the `TensorAsKey` utility. [ghstack-poisoned]

…)" This PR addresses the discussion items in #111470 (comment), that is, - use weakref when storing tensors as keys, - add `storage_offset` to the key data, - and revise the description of the `TensorAsKey` utility. [ghstack-poisoned]

This PR addresses the discussion items in #111470 (comment), that is, - use weakref when storing tensors as keys, - add `storage_offset` to the key data, - and revise the description of the `TensorAsKey` utility. Pull Request resolved: #112076 Approved by: https://github.com/cpuhrsch ghstack dependencies: #112154

Pull Request resolved: pytorch#111470 Approved by: https://github.com/cpuhrsch ghstack dependencies: pytorch#110396

… is 16. (pytorch#111489) Pull Request resolved: pytorch#111489 Approved by: https://github.com/cpuhrsch ghstack dependencies: pytorch#110396, pytorch#111470

…1760) As in the title. The figures below illustrate the performance differences of bsr_dense_mm with optimized parameters and bsr_dense_mm with default parameters (GPU: NVIDIA A100-SXM4-80GB). The first figure represents the performance equilibrium point in BSR tensor sparsity at which value bsr_dense_mm have the same performance characteristics as torch.matmul. The second figure represents speedups from using optimized meta parameters in bsr_dense_mm at its performance equilibrium points with respect to bsr_dense_mm with default meta parameters. In sum, this PR speeds up `bsr_dense_mm` about 50 % depending on the bsr tensor shape and blocksize and lowers the performance equilibrium points of BSR tensor sparsity and strided tensor for matmul operations. <img src="https://github.com/pytorch/pytorch/assets/402156/6fe9d35f-dd21-4aa0-bb01-6ee257254453" width="48%"> <img src="https://github.com/pytorch/pytorch/assets/402156/506921c6-3770-4209-ad3d-498d2ae4989d" width="48%"> Pull Request resolved: pytorch#111760 Approved by: https://github.com/cpuhrsch ghstack dependencies: pytorch#110396, pytorch#111470, pytorch#111489

pytorch#111796) Pull Request resolved: pytorch#111796 Approved by: https://github.com/cpuhrsch ghstack dependencies: pytorch#110396, pytorch#111470, pytorch#111489, pytorch#111760

…ytorch#112076) This PR addresses the discussion items in pytorch#111470 (comment), that is, - use weakref when storing tensors as keys, - add `storage_offset` to the key data, - and revise the description of the `TensorAsKey` utility. Pull Request resolved: pytorch#112076 Approved by: https://github.com/cpuhrsch ghstack dependencies: pytorch#112154

Pull Request resolved: pytorch#111470 Approved by: https://github.com/cpuhrsch ghstack dependencies: pytorch#110396

… is 16. (pytorch#111489) Pull Request resolved: pytorch#111489 Approved by: https://github.com/cpuhrsch ghstack dependencies: pytorch#110396, pytorch#111470

…1760) As in the title. The figures below illustrate the performance differences of bsr_dense_mm with optimized parameters and bsr_dense_mm with default parameters (GPU: NVIDIA A100-SXM4-80GB). The first figure represents the performance equilibrium point in BSR tensor sparsity at which value bsr_dense_mm have the same performance characteristics as torch.matmul. The second figure represents speedups from using optimized meta parameters in bsr_dense_mm at its performance equilibrium points with respect to bsr_dense_mm with default meta parameters. In sum, this PR speeds up `bsr_dense_mm` about 50 % depending on the bsr tensor shape and blocksize and lowers the performance equilibrium points of BSR tensor sparsity and strided tensor for matmul operations. <img src="https://github.com/pytorch/pytorch/assets/402156/6fe9d35f-dd21-4aa0-bb01-6ee257254453" width="48%"> <img src="https://github.com/pytorch/pytorch/assets/402156/506921c6-3770-4209-ad3d-498d2ae4989d" width="48%"> Pull Request resolved: pytorch#111760 Approved by: https://github.com/cpuhrsch ghstack dependencies: pytorch#110396, pytorch#111470, pytorch#111489

pytorch#111796) Pull Request resolved: pytorch#111796 Approved by: https://github.com/cpuhrsch ghstack dependencies: pytorch#110396, pytorch#111470, pytorch#111489, pytorch#111760

…ytorch#112076) This PR addresses the discussion items in pytorch#111470 (comment), that is, - use weakref when storing tensors as keys, - add `storage_offset` to the key data, - and revise the description of the `TensorAsKey` utility. Pull Request resolved: pytorch#112076 Approved by: https://github.com/cpuhrsch ghstack dependencies: pytorch#112154

Use lru_cache to cache indices data for bsr_scatter_mm.

85999d5

[ghstack-poisoned]

pearu mentioned this pull request Oct 18, 2023

Add scatter_mm and bsr_scatter_mm operations. #110396

Closed

pytorch-bot bot added the release notes: sparse release notes category label Oct 18, 2023

pearu added a commit that referenced this pull request Oct 18, 2023

Use lru_cache to cache indices data for bsr_scatter_mm.

b3d7bdf

ghstack-source-id: b4fae5af4bdef106c06b631da5d004fb09f0b8f8 Pull Request resolved: #111470

pytorchbot added the open source label Oct 18, 2023

Update on "Use lru_cache to cache indices data for bsr_scatter_mm."

09dab61

[ghstack-poisoned]

This was referenced Oct 18, 2023

Use more performant bsr_scatter_mm within bsr_dense_mm when blocksize is 16. #111489

Closed

Add NVIDIA A100 optimized meta parameters to bsr_dense_mm #111760

Closed

Update on "Use lru_cache to cache indices data for bsr_scatter_mm."

2807fda

[ghstack-poisoned]

pearu mentioned this pull request Oct 23, 2023

Add batched dimensions support to the second operand of bsr_scatter_mm #111796

Closed

pearu requested review from cpuhrsch and amjames October 23, 2023 17:11

cpuhrsch approved these changes Oct 23, 2023

View reviewed changes

cpuhrsch reviewed Oct 23, 2023

View reviewed changes

pytorchmergebot added the Merged label Oct 23, 2023

pytorchmergebot closed this in 6078ed9 Oct 23, 2023

pytorchmergebot pushed a commit that referenced this pull request Oct 23, 2023

Use more performant bsr_scatter_mm within bsr_dense_mm when blocksize…

f3d08ab

… is 16. (#111489) Pull Request resolved: #111489 Approved by: https://github.com/cpuhrsch ghstack dependencies: #110396, #111470

pearu added a commit that referenced this pull request Oct 25, 2023

Use weakref in storing tensors as keys (follow-up to #111470)

8bfe69e

[ghstack-poisoned]

pearu mentioned this pull request Oct 25, 2023

Use weakref in storing tensors as keys (follow-up to #111470) #112076

Closed

pearu added a commit that referenced this pull request Oct 26, 2023

Use weakref in storing tensors as keys (follow-up to #111470)

5c1cc34

ghstack-source-id: 387ab6555d39da7c6afe9a0534304aae478689e8 Pull Request resolved: #112076

pearu added a commit that referenced this pull request Oct 27, 2023

Use weakref in storing tensors as keys (follow-up to #111470)

38a19f2

ghstack-source-id: 80454291cb4c1209f8479f374bb74e16c70626cd Pull Request resolved: #112076

pearu added a commit that referenced this pull request Oct 27, 2023

Use weakref in storing tensors as keys (follow-up to #111470)

1414ef8

ghstack-source-id: 0e4f72b72cd925feab6fad78b8443a874a1eac46 Pull Request resolved: #112076

facebook-github-bot deleted the gh/pearu/121/head branch October 27, 2023 14:25

pearu added a commit that referenced this pull request Oct 29, 2023

Use weakref in storing tensors as keys (follow-up to #111470)

c2fe01e

ghstack-source-id: 112bf9e37dd54d463322c0b937fbbd32d63483b7 Pull Request resolved: #112076

xuhancn pushed a commit to xuhancn/pytorch that referenced this pull request Nov 7, 2023

Use lru_cache to cache indices data for bsr_scatter_mm. (pytorch#111470)

e1a4751

Pull Request resolved: pytorch#111470 Approved by: https://github.com/cpuhrsch ghstack dependencies: pytorch#110396

Skylion007 pushed a commit to Skylion007/pytorch that referenced this pull request Nov 14, 2023

Use lru_cache to cache indices data for bsr_scatter_mm. (pytorch#111470)

1f9c421

Pull Request resolved: pytorch#111470 Approved by: https://github.com/cpuhrsch ghstack dependencies: pytorch#110396

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use lru_cache to cache indices data for bsr_scatter_mm. #111470

Use lru_cache to cache indices data for bsr_scatter_mm. #111470

pearu commented Oct 18, 2023 •

edited

Loading

pytorch-bot bot commented Oct 18, 2023 •

edited

Loading

cpuhrsch Oct 23, 2023

pearu Oct 23, 2023

cpuhrsch Oct 23, 2023

albanD Oct 23, 2023

pearu Oct 23, 2023

pearu Oct 23, 2023

albanD Oct 23, 2023

cpuhrsch Oct 23, 2023

pearu Oct 25, 2023

Use lru_cache to cache indices data for bsr_scatter_mm. #111470

Use lru_cache to cache indices data for bsr_scatter_mm. #111470

Conversation

pearu commented Oct 18, 2023 • edited Loading

pytorch-bot bot commented Oct 18, 2023 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/111470

✅ No Failures

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pearu commented Oct 18, 2023 •

edited

Loading

pytorch-bot bot commented Oct 18, 2023 •

edited

Loading