Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use lru_cache to cache indices data for bsr_scatter_mm. #111470

Closed
wants to merge 3 commits into from

Conversation

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 18, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/111470

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 2807fda with merge base 57c7aa1 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pearu added a commit that referenced this pull request Oct 18, 2023
ghstack-source-id: b4fae5af4bdef106c06b631da5d004fb09f0b8f8
Pull Request resolved: #111470
def bsr_scatter_mm_indices_data(bsr, other, indices_format='bsr_strided_mm_compressed', **meta_input):
"""Computes indices data for :func:`scatter_mm` used in BSR and
strided tensor matrix multiplication.
class TensorAsKey:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @albanD

I think this might not always work?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends how do you define "not working". The approach is not exact in the sense that otherwise equal tensors may represent different keys but that does not break tensor caching.
On the other hand, it is better than all tensors representing different keys, including tensors that share the storage and have equal strides which would make tensor caching rather pointless.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's (obviously) two key risks here

  1. Correctness: Two Tensors are seen as equal, but they're actually not.
  2. Performance: A lot of Tensors are being kept alive that would have otherwise been deallocated causing what maybe appear like an OOM (or making the use of sparsity cause worse memory footprint than not using sparsity).

I'm mostly worried about Correctness for now.

I think the key you implemented here is comprehensive, but @albanD tends to have a complete view of the data structure. I want to make sure we're not missing anything here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very confused by this. How do you know the content of these Tensors hasn't changed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you are right in general: two tensors having the same data pointer and strides may be unequal if these have different, say, conjugate bits.

In this particular case, we consider only integer tensors that are returned by crow/col/ccol/row_indices methods. So, I think we are relatively safe here. To make sure that TensorAsKey is used to wrap only integer tensors, we can introduce dtype check in its constructor. Are there any tensor bits specific to integer tensors?

Re performance: to avoid unnecessary memory consumption, we can reduce the LRU cache size. Currently, we are using the default size 128.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you know the content of these Tensors hasn't changed?

@albanD As said above, the feature is used only for integer tensors returned by crow/col/ccol/row_indices methods. Such tensors cannot be changed in place (well, nothing prevents changing the content of a tensor but in the case of indices of sparse tensors, an arbitrary change of such tensors will likely cause a crash).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're happy with ignoring this kind of silent correctness.
I would:

  • Update the comment to make it clearer that we want to match any Tensor that looks at the same memory even if they're not the same Tensor object. (object sameness doesn't use equal, no need to mention it here).
  • Add storage_offset to your key.
  • You should find a way to use weakref to avoid holding onto the Tensor directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do this as a follow up PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@albanD @cpuhrsch here's the follow-up PR: #112076

pytorchmergebot pushed a commit that referenced this pull request Oct 23, 2023
pytorchmergebot pushed a commit that referenced this pull request Oct 23, 2023
As in the title.

The figures below illustrate the performance differences of bsr_dense_mm with optimized parameters and bsr_dense_mm with default parameters (GPU: NVIDIA A100-SXM4-80GB). The first figure represents the performance equilibrium point in BSR tensor sparsity at which value bsr_dense_mm have the same performance characteristics as torch.matmul. The second figure represents speedups from using optimized meta parameters in bsr_dense_mm at its performance equilibrium points with respect to bsr_dense_mm with default meta parameters.

In sum, this PR speeds up `bsr_dense_mm` about 50 % depending on the bsr tensor shape and blocksize and lowers the performance equilibrium points of BSR tensor sparsity and strided tensor for matmul operations.

<img src="https://github.com/pytorch/pytorch/assets/402156/6fe9d35f-dd21-4aa0-bb01-6ee257254453" width="48%"> <img src="https://github.com/pytorch/pytorch/assets/402156/506921c6-3770-4209-ad3d-498d2ae4989d" width="48%">

Pull Request resolved: #111760
Approved by: https://github.com/cpuhrsch
ghstack dependencies: #110396, #111470, #111489
pytorchmergebot pushed a commit that referenced this pull request Oct 23, 2023
pearu added a commit that referenced this pull request Oct 25, 2023
pearu added a commit that referenced this pull request Oct 26, 2023
…)"

This PR addresses the discussion items in #111470 (comment), that is,
- use weakref when storing tensors as keys,
- add `storage_offset` to the key data,
- and revise the description of the `TensorAsKey` utility.




[ghstack-poisoned]
pearu added a commit that referenced this pull request Oct 26, 2023
ghstack-source-id: 387ab6555d39da7c6afe9a0534304aae478689e8
Pull Request resolved: #112076
pearu added a commit that referenced this pull request Oct 27, 2023
ghstack-source-id: 80454291cb4c1209f8479f374bb74e16c70626cd
Pull Request resolved: #112076
pearu added a commit that referenced this pull request Oct 27, 2023
…llow-up to #111470)"

This PR addresses the discussion items in #111470 (comment), that is,
- use weakref when storing tensors as keys,
- add `storage_offset` to the key data,
- and revise the description of the `TensorAsKey` utility.




[ghstack-poisoned]
pearu added a commit that referenced this pull request Oct 27, 2023
…)"

This PR addresses the discussion items in #111470 (comment), that is,
- use weakref when storing tensors as keys,
- add `storage_offset` to the key data,
- and revise the description of the `TensorAsKey` utility.




[ghstack-poisoned]
pearu added a commit that referenced this pull request Oct 27, 2023
…llow-up to #111470)"

This PR addresses the discussion items in #111470 (comment), that is,
- use weakref when storing tensors as keys,
- add `storage_offset` to the key data,
- and revise the description of the `TensorAsKey` utility.




[ghstack-poisoned]
pearu added a commit that referenced this pull request Oct 27, 2023
…)"

This PR addresses the discussion items in #111470 (comment), that is,
- use weakref when storing tensors as keys,
- add `storage_offset` to the key data,
- and revise the description of the `TensorAsKey` utility.




[ghstack-poisoned]
pearu added a commit that referenced this pull request Oct 27, 2023
ghstack-source-id: 0e4f72b72cd925feab6fad78b8443a874a1eac46
Pull Request resolved: #112076
@facebook-github-bot facebook-github-bot deleted the gh/pearu/121/head branch October 27, 2023 14:25
pearu added a commit that referenced this pull request Oct 29, 2023
ghstack-source-id: 112bf9e37dd54d463322c0b937fbbd32d63483b7
Pull Request resolved: #112076
pearu added a commit that referenced this pull request Oct 29, 2023
…llow-up to #111470)"

This PR addresses the discussion items in #111470 (comment), that is,
- use weakref when storing tensors as keys,
- add `storage_offset` to the key data,
- and revise the description of the `TensorAsKey` utility.




[ghstack-poisoned]
pearu added a commit that referenced this pull request Oct 29, 2023
…)"

This PR addresses the discussion items in #111470 (comment), that is,
- use weakref when storing tensors as keys,
- add `storage_offset` to the key data,
- and revise the description of the `TensorAsKey` utility.




[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request Oct 30, 2023
This PR addresses the discussion items in #111470 (comment), that is,
- use weakref when storing tensors as keys,
- add `storage_offset` to the key data,
- and revise the description of the `TensorAsKey` utility.

Pull Request resolved: #112076
Approved by: https://github.com/cpuhrsch
ghstack dependencies: #112154
xuhancn pushed a commit to xuhancn/pytorch that referenced this pull request Nov 7, 2023
xuhancn pushed a commit to xuhancn/pytorch that referenced this pull request Nov 7, 2023
xuhancn pushed a commit to xuhancn/pytorch that referenced this pull request Nov 7, 2023
…1760)

As in the title.

The figures below illustrate the performance differences of bsr_dense_mm with optimized parameters and bsr_dense_mm with default parameters (GPU: NVIDIA A100-SXM4-80GB). The first figure represents the performance equilibrium point in BSR tensor sparsity at which value bsr_dense_mm have the same performance characteristics as torch.matmul. The second figure represents speedups from using optimized meta parameters in bsr_dense_mm at its performance equilibrium points with respect to bsr_dense_mm with default meta parameters.

In sum, this PR speeds up `bsr_dense_mm` about 50 % depending on the bsr tensor shape and blocksize and lowers the performance equilibrium points of BSR tensor sparsity and strided tensor for matmul operations.

<img src="https://github.com/pytorch/pytorch/assets/402156/6fe9d35f-dd21-4aa0-bb01-6ee257254453" width="48%"> <img src="https://github.com/pytorch/pytorch/assets/402156/506921c6-3770-4209-ad3d-498d2ae4989d" width="48%">

Pull Request resolved: pytorch#111760
Approved by: https://github.com/cpuhrsch
ghstack dependencies: pytorch#110396, pytorch#111470, pytorch#111489
xuhancn pushed a commit to xuhancn/pytorch that referenced this pull request Nov 7, 2023
xuhancn pushed a commit to xuhancn/pytorch that referenced this pull request Nov 7, 2023
…ytorch#112076)

This PR addresses the discussion items in pytorch#111470 (comment), that is,
- use weakref when storing tensors as keys,
- add `storage_offset` to the key data,
- and revise the description of the `TensorAsKey` utility.

Pull Request resolved: pytorch#112076
Approved by: https://github.com/cpuhrsch
ghstack dependencies: pytorch#112154
Skylion007 pushed a commit to Skylion007/pytorch that referenced this pull request Nov 14, 2023
Skylion007 pushed a commit to Skylion007/pytorch that referenced this pull request Nov 14, 2023
Skylion007 pushed a commit to Skylion007/pytorch that referenced this pull request Nov 14, 2023
…1760)

As in the title.

The figures below illustrate the performance differences of bsr_dense_mm with optimized parameters and bsr_dense_mm with default parameters (GPU: NVIDIA A100-SXM4-80GB). The first figure represents the performance equilibrium point in BSR tensor sparsity at which value bsr_dense_mm have the same performance characteristics as torch.matmul. The second figure represents speedups from using optimized meta parameters in bsr_dense_mm at its performance equilibrium points with respect to bsr_dense_mm with default meta parameters.

In sum, this PR speeds up `bsr_dense_mm` about 50 % depending on the bsr tensor shape and blocksize and lowers the performance equilibrium points of BSR tensor sparsity and strided tensor for matmul operations.

<img src="https://github.com/pytorch/pytorch/assets/402156/6fe9d35f-dd21-4aa0-bb01-6ee257254453" width="48%"> <img src="https://github.com/pytorch/pytorch/assets/402156/506921c6-3770-4209-ad3d-498d2ae4989d" width="48%">

Pull Request resolved: pytorch#111760
Approved by: https://github.com/cpuhrsch
ghstack dependencies: pytorch#110396, pytorch#111470, pytorch#111489
Skylion007 pushed a commit to Skylion007/pytorch that referenced this pull request Nov 14, 2023
Skylion007 pushed a commit to Skylion007/pytorch that referenced this pull request Nov 14, 2023
…ytorch#112076)

This PR addresses the discussion items in pytorch#111470 (comment), that is,
- use weakref when storing tensors as keys,
- add `storage_offset` to the key data,
- and revise the description of the `TensorAsKey` utility.

Pull Request resolved: pytorch#112076
Approved by: https://github.com/cpuhrsch
ghstack dependencies: pytorch#112154
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants