Skip to content

Conversation

pearu
Copy link
Collaborator

@pearu pearu commented Oct 25, 2023

This PR addresses the discussion items in #111470 (comment), that is,

  • use weakref when storing tensors as keys,
  • add storage_offset to the key data,
  • and revise the description of the TensorAsKey utility.

Stack from ghstack (oldest at bottom):

@pytorch-bot pytorch-bot bot added release notes: sparse release notes category labels Oct 25, 2023
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 25, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/112076

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 4902f95 with merge base f5088d2 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…)"

This PR addresses the discussion items in #111470 (comment), that is,
- use weakref when storing tensors as keys,
- add `storage_offset` to the key data,
- and revise the description of the `TensorAsKey` utility.




[ghstack-poisoned]
pearu added a commit that referenced this pull request Oct 26, 2023
a = non_zero_row_indices * (Ms * N)
r_offsets = (a + b).view(-1)
c_indices = crow_indices
# crow_indices consitutes a part of a key in lru_cache as well
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused about what this means tbh.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The definition of _bsr_scatter_mm_indices_data function is:

@lru_cache(maxsize=128)
def _bsr_scatter_mm_indices_data(indices_format, M, K, N, Ms, Ks, nbatches, SPLIT_N, crow_indices_as_key, col_indices_as_key):
    ...

which means that a key to LRU cache constitutes of all arguments to the function:

lru_cache_key = (indices_format, M, K, N, Ms, Ks, nbatches, SPLIT_N, crow_indices_as_key, col_indices_as_key)

while the value of the LRU cache is the result of the function. Notice that the argument crow_indices_as_key is TensorAsKey(crow_indices).
On the other hand, the result of the function is a tuple (indices_format, c_indices, r_offsets, q_offsets) where originally c_indices was crow_indices.

So, this means that the tensor crow_indices is stored both in the key as well as in the value of LRU cache. Now, when the crow_indices variable goes out of scope, the corresponding tensor ought to be garbage collected (assuming that crow_indices appears as weakref instance in the key). However, the tensor cannot be garbage collected because it is referenced by the value of the LRU cache. Therefore, the crow_indices tensor must be cloned so that garbage collection would be effective when the crow_indices variable goes out of the scope, and later, when the LRU cache will overfill, the cloned tensor will be cleaned up as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, after thinking about it, I think the crow_indices tensor in the value part of the LRU cache ought to be weakref as well. Then we don't have to wait for an overfill for the tensor to be garbage collected. I'll fix it unless I missed something right now..

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is now removed. It turns out that it is not sufficient to use crow_indices and col_indices as keys because these have shorter lifetime than the original sparse compressed tensor. Now the original tensor is wrapped with TensorAsKey to make sure that the cached item exists in the life time of the original tensor.

…)"

This PR addresses the discussion items in #111470 (comment), that is,
- use weakref when storing tensors as keys,
- add `storage_offset` to the key data,
- and revise the description of the `TensorAsKey` utility.




[ghstack-poisoned]
pearu added a commit that referenced this pull request Oct 27, 2023
…)"

This PR addresses the discussion items in #111470 (comment), that is,
- use weakref when storing tensors as keys,
- add `storage_offset` to the key data,
- and revise the description of the `TensorAsKey` utility.




[ghstack-poisoned]
pearu added a commit that referenced this pull request Oct 27, 2023
…)"

This PR addresses the discussion items in #111470 (comment), that is,
- use weakref when storing tensors as keys,
- add `storage_offset` to the key data,
- and revise the description of the `TensorAsKey` utility.




[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request Oct 30, 2023
…2337)

This PR fixes
```
RuntimeError: Triton Error [CUDA]: an illegal memory access was encountered
```
that appears when using large non-contiguous tensor arguments in `scatter_mm` kernel launch.

Pull Request resolved: #112337
Approved by: https://github.com/cpuhrsch
ghstack dependencies: #112154, #112076
@facebook-github-bot facebook-github-bot deleted the gh/pearu/126/head branch November 3, 2023 14:27
xuhancn pushed a commit to xuhancn/pytorch that referenced this pull request Nov 7, 2023
…ytorch#112076)

This PR addresses the discussion items in pytorch#111470 (comment), that is,
- use weakref when storing tensors as keys,
- add `storage_offset` to the key data,
- and revise the description of the `TensorAsKey` utility.

Pull Request resolved: pytorch#112076
Approved by: https://github.com/cpuhrsch
ghstack dependencies: pytorch#112154
xuhancn pushed a commit to xuhancn/pytorch that referenced this pull request Nov 7, 2023
…orch#112337)

This PR fixes
```
RuntimeError: Triton Error [CUDA]: an illegal memory access was encountered
```
that appears when using large non-contiguous tensor arguments in `scatter_mm` kernel launch.

Pull Request resolved: pytorch#112337
Approved by: https://github.com/cpuhrsch
ghstack dependencies: pytorch#112154, pytorch#112076
Skylion007 pushed a commit to Skylion007/pytorch that referenced this pull request Nov 14, 2023
…ytorch#112076)

This PR addresses the discussion items in pytorch#111470 (comment), that is,
- use weakref when storing tensors as keys,
- add `storage_offset` to the key data,
- and revise the description of the `TensorAsKey` utility.

Pull Request resolved: pytorch#112076
Approved by: https://github.com/cpuhrsch
ghstack dependencies: pytorch#112154
Skylion007 pushed a commit to Skylion007/pytorch that referenced this pull request Nov 14, 2023
…orch#112337)

This PR fixes
```
RuntimeError: Triton Error [CUDA]: an illegal memory access was encountered
```
that appears when using large non-contiguous tensor arguments in `scatter_mm` kernel launch.

Pull Request resolved: pytorch#112337
Approved by: https://github.com/cpuhrsch
ghstack dependencies: pytorch#112154, pytorch#112076
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants