-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Use weakref in storing tensors as keys (follow-up to #111470) #112076
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/112076
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 4902f95 with merge base f5088d2 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
…)" This PR addresses the discussion items in #111470 (comment), that is, - use weakref when storing tensors as keys, - add `storage_offset` to the key data, - and revise the description of the `TensorAsKey` utility. [ghstack-poisoned]
torch/sparse/_triton_ops.py
Outdated
a = non_zero_row_indices * (Ms * N) | ||
r_offsets = (a + b).view(-1) | ||
c_indices = crow_indices | ||
# crow_indices consitutes a part of a key in lru_cache as well |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused about what this means tbh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The definition of _bsr_scatter_mm_indices_data
function is:
@lru_cache(maxsize=128)
def _bsr_scatter_mm_indices_data(indices_format, M, K, N, Ms, Ks, nbatches, SPLIT_N, crow_indices_as_key, col_indices_as_key):
...
which means that a key to LRU cache constitutes of all arguments to the function:
lru_cache_key = (indices_format, M, K, N, Ms, Ks, nbatches, SPLIT_N, crow_indices_as_key, col_indices_as_key)
while the value of the LRU cache is the result of the function. Notice that the argument crow_indices_as_key
is TensorAsKey(crow_indices)
.
On the other hand, the result of the function is a tuple (indices_format, c_indices, r_offsets, q_offsets)
where originally c_indices
was crow_indices
.
So, this means that the tensor crow_indices
is stored both in the key as well as in the value of LRU cache. Now, when the crow_indices
variable goes out of scope, the corresponding tensor ought to be garbage collected (assuming that crow_indices
appears as weakref instance in the key). However, the tensor cannot be garbage collected because it is referenced by the value of the LRU cache. Therefore, the crow_indices
tensor must be cloned so that garbage collection would be effective when the crow_indices
variable goes out of the scope, and later, when the LRU cache will overfill, the cloned tensor will be cleaned up as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, after thinking about it, I think the crow_indices
tensor in the value part of the LRU cache ought to be weakref as well. Then we don't have to wait for an overfill for the tensor to be garbage collected. I'll fix it unless I missed something right now..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is now removed. It turns out that it is not sufficient to use crow_indices and col_indices as keys because these have shorter lifetime than the original sparse compressed tensor. Now the original tensor is wrapped with TensorAsKey to make sure that the cached item exists in the life time of the original tensor.
…)" This PR addresses the discussion items in #111470 (comment), that is, - use weakref when storing tensors as keys, - add `storage_offset` to the key data, - and revise the description of the `TensorAsKey` utility. [ghstack-poisoned]
…)" This PR addresses the discussion items in #111470 (comment), that is, - use weakref when storing tensors as keys, - add `storage_offset` to the key data, - and revise the description of the `TensorAsKey` utility. [ghstack-poisoned]
…)" This PR addresses the discussion items in #111470 (comment), that is, - use weakref when storing tensors as keys, - add `storage_offset` to the key data, - and revise the description of the `TensorAsKey` utility. [ghstack-poisoned]
…2337) This PR fixes ``` RuntimeError: Triton Error [CUDA]: an illegal memory access was encountered ``` that appears when using large non-contiguous tensor arguments in `scatter_mm` kernel launch. Pull Request resolved: #112337 Approved by: https://github.com/cpuhrsch ghstack dependencies: #112154, #112076
…ytorch#112076) This PR addresses the discussion items in pytorch#111470 (comment), that is, - use weakref when storing tensors as keys, - add `storage_offset` to the key data, - and revise the description of the `TensorAsKey` utility. Pull Request resolved: pytorch#112076 Approved by: https://github.com/cpuhrsch ghstack dependencies: pytorch#112154
…orch#112337) This PR fixes ``` RuntimeError: Triton Error [CUDA]: an illegal memory access was encountered ``` that appears when using large non-contiguous tensor arguments in `scatter_mm` kernel launch. Pull Request resolved: pytorch#112337 Approved by: https://github.com/cpuhrsch ghstack dependencies: pytorch#112154, pytorch#112076
…ytorch#112076) This PR addresses the discussion items in pytorch#111470 (comment), that is, - use weakref when storing tensors as keys, - add `storage_offset` to the key data, - and revise the description of the `TensorAsKey` utility. Pull Request resolved: pytorch#112076 Approved by: https://github.com/cpuhrsch ghstack dependencies: pytorch#112154
…orch#112337) This PR fixes ``` RuntimeError: Triton Error [CUDA]: an illegal memory access was encountered ``` that appears when using large non-contiguous tensor arguments in `scatter_mm` kernel launch. Pull Request resolved: pytorch#112337 Approved by: https://github.com/cpuhrsch ghstack dependencies: pytorch#112154, pytorch#112076
This PR addresses the discussion items in #111470 (comment), that is,
storage_offset
to the key data,TensorAsKey
utility.Stack from ghstack (oldest at bottom):