-
Notifications
You must be signed in to change notification settings - Fork 21.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug in element-wise multiplication of torch.sparse_csr_tensor
s on GPU - 0's in result considered significant - PyTorch 2.1.1
#114529
Comments
Just co clarify, when you are using 2.0 vs 2.1, are you using the same CUDA versions or different? |
I checked the versions through this command |
@amjames Can you take a look? |
Took a while to trace this through, but confirmed there is a deviation in behaviour here beween the cuda and cpu paths. Some notes:
We don't explicitly promise that we won't produce outputs with explicit zeros. We do strive to have coalesced inputs produce coalesced outputs wherever possible, but technically our definition of coalesced does not include "no explicit zeros will be stored". The comment does seem to hold up though, doing a quick modification to force all sparse-sparse multiply on the CPU to go through the generic algo used for cuda, compared against the path taken currently the generic does seem to be a noticeable amount slower:
The unexpected result is not really wrong, the expectation is slightly flawed as sparse tensor may have explicit zeros, but the divergence of behavior between cpu/cuda here is something I would consider a minor bug. Working on a fix now. |
I think it would make sense to keep the operation algorithm and the algorithm of eliminating explicit zeros separate. This is relevant for efficiency (one may want to eliminate the explicit zeros as a last step after applying operations to sparse tensors because the zeros elimination is an expensive operation) as well as for having generic algorithms to support implementing masked and non-masked semantics of sparse tensors (in masked semantics, keeping explicit zeros may be preferred when mask invariance of operations is required). |
@pearu In this case the algorithm which does not produce explicit zeros in the output is already separate from the generic intersection kernel, and it is faster, at least for the case where it is used (both inputs are coalesced) |
Compressed formats need their own intersection primitive implemented to avoid such issues. It should be pretty straightforward to implement as well, since it shares some logic with the COO case. With COO, however, materializing zeroes is the only way to avoid device synchronizations with at least one coalesced argument. |
🐛 Describe the bug
Problem:
The problem occurs when using the PyTorch version 2.1.1. The result of element-wise multiplication between two
torch.sparse_csr_tensor
s consider 0s as significant values and retain them in their sparse form. This is not the expected behavior. Moreover, this bug occurs only when the operation is run on a gpu.Code:
Output (Torch 2.0.0 - GPU): Expected Output
Output (Torch 2.1.1 - CPU): Expected Output
Output (Torch 2.1.1 - GPU):
The resulting 0 from multiplication is considered significant.
Versions
cc @alexsamardzic @nikitaved @pearu @cpuhrsch @amjames @bhosmer
The text was updated successfully, but these errors were encountered: