Data access pattern in the loop in add_out_dense_sparse_csr_cuda could be pretty bad #59104
Labels
module: sparse
Related to torch.sparse
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
As it is referenced here:
pytorch/aten/src/ATen/native/sparse/cuda/SparseCsrTensorMath.cu
Line 114 in a7b796c
Data access pattern in this loop could be pretty bad. Moreover this could be wildly imbalanced if the sparsity pattern varies a lot between rows.
As a reference here:
pytorch/aten/src/ATen/native/sparse/cuda/SparseCUDAApplyUtils.cuh
Line 66 in 9e60c7d
Neighboring threads are accessing neighboring values, not a single thread loops over all elements in this row. So it can be improved.
Originally posted by @ngimel in #59011 (comment)
cc @aocsa @nikitaved @pearu @cpuhrsch @IvanYashchuk
The text was updated successfully, but these errors were encountered: