Data access pattern in the loop in add_out_dense_sparse_csr_cuda could be pretty bad #59104

aocsa · 2021-05-27T20:46:30Z

As it is referenced here:

pytorch/aten/src/ATen/native/sparse/cuda/SparseCsrTensorMath.cu

Line 114 in a7b796c

thrust::for_each(

Data access pattern in this loop could be pretty bad. Moreover this could be wildly imbalanced if the sparsity pattern varies a lot between rows.

As a reference here:

pytorch/aten/src/ATen/native/sparse/cuda/SparseCUDAApplyUtils.cuh

Line 66 in 9e60c7d

op(dst + linearId2, src + linearId2);

Neighboring threads are accessing neighboring values, not a single thread loops over all elements in this row. So it can be improved.

Originally posted by @ngimel in #59011 (comment)

cc @aocsa @nikitaved @pearu @cpuhrsch @IvanYashchuk

This was referenced May 27, 2021

CUDA support in the CSR layout: sparse_to_dense/add_sparse_csr #59011

Closed

Sparse tensor CSR layout for CUDA #56485

Open

ngimel added module: sparse Related to torch.sparse triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Jun 1, 2021

pearu added this to To do in Sparse tensors Aug 10, 2021

pearu moved this from To do to To Do: CSR in Sparse tensors Aug 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data access pattern in the loop in add_out_dense_sparse_csr_cuda could be pretty bad #59104

Data access pattern in the loop in add_out_dense_sparse_csr_cuda could be pretty bad #59104

aocsa commented May 27, 2021 •

edited by pytorch-probot bot

Data access pattern in the loop in add_out_dense_sparse_csr_cuda could be pretty bad #59104

Data access pattern in the loop in add_out_dense_sparse_csr_cuda could be pretty bad #59104

Comments

aocsa commented May 27, 2021 • edited by pytorch-probot bot

aocsa commented May 27, 2021 •

edited by pytorch-probot bot