Matlab mex wrappers to NVIDIA cuSPARSE (https://developer.nvidia.com/cusparse).
Uses int32 and single precision to save memory (Matlab sparse uses int64 and double).
-
Save in a folder called @gpuSparse on the Matlab path
-
A = gpuSparse('recompile')
to trigger compilation of mex -
Recommended: CUDA-11 for much faster transpose-multiply
Due to memory layout (row/col-major) multiply and transpose-multiply differ in performance. size(A) = 221,401 x 213,331 nnz(A) = 23,609,791 (0.05%) AT = precomputed transpose of A CPU sparse A*x (sparse) : Elapsed time is 1.370207 seconds. AT*y (sparse) : Elapsed time is 1.347447 seconds. A'*y (sparse) : Elapsed time is 0.267259 seconds. GPU sparse A*x (gpuArray) : Elapsed time is 0.137195 seconds. AT*y (gpuArray) : Elapsed time is 0.106331 seconds. A'*y (gpuArray) : Elapsed time is 0.232057 seconds. (CUDA 11)A'*y (gpuArray) : Elapsed time is 16.733638 seconds.GPU gpuSparse A*x (gpuSparse): Elapsed time is 0.068451 seconds. At*y (gpuSparse): Elapsed time is 0.063651 seconds. A'*y (gpuSparse): Elapsed time is 0.059236 seconds. (CUDA 11)A'*y (gpuSparse): Elapsed time is 3.094271 seconds.