Skip to content

puzzlef/vector-multiplication-cuda

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Comparing approaches for CUDA-based vector multiplication.

In each of the experiments given below, we multiply two floating-point vectors x and y, with number of elements from 10^6 to 10^9 using OpenMP. Each element count is attempted with various approaches, running each approach 5 times to get a good time measure. Multiplication here represents any memory-aligned independent operation, or a map() operation.


Adjusting Launch config

In this experiment (adjust-launch), we multiply two floating-point vectors x and y using CUDA. Each element count is attempted with various CUDA launch configs. Results indicate that a grid_limit of 16384/32768, and a block_size of 128/256 to be suitable for both float and double. Using a grid_limit of MAX and a block_size of 256 could be a decent choice.


Adjusting Thread duty

In this experiment (adjust-duty), we compare various per-thread duty numbers for CUDA-based vector multiplication. Each element count is attempted with various CUDA launch configs and per-thread duties. Results indicate no significant difference between adjust-launch approach, and this one.



References



ORG DOI