Skip to content

puzzlef/prefix-scan-cuda

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Design of an efficient algorithm for parallel prefix-scan of a large array of values on GPUs.

It appears both CUB and Thrust have a similar performance.


OMP_NUM_THREADS=64
{1.000e+06 values} -> {0000000.8ms, 1500000 last_value} inclusiveScanOmp
{1.000e+06 values} -> {0000000.4ms, 1500000 last_value} inclusiveScanCudaCub
{1.000e+06 values} -> {0000000.3ms, 1500000 last_value} inclusiveScanCudaThrust
{1.000e+06 values} -> {0000000.8ms, 1499997 last_value} exclusiveScanOmp
{1.000e+06 values} -> {0000000.1ms, 1499997 last_value} exclusiveScanCudaCub
{1.000e+06 values} -> {0000000.3ms, 1499997 last_value} exclusiveScanCudaThrust
{1.000e+07 values} -> {0000004.9ms, 15000000 last_value} inclusiveScanOmp
{1.000e+07 values} -> {0000000.3ms, 15000000 last_value} inclusiveScanCudaCub
{1.000e+07 values} -> {0000000.5ms, 15000000 last_value} inclusiveScanCudaThrust
{1.000e+07 values} -> {0000004.9ms, 14999997 last_value} exclusiveScanOmp
{1.000e+07 values} -> {0000000.3ms, 14999997 last_value} exclusiveScanCudaCub
{1.000e+07 values} -> {0000000.5ms, 14999997 last_value} exclusiveScanCudaThrust
{1.000e+08 values} -> {0000074.7ms, 150000000 last_value} inclusiveScanOmp
{1.000e+08 values} -> {0000002.1ms, 150000000 last_value} inclusiveScanCudaCub
{1.000e+08 values} -> {0000002.4ms, 150000000 last_value} inclusiveScanCudaThrust
{1.000e+08 values} -> {0000076.0ms, 149999997 last_value} exclusiveScanOmp
{1.000e+08 values} -> {0000002.1ms, 149999997 last_value} exclusiveScanCudaCub
{1.000e+08 values} -> {0000002.3ms, 149999997 last_value} exclusiveScanCudaThrust
{1.000e+09 values} -> {0000815.9ms, 1500000000 last_value} inclusiveScanOmp
{1.000e+09 values} -> {0000020.3ms, 1500000000 last_value} inclusiveScanCudaCub
{1.000e+09 values} -> {0000020.7ms, 1500000000 last_value} inclusiveScanCudaThrust
{1.000e+09 values} -> {0000733.8ms, 1499999997 last_value} exclusiveScanOmp
{1.000e+09 values} -> {0000020.1ms, 1499999997 last_value} exclusiveScanCudaCub
{1.000e+09 values} -> {0000020.4ms, 1499999997 last_value} exclusiveScanCudaThrust

References




ORG

About

Design of an efficient algorithm for parallel prefix-scan of a large array of values on GPUs.

Resources

License

Stars

Watchers

Forks