# <span style="color:green"> Objective </span>

- To learn to write and analyze a high-performance scan kernel
    - Interleaved reduction trees
    - Thread index to data mapping
    - Barrier Synchronization
    - Work efficiency analysis

<hr style="height:2px">

# <span style="color:green"> A Better Parallel Scan Algorithm </span>

1. Read input from device global memory to shared memory
2. Iterate log(n) times; stride from 1 to n-1: double stride each iteration

![alt tag](img/3.png)



<hr style="height:2px">

# <span style="color:green"> A Better Parallel Scan Algorithm </span>

1. Read input from device global memory to shared memory
2. Iterate log(n) times; stride from 1 to n-1: double stride each iteration
![alt tag](img/4.png)

3. Write output from shared memory to device memory

![alt tag](img/4.png)

<hr style="height:2px">

# <span style="color:green"> Handling Dependencies </span>

- During every iteration, each thread can overwrite the input of another thread
    - Barrier synchronization to ensure all inputs have been properly generated
    - All threads secure input operand that can be overwritten by another thread
    - Barrier synchronization is required to ensure that all threads have secured their inputs
    - All threads perform addition and write output

![alt tag](img/6.png)
<hr style="height:2px">

# <span style="color:green"> A Work-Inefficient Scan Kernel </span>

```cpp
__global__ void work_inefficient_scan_kernel(float *X, float *Y, int InputSize) {
__shared__ float XY[SECTION_SIZE];
int i = blockIdx.x * blockDim.x + threadIdx.x;
if (i < InputSize) {XY[threadIdx.x] = X[i];}
// the code below performs iterative scan on XY
for (unsigned int stride = 1; stride <= threadIdx.x; stride *= 2) {
    __syncthreads();
    float in1 = XY[threadIdx.x - stride];
    __syncthreads();
    XY[threadIdx.x] += in1;
}
__ syncthreads();
if (i < InputSize) {Y[i] = XY[threadIdx.x];}
}
```
<hr style="height:2px">

<footer>
<cite> GPU NVIDIA Teaching Kit - University of Illinois </cite>
</footer>