## Correlation Coefficient Example
The following example will calculate the correlation coefficient against five images (layered).<br>
The Sample is an array with length 5.  This sample goes through the correlation calculations pixel by pixel where:<br>
sample[5] corralated against each pixel such as 3D_array[0,0,:] (z-axis of the layer).

We start the code by loading dependencies and declair variables:

In [1]:
import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule
import numpy
from pycuda.gpuarray import to_gpu

layers = 5
column = 10
depth = 10
layersResults = 6 #result layer and correlation formula values (5 of them)

## Initilaize our arrays
At the same time, we load dummy data where:<br>
arrayData has the same pixel value / layer<br>
arrayControl has the signature we are looking for<br>
ArrayResult has the correlation coefficient in array position [0] and [1] to[6] are the variables for the equation.<br>
Details on the equiation follow

In [2]:
arrayData = numpy.full((layers, column, depth), 0)
layers = numpy.int32(layers) #define the layer count
#populate each layer with fixed values
arrayData[0,:,:] = 25
arrayData[1,:,:] = 25
arrayData[2,:,:] = 27
arrayData[3,:,:] = 31
arrayData[4,:,:] = 32

arrayData = arrayData.astype(numpy.int32)
arrayData_gpu = cuda.mem_alloc(arrayData.nbytes)
cuda.memcpy_htod(arrayData_gpu, arrayData)

arrayData_Answer = numpy.empty_like(arrayData)

arrayResult = numpy.zeros([layersResults, column, depth], dtype = numpy.float32)
arrayResult_gpu = cuda.mem_alloc(arrayResult.nbytes)
cuda.memcpy_htod(arrayResult_gpu, arrayResult)
arrayResult_Answer = numpy.empty_like(arrayResult)

arrayControl = numpy.zeros(layers, dtype = numpy.int32)
arrayControl[0] = 15
arrayControl[1] = 18
arrayControl[2] = 21
arrayControl[3] = 24
arrayControl[4] = 27

arrayControl_gpu = cuda.mem_alloc(arrayControl.nbytes)
cuda.memcpy_htod(arrayControl_gpu, arrayControl)

#used to generate the lookup table for correlation coefficient (5 depth only)
arrayTemp = numpy.full((5, column, depth), 0)
arrayTemp = arrayData.astype(numpy.int32)
arrayTemp_gpu = cuda.mem_alloc(arrayTemp.nbytes)
cuda.memcpy_htod(arrayTemp_gpu, arrayTemp)

The correlation coeeficient equation is described as:<br><br>

\begin{equation*}
r = \frac{n \left(\sum XY \right) - \left( \sum X \right)\left( \sum Y \right)}
{\sqrt{\left[ n\sum X^2 - \left( \sum X\right)^2 \right]\left[ n\sum Y^2 - \left(  \sum Y\right)^2\right]}}
\end{equation*}

Using the following two arrays (these were set above during array intialization).<br>
X[ ] = {15, 18, 21, 24, 27} (Control 1D array)<br>
Y[ ] = {25, 25, 27, 31, 32} (Data where layer 1 = 25, layer 2 = 25, etc...)<br><br>
The following table caluclates the values required for the formula (last row).<br> 

|ID|  X   |  Y   |  X*Y |  X*X |  Y*Y |
|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|
|Value-1|    15|    25|   375|   225|   625|
|Value-2|    18|    25|   450|   324|   625|
|Value-3|    21|    27|   567|   441|   729|
|Value-4|    24|    31|   744|   576|   961|
|Value-5|    27|    32|   864|   729|  1024|
|sum-->|105|140|3000|2295|3964|

We now substitute each value from the 'sum' row into our formula and get:<br>

\begin{equation*}
r = \frac{5 \left(3000 \right) - \left( 105 \right)\left( 140 \right)}
{\sqrt{\left[ 5\left(2295\right) - \left( 105\right)\left( 105\right) \right]\left[ 5\left( 3964\right)  - \left( 140\right)\left( 140\right)\right]}}
\end{equation*}
Thus:

\begin{equation*}
r = 0.9535
\end{equation*}

## Here we create the kernal that will be loaded into the GPU
The kernal does the following:<br>
01. define our parallel processing dimensions
02. loop through each layer and find the entries for the correlation equation
03. calculate the correlation coefficient

In [3]:
mod = SourceModule("""
    #include <math.h>
    __global__ void getLayer(int *arrayData, int * arrayControl, float *arrayResult, int layers)
    {
        //step 01
        int idx = threadIdx.x + (blockIdx.x * blockDim.x); // x coordinate (numpy axis 2) 
        int idy = threadIdx.y + (blockIdx.y * blockDim.y); // y coordinate (numpy axis 1) 
        int x_width = (blockDim.x * gridDim.x); 
        int y_width = (blockDim.y * gridDim.y);                            
        
        //step 02
        for(int i = 0; i < layers; i++)
        {
            //sum of X
            arrayResult[idx + (x_width * idy) + (x_width * y_width) * 1] +=
                arrayControl[i];            
            
            //sum of Y
            arrayResult[idx + (x_width * idy) + (x_width * y_width) * 2] +=
                arrayData[idx + (x_width * idy) + (x_width * y_width) * i];            
            
            //sum of X*Y
            arrayResult[idx + (x_width * idy) + (x_width * y_width) * 3] +=
                arrayData[idx + (x_width * idy) + (x_width * y_width) * i] *
                arrayControl[i];            
            
            //sum of X*X
            arrayResult[idx + (x_width * idy) + (x_width * y_width) * 4] +=
                arrayControl[i] *
                arrayControl[i];            
            
            //sum of Y*Y
            arrayResult[idx + (x_width * idy) + (x_width * y_width) * 5] +=
                arrayData[idx + (x_width * idy) + (x_width * y_width) * i] *
                arrayData[idx + (x_width * idy) + (x_width * y_width) * i];            
        }
        
        //step 03
        arrayResult[idx + (x_width * idy) + (x_width * y_width) * 0] =
            ((5 * arrayResult[idx + (x_width * idy) + (x_width * y_width) * 3] -
               arrayResult[idx + (x_width * idy) + (x_width * y_width) * 1] *
               arrayResult[idx + (x_width * idy) + (x_width * y_width) * 2]
            )
           /
            (sqrt((5 * arrayResult[idx + (x_width * idy) + (x_width * y_width) * 4] -
                   powf(arrayResult[idx + (x_width * idy) + (x_width * y_width) * 1], 2)
                  )
                  *
                  (5 * arrayResult[idx + (x_width * idy) + (x_width * y_width) * 5] -
                   powf(arrayResult[idx + (x_width * idy) + (x_width * y_width) * 2], 2)
                  )
                 ) 
            )
           );
    }
    """)

Define the kernal in pycuda and execute the function.<br>
Once the calculation are complete, move our data from GPU memory to CPU memory.

In [4]:
func = mod.get_function("getLayer")
func(arrayData_gpu, arrayControl_gpu, arrayResult_gpu, layers, block=(depth, column, 1), grid=(1,1))

cuda.memcpy_dtoh(arrayData_Answer, arrayData_gpu)
cuda.memcpy_dtoh(arrayResult_Answer, arrayResult_gpu)

### Print the results
Where:<br>
1. arrayResult_Answer[:,0] = correlation coefficient for each pixel
2. arrayResult_Answer[:,1] = \\(\sum X\\)
3. arrayResult_Answer[:,2] = \\(\sum Y\\)
4. arrayResult_Answer[:,3] = \\(\sum XY\\)
5. arrayResult_Answer[:,4] = \\(\sum XX\\)
6. arrayResult_Answer[:,5] = \\(\sum YY\\)


In [5]:
numpy.set_printoptions(formatter={'float': lambda x: "{0:0.4f}".format(x)})
print("----=== correlation coeffificient ===----")
print(arrayResult_Answer[0,:,:])
numpy.set_printoptions(formatter={'float': lambda x: "{0:0.0f}".format(x)})
print("----=== sum X ===----")
print(arrayResult_Answer[1,:,:])
print("----=== sum Y ===----")
print(arrayResult_Answer[2,:,:])
print("----=== sum XY ===----")
print(arrayResult_Answer[3,:,:])
print("----=== sum XX ===----")
print(arrayResult_Answer[4,:,:])
print("----=== sum YY ===----")
print(arrayResult_Answer[5,:,:])

----=== correlation coeffificient ===----
[[0.9535 0.9535 0.9535 0.9535 0.9535 0.9535 0.9535 0.9535 0.9535 0.9535]
 [0.9535 0.9535 0.9535 0.9535 0.9535 0.9535 0.9535 0.9535 0.9535 0.9535]
 [0.9535 0.9535 0.9535 0.9535 0.9535 0.9535 0.9535 0.9535 0.9535 0.9535]
 [0.9535 0.9535 0.9535 0.9535 0.9535 0.9535 0.9535 0.9535 0.9535 0.9535]
 [0.9535 0.9535 0.9535 0.9535 0.9535 0.9535 0.9535 0.9535 0.9535 0.9535]
 [0.9535 0.9535 0.9535 0.9535 0.9535 0.9535 0.9535 0.9535 0.9535 0.9535]
 [0.9535 0.9535 0.9535 0.9535 0.9535 0.9535 0.9535 0.9535 0.9535 0.9535]
 [0.9535 0.9535 0.9535 0.9535 0.9535 0.9535 0.9535 0.9535 0.9535 0.9535]
 [0.9535 0.9535 0.9535 0.9535 0.9535 0.9535 0.9535 0.9535 0.9535 0.9535]
 [0.9535 0.9535 0.9535 0.9535 0.9535 0.9535 0.9535 0.9535 0.9535 0.9535]]
----=== sum X ===----
[[105 105 105 105 105 105 105 105 105 105]
 [105 105 105 105 105 105 105 105 105 105]
 [105 105 105 105 105 105 105 105 105 105]
 [105 105 105 105 105 105 105 105 105 105]
 [105 105 105 105 105 105 105 105