In order to successfully complete this assignment you must do the required reading, watch the provided videos and complete all instructions.  The embedded Google form must be entirely filled out and submitted on or before **11:59pm on Sunday February 24th**.  Students must come to class the next day prepared to discuss the material covered in this assignment.

# Pre-Class Assignment: Introduction to CUDA

### Goals for today's pre-class assignment 

1. CPU vs GPU
1. CUDA on the HPCC
3. Assignment wrap-up

----

# 1. CPU vs GPU

Okay, these videos are just kind of fun and try to give a broad overview between GPUs and CPUS.  

In [None]:
from IPython.display import YouTubeVideo
YouTubeVideo("-P28LKWTzrI",width=640,height=360)

&#9989; <font color=red>**QUESTION:**</font>  Think about the analogy demonstrated in the Mythbuster video.  Serial code is the single robot arm and the GPU is the array.  Using the same basic analogy, describe a program that would represent a shared memory system such as OpenMP. 

Put your answer to the above question here

In [None]:
from IPython.display import YouTubeVideo
YouTubeVideo("1kypaBjJ-pg",width=640,height=360)

&#9989; <font color=red>**QUESTION:**</font>  What does the acronym GPGPU stand for?  

Put your answer to the above question here

In [None]:
from IPython.display import YouTubeVideo
YouTubeVideo("lQVV5JCd74I",width=640,height=360)

&#9989; <font color=red>**QUESTION:**</font> What is the name of the CUDA command that allocates memory on the GPU?

Put your answer to the above question here

&#9989; <font color=red>**QUESTION:**</font> What is the name of the CUDA command that copies memory from the CPU to the GPU?

Put your answer to the above question here

&#9989; <font color=red>**QUESTION:**</font> What is the difference between the "HOST" and the "DEVICE" in CUDA terminology?

Put your answer to the above question here

In [None]:
from IPython.display import YouTubeVideo
YouTubeVideo("IzU4AVcMFys",width=640,height=360)

&#9989; <font color=red>**QUESTION:**</font> What is the difference between the "HOST" and the "DEVICE" in CUDA terminology?

# 2. CUDA on the HPCC

In [None]:
from IPython.display import YouTubeVideo
YouTubeVideo("gzkKtcRpA6A",width=640,height=360)

Commands from the video

    ssh dev-intel16-k80
    module load powertools
    getexample cuda
    cd cuda
    module load CUDA/9.2.8
    nvcc -o simple_cuda simple.cu
    ./simple.cu
    

&#9989; <font color=red>**DO THIS:**</font> Here is the code as it was modified in the video.  See if you can get it working.

In [None]:
%%writefile NCode/simple.cu

#include "cuda.h"
#include <iostream>
#define CUDA_CALL(x) {cudaError_t cuda_error__ = (x); if (cuda_error__) std::cout << "CUDA error: " << #x << " returned " << cudaGetErrorString(cuda_error__) << std::endl;}


__global__ void theKernel(float * our_array)
{
    //This is array flattening, (Array Width * Y Index + X Index)
    int index = (gridDim.x * blockDim.x) * \
              (blockIdx.y * blockDim.y + threadIdx.y) + \
              (blockIdx.x * blockDim.x + threadIdx.x);
    our_array[index] = (float) index;
}


void printGrid(float an_array[16][16])
{
    for (int i = 0; i < 16; i++)
    {
        for (int j = 0; j < 16; j++)
        {
            std::cout << an_array[i][j];
            std::cout << " ";
        }
        std::cout << std::endl;
    }
}


int main()
{
    float our_array[16][16];

    for (int i = 0; i < 16; i++)
    {
        for (int j = 0; j < 16; j++)
        {
            our_array[i][j] = i;
        }
    }

    //STEP 1: ALLOCATE
    float * our_array_d;
    int size = sizeof(float)*256;
    CUDA_CALL(cudaMalloc((void **) &our_array_d, size));

    //STEP 2: TRANSFER
    CUDA_CALL(cudaMemcpy(our_array_d, our_array, size, cudaMemcpyHostToDevice));

    //STEP 3: SET UP
    dim3 blockSize(8,8,1);
    dim3 gridSize(2,2,1);

    //STEP 4: RUN
    theKernel <<<gridSize, blockSize>>> (our_array_d);

    //STEP 5: TRANSFER
    printGrid(our_array);
    CUDA_CALL(cudaMemcpy(our_array, our_array_d, size, cudaMemcpyDeviceToHost));
    std::cout << "--------------------" << std::endl;
    printGrid(our_array);
}

In [None]:
#Compile Cuda
!nvcc -o simple_cuda NCode/simple.cu

In [None]:
#Run Example
!./simple_cuda

----
# 3. Assignment wrap-up

Please fill out the form that appears when you run the code below.  **You must completely fill this out in order to receive credit for the assignment!**

[Direct Link](https://docs.google.com/forms/d/e/1FAIpQLScj9YBw_FM5hLzZgmwKbt4JiWvaSQsFDDEHJromcXBTfcLI6w/viewform)

&#9989; <font color=red>**QUESTION:**</font> Approximately how long (in minutes) did this assignment take for you to complete.

Put your answer to the above question here

&#9989; <font color=red>**QUESTION:**</font>  What questions do you have, if any, about any of the topics discussed in this assignment after working through the jupyter notebook?

Put your answer to the above question here

&#9989; <font color=red>**QUESTION:**</font>  Do you have any further questions or comments about this material, or anything else that's going on in class? 

Put your answer to the above question here

In [None]:
from IPython.display import HTML
HTML(
"""
<iframe 
	src="https://docs.google.com/forms/d/e/1FAIpQLScj9YBw_FM5hLzZgmwKbt4JiWvaSQsFDDEHJromcXBTfcLI6w/viewform?embedded=true" 
	width="100%" 
	height="1200px" 
	frameborder="0" 
	marginheight="0" 
	marginwidth="0">
	Loading...
</iframe>
"""
)

---------
### Congratulations, we're done!

To get credit for this assignment you must fill out and submit the above Google From on or before the assignment due date.


**Course Resources:**
- [Syllabus](https://tinyurl.com/y75cnzam)
- [Preliminary Schedule](https://tinyurl.com/CMSE314-Schedule)
- [Git Repository](https://gitlab.msu.edu/colbrydi/cmse401-s19)
- [Jargon Jar and Command History](https://tinyurl.com/CMSE314-JargonJar) 



&#169; Copyright 2019,  Michigan State University Board of Trustees