##**Hello world CUDA example**


Let's create our first CUDA program. Modify this skeleton to write a minimal CUDA program to:

- define 2 blocks with 4 threads each
- print the "Hello World" message to reflect also information on the thread number from each block (hint: use the built-in variable ```threadIdx.x```)



In [1]:
!nvidia-smi

Mon Jul 12 14:56:06 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.42.01    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   49C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [2]:
 %%sh 
cat > hello.cu << EOF
#include <stdio.h>
 
#define NUM_BLOCKS 4
#define BLOCK_SIZE 1
 
__global__ void hello()
{
    int idx = blockIdx.x;
    printf("Hello world! I'm a thread in block %d\n", idx);
}
 
 
int main(int argc,char **argv)
{
    // launch the kernel
    hello<<<NUM_BLOCKS, BLOCK_SIZE>>>();
 
    // force the printf()s to flush
    cudaDeviceSynchronize();
 
    printf("That's all!\n");
 
    return 0;
}
EOF

In [3]:
 !nvcc -o hello_cuda hello.cu && ./hello_cuda

Hello world! I'm a thread in block 1
Hello world! I'm a thread in block 3
Hello world! I'm a thread in block 2
Hello world! I'm a thread in block 0
That's all!


You can compare with our solution:

In [4]:
 %%sh 
cat > hello_exercise.cu << EOF
#include <stdio.h>
 
#define NUM_BLOCKS 2
#define BLOCK_SIZE 4
 
__global__ void hello()
{
    int bidx = blockIdx.x;
    int tidx = threadIdx.x;
    printf("Hello world! I'm a thread %d in block %d\n", tidx, bidx);
}
 
 
int main(int argc,char **argv)
{
    // launch the kernel
    hello<<<NUM_BLOCKS, BLOCK_SIZE>>>();
 
    // force the printf()s to flush
    cudaDeviceSynchronize();
 
    printf("That's all!\n");
 
    return 0;
}
EOF

In [5]:
 !nvcc -o hello_cuda_exercise hello_exercise.cu && ./hello_cuda_exercise

Hello world! I'm a thread 0 in block 1
Hello world! I'm a thread 1 in block 1
Hello world! I'm a thread 2 in block 1
Hello world! I'm a thread 3 in block 1
Hello world! I'm a thread 0 in block 0
Hello world! I'm a thread 1 in block 0
Hello world! I'm a thread 2 in block 0
Hello world! I'm a thread 3 in block 0
That's all!
