<a href="https://colab.research.google.com/github/ggruszczynski/gpu_colab/blob/main/10_intro_setup.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction - Syntax sugar 

When you run a command with 

- `!`  it directly executes a bash command in a **subshell**.

- `%`  it executes one of the magic commands defined in IPython.

- `%% my_native_language` defines the language used to interpret the cell

Some of the magic commands defined by IPython deliberately mirror bash commands, but they differ in the implementation details.

For example, running the !cd bash command does not persistently change your directory, because it runs in a temporary subshell. However, running the %cd magic command will persistently change your directory:

```.sh
!pwd
# /content

!cd sample_data/
!pwd
# /content

%cd sample_data/
!pwd
# /content/sample_data
```

Reference <https://ipython.readthedocs.io/en/stable/interactive/magics.html>

In [None]:
# an example of mixing python an shell in one cell

# this is python (default interpreter)
import numpy as np
print(2*np.exp([1,2,3])) 

# this is bash shell
%env  MY_VARIABLE=123 
!pwd
!echo "my shell variable ${123}"

[ 5.43656366 14.7781122  40.17107385]
env: MY_VARIABLE=123
/content
my shell variable 23


## Get the material

In [6]:
!git clone https://github.com/ggruszczynski/gpu_colab.git

Cloning into 'gpu_colab'...
remote: Enumerating objects: 75, done.[K
remote: Counting objects: 100% (60/60), done.[K
remote: Compressing objects: 100% (54/54), done.[K
remote: Total 75 (delta 28), reused 22 (delta 5), pack-reused 15[K
Unpacking objects: 100% (75/75), done.


In [7]:
!ls

gpu_colab  sample_data	src


In [8]:
%cd gpu_colab/code_samples

/content/gpu_colab/code_samples


## Create a file, compile & run!

In [9]:
%%file hello.cpp
#include <iostream>

int main() {
    std::cout << "Hello World!";
    return 0;
}

Writing hello.cpp


In [11]:
%%bash
g++ hello.cpp -o hello
echo "===print working directory and its content==="
pwd
ls
echo "===execute the program==="
./hello

===print working directory and its content===
/content/gpu_colab/code_samples
ex1_hello_world.cu
ex2_vector_add.cu
ex3_matrix_add.cu
ex4_parallel_reduction.cu
ex5_thrust_reduction.cu
ex6_thrust_saxpy.cu
gpu_batch.sh
hello
hello.cpp
hello_thrust.cu
===execute the program===
Hello World!

## cpp (auto) magic 

This section explains how to create a wrapper for your cell.

In [12]:
from IPython.core.magic import register_cell_magic

In [13]:
@register_cell_magic
def cpp(line, cell):
  with open('a.cpp', 'w') as f:
    f.write(cell)
  !g++ a.cpp
  !./a.out

In [14]:
%%cpp
#include <iostream>
int main() {
    std::cout << "Hello World!";
    return 0;
}

Hello World!

In [15]:
cpp_header = """
#include <iostream> 
#include <string>
#include <iterator> 
#include <utility> 
#include <map>
using namespace std;
"""

@register_cell_magic
def cpp(line, cell):
  if ' main()' not in cell:
    cell = "int main(){" + cell + "}"
  with open('a.cpp', 'w') as f:
    f.write(cpp_header + cell)
  !g++ a.cpp
  !./a.out

In [16]:
%%cpp
std::cout << "Hello World!";

Hello World!

In [17]:
%%cpp
for(int i=0; i<5; i++) {
    cout << i;
}

cout << endl;
pair <int, string> PAIR1; 

PAIR1.first = 100; 
PAIR1.second = "lat!" ; 

cout << PAIR1.first << " "; 
cout << PAIR1.second << endl; 

01234
100 lat!


# Activate GPU

- To get access to a GPU, click on the *Runtime* menu and select *Change runtime type*. Choose GPU as a Hardware accelerator. It might take a minute for your notebook to connect to a GPU.
- To check whether a GPU has been connected to your session, run the code cell below with the ``!nvidia-smi`` command by hitting ``SHIFT-ENTER`` on it.

In [18]:
!nvidia-smi

Fri Apr  8 10:29:06 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla K80           Off  | 00000000:00:04.0 Off |                    0 |
| N/A   34C    P8    28W / 149W |      0MiB / 11441MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [19]:
%%file hello_cuda.cu

#include <stdio.h>

// functions qualifers:
// __global__ launched by CPU on device (must return void)
// __device__ called from other GPU functions (never CPU)
// __host__ can be executed by CPU
// (can be used together with __device__)

// kernel launch:
// f_name<<<blocks,threads_per_block>>>(p1,... pN)

__global__ void print_from_gpu(void) {
    int tidx = blockIdx.x*blockDim.x+threadIdx.x;
    printf("Hello from device! My threadId = blockIdx.x *blockDim.x + threadIdx.x <=> %d = %d * %d + %d \n",
    tidx, blockIdx.x, blockDim.x, threadIdx.x);
}

int main(void) {
    printf("Hello World from host!\n");

    print_from_gpu<<<2,3>>>();  // <<<blocks, threads_per_block>>>
    cudaDeviceSynchronize();
    printf("-------------------------------\n");
    dim3 grid_dim(2,1,1);
    dim3 block_dim(3,1,1);
    print_from_gpu<<<grid_dim, block_dim>>>();  // <<<blocks, threads_per_block>>>
    cudaDeviceSynchronize();
    return 0;
}

Writing hello_cuda.cu


## Check version of your GPU card
if you received an older gpu like Tesla K80 (check the output of `!nvidia-smi` command) add the `-gencode arch=compute_35,code=sm_35` flags to nvcc compiler.

In [26]:
%%bash

CUDA_SUFF=35
nvcc -gencode arch=compute_${CUDA_SUFF},code=sm_${CUDA_SUFF} ./hello_cuda.cu -o hello_cuda
./hello_cuda

Hello World from host!
Hello from device! My threadId = blockIdx.x *blockDim.x + threadIdx.x <=> 3 = 1 * 3 + 0 
Hello from device! My threadId = blockIdx.x *blockDim.x + threadIdx.x <=> 4 = 1 * 3 + 1 
Hello from device! My threadId = blockIdx.x *blockDim.x + threadIdx.x <=> 5 = 1 * 3 + 2 
Hello from device! My threadId = blockIdx.x *blockDim.x + threadIdx.x <=> 0 = 0 * 3 + 0 
Hello from device! My threadId = blockIdx.x *blockDim.x + threadIdx.x <=> 1 = 0 * 3 + 1 
Hello from device! My threadId = blockIdx.x *blockDim.x + threadIdx.x <=> 2 = 0 * 3 + 2 
-------------------------------
Hello from device! My threadId = blockIdx.x *blockDim.x + threadIdx.x <=> 3 = 1 * 3 + 0 
Hello from device! My threadId = blockIdx.x *blockDim.x + threadIdx.x <=> 4 = 1 * 3 + 1 
Hello from device! My threadId = blockIdx.x *blockDim.x + threadIdx.x <=> 5 = 1 * 3 + 2 
Hello from device! My threadId = blockIdx.x *blockDim.x + threadIdx.x <=> 0 = 0 * 3 + 0 
Hello from device! My threadId = blockIdx.x *blockDim.x



## if you were lucky to get a more recent gpu (like Tesla T4)...

you can install a python wrapper to run `%%cu` cells directly

```.sh
%pip install git+https://github.com/andreinechaev/nvcc4jupyter.git
%load_ext nvcc_plugin
```

then,

```
%%cu 

your cell with cuda code...
```


In [22]:
%pip install git+https://github.com/andreinechaev/nvcc4jupyter.git

Collecting git+https://github.com/andreinechaev/nvcc4jupyter.git
  Cloning https://github.com/andreinechaev/nvcc4jupyter.git to /tmp/pip-req-build-szmf90vm
  Running command git clone -q https://github.com/andreinechaev/nvcc4jupyter.git /tmp/pip-req-build-szmf90vm


In [23]:
%load_ext nvcc_plugin

The nvcc_plugin extension is already loaded. To reload it, use:
  %reload_ext nvcc_plugin


In [None]:
%%cu 

#include <stdio.h>

__global__ void print_from_gpu(void) {
    int tidx = blockIdx.x*blockDim.x+threadIdx.x;
    printf("Hello from device! My threadId = blockIdx.x *blockDim.x + threadIdx.x <=> %d = %d * %d + %d \n",
    tidx, blockIdx.x, blockDim.x, threadIdx.x);
}

int main(void) {
    printf("Hello World from host!\n");

    print_from_gpu<<<2,3>>>();  // <<<blocks, threads_per_block>>>

    cudaDeviceSynchronize();
    return 0;
}