<a href="https://colab.research.google.com/github/ggruszczynski/gpu_colab/blob/main/10_intro_setup.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction - Syntax sugar

When you run a command with

- `!`  it directly executes a bash command in a **subshell**.

- `%`  it executes one of the magic commands defined in IPython.

- `%% my_native_language` defines the language used to interpret the cell

Some of the magic commands defined by IPython deliberately mirror bash commands, but they differ in the implementation details.

For example, running the !cd bash command does not persistently change your directory, because it runs in a temporary subshell. However, running the %cd magic command will persistently change your directory:

```.sh
!pwd
# /content

!cd sample_data/
!pwd
# /content

%cd sample_data/
!pwd
# /content/sample_data
```

Reference <https://ipython.readthedocs.io/en/stable/interactive/magics.html>

In [1]:
# an example of mixing python an shell in one cell

# this is python (default interpreter)
import numpy as np
print(2*np.exp([1,2,3]))

# this is bash shell
%env  MY_VARIABLE=123
!pwd
!echo "my shell variable ${MY_VARIABLE}"

[ 5.43656366 14.7781122  40.17107385]
env: MY_VARIABLE=123
/content
my shell variable 123


## Get the material

In [2]:
!git clone https://github.com/ggruszczynski/gpu_colab.git

Cloning into 'gpu_colab'...
remote: Enumerating objects: 390, done.[K
remote: Counting objects: 100% (64/64), done.[K
remote: Compressing objects: 100% (52/52), done.[K
remote: Total 390 (delta 29), reused 31 (delta 12), pack-reused 326[K
Receiving objects: 100% (390/390), 22.35 MiB | 24.77 MiB/s, done.
Resolving deltas: 100% (155/155), done.


In [3]:
!ls

gpu_colab


In [4]:
%cd gpu_colab/code_samples

[Errno 2] No such file or directory: 'gpu_colab/code_samples'
/content


## Create a file, compile & run!

In [5]:
%%file hello.cpp
#include <iostream>

int main() {
    std::cout << "Hello World!";
    return 0;
}

Writing hello.cpp


In [6]:
%%bash
g++ hello.cpp -o hello
echo "===print working directory and its content==="
pwd
ls
echo "===execute the program==="
./hello

===print working directory and its content===
/content
gpu_colab
hello
hello.cpp
===execute the program===
Hello World!

## cpp (auto) magic

This section explains how to create a wrapper for your cell.

In [7]:
from IPython.core.magic import register_cell_magic

In [8]:
@register_cell_magic
def cpp(line, cell):
  with open('a.cpp', 'w') as f:
    f.write(cell)
  !g++ a.cpp
  !./a.out

In [9]:
%%cpp
#include <iostream>
int main() {
    std::cout << "Hello World!";
    return 0;
}

Hello World!

In [10]:
cpp_header = """
#include <iostream>
#include <string>
#include <iterator>
#include <utility>
#include <map>
using namespace std;
"""

@register_cell_magic
def cpp(line, cell):
  if ' main()' not in cell:
    cell = "int main(){" + cell + "}"
  with open('a.cpp', 'w') as f:
    f.write(cpp_header + cell)
  !g++ a.cpp
  !./a.out

In [11]:
%%cpp
std::cout << "Hello World!";

Hello World!

In [12]:
%%cpp
for(int i=0; i<5; i++) {
    cout << i;
}

cout << endl;
pair <int, string> PAIR1;

PAIR1.first = 100;
PAIR1.second = "lat!" ;

cout << PAIR1.first << " ";
cout << PAIR1.second << endl;

01234
100 lat!


# Activate GPU

- To get access to a GPU, click on the *Runtime* menu and select *Change runtime type*. Choose GPU as a Hardware accelerator. It might take a minute for your notebook to connect to a GPU.
- To check whether a GPU has been connected to your session, run the code cell below with the ``!nvidia-smi`` command by hitting ``SHIFT-ENTER`` on it.

In [13]:
!nvidia-smi

Tue Oct 31 10:06:45 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   40C    P8     9W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [14]:
%%file hello_cuda.cu

#include <stdio.h>

// functions qualifers:
// __global__ launched by CPU on device (must return void)
// __device__ called from other GPU functions (never CPU)
// __host__ can be executed by CPU
// (can be used together with __device__)

// kernel launch:
// f_name<<<blocks,threads_per_block>>>(p1,... pN)

__global__ void print_from_gpu(void) {
    int tidx = blockIdx.x*blockDim.x+threadIdx.x;
    printf("Hello from device! My threadId = blockIdx.x *blockDim.x + threadIdx.x <=> %d = %d * %d + %d \n",
    tidx, blockIdx.x, blockDim.x, threadIdx.x);
}

int main(void) {
    printf("Hello World from host!\n");

    print_from_gpu<<<2,3>>>();  // <<<blocks, threads_per_block>>>
    cudaDeviceSynchronize();
    printf("-------------------------------\n");
    dim3 grid_dim(2,1,1);
    dim3 block_dim(3,1,1);
    print_from_gpu<<<grid_dim, block_dim>>>();  // <<<blocks, threads_per_block>>>
    cudaDeviceSynchronize();
    return 0;
}

Writing hello_cuda.cu


## Check version of your GPU card
if you received an older gpu like Tesla K80 (check the output of `!nvidia-smi` command) add the `-gencode arch=compute_35,code=sm_35` flags to nvcc compiler.

In [15]:
%%bash

CUDA_SUFF=70
nvcc -gencode arch=compute_${CUDA_SUFF},code=sm_${CUDA_SUFF} ./hello_cuda.cu -o hello_cuda
./hello_cuda

Hello World from host!
Hello from device! My threadId = blockIdx.x *blockDim.x + threadIdx.x <=> 3 = 1 * 3 + 0 
Hello from device! My threadId = blockIdx.x *blockDim.x + threadIdx.x <=> 4 = 1 * 3 + 1 
Hello from device! My threadId = blockIdx.x *blockDim.x + threadIdx.x <=> 5 = 1 * 3 + 2 
Hello from device! My threadId = blockIdx.x *blockDim.x + threadIdx.x <=> 0 = 0 * 3 + 0 
Hello from device! My threadId = blockIdx.x *blockDim.x + threadIdx.x <=> 1 = 0 * 3 + 1 
Hello from device! My threadId = blockIdx.x *blockDim.x + threadIdx.x <=> 2 = 0 * 3 + 2 
-------------------------------
Hello from device! My threadId = blockIdx.x *blockDim.x + threadIdx.x <=> 3 = 1 * 3 + 0 
Hello from device! My threadId = blockIdx.x *blockDim.x + threadIdx.x <=> 4 = 1 * 3 + 1 
Hello from device! My threadId = blockIdx.x *blockDim.x + threadIdx.x <=> 5 = 1 * 3 + 2 
Hello from device! My threadId = blockIdx.x *blockDim.x + threadIdx.x <=> 0 = 0 * 3 + 0 
Hello from device! My threadId = blockIdx.x *blockDim.x

## if you were lucky to get a more recent gpu (like Tesla T4)...

you can install a python wrapper to run `%%cu` cells directly

```.sh
!pip install git+https://github.com/andreinechaev/nvcc4jupyter.git
%load_ext nvcc_plugin
```

then,

```
%%cu

your cell with cuda code...
```


In [16]:
!pip install git+https://github.com/andreinechaev/nvcc4jupyter.git

Collecting git+https://github.com/andreinechaev/nvcc4jupyter.git
  Cloning https://github.com/andreinechaev/nvcc4jupyter.git to /tmp/pip-req-build-inp847xf
  Running command git clone --filter=blob:none --quiet https://github.com/andreinechaev/nvcc4jupyter.git /tmp/pip-req-build-inp847xf
  Resolved https://github.com/andreinechaev/nvcc4jupyter.git to commit 0a71d56e5dce3ff1f0dd2c47c29367629262f527
  Preparing metadata (setup.py) ... [?25l[?25hdone


In [17]:
%load_ext nvcc_plugin

created output directory at /content/src
Out bin /content/result.out


In [18]:
%%cu

#include <stdio.h>

__global__ void print_from_gpu(void) {
    int tidx = blockIdx.x*blockDim.x+threadIdx.x;
    printf("Hello from device! My threadId = blockIdx.x *blockDim.x + threadIdx.x <=> %d = %d * %d + %d \n",
    tidx, blockIdx.x, blockDim.x, threadIdx.x);
}

int main(void) {
    printf("Hello World from host!\n");

    print_from_gpu<<<2,3>>>();  // <<<blocks, threads_per_block>>>

    cudaDeviceSynchronize();
    return 0;
}

Hello World from host!
Hello from device! My threadId = blockIdx.x *blockDim.x + threadIdx.x <=> 3 = 1 * 3 + 0 
Hello from device! My threadId = blockIdx.x *blockDim.x + threadIdx.x <=> 4 = 1 * 3 + 1 
Hello from device! My threadId = blockIdx.x *blockDim.x + threadIdx.x <=> 5 = 1 * 3 + 2 
Hello from device! My threadId = blockIdx.x *blockDim.x + threadIdx.x <=> 0 = 0 * 3 + 0 
Hello from device! My threadId = blockIdx.x *blockDim.x + threadIdx.x <=> 1 = 0 * 3 + 1 
Hello from device! My threadId = blockIdx.x *blockDim.x + threadIdx.x <=> 2 = 0 * 3 + 2 

