<a href="https://colab.research.google.com/github/knoel99/learn_cuda/blob/master/01_easier_intro_to_cuda.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# An even easier introduction to CUDA

Source: https://developer.nvidia.com/blog/even-easier-introduction-cuda/

Many noob notes for C++ are added.

# Requirements
- Learn how to run C++ code in colab
- Select a colab runtime with a GPU

In [None]:
# Test C++ code
%%writefile hello.cpp
#include <iostream>
using namespace std;

int main() {
  cout << "Hello from Colab!" << endl;
  return 0;
}

Writing hello.cpp


In [None]:
# Compile with g++
!g++ hello.cpp -o hello
!./hello

Hello from Colab!


noob notes:
- `<iostream>` is the library needed to print results in the terminal
- writing `using namespace std;` allows to directly write function `cout` instead of `std::cout`
- `cout`means "console output" or "character output"

# Addition of two arrays

In this tutorial the studied function is just the addition of two arrays with 1 million elements each.

In [None]:
%%writefile addition.cpp
#include <iostream>
#include <math.h>

// Add two arrays
void add(int n, float *x, float *y) {
  for (int i = 0; i < n; i++)
    y[i] = x[i] + y[i];
}

int main(void) {
  int N = 1<<20; // 1 M elements

  float *x = new float[N];
  float *y = new float[N];

  // Init the two arrays with a for loop.
  // tutorial says : init arrays on the host => TODO search the meaning of host
  for (int i = 0; i < N; i++) {
    x[i] = 1.0f;
    y[i] = 2.0f;
  }

  // Run kernel on 1M elements on the CPU
  add(N, x, y);

  // Check for errors (all elements should be 3.0f)
  float maxError = 0.0f;
  for (int i = 0; i < N ; i++)
    maxError = fmax(maxError, fabs(y[i] - 3.0f));
  std::cout << "Max error: " << maxError << std::endl;

  // Free memory
  delete [] x;
  delete [] y;

  return 0;

}

Overwriting addition.cpp


In [None]:
# Compile and run
!g++ addition.cpp -o addition
!./addition

Max error: 0


# Noob note

Meaning of `int N = 1<<20; `

- `1<<20` means 2^20, where the double chevron means shifting bits to the left. The two arrays has 1 048 676 elements.
- Each element of the array is a float, defined on 4 bytes.
- Each array is then about 4*2^20 bytes=~ 4 MB in memory


Some examples:
- 1 << 10 = 1024 ~ 1 kB
- 1 << 20 = 1 048 576 ~1 MB
- 1 << 30 = 1 073 741 824  ~1 GB

Why put the pointers in the function arguments instead of the arrays themselves, just like in python ?


In python we have:
```python
def add(a, b):
    for i in range(len(a)):
        b[i] = a[i] + b[i]

x = [1.0] * 2**20
y = [2.0] * 2**20
add(x, y)
```

In Cpp:
```cpp
void add(int n, float *x, float *y) {
    for (int i = 0; i < n; i++)
        y[i] = x[i] + y[i];
}
int N = 1<<20;
float *x = new float[N];
float *y = new float[N];

for (int i = 0; i < N; i++) {
  x[i] = 1.0f;
  y[i] = 2.0f;
}

add(N, x, y);
```

In theory those two lines are equivalent, but the convention is to declare the pointer of the variable instead of the variable itself.

Let's test it.

```cpp
void add(int n, float *x, float *y)
void add(int n, float x[}, foat y[])

```

In [None]:
%%writefile test.cpp
#include <iostream>

