# Cupy Tutorial

This short tutorial introduces Cupy, a library that presents itself as the equivalent of numpy for GPUs

In [1]:
import numpy as np
import cupy as cp

And here is a small example of how the two libraries are similar.

In [None]:
x_cpu = np.array([1, 2, 3])
l2_cpu = np.linalg.norm(x_cpu)
print(l2_cpu)

# We can calculate it on GPU with CuPy in a similar way:
x_gpu = cp.array([1, 2, 3])
l2_gpu = cp.linalg.norm(x_gpu)
print(l2_gpu)



# Move arrays to a device

cupy.asarray() can be used to move a numpy.ndarray, a list, or any object that can be passed to numpy.array() to the current device:
```python
x_cpu = np.array([1, 2, 3])
x_gpu = cp.asarray(x_cpu)  # move the data to the current device.
```
cupy.asarray() can accept cupy.ndarray, which means we can transfer the array between devices with this function.

```python
with cp.cuda.Device(0):
    x_gpu_0 = cp.ndarray([1, 2, 3])  # create an array in GPU 0
with cp.cuda.Device(1):
    x_gpu_1 = cp.asarray(x_gpu_0)  # move the array to GPU 1
```


# Exercice 1:

Cupy Sandbox: Try out Matrix multiplications, and compare the speed w/ numpy depending on the size of the array

In [None]:
from time import time
size = 50
a_cpu = np.random.randn(100, size)
b_cpu = np.random.randn(size, 100)

a_gpu = cp.asarray(a_cpu)
b_gpu = cp.asarray(b_cpu)

In [None]:
t = time()
for i in range(100):
    # perform the same matrix multiplication on numpy
print("100 matrix multiplications for arrays of size (100x{}) performed on cpu in {}".format(size, time()-t))

t = time()
for i in range(100):
    # perform the same matrix multiplication on cupy
print("100 matrix multiplications for arrays of size (100x{}) performed on gpu in {}".format(size, time()-t))


# Exercice 2: 

Compute the SVD (singular value decomposition) of the given array, using cupy
Try also to compare with numpy (time-wise).

In [None]:
a1 = np.random.randn(500, 500)
from time import time

In [None]:
t = time()
# numpy code here

print(time()-t)

In [None]:
t = time()
# cupy code here

print(time()-t)

# Exercice 3 (Bonus):

Code a user-defined kernel. Here is a small tutorial on how to code kernels: https://docs-cupy.chainer.org/en/stable/tutorial/kernel.html

The exercice is to compute the softmax function (https://fr.wikipedia.org/wiki/Fonction_softmax) , on the inputs x,y,z  (3 categories) using a kernel. 

In [3]:
x = cp.random.randn(100)
y = cp.random.randn(100)
z = cp.random.randn(100)
