# Notebook 0 : Introduction to CuPy

Negin Sobhani and Deepak Cherian  
Computational & Information Systems Lab (CISL)  
negins@ucar.edu, dcherian@ucar.edu  

------------

## Introduction to CuPy
CuPy is an open-source GPU-accelerated array library for Python that is compatible with NumPy. CuPy uses NVIDIA CUDA to run operations on the GPU, which can provide significant performance improvements for numerical computations compared to running on the CPU.CuPy provides a NumPy-like interface for array manipulation and supports a wide range of mathematical operations, making it a powerful tool for scientific computing.

### Import NumPy and CuPy

In [1]:
## Import NumPy and CuPy
import cupy as cp
import numpy as np

### Creating Arrays in CuPy vs. NumPy

In [11]:
# create a 1D array with 5 elements on CPU
arr_cpu = cp.array([1, 2, 3, 4, 5])
print(arr_cpu)

# create a 1D array with 5 elements on GPU
arr_gpu = cp.array([1, 2, 3, 4, 5])
print(arr_gpu)

[1 2 3 4 5]
[1 2 3 4 5]


You can also create multi-dimensional arrays.

In [3]:
# create a 2D array of zeros with 3 rows and 4 columns
arr_cpu = np.zeros((3, 4))
print(arr_cpu)

arr_gpu = cp.zeros((3, 4))
print(arr_gpu)

[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]
[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]


### Basic Operations 

In [12]:
import numpy as np
import cupy as cp

# NumPy: Create an array
numpy_a = np.array([1, 2, 3, 4, 5])

# CuPy: Create an array
cupy_a = cp.array([1, 2, 3, 4, 5])

# Basic arithmetic operations
numpy_b = numpy_a + 2
cupy_b = cupy_a + 2

numpy_c = numpy_a * 2
cupy_c = cupy_a * 2

numpy_d = numpy_a.dot(numpy_a)
cupy_d = cupy_a.dot(cupy_a)

# Reshaping arrays
numpy_e = numpy_a.reshape(5, 1)
cupy_e = cupy_a.reshape(5, 1)

# Transposing arrays
numpy_f = numpy_e.T
cupy_f = cupy_e.T

# Complex example: element-wise exponential and sum
numpy_g = np.exp(numpy_a) / np.sum(np.exp(numpy_a))
cupy_g = cp.exp(cupy_a) / cp.sum(cp.exp(cupy_a))

### Checking for CuPy arrays using `is_cupy`


In [24]:
cupy_g.device

<CUDA Device 0>

### Moving Data between Host and Device


In [13]:
# Move data to GPU
arr_gpu = cp.asarray(arr_cpu)

# Move data back to host
arr_cpu = cp.asnumpy(arr_gpu)

## CuPy Implemented Functions
CuPy has equivalents for many of the commonly used NumPy functions, but not all. Here is a short list of the NumPy function with it's CuPy equivalent. You can see almost all of CuPy's functions will use the same function call as its NumPy equivalent.


## CuPy vs NumPy: Speed Comparison

In [19]:
import time

# create two 1000x1000 matrices
n = 1000

a_np = np.random.rand(n, n)
b_np = np.random.rand(n, n)

a_cp = cp.asarray(a_np)
b_cp = cp.asarray(b_np)

# perform matrix multiplication with NumPy and time it
start_time = time.time()
c_np = np.dot(a_np, b_np)
end_time = time.time()

numpy_time = end_time - start_time
print("NumPy time:", numpy_time, "seconds")

# perform matrix multiplication with CuPy and time it
start_time = time.time()
c_cp = cp.dot(a_cp, b_cp)
cp.cuda.Stream.null.synchronize()  # wait for GPU computation to finish
end_time = time.time()

cupy_time = end_time - start_time

print("CuPy time:", cupy_time, "seconds")
print("CuPy provides a", round(numpy_time / cupy_time, 2), "x speedup over NumPy.")

NumPy time: 0.031229019165039062 seconds
CuPy time: 0.0006198883056640625 seconds
CuPy provides a 50.38 x speedup over NumPy.


Now, let's make the same comparison with other array sizes:

In [20]:
for n in [10, 100, 1000, 5000, 10000]:
    print("n =", n)

    # create two nxn matrices
    a_np = np.random.rand(n, n)
    b_np = np.random.rand(n, n)
    a_cp = cp.asarray(a_np)
    b_cp = cp.asarray(b_np)

    # perform matrix multiplication with NumPy and time it
    start_time = time.time()
    c_np = np.dot(a_np, b_np)
    end_time = time.time()
    numpy_time = end_time - start_time

    # perform matrix multiplication with CuPy and time it
    start_time = time.time()
    c_cp = cp.dot(a_cp, b_cp)
    cp.cuda.Stream.null.synchronize()  # wait for GPU computation to finish
    end_time = time.time()
    cupy_time = end_time - start_time

    # print the speedup
    print("CuPy provides a", round(numpy_time / cupy_time,2), "x speedup over NumPy.\n")

n = 10
CuPy provides a 0.28 x speedup over NumPy.

n = 100
CuPy provides a 1.45 x speedup over NumPy.

n = 1000
CuPy provides a 52.8 x speedup over NumPy.

n = 5000
CuPy provides a 64.45 x speedup over NumPy.

n = 10000
CuPy provides a 66.44 x speedup over NumPy.

