The following code will perform a big array and operate on it using NumPy

In [None]:
N      = 200_000_000   # number of float32 elements
ITERS  = 40
SEED   = 42

In [None]:
%%timeit
import time
import numpy as np

rng = np.random.default_rng(SEED)

# Allocate a big array (~4 bytes per float32)
x = rng.random(N, dtype=np.float32)

t0 = time.perf_counter()
for i in range(1, ITERS + 1):
    # use in-place ops to avoid extra allocations
    np.sin(x, out=x)
    x *= 1.000001
    x += 1e-6

Now, we'll check how it goes with CuPy.
First, we'll check if there's any NVIDIA GPU available:

In [None]:
!nvidia-smi

Then, we must check that CUDA is installed:

In [None]:
!nvcc --version

Before using CuPy, we need to install it:

In [None]:
pip install cupy-cuda12x

Now, let's check how faster NumPy operations can become with CuPy. The following code is exactly the same than above's with just one import change:

In [None]:
%%timeit
import time
import cupy as cp

rng = cp.random.default_rng(SEED)

# Allocate a big array (~4 bytes per float32)
x = rng.random(N, dtype=cp.float32)

t0 = time.perf_counter()
for i in range(1, ITERS + 1):
    # use in-place ops to avoid extra allocations
    cp.sin(x, out=x)
    x *= 1.000001
    x += 1e-6