# Exercise - NumPy to CuPy - `ndarray` Basics

Let's revisit our first NumPy exercise and try porting it to CuPy.

In [15]:
import numpy as np
import cupy as cp

xp = cp

Create the input data array with the numbers `1` to `500_000_000`.

In [16]:
arr = xp.arange(1, 500_000_001)
arr

array([        1,         2,         3, ..., 499999998, 499999999,
       500000000])

Calculate how large the array is in GB with `nbytes`.

In [17]:
arr.nbytes / 1e9

4.0

How many dimensions does the array have?

In [18]:
arr.ndim # `len(arr.shape)` also works, but is longer to type.

1

How many elements does the array have?

In [19]:
arr.size # For 1D array, `arr.shape[0]` also works, but `arr.size` multiplies the size of all dimensions.

500000000

What is the shape of the array?

In [20]:
arr.shape

(500000000,)

Create a new array with `5_000_000` elements containing equally spaced values between `0` to `1000` (inclusive).

In [21]:
arr = xp.linspace(0, 1000, 5_000_000, endpoint=True)
arr

array([0.0000000e+00, 2.0000004e-04, 4.0000008e-04, ..., 9.9999960e+02,
       9.9999980e+02, 1.0000000e+03])

Create a random array that is `10_000` by `5_000`.

In [22]:
arr = xp.random.rand(10_000, 5_000)
arr

array([[0.30388162, 0.15822512, 0.00286212, ..., 0.61119194, 0.24825195,
        0.21273523],
       [0.34926561, 0.6586961 , 0.88773327, ..., 0.13818685, 0.43676093,
        0.81181609],
       [0.72333416, 0.86124634, 0.59755301, ..., 0.23371152, 0.28002043,
        0.69306837],
       ...,
       [0.92018622, 0.54895313, 0.01193115, ..., 0.98508866, 0.50779427,
        0.48790585],
       [0.48466694, 0.86603534, 0.45511161, ..., 0.10740881, 0.48809623,
        0.36465386],
       [0.57836419, 0.65872789, 0.46128811, ..., 0.44591221, 0.28785918,
        0.55808207]])

Sort that array.

In [23]:
arr = xp.sort(arr)
arr

array([[7.39767191e-05, 2.96843576e-04, 6.12392604e-04, ...,
        9.99809127e-01, 9.99848644e-01, 9.99904338e-01],
       [3.36582828e-04, 4.44429575e-04, 1.05085069e-03, ...,
        9.99596910e-01, 9.99840365e-01, 9.99869106e-01],
       [1.66456844e-04, 3.18810953e-04, 3.64184564e-04, ...,
        9.99592186e-01, 9.99746421e-01, 9.99968319e-01],
       ...,
       [3.82024716e-05, 1.41508506e-04, 1.56391343e-04, ...,
        9.99706703e-01, 9.99875133e-01, 9.99977863e-01],
       [1.15853027e-04, 4.38459673e-04, 5.02324922e-04, ...,
        9.99388580e-01, 9.99629079e-01, 9.99832854e-01],
       [5.14081430e-05, 1.49291079e-04, 1.86902801e-04, ...,
        9.99762622e-01, 9.99903318e-01, 9.99966381e-01]])

Let's benchmark CuPy's sort against NumPy's sort.

In [24]:
arr_np = np.random.rand(10_000, 5_000)

%timeit np.sort(arr_np)
%timeit cp.sort(arr)

439 ms ± 41.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
137 ms ± 221 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


Reshape the CuPy array to have the last dimension of length `5`.

In [25]:
arr = arr.reshape((-1, 5))
# -1 will infer the size of that dimension from the rest. Would also accept: arr.reshape((10_000_000, 5))
arr

array([[7.39767191e-05, 2.96843576e-04, 6.12392604e-04, 8.04477820e-04,
        8.86757439e-04],
       [9.81560029e-04, 1.03055077e-03, 1.37487574e-03, 1.44260585e-03,
        1.51815177e-03],
       [2.08492668e-03, 2.42945307e-03, 2.46300063e-03, 2.80049017e-03,
        2.86212441e-03],
       ...,
       [9.95224845e-01, 9.96613791e-01, 9.97715213e-01, 9.98106364e-01,
        9.98446307e-01],
       [9.98664727e-01, 9.98694409e-01, 9.98727565e-01, 9.99084565e-01,
        9.99629672e-01],
       [9.99639734e-01, 9.99688036e-01, 9.99762622e-01, 9.99903318e-01,
        9.99966381e-01]])

Find the sum of each row. Rows are axis 0, but the sum is being applied across columns, which are axis 1.

In [26]:
arr_sum = xp.sum(arr, axis=1) # You could also write `arr.sum(axis=1)`.
arr_sum

array([2.67444816e-03, 6.34774416e-03, 1.26399950e-02, ...,
       4.98610652e+00, 4.99480094e+00, 4.99896009e+00])

Normalize each row of the original random array by dividing by the sum you just computed using broadcasting.

In [27]:
arr_normalized = arr / arr_sum[:, xp.newaxis]
arr_normalized

array([[0.02766055, 0.11099246, 0.22897905, 0.30080143, 0.33156651],
       [0.15463132, 0.16234913, 0.21659281, 0.22726276, 0.23916398],
       [0.1649468 , 0.19220364, 0.19485772, 0.22155786, 0.22643398],
       ...,
       [0.1995996 , 0.19987816, 0.20009906, 0.20017751, 0.20024568],
       [0.19994085, 0.19994679, 0.19995343, 0.2000249 , 0.20013404],
       [0.19996954, 0.1999792 , 0.19999412, 0.20002226, 0.20003488]])

Prove that your normalized array is actually normalized by checking that every row sums to 1.

If we try to use `np.testing.assert_allclose` with CuPy arrays, we get an error because CuPy arrays don't implicitly convert to NumPy arrays. We'll learn more about this in the next section.

In [28]:
xp.testing.assert_allclose(xp.sum(arr_normalized, axis=1), 1.0)