numpy provides functions for aggregation. They take an array as an argument and return either another array or a scalar. We can also use their counterparts implemented as methods in the ndarray class.

First let's create a 2-dimensional array and then demonstrate some of the functions used for aggregation:

In [2]:
import numpy as np
A = np.arange(1, 13).reshape(4, 3)
A

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

In [4]:
# The first function we're going to talk about is np.sum. It sums all the elements in the array and returns a scalar.
np.sum(A)

78

In [5]:
# Alternatively, we can use the method:
A.sum()

78

In [7]:
# We can also aggregate over a specific axis. Let's sum the rows first (axis 1). Now we'll get an array with the sums
# in each row.
A.sum(axis = 1)

array([ 6, 15, 24, 33])

In [8]:
# And now the columns:
A.sum(axis = 0)

array([22, 26, 30])

The other functions and methods work the same way. Here are some examples:

In [9]:
# Let's say we need the average of all the elements in each row:
A.mean(axis = 1)

array([ 2.,  5.,  8., 11.])

In [10]:
# And now the product of all the elements in each column:
A.prod(axis = 0)

array([ 280,  880, 1944])

In [12]:
# And now the standard deviation for each column.
A.std(axis = 0)

array([3.35410197, 3.35410197, 3.35410197])

In [13]:
# We can also aggregate cumulatively. Here's how we can get an array of cumulative sums of all the elements of the array.
A.cumsum()

array([ 1,  3,  6, 10, 15, 21, 28, 36, 45, 55, 66, 78], dtype=int32)

In [15]:
# And here are the cumulative products by column:
A.cumprod(axis = 0)

array([[   1,    2,    3],
       [   4,   10,   18],
       [  28,   80,  162],
       [ 280,  880, 1944]], dtype=int32)

In [16]:
# We also have the np.max and np.min functions for the minimum and maximum value. Here are the minimum values in each row:
np.min(A, axis = 1)

array([ 1,  4,  7, 10])

In [17]:
# The np.argmin and np.argmax functions, on the other hand, return the index of the minimum and maximum value respectively:
A.argmax()

11

In [18]:
# Finally, two very frequently used functions: np.all, which return True if all elements are nonzero, and np.any,
# which returns True if any of the elements is nonzero. 

# In case of our array all elements are nonzero, so both methods will return True:
A.all()

True

In [19]:
A.any()

True

EXERCISE

Here's an array:

X = np.array([[4, 0, 6], [2, 7, 7], [1, 1, 2]])

First find out what's the average value for each row and then check for each column if all the elements in it are nonzero.

SOLUTION

In [22]:
X = np.array([[4, 0, 6], [2, 7, 7], [1, 1, 2]])
X

array([[4, 0, 6],
       [2, 7, 7],
       [1, 1, 2]])

In [23]:
X.mean(axis = 1)

array([3.33333333, 5.33333333, 1.33333333])

In [24]:
X.all(axis = 0)

array([ True, False,  True])