In [1]:
import numpy as np

# Aggregate Functions

These functions are useful when we wish to summarise the information contained in an array.

For binary ufuncs, there are some interesting aggregates that can be computed directly from the object. For example, if we’d like to reduce an array with a particular operation, we can use the reduce method of any ufunc. A reduce repeatedly applies a given operation to the elements of an array until only a single result remains. 

For example, calling reduce on the add ufunc returns the sum of all elements in the array:

In [2]:
x = np.arange(1, 6)
np.add.reduce(x)

15

If we’d like to store all the intermediate results of the computation, we can instead use accumulate:

In [None]:
np.add.accumulate(x)

### outer 
Outer products

Finally, any ufunc can compute the output of all pairs of two different inputs using the outer method. This allows you, in one line, to do things like create a multiplication table:

In [3]:
x = np.arange(1, 6)
np.multiply.outer(x, x)

array([[ 1,  2,  3,  4,  5],
       [ 2,  4,  6,  8, 10],
       [ 3,  6,  9, 12, 15],
       [ 4,  8, 12, 16, 20],
       [ 5, 10, 15, 20, 25]])

### inner

In [None]:
vector = np.array([1,2,3,4], int)
matrix1 = np.array([[1,2,3], [4,5,6], [7,8,9]], int)
matrix2 = np.array([[1,1,1], [0,0,0], [1,1,1]], int)

print("Inner of Matrix 1 and Matrix 2\n {}".format(np.inner(matrix1, matrix2)))

### cross

In [None]:
vector = np.array([1,2,3,4], int)
matrix1 = np.array([[1,2,3], [4,5,6], [7,8,9]], int)
matrix2 = np.array([[1,1,1], [0,0,0], [1,1,1]], int)

# Cross operator
print("Cross of Matrix 1 and Matrix 2\n {}".format(np.cross(matrix1, matrix2)))

## Aggregations: Min, Max, and Everything in Between
As a quick example, consider computing the sum of all values in an array. Python itself can do this using the built-in sum function:

In [4]:
import numpy as np
L = np.random.random(100)
sum(L)

54.48174277460942

The syntax is quite similar to that of NumPy’s sum function, and the result is the same in the simplest case: 

In [5]:
np.sum(L)

54.481742774609394

However, because it executes the operation in compiled code, NumPy’s version of the operation is computed much more quickly:

In [6]:
big_array = np.random.rand(1000000)
%timeit sum(big_array)
%timeit np.sum(big_array)

162 ms ± 42.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
1.35 ms ± 50.7 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [7]:
%timeit min(big_array)
%timeit np.min(big_array)

57.4 ms ± 2.75 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
588 µs ± 46.1 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


Whenever possible, make sure that you are using the NumPy version of these aggregates when operating on NumPy arrays!

Additionally, most aggregates have a NaN-safe counterpart that computes the result while ignoring missing values, which are marked by the special IEEE floating-point NaN value. Some of these NaN-safe functions were not added until NumPy 1.8, so they will not be available in older NumPy versions.

**Aggregation functions available in NumPy**

|Function Name |NaN-safe |Version Description|
|---:|---:|---:|
|np.sum| np.nansum |Compute sum of elements|
|np.prod| np.nanprod |Compute product of elements|
|np.mean| np.nanmean |Compute median of elements|
|np.std| np.nanstd |Compute standard deviation|
|np.var| np.nanvar |Compute variance|
|np.min| np.nanmin |Find minimum value|
|np.max| np.nanmax |Find maximum value|
|np.argmin| np.nanargmin |Find index of minimum value|
|np.argmax| np.nanargmax |Find index of maximum value|
|np.median| np.nanmedian |Compute median of elements|
|np.percentile| np.nanpercentile |Compute rank-based statistics of elements|
|np.any| N/A |Evaluate whether any elements are true|
|np.all| N/A |Evaluate whether all elements are true|
|np.corrcoef| N/A |Correlation coefficient|
|np.cumsum(axis=1)| N/A |Cumulative sum of the elements|

In [9]:
%timeit min(big_array)
%timeit np.min(big_array)
%timeit big_array.min()

56.1 ms ± 1.82 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
580 µs ± 56 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
533 µs ± 9.36 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [None]:
a = np.array([-4, -2, 1, 3, 5])
a.max()
a.min()
a.mean()
a.std()
a.argmax()  # `argmax` and `argmin` return the index of the maximum and minimum values in the array.
a.argmin()  # `argmax` and `argmin` return the index of the maximum and minimum values in the array.
a.median()
a.sum()
a.sum(axis = 0) # For sum up by column
a.sum(axis = 1) # For sum up by row

In [None]:
# Array-wise sum
a.sum()

In [None]:
# Array-wise minimum value
a.min()

In [None]:
# Maximum value of an array row
b.max(axis=0)

In [None]:
# Cumulative sum of the elements
b.cumsum(axis=1)

In [None]:
# Mean
a.mean()

In [None]:
# Median
b.median()

In [None]:
# Correlation coefficient
a.corrcoef() 

In [None]:
# Standard deviation
np.std(b) 

# Linear Algebra
#### Apply Basic Linear Algebra Operations

NumPy provides the ```np.linalg``` package to apply common linear algebra operations, such as:
* ```np.linalg.inv```: Inverse of a matrix (Nghịch đảo của ma trận)
* ```np.linalg.det```: Determinant of a matrix (Định thức của ma trận)
* ```np.linalg.eig```: Eigenvalues and eigenvectors of a matrix (Các giá trị riêng và hiệu riêng của ma trận)
    
Also, you can multiple matrices using ```np.dot(a, b)```. 


In [12]:
# np.linalg documentation
help(np.linalg)

Help on package numpy.linalg in numpy:

NAME
    numpy.linalg

DESCRIPTION
    Core Linear Algebra Tools
    -------------------------
    Linear algebra basics:
    
    - norm            Vector or matrix norm
    - inv             Inverse of a square matrix
    - solve           Solve a linear system of equations
    - det             Determinant of a square matrix
    - lstsq           Solve linear least-squares problem
    - pinv            Pseudo-inverse (Moore-Penrose) calculated using a singular
                      value decomposition
    - matrix_power    Integer power of a square matrix
    
    Eigenvalues and decompositions:
    
    - eig             Eigenvalues and vectors of a square matrix
    - eigh            Eigenvalues and eigenvectors of a Hermitian matrix
    - eigvals         Eigenvalues of a square matrix
    - eigvalsh        Eigenvalues of a Hermitian matrix
    - qr              QR decomposition of a matrix
    - svd             Singular value decomposition 

In [13]:
# Creating arrays
a = np.arange(1, 10).reshape(3, 3)
b= np.arange(1, 13).reshape(3, 4)
print(a)
print(b)

[[1 2 3]
 [4 5 6]
 [7 8 9]]
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


In [14]:
# Inverse
np.linalg.inv(a)

array([[ 3.15251974e+15, -6.30503948e+15,  3.15251974e+15],
       [-6.30503948e+15,  1.26100790e+16, -6.30503948e+15],
       [ 3.15251974e+15, -6.30503948e+15,  3.15251974e+15]])

In [15]:
# Determinant
np.linalg.det(a)

-9.51619735392994e-16

In [16]:
# Eigenvalues and eigenvectors
np.linalg.eig(a)

(array([ 1.61168440e+01, -1.11684397e+00, -9.75918483e-16]),
 array([[-0.23197069, -0.78583024,  0.40824829],
        [-0.52532209, -0.08675134, -0.81649658],
        [-0.8186735 ,  0.61232756,  0.40824829]]))

In [17]:
# Multiply matrices
np.dot(a, b)

array([[ 38,  44,  50,  56],
       [ 83,  98, 113, 128],
       [128, 152, 176, 200]])