Credits: The following are notes taken while working through Machine Learning with Python Cookbook by Chris Albon


This notebook covers the most common NumPy operations we are likely to run into while working on machine learning workflows.

# Creating a Vector

* A vector has magnitude (size) and direction:

/Solution/ - Use NumPy to create a one-dimensional array

In [2]:
# Load library
import numpy as np

In [4]:
# Create a vector as a row
vector_row = np.array([1, 2, 3])
vector_row

array([1, 2, 3])

In [7]:
# Create a vector as a column
vector_column = np.array([[1],[2],[3]])
vector_column

array([[1],
       [2],
       [3]])

# Creating a Matrix

* In mathematics, a matrix (plural: matrices) is a rectangular array[1] of numbers, symbols, or expressions, arranged in rows and columns.

/ Solution /- Use NumPy to create a two-dimensional array

However, the matrix data structure is not recommended for two reasons. 
 * First, arrays are the de facto standard (A de facto standard is a custom or convention that has achieved a dominant position by public acceptance or market forces) data structure of NumPy. 
 * Second, the vast majority of NumPy operations return arrays, not matrix objects.

In [8]:
# Load library
import numpy as np

In [11]:
# Create a matrix
matrix = np.array([[1,2],[1,2],[1,2]])
matrix

array([[1, 2],
       [1, 2],
       [1, 2]])

In [13]:
#NumPy actually has a dedicated matrix data structure
matrix = np.mat([[1,2],[1,2],[1,2]])
matrix

matrix([[1, 2],
        [1, 2],
        [1, 2]])

# Creating a Sparse Matrix

For example, imagine a matrix where the columns are every movie on Netflix, the rows are every Netflix user, and the values are how many times a user has watched that particular movie. This matrix would have tens of thousands of columns and millions of rows! However, since most users do not watch most movies, the vast majority of elements would be zero.

/Problem/ - Given data with very few nonzero values, you want to efficiently represent it.

/Solution/ - Create a sparse matrix

In [14]:
# Load libraries
import numpy as np
from scipy import sparse

In [15]:
# Create a matrix
matrix = np.array([[0, 0],
                   [0, 1],
                   [3, 0]])

In [19]:
# Create compressed sparse row (CSR) matrix
matrix_sparse = sparse.csr_matrix(matrix)
matrix_sparse

<3x2 sparse matrix of type '<class 'numpy.int64'>'
	with 2 stored elements in Compressed Sparse Row format>

In [24]:
# View sparse matrix
print(matrix_sparse)

  (1, 1)	1
  (2, 0)	3


In [21]:
# Create larger matrix
matrix_large = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                         [0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
                         [3, 0, 0, 0, 0, 0, 0, 0, 0, 0]])

In [27]:
# Create compressed sparse row (CSR) matrix
matrix_large_sparse = sparse.csr_matrix(matrix_large)
matrix_large_sparse

<3x10 sparse matrix of type '<class 'numpy.int64'>'
	with 2 stored elements in Compressed Sparse Row format>

In [26]:
# View larger sparse matrix
print(matrix_large_sparse)

  (1, 1)	1
  (2, 0)	3


# Selecting Elements

/Problem/ - You need to select one or more elements in a vector or matrix.

/Solution/ - NumPy array

In [28]:
# Load library
import numpy as np

In [31]:
# Create row vector
vector = np.array([1, 2, 3, 4, 5, 6])

# Create matrix
matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])


In [32]:
# Select third element of vector
vector[2]

3

In [34]:
# Select second row, second column from matrix
matrix[1,1]

5

NumPy arrays are zero-indexed.

NumPy offers a wide variety of methods for selecting (i.e., indexing and slicing) elements or groups of elements in arrays.

In [35]:
# Select all elements of a vector
vector[:]

array([1, 2, 3, 4, 5, 6])

In [36]:
# Select everything up to and including the third element
vector[:3]

array([1, 2, 3])

In [37]:
# Select everything after the third element
vector[3:]

array([4, 5, 6])

In [38]:
# Select the last element
vector[-1]

6

In [39]:
# Select the first two rows and all columns of a matrix
matrix[:2,:]

array([[1, 2, 3],
       [4, 5, 6]])

In [42]:
# Select all rows and the second column
matrix[:,1:2]

array([[2],
       [5],
       [8]])

# Describing a Matrix ( shape, size, and dimensions )

/Solution/ - Use shape, size, and ndim

In [43]:
# Load library
import numpy as np

In [44]:
# Create matrix
matrix = np.array([[1, 2, 3, 4],
                   [5, 6, 7, 8],
                   [9, 10, 11, 12]])

In [45]:
# View number of rows and columns
matrix.shape

(3, 4)

In [46]:
# View number of elements (rows * columns)
matrix.size

12

In [47]:
# View number of dimensions
matrix.ndim

2

# Applying Operations to Elements

/Problem/ - You want to apply some function to multiple elements in an array.

/Solution/ - Use NumPy’s vectorize

NumPy’s vectorize class converts a function into a function that can apply to all elements in an array or slice of an array.

It’s worth noting that vectorize is essentially a for loop over the elements and does not increase performance.

In [48]:
# Load library
import numpy as np

# Create matrix
matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

In [54]:
# Create function that adds 100 to something
add_100 = lambda i:i+100

In [53]:
def add_100(i):
    v = i + 100
    return v
add_100(20)

120

In [58]:
# Create vectorized function
vectorized_add_100 = np.vectorize(add_100)

In [59]:
# Apply function to all elements in matrix
vectorized_add_100(matrix)

array([[101, 102, 103],
       [104, 105, 106],
       [107, 108, 109]])

NumPy arrays allow us to perform operations between arrays even if their dimensions are not the same (a process called broadcasting). 

For example, we can create a much simpler version of our solution using broadcasting:

In [None]:
# Add 100 to all elements
matrix + 100

# Finding the Maximum and Minimum Values

/Solution/ - Use NumPy’s max and min

In [60]:
# Load library
import numpy as np

# Create matrix
matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

In [61]:
# Return maximum element
np.max(matrix)

9

In [62]:
# Return minimum element
np.min(matrix)

1

Using the axis parameter we can also apply the operation along a certain axis

In [63]:
# Find maximum element in each column
np.max(matrix, axis=0)

array([7, 8, 9])

In [64]:
# Find maximum element in each row
np.max(matrix, axis=1)

array([3, 6, 9])

# Calculating the Average, Variance, and Standard Deviation

Mean is average of a given set of data

Variance is the sum of squares of differences between all numbers and means.

Standard Deviation is square root of variance.


Height of 100 people was measured and got a mean of 1.7 meters and a standard deviation of 0.2 meters.That gives idea of the distribution of heights. 
But if we were informed the variance was 0.04 square meters, there’s not much we can do with that information (except take the square root to get the standard deviation).

But now suppose we were informed that gender explained 0.01 variance.Taking the square root to get 0.1 standard deviation doesn’t tell you anything. But taking 0.01 and dividing it by the 0.04 variance tells you gender explains 25% of the variance in height.


/Solution/ - Use NumPy’s mean, var, and std:

In [65]:
# Load library
import numpy as np

# Create matrix
matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

In [66]:
# Return mean 
np.mean(matrix)

5.0

In [67]:
# Return variance
np.var(matrix)

6.666666666666667

In [68]:
# Return standard deviation
np.std(matrix)

2.5819888974716112

In [69]:
# Find the mean value in each column
np.mean(matrix, axis=0)

array([ 4.,  5.,  6.])

In [70]:
# Find the mean value in each row
np.mean(matrix, axis=1)

array([ 2.,  5.,  8.])

 # Reshaping Arrays
 
/Problem/ - You want to change the shape (number of rows and columns) of an array without changing the element values.

/Solution/ - Use NumPy’s reshape

In [71]:
# Load library
import numpy as np

# Create 4x3 matrix
matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9],
                   [10, 11, 12]])

In [72]:
# Reshape matrix into 2x6 matrix
matrix.reshape(2, 6)

array([[ 1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12]])

In [73]:
matrix.size

12

In [80]:
# One useful argument in reshape is -1. so reshape(-1, 1) means one row and as many columns as needed
matrix.reshape(1, -1)

array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12]])

In [78]:
#Finally, if we provide one integer, reshape will return a 1D array of that length:
matrix.reshape(12)

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

# Transposing a Vector or Matrix

Transposing is a common operation in linear algebra where the column and row indices of each element are swapped.

/Solution/ - Use the T method

In [81]:
# Load library
import numpy as np

# Create matrix
matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

# Transpose matrix
matrix.T

array([[1, 4, 7],
       [2, 5, 8],
       [3, 6, 9]])

 a vector cannot be transposed because it is just a collection of values

In [82]:
# Transpose vector
np.array([1, 2, 3, 4, 5, 6]).T

array([1, 2, 3, 4, 5, 6])

However, it is common to refer to transposing a vector as converting a row vector to a column vector 

In [83]:
# Tranpose row vector
np.array([[1, 2, 3, 4, 5, 6]]).T

array([[1],
       [2],
       [3],
       [4],
       [5],
       [6]])

# Flattening a Matrix (transform a matrix into a one-dimensional array)


/Solution/ - Use flatten


In [84]:
# Load library
import numpy as np

# Create matrix
matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

# Flatten matrix
matrix.flatten()

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [91]:
#Alternatively, we can use reshape to create a row vector
matrix.reshape(1, -1)

array([[1, 2, 3, 4, 5, 6, 7, 8, 9]])

# Finding the Rank of a Matrix

The rank of a matrix is the dimensions of the vector space spanned by its columns or rows. 

Rank of a matrix – Rank of a matrix is equal to the maximum number of linearly independent row vectors in a matrix.

A set of vectors is linearly dependent if we can express at least one of the vectors as a linear combination of remaining vectors in the set.

/Solution/ - Use NumPy’s linear algebra method matrix_rank

In [92]:
# Load library
import numpy as np

# Create matrix
matrix = np.array([[1, 1, 1],
                   [1, 1, 10],
                   [1, 1, 15]])

# Return matrix rank
np.linalg.matrix_rank(matrix)

2

# Calculating the Determinant


Determinants are mathematical objects that are very useful in the analysis and solution of systems of linear equations.
The determinant of a linear transformation measures how much areas/volumes change during the transformation.
Determinants are defined only for square matrices.

If the determinant of a matrix is 0, the matrix is said to be singular, 

and if the determinant is 1, the matrix is said to be unimodular.



/Solution/ - Use NumPy’s linear algebra method det

In [5]:
# Load library
import numpy as np

# Create matrix
matrix = np.array([[1, 2, 3],
                   [2, 4, 6],
                   [3, 8, 9]])

#det(matrix)= 1* (4*9-8*6) -2*(2*9+3*6)+3*(2*8-3*4)

# Return determinant of matrix
np.linalg.det(matrix)

0.0

In [3]:
#create a 4*4 matrix
arr = np.arange(100,116).reshape(4,4)
arr

array([[100, 101, 102, 103],
       [104, 105, 106, 107],
       [108, 109, 110, 111],
       [112, 113, 114, 115]])

In [4]:
#find the determinant
np.linalg.det(arr)

-2.9582283945788078e-31

# Getting the Diagonal of a Matrix

/Solution/ - Use diagonal

In [6]:
# Load library
import numpy as np

# Create matrix
matrix = np.array([[1, 2, 3],
                   [2, 4, 6],
                   [3, 8, 9]])

# Return diagonal elements
matrix.diagonal()

array([1, 4, 9])

It is also possible to get a diagonal off from the main diagonal by using the offset parameter

In [7]:
# Return diagonal one above the main diagonal
matrix.diagonal(offset=1)

array([2, 6])

In [8]:
# Return diagonal one below the main diagonal
matrix.diagonal(offset=-1)

array([2, 8])

# Calculating the Trace of a Matrix

The trace of a matrix is the sum of the diagonal elements and is often used under the hood in machine learning methods. Given a NumPy multidimensional array, we can calculate the trace using trace. We can also return the diagonal of a matrix and calculate its sum:

/Solution/ - Use trace

In [9]:
# Load library
import numpy as np

# Create matrix
matrix = np.array([[1, 2, 3],
                   [2, 4, 6],
                   [3, 8, 9]])

# Return trace
matrix.trace()

14

In [10]:
# Return diagonal and sum elements
sum(matrix.diagonal())

14

# Finding Eigenvalues and Eigenvectors of square matrix

Eigenvectors are widely used in machine learning libraries. Intuitively, given a linear transformation represented by a matrix, A, eigenvectors are vectors that, when that transformation is applied, change only in scale (not direction). More formally:

Av=λv
where A is a square matrix, λ contains the eigenvalues and v contains the eigenvectors. 

In NumPy’s linear algebra toolset, eig lets us calculate the eigenvalues, and eigenvectors of any square matrix.

/Solution/ - Use NumPy’s linalg.eig

In [12]:
# Load library
import numpy as np

# Create matrix
matrix = np.array([[1, -1, 3],
                   [1, 1, 6],
                   [3, 8, 9]])

# Calculate eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(matrix)

In [13]:
# View eigenvalues
eigenvalues

array([ 13.55075847,   0.74003145,  -3.29078992])

In [14]:
# View eigenvectors
eigenvectors

array([[-0.17622017, -0.96677403, -0.53373322],
       [-0.435951  ,  0.2053623 , -0.64324848],
       [-0.88254925,  0.15223105,  0.54896288]])

#  Calculating Dot Products

/Solution/ - Use NumPy’s dot

In [17]:
# Load library
import numpy as np

# Create two vectors
vector_a = np.array([1,2,3])
vector_b = np.array([4,5,6])

# 1*4+2*5+3*6

# Calculate dot product
np.dot(vector_a, vector_b)

32

# Adding and Subtracting Matrices

/Solution/ - Use NumPy’s add and subtract:

In [18]:
# Load library
import numpy as np

# Create matrix
matrix_a = np.array([[1, 1, 1],
                     [1, 1, 1],
                     [1, 1, 2]])

# Create matrix
matrix_b = np.array([[1, 3, 1],
                     [1, 3, 1],
                     [1, 3, 8]])

# Add two matrices
np.add(matrix_a, matrix_b)

array([[ 2,  4,  2],
       [ 2,  4,  2],
       [ 2,  4, 10]])

In [19]:
# Subtract two matrices
np.subtract(matrix_a, matrix_b)

array([[ 0, -2,  0],
       [ 0, -2,  0],
       [ 0, -2, -6]])

In [20]:
#Alternatively, we can simply use the + and - operators

# Add two matrices
matrix_a + matrix_b

array([[ 2,  4,  2],
       [ 2,  4,  2],
       [ 2,  4, 10]])

In [21]:
# Subtract two matrices
matrix_a - matrix_b

array([[ 0, -2,  0],
       [ 0, -2,  0],
       [ 0, -2, -6]])

# Multiplying Matrices

/Solution/ - Use NumPy’s dot

In [22]:
# Load library
import numpy as np

# Create matrix
matrix_a = np.array([[1, 1],
                     [1, 2]])

# Create matrix
matrix_b = np.array([[1, 3],
                     [1, 2]])

# Multiply two matrices
np.dot(matrix_a, matrix_b)

array([[2, 5],
       [3, 7]])

# Inverting a Matrix

The inverse of a square matrix, A, is a second matrix A^–1, such that:

AA^−1=I
where I is the identity matrix. 

/Solution/ - Use NumPy’s linear algebra inv method

In [23]:
# Load library
import numpy as np

# Create matrix
matrix = np.array([[1, 4],
                   [2, 5]])

# Calculate inverse of matrix
np.linalg.inv(matrix)

array([[-1.66666667,  1.33333333],
       [ 0.66666667, -0.33333333]])

In [24]:
# Multiply matrix and its inverse
matrix @ np.linalg.inv(matrix)

array([[ 1.,  0.],
       [ 0.,  1.]])

# Generating Random Values

/Solution/ - Use NumPy’s random

In [25]:
# Load library
import numpy as np

# Set seed
np.random.seed(0)

# Generate three random floats between 0.0 and 1.0
np.random.random(3)

array([ 0.5488135 ,  0.71518937,  0.60276338])

In [26]:
# Generate three random integers between 1 and 10
np.random.randint(0, 11, 3)

array([3, 7, 9])

In [29]:
# Draw three numbers from a normal distribution with mean 0.0 and standard deviation of 1.0
np.random.normal(0.0, 1.0, 3)

array([-0.13309028, -2.42595457, -0.4530558 ])

In [30]:
# Draw three numbers from a logistic distribution with mean 0.0 and scale of 1.0
np.random.logistic(0.0, 1.0, 3)

array([ 1.62933678, -0.67491949,  0.61101311])

In [31]:
# Draw three numbers greater than or equal to 1.0 and less than 2.0
np.random.uniform(1.0, 2.0, 3)

array([ 1.36824154,  1.95715516,  1.14035078])