<div class="alert alert-block alert-info">
Author:<br>Felix Gonzalez, P.E. <br> Adjunct Instructor, <br> Division of Professional Studies <br> Computer Science and Electrical Engineering <br> University of Maryland Baltimore County <br> fgonzale@umbc.edu
</div>

In [1]:
# In Jupyter Notebooks is best practice to load packages towards the beginning of the notebook 
import numpy as np # Numpy library
import matplotlib.pyplot as plt # MatPlotLib Plotting library.

This notebook provides an overview of the Numpy Package. NumPy provides Python with scientific computing capabilities including but not limited to math, shape manipulation, linear algebra, basic statistics, matrix operations, among other. Numpy is included as a package in the Anaconda Distribution. The notebook will highlight the principal functions that can be used for general mathematical calclations and data analysis.

Full list of Numpy functionalities can be found at the routines documentation page (https://numpy.org/doc/stable/reference/routines.html). This includes from array creation, manipulation, binary operations, string operations, datetime support functions, linear lagebra, logic functions, mathematical operations, matrix operations, polynomials, sorting, searching and counting, and statistics. Numpy also has defined constants such as infinity, nan (Not a Number or null value) and pi (https://numpy.org/doc/stable/reference/constants.html).

Other mathematical and scientific computing Python pacakges that may be alternative to Numpy include Pandas, SciPy, TensorFlow, PyTorch. Other programing langauges that have similar capabilities include R, and MatLab.

Documentation Reference:
- https://numpy.org/doc/stable/
- https://numpy.org/doc/stable/reference/index.html

# Table of Contents
[Numpy Arrays](#Numpy-Arrays)

[Multidimensional Arrays](#Multidimensional-Arrays)

[Array Creation Functions](#Array-Creation-Functions)

[Math Functions](#Math-Functions)

[Searching](#Searching)

[Vectorization](#Vectorization)

[Distributions](#Distributions)

[Linear Regression](#Linear-Regression)

# Numpy Arrays
[Return to Table of Contents](#Table-of-Contents)

Arrays are the base object and data collection within Numpy. All data in a Numpy array must be of a single data type (dtype). Numpy has a large number of possible data types including but not limited to those in base Python:
- np.str ==> string
- np.bool ==> boolean (i.e., True|False)
- np.int ==> integer
- np.float ==> floating point
- np.complex ==> complex (i.e., 1+1j)

Other datatypes include timedelta and datetime which we will discuss in a future notebook. 

One advantage of using arrays is that calculations tend to be faster when compared to other methods that can have multiple data types.

In [2]:
# Construction of an array.
array1 = np.array(object = [1,2,3,7,8,9]) # Note that user can specify the dtype. Alternatively is detected by Np.
# Alternatively can also be written as np.array([1,2,3]), note parenthesis and square brackets.
array1

array([1, 2, 3, 7, 8, 9])

In [3]:
# Note the difference between the output of an array1 variable vs using the print function.
print(array1)

[1 2 3 7 8 9]


In [4]:
type(array1)

numpy.ndarray

In [5]:
# You can convert Lists to NP Arrays
# and believe me, you will convert lots of lists to NP arrays
list1 = [1.0, 2.0, 3.0]
array2 = np.array(list1)

In [6]:
type(list1)

list

In [7]:
type(array2)

numpy.ndarray

In [8]:
# Currently, array1 is an integer array
# If we want to convert that integer array into a float array, 
# we need to use an associated function (with an underscore). For example:
array3 = np.float_(array1)

In [9]:
# By now you may have noticed that to show the output of two objects in one cell I need to use the print function.
# This uses the space within the notebook a little more efficiently.
print(array1)
print(array3)

[1 2 3 7 8 9]
[1. 2. 3. 7. 8. 9.]


In [10]:
# If only the objects are written the output shown is that of the last object.
array1
array3

array([1., 2., 3., 7., 8., 9.])

In [11]:
# Data selection occurs in a similar way to the list indexing or slicing method. 
# Note that in this case I have a further list within the array.
array1[1]

2

In [12]:
# Data selection occurs in a similar way to the list indexing or slicing methods.
# Output being with the output being the value for indexing and a smaller array for slicing. 
array1[1]

2

In [13]:
array1[2:]

array([3, 7, 8, 9])

In [14]:
# Copying arrays.

array1 = np.array([1, 2, 3]) 
array2 = array1

print('Before______')
print(array1)
print(array2)

array1[0] = 5 # By changing array1 element 0 we also make the change in array2.

print('After_______')
print(array1)
print(array2)

Before______
[1 2 3]
[1 2 3]
After_______
[5 2 3]
[5 2 3]


In [15]:
# Same as other objects in Python, Numpy arrays are passed by reference (to minimize space used in memory)
# This is why when we made a change in array1 above, we also changed array2
# To ensure that values are independent, use the copy function.

array1 = np.array([1, 2, 3]) 
array2 = np.copy(array1)

print('Before______')
print(array1)
print(array2)

array1[0] = 5 # Because we used the copy function, by changing array1 element 0 we DO NOT make the change in array2.

print('After_______')
print(array1)
print(array2)

Before______
[1 2 3]
[1 2 3]
After_______
[5 2 3]
[1 2 3]


# Multidimensional Arrays
[Return to Table of Contents](#Table-of-Contents)

Numpy arrays can be N-dimensional, which is of particular use with tables of data (i.e. 2-D). Depending on the dataset, various functions can be used to shape the data into a form that you may need. Arrays also work faster than other data collections making it useful to manipulate data in an efficient way. This section also discusses methods and functions to transform multidimensional arrays.

In [16]:
# Creating a 4x3 Array:
array4 = np.array( [[1, 2, 3], [3, 4, 5], [5, 6, 7], [7, 8, 9]])
print(array4)

[[1 2 3]
 [3 4 5]
 [5 6 7]
 [7 8 9]]


In [17]:
# Shape function gives me the number of elements iin columns and rows.
array4.shape

(4, 3)

In [18]:
# Size function gives me the number of items or values.
array4.size

12

In [19]:
# Data selection occurs in a similar way to the list indexing or slicing method. 
# Note that in this case I have a further array within the array.
# The coma determines the location within the row or column.
array4[1]

array([3, 4, 5])

In [20]:
# Selects the data in the second column.
array4[:,1]

array([2, 4, 6, 8])

In [21]:
# Selects values in the second array element.
array4[1,:]

array([3, 4, 5])

In [22]:
# Selects value in the second row and third column.
array4[1,2]

5

In [23]:
# Starting at array element 0 jumps one.
array4[0::2]

array([[1, 2, 3],
       [5, 6, 7]])

In [24]:
# Makes converts the multidimensional array into a 1-dimension array.
array4.flatten()

array([1, 2, 3, 3, 4, 5, 5, 6, 7, 7, 8, 9])

In [25]:
# Reshapes the array into a 2 by 3.
array4.reshape((3,4))

array([[1, 2, 3, 3],
       [4, 5, 5, 6],
       [7, 7, 8, 9]])

In [26]:
# The original shape of the array.
array4.reshape((4,3))

array([[1, 2, 3],
       [3, 4, 5],
       [5, 6, 7],
       [7, 8, 9]])

In [27]:
# Returns a 1-dimension over the column axis.
array4.reshape((-1,1))

array([[1],
       [2],
       [3],
       [3],
       [4],
       [5],
       [5],
       [6],
       [7],
       [7],
       [8],
       [9]])

In [28]:
# Here the -1 means "hey python, you determine the length along this axis"
# We specify the number of columns (e.g., 6) , NP determines the number of rows.
array4.reshape((-1, 6))

array([[1, 2, 3, 3, 4, 5],
       [5, 6, 7, 7, 8, 9]])

In [29]:
array4

array([[1, 2, 3],
       [3, 4, 5],
       [5, 6, 7],
       [7, 8, 9]])

In [30]:
# Changes transposes the values and dimensions of the array.
array4.transpose()

array([[1, 3, 5, 7],
       [2, 4, 6, 8],
       [3, 5, 7, 9]])

In [31]:
# Another method of transposing
array4.T

array([[1, 3, 5, 7],
       [2, 4, 6, 8],
       [3, 5, 7, 9]])

In [32]:
# and yet another method of transposing
array4.swapaxes(0, 1)

array([[1, 3, 5, 7],
       [2, 4, 6, 8],
       [3, 5, 7, 9]])

# Array Creation Functions
[Return to Table of Contents](#Table-of-Contents)

In [33]:
# Arange function is similar to Python's range function.
array5 = np.arange(start = 3, stop = 10, step = 2)
array5

array([3, 5, 7, 9])

In [34]:
# Similar to above but specifies the number of elements instead of the steps. Note result is a float. 
array5 = np.linspace(start = 3, stop = 9, num = 4)
array5

array([3., 5., 7., 9.])

In [35]:
array6 = np.arange(0,51,5)
array6

array([ 0,  5, 10, 15, 20, 25, 30, 35, 40, 45, 50])

In [36]:
# Array of values of 1. For np.ones, shape can be changed like the example in the next cell. 
array7 = np.ones(10)
array7

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

In [37]:
# Array of values of zeros with shape of 3x5.
array8 = np.zeros((3,5))
print(array8)

[[0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]]


In [38]:
# The identity array is a square array with ones on the main diagonal.
array9 = np.identity(6)
array9

array([[1., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 1.]])

In [39]:
array10 = np.random.randint(1,10,12).reshape((4,3))
array10

array([[7, 4, 3],
       [1, 3, 1],
       [5, 2, 7],
       [6, 3, 7]])

# Math Functions
[Return to Table of Contents](#Table-of-Contents)

Example of math functions include mean, sum, product, min and max functions. This section also include examples on multidimensional arrays inverse, determinant, multiplication, and dot product. 

Documentation References:
- https://numpy.org/doc/stable/reference/routines.math.html

In [40]:
array11 = np.arange(1, 18, 2).reshape((3,3))
array11

array([[ 1,  3,  5],
       [ 7,  9, 11],
       [13, 15, 17]])

In [41]:
# Min and Max within the array.
array11.min(), array11.max()

(1, 17)

In [42]:
# Mean, sum, product, min and max functions.
array11.mean(), array11.sum(), array11.prod(), array11.min(), array11.max()

(9.0, 81, 34459425, 1, 17)

In [43]:
# Average of each column
array11.mean(axis=0)

array([ 7.,  9., 11.])

In [44]:
# Average of each row
array11.mean(axis=1)

array([ 3.,  9., 15.])

In [45]:
# Average of columns and rows.
array11.mean(axis=(0,1))

9.0

In [46]:
# Sum of columns
array11.sum(axis=0)

array([21, 27, 33])

In [47]:
# Sum of rows
array11.sum(axis=1)

array([ 9, 27, 45])

In [48]:
array11

array([[ 1,  3,  5],
       [ 7,  9, 11],
       [13, 15, 17]])

In [49]:
array11 + 5 # Adds 5 to all numbers in the array.

array([[ 6,  8, 10],
       [12, 14, 16],
       [18, 20, 22]])

In [50]:
array11 * 2 # Multiplies each number in the array by 2.

array([[ 2,  6, 10],
       [14, 18, 22],
       [26, 30, 34]])

In [51]:
array11 ** 2 # Numbers in the array by the exponent 2. 

array([[  1,   9,  25],
       [ 49,  81, 121],
       [169, 225, 289]], dtype=int32)

In [52]:
np.log10(array11) # Note warning when encountered by a division by zero.

array([[0.        , 0.47712125, 0.69897   ],
       [0.84509804, 0.95424251, 1.04139269],
       [1.11394335, 1.17609126, 1.23044892]])

In [53]:
np.exp(array11) # Exponent of the array value.

array([[2.71828183e+00, 2.00855369e+01, 1.48413159e+02],
       [1.09663316e+03, 8.10308393e+03, 5.98741417e+04],
       [4.42413392e+05, 3.26901737e+06, 2.41549528e+07]])

In [54]:
np.sin(array11) # Cosine

array([[ 0.84147098,  0.14112001, -0.95892427],
       [ 0.6569866 ,  0.41211849, -0.99999021],
       [ 0.42016704,  0.65028784, -0.96139749]])

In [55]:
# Eigenvalues and eigenvectors
np.linalg.eig(array11)

(array([ 2.94452187e+01, -2.44521872e+00,  1.22040551e-15]),
 array([[ 0.20079137,  0.79227344,  0.40824829],
        [ 0.5165778 ,  0.09475476, -0.81649658],
        [ 0.83236422, -0.60276391,  0.40824829]]))

In [56]:
# Calculating inverses
np.linalg.inv(array11)

array([[-3.75299969e+14,  7.50599938e+14, -3.75299969e+14],
       [ 7.50599938e+14, -1.50119988e+15,  7.50599938e+14],
       [-3.75299969e+14,  7.50599938e+14, -3.75299969e+14]])

In [57]:
# Determinant
np.linalg.det(array11)

3.1974423109204565e-14

In [58]:
# Rank of a matrix
np.linalg.matrix_rank(array11)

2

In [59]:
# Trace (i.e., sum of diagonal elements in the matrix)
np.trace(array11)

27

In [60]:
# Recall array2 (shape 1x3) and array11 (shape 3x3)
print(array2)
print('\n')
print(array11)

[1 2 3]


[[ 1  3  5]
 [ 7  9 11]
 [13 15 17]]


In [61]:
array11 * array2 # Multiplies arrays as long as the shape allows multiplication of the matrixes.

array([[ 1,  6, 15],
       [ 7, 18, 33],
       [13, 30, 51]])

In [62]:
np.cross(array11, array2) # Cross product as long as the array shapes allows multiplicatio f matrixes.

array([[ -1,   2,  -1],
       [  5, -10,   5],
       [ 11, -22,  11]])

# Searching
[Return to Table of Contents](#Table-of-Contents)

In [63]:
print(array11)

[[ 1  3  5]
 [ 7  9 11]
 [13 15 17]]


In [64]:
print ('Indices of elements > 11')
indices_arr = np.where(array11 > 11)
indices_arr
# Returns two arrays where array 0 is the index location element on the row and array 1 is the index location on the column.

Indices of elements > 11


(array([2, 2, 2], dtype=int64), array([0, 1, 2], dtype=int64))

In [65]:
indices_arr[0][2]

2

In [66]:
indices_arr[1][2]

2

In [67]:
array11[2,2]

17

In [68]:
# Each array can be called using the index selection.
np.where(array11 > 5)[0]

array([1, 1, 1, 2, 2, 2], dtype=int64)

In [69]:
np.where(array11 > 5)[1]

array([0, 1, 2, 0, 1, 2], dtype=int64)

In [70]:
array11[1, 2] # A specific value within the array can be called giving the coordinates. 

11

In [71]:
# Working with the indices I can make the following function to call elements.
element_index = 2
array11[indices_arr[0][element_index], indices_arr[1][element_index]]

17

In [72]:
# You can use more than one criteria but in that case we should use extra ()s as follows 
np.where((array11 > 5) & (array11 < 15))

(array([1, 1, 1, 2], dtype=int64), array([0, 1, 2, 0], dtype=int64))

# Vectorization
[Return to Table of Contents](#Table-of-Contents)

Sometimes, you’ll want to make complex functions that don’t necessarily automatically work with numpy arrays. The soLution is  vectorization. Let's see how vectorization works with an example.

In [73]:
def funct1(val): # Defined "funct1" accepts only one value.
    if val < np.pi/2: # Doesn’t work with array
        x = np.sin(val)
    else:
        x = np.cos(val)
    return x

In [74]:
# This will work
funct1(3.14/4)

0.706825181105366

In [75]:
# Let's create an array with 2 million values.
z = np.linspace(start = 0, stop = np.pi, num = 2000000)

In [76]:
# Do not print a variable with 2,000,000 values. Will most probably crash the Notebook.
print(f'Number of elements in z {type(z)} is: {len(z):,}')
z # Because there are so many values it only provides the first three and last three numbers in the array.

Number of elements in z <class 'numpy.ndarray'> is: 2,000,000


array([0.00000000e+00, 1.57079711e-06, 3.14159422e-06, ...,
       3.14158951e+00, 3.14159108e+00, 3.14159265e+00])

In [77]:
# Calling funct1 with the "z" array will fail.
# Error provides details on the array having more than one element.
#funct1(z)

In [78]:
# By using the vectorize function on the funct1 will work.
vfunct1 = np.vectorize(funct1)
# Because the vectorize function makes functions like this work for arrays

In [80]:
%%time
# Now that we have the vectorized function we can proceed to use it with z array.
vfunct1(z)

Wall time: 1.46 s


array([ 0.00000000e+00,  1.57079711e-06,  3.14159422e-06, ...,
       -1.00000000e+00, -1.00000000e+00, -1.00000000e+00])

In [81]:
%%time
# Option 2 would be to iterate thru each element in the array and use the non-vectorized function
for element in range(len(z)):
    z[element] = funct1(z[element])
z[:4]

Wall time: 2.63 s


array([0.00000000e+00, 1.57079711e-06, 3.14159422e-06, 4.71239134e-06])

While this is fine to do for functions you don’t need high performance on, it is slow(ish). Consider writing the function better for speed. 

Using the magic %%time in the above cells we can benchmark the two code cells. In the above example the vectorized function ran in 0ns while the loop ran in 2.4 seconds.

# Distributions
[Return to Table of Contents](#Table-of-Contents)

Numpy has the capability to create random sample for multiple types of distributions including but not limited to normal, binomial, lognormal, gamma, poisson, power, etc. Below is an example using the normal distribution.

Documentation References:
- https://numpy.org/doc/stable/reference/random/generator.html#distributions
- https://www.w3schools.com/python/numpy/numpy_random_binomial.asp

In [None]:
mu, sigma = 0, 0.1 # Parameters for the normal distribution: Mean and Standard deviation
s = np.random.default_rng().normal(mu, sigma, 1000) # Calling the Normal distribution function

In [None]:
count, bins, ignored = plt.hist(s, 30, density=True)

In [None]:
plt.plot(bins, 1/(sigma * np.sqrt(2 * np.pi)) *
               np.exp( - (bins - mu)**2 / (2 * sigma**2) ),
         linewidth=2, color='r')
plt.show()

# Linear Regression
[Return to Table of Contents](#Table-of-Contents)

`numpy.linalg.lstsq()`: Return the least-squares solution to a linear matrix equation.Solves the equation ${\bf A x} = {\bf b}$ by computing a vector x that minimizes the Euclidean 2-norm $||{\bf b} - {\bf A x} ||^2$. 

The equation may be under-, well-, or over- determined (i.e., the number of linearly independent rows of a can be less than, equal to, or greater than its number of linearly independent columns). If a is square and of full rank, then x (but for round-off error) is the “exact” solution of the equation.

In [None]:
x = np.arange(0, 9) # coordinates
A = np.array([x, np.ones(9)])

# linearly generated sequence
y = [19, 21, 20.5, 21.5, 21.2, 23, 23, 25.5, 24]
# obtaining the parameters of regression line
w = np.linalg.lstsq(A.T, y, rcond = -1)[0] 

# plotting the line
line = w[0]*x + w[1] # regression line
plt.plot(x, line, 'r-')
plt.plot(x, y, 'o')
plt.show()

In [None]:
# Let solve A*solutions = y
A = np.random.rand(5,5)
y = [1,2,3,4,5]
solution, residuals, rank, singular = np.linalg.lstsq(A, y, rcond = -1)
print(solution)
print(residuals)
print(rank)
print(singular)

In [None]:
np.dot(A,solution)

# NOTEBOOK END