<a href="https://colab.research.google.com/github/sayyed-uoft/fullstackai/blob/main/03_Numerical_Python_(NumPy).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Strata.ai - Artificial Intelligence Certificate 

# Module 1: Data Science for AI

# Numerical Python (NumPy)

## Learning Outcome

- Learn the fundamental numerical package in Python (NumpPy)
- Work with multi-dimensional arrays (ndarrays)
- Write efficient algorithms through vectorization

## Topics
- [Numerical Python (NumPy) - Features](#numpy)
- [ndarray: A Multidimensional Array Object](#ndarray)
- [Data Types](#dtype)
- [Creation Functions](#creation)
- [Element-wise Operations](#operations)
- [Broadcasting](#broadcasting)
- [Element-wise (Universal) Functions](#functions)
- [Aggregate and Statistical Functions](#stat_funcs)
- [Set Operations](#set_ops)
- [Indexing and Slicing](#indexing)
- [Sorting & Searching](#sort_search)
- [Manipulating Arrays](#manipulating)
- [Linear Algebra](#linear_algebra)
- [Random Generation](#random)
- [Vectorization](#vectorization)
- [Storing and Retrieving Arrays](#storing) 

<a id="numpy"></a>
## Numerical Python (NumPy) - Features

- The most important foundational packages for numerical computing in Python.
- An efficient multidimensional array providing fast array-oriented arithmetic operations.
- Mathematical functions for fast operations on entire arrays of data without having to write loops.
- Tools for reading/writing array data to disk and working with memory-mapped files.
- Linear algebra and random number generation capabilities.




<a id="ndarray"></a>
## ndarray: A Multidimensional Array Object

- A central data structure of the **numpy** library.
- A fast and flexible container for large datasets in Python.
- Can represent multidimentional arrays (matrices) of the same type. 
- All the elements are the same data type (**dtype**), typically numbers.
- Supports views, indexing, and slicing. 

In [None]:
# Import python package - common to use "np" as an aliase
import numpy as np 
print(np.__version__) # get the version

# Create a 2x3 array (matrix) from a nested list
m = np.array([
        [1, 2, 3], 
        [4, 5, 6]
    ]) 

print(type(m))
print(m)
print(m.shape, m.ndim) # get the shape & the number of dimensions
print(m.dtype) # get the data type of the elements
print(m.size) # get the number of elements 
print(m.data) #

<center>
    <img src="https://github.com/sayyed-uoft/images/raw/main/nd_vector.png" width="100%">
</center>

<a id="dtype"></a>
## Data Types

| Type | Description | 
| :-- | :-- | 
| (u)int\[8, 16, 32, 64\] | Signed (unsigned) integers 8, 16, 32, 64-bit |
| float\[16, 32, 64, 128\] | Floating point numbers half, single, double, and extended precision |
| complex\[64, 128, 256\] | Complex numbers represented by 2x32, 2x64, or 2x128 floats |
| bool | Boolean values (True and False) | 
| object | Any Python object type |
| string_ | ASCII string type (1 byte per character) |
| unicode_ | Unicode string type (number of bytes platform specific) |

In [None]:
# Data types
data = np.array([
        [1.5, 2.5], 
        [3.5, 4.5],
        [5.0, 6.5]
    ]) 

print(data.dtype, data.shape)

data = np.array([0, 1], dtype=np.int32) # an integer vector 
print(data.dtype, data.shape)

# Converting data types
d2 = data.astype(np.float64)
print(d2.dtype, d2)

<a id="creation"></a>
## Creation Functions

| Function | Description | 
| :-- | :-- | 
| array | Creates an array from any any object exposing the array interface (e.g. list, tuple, array). "dtype" can be set explicitly or inferred automatically  |
| asarray | Same as "array" but returns the same object if the input is already a "narray" |
| arange, linespace | Returns evenly spaced values within a given interval as an ndarray. "arange" inputs the step size but "linespace" inputs teh number of samples |
| ones, ones_like | Returns a new array of given shape and type, filled with ones. "_likes" takes another array as input and uses the same shape and dtype |
| zeros, zeros_like | Returns a new array of given shape and type, filled with zeros | 
| empty, empty_like | Returns a new array of given shape and type, without initializing entries |
| full, full_like | Returns a new array of given shape and type, filled with "fill_value" |
| eye, identity | Return a 2-D array with ones on the diagonal and zeros elsewhere |
| indices | Returns an array representing the indices of a grid | 
| random.random, random.randn | Returns random (uniform/normal)floats in the half-open interval \[0.0, 1.0) |

In [None]:
# Creating special arrays

a1 = np.ones((2,2)) # an array of ones - default type is float 
print(a1.dtype, a1) 

a2 = np.zeros((2,2,3), dtype='int') # an array of zeros (integer)
print(a2.dtype, a2)

a3 = np.empty((1,5)) # an empty array - without initialization
print(a3.dtype, a3)

a4 = np.full((2,2), 10) # a full array with the specified value
print(a4.dtype, a4)

In [None]:
# Other ways to create an array

a1 = np.random.random((2,2)) # an array with random values (0-1, uniform distribution)
print(a1.dtype, a1)

a2 = np.random.randn(1,2) # an array with random values (0-1, normal distribution)
print(a2.dtype, a2)

a3 = np.arange(10, 25, 5) # a vector created from a range
print(a3.dtype, a3.shape, a3)

a4 = np.linspace(0, 2, 9) # a vector of evenly-spaced values
print(a4.dtype, a4.shape, a4)

a5 = np.indices((2, 2) ) # a matrix of indices
print(a5.dtype, a5)

<a id="operations"></a>
## Element-wise Operations

- Any arithmetic operations between equal-size arrays or an array and a single value applies the operation element-wise
- These operation sre adone in **batch** without ths need for inefficient **for** loop. This is called **vectorization**
- Comparative/logical operators can be used between elements of two arrays of the same shape too

In [None]:
# Arthmetic operations (with a single value)
d1 = np.array([[1, 2], [3, 4]], dtype=np.float32)

print(d1 + 1) # add one value to all elements
print(d1 * 2) # multiply all elements by a single value
print(d1 / 2) # divide all elements by a single value
print(d1 % 2) # reminder of all elements by a single value
print(1 / d1) # divive a sigle value by all elements

In [None]:
# Artyhmetic operations (between two array with the same shape)
d1 = np.array([[1, 2], [3, 4]], dtype=np.float32)
d2 = np.array([[5, 6], [7, 8]], dtype=np.float32)

print(d1 + d2) # adding two arrays
print(d2 - d1) # subtracting one array from another one
print(d1 * d2) # multiplying two arrays
print(d1 / d2) # dividing two arrays
print(d1 % d2) # remainders 
print(d1 ** d2) # power

In [None]:
# Comparing arrays (elements-wise)
print(d1 == d2)
print(d1 < d2)
print((d1 < d2) & (d1 == d2)) # using logical operators &, |, ^

# Comparing the speed of operations in Numpy with "for" loop
a_list = range(1000000)
a_arr = np.array(a_list)

%time res = [x ** 2 for x in a_list]  # using time directive. For all directives visis 
                                      # https://ipython.readthedocs.io/en/stable/interactive/magics.html
%time res = a_arr ** 2

del a_list, a_arr # release memeory 

<a id="broadcasting"></a>
## Broadcasting

- **Broadcasting** describes how arithmetic works between arrays of different shapes. 
- Arithmetic, including broadcasting, can only be performed when the shape of each dimension in the arrays are equal or one has the dimension size of 1. The dimensions are considered in reverse order, starting with the trailing dimension; for example, looking at columns before rows in a two-dimensional case.

In [None]:
# 2-D & 1-D Example
a = np.array([[1, 2], [3, 4], [5, 6]])
b = np.array([3, 2])
c = np.array([3, 5, 4]).reshape((3, 1))
print(a.shape)
print(a)
print(b.shape, b)
print(c.shape, c)
print(a - b)
print(a - c)

<center>
    <img src="https://github.com/sayyed-uoft/images/raw/main/broadcast.png" width="100%">
</center>

In [None]:
# 3-D vs 2-D
a = np.arange(12).reshape((2, 3, 2))
b = np.arange(6).reshape((3, 2))

print(a)
print(b)
print(a - b)

<center>
    <img src="https://github.com/sayyed-uoft/images/raw/main/broadcast2.png" width="70%">
</center>

<a id="functions"></a>
## Element-wise (Universal) Functions

- MumPy provides a ritch set of elemen-wise array functions 
- There are two type of functions: **unary** (works on a single array), **binary** (takes two arrays)
- See the complete list [here](#https://numpy.org/doc/stable/reference/routines.math.html))

In [None]:
# Sample of unary functions
a = np.arange(0, 5).astype(np.float32)

print(np.sin(a)) # Sinus function
print(np.exp(a)) # Exponetial function
b = np.zeros_like(a)
np.sqrt(a , b) # optional "output" array. It will copy the result in that variable
print(b) 
np.sqrt(a - 3, a) # invalid values become 'nan' (not a number)
print(a, np.isnan(a)) # isnan is to check if the elements are "nan"
a = 1 / np.array([1 , 0]); print(a , np.isinf(a)) # Numpy supports "inf" (infinity)

import warnings
warnings.filterwarnings('ignore') # controls whether warnings are ignored, displayed, 
                                  # or turned into errors (raising an exception)
print(np.sqrt(-1))
warnings.filterwarnings('default') # for more options: https://docs.python.org/3/library/warnings.html

In [None]:
# Sample of binary functions
b = np.arange(0, 5)
a = b * 2

print(a, b)
print(np.subtract(a, b), np.add(a, b), np.multiply(a, b))
print(np.maximum(a, b), np.minimum(a, b))
print(np.less_equal(a, b), np.greater(a, b)) # element-wise comparison 
print(np.logical_xor(np.less(a, b), np.equal(a, b))) # logical operations

### ufunc Methods

All universal function will support the following methods:

| Method | Description | 
| :-- | :-- | 
| reduce() | Reduces array’s dimension by one, by applying ufunc along one axis | 
| accumulate() | Accumulates the result of applying the operator to all elements | 
| reduceat() | Performs a (local) reduce with specified slices over a single axis. | 
| outer() | Applies the ufunc op to all pairs (a, b) with a in A and b in B | 
| at() | Performs unbuffered in place operation on operand ‘a’ for elements specified by ‘indices’ | 

In [None]:
# reduce
a = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
print(np.add.reduce(a, axis=0))

# accumulate
a = np.array([10, 2, 3, 1, 5, 4])
print(np.minimum.accumulate(a))
print(np.multiply.accumulate(a))

# outer
print(np.multiply.outer([1, 2, 3], [4, 5, 6]))

# at
a = np.array([1, 2, 3, 4])
np.negative.at(a, [0, 1])
print(a)

<a id="stat_funcs"></a>
## Aggregate and Statistical Functions
NumPy provides a set of functions that compute statistics about an entire array or about the data along an axis. These functions are also accessible as methods of the array class. 

| Function/Method | Description | Function/Method | Description |
| :-- | :-- | :-- | :-- | 
| mean() | Computes the arithmetic mean along the specified axis | min() | Returns the minimum of an array or minimum along an axis |
| std() | Computes the standard deviation along the specified axis | max() | Returns the maximum of an array or minimum along an axis |
| var() | Computes the variance along the specified axis | argmin(), argmax() | Returns the indices of the minimum / maximum values along an axis |
| sum() | Computes the sum of array elements over a given axis | all() | Tests whether all array elements along a given axis evaluate to True |
| prod() | Returns the product of array elements over a given axis | any() | Tests whether any array element along a given axis evaluates to True |
| cumsum() | Returns the cumulative sum of the elements along a given axis | apply_along_axis() | Applies a function to 1-D slices along the given axis |
| cumprod() | Returns the cumulative product of elements along a given axis | apply_over_axes() | Applies a function repeatedly over multiple axe |

In [None]:
# Statistical functions & methods
a = np.array([[1, 2, 3], [4, 5, 6]])

print(np.mean(a), a.mean()) # average of the entire array (both function and method)
print(a.mean(axis = 0), a.mean(axis = 1)) # average along a specified an axis 
print(a.std(), a.var(), a.sum(), a.prod())
print(np.sum(a, axis=0), np.prod(a, axis=1))
print(np.cumsum(a , axis=0)) 
print(a.cumprod(axis=1)) 
print(a.min(), a.max(), a.argmin(), a.argmax()) 

In [None]:
# Stat functions/methods on boolean arrays
a = np.array([[1, 2, 3], [4, 5, 6]])
b = np.array([True, False, True])
c = np.array([0, 1, 2, 3])

print(np.any(a % 9 == 7), np.all(a < 10))
print(b.any(), b.all())
print((a == 4).any(), (a > 0).all())
print((a == 4).any(axis=0), (a > 3).all(axis=1))
print(np.any(c), c.all()) # 0 = False, anything else = True
print(np.any([np.nan, 0, 0])) # np.nan is considered True

In [None]:
# Apply along an axis
a = np.array([[1,2,3], [4,5,6], [7,8,9]])
def weighted_mean(arr):
    return(arr[0]*.25+arr[1]*.25+arr[2]*.5)

print(np.apply_along_axis(np.sum, 0, a)) # apply sum along axis = 0
print(np.apply_along_axis(weighted_mean, 1, a)) # apply weighted mean along axis = 1

# Apply over mutiple axes
a = np.arange(12).reshape((2, 2, 3))
print(a)
print(np.apply_over_axes(np.sum, a, [0, 2]))

<a id="set_ops"></a>
## Set Operations
NumPy provides some basic set operations for ndarrays, some only for one-dimensional arrays:

| Function/Attribute | Description | 
| :-- | :-- | 
| unique() | Returns the sorted unique elements of an array |
| intersect1d() | Returns the sorted, unique values that are in both of the input arrays |
| union1d() | Returns the unique, sorted array of values that are in either of the two input arrays |
| in1d() | Tests whether each element of a 1-D array is also present in a second array |
| setdiff1d() | Finds the set difference of two arrays | 
| setxor1d() | Finds the set exclusive-or of two arrays |

In [None]:
# Set operations
a = np.array([1, 3, 5, 7, 7])
b = np.array([2, 3, 4, 4, 5])
c = np.array([[1, 2], [1, 2], [3, 4]])

print(np.unique(a))
print(np.unique(c, axis=0))
print(np.intersect1d(a, b), np.union1d(a, b))
print(np.setdiff1d(a, b), np.setxor1d(a, b))
print(np.in1d(2, b), np.in1d([2, 3], b), np.in1d(a, b))

<a id="indexing"></a>
## Indexing and Slicing

- NumPy indexing is very powerful and provides many ways of selecting a subset of data.
- NumPy slicing can create a **view** to a subset of an array (instead of copying data). That makes it very memory efficient.
- Any modifications to the view will be reflected in the source array.
- A scalar value can be assigned to all the emements of a slice (**broadcasting**)

In [None]:
# Basic indexing
a_1d = np.arange(8) # a one dimentional array
a_2d = np.array([[1, 2], [3, 4]])

print(a_1d)
print(a_1d[3]) # the forth element in the array
print(a_2d)
print(a_2d[0]) # the first row of the array 
print(a_2d[1][1]) # the element in the index (1, 1)
print(a_2d[1, 1]) # the alternative way

a_2d[0, 0] = 100
print(a_2d)

In [None]:
# Basic slicing
a_1d = np.arange(8) # a one dimentional array
a_2d = np.array([[1, 2], [3, 4]])
print(a_1d[2:6]) # 3rd to 6th elements
print(a_1d[:]) # all the elements
print(a_2d[:,:]) # all the elements
print(a_2d[0,:], a_2d[0]) # the first row
print(a_2d[:,1], a_2d[:,1].shape) # the second column

# slices are views
a_slice = a_1d[2:6]; print(a_slice)
a_slice_copy = a_1d[2:6].copy() # get a copy not a view
a_slice[0] = 100
print(a_slice, a_1d) # both arrays are changed
a_1d[3:5] = 50; print(a_1d) # broadcasting a value to a slice
a_slice[:] = 44; print(a_1d, a_slice_copy) 

<center>
    <img src="https://github.com/sayyed-uoft/images/raw/main/indexing.png" width="80%">
</center>

In [None]:
# More on slicing 
a_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

print(a_2d[0], a_2d[0,:]) # the first row
print(a_2d[:,0]) # the first column
print(a_2d[:,1:3]) # the last two columns
print(a_2d[:,1:]) # the last two columns
print(a_2d[1:,1:]) # a slice
print(a_2d[-1,:]) # the last row

<center>
    <img src="https://github.com/sayyed-uoft/images/raw/main/indexing2.png" width="100%">
</center>

In [None]:
# Boolean indexing - slicing arrays by boolean indexes
people = np.array(["Jack", 'Joe', 'Sara', 'Susan'])
data = np.random.randn(4, 4) # random data. 4x4 matrix
np.set_printoptions(precision=3) # make the prints look nicer - 3 decimal numbers

print(data)
print(people == 'Joe', (people == 'Sara') | (people == 'Susan'))
print(data[people == 'Joe']) # selecting the row assuming every row belongs to a person
print(data[:, people == 'Joe']) # selecting the column assuming every column belongs to a person
print(data[np.char.startswith(people, 'S')]) # Selecting all rows assigned to people that their name strats with 'S'
print(data[~np.char.startswith(people, 'S')]) # Selecting the reverse

In [None]:
# Assigning to slices by boolean indexing
people = np.array(["Jack", 'Joe', 'Sara', 'Susan'])
data = np.random.randn(4, 4) # random data. 4x4 matrix

print(data)
data[data < 0] = 0; print(data) # making all negative number zero
data[~(people == 'Joe')] = 1.0 # making all numnbers to 1 except for Joe
print(data)

In [None]:
# Fancy indexing - indexing using integer array of indices 
a = np.array([[(i)*10+j for j in range(5)] for i in range(5)]) # creating a special matrix 
print(a)
print(a[[0, 2, 4]]) # selecting only even rows
print(a[:, range(1, 5, 2)]) # selecting odd columns
print(a[[-1, -3]]) # negative indices

<a id="sort_search"></a>
## Sorting & Searching

| Function/Attribute | Description | 
| :-- | :-- | 
| sort() | Returns a sorted copy of an array |
| &lt;array&gt;.sort() | Sorts an array in-place |
| argsort() | Returns the indices that would sort an array |
| searchsorted() | Finds indices where elements should be inserted to maintain order |
| where() | Returns elements chosen from array1 or array2 depending on condition |
| choose() | Constructs an array from an index array and a set of arrays to choose from | 
| extract() | Returns the elements of an array that satisfy some condition |


In [None]:
# Sort function
a = np.array([[3, 2, 5, 0], [2, 5, 0, 4]])

print(np.sort(a)) # sort along last axis (-1)
print(np.sort(a, axis=0, kind='mergesort')) # kind: {‘quicksort’, ‘mergesort’, ‘heapsort’, ‘stable’}
print(np.argsort(a , axis=1)) # sorted indices
print(np.partition(a, 1))

# In-place sorting
a.sort(axis=0)
print(a)

In [None]:
a = np.array([[3, 2, 5, 0], [2, 5, 0, 4]])
b = np.array([[5, 2, 1, 2], [0, 6, 2, 1]])

# Searching functions
print(np.nonzero(a)) # returns non-zero indices
print(np.where(a < 4)) # returns indices where the condition is satisfied
print(np.where(a < b, a, b)) # comparing two arrays 
print(np.choose([0, 1, 1, 0], a)) # choosing elements from different rows 
print(np.extract(np.mod(a, 2) == 0, a)) # returns the elements satisfying any condition

# Search sorted
print(np.searchsorted([1, 2, 4, 5], 3))
print(np.searchsorted([1, 2, 4, 5], [3, 6]))

<a id="manipulating"></a>
## Manipulating Arrays

Numpy provides a rich set of function to manipulate arrays:

- Reshaping arrays
- Transposing and swapping axes
- Changing the number of dimensions
- Joining arrays
- Splitting arrays
- Repeating arrays
- Adding and removing elements
- Rearranging elements

### Reshaping Arrays

- You can reshape any NumPy array to another shape (without changing its data) as long as the size is the same. 
- By default the elements are put into the reshaped array using C-like index order, with the last axis index changing fastest, back to the first axis index changing slowest. The reverse order option is available too.  

<center>
    <img src="https://github.com/sayyed-uoft/images/raw/main/reshaping.png" width="80%">
</center>

| Function/Attribute | Description | 
| :-- | :-- | 
| reshape(array, newshape, order='C')) | Gives a new shape to an array without changing its data |
| &lt;ndarray&gt;.reshape(shape, order='C')) | Returns an array containing the same data with a new shape |
| ravel(array, order='C') | Returns a contiguous flattened array |
| &lt;ndarray&gt;.flat | A 1-D iterator over the array |
| &lt;ndarray&gt;.flatten(order='C') | Return a copy of the array collapsed into one dimension | 

In [None]:
# Reshaping
a = np.arange(8)
print(a)
print(a.reshape(2, 4))
print(a.reshape(2, -1)) # -1: automatically calculates teh diemnsion based on the size
print(a.reshape(2, 4, order='F')) # Fortran-like order 
print(a.reshape(2, 2, 2))

In [None]:
# Flattening
a = np.arange(8).reshape(4, 2) * 2

print(a)
print(a.flatten()) # returns a copy
print(np.ravel(a)) # returns a view

for index, element in enumerate(a.flat):
    print(index, element)

### Transposing and Swapping Axes

| Function/Attribute | Description | 
| :-- | :-- | 
| moveaxis(array, source, destination) | Moves axes of an array to new positions |
| rollaxis(array, axis, start=0) | Rolls the specified axis backwards, until it lies in a given position |
| swapaxes(array, axis1, axis2) | Interchanges two axes of an array |
| &lt;ndarray&gt;.T | The transposed array (teh array is returned if ndim < 2) | 
| transpose(array, axes=None) | Returns a full array with the same shape and type as a given array |

In [None]:
# Transposing and swapping
a = np.arange(8).reshape(2, 4)
print(a) 
print(a.T) # trasposing a 2D matrix - returns a reference not a copy 
print(a.swapaxes(0, 1)) # swapping two columns (alternative to transpose)
a_3d = a.reshape(2, 2, 2)
print(a_3d.T)
print(a_3d.transpose(2, 0, 1)) # the order of axes to transpose

### Changing Number of Dimensions

| Function/Attribute | Description | 
| :-- | :-- | 
| atleast_1d(\*arrays) | Converts inputs to arrays with at least one dimension |
| atleast_2d(\*arrays) | Views inputs as arrays with at least two dimensions |
| atleast_3d(\*arrays) | Views inputs as arrays with at least three dimensions |
| expand_dims(array, axis) | Expands the shape of an array |
| squeeze(array, axis=None) | Removes single-dimensional entries from the shape of an array |

In [None]:
# Changing the number of dimensions
a = np.arange(2)

def print_with_shape(arr):
    print(arr.shape, arr)
    
print_with_shape(np.atleast_1d(a))
print_with_shape(np.atleast_2d(a))
print_with_shape(np.atleast_3d(a))
print_with_shape(np.expand_dims(a, axis=0)) # expand dimension at axis 0
b = np.expand_dims(a, axis=1)
print_with_shape(b) # expand dimension at axis 1
print_with_shape(np.squeeze(b, axis=1))

### Joining Arrays

| Function/Attribute | Description | 
| :-- | :-- | 
| concatenate(arrays, axis=0, out=None) | Joins a sequence of arrays along an existing axis |
| stack(arrays, axis=0, out=None) | Joins a sequence of arrays along a new axis |
| dstack(arrays) | Stacks arrays in sequence depth wise (along third axis) |
| hstack(arrays) | Stacks arrays in sequence horizontally (column wise) |
| vstack(arrays) | Stacks arrays in sequence vertically (row wise) |
| block(arrays) | Assembles an nd-array from nested lists of blocks |

In [None]:
# Joining arrays
a = np.array([[1, 2]])
b = np.array([[3, 4]])

def print_with_shape(arr):
    print(arr.shape, arr)

print_with_shape(a)
print_with_shape(b)
print_with_shape(np.concatenate((a, b)))
print_with_shape(np.concatenate((a, b), axis=1))
print_with_shape(np.stack((a, b)))
print_with_shape(np.hstack((a, b)))
print_with_shape(np.vstack((a, b)))

In [None]:
# Building a block
a = np.eye(2)
b = np.eye(3) * 2
c = np.zeros((2, 3))
d = np.ones((3, 2))

print(np.block([[a , c], [d, b]]))

### Splitting Arrays

| Function/Attribute | Description | 
| :-- | :-- | 
| split(array, indices_or_sections, axis=0) | Splits an array into multiple sub-arrays |
| array_split(array, indices_or_sections, axis=0) | Splits an array into multiple sub-arrays |
| dsplit(array, indices_or_sections) | Splits array into multiple sub-arrays along the 3rd axis (depth) |
| hsplit(array, indices_or_sections) | Splits an array into multiple sub-arrays horizontally (column-wise) |
| vsplit(array, indices_or_sections) | Splits an array into multiple sub-arrays vertically (row-wise) |

In [None]:
# Splitting arrays
a = np.arange(6)
a_2d = a.reshape((2, 3))

print(np.split(a, 3)) # doesn't allow an integer that does not equally divide the axis (e.g. 4)
print(np.split(a_2d, 3, axis=-1))
print(np.array_split(a, 4))
print(np.hsplit(a_2d, 3))
print(np.vsplit(a_2d, 2))

### Repeating Arrays


| Function/Attribute | Description | 
| :-- | :-- | 
| tile(array, repeats) | Constructs an array by repeating A the number of times given by reps |
| repeat(array, repeats, axis=0) | Repeats elements of an array|

In [None]:
# Repeating arrays
a = np.array([1, 2])
b = np.array([[1, 2], [3, 4]])

print(np.tile(a, 3))
print(np.tile(a, (2, 2)))
print(np.tile(b, 2))
print(np.repeat(a, 3))
print(np.repeat(b, 2, axis=0))
print(np.repeat(b, 2, axis=1))

### Adding and Removing Elements

| Function/Attribute | Description | 
| :-- | :-- | 
| delete(array, indices, axis=None) | Returns a new array with sub-arrays along an axis deleted |
| insert(array, indices, values, axis=None) | Inserts values along the given axis before the given indices |
| append(array, values, axis=None) | Appends values to the end of an array |
| resize(array, new_shape) | Returns a new array with the specified shape |
| &lt;array&gt;resize(new_shape) | Resizes an array in-place |
| trim_zeros(array, trim='fb' | Trims the leading and/or trailing zeros from a 1-D array or sequence |
| unique(array, return_index=False, return_inverse=False, return_counts=False, axis=None) | Finds the unique elements of an array |

In [None]:
# Deleting & inserting elements
a = np.eye(3)

print(np.delete(a, 1)) # axis=None applies to flatten array
print(np.delete(a, 1, axis=0)) # deleting the second row
print(np.delete(a, [0, 2], axis=1)) # deleting the first and third columns
print(np.insert(a, -1, 5, axis=1)) # inserting a column before last columns with values as 5
print(np.insert(a, 3, [1, 2, 3], axis=1)) # values as a list

In [None]:
# appending elements
a = np.eye(3)

print(np.append(a, [[1, 2, 3]], axis=0)) # values should have same number of dimensions as array 
print(np.append(a, [[1], [2], [3]], axis=1)) # values should have same number of dimensions as array 

# Resizing
a = np.ones((3, 3))
print(np.resize(a, (2, 6))) # If size is larger than the original, the new array is filled with repeated copies of a
a.resize((2, 6)); print(a) # fills the extra elements by zero

In [None]:
# Trimming zeros
a = np.array([0, 0, 1, 2, 3, 0])
b = np.array([[1, 2], [1, 2], [3, 4]])

print(np.trim_zeros(a)) # trimming from both fromt and back
print(np.trim_zeros(a, trim='f')) # trimming from fromt
print(np.trim_zeros(a, trim='b')) # trimming from back

print(np.unique(a))
print(np.unique(b, axis=0))

### Rearranging Elements

| Function/Attribute | Description | 
| :-- | :-- | 
| flip(array, axis=None) | Reverses the order of elements in an array along the given axis |
| fliplr(array) | Flips array in the left/right direction |
| flipud(array) | Flips array in the up/down direction |
| roll(array, shift, axis=None) | Rolls array elements along a given axis |
| rot90(array, k=1, axes=(0, 1)) | Rotates an array by 90 degrees in the plane specified by axes |

In [None]:
# Rearranging elements
a = np.array([[1, 2], [3, 4]])
b = np.arange(8)

print(np.flip(a, axis=0)) # flip around the first axis
print(np.fliplr(a)) # flip left right
print(np.flipud(a)) # flip up down

print(np.roll(b, 2)) # shift by 2 to right
print(np.roll(b, -2)) # shift by 2 to left
print(np.rot90(a)) # rotate 90 degrees counter clockwise

<a id="linear_algebra"></a>
## Linear Algebra

Similar to other Array processing libaraies, NumPy provides most common linear algebra operations. Most of the functions (except **dot**) are in **numpy.linalg** package. 


| Function/Method | Description | Function/Attribute | Description |
| :-- | :-- | :-- | :-- | 
| dot() | Dot product of two arrays (matrix multiplication) | linalg.inv() | Computes the (multiplicative) inverse of a matrix |
| matmal() | Matrix product of two arrays | linalg.pinv() | Computes the (Moore-Penrose) pseudo-inverse of a matrix |
| diag() | Extracts a diagonal or construct a diagonal array | linalg.qr() | Computes the QR factorization of a matrix
| trace() | Returns the sum along diagonals of the array | linalg.svd() |  | Computes the Singular Value Decomposition |
| linalg.det() | Computes the determinant of an array | linalg.solve() | Solves a linear matrix equation, or system of linear scalar equations |
| linalg.eig() | Computes the eigenvalues and right eigenvectors of a square array | linalg.lstsq() | Returns the least-squares solution to a linear matrix equation |

In [None]:
# Basic linear algebra operations
v1 = np.array([1, 2, 3])
v2 = np.array([4, 5, 6])
m1 = np.array([[1, 2], [3, 4]])
m2 = np.array([[5, 6], [7, 8]])

print(np.dot(v1, v2), v1.dot(v2)) # inner product 
print(np.dot(m1, m2)) # Matrix multiplication
print(m2.dot(m1))
print(np.matmul(m1, m2)) # Matrix multiplication (preferred)
print(m1 @ m2) # Matrix multiplication (preferred)
print(np.diag(m1)) # Diagonal array
print(np.trace(m2)) # Trace of the matrix

In [None]:
# numpy.linalg package 
from numpy.linalg import *
m = np.array([[1, 2, 3],[3, 2, 1],[1, 0, -1]])


print(m)
print(det(m))
print(inv(m.dot(m.T))) 
print(m.dot(pinv(m)))
e_vals, e_vects = eig(m)
print(e_vals) # eigenvalues
print(e_vects) # eigenvectors
print(np.dot(m, e_vects[:, 1]), e_vals[1] * e_vects[:, 1]) # checking matrix * eignvector = eignvalue * identity

<a id="random"></a>
## Random Generation

- The **numpy.random** module provides functions to efficiently generate array of pseudorandom numbers drawn from different prrobability distributions (e.g. uniform, normal)
- **Pseudorandom** numbers are generated by a deterministic algorithm based on a **seed** (global state)
- You can also use a local state using **RandomState**

| Function/Class | Description | Function/Attribute | Description |
| :-- | :-- | :-- | :-- | 
| seed() | Seeds the pseudorandom generator | randn() | Returns samples from the “standard normal” distribution |
| RandomState | Container for the Mersenne Twister pseudo-random number generator | binomial() | Draws samples from a binomial distribution |
| permutation() | Randomly permutes a sequence, or return a permuted range | normal() | Draws samples from a normal distribution |
| shuffle() | Modifies a sequence in-place by shuffling its contents | beta() | Draws samples from a beta distribution |
| choice() | Generates a random sample from a given 1-D array | chisquare() | Draws samples from a chai square distribution |
| rand() | Returns samples from the “uniform” distribution | gamma() | Draws samples from a gamma distribution |
| randint() | Returns random integers from low (inclusive) to high (exclusive) | uniform() | Draws samples from a uniform distribution |

#### Uniform Distribution vs. Normal Distribution:


<center>
    <img src="https://github.com/sayyed-uoft/images/raw/main/uni_norm_dist.png" width="80%">
</center>

In [None]:
# Random generation
print(np.random.rand(2, 10)) # from uniform distribution 
print(np.random.randn(2, 10)) # from standard normal distribution
print(np.random.randint(2, size=10)) # integer numbers < 2
print(np.random.randint(2, 5, size=(2, 5))) # integer in [2, 5)
print(np.random.normal(5, 2, size=(2,10))) # mean = 2, std = 2
print(np.random.uniform(-2, 2, size=(2,10))) # from -2 to 2
print(np.random.choice([1, 10,15], size=10)) # randomly choose 10 numbers from (1, 10, 15)

In [None]:
# Seeding and random state
np.random.seed(2020)
print(np.random.rand(2, 10))

np.random.seed(2020)  # repeat
print(np.random.rand(2, 10))

rg = np.random.RandomState(2021) # using Random State
print(rg.rand(2, 10))

In [None]:
# Permutation and shuffle
print(np.random.permutation(10)) # permuted list of 0 to 9 
print(np.random.permutation([1, 3, 5, 7, 9])) # permutation of an input sequence or array
a = np.arange(9).reshape((3, 3))
print(np.random.permutation(a))

print(a)
np.random.shuffle(a)
print(a)

<a id="vectorization"></a>
## Vectorization

- NumPy arrays can make many data processing tasks faster and easier that otherwise require writing loops. This is called **vectorization**
- Vectorized array operations will often be one or two (or more) times faster than their pure Python equivalents and the code is shorter

In [None]:
# Vectorization example
ra = np.random.randn(1000, 1000) # 1 million (1000x1000) random numbers from standard normal distribution

# Counting positive elements using loops
def count_positive(a2d):
    lx, ly = a2d.shape
    count = 0
    for i in range(lx):
        for j in range(ly):
            if ra[i, j] > 0:
                count += 1
    return(count)

%time print(count_positive(ra))

# counting using vectorization
%time print(np.count_nonzero(ra > 0))

# alternative function (NumPy)
%time print(np.sum(ra > 0))



### Another Example: Buy Low Sell High 

- **Goal:** find best time to buy and sell to maximize profit
- **Asumption:** only one buy and one sell is allowed
- **Data:** stock’s price history as a sequence
- **Algorithm:** finding the difference between each price and a running minimum

In [None]:
# Let's generate some sample data
rg = np.random.RandomState(2020)
x = np.arange(1, 10, 0.1)
stock_prices = 10 + 4 * np.sin(1.5 * x) + 3 * rg.randn(x.shape[0]) # x+5+4sin(1.5x)+noise

# plotting the data (we will learn this in later sessions)
%matplotlib inline
import matplotlib.pyplot as plt
plt.plot(x, stock_prices)
plt.show()

In [None]:
def vectorized_method(prices):
    running_min = np.minimum.accumulate(prices)
    sell_idx = np.argmax(prices - running_min)
    buy_idx = np.where(prices == running_min[max_idx])[0][0]
    print(buy_idx, sell_idx)

def regular_method(prices):
    running_min_tpl = (0, -1) # (min_val, min_idx)
    max_tpl = (0, -1) # (max_val, max_idx)
    for idx, price in enumerate(prices):
        if idx == 0:
            running_min_tpl = (price, idx)
        else:
            if running_min_tpl[0] > price:
                running_min_tpl = (price, idx)
            diff = price - running_min_tpl[0]
            if max_tpl[0] < diff:
                max_tpl = (diff, idx)
    print(running_min_tpl[1], max_tpl[1])

In [None]:
%time vectorized_method(stock_prices)

%time regular_method(list(stock_prices))

### Another Example: Monty Hall Problem 

- **Problem:** Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what's behind the doors, opens another door, say No. 3, which has a goat. He then says to you, "Do you want to pick door No. 2?" Is it to your advantage to switch your choice?
- **Emprical Method:** Let's play the game for a large number of times (randomly door assignment and picking) and calculate both the probablity of winning if you switch and if you don't switch.  


<center>
    <img src="https://github.com/sayyed-uoft/images/raw/main/monty_hall.png" width="50%"><br\>
    Source: https://en.wikipedia.org/wiki/Monty_Hall_problem
</center>

In [None]:
%%time
# Monty Hall Solution
num_trial = 100000
doors = np.array([True, False, False]) # True = Car, False = Goat
win_count = 0 # numnber of wins without switching 
for _ in range(num_trial):
    np.random.shuffle(doors) # shuffling door assignments
    win_count += doors[np.random.randint(3)] # choose a door randomly and increase the count if the right door
print('Prob win without switching:', win_count / num_trial)
print('Prob win with switching:', (num_trial-win_count) / num_trial)

In [None]:
%%time
# Monty Hall Solution (Vectorized)
num_trial = 100000
rg = np.random.randint
p = np.mean(rg(3, size=num_trial) == rg(3, size=num_trial)) # car door numbers vs selected door numbers
print('Prob win without switching:', p)
print('Prob win with switching:', 1-p)

<a id="vectorization"></a>
## Storing and Retrieving Arrays

Functions to save and load data to and from disk either in text or binary format:

| Function/Method | Description | 
| :-- | :-- | 
| save() | Saves an array to a binary file in NumPy .npy format |
| savez() | Saves several arrays into a single file in uncompressed .npz format |
| savez_compressed() | Saves several arrays into a single file in compressed .npz format |
| savetxt() | Saves an array to a text file |
| &lt;ndaary&gt;.tofile() | Writes array to a file as text or binary (default) |
| load() | Loads arrays or pickled objects from .npy, .npz or pickled files |
| loadtxt() | Loads data from a text file |
| genfromtxt() | Loads data from a text file, with missing values handled as specified |
| fromregex() | Constructs an array from a text file, using regular expression parsing |

In [None]:
# Saving one object in a file 
a = np.random.randn(3, 20)
print(a)
np.save('temp', a) # save

!ls *.npy # list the file in teh current directory

print(np.load('temp.npy')) # load 

In [None]:
# Saving more than one objects in a file
a = np.random.randn(2, 10)
b = np.random.randn(1, 10)

with open('temp2.npy', 'wb') as file: # saving
    np.save(file, a)
    np.save(file, b)
    
!ls *.npy

with open('temp2.npy', 'rb') as file: # loading with the same sequence
    a2 = np.load(file)
    b2 = np.load(file)

print(a)
print(a2)
print(b)
print(b2)

In [None]:
# save several file in NPZ format
a = np.random.randn(2, 10)
b = np.random.randn(1, 10)

np.savez('arrays', a , b) # save files

!ls *.npz

npz = np.load('arrays.npz') # load files
print(npz.files)
print(npz['arr_1'])

In [None]:
# Save into and load from a text file
a = np.random.randn(5, 10)

np.savetxt('array2.txt', a, fmt='%0.3f')  # default delimiter is space

!cat array2.txt

b = np.loadtxt('array2.txt')
print(b.shape)
print(b)

### Example: Nearest Neighbors (NN)

- **Goal:** Finding similar images using nearest neighbors method
- **Data:** MNIST, a large database of handwritten digits that is commonly used for training various image processing systems. Every row is a vector of size 784 that is the flatten version of a 28x28 grayscale image. 
- **Similarity Algorithm**: Euclidean distance between the image vectors


<center>
    <img src="https://github.com/sayyed-uoft/images/raw/main/5nn.png" width="45%"><br>
    5-NN for 2-D Vectors
</center>



<center>
    <img src="https://github.com/sayyed-uoft/images/raw/main/mnist.png" width="70%"><br>
    MNIST Sample Images
</center>

In [None]:
data = np.loadtxt('mnist.csv', delimiter=',') # loading moinist images
digits = data[:, 1:] # the first column is the label
print(digits.shape)
print(digits[0])

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

samples = [0, 200, 300, 400, 1000] # picking some sample digits
fig, axes = plt.subplots(5, 5, figsize=(7.5,10)) # creating a plot grid
for i in range(5):
    sample_digit = digits[samples[i]] # get the corresponding vector for the sample digit
    sq_distances = np.sum((digits-sample_digit)**2, axis=1) # calculate the Eucleadian distance, used broadcasting
    top_5 = np.argsort(sq_distances)[:5] # sort and get the top 5 nearest (lowest distance)
    for j in range(5):
        ax = axes[i, j]
        ax.imshow(digits[top_5[j]].reshape((28,28)), cmap='gray') # show the image
plt.tight_layout()
plt.show()