# Module 6: Numerical Python (NumPy)

## What is NumPy?

- A library of pre-built functions + data objects


- Underlying code has been optimized for speed of computation


- NumPy eliminates the need to implement iterative control flow statements for arithmetic operations on array elements. So code is easier to read + maintain.


- Mostly used for processing arrays of numerical data


- Most scientific + data analytics software packages availble for use within Python were built using components of NumPy (e.g., Pandas, scikit-learn, etc.)


## ndarray: A Multi-dimensional Array Object

- NumPy's most important data object/type


- 'n dimensional array' = ndarray


- ndarrays can be of any dimensionality, e.g., uni-dimensional, bi-dimensional, tri-dimensional, etc..


- Unlike Python lists, elements of NumPy arrays are __always__ of the same data type.


Some simple examples with a uni-dimensional array:


In [None]:
# load the numpy library into your Python environment
import numpy as np

# create a 1-D array: pass a list of items to np.array()
simple = np.array([1, 2, 3, 4, 5])
simple

array([1, 2, 3, 4, 5])

In [None]:
# check type of new object
type(simple)

numpy.ndarray

In [None]:
# indices in a 1D array work the same way as a list
simple[2]

3

In [None]:
# update a value within a 1D array
simple[4] = 101
simple

array([  1,   2,   3,   4, 101])

In [None]:
# elements must be same data type
# e.g., adding a float to an int array yields and int
simple[4] = 5.2
simple

array([1, 2, 3, 4, 5])

In [None]:
# use slicing to get a subset of a 1D array
simple[2:5]

array([3, 4, 5])

In [None]:
# automatically create a 1D array containing a sequence of values
# note that 'arange' creates an ndarray, NOT a list
np.arange(0, 16)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])

### Multi-dimensional NumPy Arrays

The most common form of a multi-dimensional array is a "2D array" having m number of rows and n number of columns (i.e., "__m x n__")

In [None]:
# Create a 4x5 array filled with random data sampled from a normal distribution
data = np.random.randn(4, 5)
data

array([[-0.78837496,  0.4196049 ,  0.02168693,  0.41523792,  1.2478209 ],
       [ 0.2897301 , -0.42972632,  1.95534125, -1.24928316, -0.44094488],
       [ 0.53868387,  0.07463815, -1.61606097,  0.06512262, -0.49991499],
       [-0.05861105,  0.52470005, -0.79160738,  1.03017106, -0.66064693]])

In [None]:
# scale the entire array by a factor of 10
data * 10

array([[ -7.88374959,   4.19604899,   0.21686929,   4.15237917,
         12.47820901],
       [  2.89730095,  -4.29726319,  19.55341253, -12.49283159,
         -4.40944885],
       [  5.38683871,   0.74638148, -16.1606097 ,   0.65122622,
         -4.99914994],
       [ -0.58611047,   5.24700052,  -7.9160738 ,  10.30171055,
         -6.60646932]])

In [None]:
# add 2 arrays of same dimensions
data + data

array([[-1.57674992,  0.8392098 ,  0.04337386,  0.83047583,  2.4956418 ],
       [ 0.57946019, -0.85945264,  3.91068251, -2.49856632, -0.88188977],
       [ 1.07736774,  0.1492763 , -3.23212194,  0.13024524, -0.99982999],
       [-0.11722209,  1.0494001 , -1.58321476,  2.06034211, -1.32129386]])

In [None]:
# check the dimensionality of an ndarray object
print(data.ndim)

# check the shape of an ndarray object
print(data.shape)

# check the data type of the elements of the ndarray
print(data.dtype)

2
(4, 5)
float64


In [None]:
# init an array to all zeroes
np.zeros((3, 6))

array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])

In [None]:
# init an array to all ones
np.ones((3,6))

array([[1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.]])

In [None]:
# create a square identity matrix (linear algebra)
np.identity(5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

In [None]:
# init an empty array comprised of 2 3x2 arrays
np.empty((2, 3, 2))

array([[[0., 0.],
        [0., 0.],
        [0., 0.]],

       [[0., 0.],
        [0., 0.],
        [0., 0.]]])

In [None]:
# use 'arange' to generate a 1D array of 12 elements + 
# reshape that array into a 3x4 2D array
np.arange(12).reshape(3,4)

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [None]:
# use 'nditer' to automatically iterate through all elements of an array

tdata = np.arange(12).reshape(3,4)
print(tdata)

for x in np.nditer(tdata):
    print(x)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
0
1
2
3
4
5
6
7
8
9
10
11


## NumPy matrix Objects: Strictly 2-dimensional !!

- A "matrix" is a separate type of data object in NumPy.


- Unlike arrays, matrices __must__ be 2 dimensions __only__.


- NumPy provides special matrix mathematics functionality that works __only__ on NumPy matrix objects.


- NumPy __matrix__ objects behave differently than ndarrays when standard arithmetic operations are applied to them.


- DO NOT CONFUSE THE TERM __MATRIX MATHEMATICS__ WITH __ARRAY MATHEMATICS__!! They are not the same thing in NumPy!!

In [None]:
# define a matrix using numpy
x = np.matrix( ((2,3), (3, 5)) )
x

matrix([[2, 3],
        [3, 5]])

## Array Mathematics

- In array mathematics, arithmetic standard operators (+, -, \*, /, \*\*, %) are applied __to elements whose positions correspond to one another__ within the two arrays (analagous to __vector mathematics__).


- If the arrays are not of the exact same dimension, the arithmetic operation will not work.

### Uni-dimensional Array Mathematics (aka "Vector Mathematics")

In [None]:
# 1D math = vector mathematics
print(simple)

# vector addition
simple + np.array([5, 6, 7, 8, 9])

[1 2 3 4 5]


array([ 6,  8, 10, 12, 14])

In [None]:
# vector subtraction
np.array([5, 6, 7, 8, 9]) - simple

array([4, 4, 4, 4, 4])

In [None]:
# vector multiplication
np.array([5, 6, 7, 8, 9]) * simple

array([ 5, 12, 21, 32, 45])

In [None]:
# vector division
np.array([5, 6, 7, 8, 9]) / simple

array([5.        , 3.        , 2.33333333, 2.        , 1.8       ])

In [None]:
# apply a scalar to a 1D array
simple * 6

array([ 6, 12, 18, 24, 30])

### Multi-dimensional Array Mathematics

In [None]:
# define a 2x3 array
arr = np.array([[1., 2., 3.], [4., 5., 6.]])
arr

array([[1., 2., 3.],
       [4., 5., 6.]])

In [None]:
# multiply each element in the array BY ITS OWN VALUE
arr * arr

array([[ 1.,  4.,  9.],
       [16., 25., 36.]])

As we can see above, applying the \* operator to two arrays having the same dimensions results in the elements of the second array being multiplied by their positional counterparts within the first array.

In [None]:
# create a new array by adding the original array to itself
arr1 = arr + arr
arr1

array([[ 2.,  4.,  6.],
       [ 8., 10., 12.]])

In [None]:
# subtract the new array from the original
arr - arr1

array([[-1., -2., -3.],
       [-4., -5., -6.]])

In [None]:
# now create a 3x2 array
arr2 = np.array([[1., 2.], [3., 4.], [5., 6.]])
arr2

array([[1., 2.],
       [3., 4.],
       [5., 6.]])

In [None]:
# applying an operator to two arrays that are NOT of the same dimension
# yields an error message
arr + arr2

ValueError: operands could not be broadcast together with shapes (2,3) (3,2) 

In [None]:
# boolean logic tests can be applied to arrays that are the same size
arr3 = np.array([[0., 4., 1.], [7., 2., 12.]])
print(arr3)
arr3 > arr

[[ 0.  4.  1.]
 [ 7.  2. 12.]]


array([[False,  True, False],
       [ True, False,  True]])

In [None]:
# divide a constant by each element of an array
1 / arr

array([[1.        , 0.5       , 0.33333333],
       [0.25      , 0.2       , 0.16666667]])

In [None]:
# multiple each element of an array by a constant
5 * arr

array([[ 5., 10., 15.],
       [20., 25., 30.]])

## Matrix Mathematics

Unlike array mathematics, if we apply the \* operator to two square NumPy __matrix__ objects we produce the __dot product__ of the two matrices, __NOT__ the simple positional multiplication we saw above. 

(see https://www.mathsisfun.com/algebra/matrix-multiplying.html for an explanation of how to compute the dot product)

In [None]:
a = [[1,2],[4,5]]
b = [[6,8],[3,2]]
print(a@b)

In [None]:
# apply the * operator to two 2x2 ndarrays
# the result is analogous to vector mathematics
x = np.array( ((2,3), (3, 5)) )
print(x)
y = np.array( ((1,2), (5, -1)) )
print(y)
x * y

[[2 3]
 [3 5]]
[[ 1  2]
 [ 5 -1]]


array([[ 2,  6],
       [15, -5]])

In [None]:
# now apply the * operator to two 2x2 MATRICES
# the result is the dot product of the matrices
x = np.matrix( ((2,3), (3, 5)) )
print(x)
y = np.matrix( ((1,2), (5, -1)) )
print(y)
x * y

[[2 3]
 [3 5]]
[[ 1  2]
 [ 5 -1]]


matrix([[17,  1],
        [28,  1]])

## Indexing & Slicing of ndarray Objects

- Similar to how it's done with Python list objects


- Indices are __zero based__, meaning they start with 0 and end at 1 digit less than the corresponding array dimension


Some Examples:


In [None]:
print(arr3)

[[ 0.  4.  1.]
 [ 7.  2. 12.]]


In [None]:
# get the second row of arr3
arr3[1]

array([ 7.,  2., 12.])

In [None]:
# get the first column of arr3
arr3[:, 0]

array([0., 7.])

In [None]:
# equivalent ways of accessing data via an index
# get the content of the third column in the second row of the array
print(arr3[1][2])
print(arr3[1, 2])

12.0
12.0


In [None]:
# use slicing to subset a 2D array
# get values of first two columns of first row
print(data)
print(' ')
print(data[:1, :2])

[[-0.78837496  0.4196049   0.02168693  0.41523792  1.2478209 ]
 [ 0.2897301  -0.42972632  1.95534125 -1.24928316 -0.44094488]
 [ 0.53868387  0.07463815 -1.61606097  0.06512262 -0.49991499]
 [-0.05861105  0.52470005 -0.79160738  1.03017106 -0.66064693]]
 
[[-0.78837496  0.4196049 ]]


In [None]:
# get 2nd - nth col of all rows except the first row
# 'data' is a 4x5 array, so we get cols 2-5 of rows 2-4
print(data[1:, 1:])

[[-0.42972632  1.95534125 -1.24928316 -0.44094488]
 [ 0.07463815 -1.61606097  0.06512262 -0.49991499]
 [ 0.52470005 -0.79160738  1.03017106 -0.66064693]]


In [None]:
# safely 'bound' the upper ends of a slice by using the row + column dimensions
# this subsets everthing EXCEPT the first row and first column
# data.shape[0] is the number of rows in the array
# data.shape[1] is the number of columns in the array

print(data[1:(data.shape[0]), 1:(data.shape[1])])

[[-0.42972632  1.95534125 -1.24928316 -0.44094488]
 [ 0.07463815 -1.61606097  0.06512262 -0.49991499]
 [ 0.52470005 -0.79160738  1.03017106 -0.66064693]]


## Transposing an ndarray Object

NumPy has a built-in method for transposing multi-dimensional arrays

In [None]:
# apply the 'T' method to transpose an array
print(arr)
arr.T

[[1. 2. 3.]
 [4. 5. 6.]]


array([[1., 4.],
       [2., 5.],
       [3., 6.]])

In [None]:
# another example of transposing an array
print(data)
data.T

[[-0.78837496  0.4196049   0.02168693  0.41523792  1.2478209 ]
 [ 0.2897301  -0.42972632  1.95534125 -1.24928316 -0.44094488]
 [ 0.53868387  0.07463815 -1.61606097  0.06512262 -0.49991499]
 [-0.05861105  0.52470005 -0.79160738  1.03017106 -0.66064693]]


array([[-0.78837496,  0.2897301 ,  0.53868387, -0.05861105],
       [ 0.4196049 , -0.42972632,  0.07463815,  0.52470005],
       [ 0.02168693,  1.95534125, -1.61606097, -0.79160738],
       [ 0.41523792, -1.24928316,  0.06512262,  1.03017106],
       [ 1.2478209 , -0.44094488, -0.49991499, -0.66064693]])

## Built-In NumPy Universal Functions ('ufuncs')

- Fast, "vectorized" functionality that eliminate the need for use of an explicit iterator with a NumPy array object


- ufuncs __automatically apply their programming logic to each element of a NumPy array__

In [None]:
# get the square root of each element in an array
print(arr)
np.sqrt(arr)

[[1. 2. 3.]
 [4. 5. 6.]]


array([[1.        , 1.41421356, 1.73205081],
       [2.        , 2.23606798, 2.44948974]])

In [None]:
# square each element in an array
np.square(arr)

array([[ 1.,  4.,  9.],
       [16., 25., 36.]])

In [None]:
# get maximum value of 2 corresponding array elements
print(simple)
np.maximum(simple, np.array([0,4,1,3,6]))

[1 2 3 4 5]


array([1, 4, 3, 4, 6])

In [None]:
# add the corresponding elements of 2 same-sized arrays
np.add(simple, np.array([0,4,1,3,6]))

array([ 1,  6,  4,  7, 11])

In [None]:
# sum all rows of a 2d array
print(arr)
np.sum(arr, axis = 1)

[[1. 2. 3.]
 [4. 5. 6.]]


array([ 6., 15.])

In [None]:
# sum all columns of a 2D array
np.sum(arr, axis = 0)

array([5., 7., 9.])

#### NOTE:

As shown in the two examples above, in NumPy the "axis" values assigned to 'rows' and 'columns' are not very intuitive:

- __axis = 1__ refers to the __rows__ of a NumPy array


- __axis = 0__ refers to the __columns__. 

## Using Conditional Logic in Array Operations

NumPy's __where__ ufunc works like a 'where' clause from SQL

In [None]:
# generate a 4x4 array filled with values from a normal distribution
warr = np.random.randn(4, 4)
print(warr)
print(' ')
print(warr > 0)

# replace all positive values with '2', replace all negative values with '-2'
np.where(warr > 0, 2, -2)

[[ 1.59517436  0.31152247 -0.20846376  0.04970647]
 [ 1.37948183 -1.57865163 -0.04513732 -0.07928546]
 [-0.61998553  0.07429133 -1.27574604 -0.24140723]
 [ 0.70759127  0.03734384 -0.47018534 -1.18079426]]
 
[[ True  True False  True]
 [ True False False False]
 [False  True False False]
 [ True  True False False]]


array([[ 2,  2, -2,  2],
       [ 2, -2, -2, -2],
       [-2,  2, -2, -2],
       [ 2,  2, -2, -2]])

## Math + Stats Methods

- Built-in NumPy methods for performing calculations on arrays


- Fast and simple to use

In [None]:
# calculate the mean of all values in an array
print(data)
data.mean()

[[-0.78837496  0.4196049   0.02168693  0.41523792  1.2478209 ]
 [ 0.2897301  -0.42972632  1.95534125 -1.24928316 -0.44094488]
 [ 0.53868387  0.07463815 -1.61606097  0.06512262 -0.49991499]
 [-0.05861105  0.52470005 -0.79160738  1.03017106 -0.66064693]]


0.002378354856203391

In [None]:
# calculate the sum of all values in an array
print(arr3)
arr3.sum()

[[ 0.  4.  1.]
 [ 7.  2. 12.]]


26.0

In [None]:
# sort an array 'in place': Note that you will lose the original order of
# the array UNLESS YOU COPY IT TO A NEW ARRAY

# copy array to a new array to preserve original ordering
arrsort = np.array(arr3)

# sort the new copy of the array
arrsort.sort()

# display the original array
print(arr3)

# display the sorted copy
print(arrsort)


[[ 0.  4.  1.]
 [ 7.  2. 12.]]
[[ 0.  1.  4.]
 [ 2.  7. 12.]]


## "Hands On" Exercises

#### 1. Use Python + NumPy to do the following:


- Print all numbers between 0 and 99 that are multiples of either 3 or 5.


- Calculate and print the sum of all numbers between 0 and 100 that are multiples of either 3 or 5.




In [None]:
result = [i for i in np.arange(0,100) if i%3 ==0 or i%5 ==0 ]
print(result)

[0, 3, 5, 6, 9, 10, 12, 15, 18, 20, 21, 24, 25, 27, 30, 33, 35, 36, 39, 40, 42, 45, 48, 50, 51, 54, 55, 57, 60, 63, 65, 66, 69, 70, 72, 75, 78, 80, 81, 84, 85, 87, 90, 93, 95, 96, 99]


In [None]:
np.sum(result)

2318

In [None]:
x = np.arange(0,100)

n=x[(x%3==0)|(x%5==0)]

print(n)

[ 0  3  5  6  9 10 12 15 18 20 21 24 25 27 30 33 35 36 39 40 42 45 48 50
 51 54 55 57 60 63 65 66 69 70 72 75 78 80 81 84 85 87 90 93 95 96 99]


In [None]:
np.sum(n)

2318

#### 2. Normalizing Elements of an Array Using "Standardization"


- Start with this 4x5 array:

In [None]:
# create 4x5 array
hands_data = np.array([ [31, 43, 73, 79, 11], [54, 69, 56, 7, 32],
                      [56, 22, 41, 70, 42], [56, 94, 22, 80, 31] ])
hands_data

array([[31, 43, 73, 79, 11],
       [54, 69, 56,  7, 32],
       [56, 22, 41, 70, 42],
       [56, 94, 22, 80, 31]])

- For each column of the array calculate the mean and standard deviation of all elements in that column


- For each item in the array, calculate the following and store the results in a new array having the same size/shape as the original:

    Normalized value = (x - mean(column) ) / standard deviation(column)
    
    
- Verify you result by checking the sum of each column in your new array: all columns should sum to zero


### A viable solution is provided below

In [None]:
# get means of all columns
colmeans = np.mean(hands_data, axis = 0)
print(colmeans)

# get all col stddevs
colstddev = np.std(hands_data, axis = 0)
print(colstddev)


[49.25 57.   48.   59.   29.  ]
[10.56823069 27.08320513 18.80159568 30.27375101 11.24722188]


In [None]:
# create a new 4x5 array of zeroes to house the normalized array
normed_arr = np.zeros((4,5))
print(normed_arr)

# normalize all items in original array
# x iterates over the columns; y iterates over the rows
for x in range(0, hands_data.shape[1]):
    for y in range(0, hands_data.shape[0]):
        normed_arr[y][x] = (hands_data[y][x] - colmeans[x]) / colstddev[x]
        
print(normed_arr)

[[0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]]
[[-1.72687373 -0.51692552  1.32967438  0.66063832 -1.60039521]
 [ 0.44946029  0.44307902  0.4254958  -1.71765963  0.26673253]
 [ 0.63870672 -1.29231381 -0.37230883  0.36335108  1.15584098]
 [ 0.63870672  1.36616031 -1.38286135  0.69367024  0.17782169]]


In [None]:
# VALIDATION OF YOUR CALCULATIONS:
# sum the columns of normed_arr to check whether they sum to zero
# if they sum to zero, you have successfully normalized the values of the
# original array
np.round(np.sum(normed_arr, axis = 0))

array([ 0.,  0.,  0.,  0., -0.])