<a href="https://colab.research.google.com/github/jiangenhe/insc-486-fall-2021/blob/main/week2/week2_lecture_numpy_basics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Numerical Python (NumPy)

In [None]:
import numpy as np

## Creating Arrays

Create a list and convert it to a numpy array

In [None]:
mylist = [1, 2, 3]
x = np.array(mylist)
x

array([1, 2, 3])


Or just pass in a list directly

In [None]:
y = np.array([4, 5, 6])
y

array([4, 5, 6])


Pass in a list of lists to create a multidimensional array.

In [None]:
m = np.array([[7, 8, 9], [10, 11, 12]])
m

array([[ 7,  8,  9],
       [10, 11, 12]])


Use the shape method to find the dimensions of the array. (rows, columns)

In [None]:
m.shape

(2, 3)


`arange` returns evenly spaced values within a given interval.

In [None]:
n = np.arange(0, 30, 2) # start at 0 count up by 2, stop before 30
n


`reshape` returns an array with the same data with a new shape.

In [None]:
n = n.reshape(3, 5) # reshape array to be 3x5
n


`linspace` returns evenly spaced numbers over a specified interval.

In [None]:
o = np.linspace(0, 4, 9) # return 9 evenly spaced values from 0 to 4
o


`resize` changes the shape and size of array in-place.

In [None]:
o.resize(3, 3)
o


`ones` returns a new array of given shape and type, filled with ones.

In [None]:
np.ones((3, 2))

array([[ 1.,  1.],
       [ 1.,  1.],
       [ 1.,  1.]])


`zeros` returns a new array of given shape and type, filled with zeros.

In [None]:
np.zeros((2, 3))

array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])


`eye` returns a 2-D array with ones on the diagonal and zeros elsewhere.

In [None]:
np.eye(3)

array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])


`diag` extracts a diagonal or constructs a diagonal array.

In [None]:
np.diag(y)

array([[4, 0, 0],
       [0, 5, 0],
       [0, 0, 6]])


Create an array using repeating list (or see `np.tile`)

In [None]:
np.array([1, 2, 3] * 3)

array([1, 2, 3, 1, 2, 3, 1, 2, 3])


Repeat elements of an array using `repeat`.

In [None]:
np.repeat([1, 2, 3], 3)

array([1, 1, 1, 2, 2, 2, 3, 3, 3])

## Combining Arrays

In [None]:
p = np.ones([2, 3], int)
p

array([[1, 1, 1],
       [1, 1, 1]])


Use `vstack` to stack arrays in sequence vertically (row wise).

In [None]:
np.vstack([p, 2*p])

array([[1, 1, 1],
       [1, 1, 1],
       [2, 2, 2],
       [2, 2, 2]])


Use `hstack` to stack arrays in sequence horizontally (column wise).

In [None]:
np.hstack([p, 2*p])

## Operations

Use `+`, `-`, `*`, `/` and `**` to perform element wise addition, subtraction, multiplication, division and power.

In [None]:
print(x + y) # elementwise addition     [1 2 3] + [4 5 6] = [5  7  9]
print(x - y) # elementwise subtraction  [1 2 3] - [4 5 6] = [-3 -3 -3]

[5 7 9]
[-3 -3 -3]


In [None]:
print(x * y) # elementwise multiplication  [1 2 3] * [4 5 6] = [4  10  18]
x[3] = 3
print(x)
print(x / y) # elementwise divison         [1 2 3] / [4 5 6] = [0.25  0.4  0.5]

In [None]:
print(x**2) # elementwise power  [1 2 3] ^2 =  [1 4 9]

[1 4 9]


**Dot Product:**  

$ \begin{bmatrix}x_1 \ x_2 \ x_3\end{bmatrix}
\cdot
\begin{bmatrix}y_1 \\ y_2 \\ y_3\end{bmatrix}
= x_1 y_1 + x_2 y_2 + x_3 y_3$

In [None]:
x.dot(y) # dot product  1*4 + 2*5 + 3*6

Let's look at transposing arrays. Transposing permutes the dimensions of the array.

In [None]:
z = np.array([y, y**2])
z

array([[ 4,  5,  6],
       [16, 25, 36]])

The shape of array `z` is `(2,3)` before transposing.

In [None]:
z.shape

(2, 3)


Use `.T` to get the transpose.

In [None]:
z.T

array([[ 4, 16],
       [ 5, 25],
       [ 6, 36]])


The number of rows has swapped with the number of columns.

In [None]:
z.T.shape

(3, 2)


Use `.dtype` to see the data type of the elements in the array.

In [None]:
z.dtype

dtype('int64')


Use `.astype` to cast to a specific type.

In [None]:
z = z.astype('f')
z.dtype

dtype('float32')

## Math Functions

Numpy has many built in math functions that can be performed on arrays.

In [None]:
a = np.array([-4, -2, 1, 3, 5])

In [None]:
a.sum()

3

In [None]:
a.max()

5

In [None]:
a.min()

-4

In [None]:
a.mean()

0.60

In [None]:
a.std()

3.26


`argmax` and `argmin` return the index of the maximum and minimum values in the array.

In [None]:
a.argmax()

4

In [None]:
a.argmin()

0

## Indexing / Slicing

In [None]:
s = np.arange(13)**2
s

array([  0,   1,   4,   9,  16,  25,  36,  49,  64,  81, 100, 121, 144])


Use bracket notation to get the value at a specific index. Remember that indexing starts at 0.

In [None]:
s[0], s[4], s[-1]

(0, 16, 144)


Use `:` to indicate a range. `array[start:stop]`


Leaving `start` or `stop` empty will default to the beginning/end of the array.

In [None]:
s[1:5]

array([ 1,  4,  9, 16])

Use negatives to count from the back.

In [None]:
s[-4:]

array([ 81, 100, 121, 144])

A second `:` can be used to indicate step-size. `array[start:stop:stepsize]`

Here we are starting 5th element from the end, and counting backwards by 2 until the beginning of the array is reached.

In [None]:
s[-5::-2]

array([64, 36, 16,  4,  0])


Let's look at a multidimensional array.

In [None]:
r = np.arange(36)
r.resize((6, 6))
r

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35]])

Use bracket notation to slice: **`array[row, column]`**

In [None]:
r[2, 2]

14


And use : to select a range of rows or columns

In [None]:
r[3, 3:6]

array([21, 22, 23])


Here we are selecting all the rows up to (and not including) row 2, and all the columns up to (and not including) the last column.

In [None]:
r[:2, :-1]

array([[ 0,  1,  2,  3,  4],
       [ 6,  7,  8,  9, 10]])


This is a slice of the last row, and only every other element.

In [None]:
r[-1, ::2]

array([30, 32, 34])


We can also perform conditional indexing. Here we are selecting values from the array that are greater than 30. (Also see `np.where`)

In [None]:
r[r > 30]

array([31, 32, 33, 34, 35])


Here we are assigning all values in the array that are greater than 30 to the value of 30.

In [None]:
r[r > 30] = 30
r

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 30, 30, 30, 30, 30]])

## Copying Data

Be careful with copying and modifying arrays in NumPy!


`r2` is a slice of `r`

In [None]:
r2 = r[:3,:3]
r2

array([[ 0,  1,  2],
       [ 6,  7,  8],
       [12, 13, 14]])


Set this slice's values to zero ([:] selects the entire array)

In [None]:
r2[:] = 0
r2

array([[0, 0, 0],
       [0, 0, 0],
       [0, 0, 0]])


`r` has also been changed!

In [None]:
r

array([[ 0,  0,  0,  3,  4,  5],
       [ 0,  0,  0,  9, 10, 11],
       [ 0,  0,  0, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 30, 30, 30, 30, 30]])


To avoid this, use `r.copy` to create a copy that will not affect the original array

In [None]:
r_copy = r.copy()
r_copy

array([[ 0,  0,  0,  3,  4,  5],
       [ 0,  0,  0,  9, 10, 11],
       [ 0,  0,  0, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 30, 30, 30, 30, 30]])


Now when r_copy is modified, r will not be changed.

In [None]:
r_copy[:] = 10
print(r_copy, '\n')
print(r)

[[10 10 10 10 10 10]
 [10 10 10 10 10 10]
 [10 10 10 10 10 10]
 [10 10 10 10 10 10]
 [10 10 10 10 10 10]
 [10 10 10 10 10 10]] 

[[ 0  0  0  3  4  5]
 [ 0  0  0  9 10 11]
 [ 0  0  0 15 16 17]
 [18 19 20 21 22 23]
 [24 25 26 27 28 29]
 [30 30 30 30 30 30]]



## Iterating Over Arrays

Let's create a new 4 by 3 array of random numbers 0-9.

In [None]:
test = np.random.randint(0, 10, (4,3))
test

array([[1, 8, 2],
       [4, 4, 6],
       [4, 4, 9],
       [5, 5, 1]])


Iterate by row:

In [None]:
for row in test:
    print(row)

[1 8 2]
[4 4 6]
[4 4 9]
[5 5 1]



Iterate by index:

In [None]:
for i in range(len(test)):
    print(test[i])

[1 8 2]
[4 4 6]
[4 4 9]
[5 5 1]



Iterate by row and index:

In [None]:
for i, row in enumerate(test):
    print('row', i, 'is', row)

row 0 is [1 8 2]
row 1 is [4 4 6]
row 2 is [4 4 9]
row 3 is [5 5 1]



Use `zip` to iterate over multiple iterables.

In [None]:
test2 = test**2
test2

array([[ 1, 64,  4],
       [16, 16, 36],
       [16, 16, 81],
       [25, 25,  1]])

In [None]:
for i, j in zip(test, test2):
    print(i,'+',j,'=',i+j)

[1 8 2] + [ 1 64  4] = [ 2 72  6]
[4 4 6] + [16 16 36] = [20 20 42]
[4 4 9] + [16 16 81] = [20 20 90]
[5 5 1] + [25 25  1] = [30 30  2]


## Numpy read .csv files

In [None]:
admission = np.genfromtxt("Admission_Predict.csv", delimiter=",", names=True)
admission

In [None]:
admission.dtype.names

('Serial_No',
 'GRE_Score',
 'TOEFL_Score',
 'University_Rating',
 'SOP',
 'LOR',
 'CGPA',
 'Research',
 'Chance_of_Admit')

In [None]:
admission["CGPA"][:5]

array([9.65, 8.87, 8.  , 8.67, 8.21])

In [None]:
admission['CGPA'] = admission['CGPA']/10 * 4

In [None]:
admission["CGPA"][:5]

In [None]:
# How many students have had research experience
len(admission[admission['Research'] == 1])

In [None]:
admission[admission['Chance_of_Admit']>0.8]['GRE_Score'].mean()