# Introduction to NumPy

 
## Basic Data Types - List, Dicts, Sets and Tuples  
First, let us do some usual library imports

In [None]:
import pandas as pd
import os
import random
import numpy as np
import scipy
import math
import joblib

Lists are represented as ```[]```<br>
Dicts are ```{}``` <br>
Sets are also ```{}```, except they don't have the colons separating the key:value pairs<br>
Tuples are in ```()```

In [None]:
empty_dict = {}
dict1 = {'first': ['John', 'Jane'], 2: (1,2,3)}
dict1

In [None]:
type(dict1)

In [None]:
empty_list = []
list1 = ['a', 2,4, 'python']
list1

In [None]:
type(list1)

In [None]:
set1 = {1,2,4,5} # Sets can do intersect, union and difference
tuple1 = 1, 3, 4 # or
tuple1 = (1, 3, 4)
tuple1

***
## Numpy arrays  

Everything that Numpy touches ends as an array, just like everything from a pandas function is a dataframe.  Easiest way to generate a random array is `np.random.randn(2,3)` which will give an array with dimensions 2,3.  You can pick any other dimensions too.  `randn` gives random normal numbers.

In [None]:
np.random.randn(2,3)

In [None]:
np.random.randn(4)

In [None]:
data = np.random.randn(2, 3, 4)
print('The shape of the array is:', data.shape)
data

> The number of `[` gives the number of dimensions in the array.  
Two are represented on screen, the rows and columns.  All others appear afterwards.
The last two dimensions, eg here 3, 4 represent rows and columns.  The 2, the first one, means there are two 
sets of these rows and columns in the array.

In [None]:
# Now let us add another dimension.  But this time random integers than random normal.
# However randint requires specifying low and high for the uniform distribution.
data = np.random.randint(low = 1, high = 100, size = (2,3,2,4))
data

So there will be a collection of 2 rows x 4 columns matrices, repeated 3 times, and that entire set another 2 times. <br><br>
And the 4 occurrences of `[[[[` means there are 4 dimensions to the array.

In [None]:
type(data)

In [None]:
# Converting a list to an array
list1 = list(range(12))
list1

In [None]:
array1 = np.array(list1)
array1

In [None]:
# This array1 is one dimensional, let us convert to a 3x4 array.
array1.shape = (3,4)
array1

### Create arrays

In [None]:
array1 = np.zeros((2,3)) # The dimensions must be a tuple inside the brackets
array1

In [None]:
array1 = np.arange((12))
array1

In [None]:
array1.reshape(3,4) #You can reshape the dimensions of an array

In [None]:
array1.reshape(3,2,2)

In [None]:
array1 = np.ones((3,5))
array1

In [None]:
array1 = np.eye(4) #Creates the identity matrix 
array1

 **All math on arrays is element wise, and scalars are multiplied/added with each element.**

In [None]:
array1 + 4

In [None]:
array1 > np.random.randint(0, 2, (4,4))

In [None]:
array1 + 2

In [None]:
np.sum(array1) # adds all the elements of an array

In [None]:
np.sum(array1, axis = 0) # adds all elements of the array along a particular axis

### Subsetting arrays ('slices')
The confusing thing is that the first element of every dimension is 0.  The portion of the dimension you wish to select is given in the form `start:finish` where the `start` element is included, but the `finish` is excluded.  So `1:3` means include 1 and 2 but not 3.

In [None]:
array1 = np.random.randint(0, 100, (3,5))
array1

In [None]:
array1[0:2, 0:2]

In [None]:
array1[:,0:2] # ':' means include everything

In [None]:


array1[0:2]

In [None]:
#Slices are references to the original array.  So you if you need a copy, use the below:
array1[0:2].copy()

Generally, use the above 'Long Form' way for slicing where you specify the indices for each dimension. Where everything is to be included, use `:`.  There are other short-cut methods of slicing, but can leave those as is.

Imagine an array a1 with dimensions (3, 5, 2, 4).  This means:
 - This array has 3 arrays in it that have the dimensions (5, 2, 4)
 - Each of these 3 arrays have 5 additional arrays each in them of the dimension (2,4).  (So there are 3*5=15 of these 2x4 arrays)
 - Each of these (2,4) arrays has 2 one-dimensional arrays with 4 columns.
 
If in the slice notation only a portion of what to include is specified, eg a1[0], then it means we are asking for the first one of these bullets, ie the dimension parameters are specifying from the left of (3, 5, 2, 4).  It means give me the first of the 3 arrays with size (5,2,4).  

If the slice notation says a1[0,1], then it means 0th element of the first dim, and 1st element of the second dim.

Check it out using the following code:

In [None]:
a1 = np.random.randint(0, 100, (3,4,2,5))
a1

In [None]:
a1[0].shape

In [None]:
a1[0]

In [None]:
a1[0,1]

### Picking selected rows or columns

In [None]:
a1 = np.random.randint(0, 100, (8,9))
a1

In [None]:
a1[[0,3]] #pick the first and the fourth rows

In [None]:
a1[[0, 3]][:,[0, 1]] # Named rows and columns.  
# Note that a1[[0, 3],[0, 1]] does not work as expected, it selects two points (0,0)and (3,1).  Really crazy but it is
# what it is.

***
## Understanding numpy axes

In [None]:
x = np.random.randint(low = 1, high = 5, size = (2,3,4))
print('Shape: ', x.shape)
x

  
  
Shape is (2, 3, 4)  
  
axis = 0 means : (**2**, 3, 4)  
axis = 1 means : (2, **3**, 4)  
axis = 2 means : (2, 3, **4**)  

  
Numpy axes numbers run from left to right, starting with the index 0.  So `x.shape` gives me 2, 3, 4 which means 2 is the 0th axis, 3 rows are the 1st axis and 4 columns are the 2nd axis.  

Putting the axis = n argument in a function makes axis n disappear, leaving only the rest of the dimensions.  So `np.sum(array_name, axis = n)`, similarly `mean()`, `min()`, `median()`, `std()` etc will calculate the aggregation function by collapsing all the elements of the selected axis number into one and performing that operation.  See below using the sum function.  
  

In [None]:
# So with axis = 0, the very first dimension, ie the 2, will collapse leaving an array of shape (3,4)
x.sum(axis = 0) # (3,4) will remain

In [None]:
x.sum(axis = 1) # (2,4) will remain

In [None]:
x.sum(axis = 2) # (2,3) will remain

# -------You can stop here--------

***

## Matrix math

Numpy has arrays as well as matrices.  Matrices are 2D, arrays can have any number of dimensions. The only real difference between a matrix (type = `numpy.matrix`) and an array (type = `numpy.ndarray`) is that all array operations are element wise, ie the special R x C matrix multiplication does not apply to arrays.  However, for an array that is 2 x 2 in shape you can use the `@` operator to do matrix math.

So that leaves matrices and arrays interchangeable in a practical sense.  Except that you can't do an inverse of an array using `.I` which you can for a matrix.

In [None]:
# Create a matrix 'm' and an array 'a' that are identical
m = np.matrix(np.random.randint(0,10,(3,3)))
a = np.array(m)

In [None]:
m

In [None]:
a

### Transpose with a `.T`

In [None]:
m.T

In [None]:
a.T

## Inverse with a `.I` 
**Does not work for arrays**

In [None]:
m.I

### Matrix multiplication
For matrices, just a `*` suffices for matrix multiplication.  If using arrays, use `@` for matrix multiplication, which also works for matrices.  So just to be safe, just use `@`.

**Dot-product** is the same as row-by-column matrix multiplication, and is not elementwise.

In [None]:
a=np.matrix([[4, 3], [2, 1]])
b=np.mat([[1, 2], [3, 4]])

In [None]:
a

In [None]:
b

In [None]:
a*b

In [None]:
a@b

In [None]:
# Now check with arrays
a=np.array([[4, 3], [2, 1]])
b=np.array([[1, 2], [3, 4]])

In [None]:
a@b # does matrix multiplication.  

In [None]:
a

In [None]:
b

In [None]:
a*b # element-wise multiplication as a and b are arrays

`@` is the same as `np.dot(a, b)`, which is just a longer fully spelled out function.

In [None]:
np.dot(a,b)

### Exponents with matrices and arrays `**`.

In [None]:
a = np.array([[4, 3], [2, 1]])
m = np.matrix(a)
m

In [None]:
a**2 # Because a is an array, this will square each element of a.

In [None]:
m**2 # Because m is a matrix, this will be read as m*m, and dot product of the matrix with itself will result.

which is same as `a@a`

In [None]:
a@a

### Modulus of a vector, matrix or an array
The modulus is just sqrt(a^2 + b^2 + ....n^2), where a, b...n are elements of the vector, matrix or array.  Can be calculated using `np.linalg.norm(a)`

In [None]:
a = np.array([4,3,2,1])
np.linalg.norm(a)

In [None]:
# Same as calculating manually
(4**2 + 3**2 + 2**2 + 1**2) ** 0.5

In [None]:
b


In [None]:
np.linalg.norm(b)

In [None]:
m

In [None]:
np.linalg.norm(m)

In [None]:
m = np.matrix(np.random.randint(0,10,(3,3)))
m

In [None]:
np.linalg.norm(m)

In [None]:
(5**2 + 8**2 + 7**2 + 9**2 + 8**2 + 7**2 + 1**2 + 5**2 + 4**2) **0.5

### Determinant of a matrix `np.linalg.det(a)`
The determinant explains whether a matrix is expanding or shrinking space.

In [None]:
np.linalg.det(m)

### Converting from matrix to array and vice-versa
`np.asmatrix` and `np.asarray` allow you to convert one to the other. Though above we have just used np.array and np.matrix without any issue.

The above references: https://stackoverflow.com/questions/4151128/what-are-the-differences-between-numpy-arrays-and-matrices-which-one-should-i-u


## `argsort` 
Which is sort, and show index numbers instead of the values

In [None]:
a = np.array([20,10,30,0])
print("a = np.array([20,10,30,0])")
print('\n')
print('Regular ascending argsort')
print("np.argsort(a)")
print(np.argsort(a))
print('\n')
print('Descending argsort')
print("b = np.argsort(a)[::-1]")
b = np.argsort(a)[::-1]
print(b)
print("\n")
print('How to use the argsort to actually perform a sort')
print(a[b])

### Dot product
**Size of a vector, angle between vectors, distance between vectors**

In [None]:
a = np.array([1,2,3]); b = np.array([5,4,3])

In [None]:
np.linalg.norm(a) # Size of the vector, computed as the root of the squares of each of the elements

In [None]:
np.linalg.norm(a - b) # Distance between two vectors

In [None]:
np.arccos(np.dot(a,b) / (np.linalg.norm(a) * np.linalg.norm(b))) 

# Angle in radians between two vectors. To get the
# answer in degrees, multiply by 180/pi, or 180/math.pi (after import math).  Also there is a function in math called
# math.radians to get radians from degrees, or math.degrees(x) to convert angle x from radians to degrees.

In [None]:
math.acos(np.dot(a,b) / (np.linalg.norm(a) * np.linalg.norm(b))) # Same as above using math.acos instead of np.arccos

  
That's about it!