# Notes for getting started with Python.  

 
## List, Dicts, Sets and Tuples

In [1]:
import pandas as pd
import os
import random
import numpy as np
import scipy
import math
import joblib

Lists are expressed as ```[]```<br>
Dicts are ```{}``` <br>
Sets are also ```{}```, except they don't have the colons separating the key:value pairs<br>
Tuples are in ```()```

In [2]:
empty_dict = {}
dict1 = {'first': ['John', 'Jane'], 2: (1,2,3)}
dict1

{'first': ['John', 'Jane'], 2: (1, 2, 3)}

In [3]:
type(dict1)

dict

In [4]:
empty_list = []
list1 = ['a', 2,4, 'python']
list1

['a', 2, 4, 'python']

In [5]:
type(list1)

list

In [6]:
set1 = {1,2,4,5} # Sets can do intersect, union and difference
tuple1 = 1, 3, 4 # or
tuple1 = (1, 3, 4)
tuple1

(1, 3, 4)

***
## Numpy arrays

Everything that Numpy touches ends as an array, just like everything from a pandas function is a dataframe.  Easiest way to generate a random array is `np.random.randn(2,3)` which will give an array with dimensions 2,3.  You can pick any other dimensions too.  `randn` gives random normal numbers.

In [7]:
data = np.random.randn(2, 3, 4)
data
# The number of `[` gives the number of dimensions in the array.  
# Two are represented on screen, the rows and columns.  All others appear afterwards.
# The last two dimensions, eg here 3, 4 represent rows and columns.  The 2, the first one, means there are two 
# sets of these rows and columns in the array.

array([[[ 0.01859175,  0.91285871, -0.27777784,  1.1778876 ],
        [-1.3581386 , -0.22362897, -1.40181201, -0.42405101],
        [ 0.93268656, -0.56882653, -1.00474979, -0.73055752]],

       [[-0.73352888, -0.38972479, -2.01955289,  0.67861684],
        [ 1.08576053,  0.15419767, -0.85740564, -0.90513423],
        [ 1.48783125, -0.48552499, -1.29105235,  0.88736602]]])

In [8]:
# Now let us add another dimension.  But this time random integers than random normal.
# However randint requires specifying low and high for the uniform distribution.
data = np.random.randint(low = 1, high = 100, size = (2,3,2,4))
data

array([[[[66, 64, 23, 74],
         [29, 70, 85, 75]],

        [[52, 23, 23, 75],
         [27, 36, 34, 26]],

        [[37, 60, 62, 95],
         [42, 47,  7, 30]]],


       [[[79, 41, 91, 68],
         [86, 50, 60, 85]],

        [[54,  3, 46, 17],
         [ 2, 82, 29, 82]],

        [[33,  7, 90, 71],
         [67,  5, 10, 31]]]])

So there will be a collection of 2 rows x 4 columns matrices, repeated 3 times, and that entire set another 2 times. <br><br>
And the 4 occurrences of `[[[[` means there are 4 dimensions to the array.

In [9]:
type(data)

numpy.ndarray

In [10]:
# Converting a list to an array
list1 = list(range(12))
list1

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

In [11]:
array1 = np.array(list1)
array1

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In [12]:
# This array1 is one dimensional, let us convert to a 3x4 array.
array1.shape = (3,4)
array1

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

### Create arrays

In [13]:
array1 = np.zeros((2,3)) # The dimensions must be a tuple inside the brackets
array1

array([[0., 0., 0.],
       [0., 0., 0.]])

In [14]:
array1 = np.arange((12))
array1

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In [15]:
array1.reshape(3,4) #You can reshape the dimensions of an array

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [16]:
array1.reshape(3,2,2)

array([[[ 0,  1],
        [ 2,  3]],

       [[ 4,  5],
        [ 6,  7]],

       [[ 8,  9],
        [10, 11]]])

In [17]:
array1 = np.ones((3,5))
array1

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [18]:
array1 = np.eye(4) #Creates the identity matrix 
array1

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

 **All math on arrays is element wise, and scalars are multiplied/added with each element.**

In [19]:
array1 + 4

array([[5., 4., 4., 4.],
       [4., 5., 4., 4.],
       [4., 4., 5., 4.],
       [4., 4., 4., 5.]])

In [21]:
array1 > np.random.randint(0, 2, (4,4))

array([[False, False, False, False],
       [False, False, False, False],
       [False, False,  True, False],
       [False, False, False,  True]])

In [22]:
array1 + 2

array([[3., 2., 2., 2.],
       [2., 3., 2., 2.],
       [2., 2., 3., 2.],
       [2., 2., 2., 3.]])

In [23]:
np.sum(array1) # adds all the elements of an array

4.0

In [24]:
np.sum(array1, axis = 0) # adds all elements of the array along a particular axis

array([1., 1., 1., 1.])

### Subsetting arrays ('slices')
The confusing thing is that the first element of every dimension is 0.  The portion of the dimension you wish to select is given in the form `start:finish` where the `start` element is included, but the `finish` is excluded.  So `1:3` means include 1 and 2 but not 3.

In [25]:
array1 = np.random.randint(0, 100, (3,5))
array1

array([[57, 21, 33, 90, 24],
       [95, 63, 95, 10, 13],
       [74, 12, 40, 90, 17]])

In [26]:
array1[0:2, 0:2]

array([[57, 21],
       [95, 63]])

In [27]:
array1[:,0:2] # ':' means include everything

array([[57, 21],
       [95, 63],
       [74, 12]])

In [28]:


array1[0:2]

array([[57, 21, 33, 90, 24],
       [95, 63, 95, 10, 13]])

In [29]:
#Slices are references to the original array.  So you if you need a copy, use the below:
array1[0:2].copy()

array([[57, 21, 33, 90, 24],
       [95, 63, 95, 10, 13]])

Generally, use the above 'Long Form' way for slicing where you specify the indices for each dimension. Where everything is to be included, use `:`.  There are other short-cut methods of slicing, but can leave those as is.

Imagine an array a1 with dimensions (3, 5, 2, 4).  This means:
 - This array has 3 arrays in it that have the dimensions (5, 2, 4)
 - Each of these 3 arrays have 5 additional arrays each in them of the dimension (2,4).  (So there are 3*5=15 of these 2x4 arrays)
 - Each of these (2,4) arrays has 2 one-dimensional arrays with 4 columns.
 
If in the slice notation only a portion of what to include is specified, eg a1[0], then it means we are asking for the first one of these bullets, ie the dimension parameters are specifying from the left of (3, 5, 2, 4).  It means give me the first of the 3 arrays with size (5,2,4).  

If the slice notation says a1[0,1], then it means 0th element of the first dim, and 1st element of the second dim.

Check it out using the following code:

In [30]:
a1 = np.random.randint(0, 100, (3,4,2,5))
a1

array([[[[23, 63, 63,  2, 38],
         [23, 16, 63, 97, 69]],

        [[ 8, 96, 96,  2, 10],
         [11, 86, 80,  0, 98]],

        [[52, 51,  6, 34, 77],
         [24, 32, 26, 36, 32]],

        [[45, 99, 37, 33, 23],
         [94, 26, 10, 94, 99]]],


       [[[92, 52, 96, 47,  3],
         [52, 69, 23, 66, 71]],

        [[39,  8, 43, 62, 91],
         [35, 77, 33, 25, 10]],

        [[42, 93, 97, 12, 44],
         [ 3, 61, 82, 38,  0]],

        [[89, 19, 85, 54, 81],
         [64,  6, 52, 20, 30]]],


       [[[27, 33, 88, 22, 51],
         [62, 53,  4, 43, 64]],

        [[85,  5, 22, 75,  0],
         [19, 41,  4, 64, 46]],

        [[16, 60, 71,  6,  4],
         [ 1, 46, 95, 83, 21]],

        [[28, 64, 93, 33,  6],
         [51, 96, 47, 98, 28]]]])

In [31]:
a1[0].shape

(4, 2, 5)

In [32]:
a1[0]

array([[[23, 63, 63,  2, 38],
        [23, 16, 63, 97, 69]],

       [[ 8, 96, 96,  2, 10],
        [11, 86, 80,  0, 98]],

       [[52, 51,  6, 34, 77],
        [24, 32, 26, 36, 32]],

       [[45, 99, 37, 33, 23],
        [94, 26, 10, 94, 99]]])

In [33]:
a1[0,1]

array([[ 8, 96, 96,  2, 10],
       [11, 86, 80,  0, 98]])

### Picking selected rows or columns

In [34]:
a1 = np.random.randint(0, 100, (8,9))
a1

array([[17, 89, 41, 63, 93, 89, 84, 29, 14],
       [45, 76, 37, 60, 76, 15, 88, 19, 89],
       [34, 34, 53, 22, 65,  4, 38, 31, 86],
       [74, 40, 66, 99, 94, 97, 29, 90, 31],
       [91, 68, 48, 25,  9, 44, 35, 18, 12],
       [45, 59,  5, 92, 17, 93, 69, 80,  2],
       [ 3, 54, 29, 14, 30, 29, 38, 40, 86],
       [53, 79, 89, 34, 65, 69, 96, 59,  6]])

In [35]:
a1[[0,3]] #pick the first and the fourth rows

array([[17, 89, 41, 63, 93, 89, 84, 29, 14],
       [74, 40, 66, 99, 94, 97, 29, 90, 31]])

In [36]:
a1[[0, 3]][:,[0, 1]] # Named rows and columns.  
# Note that a1[[0, 3],[0, 1]] does not work as expected, it selects two points (0,0)and (3,1).  Really crazy but it is
# what it is.

array([[17, 89],
       [74, 40]])

## `argsort` 
Which is sort, and show index numbers instead of the values

In [37]:
a = np.array([20,10,30,0])
print("a = np.array([20,10,30,0])")
print('\n')
print('Regular ascending argsort')
print("np.argsort(a)")
print(np.argsort(a))
print('\n')
print('Descending argsort')
print("b = np.argsort(a)[::-1]")
b = np.argsort(a)[::-1]
print(b)
print("\n")
print('How to use the argsort to actually perform a sort')
print(a[b])

a = np.array([20,10,30,0])


Regular ascending argsort
np.argsort(a)
[3 1 0 2]


Descending argsort
b = np.argsort(a)[::-1]
[2 0 1 3]


How to use the argsort to actually perform a sort
[30 20 10  0]


### Dot product
**Size of a vector, angle between vectors, distance between vectors**

In [38]:
a = np.array([1,2,3]); b = np.array([5,4,3])

In [39]:
np.linalg.norm(a) # Size of the vector, computed as the root of the squares of each of the elements

3.7416573867739413

In [40]:
np.linalg.norm(a - b) # Distance between two vectora

4.47213595499958

In [41]:
np.arccos(np.dot(a,b) / (np.linalg.norm(a) * np.linalg.norm(b))) 

# Angle in radians between two vectors. To get the
# answer in degrees, multiply by 180/pi, or 180/math.pi (after import math).  Also there is a function in math called
# math.radians to get radians from degrees, or math.degrees(x) to convert angle x from radians to degrees.

0.5889546074455115

In [42]:
math.acos(np.dot(a,b) / (np.linalg.norm(a) * np.linalg.norm(b))) # Same as above using math.acos instead of np.arccos

0.5889546074455115

***

## Matrix math

Numpy has arrays as well as matrices.  Matrices are 2D, arrays can have any number of dimensions. The only real difference between a matrix (type = `numpy.matrix`) and an array (type = `numpy.ndarray`) is that all array operations are element wise, ie the special R x C matrix multiplication does not apply to arrays.  However, for an array that is 2 x 2 in shape you can use the `@` operator to do matrix math.

So that leaves matrices and arrays interchangeable in a practical sense.  Except that you can't do an inverse of an array using `.I` which you can for a matrix.

In [43]:
# Create a matrix 'm' and an array 'a' that are identical
m = np.matrix(np.random.randint(0,10,(3,3)))
a = np.array(m)

In [44]:
m

matrix([[0, 4, 0],
        [0, 8, 3],
        [6, 6, 7]])

In [45]:
a

array([[0, 4, 0],
       [0, 8, 3],
       [6, 6, 7]])

### Transpose with a `.T`

In [46]:
m.T

matrix([[0, 0, 6],
        [4, 8, 6],
        [0, 3, 7]])

In [47]:
a.T

array([[0, 0, 6],
       [4, 8, 6],
       [0, 3, 7]])

## Inverse with a `.I` 
**Does not work for arrays**

In [48]:
m.I

matrix([[ 0.52777778, -0.38888889,  0.16666667],
        [ 0.25      ,  0.        ,  0.        ],
        [-0.66666667,  0.33333333, -0.        ]])

### Matrix multiplication
For matrices, just a `*` suffices for matrix multiplication.  If using arrays, use `@` for matrix multiplication, which also works for matrices.  So just to be safe, just use `@`.

**Dot-product** is the same as row-by-column matrix multiplication, and is not elementwise.

In [49]:
a=np.matrix([[4, 3], [2, 1]])
b=np.mat([[1, 2], [3, 4]])

In [50]:
a

matrix([[4, 3],
        [2, 1]])

In [51]:
b

matrix([[1, 2],
        [3, 4]])

In [52]:
a*b

matrix([[13, 20],
        [ 5,  8]])

In [53]:
a@b

matrix([[13, 20],
        [ 5,  8]])

In [54]:
# Now check with arrays
a=np.array([[4, 3], [2, 1]])
b=np.array([[1, 2], [3, 4]])

In [55]:
a@b # does matrix multiplication.  

array([[13, 20],
       [ 5,  8]])

In [56]:
a

array([[4, 3],
       [2, 1]])

In [57]:
b

array([[1, 2],
       [3, 4]])

In [58]:
a*b # element-wise multiplication as a and b are arrays

array([[4, 6],
       [6, 4]])

`@` is the same as `np.dot(a, b)`, which is just a longer fully spelled out function.

In [59]:
np.dot(a,b)

array([[13, 20],
       [ 5,  8]])

### Time your code with `%time`

In [60]:
%time np.dot(array1, array1)

ValueError: shapes (3,5) and (3,5) not aligned: 5 (dim 1) != 3 (dim 0)

### Exponents with matrices and arrays `**`.

In [61]:
a = np.array([[4, 3], [2, 1]])
m = np.matrix(a)
m

matrix([[4, 3],
        [2, 1]])

In [62]:
a**2 # Because a is an array, this will square each element of a.

array([[16,  9],
       [ 4,  1]], dtype=int32)

In [63]:
m**2 # Because m is a matrix, this will be read as m*m, and dot product of the matrix with itself will result.

matrix([[22, 15],
        [10,  7]])

which is same as `a@a`

In [64]:
a@a

array([[22, 15],
       [10,  7]])

### Modulus of a vector, matrix or an array
The modulus is just sqrt(a^2 + b^2 + ....n^2), where a, b...n are elements of the vector, matrix or array.  Can be calculated using `np.linalg.norm(a)`

In [65]:
a = np.array([4,3,2,1])
np.linalg.norm(a)

5.477225575051661

In [66]:
# Same as calculating manually
(4**2 + 3**2 + 2**2 + 1**2) ** 0.5

5.477225575051661

In [67]:
b


array([[1, 2],
       [3, 4]])

In [68]:
np.linalg.norm(b)

5.477225575051661

In [69]:
m

matrix([[4, 3],
        [2, 1]])

In [70]:
np.linalg.norm(m)

5.477225575051661

In [71]:
m = np.matrix(np.random.randint(0,10,(3,3)))
m

matrix([[3, 6, 0],
        [1, 9, 5],
        [9, 1, 7]])

In [72]:
np.linalg.norm(m)

16.822603841260722

In [73]:
(5**2 + 8**2 + 7**2 + 9**2 + 8**2 + 7**2 + 1**2 + 5**2 + 4**2) **0.5

19.339079605813716

### Determinant of a matrix `np.linalg.det(a)`
The determinant explains whether a matrix is expanding or shrinking space.

In [74]:
np.linalg.det(m)

401.99999999999966

### Converting from matrix to array and vice-versa
`np.asmatrix` and `np.asarray` allow you to convert one to the other. Though above we have just used np.array and np.matrix without any issue.

The above references: https://stackoverflow.com/questions/4151128/what-are-the-differences-between-numpy-arrays-and-matrices-which-one-should-i-u


***
## Understanding numpy axes

In [75]:
x = np.random.randint(low = 1, high = 5, size = (2,3,4))
print('Shape: ', x.shape)
x

Shape:  (2, 3, 4)


array([[[1, 1, 2, 4],
        [3, 3, 3, 2],
        [3, 2, 3, 1]],

       [[2, 1, 2, 4],
        [4, 3, 1, 4],
        [3, 2, 4, 4]]])

  
  
Shape is (2, 3, 4)  
  
axis = 0 means : (**2**, 3, 4)  
axis = 1 means : (2, **3**, 4)  
axis = 2 means : (2, 3, **4**)  

  
Numpy axes numbers run from left to right, starting with the index 0.  So `x.shape` gives me 2, 3, 4 which means 2 is the 0th axis, 3 rows are the 1st axis and 4 columns are the 2nd axis.  

Putting the axis = n argument in a function makes axis n disappear, leaving only the rest of the dimensions.  So `np.sum(array_name, axis = n)`, similarly `mean()`, `min()`, `median()`, `std()` etc will calculate the aggregation function by collapsing all the elements of the selected axis number into one and performing that operation.  See below using the sum function.  
  

In [76]:
# So with axis = 0, the very first dimension, ie the 2, will collapse leaving an array of shape (3,4)
x.sum(axis = 0) # (3,4) will remain

array([[3, 2, 4, 8],
       [7, 6, 4, 6],
       [6, 4, 7, 5]])

In [77]:
x.sum(axis = 1) # (2,4) will remain

array([[ 7,  6,  8,  7],
       [ 9,  6,  7, 12]])

In [78]:
x.sum(axis = 2) # (2,3) will remain

array([[ 8, 11,  9],
       [ 9, 12, 13]])

  
That's about it!