# Python Crash Course (Numpy for data analysis)

Audience: M2 Data & IA.
Chapter 1 - Introduction to Machine learning.
Section 2: Numpy for data analysis.

This notebook will just go through the basic topics in order:

* Arrays Indexing, Slicing.
* Arrays Shapes, Operations.

## What is NumPy?

NumPy is a Python library used for working with arrays.

It also has functions for working in domain of <b> linear algebra </b>, fourier transform, and matrices.

NumPy stands for Numerical Python.

1. Numpy is fast ! uses bindings to C/C++ libraries (up to 50x faster than traditional Python lists).
2. The array object in NumPy is called ndarray.
3. Code base can be found at https://github.com/numpy/numpy 

## Installation

1. Using anaconda prompt/python terminal: pip install numpy or conda install numpy
2. using %pip or %conda magic: %pip install numpy
3. Google Colab !pip install numpy

## Importing

Use the following improt convention

In [4]:
import numpy as np

## Arrays


NumPy is used to work with arrays. The array object in NumPy is called ndarray.

We can create a NumPy ndarray object by using the array() function.

Numpy arrays essentially come in two flavors: <b>vectors and matrices</b>. 

1. Vectors are strictly 1-d arrays 
2. Matrices are 2-d (but you should note a matrix can still have only one row or one column).


In [1]:
from IPython.display import Image
from IPython.core.display import HTML 
Image(url= "https://miro.medium.com/max/817/0*y04Nh3L0aSwyGaby.png")

In [None]:
# Create a NumPy ndarray Object using np.array.

numpy_array = np.array([1, 2, 3, 4, 5])

In [None]:
print(numpy_array)

In [None]:
# From a Python List we can create an array by directly converting a list or list of lists:
py_list = [1, 2, 3, 4, 5]
print(py_list)

In [None]:
np_list_from_py_list = np.array(py_list)
print(np_list_from_py_list)

1. Each value in an array is a 0-D, or Scalar.
2. An array that has 0-D arrays as its elements is called uni-dimensional or 1-D array.

### 2-D Arrays

An array that has 1-D arrays as its elements is called a 2-D array.
These are often used to represent matrix or 2nd order tensors.

In [None]:
# Create a 2-D array containing two arrays with the values 1,2,3 and 4,5,6

two_dim_arr = np.array([[1, 2, 3], [4, 5, 6]])
print(two_dim_arr)

### 3-D Arrays

An array that has 2-D arrays (matrices) as its elements is called 3-D array.
These are often used to represent a 3rd order tensor.

In [None]:
# Create a 3-D array with two 2-D arrays, both containing two arrays with the values 1,2,3 and 4,5,6

three_dim_arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(three_dim_arr)

### Check Number of Dimensions?

NumPy Arrays provides the <b>ndim</b> attribute that returns an integer that tells us how many dimensions the array have.

In [1]:
# how many dimentions are here.
a = np.array([[1,2,3],[4,5,6],[7,8,9]])

NameError: name 'np' is not defined

In [None]:
e = np.array(42)
b = np.array([1, 2, 3, 4, 5])
c = np.array([[1, 2, 3], [4, 5, 6]])
d = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])

In [2]:
print(e.ndim)
print(b.ndim)
print(c.ndim)
print(d.ndim)

NameError: name 'e' is not defined

### Array Slicing.

Slicing in python means taking elements from one given index to another given index.


We pass slice instead of index like this: [start:end].

We can also define the step, like this: [start:end:step].


1. If we don't pass start, its considered 0

2. If we don't pass end, its considered length of array in that dimension

3. If we don't pass step, its considered 1

4. Use the step value to determine the step of the slicing.

In [None]:
slice_arr = np.array([1, 2, 3, 4, 5, 6, 7])

In [None]:
# Slice elements from index 1 to index 5 from the following array. 
# Tip: We start at 0.
print(slice_arr[1:5])

In [None]:
# Slice elements from index 4 to the end of the array:

In [None]:
print(slice_arr[4:])

In [None]:
# Slice elements from the beginning to index 4 (not included).

In [None]:
print(slice_arr[:4])

In [None]:
# Return every other element from index 1 to index 5

In [None]:
print(slice_arr[1:5:2])

In [None]:
# Return every other element from the entire array

In [None]:
print(slice_arr[::2])

In [None]:
#Reversed array
slice_arr[::-1]

#### Slicing 2-D Arrays

In [None]:
two_dim_arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])

In [None]:
# From the second element, slice elements from index 1 to index 4 (not included)
# Remember that second element has index 1. 

In [None]:
print(two_dim_arr[1, 1:4])

In [None]:
#From both elements, return index 2

In [None]:
print(two_dim_arr[:, 2]) #or two_dim_arr[0:2, 2]

In [None]:
#From both elements, slice index 1 to index 4 (not included), this will return a 2-D array

In [None]:
print(two_dim_arr[:, 1:4]) #or two_dim_arr[0:2, 1:4]

#### Shapes

The shape of an array is the number of elements in each dimension.

NumPy arrays have an attribute called shape that returns a tuple with each index having the number of corresponding elements.

In [None]:
two_dim_arr

In [None]:
two_dim_arr.shape

The example above returns (2, 5), which means that the array has 2 dimensions, where the first dimension has 2 elements and the second has 5.

In [None]:
from IPython.display import Image
from IPython.core.display import HTML 
Image(url= "https://miro.medium.com/max/817/0*y04Nh3L0aSwyGaby.png")

#### Reshaping arrays

Reshaping means changing the shape of an array.

The shape of an array is the number of elements in each dimension.

By reshaping we can add or remove dimensions or change number of elements in each dimension.

In [None]:
# Convert the following 1-D array with 12 elements into a 2-D array.

# The outermost dimension will have 4 arrays, each with 3 elements.

In [None]:
test_reshape = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])

In [None]:
new_reshape = test_reshape.reshape(4, 3)
print(new_reshape)

In [None]:
# Convert the following 1-D array with 12 elements into a 3-D array.

# The outermost dimension will have 2 arrays that contains 3 arrays, each with 2 elements.

In [None]:
new_reshape = test_reshape.reshape(2, 3, 2)
print(new_reshape)

#### Can We Reshape Into any Shape?

Yes, as long as the elements required for reshaping are equal in both shapes.

We can reshape an 8 elements 1D array into 4 elements in 2 rows 2D array. (4*2 = 8 elements).

We cannot reshape it into a 3 elements 3 rows 2D array as that would require 3x3 = 9 elements.

#### Flattening the arrays

Flattening array means converting a multidimensional array into a 1D array.

We can use reshape(-1) to do this.

In [None]:
new_reshape = test_reshape.reshape(-1)
print(new_reshape)

## Built-in Methods

There are lots of built-in ways to generate Arrays.

#### arange(): Create an array of evenly spaced values (step value)

In [None]:
np.arange(0,10) # [start (included), end (not included), step]

In [None]:
np.arange(0,11,2)

#### zeros() and ones() : Generate arrays of zeros or ones

In [None]:
np.zeros(3) #Create an array of zeros 1D

In [None]:
np.zeros((3,4)) #Create an array of zeros 2D

In [None]:
np.ones((2,3,4)) #Create an array of ones  2D

In [None]:
np.ones((2,3,4), dtype="i") #Create an array of ones (integer). 2D

i - integer 

b - boolean

u - unsigned integer

f - float

c - complex float

m - timedelta

M - datetime

O - object

S - string

U - unicode string

V - fixed chunk of memory for other type ( void )

#### linspace() : Returns evenly spaced numbers over a specified interval.

In [None]:
np.linspace(0,10,3) #Create an array of evenlyspaced values (0 start(included), 10 end (included), 3 = number of samples)

In [None]:
np.linspace(0,10,50) #Create an array of evenlyspaced values (50 = number of samples)

#### eye() : Creates an identity matrix

In linear algebra, the identity matrix of size n, is the n*n square matrix
with <b>ones on the main diagonal</b> and zeros elsewhere.

In [None]:
np.eye(2) #Create a 2X2 identity matrix

In [None]:
np.eye(4) #Create a 4X4 identity matrix

#### Random

Numpy also has lots of ways to create random number arrays

#### rand

Create an array of the given shape and populate it with random samples from a <b>uniform distribution</b> over ``[0, 1]``.

In [None]:
np.random.rand(2) # 1D array of 2 random numbers.

In [None]:
np.random.rand(5,5) #2D array of 5 elements, having 5 random numbers.

#### randn

Return a sample (or samples) from the <b>standard normal distribution</b>. Unlike rand which is uniform.

Tip: A normal distribution shows that it is more likely that numbers in the middle will occur than numbers at the end.

A uniform distribution holds the same probability for the entire interval of numbers.

In [None]:
np.random.randn(2) # 1D array of 2 random numbers.

In [None]:
np.random.randn(5,5) #2D array of 5 elements, having 5 random numbers.

#### randint

Return random integers from low (included) to high (excluded).

In [4]:
np.random.randint(1,100)

54

In [5]:
np.random.randint(1,100,10)

array([65, 15, 55,  9, 11, 44, 92, 33, 42, 43])

### Array Attributes and Methods

Let's discuss some useful attributes and methods or an array

In [6]:
attr_arr_1 = np.random.randint(0,50,10)
attr_arr_2 = np.random.randint(0,50,10)

#### max,min,argmax,argmin

These are useful methods for finding max or min values. Or to find their index locations using argmin or argmax

In [None]:
attr_arr_1.max() # finds max element.

In [None]:
attr_arr_1.argmax() # finds index of max element.

In [None]:
attr_arr_1.min()

In [None]:
attr_arr_1.argmin()

#### Selection

Let's briefly go over how to use brackets for selection based off of comparison operators.

In [7]:
attr_arr_1

array([35, 39, 30, 33,  9,  1, 19, 36, 49, 39])

In [None]:
attr_arr_1 == attr_arr_1 #Elementwise comparison

In [8]:
attr_arr_1 > 8

array([ True,  True,  True,  True,  True, False,  True,  True,  True,
        True])

In [None]:
attr_arr_1[attr_arr_1 > 10] # Keep elements bigger than 10.

In [None]:
attr_arr_1

#### Arithmetic

You can easily perform array with array arithmetic, or scalar with array arithmetic.

In [None]:
attr_arr_1 = np.random.randint(0,50,10)
attr_arr_2 = np.random.randint(0,50,10)

In [None]:
attr_arr_1

In [None]:
attr_arr_2

In [None]:
attr_arr_1 + attr_arr_2 # Sum elements row-wise.

In [None]:
attr_arr_1 * attr_arr_2 # multiplication

In [None]:
attr_arr_1**3 # change all elements to their power of 3

Numpy comes with many universal array functions, which are essentially just mathematical operations you can use to perform the operation across the array.

In [9]:
#Taking Square Roots
np.sqrt(attr_arr_1)

array([5.91607978, 6.244998  , 5.47722558, 5.74456265, 3.        ,
       1.        , 4.35889894, 6.        , 7.        , 6.244998  ])

In [None]:
#Calcualting exponential (e^)
np.exp(attr_arr_1)

In [None]:
#Calcualting sines of an array
np.sin(attr_arr_1) 

In [None]:
#Elementwise cosine
np.cos(attr_arr_1) 

In [None]:
#Elementwise natural logarithm
np.log(attr_arr_1)

In [None]:
#Dot product 
attr_arr_1.dot(attr_arr_1)

### Joining NumPy Arrays

Joining means putting contents of two or more arrays in a single array.

In SQL we join tables based on a key, whereas in NumPy we join arrays by axes.

We pass a sequence of arrays that we want to join to the concatenate() function, along with the axis. 

If axis is not explicitly passed, it is taken as 0.

In [None]:
attr_arr_1 = np.random.randint(0,50,10)
attr_arr_2 = np.random.randint(0,50,10)

In [None]:
attr_arr_1

In [None]:
attr_arr_2

In [None]:
np.concatenate((attr_arr_1, attr_arr_2)) # columns

In [None]:
np.concatenate((attr_arr_1, attr_arr_2), axis=1)

In [None]:
# Join two 2-D arrays along rows 
np.concatenate((attr_arr_1, attr_arr_2), axis=1)

In [None]:
# Join two 2-D arrays along columns

attr_arr_1 = np.array([[1, 2], [3, 4]])

attr_arr_2 = np.array([[5, 6], [7, 8]])

## Well done ! 

Now its time to test your skills.

#### Import NumPy as np

In [None]:
import numpy as np

#### Create an array of 10 zeros 

In [None]:
np.zeros(10)

#### Create an array of 10 ones

In [None]:
np.ones(10)

#### Create an array of 10 fives

In [None]:
np.ones(10) * 5

#### Create an array of the integers from 10 to 50

In [None]:
np.arange(10,51)

#### Create an array of all the even integers from 10 to 50

In [None]:
np.arange(10,51,2)

#### Create a 3x3 matrix with values ranging from 0 to 8

In [None]:
np.arange(9).reshape(3,3)

#### Create a 3x3 identity matrix

In [None]:
np.eye(3)

#### Use NumPy to generate a random number between 0 and 1

In [None]:
np.random.rand(1)

#### Use NumPy to generate an array of 10 random numbers sampled from a standard normal distribution

In [None]:
np.random.randn(10)

#### Create a 10X10 matrix with values ranging from 0.01 to 1.

In [None]:
np.arange(1,101).reshape(10,10) / 100

#### Create an array of 10 linearly spaced points between 0 and 1

In [None]:
np.linspace(0,1,10)

## Numpy Indexing and Selection

Given a matrice, replicate the resulting matrix outputs.

In [5]:
test_mat = np.arange(1,26).reshape(5,5)
test_mat

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25]])

In [2]:
# WRITE CODE HERE THAT REPRODUCES THE OUTPUT OF THE CELL BELOW
test_mat[2:,1:]

NameError: name 'test_mat' is not defined

In [None]:
test_mat[3,4]

In [None]:
test_mat[:3,1:2]

In [None]:
test_mat[4,:]

In [None]:
test_mat[3:5,:]

In [None]:
test_mat[:, ::-1]

#### Get the sum of all the values in mat

In [None]:
test_mat.sum()

#### Get the standard deviation of the values in mat

In [None]:
test_mat.std()

#### Get the sum of all the columns in mat

In [None]:
test_mat.sum(axis=0)

## Thats It !