<h1 style="text-align: center;"> An Introduction to Numpy </h1>

## Why do we need Numpy?

<br />

* __Vectors, Matrices and linear algebra operations__ are fundamental to Machine Learning and Data Science

<br />

* Vectors can be represented as lists `[2, 3, 2]` and matrices can be represented as a __list of lists__ `[[1, 1, 1], [2, 2, 2]]`

<br />

* Python lists seem like a reasonable data structure to use for the above use case, let's take a look at it

<br />

* Let's perform the basic operation of a vector addition and see the __performance difference between lists and numpy arrays__

<br />

* Before we proceed, let's import the numpy library and it is an __accepted convention to use the `np` alias for `numpy`__

In [3]:
import numpy as np

* Let us create __two lists and two arrays__ for our experiment

In [9]:
# The * operator on the list replicates the items in a list, n number of times given list * n 

example_list_one = [1, 2, 3, 4] * 1000

example_list_two = [4, 3, 2, 1] * 1000

# We'll look at the code given here later, but for now understand that numpy arrays can be created from python lists

example_array_one = np.array(example_list_one)

example_array_two = np.array(example_list_two)

* Below, is a function that performs a vector addition on two lists of equal length

In [15]:
np.array([1, 1, np.nan, 0], dtype=np.float64)

array([ 1.,  1., nan,  0.])

In [10]:
def vector_addition(vector_one, vector_two):
    
    new_vec = []
    
    for i in range(len(vector_one)):
        
        new_vec.append(vector_one[i] + vector_two[i])
        
    return new_vec

* The `%%timeit` IPython magic command gives an approximate runtime of the code in a cell

In [11]:
%%timeit

vector_addition(example_list_one, example_list_two)


489 µs ± 6.12 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [12]:
%%timeit

example_array_one + example_array_two


2.2 µs ± 29.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


* From the above experiment we can see that __using numpy arrays__ for even a simple vector addition is __more than 200 times faster compared to a list__

<br />

* We need Numpy arrays for:

<br />

    * Representing Vectors and Matrices as datastrcutures ready for manipulation in python
    
<br />
    
    * Access to fast vectorised operations (will be elaborated later)
    
<br />
    
    * Access to linear algebra operations, numpy has access to many linear algebra operations

## So, what is a numpy array anyway?

<br />

* A numpy array is a highly efficient implementation of the n dimensional array and unlike a python list contains only elements of a single data type

<br />

* All the elements in a numpy array are of the __same data type__, leading for efficient storage of the values in the array as a __contiguous block in memory__ for fast access to the elements whereas Python lists store pointers to where each element is stored increasing the time consumed to access an element

<br />

* Numpy also has bindings with C/C++/Fortran for fast and speedy linear algebra computation using the low level bindings on the numpy arrays

<img src='img/numpy_arrays.png' width='600px'/>

## Data types supported by Numpy

* Numpy supports the following data types int, float, string ,etc.

![](img/numpy_dtypes.jpg)

<img src='' />


In [11]:
np.int32

numpy.int32

In [12]:
np.float64

numpy.float64

### Missing and Special Data Types

* Numpy also has access to special data types such as `np.nan`, `np.inf`, etc.

In [6]:
np.isnan(np.nan)

True

* `np.nan` is used to encode missing values in both numpy as well as pandas

<br />

* It is important to remember that `np.nan` can only exist in floting point arrays and cannot exist in integer or boolean arrays as the implementation type of `np.nan` is `float`; `pasndas 2.0` will alleviate this issue

In [7]:
type(np.nan)

float

## Creating Numpy Arrays

<img src='img/numpy_array_creation.jpg' width='650px'/>

* `np.array()`

* `np.zeroes()`

* `np.arange()`

* Using the `np.array()` function let us create a numpy array from a python list

In [14]:
firstArray = np.array([12, 200, 82])

firstArray

array([ 12, 200,  82])

* Using the `np.arange()` function let us create a numpy array containing elements from 0 to 15

In [16]:
np.arange(16)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])

* Functions that allow us to prepopulate values in a numpy array also mostly have the ability for us to define the `shape of a numpy array`

## Attributes of a Numpy Array

### Data Type of the numpy array

* As we saw above, numpy has special datatypes implemented in it's library

<br />

* We can use those data types to create numpy arrays, with the only limitation being that the array has to be homogenous

<br />

* Let us see a few examples below

In [5]:
# We can convert any python list into a numpy array

first_array = np.array([1, 2, 3], dtype = 'int32')

first_array.dtype

dtype('int32')

## The shape of a numpy array

* __This is a very important concept__

* Every numpy array has an attribute called shape

* For a python list, that is converted into a numpy array, the shape is __(number_of_elements,)__

* For the numpy array values stored inside a pandas data frame, the shape is __(nrows, ncols)__

* The shape of a numpy array is stored as a tuple


In [10]:
# Shape of a single element

numpy_element = np.array(1)

print(numpy_element.shape)

numpy_element

()


array(1)

__For the one dimensional array the returned shape is (number_of_elements,)__

In [11]:
# Shape of a one dimensional array

numpy_one_dim = np.array([1, 2, 3, 4, 5, 6])

print(numpy_one_dim.shape)

numpy_one_dim

(6,)


array([1, 2, 3, 4, 5, 6])

__For the two dimensional array (like a matrix) the returned shape is (nrows, ncols)__

In [15]:
# Shape of a one dimensional array

numpy_two_dim = np.array([
                            [1, 2, 3, 4, 5, 6],
                            [11, 22, 33, 44, 55, 66]
                         ])

print(numpy_two_dim.shape)

numpy_two_dim

(2, 6)


array([[ 1,  2,  3,  4,  5,  6],
       [11, 22, 33, 44, 55, 66]])

## Let's go deeper and generalize it

Even though it might not be an accurate informalisation of the idea, you can in general think about the shape of arrays as 

* __(num_arrays_outermost, num_arrays_next, ......, num_elements_in_the_last_array)__

### Let's break this down for a three dimensional array

In [16]:
# Shape of a three dimensional array

numpy_three_dim = np.array([
                            [
                                [1, 2, 3, 4, 5, 6],
                                [11, 22, 33, 44, 55, 66],
                                [111, 222, 333, 444, 555, 666]
                            ],
                            [
                                [21, 22, 23, 24, 25, 26],
                                [221, 222, 233, 244, 255, 266],
                                [2211, 2222, 2333, 2444, 2555, 2666]
                            ]
                         ])

print(numpy_three_dim.shape)

numpy_three_dim

(2, 3, 6)


array([[[   1,    2,    3,    4,    5,    6],
        [  11,   22,   33,   44,   55,   66],
        [ 111,  222,  333,  444,  555,  666]],

       [[  21,   22,   23,   24,   25,   26],
        [ 221,  222,  233,  244,  255,  266],
        [2211, 2222, 2333, 2444, 2555, 2666]]])

### Why is the shape of the array (2, 3, 6)?

## Indexing elements from a numpy array

* --> Elements have to be indexed using the position of an element

* --> __Indexing starts from 0__

* --> We navigate through the numpy array using the __shape of the array__

In [25]:
## Indexing a one dimensional array works a lot like python lists

print(
    
    "\n",

    "The array has the following elements",
    
    " ---> ",
    
    numpy_one_dim

)


print(
    
    "\n",
    
    "Indexing Starts from 0",
    
    " --> numpy_one_dim[0] --> ",
    
    numpy_one_dim[0]

)

print(
    
    "\n",
    
    "Access any element using the position",
    
    " --> numpy_one_dim[3] --> ",
    
    numpy_one_dim[3]

)

print(
    
    "\n",
        
    "Access the last element using -1",
    
    " --> numpy_one_dim[-1] --> ",
    
    numpy_one_dim[-1]

)

print(
    
    "\n",
    
    "Slice an array using start_pos:(end_pos + 1)",
    
    " --> numpy_one_dim[-1] --> ",
    
    numpy_one_dim[2:-1],
    
    "\n"
)


 The array has the following elements  --->  [1 2 3 4 5 6]

 Indexing Starts from 0  --> numpy_one_dim[0] -->  1

 Access any element using the position  --> numpy_one_dim[3] -->  4

 Access the last element using -1  --> numpy_one_dim[-1] -->  6

 Slice an array using start_pos:(end_pos + 1)  --> numpy_one_dim[-1] -->  [3 4 5] 



In [26]:
# Indexing a 2d array

print(numpy_two_dim)

[[ 1  2  3  4  5  6]
 [11 22 33 44 55 66]]


### Extracting Multiple Elements

In [39]:
## We can extract multiple elements from an n dimensional array by passing in a list

print(numpy_two_dim[ 1, [1, 2, 3]])

# We can extract all elements from a particular dimension of a the shape by using :

print(numpy_two_dim[ 1, :])

[22 33 44]
[11 22 33 44 55 66]


## Reshaping numpy arrays


In [43]:
numpy_three_dim

array([[[   1,    2,    3,    4,    5,    6],
        [  11,   22,   33,   44,   55,   66],
        [ 111,  222,  333,  444,  555,  666]],

       [[  21,   22,   23,   24,   25,   26],
        [ 221,  222,  233,  244,  255,  266],
        [2211, 2222, 2333, 2444, 2555, 2666]]])

In [44]:
## Let's reshape numpy_three_dim to (9, 4)

numpy_three_dim.shape

(2, 3, 6)

In [42]:
numpy_three_dim.reshape(9, 4)

array([[   1,    2,    3,    4],
       [   5,    6,   11,   22],
       [  33,   44,   55,   66],
       [ 111,  222,  333,  444],
       [ 555,  666,   21,   22],
       [  23,   24,   25,   26],
       [ 221,  222,  233,  244],
       [ 255,  266, 2211, 2222],
       [2333, 2444, 2555, 2666]])

In [48]:
## If you dont know about the size of one dimesion, you can use -1 to tell numpy to figure it out

numpy_three_dim.reshape(9, -1)

array([[   1,    2,    3,    4],
       [   5,    6,   11,   22],
       [  33,   44,   55,   66],
       [ 111,  222,  333,  444],
       [ 555,  666,   21,   22],
       [  23,   24,   25,   26],
       [ 221,  222,  233,  244],
       [ 255,  266, 2211, 2222],
       [2333, 2444, 2555, 2666]])

In [47]:
## The flatten method helps us get to a one dimensional numpy array

numpy_three_dim.flatten()

array([   1,    2,    3,    4,    5,    6,   11,   22,   33,   44,   55,
         66,  111,  222,  333,  444,  555,  666,   21,   22,   23,   24,
         25,   26,  221,  222,  233,  244,  255,  266, 2211, 2222, 2333,
       2444, 2555, 2666])

# Row array vs Column array

## Important reshape note: Sometimes scikit-learn, a library we will use to do machine learning in python only accepts a column vector so knowing this stuff is important

In [49]:
'''

A one dimensional numpy array has only one number 
in it's shape so therefore it is neither a row vector, nor a column vector

'''

numpy_one_dim.shape

(6,)

In [50]:
# To explicitly make it a row vector / array

numpy_one_dim.reshape(1, 6)

array([[1, 2, 3, 4, 5, 6]])

In [51]:
# To explicitly make it a column vector / array

numpy_one_dim.reshape(6, 1)

array([[1],
       [2],
       [3],
       [4],
       [5],
       [6]])

In [52]:
# We can also use -1 if we do not know the number of elements in the array to create a column vector

numpy_one_dim.reshape(-1, 1)

array([[1],
       [2],
       [3],
       [4],
       [5],
       [6]])

### Understanding Vectorized Operations in numpy through the demonstration of Min-Max scaling

In [1]:
# Create a numpy array that starts at 0 and end at 99

A = np.arange(100)

print(A)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
 96 97 98 99]


In [2]:
Amax, Amin = A.max(), A.min()

print(Amax, Amin)

99 0


In [4]:
Ascaled = (A - Amin)/(Amax - Amin)

print(Ascaled)

[0.         0.01010101 0.02020202 0.03030303 0.04040404 0.05050505
 0.06060606 0.07070707 0.08080808 0.09090909 0.1010101  0.11111111
 0.12121212 0.13131313 0.14141414 0.15151515 0.16161616 0.17171717
 0.18181818 0.19191919 0.2020202  0.21212121 0.22222222 0.23232323
 0.24242424 0.25252525 0.26262626 0.27272727 0.28282828 0.29292929
 0.3030303  0.31313131 0.32323232 0.33333333 0.34343434 0.35353535
 0.36363636 0.37373737 0.38383838 0.39393939 0.4040404  0.41414141
 0.42424242 0.43434343 0.44444444 0.45454545 0.46464646 0.47474747
 0.48484848 0.49494949 0.50505051 0.51515152 0.52525253 0.53535354
 0.54545455 0.55555556 0.56565657 0.57575758 0.58585859 0.5959596
 0.60606061 0.61616162 0.62626263 0.63636364 0.64646465 0.65656566
 0.66666667 0.67676768 0.68686869 0.6969697  0.70707071 0.71717172
 0.72727273 0.73737374 0.74747475 0.75757576 0.76767677 0.77777778
 0.78787879 0.7979798  0.80808081 0.81818182 0.82828283 0.83838384
 0.84848485 0.85858586 0.86868687 0.87878788 0.88888889 0.89898

__Why does this work?__

<br />

* Numpy broadcasts the array so that it can perform vectorized operations

<br />

* once broadcasted, the arrays have vectorized addition or subtraction (elementwise)

<br />

<img src='img/numpy_vectorized_computation.jpg' width='550px'/>

## Vectorized Mathematical Operations on Numpy Arrays

<br />

* Numpy also has a lot of convenience functions that help in performing many __mathematical operations on vectors__ such as dot product, exponentiation, etc.

<br />

* Let us look at a breif example below where we implement the sigmoid function using the `np.exp()` function

<br />

$$ h(W) =  \frac{\mathrm{1} }{\mathrm{1} + e^{-W}}  $$ 

In [4]:
def sigmoid(W):
    
    return 1 / (1 + np.exp(-W))

In [10]:
sigmoid(np.array([24, 12, 18, 1, -24]))

array([1.00000000e+00, 9.99993856e-01, 9.99999985e-01, 7.31058579e-01,
       3.77513454e-11])

## Numpy cheat sheet

<img src='img/numpy_cheat_sheet.png' />