# Machine Learning Master Notes 8 - Multiple Linear Regression and Vectorization

### Prepare Environment

In [1]:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.gridspec import GridSpec
from matplotlib import cm

## Linear Regression and Gradient Descent Formula Summary

Hypothesis: $$f_{w,b}(x^{(i)}) = wx^{(i)} + b$$
Cost Function: $$J(w,b) = \frac{1}{2m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)})^2$$ 

Gradient Descent Algorithm: $$\begin{align*} \text{repeat}&\text{ until convergence:} \; \lbrace \newline
\;  w &= w -  \alpha \frac{\partial J(w,b)}{\partial w}  \; \newline 
 b &= b -  \alpha \frac{\partial J(w,b)}{\partial b}  \newline \rbrace
\end{align*}$$


Partial Derivatives: $$
\begin{align}
\frac{\partial J(w,b)}{\partial w}  &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)})x^{(i)} \\
  \frac{\partial J(w,b)}{\partial b}  &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)}) \\
\end{align}
$$

Full Implementation of Gradient Descent:
$$\begin{align*} \text{repeat}&\text{ until convergence:} \; \lbrace \newline
\;  w &= w -  \alpha \frac{1}{m} \sum\limits_{i = 0}^{m-1} ((wx^{(i)} + b) - y^{(i)})x^{(i)}  \; \newline 
 b &= b -  \alpha \frac{1}{m} \sum\limits_{i = 0}^{m-1} ((wx^{(i)} + b) - y^{(i)})  \newline \rbrace
\end{align*}$$

## Multiple Linear Regression (Multiple Features) Formulation

**In the previously notes, we use housing size as the main factor in estimating housing price using linear regression. However, in practice, housing size is not the only factor affecting the housing prices. The number of bedrooms, number of bathrooms, number of floors and the age of the house will also be considered. In the following notes, we will be applying multiple features to our linear regression.**

In summary, for **single feature** linear regression, we have:

Hypothesis: $$f_{w,b}(x^{(i)}) = wx^{(i)} + b$$

For a dataset with 3 training examples, we have:

$$f_{w,b}(x^{(1)})=b + wx^{(1)}$$
$$f_{w,b}(x^{(2)})=b + wx^{(2)}$$
$$f_{w,b}(x^{(3)})=b + wx^{(3)}$$

$$$$
$$$$

For **multiple features** linear regression, assuming we have **4 features**, then the formula for computing **one training example** will be:

Linear Function: $$f_{w,b}(x)=b + w_{1}x_{1} + w_{2}x_{2} + w_{3}x_{3} + w_{4}x_{4}$$

- In this case there will be more than one `x` and one `w`.
- We use subscript to represent each feature. 


For a dataset with 3 training examples, we have:

$$f_{w,b}(x^{(1)})=b + w_{1}x_{1}^{(1)} + w_{2}x_{2}^{(1)} + w_{3}x_{3}^{(1)} + w_{4}x_{4}^{(1)}$$
$$f_{w,b}(x^{(2)})=b + w_{1}x_{1}^{(2)} + w_{2}x_{2}^{(2)} + w_{3}x_{3}^{(2)} + w_{4}x_{4}^{(2)}$$
$$f_{w,b}(x^{(3)})=b + w_{1}x_{1}^{(3)} + w_{2}x_{2}^{(3)} + w_{3}x_{3}^{(3)} + w_{4}x_{4}^{(3)}$$

To convert to short form:

- we still use $i$ to represent single training example
- we use $m$ to represent total number of training examples

$$f_{w,b}(x^{(i)})=\left(b + w_{1}x_{1}^{(i)} + w_{2}x_{2}^{(i)} + w_{3}x_{3}^{(i)} + w_{4}x_{4}^{(i)}\right)$$

Then
- we use $n$ is represent total number of features used.
- we use $j$  to represent each different feature

In short: $$f_{w,b}(x^{(i)})=b + \sum\limits_{j=0}^{n-1} w_{j}x_{j}^{(i)}$$

- This means that for every training example, we need to multiply and sum up all the features and $b$ before we proceed to the next training examples.

We denote a vector (a list of numbers) as $\vec{x}$, please note that some formula conveniently ignore it. In addition, $w$ will no long be a single number instead $w$ will be a vector denoted as $\vec w$

In short: $$f_{\vec{w},b}(\vec{x}^{(i)})=b + \sum\limits_{j=0}^{n-1} \vec{w}_{j}\vec{x}_{j}^{(i)}$$

## Multiple Linear Regression (Multiple Features) Cost Function

Hypothesis: $$f_{\vec{w},b}(\vec{x}^{(i)})=b + \sum\limits_{j=0}^{n-1} \vec{w}_{j}\vec{x}_{j}^{(i)}$$
Cost Function:	$$J(\vec w, b) = \frac{1}{2m}   \sum\limits_{i=0}^{m-1} (f_{\vec w,b}(\vec{x}^{(i)})-\vec{y}^{(i)})^{2}$$

$\therefore$ 
$$J(\vec w, b) = \frac{1}{2m} \sum\limits_{i=0}^{m-1} \left(\left(b + \sum\limits_{j=0}^{n-1} \vec w_{j} \vec x_{j}^{(i)} \right)-\vec{y}^{(i)}\right)^{2}$$

## Vector and Matrix

<div class="alert alert-block alert-info">

**Introducing Vector and Matrix**

Now, we are going to review the mathematics of vectors and matrix. We will also review how Python support vector and matrix computation. This subject will help us to deal with more complex structure and big data. This will be a very brief introduction to the basics of matrix manipulation. We have a more detailed explanation on this topics a on separate supplemental notes.

**Why Do We Need Vector and Matrix**

When we are dealing with a large dataset especially data with millions of rows and thousands of features. Matrix computation help us to handle the computation in a much easier way. In addition, our computing system has a way to handle this type of computation in parallel and thus increase the speed of computation.

**Advantages of Matrix Computation**

- **Fewer codes and no loops required for matrix computation**
- **Computer hardware able to handle such massive computation in parallel resulting in faster computing speed.**
</div>

### What is Vector?

A **Vector** is a mathematical object that has both magnitude and direction. In machine learning application, we treat the vectors as list of numbers. Vectors are frequently denote as $\vec x$.

E.g.
Below is a **column vector**.
$$\vec x = \begin{bmatrix}
           1 \\
           2 \\
           \vdots \\
           n
         \end{bmatrix}$$
         

For **row vector**:

$$\vec y = \begin{bmatrix}
           1,
           2,
           ...
           n
         \end{bmatrix}$$


<div class="alert alert-block alert-info">

**Programmers Note 1**

Although vectors looks like a list in python, however, list works differently with vector data structure. In Python, vector is represented using the Numpy library. We use Numpy library to convert a list to Numpy array. 

In Python, Numpy array allow us to perform computation more efficiently.

In Mathematics, we refer to the first item as item 1. In computing, usually we refer to first item as item 0.

$$\vec x = \begin{bmatrix}
           1 \\
           2 \\
           3 \\
           4
         \end{bmatrix}$$

Mathematics
$$x_{1} = 1$$
$$x_{3} = 3$$

Computing
$$x_{0} = 1$$
$$x_{3} = 4$$

</div>

<div class="alert alert-block alert-info">

**Programmers Note 2: 1D Numpy vs 2D Numpy**

Numpy can form 1D or 2D array. 1D array is like a list consist of numbers. The shape of the array is indicated as (n,), where n is number of elements in that array. 

A 2D array consist of rows and column. A vector is a 2D array. A column vector is a n rows by 1 column vector. The shape for a column vector is (n,1) where n is number of rows and 1 is the number of column. Hence, a row vector should have the shape of (1,n) where n refers to number of columns.

**For beginners in machine learning, it is best to use 2D array, especially when we use Python to program our own function. It allow new learners to understand the matrix computation.**

**However, please be aware that some library function (from SciKit Learn, Pytorch) accepts only 1D whereas some function accepts only 2D. There are some functions accepts either.**



<div class="alert alert-block alert-warning">

**Important:**
- **Please be aware that once we created an object of a vector, we can manipulate the vector as required. However, when we assign the vector to another variable, the new variable only contains a pointer that points to the original array.**    
- **In such case, we can say that the Numpy array return a view instead of returning a copy when we assigned original array to a new variable.**
- **Please also note that some Numpy operations return a view, some operations return a copy.**
- **There are some operations where it can return either the view or a copy depending on memory situations.**
- **To create a copy, use the method .copy().**
- **For now, let us remember that simple assignment (=) and slicing([:]) always return a view.**

**Reference**
- https://scipy-cookbook.readthedocs.io/items/ViewsVsCopies.html
- https://nedbatchelder.com/text/names1.html
- https://chatgpt.com/share/66f13f1c-8260-8000-b731-5b65f664bc69
- https://stackoverflow.com/questions/65572924/when-should-i-use-copy?rq=3

</div>

#### Vector Creation and Manipulation

In [2]:
v1 = np.array([[1,2,3,4,5]])
v1

array([[1, 2, 3, 4, 5]])

In [3]:
v1.shape

(1, 5)

In [4]:
v2 = np.array([1,2,3,4,5])
v2

array([1, 2, 3, 4, 5])

In [5]:
v2.shape

(5,)

In [6]:
v2 = v2.reshape(5,1)
v2

array([[1],
       [2],
       [3],
       [4],
       [5]])

In [7]:
v2.shape

(5, 1)

In [8]:
v3 = v2.reshape(1,5)
v3.shape

(1, 5)

In [9]:
v3

array([[1, 2, 3, 4, 5]])

In [10]:
v3[0,3]

4

In [11]:
v3[0,3] = 40

In [12]:
v3

array([[ 1,  2,  3, 40,  5]])

In [13]:
v2

array([[ 1],
       [ 2],
       [ 3],
       [40],
       [ 5]])

**The above show that new variable assignment are just additional reference to the same object, although the shape is different.**

In [14]:
v4 = v3.copy()
v4

array([[ 1,  2,  3, 40,  5]])

In [15]:
v4[0,4] = 1000
v4

array([[   1,    2,    3,   40, 1000]])

In [16]:
v3

array([[ 1,  2,  3, 40,  5]])

In [17]:
v2

array([[ 1],
       [ 2],
       [ 3],
       [40],
       [ 5]])

In [18]:
v2 = v3.copy()
v2

array([[ 1,  2,  3, 40,  5]])

In [19]:
v2 = v2.reshape((5,1))
v2

array([[ 1],
       [ 2],
       [ 3],
       [40],
       [ 5]])

In [20]:
v2[3,0] = 4
v2

array([[1],
       [2],
       [3],
       [4],
       [5]])

In [21]:
v3

array([[ 1,  2,  3, 40,  5]])

In [22]:
v4

array([[   1,    2,    3,   40, 1000]])

### What is Matrix?

A matrix consist of many rows and columns as shown in the example below:

$$\begin{bmatrix}
    x_{11}       & x_{12} & x_{13} & \dots & x_{1n} \\
    x_{21}       & x_{22} & x_{23} & \dots & x_{2n} \\
    \dots        & \dots  &\dots   &\dots  &\dots \\
    x_{m1}       & x_{m2} & x_{m3} & \dots & x_{mn}
\end{bmatrix}
$$

We use the format **row by column** to describe matrix:

The following is **2 by 2** matrix:


$$\begin{bmatrix}
    x_{11}       & x_{12} \\
    x_{21}       & x_{22}  
\end{bmatrix}
$$

A 2 by 2 or 3 by 3 or n by n matrices also known as **square matrix**.

Matrix can have non uniform rows and columns. The following is **3 by 2 matrix**:
    
$$\begin{bmatrix}
    x_{11}       & x_{12} \\
    x_{21}       & x_{22} \\
    x_{31}       & x_{32} 
\end{bmatrix}
$$    


The following is **4 by 5 matrix**:


$$\begin{bmatrix}
    x_{11}       & x_{12} & x_{13}& x_{14}& x_{15}\\
    x_{21}       & x_{22} & x_{23}& x_{24}& x_{25}\\
    x_{31}       & x_{32} & x_{33}& x_{34}& x_{35}\\
    x_{41}       & x_{42} & x_{43}& x_{44}& x_{45}
\end{bmatrix}
$$

To refer to the items, we use $x_{12}$. We refer it as first row and 2nd column.  Please note that in computing term we start from row and column 0.

Please also note that 2D vector is also consider as matrix. In such cases, it will be a **n by 1 matrix** or **1 by n matrix**.

<div class="alert alert-block alert-warning">

**Important:**
- **Please be aware that once we created an object of a matrix, we can manipulate the matrix as required. However, when we assign the matrix to another variable, the new variable only contains a pointer that points to the original matrix array.**    
- **In such cases, we can say that the Numpy array return a view instead of returning a copy when we assigned original array to a new variable.**
- **Please also note that some Numpy operations return a view, some operations return a copy.**
- **There are some operations where it can return either the view or a copy depending on situations.**
- **To create a copy, use the method .copy().**
- **In summary, simple assignment (=), slicing([:]) always return a view.**
    
</div>

#### Matrix Creation and Manipulation

In [23]:
m1 = np.array([[1,2,3,4,5],[2,4,6,8,10],[11,13,17,19,23]])
m1

array([[ 1,  2,  3,  4,  5],
       [ 2,  4,  6,  8, 10],
       [11, 13, 17, 19, 23]])

In [24]:
m1.shape

(3, 5)

In [25]:
a1 = np.arange(1,51)
a1.shape

(50,)

In [26]:
m2 = a1.reshape(5,10)
m2

array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
       [31, 32, 33, 34, 35, 36, 37, 38, 39, 40],
       [41, 42, 43, 44, 45, 46, 47, 48, 49, 50]])

In [27]:
m3 = m2.reshape(10,5)
m3

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25],
       [26, 27, 28, 29, 30],
       [31, 32, 33, 34, 35],
       [36, 37, 38, 39, 40],
       [41, 42, 43, 44, 45],
       [46, 47, 48, 49, 50]])

In [28]:
m4 = m3[:5,3:]
m4

array([[ 4,  5],
       [ 9, 10],
       [14, 15],
       [19, 20],
       [24, 25]])

In [29]:
m3[2,3]

14

In [30]:
m3[2,3] = 14000
m3

array([[    1,     2,     3,     4,     5],
       [    6,     7,     8,     9,    10],
       [   11,    12,    13, 14000,    15],
       [   16,    17,    18,    19,    20],
       [   21,    22,    23,    24,    25],
       [   26,    27,    28,    29,    30],
       [   31,    32,    33,    34,    35],
       [   36,    37,    38,    39,    40],
       [   41,    42,    43,    44,    45],
       [   46,    47,    48,    49,    50]])

In [31]:
m4

array([[    4,     5],
       [    9,    10],
       [14000,    15],
       [   19,    20],
       [   24,    25]])

In [32]:
m2

array([[    1,     2,     3,     4,     5,     6,     7,     8,     9,
           10],
       [   11,    12,    13, 14000,    15,    16,    17,    18,    19,
           20],
       [   21,    22,    23,    24,    25,    26,    27,    28,    29,
           30],
       [   31,    32,    33,    34,    35,    36,    37,    38,    39,
           40],
       [   41,    42,    43,    44,    45,    46,    47,    48,    49,
           50]])

This proves that even we just capture a slices of the main matrix. m4 still reference to the m3 and thus any changes to m3 will be reflected in m4. To have a distinct copy, use the copy() function.

In [33]:
m4 = m3[:5,3:].copy()
m4

array([[    4,     5],
       [    9,    10],
       [14000,    15],
       [   19,    20],
       [   24,    25]])

In [34]:
m4[2,0] = 14
m4

array([[ 4,  5],
       [ 9, 10],
       [14, 15],
       [19, 20],
       [24, 25]])

In [35]:
m3

array([[    1,     2,     3,     4,     5],
       [    6,     7,     8,     9,    10],
       [   11,    12,    13, 14000,    15],
       [   16,    17,    18,    19,    20],
       [   21,    22,    23,    24,    25],
       [   26,    27,    28,    29,    30],
       [   31,    32,    33,    34,    35],
       [   36,    37,    38,    39,    40],
       [   41,    42,    43,    44,    45],
       [   46,    47,    48,    49,    50]])

In [36]:
m8 = np.zeros((3,4))
m8

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [37]:
m9 = np.ones((3,1))
m9

array([[1.],
       [1.],
       [1.]])

In [38]:
m10 = np.ones((4,6)) * 8
m10

array([[8., 8., 8., 8., 8., 8.],
       [8., 8., 8., 8., 8., 8.],
       [8., 8., 8., 8., 8., 8.],
       [8., 8., 8., 8., 8., 8.]])

In [39]:
m11 = np.random.sample((3,4))
m11

array([[0.82132512, 0.96440342, 0.29626399, 0.31962458],
       [0.61479227, 0.39536543, 0.08708367, 0.58385816],
       [0.28036288, 0.21096375, 0.92259362, 0.17038499]])

## Matrix and Vector Operations

### Addition and Subtraction of Matrices

Two matrices can perform addition or subtraction as long as their **dimension is the same**. These are **element wise operations**. In Python, element wise operations is also known as broadcasting.

**Matrix Addition** 


$$\begin{bmatrix}a &b \\c &d\end{bmatrix} + \begin{bmatrix}e &f \\g &h\end{bmatrix} = \begin{bmatrix}a+e &b+f \\c+g &d+h\end{bmatrix}$$
         
**Matrix Subtraction** 


$$\begin{bmatrix}a &b \\c &d\end{bmatrix} - \begin{bmatrix}e &f \\g &h\end{bmatrix} = \begin{bmatrix}a-e &b-f \\c-g &d-h\end{bmatrix}$$      

Matrix element wise operation also applies to scalar operation.

**Matrix Addition with Scalar** 


$$\begin{bmatrix}a &b \\c &d\end{bmatrix} + \begin{bmatrix}e\end{bmatrix} = \begin{bmatrix}a+e &b+e \\c+e &d+e\end{bmatrix}$$
         
**Matrix Subtraction with Scalar** 


$$\begin{bmatrix}a &b \\c &d\end{bmatrix} - \begin{bmatrix}f\end{bmatrix} = \begin{bmatrix}a-f &b-f \\c-f &d-f\end{bmatrix}$$ 

**Matrix addition and subtraction always return a new array. However, if we only manipulation certain column of matrix, it will reference the same object.**



**Matrix element wise operation also applies to vector operation. In this case, the n-vector must match either the row or column of matrix.**

**Matrix Addition with Column Vector** 


$$\begin{bmatrix}a &b \\c &d\end{bmatrix} + \begin{bmatrix}e \\ f\end{bmatrix} = \begin{bmatrix}a+e &b+e \\c+f &d+f\end{bmatrix}$$
         
**Matrix Subtraction with Row Vector** 


$$\begin{bmatrix}a &b \\c &d\end{bmatrix} - \begin{bmatrix}e&f\end{bmatrix} = \begin{bmatrix}a-e &b-f \\c-e &d-f\end{bmatrix}$$ 

**Matrix addition and subtraction always return a new array. However, if we only manipulation certain column of matrix, it will reference the same object.**

<div class="alert alert-block alert-warning">

**Important:**
- **For element wise matrix to matrix operations, the dimension of the matrices must be the same.**    
- **For element wise matrix to vector operations, either the column or row of the vector must be the same as the matrix.**
- **Matrix addition and subtraction always return a new array. However, if we only manipulation certain column of matrix, it will reference the same object.**
    
</div>

In [40]:
ao1 = np.arange(1000,1050)
mo1 = ao1.reshape((10,5))
mo1

array([[1000, 1001, 1002, 1003, 1004],
       [1005, 1006, 1007, 1008, 1009],
       [1010, 1011, 1012, 1013, 1014],
       [1015, 1016, 1017, 1018, 1019],
       [1020, 1021, 1022, 1023, 1024],
       [1025, 1026, 1027, 1028, 1029],
       [1030, 1031, 1032, 1033, 1034],
       [1035, 1036, 1037, 1038, 1039],
       [1040, 1041, 1042, 1043, 1044],
       [1045, 1046, 1047, 1048, 1049]])

In [41]:
mo2 = mo1 - 500
mo2

array([[500, 501, 502, 503, 504],
       [505, 506, 507, 508, 509],
       [510, 511, 512, 513, 514],
       [515, 516, 517, 518, 519],
       [520, 521, 522, 523, 524],
       [525, 526, 527, 528, 529],
       [530, 531, 532, 533, 534],
       [535, 536, 537, 538, 539],
       [540, 541, 542, 543, 544],
       [545, 546, 547, 548, 549]])

In [42]:
mo1

array([[1000, 1001, 1002, 1003, 1004],
       [1005, 1006, 1007, 1008, 1009],
       [1010, 1011, 1012, 1013, 1014],
       [1015, 1016, 1017, 1018, 1019],
       [1020, 1021, 1022, 1023, 1024],
       [1025, 1026, 1027, 1028, 1029],
       [1030, 1031, 1032, 1033, 1034],
       [1035, 1036, 1037, 1038, 1039],
       [1040, 1041, 1042, 1043, 1044],
       [1045, 1046, 1047, 1048, 1049]])

In [43]:
mo2

array([[500, 501, 502, 503, 504],
       [505, 506, 507, 508, 509],
       [510, 511, 512, 513, 514],
       [515, 516, 517, 518, 519],
       [520, 521, 522, 523, 524],
       [525, 526, 527, 528, 529],
       [530, 531, 532, 533, 534],
       [535, 536, 537, 538, 539],
       [540, 541, 542, 543, 544],
       [545, 546, 547, 548, 549]])

In [44]:
mo3 = mo1 + 800
mo3

array([[1800, 1801, 1802, 1803, 1804],
       [1805, 1806, 1807, 1808, 1809],
       [1810, 1811, 1812, 1813, 1814],
       [1815, 1816, 1817, 1818, 1819],
       [1820, 1821, 1822, 1823, 1824],
       [1825, 1826, 1827, 1828, 1829],
       [1830, 1831, 1832, 1833, 1834],
       [1835, 1836, 1837, 1838, 1839],
       [1840, 1841, 1842, 1843, 1844],
       [1845, 1846, 1847, 1848, 1849]])

In [45]:
mo1

array([[1000, 1001, 1002, 1003, 1004],
       [1005, 1006, 1007, 1008, 1009],
       [1010, 1011, 1012, 1013, 1014],
       [1015, 1016, 1017, 1018, 1019],
       [1020, 1021, 1022, 1023, 1024],
       [1025, 1026, 1027, 1028, 1029],
       [1030, 1031, 1032, 1033, 1034],
       [1035, 1036, 1037, 1038, 1039],
       [1040, 1041, 1042, 1043, 1044],
       [1045, 1046, 1047, 1048, 1049]])

In [46]:
mo4 = np.arange(50,100).reshape((10,5))
mo4

array([[50, 51, 52, 53, 54],
       [55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64],
       [65, 66, 67, 68, 69],
       [70, 71, 72, 73, 74],
       [75, 76, 77, 78, 79],
       [80, 81, 82, 83, 84],
       [85, 86, 87, 88, 89],
       [90, 91, 92, 93, 94],
       [95, 96, 97, 98, 99]])

In [47]:
mo5 = mo1 + mo4
mo5

array([[1050, 1052, 1054, 1056, 1058],
       [1060, 1062, 1064, 1066, 1068],
       [1070, 1072, 1074, 1076, 1078],
       [1080, 1082, 1084, 1086, 1088],
       [1090, 1092, 1094, 1096, 1098],
       [1100, 1102, 1104, 1106, 1108],
       [1110, 1112, 1114, 1116, 1118],
       [1120, 1122, 1124, 1126, 1128],
       [1130, 1132, 1134, 1136, 1138],
       [1140, 1142, 1144, 1146, 1148]])

In [48]:
mo6 = (mo3 - 400) + mo2
mo6

array([[1900, 1902, 1904, 1906, 1908],
       [1910, 1912, 1914, 1916, 1918],
       [1920, 1922, 1924, 1926, 1928],
       [1930, 1932, 1934, 1936, 1938],
       [1940, 1942, 1944, 1946, 1948],
       [1950, 1952, 1954, 1956, 1958],
       [1960, 1962, 1964, 1966, 1968],
       [1970, 1972, 1974, 1976, 1978],
       [1980, 1982, 1984, 1986, 1988],
       [1990, 1992, 1994, 1996, 1998]])

In [49]:
mo7 = np.arange(1,51).reshape(5,10)
mo7

array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
       [31, 32, 33, 34, 35, 36, 37, 38, 39, 40],
       [41, 42, 43, 44, 45, 46, 47, 48, 49, 50]])

In [50]:
try:
    mo7 - mo6
except Exception as e:
    print(e)

operands could not be broadcast together with shapes (5,10) (10,5) 


In [51]:
mo7.shape

(5, 10)

In [52]:
mo6.shape

(10, 5)

<div class="alert alert-block alert-warning">

**Matrix of different dimension cannot be computed together.**

</div>

In [53]:
mo7

array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
       [31, 32, 33, 34, 35, 36, 37, 38, 39, 40],
       [41, 42, 43, 44, 45, 46, 47, 48, 49, 50]])

In [54]:
mo7[:,1] = mo7[:,1] + 20
mo7

array([[ 1, 22,  3,  4,  5,  6,  7,  8,  9, 10],
       [11, 32, 13, 14, 15, 16, 17, 18, 19, 20],
       [21, 42, 23, 24, 25, 26, 27, 28, 29, 30],
       [31, 52, 33, 34, 35, 36, 37, 38, 39, 40],
       [41, 62, 43, 44, 45, 46, 47, 48, 49, 50]])

In [55]:
mo8 = mo7
mo8[:,5] = mo8[:,5] - 100
mo8

array([[  1,  22,   3,   4,   5, -94,   7,   8,   9,  10],
       [ 11,  32,  13,  14,  15, -84,  17,  18,  19,  20],
       [ 21,  42,  23,  24,  25, -74,  27,  28,  29,  30],
       [ 31,  52,  33,  34,  35, -64,  37,  38,  39,  40],
       [ 41,  62,  43,  44,  45, -54,  47,  48,  49,  50]])

The operation previously done on mo7 will still reflect in mo8.

In [56]:
mo7

array([[  1,  22,   3,   4,   5, -94,   7,   8,   9,  10],
       [ 11,  32,  13,  14,  15, -84,  17,  18,  19,  20],
       [ 21,  42,  23,  24,  25, -74,  27,  28,  29,  30],
       [ 31,  52,  33,  34,  35, -64,  37,  38,  39,  40],
       [ 41,  62,  43,  44,  45, -54,  47,  48,  49,  50]])

Changes to mo8 also reflects in mo7

In [57]:
mo9 = np.arange(1,51).reshape(5,10)
mo9

array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
       [31, 32, 33, 34, 35, 36, 37, 38, 39, 40],
       [41, 42, 43, 44, 45, 46, 47, 48, 49, 50]])

In [58]:
mo9[:,5] = mo9[:,5] - 100
mo9

array([[  1,   2,   3,   4,   5, -94,   7,   8,   9,  10],
       [ 11,  12,  13,  14,  15, -84,  17,  18,  19,  20],
       [ 21,  22,  23,  24,  25, -74,  27,  28,  29,  30],
       [ 31,  32,  33,  34,  35, -64,  37,  38,  39,  40],
       [ 41,  42,  43,  44,  45, -54,  47,  48,  49,  50]])

In [59]:
mo9.shape

(5, 10)

In [60]:
mo9

array([[  1,   2,   3,   4,   5, -94,   7,   8,   9,  10],
       [ 11,  12,  13,  14,  15, -84,  17,  18,  19,  20],
       [ 21,  22,  23,  24,  25, -74,  27,  28,  29,  30],
       [ 31,  32,  33,  34,  35, -64,  37,  38,  39,  40],
       [ 41,  42,  43,  44,  45, -54,  47,  48,  49,  50]])

In [61]:
mo10 = np.array([1,2,3,4,5]).reshape((5,1))
mo10

array([[1],
       [2],
       [3],
       [4],
       [5]])

In [62]:
mo10b = mo9 - mo10
mo10b

array([[  0,   1,   2,   3,   4, -95,   6,   7,   8,   9],
       [  9,  10,  11,  12,  13, -86,  15,  16,  17,  18],
       [ 18,  19,  20,  21,  22, -77,  24,  25,  26,  27],
       [ 27,  28,  29,  30,  31, -68,  33,  34,  35,  36],
       [ 36,  37,  38,  39,  40, -59,  42,  43,  44,  45]])

In [63]:
mo9

array([[  1,   2,   3,   4,   5, -94,   7,   8,   9,  10],
       [ 11,  12,  13,  14,  15, -84,  17,  18,  19,  20],
       [ 21,  22,  23,  24,  25, -74,  27,  28,  29,  30],
       [ 31,  32,  33,  34,  35, -64,  37,  38,  39,  40],
       [ 41,  42,  43,  44,  45, -54,  47,  48,  49,  50]])

**mo9 retain old numbers.**

In [64]:
mo11 = np.arange(10).reshape((1,10))
mo11

array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])

In [65]:
mo9

array([[  1,   2,   3,   4,   5, -94,   7,   8,   9,  10],
       [ 11,  12,  13,  14,  15, -84,  17,  18,  19,  20],
       [ 21,  22,  23,  24,  25, -74,  27,  28,  29,  30],
       [ 31,  32,  33,  34,  35, -64,  37,  38,  39,  40],
       [ 41,  42,  43,  44,  45, -54,  47,  48,  49,  50]])

In [66]:
mo11b = mo9 + mo11
mo11b

array([[  1,   3,   5,   7,   9, -89,  13,  15,  17,  19],
       [ 11,  13,  15,  17,  19, -79,  23,  25,  27,  29],
       [ 21,  23,  25,  27,  29, -69,  33,  35,  37,  39],
       [ 31,  33,  35,  37,  39, -59,  43,  45,  47,  49],
       [ 41,  43,  45,  47,  49, -49,  53,  55,  57,  59]])

### Scalar Multiplication

We can perform scalar multiplication with a matrix, we just multiply the scalar with each element in the matrix:


$$\begin{bmatrix}a &b \\c &d\end{bmatrix} * x = \begin{bmatrix}a*x &b*x \\c*x &d*x\end{bmatrix}$$  

scalar multiplication always return a new array when you assigned it to a new variable, however, it will always reference the same object if you are only modifying a particular column.

In [67]:
mo6

array([[1900, 1902, 1904, 1906, 1908],
       [1910, 1912, 1914, 1916, 1918],
       [1920, 1922, 1924, 1926, 1928],
       [1930, 1932, 1934, 1936, 1938],
       [1940, 1942, 1944, 1946, 1948],
       [1950, 1952, 1954, 1956, 1958],
       [1960, 1962, 1964, 1966, 1968],
       [1970, 1972, 1974, 1976, 1978],
       [1980, 1982, 1984, 1986, 1988],
       [1990, 1992, 1994, 1996, 1998]])

In [68]:
mo6.shape

(10, 5)

In [69]:
mo12 = mo6 / 100
mo12

array([[19.  , 19.02, 19.04, 19.06, 19.08],
       [19.1 , 19.12, 19.14, 19.16, 19.18],
       [19.2 , 19.22, 19.24, 19.26, 19.28],
       [19.3 , 19.32, 19.34, 19.36, 19.38],
       [19.4 , 19.42, 19.44, 19.46, 19.48],
       [19.5 , 19.52, 19.54, 19.56, 19.58],
       [19.6 , 19.62, 19.64, 19.66, 19.68],
       [19.7 , 19.72, 19.74, 19.76, 19.78],
       [19.8 , 19.82, 19.84, 19.86, 19.88],
       [19.9 , 19.92, 19.94, 19.96, 19.98]])

In [70]:
mo6

array([[1900, 1902, 1904, 1906, 1908],
       [1910, 1912, 1914, 1916, 1918],
       [1920, 1922, 1924, 1926, 1928],
       [1930, 1932, 1934, 1936, 1938],
       [1940, 1942, 1944, 1946, 1948],
       [1950, 1952, 1954, 1956, 1958],
       [1960, 1962, 1964, 1966, 1968],
       [1970, 1972, 1974, 1976, 1978],
       [1980, 1982, 1984, 1986, 1988],
       [1990, 1992, 1994, 1996, 1998]])

In [71]:
mo6 * 0.01

array([[19.  , 19.02, 19.04, 19.06, 19.08],
       [19.1 , 19.12, 19.14, 19.16, 19.18],
       [19.2 , 19.22, 19.24, 19.26, 19.28],
       [19.3 , 19.32, 19.34, 19.36, 19.38],
       [19.4 , 19.42, 19.44, 19.46, 19.48],
       [19.5 , 19.52, 19.54, 19.56, 19.58],
       [19.6 , 19.62, 19.64, 19.66, 19.68],
       [19.7 , 19.72, 19.74, 19.76, 19.78],
       [19.8 , 19.82, 19.84, 19.86, 19.88],
       [19.9 , 19.92, 19.94, 19.96, 19.98]])

In [72]:
mo13 = mo6.copy()
mo13

array([[1900, 1902, 1904, 1906, 1908],
       [1910, 1912, 1914, 1916, 1918],
       [1920, 1922, 1924, 1926, 1928],
       [1930, 1932, 1934, 1936, 1938],
       [1940, 1942, 1944, 1946, 1948],
       [1950, 1952, 1954, 1956, 1958],
       [1960, 1962, 1964, 1966, 1968],
       [1970, 1972, 1974, 1976, 1978],
       [1980, 1982, 1984, 1986, 1988],
       [1990, 1992, 1994, 1996, 1998]])

In [73]:
mo14 = mo13 / 100
mo14

array([[19.  , 19.02, 19.04, 19.06, 19.08],
       [19.1 , 19.12, 19.14, 19.16, 19.18],
       [19.2 , 19.22, 19.24, 19.26, 19.28],
       [19.3 , 19.32, 19.34, 19.36, 19.38],
       [19.4 , 19.42, 19.44, 19.46, 19.48],
       [19.5 , 19.52, 19.54, 19.56, 19.58],
       [19.6 , 19.62, 19.64, 19.66, 19.68],
       [19.7 , 19.72, 19.74, 19.76, 19.78],
       [19.8 , 19.82, 19.84, 19.86, 19.88],
       [19.9 , 19.92, 19.94, 19.96, 19.98]])

**The following demonstrate how to manipulate a column at a time.**

In [74]:
mo15 = mo6.copy()
mo15

array([[1900, 1902, 1904, 1906, 1908],
       [1910, 1912, 1914, 1916, 1918],
       [1920, 1922, 1924, 1926, 1928],
       [1930, 1932, 1934, 1936, 1938],
       [1940, 1942, 1944, 1946, 1948],
       [1950, 1952, 1954, 1956, 1958],
       [1960, 1962, 1964, 1966, 1968],
       [1970, 1972, 1974, 1976, 1978],
       [1980, 1982, 1984, 1986, 1988],
       [1990, 1992, 1994, 1996, 1998]])

In [75]:
mo15[:,2] = mo15[:,2]/10.0
mo15

array([[1900, 1902,  190, 1906, 1908],
       [1910, 1912,  191, 1916, 1918],
       [1920, 1922,  192, 1926, 1928],
       [1930, 1932,  193, 1936, 1938],
       [1940, 1942,  194, 1946, 1948],
       [1950, 1952,  195, 1956, 1958],
       [1960, 1962,  196, 1966, 1968],
       [1970, 1972,  197, 1976, 1978],
       [1980, 1982,  198, 1986, 1988],
       [1990, 1992,  199, 1996, 1998]])

In [76]:
mo13.dtype

dtype('int64')

**Note: The above integer matrix could not divide and convert to float.**

In [77]:
mo16 = mo6.copy().astype('float')
mo16

array([[1900., 1902., 1904., 1906., 1908.],
       [1910., 1912., 1914., 1916., 1918.],
       [1920., 1922., 1924., 1926., 1928.],
       [1930., 1932., 1934., 1936., 1938.],
       [1940., 1942., 1944., 1946., 1948.],
       [1950., 1952., 1954., 1956., 1958.],
       [1960., 1962., 1964., 1966., 1968.],
       [1970., 1972., 1974., 1976., 1978.],
       [1980., 1982., 1984., 1986., 1988.],
       [1990., 1992., 1994., 1996., 1998.]])

In [78]:
mo16[:,2] = mo16[:,2]/10.0
mo16

array([[1900. , 1902. ,  190.4, 1906. , 1908. ],
       [1910. , 1912. ,  191.4, 1916. , 1918. ],
       [1920. , 1922. ,  192.4, 1926. , 1928. ],
       [1930. , 1932. ,  193.4, 1936. , 1938. ],
       [1940. , 1942. ,  194.4, 1946. , 1948. ],
       [1950. , 1952. ,  195.4, 1956. , 1958. ],
       [1960. , 1962. ,  196.4, 1966. , 1968. ],
       [1970. , 1972. ,  197.4, 1976. , 1978. ],
       [1980. , 1982. ,  198.4, 1986. , 1988. ],
       [1990. , 1992. ,  199.4, 1996. , 1998. ]])

In [79]:
mo17 = np.array([[2104, 5, 1, 45], [1416, 3, 2, 40], [852, 2, 1, 35]]).astype('float')
mo17

array([[2.104e+03, 5.000e+00, 1.000e+00, 4.500e+01],
       [1.416e+03, 3.000e+00, 2.000e+00, 4.000e+01],
       [8.520e+02, 2.000e+00, 1.000e+00, 3.500e+01]])

In [80]:
mo17[:,0] = mo17[:,0]/1000
mo17

array([[ 2.104,  5.   ,  1.   , 45.   ],
       [ 1.416,  3.   ,  2.   , 40.   ],
       [ 0.852,  2.   ,  1.   , 35.   ]])

In [81]:
mo17[:,3] = mo17[:,3]/10
mo17

array([[2.104, 5.   , 1.   , 4.5  ],
       [1.416, 3.   , 2.   , 4.   ],
       [0.852, 2.   , 1.   , 3.5  ]])

### Transpose of Matrix

Transpose of a matrix is to convert the rows to column and columns to rows, it will take the first row and make it as first column and so on:

$$A = \begin{bmatrix}a &b \\c &d \\e &f\end{bmatrix}$$ 

$$A^{T} = \begin{bmatrix}a &c &e\\b &d &f\end{bmatrix}$$ 

Also

$$A_{ij} = A^{T}_{ji}$$

In [82]:
mo18 = np.array([[2104, 5, 1, 45], [1416, 3, 2, 40], [852, 2, 1, 35]])
mo18

array([[2104,    5,    1,   45],
       [1416,    3,    2,   40],
       [ 852,    2,    1,   35]])

In [83]:
mo19 = np.array([[2104,1416,852],[5,3,2],[1,2,1],[45,40,35]])
mo19

array([[2104, 1416,  852],
       [   5,    3,    2],
       [   1,    2,    1],
       [  45,   40,   35]])

In [84]:
mo20 = np.transpose(mo19)
mo20

array([[2104,    5,    1,   45],
       [1416,    3,    2,   40],
       [ 852,    2,    1,   35]])

### Other Matrix Operation

There are also some pre-built matrix function such as sum, mean that we can use.

#### Sum of Matrix

In [85]:
mo18

array([[2104,    5,    1,   45],
       [1416,    3,    2,   40],
       [ 852,    2,    1,   35]])

In [86]:
# Sum all numbers in the matrix
mo18.sum()

4506

In [87]:
# Sum all the columns
mo18.sum(axis = 0)

array([4372,   10,    4,  120])

In [88]:
# Sum all the rows
# This does not really make sense or practical unless your rows belong to the same column
mo18.sum(axis = 1)

array([2155, 1461,  890])

#### Average of Matrix

In [89]:
mo18

array([[2104,    5,    1,   45],
       [1416,    3,    2,   40],
       [ 852,    2,    1,   35]])

In [90]:
mo18.mean()

375.5

In [91]:
# Average of the columns
mo18.mean(axis = 0)

array([1.45733333e+03, 3.33333333e+00, 1.33333333e+00, 4.00000000e+01])

## Matrix and Vector Multiplication (Broadcast)

<div class="alert alert-block alert-info">
    
**There are 2 forms of matrix multiplications, they are the broadcast element wise multiplication and dot product multiplication. In this section we will be touching just the element wise broadcast**
</div>

### Vector and Vector Multiplication (Broadcast)

**In vector and vector multiplication, only the vector of the same dimension can perform element wise vector multiplication.**
$$\begin{bmatrix}a \\b \\c \end{bmatrix} * \begin{bmatrix}x \\y \\z \end{bmatrix} = \begin{bmatrix}(a*x)\\(b*y)\\(c*z)\end{bmatrix}$$

In [92]:
vo20 = np.arange(1,6).reshape((5,1))
vo20

array([[1],
       [2],
       [3],
       [4],
       [5]])

In [93]:
vo21 = np.arange(3221,3226).reshape((5,1))
vo21

array([[3221],
       [3222],
       [3223],
       [3224],
       [3225]])

In [94]:
vo22 = vo21 * vo20
vo22

array([[ 3221],
       [ 6444],
       [ 9669],
       [12896],
       [16125]])

In [95]:
vo23 = np.arange(1,11).reshape((10,1))
vo23

array([[ 1],
       [ 2],
       [ 3],
       [ 4],
       [ 5],
       [ 6],
       [ 7],
       [ 8],
       [ 9],
       [10]])

In [96]:
try:
    vo22 * vo23
except Exception as e:
    print('Error:', e)

Error: operands could not be broadcast together with shapes (5,1) (10,1) 


### Matrix and Matrix Multiplication (Broadcast)

**In matrix and matrix multiplication, only the matrix of the same dimension can perform matrix multiplication.**
$$\begin{bmatrix}a &b \\c &d \\e &f\end{bmatrix} * \begin{bmatrix}u &v \\w &x \\y &z\end{bmatrix} = \begin{bmatrix}a*u &b*v \\c*w &d*x\\e*y &f*z\end{bmatrix}$$ 

**Important:**

**Commutative** $$A * B = B * A$$
**Associative** $$(A*B)*C = A*(B*C)$$

In [97]:
mo20 = np.arange(1,7).reshape((3,2))
mo20

array([[1, 2],
       [3, 4],
       [5, 6]])

In [98]:
mo21 = np.arange(1001,1007).reshape((3,2))
mo21

array([[1001, 1002],
       [1003, 1004],
       [1005, 1006]])

In [99]:
mo22 = mo20 * mo21
mo22

array([[1001, 2004],
       [3009, 4016],
       [5025, 6036]])

In [100]:
mo23 = np.arange(1,51).reshape((10,5))
mo23

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25],
       [26, 27, 28, 29, 30],
       [31, 32, 33, 34, 35],
       [36, 37, 38, 39, 40],
       [41, 42, 43, 44, 45],
       [46, 47, 48, 49, 50]])

In [101]:
mo24 = np.arange(1001,1051).reshape((10,5))
mo24

array([[1001, 1002, 1003, 1004, 1005],
       [1006, 1007, 1008, 1009, 1010],
       [1011, 1012, 1013, 1014, 1015],
       [1016, 1017, 1018, 1019, 1020],
       [1021, 1022, 1023, 1024, 1025],
       [1026, 1027, 1028, 1029, 1030],
       [1031, 1032, 1033, 1034, 1035],
       [1036, 1037, 1038, 1039, 1040],
       [1041, 1042, 1043, 1044, 1045],
       [1046, 1047, 1048, 1049, 1050]])

In [102]:
mo25 = mo24 / mo23
mo25

array([[1001.        ,  501.        ,  334.33333333,  251.        ,
         201.        ],
       [ 167.66666667,  143.85714286,  126.        ,  112.11111111,
         101.        ],
       [  91.90909091,   84.33333333,   77.92307692,   72.42857143,
          67.66666667],
       [  63.5       ,   59.82352941,   56.55555556,   53.63157895,
          51.        ],
       [  48.61904762,   46.45454545,   44.47826087,   42.66666667,
          41.        ],
       [  39.46153846,   38.03703704,   36.71428571,   35.48275862,
          34.33333333],
       [  33.25806452,   32.25      ,   31.3030303 ,   30.41176471,
          29.57142857],
       [  28.77777778,   28.02702703,   27.31578947,   26.64102564,
          26.        ],
       [  25.3902439 ,   24.80952381,   24.25581395,   23.72727273,
          23.22222222],
       [  22.73913043,   22.27659574,   21.83333333,   21.40816327,
          21.        ]])

In [103]:
mo26 = np.arange(1001,1051).reshape((5,10))
mo26

array([[1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010],
       [1011, 1012, 1013, 1014, 1015, 1016, 1017, 1018, 1019, 1020],
       [1021, 1022, 1023, 1024, 1025, 1026, 1027, 1028, 1029, 1030],
       [1031, 1032, 1033, 1034, 1035, 1036, 1037, 1038, 1039, 1040],
       [1041, 1042, 1043, 1044, 1045, 1046, 1047, 1048, 1049, 1050]])

In [104]:
mo24

array([[1001, 1002, 1003, 1004, 1005],
       [1006, 1007, 1008, 1009, 1010],
       [1011, 1012, 1013, 1014, 1015],
       [1016, 1017, 1018, 1019, 1020],
       [1021, 1022, 1023, 1024, 1025],
       [1026, 1027, 1028, 1029, 1030],
       [1031, 1032, 1033, 1034, 1035],
       [1036, 1037, 1038, 1039, 1040],
       [1041, 1042, 1043, 1044, 1045],
       [1046, 1047, 1048, 1049, 1050]])

In [105]:
try:
    mo24 * mo26
except Exception as e:
    print('Error:', e)

Error: operands could not be broadcast together with shapes (10,5) (5,10) 


In [106]:
mo27 =  np.arange(11,23).reshape((4,3))
mo27

array([[11, 12, 13],
       [14, 15, 16],
       [17, 18, 19],
       [20, 21, 22]])

In [107]:
mo28 =  np.random.rand(4,3)
mo28

array([[0.74822919, 0.61049448, 0.27109772],
       [0.9557636 , 0.71159844, 0.59763958],
       [0.76812033, 0.97948658, 0.15510894],
       [0.77264892, 0.76706713, 0.94627878]])

In [108]:
mo27 * mo28

array([[ 8.23052105,  7.3259338 ,  3.52427038],
       [13.38069037, 10.67397666,  9.5622333 ],
       [13.05804558, 17.63075845,  2.9470698 ],
       [15.4529783 , 16.10840983, 20.81813314]])

In [109]:
mo28 * mo27

array([[ 8.23052105,  7.3259338 ,  3.52427038],
       [13.38069037, 10.67397666,  9.5622333 ],
       [13.05804558, 17.63075845,  2.9470698 ],
       [15.4529783 , 16.10840983, 20.81813314]])

**The above proves that** $A * B = B * A$

In [110]:
mo29 = np.arange(67, 67+12).reshape((4,3))
mo29

array([[67, 68, 69],
       [70, 71, 72],
       [73, 74, 75],
       [76, 77, 78]])

In [111]:
mo27 * mo28 * mo29

array([[ 551.44491048,  498.16349819,  243.17465593],
       [ 936.64832607,  757.85234321,  688.48079755],
       [ 953.23732724, 1304.6761255 ,  221.03023495],
       [1174.42635084, 1240.34755663, 1623.81438505]])

In [112]:
tmp1 = mo27 * mo28
tmp1 * mo29

array([[ 551.44491048,  498.16349819,  243.17465593],
       [ 936.64832607,  757.85234321,  688.48079755],
       [ 953.23732724, 1304.6761255 ,  221.03023495],
       [1174.42635084, 1240.34755663, 1623.81438505]])

In [113]:
tmp2 = mo28 * mo29
mo27 * tmp2

array([[ 551.44491048,  498.16349819,  243.17465593],
       [ 936.64832607,  757.85234321,  688.48079755],
       [ 953.23732724, 1304.6761255 ,  221.03023495],
       [1174.42635084, 1240.34755663, 1623.81438505]])

**The above proves that** $(A*B)*C = A*(B*C)$

### Matrix and Vector Multiplication (Broadcast)

**In matrix and vector multiplication works only if the dimension can be broadcast. For row vector, the number of columns must match the column of the matrix. For column vector, the number of rows must match the row of matrix.**

**Matrix with Column Vector**
$$\begin{bmatrix}a &b &c \\d &e &f \\g &h &i\end{bmatrix} * \begin{bmatrix}x \\y \\z\end{bmatrix} = \begin{bmatrix}a*x &b*x & c*x \\d*y &e*y & f*y\\g*z &h*z &i*z\end{bmatrix}$$ 

**Matrix with Row Vector**
$$\begin{bmatrix}a &b &c \\d &e &f \\g &h &i\end{bmatrix} * \begin{bmatrix}x &y &z\end{bmatrix} = \begin{bmatrix}a*x &b*y & c*z \\d*x &e*y & f*z\\g*x &h*y &i*z\end{bmatrix}$$

In [114]:
mv1 = np.arange(1,13).reshape((3,4))
mv1

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [115]:
mv2 = np.array([10,20,30]).reshape((3,1))
mv2

array([[10],
       [20],
       [30]])

In [116]:
mv3 = mv1 * mv2
mv3

array([[ 10,  20,  30,  40],
       [100, 120, 140, 160],
       [270, 300, 330, 360]])

In [117]:
mv4 = np.array([10,20,30,40]).reshape((4,1))
mv4

array([[10],
       [20],
       [30],
       [40]])

In [118]:
try:
    mv1 * mv4
except Exception as e:
    print('Error:', e)

Error: operands could not be broadcast together with shapes (3,4) (4,1) 


In [119]:
mv5 = np.arange(1001,1101).reshape((20,5))
mv5

array([[1001, 1002, 1003, 1004, 1005],
       [1006, 1007, 1008, 1009, 1010],
       [1011, 1012, 1013, 1014, 1015],
       [1016, 1017, 1018, 1019, 1020],
       [1021, 1022, 1023, 1024, 1025],
       [1026, 1027, 1028, 1029, 1030],
       [1031, 1032, 1033, 1034, 1035],
       [1036, 1037, 1038, 1039, 1040],
       [1041, 1042, 1043, 1044, 1045],
       [1046, 1047, 1048, 1049, 1050],
       [1051, 1052, 1053, 1054, 1055],
       [1056, 1057, 1058, 1059, 1060],
       [1061, 1062, 1063, 1064, 1065],
       [1066, 1067, 1068, 1069, 1070],
       [1071, 1072, 1073, 1074, 1075],
       [1076, 1077, 1078, 1079, 1080],
       [1081, 1082, 1083, 1084, 1085],
       [1086, 1087, 1088, 1089, 1090],
       [1091, 1092, 1093, 1094, 1095],
       [1096, 1097, 1098, 1099, 1100]])

In [120]:
mv6 = np.array([10,10,1,100,100]).reshape((1,5))
mv6

array([[ 10,  10,   1, 100, 100]])

In [121]:
mv7 = mv5 / mv6
mv7

array([[ 100.1 ,  100.2 , 1003.  ,   10.04,   10.05],
       [ 100.6 ,  100.7 , 1008.  ,   10.09,   10.1 ],
       [ 101.1 ,  101.2 , 1013.  ,   10.14,   10.15],
       [ 101.6 ,  101.7 , 1018.  ,   10.19,   10.2 ],
       [ 102.1 ,  102.2 , 1023.  ,   10.24,   10.25],
       [ 102.6 ,  102.7 , 1028.  ,   10.29,   10.3 ],
       [ 103.1 ,  103.2 , 1033.  ,   10.34,   10.35],
       [ 103.6 ,  103.7 , 1038.  ,   10.39,   10.4 ],
       [ 104.1 ,  104.2 , 1043.  ,   10.44,   10.45],
       [ 104.6 ,  104.7 , 1048.  ,   10.49,   10.5 ],
       [ 105.1 ,  105.2 , 1053.  ,   10.54,   10.55],
       [ 105.6 ,  105.7 , 1058.  ,   10.59,   10.6 ],
       [ 106.1 ,  106.2 , 1063.  ,   10.64,   10.65],
       [ 106.6 ,  106.7 , 1068.  ,   10.69,   10.7 ],
       [ 107.1 ,  107.2 , 1073.  ,   10.74,   10.75],
       [ 107.6 ,  107.7 , 1078.  ,   10.79,   10.8 ],
       [ 108.1 ,  108.2 , 1083.  ,   10.84,   10.85],
       [ 108.6 ,  108.7 , 1088.  ,   10.89,   10.9 ],
       [ 109.1 ,  109.2 , 10

## Matrix and Vector Multiplication (Dot Product)

<div class="alert alert-block alert-info">
    
**In dot product multiplication, 2 vectors or matrix can be multiply together under one condition. The condition is that the columns of the multiplier **must match** the number of rows of multiplicand. A **m by n** matrix/vector multiplied with **n by p** matrix/vector will result in a **m by p** matrix/vector.**
</div>

### Vector and Vector Multiplication (Dot Product)

$$\begin{bmatrix}a &b &c \end{bmatrix} * \begin{bmatrix}x \\y \\z \end{bmatrix} = \begin{bmatrix}(a*x)+(b*y)+(c*z)\end{bmatrix}$$ 

Two vectors with different size can be multiplied together under one condition. The condition is that the columns of the multiplier **must match** the number of rows of multiplicand. A **1 x n** vector multiplied with **n x 1** vector will result in a **1 x 1** number.

**Please note that vector to vector multiplication can only work if one is a row vector and the other is column vector.**

**However, in implementation of Python, we can perform 1D array multiply by 1D array. This is the way Python handles array. However, we recommended all users to convert all arrays to a row or column vector.**

<font color='red'>
    
**There are 2 different ways to perform dot product computation:**

- We can use **np.dot()**
- or we can use **np.matmul()**
- np.matmul can also be represented by symbol **@**

<font color='blue'>

**Notes**

- For 2D array no difference in using np.dot() or np.matmul. However, **np.matmul is preferred for 2D array**.
- np.dot can perform on 1D array and scalar but np.matmul cannot do so.
- **To make things simple, always use 2D array for all operations.**

In [122]:
vd1 = np.arange(1,11).reshape((10,1))
vd1

array([[ 1],
       [ 2],
       [ 3],
       [ 4],
       [ 5],
       [ 6],
       [ 7],
       [ 8],
       [ 9],
       [10]])

In [123]:
vd2 = np.arange(1001,1011).reshape((10,1))
vd2

array([[1001],
       [1002],
       [1003],
       [1004],
       [1005],
       [1006],
       [1007],
       [1008],
       [1009],
       [1010]])

In [124]:
try:
    vd3 = np.dot(vd1,vd2)
except Exception as e:
    print('Error:', e)

Error: shapes (10,1) and (10,1) not aligned: 1 (dim 1) != 10 (dim 0)


In [125]:
vd2 = np.transpose(vd2)
vd2

array([[1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010]])

In [126]:
vd3 = np.dot(vd1,vd2)
vd3

array([[ 1001,  1002,  1003,  1004,  1005,  1006,  1007,  1008,  1009,
         1010],
       [ 2002,  2004,  2006,  2008,  2010,  2012,  2014,  2016,  2018,
         2020],
       [ 3003,  3006,  3009,  3012,  3015,  3018,  3021,  3024,  3027,
         3030],
       [ 4004,  4008,  4012,  4016,  4020,  4024,  4028,  4032,  4036,
         4040],
       [ 5005,  5010,  5015,  5020,  5025,  5030,  5035,  5040,  5045,
         5050],
       [ 6006,  6012,  6018,  6024,  6030,  6036,  6042,  6048,  6054,
         6060],
       [ 7007,  7014,  7021,  7028,  7035,  7042,  7049,  7056,  7063,
         7070],
       [ 8008,  8016,  8024,  8032,  8040,  8048,  8056,  8064,  8072,
         8080],
       [ 9009,  9018,  9027,  9036,  9045,  9054,  9063,  9072,  9081,
         9090],
       [10010, 10020, 10030, 10040, 10050, 10060, 10070, 10080, 10090,
        10100]])

<div class="alert alert-block alert-warning">
    
**The above result is not what we want. We want a single number. Tracking back, we need to check the shape of the vector.**
</div>

In [127]:
vd2.shape

(1, 10)

In [128]:
vd1.shape

(10, 1)

<div class="alert alert-block alert-warning">
    
**The shape show us that we are doing the dot product of 10 by 1 vector with 1 by 10 vector, resulting in 10 by 10 matrix. We should reverse the operation. We should do a np.dot(vd2,vd1) instead of  np.dot(vd1, vd2).**

**The order of the vector or matrix is important in dot product multiplication.**
</div>

In [129]:
vd3 = np.dot(vd2,vd1)
vd3

array([[55385]])

In [130]:
vd2@vd1

array([[55385]])

In [131]:
np.matmul(vd2,vd1)

array([[55385]])

### Matrix and Matrix Multiplication (Dot Product)

Two matrices with different size can be multiplied together under same condition that the columns of the multiplier **must match** the number of rows of multiplicand. A **m x n** matrix multiplied with **n x p** matrix will result in a **m x p** matrix.

$$\begin{bmatrix}a &b \\c &d \\e &f\end{bmatrix} * \begin{bmatrix}w &x \\y &z \end{bmatrix} = \begin{bmatrix}a*w+b*y &a*x+b*z \\c*w+d*y &c*x+d*z\\e*w+f*y &e*x+f*z\end{bmatrix}$$ 

The example above is a **3 by 2 matrix** multiple by a **2 by 2 matrix**. The result is a **3 by 2 matrix**.

The row of multiplier and the column of multiplicand need not be the same.

$$\begin{bmatrix}
    1    &2   &3\\
    4    &5   &6  
\end{bmatrix}$$

$$\cdot$$

$$\begin{bmatrix}
    11    &4    &7\\
    21    &1    &9\\ 
    31    &7    &4   
\end{bmatrix}$$  

$$=$$

$$\begin{bmatrix}
    146    &27    &37\\
    335    &63    &97    
\end{bmatrix}$$ 


Based on example above **2 by 3** matrix and a **3 by 3** matrix, the resulting product is a **2 by 3** matrix

Other examples are:

- **3 by 3**  dot  **3 by 3**  => **3 by 3**
- **3 by 4**  dot  **4 by 5**  => **3 by 5**
- **17 by 2** dot  **2 by 20** => **17 by 20**


**Dot product of 2 same dimension matrix will also not worked if the matrices does not satisfied the rules above.**

**Important:**

**Not Commutative** $$A * B \neq B * A$$

**Associative** $$(A*B)*C = A*(B*C)$$

In [132]:
md1 = np.random.rand(4,5)
md1

array([[0.55910652, 0.87515499, 0.02808619, 0.74302485, 0.90642227],
       [0.87662575, 0.71982642, 0.11144095, 0.35934357, 0.32764732],
       [0.83377051, 0.90832151, 0.11173654, 0.0075381 , 0.29560746],
       [0.91747879, 0.83277466, 0.04869898, 0.95937644, 0.69969591]])

In [133]:
md2 = np.arange(1,16).reshape(5,3)
md2

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12],
       [13, 14, 15]])

In [134]:
md3 = md1@md2
md3

array([[23.47006791, 26.58186274, 29.69365757],
       [12.38886889, 14.7837529 , 17.17863691],
       [ 9.16749028, 11.3244644 , 13.48143852],
       [23.27928153, 26.73730631, 30.19533108]])

In [135]:
md3.shape

(4, 3)

**It would not work if we reverse the order.**

In [136]:
try:
    md2@md1
except Exception as e:
    print('Error:', e)

Error: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 4 is different from 3)


**This proves that dot product is Not Commutative** $A * B \neq B * A$


In [137]:
md4 = np.random.rand(3,7)
md4

array([[0.79240487, 0.67756013, 0.85395575, 0.17329821, 0.54443498,
        0.56848111, 0.43425804],
       [0.61959281, 0.02513875, 0.90040911, 0.95336196, 0.80660893,
        0.93368643, 0.35684056],
       [0.87374233, 0.29934569, 0.52781874, 0.09174014, 0.15170882,
        0.2333759 , 0.74577565]])

In [138]:
md5 = md1@md2@md4
md5

array([[61.01233264, 25.4592854 , 59.64981978, 32.13355785, 38.72388341,
        45.09119884, 41.82235928],
       [33.98660926, 13.90819957, 32.95817814, 17.81720702, 21.27579135,
        24.85530726, 23.46681771],
       [26.06022414, 10.53181927, 25.14103786, 13.62181231, 16.17076957,
        18.93128664, 18.07621316],
       [61.39579768, 25.48409762, 59.89165215, 32.2947125 , 38.8215031 ,
        45.24495444, 42.16911322]])

In [139]:
md5.shape

(4, 7)

In [140]:
tmp1 = md1@md2
tmp1@md4

array([[61.01233264, 25.4592854 , 59.64981978, 32.13355785, 38.72388341,
        45.09119884, 41.82235928],
       [33.98660926, 13.90819957, 32.95817814, 17.81720702, 21.27579135,
        24.85530726, 23.46681771],
       [26.06022414, 10.53181927, 25.14103786, 13.62181231, 16.17076957,
        18.93128664, 18.07621316],
       [61.39579768, 25.48409762, 59.89165215, 32.2947125 , 38.8215031 ,
        45.24495444, 42.16911322]])

In [141]:
tmp2 = md2@md4
md1@tmp2

array([[61.01233264, 25.4592854 , 59.64981978, 32.13355785, 38.72388341,
        45.09119884, 41.82235928],
       [33.98660926, 13.90819957, 32.95817814, 17.81720702, 21.27579135,
        24.85530726, 23.46681771],
       [26.06022414, 10.53181927, 25.14103786, 13.62181231, 16.17076957,
        18.93128664, 18.07621316],
       [61.39579768, 25.48409762, 59.89165215, 32.2947125 , 38.8215031 ,
        45.24495444, 42.16911322]])

**This proves that dot product is Associative** $(A*B)*C = A*(B*C)$

### Matrix and Vector Multiplication (Dot Product)

A matrix can be multiplied by a vector. The multiplier could be a matrix and the vector is the multiplicand. Please also mote that the columns of the matrix **must match** the number of rows of vector.

$$\begin{bmatrix}a &b \\c &d \\e &f\end{bmatrix} * \begin{bmatrix}x \\y \end{bmatrix} = \begin{bmatrix}a*x+b*y \\c*x+d*y \\e*x+f*y \end{bmatrix}$$ 

The result is a vector.

A **m x n** matrix multiplied by a **n x 1** vector results with a **m x 1** vector. In the example above a **3 by 2 matrix** multiply by **2 by 1 vector** will result in a **3 by 1 vector**.

We can also do dot product with vector as multiplier but the rules applied.

In [142]:
mvd1 = np.arange(23,23+6).reshape((3,2))
mvd1

array([[23, 24],
       [25, 26],
       [27, 28]])

In [143]:
mvd2 = np.random.rand(2,1)
mvd2

array([[0.45697073],
       [0.98781713]])

In [144]:
mvd3 = mvd1@mvd2
mvd3

array([[34.21793773],
       [37.10751343],
       [39.99708914]])

In [145]:
mvd4 = np.arange(67,87).reshape((5,4))
mvd4

array([[67, 68, 69, 70],
       [71, 72, 73, 74],
       [75, 76, 77, 78],
       [79, 80, 81, 82],
       [83, 84, 85, 86]])

In [146]:
mvd5 = np.random.rand(1,5)
mvd5

array([[0.12229559, 0.8915123 , 0.35945985, 0.42779507, 0.0852575 ]])

In [147]:
mvd6 = np.matmul(mvd5, mvd4)
mvd6

array([[139.32285013, 141.20917045, 143.09549077, 144.98181108]])

### Additional Example Codes

#### Example 1

$$$$
$a = \begin{bmatrix}1 \\2\\3 \end{bmatrix}$ and $b = \begin{bmatrix}1\\2\\3\end{bmatrix}$

In [148]:
ae1 = np.array([1,2,3]).reshape([3,1])
ae1.shape

(3, 1)

In [149]:
ae1

array([[1],
       [2],
       [3]])

In [150]:
ae2 = np.array([1,2,3]).reshape([3,1])
ae2.shape

(3, 1)

In [151]:
ae2

array([[1],
       [2],
       [3]])

In [152]:
try:
    dp1 = np.dot(ae1,ae2)
    dp1
except Exception as e:
    print('Error:', e)

Error: shapes (3,1) and (3,1) not aligned: 1 (dim 1) != 3 (dim 0)


<div class="alert alert-block alert-warning">

**The error above say we cannot do a dot product with 3 by 1 and 3 by 1. We can only do 1 By 3 with 3 by 1.**

</div>

$$$$
So we need to transpose b such that:

$a = \begin{bmatrix}1 \\2\\3 \end{bmatrix}$ and $b = \begin{bmatrix}1 &2 &3\end{bmatrix}= \begin{bmatrix}1+4+9=14\end{bmatrix}$

In [153]:
ae1

array([[1],
       [2],
       [3]])

In [154]:
ae2

array([[1],
       [2],
       [3]])

In [155]:
ae1.shape

(3, 1)

In [156]:
ae2 = np.transpose(ae2)
ae2.shape

(1, 3)

In [157]:
dp1 = np.dot(ae1,ae2)
dp1

array([[1, 2, 3],
       [2, 4, 6],
       [3, 6, 9]])

<div class="alert alert-block alert-warning">

**This is not the answer we want. We are doing a 3 by 1 and 1 by 3 matrix multiplication resulting in 3 by 3 product. We should do 1 By 3 with 3 by 1.**

</div>

In [158]:
dp1a = np.dot(ae2,ae1)
dp1a 

array([[14]])

In [159]:
ae2@ae1

array([[14]])

<div class="alert alert-block alert-info">

**When doing dot product multiplication, always check the dimensions and the expected dimensions.**

</div>

#### Example 2

$$\begin{bmatrix}
    1    &2   &3\\
    4    &5   &6  
\end{bmatrix}$$

$$\cdot$$

$$\begin{bmatrix}
    11    &4    &7\\
    21    &1    &9\\ 
    31    &7    &4   
\end{bmatrix}$$  

$$=$$

$$\begin{bmatrix}
    146    &27    &37\\
    335    &63    &97    
\end{bmatrix}$$ 

In [160]:
X =np.array([1,2,3,4,5,6]).reshape([2,3])
X

array([[1, 2, 3],
       [4, 5, 6]])

In [161]:
Y =np.array([11,4,7,21,1,9,31,7,4]).reshape([3,3])
Y

array([[11,  4,  7],
       [21,  1,  9],
       [31,  7,  4]])

In [162]:
dp2 = X@Y
dp2

array([[146,  27,  37],
       [335,  63,  97]])

<details>
<summary>
    <font size='3'><b>Expected Answer</b></font>
</summary>
    <p>
    $$\begin{bmatrix}
    146    &27    &37\\
    335    &63    &97    
\end{bmatrix}$$ 
    </p>

#### Example 3

In [163]:
a3 = np.array([100,200,300]).reshape([1,3])
b3 = np.array([100,200,300]).reshape([3,1])

In [164]:
a3

array([[100, 200, 300]])

In [165]:
b3

array([[100],
       [200],
       [300]])

In [166]:
dp3 = np.dot(a3,b3)
dp3

array([[140000]])

In [167]:
dp3a = a3@b3
dp3a

array([[140000]])

<details>
<summary>
    <font size='3'><b>Expected Answer</b></font>
</summary>
    <p>
    <ul>
        <li>$140,000$ </li>
    </ul>
    </p>

#### Example 4: Exception from dot product rules

In [168]:
a4 = np.array([1,2,3])
b4 = np.array([1,2,3])

In [169]:
a4.shape

(3,)

In [170]:
b4.shape

(3,)

In [171]:
dp4 = a4@b4
dp4

14

<div class="alert alert-block alert-info">

**Please note that in the example above, we are able to do dot product with 1D array. This is enabled in Numpy. For more information, please refer to the link below.**

**As a new learner, we prefer to use 3 by 1 matrix/vector instead.**

- https://numpy.org/doc/stable/reference/generated/numpy.dot.html
- https://numpy.org/doc/stable/reference/generated/numpy.matmul.html


</div>

However, if we reshape into n by 1 vector. Then it would not work. The dot product rules applies.

In [172]:
a4i = np.array([1,2,3]).reshape((3,1))
b4i = np.array([1,2,3]).reshape((3,1))

In [173]:
try:
    a4i@b4i
except Exception as e:
    print('Error:', e)

Error: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 3 is different from 1)


<div class="alert alert-block alert-info">

**Note**

**In our experience, for beginners with so many rules to remember. It is better to always reshape a vector to **n by 1** or **1 by n** vector. This will help use to understand the dot product rules better and avoid any confusion with the exception.**

</div>

#### Example 5: Dot Product and Matmul Differences

In [174]:
special_a = np.array([1,2,3]).reshape((3,1))
num1 = 23

We can use dot product with a single number as a 1 by 1 matrix.

In [175]:
np.dot(special_a,num1)

array([[23],
       [46],
       [69]])

However, we cannot perform the similar using matmul.

In [176]:
try:
    np.matmul(special_a,num1)
except Exception as e:
    print('Error:', e)

Error: matmul: Input operand 1 does not have enough dimensions (has 0, gufunc core with signature (n?,k),(k,m?)->(n?,m?) requires 1)


In [177]:
try:
    special_a@num1
except Exception as e:
    print('Error:', e)

Error: matmul: Input operand 1 does not have enough dimensions (has 0, gufunc core with signature (n?,k),(k,m?)->(n?,m?) requires 1)


In [178]:
special_num = np.array([23]).reshape((1,1))
special_num

array([[23]])

In [179]:
np.matmul(special_a,special_num)

array([[23],
       [46],
       [69]])

## Application of Matrix Multiplication (Dot Product) in Linear Regression

### Matrix Dot Product (Multiple Features)

In Python, we use the dot product function to perform matrix and vector multiplication. Consider the following example, using the matrix multiplication rules:

A **m x n** matrix multiplied with **n x p** matrix will result in a **m x p** matrix.

$$\begin{bmatrix}a &b &c \\d &e  &f\\g &h & i \end{bmatrix} * \begin{bmatrix}x\\y\\z\end{bmatrix} = \begin{bmatrix}(a*x)+(b*y)+(c*z) \\(d*x)+(e*y)+(f*z) \\(g*x)+(h*y)+(i*z) \end{bmatrix}$$ 

The example above is a **3 by 3 matrix** multiple by a **3 by 1 matrix**. The result is a **3 by 1 matrix**.


Now let us use a more concrete example:

$$X = \begin{bmatrix}sqft &room \\100 &3 \\200 &4\end{bmatrix}$$ 

- where $X$ has 2 features with 2 training example.
- let us assume $w_1 = 10, w_2 = 3$, we have

$$w = \begin{bmatrix}w_1 =10\\w_2=3\end{bmatrix}$$ 
$$$$
We should have **2 by 2 matrix** multiply by **2 by 1 vector** resulting in **2 by 1 vector**.

$$\begin{bmatrix}100 &3 \\200 &4\end{bmatrix} * \begin{bmatrix}10\\3\end{bmatrix} = \begin{bmatrix}(100*10)+(3*3) \\(200*10)+(4*3) \end{bmatrix}= \begin{bmatrix}1009 \\2012\end{bmatrix}$$ 

$$$$
The dot product implies the following formula for **each training example**:

$$X^{(i)} \cdot w^{(i)} = (X^{(i)}_1 \cdot w^{(i)}_1) + (X^{(i)}_2 \cdot w^{(i)}_2) =  \sum\limits_{j=0}^{n-1}X^{(i)}_j \cdot w^{(i)}_j$$

- where $n$ is total number of features
- where $j$ is the index for $w_j$ and $x_j$

As we can see the dot product satisfy our linear function for multiple features.

$$f_{\vec{w},b}(\vec{x}^{(i)})=b + \sum\limits_{j=0}^{n-1} \vec{w}_{j}\vec{x}_{j}^{(i)}$$

#### Example A

$$\begin{bmatrix}100 &3 \\200 &4\end{bmatrix} * \begin{bmatrix}10\\3\end{bmatrix} = \begin{bmatrix}(100*10)+(3*3) \\(200*10)+(4*3) \end{bmatrix}= \begin{bmatrix}1009 \\2012\end{bmatrix}$$ 

$$\sum\limits_{j=0}^{n-1}X^{(i)}_j \cdot w^{(i)}_j$$

In [180]:
X = np.array([100,3,200,4]).reshape((2,2))
X

array([[100,   3],
       [200,   4]])

In [181]:
w = np.array([10,3]).reshape((2,1))
w

array([[10],
       [ 3]])

In [182]:
expected_answer_5_first_row = (100 * 10 + 3 * 3)
expected_answer_5_first_row

1009

In [183]:
expected_answer_5_2nd_row = (200 * 10 + 4 * 3)
expected_answer_5_2nd_row

2012

In [184]:
np.dot(X,w)

array([[1009],
       [2012]])

In [185]:
np.matmul(X,w)

array([[1009],
       [2012]])

In [186]:
X@w

array([[1009],
       [2012]])

**Do note that dot product only perform computation for all features in a single training example. We need to sum up all the training examples when computing cost function.** 

### Matrix Dot Product (Single Features)

Consider the following example, using the matrix multiplication rules:

$$\begin{bmatrix}a \\b\\c \end{bmatrix} \cdot \begin{bmatrix}w\end{bmatrix} = \begin{bmatrix}a*x \\b*w\\c*w\end{bmatrix}$$
A **3 by 1 matrix** with **1 by 1 vector** will result in **3 by 1 matrix**.
$$$$
Therefore, 
If $X = \begin{bmatrix}a \\b\\c \end{bmatrix}$ and $w = \begin{bmatrix}w\end{bmatrix}$
$$X \cdot w = X^{(i)} \cdot w^{(i)}$$

If $X = \begin{bmatrix}1 \\2\\3 \end{bmatrix}$ and $w = \begin{bmatrix}5\end{bmatrix}$ Then

$$X \cdot w = \begin{bmatrix}1*5 \\2*5\\3*5\end{bmatrix}= \begin{bmatrix}5 \\10\\15\end{bmatrix}$$

The next step is to sum up the array which can be achieved by the sum method of the vector or matrix.  

#### Example B

$$X \cdot w =\begin{bmatrix}1 \\2\\3\end{bmatrix} \cdot \begin{bmatrix}5\end{bmatrix} =\begin{bmatrix}1*5 \\2*5\\3*5\end{bmatrix}= \begin{bmatrix}5 \\10\\15\end{bmatrix}$$

$$X \cdot w = X_1^{(i)} \cdot w_1^{(i)}$$

In [187]:
X = np.array([1,2,3]).reshape((3,1))
w = np.array([5]).reshape((1,1))

In [188]:
X

array([[1],
       [2],
       [3]])

In [189]:
w

array([[5]])

In [190]:
expected_answer_1 = (1*5)
expected_answer_1

5

In [191]:
expected_answer_2 = (2*5)
expected_answer_2

10

In [192]:
expected_answer_3 = (3*5)
expected_answer_3

15

In [193]:
expected_answer_total = (1*5)+(2*5)+(3*5)
expected_answer_total

30

In [194]:
dp2a = np.dot(X,w)
dp2a

array([[ 5],
       [10],
       [15]])

In [195]:
np.matmul(X,w)

array([[ 5],
       [10],
       [15]])

In [196]:
try:
    np.matmul(X,5)
except Exception as e:
    print('Error:', e)

Error: matmul: Input operand 1 does not have enough dimensions (has 0, gufunc core with signature (n?,k),(k,m?)->(n?,m?) requires 1)


## Application of Matrix Operation in Cost Function

### Single Feature Cost Function

Hypothesis: $$f_{w,b}(x^{(i)}) = wx^{(i)} + b$$
Cost Function: $$J(w,b) = \frac{1}{2m} \sum\limits_{i = 0}^{m-1}\left(( wx^{(i)} + b) - y^{(i)}\right)^2$$ 

#### Example C

If $w = 5$ and $b=0$ and we have three training examples:

$$J(w,b) = \frac{1}{2m} \left( \left((100 \times 5) - 100 \right)^2 + \left((200 \times 5) - 200 \right)^2 + \left((300 \times 5) - 300 \right)^2 \right)$$

In [197]:
x = np.array([100.,200.,300.],dtype=np.float64).reshape([3,1])
y = np.array([100.,200.,300.],dtype=np.float64).reshape([3,1])
w = np.array([5]).reshape([1,1])
b = 0

In [198]:
def cost_function_l(x,y,b,w):
    m = x.shape[0]
    sumRow = 0.
    for i in range(m):
        fx = ((w * x[i]) + b)
        costEachEx = (fx - y[i]) ** 2
        sumRow += costEachEx

    return (1 / (2 * m)) * sumRow

In [199]:
cost_function_l(x,y,b=0,w=5)

array([373333.33333333])

In [200]:
# No loops required
def cost_function_beta1(x,y,b,w):
    m = x.shape[0]
    fx = (np.dot(x,w)+b)
    costEachEx = (fx - y)**2
    sumRow = costEachEx.sum()
    return (1 / (2 * m)) * sumRow

In [201]:
cost_function_beta1(x,y,b=0,w=5)

373333.3333333333

<details>
<summary>
    <font size='3'><b>Expected Answer</b></font>
</summary>
    <p>
$$X \cdot w = \begin{bmatrix}
500 \\
1000 \\
1500 \\    
\end{bmatrix}$$
    
$$(X \cdot w) - y = \begin{bmatrix} 500 \\ 1000 \\ 1500 \end{bmatrix} - \begin{bmatrix} 100 \\ 200 \\ 300 \end{bmatrix} = \begin{bmatrix} 400 \\ 800 \\ 1200 \end{bmatrix}$$

$$ ((X \cdot w) - y) ^ 2 = \begin{bmatrix}
160000 \\
640000 \\
1440000 \\    
\end{bmatrix}$$

$$Cost = sum\left(\begin{bmatrix}160000 \\640000 \\1440000 \end{bmatrix}\right)/(2*m) = 2240000/(2 * m) =  2240000/6 = 373,333.3333$$


</p>

### Multiple Feature Cost Function

**Multiple Feature**:

Hypothesis: $$f_{\vec{w},b}(\vec{x}^{(i)})=b + w_{1}x_{1}^{(i)} + w_{2}x_{2}^{(i)} + w_{3}x_{3}^{(i)} + ..... + w_{j}x_{j}^{(i)} + ..... + w_{n}x_{n}^{(i)}$$
$$f_{\vec{w},b}(\vec{x}^{(i)})=b + \sum\limits_{j=0}^{n-1} \vec{w}_{j}\vec{x}_{j}^{(i)}$$


Cost Function:	$$J(\vec w, b) = \frac{1}{2m}   \sum\limits_{i=0}^{m-1} (f_{\vec w,b}(\vec{x}^{(i)})-\vec y^{(i)})^{2}$$

$\therefore$ 
$$J(\vec w, b) = \frac{1}{2m} \sum\limits_{i=0}^{m-1} \left(\left(b + \sum\limits_{j=0}^{n-1} \vec w_{j} \vec x_{j}^{(i)} \right)-\vec y^{(i)}\right)^{2}$$
$$J(\vec w, b) = \frac{1}{2m} \sum\limits_{i=0}^{m-1} \left(\left(b + \vec X^{(i)} \cdot \vec w \right)-\vec y^{(i)}\right)^{2}$$

#### Example D

$$\begin{bmatrix}100 &3 \\200 &4\end{bmatrix} * \begin{bmatrix}10\\3\end{bmatrix} = \begin{bmatrix}(100*10)+(3*3) \\(200*10)+(4*3) \end{bmatrix}= \begin{bmatrix}1009 \\2012\end{bmatrix}$$ 

In [202]:
X_train = np.array([100,3,200,4]).reshape([2,2])
X_train

array([[100,   3],
       [200,   4]])

In [203]:
y_train = np.array([100,200]).reshape([2,1])
y_train

array([[100],
       [200]])

In [204]:
w = np.array([10,3]).reshape([2,1])
b = 0
w

array([[10],
       [ 3]])

In [205]:
def cost_function_loopv2(x,y,b,w):
    m = x.shape[0]
    n = w.shape[0]
    sumAll = 0.

    # for each training example
    for i in range(m):

        # Multiply and sum all the features for example i
        sumFeatures = 0
        for j in range(n):
            sumFeatures += w[j] * x[i][j]
        
        fx = sumFeatures + b
        costEachTrainingExample = (fx - y[i]) ** 2
        sumAll += costEachTrainingExample

    return (1 / (2 * m)) * sumAll

In [206]:
cost = cost_function_loopv2(X_train,y_train,b,w)
cost

array([1027406.25])

In [207]:
# No loops required
def cost_function_beta2(x,y,b,w):
    m = x.shape[0]
    fx = (x@w)+b
    costEachTrainingExample = (fx - y) ** 2
    sumAll = costEachTrainingExample.sum()
    return (1 / (2 * m)) * sumAll

In [208]:
cost = cost_function_beta2(X_train,y_train,b,w)
cost

1027406.25

<details>
<summary>
    <font size='3'><b>Expected Answer</b></font>
</summary>
    <p>
$$X \cdot w = \begin{bmatrix}
1009 \\
2012 \\    
\end{bmatrix}$$
    

$$(X \cdot w) - y = \begin{bmatrix} 1009 \\ 2012 \end{bmatrix} - \begin{bmatrix} 100 \\ 200 \end{bmatrix} = \begin{bmatrix} 909 \\ 1812 \end{bmatrix}$$

$$ ((X \cdot w) - y) ^ 2 = \begin{bmatrix}
826,281 \\
3,283,344 \\   
\end{bmatrix}$$

$$Cost = sum\left(\begin{bmatrix}826,281 \\3,283,344 \end{bmatrix}\right)/(2*m) = 4,109,625/(2 * m) =  2240000/4 = 1,027,406.25$$



</p>

#### Example E

In [209]:
dfx = pd.DataFrame({'sqft': [1000, 1500, 2000], 'rooms': [3, 4,5], 'bathrooms':[2,3,3], 'age':[20,25,30]})
dfx

Unnamed: 0,sqft,rooms,bathrooms,age
0,1000,3,2,20
1,1500,4,3,25
2,2000,5,3,30


In [210]:
dfy = pd.DataFrame({'price in thousands': [250, 300, 350]})
dfy

Unnamed: 0,price in thousands
0,250
1,300
2,350


$$f_{w,b}(x^{(1)})=b + w_{1}x_{1}^{(1)} + w_{2}x_{2}^{(1)} + w_{3}x_{3}^{(1)} + w_{4}x_{4}^{(1)}$$
$$f_{w,b}(x^{(2)})=b + w_{1}x_{1}^{(2)} + w_{2}x_{2}^{(2)} + w_{3}x_{3}^{(2)} + w_{4}x_{4}^{(2)}$$
$$f_{w,b}(x^{(3)})=b + w_{1}x_{1}^{(3)} + w_{2}x_{2}^{(3)} + w_{3}x_{3}^{(3)} + w_{4}x_{4}^{(3)}$$
$$$$
$$f_{w,b}(x^{(1)})=b + w_{1}1000 + w_{2}3 + w_{3}2 + w_{4}20$$
$$f_{w,b}(x^{(2)})=b + w_{1}1500 + w_{2}4 + w_{3}3 + w_{4}25$$
$$f_{w,b}(x^{(3)})=b + w_{1}2000 + w_{2}5 + w_{3}3 + w_{4}30$$
$$$$
Assuming: $$w_1 = 100, w_2 = 4, w_3 = 3, w_4=20, b = 50000$$
$$$$
$$f_{w,b}(x^{(1)})=50000 + 100 \times 1000 + 4 \times 3 + 3 \times 2 + 20 \times 20$$
$$f_{w,b}(x^{(2)})=50000 + 100 \times 1500 + 4 \times 4 + 3 \times 3 + 20 \times 25$$
$$f_{w,b}(x^{(3)})=50000 + 100 \times 2000 + 4 \times 5 + 3 \times 3 + 20 \times 30$$
$$$$
$$J(\vec w, b) = \frac{1}{2m} \sum\limits_{i=0}^{m-1} \left(\left(b + \sum\limits_{j=0}^{n-1} \vec w_{j} \vec x_{j}^{(i)} \right)-y^{(i)}\right)^{2}$$
$$$$
$$J(\vec w, b) = \frac{1}{2m} \left(\left(50000 + 100 \times 1000 + 4 \times 3 + 3 \times 2 + 20 \times 20 \right) - 250\right)^2$$  
$$+ \left(\left(50000 + 100 \times 1500 + 4 \times 4 + 3 \times 3 + 20 \times 25 \right) - 300\right)^2 + \left(\left(50000 + 100 \times 2000 + 4 \times 5 + 3 \times 3 + 20 \times 30 \right) - 350\right)^2$$



In [211]:
expected_ans = (((100*1000)+(4*3)+(3*2)+(20*20)+50000) - 250) **2 +(((100*1500)+(4*4)+(3*3)+(20*25)+50000)-300) ** 2+(((100*2000)+(4*5)+(3*3)+(20*30)+50000)-350)**2
expected_ans = expected_ans * (1/(2*3))
expected_ans 

20880009448.333332

In [212]:
X_train = dfx.to_numpy()
X_train

array([[1000,    3,    2,   20],
       [1500,    4,    3,   25],
       [2000,    5,    3,   30]])

In [213]:
y_train = dfy.to_numpy()
y_train

array([[250],
       [300],
       [350]])

In [214]:
b = 50000
w = np.array([100,4,3,20]).reshape([4,1])
w.shape

(4, 1)

In [215]:
w

array([[100],
       [  4],
       [  3],
       [ 20]])

In [216]:
cost = cost_function_beta2(X_train,y_train,b,w)
cost

20880009448.333332

**Expected Answer:** 20880009448.333332

#### Example F: Use the function for multiple features on single features

In [217]:
x = np.array([100.,200.,300.],dtype=np.float64).reshape([3,1])
y = np.array([100.,200.,300.],dtype=np.float64).reshape([3,1])
w = np.array([5]).reshape([1,1])
b = 0

In [218]:
costSingle = cost_function_beta2(x,y,b,w)
costSingle

373333.3333333333

**Thus, we can use the same vectorized function for Single or Multiple Features.**

### Final Cost Function

In [219]:
# Cost Function
def cost_function(X,y,b,w):
    '''
    Actual cost function for both single and multiple features
    X = matrix of training data, each training examples in rows (m) and features in column (n), 
        Single feature data must be in m by 1 vector, where m is total number of training examples.
    y = m by 1 vector, where m is total number of training examples.
    b = scalar
    w = n by 1 vector, where n is total number of features
    '''
    
    m = X.shape[0]
    fx = (X@w)+b
    lossFunction = (fx - y) ** 2
    RSS = lossFunction.sum()
    cost = (1 / (2 * m)) * RSS

    return cost

In [220]:
X_train = np.array([100,3,200,4]).reshape([2,2])
y_train = np.array([100,200]).reshape([2,1])
w = np.array([10,3]).reshape([2,1])
b = 0
w

array([[10],
       [ 3]])

In [221]:
%%timeit -r 1 -n 1
costLoop = cost_function_loopv2(X_train,y_train,b,w)
print(costLoop)

[1027406.25]
1.11 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


In [222]:
%%timeit -r 1 -n 1
costFunction = cost_function(X_train,y_train,b,w)
print(costFunction)

1027406.25
1.43 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


The timing confirm that matrix computation is much faster.

## End of Note 8