# Multivariate Linear Regression Tutorial
<br>
In the previous tutorial, we predicted restaurant profits (our **label**) using linear regression based on a sole **feature**, which was the population of cities that restaurants were located in. However at often times, there are more than one features that affect the label, for example size of the restaurant in square feet, number of competitors within its vicinity, etc. In this situation, we use **Mutivariate Linear Regression** which occurs when we want to perform linear regression when our data set has more than one feature. 

Note that since the dataset is multivariate, we are dealing with **multi-dimensional** data, which cannot be represented on a 2D graph. The best we can do for visualization is to use a 3D graph, but that is only in the special case that there are two features and one label (ie. predicting restaurant profits based on only two factors, like population of city and size of restaurant). However in practice, often times there are more than two features that determine a label, which makes it difficult to depict our data in a graph.

For this tutorial, we will use the dataset "ex1data2.txt" found in this directory to predict housing prices. The dataset comes with three columns: Size of house (in square feet), number of bedrooms, and price. 

In [4]:
# Once again, the first step is to import libraries and load our DataFrame. 
import pandas as pd
import numpy as np

# although our data is 3D, we are not going visualize our dataset/prediction
import matplotlib.pyplot as plt 
%matplotlib inline

df = pd.read_csv('ex1data2.txt', names = ['Size of house (sq ft)',
                                          'Number of bedrooms',
                                          'Price of house'])
print(df.head(5))
print('\n')
print(df.tail(5))


   Size of house (sq ft)  Number of bedrooms  Price of house
0                   2104                   3          399900
1                   1600                   3          329900
2                   2400                   3          369000
3                   1416                   2          232000
4                   3000                   4          539900


    Size of house (sq ft)  Number of bedrooms  Price of house
42                   2567                   4          314000
43                   1200                   3          299000
44                    852                   2          179900
45                   1852                   4          299900
46                   1203                   3          239500


With multivariate linear regression, we need to introduce some new notation:
- m is the number of data points (46 in this example)
- n is the number of features (2 in this example)
- $x^{(i)}_j$ is the value at the j-th feature on the i-th training example ($x^{(43)}_1$ = 1200, 1st feature on 43rd data point)
- $y^{(i)}$ is the i-th label ($y^{(4)}$ = 539900)

<br>
Recall the equation of a line for SLRM:
$$ y = \theta_0x_0 + \theta_1x_1 = \theta_0 + \theta_1x_1 $$

For multivariate linear regression, our new equation (which we will call the hypothesis function) is: 
<br>
<br>
\begin{align*}
h(x) & = \theta_0x_0 + \theta_1x_1 + \theta_2x_2 + \theta_3x_3 + ... + \theta_nx_n &&& \text{note: }x_0 = 1 \\\\
    & = \left [  \theta_0~\theta_1~...~\theta_n\right ]\begin{bmatrix}
x_0\\ 
x_1\\ 
...\\
x_n 
\end{bmatrix} \\
 & = \theta^Tx
\end{align*}
<br>
<br>
Similarly, recall the gradient descent algorithm for SLRM that for each iteration we set:
$$ \theta_0 = \theta_0 - \alpha\frac{1}{m}\sum_{i=1}^{m}(\theta_0 + \theta_1x^{(i)} - y^{(i)}) \\ \theta_1 = \theta_1 - \alpha\frac{1}{m}\sum_{i=1}^{m}(\theta_0 + \theta_1x^{(i)} - y^{(i)})~ x^{(i)} $$

However for multiple variables, for each iteration in the gradient descent algorithm, we set:
\begin{align*}
\theta_0 &= \theta_0 - \alpha\frac{1}{m}\sum_{i=1}^{m}(\theta_0 + \theta_1x^{(i)}_1 + \theta_2x^{(i)}_2 + ... + \theta_nx^{(i)}_n - y^{(i)}) \\
        &= \theta_0 - \alpha\frac{1}{m}\sum_{i=1}^{m} (\theta^Tx^{(i)} - y^{(i)}) \\
\theta_1 &= \theta_1 - \alpha\frac{1}{m}\sum_{i=1}^{m}(\theta^Tx^{(i)} - y^{(i)})~ x^{(i)}_1 \\
\theta_2 &= \theta_2 - \alpha\frac{1}{m}\sum_{i=1}^{m}(\theta^Tx^{(i)} - y^{(i)})~ x^{(i)}_2 \\ 
& =~... \\
\theta_n &= \theta_n - \alpha\frac{1}{m}\sum_{i=1}^{m}(\theta^Tx^{(i)} - y^{(i)})~ x^{(i)}_n \\
\end{align*}
