# One Variable Linear Regression

## Goal

We're given a dataset that has x and y. Example:

|  x1  | x2 | x3 | x4 |  y  |
|------|----|----|----|-----|
| 2104 | 5  | 1  | 45 | 460 |
| 1416 | 3  | 2  | 40 | 232 |
| 852  | 2  | 1  | 35 | 178 |


Now we want to fit the best variable linear regression for the given data. Model (here n = 4):

$$ f_{w,b}(x^{(i)}) = w_1 x_1^{(i)} + w_2 x_2^{(i)} + ... + w_n x_n^{(i)} + b $$


## Definitions

We imagine we have **m** training data. Also, our function has **n** variables.
To show all of x training data we use capital bold **X** which is a matrix:

$$\mathbf{X} = 
\begin{pmatrix}
 x^{(0)}_0 & x^{(0)}_1 & \cdots & x^{(0)}_{n-1} \\ 
 x^{(1)}_0 & x^{(1)}_1 & \cdots & x^{(1)}_{n-1} \\
 \cdots \\
 x^{(m-1)}_0 & x^{(m-1)}_1 & \cdots & x^{(m-1)}_{n-1} 
\end{pmatrix}
$$

 We also define a bold $\mathbf{x^{(i)}}$ to show list of all training data for a training set and a bold **w**. 

$$\mathbf{x}^{(i)} = (x^{(i)}_0, x^{(i)}_1, \cdots,x^{(i)}_{n-1})$$

$$\mathbf{w} = \begin{pmatrix}
w_0 \\ 
w_1 \\
\cdots\\
w_{n-1}
\end{pmatrix}
$$


Also, we know that dot production of two vector is equal to:

$$ \mathbf{w} \cdot \mathbf{x^{(i)}} = \sum_{j=0}^{n} (w_j x_j^{(i)}) $$

And we can write our model as:

$$ f_{\mathbf{w}, b}(\mathbf{x^{(i)}}) = \mathbf{w} \cdot \mathbf{x^{(i)}} + b$$

## Solution

Now we know that our cost function is defined as below:

$$ J_{\mathbf{w}, b} = \frac{1}{2m} \sum_{i=0}^{m-1}(f_{\mathbf{w}, b}(\mathbf{x^{(i)}}) - y^{(i)})^2$$

Now we need to calculate each of $ \frac {\partial J_{\mathbf{w}, b}} {\partial w_1} $ ,$ \frac {\partial J_{\mathbf{w}, b}} {\partial w_2} $, ... ,$ \frac {\partial J_{\mathbf{w}, b}} {\partial w_n} $, $ \frac {\partial J_{\mathbf{w}, b}} {\partial b} $. 

$$ \frac {\partial J_{\mathbf{w}, b}} {\partial w_1} = \frac{\partial}{\partial w_1} ( \frac{1}{2m} \sum_{i=0}^{m-1}(f_{\mathbf{w}, b}(\mathbf{x^{(i)}}) - y^{(i)})^2)$$

$$ \frac {\partial J_{\mathbf{w}, b}} {\partial w_1} = \frac{\partial}{\partial w_1} ( \frac{1}{2m} \sum_{i=0}^{m-1}(w_1 x_1 ^{(i)} + w_2 x_2^{(i)} + ... + w_n x_n^{(i)} - y^{(i)})^2)$$


$$ \frac {\partial J_{\mathbf{w}, b}} {\partial w_1} = \frac{1}{m} \sum_{i=0}^{m-1}(w_1 x_1^{(i)} + w_2 x_2^{(i)} + ... + w_n x_n^{(i)} - y^{(i)}) \times x_1 ^ {(i)}$$

By replacing $f_{w,b}$ we get:

$$\frac {\partial J_{\mathbf{w}, b}} {\partial w_1} = \frac{1}{m} \sum_{i=0}^{m-1}(f_{\mathbf{w}, b}(\mathbf{x^{(i)}}) - y^{(i)}) \times x_1 ^ {(i)}$$

And we can generalize it so we get:


$$\Rightarrow \frac {\partial J_{\mathbf{w}, b}} {\partial w_n} = \frac{1}{m} \sum_{i=0}^{m-1}(f_{\mathbf{w}, b}(\mathbf{x^{(i)}}) - y^{(i)}) \times x_n ^ {(i)}$$

By repeating same steps for b, we get:


$$\Rightarrow \frac {\partial J_{\mathbf{w}, b}} {\partial b} = \frac{1}{m} \sum_{i=0}^{m-1}(f_{\mathbf{w}, b}(\mathbf{x^{(i)}}) - y^{(i)})$$

And from one linear regression, we are familiar with these formulas:

$$ w_n = w_n - \alpha \frac{\partial J(\mathbf{w},b)}{\partial w_n} $$
$$ b = b - \alpha \frac{\partial J(\mathbf{w},b)}{\partial b} $$


## Code
### Initial Code

In [3]:
import numpy as np
import matplotlib.pyplot as plt

In [30]:
def compute_cost(X_train, y_train, w, b):
    m = X_train.shape[0]
    cost = 0
    for i in range(m):
        error = np.dot(w, X_train[i]) + b - y_train[i]
        cost += error ** 2
    cost = cost / (2 * m)
    return cost

In [43]:
def compute_gradient(X_train, y_train, w, b):
    m = X_train.shape[0]
    dj_dw = np.zeros(X_train.shape[1])
    dj_db = 0.
    for i in range(m):
        error = np.dot(w, X_train[i]) + b - y_train[i]
        dj_dw += np.dot(error, X_train[i])
        dj_db += error
    dj_dw = dj_dw / m
    dj_db = dj_db / m
    return dj_dw, dj_db

In [53]:
def fit_regression(X_train, y_train, initial_w, initial_b, alpha, iteration):
    w = initial_w
    b = initial_b
    for _ in range(iteration):
        dj_dw, dj_db = compute_gradient(X_train, y_train, w, b)
        w = w - alpha * dj_dw
        b = b - alpha * dj_db
    
    return w, b

In [57]:
X_train = np.array([[2104, 5, 1, 45], [1416, 3, 2, 40], [852, 2, 1, 35]])
y_train = np.array([460, 232, 178])

b_initial = 0.
w_initial = np.zeros(X_train.shape[1])

alpha = 5.0e-7
iteration = 1_000
w, b = fit_regression(X_train, y_train, w_initial, b_initial, alpha, iteration)

cost = compute_cost(X_train, y_train, w, b)
print(f"cost: {cost}")

cost: 686.7034116665205


### Applying Scaling