# Linear Regression

## Problem Formulation

* $m$ is the number of training examples 
* $nx$ is the number of input features
* $n = nx+1$ is the actual number of features (obtained by adding an additional feature $x_0 = 1$)
* $(x^{(i)},y^{(i)})$ is the <i>i-th</i> training example
* $x^{(i)}_{j}$ is the <i>j-th</i> feature of the i-th training example

$$ x^{(i)} = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_{nx} \end{bmatrix} \rightarrow x^{(i)} = \begin{bmatrix} x_0 \\ x_1 \\ \vdots \\ x_{nx} \end{bmatrix} \ \ , \ \ where \ \ \ \ x_0 = 1 \rightarrow \ \ \ x^{(i)} = \begin{bmatrix} 1 \\ x_1 \\ \vdots \\ x_{nx} \end{bmatrix} \ \ , \ \ \ \ \ \ \theta = \begin{bmatrix} \theta_0 \\ \theta_1 \\ \vdots \\ \theta_{nx} \end{bmatrix} \\
x^{(i)} \in \mathbb{R}^n \ \ \ \ , \ \ \ \ \theta \in \mathbb{R}^n \ \ \ \ , \ \ \ \ y^{(i)} \in \mathbb{R} $$

In [1]:
def init_weight(n, init_type=enumerate(["RANDOM", "ZERO"]))
    if init_type == "RANDOM":
        # Initialize weights `theta` with random values
        theta = np.random.rand(n, 1)
    elif init_type == "ZERO":
        # Initialize weights `theta` with zero values
        theta = np.zeros((n, 1), dtype=float)
    else:
        # Default initiliziation is `zero`
        theta = np.zeros((n, 1), dtype=float)
    return theta

SyntaxError: invalid syntax (<ipython-input-1-c2d4baa0919d>, line 1)

## Hypothesis
The hypothesis outpus a linear function of the input features:

$$h_\theta(x^{(i)}) = \theta_0x_0^{(i)} + ... + \theta_nx_n^{(i)} $$

$$h_\theta(x^{(i)}) = \sum_{j = 1}^{n}\theta_j x_j^{(i)} = \theta^{T}x^{(i)} $$

In [None]:
def compute_hypothesis()

In [1]:
from sklearn import datasets

# Load the diabetes dataset
diabetes_X, diabetes_y = datasets.load_diabetes(return_X_y=True)

In [2]:
diabetes_X.shape

(442, 10)

In [3]:
diabetes_y.shape

(442,)

In [10]:
# Split the data into training/testing sets
diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]

# Split the targets into training/testing sets
diabetes_y_train = diabetes_y[:-20]
diabetes_y_test = diabetes_y[-20:]

## Cost Function
The cost function measures 
$$ J(\theta) = \frac{1}{2}\sum_{i = 1}^{m}(h_\theta(x^{(i)}) - y^{(i)})^2$$

## Update Rule 

$$ \theta_j \mathrel{\mathop:}= \theta_j - \alpha \frac{\partial}{\partial\theta_j} J(\theta) $$

### Gradient

$$ \frac{\partial}{\partial\theta_j} J(\theta) = \frac{\partial}{\partial\theta_j} \frac{1}{2}\sum_{i = 1}^{m}(h_\theta(x^{(i)}) - y^{(i)})^2  $$

$$ \begin{align} \frac{\partial}{\partial\theta_j} J(\theta) & = \frac{\partial}{\partial\theta_j} \frac{1}{2}(h_\theta(x) - y)^2  \\ & = 2 \cdot \frac{1}{2} (h_\theta(x) - y) \cdot \frac{\partial}{\partial\theta_j} (h_\theta(x) - y) \\ & = (h_\theta(x) - y) \cdot \frac{\partial}{\partial\theta_j} \ (\sum_{j = 1}^{n}\theta_j x_j - y)  \\ & = (h_\theta(x) - y)\ x_j \end{align} $$


$$ \theta_j \mathrel{\mathop:}= \theta_j - \alpha \ (h_\theta(x^{(i)}) - y^{(i)})\ x_j^{(i)} $$

$$ \theta_j \mathrel{\mathop:}= \theta_j - \alpha \ \sum_{i = 1}^{m} (h_\theta(x^{(i)}) - y^{(i)})\ x_j^{(i)} $$

## Vectorization

### Input Representation

$$ X = \begin{bmatrix}
\newcommand*{\horzbar}{\rule[.5ex]{5ex}{0.5pt}}
\ \ (x^{(1)})^T \ \ \newcommand*{\horzbar}{\rule[.5ex]{5ex}{0.5pt}} \\
 \newcommand*{\horzbar}{\rule[.5ex]{5ex}{0.5pt}} \ \ (x^{(2)})^T  \ \ \newcommand*{\horzbar}{\rule[.5ex]{5ex}{0.5pt}} \\
  \vdots  \\
 \newcommand*{\horzbar}{\rule[.5ex]{5ex}{0.5pt}} \ \ (x^{(m)})^T \ \ \newcommand*{\horzbar}{\rule[.5ex]{5ex}{0.5pt}} \\
\end{bmatrix} \ \ \ , \ \ \ Y = \begin{bmatrix} y^{(1)} \\ y^{(2)} \\ \vdots \\ y^{(m)} \end{bmatrix} \ \ \ , \ \ \ \theta = \begin{bmatrix} \theta_{1} \\ \theta_{2} \\ \vdots \\ \theta_{n} \end{bmatrix}$$


### Compute Hypothesis

$$ \begin{align} h_\theta(X) = X \theta  =  \ & \begin{bmatrix}
\newcommand*{\horzbar}{\rule[.5ex]{5ex}{0.5pt}}
\ \ (x^{(1)})^T \ \ \newcommand*{\horzbar}{\rule[.5ex]{5ex}{0.5pt}} \\
 \newcommand*{\horzbar}{\rule[.5ex]{5ex}{0.5pt}} \ \ (x^{(2)})^T  \ \ \newcommand*{\horzbar}{\rule[.5ex]{5ex}{0.5pt}} \\
  \vdots  \\
 \newcommand*{\horzbar}{\rule[.5ex]{5ex}{0.5pt}} \ \ (x^{(m)})^T \ \ \newcommand*{\horzbar}{\rule[.5ex]{5ex}{0.5pt}} \\
\end{bmatrix} \cdot \begin{bmatrix} \theta_{1} \\ \theta_{2} \\ \vdots \\ \theta_{n} \end{bmatrix} = \begin{bmatrix} h_{\theta}(x^{(1)}) \\ h_{\theta}(x^{(2)}) \\ \vdots \\ h_{\theta}(x^{(m)}) \end{bmatrix}  \\ \\ &
 \ \ (m,n) \ \ \cdot  \ \ (n,1) \ \ = \ \  (m,1) \end{align} $$ 

$$ \begin{align} J(\theta) & = \frac{1}{2} \ (h_{\theta}(X) - Y)^T \ (h_{\theta}(X) - Y) \\  & = \frac{1}{2} \ (X\theta - Y)^T \ (X\theta - Y) \end{align} $$ 

$$ \begin{align} (X\theta - Y) =  & \begin{bmatrix} h_{\theta}(x^{(1)}) - y^{(1)} \\ h_{\theta}(x^{(2)}) - y^{(2)} \\ \vdots \\ h_{\theta}(x^{(m)}) - y^{(m)} \end{bmatrix} \\ \\  & \ \ \ \ \ \ \ \ \ \ \ (m,1) \end{align}$$

A row vector times a column vecotor gives a real number: 
$$ \begin{align} (h_{\theta}(X) - Y)^T \ (h_{\theta}(X) - Y) & =  \begin{bmatrix} h_{\theta}(x^{(1)}) - y^{(1)} \ \ \ \ \ h_{\theta}(x^{(2)}) - y^{(2)} \ \ \ \ \ \cdots \ \ \ \ \ h_{\theta}(x^{(m)}) - y^{(m)} \end{bmatrix} \cdot \begin{bmatrix} h_{\theta}(x^{(1)}) - y^{(1)} \\ h_{\theta}(x^{(2)}) - y^{(2)} \\ \vdots \\ h_{\theta}(x^{(m)}) - y^{(m)} \end{bmatrix} \\ & = \sum_{i = 1}^{m} \ (h_{\theta}(x^{(i)}) - y^{(i)})^2 \end{align} $$

In [1]:
from sklearn import datasets

# Load the diabetes dataset
diabetes_X, diabetes_y = datasets.load_diabetes(return_X_y=True)

In [4]:
diabetes_X.shape

(442, 10)

In [5]:
diabetes_y.shape

(442,)

In [6]:
# Split the data into training/testing sets
diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]

# Split the targets into training/testing sets
diabetes_y_train = diabetes_y[:-20]
diabetes_y_test = diabetes_y[-20:]