### Normal Equation

Normal equation: Method to solve for $\theta$ analitically.   
Intuition:  
$J(\theta) = a\theta^2 + b\theta + c$
will find the minimum by solve   
$\alpha\frac{\partial }{\partial \theta_j}J(\theta_0, \theta_1, \dots, \theta_n)  = 0$

$\theta = (X^TX)^{-1}X^Ty$

#### Example

In [18]:
import pandas as pd
from numpy import dot
from numpy.linalg import inv


m = 4

data = pd.DataFrame({
    'x0': [1, 1, 1, 1],
    'size': [2014, 1416, 1534, 852],
    'num_bedrooms': [5, 3, 3, 2],
    'num_floors': [1, 2, 2, 1],
    'age_home': [45, 40, 30, 36],
    'price': [460,  232, 315, 178],
})
X = data.as_matrix(['x0', 'size', 'num_bedrooms', 'num_floors', 'age_home'])
y = data.as_matrix(['price'])
print("### X ###\n", X)
print("### y ###\n", y)
theta = inv(dot(X.T, X)).dot(X.T).dot(y)
print("### theta ###\n", theta)

### X ###
 [[   1 2014    5    1   45]
 [   1 1416    3    2   40]
 [   1 1534    3    2   30]
 [   1  852    2    1   36]]
### y ###
 [[460]
 [232]
 [315]
 [178]]
### theta ###
 [[ 2.99636719e+02]
 [-2.73181915e-01]
 [ 1.44888672e+02]
 [-1.14130859e+01]
 [-7.47497559e+00]]


$m$ examples $(x^{(1)}, x^{(1)}), \dots, (x^{(m)}, x^{(m)})$; $n$ features   

$ x^{(i)} = \begin{bmatrix}x_0^{(i)} \\ x_1^{(i)} \\ x_2^{(i)} \\ \vdots \\ x_n^{(i)}\end{bmatrix} \in \mathbb{R}^{n+1}$



$ X = \begin{bmatrix}(x^{(1)})^T \\ (x^{(2)})^T \\ (x^{(3)})^T \\ \vdots \\ (x^{(m)})^T\end{bmatrix} $ &nbsp;&nbsp;&nbsp;&nbsp; (first column is fill of ones)


$\theta = (X^TX)^{-1}X^Ty$

No need feature scaling!

$m$ trainig examples, $n$ features

Gradient descent
* need to chose $\alpha$
* needs may iterations
* works wll even when $n$ is large

Normal Equation
* no need to chose $\alpha$
* don't need to iterate
* need to compute $(X^TX)^{-1}$, which if the matrix is $n$ x $n$ is $O(n^3)$
* slow if n is very large (if n about 10000 choose gradient descent)

### Normal Equation Noninvertibility

$\theta = (X^TX)^{-1}X^Ty$

* what if $X^TX$ is non-invetible?(singluar/degenerate)
* ocatave: pinv(X'*X)*X'*y (can compute also if is not invertible)

Common causes:
* Redundat features (linear dependent)
* Too many features (e.g $m \leq n$) => delete some features of use regularization