# Linear Regression with Multiple Variables

Linear regression with multiple variables is also known as "multivariate linear regression". We now introduce notation for equations where we can have any number of input variables.

* $x_j(i)$ = value of feature j in the $i^{th}$ training example
* $x(i)$ = the column vector of all the feature inputs of the $i^{th}$ training example
* $m$ = the number of training examples
* $n=|x(i)|$ the number of features

Now define the multivariable form of the hypothesis function as follows, accommodating these multiple features:

$$ h_\theta (x) = \theta_0 + \theta_1 x_1 + \theta_2 x_2 + \theta_3 x_3 + \cdots + \theta_n x_n $$

In order to develop intuition about this function, we can think about $\theta_0$ as the basic price of a house, $\theta_1$ as the price per square meter, $\theta_2$ as the price per floor, etc. $x_1$ will be the number of square meters in the house, $x_2$ the number of floors, etc.

Using the definition of matrix multiplication, our multivariable hypothesis function can be concisely represented as:

\begin{equation}
h_{\theta}(x) = 
\begin{bmatrix} 
\theta_0 \hspace{2em}  
\theta_1 \hspace{2em}  
...  \hspace{2em}  
\theta_n
\end{bmatrix}
\begin{bmatrix}
x_0 \\ 
x_1 \\ 
\vdots \\ 
x_n
\end{bmatrix} 
= \theta^T x
\end{equation}

This is a vectorization of our hypothesis function for one training example; see the lessons on vectorization to learn more.

Remark: Note that for convenience reasons in this course Mr. Ng assumes $x_0^{(i)}=1$

The training examples are stored in X row-wise, like such:

\begin{equation}
X = 
\begin{bmatrix} 
x^{(1)}_0 & x^{(1)}_1  \\
x^{(2)}_0 & x^{(2)}_1  \\
x^{(3)}_0 & x^{(3)}_1 \\
\end{bmatrix},
\theta = 
\begin{bmatrix}
\theta_0 \\
\theta_1 \\
\end{bmatrix}
\end{equation}

You can calculate the hypothesis as a column vector of size (m x 1) with:
\begin{equation}
h_\theta(X) = X \theta
\end{equation}

## Cost function

For the parameter vector $\theta$ the cost function is:

\begin{equation}
J(\theta) = \frac {1}{2m}  \sum_{i=1}^m \left (h_\theta (x^{(i)}) - y^{(i)} \right)^2
\end{equation}

The vectorized version is:

\begin{equation}
J(\theta) = \frac {1}{2m} (X\theta - \vec{y})^{T} (X\theta - \vec{y})
\end{equation}

Where $\vec y$ denotes the vector of all y values.

## Gradient Descent for Multiple Variables

The gradient descent equation itself is generally the same form; we just have to repeat it for our $n$ features:

\begin{equation}
\theta_j := \theta_j - \alpha \frac{1}{m} \sum_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) \cdot x_j^{(i)} \;  \text{for j := 0...n}
\end{equation}