In [1]:
%run Latex_macros.ipynb
%run beautify_plots.py

<IPython.core.display.Latex object>

# Regression

Given examples $\langle \X, \y \rangle$ a *regression task* is to predict
- a continuous $\y$
- from a vector of features $\x$

This differs from a *Classification* task (e.g., predicting the digit represented by an image)
- where the $\y$ are *discrete* values

To be concrete: imagine we need to predict the Price $\hat{\y}$ of a house given only its Size $\x$.

We could imagine an approach similar to the KNN algorithm used for classification
- compare $\x$ to each $\x^\ip$ in the training set $\X$
    - measure the "distance" from $\x$ to $\x^\ip$ to come up with a weight
- predict $\hat{\y}$ as the weighted average of the $\y^\ip$

A strong criticism of KNN is that $\Theta$, the parameters, comprised all $m$ training examples
- large
- memorization versus generalization

The fact that $\y$ is *continuous* rather than discrete
- opens the possibility of a *numerical* relationship
between features $\x$ and labels $\y$.

We will take advantage of this in our first Regression model.

# Linear Regression

Our first predictor/estimator/model is called Linear Regression.

*Linear Regression* restricts the form of relationship between $\y$ and $\x$ to
$$
\hat{\y} = \Theta^T \cdot \x
$$

That is: the predicted $\hat{\y}$ is a linearly-weighted (with weights from vector $\Theta$) sum of features $\x$.

Anyone who has fit a straight line to a cloud of points has performed Linear Regression.

A straight line has intercept $\Theta_0$ and slope $\Theta_1$
$$
\hat{\y} = \Theta_0 + \Theta_1 * \x_1
$$


<table>
    <tr>
        <th><center>Fitting a model</center></th>
    </tr>
    <tr>
        <td><img src="images/W2_L0_S4_Terminology_training.png" width="60%"></td>
    </tr>
</table>

In our example
- we expect the Price to increase with Size $\x_1$
    - $\Theta_1$ tells us how much each extra unit of Size increases the Price

Rather than writing the intercept $\Theta_0$ as a separate term we can modify $\x$ and $\Theta$

$$
\begin{array}[lll]\\
\Theta^T & = & (\Theta_0, \Theta_1) \\
\x'^T     & = & (1, \x_1) \\
\end{array}
$$

so that the straight line may be written as
$$
\hat{\y} = \Theta^T \cdot \x'
$$

Because the size of $\Theta^T$ and $\x$ must match
- we augmented $\x$ with a "constant" feature 1
    - that corresponds to the intercept

<table>
    <tr>
        <th><center>Fitting a Linear Regression model</center></th>
    </tr>
    <tr>
        <td><img src="images/W1_L4_S11_Terminology_training_linear_regr.png"  width="60%"></td>
    </tr>
</table>

The real power of Linear Regression can be seen when there is more than one non-constant feature.
- Predict Price given features Size, Number of bedrooms, Number of bathrooms, Proximity to transportation
- $\Theta_j$ tells us how much each unit increase in feature $\x_j$ affects Price.

The prediction $\y$ is linear in each feature $\x_j$, hence the name *linear* regression

Anyone recognize this expression: $\Theta^T \cdot \x$ ?

It's our friend the dot product, as promised in the introductory lecture.

Watch out, this will be a regularly recurring character in our series.


## Linear Regression in matrix form

We will typically augment $\x$ with the leading "constant feature 1" to capture the intercept.

$$
\begin{array}[lll]\\
\Theta^T & = & (\Theta_0, \Theta_1, \ldots, \Theta_n) \\
\x'^T     & = & (1, \x_1, \ldots, \x_n) \\
\end{array}
$$



We do this for each example in $\X$ so that $\X$ becomes

$
\X' =
\begin{pmatrix}
  1  &\x^{(1)}_1  & \ldots &\x^{(1)}_n \\ 
   1 &\x^{(2)}_1  &\ldots  &\x^{(2)}_n \\ 
   \vdots & \vdots & \ldots &  \vdots \\
   1 &\x^{(m)}_1  &\ldots  &\x^{(m)}_n \\
  \end{pmatrix}
$

We sometimes refer to $\X$ as the *design matrix*.

So we could simultaneously obtain our prediction for *all* training examples by the matrix product

$$
\hat{\y} = \X' \Theta
$$

Using matrix notation 
- mimics an implementation using a language(such as `numPy`) with matrix arithmetic
- allows us to evaluate examples in parallel

## Examples

Some examples
- Predict the Price of a stock given Earnings ($|| \x || = 1$)
- Predict the Price of a stock given Earnings, Dividend, and Sales ($||\x|| = 3$)


In [2]:
print("Done")

Done
