# Simple Linear Regression: Neural Net Style

We're going to begin our journey into neural networks by starting with simple linear regression, but add a little twist. We're going to fit our simple linear regression and learn its coefficients as if it was a neural network, using gradient descent. We'll use both forward propagation and backward propagation (these terms should become clearer soon), which will help us to (hopefully) transition into talking about neural networks more seamlessly. 

## Standard Simple Linear Regression

Recall that simple linear regression can be denoted with the following formula: 

<img src="https://github.com/sallamander/neural-networks-intro/blob/master/mini-books/shallow-neural-networks/imgs/equations/simp_linear.png?raw=true" width=110 \>

where 
<img src="https://github.com/sallamander/neural-networks-intro/blob/master/mini-books/shallow-neural-networks/imgs/variables/x1.png?raw=true" width=20 style="vertical-align: text-middle; display: inline-block; padding-top:0; margin-top:0" \> is a column vector of individual observations. Before solving for 
<img src="https://github.com/sallamander/neural-networks-intro/blob/master/mini-books/shallow-neural-networks/imgs/variables/beta0.png?raw=true" style="vertical-align: text-middle; display: inline-block; padding-top:0; margin-top:0;" width=20 \> 
and 
<img src="https://github.com/sallamander/neural-networks-intro/blob/master/mini-books/shallow-neural-networks/imgs/variables/beta1.png?raw=true" width=20 style="vertical-align: text-middle; display: inline-block; padding-top:0; margin-top:0" \>, we'll need to define an error metric. We'll use **squared error**, as is common to use in linear regression. Thus, the error for an *individual observation* is given by the following formula: 

<img src="https://github.com/sallamander/neural-networks-intro/blob/master/mini-books/shallow-neural-networks/imgs/equations/ind_squared_error.png?raw=true" width=115 \>

where 
<img src="https://github.com/sallamander/neural-networks-intro/blob/master/mini-books/shallow-neural-networks/imgs/variables/yi.png?raw=true" style="vertical-align: text-middle; display: inline-block; padding-top:0; margin-top:0" width=17 \> 
is the true value, and 
<img src="https://github.com/sallamander/neural-networks-intro/blob/master/mini-books/shallow-neural-networks/imgs/variables/yhati.png?raw=true" style="vertical-align: text-middle; display: inline-block; padding-top:0; margin-top:0" width=16 \> 
is our predicted value. Given this error for an individual observation, the average error (**mean squared error**) across *all observations* is: 

<img src="https://github.com/sallamander/neural-networks-intro/blob/master/mini-books/shallow-neural-networks/imgs/equations/agg_squared_error.png?raw=true" width=150 \>

Typically, we'll [solve simple linear regression](https://isites.harvard.edu/fs/docs/icb.topic515975.files/OLSDerivation.pdf) (without regularization) by minimizing this mean squared error, and by doing so we'll obtain the closed form solution for
<img src="https://github.com/sallamander/neural-networks-intro/blob/master/mini-books/shallow-neural-networks/imgs/variables/beta0.png?raw=true" style="vertical-align: text-middle; display: inline-block; padding-top:0; margin-top:0;" width=20 \> 
and 
<img src="https://github.com/sallamander/neural-networks-intro/blob/master/mini-books/shallow-neural-networks/imgs/variables/beta1.png?raw=true" width=22 style="vertical-align: text-middle; display: inline-block; padding-top:0; margin-top:0" \>: 

<img src="https://github.com/sallamander/neural-networks-intro/blob/master/mini-books/shallow-neural-networks/imgs/equations/beta1_soln_simp_linear.png?raw=true" width=160 \>  
<img src="https://github.com/sallamander/neural-networks-intro/blob/master/mini-books/shallow-neural-networks/imgs/equations/beta0_soln_simp_linear.png?raw=true" width=120 \>

It turns out that we could also solve our linear regression problem using gradient descent...

## Simple Linear Regression using Gradient Descent

### Simple Linear Regression in a Computational Graph

Before solving our simple linear regression problem using gradient descent, it'll be useful to view it as a computational graph. This will be our first step towards talking about neural networks, as we'll visualize those in a similar manner. Using a computational graph, we can visualize our simple linear regression problem as follows: 

<img src="https://github.com/sallamander/neural-networks-intro/blob/master/mini-books/shallow-neural-networks/imgs/custom/simp_linear_comp_graph.png?raw=true" align="center" width="400"\>

Here, we're using blocks (**nodes**) to denote each of the parts of our simple linear regression equation, and lines to denote **connections** between nodes. We can begin reading our comptuational graph (read left to right) by starting with 
<img src="https://github.com/sallamander/neural-networks-intro/blob/master/mini-books/shallow-neural-networks/imgs/variables/beta0.png?raw=true" style="vertical-align: text-middle; display: inline-block; padding-top:0; margin-top:0;" width=20 \>, 
<img src="https://github.com/sallamander/neural-networks-intro/blob/master/mini-books/shallow-neural-networks/imgs/variables/beta1.png?raw=true" style="vertical-align: text-middle; display: inline-block; padding-top:0; margin-top:0;" width=22 \> , and 
<img src="https://github.com/sallamander/neural-networks-intro/blob/master/mini-books/shallow-neural-networks/imgs/variables/x1.png?raw=true" style="vertical-align: text-middle; display: inline-block; padding-top:0; margin-top:0;" width=20 \>,
each of which get their own node. Next, we have a multiplication node to denote that 
<img src="https://github.com/sallamander/neural-networks-intro/blob/master/mini-books/shallow-neural-networks/imgs/variables/beta1.png?raw=true" style="vertical-align: text-middle; display: inline-block; padding-top:0; margin-top:0;" width=20 \> and 
<img src="https://github.com/sallamander/neural-networks-intro/blob/master/mini-books/shallow-neural-networks/imgs/variables/x1.png?raw=true" style="vertical-align: text-middle; display: inline-block; padding-top:0; margin-top:0;" width=20 \> 
are multiplied together. Finally, the summation node denotes that the product of 
<img src="https://github.com/sallamander/neural-networks-intro/blob/master/mini-books/shallow-neural-networks/imgs/variables/beta1.png?raw=true" style="vertical-align: text-middle; display: inline-block; padding-top:0; margin-top:0;" width=22 \>  and 
<img src="https://github.com/sallamander/neural-networks-intro/blob/master/mini-books/shallow-neural-networks/imgs/variables/x1.png?raw=true" style="vertical-align: text-middle; display: inline-block; padding-top:0; margin-top:0;" width=20 \>  is summed with 
<img src="https://github.com/sallamander/neural-networks-intro/blob/master/mini-books/shallow-neural-networks/imgs/variables/beta0.png?raw=true" style="vertical-align: text-middle; display: inline-block; padding-top:0; margin-top:0;" width=20 \> to obtain 
<img src="https://github.com/sallamander/neural-networks-intro/blob/master/mini-books/shallow-neural-networks/imgs/variables/y.png?raw=true" style="vertical-align: text-middle; display: inline-block; padding-top:0; margin-top:0;" width=14 \> . 
We'll now fit this depiction into a solution using gradient descent. 

### Solving Simple Linear Regression via Gradient Descent 

We'll formalize the steps of gradient descent in the next notebook and for now focus on two steps - the **forward propagation** step and the **backward propagation** step. These are steps that we'll discuss often when working with neural networks. 

### Forward Propagation

In forward propagation, we are simply going to read the computational graph from **left to right** (as we did before), using the calculations it gives to compute our output (which in this case is 
<img src="https://github.com/sallamander/neural-networks-intro/blob/master/mini-books/shallow-neural-networks/imgs/variables/yhat.png?raw=true" style="vertical-align: text-middle; display: inline-block; padding-top:0; margin-top:0;" width=13 \>).

<img src="https://github.com/sallamander/neural-networks-intro/blob/master/mini-books/shallow-neural-networks/imgs/custom/simp_linear_comp_graph_forprop.png?raw=true" width=400\>

### Backward Propagation

In backward propagation, we'll read our computational graph from **right to left**. In between forward and backward propagation, we'll calculate the error for each observation. Next, we'll calculate the gradient of each of those errors with respect to each of the coefficients. Finally, we'll use the mean of those gradients to update the coefficients. To get the mean, we'll want the following two quantities for each observation: 

<img src="https://github.com/sallamander/neural-networks-intro/blob/master/mini-books/shallow-neural-networks/imgs/derivatives/ei_beta0.png?raw=true" width=30\>
<img src="https://github.com/sallamander/neural-networks-intro/blob/master/mini-books/shallow-neural-networks/imgs/derivatives/ei_beta1.png?raw=true" width=30\>

To obtain these quantities, we'll use the chain rule: 

<img src="https://github.com/sallamander/neural-networks-intro/blob/master/mini-books/shallow-neural-networks/imgs/derivatives/ei_beta0_chain.png?raw=true" width=120\>
<img src="https://github.com/sallamander/neural-networks-intro/blob/master/mini-books/shallow-neural-networks/imgs/derivatives/ei_beta1_chain.png?raw=true" width=110\>

Visually, this step would look something like the following: 

<img src="https://github.com/sallamander/neural-networks-intro/blob/master/mini-books/shallow-neural-networks/imgs/custom/simp_linear_comp_graph_backprop.png?raw=true" width=400\>

In the next notebook, we'll formalize our gradient descent procedure, actually walk through calculating those derivatives, and then code it up using `numpy`. 