# Simple Linear Regression: Neural Net Style

We're going to begin our journey into neural networks by starting with simple linear regression, but add a little twist. We're going to fit our simple linear regression and learn its coefficient as if it was a neural network, using gradient descent. We'll use both forward propagation and backward propagation (these terms should become clearer soon), which will help us to (hopefully) transition into talking about neural networks more seamlessly. 

## Standard Simple Linear Regression

Recall that simple linear regression can be denoted with the following formula: 

<img src="imgs/equations/simp_linear.png" width=110 \>

where 
<img src="imgs/variables/x1.png" width=20 style="vertical-align: text-middle; display: inline-block; padding-top:0; margin-top:0" \> is a column vector of individual observations. Before solving for our 
<img src="imgs/variables/beta0.png" style="vertical-align: text-middle; display: inline-block; padding-top:0; margin-top:0;" width=20 \> 
and 
<img src="imgs/variables/beta1.png" width=20 style="vertical-align: text-middle; display: inline-block; padding-top:0; margin-top:0" \>, we'll need to define an error metric. We'll use **squared error**, as is common to use in linear regression. Thus, our error for an *individual observation* is given by the following formula: 

<img src="imgs/equations/ind_squared_error.png" width=115 \>

where 
<img src="imgs/variables/yi.png" style="vertical-align: text-middle; display: inline-block; padding-top:0; margin-top:0" width=14 \> 
is our true value, and 
<img src="imgs/variables/yhati.png" style="vertical-align: text-middle; display: inline-block; padding-top:0; margin-top:0" width=13 \> 
is our predicted value. Given this error for an individual observation, the average error (**mean squared error**) across *all observations* is: 

<img src="imgs/equations/agg_squared_error.png" width=150 \>

Typically, we'll solve linear regression (without regularization) by minimizing this mean squared error, and by doing so we'll obtain the closed form solution for our coefficients: 

<img src="imgs/equations/beta1_soln_simp_linear.png" width=160 \>  
<img src="imgs/equations/beta0_soln_simp_linear.png" width=120 \>

It turns out that we could also solve our linear regression problem using gradient descent...

## Simple Linear Regression using Gradient Descent

### Simple Linear Regression in a Computational Graph

Before solving our linear regression problem through gradient descent, it'll be useful to view it through a computational graph. This will be our first step towards talking about neural networks, as we'll visualize those in a similar manner. Using a computational graph, we can visualize our simple linear regression problem as follows: 

<img src="imgs/custom/simp_linear_comp_graph.png" align="center" width="400"\>

Here, we're using blocks (**nodes**) to denote each of the parts of our linear regression equation in our computational graph, which is read from left to right. 
<img src="imgs/variables/beta0.png" style="vertical-align: text-middle; display: inline-block; padding-top:0; margin-top:0;" width=20 \>, 
<img src="imgs/variables/beta1.png" style="vertical-align: text-middle; display: inline-block; padding-top:0; margin-top:0;" width=20 \> , and 
<img src="imgs/variables/x1.png" style="vertical-align: text-middle; display: inline-block; padding-top:0; margin-top:0;" width=20 \> 
each get their own node. Then, we have a multiplication node to denote that 
<img src="imgs/variables/beta1.png" style="vertical-align: text-middle; display: inline-block; padding-top:0; margin-top:0;" width=20 \> and 
<img src="imgs/variables/x1.png" style="vertical-align: text-middle; display: inline-block; padding-top:0; margin-top:0;" width=20 \> 
are multiplied together. Finally, the summation node denotes that the product of 
<img src="imgs/variables/beta1.png" style="vertical-align: text-middle; display: inline-block; padding-top:0; margin-top:0;" width=20 \>  and 
<img src="imgs/variables/x1.png" style="vertical-align: text-middle; display: inline-block; padding-top:0; margin-top:0;" width=20 \>  is summed with 
<img src="imgs/variables/beta0.png" style="vertical-align: text-middle; display: inline-block; padding-top:0; margin-top:0;" width=20 \> to obtain 
<img src="imgs/variables/y.png" style="vertical-align: text-middle; display: inline-block; padding-top:0; margin-top:0;" width=14 \> . 
We'll now fit this depiction into a solution using gradient descent. 

### Solving Simple Linear Regression via Gradient Descent 

Solving our linear regression problem using gradient descent will involve several steps: 

1. Randomly initialize values for our coefficients, 
<img src="imgs/variables/beta0.png" style="vertical-align: text-middle; display: inline-block; padding-top:0; margin-top:0;" width=20 \> and
<img src="imgs/variables/beta1.png" style="vertical-align: text-middle; display: inline-block; padding-top:0; margin-top:0;" width=20 \>    
2. While we haven't met some stopping condition:  
 A. Calculate our predicted outcomes, 
<img src="imgs/variables/yhat.png" style="vertical-align: text-middle; display: inline-block; padding-top:0; margin-top:0;" width=13 \>. 
<img src="imgs/variables/yhat.png" style="vertical-align: text-middle; display: inline-block; padding-top:0; margin-top:0;" width=13 \> 
will be the the result of plugging in our current values for 
<img src="imgs/variables/beta0.png" style="vertical-align: text-middle; display: inline-block; padding-top:0; margin-top:0;" width=20 \>
and 
<img src="imgs/variables/beta1.png" style="vertical-align: text-middle; display: inline-block; padding-top:0; margin-top:0;" width=20 \>, as well as our values for 
<img src="imgs/variables/x1.png" style="vertical-align: text-middle; display: inline-block; padding-top:0; margin-top:0;" width=20 \>, into our simple linear regression equation (<img src="imgs/equations/simp_linear.png" width=100 style="vertical-align: text-middle; display: inline-block; padding-top:0; margin-top:0;" \>). 
This is the **forward propagation** step.    
 B. For each of our observations, calculate our error using the true values
<img src="imgs/variables/yi.png" style="vertical-align: text-middle; display: inline-block; padding-top:0; margin-top:0;" width=14 \>, our predicted values 
<img src="imgs/variables/yhati.png" style="vertical-align: text-middle; display: inline-block; padding-top:0; margin-top:0;" width=13 \>, and our error formula: 
<img src="imgs/equations/ind_squared_error.png" width=110 \>
 C. For each of our observations, calculate the gradient of the error with respect to each one of our coefficients, and then update them accordingly (**backward propagation**).  
 
We'll formalize this more in the next notebook, but for now let's walk through 2A and 2C in more detail, specifically because the terms **forward propagation** and **backward propagation** are key terms when working with neural networks. 

### Forward Propagation

In forward propagation, we are going to read our computational graph from left to right, using the calculations it gives to compute our output (which for us is 
<img src="imgs/variables/yhat.png" style="vertical-align: text-middle; display: inline-block; padding-top:0; margin-top:0;" width=13 \>). That is, we'll **propagate our inputs forward** through our computational graph in order to obtain our output. 

<img src="imgs/custom/simp_linear_comp_graph_forprop.png" width=400\>

### Backward Propagation

In backward propagation, we'll read our computational graph from right to left, but with a set of inputs that is a function of our error. In between forward and backward propagation, we'll calculate our error for each observation (step 2B above). Then, we'll use the mean of the gradient of each of those errors with respect to each of our coefficients to update each coefficient. In the end, we'll want the following two quantities for each observation: 

<img src="imgs/derivatives/ei_beta0.png" width=30\>
<img src="imgs/derivatives/ei_beta1.png" width=30\>

To obtain these quantities, we'll use the chain rule: 

<img src="imgs/derivatives/ei_beta0_chain.png" width=120\>
<img src="imgs/derivatives/ei_beta1_chain.png" width=110\>

Visually, this step would look something like the following: 

<img src="imgs/custom/simp_linear_comp_graph_backprop.png" width=400\>

In the next notebook, we'll formalize our gradient descent procedure, actually walk through calculating those derivatives above, and then code it up using `numpy`. 