**An Introduction to Gradient Descent and Linear Regression**

---

This example project demonstrates how the gradient descent algorithm may be used to solve a linear regression problem.




In [0]:
import pandas as pd
import numpy as np
from numpy import *

In [0]:
datapoints = pd.read_csv('https://raw.githubusercontent.com/llSourcell/linear_regression_live/master/data.csv',delimiter = ',',names=['x','y'])

In [3]:
datapoints.head()

Unnamed: 0,x,y
0,32.502345,31.707006
1,53.426804,68.777596
2,61.530358,62.562382
3,47.47564,71.546632
4,59.813208,87.230925


In [0]:
learning_rate = 0.0001 # hyperparameter---- how fast our model learns 

In [0]:
## Y = mx + b
initial_b = 0
initial_m =0 

In [0]:
num_of_iteration =1000

**Mean Squared Error Definition**



```
The mean squared error tells you how close a regression line is to a set of points. It does this by taking the distances from the points to the regression line (these distances are the “errors”) and squaring them. The squaring is necessary to remove any negative signs. It also gives more weight to larger differences. It’s called the mean squared error as you’re finding the average of a set of errors.
```
The smaller the mean squared error, the closer you are to finding the line of best fit. Depending on your data, it may be impossible to get a very small value for the mean squared error.

[MSE](https://www.statisticshowto.datasciencecentral.com/mean-squared-error/)

![alt text](https://spin.atomicobject.com/wp-content/uploads/linear_regression_error1.png)


In [0]:
#MSE
def compute_error_for_points(b,m,points):
    total_error =0
    for i in range(0,len(points)):
        x = points[i,0]
        y = points[0,1]
        total_error +=(y-(m *x + b ))**2
    return total_error/ float(len(points))

**Gradient Descent**

---
Gradient descent is a first-order iterative optimization algorithm for finding the minimum of a function.

To run gradient descent on  error function, we first need to compute its gradient. The gradient will act like a compass and always point us downhill. To compute it, we will need to differentiate our error function. Since our function is defined by two parameters (m and  b), we will need to compute a partial derivative for each. These derivatives work out to be:

![alt text](https://spin.atomicobject.com/wp-content/uploads/linear_regression_gradient1.png)


![alt text](https://raw.githubusercontent.com/mattnedrich/GradientDescentExample/master/gradient_descent_example.gif)

In [0]:
def step_gradient(b,m,datapoints,learning_rate):
    ### gradient descent
    b_gradient = 0
    m_gradient = 0
    N = float(len(datapoints))
    for i in range(0,len(datapoints)):
        x = datapoints[i,0]
        y = datapoints[i,1]
        b_gradient += -(2/N)* (y - (m * x )+ b)
        m_gradient += -(2/N)* x * (y - (m * x )+ b)
    new_b = b -(learning_rate * b_gradient)
    new_m = m -(learning_rate * m_gradient)
    return [new_b , new_m]
    

In [0]:
def gradient_descent_runner(datapoints , initial_b , initial_m , learning_rate ,num_of_iteration ):
    b=initial_b
    m=initial_m
    
    for i in range(0 ,num_of_iteration):
        b,m = step_gradient(b,m,array(datapoints),learning_rate)
    return [b,m] 
    

In [0]:
[b, m ] = gradient_descent_runner(datapoints , initial_b , initial_m , learning_rate ,num_of_iteration )

In [11]:
print(b)
print(m)

0.08989889221785102
1.4812542263671995
