# Gradient Descent and Cost Function for linear regression

In mathematics **gradient descent** (also often called **steepest descent**) is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function. 

Object of the gradient descent is to mininize the cost function.

A **cost function(Loss Function)** is a formula used to predict the cost(loss) that will be experienced at a certain activity level

One of the Cost Function is MSE.
MSE is used for Linear regression.

![Screenshot 2022-02-06 at 8.24.03 PM.png](attachment:f06af8e2-e9ce-4375-80f8-aaaba717fbf8.png)

Ref : https://statisticsbyjim.com/regression/mean-squared-error-mse/

* 1st one is the m derivative of the cost function
* 2nd one is b derivative of the cost function


![Screenshot 2022-02-06 at 10.34.21 PM.png](attachment:657ffb29-ec77-4dc5-b943-3d22d4e02b02.png)

Ref : https://www.skillbasics.com/courses/machine-learning-for-beginners/lecture/37

In [1]:
import numpy as np


In [2]:
# y = mx+b
# m is slope (coefficient)
# b is y-intercept
def gradient_descent(x, y):
    m_curr = b_curr = 0 # assigned m current and b current to 0
    iterations = 10000
    n = len(x)
    learning_rate = 0.08 # based on this value we update the next m and b values
    
    for i in range(iterations):
        y_predicted = m_curr * x + b_curr  # y = m * x + b
        cost = (1/n) * sum([w**2 for w in (y-y_predicted)])  # MSE FORMULA, just used here for the display purpose
        md = -(2/n)*sum(x*(y-y_predicted)) # md means m derivatives of cost function (shown in the above image)
        bd = -(2/n)*sum(y-y_predicted) # bd means b derivatives of cost function (shown in the above image)
        m_curr = m_curr - learning_rate * md
        b_curr = b_curr - learning_rate * bd
        print("m {},b {},cost {},iteration {}".format(m_curr, b_curr, cost, i))
    

In the above method tweak the **learning_rate** and number of **iterations**.

Start learning rate with 0.001 then tweak it with number of iteration untill cost is minimum.

In [3]:
x = np.array([1,2,3,4,5])
y = np.array([5,7,9,11,13])

In [4]:
gradient_descent(x,y)

m 4.96,b 1.44,cost 89.0,iteration 0
m 0.4991999999999983,b 0.26879999999999993,cost 71.10560000000002,iteration 1
m 4.451584000000002,b 1.426176000000001,cost 56.8297702400001,iteration 2
m 0.892231679999997,b 0.5012275199999995,cost 45.43965675929613,iteration 3
m 4.041314713600002,b 1.432759910400001,cost 36.35088701894832,iteration 4
m 1.2008760606719973,b 0.7036872622079998,cost 29.097483330142282,iteration 5
m 3.7095643080294423,b 1.4546767911321612,cost 23.307872849944438,iteration 6
m 1.4424862661541864,b 0.881337636696883,cost 18.685758762535738,iteration 7
m 3.4406683721083144,b 1.4879302070713722,cost 14.994867596913156,iteration 8
m 1.6308855378034224,b 1.0383405553279617,cost 12.046787238456794,iteration 9
m 3.2221235247119777,b 1.5293810083298451,cost 9.691269350698109,iteration 10
m 1.7770832372205707,b 1.1780607551353204,cost 7.8084968312098315,iteration 11
m 3.0439475772474127,b 1.5765710804477953,cost 6.302918117062937,iteration 12
m 1.8898457226770244,b 1.303224870497