**Gradient Descent : **

Gradient descent is a calculus based method to find the least-squared fit for a given set of data. Its starts with arbitrarily chosen importance of multiple variables (usually [0, 0] if there is only one feature) and then adjusts iteratively in an effort to reach the global minima. 

![](https://miro.medium.com/max/1593/1*WGHn1L4NveQ85nn3o7Dd2g.png)

In [None]:
import pandas as pd 
import matplotlib.pyplot as plt
import numpy as np
import sys

In [None]:
MaxIterations = 2000 #Number of the times thetas will change
alpha = 0.01 
CostArray = [] #This will store the cost of each set of thetas 

**Data:** 
     
The data I will use to write the gradient descent algorithm comes from andrew ng's famous course on machine learning on coursera. it's a housing database with first column denoting the area of the house while the second column denotes the population of the area. Last column denotes the price of the house which we will consider to be the dependent variable. 

In [None]:
data = pd.read_csv('../input/ex1data2.csv', sep=",", header=None)
numberOfColumns = data.shape[0]
thetas = [0]*len(data.columns)

**Hypothesis : **
Hypothesis is the line or plane that is proposed using the thetas (for a single variable it will be y = mx + c line in cartesian plane). 

In [None]:
    
def hypothesis(theta, Xaxis):
    thetaArray = np.matrix(np.array(theta)) 
    Xaxis = np.matrix(Xaxis)
    xtrans = np.transpose(Xaxis) 
    mat =  np.matmul(thetaArray, xtrans)
    return mat

**Cost Function: **

Cost fuction can tell us how far our solution is from the data points that we have. Formula for finding cost function would be: 

![](http://s0.wp.com/latex.php?zoom=1.5&latex=J%28%5Ctheta%29+%3D+%5Cfrac%7B1%7D%7B2m%7D%5Csum%7B%28h_%7B%5Ctheta%7D%28x%5E%7B%28i%29%7D%29+-+y%5E%7B%28i%29%7D%29%5E2%7D&bg=ffffff&fg=000&s=0)


In [None]:
def costFunction(thetas, Xaxis, Yaxis):
    resultingMatrix = hypothesis(thetas, Xaxis) - np.matrix(Yaxis)
    totalSum = np.sum(np.square(resultingMatrix))
    totalCost = totalSum / (2*(numberOfColumns))
    CostArray.append(totalCost)
    return totalCost


**Gradient Descent**

Taking the derivative of the cost function we will have to loop through many time to get to the point where cost function is minimized. Ideally we will get the derivative of cost function to be 0, which means that we will have reached a minima. However, a potential problem that can occur in this way of solving the problem is that we could reach a local minima and not the global minima. 

![](https://miro.medium.com/max/765/1*QKHtyn4Rr-0R-s0an1eSsA.png)

In [None]:
def updateThetas(theta):
    temp = np.matrix(np.array(theta))
    resultingMatrix = hypothesis(theta, Xaxis) - np.matrix(Yaxis)
    X2 = np.matrix(Xaxis)
    multiplier = np.matmul(resultingMatrix, X2)
    temp = np.matrix(np.array(theta)) - ((alpha/(numberOfColumns))* multiplier)
    global thetas
    thetas = temp 

In [None]:
ones = pd.Series([1]*(data.shape[0]))

Xaxis = (data.iloc[:, :-1] - np.mean(data.iloc[:, :-1]))
Xaxis = Xaxis/np.std(data.iloc[:, :-1])
Xaxis = pd.concat([ones, Xaxis], axis=1)
Yaxis = data[data.columns[-1]]

for j in range(MaxIterations):
    updateThetas(thetas)
    CostArray.append(costFunction(thetas, Xaxis, Yaxis))
print(CostArray[1999])
print(thetas)

As noted from the results above the bias variable has the importance of 340412 which means that a house will atleast have that much minimum value without cosidering the other features. The second variable also has a strong positive impact (109447) whereas the last variable tends to depreciate the value of houses 

In [None]:
plt.plot(CostArray)
plt.show()

Here we can observe that cost had been minimized after 500 repetitions. 