In [1]:
import numpy as np
import pandas as pd

**For multi-variate data, only the linear term of each feature will be taken in the framing of the hypothesis.** 

*Let, x_1, x_2, … x_n, be the features on which the Target Outcome depends upon.* 

Then, the hypothesis for Multi-Variate Linear Regression:

![image.png](attachment:image.png)

where theta_0, theta_1, theta_2, theta_3,…., theta_n are the parameters
and  h(x1,x2,x3,.......xn) is called the **hypothesis function** and is represented by h(x).

h(x) in vector form can be represented by:
![image.png](attachment:image.png)

**The cost function is dependent on the hypothesis function.**

And we need to find the minimum value of this cost function.

So, these parameters, theta_0, theta_1, theta_2, …, theta_n have to assume such values for which the cost function (or simply cost) reaches to its minimum value possible. 

In other words, the minima of the Cost Function have to be found out.

In [2]:
class LRGD:
    def __init__(self):
        self.theta=0
        self.cost=0
        
    #hypothesis function
    def hypothesis(self,X, n):
        h = np.ones((X.shape[0],1))
        self.theta = theta.reshape(1,n+1)
        for i in range(0,X.shape[0]):
            h[i] = float(np.matmul(self.theta, X[i]))
        h = h.reshape(X.shape[0])
        return h
    
    #batch gradient descent
    def BGD(self, alpha, num_iters, h, X, y, n):    #alpha is the learning rate and num_iters is the no. of iterations.
        self.cost = np.ones(num_iters)
        for i in range(0,num_iters):
            self.theta[0] = self.theta[0] - (alpha/X.shape[0]) * sum(h - y)
            for j in range(1,n+1):
                self.theta[j] = self.theta[j] - (alpha/X.shape[0]) * sum((h-y) * X.transpose()[j])
            h = hypothesis(X, n)
            self.cost[i] = (1/X.shape[0]) * 0.5 * sum(np.square(h - y))
        self.theta = self.theta.reshape(1,n+1)
        return self

    #the main function in which we will use the above created function
    def fit(self,X, y, alpha=0.0001, num_iters=10000):
        n = X.shape[1]
        one_column = np.ones((X.shape[0],1))
        X = np.concatenate((one_column, X), axis = 1)
        # the parameter vector
        self.theta = np.zeros(n+1)
        # hypothesis calculation
        h = hypothesis(self, X, n)
        # returning the optimized parameters by Gradient Descent
        self.theta, self.cost = BGD(self,alpha,num_iters,h,X,y,n)
        return self
    
    #predictions
    def predit(self,X):
        X = np.concatenate((np.ones((X.shape[0],1)), X),axis = 1)
        predictions = hypothesis(self, X, X.shape[1] - 1)
        return predictions

*We will put this optimization technique in Linear Regression problem*

**In linear Regression cost function is defined as:**

![image.png](attachment:image.png)

*here m is the no. of features*

The Batch Gradient descent algorithm is defined as :
![image.png](attachment:image.png)

*Here, vanilla means pure / without any adulteration. Its main feature is that we take small steps in the direction of the minima by taking gradient of the cost function. So BGD is also known as Vanilla.*