### **Gradient Descent**

---

The idea is to take repeated steps in the opposite direction of the gradient (or approximate gradient) of the function at the current point, because this is the direction of steepest descent.

#### **Intuition**

The intuition is to find the slope of the equation that needs to be minimised, at a specific point and moving in the direction opposite to this spole in order to find a point where the slope is close to or equal to **zero**. 

If the slope is positive, then the equation is increasing at that point, and we move in the backwards direction.

But if the slope is negative, then the equation is decreasing at that point, and we need to move in the forward direction.

Let the equation be dependent on just one variable, say $b$, then -

### $$ loss= f(b) $$

### $$ b_{new} = b_{old} - f^\prime(b_{old}) $$

### $$ b_{new} = b_{old} - \eta * f^\prime(b_{old}) $$

where, 

### $$\eta = learning\_rate(hyper\_parameter) $$

Convex functions have only one minima (global minima)

Non-convex points have many minima, and can lead to the gradient descent converging at a local minima instead of the global minima.

Properly scaled data leads to faster convergance.

<img src="../../assets/convex_concave.png"/>

---

### **Gradient descent for linear regression**

### $$ y = \sum_{i=0}^n \beta_i X_i $$

where,

$$X = \begin{bmatrix}
        1 & x_{11} & ... & x_{1m}\\
        1 & x_{21} & ... & x_{2m}\\
        \vdots\\
        1 & x_{n1} & ... & x_{nm}
      \end{bmatrix} $$ 


### **$$ {{\delta L}\over{\delta \beta m}} = {{-2}\over{n}}\sum (y_i - \hat{y_i})X_{im}$$**

### Types of Gradient descent

1. Batch gradient descent

In [None]:
import numpy as np

class GDRegressor:
    
    def __init__(self,learning_rate=0.01,epochs=100):
        
        self.coef_ = None
        self.intercept_ = None
        self.lr = learning_rate
        self.epochs = epochs
        
    def fit(self,X_train,y_train):
        # init your coefs### **$$ Loss = (XW- Y) ^ T (XW - Y) + \lambda W^T W$$**

        self.intercept_ = 0
        self.coef_ = np.ones(X_train.shape[1])
        
        for i in range(self.epochs):
            # update all the coef and the intercept
            y_hat = np.dot(X_train,self.coef_) + self.intercept_
            intercept_der = -2 * np.mean(y_train - y_hat)
            self.intercept_ = self.intercept_ - (self.lr * intercept_der)
            
            coef_der = -2 * np.dot((y_train - y_hat),X_train)/X_train.shape[0]
            self.coef_ = self.coef_ - (self.lr * coef_der)
            
    def predict(self,X_test):
        return np.dot(X_test,self.coef_) + self.intercept_

2. Stochastic gradient descent

In [None]:
import numpy as np

class SGDRegressor:
    
    def __init__(self,learning_rate=0.01,epochs=100):
        
        self.coef_ = None
        self.intercept_ = None
        self.lr = learning_rate
        self.epochs = epochs
        
    def fit(self,X_train,y_train):
        # init your coefs
        self.intercept_ = 0
        self.coef_ = np.ones(X_train.shape[1])
        
        for i in range(self.epochs):
            for j in range(X_train.shape[0]):
                idx = np.random.randint(0,X_train.shape[0])
                
                y_hat = np.dot(X_train[idx],self.coef_) + self.intercept_
                
                intercept_der = -2 * (y_train[idx] - y_hat)
                self.intercept_ = self.intercept_ - (self.lr * intercept_der)
                
                coef_der = -2 * np.dot((y_train[idx] - y_hat),X_train[idx])
                self.coef_ = self.coef_ - (self.lr * coef_der)
            
    def predict(self,X_test):
        return np.dot(X_test,self.coef_) + self.intercept_

3. Mini-batch gradient descent

In [None]:
import random

class MBGDRegressor:
    
    def __init__(self,batch_size,learning_rate=0.01,epochs=100):
        
        self.coef_ = None
        self.intercept_ = None
        self.lr = learning_rate
        self.epochs = epochs
        self.batch_size = batch_size
        
    def fit(self,X_train,y_train):
        # init your coefs
        self.intercept_ = 0
        self.coef_ = np.ones(X_train.shape[1])
        
        for i in range(self.epochs):
            
            for j in range(int(X_train.shape[0]/self.batch_size)):
                
                idx = random.sample(range(X_train.shape[0]),self.batch_size)
                
                y_hat = np.dot(X_train[idx],self.coef_) + self.intercept_
                #print("Shape of y_hat",y_hat.shape)
                intercept_der = -2 * np.mean(y_train[idx] - y_hat)
                self.intercept_ = self.intercept_ - (self.lr * intercept_der)

                coef_der = -2 * np.dot((y_train[idx] - y_hat),X_train[idx])
                self.coef_ = self.coef_ - (self.lr * coef_der)
            
    def predict(self,X_test):
        return np.dot(X_test,self.coef_) + self.intercept_

<img src="../../assets/diff_gd.png"/>
<img src="../../assets/diff_gd_cost.png"/>