### Mini Batch Gradient Descent (MBGD)

`Mini batch gradient descent is a combination of batch gradient descent and stochastic gradient descent. BGD uses all samples for update which makes it stable but slow. SGD uses one sample per update which makes it fast but noisy where as MBGD uses small group/batch of samples per update which makes it stable compared to SGD and faster compared to BGD.`

#### Advantage
> **1. Faster than BGD:** <br>
> **2. More stable than SGD:** <br>
> **3. Works well with large dataset:** <br>

#### Disadvantage
> **1. Slower Convergence than SGD:** <br>
> **2. Choosing batch size is crucial:** <br>`A small batch leads to noisy updates like SGD, while a large batch behave like BGD which makes it slower.`

In [1]:
# Import Libraries

from sklearn.datasets import make_regression
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.image

from sklearn.model_selection import train_test_split
from sklearn.linear_model import SGDRegressor, LinearRegression
from sklearn.metrics import r2_score

In [2]:
# Create data for regression

X,y = make_regression(n_samples= 100, n_features= 5, n_informative= 3, n_targets= 1, noise= 50, random_state= 1)

In [3]:
# Split data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size= 0.3, random_state= 0)

In [4]:
X_train

array([[ 9.20017933e-01, -1.95057341e-01, -7.01344426e-01,
         8.05393424e-01,  1.01012718e+00],
       [ 5.28879746e-01, -2.23708651e+00, -1.77183179e-02,
        -1.10771250e+00, -8.28627979e-01],
       [-3.86955093e-02, -1.61577235e+00,  4.08900538e-01,
         1.12141771e+00, -1.31228341e+00],
       [-7.75161619e-01,  1.27375593e+00, -1.85798186e+00,
         1.96710175e+00, -2.46169559e-02],
       [-5.04465863e-01,  1.60037069e-01,  3.15634947e-01,
         8.76168921e-01, -1.44411381e+00],
       [ 1.61336137e+00, -3.74804687e-01,  2.05462410e+00,
        -7.49969617e-01, -2.28765829e-01],
       [-3.06204013e-01,  8.27974643e-01,  7.62011180e-01,
         2.30094735e-01, -2.02220122e+00],
       [ 5.62761097e-01,  2.40737092e-01, -7.31127037e-02,
         2.80665077e-01, -6.17362064e-01],
       [-2.06014071e+00, -3.22417204e-01,  1.13376944e+00,
        -3.84054355e-01,  1.46210794e+00],
       [ 1.54335911e+00,  7.58805660e-01, -8.77281519e-01,
         8.84908814e-01

### Mini Batch Gradient Descent from scratch

In [8]:
# write a class for for gradient descent

class MBGD:
    '''
    training: method for learning the intercept and slope,
    testing: method for visual representation of the predicted line.
    '''
    def __init__(self, learning_rate=0.005, epochs= 100):
        '''
        Input: intercept, slope and learning rate as input
        '''
        self.intercept_ = None
        self.coeff_ = None
        self.learning_rate = learning_rate
        self.epochs = epochs
    
    def training(self, X_train, y_train):
        '''
        Input: independent(X_train), dependant(y_train) variable and number of iterations
        Output: returns updated intercept and slope
        '''

        X_train = pd.DataFrame(X_train) # convert data to dataframe

        # Initilize weights and bias
        self.intercept_ = 0
        self.coeff_ = np.ones(X_train.shape[1])
        
        for iter in range(self.epochs):
            for j in range(int(X_train.shape[0]//10)):

                # Choose random data
                index_ = X_train.sample(10, replace= False).index 
                
                # Compute predictions
                y_pred = np.dot(X_train.iloc[index_,:], self.coeff_) + self.intercept_ # m*x + b
    
                # Compute gradients
                derivative_intercept = -2 * np.mean(y_train[index_] - y_pred) # -2*(y_train - intercept - X_train[index_]* coeff)
                # derivative_intercept = -2 * np.sum(y_train[index_] - y_pred) # -2*(y_train[index_] - intercept - X_train[index_]* coeff)
                derivative_slope = -2 * np.dot((y_train[index_] - y_pred), X_train.iloc[index_,:]) # -2*((y_train[index_] - intercept - X_train[index_]* coeff)*X_train[index_])
                
                self.intercept_ = self.intercept_ - (self.learning_rate * derivative_intercept)
                self.coeff_ = self.coeff_ - (self.learning_rate * derivative_slope)
            
        return self.intercept_, self.coeff_

    def testing(self, X):
        '''
        Input: independent(X_train) and dependant(y_train) variable
        Output: predictions
        '''
        y_test_pred = np.dot(X,self.coeff_) + self.intercept_
        
        return y_test_pred

In [9]:
# create instance/object

mbgd = MBGD()
mbgd

<__main__.MBGD at 0x258d886d220>

In [10]:
mbgd.training(X_train, y_train)

(np.float64(-0.09647283424041245),
 array([12.74744395, 59.44129636,  8.84717471, -5.27963939, 30.01157535]))

`It takes random observation, so it will always give different output.`