<h1>Linear regression 1D: Training Two Parameter Stochastic Gradient Descent (SGD)</h1>

<h2>Objective</h2><ul><li> How to use SGD(Stochastic Gradient Descent) to train the model.</li></ul> 

Batch Gradient Descent (BGD) VS GSD:
The main difference between BGD and SGD is that BGD uses the entire dataset for each gradient update, while SGD uses a single or small batch of data points for more frequent updates.

SGD is often considered superior to BGD because it can converge faster by making frequent updates, potentially escaping local minima and better exploring the loss landscape, and it requires less memory, making it more efficient for large datasets.


<h2>Table of Contents</h2>
<p>Practice training a model by using Stochastic Gradient descent.</p>

<ul>
    <li><a href="#Makeup_Data">Make Some Data</a></li>
    <li><a href="#Model_Cost">Create the Model and Cost Function (Total Loss)</a></li>
    <li><a href="#BGD">Train the Model:Batch Gradient Descent</a></li>
    <li><a href="#SGD">Train the Model:Stochastic gradient descent</a></li>
    <li><a href="#SGD_Loader">Train the Model:Stochastic gradient descent with Data Loader</a></li>
</ul>



<h2>Preparation</h2>

In [None]:
# These are the libraries we are going to use in the lab.
import torch
import numpy as np

<h2 id="Makeup_Data">Make Some Data</h2>

In [None]:
# Setup the actual data and simulated data
torch.manual_seed(1) # Set random seed

X = torch.arange(-3, 3, 0.1).view(-1, 1)
f = 1 * X - 1
Y = f + 0.1 * torch.randn(X.size())

<h2 id="Model_Cost">Create the Model and Cost Function (Total Loss)</h2>

In [None]:
#step01: Define the forward function
def forward(x):
    return w * x + b

In [None]:
#step02: Define the MSE Loss function
def criterion(yhat, y):
    return torch.mean((yhat - y) ** 2)

<h2 id="BGD">Train the Model: Batch Gradient Descent</h2>

In [None]:
# Define the parameters w, b for y = wx + b
w = torch.tensor(-15.0, requires_grad = True)
b = torch.tensor(-10.0, requires_grad = True)

# Define learning rate and create an empty list for containing the loss for each iteration.
lr = 0.1
LOSS_BGD = []

# The function for training the model

def train_model(iter):
    
    # Loop
    for epoch in range(iter):
        
        # make a prediction
        Yhat = forward(X)
        
        # calculate the loss 
        loss = criterion(Yhat, Y)

        # Section for plotting
        # get_surface.set_para_loss(w.data.tolist(), b.data.tolist(), loss.tolist())
        # get_surface.plot_ps()
            
        # store the loss in the list LOSS_BGD
        LOSS_BGD.append(loss)
        
        # backward pass: compute gradient of the loss with respect to all the learnable parameters
        loss.backward()
        
        # update parameters slope and bias
        w.data = w.data - lr * w.grad.data
        b.data = b.data - lr * b.grad.data
        
        # zero the gradients before running the backward pass
        w.grad.data.zero_()
        b.grad.data.zero_()

In [None]:
# Train the model with 10 iterations
train_model(10)

<h2 id="SGD">Train the Model: Stochastic Gradient Descent</h2>

In [None]:
# The function for training the model

LOSS_SGD = []
w = torch.tensor(-15.0, requires_grad = True)
b = torch.tensor(-10.0, requires_grad = True)

def train_model_SGD(iter):
    
    # Loop
    for epoch in range(iter):
        
        # SGD is an approximation of out true total loss/cost, in this line of code we calculate our true loss/cost and store it
        Yhat = forward(X)

        # store the loss 
        LOSS_SGD.append(criterion(Yhat, Y).tolist())
        
        for x, y in zip(X, Y):   ## the main different from BGD is here!!!
            
            # make a pridiction
            yhat = forward(x)
        
            # calculate the loss 
            loss = criterion(yhat, y)

            # Section for plotting
            # get_surface.set_para_loss(w.data.tolist(), b.data.tolist(), loss.tolist())
        
            # backward pass: compute gradient of the loss with respect to all the learnable parameters
            loss.backward()
        
            # update parameters slope and bias
            w.data = w.data - lr * w.grad.data
            b.data = b.data - lr * b.grad.data

            # zero the gradients before running the backward pass
            w.grad.data.zero_()
            b.grad.data.zero_()
            
        #plot surface and data space after each epoch    
        # get_surface.plot_ps()

<h2 id="SGD_Loader">SGD with Dataset DataLoader</h2>

In [None]:
# Import the library for DataLoader
from torch.utils.data import Dataset, DataLoader

In [None]:
# Dataset Class

class Data(Dataset):
    
    # Constructor
    def __init__(self):
        self.x = torch.arange(-3, 3, 0.1).view(-1, 1)
        self.y = 1 * self.x - 1
        self.len = self.x.shape[0]
        
    # Getter
    def __getitem__(self,index):    
        return self.x[index], self.y[index]
    
    # Return the length
    def __len__(self):
        return self.len

In [None]:
# Create the dataset and check the length
dataset = Data()
print("The length of dataset: ", len(dataset))

In [None]:
# Create DataLoader
trainloader = DataLoader(dataset = dataset, batch_size = 1)

In [None]:
# The function for training the model

w = torch.tensor(-15.0,requires_grad=True)
b = torch.tensor(-10.0,requires_grad=True)
LOSS_Loader = []

def train_model_DataLoader(epochs):
    
    # Loop
    for epoch in range(epochs):
        
        # SGD is an approximation of out true total loss/cost, in this line of code we calculate our true loss/cost and store it
        Yhat = forward(X)
        
        # store the loss 
        LOSS_Loader.append(criterion(Yhat, Y).tolist())
        
        for x, y in trainloader:
            
            # make a prediction
            yhat = forward(x)
            
            # calculate the loss
            loss = criterion(yhat, y)
            
            # Section for plotting
            # get_surface.set_para_loss(w.data.tolist(), b.data.tolist(), loss.tolist())
            
            # Backward pass: compute gradient of the loss with respect to all the learnable parameters
            loss.backward()
            
            # Updata parameters slope
            w.data = w.data - lr * w.grad.data
            b.data = b.data - lr* b.grad.data
            
            # Clear gradients 
            w.grad.data.zero_()
            b.grad.data.zero_()
            
        #plot surface and data space after each epoch    
        # get_surface.plot_ps()

In [None]:
# Run 10 iterations
train_model_DataLoader(10)