# Importing PyTorch

In [5]:
import torch
import torch.nn as nn # It has almost all Layers and Activation functions.
import torch.optim as optim # It has a lot of Optimizers.
import torch.nn.functional as F # It has some Activation Functions as well.
from torch.utils.data import DataLoader
import torchvision.datasets as datasets
import torchvision.transforms as transforms

In [6]:
# Importing Other Important Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

- Now let us try to see the underlying algorithm of how Gradient Descent works from Scratch.


# What is Gradient Descent ?

- let us suppose we have **X=5 , Y_true = 10**.
- Now the Objective is to find the value of **'W'** such that **W*X=Y_true**.
- Now applying it here is pretty easy where we divide Y_true with X to give 10/5=2 as the value of **W**.
- But in real world the equation is not that easy to solve and need **Exact Solution** to solve the equations.
- There are many **Numerical Methods** in Mathematics to solve this kind of **Exact equations** of which one is **Gradient Descent** Algorithm.
- **Gradient Descent:**Gradient descent is an optimization algorithm used to minimize some function by iteratively moving in the direction of steepest descent as defined by the negative of the gradient. In Deep learning, we use gradient descent to update the parameters of our model.
-- <img src='https://lucidar.me/en/neural-networks/files/gradient-overview.png' width=400 />
- Here Target minimum is known as **Global Minima** which is the lowest possible error between the true and predicted value.
- At this Global Minima whatever weight it has applied to reach the point would be considered as an optium Weight for prediction.
- We will initially use *Numpy for understanding it from depth*.
- Later we will use **AUTOGRAD** for further simplification.
- As we are gonna watch two implementations let us move forward with the first one.
 

# Gradient Descent using Numpy

## Initializing the Data.

In [19]:
# Firstly let us decide our Predictor->X and Target->Y values respectively.
X = np.array([1,2,3,4])
Y = np.array([3,6,9,12])

- The Equation of our forward step is **Y=W*X,** and clearly the value of **W** is **3** which upon multiplied to **X** will give us **Y**. 
- Let us intiate **W** to zero and see if we can iteratively reach to the original solution by reducing the weights.

In [20]:
# Let us firstly set weight to zero
W=0

- Let us make our forward function as follows.
- Also will try to make a loss function which in our case would be **Mean Squarred error.**

## Creating Forward function and Loss function

In [10]:
def forward_step(X):
  predicted_value = W*X
  return predicted_value

def loss_function(y_true, y_pred):
  # Mean Squared Error.
  loss = ((y_true-y_pred)**2).mean()
  return loss

## Gradient Function:
- Our Gradient Function is always dependent on the Loss function that we choode.
- Infact it is the **First Derivative** of our loss function.
- **Loss function:**
- <img src='https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTFC9BlOat6q3hSstsBxmrCweT6NDdfj6mB206wzGPAyM-FwlE7TsoU_Kb-qwyrKhHS5cs&usqp=CAU' width=400 height=100/>
- **Loss function** after differentiating one time becomes the **objective function** or **Gradient function** and looks like follows:
- <img src='https://hackernoon.com/hn-images/0*XFK9C3go0VaWR_f4.png' width=400 />
- Using gradient function to update weights where formula looks like follows.
- <img src='https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSKG9gm8Q1Xwaz-L-W_U0qNY02X7nENMLRJBUtY1eVOyZryQgYqcE_gnQpWhCiAPVmYKQ&usqp=CAU' width=400 />

- where *Alpha* is the **Learning Rate**.

## Applying Gradient Descent using Numpy

In [11]:
# The gradient function after differentiation.
def Gradient_function(X,y_true, y_pred):
  return np.dot(2*X , y_pred-y_true).mean()

In [30]:
# Let us initiate some parameters and start finiding solution to this.
W=0
print(f'Initialized Weight :{W}')
print(f'Predictions with this weight:{forward_step(X)}')
learning_rate = 0.01
n_iters = 11
for i in range(n_iters):
  print(f'Weight before forward:{W}')
  y_preds = forward_step(X)
  loss = loss_function(Y, y_preds)
  gradient = Gradient_function(X, Y, y_preds)
  W = W - gradient*learning_rate
  print(f'Parameters at {i} steps: \nPredictions={y_preds}, loss={loss:.3f}, updated weight={W:.3f}')
print('\n')
y_preds = [round(i,3) for i in y_preds]
print(f'Comparing real and predicted values\n Y_true={Y}\n Y_pred={y_preds}')

Initialized Weight :0
Predictions with this weight:[0 0 0 0]
Weight before forward:0
Parameters at 0 steps: 
Predictions=[0 0 0 0], loss=67.500, updated weight=1.800
Weight before forward:1.8
Parameters at 1 steps: 
Predictions=[1.8 3.6 5.4 7.2], loss=10.800, updated weight=2.520
Weight before forward:2.52
Parameters at 2 steps: 
Predictions=[ 2.52  5.04  7.56 10.08], loss=1.728, updated weight=2.808
Weight before forward:2.808
Parameters at 3 steps: 
Predictions=[ 2.808  5.616  8.424 11.232], loss=0.276, updated weight=2.923
Weight before forward:2.9232
Parameters at 4 steps: 
Predictions=[ 2.9232  5.8464  8.7696 11.6928], loss=0.044, updated weight=2.969
Weight before forward:2.96928
Parameters at 5 steps: 
Predictions=[ 2.96928  5.93856  8.90784 11.87712], loss=0.007, updated weight=2.988
Weight before forward:2.987712
Parameters at 6 steps: 
Predictions=[ 2.987712  5.975424  8.963136 11.950848], loss=0.001, updated weight=2.995
Weight before forward:2.9950848
Parameters at 7 steps:

- Looks Pretty Cool!
- Now let us try the same using **Pytorch's AUTOGRAD** and lets see how simple it is to perform.

# Gradient Descent using Pytorch 

## Initializing data to Pytorch's Tensors and using them

In [41]:
# Firstly we have to change the properties of our Variables. We will convert this into pytorch tensor as we will not consider any numpy array.
# Using numpy array can cause error while doing operations of Gradients Calculation.

X = torch.tensor([1,2,3,4], dtype=torch.float32) # Always specify the Data Type if possible.
Y = torch.tensor([3,6,9,12], dtype=torch.float32)

In [76]:
# Also the weight should be a Torch tensor.

W = torch.tensor([0.0], dtype=torch.float32, requires_grad=True) # Requires_grad = True because we are intrested in its gradient.

- Using the same Loss and Forward functions.
- Let us move forward to the training part.

## Applying Gradient Descent using Pytorch's AutoGrad

In [73]:
# Let us initiate some parameters and start finiding solution to this.
learning_rate = 0.01
n_iters = 11
for i in range(n_iters):
  print(f'Weight before forward:{W}')
  y_preds = forward_step(X)
  loss = loss_function(Y, y_preds)

  #gradient = Gradient_function(X, Y, y_preds) **Not going to use this instead we have....

  loss.backward()
  print(W.grad)
  # Updating the weights without want it to print.
  # returns no gradient to visualize which gives us a pleasant view.
  with torch.no_grad():
    W -= (learning_rate * W.grad) # Multiplying it using gradient of Weight vector.
  
  print(W)
  # Always clear the output of gradient so that it will not clash with the previous gradient.
  W.grad.zero_()
  if i%2==0:
    print(f'Parameters at {i} steps: \nPredictions={y_preds}, loss={loss}, updated weight={W}')
print('\n')
print(f'Comparing real and predicted values\n Y_true={Y}\n Y_pred={y_preds}')

Weight before forward:tensor([0.], requires_grad=True)
tensor([-45.])
tensor([0.4500], requires_grad=True)
Parameters at 0 steps: 
Predictions=tensor([0., 0., 0., 0.], grad_fn=<MulBackward0>), loss=67.5, updated weight=tensor([0.4500], requires_grad=True)
Weight before forward:tensor([0.4500], requires_grad=True)
tensor([-38.2500])
tensor([0.8325], requires_grad=True)
Weight before forward:tensor([0.8325], requires_grad=True)
tensor([-32.5125])
tensor([1.1576], requires_grad=True)
Parameters at 2 steps: 
Predictions=tensor([0.8325, 1.6650, 2.4975, 3.3300], grad_fn=<MulBackward0>), loss=35.23542022705078, updated weight=tensor([1.1576], requires_grad=True)
Weight before forward:tensor([1.1576], requires_grad=True)
tensor([-27.6356])
tensor([1.4340], requires_grad=True)
Weight before forward:tensor([1.4340], requires_grad=True)
tensor([-23.4903])
tensor([1.6689], requires_grad=True)
Parameters at 4 steps: 
Predictions=tensor([1.4340, 2.8680, 4.3019, 5.7359], grad_fn=<MulBackward0>), loss

- **YAY!!!** got the same result but with some differences.
- Let us **Increase the Epochs** and see the differences.

In [77]:
# Let us initiate some parameters and start finiding solution to this.
learning_rate = 0.01
n_iters = 25
for i in range(n_iters):
  print(f'Weight before forward:{W}')
  y_preds = forward_step(X)
  loss = loss_function(Y, y_preds)

  #gradient = Gradient_function(X, Y, y_preds) **Not going to use this instead we have....

  loss.backward()
  print(W.grad)
  # Updating the weights without want it to print.
  # returns no gradient to visualize which gives us a pleasant view.
  with torch.no_grad():
    W -= (learning_rate * W.grad) # Multiplying it using gradient of Weight vector.
  
  print(W)
  # Always clear the output of gradient so that it will not clash with the previous gradient.
  W.grad.zero_()
  if i%5==0:
    print(f'Parameters at {i} steps: \nPredictions={y_preds}, loss={loss}, updated weight={W}')
print('\n')
print(f'Comparing real and predicted values\n Y_true={Y}\n Y_pred={y_preds}')

Weight before forward:tensor([0.], requires_grad=True)
tensor([-45.])
tensor([0.4500], requires_grad=True)
Parameters at 0 steps: 
Predictions=tensor([0., 0., 0., 0.], grad_fn=<MulBackward0>), loss=67.5, updated weight=tensor([0.4500], requires_grad=True)
Weight before forward:tensor([0.4500], requires_grad=True)
tensor([-38.2500])
tensor([0.8325], requires_grad=True)
Weight before forward:tensor([0.8325], requires_grad=True)
tensor([-32.5125])
tensor([1.1576], requires_grad=True)
Weight before forward:tensor([1.1576], requires_grad=True)
tensor([-27.6356])
tensor([1.4340], requires_grad=True)
Weight before forward:tensor([1.4340], requires_grad=True)
tensor([-23.4903])
tensor([1.6689], requires_grad=True)
Weight before forward:tensor([1.6689], requires_grad=True)
tensor([-19.9667])
tensor([1.8686], requires_grad=True)
Parameters at 5 steps: 
Predictions=tensor([1.6689, 3.3378, 5.0067, 6.6755], grad_fn=<MulBackward0>), loss=13.289022445678711, updated weight=tensor([1.8686], requires_g

- Increasing the number of Epochs does the Job for us, Now let us quickly jump to the Conclusions.

# Conclusions:
- The *Custom* one reached the real value **more faster** than the *Torch* one but still increasing the Epochs can give us the same result as well.
- **AutoGrad reduces the headache of keeping track of gradients.**