<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Data-and-Model" data-toc-modified-id="Data-and-Model-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Data and Model</a></span></li><li><span><a href="#The-backward-method" data-toc-modified-id="The-backward-method-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>The <code>backward</code> method</a></span><ul class="toc-item"><li><span><a href="#The-zero_-method" data-toc-modified-id="The-zero_-method-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>The <code>zero_</code> method</a></span></li></ul></li><li><span><a href="#Updating-Parameters" data-toc-modified-id="Updating-Parameters-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Updating Parameters</a></span><ul class="toc-item"><li><span><a href="#First-Approach" data-toc-modified-id="First-Approach-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>First Approach</a></span></li><li><span><a href="#Second-Approach" data-toc-modified-id="Second-Approach-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Second Approach</a></span></li><li><span><a href="#Third-Approach" data-toc-modified-id="Third-Approach-3.3"><span class="toc-item-num">3.3&nbsp;&nbsp;</span>Third Approach</a></span></li></ul></li></ul></div>

In [1]:
import torch
import numpy as np
from tqdm import tqdm

  from .autonotebook import tqdm as notebook_tqdm


## Data and Model

`Let's take a linear regression function to be:`
- `y = b + w * x`

In [2]:
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Device: {device}")

Device: cpu


In [3]:
w = torch.tensor([2, 5], dtype=torch.float, device=device)
b = torch.scalar_tensor(1, dtype=torch.float, device=device)
print(f"w: {w} \nb: {b}")

w: tensor([2., 5.]) 
b: 1.0


In [4]:
# Creating Data Points and it's label from ture function
torch.manual_seed(42)

x = torch.randn(size=(100, 2), dtype=torch.float, device=device)
y = torch.matmul(torch.reshape(w, (1, 2)), torch.t(x)) + b

In [5]:
y.shape

torch.Size([1, 100])

## The `backward` method

In [6]:
# Creating Parameters
pt_w = torch.randn((1, 2), dtype=torch.float, requires_grad=True, device=device)
pt_b = torch.randn((1), dtype=torch.float, requires_grad=True, device=device)

In [7]:
y_hat = torch.matmul(pt_w, torch.t(x)) + pt_b

error = (y_hat - y)
loss = (error**2).mean()

In [8]:
loss.backward()

In [9]:
print(error.requires_grad, y_hat.requires_grad, \
      pt_w.requires_grad, pt_b.requires_grad)
print(y.requires_grad, x.requires_grad)

True True True True
False False


In [10]:
print(pt_w.grad, pt_b.grad)

tensor([[ -7.3571, -10.8245]]) tensor([-4.8556])


`Running the calculation of y_hat, error and then loss.backward() accumulates the gradient of parameter`

### The `zero_` method

`Every time we use the gradients to update the parameters, we need to zero the gradients afterward. And that is what zero_() is good for.`

In [11]:
print(pt_w.grad.zero_(), pt_b.grad.zero_())

tensor([[0., 0.]]) tensor([0.])


## Updating Parameters

`Different approaches to update the parameters`
- `Updating the trainable parameters individually by reassigning the weights (using assignment operator "="), marking the gradient as zero in each iteration`
- `Updating the trainable parameters by in-place python assignment`
- `Using no_grad method to update the trainable parameters`

In [12]:
n_epochs = 1000
device = "cuda" if torch.cuda.is_available() else "cpu"
lr = 0.001

### First Approach

In [13]:
# Initializing the weights
pt_w = torch.randn((1, 2), dtype=torch.float, requires_grad=True, device=device)
pt_b = torch.randn((1), dtype=torch.float, requires_grad=True, device=device)

In [14]:
for epoch in range(n_epochs):
    print(epoch)
    # Forward propagation
    y_hat = torch.add(pt_b, torch.matmul(pt_w, torch.t(x)))

    error = (y_hat - y)
    loss = (error**2).mean()

    # Backward Propagation
    loss.backward()

    pt_w = pt_w - lr*pt_w.grad
    pt_b = pt_b - lr*pt_b.grad

    # Resetting the values of gradients
    pt_w.grad.zero_()
    pt_b.grad.zero_()
    
print(pt_b, pt_w)

0


  return self._grad


AttributeError: 'NoneType' object has no attribute 'zero_'

### Second Approach

In [15]:
# Initializing the weights
pt_w = torch.randn((1, 2), dtype=torch.float, requires_grad=True, device=device)
pt_b = torch.randn((1), dtype=torch.float, requires_grad=True, device=device)

In [16]:
for epoch in range(n_epochs):
    print(epoch)
    # Forward propagation
    y_hat = torch.add(pt_b, torch.matmul(pt_w, torch.t(x)))

    error = (y_hat - y)
    loss = (error**2).mean()

    # Backward Propagation
    loss.backward()

    pt_w -= lr*pt_w.grad
    pt_b -= lr*pt_b.grad

    # Resetting the values of gradients
    pt_w.grad.zero_()
    pt_b.grad.zero_()
    
print(pt_b, pt_w)

0


RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.

### Third Approach

In [18]:
# Initializing the weights
pt_w = torch.randn((1, 2), dtype=torch.float, requires_grad=True, device=device)
pt_b = torch.randn((1), dtype=torch.float, requires_grad=True, device=device)

In [20]:
for epoch in tqdm(range(n_epochs)):
    # Forward propagation
    y_hat = torch.add(pt_b, torch.matmul(pt_w, torch.t(x)))

    error = (y_hat - y)
    loss = (error**2).mean()

    # Backward Propagation
    loss.backward()
    
    with torch.no_grad():
        pt_w -= lr*pt_w.grad
        pt_b -= lr*pt_b.grad

    # Resetting the values of gradients
    pt_w.grad.zero_()
    pt_b.grad.zero_()
    
print(pt_b, pt_w)

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:00<00:00, 10642.50it/s]

tensor([1.0050], requires_grad=True) tensor([[1.7841, 4.4107]], requires_grad=True)





`In above 2nd attempts, pytorch throws error because of dymanic computation graph. To avoid that we use no_grad() method to update the parameters.`