<a href="https://colab.research.google.com/github/nikxlvii/pytorch/blob/main/the_real_thing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Tackling a problem statement from scratch (simple models to neural nets in Pytorch)

"We just got back from a trip to some obscure location, and we brought back a fancy,wall-mounted analog thermometer. It looks great, and it’s a perfect fit for our livingroom. Its only flaw is that it doesn’t show units. Not to worry, we’ve got a plan: we’ll build a dataset of readings and corresponding temperature values in our favorite units, choose a model, adjust its weights iteratively until a measure of the error is low enough, and finally be able to interpret the new readings in units we understand"

In [7]:
# First we'll get the data

t_c = [0.5, 14.0, 15.0, 28.0, 11.0, 8.0, 3.0, -4.0, 6.0, 13.0, 21.0]
t_u = [35.7, 55.9, 58.2, 81.9, 56.3, 48.9, 33.9, 21.8, 48.4, 60.4, 68.4]

In [8]:
import torch

t_c = torch.tensor(t_c)
t_u = torch.tensor(t_u)

First, we'll use the linear model for this. The two measurements, t_c and t_u might be linearly related to each other. We can write that in the form:

t_c = w * t_u + b

w and b refer to weights and bias respectively. The weight tells us how much a given input influences the output. The bias is what the output would be if all inputs were zero.

We need to estimate these unknown parameters so that the error between the predicted output and the actual output is as low as possible. A loss function is a measure of error which can be used for this purpose. The loss function is high when the error is high and should ideally be as low as possible for a perfect match. Hence, this is an optimization process where we need to find the values of w and b keeping the loss function as low as possible.





The loss function here, would be the difference between the predicted temperatures and the actual temperature.

loss_func = (t_p - t_c)**2 [we need to loss function to be positive]



In [9]:
def model(t_u,w,b):
  return w*t_u + b

In [10]:
def loss_function(t_p,t_c):
  squared_diff = (t_p - t_c)**2
  return squared_diff.mean()

In [11]:
w = torch.ones(())
b = torch.zeros(())

t_p = model(t_u,w,b)
t_p

tensor([35.7000, 55.9000, 58.2000, 81.9000, 56.3000, 48.9000, 33.9000, 21.8000,
        48.4000, 60.4000, 68.4000])

In [12]:
loss = loss_function(t_p,t_c)
loss

tensor(1763.8848)

 We'll optimize the loss function using the gradient descent algorithm. Gradient descent computes the rate of change of the loss with respect to each parameter, and modify each parameter in the direction of decreasing loss.

 The main crunch behind Gradient Descent is to change the parameters very slowly along a decreasing loss. The change should be extremely minute (delta = 0.1). A change in w (or b) leads to some change in loss. If the change in loss is negative then we need to increase w (or b) to minimize the loss. If the change in loss is positive then we need to decrease w (or b) to minimize the loss.

 Now, it is very important to talk about the learning rate. The learning rate will determine the speed at which the parameters will move towards their optimal value. It's like a car, if we go too fast, then we might miss the destination. So it's better to go slow and hence keep the learning rate very small.

Most of the times, repeated evaluations of the model and loss aren't very fruitful.

We will analytically take the derivative of the loss with respect to a parameter. This will give us how that parameter and the overall result are connected. If the derivative is positive then when the parameter increases, the loss will increase too. If it's negative then they'll have an inversely proportional relationship with each other.

The gradient is referred to as a vector of derivatives.


In [13]:
def dloss_fn(t_p,t_c):
  dsq_diffs = 2 * (t_p - t_c ) / t_p.size(0)
  return dsq_diffs

In [14]:
# Applying derivatives to the model

def dmodel_dw(t_u,w,b):
  return t_u

def dmodel_db(t_u,w,b):
  return 1.0

In [15]:
def grad_fn(t_u,t_c,t_p,w,b):
  dloss_dtp = dloss_fn(t_p, t_c)
  dloss_dw = dloss_dtp * dmodel_dw(t_u, w, b)
  dloss_db = dloss_dtp * dmodel_db(t_u, w, b)
  return torch.stack([dloss_dw.sum(), dloss_db.sum()])

Now we have everything in place to optimize our parameters. We have to define the number of iterations which we update and optimize the parameter. These training iterations are called Epochs.

In [16]:
'''def training_loop(n_epochs,learning_rate,params,t_u,t_c):
  for epoch in range(1,n_epochs+1):
    w,b = params
    t_p = model(t_u,w,b)
    loss = loss_function(t_p,t_c)
    grad = grad_fn(t_u,t_c,t_p,w,b)

    params = params - learning_rate*grad

    print('Epoch %d, Loss %f' % (epoch,float(loss)))
  return params'''

In [19]:
def training_loop(n_epochs, learning_rate, params, t_u, t_c,
                  print_params=True):
    for epoch in range(1, n_epochs + 1):
        w, b = params

        t_p = model(t_u, w, b)  # <1>
        loss = loss_function(t_p, t_c)
        grad = grad_fn(t_u, t_c, t_p, w, b)  # <2>

        params = params - learning_rate * grad

        if epoch in {1, 2, 3, 10, 11, 99, 100, 4000, 5000}:  # <3>
            print('Epoch %d, Loss %f' % (epoch, float(loss)))
            if print_params:
                print('    Params:', params)
                print('    Grad:  ', grad)
        if epoch in {4, 12, 101}:
            print('...')

        if not torch.isfinite(loss).all():
            break  # <3>

    return params

In [20]:
training_loop(n_epochs=100,
              learning_rate = 1e-2,
              params = torch.tensor([1.0,0.0]),
              t_u = t_u,
              t_c = t_c)

Epoch 1, Loss 1763.884766
    Params: tensor([-44.1730,  -0.8260])
    Grad:   tensor([4517.2964,   82.6000])
Epoch 2, Loss 5802484.500000
    Params: tensor([2568.4011,   45.1637])
    Grad:   tensor([-261257.4062,   -4598.9702])
Epoch 3, Loss 19408029696.000000
    Params: tensor([-148527.7344,   -2616.3931])
    Grad:   tensor([15109614.0000,   266155.6875])
...
Epoch 10, Loss 90901105189019073810297959556841472.000000
    Params: tensor([3.2144e+17, 5.6621e+15])
    Grad:   tensor([-3.2700e+19, -5.7600e+17])
Epoch 11, Loss inf
    Params: tensor([-1.8590e+19, -3.2746e+17])
    Grad:   tensor([1.8912e+21, 3.3313e+19])


tensor([-1.8590e+19, -3.2746e+17])

The losses are becoming inf in the end. That means that the optimization process is unstable. Let us choose a smaller learning rate for this.

In [21]:
training_loop(n_epochs = 100,
              learning_rate = 1e-4,
              params = torch.tensor([1.0,0.0]),
              t_u = t_u,
              t_c = t_c)

Epoch 1, Loss 1763.884766
    Params: tensor([ 0.5483, -0.0083])
    Grad:   tensor([4517.2964,   82.6000])
Epoch 2, Loss 323.090515
    Params: tensor([ 0.3623, -0.0118])
    Grad:   tensor([1859.5493,   35.7843])
Epoch 3, Loss 78.929634
    Params: tensor([ 0.2858, -0.0135])
    Grad:   tensor([765.4666,  16.5122])
...
Epoch 10, Loss 29.105247
    Params: tensor([ 0.2324, -0.0166])
    Grad:   tensor([1.4803, 3.0544])
Epoch 11, Loss 29.104168
    Params: tensor([ 0.2323, -0.0169])
    Grad:   tensor([0.5781, 3.0384])
...
Epoch 99, Loss 29.023582
    Params: tensor([ 0.2327, -0.0435])
    Grad:   tensor([-0.0533,  3.0226])
Epoch 100, Loss 29.022667
    Params: tensor([ 0.2327, -0.0438])
    Grad:   tensor([-0.0532,  3.0226])


tensor([ 0.2327, -0.0438])

There's another problem: the updates to the parameters are very small, so the loss decreases very slowly and eventually stalls.

We can also normalize the inputs to ensure that the gradients aren't so different. In this case, we can multiply t_u by 0.1.

Normalization is usually done to ensure model convergence.

In [22]:
t_un = t_u * 0.1

In [24]:
training_loop(n_epochs = 5000,
              learning_rate = 1e-2,
              params=torch.tensor([1.0,0.0]),
              t_u = t_un,
              t_c = t_c)

Epoch 1, Loss 80.364342
    Params: tensor([1.7761, 0.1064])
    Grad:   tensor([-77.6140, -10.6400])
Epoch 2, Loss 37.574913
    Params: tensor([2.0848, 0.1303])
    Grad:   tensor([-30.8623,  -2.3864])
Epoch 3, Loss 30.871077
    Params: tensor([2.2094, 0.1217])
    Grad:   tensor([-12.4631,   0.8587])
...
Epoch 10, Loss 29.030489
    Params: tensor([ 2.3232, -0.0710])
    Grad:   tensor([-0.5355,  2.9295])
Epoch 11, Loss 28.941877
    Params: tensor([ 2.3284, -0.1003])
    Grad:   tensor([-0.5240,  2.9264])
...
Epoch 99, Loss 22.214186
    Params: tensor([ 2.7508, -2.4910])
    Grad:   tensor([-0.4453,  2.5208])
Epoch 100, Loss 22.148710
    Params: tensor([ 2.7553, -2.5162])
    Grad:   tensor([-0.4446,  2.5165])
...
Epoch 4000, Loss 2.927680
    Params: tensor([  5.3643, -17.2853])
    Grad:   tensor([-0.0006,  0.0033])
Epoch 5000, Loss 2.927648
    Params: tensor([  5.3671, -17.3012])
    Grad:   tensor([-0.0001,  0.0006])


tensor([  5.3671, -17.3012])

Let us rewrite some of our code to include AutoGrad

In [27]:
params = torch.tensor([1.0,0.0], requires_grad = True) # That argument is telling PyTorch to track the entire family tree of tensors resulting from operations on params. In other words, any tensor that will have params as an ancestor will have access to the chain of functions that were called to get from params to that tensor.

In [28]:
loss = loss_function(model(t_u, *params), t_c)
loss.backward()

In [29]:
params.grad

tensor([4517.2969,   82.6000])

In [33]:
# AutoGrad enabled traning loop

def training_loop_grad(n_epochs, learning_rate, params, t_u, t_c):
  for epoch in range(1, n_epochs + 1):
    if params.grad is not None:
      params.grad.zero_()

    t_p = model(t_u, *params)
    loss = loss_function(t_p, t_c)
    loss.backward()

    with torch.no_grad():
      params -= learning_rate * params.grad

    if epoch % 500 == 0:
      print('Epoch %d, Loss %f' % (epoch, float(loss)))
  return params

In [34]:
training_loop_grad(n_epochs=5000,
                   learning_rate = 1e-2,
                   params=torch.tensor([1.0,0.0], requires_grad=True),
                   t_u = t_un,
                   t_c = t_c)

Epoch 500, Loss 7.860115
Epoch 1000, Loss 3.828538
Epoch 1500, Loss 3.092191
Epoch 2000, Loss 2.957698
Epoch 2500, Loss 2.933134
Epoch 3000, Loss 2.928648
Epoch 3500, Loss 2.927830
Epoch 4000, Loss 2.927679
Epoch 4500, Loss 2.927652
Epoch 5000, Loss 2.927647


tensor([  5.3671, -17.3012], requires_grad=True)

The values of the parameters come down to 5.3671 and -17.3012 which is really close to the actual values of conversion between Celcius and Fahrenheit.


We were actually look at temperature in Fahrenheit all along.