In [1]:
import torch


In [2]:
import numpy as np

## Linear Regression with pytorch

The learning in linear regression is the learning of weights/params for each predictor variable. Training a linear regression model involves finding the best weights to optimize loss(difference of the predicted model from the ground truth model.)

### Training Data 

There are 5 instances of the input variables( temp, rainfall, humidity).

In [3]:
inputs = np.array([[73, 67, 43], 
                   [91, 88, 64], 
                   [87, 134, 58], 
                   [102, 43, 37], 
                   [69, 96, 70]], dtype='float32')

Two target variables: Apples and oranges yield.

In [4]:
targets = np.array([[56, 70], 
                    [81, 101], 
                    [119, 133], 
                    [22, 37], 
                    [103, 119]], dtype='float32')

We want to convert the inputs and target arrays to tensors.

In [5]:
inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)
print(inputs)
print(targets)

tensor([[ 73.,  67.,  43.],
        [ 91.,  88.,  64.],
        [ 87., 134.,  58.],
        [102.,  43.,  37.],
        [ 69.,  96.,  70.]])
tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


## Step 1. Random Initialization of weights


We initialize w and b with elements randomly picked from a gaussian distribution with mean 0 and std 1 using: 
#### torch.randn()

In [6]:
w = torch.randn(2,3, requires_grad= True) # number of weights = number of predictor variables times  target variables. In this case, 3 x 2 
b =  torch.randn(2, requires_grad= True) # number of biases =  number of target variables 


In [7]:
print(w)
print(b)

tensor([[ 0.6969, -0.2638, -0.1734],
        [ 1.6521,  0.6281, -0.0261]], requires_grad=True)
tensor([1.3213, 0.6752], requires_grad=True)


## Step 2. Defining the model

In [8]:
def model(x):
    return x@w.t() + b  


#### x@w :
@ is used for matrix multiplication between the two matrices.

#### .t() transposes w

## Step 3. Predicting the outcomes

In [9]:
prediction = model(inputs)


## Step 4. Evaluating performance of model using loss function





First, let's take a glance between our targets and predictions.

In [10]:
print(targets)

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


In [11]:
print(prediction)

tensor([[ 27.0649, 162.2357],
        [ 30.4277, 204.6144],
        [ 16.5464, 227.0556],
        [ 54.6472, 195.2283],
        [ 11.9444, 173.1367]], grad_fn=<AddBackward0>)


Our predictions obviously way off from the ground truth. We have to minimize this difference between the two a.k.a. loss/error.

In linear regression,the loss is characterized in terms of mean squared error(MSE).

MSE is the average squared difference of the corresponding elements from the prediciton tensor to the target tensor.

In [12]:
#Let's define a function that calculates MSE. 
def mse(t1, t2):
    diff = t1-t2 
    return torch.sum(diff*diff)/diff.numel()

#### torch.sum(...) sums up all elements in a tensor 
#### .numel() calculates the number of elements in a tensor

In [13]:
#Let's calculate the loss of our model. 
loss = mse(prediction, targets)

That's a pretty high loss or deviation from the ground truth. We need a way to minimize this loss, which is where gradients come into play. 

## Step 5. Computing gradients of the loss

#### .backward() 
computes the gradient of loss w.r.t all the elements 
that were used to compute it(the ones that had requires_grad set to True )

#### .grad

displays the derivative of a function w.r.t. the particular element 

In [14]:
loss.backward() #computing the derivative of the loss function w.r.t b and w

In [15]:
#display gradients 
print('dloss/dw:', w.grad) 
print('dloss/db:', b.grad)

dloss/dw: tensor([[-3716.1250, -5491.0605, -3117.8179],
        [ 8843.9365,  7980.4521,  5139.3408]])
dloss/db: tensor([-48.0739, 100.4542])


#### Our objective is to find the set of weights where the loss is the lowest.

If a grad element is positive (slope is positive), increasing the element's value will increase the loss. 
Decreasing the element's value will decrease the loss. 

If a grad element is negative (slope is negative), increasing the element's value will decrease the loss. 
Decreasing the element's value will increase the loss.

By adjusting the value of the weights in proportion to the value of the gradients,
we can update loss. 

#### .grad.zero_()

Everytime we run .backward, the new gradient value gets added to the existing gradient values.
We need to reset grad values to zero during each training epoch before calculating new gradients for the updated loss function. 
We can do that by using the .zero_() function.


In [16]:
w.grad.zero_()
b.grad.zero_()
print(w.grad)
print(b.grad)

tensor([[0., 0., 0.],
        [0., 0., 0.]])
tensor([0., 0.])


## Step 6. Updating weights and biases

### We have to reimplement step 3. through step 5. to update w and b. 

In [17]:
prediction = model(inputs) #step 3 
loss = mse(prediction, targets)#step 4
loss.backward() #step 5
print(w.grad)
print(b.grad)

tensor([[-3716.1250, -5491.0605, -3117.8179],
        [ 8843.9365,  7980.4521,  5139.3408]])
tensor([-48.0739, 100.4542])


In [18]:
print(loss) #prints loss before adjusting weights 

tensor(7930.5322, grad_fn=<DivBackward0>)


#### torch.no_grad() 
deactivates autograd engine. Eventually it will reduce the memory usage and speed up computations. We notify pytorch that we aren't going to automatically track or modify the grads based on the updated weights. 


In [19]:
# Adjusting weights and reset gradients. 
#learning rate has been set to 10^-5
with torch.no_grad(): 
    w-= w.grad*1e-5 
    b-= b.grad*1e-5
#resetting grads to 0
w.grad.zero_()
b.grad.zero_()

tensor([0., 0.])

In [20]:
#updating loss
prediction = model(inputs) #step 3 
loss = mse(prediction, targets)#step 4
print(loss) #new loss

tensor(5907.3530, grad_fn=<DivBackward0>)


The loss has gone down significantly, meaning our model is improving. 

#### We are going to train the model for multiple epochs. 

In [21]:
epochs = 500
for i in range(0,epochs): 
    prediction = model(inputs) #step 3 
    loss = mse(prediction, targets)#step 4
    loss.backward() #step 5
    with torch.no_grad(): 
        w-= w.grad*1e-5 
        b-= b.grad*1e-5
    #resetting gradients to 0
    w.grad.zero_()
    b.grad.zero_()
    print(loss) 

    

tensor(5907.3530, grad_fn=<DivBackward0>)
tensor(4537.1553, grad_fn=<DivBackward0>)
tensor(3607.0786, grad_fn=<DivBackward0>)
tensor(2973.6763, grad_fn=<DivBackward0>)
tensor(2540.2820, grad_fn=<DivBackward0>)
tensor(2241.7527, grad_fn=<DivBackward0>)
tensor(2034.1881, grad_fn=<DivBackward0>)
tensor(1888.0032, grad_fn=<DivBackward0>)
tensor(1783.2605, grad_fn=<DivBackward0>)
tensor(1706.5232, grad_fn=<DivBackward0>)
tensor(1648.7357, grad_fn=<DivBackward0>)
tensor(1603.7944, grad_fn=<DivBackward0>)
tensor(1567.5848, grad_fn=<DivBackward0>)
tensor(1537.3339, grad_fn=<DivBackward0>)
tensor(1511.1716, grad_fn=<DivBackward0>)
tensor(1487.8372, grad_fn=<DivBackward0>)
tensor(1466.4800, grad_fn=<DivBackward0>)
tensor(1446.5251, grad_fn=<DivBackward0>)
tensor(1427.5856, grad_fn=<DivBackward0>)
tensor(1409.3990, grad_fn=<DivBackward0>)
tensor(1391.7874, grad_fn=<DivBackward0>)
tensor(1374.6309, grad_fn=<DivBackward0>)
tensor(1357.8469, grad_fn=<DivBackward0>)
tensor(1341.3795, grad_fn=<DivBack

tensor(180.5499, grad_fn=<DivBackward0>)
tensor(178.9080, grad_fn=<DivBackward0>)
tensor(177.2857, grad_fn=<DivBackward0>)
tensor(175.6826, grad_fn=<DivBackward0>)
tensor(174.0987, grad_fn=<DivBackward0>)
tensor(172.5336, grad_fn=<DivBackward0>)
tensor(170.9871, grad_fn=<DivBackward0>)
tensor(169.4589, grad_fn=<DivBackward0>)
tensor(167.9489, grad_fn=<DivBackward0>)
tensor(166.4567, grad_fn=<DivBackward0>)
tensor(164.9823, grad_fn=<DivBackward0>)
tensor(163.5253, grad_fn=<DivBackward0>)
tensor(162.0856, grad_fn=<DivBackward0>)
tensor(160.6629, grad_fn=<DivBackward0>)
tensor(159.2570, grad_fn=<DivBackward0>)
tensor(157.8676, grad_fn=<DivBackward0>)
tensor(156.4947, grad_fn=<DivBackward0>)
tensor(155.1380, grad_fn=<DivBackward0>)
tensor(153.7973, grad_fn=<DivBackward0>)
tensor(152.4723, grad_fn=<DivBackward0>)
tensor(151.1631, grad_fn=<DivBackward0>)
tensor(149.8691, grad_fn=<DivBackward0>)
tensor(148.5904, grad_fn=<DivBackward0>)
tensor(147.3266, grad_fn=<DivBackward0>)
tensor(146.0778,

tensor(45.8663, grad_fn=<DivBackward0>)
tensor(45.7078, grad_fn=<DivBackward0>)
tensor(45.5506, grad_fn=<DivBackward0>)
tensor(45.3946, grad_fn=<DivBackward0>)
tensor(45.2398, grad_fn=<DivBackward0>)
tensor(45.0861, grad_fn=<DivBackward0>)
tensor(44.9337, grad_fn=<DivBackward0>)
tensor(44.7824, grad_fn=<DivBackward0>)
tensor(44.6323, grad_fn=<DivBackward0>)
tensor(44.4833, grad_fn=<DivBackward0>)
tensor(44.3354, grad_fn=<DivBackward0>)
tensor(44.1886, grad_fn=<DivBackward0>)
tensor(44.0429, grad_fn=<DivBackward0>)
tensor(43.8983, grad_fn=<DivBackward0>)
tensor(43.7548, grad_fn=<DivBackward0>)
tensor(43.6123, grad_fn=<DivBackward0>)
tensor(43.4708, grad_fn=<DivBackward0>)
tensor(43.3305, grad_fn=<DivBackward0>)
tensor(43.1912, grad_fn=<DivBackward0>)
tensor(43.0528, grad_fn=<DivBackward0>)
tensor(42.9154, grad_fn=<DivBackward0>)
tensor(42.7790, grad_fn=<DivBackward0>)
tensor(42.6437, grad_fn=<DivBackward0>)
tensor(42.5092, grad_fn=<DivBackward0>)
tensor(42.3757, grad_fn=<DivBackward0>)


After 500 epochs, our loss is 37.5 units which is way lower than what we started with.  
