In [3]:
import torch


In [4]:
import numpy as np

## Linear Regression with pytorch

The learning in linear regression is the learning of weights/params for each predictor variable. Training a linear regression model involves finding the best weights to optimize loss(difference of the predicted model from the ground truth model.)

### Training Data 

There are 5 instances of the input variables( temp, rainfall, humidity).

In [5]:
inputs = np.array([[73, 67, 43], 
                   [91, 88, 64], 
                   [87, 134, 58], 
                   [102, 43, 37], 
                   [69, 96, 70]], dtype='float32')

Two target variables: Apples and oranges yield.

In [6]:
targets = np.array([[56, 70], 
                    [81, 101], 
                    [119, 133], 
                    [22, 37], 
                    [103, 119]], dtype='float32')

We want to convert the inputs and target arrays to tensors.

In [7]:
inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)
print(inputs)
print(targets)

tensor([[ 73.,  67.,  43.],
        [ 91.,  88.,  64.],
        [ 87., 134.,  58.],
        [102.,  43.,  37.],
        [ 69.,  96.,  70.]])
tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


## Step 1. Random Initialization of weights


We initialize w and b with elements randomly picked from a gaussian distribution with mean 0 and std 1 using: 
#### torch.randn()

In [8]:
w = torch.randn(2,3, requires_grad= True) # number of weights = number of predictor variables times  target variables. In this case, 3 x 2 
b =  torch.randn(2, requires_grad= True) # number of biases =  number of target variables 


In [9]:
print(w)
print(b)

tensor([[-1.0807, -0.9576, -1.2509],
        [ 0.0829, -0.1652, -0.5678]], requires_grad=True)
tensor([-0.3241,  1.1236], requires_grad=True)


## Step 2. Defining the model

In [10]:
def model(x):
    return x@w.t() + b  


#### x@w :
@ is used for matrix multiplication between the two matrices.

#### .t() transposes w

## Step 3. Predicting the outcomes

In [11]:
prediction = model(inputs)


## Step 4. Evaluating performance of model using loss function





First, let's take a glance between our targets and predictions.

In [12]:
print(targets)

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


In [13]:
print(prediction)

tensor([[-197.1626,  -28.3126],
        [-262.9934,  -42.2148],
        [-295.2137,  -46.7383],
        [-198.0163,  -18.5376],
        [-254.3834,  -48.7666]], grad_fn=<AddBackward0>)


Our predictions obviously way off from the ground truth. We have to minimize this difference between the two a.k.a. loss/error.

In linear regression,the loss is characterized in terms of mean squared error(MSE).

MSE is the average squared difference of the corresponding elements from the prediciton tensor to the target tensor.

In [14]:
#Let's define a function that calculates MSE. 
def mse(t1, t2):
    diff = t1-t2 
    return torch.sum(diff*diff)/diff.numel()

#### torch.sum(...) sums up all elements in a tensor 
#### .numel() calculates the number of elements in a tensor

In [18]:
#Let's calculate the loss of our model. 
loss = mse(prediction, targets)

That's a pretty high loss or deviation from the ground truth. We need a way to minimize this loss, which is where gradients come into play. 

## Step 5. Computing gradients of the loss

#### .backward() 
computes the gradient of loss w.r.t all the elements 
that were used to compute it(the ones that had requires_grad set to True )

#### .grad

displays the derivative of a function w.r.t. the particular element 

In [19]:
loss.backward() #computing the derivative of the loss function w.r.t b and w

In [21]:
#display gradients 
print('dloss/dw:', w.grad) 
print('dloss/db:', b.grad)

dloss/dw: tensor([[-26584.3984, -29301.4922, -18016.6836],
        [-10617.4648, -12353.6982,  -7523.3125]])
dloss/db: tensor([-317.7539, -128.9140])


#### Our objective is to find the set of weights where the loss is the lowest.

If a grad element is positive (slope is positive), increasing the element's value will increase the loss. 
Decreasing the element's value will decrease the loss. 

If a grad element is negative (slope is negative), increasing the element's value will decrease the loss. 
Decreasing the element's value will increase the loss.