# PyTorch Basics: Tensors & Gradients.

## Tensors
- Its core, PyTorch is a library for processing tensors. 
- A **tensor** is a number, vector, matrix or any n-dimensional array.

In [1]:
# Uncomment and run the appropriate command for your operating system, if required

# Linux / Binder
# !pip install numpy torch==1.7.0+cpu torchvision==0.8.1+cpu torchaudio==0.7.0 -f https://download.pytorch.org/whl/torch_stable.html

# Windows
#!pip install numpy torch==1.7.0+cpu torchvision==0.8.1+cpu torchaudio==0.7.0 -f https://download.pytorch.org/whl/torch_stable.html

# MacOS
# !pip install numpy torch torchvision torchaudio


In [2]:
# Importing PyTorch. 
import torch

In [3]:
torch.__version__

'1.4.0'

In [4]:
# Number
t1=torch.tensor(4.)
t1

tensor(4.)

In [5]:
t1.dtype

torch.float32

In [6]:
# Vector.
t2=torch.tensor([1.,2,3,4])
t2

tensor([1., 2., 3., 4.])

- PyTorch should have same datatypes, else it converts automatically to int form as above.

In [7]:
# Matrix
t3=torch.tensor([[5.,6],[7,8],[9,10]])
t3

tensor([[ 5.,  6.],
        [ 7.,  8.],
        [ 9., 10.]])

In [8]:
# 3-Dimensional array.
t4=torch.tensor([[[11,12,13],
                 [13,14,15]],
                [[15,16,17],
                [17,18,19.]]])
t4

tensor([[[11., 12., 13.],
         [13., 14., 15.]],

        [[15., 16., 17.],
         [17., 18., 19.]]])

In [9]:
# Tensor can have any number of dimensions.
print(t1.shape)
print(t2.shape)
print(t3.shape)
print(t4.shape)

torch.Size([])
torch.Size([4])
torch.Size([3, 2])
torch.Size([2, 2, 3])


In [10]:
t4

tensor([[[11., 12., 13.],
         [13., 14., 15.]],

        [[15., 16., 17.],
         [17., 18., 19.]]])

##  Tensor operations and gradients

In [11]:
# Creating a tensor.
x=torch.tensor(3.)
w=torch.tensor(4., requires_grad=True)
b=torch.tensor(5., requires_grad=True)

In [12]:
# Arithemtic Operation
y=w*x+b
y

tensor(17., grad_fn=<AddBackward0>)

- From above, **y** is a tensorwith the value 3*4+5 =17. We can automatically compute the derivative of **y** w.r.t the tensors that have **requires_grad** set to **True** i.e w and b. This feature of PyTorch is called **autograd**(automatic gradients).

- To compute the derivatives, we can invoke the **.backward** method on our result **y**

In [13]:
# To Compute Derivatives we use .backward method. 
y.backward()

In [14]:
# Derivatives are important for Optimization Algorithms such as Gradient Discent.
# Derivatives of a variable is stored in '.grad' property of respective tensors.

# Display gradients.
print('dy/dx:', x.grad) # since we didnt give requires_grad=True, it gives None for this derivative.
print('dy/dw:',w.grad)
print('dy/db:', b.grad)

dy/dx: None
dy/dw: tensor(3.)
dy/db: tensor(1.)


In [15]:
# gradient is used when we deal with matrixes.
# Derivative is used when we deal with numbers.

## Tensor functions
- Apart from arithmetic operations, torch module also contains many functions for creating and manipulating tensors.

In [16]:
# Creating a tensor with a fixed value for every element.
t6=torch.full((3,2),42) # (3,2) is shape and 42 is every element in matrix should be 42
t6

tensor([[42., 42.],
        [42., 42.],
        [42., 42.]])

In [17]:
t3

tensor([[ 5.,  6.],
        [ 7.,  8.],
        [ 9., 10.]])

In [18]:
# Concatenating two tensors wwith compatible shapes.
t7=torch.cat((t3,t6))
t7

tensor([[ 5.,  6.],
        [ 7.,  8.],
        [ 9., 10.],
        [42., 42.],
        [42., 42.],
        [42., 42.]])

In [19]:
# Computing sine of every element.
t8=torch.sin(t7)
t8

tensor([[-0.9589, -0.2794],
        [ 0.6570,  0.9894],
        [ 0.4121, -0.5440],
        [-0.9165, -0.9165],
        [-0.9165, -0.9165],
        [-0.9165, -0.9165]])

In [20]:
# Changing shape of tensor
t9= t8.view(3,2,2)
t9
# or
# t9= t8.reshape(3,2,2)

# For above both syntaxes are same for reshaping.

tensor([[[-0.9589, -0.2794],
         [ 0.6570,  0.9894]],

        [[ 0.4121, -0.5440],
         [-0.9165, -0.9165]],

        [[-0.9165, -0.9165],
         [-0.9165, -0.9165]]])

## Interoperability with Numpy.
- **Numpy** is one of the popular open-source library used for mathematical and scientific computing in Python.It enables efficient operations on large multi-dimensional arrays and has a vast ecosystem of supporting libraries which includes:
  - **Pandas** for file I/O and data analysis.
  - **Matplotlib** for plotting and visualization.
  - **OpenCV** for image and video processing.

In [21]:
import numpy as np

In [22]:
lst=[3,4,5,6]
arr=np.array(lst)

In [23]:
# Converting numpy array to PyTorch tensors.
tensor=torch.from_numpy(arr)
tensor

tensor([3, 4, 5, 6], dtype=torch.int32)

In [24]:
x=torch.Tensor([2,3])
y=torch.Tensor([3,4])
print(x*y)

tensor([ 6., 12.])


In [25]:
torch.ones([2,3])

tensor([[1., 1., 1.],
        [1., 1., 1.]])

In [26]:
y=torch.zeros([3,5])
y

tensor([[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]])

### Reshape

In [27]:
s=torch.rand([5,6])
s

tensor([[0.1584, 0.6196, 0.2311, 0.2726, 0.8147, 0.6848],
        [0.5885, 0.7764, 0.6482, 0.4638, 0.2022, 0.9913],
        [0.0875, 0.0905, 0.2957, 0.2882, 0.0979, 0.1427],
        [0.5846, 0.3798, 0.7731, 0.1545, 0.2389, 0.0415],
        [0.1900, 0.6544, 0.0590, 0.7609, 0.7562, 0.6265]])

In [28]:
# Reshaping an array
s.view([3,10])

tensor([[0.1584, 0.6196, 0.2311, 0.2726, 0.8147, 0.6848, 0.5885, 0.7764, 0.6482,
         0.4638],
        [0.2022, 0.9913, 0.0875, 0.0905, 0.2957, 0.2882, 0.0979, 0.1427, 0.5846,
         0.3798],
        [0.7731, 0.1545, 0.2389, 0.0415, 0.1900, 0.6544, 0.0590, 0.7609, 0.7562,
         0.6265]])

### Major Steps in Linear Regression.
- Design the model(input , output size, forward pass) 
- Construct loss and optimizer
- Training loop
  - Forward pass: compute predictions and loss
  - backward pass: gradients
  - update weights

# Introduction to Linear Regression

In this tutorial, we'll discuss one of the foundational algorithms in machine learning: *Linear regression*. We'll create a model that predicts crop yields for apples and oranges (*target variables*) by looking at the average temperature, rainfall, and humidity (*input variables or features*) in a region. Here's the training data:

![linear-regression-training-data](https://i.imgur.com/6Ujttb4.png)

In a linear regression model, each target variable is estimated to be a weighted sum of the input variables, offset by some constant, known as a bias :

```
yield_apple  = w11 * temp + w12 * rainfall + w13 * humidity + b1
yield_orange = w21 * temp + w22 * rainfall + w23 * humidity + b2
```

Visually, it means that the yield of apples is a linear or planar function of temperature, rainfall and humidity:

![linear-regression-graph](https://i.imgur.com/4DJ9f8X.png)

- The learning part of linear regression is to figure out a set of weights **w11, w12... w23, b1, b2** using the training data to make accurate predictions for new data. The **learned** weights will be used to predict the  yeilds for apples and oranges in a new region using the average temperature, rainfall and humidity for that region.


-  We train the model by adjusting the weights slightly many times to make better predicitons, using an optimization technique called **gradient descent** 

## Method 1:

In [29]:
import numpy as np
import torch

### Training  data
- We can represent the training data using two metrices: **inputs** and **targets**, each with one row per observation and one column per variable.

In [30]:
# Input (temp, rainfall, humidity)
inputs = np.array([[73,67,43],
                [91,88,64],
                [87,134,58],
                [102,43,37],
                [69,96,70]], dtype='float32')

In [31]:
# Target Variables.(apples,oranges)
targets = np.array([[56,70],
                 [81,110],
                 [119,133],
                 [22,37],
                 [103,119]],dtype='float32')

- We've separated the input and target variables because we'll operate on them separately. Also we've created numpy arrays, because this is typically how we train the data
-  Other method is ----> read some CSV files as numpy arrays, do some processing, and then convert them to PyTorch tensors.

In [32]:
# Converting numpy and pytorch into tensors
inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)

In [33]:
print(inputs)
print(targets)

tensor([[ 73.,  67.,  43.],
        [ 91.,  88.,  64.],
        [ 87., 134.,  58.],
        [102.,  43.,  37.],
        [ 69.,  96.,  70.]])
tensor([[ 56.,  70.],
        [ 81., 110.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


### Linear Regression Model from scratch.
- The weights and biases (**w11, w12,... w23, b1 & b2**) can also be represented as matrices, initialized as random values. The first row of **w** and the first element of **b** are used to predict the first target variable, i.e., yield of apples, and similarly, the second for oranges.

In [34]:
# Adding 'weights and biases' randomly.

w=torch.randn(2,3, requires_grad=True)
b=torch.randn(2, requires_grad=True)

print(w)
print(b)

tensor([[ 0.6787,  0.4869, -1.0340],
        [ 0.6957,  0.1356,  0.5034]], requires_grad=True)
tensor([-0.8766,  0.8002], requires_grad=True)


- **torch_randn** creates a tensor with the given shape with elements picked randomly from a normal distribution with mean as 0 and standard deviation 1.
- Our model is just does matrix multiplication between **inputs** and the **weights** (transposed) and adds the bias **b**.

### Defining the model.

In [35]:
def model(x):
    return x @ w.t() + b
    # x will be inputs
    # @ is matrix multiplication
    #.t() is transpose
    # b is biases

- **@** is matrix multiplication in PyTorch and **.t** returns the transpose of a tensor.
-  The matrix obtained by passing the input data into the model is a set of predictions for the target variables.
- math: y= xA^T + b

### General Steps.
- Generate Predictions
- Calculate the loss
- Compute gradients w.r.t the weights and biases
- Adjust the weights by subtracting a small quantity proportional to the gradient
- Reset the gradients to zero

In [36]:
# Generate Predictions.
preds=model(inputs)
print(preds)

tensor([[ 36.8331,  82.3233],
        [ 37.5616, 108.2666],
        [ 63.4505, 108.7023],
        [ 51.0338,  96.2239],
        [ 20.3208,  97.0659]], grad_fn=<AddBackward0>)


In [37]:
# Comparing our predictions with targets.
print(targets)

tensor([[ 56.,  70.],
        [ 81., 110.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


-  When comparing the actual values to predicted, there are differences between them. Because we have initialized our model with random weights and biases.

### Loss Function.
- To evaluate our model by comparing the model's prediction with the actual targets.(preds-targets)
- Calculate the differences between the two matrics(preds & targets)
- Square all elements of the difference matrix **to remove negative values**.
- Calculate the average of the elements in the resulting matrix. The result is single number known as the **mean squared error**(MSE).

In [38]:
# MSE Loss.
def MSE(x,y): # This is Mean Squared Error.
    diff= x-y
    return torch.sum (diff*diff)/diff.numel()

In [39]:
# Compute loss
loss=MSE(preds,targets)
print(loss)

tensor(1775.2646, grad_fn=<DivBackward0>)


### Compute gradients.
- In PyTorch we can compute the gradient or derivative of the loss w.r.t the weights and biases since we have **requires_grad=True**

In [40]:
# Compute gradients
loss.backward()
print(w.grad)
print(b.grad)

tensor([[-2585.6597, -3847.8286, -2307.8794],
        [  631.0690,  -428.3635,   -66.8811]])
tensor([-34.3600,   4.7164])


- These gradients are stored in **.grad** property of the respective tensors. 

In [41]:
# Gradients for weights.
print(w)
# Since w is a matrix then .grad of w will also be a matrix.
print(w.grad)

tensor([[ 0.6787,  0.4869, -1.0340],
        [ 0.6957,  0.1356,  0.5034]], requires_grad=True)
tensor([[-2585.6597, -3847.8286, -2307.8794],
        [  631.0690,  -428.3635,   -66.8811]])


In [42]:
print(b)
print(b.grad)

tensor([-0.8766,  0.8002], requires_grad=True)
tensor([-34.3600,   4.7164])


## Adjust weights and biases to reduce the loss
- Loss(MSE) is a quadratic function of our weights and biases, our main aim is **to find the set of weights where the loss is the lowest**. If we plot a graph of the loss w.r.t any individual weight or bias element, it looks like below figure.
- Gradient indicates the rate of change of the loss, i.e the slope of the loss function w.r.t the weights and biases.



If a gradient element is **positive**:

* **increasing** the weight element's value slightly will **increase** the loss
* **decreasing** the weight element's value slightly will **decrease** the loss

![postive-gradient](https://i.imgur.com/WLzJ4xP.png)

If a gradient element is **negative**:

* **increasing** the weight element's value slightly will **decrease** the loss
* **decreasing** the weight element's value slightly will **increase** the loss

![negative=gradient](https://i.imgur.com/dvG2fxU.png)

The increase or decrease in the loss by changing a weight element is proportional to the gradient of the loss w.r.t. that element. This observation forms the basis of _the gradient descent_ optimization algorithm that we'll use to improve our model (by _descending_ along the _gradient_).

We can subtract from each weight element a small quantity proportional to the derivative of the loss w.r.t. that element to reduce the loss slightly.

- We reset the gradients to zero by calling **.zero_()** method because everytime when we call .backward PyTorch keeps adding the gradients values into w.grad, since there is single variable to capture all gradients. Hence everytime when we done with gradient related work we need to **clear out the gradients** by setting gradient back to zero.

In [43]:
w.grad.zero_()
b.grad.zero_()
print(w.grad)
print(b.grad)

tensor([[0., 0., 0.],
        [0., 0., 0.]])
tensor([0., 0.])


In [44]:
# Calculate loss.
preds=model(inputs)
print(preds)

tensor([[ 36.8331,  82.3233],
        [ 37.5616, 108.2666],
        [ 63.4505, 108.7023],
        [ 51.0338,  96.2239],
        [ 20.3208,  97.0659]], grad_fn=<AddBackward0>)


In [45]:
# Calculate loss.
loss=MSE(preds,targets)
print(loss)

tensor(1775.2646, grad_fn=<DivBackward0>)


In [46]:
# Compute gradients
loss.backward()
print(w.grad)
print(b.grad)

tensor([[-2585.6597, -3847.8286, -2307.8794],
        [  631.0690,  -428.3635,   -66.8811]])
tensor([-34.3600,   4.7164])


- Since our gradient element is positive we need to subtract the small quantity propotional to the gradient.

In [47]:
# Adjusting the weights by subtracting a small quantity proportional to the gradient.
# Reseting the gradients to zero.
with torch.no_grad():
    w-= w.grad*1e-3 # For each weight element, we will subtract small quantity proportional to the corresponding gradient element, which does this element wise for entire matrix.
    b-= b.grad*1e-3
    w.grad.zero_()
    b.grad.zero_()

- torch.no_grad() is to indicate to PyTorch that we should'nt track, calculate or modify gradients while updating the weights and biases.

In [48]:
# New Weights and biases.
print(w)
print(b)

tensor([[3.2644, 4.3348, 1.2738],
        [0.0647, 0.5640, 0.5703]], requires_grad=True)
tensor([-0.8423,  0.7955], requires_grad=True)


-  With these new weights and biases, need to calculate loss.

In [49]:
# Calculate loss
preds=model(inputs)
print(preds)

tensor([[582.6640,  67.8268],
        [759.2042,  92.8110],
        [937.9033, 115.0744],
        [565.6537,  52.7444],
        [729.7089,  99.3220]], grad_fn=<AddBackward0>)


In [50]:
loss=MSE(preds,targets)
print(loss)

tensor(209751.8438, grad_fn=<DivBackward0>)


### To Reduce the loss further:
- To reduce the loss further, we repeat the process of adjusting the weights and biases using the gradients multiple times. each iteration is called as **Epoch**

In [51]:
for i in range(100):
    preds=model(inputs)
    loss=MSE(preds,targets)
    loss.backward()
    with torch.no_grad():
        w-=w.grad*1e-5
        b-=b.grad*1e-5
        w.grad.zero_()
        b.grad.zero_()

Once again, verifying the loss in lower order.

In [52]:
# Calculate loss
preds=model(inputs)
loss=MSE(preds,targets)
print(loss)

tensor(136.7434, grad_fn=<DivBackward0>)


In [53]:
# Predictions
preds

tensor([[ 59.9405,  74.1622],
        [ 74.2641, 102.3236],
        [132.2929, 132.4305],
        [ 37.0775,  51.0612],
        [ 78.5946, 113.2918]], grad_fn=<AddBackward0>)

In [54]:
# Targets
targets

tensor([[ 56.,  70.],
        [ 81., 110.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])

## Method 2:

# Linear Regresson using PyTorch built-ins
-  The model and training process above executed were using basic matrix operations. But since this is common pattern, PyTorch has several built-in functions and classes to make it easy to create and train the models

In [55]:
import torch.nn as nn

In [56]:
# Input (temp, rainfall, humidity)
inputs = np.array([[73, 67, 43], 
                   [91, 88, 64], 
                   [87, 134, 58], 
                   [102, 43, 37], 
                   [69, 96, 70], 
                   [74, 66, 43], 
                   [91, 87, 65], 
                   [88, 134, 59], 
                   [101, 44, 37], 
                   [68, 96, 71], 
                   [73, 66, 44], 
                   [92, 87, 64], 
                   [87, 135, 57], 
                   [103, 43, 36], 
                   [68, 97, 70]], 
                  dtype='float32')

# Targets (apples, oranges)
targets = np.array([[56, 70], 
                    [81, 101], 
                    [119, 133], 
                    [22, 37], 
                    [103, 119],
                    [57, 69], 
                    [80, 102], 
                    [118, 132], 
                    [21, 38], 
                    [104, 118], 
                    [57, 69], 
                    [82, 100], 
                    [118, 134], 
                    [20, 38], 
                    [102, 120]], 
                   dtype='float32')

In [57]:
inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)

In [58]:
inputs

tensor([[ 73.,  67.,  43.],
        [ 91.,  88.,  64.],
        [ 87., 134.,  58.],
        [102.,  43.,  37.],
        [ 69.,  96.,  70.],
        [ 74.,  66.,  43.],
        [ 91.,  87.,  65.],
        [ 88., 134.,  59.],
        [101.,  44.,  37.],
        [ 68.,  96.,  71.],
        [ 73.,  66.,  44.],
        [ 92.,  87.,  64.],
        [ 87., 135.,  57.],
        [103.,  43.,  36.],
        [ 68.,  97.,  70.]])

 We are using  training examples time time, to illustrate how to work with large datasets in small batches

## Dataset and DataLoader
- we create **TensorDataset**, which allows access to rows from **inputs** and **targets** as tuples and provides standard APIs for  working with many different types of datasets in PyTorch.

In [59]:
from torch.utils.data import TensorDataset

In [60]:
# Defining Dataset
train_ds=TensorDataset(inputs, targets)
print(train_ds[0:3]) # Picks first 3 rows of both input and output data.


(tensor([[ 73.,  67.,  43.],
        [ 91.,  88.,  64.],
        [ 87., 134.,  58.]]), tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.]]))


 All the datasets either images, text, tabular data, etc. will be converted into pytorch datasets.

- This **TensorDataset** allows us to access a small section of  the training data using the array indexing notations([:3] in the above code). It returns a tuple with two elements. The first element contains the input variables for the selected rows, and the second contains the targets.
- We also create a **DataLoader**, which can **split the data into batches of a predefined size** while training. It also provides other utilities like shuffling and random sampling of the data.

In [61]:
from torch.utils.data import DataLoader

In [62]:
# Define data loader

batch_size=5 # deals the data with batches of 5
train_dl=DataLoader(train_ds, batch_size, shuffle=True)
print(train_dl)

<torch.utils.data.dataloader.DataLoader object at 0x000001FE5FACD0B8>


In [63]:
for xb, yb in train_dl:
    print(xb) # Looking for one batch of data after shuffling.
    print(yb)
    break

tensor([[101.,  44.,  37.],
        [ 73.,  67.,  43.],
        [ 87., 135.,  57.],
        [ 91.,  88.,  64.],
        [103.,  43.,  36.]])
tensor([[ 21.,  38.],
        [ 56.,  70.],
        [118., 134.],
        [ 81., 101.],
        [ 20.,  38.]])


- In each iteration, the data lloader returns one batch of the data with the given batch size. If **shuffle** is set to **True**, It shuffles the training data before creating batches. Shuffling even helps us po randomize the input to the optimization algorithm, leadinig to a faster reduction in the loss.

### nn.Linear
- Instead of initializing the weights and biases namually, we can define the model using the **nn.Linear** class from PyTorch, which does it automatically.

In [64]:
# Define Model
model=nn.Linear(3,2) # Specifying '3' inputs and '2' outputs i.e 
print(model.weight)
print(model.bias)

Parameter containing:
tensor([[ 0.5020,  0.1231, -0.3146],
        [ 0.4185, -0.4518, -0.2985]], requires_grad=True)
Parameter containing:
tensor([ 0.5296, -0.0403], requires_grad=True)


- PyTorch models has a helpful **.parameters** method, which returns a list containing all the weights and bias matrices present in the model. For our linear regression model, we have one weight matrix and one bias matrix

In [65]:
# Parameters i.e list of weights and biases.
list(model.parameters())

[Parameter containing:
 tensor([[ 0.5020,  0.1231, -0.3146],
         [ 0.4185, -0.4518, -0.2985]], requires_grad=True),
 Parameter containing:
 tensor([ 0.5296, -0.0403], requires_grad=True)]

In [66]:
# Generate Predictions
preds=model(inputs)
preds

tensor([[ 31.8923, -12.5903],
        [ 36.9057, -20.8121],
        [ 42.4495, -41.4766],
        [ 45.3820,  12.1810],
        [ 24.9597, -35.4254],
        [ 32.2712, -11.7200],
        [ 36.4679, -20.6589],
        [ 42.6368, -41.3565],
        [ 45.0032,  11.3107],
        [ 24.1431, -36.1424],
        [ 31.4546, -12.4371],
        [ 37.2845, -19.9418],
        [ 42.8873, -41.6299],
        [ 46.1986,  12.8981],
        [ 24.5809, -36.2957]], grad_fn=<AddmmBackward>)

### Loss Function
- Instead of defining the loss function manually, we can use built-in loss function **mse_loss**

In [67]:
# Importing nn.functional. This package contains many useful loss functions.
import torch.nn.functional as F

In [68]:
loss_fn = F.mse_loss

In [69]:
# To get the documentation help during the coding i,e helps to get clarify the syntaxes
?nn.Linear

In [70]:
loss=loss_fn(model(inputs), targets)
print(loss)

tensor(9156.9521, grad_fn=<MseLossBackward>)


### Optimizer
- Instead of manually manipulating the model's weight and biases using gradients, we can use the optimizer **optim.SGD**. SGD is for 'Stochastic Gradient Descent'. The term Stoochastic indicates that samples are selected in random batches instead of a single group.

In [71]:
# Defining Optimizer
opt = torch.optim.SGD(model.parameters(), lr=1e-5) #

- **model.parameters()** is passed as an argument to **optim.SGD**, so that the optimizer knows which matrices should be modified during the update setp. Even we can specify a **learning rate which controls the amount by which the parameters are modified**.

### Training the model
Steps involved to implement gradient descent are:
- Generate Predictions
- Calculate the loss
- Compute gradients w.r.t the weights and biases
- Adjust the weights by subtracting a small quantity proportional to the gradient

The only change is that we'll work batches of the data instead of processing the entire training data in every iteration. 

In [72]:
def fit(num_epochs, model, loss_fn, opt, train_dl):
    
    # Repeat for given number of epochs
    for epoch in range(num_epochs):
        
        # Train with batches of data
        for xb, yb in train_dl:
            
            # 1. Generate predictions
            pred=model(xb)
            
            # 2. Calculate loss
            loss=loss_fn(pred, yb)
            
            # 3. Compute gradients
            loss.backward()
            
            # 4. Update parameters using gradients
            opt.step() # This updates weights, bias everything
            
            # 5. Reset the gradients to zero
            opt.zero_grad() # Resets all the gradients back to zero
            
        # Print the progress
        if (epoch+1) %10==0:
            print('Epoch [{}/{}], loss:{:.4f}'.format(epoch+1, num_epochs, loss.item()))

# For every 10 epochs we are getting loss value, then we are formatting and printing out.

In [73]:
fit(100, model, loss_fn, opt, train_dl)

Epoch [10/100], loss:1453.8254
Epoch [20/100], loss:200.2012
Epoch [30/100], loss:146.2127
Epoch [40/100], loss:429.0978
Epoch [50/100], loss:212.6598
Epoch [60/100], loss:132.7685
Epoch [70/100], loss:77.0179
Epoch [80/100], loss:88.4021
Epoch [90/100], loss:72.7909
Epoch [100/100], loss:35.9717


In [74]:
# Generate Predictions
preds=model(inputs)
preds

tensor([[ 59.0136,  72.5083],
        [ 78.8558,  97.9103],
        [122.6967, 135.7258],
        [ 31.3636,  49.3879],
        [ 90.2411, 107.1066],
        [ 57.9182,  71.6338],
        [ 78.0105,  97.3925],
        [122.6663, 136.0674],
        [ 32.4590,  50.2624],
        [ 90.4912, 107.4633],
        [ 58.1683,  71.9905],
        [ 77.7604,  97.0358],
        [123.5420, 136.2436],
        [ 31.1136,  49.0312],
        [ 91.3366, 107.9811]], grad_fn=<AddmmBackward>)

In [75]:
targets

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.],
        [ 57.,  69.],
        [ 80., 102.],
        [118., 132.],
        [ 21.,  38.],
        [104., 118.],
        [ 57.,  69.],
        [ 82., 100.],
        [118., 134.],
        [ 20.,  38.],
        [102., 120.]])

In [76]:
# Predicting the External data.
model(torch.tensor([[75,63,44.]]))

tensor([[55.0223, 69.3745]], grad_fn=<AddmmBackward>)

- The Predicted yeild of apples is 55.84 tons per hectare and that of oranges is 69.36 tons per hectare.