<a href="https://colab.research.google.com/github/sachinkun21/PyTorch/blob/master/PyTorch_Neural_Network_from_Scratch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **What is PyTorch?**
Let’s understand what PyTorch is and why it has become so popular lately, before diving into it’s implementation.

- PyTorch is a Python based scientific computing package that is similar to NumPy, but with the added power of GPUs. 
- It is also a deep learning framework that provides maximum flexibility and speed during implementing and building deep neural network architectures.


Recently, PyTorch 1.0 was released and it was aimed to assist researchers by addressing four major challenges:

1. Extensive reworking
2. Time-consuming training
3. Python programming language inflexibility
4. Slow scale-up


Intrinsically, there are two main characteristics of PyTorch that distinguish it from other deep learning frameworks:

1. Imperative Programming
2. Dynamic Computation Graphing

**Imperative Programming:** PyTorch performs computations as it goes through each line of the written code. This is quite similar to how a Python program is executed. This concept is called imperative programming. The biggest advantage of this feature is that your code and programming logic can be debugged on the fly.

**Dynamic Computation Graphing:** PyTorch is referred to as a “defined by run” framework, which means that the computational graph structure (of a neural network architecture) is generated during run time. The main advantage of this property is that it provides a flexible and programmatic runtime interface that facilitates the construction and modification of systems by connecting operations. In PyTorch, a new computational graph is defined at each forward pass. This is in stark contrast to TensorFlow which uses a static graph representation.

PyTorch 1.0 comes with an important feature called torch.jit, a high-level compiler that allows the user to separate the models and code. It also supports efficient model optimization on custom hardware, such as GPUs or TPUs.

 

Building Neural Nets using PyTorch
Let’s understand PyTorch through a more practical lens. Learning theory is good, but it isn’t much use if you don’t put it into practice!



A PyTorch implementation of a neural network looks exactly like a NumPy implementation. The goal of this section is to showcase the equivalent nature of PyTorch and NumPy. For this purpose, let’s create a simple three-layered network having 5 nodes in the input layer, 3 in the hidden layer, and 1 in the output layer. We will use only one training example with one row which has five features and one target.


In [0]:
import torch
n_input, n_hidden, n_output = 5 , 3 , 1

The first step is to do parameter initialization. Here, the weights and bias parameters for each layer are initialized as the tensor variables. 

Tensors are the base data structures of PyTorch which are used for building different types of neural networks. They can be considered as the generalization of arrays and matrices; in other words, tensors are N-dimensional matrices.

In [35]:
# initialising tensor for input and output
X = torch.randn((1 , n_input))
y = torch.randn((1, n_output))
X,y

(tensor([[-1.4169, -0.1208,  2.9017, -0.5320,  1.1673]]), tensor([[0.9833]]))

In [36]:
# initialising tensors variables for weights
w1 = torch.randn(n_input , n_hidden)
w2 = torch.randn(n_hidden , n_output)
print(w1 ,"\n", w2)

tensor([[-0.1227, -1.7698, -0.4778],
        [-0.1151, -0.4091, -1.4808],
        [ 0.3592, -1.2880, -0.1151],
        [-1.5897, -0.7054,  1.2197],
        [ 1.2678,  0.9360, -1.2345]]) 
 tensor([[-0.7256],
        [ 0.2154],
        [ 0.0544]])


In [37]:
# initialising tensors for bias terms
b1 = torch.randn(1, n_hidden)
b2 = torch.randn(1, n_output)
b1, b2

(tensor([[-0.0885,  1.1882,  0.7300]]), tensor([[2.3224]]))

After the parameter initialization step, a neural network can be defined and trained in four key steps:

- Forward Propagation
- Loss computation
- Backpropagation
- Updating the parameters

Let’s see each of these steps in a bit more detail.

 

**Forward Propagation**: In this step, activations are calculated at every layer using the two steps shown below. These activations flow in the forward direction from the input layer to the output layer in order to generate the final output.

1. z = weight * input + bias
2. a = activation_function (z)

The following code blocks show how we can write these steps in PyTorch. Notice that most of the functions, such as exponential and matrix multiplication, are similar to the ones in NumPy.

In [0]:
# sigmoid activation using pytorch
def sigmoid_activation(z):
  return 1 / (1 + torch.exp(-z))


In [39]:
# activating hidden layer
z1 = torch.mm(X, w1) + b1
a1 = sigmoid_activation(z1)
a1

tensor([[0.9697, 0.8139, 0.3020]])

In [40]:
# activation of output layer
z2 = torch.mm(a1, w2) + b2
output = sigmoid_activation(z2)
output

tensor([[0.8594]])

**Loss Computation**: In this step, the error (also called loss) is calculated in the output layer. A simple loss function can tell the difference between the actual value and the predicted value. Later, we will look at different loss functions available in PyTorch.

In [0]:
loss = y - output

**Backpropagation**: The aim of this step is to minimize the error in the output layer by making marginal changes in the bias and the weights. These marginal changes are computed using the derivatives of the error term.

Based on the Calculus principle of the Chain rule, the delta changes are back passed to hidden layers where corresponding changes in their weights and bias are made. This leads to an adjustment in the weights and bias until the error is minimized

In [0]:
# function to calculate the derivative of activation
def sigmoid_delta(x):
  return x*(1-x)


In [0]:
# let's compute the derivative of error terms
delta_output = sigmoid_delta(output)
delta_hidden = sigmoid_delta(a1)


In [0]:
# backpass the changes to previous layers

d_outp = loss*delta_output
loss_h = torch.mm(d_outp , w2.t())
d_hidn = loss_h*delta_hidden


**Updating the Parameters**: Finally, the weights and bias are updated using the delta changes received from the above backpropagation step.

In [0]:
learning_rate = 0.1

In [0]:
w2 += torch.mm(a1.t() , d_outp)*learning_rate 
w1 += torch.mm(X.t() , d_hidn)*learning_rate 

In [47]:
b2 += d_outp.sum() * learning_rate
b1 += d_hidn.sum() * learning_rate
b1, b2

(tensor([[-0.0885,  1.1883,  0.7301]]), tensor([[2.3239]]))

Finally, when these steps are executed for a number of epochs with a large number of training examples, the loss is reduced to a minimum value. The final weight and bias values are obtained which can then be used to make predictions on the unseen data.

In the next cell write code to perform the above 4 steps 500 times and print the final loss, along with final weigths and bias values.

T.B.C.