In [1]:
import torch

In [2]:
n_input,n_hidden,n_output = 5,3,1

In [3]:
## initialize tensor for inputs, and outputs 
x = torch.randn((1, n_input))
y = torch.randn((1, n_output))

In [4]:
## initialize tensor variables for weights 
w1 = torch.randn(n_input, n_hidden) # weight for hidden layer
w2 = torch.randn(n_hidden, n_output) # weight for output layer

In [5]:
## initialize tensor variables for bias terms 
b1 = torch.randn((1, n_hidden)) # bias for hidden layer
b2 = torch.randn((1, n_output)) # bias for output layer

After the parameter initialization step, a neural network can be defined and trained in four key steps:

Forward Propagation  
Loss computation  
Backpropagation  
Updating the parameters  

# Forward Propagation: 
In this step, activations are calculated at every layer using the two steps shown below. These activations flow in the forward direction from the input layer to the output layer in order to generate the final output.

z = weight * input + bias
a = activation_function (z)

In [6]:
## sigmoid activation function using pytorch
def sigmoid_activation(z):
    return 1 / (1 + torch.exp(-z))

In [7]:
## activation of hidden layer 
z1 = torch.mm(x, w1) + b1
a1 = sigmoid_activation(z1)

In [8]:
## activation (output) of final layer 
z2 = torch.mm(a1, w2) + b2
output = sigmoid_activation(z2)

# Loss Computation: 
In this step, the error (also called loss) is calculated in the output layer. A simple loss function can tell the difference between the actual value and the predicted value. Later, we will look at different loss functions available in PyTorch.

In [9]:
loss = y - output

# Backpropagation: 
The aim of this step is to minimize the error in the output layer by making marginal changes in the bias and the weights. These marginal changes are computed using the derivatives of the error term.

Based on the Calculus principle of the Chain rule, the delta changes are back passed to hidden layers where corresponding changes in their weights and bias are made. This leads to an adjustment in the weights and bias until the error is minimized.

In [10]:
## function to calculate the derivative of activation
def sigmoid_delta(x):
  return x * (1 - x)

In [11]:
## compute derivative of error terms
delta_output = sigmoid_delta(output)
delta_hidden = sigmoid_delta(a1)

In [12]:
## backpass the changes to previous layers 
d_outp = loss * delta_output
loss_h = torch.mm(d_outp, w2.t())
d_hidn = loss_h * delta_hidden

# Updating the Parameters: 
Finally, the weights and bias are updated using the delta changes received from the above backpropagation step.

In [13]:
learning_rate = 0.1

In [14]:
w2 += torch.mm(a1.t(), d_outp) * learning_rate

In [15]:
w1 += torch.mm(x.t(), d_hidn) * learning_rate

In [16]:
b2 += d_outp.sum() * learning_rate
b1 += d_hidn.sum() * learning_rate

Finally, when these steps are executed for a number of epochs with a large number of training examples, the loss is reduced to a minimum value. The final weight and bias values are obtained which can then be used to make predictions on the unseen data.