A netwok has 3 components - an input layer, one or more hidden layers and one output layer. Information travels from left to right.

Input layer is not composed of neurons. It just contains raw data and number of nodes = number of features of the data.

Hidden layers are the brain of a neural network. Width of a hidden layer is number of neurons in it and depth is the number of layers. Each neuron in a hidden layer is connected to all outputs of the previous layer. They detect complex patterns like layer 1 may detect shapes, layer 2 may combine shapes to make objects, etc.

Output layer contains number of neurons = number of outputs we want. Activation function used in this is task specific.

# **NEURON TO LAYER**

Say we have n neurons each having m weights. We can then make a weight matrix W of shape m x n where each column vector of the matrix corresponds to the weight vector of that neuron. Similarly we can have a bias vector of size n where each element represents bias of that neuron.

For a single neuron, linear step was wTx where w was weight vector. For this operation it is the matrix vector operation x @ W + b and result is a vector containing pre-activation of all neurons.

In [10]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class Layer(nn.Module):
  def __init__(self, n_input, n_neurons, activation = None):
    super().__init__()
    self.weights = nn.Parameter(torch.randn(n_input, n_neurons)) #Use nn.Parameter(torch.randn()) instead of torch.randn() directly because nn.Parameter tells PyTorch it is a learnable parameter. Only registered parameters are updated by torch.optim.
    self.bias = nn.Parameter(torch.zeros(n_neurons)) #Without nn.Parameters, gradients can still be track upon setting requires_grad = True but we need to pass it in optim everytime.
    self.activation = activation

  def forward_pass(self, x):
    logits = x @ self.weights + self.bias
    if self.activation is not None:
      return self.activation(logits)
    else:
      return logits



In [11]:
n_neurons = 3
n_inputs = 5
batch_size = 2

my_layer = Layer(n_inputs, n_neurons, activation = F.relu)
x = torch.randn(batch_size, n_inputs)
output = my_layer.forward_pass(x)
print(output)

tensor([[1.5587, 4.1114, 5.3775],
        [1.3606, 0.0000, 0.0000]], grad_fn=<ReluBackward0>)


**Practice 1 -** Create an output layer for binary classification problem that should take 16 inputs from a previous hidden layer, using sigmoid as the activation function.

In [12]:
n_inputs = 16
n_neurons = 1 #because output is either yes or no. No other features.

my_layer = Layer(n_inputs, n_neurons, activation = F.sigmoid)
x = torch.randn(n_inputs)
output = my_layer.forward_pass(x)
print(output)


tensor([0.4462], grad_fn=<SigmoidBackward0>)


**Practice 2 -** Create a Layer instance that would be suitable as the output layer for a regression problem (like predicting a house price). It should take 8 inputs.

In [13]:
n_inputs = 8
n_neurons = 1 #We are just interested in the price so one output - that single number.
batch_size = 5

my_layer = Layer(n_inputs, n_neurons) #No activation as the problem is generally Linear Regression. No need for activation as we don't need to deal with boundedness.

x = torch.randn(batch_size, n_inputs)

output = my_layer.forward_pass(x)
print(output)

tensor([[ 0.4546],
        [-1.0087],
        [-2.7327],
        [ 1.9517],
        [ 1.0942]], grad_fn=<AddBackward0>)


**PyTorch built-in Layer Class -** nn.Linear does the same thing as our defined layer.

In [15]:
layer1 = nn.Linear(in_features=8, out_features=1)
x = torch.randn(batch_size, n_inputs)
output = layer1(x)
print(output)

tensor([[ 0.6360],
        [-1.1589],
        [-0.9103],
        [-0.1377],
        [-1.2064]], grad_fn=<AddmmBackward0>)
