The idea behind this notebook series is to go slow and build a solid understanding of code and maths involved in deep learning. Few points to note -
1. For simplicity of calculations, I have used small integer values in examples
2. This code is only for understanding the concepts so it is missing couple of things like type checking and error handling

This is the 1st notebook in the series. Here we start with a single neuron (scalar input and output) and build upto the idea of a network layer.

[Colab link](https://colab.research.google.com/github/shwetaAgrawal/deeplearning_tutorials/blob/main/notebooks/1_Intro_to_perceptron.ipynb)


In [1]:
# Pre-requisite for running this notebook - numpy. If you come across error "No module named 'numpy'" then please uncomment the below line and run this cell
#!pip install numpy

# Introduction to Neural Networks

Neural networks are composed of multiple nodes knows as neurons. Like any network node, neurons also have incoming and outgoing connections. All the connections in neural networks are weighted. During training of a neural network, we try to find the optimal weights for given data (input features and target). 

![Neural Network](assets/images/NeuralNetworkIllustration.png)

## Neuron 
Lets start by understanding working of a neuron

How does an artifical neuron work?
It takes in an **input** - applies **weight** and **intercept** followed by an **activation** function
$$
\hat y = f(W^T \cdot x + b)
$$
- where ${f}$ is activation function
- W is weight matrix consisting of individual neurons weight vectors. A single neuron's weight vector is also represented as a column vector with number of rows = number of input dimensions
- x is input vector usually depicted as a column vector.
- b is bias

![Single Neuron](assets/images/SingleNeuron.png)

In this notebook, we will compute ${\hat y}$ for a given input and weight. We will be increasing the complexity slowly till we are able to build a full network consisting of multiple layers and able to support multi-dimensional inputs and outputs.

This notebook follows the concepts covered in [Linear Layer Worksheet by Tom Yeh](https://aibyhand.substack.com/p/w3-linear-layer). If you are like me, you might want to grab the pen and notebook to do the exercises covered in the linked workbook before going through code here.

In [2]:
# numpy is used for efficient numerical operations - it supports vectorized operations on n-dimensional arrays
# random is used to generate random numbers for generating input data
import numpy as np
import random

### Neuron with 1-D input and 1-D output 

1-D input => input vector of dimension 1 x 1

Now since there is 1-D input - our weight matrix's dimensions are also 1 x 1 and transpose of this matrix is matrix itself so ${W^T = W}$

To keep things simple, we will represent this 1 X 1 matrices as scalar. 

In [3]:
# Single neuron with 1-D input and 1-D output. As mentioned above, to keep it simple, we are representing the 1x1 matrix as a scalar value
x = random.randint(1, 10)
print("Input, x =", x)
weight = 3
bias = 1

# lets say activation function is linear f(x) = x for all values of x and transpose of a scalar is scalar itself
y_pred = (weight * x + bias)
print("Prediction, y_pred =", y_pred)

Input, x = 1
Prediction, y_pred = 4


In [4]:
#Lets define the neuron class 
class ArtificalNeuron:
    """
    A simple artificial neuron implementation.
    """

    def __init__(self, weight: np.ndarray, bias: np.ndarray) -> None:
        """
        Initialize the neuron with the given weight and bias.
        
        Args:
            weight (float/int) : The weight of the neuron.
            bias (float/int) : The bias of the neuron.
        """
        self.weight = weight
        self.bias = bias

    def forward(self, x: np.ndarray) -> np.ndarray:
        """
        Calculate the output of the neuron for the given input.
        
        Args:
            x (float/int/np.ndarray): The input to the neuron.
    
        Returns:
            float/int/np.ndarray: The output of the neuron.
        """
        return self.weight * x + self.bias
    
# Lets create an object of the neuron class and test it
neuron = ArtificalNeuron(weight, bias)

# Lets test using the same input x that we used in earlier cell
print("Input, x =", x)
y_pred = neuron.forward(x)
print("Prediction, y_pred =", y_pred)

Input, x = 1
Prediction, y_pred = 4


### Activation Functions

Lets define some common activation functions. Activation functions are used for **introducing non-linearity** to neural networks. Looking carefully at the equation for the artificial neuron, we can see that ${W^T \cdot x + b}$ is a linear equation so we need non-linearity to be able to model different types of deep learning tasks which can have linear as well as non-linear decision boundaries.

In [5]:
"""Implementing common activation functions. Expected input to all the functions are numpy arrays"""
class ActivationFunctions:
  @staticmethod
  def linear(x):
    """Linear activation function: f(x) = x
    Args:
      x: np.ndarray : input to the activation function
    
    Returns:
      np.ndarray
    """
    return x

  @staticmethod
  def sigmoid(x):
    """Sigmoid activation function: f(x) = 1 / (1 + exp(-x))
    
    Args:
      x: np.ndarray : input to the activation function
    
    Returns:
      np.ndarray
    """
    return 1 / (1 + np.exp(-x))

  @staticmethod
  def relu(x):
    """ReLU activation function: f(x) = max(0, x)
    
    Args:
      x: np.ndarray : input to the activation function
    
    Returns:
      np.ndarray
    """
    # return max(0, x) this won't work for an array
    return np.maximum(0, x)

  @staticmethod
  def tanh(x):
    """Tanh activation function: f(x) = (exp(2x) - 1) / (exp(2x) + 1)
    
    Args:
      x: np.ndarray : input to the activation function
    
    Returns:
      np.ndarray
    """
    return (np.exp(2*x) - 1) / (np.exp(2*x) + 1)

### Batch input processing with a neuron

Lets now run our existing code for a batch (set of inputs). We will create a batch of 5 scalar inputs to run through our neuron.

Since inputs are represented as column vector => each input is a new column in the input matrix => input matrix dimensions = 1 x 5

In [6]:
# running above calculations for a batch of scalar inputs. We will draw a sample of 5 inputs with values between 1 and 10
batch_size = 5

x = np.random.randint(1, 10, batch_size)
print("x =", x)
print("Weight =", neuron.weight)
print("Bias =", neuron.bias)

# we added activation function here for now. We will move it to the ArtificalNeuron class definition later
y_pred = ActivationFunctions.linear(neuron.forward(x))
print("y_pred =", y_pred)

x = [2 2 2 6 2]
Weight = 3
Bias = 1
y_pred = [ 7  7  7 19  7]


** Our existing class can handle both batch of inputs and a single input **

### Neuron with Multi-dimensional inputs

Lets increase complexity by adding more dimensions to input and analyze how our code changes with this change. 

We will start with understanding representation changes if any required for multi-dimensional input. We know that ${x}$ is represented as column vector, so an n-dimensional input will be represented as n x 1 matrix.
```
Ex 1 - two dimensional input sample
2
1
Ex - 3 input samples of 2 dimension will be represented as
2 3 1
1 0 3
```
weight matrix is also represented as column vector and number of weights for a neuron == number of incoming input connections => weight matrix will also have n x 1 dimension. Now, because of increase in dimensions, ${W^T \not= W}$, so we will need to make this change in our original neuron class.

As there is still 1 neuron, we can continue to represent bias as 1 x 1 matrix or a scalar. Usually bias is a vector with dimension = **num_neurons x 1**

In [7]:
# Now lets move to multi-dimensional input
# lets say input features are 3 so input has 3 rows and 1 columns, and weight is also 3 rows and 1 column
# bias continues to be a scalar
input_dimensions = 3 # since we decided for a 3 dimensional input above
output_dimensions = 1 # since we have only 1 neuron so only 1 output

# we are specifying 2 dimensions for input because we want a column vector, other way to achieve same is by creating row vectors and transposing them
x = np.random.randint(0, 5, size=(input_dimensions, 1))
print("x =\n", x)

# create weight vectors with dimensions as shared above
weight = np.random.randint(0, 2, size=(input_dimensions, output_dimensions))
print("\nweights =\n", weight)
# we can have bias as a 1 x 1 matrix or scalar both works
bias = 1

# Note that we switched to using np.matmul instead of * (scalar multiplier) for multiplication 
# We also transposed the weight matrix to match with our original definition 
y_pred = ActivationFunctions.linear(np.matmul(weight.T, x) + bias)
print("\ny_pred =\n", y_pred)

x =
 [[2]
 [2]
 [1]]

weights =
 [[0]
 [1]
 [0]]

y_pred =
 [[3]]


In [8]:
# lets vectorize the ArtificialNeuron class and add activation function to it
class Neuron:
    """
    A simple neuron implementation.
    """

    def __init__(self, weight, bias, activation_function):
        """
        Initialize the neuron with the given weight and bias.
        
        Args:
            weight (np.ndarray) : The weight of the neuron.
            bias (float/int/np.ndarray) : The bias of the neuron.
            activation_function (function) : The activation function to use.
        """
        self.weight = weight
        self.bias = bias
        self.activation_function = activation_function

    def forward(self, x):
        """
        Calculate the output of the neuron for the given input.
        
        Args:
            x (np.ndarray): The input to the neuron.
    
        Returns:
            np.ndarray: The output of the neuron.
        """
        # Note that we switched to using np.matmul instead of * (scalar multiplier) for multiplication
        # Note that we took a transpose of the weight matrix to support multi-dimensional input
        return self.activation_function(np.matmul(self.weight.T, x) + self.bias)
    
# lets create an object of the neuron class and test it using the same set of inputs, weight and bias as above
multi_input_neuron = Neuron(weight, bias, ActivationFunctions.linear)
print("x =\n", x)
print("weights =\n", weight)
print("bias =", bias)
y_pred = multi_input_neuron.forward(x)
print("\ny_pred =\n", y_pred)

x =
 [[2]
 [2]
 [1]]
weights =
 [[0]
 [1]
 [0]]
bias = 1

y_pred =
 [[3]]


In [9]:
# lets try the above Neuron class for a batch of inputs
batch_size = 5

x = np.random.randint(0, 5, size=(input_dimensions, batch_size))
print("x =\n", x)
print("weights =\n", weight)
print("bias =", bias)

# we are reusing the same object of ArtificalNeuron class that we created above => no change in weight and bias
y_pred = multi_input_neuron.forward(x)
print("\ny_pred =\n", y_pred)


x =
 [[1 0 0 3 0]
 [2 1 2 3 0]
 [3 0 2 2 1]]
weights =
 [[0]
 [1]
 [0]]
bias = 1

y_pred =
 [[3 2 3 4 1]]


## Neural Network Layer - Multiple Neurons with same input

Neural network layer consists of neurons having same set of inputs. So if we add one more neuron to our earlier setup with same inputs, we get a neural network layer. Now since we have multiple output neurons, we are going to have multiple outputs for each input record. Our output is also represented as a column vector with dimensions = **num_neurons x 1**

Lets first try to understand what does having multiple output neurons imply. So far we have seen that a single neuron had -
1. a set of weights
2. a bias term
3. activation function
4. output

So adding a new neuron implies that we need to add weights, bias and activation function for this new neuron & this addition will lead to addition of 
* one more column in the weight matrix representing new neuron's weights
* one more row in the bias matrix representing new neuron's bias
* output matrix dimension will now change to **num_neurons x 1**

In [10]:
# Now lets move to multi-dimensional output as in add 1 more neuron to our setup => 2 outputs
# this new neuron need to be initialized with new set of weights, bias and activation function
# continuing our earlier example where we had 3 input neurons and now 2 output neurons

input_dimensions = 3 # since we decided for a 3 dimensional input above
output_dimensions = 2 # since we now have 2 neuron so 2-dimensional output is expected now

# we are specifying 2 dimensions for input because we want a column vector, other way to achieve same is by creating row vectors and transposing them
x = np.random.randint(0, 5, size=(input_dimensions, 1))
print("x =\n", x)

# create weight vectors with dimensions as shared above
weight = np.random.randint(0, 2, size=(input_dimensions, output_dimensions))
print("\nweights =\n", weight)

# here we can't continue with scalar bias because now we have 2 neurons with their independent bias terms
# so we need a column vector with dimensions as output_dimensions X 1
bias = np.random.randint(-1, 1, size=(output_dimensions, 1))
print("\nbias =\n", bias)

# Note that we don't need to change the ArtificialNeuron class for this change in output dimensions
# its already designed to handle any number of output dimensions
network_layer = Neuron(weight, bias, ActivationFunctions.linear)
y_pred = network_layer.forward(x)
print("\ny_pred =\n", y_pred)

x =
 [[0]
 [2]
 [4]]

weights =
 [[0 1]
 [1 1]
 [0 0]]

bias =
 [[-1]
 [-1]]

y_pred =
 [[1]
 [1]]


### Neural Network Layer with batch of inputs

Now we are going to write code that can run multiple input samples through our neural network layer in one go. Also this time we have a slightly complex network consisting of multiple neurons and multi-dimensional inputs.

In the above cell, we laid foundations to build neural network layer, so lets start with that and identify if there are any changes required for batch inputs

In [11]:
# continuing our earlier example where we had 3 input neurons, 2 output neurons
# and now adding 5 input samples instead of 1

batch_size = 5 # number of input samples we want to process
input_dimensions = 3 # since we decided for a 3 dimensional input above
output_dimensions = 2 # since we now have 2 neuron so 2-dimensional output is expected now

# we are specifying 2 dimensions for input because we want a column vector, other way to achieve same is by creating row vectors and transposing them
# Note the change in dimension of input
x = np.random.randint(0, 5, size=(input_dimensions, batch_size))
print("x =\n", x)

# create weight vectors with dimensions as shared above
# Note no change in weights and bias as no change in underlying neural network structure
weight = np.random.randint(0, 2, size=(input_dimensions, output_dimensions))
print("\nweights =\n", weight)

# here we can't continue with scalar bias because now we have 2 neurons with their independent bias terms
# so we need a column vector with dimensions as output_dimensions X 1
bias = np.random.randint(-1, 1, size=(output_dimensions, 1))
print("\nbias =\n", bias)

# Note that we don't need to change the ArtificialNeuron class for this as well
network_layer = Neuron(weight, bias, ActivationFunctions.linear)
y_pred = network_layer.forward(x)
print("\ny_pred =\n", y_pred)

x =
 [[2 3 0 1 2]
 [4 4 3 4 2]
 [0 0 1 2 3]]

weights =
 [[0 1]
 [0 1]
 [1 1]]

bias =
 [[ 0]
 [-1]]

y_pred =
 [[0 0 1 2 3]
 [5 6 3 6 6]]


### Neural Network Layer

Now that we have understood the computations, lets tweak the input parameters to match the real world representation of artificial neurons and neural network layers. In practice, weights & bias are initialized randomly (mostly) and during training these are updated to minimize differences between ${y}$ and ${\hat y}$. 

![Neural Network Layer](assets/images/NeuralNetworkLayer.png)

In [12]:
class NeuralNetworkLayer:
  """
  A simple neural network layer implementation.
  """

  def __init__(self, input_dim: int, output_dim: int, act_function: callable, is_bias: bool = True) -> None:
    """
    Initialize the layer with input and output dimensions, activation function
    Weights will be initialized randomly for now (binary values)
    Bias will be initialized randomly for now (-1, 0, 1)

    Args:
      input_dim (int) : Number of input dimensions
      output_dim (int) : Number of output dimensions
      act_function (function) : Activation function to use
      is_bias (bool) : Whether to use bias or not
    """
    self.input_dimensions = input_dim
    self.output_dimensions = output_dim
    self.activation_function = act_function

    # For now we can initialize weights randomly to start with
    # we will deep dive later on how to set initial weights while covering the model training
    self.weights = np.random.randint(0, 2, size=(self.input_dimensions, self.output_dimensions))
    if is_bias:
      self.bias = np.random.randint(-1, 2, size=(self.output_dimensions, 1))
    else:
      self.bias = np.zeros((self.output_dimensions, 1))


  def forward(self, input:np.ndarray) -> np.ndarray:
    """
    Calculate the output of the layer for the given input.

    Args:
      input (np.ndarray): The input to the layer.

    Returns:
      np.ndarray: The output of the layer.
    """
    return self.activation_function(np.matmul(self.weights.T, input) + self.bias)

Pytorch implementation of linear layer is available [here](https://github.com/pytorch/pytorch/blob/3a185778edb18abfbad155a87ff3b2d716e4c220/torch/nn/modules/linear.py#L93)

In [13]:
class InputUtils:
  def __init__(self, input_dimensions):
    self.input_dimensions = input_dimensions

  def getInput(self):
    return self.getInputBatch(1)

  def getInputBatch(self, batch_size):
    x = np.random.randint(0, 5, size=(self.input_dimensions, batch_size))
    return x

Lets use the code that we created above to create a linear neural network layer with 3 inputs and 2 outputs.

Linear layer is simply a layer with all neuron using linear activation function.


In [14]:
input_dimensions = 3
output_dimensions = 2

input_sampler = InputUtils(input_dimensions)
layer = NeuralNetworkLayer(input_dimensions, output_dimensions, ActivationFunctions.linear, is_bias=True)

x_input = input_sampler.getInputBatch(5)
print("x_input = \n", x_input)

print("\nweights = \n", layer.weights)

print("\nbias = \n", layer.bias)

print("\ny_pred = \n", layer.forward(x_input))

x_input = 
 [[2 4 1 4 0]
 [3 1 0 0 0]
 [1 1 0 1 2]]

weights = 
 [[1 0]
 [0 0]
 [0 1]]

bias = 
 [[-1]
 [ 1]]

y_pred = 
 [[ 1  3  0  3 -1]
 [ 2  2  1  2  3]]


While different neurons in the same layer can have different activation functions, the usual practice is to use same activation function for all the neurons in a layer. Check this to know [more](https://datascience.stackexchange.com/questions/72559/different-activation-function-in-same-layer-of-a-neural-network).

## Neural Network

A neural network comprises of one or more neural network layers stacked to predict final output.

To define a neural network, we need to know following -
1.  number of input neurons
2.  number of output neurons
3.  number of neurons and activation function for each hidden layer

Input and output neuron count help us identify the dimensions of weight and bias matrix. We'll try to build a network first using network layers and then write the end-to-end code for neural network calculation.

In [15]:
# Now lets move to multi-layer neural network
# we will start with a simple 2 layer neural network
# we will have 3 input neurons, 2 hidden neurons and 2 output neurons
# we will use ReLU activation function for hidden layer and linear activation function for output layer

input_dimensions = 3
output_dimensions = 2
num_hidden_neurons = 2

input_sampler = InputUtils(input_dimensions)

#create layer1, note output dimensions of layer1 will be input dimensions of layer2 and is equal to number of nodes in hidden layer
hidden_layer = NeuralNetworkLayer(input_dimensions, num_hidden_neurons, ActivationFunctions.relu, is_bias=True)
final_layer = NeuralNetworkLayer(num_hidden_neurons, output_dimensions, ActivationFunctions.relu, is_bias=True)

x_input = input_sampler.getInputBatch(5)
print("x_input = \n", x_input)

print("\nweights = \n", hidden_layer.weights)

print("\nbias = \n", hidden_layer.bias)

print("\nhidden_layer_output = \n", hidden_layer.forward(x_input))

print("\nweights = \n", final_layer.weights)
print("\nbias = \n", final_layer.bias)
print("\ny_pred = \n", final_layer.forward(hidden_layer.forward(x_input)))

x_input = 
 [[4 2 2 0 1]
 [1 4 4 1 4]
 [1 4 4 2 4]]

weights = 
 [[0 0]
 [0 0]
 [1 1]]

bias = 
 [[1]
 [0]]

hidden_layer_output = 
 [[2 5 5 3 5]
 [1 4 4 2 4]]

weights = 
 [[1 0]
 [1 1]]

bias = 
 [[-1]
 [ 0]]

y_pred = 
 [[2 8 8 4 8]
 [1 4 4 2 4]]


In [16]:
# Now we can define a simple neural network class which will have multiple layers

class NeuralNetwork:
    def __init__(self, input_dimensions: int, output_dimensions: int, hidden_layer_neuron_count: list[int]) -> None:
        """
        Initialize the neural network with input and output dimensions and number of neurons in each hidden layer
        Weights will be initialized randomly for now (binary values)
        Bias will be initialized randomly for now (-1, 0, 1)
        Lets assume that we are using only ReLU activation function for now for all layers

        Args:
            input_dimensions (int) : Number of input dimensions
            output_dimensions (int) : Number of output dimensions
            hidden_layer_neuron_count (list[int]) : Number of neurons in each hidden layer
        """
        self.input_dimensions = input_dimensions
        self.output_dimensions = output_dimensions
        self.hidden_layer_neuron_count = hidden_layer_neuron_count
        self.layers = []
        
        tmp_input_dimensions = input_dimensions
        for num_neuron in hidden_layer_neuron_count:
            self.add_layer(tmp_input_dimensions, num_neuron)
            tmp_input_dimensions = num_neuron
        self.add_layer(tmp_input_dimensions, output_dimensions)

    def add_layer(self, input_dimensions: int, output_dimensions: int) -> None:
        """
        Add a layer to the neural network.

        Args:
            input_dimensions (int) : Number of input dimensions
            output_dimensions (int) : Number of output dimensions
        """
        if input_dimensions <= 0 or output_dimensions <= 0:
            raise ValueError("Number of neurons in hidden layer should be greater than 0")
        self.layers.append(NeuralNetworkLayer(input_dimensions, output_dimensions, ActivationFunctions.relu, is_bias=True))

    def forward(self, input: np.ndarray) -> np.ndarray:
        """
        Calculate the output of the neural network for the given input.

        Args:
            input (np.ndarray): The input to the neural network.

        Returns:
            np.ndarray: The output of the neural network.
        """
        for layer in self.layers:
            input = layer.forward(input)
        return input

In [17]:
input_dimensions = 3
output_dimensions = 2
num_hidden_neurons = 2

input_sampler = InputUtils(input_dimensions)
x_input = input_sampler.getInputBatch(5)
print("x_input = \n", x_input)

nn = NeuralNetwork(input_dimensions, output_dimensions, [num_hidden_neurons])

print("\nweights = \n", nn.layers[0].weights)
print("\nbias = \n", nn.layers[0].bias)

print("\nhidden_layer_output = \n", nn.layers[0].forward(x_input))

print("\nweights = \n", nn.layers[1].weights)
print("\nbias = \n", nn.layers[1].bias)

print("\ny_pred = \n", nn.forward(x_input))


x_input = 
 [[1 4 0 2 0]
 [0 4 0 3 0]
 [1 4 4 4 3]]

weights = 
 [[0 1]
 [0 1]
 [0 0]]

bias = 
 [[0]
 [0]]

hidden_layer_output = 
 [[0 0 0 0 0]
 [1 8 0 5 0]]

weights = 
 [[0 0]
 [1 1]]

bias = 
 [[ 0]
 [-1]]

y_pred = 
 [[1 8 0 5 0]
 [0 7 0 4 0]]


# Recap

This wraps up this notebook. Just to recap we learned -


1.   Artificial Neuron and its mathematical representation
2.   How to code a
    
        *   Neuron processing single 1-D input
        *   Neuron processing multiple 1-D input (Batch Inputs)
        *   Neuron processing multiple n-D inputs
        *   NeuralNetworkLayer consisting of multiple neurons
        *   NeuralNetwork consisting of multiple hidden layers

# References

1.   https://aibyhand.substack.com/