The idea behind this notebook series is to go slow and build a solid understanding of code and maths involved in deep learning. Few points to note -
1. For simplicity of calculations, I have used small integer values in examples
2. This code is only for understanding the concepts so it is missing couple of things like type checking and error handling

This is the 1st notebook in the series. Here we start with a single neuron (scalar input and output) and build upto the idea of a network layer.

[Colab link](https://colab.research.google.com/github/shwetaAgrawal/deeplearning_tutorials/blob/main/notebooks/1_Intro_to_perceptron.ipynb)


In [1]:
# Pre-requisite for running this notebook - numpy. If you come across error "No module named 'numpy'" then please uncomment the below line and run this cell
#!pip install numpy

# Introduction to perceptron

**Lets start from a simple perceptron**

How does a perceptron work?
It takes in an **input** - applies **weight** and **intercept** followed by an **activation** function

> *y_pred = activation_function(w * x_input + b)*

This notebook follows the concepts covered in [Linear Layer Worksheet by Tom Yeh](https://aibyhand.substack.com/p/w3-linear-layer). If you are like me, you might want to grab the pen and notebook to do the exercises covered in the linked workbook before going through code here.

In [2]:
# numpy is used for efficient numerical operations - it supports vectorized operations on n-dimensional arrays
# random is used to generate random numbers for generating input data
import numpy as np
import random

## Scalar input
Perceptron with simple scalar input, weight and bias.

For perceptron scalar input is nothing but a 1x1 matrix. We will be covering this later in the notebook.

In [3]:
# Perceptron example with scalar input. We are starting with barebones code to understand the concept
x = random.randint(1, 10)
print("Input, x =", x)
weight = 3
bias = 1

# lets say activation function is linear f(x) = x for all values of x
y_pred = (weight * x + bias)
print("Prediction, y_pred =", y_pred)

Input, x = 4
Prediction, y_pred = 13


In [4]:
#Lets define the perceptron class 
class Perceptron:
    """
    A simple perceptron implementation.
    """

    def __init__(self, weight, bias):
        """
        Initialize the perceptron with the given weight and bias.
        
        Args:
            weight (float/int) : The weight of the perceptron.
            bias (float/int) : The bias of the perceptron.
        """
        self.weight = weight
        self.bias = bias

    def forward(self, x):
        """
        Calculate the output of the perceptron for the given input.
        
        Args:
            x (float/int/np.ndarray): The input to the perceptron.
    
        Returns:
            float/int/np.ndarray: The output of the perceptron.
        """
        return self.weight * x + self.bias
    
# Lets create an object of the perceptron class and test it
perceptron1 = Perceptron(weight, bias)

# Lets test using the same input x that we used in earlier cell
print("Input, x =", x)
y_pred = perceptron1.forward(x)
print("Prediction, y_pred =", y_pred)

Input, x = 4
Prediction, y_pred = 13


## Activation Functions

Lets define some common activation functions used in deep learning. Activation function are required for **introducing non-linearity** to neural networks.

Looking carefully at the equation for perceptron, we can see that **w*x + b** is a linear equation.

Why do we need non-linearity? To model non-linear decision boundaries. We will cover this in detail later.

In [5]:
"""Implementing common activation functions. Expected input to all the functions are numpy arrays"""
class ActivationFunctions:
  @staticmethod
  def linear(x):
    """Linear activation function: f(x) = x
    Args:
      x: np.ndarray : input to the activation function
    
    Returns:
      np.ndarray
    """
    return x

  @staticmethod
  def sigmoid(x):
    """Sigmoid activation function: f(x) = 1 / (1 + exp(-x))
    
    Args:
      x: np.ndarray : input to the activation function
    
    Returns:
      np.ndarray
    """
    return 1 / (1 + np.exp(-x))

  @staticmethod
  def relu(x):
    """ReLU activation function: f(x) = max(0, x)
    
    Args:
      x: np.ndarray : input to the activation function
    
    Returns:
      np.ndarray
    """
    # return max(0, x) this won't work for an array
    return np.maximum(0, x)

  @staticmethod
  def tanh(x):
    """Tanh activation function: f(x) = (exp(2x) - 1) / (exp(2x) + 1)
    
    Args:
      x: np.ndarray : input to the activation function
    
    Returns:
      np.ndarray
    """
    return (np.exp(2*x) - 1) / (np.exp(2*x) + 1)

## Batch inputs

Lets now run this for a batch (set of inputs).
We will create a batch of 5 scalar inputs to run through our neuron.

Since inputs are represented as column vector => each input is a new column in the input matrix

In [6]:
# running above calculations for a batch of scalar inputs. We will draw a sample of 5 inputs with values between 1 and 10
batch_size = 5

x = np.random.randint(1, 10, batch_size)
print("x =", x)

# we added activation function here for now. We will move it to the Perceptron class definition later
y_pred = ActivationFunctions.linear(perceptron1.forward(x))
print("y_pred =", y_pred)

x = [7 8 1 1 4]
y_pred = [22 25  4  4 13]


## Multi-dimensional inputs

Lets add more dimensions to input. As we increase the dimensions of either input or output, we switch to vector form of the perceptron equation.  

* x_input is represented as column vector
```
Ex 1 - two dimensional input sample
2
1
Ex - 3 input samples of 2 dimension will be represented as
2 3 1
1 0 3
```

* weight represented as an **output_neurons * input_neurons** matrix
* bias a column vector {output x 1}


In [7]:
# Now lets move to multi-dimensional input
# lets say input features are 3 so input has 3 rows and 1 columns, and weight has 1 row and 3 columns
# and start with single perceptron calculation

input_dimensions = 3 # since we decided for a 3 dimensional input above
output_dimensions = 1 # since we have only 1 neuron so only 1 output

# we are specifying 2 dimensions for input because we want a column vector, other way to achieve same is by creating row vectors and transposing them
x = np.random.randint(0, 5, size=(input_dimensions, 1))
print("x =\n", x)

# create weight vectors with dimensions as shared above
weight = np.random.randint(0, 2, size=(output_dimensions, input_dimensions))
print("\nweights =\n", weight)
# we can have bias as a 1X1 matrix or scalar both works
bias = 1

# Note that we switched to using np.matmul instead of * (scalar multiplier) for multiplication
y_pred = ActivationFunctions.linear(np.matmul(weight, x) + bias)
print("\ny_pred =\n", y_pred)

x =
 [[2]
 [2]
 [0]]

weights =
 [[0 0 0]]

y_pred =
 [[1]]


In [8]:
# lets vectorize the perceptron class and add activation function to it
class Perceptron:
    """
    A simple perceptron implementation.
    """

    def __init__(self, weight, bias, activation_function):
        """
        Initialize the perceptron with the given weight and bias.
        
        Args:
            weight (np.ndarray) : The weight of the perceptron.
            bias (float/int/np.ndarray) : The bias of the perceptron.
            activation_function (function) : The activation function to use.
        """
        self.weight = weight
        self.bias = bias
        self.activation_function = activation_function

    def forward(self, x):
        """
        Calculate the output of the perceptron for the given input.
        
        Args:
            x (np.ndarray): The input to the perceptron.
    
        Returns:
            np.ndarray: The output of the perceptron.
        """
        return self.activation_function(np.matmul(self.weight, x) + self.bias)
    
# lets create an object of the perceptron class and test it using the same set of inputs, weight and bias as above
perceptron1 = Perceptron(weight, bias, ActivationFunctions.linear)
print("x =\n", x)
y_pred = perceptron1.forward(x)
print("\ny_pred =\n", y_pred)

x =
 [[2]
 [2]
 [0]]

y_pred =
 [[1]]


In [9]:
# lets test the above Perceptron class for a batch of inputs
batch_size = 5

x = np.random.randint(0, 5, size=(input_dimensions, batch_size))
print("x =\n", x)

# we are reusing the same object of perceptron class that we created above => no change in weight and bias
y_pred = perceptron1.forward(x)
print("\ny_pred =\n", y_pred)


x =
 [[0 2 0 2 1]
 [3 0 4 4 2]
 [3 4 1 4 4]]

y_pred =
 [[1 1 1 1 1]]


## Multi-dimensional output

So far we were looking at single neuron case, what if I add more neuron to this setup. In case of multiple output neurons, we are going to have multiple outputs for each input sample.

Lets first try to understand what does having multiple output neurons imply. So far we have seen that a single neuron had -
1. a set of weights
2. a bias term
3. activation function
4. output

So adding a new neuron implies that we need to add weights, bias and activation function for this new neuron & this addition will lead to addition of one more dimension to the output.

In [10]:
# Now lets move to multi-dimensional output as in add 1 more neuron to our setup => 2 outputs
# this new neuron need to be initialized with new set of weights, bias and activation function
# continuing our earlier example where we had 3 input neurons and now 2 output neurons

input_dimensions = 3 # since we decided for a 3 dimensional input above
output_dimensions = 2 # since we now have 2 neuron so 2-dimensional output is expected now

# we are specifying 2 dimensions for input because we want a column vector, other way to achieve same is by creating row vectors and transposing them
x = np.random.randint(0, 5, size=(input_dimensions, 1))
print("x =\n", x)

# create weight vectors with dimensions as shared above
weight = np.random.randint(0, 2, size=(output_dimensions, input_dimensions))
print("\nweights =\n", weight)

# here we can't continue with scalar bias because now we have 2 neurons with their independent bias terms
# so we need a column vector with dimensions as output_dimensions X 1
bias = np.random.randint(-1, 1, size=(output_dimensions, 1))
print("\nbias =\n", bias)

# Note that we don't need to change the perceptron class for this change in output dimensions
# its already designed to handle any number of output dimensions
perceptron2 = Perceptron(weight, bias, ActivationFunctions.linear)
y_pred = perceptron2.forward(x)
print("\ny_pred =\n", y_pred)

x =
 [[2]
 [1]
 [0]]

weights =
 [[1 1 1]
 [0 1 1]]

bias =
 [[0]
 [0]]

y_pred =
 [[3]
 [1]]


## Batch Input + Multi-dimensional input and output

Now we want to write code that can run multiple input samples through our neural network in one go. Also this time we have a slightly complex network consisting of multiple neurons and multi-dimensional inputs.

In the above cell, we laid foundations to build multi-neuron network, so lets start with that and identify if there are any changes required for batch inputs

In [11]:
# continuing our earlier example where we had 3 input neurons, 2 output neurons
# and now adding 5 input samples instead of 1

batch_size = 5 # number of input samples we want to process
input_dimensions = 3 # since we decided for a 3 dimensional input above
output_dimensions = 2 # since we now have 2 neuron so 2-dimensional output is expected now

# we are specifying 2 dimensions for input because we want a column vector, other way to achieve same is by creating row vectors and transposing them
# Note the change in dimension of input
x = np.random.randint(0, 5, size=(input_dimensions, batch_size))
print("x =\n", x)

# create weight vectors with dimensions as shared above
# Note no change in weights and bias as no change in underlying neural network structure
weight = np.random.randint(0, 2, size=(output_dimensions, input_dimensions))
print("\nweights =\n", weight)

# here we can't continue with scalar bias because now we have 2 neurons with their independent bias terms
# so we need a column vector with dimensions as output_dimensions X 1
bias = np.random.randint(-1, 1, size=(output_dimensions, 1))
print("\nbias =\n", bias)

# Note that we don't need to change the perceptron class for this as well
perceptron2 = Perceptron(weight, bias, ActivationFunctions.linear)
y_pred = perceptron2.forward(x)
print("\ny_pred =\n", y_pred)

x =
 [[3 0 0 0 2]
 [0 2 1 0 4]
 [0 2 3 3 3]]

weights =
 [[0 1 1]
 [0 0 1]]

bias =
 [[ 0]
 [-1]]

y_pred =
 [[ 0  4  4  3  7]
 [-1  1  2  2  2]]


## Neural Network Layer

A neural network layer comprises of one or more neurons connected to same set of inputs and outputs.

If you look back, even when we coded a single neuron or multiple neurons above - we are effectively building a layer in the network.

To define a neural network, we need to know following -
1.  number of input neurons
2.  number of output neurons
3.  activation function for the neurons in this layer

Input and output neuron count help us identify the dimensions of weight and bias matrix. The code below is just tweaking Perceptron class to work with a different set of input parameters.

In [12]:
class NeuralNetworkLayer:
  """
  A simple neural network layer implementation.
  """

  def __init__(self, input_dim: int, output_dim: int, act_function: callable, is_bias: bool = True) -> None:
    """
    Initialize the layer with input and output dimensions, activation function
    Weights will be initialized randomly for now (binary values)
    Bias will be initialized randomly for now (-1, 0, 1)

    Args:
      input_dim (int) : Number of input dimensions
      output_dim (int) : Number of output dimensions
      act_function (function) : Activation function to use
      is_bias (bool) : Whether to use bias or not
    """
    self.input_dimensions = input_dim
    self.output_dimensions = output_dim
    self.activation_function = act_function

    # For now we can initialize weights randomly to start with
    # we will deep dive later on how to set initial weights while covering the model training
    self.weights = np.random.randint(0, 2, size=(self.output_dimensions, self.input_dimensions))
    if is_bias:
      self.bias = np.random.randint(-1, 2, size=(self.output_dimensions, 1))
    else:
      self.bias = np.zeros((self.output_dimensions, 1))


  def predict(self, input:np.ndarray) -> np.ndarray:
    """
    Calculate the output of the layer for the given input.

    Args:
      input (np.ndarray): The input to the layer.

    Returns:
      np.ndarray: The output of the layer.
    """
    return self.activation_function(np.matmul(self.weights, input) + self.bias)

Pytorch implementation of linear layer is available [here](https://github.com/pytorch/pytorch/blob/3a185778edb18abfbad155a87ff3b2d716e4c220/torch/nn/modules/linear.py#L93)

In [13]:
class InputUtils:
  def __init__(self, input_dimensions):
    self.input_dimensions = input_dimensions

  def getInput(self):
    return self.getInputBatch(1)

  def getInputBatch(self, batch_size):
    x = np.random.randint(0, 5, size=(self.input_dimensions, batch_size))
    return x

Lets use the code that we created above to create a linear neural network layer with 3 inputs and 2 outputs.

Linear layer is nothing but one which maps inputs (x) to output (y) using following relationship -

y = w.x + b

Linear layer implicitly uses linear activation function.


In [14]:
input_dimensions = 3
output_dimensions = 2

input_sampler = InputUtils(input_dimensions)
layer = NeuralNetworkLayer(input_dimensions, output_dimensions, ActivationFunctions.linear, is_bias=True)

x_input = input_sampler.getInputBatch(5)
print("x_input = \n", x_input)

print("\nweights = \n", layer.weights)

print("\nbias = \n", layer.bias)

print("\ny_pred = \n", layer.predict(x_input))

x_input = 
 [[2 1 0 1 0]
 [3 1 1 3 0]
 [2 0 3 2 2]]

weights = 
 [[0 1 1]
 [1 1 1]]

bias = 
 [[ 0]
 [-1]]

y_pred = 
 [[5 1 4 5 2]
 [6 1 3 5 1]]


While different neurons in the same layer can have different activation functions, the usual practice is to use same activation function for all the neuron in a layer. Check this to know [more](https://datascience.stackexchange.com/questions/72559/different-activation-function-in-same-layer-of-a-neural-network).

# Recap

This wraps up this notebook. Just to recap we learned -


1.   Artificial Neuron and its mathematical representation
2.   How to code a
    
        *   Neuron processing single 1-D input
        *   Neuron processing multiple 1-D input (Batch Inputs)
        *   Neuron processing multiple n-D inputs
        *   NeuralNetworkLayer consisting of multiple neurons

# References

1.   https://aibyhand.substack.com/