## Introduction

Historic Downfalls of the perceptron was that it cannot learn nontrivial patterns present in data. For example, in XOR situation in which decision boundry cannot be single straight line, Perceptron fails to learn this decision boundry.

![Figure 4.1](../images/figure_4_1.png)

**Feed-Forward** network is any neural network in which data flows in one direction(ie from input to output). By definition, perceptron is also a _feed-forward_ modelm but usually the term is reserved for more complicated models with multiple units.

Two types of _Feed Forward Neural Networks_:

- **Multilayer Perceptron(MLP)**
    - MLP structurally extends the simpler perceptron by grouping many perceptrons in a single layer and stacking multiple layers together.
- **Convolutional Neural Network(CNN)**
    - CNNs are able to learn localized patterns in the inputs using windowing propertry which is inspired by windowed filters in the processing of digital signals.

## The Multilayer Perceptron

The Perceptron takes the data vector as input and computes a single output value. In an MLP, many perceptrons are grouped so that output of a single layer is a new vector instead of a single output value. Additionally MLP combines multiple layers with nonlinearity in between each layer.

The simplest MLP is composed of 3 stages of representation and two linear layers. The first stage is the _input vector_. Given this input vector, the _First Linear Layer_ computes a _hidden vector_ which is the _second stage of representation_. Using the hidden vector, the _Second Linear Layer_ computes an _output vector_.

![Figure 4.2](../images/figure_4_2.png)

The power of MLPs comes from adding the second Linear Layer and allowing the model to learn an intermediate representation that is _linearly separable_ - a property of representations in which a single straight line can be used to distinguish the data points by whcih side of the line they fall on.

### A Simple Example: XOR

In Figure 4.3 We can see that perceptron has difficulty in learning a decision boundry that can separate the stars and circles however the MLP learns a decision boundry that classifies the stars and the circles more accurately.

![Figure 4.3](../images/figure_4_3.png)

It may appear that MLP has two decision boundries but is just one decision boundry because it has been constructed using the intermediate representation that has morphed the space to allow one hyperplane to appear in bith of these positions. This can be visualised in Figure 4.4 and 4.5

![Figure 4.4](../images/figure_4_4.png)

![Figure 4.5](../images/figure_4_5.png)

### Implementing MLPs in PyTorch

In [1]:
# Multilayer perceptron using PyTorch

import torch.nn as nn
import torch.nn.functional as F

class MultilayerPerceptron(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        """
        Args:
            input_dim (int): the size of the input vectors
            hidden_dim (int): the output size of the first Linear layer
            output_dim (int): the output size of the second Linear layer
        """
        super(MultilayerPerceptron, self).__init__()
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        self.fc2 = nn.Linear(hidden_dim, output_dim)
    
    def forward(self, x_in, apply_softmax=False):
        """
        The forward pass of the MLP
        
        Args:
            x_in (torch.Tensor): an input data tensor x_in.shape
                should be (batch, input_dim)
            apply_softmax (bool): a flag for the softmax activation
                should be false if used with the cross-entropy losses
        Returns:
            the resulting tensor. tensor.shape should be (batch, output_dim)
        """
        intermediate = F.relu(self.fc1(x_in))
        output = self.fc2(intermediate)
        
        if apply_softmax:
            output = F.softmax(output, dim=1)
        return output

In [4]:
batch_size, input_dim, hidden_dim, output_dim = 2, 3, 100, 4

mlp = MultilayerPerceptron(input_dim, hidden_dim, output_dim)
print(mlp)

MultilayerPerceptron(
  (fc1): Linear(in_features=3, out_features=100, bias=True)
  (fc2): Linear(in_features=100, out_features=4, bias=True)
)


In [8]:
import torch

def describe(x):
    print(f"Type: {x.type()}")
    print(f"Shape Size: {x.shape}")
    print(f"Values: {x}")

x_input = torch.rand(batch_size, input_dim)
describe(x_input)

y_output = mlp(x_input, apply_softmax=False)
describe(y_output)

y_output = mlp(x_input, apply_softmax=True)
describe(y_output)

Type: torch.FloatTensor
Shape Size: torch.Size([2, 3])
Values: tensor([[0.0073, 0.3744, 0.2184],
        [0.1998, 0.2203, 0.0871]])
Type: torch.FloatTensor
Shape Size: torch.Size([2, 4])
Values: tensor([[-0.0095,  0.0164, -0.0362,  0.0886],
        [ 0.0059,  0.0262, -0.0443,  0.1194]], grad_fn=<AddmmBackward>)
Type: torch.FloatTensor
Shape Size: torch.Size([2, 4])
Values: tensor([[0.2437, 0.2501, 0.2373, 0.2689],
        [0.2444, 0.2494, 0.2324, 0.2738]], grad_fn=<SoftmaxBackward>)
