<a href="https://colab.research.google.com/github/shineloveyc/Doing_ML/blob/master/Forward_Network.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Multilayer perceptro using Pytorch

In [0]:
import torch
import torch.nn as nn
import torch.nn.functional as F
#https://pytorch.org/docs/stable/nn.functional.html

In [0]:
class MultilayerPerceptron(nn.Module):
  def __init__(self, input_dim, hidden_dim, output_dim):
    """ Args:
            initiatiate the model
            input_dim (int): the size of the input vectors
            hidden_dim (int): the output size of the first Linear layer 
            output_dim (int): the output size of the second Linear layer
"""

    super(MultilayerPerceptron, self).__init__() #inherent the nn.Module class
    #define linear function with input and output
    self.fc1 = nn.Linear(input_dim, hidden_dim)
    self.fc2 = nn.Linear(hidden_dim, output_dim)

  #define the forward function
  def forward(self, x_in, apply_softmax = False):
    """The forward pass of the MLP
    Args:
        x_in (torch.Tensor): an input data tensor
        x_in.shape should be (batch, input_dim) 
        apply_softmax (bool): a flag for the softmax activation
          should be false if used with the cross-entropy losses 
        Returns:
          the resulting tensor. tensor.shape should be (batch, output_dim) """
     #add non-linear function in between two linear functions     
    intermediate = F.relu(self.fc1(x_in))
    output = self.fc2(intermediate)

    if apply_softmax:
      output = F.softmax(output,dim =1)
    return output

In [11]:
#instantiation of an MLP
batch_size = 2 #number of samples input at once
input_dim = 3 #the node of the first layer
hidden_dim =100 # the node of the hidden layer
output_dim = 4 #the node of the final layer

#Initialize model
mlp = MultilayerPerceptron(input_dim, hidden_dim, output_dim)
print(mlp)

MultilayerPerceptron(
  (fc1): Linear(in_features=3, out_features=100, bias=True)
  (fc2): Linear(in_features=100, out_features=4, bias=True)
)


In [14]:
#Testing the MLP with random inputs

def describe(x):
  print("type : {}".format(x.type()))
  print('shape/size: {}'.format(x.shape))
  print('values: \n{}'.format(x))

x_input = torch.rand(batch_size,input_dim)
describe(x_input)

type : torch.FloatTensor
shape/size: torch.Size([2, 3])
values: 
tensor([[0.8113, 0.3775, 0.1514],
        [0.5811, 0.9488, 0.4711]])


In [16]:
#mlp starting to train by deploy the forward function, so the arg should match what in the forward function
# the input dim should be (batch, input_dim)
y_output = mlp(x_input, apply_softmax = False)

#the output size should be (batch, output_dim)
#2 is the number of data points in the minibatch
# y the is final feature vectors(in some setting it is a predictor vector) for each data point
describe(y_output)

type : torch.FloatTensor
shape/size: torch.Size([2, 4])
values: 
tensor([[-0.1795,  0.1133,  0.4556, -0.0532],
        [-0.1123,  0.1245,  0.4992, -0.0041]], grad_fn=<AddmmBackward>)


However, if you want to turn the prediction vector into probabilities, an extra step is required. Specifically, you require the softmax activation function, which is used to transform a vector of values into probabilities. The softmax function has many roots. In physics, it is known as the Boltzmann or Gibbs distribution; in statistics, it’s multinomial logistic regression; and in the natural language processing (NLP) community it’s known as the maximum entropy (MaxEnt) classifier.
<br>
Whateverthename,theintuitionunderlyingthefunctionisthatlarge positive values will result in higher probabilities, and lower negative values will result in smaller probabilities.


In [17]:
#Producing probabilistic outputs with a MLP classifier
y_output = mlp(x_input, apply_softmax = True)
describe(y_output)

type : torch.FloatTensor
shape/size: torch.Size([2, 4])
values: 
tensor([[0.1865, 0.2499, 0.3520, 0.2116],
        [0.1914, 0.2425, 0.3528, 0.2133]], grad_fn=<SoftmaxBackward>)


To conclude, MLPs are stacked Linear layers that map tensors to other tensors. 
<br>
* Nonlinearities are used between each pair of Linear layers to break the linear
relationship and allow for the model to twist the vector space around. 
* In a classification setting, this twisting should result in linear separability between classes. 
* Additionally, you can use the softmax function to interpret MLP outputs as probabilities, but you should not use softmax with specific loss functions, because the underlying implementations can leverage superior mathematical/computational shortcuts.