# Activation Function

In [1]:
# apply a non-linear transformation and decides whether a neuron should be activated or not
# enables more complex training
# apply after each layer

In [3]:
'''
Most popular activation functions

1. Step function
    - outputs 1 if x > 0
    - outputs 0 otherwise
    - not used in practice
2. Sigmoid
    - f(x) = 1/(1+e^-x)
    - typically in the last layer of a binary problem
3. TanH
    - basically a scaled sigmoid function
    - f(x) = 2/(1+e^(-2x)) -1)
    - outputs between -1 and 1
    - Used in Hidden Layers
4. ReLU
    - most popular choice in most of the networks
    - f(x) = max(0,x)
    - outputs 0 for values < 0
    - outputs x for all other x 
    - nonlinear
    - very good choice for an activation function 
    - if unsure which activation function to use, just use ReLU for hidden layers
5. Leaky ReLU
    - slightly modified, slightly improved version of the ReLu
    - still outputs x for all values > 0
    - outputs ax for values < 0
    - a is a very small value (for negative values)
    - tries to solve the vanishing gradient problem
    - if weights don't update during training, try using the Leaky ReLU function (dead neurons)
6. Softmax
    - outputs between 0 and 1
    - probability as an output
    - good in the last layer in multi class classification problems
'''

"\nMost popular activation functions\n\n1. Step function\n    - outputs 1 if x > 0\n    - outputs 0 otherwise\n    - not used in practice\n2. Sigmoid\n    - f(x) = 1/(1+e^-x)\n    - typically in the last layer of a binary problem\n3. TanH\n    - basically a scaled sigmoid function\n    - f(x) = 2/(1+e^(-2x)) -1)\n    - outputs between -1 and 1\n    - Used in Hidden Layers\n4. ReLU\n    - most popular choice in most of the networks\n    - f(x) = max(0,x)\n    - outputs 0 for values < 0\n    - outputs x for all other x \n    - nonlinear\n    - very good choice for an activation function \n    - if unsure which activation function to use, just use ReLU for hidden layers\n5. Leaky ReLU\n    - slightly modified, slightly improved version of the ReLu\n    - still outputs x for all values > 0\n    - outputs ax for values < 0\n    - a is a very small value (for negative values)\n    - tries to solve the vanishing gradient problem\n    - if weights don't update during training, try using the Le

In [4]:
import torch
import torch.nn as nn
import torch.nn.functional as F

In [5]:
# option 1 (create nn modules)
class NeuralNet(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(NeuralNet, self).__init__()
        self.linear1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.linear2 = nn.Linear(hidden_size, 1)
        self.sigmoid = nn.Sigmoid()
        
    def forward(self, x):
        out = self.linear1(x)
        out = self.relu(out)
        out = self.linear2(out)
        out = self.sigmoid(out)
        
        return out

In [6]:
# option 2 (use activation functions directly in forward pass)
class NeuralNet(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(NeuralNet, self).__init__()
        self.linear1 = nn.Linear(input_size, hidden_size)
        self.linear2 = nn.Linear(hidden_size, 1)
        
    def forward(self, x):
        out = torch.relu(self.linear1(x))
        out = torch.sigmoid(self.linear2(out))
        
        return out