# Activation Functions

- Apply a non-linear transformation
- We do this because we don't want a model that only contains linear transformations
- Non-linear transfomrations allow the model to learn more complex phenomena

- Most popular activation functions
    - Step function (not used in practice)
    - Sigmoid function (typicially used in last layer of binary calssification problems)
    - TanH function (between -1 and +1)
    - ReLU (most popular choice) (could cause dead neurons, vanishing gradient)
    - Leaky ReLU (improved ReLU, attempts to solve vanishing gradient problem)
    - Softmax (good choice for last layer of classification problem)
    
Rule of thumb: if you don't know which to use then use ReLU

In [None]:
import numpy as np
import torch
import torch.nn as nn
# sometimes the functions are in torch.nn.functional (such as leaky relu)
import torch.nn.functional as F

# option 1
class NeuralNet(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(NeuralNet, self).__init__()
        self.linear1 = nn.Linear(input_size, hidden_size) 
        self.relu = nn.ReLU()
        self.linear2 = nn.Linear(hidden_size, 1)  
        self.sigmoid = nn.Sigmoid()
    
    def forward(self, x):
        out = self.linear1(x)
        out = self.relu(out)
        out = self.linear2(out)
        out = self.sigmoid(out)
        # no softmax at the end
        return out

# option 2
class NeuralNet(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(NeuralNet, self).__init__()
        self.linear1 = nn.Linear(input_size, hidden_size)
        self.linear2 = nn.Linear(hidden_size, 1)  
    
    def forward(self, x):
        out = torch.relu(self.linear1(x))
        out = torch.sigmoid(self.linear2(out))
        # no softmax at the end
        return out