# II. Activation Functions

## Learning Objectives
- Understand the role of activation functions in a neural network
- Compare different types of activation functions
- Understand the pros and cons of each function
- Understand real-world examples where different activation functions might be used


## Introduction to Activation Functions

An activation function in a neural network is a mathematical 'gate' in between the input feeding the current neuron and its output going to the next layer. It can be thought of as a decision-making process, determining whether a neuron should be activated or not, much like how a manager decides whether to greenlight a project based on certain criteria. This decision is made based on the weighted sum of the input to the neuron. If this weighted sum is above a certain value (the neuron's threshold), the neuron is activated and sends its own signal onward. If not, it remains inactive.

## Different Types of Activation Functions

There are several types of activation functions used in neural networks, each with their own advantages and disadvantages. The most common activation functions are:

- Linear or Identity function
- Sigmoid function
- Hyperbolic Tangent (tanh) function
- Rectified Linear Unit (ReLU) function
- Softmax function

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Define the functions
def linear(x):
    return x

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def tanh(x):
    return np.tanh(x)

def relu(x):
    return np.maximum(0, x)

def softmax(x):
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum()

# Visualize the functions
x = np.linspace(-10, 10, 1000)

plt.figure(figsize=(10, 7))
plt.plot(x, linear(x), label='Linear')
plt.plot(x, sigmoid(x), label='Sigmoid')
plt.plot(x, tanh(x), label='Tanh')
plt.plot(x, relu(x), label='ReLU')
plt.title('Activation Functions')
plt.legend()
plt.grid(True)
plt.show()

## Comparing Activation Functions

Each activation function has its pros and cons, and are used in different scenarios based on these properties:

- **Linear or Identity function**: The linear function is a straight line which means the output will be directly proportional to the input. However, using a linear activation function means that the derivative is constant, and so the gradient has no relationship with X. It's usually used in the output layer for regression problems.

- **Sigmoid function**: The sigmoid function is smooth and bounded between 0 and 1. This nice property allows us to interpret the outputs as probabilities. However, it has two main disadvantages. First, its output isn’t zero-centered which can make the gradient updates go too far in different directions. Second, it suffers from the vanishing gradients problem.

- **Hyperbolic Tangent function (tanh)**: The tanh function is a scaled version of the sigmoid function, and its output is zero-centered because its range is -1 to 1. However, it also suffers from the vanishing gradients problem.

- **Rectified Linear Unit function (ReLU)**: The ReLU function is the most widely used activation function in the field of deep learning. It's computationally efficient and helps mitigate the vanishing gradient problem. However, it suffers from the dying ReLU problem where a large gradient update can cause the neuron to stop learning entirely.

- **Softmax function**: The Softmax function is generally used in the output layer for multi-class classification problems. It gives the probability distribution for each class, but it also suffers from the vanishing gradients problem.

## Real-world Examples

Different activation functions are used in various types of neural networks, and the choice depends on the network's architecture and the problem that's being solved:

- The **linear function** is often used in the output layer of a regression neural network where the output is a real value, such as predicting house prices.

- The **sigmoid function** is often used in the output layer of a binary classification neural network where the output is a probability that the input point belongs to one class or the other.

- The **tanh function** can be used in the hidden layers of a neural network as it is zero-centered, leading to easier model training.

- The **ReLU function** is often used in the hidden layers of a deep neural network due to its computational efficiency and its ability to propagate gradients well.

- The **softmax function** is used in the output layer of a multi-class classification neural network where the outputs are probabilities for each class.