# Activation Functions

Activation functions are a fundamental component of neural networks. They introduce non-linearity to the model, enabling neural networks to learn complex relationships and solve a wide range of tasks. In this detailed explanation, We'll cover various activation functions commonly used in neural networks, along with code examples in PyTorch.

### Neural Networks Basics: Activation Functions

Activation functions are applied to the output of each neuron (or node) in a neural network layer. They determine whether a neuron should "fire" and pass its signal to the next layer. Here are some common activation functions:

### 1. Sigmoid Activation Function:

* Range: (0, 1)
* Often used in the output layer for binary classification problems. <br>
* Squashes input values to a sigmoid-shaped curve.

In [12]:
import torch
import torch.nn as nn

# Mathematically, Sigmoid is defined as
# f (x) = 1 / (1 + exp(-x))

sigmoid = nn.Sigmoid()
x = torch.tensor([0.0, 1.0, -1.0], requires_grad=True)
output = sigmoid(x)

### 2. Hyperbolic Tangent (Tanh) Activation Function:

* Range: (-1, 1)
* Similar to the sigmoid but centered around zero.
* Often used in hidden layers of neural networks.

In [4]:
# Mathematically, tanh is defined as:
# f(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x))

tanh = nn.Tanh()
x = torch.tensor([0.0, 1.0, -1.0], requires_grad=True)
output = tanh(x)

### 3. Rectified Linear Unit (ReLU) Activation Function:

* Range: [0, ∞)
* Most widely used activation function.
* Simple and computationally efficient.
* Introduces non-linearity by thresholding negative values to zero.

In [5]:
# Mathematically, ReLU is defined as:
# f(x) = max(0, x)

relu = nn.ReLU()
x = torch.tensor([0.0, 1.0, -1.0], requires_grad=True)
output = relu(x)

### 4. Leaky ReLU Activation Function:

* Variation of ReLU.
* Allows a small gradient for negative values to prevent dead neurons.
* Typically, α (the leaky slope) is set to a small positive value.

In [10]:
leaky_relu = nn.LeakyReLU(0.01) # You can adjust the slope. 0.01 in this case.
x = torch.tensor([0.0, 1.0, -1.0], requires_grad=True)
output = leaky_relu(x)

### 5. Parametric ReLU (PReLU) Activation Function:

* Similar to Leaky ReLU, but the leaky slope is learned during training.
* Allows the network to adapt the slope for each neuron.

In [8]:
prelu = nn.PReLU(num_parameters=1) # One slope parameter per channel
x = torch.tensor([0.0, 1.0, -1.0], requires_grad=True)
output = prelu(x)

### 6. Exponential Linear Unit (ELU) Activation Function:

* Range: (-α, ∞) where α is a positive constant.
* Similar to ReLU but smooth for negative values.
* Prevents dead neurons and can lead to faster convergence.

In [9]:
elu = nn.ELU(alpha=1.0) # You can adjust the α parameter
x = torch.tensor([0.0, 1.0, -1.0], requires_grad=True)
output = elu(x)

### 7. Swish Activation Function:

* Proposed by Google researchers as an alternative to ReLU.
* Combines the advantages of ReLU and Sigmoid.

In [11]:
def swish(x):
    return x * torch.sigmoid(x)

x = torch.tensor([0.0, 1.0, -1.0], requires_grad=True)
output = swish(x)

### 8. Softmax

* The softmax activation is commonly used in the output layer for multiclass classification problems. 
* It converts a vector of real numbers into a probability distribution over multiple classes.

In [14]:
# Mathematically Softmax is defined as
# softmax(x)_i = exp(x_i) / sum(exp(x_j) for j in range(N))

softmax = nn.Softmax(dim=0)
x = torch.tensor([0.0, 1.0, -1.0], requires_grad=True)
output = softmax(x)