📝 **Author:** Amirhossein Heydari - 📧 **Email:** amirhosseinheydari78@gmail.com - 📍 **Linktree:** [linktr.ee/mr_pylin](https://linktr.ee/mr_pylin)

---

**Table of contents**<a id='toc0_'></a>    
- [Dependencies](#toc1_)    
  - [`torch`](#toc1_1_)    
  - [`torch.nn`](#toc1_2_)    
  - [`torch.nn.functional`](#toc1_3_)    
- [Activation Functions](#toc2_)    
  - [Linear](#toc2_1_)    
  - [Sigmoid](#toc2_2_)    
  - [Hyperbolic Tangent (Tanh)](#toc2_3_)    
  - [Softplus](#toc2_4_)    
  - [LogSigmoid](#toc2_5_)    
  - [Rectified Linear Unit (ReLU)](#toc2_6_)    
  - [LeakyReLU](#toc2_7_)    
  - [Exponential Linear Unit (ELU)](#toc2_8_)    
  - [Sigmoid Linear Unit (SiLU)](#toc2_9_)    
  - [Mish](#toc2_10_)    
  - [Softmax](#toc2_11_)    
  - [LogSoftmax](#toc2_12_)    
  - [Gaussian Error Linear Units (GeLU)](#toc2_13_)    
  - [Plot Activation Functions](#toc2_14_)    
- [Threshold Functions](#toc3_)    
  - [Step](#toc3_1_)    
  - [Sign](#toc3_2_)    
  - [Plot Threshold Functions](#toc3_3_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

# <a id='toc1_'></a>[Dependencies](#toc0_)

In [None]:
import matplotlib.pyplot as plt
import torch

## <a id='toc1_1_'></a>[`torch`](#toc0_)
   - Not commonly used directly in user code.

In [None]:
from torch import relu, sigmoid, sign, softmax, tanh

## <a id='toc1_2_'></a>[`torch.nn`](#toc0_)
   - Creates a module.
   - Can be used as a layer in a neural network.
   - Suitable for building complex models.

In [None]:
from torch.nn import ELU, GELU, LeakyReLU, LogSigmoid, LogSoftmax, Mish, ReLU, Sigmoid, SiLU, Softmax, Softplus, Tanh

## <a id='toc1_3_'></a>[`torch.nn.functional`](#toc0_)
   - Functional API for applying activation functions.
   - More flexible than `torch.nn` for custom operations.
   - Often used directly in model forward passes.
   - Provides more control over the computation graph.

In [None]:
from torch.nn.functional import (
    elu,
    gelu,
    leaky_relu,
    log_softmax,
    logsigmoid,
    mish,
    relu,
    sigmoid,
    silu,
    softmax,
    softplus,
    tanh,
)

# <a id='toc2_'></a>[Activation Functions](#toc0_)
   - Activation functions are used to introduce non-linearity into the neural network.
   - Without an activation function, a neural network would behave like a linear regression model, no matter how many layers it has!

<figure style="text-align: center;">
    <img src="../../assets/images/original/mlp/no-activation-network.svg" alt="no-activation-network.svg" style="width: 100%;">
    <figcaption style="text-align: center;">Neural Network without Any Activation Functions is just a Linear Transformation of Input to the Output</figcaption>
</figure>

📝 Docs:
   - Non-linear Activations (weighted sum, nonlinearity): [pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity](https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity)
   - Non-linear Activations (other): [pytorch.org/docs/stable/nn.html#non-linear-activations-other](https://pytorch.org/docs/stable/nn.html#non-linear-activations-other)
   - Non-linear activation functions: [pytorch.org/docs/stable/nn.functional.html#non-linear-activation-functions](https://pytorch.org/docs/stable/nn.functional.html#non-linear-activation-functions)

✍️ **Notes**:
   - Using Python functions is not a correct implementation of an activation function for Pytorch
   - The correct implementation is covered in the future notebooks

In [None]:
# domain [-10, +10]
x = torch.linspace(-10, +10, 1001)

# log
print(x)

## <a id='toc2_1_'></a>[Linear](#toc0_)

In [6]:
def linear_func(x: torch.Tensor) -> torch.Tensor:
    return x

In [None]:
plt.figure(figsize=(4, 4))
plt.plot(x, linear_func(x))
plt.title("Linear")
plt.grid(True)
plt.show()

## <a id='toc2_2_'></a>[Sigmoid](#toc0_)
   - Historically used for `binary classification`, but less common now due to [vanishing gradient](https://towardsdatascience.com/the-vanishing-gradient-problem-69bf08b15484) issues.

In [8]:
def sigmoid_func(x: torch.Tensor) -> torch.Tensor:
    return 1 / (1 + torch.exp(-x))

In [None]:
plt.figure(figsize=(4, 4))
plt.plot(x, sigmoid_func(x))
plt.title("Sigmoid")
plt.grid(True)
plt.xlim(-10, 10)
plt.ylim(-4, 4)
plt.show()

## <a id='toc2_3_'></a>[Hyperbolic Tangent (Tanh)](#toc0_)
   - Similar to `sigmoid` but centered around 0, used in [recurrent neural networks (RNNs)](https://karpathy.github.io/2015/05/21/rnn-effectiveness/) and older architectures.

In [10]:
def tanh_func(x: torch.Tensor) -> torch.Tensor:
    exp_x = torch.exp(x)
    exp_neg_x = torch.exp(-x)
    return (exp_x - exp_neg_x) / (exp_x + exp_neg_x)

In [None]:
plt.figure(figsize=(4, 4))
plt.plot(x, tanh_func(x))
plt.title("Hyperbolic Tangent (Tanh)")
plt.grid(True)
plt.xlim(-10, 10)
plt.ylim(-4, 4)
plt.show()

## <a id='toc2_4_'></a>[Softplus](#toc0_)
   - Smooth approximation of `ReLU`.

In [12]:
def softplus_func(x: torch.Tensor) -> torch.Tensor:
    return torch.log(1 + torch.exp(x))

In [None]:
plt.figure(figsize=(4, 4))
plt.plot(x, softplus_func(x))
plt.title("Softplus")
plt.grid(True)
plt.xlim(-10, 10)
plt.ylim(-10, 10)
plt.show()

## <a id='toc2_5_'></a>[LogSigmoid](#toc0_)
   - Logarithm of `sigmoid`, less common but used in specific applications.

In [14]:
def logsigmoid_func(x: torch.Tensor) -> torch.Tensor:
    return torch.log(1 / (1 + torch.exp(-x)))

In [None]:
plt.figure(figsize=(4, 4))
plt.plot(x, logsigmoid_func(x))
plt.title("LogSigmoid")
plt.grid(True)
plt.xlim(-10, 10)
plt.ylim(-10, 10)
plt.show()

## <a id='toc2_6_'></a>[Rectified Linear Unit (ReLU)](#toc0_)
   - Most commonly used, computationally efficient, but suffers from the [dying ReLU](https://datascience.stackexchange.com/questions/5706/what-is-the-dying-relu-problem-in-neural-networks) ([vanishing gradient](https://towardsdatascience.com/the-vanishing-gradient-problem-69bf08b15484)) problem.

In [16]:
def relu_func(x: torch.Tensor) -> torch.Tensor:
    return torch.max(x, torch.tensor(0.0))

In [None]:
plt.figure(figsize=(4, 4))
plt.plot(x, relu_func(x))
plt.title("Rectified Linear Unit (ReLU)")
plt.grid(True)
plt.xlim(-10, 10)
plt.ylim(-10, 10)
plt.show()

## <a id='toc2_7_'></a>[LeakyReLU](#toc0_)
   - Addresses the `dying ReLU` problem by allowing a small, non-zero gradient for negative inputs.

In [18]:
def leaky_relu_func(x: torch.Tensor, negative_slope: float = 0.2) -> torch.Tensor:
    return torch.max(x, negative_slope * x)

In [None]:
plt.figure(figsize=(4, 4))
plt.plot(x, leaky_relu_func(x))
plt.title("LeakyReLU")
plt.grid(True)
plt.xlim(-10, 10)
plt.ylim(-10, 10)
plt.show()

## <a id='toc2_8_'></a>[Exponential Linear Unit (ELU)](#toc0_)
   - Similar to `LeakyReLU` but uses an exponential function for negative inputs, often providing better performance than `ReLU`.

In [20]:
def elu_func(x: torch.Tensor, alpha: int = 1.0) -> torch.Tensor:
    return torch.where(x > 0, x, alpha * (torch.exp(x) - 1))

In [None]:
plt.figure(figsize=(4, 4))
plt.plot(x, elu_func(x))
plt.title("Exponential Linear Unit (ELU)")
plt.grid(True)
plt.xlim(-10, 10)
plt.ylim(-10, 10)
plt.show()

## <a id='toc2_9_'></a>[Sigmoid Linear Unit (SiLU)](#toc0_)
   - Combines ReLU-like behavior with a smooth curve, often yielding better results than `ReLU`.

In [22]:
def silu_func(x: torch.Tensor) -> torch.Tensor:
    return x * torch.sigmoid(x)

In [None]:
plt.figure(figsize=(4, 4))
plt.plot(x, silu_func(x))
plt.title("Sigmoid Linear Unit (SiLU)")
plt.grid(True)
plt.xlim(-10, 10)
plt.ylim(-10, 10)
plt.show()

## <a id='toc2_10_'></a>[Mish](#toc0_)
   - Self-regularized activation function, generally performs better than `ReLU` and its variants.

In [24]:
def mish_func(x: torch.Tensor) -> torch.Tensor:
    return x * torch.tanh(torch.nn.functional.softplus(x))

In [None]:
plt.figure(figsize=(4, 4))
plt.plot(x, mish_func(x))
plt.title("Mish")
plt.grid(True)
plt.xlim(-10, 10)
plt.ylim(-10, 10)
plt.show()

## <a id='toc2_11_'></a>[Softmax](#toc0_)
   - Used for `multi-class classification`, outputs probabilities [[mutually exclusive](https://en.wikipedia.org/wiki/Softmax_function)] for each class, often used `internally` in `CrossEntropyLoss`.

In [26]:
def softmax_func(x: torch.Tensor, dim=None) -> torch.Tensor:
    if dim is None:
        dim = len(x.shape) - 1
    exp_x = torch.exp(x - x.max(dim=dim, keepdim=True).values)
    return exp_x / exp_x.sum(dim=dim, keepdim=True)

In [None]:
plt.figure(figsize=(4, 4))
plt.plot(x, softmax_func(x))
plt.title("Softmax")
plt.grid(True)
plt.xlim(-10, 10)
plt.ylim(-0.05, 0.05)
plt.show()

## <a id='toc2_12_'></a>[LogSoftmax](#toc0_)
   - Logarithm of softmax, often used in `NLLLoss`.
   - Reducing the risk of numerical issues and ensuring more reliable calculations rather than `Softmax`.

In [28]:
def logsoftmax_func(x: torch.Tensor, dim=None) -> torch.Tensor:
    if dim is None:
        dim = len(x.shape) - 1
    softmax_x = torch.nn.functional.softmax(x, dim=dim)
    return torch.log(softmax_x)

In [None]:
plt.figure(figsize=(4, 4))
plt.plot(x, logsoftmax_func(x))
plt.title("LogSoftmax")
plt.grid(True)
plt.xlim(-10, 10)
plt.ylim(-25, 0)
plt.show()

## <a id='toc2_13_'></a>[Gaussian Error Linear Units (GeLU)](#toc0_)
   - Approximates the expected value of `ReLU` with a Gaussian input, often used in `transformer-based` models.

In [30]:
def gelu_func(x: torch.Tensor) -> torch.Tensor:
    return x * 0.5 * (1.0 + torch.erf(x / 2.0**0.5))

In [None]:
plt.figure(figsize=(4, 4))
plt.plot(x, gelu_func(x))
plt.title("Gaussian Error Linear Units (GeLU)")
plt.grid(True)
plt.xlim(-10, 10)
plt.ylim(-10, 10)
plt.show()

## <a id='toc2_14_'></a>[Plot Activation Functions](#toc0_)

In [None]:
fig, axs = plt.subplots(nrows=3, ncols=4, figsize=(12, 8), layout="compressed")
fig.suptitle("Activation Functions")
axs[0, 0].plot(x, relu_func(x))
axs[0, 0].set(title="Rectified Linear Unit (ReLU)", xlim=[-10, 10], ylim=[-10, 10])
axs[0, 1].plot(x, leaky_relu(x))
axs[0, 1].set(title="LeakyReLU", xlim=[-10, 10], ylim=[-10, 10])
axs[0, 2].plot(x, elu_func(x))
axs[0, 2].set(title="Exponential Linear Unit (ELU)", xlim=[-10, 10], ylim=[-10, 10])
axs[0, 3].plot(x, silu_func(x))
axs[0, 3].set(title="Sigmoid Linear Unit (SiLU)", xlim=[-10, 10], ylim=[-10, 10])
axs[1, 0].plot(x, mish_func(x))
axs[1, 0].set(title="Mish", xlim=[-10, 10], ylim=[-10, 10])
axs[1, 1].plot(x, sigmoid_func(x))
axs[1, 1].set(title="Sigmoid", xlim=[-10, 10], ylim=[-4, 4])
axs[1, 2].plot(x, tanh_func(x))
axs[1, 2].set(title="Hyperbolic Tangent (Tanh)", xlim=[-10, 10], ylim=[-4, 4])
axs[1, 3].plot(x, softplus_func(x))
axs[1, 3].set(title="Softplus", xlim=[-10, 10], ylim=[-10, 10])
axs[2, 0].plot(x, logsigmoid_func(x))
axs[2, 0].set(title="LogSigmoid", xlim=[-10, 10], ylim=[-10, 10])
axs[2, 1].plot(x, softmax_func(x))
axs[2, 1].set(title="Softmax", xlim=[-10, 10], ylim=[-0.05, 0.05])
axs[2, 2].plot(x, logsoftmax_func(x))
axs[2, 2].set(title="LogSoftmax", xlim=[-10, 10], ylim=[-25, 0])
axs[2, 3].plot(x, gelu_func(x))
axs[2, 3].set(title="Gaussian Error Linear Units (GeLU)", xlim=[-10, 10], ylim=[-10, 10])
for ax in fig.axes:
    ax.grid(True)
plt.show()

# <a id='toc3_'></a>[Threshold Functions](#toc0_)
   - Threshold functions are a simpler type of activation function primarily used in the early development of neural networks
   - These functions decide whether a neuron should be activated or not based on whether the input surpasses a certain threshold

## <a id='toc3_1_'></a>[Step](#toc0_)

In [33]:
def step_func(x: torch.Tensor) -> torch.Tensor:
    return torch.where(x >= 0, torch.ones_like(x), torch.zeros_like(x))

In [None]:
plt.figure(figsize=(4, 4))
plt.plot(x, step_func(x))
plt.title("Step")
plt.grid(True)
plt.xlim(-10, 10)
plt.ylim(-2, 2)
plt.show()

## <a id='toc3_2_'></a>[Sign](#toc0_)

In [35]:
def sign_func(x: torch.Tensor) -> torch.Tensor:
    return torch.where(x > 0, torch.ones_like(x), torch.where(x < 0, torch.ones_like(x) * -1, torch.zeros_like(x)))

In [None]:
plt.figure(figsize=(4, 4))
plt.plot(x, sign_func(x))
plt.title("Sign")
plt.grid(True)
plt.xlim(-10, 10)
plt.ylim(-2, 2)
plt.show()

## <a id='toc3_3_'></a>[Plot Threshold Functions](#toc0_)

In [None]:
# plot
fig, axs = plt.subplots(nrows=1, ncols=2, figsize=(8, 4), layout="compressed")
fig.suptitle("Threshold Functions")
axs[0].plot(x, step_func(x))
axs[0].grid(True)
axs[0].set(title="step", xlim=[-10, 10], ylim=[-2, 2])
axs[1].plot(x, sign_func(x))
axs[1].grid(True)
axs[1].set(title="sign", xlim=[-10, 10], ylim=[-2, 2])
plt.show()