```{contents}
```

## Activation Function

An **activation function** is a mathematical function applied to the output of a neuron. Its role is to determine whether a neuron should “activate” (i.e., pass its value forward) or not.

Without activation functions, neural networks would just be **linear models**, no matter how many layers they have — meaning they couldn’t learn complex patterns.

---

## Why Do We Need Activation Functions?

### 🔹 1. Introduce Non-Linearity

Real-world data and problems (like vision, speech, and language) are **non-linear**. Activation functions allow neural networks to **model complex relationships**.

### 🔹 2. Help with Hierarchical Learning

Activation functions allow deeper layers to progressively learn higher-level features:

* Early layers → edges, curves
* Middle layers → shapes, patterns
* Final layers → objects, decisions

### 🔹 3. Control the Flow of Information

They decide:

* How much signal should pass forward (forward propagation)
* How much gradient should pass backward (backpropagation)

---

## Types of Activation Functions

### 1. **Linear Activation**

➤ *Used rarely* (except in some output layers).
`f(x) = x` → No non-linearity, so limited learning capability.

---

### 2. **Sigmoid (Logistic)**

**Formula:**
[
\sigma(z)= \frac{1}{1+e^{-z}}
]

✔ Output range: (0, 1)
✔ Good for **binary classification** (output layer)

❌ Main problems:

* **Vanishing gradient**
* Saturates for large +ve / -ve values
* Not zero-centered

---

### 3. **Tanh (Hyperbolic Tangent)**

**Formula:**
[
\tanh(z) = \frac{e^z - e^{-z}}{e^z + e^{-z}}
]

✔ Output range: (−1, 1)
✔ Zero-centered

❌ Still suffers from **vanishing gradients**

---

### 4. **ReLU (Rectified Linear Unit)**

**Formula:**
[
f(x) = \max(0, x)
]

✔ Fast and simple
✔ Works well for **hidden layers**
✔ Doesn’t saturate for positive values
✔ Helps avoid vanishing gradients

❌ Can cause **“dead neuron” problem** (output always 0)

---

### 5. **Leaky ReLU**

Fixes “dead ReLU” issue
[
f(x) =
\begin{cases}
x & x > 0 \
0.01x & x \le 0
\end{cases}
]

---

### 6. **Softmax**

Used in **multi-class classification** (output layer)

Converts raw values into **probabilities** that sum to 1.

---

**Where to Use Which Activation Function?**

| Layer Type           | Recommended Activation   |
| -------------------- | ------------------------ |
| Hidden Layers        | ReLU / Leaky ReLU / Tanh |
| Output (Binary)      | Sigmoid                  |
| Output (Multi-class) | Softmax                  |
| Regression Output    | Linear                   |

---

**Key Role in Deep Learning**

Activation functions impact:

* **Training speed**
* **Gradient flow**
* **Accuracy**
* **Stability**

Choosing the right activation function is **critical** for solving vanishing gradient, exploding gradient, or dead neuron problems.

```{dropdown} Click here for Sections
```{tableofcontents}
