```{contents}
```

## ELU (Exponential Linear Unit)

* ELU is designed to **fix the limitations of ReLU, Leaky ReLU, and Parametric ReLU**.

* **Formula:**
  
  $$
  f(x) =
  \begin{cases}
  x & \text{if } x > 0 \
  \alpha (e^x - 1) & \text{if } x \leq 0
  \end{cases}
  $$
  Here, α is usually set to 1.

* **Key features:**

  1. **No dead ReLU problem** – because negative inputs are not zeroed out completely; they are mapped to negative values.
  2. **Zero-centered** – the mean of outputs is closer to zero, which improves weight updates during backpropagation.
  3. **Smooth for negative values** – unlike Leaky ReLU, which is linear for negatives, ELU is exponential, which can lead to better learning.

---

![](../images/elu.png)

### Context: Activation Functions and ReLU Problems

* **Activation functions** introduce non-linearity in neural networks. Without them, neural networks would behave like linear regression models.

* **ReLU (Rectified Linear Unit):**

  * Formula: $f(x) = \max(0, x)$
  * Works great for avoiding vanishing gradients.
  * **Problem:** Dead ReLU neurons – if the input to a neuron is always negative, the gradient is 0, so that neuron stops learning.

* **Leaky ReLU and Parametric ReLU:**

  * They fix the dead neuron problem by allowing a small slope for negative values:

    * Leaky ReLU: $f(x) = x$ if $x > 0$, else $f(x) = \alpha x$ (α is small, like 0.01)
    * Parametric ReLU: similar, but α is learned during training.
  * **Limitation:** They are **not zero-centered**. This can make weight updates less efficient because the mean of the outputs isn’t around zero.

---


###  Forward and Backward Propagation

* **Forward pass:**

  * Positive values behave like ReLU (output = input).
  * Negative values decay exponentially toward a negative asymptote.
* **Backward pass (gradients):**

  * Positive: gradient = 1
  * Negative: gradient gradually decreases toward zero, but not exactly zero.

---

### Advantages

* Solves dead ReLU issue.
* Zero-centered outputs → better learning.
* Can handle negative inputs more effectively.

### Disadvantages

* Slightly more **computationally intensive** than ReLU or Leaky ReLU because of the exponential calculation.

---

**Summary**

| Activation | Dead Neuron Problem | Zero-Centered | Complexity |
| ---------- | ------------------- | ------------- | ---------- |
| ReLU       | Yes                 | No            | Low        |
| Leaky ReLU | No                  | No            | Low        |
| PReLU      | No                  | No            | Low-Med    |
| ELU        | No                  | Yes           | Medium     |

**Intuition:** ELU is like ReLU but smarter – it keeps neurons alive in the negative region and ensures outputs are centered around zero, which helps the network learn faster and more efficiently.