# ReLU (Rectified Linear Unit) — Key Notes

## Definition
The **Rectified Linear Unit (ReLU)** is one of the most widely used activation functions in modern neural networks.  
It is defined as:

$$f(x) = \max(0, x)$$

---

## Intuition
- If the input is **positive**, ReLU just passes it through (acts like identity).
- If the input is **negative**, ReLU outputs **0** (it "rectifies" it).

---

## Derivative (for backpropagation)
ReLU is simple to differentiate:

$$f'(x) = 
\begin{cases}
1 & \text{if } x > 0 \\
0 & \text{if } x \leq 0
\end{cases}$$

- For positive values → gradient = 1 (no shrinking like sigmoid/tanh).
- For negative values → gradient = 0 (inactive neuron).

---

## Why ReLU is Popular
1. **Computationally cheap**  
   - Just compares with 0, no exponentials or divisions.
   
2. **Avoids vanishing gradients** (mostly)  
   - Unlike sigmoid/tanh, the gradient for positive inputs is 1, so deep networks can propagate gradients better.

3. **Sparsity**  
   - Many neurons output exactly 0 → network becomes sparse, which can help generalization and efficiency.

4. **Linear behavior for positive inputs**  
   - Makes optimization easier compared to bounded activations.

---

## Drawbacks
- **Dying ReLU problem**: Neurons that fall into the negative side can get stuck (always output 0, never update).
- Solutions: Leaky ReLU, Parametric ReLU, ELU, GELU, etc.

---

## Summary
- **Formula**: $f(x) = \max(0, x)$  
- **Derivative**: $f'(x) = 1$ (if $x>0$), else $0$.  
- **Good for**: Deep networks, faster training, avoiding vanishing gradients.  
- **Risk**: Some neurons can "die".  
