```{contents}
```

# Tanh Activation Function

The **tanh (hyperbolic tangent)** activation function is a non-linear function used in neural networks, especially in hidden layers.

It transforms input values into a range between **-1 and +1**.

### ✅ Formula:

$$
\tanh(z) = \frac{e^z - e^{-z}}{e^z + e^{-z}}
$$

Where:
$$
z = \sum_{i=1}^{n}(w_i \cdot x_i) + b
$$

---

![](../images/tanh.png)

## How It Works (Intuition)

Think of tanh as a "scaled and shifted" version of the sigmoid function.

* **Sigmoid** squashes values between **0 and 1**
* **Tanh** squashes values between **-1 and +1**

So:

* Negative inputs become **negative outputs**
* Positive inputs become **positive outputs**
* Outputs are more centered around **zero**

This makes learning **faster and more stable** than sigmoid in many models.

---

## Output Range:

$$
-1 ;\leq; \tanh(z) ;\leq; +1
$$

Example behavior:

* Large positive z → output ≈ +1
* Large negative z → output ≈ -1
* z near 0 → output ≈ 0

---

## ✅ Derivative of Tanh (Used in Backpropagation)

$$
\frac{d}{dz}\tanh(z) = 1 - \tanh^2(z)
$$

The derivative of tanh ranges from **0 to 1**.

This helps with learning, but not perfectly (we’ll see why later).

---

## Why Tanh is Better Than Sigmoid

| Feature           | Sigmoid   | Tanh     |
| ----------------- | --------- | -------- |
| Output Range      | 0 to 1    | -1 to +1 |
| Zero-centered     | ❌ No      | ✅ Yes    |
| Derivative Range  | 0 to 0.25 | 0 to 1   |
| Speed of learning | Slower    | Faster   |

Zero-centered outputs make weight updates more balanced and reduce bias in optimization.

---

## Advantages of Tanh

✔️ **Zero-centered** → Better weight updates
✔️ **Larger derivative range** than sigmoid → Less vanishing gradient (for shallow/medium networks)
✔️ Works well in **hidden layers**, especially for classification problems

---

## Disadvantages of Tanh

❌ **Still suffers from vanishing gradient** in deep neural networks

* Derivatives shrink layer by layer
  ❌ **Computationally expensive**
* Uses exponential operations
  ❌ Not ideal for **very deep networks**

That’s why ReLU and its variants became more popular.

---

## When is Tanh Used?

✅ Good for:

* Hidden layers in small or medium neural networks
* Situations where negative values help (zero-centered outputs)
* Recurrent Neural Networks (older architectures like vanilla RNN)

Not preferred in very deep networks — ReLU performs better there.

---

**Summary**

**Tanh is a zero-centered, non-linear activation function that maps inputs to $$-1, +1$$. It's better than sigmoid but still limited by the vanishing gradient problem in deeper networks.**
