# Tanh Activation Function - From Scratch

## Question 14: Compare Activation Functions Mathematically and Visually

---

### ðŸ§© Problem Statement

**What problem is being solved?**
- Sigmoid outputs are not zero-centered (all positive)
- This causes zigzag optimization paths
- Tanh solves this with output range (-1, 1)

**Key Formula:** tanh(z) = (e^z - e^(-z)) / (e^z + e^(-z))

**Output Range:** (-1, 1) - Zero-centered!

**Maximum Gradient:** 1.0 at z=0 (4x better than sigmoid!)

---

## Step 1: Import Libraries

In [None]:
import numpy as np
import matplotlib.pyplot as plt

---

## Step 2: Implement Tanh Function

### ðŸ”¹ Line Explanation

#### 2.1 What the line does
Computes tanh using exponentials: (e^z - e^(-z)) / (e^z + e^(-z))

#### 2.2 Why it is used
Zero-centered output helps optimization. Max gradient is 1.0 (4x sigmoid).

#### 2.3 When to use it
RNNs, LSTMs, when zero-centered output is important.

#### 2.4 Where to use it
NLP models, time series, sequence-to-sequence models.

#### 2.5-2.7 Usage and Output
```python
tanh(0)   # Returns 0 (center point)
tanh(2)   # Returns 0.964
tanh(-2)  # Returns -0.964
```

In [None]:
def tanh(z):
    """
    Tanh activation function from scratch.
    Formula: tanh(z) = (e^z - e^(-z)) / (e^z + e^(-z))
    Output: Always between -1 and 1
    """
    exp_z = np.exp(z)
    exp_neg_z = np.exp(-z)
    return (exp_z - exp_neg_z) / (exp_z + exp_neg_z)

In [None]:
# Test tanh
print("tanh(0) =", tanh(0))      # Expected: 0
print("tanh(2) =", tanh(2))      # Expected: ~0.964
print("tanh(-2) =", tanh(-2))    # Expected: ~-0.964

---

## Step 3: Implement Tanh Derivative

### ðŸ”¹ Formula: tanh'(z) = 1 - tanhÂ²(z)

Maximum gradient is 1.0 at z=0 (4x better than sigmoid's 0.25!)

In [None]:
def tanh_derivative(z):
    """
    Derivative of tanh function.
    Formula: tanh'(z) = 1 - tanh^2(z)
    Maximum: 1.0 at z=0
    """
    t = tanh(z)
    return 1 - t ** 2

In [None]:
# Test derivative
print("tanh_derivative(0) =", tanh_derivative(0))   # Expected: 1.0
print("tanh_derivative(2) =", tanh_derivative(2))   # Expected: ~0.07

---

## Step 4: Visualization

In [None]:
z_range = np.linspace(-6, 6, 200)

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Tanh function
ax1.plot(z_range, tanh(z_range), 'g-', linewidth=2, label='Tanh')
ax1.axhline(y=0, color='red', linestyle=':', alpha=0.7)
ax1.axvline(x=0, color='gray', linestyle='--', alpha=0.5)
ax1.set_xlabel('Input (z)')
ax1.set_ylabel('Output')
ax1.set_title('Tanh Function (Zero-Centered!)')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Derivative
ax2.plot(z_range, tanh_derivative(z_range), 'purple', linewidth=2, label='Derivative')
ax2.axhline(y=1.0, color='green', linestyle=':', alpha=0.7, label='Max=1.0')
ax2.axvline(x=0, color='gray', linestyle='--', alpha=0.5)
ax2.set_xlabel('Input (z)')
ax2.set_ylabel('Gradient')
ax2.set_title('Tanh Derivative (Max=1.0, 4x Sigmoid!)')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('outputs/tanh_combined.png', dpi=150)
plt.show()

---

## Step 5: Numerical Analysis

In [None]:
test_inputs = np.array([-5, -2, -0.5, 0, 0.5, 2, 5])

print("TANH NUMERICAL ANALYSIS")
print("=" * 50)
print(f"{'Input':<10} {'Tanh':<15} {'Derivative':<15}")
print("-" * 40)

for z in test_inputs:
    print(f"{z:<10.1f} {tanh(z):<15.6f} {tanh_derivative(z):<15.6f}")

---

## Comparison: Tanh vs Sigmoid

| Property | Sigmoid | Tanh |
|----------|---------|------|
| Output Range | (0, 1) | (-1, 1) |
| Zero-centered | No | **Yes** |
| Max Gradient | 0.25 | **1.0** |
| Relationship | -- | tanh(z) = 2*sigmoid(2z) - 1 |

---

## ðŸ’¼ Interview Key Points

1. **Tanh is zero-centered** - helps optimization
2. **Max gradient is 1.0** - 4x better than sigmoid
3. **Still has vanishing gradient** - for |z| > 3
4. **Use for RNNs/LSTMs** - not for modern CNNs