# Activation Functions Comparison

## Question 14: Compare Activation Functions Mathematically and Visually

---

### ðŸ§© Problem Statement

Compare Sigmoid, Tanh, and ReLU side-by-side to understand:
- When to use each
- Vanishing gradient problem
- Trade-offs between them

### Summary Table

| Function | Formula | Max Gradient | Use Case |
|----------|---------|--------------|----------|
| Sigmoid | 1/(1+e^-z) | 0.25 | Binary output |
| Tanh | (e^z-e^-z)/(e^z+e^-z) | 1.0 | RNNs |
| ReLU | max(0, z) | 1.0 always | Hidden layers |

---

## Step 1: Import and Define All Functions

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Sigmoid
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def sigmoid_derivative(z):
    s = sigmoid(z)
    return s * (1 - s)

# Tanh
def tanh(z):
    exp_z = np.exp(z)
    exp_neg_z = np.exp(-z)
    return (exp_z - exp_neg_z) / (exp_z + exp_neg_z)

def tanh_derivative(z):
    t = tanh(z)
    return 1 - t ** 2

# ReLU
def relu(z):
    return np.maximum(0, z)

def relu_derivative(z):
    return np.where(z > 0, 1, 0).astype(float)

---

## Step 2: Plot All Functions on Same Graph

In [None]:
z_range = np.linspace(-6, 6, 200)

plt.figure(figsize=(12, 6))
plt.plot(z_range, sigmoid(z_range), 'b-', linewidth=2, label='Sigmoid')
plt.plot(z_range, tanh(z_range), 'g-', linewidth=2, label='Tanh')
plt.plot(z_range, relu(z_range), 'r-', linewidth=2, label='ReLU')

plt.axhline(y=0, color='gray', linestyle='--', alpha=0.5)
plt.axvline(x=0, color='gray', linestyle='--', alpha=0.5)

plt.xlabel('Input (z)', fontsize=12)
plt.ylabel('Output', fontsize=12)
plt.title('Comparison of Activation Functions', fontsize=14)
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)
plt.ylim(-1.5, 6)

plt.savefig('outputs/all_activations.png', dpi=150)
plt.show()

---

## Step 3: Plot All Derivatives (Gradients)

In [None]:
plt.figure(figsize=(12, 6))
plt.plot(z_range, sigmoid_derivative(z_range), 'b-', linewidth=2, label="Sigmoid' (max=0.25)")
plt.plot(z_range, tanh_derivative(z_range), 'g-', linewidth=2, label="Tanh' (max=1.0)")
plt.plot(z_range, relu_derivative(z_range), 'r-', linewidth=2, label="ReLU' (0 or 1)")

plt.axhline(y=0.25, color='blue', linestyle=':', alpha=0.5)
plt.axhline(y=1.0, color='gray', linestyle=':', alpha=0.5)
plt.axvline(x=0, color='gray', linestyle='--', alpha=0.5)

plt.xlabel('Input (z)', fontsize=12)
plt.ylabel('Gradient', fontsize=12)
plt.title('Derivatives Comparison - Shows Vanishing Gradient!', fontsize=14)
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)

plt.savefig('outputs/all_derivatives.png', dpi=150)
plt.show()

---

## Step 4: Complete Numerical Comparison

In [None]:
test_inputs = np.array([-5, -2, -0.5, 0, 0.5, 2, 5])

print("COMPLETE NUMERICAL COMPARISON")
print("=" * 80)
print(f"{'Input':<8} {'Sigmoid':<10} {'Sig.Grad':<10} {'Tanh':<10} {'Tanh.Grad':<10} {'ReLU':<8} {'ReLU.Grad':<10}")
print("-" * 78)

for z in test_inputs:
    print(f"{z:<8.1f} {sigmoid(z):<10.4f} {sigmoid_derivative(z):<10.4f} "
          f"{tanh(z):<10.4f} {tanh_derivative(z):<10.4f} {relu(z):<8.1f} {relu_derivative(z):<10.1f}")

---

## Step 5: Gradient Analysis at Key Points

In [None]:
print("GRADIENT COMPARISON AT KEY POINTS")
print("=" * 60)

for x in [-2, 0, 2, 5]:
    print(f"\nAt x = {x}:")
    print(f"  Sigmoid: {sigmoid_derivative(x):.6f}")
    print(f"  Tanh:    {tanh_derivative(x):.6f}")
    print(f"  ReLU:    {relu_derivative(x):.1f}")

---

## Key Insights

### At z = 5 (far from center):
- Sigmoid gradient: **0.0066** (vanishing!)
- Tanh gradient: **0.0002** (vanishing!)
- ReLU gradient: **1.0** (perfect!)

### This is why ReLU enabled deep learning!

---

## Decision Guide

| Need | Use |
|------|-----|
| Binary classification output | Sigmoid |
| Multi-class output | Softmax |
| Hidden layers (default) | ReLU |
| RNNs/LSTMs | Tanh |
| Dead neurons problem | LeakyReLU |

---

## ðŸ’¼ Interview Summary

1. **Sigmoid**: max grad 0.25, use for binary OUTPUT only
2. **Tanh**: max grad 1.0, zero-centered, use for RNNs
3. **ReLU**: grad = 1 for ALL positive, enabled deep learning
4. **Trade-off**: Sigmoid/Tanh = vanishing gradient, ReLU = dead neurons
5. **Winner**: ReLU for hidden layers (vanishing gradient is worse than dead neurons)