# Lab 3 - Module 1: Activation Functions – Bending Space

**Learning Objectives:**
- Describe what activation functions do to numbers ("squash", "clip", "zero out")
- See how applying activation functions to 2D coordinates warps/bends space
- Understand that activation functions are the "nonlinear ingredient" for flexible models
- Recognize that straight rules after warping can look curved in original space

**Time:** ~15-20 minutes

---

**From Module 0:** You saw that some patterns (XOR, circles) can't be separated by straight lines.

**Today's Big Idea:** Activation functions **warp or reshape space** itself. This warping is the key ingredient that will later let us use simple straight-line rules to solve complex problems!

## Connection to Module 0

In **Module 0**, you discovered:
- Straight lines work great for some datasets (like two separated clouds)
- But straight lines **completely fail** for XOR and circular patterns

**Today:** We'll see how activation functions bend space so that simple rules become much more powerful.

**Important:** Activation functions alone don't "solve" these problems. But they provide the **nonlinear transformation** that makes solutions possible.

## 1. Setup: The Four Activation Functions

Before we see how activation functions bend 2D space, let's recall what they do to single numbers.

### The Four Functions:

1. **Step**: Jumps at 0 → outputs 0 or 1
2. **Sigmoid**: Smooth S-curve → outputs between 0 and 1
3. **Tanh**: Smooth S-curve → outputs between -1 and 1
4. **ReLU** (Rectified Linear Unit): Outputs 0 for negative inputs, keeps positive inputs as-is

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import ipywidgets as widgets
from ipywidgets import Dropdown, interact, FloatSlider
from IPython.display import display

# Define activation functions
def sigmoid(x):
    return 1 / (1 + np.exp(-np.clip(x, -500, 500)))  # Clip to avoid overflow

def tanh(x):
    return np.tanh(x)

def relu(x):
    return np.maximum(0, x)

def step(x):
    return (x > 0).astype(float)

print("✓ Activation functions loaded!")
print("\nKey behaviors:")
print("  • Sigmoid/Tanh: Compress large values into narrow range")
print("  • ReLU: Zeros out negatives, keeps positives")
print("  • Step: Hard jump at zero")

## 2. Explore Individual Activation Functions

Select an activation function to see:
- Its graph
- Example input → output pairs
- How it behaves at large positive/negative values

In [None]:
def show_1d_activation(activation_name):
    """
    Display 1D activation function with example values.
    """
    # Select function
    funcs = {
        'Sigmoid': sigmoid,
        'Tanh': tanh,
        'ReLU': relu,
        'Step': step
    }
    func = funcs[activation_name]
    
    # Create plot
    x = np.linspace(-5, 5, 200)
    y = func(x)
    
    fig, ax = plt.subplots(figsize=(10, 6), dpi=100)
    ax.plot(x, y, 'purple', linewidth=3, label=f'{activation_name}(x)')
    ax.axhline(0, color='black', linewidth=0.8, alpha=0.3)
    ax.axvline(0, color='black', linewidth=0.8, alpha=0.3)
    ax.grid(True, alpha=0.3)
    ax.set_xlabel('Input (x)', fontsize=13)
    ax.set_ylabel('Output', fontsize=13)
    ax.set_title(f'{activation_name} Activation Function', fontsize=14, fontweight='bold')
    ax.legend(fontsize=12)
    plt.tight_layout()
    plt.show()
    
    # Example values
    test_inputs = [-3, -1, 0, 1, 3]
    print(f"\nExample Input → Output for {activation_name}:")
    print("="*50)
    for inp in test_inputs:
        out = func(np.array([inp]))[0]
        print(f"  {inp:4.1f} → {out:6.3f}")
    
    # Behavior description
    print(f"\nBehavior:")
    if activation_name == 'Sigmoid':
        print("  • Large negative → close to 0")
        print("  • Large positive → close to 1")
        print("  • Smooth S-curve")
    elif activation_name == 'Tanh':
        print("  • Large negative → close to -1")
        print("  • Large positive → close to +1")
        print("  • Smooth S-curve, centered at 0")
    elif activation_name == 'ReLU':
        print("  • Negative → 0")
        print("  • Positive → keeps value")
        print("  • Sharp corner at 0")
    else:  # Step
        print("  • Negative → 0")
        print("  • Positive → 1")
        print("  • Hard jump at 0")

# Interactive widget
interact(
    show_1d_activation,
    activation_name=Dropdown(
        options=['Sigmoid', 'Tanh', 'ReLU', 'Step'],
        value='Sigmoid',
        description='Activation:'
    )
);

## 3. Warping a 2D Grid – The Rubber Sheet

Now here's where it gets interesting!

### The Concept:

Imagine drawing a square grid on a **rubber sheet** (the x₁-x₂ plane).

When we apply an activation function to **both coordinates** (x₁ and x₂), it's like **stretching or compressing that rubber sheet** in different ways.

**What happens:**
- **Sigmoid**: Pulls the entire grid into [0,1] × [0,1] with curved edges
- **Tanh**: Pulls into [-1,1] × [-1,1] with curved edges
- **ReLU**: Negative half-planes collapse onto axes
- **Step**: Almost everything collapses to four corner points

In [None]:
def create_grid(n_lines=15, grid_range=3):
    """
    Create a regular 2D grid of lines.
    """
    lines_h = []  # Horizontal lines
    lines_v = []  # Vertical lines
    
    grid_points = np.linspace(-grid_range, grid_range, n_lines)
    n_points_per_line = 100
    
    for pos in grid_points:
        # Horizontal line at x2 = pos
        x1_h = np.linspace(-grid_range, grid_range, n_points_per_line)
        x2_h = np.full(n_points_per_line, pos)
        lines_h.append(np.column_stack([x1_h, x2_h]))
        
        # Vertical line at x1 = pos
        x1_v = np.full(n_points_per_line, pos)
        x2_v = np.linspace(-grid_range, grid_range, n_points_per_line)
        lines_v.append(np.column_stack([x1_v, x2_v]))
    
    return lines_h, lines_v

def apply_activation_2d(points, activation_func):
    """
    Apply activation function to both coordinates.
    """
    return np.column_stack([
        activation_func(points[:, 0]),
        activation_func(points[:, 1])
    ])

print("✓ Grid functions ready!")

## 4. Interactive: Watch the Grid Warp

**Instructions:**
1. Select an activation function from the dropdown
2. Compare the LEFT plot (original) to the RIGHT plot (warped)
3. Notice what happens to:
   - Straight lines (do they stay straight?)
   - The corners of the grid
   - Points that were far from the origin

In [None]:
def visualize_grid_warping(activation_name):
    """
    Show original grid and warped grid side-by-side.
    """
    # Select function
    funcs = {
        'Sigmoid': sigmoid,
        'Tanh': tanh,
        'ReLU': relu,
        'Step': step
    }
    func = funcs[activation_name]
    
    # Create grid
    lines_h, lines_v = create_grid(n_lines=15, grid_range=3)
    
    # Create figure
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 7), dpi=100)
    
    # LEFT: Original grid
    for line in lines_h:
        ax1.plot(line[:, 0], line[:, 1], 'blue', alpha=0.5, linewidth=1)
    for line in lines_v:
        ax1.plot(line[:, 0], line[:, 1], 'blue', alpha=0.5, linewidth=1)
    
    ax1.set_xlabel('x₁', fontsize=13)
    ax1.set_ylabel('x₂', fontsize=13)
    ax1.set_title('ORIGINAL Space (Before Activation)', fontsize=13, fontweight='bold')
    ax1.grid(True, alpha=0.3)
    ax1.set_aspect('equal')
    ax1.set_xlim(-3.5, 3.5)
    ax1.set_ylim(-3.5, 3.5)
    ax1.axhline(0, color='black', linewidth=1, alpha=0.3)
    ax1.axvline(0, color='black', linewidth=1, alpha=0.3)
    
    # RIGHT: Warped grid
    for line in lines_h:
        line_warped = apply_activation_2d(line, func)
        ax2.plot(line_warped[:, 0], line_warped[:, 1], 'red', alpha=0.6, linewidth=1.5)
    for line in lines_v:
        line_warped = apply_activation_2d(line, func)
        ax2.plot(line_warped[:, 0], line_warped[:, 1], 'red', alpha=0.6, linewidth=1.5)
    
    ax2.set_xlabel(f'{activation_name}(x₁)', fontsize=13)
    ax2.set_ylabel(f'{activation_name}(x₂)', fontsize=13)
    ax2.set_title(f'WARPED Space (After {activation_name})', fontsize=13, fontweight='bold')
    ax2.grid(True, alpha=0.3)
    ax2.set_aspect('equal')
    
    # Set appropriate limits based on activation
    if activation_name == 'Sigmoid':
        ax2.set_xlim(-0.1, 1.1)
        ax2.set_ylim(-0.1, 1.1)
    elif activation_name == 'Tanh':
        ax2.set_xlim(-1.2, 1.2)
        ax2.set_ylim(-1.2, 1.2)
    elif activation_name == 'Step':
        ax2.set_xlim(-0.2, 1.2)
        ax2.set_ylim(-0.2, 1.2)
    else:  # ReLU
        ax2.set_xlim(-0.5, 3.5)
        ax2.set_ylim(-0.5, 3.5)
    
    ax2.axhline(0, color='black', linewidth=1, alpha=0.3)
    ax2.axvline(0, color='black', linewidth=1, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    # Observations
    print(f"\nKey Observations for {activation_name}:")
    print("="*70)
    if activation_name == 'Sigmoid':
        print("  • Entire grid compressed into [0,1] × [0,1]")
        print("  • Straight lines become curves")
        print("  • Far-away points all squeezed near edges (0 or 1)")
    elif activation_name == 'Tanh':
        print("  • Grid compressed into [-1,1] × [-1,1]")
        print("  • Similar to sigmoid but centered at zero")
        print("  • Straight lines become curved")
    elif activation_name == 'ReLU':
        print("  • Three quadrants collapse onto the axes")
        print("  • Only positive quadrant (x₁>0, x₂>0) stays spread out")
        print("  • Sharp corner at origin")
    else:  # Step
        print("  • Almost everything collapses to 4 points: (0,0), (0,1), (1,0), (1,1)")
        print("  • Extreme compression!")

# Interactive widget
interact(
    visualize_grid_warping,
    activation_name=Dropdown(
        options=['Sigmoid', 'Tanh', 'ReLU', 'Step'],
        value='Sigmoid',
        description='Activation:'
    )
);

## 5. Straight Rules After Warping

Now for the **powerful insight**:

Suppose after applying an activation, we classify points using a very simple rule:

**"activated_x₁ + activated_x₂ > threshold"**

- That rule is **linear** (straight line) in the activated space
- But because activation **bent the coordinates**, the boundary looks **curved** in the original space!

This is the key: **simple rules after warping = flexible boundaries before warping**.

In [None]:
def linear_rule_after_activation(points, threshold):
    """
    Apply rule: activated_x1 + activated_x2 > threshold
    """
    return points[:, 0] + points[:, 1] > threshold

def visualize_curved_boundary(activation_name, threshold):
    """
    Show how a linear rule in activated space looks curved in original space.
    """
    # Select function
    funcs = {
        'Sigmoid': sigmoid,
        'Tanh': tanh,
        'ReLU': relu
    }
    
    if activation_name not in funcs:
        print("Step creates too much compression - try Sigmoid, Tanh, or ReLU!")
        return
    
    func = funcs[activation_name]
    
    # Create a grid of points
    x1 = np.linspace(-3, 3, 60)
    x2 = np.linspace(-3, 3, 60)
    X1, X2 = np.meshgrid(x1, x2)
    points_original = np.column_stack([X1.ravel(), X2.ravel()])
    
    # Apply activation
    points_activated = apply_activation_2d(points_original, func)
    
    # Apply linear rule in activated space
    classification = linear_rule_after_activation(points_activated, threshold)
    
    # Create figure
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 7), dpi=100)
    
    # LEFT: Original space
    ax1.scatter(points_original[classification, 0], 
               points_original[classification, 1],
               c='red', s=10, alpha=0.3, label='Class A')
    ax1.scatter(points_original[~classification, 0], 
               points_original[~classification, 1],
               c='blue', s=10, alpha=0.3, label='Class B')
    
    ax1.set_xlabel('x₁', fontsize=13)
    ax1.set_ylabel('x₂', fontsize=13)
    ax1.set_title('ORIGINAL Space: Boundary Looks Curved!', fontsize=13, fontweight='bold')
    ax1.legend(fontsize=11, markerscale=3)
    ax1.grid(True, alpha=0.3)
    ax1.set_aspect('equal')
    ax1.set_xlim(-3, 3)
    ax1.set_ylim(-3, 3)
    
    # RIGHT: Activated space
    ax2.scatter(points_activated[classification, 0], 
               points_activated[classification, 1],
               c='red', s=10, alpha=0.3, label='Class A')
    ax2.scatter(points_activated[~classification, 0], 
               points_activated[~classification, 1],
               c='blue', s=10, alpha=0.3, label='Class B')
    
    # Draw the decision boundary line
    if activation_name == 'Sigmoid':
        y1_line = np.linspace(0, 1, 100)
        xlim, ylim = (0, 1), (0, 1)
    elif activation_name == 'Tanh':
        y1_line = np.linspace(-1, 1, 100)
        xlim, ylim = (-1, 1), (-1, 1)
    else:  # ReLU
        y1_line = np.linspace(0, 3, 100)
        xlim, ylim = (0, 3), (0, 3)
    
    y2_line = threshold - y1_line
    ax2.plot(y1_line, y2_line, 'green', linewidth=3, label=f'Rule: y₁+y₂={threshold:.1f}')
    
    ax2.set_xlabel(f'{activation_name}(x₁)', fontsize=13)
    ax2.set_ylabel(f'{activation_name}(x₂)', fontsize=13)
    ax2.set_title('ACTIVATED Space: Boundary is Straight!', fontsize=13, fontweight='bold')
    ax2.legend(fontsize=11, markerscale=3)
    ax2.grid(True, alpha=0.3)
    ax2.set_aspect('equal')
    ax2.set_xlim(xlim)
    ax2.set_ylim(ylim)
    
    plt.tight_layout()
    plt.show()
    
    print(f"\nKey Insight:")
    print("="*70)
    print(f"  RIGHT: Boundary is STRAIGHT in activated space")
    print(f"  LEFT: Same boundary looks CURVED in original space")
    print(f"\n  Because {activation_name} warped the space!")
    print(f"  Simple linear rule after activation = flexible boundary before")

# Interactive widget
widgets.interact(
    visualize_curved_boundary,
    activation_name=Dropdown(
        options=['Sigmoid', 'Tanh', 'ReLU'],
        value='Sigmoid',
        description='Activation:'
    ),
    threshold=FloatSlider(
        min=-0.5, max=2.0, step=0.1, value=0.5,
        description='Threshold:',
        continuous_update=False
    )
);

## Questions for Your Answer Sheet

**Q4.** In your own words, what happens to very large positive and very large negative inputs for sigmoid and tanh?

**Q5.** Which activation function changes most rapidly near x = 0? How can you tell from the graph?

**Q6.** For the sigmoid activation, what happens to points that were far away from the origin (large |x₁| or |x₂|)? Where do they end up in the activated space?

**Q7.** For ReLU, what happens to points where x₁ or x₂ is negative? What shape do you see in the activated space?

**Q8.** Which activation warps the grid the most (makes it look least like a square), and how can you tell?

**Q9.** When you look at the original space (left plot in Section 5), does the boundary between the two colors look straight or curved?

**Q10.** In the activated space (right plot in Section 5), what does the boundary look like?

**Q11.** Explain, in one or two sentences, how activation functions help us build more flexible decision rules even if the rule itself is linear after activation.

## Next Steps

1. **Answer Q4-Q11** on your answer sheet
2. **Return to the LMS** and continue to Module 2
3. In Module 2, you'll learn more details about each activation function and their properties!