# Lab 3 - Module 1: How Activation Functions Separate Data

**Learning Objectives:**
- See how activation functions transform mixed data into separated classes
- Understand that activation functions "squash" or "compress" values into specific ranges
- Build intuition for why activation functions help with classification

**Time:** ~15 minutes

---

**From Module 0:** You discovered that some patterns cannot be separated by straight lines.

**Today's Big Idea:** Activation functions can **transform** numbers in a way that makes mixed-up data become clearly separated!

## 1. Setup: A Simple Example

Let's start with a simple scenario:
- You have a bunch of numbers (they could be distances, scores, measurements, etc.)
- Some numbers should be classified as **Class 0** (blue)
- Other numbers should be classified as **Class 1** (red)
- But the values overlap - they're mixed together!

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import ipywidgets as widgets
from ipywidgets import FloatSlider, interact, Dropdown
from IPython.display import display

# Set random seed
np.random.seed(42)

# Create some mixed data
# Class 0: centered around -2
# Class 1: centered around +2
# But they overlap!
n_per_class = 40

values_class0 = np.random.randn(n_per_class) * 1.2 - 2.0  # Mean -2, some spread
values_class1 = np.random.randn(n_per_class) * 1.2 + 2.0  # Mean +2, some spread

# Combine
values = np.concatenate([values_class0, values_class1])
labels = np.concatenate([np.zeros(n_per_class), np.ones(n_per_class)])

print("‚úì Data created!")
print(f"\n{n_per_class} values for Class 0 (blue) - centered around -2")
print(f"{n_per_class} values for Class 1 (red) - centered around +2")
print("\nBut they OVERLAP in the middle - some blue and red values are mixed!")

## 2. The Problem: Overlapping Values

Let's visualize these values on a number line.

In [None]:
# Plot the values on a number line
fig, ax = plt.subplots(figsize=(14, 4), dpi=100)

# Plot Class 0 (blue)
ax.scatter(values_class0, np.zeros(n_per_class), c='blue', s=100, alpha=0.6, 
          label='Class 0 (Blue)', edgecolors='k', linewidths=1.5)

# Plot Class 1 (red)
ax.scatter(values_class1, np.zeros(n_per_class), c='red', s=100, alpha=0.6, 
          label='Class 1 (Red)', edgecolors='k', linewidths=1.5)

ax.axhline(0, color='black', linewidth=1, alpha=0.3)
ax.axvline(0, color='green', linewidth=2, linestyle='--', alpha=0.5, label='Threshold at 0')

ax.set_xlabel('Value', fontsize=13)
ax.set_ylabel('')
ax.set_title('Original Values: Classes Overlap!', fontsize=14, fontweight='bold')
ax.set_yticks([])
ax.legend(fontsize=11, loc='upper right')
ax.grid(True, alpha=0.3, axis='x')
ax.set_xlim(-6, 6)

plt.tight_layout()
plt.show()

# Calculate how many are on the "wrong" side of zero
wrong_class0 = np.sum(values_class0 > 0)  # Blue points that are positive
wrong_class1 = np.sum(values_class1 < 0)  # Red points that are negative
total_wrong = wrong_class0 + wrong_class1
accuracy = (len(values) - total_wrong) / len(values) * 100

print(f"\nIf we use a simple threshold at 0:")
print(f"  ‚Ä¢ {wrong_class0} blue points are on the wrong side (positive)")
print(f"  ‚Ä¢ {wrong_class1} red points are on the wrong side (negative)")
print(f"  ‚Ä¢ Accuracy: {accuracy:.1f}%")
print(f"\n‚ö†Ô∏è The classes overlap - we can't perfectly separate them with just a threshold!")

## 3. Activation Functions: The Transformation Tools

Now let's see what happens when we apply **activation functions** to these values.

We'll use a **sigmoid function** - one of the most common activation functions:
- It takes any number (negative, positive, huge, tiny)
- It **squashes** it into the range 0 to 1
- Negative numbers ‚Üí close to 0
- Positive numbers ‚Üí close to 1
- Zero ‚Üí exactly 0.5

In [None]:
def sigmoid(x):
    """Sigmoid activation function: squashes values to range (0, 1)"""
    return 1 / (1 + np.exp(-x))

def step(x):
    """Step function: hard threshold at 0"""
    return (x > 0).astype(float)

def tanh_activation(x):
    """Tanh activation: squashes to range (-1, 1)"""
    return np.tanh(x)

def relu(x):
    """ReLU: keeps positive, zeros negative"""
    return np.maximum(0, x)

# Show what sigmoid looks like
x_plot = np.linspace(-6, 6, 200)
y_sigmoid = sigmoid(x_plot)

fig, ax = plt.subplots(figsize=(10, 6), dpi=100)
ax.plot(x_plot, y_sigmoid, 'purple', linewidth=3, label='Sigmoid(x)')
ax.axhline(0, color='black', linewidth=1, alpha=0.3)
ax.axhline(1, color='black', linewidth=1, alpha=0.3, linestyle='--')
ax.axhline(0.5, color='green', linewidth=2, linestyle='--', alpha=0.5, label='Middle (0.5)')
ax.axvline(0, color='black', linewidth=1, alpha=0.3)

ax.set_xlabel('Input Value (x)', fontsize=13)
ax.set_ylabel('Sigmoid Output', fontsize=13)
ax.set_title('Sigmoid Function: Squashes Everything to 0-1 Range', fontsize=14, fontweight='bold')
ax.legend(fontsize=11)
ax.grid(True, alpha=0.3)
ax.set_xlim(-6, 6)
ax.set_ylim(-0.1, 1.1)

plt.tight_layout()
plt.show()

print("\nSigmoid Key Properties:")
print("  ‚Ä¢ Large negative inputs ‚Üí Output ‚âà 0")
print("  ‚Ä¢ Large positive inputs ‚Üí Output ‚âà 1")
print("  ‚Ä¢ Input = 0 ‚Üí Output = 0.5 (middle)")
print("  ‚Ä¢ Smooth S-curve shape")

## 4. Apply Sigmoid: Watch the Separation Happen!

Now let's apply sigmoid to our mixed-up values and see what happens.

In [None]:
# Apply sigmoid to all values
values_transformed = sigmoid(values)
values_class0_transformed = sigmoid(values_class0)
values_class1_transformed = sigmoid(values_class1)

# Create side-by-side comparison
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 5), dpi=100)

# LEFT: Before (original values)
ax1.scatter(values_class0, np.zeros(n_per_class), c='blue', s=100, alpha=0.6, 
           label='Class 0 (Blue)', edgecolors='k', linewidths=1.5)
ax1.scatter(values_class1, np.zeros(n_per_class), c='red', s=100, alpha=0.6, 
           label='Class 1 (Red)', edgecolors='k', linewidths=1.5)
ax1.axvline(0, color='green', linewidth=2, linestyle='--', alpha=0.5, label='Threshold')
ax1.axhline(0, color='black', linewidth=1, alpha=0.3)

ax1.set_xlabel('Original Value', fontsize=13)
ax1.set_title('BEFORE: Original Values (Overlapping)', fontsize=13, fontweight='bold')
ax1.set_yticks([])
ax1.legend(fontsize=10)
ax1.grid(True, alpha=0.3, axis='x')
ax1.set_xlim(-6, 6)

# RIGHT: After sigmoid
ax2.scatter(values_class0_transformed, np.zeros(n_per_class), c='blue', s=100, alpha=0.6, 
           label='Class 0 (Blue)', edgecolors='k', linewidths=1.5)
ax2.scatter(values_class1_transformed, np.zeros(n_per_class), c='red', s=100, alpha=0.6, 
           label='Class 1 (Red)', edgecolors='k', linewidths=1.5)
ax2.axvline(0.5, color='green', linewidth=2, linestyle='--', alpha=0.5, label='Threshold at 0.5')
ax2.axhline(0, color='black', linewidth=1, alpha=0.3)

ax2.set_xlabel('After Sigmoid', fontsize=13)
ax2.set_title('AFTER: Sigmoid Applied (Better Separated!)', fontsize=13, fontweight='bold')
ax2.set_yticks([])
ax2.legend(fontsize=10)
ax2.grid(True, alpha=0.3, axis='x')
ax2.set_xlim(-0.1, 1.1)

plt.tight_layout()
plt.show()

# Calculate new accuracy
predicted = (values_transformed > 0.5).astype(int)
accuracy_after = np.mean(predicted == labels) * 100

print(f"\nSeparation Results:")
print(f"  BEFORE sigmoid: {accuracy:.1f}% accuracy")
print(f"  AFTER sigmoid:  {accuracy_after:.1f}% accuracy")
print(f"\n‚úì The sigmoid function compressed the values into two clusters!")
print(f"  ‚Ä¢ Blue points ‚Üí mostly below 0.5")
print(f"  ‚Ä¢ Red points ‚Üí mostly above 0.5")
print(f"  ‚Ä¢ Classes are now MORE separated!")

## 5. Interactive: Try Different Activation Functions

Let's compare how different activation functions separate the same data.

**Try these:**
- **Sigmoid**: Smooth squashing to 0-1
- **Step**: Hard cut at 0 (binary 0 or 1)
- **Tanh**: Smooth squashing to -1 to +1
- **ReLU**: Keeps positive values, zeros out negative

In [None]:
def compare_activations(activation_name):
    """
    Show before/after for different activation functions.
    """
    # Select activation function
    if activation_name == 'Sigmoid':
        activation = sigmoid
        threshold = 0.5
        y_range = (-0.1, 1.1)
    elif activation_name == 'Step':
        activation = step
        threshold = 0.5
        y_range = (-0.1, 1.1)
    elif activation_name == 'Tanh':
        activation = tanh_activation
        threshold = 0.0
        y_range = (-1.1, 1.1)
    else:  # ReLU
        activation = relu
        threshold = 1.0
        y_range = (-0.5, 6)
    
    # Apply activation
    values_transformed = activation(values)
    values_class0_t = activation(values_class0)
    values_class1_t = activation(values_class1)
    
    # Create visualization
    fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(18, 5), dpi=100)
    
    # PLOT 1: Activation function curve
    x_plot = np.linspace(-6, 6, 200)
    y_plot = activation(x_plot)
    ax1.plot(x_plot, y_plot, 'purple', linewidth=3, label=f'{activation_name}(x)')
    ax1.axhline(threshold, color='green', linewidth=2, linestyle='--', 
               alpha=0.5, label=f'Threshold = {threshold}')
    ax1.axvline(0, color='black', linewidth=1, alpha=0.3)
    ax1.axhline(0, color='black', linewidth=1, alpha=0.3)
    ax1.set_xlabel('Input', fontsize=12)
    ax1.set_ylabel('Output', fontsize=12)
    ax1.set_title(f'{activation_name} Function', fontsize=13, fontweight='bold')
    ax1.legend(fontsize=10)
    ax1.grid(True, alpha=0.3)
    ax1.set_xlim(-6, 6)
    
    # PLOT 2: Before
    ax2.scatter(values_class0, np.zeros(n_per_class), c='blue', s=80, alpha=0.6, 
               label='Class 0', edgecolors='k', linewidths=1.5)
    ax2.scatter(values_class1, np.zeros(n_per_class), c='red', s=80, alpha=0.6, 
               label='Class 1', edgecolors='k', linewidths=1.5)
    ax2.axhline(0, color='black', linewidth=1, alpha=0.3)
    ax2.set_xlabel('Original Value', fontsize=12)
    ax2.set_title('BEFORE: Original Data', fontsize=13, fontweight='bold')
    ax2.set_yticks([])
    ax2.legend(fontsize=10)
    ax2.grid(True, alpha=0.3, axis='x')
    ax2.set_xlim(-6, 6)
    
    # PLOT 3: After
    ax3.scatter(values_class0_t, np.zeros(n_per_class), c='blue', s=80, alpha=0.6, 
               label='Class 0', edgecolors='k', linewidths=1.5)
    ax3.scatter(values_class1_t, np.zeros(n_per_class), c='red', s=80, alpha=0.6, 
               label='Class 1', edgecolors='k', linewidths=1.5)
    ax3.axvline(threshold, color='green', linewidth=2, linestyle='--', 
               alpha=0.5, label=f'Threshold = {threshold}')
    ax3.axhline(0, color='black', linewidth=1, alpha=0.3)
    ax3.set_xlabel(f'After {activation_name}', fontsize=12)
    ax3.set_title(f'AFTER: {activation_name} Applied', fontsize=13, fontweight='bold')
    ax3.set_yticks([])
    ax3.legend(fontsize=10)
    ax3.grid(True, alpha=0.3, axis='x')
    ax3.set_xlim(y_range[0], y_range[1])
    
    plt.tight_layout()
    plt.show()
    
    # Calculate accuracy
    predicted = (values_transformed > threshold).astype(int)
    accuracy_after = np.mean(predicted == labels) * 100
    
    print(f"\n{activation_name} Results:")
    print(f"  Accuracy: {accuracy_after:.1f}%")
    
    if accuracy_after > 90:
        print(f"  ‚úì Excellent separation!")
    elif accuracy_after > 75:
        print(f"  üëç Good separation")
    else:
        print(f"  üòê Some overlap remains")

# Interactive widget
print("Compare Different Activation Functions")
print("="*70)
print("Select an activation function to see how it transforms the data:\n")

interact(
    compare_activations,
    activation_name=Dropdown(
        options=['Sigmoid', 'Step', 'Tanh', 'ReLU'],
        value='Sigmoid',
        description='Activation:'
    )
);

## 6. Key Observations

After trying different activation functions, you should notice:

### What All Activation Functions Do:
- Transform/compress input values
- Push values into specific output ranges
- Help separate overlapping classes

### Differences Between Them:
- **Sigmoid**: Smooth compression to 0-1, good for probabilities
- **Step**: Hard binary cutoff (0 or 1), no middle ground
- **Tanh**: Smooth compression to -1 to +1, centered at zero
- **ReLU**: Keeps positive values, zeros negatives, no upper limit

---

**The Big Takeaway:**
Activation functions take messy, overlapping data and **compress/transform** it into ranges where classes become more separated!

## 7. Connection to Module 0

Remember the circular pattern from Module 0 that couldn't be separated by a line?

Here's what activation functions can do:
1. Take the distance from the center for each point: `r = ‚àö(x‚ÇÅ¬≤ + x‚ÇÇ¬≤)`
2. Apply an activation function to that distance
3. Now inner points (small r) and outer points (large r) transform differently
4. The transformation can help separate them!

**You'll see this in action in Module 3 when we build perceptrons!**

## Questions for Your Answer Sheet

**Q4.** How did the sigmoid function change the overlapping values? Describe what you observed.

**Q5.** Which activation function created the clearest separation between classes? Why do you think that is?

**Q6.** In your own words, what do activation functions do to data? Why might this help with classification?

## Next Steps

1. **Answer Q4, Q5, Q6** on your answer sheet
2. **Return to the LMS** and continue to Module 2
3. In Module 2, you'll learn more details about the four activation functions and their properties!