# Interactive Backpropagation Simulator
## For AP Calculus Students Learning Neural Networks

This notebook demonstrates how the chain rule powers neural network learning through interactive visualizations of backpropagation.

**Learning Objectives:**
- Understand how neural networks use the chain rule to learn
- See backpropagation in action with real calculations
- Connect your calculus knowledge to artificial intelligence

---

## Setup: Import Libraries

Run this cell first to load all the tools we need.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

# Set style for better-looking plots
plt.style.use('seaborn-v0_8-darkgrid')
plt.rcParams['figure.figsize'] = (12, 6)

print("‚úì Libraries loaded successfully!")
print("\nReady to explore backpropagation!")

---
## Part 1: The Simple Network from Your Worksheet

Let's recreate the exact network from your discovery worksheet:
- **Hidden layer:** h = 3x + w‚ÇÅ
- **Output layer:** y = 2h + w‚ÇÇ
- **Error:** E = ¬Ω(y - target)¬≤

In [None]:
class SimpleNetwork:
    """
    A simple 2-layer network matching the worksheet.
    """
    
    def __init__(self, w1=1.0, w2=3.0):
        self.w1 = w1
        self.w2 = w2
        self.history = {'w1': [w1], 'w2': [w2], 'error': []}
        
    def forward(self, x, target):
        """Forward pass through the network"""
        self.x = x
        self.h = 3 * x + self.w1
        self.y = 2 * self.h + self.w2
        self.target = target
        self.error = 0.5 * (self.y - target) ** 2
        return self.y, self.error
    
    def backward(self):
        """
        Backward pass: Calculate gradients using the chain rule.
        This is where the magic happens!
        """
        # Gradient of error with respect to output
        dE_dy = (self.y - self.target)
        
        # Gradient for w2: dE/dw2 = dE/dy * dy/dw2
        dy_dw2 = 1
        self.dE_dw2 = dE_dy * dy_dw2
        
        # Gradient for w1: dE/dw1 = dE/dy * dy/dh * dh/dw1 (chain rule!)
        dy_dh = 2
        dh_dw1 = 1
        self.dE_dw1 = dE_dy * dy_dh * dh_dw1
        
        return self.dE_dw1, self.dE_dw2
    
    def update_weights(self, learning_rate=0.1):
        """Update weights using gradient descent"""
        self.w1 = self.w1 - learning_rate * self.dE_dw1
        self.w2 = self.w2 - learning_rate * self.dE_dw2
        self.history['w1'].append(self.w1)
        self.history['w2'].append(self.w2)
        self.history['error'].append(self.error)
    
    def train_step(self, x, target, learning_rate=0.1):
        """One complete training step"""
        self.forward(x, target)
        self.backward()
        self.update_weights(learning_rate)
        return self.error

print("‚úì Network class defined!")

### Let's verify your worksheet calculations!

In [None]:
# Create the network with worksheet values
net = SimpleNetwork(w1=1.0, w2=3.0)

# Run forward pass
x_input = 2
target_output = 20
y_pred, initial_error = net.forward(x_input, target_output)

print("="*60)
print("INITIAL NETWORK STATE")
print("="*60)
print(f"Input: x = {x_input}")
print(f"Target: {target_output}")
print(f"Weights: w1 = {net.w1}, w2 = {net.w2}")
print("\nFORWARD PASS:")
print(f"  h = 3({x_input}) + {net.w1} = {net.h}")
print(f"  y = 2({net.h}) + {net.w2} = {y_pred}")
print(f"  Error = 0.5({y_pred} - {target_output})¬≤ = {initial_error}")

# Calculate gradients
dE_dw1, dE_dw2 = net.backward()

print("\nBACKWARD PASS (Chain Rule):")
print(f"  dE/dw2 = {dE_dw2:.2f}")
print(f"  dE/dw1 = {dE_dw1:.2f}")

print("\nDo these match your worksheet? ‚úì")

---
## Part 2: Watch the Network Learn!

Now let's train the network for multiple steps and watch it improve.

In [None]:
# Reset network
net = SimpleNetwork(w1=1.0, w2=3.0)

# Train for 20 steps
print("Training Progress:")
print(f"{'Step':<6} {'w1':<10} {'w2':<10} {'Output':<10} {'Error':<10}")
print("-" * 50)

for step in range(20):
    error = net.train_step(x_input, target_output, learning_rate=0.1)
    y_current, _ = net.forward(x_input, target_output)
    
    if step % 5 == 0 or step == 19:
        print(f"{step:<6} {net.w1:<10.4f} {net.w2:<10.4f} {y_current:<10.4f} {error:<10.6f}")

print(f"\n‚úì Network converged to target value {target_output}!")

### Visualize the Training Process

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Weight evolution
ax1 = axes[0]
iterations = range(len(net.history['w1']))
ax1.plot(iterations, net.history['w1'], 'b-', linewidth=2, label='w1', marker='o')
ax1.plot(iterations, net.history['w2'], 'r-', linewidth=2, label='w2', marker='s')
ax1.set_xlabel('Training Step', fontsize=12)
ax1.set_ylabel('Weight Value', fontsize=12)
ax1.set_title('Weight Changes During Training', fontweight='bold')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Error reduction
ax2 = axes[1]
ax2.plot(range(1, len(net.history['error']) + 1), net.history['error'], 
         'g-', linewidth=2, marker='o')
ax2.set_xlabel('Training Step', fontsize=12)
ax2.set_ylabel('Error', fontsize=12)
ax2.set_title('Error Decreases Over Time', fontweight='bold')
ax2.grid(True, alpha=0.3)
ax2.set_yscale('log')

plt.tight_layout()
plt.show()

print("‚úì The chain rule guided the network to the solution!")

---
## Part 3: Interactive Exploration

**Your Turn!** Try changing the weights below and see how it affects the error landscape.

In [None]:
# Change these values and re-run the cell!
test_w1 = 1.0  # Try different values
test_w2 = 3.0  # Try different values

net_test = SimpleNetwork(w1=test_w1, w2=test_w2)
y_pred, error = net_test.forward(x_input, target_output)
dE_dw1, dE_dw2 = net_test.backward()

print(f"Current State:")
print(f"  w1={test_w1}, w2={test_w2}")
print(f"  Output: {y_pred:.2f} (target: {target_output})")
print(f"  Error: {error:.4f}")
print(f"\nGradients:")
print(f"  dE/dw1 = {dE_dw1:.4f} ‚Üí {'decrease w1' if dE_dw1 > 0 else 'increase w1'}")
print(f"  dE/dw2 = {dE_dw2:.4f} ‚Üí {'decrease w2' if dE_dw2 > 0 else 'increase w2'}")

# Visualize error landscape
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# w2 error surface
w2_range = np.linspace(test_w2 - 5, test_w2 + 5, 100)
errors_w2 = []
for w2 in w2_range:
    net_tmp = SimpleNetwork(w1=test_w1, w2=w2)
    _, err = net_tmp.forward(x_input, target_output)
    errors_w2.append(err)

axes[0].plot(w2_range, errors_w2, 'r-', linewidth=2)
axes[0].plot(test_w2, error, 'ro', markersize=12, label='Your position')
axes[0].set_xlabel('w2 value')
axes[0].set_ylabel('Error')
axes[0].set_title(f'Error vs w2 (gradient = {dE_dw2:.2f})')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# w1 error surface
w1_range = np.linspace(test_w1 - 5, test_w1 + 5, 100)
errors_w1 = []
for w1 in w1_range:
    net_tmp = SimpleNetwork(w1=w1, w2=test_w2)
    _, err = net_tmp.forward(x_input, target_output)
    errors_w1.append(err)

axes[1].plot(w1_range, errors_w1, 'b-', linewidth=2)
axes[1].plot(test_w1, error, 'bo', markersize=12, label='Your position')
axes[1].set_xlabel('w1 value')
axes[1].set_ylabel('Error')
axes[1].set_title(f'Error vs w1 (gradient = {dE_dw1:.2f})')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

---
## Part 4: Extension - Nonlinear Activation

Real neural networks use **nonlinear activation functions**. The most common is the **sigmoid function**:

$$\sigma(z) = \frac{1}{1 + e^{-z}}$$

Its derivative is: $\sigma'(z) = \sigma(z) \cdot (1 - \sigma(z))$

This adds one more link to the chain!

In [None]:
class NonlinearNetwork:
    """
    Network with sigmoid activation:
    h = sigmoid(3*x + w1)
    y = 2*h + w2
    """
    
    def __init__(self, w1=1.0, w2=3.0):
        self.w1 = w1
        self.w2 = w2
        self.history = {'w1': [w1], 'w2': [w2], 'error': []}
    
    @staticmethod
    def sigmoid(z):
        return 1 / (1 + np.exp(-z))
    
    @staticmethod
    def sigmoid_derivative(z):
        s = NonlinearNetwork.sigmoid(z)
        return s * (1 - s)
    
    def forward(self, x, target):
        self.x = x
        self.z = 3 * x + self.w1
        self.h = self.sigmoid(self.z)
        self.y = 2 * self.h + self.w2
        self.target = target
        self.error = 0.5 * (self.y - target) ** 2
        return self.y, self.error
    
    def backward(self):
        dE_dy = (self.y - self.target)
        
        # For w2 (same as before)
        dy_dw2 = 1
        self.dE_dw2 = dE_dy * dy_dw2
        
        # For w1 (now with sigmoid derivative!)
        dy_dh = 2
        dh_dz = self.sigmoid_derivative(self.z)  # New!
        dz_dw1 = 1
        self.dE_dw1 = dE_dy * dy_dh * dh_dz * dz_dw1
        
        return self.dE_dw1, self.dE_dw2
    
    def update_weights(self, learning_rate=0.1):
        self.w1 = self.w1 - learning_rate * self.dE_dw1
        self.w2 = self.w2 - learning_rate * self.dE_dw2
        self.history['w1'].append(self.w1)
        self.history['w2'].append(self.w2)
        self.history['error'].append(self.error)
    
    def train_step(self, x, target, learning_rate=0.1):
        self.forward(x, target)
        self.backward()
        self.update_weights(learning_rate)
        return self.error

print("‚úì Nonlinear network ready!")
print("\nNotice: The chain for w1 now has 4 terms instead of 3:")
print("  dE/dw1 = dE/dy √ó dy/dh √ó dh/dz √ó dz/dw1")
print("           ‚Üë sigmoid derivative added here!")

In [None]:
# Compare linear vs nonlinear
net_linear = SimpleNetwork(w1=1.0, w2=3.0)
net_nonlinear = NonlinearNetwork(w1=1.0, w2=3.0)

for _ in range(30):
    net_linear.train_step(x_input, target_output, learning_rate=0.5)
    net_nonlinear.train_step(x_input, target_output, learning_rate=0.5)

# Visualize
plt.figure(figsize=(12, 5))
iterations = range(len(net_linear.history['error']))
plt.plot(iterations, net_linear.history['error'], 'b-', 
         linewidth=2, label='Linear', marker='o', markersize=4)
plt.plot(iterations, net_nonlinear.history['error'], 'r-',
         linewidth=2, label='Nonlinear (Sigmoid)', marker='s', markersize=4)
plt.xlabel('Training Step', fontsize=12)
plt.ylabel('Error', fontsize=12)
plt.title('Linear vs Nonlinear: Both Use the Chain Rule!', fontweight='bold')
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)
plt.yscale('log')
plt.show()

print("‚úì Both networks learn, but the chain is longer for the nonlinear one!")

---
## Summary: What You've Learned

### üéØ Key Takeaways

1. **Backpropagation = Chain Rule**
   - Every weight update uses the chain rule
   - Longer networks = longer chains of derivatives
   - Same principle scales from 2 weights to billions!

2. **Gradients Show the Direction**
   - Negative gradient ‚Üí increase weight
   - Positive gradient ‚Üí decrease weight
   - Learning rate controls step size

3. **Your Calculus Matters**
   - Every chain rule problem builds AI intuition
   - Modern AI wouldn't exist without this math
   - You're learning the foundation of machine learning!

### üí° The Big Picture

When you compute $\frac{d}{dx}[f(g(x))]$, you're practicing the exact operation that teaches:
- ChatGPT to write
- Self-driving cars to navigate
- Medical AI to diagnose diseases
- Image generators to create art

**The chain rule isn't just homework‚Äîit's the engine of artificial intelligence!**