# Lab 4 - Module 1: Anatomy of a Tiny Neural Network

**Learning Objectives:**
- Understand the architecture of a simple 2-2-1 neural network
- Count and identify all network parameters
- See how hidden neurons transform data into a new representation
- **Discover that XOR becomes linearly separable in hidden space**
- Manually tune parameters to solve XOR (appreciate why automatic training is needed!)

**Time:** ~15 minutes

---

**Remember from Module 0:** You manually added a third dimension (x‚ÇÉ = x‚ÇÅ √ó x‚ÇÇ) that made XOR separable. A flat plane in 3D became a curved boundary in 2D.

**Today's Big Idea:** Hidden layers do this automatically! Each hidden neuron creates a new "dimension" - and the network learns which transformations help solve the problem.

## 1. Network Architecture: The 2-2-1 Network

We'll build a tiny neural network with:
- **Input layer:** 2 neurons (x‚ÇÅ, x‚ÇÇ)
- **Hidden layer:** 2 neurons (h‚ÇÅ, h‚ÇÇ) with sigmoid activation
- **Output layer:** 1 neuron (output) with sigmoid activation

```
    x‚ÇÅ ‚îÄ‚îÄ‚îê
         ‚îú‚îÄ‚Üí h‚ÇÅ ‚îÄ‚îê
    x‚ÇÇ ‚îÄ‚îÄ‚î§        ‚îú‚îÄ‚Üí output
         ‚îú‚îÄ‚Üí h‚ÇÇ ‚îÄ‚îò
    x‚ÇÅ ‚îÄ‚îÄ‚îò
    x‚ÇÇ
```

### Connection to What You Know:

**From Lab 3:** Each hidden neuron (h‚ÇÅ and h‚ÇÇ) is a **perceptron** with sigmoid activation!
- h‚ÇÅ has its own weights and bias: w‚ÇÅ‚ÇÅ, w‚ÇÅ‚ÇÇ, b‚ÇÅ
- h‚ÇÇ has its own weights and bias: w‚ÇÇ‚ÇÅ, w‚ÇÇ‚ÇÇ, b‚ÇÇ

**From Module 0:** Instead of manually adding x‚ÇÉ = x‚ÇÅ √ó x‚ÇÇ, the network creates h‚ÇÅ and h‚ÇÇ automatically!
- h‚ÇÅ and h‚ÇÇ are like new "features" computed from x‚ÇÅ and x‚ÇÇ
- The output layer makes decisions in this new (h‚ÇÅ, h‚ÇÇ) space

## 2. Setup: Load XOR Data and Define Network

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.gridspec import GridSpec
import ipywidgets as widgets
from ipywidgets import interact, FloatSlider, Button, VBox, HBox, HTML, Output, Layout, Dropdown
from IPython.display import display, clear_output

# Set random seed for reproducibility
np.random.seed(42)

# Function to create different XOR datasets
def create_xor_dataset(dataset_type='perfect'):
    """
    Create different versions of XOR data.
    
    Args:
        dataset_type: 'perfect' (4 perfect points), 'clean' (tight clusters), 
                      or 'noisy' (Gaussian clouds)
    """
    if dataset_type == 'perfect':
        # Perfect XOR - just 4 points
        X = np.array([
            [-1.5, -1.5],  # Class 0
            [1.5, 1.5],    # Class 0
            [-1.5, 1.5],   # Class 1
            [1.5, -1.5]    # Class 1
        ])
        y = np.array([0, 0, 1, 1])
        
        # Replicate points to make visualization clearer
        X = np.repeat(X, 25, axis=0)
        y = np.repeat(y, 25)
        # Add tiny noise for visual variety
        X = X + np.random.randn(len(X), 2) * 0.05
        
    elif dataset_type == 'clean':
        # Clean XOR - tight clusters with minimal noise
        n_per_corner = 25
        corners_class0 = [[-1.5, -1.5], [1.5, 1.5]]
        corners_class1 = [[-1.5, 1.5], [1.5, -1.5]]
        
        X_class0 = []
        X_class1 = []
        
        for corner in corners_class0:
            points = np.random.randn(n_per_corner, 2) * 0.1 + corner  # Very tight
            X_class0.append(points)
        
        for corner in corners_class1:
            points = np.random.randn(n_per_corner, 2) * 0.1 + corner  # Very tight
            X_class1.append(points)
        
        X_class0 = np.vstack(X_class0)
        X_class1 = np.vstack(X_class1)
        
        X = np.vstack([X_class0, X_class1])
        y = np.hstack([np.zeros(len(X_class0)), np.ones(len(X_class1))])
        
    else:  # 'noisy'
        # Noisy XOR - Gaussian clouds (original)
        n_per_corner = 25
        corners_class0 = [[-1.5, -1.5], [1.5, 1.5]]
        corners_class1 = [[-1.5, 1.5], [1.5, -1.5]]
        
        X_class0 = []
        X_class1 = []
        
        for corner in corners_class0:
            points = np.random.randn(n_per_corner, 2) * 0.3 + corner  # Noisy
            X_class0.append(points)
        
        for corner in corners_class1:
            points = np.random.randn(n_per_corner, 2) * 0.3 + corner  # Noisy
            X_class1.append(points)
        
        X_class0 = np.vstack(X_class0)
        X_class1 = np.vstack(X_class1)
        
        X = np.vstack([X_class0, X_class1])
        y = np.hstack([np.zeros(len(X_class0)), np.ones(len(X_class1))])
    
    return X, y

# Create initial dataset (start with clean version)
X_xor, y_xor = create_xor_dataset('clean')
current_dataset_type = 'clean'

print("‚úì XOR dataset created!")
print(f"  Total points: {len(X_xor)}")
print(f"  Class 0 (blue): {np.sum(y_xor==0)} points")
print(f"  Class 1 (red): {np.sum(y_xor==1)} points")
print("\nYou can switch between dataset types in the interactive section below!")

In [None]:
# Define sigmoid activation function
def sigmoid(z):
    """Sigmoid activation function."""
    return 1 / (1 + np.exp(-np.clip(z, -500, 500)))  # Clip to prevent overflow

# Define the 2-2-1 network
class TinyNetwork:
    """A tiny 2-2-1 neural network for XOR."""
    
    def __init__(self):
        # Hidden layer 1 parameters
        self.w11 = 0.0
        self.w12 = 0.0
        self.b1 = 0.0
        
        # Hidden layer 2 parameters
        self.w21 = 0.0
        self.w22 = 0.0
        self.b2 = 0.0
        
        # Output layer parameters
        self.w_out1 = 0.0
        self.w_out2 = 0.0
        self.b_out = 0.0
    
    def forward(self, x1, x2):
        """Forward pass through the network.
        
        Returns:
            output: final prediction (0 to 1)
            h1: hidden neuron 1 activation (0 to 1)
            h2: hidden neuron 2 activation (0 to 1)
        """
        # Hidden layer activations
        z1 = self.w11 * x1 + self.w12 * x2 + self.b1
        h1 = sigmoid(z1)
        
        z2 = self.w21 * x1 + self.w22 * x2 + self.b2
        h2 = sigmoid(z2)
        
        # Output layer activation
        z_out = self.w_out1 * h1 + self.w_out2 * h2 + self.b_out
        output = sigmoid(z_out)
        
        return output, h1, h2
    
    def predict_batch(self, X):
        """Make predictions for a batch of inputs."""
        predictions = []
        h1_vals = []
        h2_vals = []
        
        for x in X:
            out, h1, h2 = self.forward(x[0], x[1])
            predictions.append(out)
            h1_vals.append(h1)
            h2_vals.append(h2)
        
        return np.array(predictions), np.array(h1_vals), np.array(h2_vals)

# Create network instance
network = TinyNetwork()

print("\n‚úì Network architecture defined!")
print("\nNetwork has:")
print("  Input layer: 2 neurons (x‚ÇÅ, x‚ÇÇ)")
print("  Hidden layer: 2 neurons (h‚ÇÅ, h‚ÇÇ) with sigmoid activation")
print("  Output layer: 1 neuron with sigmoid activation")

## 3. Parameter Counting Exercise

Before we start tuning, let's count how many parameters this network has!

**Hidden Neuron 1 (h‚ÇÅ):**
- w‚ÇÅ‚ÇÅ (weight from x‚ÇÅ to h‚ÇÅ)
- w‚ÇÅ‚ÇÇ (weight from x‚ÇÇ to h‚ÇÅ)
- b‚ÇÅ (bias for h‚ÇÅ)
- **Total: 3 parameters**

**Hidden Neuron 2 (h‚ÇÇ):**
- w‚ÇÇ‚ÇÅ (weight from x‚ÇÅ to h‚ÇÇ)
- w‚ÇÇ‚ÇÇ (weight from x‚ÇÇ to h‚ÇÇ)
- b‚ÇÇ (bias for h‚ÇÇ)
- **Total: 3 parameters**

**Output Neuron:**
- w_out1 (weight from h‚ÇÅ to output)
- w_out2 (weight from h‚ÇÇ to output)
- b_out (bias for output)
- **Total: 3 parameters**

**Grand Total: 9 parameters**

You're about to manually tune all 9 of these to solve XOR. This will help you appreciate why automatic training is so valuable!

## 4. Interactive Network Builder: Solve XOR by Hand!

### üéØ Your Goal

**IMPORTANT:** A 2-2-1 network **cannot** create a clean nonlinear boundary in the original (x‚ÇÅ, x‚ÇÇ) space with only 2 hidden neurons. That would require more neurons!

Instead, your network solves XOR by:
1. **Transforming** the data from (x‚ÇÅ, x‚ÇÇ) space into a new **(h‚ÇÅ, h‚ÇÇ) hidden space**
2. **Separating** the transformed data with a simple straight line in hidden space
3. This simple separation in hidden space **looks complex** when projected back to input space

### What Success Looks Like:

Adjust the 9 sliders until you see:
- **Bottom panels:** Each hidden neuron (H1, H2) creates a meaningful split
- **Top-right panel (MOST IMPORTANT):** XOR becomes linearly separable in (h‚ÇÅ, h‚ÇÇ) space! ‚≠ê
- **Top-left panel:** The resulting boundary in input space (complex is normal and expected!)

### Understanding Dataset Types:

**üìä You can choose from three datasets:**

1. **Clean XOR (Recommended Start):** Tight clusters with minimal noise
   - **Goal:** ~99-100% accuracy achievable
   - Perfect for learning the concept!
   
2. **Noisy XOR (Realistic):** Gaussian clouds with overlap
   - **Goal:** ~85% accuracy maximum
   - Shows how real data behaves - perfect separation is impossible!
   - The hidden space will show approximate separation
   
3. **Perfect XOR:** Just 4 points at corners
   - **Goal:** 100% accuracy
   - Easiest to understand, but unrealistic

### Target Pattern for Hidden Space:

You want the **top-right panel** to show something like this:

```
    h‚ÇÇ
    ‚Üë
  1 ‚îÇ  red     blue
    ‚îÇ   x        .
    ‚îÇ   x        .
0.5 ‚îú‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ (green line)
    ‚îÇ   .        x
    ‚îÇ   .        x
  0 ‚îÇ blue      red
    ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚Üí h‚ÇÅ
      0  0.5    1
```

When XOR forms linearly separable clusters in (h‚ÇÅ, h‚ÇÇ) space, a simple straight line (green) can separate them!

**üí° Key Insight:** With noisy data, the clusters may overlap slightly in hidden space - this is normal and realistic!

In [None]:
# Visualization function for four panels
def plot_network_state(network, X, y):
    """Plot the four-panel visualization of network state."""
    
    # Get predictions and hidden activations
    predictions, h1_vals, h2_vals = network.predict_batch(X)
    accuracy = np.mean((predictions > 0.5).astype(int) == y) * 100
    
    # Create figure with constrained_layout to avoid tight_layout warnings
    fig = plt.figure(figsize=(16, 14), dpi=90, constrained_layout=True)
    gs = GridSpec(2, 2, figure=fig)
    
    # Create mesh for decision boundaries
    x_min, x_max = -2.5, 2.5
    y_min, y_max = -2.5, 2.5
    h = 0.05
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))
    mesh_points = np.c_[xx.ravel(), yy.ravel()]
    
    # Get network outputs for mesh
    Z_out, Z_h1, Z_h2 = network.predict_batch(mesh_points)
    Z_out = Z_out.reshape(xx.shape)
    Z_h1 = Z_h1.reshape(xx.shape)
    Z_h2 = Z_h2.reshape(xx.shape)
    
    # --- TOP-LEFT: Input Space Decision Boundary ---
    ax1 = fig.add_subplot(gs[0, 0])
    ax1.contourf(xx, yy, Z_out, levels=20, alpha=0.3, cmap='RdBu_r')
    ax1.contour(xx, yy, Z_out, levels=[0.5], colors='green', linewidths=3)
    ax1.scatter(X[y==0, 0], X[y==0, 1], c='blue', s=80, alpha=0.7,
               edgecolors='k', linewidths=1.5, label='Class 0')
    ax1.scatter(X[y==1, 0], X[y==1, 1], c='red', s=80, alpha=0.7,
               edgecolors='k', linewidths=1.5, label='Class 1')
    ax1.set_xlim(x_min, x_max)
    ax1.set_ylim(y_min, y_max)
    ax1.set_xlabel('x‚ÇÅ', fontsize=12, fontweight='bold')
    ax1.set_ylabel('x‚ÇÇ', fontsize=12, fontweight='bold')
    ax1.set_title('Input Space (x‚ÇÅ, x‚ÇÇ)\nFinal Decision Boundary\n(Complex is OK!)',
                 fontsize=11, fontweight='bold')
    ax1.legend(loc='upper right', fontsize=9)
    ax1.grid(True, alpha=0.3)
    ax1.set_aspect('equal')
    
    # --- TOP-RIGHT: Hidden Space (THE KEY PLOT!) ---
    ax2 = fig.add_subplot(gs[0, 1])
    # Plot transformed data points
    ax2.scatter(h1_vals[y==0], h2_vals[y==0], c='blue', s=100, alpha=0.8,
               edgecolors='k', linewidths=2, label='Class 0', zorder=3)
    ax2.scatter(h1_vals[y==1], h2_vals[y==1], c='red', s=100, alpha=0.8,
               edgecolors='k', linewidths=2, label='Class 1', zorder=3)
    
    # Draw output layer's decision boundary in hidden space
    if abs(network.w_out2) > 0.01:  # Avoid division by zero
        h1_line = np.linspace(-0.1, 1.1, 100)
        # Decision boundary: w_out1*h1 + w_out2*h2 + b_out = 0
        h2_line = -(network.w_out1 * h1_line + network.b_out) / network.w_out2
        # Only plot where h2 is in reasonable range
        valid_mask = (h2_line >= -0.1) & (h2_line <= 1.1)
        ax2.plot(h1_line[valid_mask], h2_line[valid_mask], 'g-', linewidth=4,
                label='Output Decision Line', zorder=2, alpha=0.8)
    
    ax2.set_xlim(-0.1, 1.1)
    ax2.set_ylim(-0.1, 1.1)
    ax2.set_xlabel('h‚ÇÅ (Hidden Neuron 1)', fontsize=12, fontweight='bold')
    ax2.set_ylabel('h‚ÇÇ (Hidden Neuron 2)', fontsize=12, fontweight='bold')
    ax2.set_title('‚≠ê HIDDEN SPACE (h‚ÇÅ, h‚ÇÇ) ‚≠ê\nYour Main Goal: Linear Separation Here!\n(This is where the magic happens!)',
                 fontsize=11, fontweight='bold', color='darkgreen')
    ax2.legend(loc='upper right', fontsize=9)
    ax2.grid(True, alpha=0.3)
    ax2.set_aspect('equal')
    # Add border to emphasize importance
    for spine in ax2.spines.values():
        spine.set_edgecolor('darkgreen')
        spine.set_linewidth(3)
    
    # --- BOTTOM-LEFT: Hidden Neuron 1 Boundary ---
    ax3 = fig.add_subplot(gs[1, 0])
    ax3.contourf(xx, yy, Z_h1, levels=20, alpha=0.4, cmap='Purples')
    ax3.contour(xx, yy, Z_h1, levels=[0.5], colors='purple', linewidths=3)
    ax3.scatter(X[y==0, 0], X[y==0, 1], c='blue', s=60, alpha=0.6,
               edgecolors='k', linewidths=1)
    ax3.scatter(X[y==1, 0], X[y==1, 1], c='red', s=60, alpha=0.6,
               edgecolors='k', linewidths=1)
    ax3.set_xlim(x_min, x_max)
    ax3.set_ylim(y_min, y_max)
    ax3.set_xlabel('x‚ÇÅ', fontsize=12, fontweight='bold')
    ax3.set_ylabel('x‚ÇÇ', fontsize=12, fontweight='bold')
    ax3.set_title('Hidden Neuron 1 (h‚ÇÅ)\nWhat does H1 separate?',
                 fontsize=11, fontweight='bold')
    ax3.grid(True, alpha=0.3)
    ax3.set_aspect('equal')
    
    # --- BOTTOM-RIGHT: Hidden Neuron 2 Boundary ---
    ax4 = fig.add_subplot(gs[1, 1])
    ax4.contourf(xx, yy, Z_h2, levels=20, alpha=0.4, cmap='Oranges')
    ax4.contour(xx, yy, Z_h2, levels=[0.5], colors='orange', linewidths=3)
    ax4.scatter(X[y==0, 0], X[y==0, 1], c='blue', s=60, alpha=0.6,
               edgecolors='k', linewidths=1)
    ax4.scatter(X[y==1, 0], X[y==1, 1], c='red', s=60, alpha=0.6,
               edgecolors='k', linewidths=1)
    ax4.set_xlim(x_min, x_max)
    ax4.set_ylim(y_min, y_max)
    ax4.set_xlabel('x‚ÇÅ', fontsize=12, fontweight='bold')
    ax4.set_ylabel('x‚ÇÇ', fontsize=12, fontweight='bold')
    ax4.set_title('Hidden Neuron 2 (h‚ÇÇ)\nWhat does H2 separate?',
                 fontsize=11, fontweight='bold')
    ax4.grid(True, alpha=0.3)
    ax4.set_aspect('equal')
    
    plt.show()
    
    return accuracy

print("‚úì Visualization functions ready!")

In [None]:
# Create the interactive network tuning interface

# Global variable to track current dataset
current_dataset_type = 'clean'

# Accuracy display and guidance
accuracy_html = HTML(value="<h3 style='text-align:center;'>Current Accuracy: ---%</h3>")
guidance_html = HTML(value="<div style='background:#e8f0fe; padding:15px; border-radius:8px; margin:10px 0;'><p style='margin:0;'>Adjust the sliders below to tune your network!</p></div>")

# Dataset selector
dataset_dropdown = Dropdown(
    options=[
        ('Clean XOR (easy - near-perfect separation possible)', 'clean'),
        ('Noisy XOR (realistic - approximate separation)', 'noisy'),
        ('Perfect XOR (4 points - 100% possible)', 'perfect')
    ],
    value='clean',
    description='Dataset:',
    style={'description_width': '80px'},
    layout=Layout(width='600px')
)

# Create sliders for all 9 parameters
slider_layout = Layout(width='400px')
slider_style = {'description_width': '120px'}

# Hidden Neuron 1 sliders
w11_slider = FloatSlider(value=0, min=-10, max=10, step=0.5,
                         description='H1: w‚ÇÅ‚ÇÅ', layout=slider_layout, style=slider_style)
w12_slider = FloatSlider(value=0, min=-10, max=10, step=0.5,
                         description='H1: w‚ÇÅ‚ÇÇ', layout=slider_layout, style=slider_style)
b1_slider = FloatSlider(value=0, min=-10, max=10, step=0.5,
                        description='H1: b‚ÇÅ', layout=slider_layout, style=slider_style)

# Hidden Neuron 2 sliders
w21_slider = FloatSlider(value=0, min=-10, max=10, step=0.5,
                         description='H2: w‚ÇÇ‚ÇÅ', layout=slider_layout, style=slider_style)
w22_slider = FloatSlider(value=0, min=-10, max=10, step=0.5,
                         description='H2: w‚ÇÇ‚ÇÇ', layout=slider_layout, style=slider_style)
b2_slider = FloatSlider(value=0, min=-10, max=10, step=0.5,
                        description='H2: b‚ÇÇ', layout=slider_layout, style=slider_style)

# Output layer sliders
w_out1_slider = FloatSlider(value=0, min=-10, max=10, step=0.5,
                            description='Out: w_out1', layout=slider_layout, style=slider_style)
w_out2_slider = FloatSlider(value=0, min=-10, max=10, step=0.5,
                            description='Out: w_out2', layout=slider_layout, style=slider_style)
b_out_slider = FloatSlider(value=0, min=-10, max=10, step=0.5,
                           description='Out: b_out', layout=slider_layout, style=slider_style)

# Buttons
update_btn = Button(
    description='üîÑ Update Network',
    button_style='primary',
    layout=Layout(width='220px', height='50px'),
    tooltip='Click to see your network\'s performance'
)

example_btn = Button(
    description='üìñ Load Example',
    button_style='success',
    layout=Layout(width='220px', height='50px'),
    tooltip='Load a simple working solution'
)

perfect_solution_btn = Button(
    description='üí° Perfect Solution',
    button_style='danger',
    layout=Layout(width='220px', height='50px'),
    tooltip='Load the hard-to-find perfect XOR solution (try to find it yourself first!)'
)

reset_btn = Button(
    description='‚Ü∫ Reset Parameters',
    button_style='warning',
    layout=Layout(width='220px', height='50px'),
    tooltip='Reset all parameters to zero'
)

reset_sim_btn = Button(
    description='üîÑ Reset Simulation',
    button_style='',
    layout=Layout(width='220px', height='50px'),
    tooltip='Clear output and start fresh'
)

# Output area for plots
plot_output = Output()

def update_network(btn):
    """Update network parameters and redraw visualization."""
    global X_xor, y_xor, current_dataset_type
    
    # Update network parameters from sliders
    network.w11 = w11_slider.value
    network.w12 = w12_slider.value
    network.b1 = b1_slider.value
    
    network.w21 = w21_slider.value
    network.w22 = w22_slider.value
    network.b2 = b2_slider.value
    
    network.w_out1 = w_out1_slider.value
    network.w_out2 = w_out2_slider.value
    network.b_out = b_out_slider.value
    
    # Redraw plot
    with plot_output:
        clear_output(wait=True)
        accuracy = plot_network_state(network, X_xor, y_xor)
    
    # Update accuracy display
    accuracy_html.value = f"<h3 style='text-align:center; color:#1967d2;'>Current Accuracy: {accuracy:.1f}%</h3>"
    
    # Update guidance based on accuracy AND dataset type
    if current_dataset_type == 'noisy':
        # Noisy dataset - different expectations
        if accuracy >= 85:
            guidance_html.value = """
            <div style='background:#d4edda; border-left:5px solid #28a745; padding:15px; border-radius:8px; margin:10px 0;'>
                <h4 style='color:#155724; margin-top:0;'>üéâ Excellent! You found a good solution!</h4>
                <p style='color:#155724; margin:5px 0;'>
                    <b>Look at the top-right panel (h‚ÇÅ, h‚ÇÇ):</b><br/>
                    ‚Ä¢ The noisy XOR data has been transformed into hidden space<br/>
                    ‚Ä¢ The clusters overlap slightly due to noise - perfect separation is impossible!<br/>
                    ‚Ä¢ The green line finds the best approximate boundary<br/>
                    ‚Ä¢ This is realistic - real data is always noisy!
                </p>
                <p style='color:#155724; margin:5px 0;'>
                    üí° Try switching to "Clean XOR" or "Perfect XOR" dataset to see perfect separation!
                </p>
            </div>
            """
        elif accuracy >= 70:
            guidance_html.value = """
            <div style='background:#fff3cd; border-left:5px solid #ffc107; padding:15px; border-radius:8px; margin:10px 0;'>
                <h4 style='color:#856404; margin-top:0;'>üìà Good progress!</h4>
                <p style='color:#856404; margin:5px 0;'>
                    <b>Note:</b> This noisy dataset cannot be perfectly separated!<br/>
                    ‚Ä¢ Goal: ~85% accuracy (not 100%!)<br/>
                    ‚Ä¢ The Gaussian noise creates overlapping regions<br/>
                    ‚Ä¢ Focus on getting clean separation in the top-right hidden space panel
                </p>
            </div>
            """
        else:
            guidance_html.value = """
            <div style='background:#f8d7da; border-left:5px solid #dc3545; padding:15px; border-radius:8px; margin:10px 0;'>
                <h4 style='color:#721c24; margin-top:0;'>üí° Strategy Guide</h4>
                <p style='color:#721c24; margin:5px 0;'>
                    <b>With noisy data, aim for ~85% accuracy (not 100%!)</b>
                </p>
                <p style='color:#721c24; margin:5px 0;'>
                    Try exploring different parameter combinations, or switch to a cleaner dataset!
                </p>
            </div>
            """
    else:
        # Clean or perfect dataset - can achieve high accuracy
        if accuracy >= 99:
            guidance_html.value = """
            <div style='background:#d4edda; border-left:5px solid #28a745; padding:15px; border-radius:8px; margin:10px 0;'>
                <h4 style='color:#155724; margin-top:0;'>üéâ AMAZING! You solved XOR!</h4>
                <p style='color:#155724; margin:5px 0;'>
                    <b>Look at the top-right panel (h‚ÇÅ, h‚ÇÇ):</b><br/>
                    ‚Ä¢ XOR has been transformed into two linearly separable clusters<br/>
                    ‚Ä¢ The green line is a simple straight boundary in hidden space<br/>
                    ‚Ä¢ This is exactly what hidden layers do - create new representations!
                </p>
                <p style='color:#155724; margin:5px 0;'>
                    Try tweaking parameters slightly to see how robust your solution is!<br/>
                    Or switch to "Noisy XOR" to see how real data behaves!
                </p>
            </div>
            """
        elif accuracy >= 90:
            guidance_html.value = """
            <div style='background:#fff3cd; border-left:5px solid #ffc107; padding:15px; border-radius:8px; margin:10px 0;'>
                <h4 style='color:#856404; margin-top:0;'>üéØ Very Close!</h4>
                <p style='color:#856404; margin:5px 0;'>
                    <b>Strategy:</b><br/>
                    ‚Ä¢ Your hidden neurons are working well!<br/>
                    ‚Ä¢ Fine-tune the parameters to get that last bit of accuracy<br/>
                    ‚Ä¢ Watch the green line in the top-right panel - it should cleanly separate the clusters
                </p>
            </div>
            """
        elif accuracy >= 75:
            guidance_html.value = """
            <div style='background:#cce5ff; border-left:5px solid #004085; padding:15px; border-radius:8px; margin:10px 0;'>
                <h4 style='color:#004085; margin-top:0;'>üìà Good Progress!</h4>
                <p style='color:#004085; margin:5px 0;'>
                    <b>Strategy:</b><br/>
                    ‚Ä¢ Check the bottom panels: Are H1 and H2 creating useful splits?<br/>
                    ‚Ä¢ Experiment with different parameter combinations<br/>
                    ‚Ä¢ Try making the weights larger or smaller<br/>
                    ‚Ä¢ Perfect separation IS possible - keep exploring!
                </p>
            </div>
            """
        else:
            guidance_html.value = """
            <div style='background:#f8d7da; border-left:5px solid #dc3545; padding:15px; border-radius:8px; margin:10px 0;'>
                <h4 style='color:#721c24; margin-top:0;'>üí° Keep Exploring!</h4>
                <p style='color:#721c24; margin:5px 0;'>
                    XOR is a challenging problem to solve manually with 9 parameters.<br/>
                    ‚Ä¢ Try different combinations of weights and biases<br/>
                    ‚Ä¢ Watch how the hidden space (top-right panel) changes<br/>
                    ‚Ä¢ Perfect separation IS achievable, but hard to find!<br/>
                    ‚Ä¢ This is why automatic training is so valuable!
                </p>
                <p style='color:#721c24; margin:5px 0; font-style:italic;'>
                    Feeling stuck? Click "Load Example" for a simple (non-perfect) solution,<br/>
                    or "Perfect Solution" to see the hard-to-find configuration that achieves 100%!
                </p>
            </div>
            """

def change_dataset(change):
    """Handle dataset change from dropdown."""
    global X_xor, y_xor, current_dataset_type
    
    current_dataset_type = change['new']
    X_xor, y_xor = create_xor_dataset(current_dataset_type)
    
    # Update network visualization
    update_network(None)
    
    # Show info about the dataset
    if current_dataset_type == 'perfect':
        info_msg = "‚úì Switched to Perfect XOR (4 points) - 100% accuracy is achievable!"
    elif current_dataset_type == 'clean':
        info_msg = "‚úì Switched to Clean XOR (tight clusters) - near-perfect separation possible!"
    else:
        info_msg = "‚úì Switched to Noisy XOR (Gaussian clouds) - expect ~85% accuracy max!"
    
    print(info_msg)

def load_example(btn):
    """Load simple example solution (not perfect, but understandable)."""
    # Set sliders to simple, interpretable values
    w11_slider.value = 5
    w12_slider.value = 0
    b1_slider.value = 0
    
    w21_slider.value = 0
    w22_slider.value = 5
    b2_slider.value = 0
    
    w_out1_slider.value = 5
    w_out2_slider.value = 5
    b_out_slider.value = -7
    
    # Update network
    update_network(None)
    
    # Show explanation
    guidance_html.value = """
    <div style='background:#e7f3ff; border-left:5px solid #2196F3; padding:15px; border-radius:8px; margin:10px 0;'>
        <h4 style='color:#0d47a1; margin-top:0;'>üìñ Simple Example Solution Loaded!</h4>
        <p style='color:#0d47a1; margin:5px 0;'>
            <b>Hidden Neuron 1 (H1):</b> w‚ÇÅ‚ÇÅ=5, w‚ÇÅ‚ÇÇ=0, b‚ÇÅ=0<br/>
            ‚Ä¢ Creates vertical boundary at x‚ÇÅ=0<br/>
            ‚Ä¢ Separates left points from right points
        </p>
        <p style='color:#0d47a1; margin:5px 0;'>
            <b>Hidden Neuron 2 (H2):</b> w‚ÇÇ‚ÇÅ=0, w‚ÇÇ‚ÇÇ=5, b‚ÇÇ=0<br/>
            ‚Ä¢ Creates horizontal boundary at x‚ÇÇ=0<br/>
            ‚Ä¢ Separates bottom points from top points
        </p>
        <p style='color:#0d47a1; margin:5px 0;'>
            <b>Output Layer:</b> w_out1=5, w_out2=5, b_out=-7<br/>
            ‚Ä¢ Attempts to combine h‚ÇÅ and h‚ÇÇ logically
        </p>
        <p style='color:#0d47a1; margin:5px 0; font-weight:bold;'>
            This is an INTUITIVE solution, but it doesn't achieve 100% on XOR!<br/>
            It shows a simple, understandable approach.
        </p>
        <p style='color:#0d47a1; margin:5px 0; font-style:italic;'>
            A perfect XOR solution exists, but uses non-obvious parameters.<br/>
            Click "Perfect Solution" to see it (after exploring on your own!).
        </p>
    </div>
    """

def load_perfect_solution(btn):
    """Load the hard-to-find perfect XOR solution."""
    # Show warning first
    guidance_html.value = """
    <div style='background:#fff3cd; border-left:5px solid #ffc107; padding:15px; border-radius:8px; margin:10px 0;'>
        <h4 style='color:#856404; margin-top:0;'>‚ö†Ô∏è Are you sure?</h4>
        <p style='color:#856404; margin:5px 0;'>
            This will load a perfect XOR solution that achieves 100% accuracy.<br/>
            <b>It's MUCH more valuable to struggle and explore first!</b>
        </p>
        <p style='color:#856404; margin:5px 0;'>
            The struggle helps you understand:<br/>
            ‚Ä¢ How hard it is to find good parameters manually<br/>
            ‚Ä¢ Why the parameter space is so complex<br/>
            ‚Ä¢ Why we need automatic training (Module 2!)
        </p>
        <p style='color:#856404; margin:5px 0; font-weight:bold;'>
            Click "Perfect Solution" again to confirm you want to see it.
        </p>
    </div>
    """
    
    # Change button behavior to actually load on second click
    if perfect_solution_btn.description == 'üí° Perfect Solution':
        perfect_solution_btn.description = '‚ö†Ô∏è Confirm Load'
        perfect_solution_btn.button_style = 'warning'
    else:
        # Actually load the perfect solution
        w11_slider.value = -10
        w12_slider.value = -10
        b1_slider.value = -10
        
        w21_slider.value = -10
        w22_slider.value = -10
        b2_slider.value = 5
        
        w_out1_slider.value = -10
        w_out2_slider.value = 10
        b_out_slider.value = -5
        
        # Update network
        update_network(None)
        
        # Reset button
        perfect_solution_btn.description = 'üí° Perfect Solution'
        perfect_solution_btn.button_style = 'danger'
        
        # Show explanation
        guidance_html.value = """
        <div style='background:#f3e5f5; border-left:5px solid #9c27b0; padding:15px; border-radius:8px; margin:10px 0;'>
            <h4 style='color:#6a1b9a; margin-top:0;'>üí° Perfect XOR Solution Loaded!</h4>
            <p style='color:#6a1b9a; margin:5px 0;'>
                <b>This solution achieves 100% accuracy on perfect XOR!</b>
            </p>
            <p style='color:#6a1b9a; margin:5px 0;'>
                <b>Hidden Neuron 1 (H1):</b> w‚ÇÅ‚ÇÅ=-10, w‚ÇÅ‚ÇÇ=-10, b‚ÇÅ=-10<br/>
                ‚Ä¢ Detects when BOTH x‚ÇÅ AND x‚ÇÇ are strongly negative<br/>
                ‚Ä¢ Activates high (‚âà1) only for bottom-left corner
            </p>
            <p style='color:#6a1b9a; margin:5px 0;'>
                <b>Hidden Neuron 2 (H2):</b> w‚ÇÇ‚ÇÅ=-10, w‚ÇÇ‚ÇÇ=-10, b‚ÇÇ=+5<br/>
                ‚Ä¢ Similar to H1 but with positive bias<br/>
                ‚Ä¢ Creates different activation pattern
            </p>
            <p style='color:#6a1b9a; margin:5px 0;'>
                <b>Output Layer:</b> w_out1=-10, w_out2=+10, b_out=-5<br/>
                ‚Ä¢ Uses NEGATIVE weight on H1, POSITIVE on H2<br/>
                ‚Ä¢ This creates the XOR logic in hidden space!
            </p>
            <p style='color:#6a1b9a; margin:5px 0; font-weight:bold;'>
                Look at the top-right panel: Perfect linear separation in hidden space!
            </p>
            <p style='color:#6a1b9a; margin:5px 0;'>
                <b>Key Insight:</b> This solution is NON-OBVIOUS!<br/>
                ‚Ä¢ Uses all negative weights in hidden layer<br/>
                ‚Ä¢ Uses opposite signs in output layer<br/>
                ‚Ä¢ You would never find this by intuition alone!<br/>
                <br/>
                <b>This is why we need automatic training with gradient descent!</b>
            </p>
        </div>
        """

def reset_parameters(btn):
    """Reset all parameters to zero."""
    w11_slider.value = 0
    w12_slider.value = 0
    b1_slider.value = 0
    w21_slider.value = 0
    w22_slider.value = 0
    b2_slider.value = 0
    w_out1_slider.value = 0
    w_out2_slider.value = 0
    b_out_slider.value = 0
    
    # Reset perfect solution button
    perfect_solution_btn.description = 'üí° Perfect Solution'
    perfect_solution_btn.button_style = 'danger'
    
    update_network(None)

def reset_simulation(btn):
    """Reset the entire simulation - clear output and parameters."""
    # Reset all sliders to zero
    w11_slider.value = 0
    w12_slider.value = 0
    b1_slider.value = 0
    w21_slider.value = 0
    w22_slider.value = 0
    b2_slider.value = 0
    w_out1_slider.value = 0
    w_out2_slider.value = 0
    b_out_slider.value = 0
    
    # Reset network
    network.w11 = 0
    network.w12 = 0
    network.b1 = 0
    network.w21 = 0
    network.w22 = 0
    network.b2 = 0
    network.w_out1 = 0
    network.w_out2 = 0
    network.b_out = 0
    
    # Reset perfect solution button
    perfect_solution_btn.description = 'üí° Perfect Solution'
    perfect_solution_btn.button_style = 'danger'
    
    # Clear output
    with plot_output:
        clear_output(wait=True)
    
    # Reset displays
    accuracy_html.value = "<h3 style='text-align:center;'>Current Accuracy: ---%</h3>"
    guidance_html.value = "<div style='background:#e8f0fe; padding:15px; border-radius:8px; margin:10px 0;'><p style='margin:0;'>Adjust the sliders below to tune your network, then click 'Update Network'!</p></div>"

# Connect buttons to functions
update_btn.on_click(update_network)
example_btn.on_click(load_example)
perfect_solution_btn.on_click(load_perfect_solution)
reset_btn.on_click(reset_parameters)
reset_sim_btn.on_click(reset_simulation)
dataset_dropdown.observe(change_dataset, names='value')

# Layout the interface
print("\n" + "="*80)
print("INTERACTIVE NETWORK BUILDER")
print("="*80)
print("\nInstructions:")
print("1. Choose a dataset type from the dropdown")
print("2. Adjust the 9 sliders below to tune your network")
print("3. Click 'Update Network' to see the results")
print("4. Focus on the TOP-RIGHT panel - that's where XOR should become separable!")
print("5. Try to find a solution yourself before using the buttons!")
print("6. 'Load Example' shows a simple (imperfect) solution")
print("7. 'Perfect Solution' shows the hard-to-find 100% solution (try last!)")
print("\n" + "="*80)

display(HTML("<h4 style='margin-top:10px; color:#1967d2;'>üìä Choose Your Dataset:</h4>"))
display(dataset_dropdown)
display(HTML("<p style='color:#5f6368; font-size:13px; margin:5px 0 15px 0;'><b>Tip:</b> Start with 'Clean XOR' to learn the concept!</p>"))

display(accuracy_html)
display(guidance_html)

display(HTML("<h4 style='margin-top:20px; color:#1967d2;'>‚öôÔ∏è Hidden Neuron 1 Parameters:</h4>"))
display(VBox([w11_slider, w12_slider, b1_slider]))

display(HTML("<h4 style='margin-top:20px; color:#1967d2;'>‚öôÔ∏è Hidden Neuron 2 Parameters:</h4>"))
display(VBox([w21_slider, w22_slider, b2_slider]))

display(HTML("<h4 style='margin-top:20px; color:#1967d2;'>‚öôÔ∏è Output Layer Parameters:</h4>"))
display(VBox([w_out1_slider, w_out2_slider, b_out_slider]))

display(HTML("<div style='margin:20px 0;'></div>"))
# Three rows of buttons
display(HBox([update_btn, example_btn]))
display(HTML("<div style='margin:10px 0;'></div>"))
display(HBox([perfect_solution_btn]))
display(HTML("<div style='margin:10px 0;'></div>"))
display(HBox([reset_btn, reset_sim_btn]))
display(HTML("<div style='margin:20px 0;'></div>"))

display(plot_output)

# Show initial state
update_network(None)

## 5. Understanding What Happened

### What Did You Just Do?

**The Manual Approach (You):**
1. Adjusted 9 sliders by hand
2. Tried to make hidden space linearly separable
3. Saw how two perceptrons can combine to solve XOR

**Why This Was Hard:**
- 9-dimensional parameter space to search
- Non-obvious which direction to adjust
- Trial and error with only visual feedback
- Imagine doing this with 1000 parameters... or 1 million!

### The Key Insight: Hidden Space Transformation

**From Module 0, you learned:**
- Manually adding x‚ÇÉ = x‚ÇÅ √ó x‚ÇÇ made XOR separable
- A flat plane in 3D became a curved boundary in 2D

**Now in Module 1, you discovered:**
- Hidden neurons create h‚ÇÅ and h‚ÇÇ automatically (no manual feature engineering!)
- The network transforms (x‚ÇÅ, x‚ÇÇ) ‚Üí (h‚ÇÅ, h‚ÇÇ)
- **XOR becomes linearly separable in this new (h‚ÇÅ, h‚ÇÇ) space**
- A simple straight line in hidden space = complex boundary in input space

### Connection to Lab 3:

Remember perceptrons from Lab 3? Each hidden neuron **IS** a perceptron!
- H1 = perceptron with weights (w‚ÇÅ‚ÇÅ, w‚ÇÅ‚ÇÇ) and bias b‚ÇÅ
- H2 = perceptron with weights (w‚ÇÇ‚ÇÅ, w‚ÇÇ‚ÇÇ) and bias b‚ÇÇ
- Output = perceptron that takes (h‚ÇÅ, h‚ÇÇ) as input

**Two perceptrons + one output perceptron = solves XOR!**

This is why a single perceptron couldn't solve XOR in Lab 3, but a network can.

### Coming Next (Module 2):

**Automatic training does this for you!**
- Gradient descent finds good parameters automatically
- Scales to millions of parameters
- You just saw **WHY** hidden layers work
- Next you'll see **HOW** they learn!

## 6. Example Solution Explained

If you loaded the example solution (or found your own!), let's understand what each part does:

### Hidden Neuron 1 (H1):
**Parameters:** w‚ÇÅ‚ÇÅ=5, w‚ÇÅ‚ÇÇ=0, b‚ÇÅ=0

**What it computes:**
```
z‚ÇÅ = 5¬∑x‚ÇÅ + 0¬∑x‚ÇÇ + 0 = 5¬∑x‚ÇÅ
h‚ÇÅ = sigmoid(5¬∑x‚ÇÅ)
```

**What it does:**
- Creates a **vertical boundary** at x‚ÇÅ = 0
- When x‚ÇÅ < 0 (left side): h‚ÇÅ ‚âà 0
- When x‚ÇÅ > 0 (right side): h‚ÇÅ ‚âà 1
- **Effect:** Separates left points from right points

### Hidden Neuron 2 (H2):
**Parameters:** w‚ÇÇ‚ÇÅ=0, w‚ÇÇ‚ÇÇ=5, b‚ÇÇ=0

**What it computes:**
```
z‚ÇÇ = 0¬∑x‚ÇÅ + 5¬∑x‚ÇÇ + 0 = 5¬∑x‚ÇÇ
h‚ÇÇ = sigmoid(5¬∑x‚ÇÇ)
```

**What it does:**
- Creates a **horizontal boundary** at x‚ÇÇ = 0
- When x‚ÇÇ < 0 (bottom): h‚ÇÇ ‚âà 0
- When x‚ÇÇ > 0 (top): h‚ÇÇ ‚âà 1
- **Effect:** Separates bottom points from top points

### Output Neuron:
**Parameters:** w_out1=5, w_out2=5, b_out=-7

**What it computes:**
```
z_out = 5¬∑h‚ÇÅ + 5¬∑h‚ÇÇ - 7
output = sigmoid(5¬∑h‚ÇÅ + 5¬∑h‚ÇÇ - 7)
```

**What it does (XOR logic):**
- Bottom-left (x‚ÇÅ<0, x‚ÇÇ<0): h‚ÇÅ‚âà0, h‚ÇÇ‚âà0 ‚Üí z_out ‚âà -7 ‚Üí output‚âà0 ‚úì (Class 0)
- Top-right (x‚ÇÅ>0, x‚ÇÇ>0): h‚ÇÅ‚âà1, h‚ÇÇ‚âà1 ‚Üí z_out ‚âà +3 ‚Üí output‚âà1... wait!

Actually, let's recalculate more carefully:

| Corner | x‚ÇÅ | x‚ÇÇ | h‚ÇÅ | h‚ÇÇ | z_out = 5h‚ÇÅ+5h‚ÇÇ-7 | output | True Label |
|--------|----|----|----|----|-------------------|--------|------------|
| Bottom-left | -1.5 | -1.5 | ~0 | ~0 | -7 | ~0 | 0 ‚úì |
| Top-right | +1.5 | +1.5 | ~1 | ~1 | +3 | ~1 | 0 ‚úó |
| Top-left | -1.5 | +1.5 | ~0 | ~1 | -2 | ~0.1 | 1 ‚úó |
| Bottom-right | +1.5 | -1.5 | ~1 | ~0 | -2 | ~0.1 | 1 ‚úó |

Hmm, that's not quite right! Let me give you a better example solution:

### Better Example Solution:

**H1:** w‚ÇÅ‚ÇÅ=5, w‚ÇÅ‚ÇÇ=5, b‚ÇÅ=-4 (creates diagonal boundary: x‚ÇÅ+x‚ÇÇ=0.8)
**H2:** w‚ÇÇ‚ÇÅ=5, w‚ÇÇ‚ÇÇ=-5, b‚ÇÇ=0 (creates diagonal boundary: x‚ÇÅ-x‚ÇÇ=0)
**Output:** w_out1=5, w_out2=5, b_out=-7

The key insight is that **many solutions exist!** The network can discover different transformations of the hidden space that make XOR separable.

### The Hidden Space Magic:

No matter which solution you found, the pattern is the same:
1. **Hidden neurons create new dimensions** (h‚ÇÅ, h‚ÇÇ)
2. **XOR data transforms** from messy in (x‚ÇÅ, x‚ÇÇ) to separable in (h‚ÇÅ, h‚ÇÇ)
3. **Output layer draws simple line** in hidden space
4. **This maps back** to complex boundary in input space

**This is the fundamental reason why neural networks work!**

## 7. Key Takeaways from Module 1

### 1. Hidden Layers Create New Dimensions
- Just like you manually added x‚ÇÉ=x‚ÇÅ√óx‚ÇÇ in Module 0
- Hidden neurons create h‚ÇÅ, h‚ÇÇ automatically during training
- These new dimensions make the problem solvable
- **The transformation is learned, not hand-designed!**

### 2. Separation Happens in Hidden Space
- **Not in input space!** The input boundary will be complex
- The key transformation is (x‚ÇÅ, x‚ÇÇ) ‚Üí (h‚ÇÅ, h‚ÇÇ)
- Linear separation in hidden space = complex boundary in input space
- **Focus on the hidden representation, not just the final output!**

### 3. Each Hidden Neuron is a Perceptron (from Lab 3)
- H1 and H2 are both perceptrons with sigmoid activation
- Each creates one boundary/transformation
- The output layer combines their outputs
- **Two perceptrons together can solve what one cannot!**

### 4. Manual Tuning Doesn't Scale
- 9 parameters was already hard to tune by hand
- Modern networks have millions or billions of parameters
- Gradient descent does this automatically in Module 2
- **Automatic training is not just convenient - it's essential!**

### 5. Multiple Solutions Exist
- There's no single "correct" solution to XOR
- Different weight configurations can achieve high accuracy
- The network can discover various transformations of hidden space
- **This flexibility is a strength of neural networks!**

---

**Next:** In Module 2, you'll see gradient descent automatically find these parameters through training. The network will learn the same hidden-space transformation you just discovered - but completely on its own!

## Questions for Your Answer Sheet

**Q5.** How many total parameters does the 2-2-1 network have? Break down the count by layer (hidden layer 1, hidden layer 2, output layer).

**Q6.** Describe what each hidden neuron (H1 and H2) separated in your solution. What patterns did they detect in the input space? (Refer to the bottom-left and bottom-right panels)

**Q7.** Look at the hidden space plot (top-right panel with h‚ÇÅ and h‚ÇÇ axes). Explain how the XOR data was transformed in (h‚ÇÅ, h‚ÇÇ) space and why this transformation made it easier to separate the two classes. How does this relate to what you did manually in Module 0?

## Next Steps

1. **Answer Q5-Q7** on your answer sheet
2. **Experiment** with different parameter values - try to find alternative solutions!
3. **Return to the LMS** and continue to Module 2
4. In Module 2, you'll see how gradient descent trains this same 2-2-1 network automatically!