# Visual Hyperparameter Tuning with Interactive Sliders

This notebook demonstrates interactive neural network weight tuning using sliders. You can adjust individual weights in a neural network and immediately see how they affect the model's predictions and decision boundaries.

This approach is inspired by sparse auto-encoding and mechanistic interpretability research, allowing us to understand how individual parameters contribute to network behavior.

## Key Features

- Interactive sliders for every weight in the neural network
- Real-time visualization of decision boundaries
- Activation statistics and weight heatmaps
- Multiple dataset options (XOR, circles, moons)


In [None]:
# Import required libraries
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
import ipywidgets as widgets
from IPython.display import display

# Import from our custom modules
from nn_slider_core import SimpleMLPNetwork, create_demo_dataset, compute_activation_statistics
from sliders_interface import NetworkSliderInterface, CompactSliderInterface, create_weight_heatmap


## Step 1: Create a Demo Dataset

We'll start with the classic XOR problem - a non-linearly separable dataset that requires at least one hidden layer to solve.


In [None]:
# Create XOR dataset
X_train, y_train = create_demo_dataset(task='xor', n_samples=200, noise=0.1)

# Visualize the dataset
plt.figure(figsize=(8, 6))
scatter = plt.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap='RdYlBu', s=50, alpha=0.7, edgecolors='k')
plt.colorbar(scatter, label='Class')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('XOR Dataset')
plt.grid(True, alpha=0.3)
plt.show()


## Step 2: Initialize Neural Network

We'll create a simple MLP with one hidden layer. The network architecture is: 2 inputs → 4 hidden units → 1 output.


In [None]:
# Create a simple neural network
network = SimpleMLPNetwork(input_size=2, hidden_sizes=[4], output_size=1)

# Display network information
param_info = network.get_parameter_info()
print(f"Total parameters: {network.get_num_parameters()}")
print("\nParameter breakdown:")
for info in param_info:
    print(f"  {info['name']}: shape {info['shape']}, {info['size']} parameters")


## Step 3: Visualize Initial Weights

Let's visualize the initial random weights as a heatmap to understand the starting configuration.


In [None]:
# Create weight heatmap for the first layer
fig = create_weight_heatmap(network, layer_idx=0)
plt.tight_layout()
plt.show()


## Step 4: Helper Function for Decision Boundary Visualization

This function creates a mesh grid and visualizes how the network classifies different regions of the input space.


In [None]:
def plot_decision_boundary(network, X, y, resolution=0.02):
    """Plot the decision boundary of the neural network."""
    # Create mesh grid
    x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5
    y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5
    xx, yy = np.meshgrid(np.arange(x_min, x_max, resolution),
                         np.arange(y_min, y_max, resolution))
    
    # Predict on mesh grid
    Z = network.forward(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    
    # Plot
    plt.figure(figsize=(10, 8))
    plt.contourf(xx, yy, Z, alpha=0.4, cmap='RdYlBu', levels=20)
    plt.colorbar(label='Network Output')
    scatter = plt.scatter(X[:, 0], X[:, 1], c=y, cmap='RdYlBu', s=50, alpha=0.8, edgecolors='k', linewidth=1.5)
    plt.xlabel('Feature 1')
    plt.ylabel('Feature 2')
    plt.title('Decision Boundary Visualization')
    plt.grid(True, alpha=0.3)
    
    # Compute and display loss
    loss = network.compute_loss(X, y)
    plt.text(0.02, 0.98, f'Loss: {loss:.4f}', transform=plt.gca().transAxes,
             bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.8),
             verticalalignment='top', fontsize=12)
    
    plt.show()


## Step 5: Initial Decision Boundary (Random Weights)

Let's see how the network performs with random initialization before any tuning.


In [None]:
# Plot initial decision boundary
plot_decision_boundary(network, X_train, y_train)


## Step 6: Interactive Weight Tuning with Sliders

Now comes the exciting part! Use the sliders below to adjust individual weights and biases. Watch how each parameter affects the decision boundary and network output.

Tips for exploration:

- Try adjusting weights in the first layer to see how they affect feature detection
- Modify biases to shift decision boundaries
- Observe how hidden layer activations change with different weight configurations
- Try to manually solve the XOR problem by tuning weights!


In [None]:
# Create interactive slider interface
slider_interface = NetworkSliderInterface(network, X_train, y_train)
slider_interface.display()


## Step 7: Compact Slider Interface (Alternative)

For larger networks, we can use a more compact interface that groups parameters by layer.


In [None]:
# Create compact slider interface
compact_interface = CompactSliderInterface(network, X_train, y_train)
compact_interface.display()


## Step 8: Analyze Activation Statistics

Let's examine the activation patterns in the hidden layer to understand what features the network has learned.


In [None]:
# Get current network parameters after tuning
current_params = slider_interface.get_current_parameters()
network.unvectorize_parameters(current_params)

# Forward pass to get activations
activations = []
x = X_train
for i, (W, b) in enumerate(zip(network.weights, network.biases)):
    x = np.dot(x, W) + b
    if i < len(network.weights) - 1:  # Apply activation for hidden layers
        x = np.maximum(0, x)  # ReLU
    activations.append(x)

# Compute statistics
stats = compute_activation_statistics(activations)

# Visualize activation statistics
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

layers = list(range(len(stats['mean'])))

axes[0].plot(layers, stats['mean'], marker='o', linewidth=2, markersize=8)
axes[0].set_xlabel('Layer')
axes[0].set_ylabel('Mean Activation')
axes[0].set_title('Mean Activation by Layer')
axes[0].grid(True, alpha=0.3)

axes[1].plot(layers, stats['std'], marker='s', linewidth=2, markersize=8, color='orange')
axes[1].set_xlabel('Layer')
axes[1].set_ylabel('Std Deviation')
axes[1].set_title('Activation Std Dev by Layer')
axes[1].grid(True, alpha=0.3)

axes[2].plot(layers, stats['sparsity'], marker='^', linewidth=2, markersize=8, color='green')
axes[2].set_xlabel('Layer')
axes[2].set_ylabel('Sparsity (% zeros)')
axes[2].set_title('Activation Sparsity by Layer')
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()


## Step 9: Experiment with Different Datasets

Try the same interactive tuning approach with different datasets to see how network behavior changes.


In [None]:
# Create circles dataset
X_circles, y_circles = create_demo_dataset(task='circles', n_samples=200, noise=0.05)

# Visualize
plt.figure(figsize=(8, 6))
scatter = plt.scatter(X_circles[:, 0], X_circles[:, 1], c=y_circles, cmap='RdYlBu', s=50, alpha=0.7, edgecolors='k')
plt.colorbar(scatter, label='Class')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Circles Dataset')
plt.grid(True, alpha=0.3)
plt.show()

# Create new network for circles
network_circles = SimpleMLPNetwork(input_size=2, hidden_sizes=[6], output_size=1)

# Interactive interface for circles
slider_circles = NetworkSliderInterface(network_circles, X_circles, y_circles)
slider_circles.display()


## Conclusion and Key Insights

This notebook demonstrated interactive hyperparameter tuning for neural networks using sliders. Key takeaways:

1. Individual weights have varying impacts on the decision boundary - some are more influential than others
2. Hidden layer weights determine feature detection, while output weights combine these features
3. Biases shift decision boundaries and activation thresholds
4. Activation sparsity can indicate which neurons are actively contributing to predictions
5. Manual weight tuning provides intuition for how gradient descent optimizes networks

This approach connects to sparse auto-encoding and mechanistic interpretability research by allowing us to understand the role of individual parameters. For larger networks, techniques like sparse autoencoders can help identify monosemantic features - individual neurons or directions in activation space that correspond to interpretable concepts.

### Further Exploration

- Try deeper networks with multiple hidden layers
- Experiment with different activation functions
- Compare manual tuning vs gradient descent optimization
- Investigate weight pruning and sparsity constraints
- Apply to real-world datasets and observe interpretability challenges
