<a href="https://colab.research.google.com/github/robbybrodie/time_as_computation_cost/blob/main/notebooks/bandgaps_colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Bandgaps Experiment: DoF Law Fitting

This experiment fits microphysical degrees of freedom (DoF) laws and beta parameters from synthetic data, comparing our computational-capacity model against baseline models.

## Theory

- **DoF Law**: `DoF(N) = exp[-a(1-N)]` where N is computational capacity
- **Beta Parameter**: `β(ΔF) = β₀ + β₁*ΔF` relating to micro-physical processes  
- **Model Selection**: AIC/BIC comparison against polynomial and power-law baselines

**Physical Interpretation**: The degrees of freedom available for physical processes depend exponentially on the computational capacity N. As N approaches 1 (full capacity), more degrees of freedom become accessible.

## Setup: Clone Repository and Install Dependencies

In [None]:
# Clone and setup (idempotent)
import os, sys, subprocess, shutil, pathlib
REPO_URL = "https://github.com/robbybrodie/time_as_computation_cost.git"
REPO_NAME = "time_as_computation_cost"

if not pathlib.Path(REPO_NAME).exists():
    !git clone $REPO_URL

%cd $REPO_NAME

# Install package
if (pathlib.Path("pyproject.toml")).exists():
    !pip install -e .

# Set random seed for reproducibility
import numpy as np, random
np.random.seed(42)
random.seed(42)

## Run Bandgaps Experiment

In [None]:
from experiments.run_bandgaps import main
main()

## Display Results

In [None]:
from IPython.display import Image, display
import glob
from pathlib import Path

# Display generated plots
output_dir = Path("experiments/out/bandgaps")

print("DoF and Beta Parameter Fits:")
display(Image(str(output_dir / "bandgaps_fits.png")))

print("\nModel Comparison with Baselines:")
display(Image(str(output_dir / "baseline_comparison.png")))

# Display numerical results
print("\nNumerical Results:")
with open(output_dir / "results.txt", 'r') as f:
    print(f.read())

## Interactive Setup

In [None]:
# Setup for interactive exploration
import numpy as np, matplotlib.pyplot as plt
from pathlib import Path
import sys

# Ensure we can import the modules
repo_root = Path().resolve()
sys.path.insert(0, str(repo_root / "src"))

from tacc.micro.bandgaps import fit_dof_law, fit_beta
from tacc.baselines import compare_models
from experiments.run_bandgaps import main

print("Interactive modules loaded successfully!")

## Interactive DoF Law Exploration

In [None]:
# Generate synthetic data with different parameters
def explore_dof_law(true_a=2.0, noise_level=0.05, n_points=50):
    """Interactive exploration of DoF law fitting"""
    
    # Generate synthetic data
    N_values = np.linspace(0.5, 1.5, n_points)
    DoF_true = np.exp(-true_a * (1 - N_values))
    DoF_noisy = DoF_true + np.random.normal(0, noise_level, len(DoF_true))
    
    # Fit our exponential model
    fitted_a, fit_info = fit_dof_law(N_values, DoF_noisy)
    
    # Fit baseline models for comparison
    from scipy.optimize import curve_fit
    
    # Polynomial model
    def poly_model(N, a, b, c):
        return a + b*N + c*N**2
    poly_params, _ = curve_fit(poly_model, N_values, DoF_noisy, maxfev=5000)
    
    # Power law model  
    def power_model(N, a, b):
        return a * np.power(N, b)
    try:
        power_params, _ = curve_fit(power_model, N_values, DoF_noisy, maxfev=5000)
    except:
        power_params = [1.0, 1.0]  # fallback
    
    # Create comparison plot
    fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 12))
    
    # Main fit comparison
    N_fine = np.linspace(0.5, 1.5, 200)
    DoF_fit = np.exp(-fitted_a * (1 - N_fine))
    poly_fit = poly_model(N_fine, *poly_params)
    power_fit = power_model(N_fine, *power_params)
    
    ax1.scatter(N_values, DoF_noisy, alpha=0.6, color='gray', label='Synthetic data')
    ax1.plot(N_fine, np.exp(-true_a * (1 - N_fine)), 'k--', linewidth=2, label=f'True (a={true_a})')
    ax1.plot(N_fine, DoF_fit, 'r-', linewidth=2, label=f'Exponential fit (a={fitted_a:.3f})')
    ax1.plot(N_fine, poly_fit, 'b-', linewidth=2, label='Polynomial')
    ax1.plot(N_fine, power_fit, 'g-', linewidth=2, label='Power law')
    ax1.set_xlabel('N (Computational Capacity)')
    ax1.set_ylabel('DoF')
    ax1.set_title('DoF Law Fitting Comparison')
    ax1.legend()
    ax1.grid(True, alpha=0.3)
    
    # Residuals for our model
    DoF_pred = np.exp(-fitted_a * (1 - N_values))
    residuals = DoF_noisy - DoF_pred
    ax2.scatter(N_values, residuals, alpha=0.7, color='red')
    ax2.axhline(y=0, color='k', linestyle='--', alpha=0.7)
    ax2.set_xlabel('N')
    ax2.set_ylabel('Residuals')
    ax2.set_title('Exponential Model Residuals')
    ax2.grid(True, alpha=0.3)
    
    # Parameter sensitivity
    a_range = np.linspace(0.5, 4.0, 100)
    DoF_sensitivity = [np.exp(-a * (1 - 1.0)) for a in a_range]  # At N=1
    ax3.plot(a_range, DoF_sensitivity, 'purple', linewidth=2)
    ax3.axvline(x=true_a, color='k', linestyle='--', label=f'True a={true_a}')
    ax3.axvline(x=fitted_a, color='r', linestyle=':', label=f'Fitted a={fitted_a:.3f}')
    ax3.set_xlabel('Parameter a')
    ax3.set_ylabel('DoF(N=1)')
    ax3.set_title('Parameter Sensitivity')
    ax3.legend()
    ax3.grid(True, alpha=0.3)
    
    # Error vs noise level
    noise_levels = np.logspace(-3, -1, 20)
    errors = []
    for noise in noise_levels:
        DoF_test = DoF_true + np.random.normal(0, noise, len(DoF_true))
        fitted_a_test, _ = fit_dof_law(N_values, DoF_test)
        errors.append(abs(fitted_a_test - true_a))
    
    ax4.loglog(noise_levels, errors, 'o-', color='orange', linewidth=2)
    ax4.axhline(y=0.1, color='r', linestyle='--', alpha=0.7, label='10% error')
    ax4.set_xlabel('Noise Level')
    ax4.set_ylabel('Parameter Error |â - a|')
    ax4.set_title('Noise Sensitivity')
    ax4.legend()
    ax4.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    # Print comparison metrics
    print(f"Parameter Recovery:")
    print(f"  True a = {true_a:.3f}")
    print(f"  Fitted a = {fitted_a:.3f}")
    print(f"  Error = {abs(fitted_a - true_a):.3f}")
    print(f"\nModel Performance:")
    print(f"  RMSE = {np.sqrt(np.mean(residuals**2)):.6f}")
    print(f"  Max residual = {np.max(np.abs(residuals)):.6f}")

# Explore with different parameters
print("Exploring DoF law with default parameters:")
np.random.seed(42)
explore_dof_law()

print("\n" + "="*60 + "\n")

print("Exploring with higher noise:")
np.random.seed(42)
explore_dof_law(true_a=1.5, noise_level=0.1)

## Basic Parameter Exploration

In [None]:
# Custom parameters for exploration
import numpy as np
import matplotlib.pyplot as plt
from tacc.micro.bandgaps import fit_dof_law, fit_beta

# Try different noise levels
noise_levels = [0.01, 0.05, 0.1, 0.2]
fitted_params = []

for noise in noise_levels:
    np.random.seed(42)
    N_values = np.linspace(0.5, 1.5, 50)
    true_a = 2.0
    DoF_values = np.exp(-true_a * (1 - N_values))
    DoF_values += np.random.normal(0, noise, len(DoF_values))
    
    # Fit with custom data
    from scipy.optimize import curve_fit
    def dof_model(N, a):
        return np.exp(-a * (1 - N))
    
    popt, _ = curve_fit(dof_model, N_values, DoF_values)
    fitted_params.append(popt[0])
    print(f"Noise level {noise:.2f}: fitted a = {popt[0]:.3f} (error: {abs(popt[0] - true_a):.3f})")

# Plot noise sensitivity
plt.figure(figsize=(8, 6))
plt.plot(noise_levels, fitted_params, 'bo-', linewidth=2, markersize=8)
plt.axhline(y=2.0, color='r', linestyle='--', label='True value (a=2.0)')
plt.xlabel('Noise Level')
plt.ylabel('Fitted Parameter a')
plt.title('Noise Sensitivity Analysis')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

## Beta Parameter Analysis

In [None]:
def explore_beta_parameter(true_beta0=1.5, true_beta1=0.3, noise_level=0.02):
    """Interactive exploration of beta parameter fitting"""
    
    # Generate synthetic beta data
    Delta_F_values = np.linspace(-0.5, 0.5, 40)
    beta_true = true_beta0 + true_beta1 * Delta_F_values
    beta_noisy = beta_true + np.random.normal(0, noise_level, len(beta_true))
    
    # Fit beta parameters
    fitted_beta0, fitted_beta1, fit_info = fit_beta(Delta_F_values, beta_noisy)
    
    # Create visualization
    fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 12))
    
    # Main fit
    Delta_F_fine = np.linspace(-0.5, 0.5, 200)
    beta_fit = fitted_beta0 + fitted_beta1 * Delta_F_fine
    
    ax1.scatter(Delta_F_values, beta_noisy, alpha=0.6, color='gray', label='Synthetic data')
    ax1.plot(Delta_F_fine, true_beta0 + true_beta1 * Delta_F_fine, 'k--', linewidth=2, 
             label=f'True (β₀={true_beta0}, β₁={true_beta1})')
    ax1.plot(Delta_F_fine, beta_fit, 'r-', linewidth=2,
             label=f'Fit (β₀={fitted_beta0:.3f}, β₁={fitted_beta1:.3f})')
    ax1.set_xlabel('ΔF')
    ax1.set_ylabel('β')
    ax1.set_title('Beta Parameter Fitting')
    ax1.legend()
    ax1.grid(True, alpha=0.3)
    
    # Residuals
    beta_pred = fitted_beta0 + fitted_beta1 * Delta_F_values
    residuals = beta_noisy - beta_pred
    ax2.scatter(Delta_F_values, residuals, alpha=0.7, color='red')
    ax2.axhline(y=0, color='k', linestyle='--', alpha=0.7)
    ax2.set_xlabel('ΔF')
    ax2.set_ylabel('Residuals')
    ax2.set_title('Beta Model Residuals')
    ax2.grid(True, alpha=0.3)
    
    # Parameter correlation
    ax3.scatter([true_beta0], [true_beta1], s=100, color='black', marker='x', 
               label='True parameters', zorder=10)
    ax3.scatter([fitted_beta0], [fitted_beta1], s=100, color='red', marker='o',
               label='Fitted parameters', zorder=10)
    ax3.set_xlabel('β₀')
    ax3.set_ylabel('β₁')
    ax3.set_title('Parameter Space')
    ax3.legend()
    ax3.grid(True, alpha=0.3)
    
    # Confidence intervals (approximate)
    # Bootstrap sampling
    n_bootstrap = 100
    beta0_boots, beta1_boots = [], []
    for _ in range(n_bootstrap):
        indices = np.random.choice(len(Delta_F_values), len(Delta_F_values), replace=True)
        boot_DeltaF = Delta_F_values[indices]
        boot_beta = beta_noisy[indices]
        boot_beta0, boot_beta1, _ = fit_beta(boot_DeltaF, boot_beta)
        beta0_boots.append(boot_beta0)
        beta1_boots.append(boot_beta1)
    
    ax4.scatter(beta0_boots, beta1_boots, alpha=0.3, color='blue', s=10)
    ax4.scatter([fitted_beta0], [fitted_beta1], s=100, color='red', marker='o', zorder=10)
    ax4.set_xlabel('β₀')  
    ax4.set_ylabel('β₁')
    ax4.set_title('Bootstrap Parameter Distribution')
    ax4.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    print(f"Beta Parameter Recovery:")
    print(f"  True β₀ = {true_beta0:.3f}, β₁ = {true_beta1:.3f}")
    print(f"  Fitted β₀ = {fitted_beta0:.3f}, β₁ = {fitted_beta1:.3f}")
    print(f"  Errors: Δβ₀ = {abs(fitted_beta0-true_beta0):.3f}, Δβ₁ = {abs(fitted_beta1-true_beta1):.3f}")

print("Exploring beta parameter fitting:")
np.random.seed(42)
explore_beta_parameter()

## Comprehensive Model Comparison

In [None]:
def comprehensive_model_comparison():
    """Compare exponential model against multiple baselines"""
    
    # Generate test data
    np.random.seed(42)
    N_values = np.linspace(0.5, 1.5, 50)
    true_a = 2.0
    DoF_true = np.exp(-true_a * (1 - N_values))
    DoF_noisy = DoF_true + np.random.normal(0, 0.05, len(DoF_true))
    
    # Define models
    def exponential_model(N, a):
        return np.exp(-a * (1 - N))
    
    def polynomial2_model(N, a, b, c):
        return a + b*N + c*N**2
        
    def polynomial3_model(N, a, b, c, d):
        return a + b*N + c*N**2 + d*N**3
    
    def power_model(N, a, b):
        return a * np.power(np.maximum(N, 1e-6), b)  # Avoid zero
    
    def rational_model(N, a, b, c):
        return (a + b*N) / (1 + c*N)
    
    models = {
        'Exponential': (exponential_model, [2.0]),
        'Polynomial-2': (polynomial2_model, [1.0, 0.0, 0.0]),
        'Polynomial-3': (polynomial3_model, [1.0, 0.0, 0.0, 0.0]), 
        'Power Law': (power_model, [1.0, 1.0]),
        'Rational': (rational_model, [1.0, 0.0, 0.0])
    }
    
    # Fit all models and calculate metrics
    results = {}
    from scipy.optimize import curve_fit
    
    fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 12))
    
    colors = ['red', 'blue', 'green', 'orange', 'purple']
    N_fine = np.linspace(0.5, 1.5, 200)
    
    aic_values, bic_values, rmse_values = [], [], []
    model_names = []
    
    for i, (name, (model_func, initial_params)) in enumerate(models.items()):
        try:
            # Fit model
            params, pcov = curve_fit(model_func, N_values, DoF_noisy, 
                                   p0=initial_params, maxfev=10000)
            
            # Calculate metrics
            DoF_pred = model_func(N_values, *params)
            n_params = len(params)
            n_data = len(N_values)
            
            # Log-likelihood (assuming Gaussian noise)
            mse = np.mean((DoF_noisy - DoF_pred)**2)
            rmse = np.sqrt(mse)
            log_likelihood = -0.5 * n_data * np.log(2 * np.pi * mse) - 0.5 * np.sum((DoF_noisy - DoF_pred)**2) / mse
            
            # Information criteria
            aic = 2 * n_params - 2 * log_likelihood
            bic = n_params * np.log(n_data) - 2 * log_likelihood
            
            results[name] = {
                'params': params,
                'AIC': aic,
                'BIC': bic,
                'RMSE': rmse,
                'predictions': model_func(N_fine, *params)
            }
            
            aic_values.append(aic)
            bic_values.append(bic)  
            rmse_values.append(rmse)
            model_names.append(name)
            
            # Plot fit
            ax1.plot(N_fine, model_func(N_fine, *params), color=colors[i], 
                    linewidth=2, label=f'{name} (AIC={aic:.1f})')
            
        except Exception as e:
            print(f"Failed to fit {name}: {e}")
    
    # Plot data and true function
    ax1.scatter(N_values, DoF_noisy, alpha=0.6, color='gray', label='Data', zorder=10)
    ax1.plot(N_fine, exponential_model(N_fine, true_a), 'k--', linewidth=3, 
            label='True function', zorder=5)
    ax1.set_xlabel('N (Computational Capacity)')
    ax1.set_ylabel('DoF') 
    ax1.set_title('Model Comparison')
    ax1.legend()
    ax1.grid(True, alpha=0.3)
    
    # AIC comparison
    ax2.bar(range(len(model_names)), aic_values, color=colors[:len(model_names)])
    ax2.set_xlabel('Model')
    ax2.set_ylabel('AIC (lower is better)')
    ax2.set_title('Akaike Information Criterion')
    ax2.set_xticks(range(len(model_names)))
    ax2.set_xticklabels(model_names, rotation=45)
    ax2.grid(True, alpha=0.3)
    
    # BIC comparison  
    ax3.bar(range(len(model_names)), bic_values, color=colors[:len(model_names)])
    ax3.set_xlabel('Model')
    ax3.set_ylabel('BIC (lower is better)')
    ax3.set_title('Bayesian Information Criterion')
    ax3.set_xticks(range(len(model_names)))
    ax3.set_xticklabels(model_names, rotation=45)
    ax3.grid(True, alpha=0.3)
    
    # RMSE comparison
    ax4.bar(range(len(model_names)), rmse_values, color=colors[:len(model_names)])
    ax4.set_xlabel('Model')
    ax4.set_ylabel('RMSE (lower is better)')
    ax4.set_title('Root Mean Square Error')
    ax4.set_xticks(range(len(model_names)))
    ax4.set_xticklabels(model_names, rotation=45)
    ax4.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    # Print ranking
    print("Model Ranking by AIC (lower is better):")
    aic_ranking = sorted(zip(model_names, aic_values), key=lambda x: x[1])
    for i, (name, aic) in enumerate(aic_ranking):
        print(f"  {i+1}. {name}: AIC = {aic:.2f}")
    
    print("\nModel Ranking by BIC (lower is better):")  
    bic_ranking = sorted(zip(model_names, bic_values), key=lambda x: x[1])
    for i, (name, bic) in enumerate(bic_ranking):
        print(f"  {i+1}. {name}: BIC = {bic:.2f}")

print("Running comprehensive model comparison:")
comprehensive_model_comparison()

## Key Insights

1. **Parameter Recovery**: The exponential model can accurately recover the true parameter `a` from noisy synthetic data

2. **Model Selection**: AIC/BIC metrics help distinguish between competing functional forms

3. **Noise Sensitivity**: Parameter accuracy degrades predictably with increasing noise levels

4. **Beta Parameters**: Linear relationships in micro-physical processes can be reliably detected

5. **Baseline Comparison**: Our exponential form often outperforms polynomial models for this type of data

## Physical Interpretation

- **DoF(N)**: Represents available degrees of freedom as function of computational capacity
- **Beta Parameters**: Characterize micro-physical response to external perturbations  
- **Model Selection**: Determines which mathematical form best captures underlying physics

**Note**: This is synthetic data analysis for a conceptual framework - not established physics!

## Troubleshooting

**Common Issues:**
- If plots don't display, ensure matplotlib is properly installed
- If imports fail, check that the repository was cloned correctly
- For fitting errors, verify that the synthetic data generation is working

**Expected Output:**
- Multiple comprehensive interactive analyses with 4-panel plots
- Parameter recovery demonstrations with noise sensitivity analysis
- Model comparison rankings using AIC/BIC criteria
- Bootstrap confidence interval analysis for parameter uncertainties