# RIS Auto-Research Engine - Run Search Campaigns

This notebook demonstrates how to execute comprehensive search campaigns using predefined YAML configurations. You'll learn to:
- List and select available search space configurations
- Execute multi-experiment campaigns
- Monitor progress and collect results
- Generate publication-ready visualizations
- Export data for further analysis

**Available Campaigns:**
- `quick_test` - Fast validation (2 experiments, ~2 min)
- `probe_comparison` - Compare 6 probe types (18 experiments, ~15 min)
- `model_comparison` - Compare 3 architectures (9 experiments, ~10 min)
- `sparsity_sweep` - Parameter optimization (45 experiments, ~30 min)
- `cross_fidelity_validation` - Multi-fidelity analysis (30 experiments, ~25 min)
- `full_search` - Comprehensive search (200+ experiments, several hours)

## 1. Setup and Initialization

Import necessary modules and initialize the RIS Engine with result tracking.

In [None]:
# Standard imports
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from pathlib import Path
from datetime import datetime
import yaml

# RIS Engine imports
from ris_research_engine.ui import RISEngine
from ris_research_engine.engine import ResultAnalyzer, ReportGenerator

# Set up plotting style
plt.style.use('seaborn-v0_8-darkgrid' if 'seaborn-v0_8-darkgrid' in plt.style.available else 'default')
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 10

# Initialize engine
engine = RISEngine(
    db_path="results.db",
    output_dir="outputs"
)

print("‚úì RIS Engine initialized")
print(f"  Database: {engine.db_path}")
print(f"  Output directory: {engine.output_dir}")

## 2. List Available Configurations

Discover all available search space configurations with descriptions and metadata.

In [None]:
# Path to search space configs
config_dir = Path("../configs/search_spaces")

# List all YAML configs
configs = list(config_dir.glob("*.yaml"))

print("\n" + "="*80)
print("AVAILABLE SEARCH SPACE CONFIGURATIONS")
print("="*80 + "\n")

config_info = []
for config_path in sorted(configs):
    with open(config_path, 'r') as f:
        config_data = yaml.safe_load(f)
    
    # Extract key information
    name = config_data.get('name', 'N/A')
    description = config_data.get('description', 'No description')
    
    # Count expected experiments (rough estimate)
    # This is a simplified calculation
    num_probes = 1
    if 'probes' in config_data and 'type' in config_data['probes']:
        probe_types = config_data['probes']['type']
        if isinstance(probe_types, list):
            num_probes = len(probe_types)
    
    num_seeds = len(config_data.get('random_seeds', [1]))
    estimated_experiments = num_probes * num_seeds
    
    config_info.append({
        'Config File': config_path.name,
        'Name': name,
        'Description': description[:60] + '...' if len(description) > 60 else description,
        'Est. Experiments': estimated_experiments
    })
    
    print(f"üìã {config_path.name}")
    print(f"   Name: {name}")
    print(f"   Description: {description}")
    print(f"   Estimated Experiments: ~{estimated_experiments}")
    print()

# Display as table
config_df = pd.DataFrame(config_info)
print("\n" + "="*80)
print(config_df.to_string(index=False))
print("="*80)

## 3. Run Probe Comparison Campaign

Execute the `probe_comparison.yaml` campaign which systematically compares 6 different probe designs:
- DFT beams (structured)
- Hadamard (orthogonal)
- Sobol (quasi-random)
- Halton (quasi-random)
- Random uniform (baseline)
- Random binary (baseline)

This campaign helps identify the most effective probe design for your system configuration.

In [None]:
# Run probe comparison campaign
print("\n" + "="*80)
print("STARTING PROBE COMPARISON CAMPAIGN")
print("="*80)
print("\nThis will run ~18 experiments (6 probes √ó 3 seeds)")
print("Expected runtime: ~15 minutes on CPU\n")

try:
    # Run the campaign
    probe_campaign = engine.search("../configs/search_spaces/probe_comparison.yaml")
    
    print("\n" + "="*80)
    print("‚úì PROBE COMPARISON CAMPAIGN COMPLETED")
    print("="*80)
    print(f"  Total Experiments: {len(probe_campaign['experiments'])}")
    print(f"  Successful: {sum(1 for e in probe_campaign['experiments'] if e['status'] == 'completed')}")
    print(f"  Failed: {sum(1 for e in probe_campaign['experiments'] if e['status'] == 'failed')}")
    print(f"  Total Runtime: {probe_campaign.get('total_time_seconds', 0):.2f}s")
    
except Exception as e:
    print(f"\n‚úó Campaign failed: {e}")
    print("  Check logs for details")
    raise

## 4. Campaign Summary Statistics

Display summary statistics for the completed campaign including mean, std, min, and max values for each probe type.

In [None]:
# Analyze campaign results
experiments = probe_campaign['experiments']
completed = [e for e in experiments if e['status'] == 'completed']

# Group by probe type
probe_results = {}
for exp in completed:
    probe = exp['config']['probe_type']
    if probe not in probe_results:
        probe_results[probe] = []
    probe_results[probe].append(exp['metrics']['top_1_accuracy'])

# Calculate statistics
summary_data = []
for probe, accuracies in probe_results.items():
    summary_data.append({
        'Probe Type': probe,
        'Mean Accuracy': np.mean(accuracies),
        'Std Dev': np.std(accuracies),
        'Min': np.min(accuracies),
        'Max': np.max(accuracies),
        'Runs': len(accuracies)
    })

# Sort by mean accuracy
summary_df = pd.DataFrame(summary_data).sort_values('Mean Accuracy', ascending=False)

print("\n" + "="*80)
print("PROBE COMPARISON SUMMARY")
print("="*80)

# Format display
for col in ['Mean Accuracy', 'Std Dev', 'Min', 'Max']:
    summary_df[col] = summary_df[col].apply(lambda x: f"{x:.4f}")

print(summary_df.to_string(index=False))
print("="*80)

# Identify winner
best_probe = summary_data[0]['Probe Type']
best_acc = summary_data[0]['Mean Accuracy']
print(f"\nüèÜ Best Probe: {best_probe}")
print(f"   Mean Top-1 Accuracy: {best_acc:.4f}")

## 5. Generate Probe Comparison Plots

Create publication-quality visualizations comparing probe performance with error bars showing variance across random seeds.

In [None]:
# Bar chart with error bars
fig, ax = plt.subplots(figsize=(12, 6))

probes = [d['Probe Type'] for d in summary_data]
means = [d['Mean Accuracy'] for d in summary_data]
stds = [d['Std Dev'] for d in summary_data]

# Create bar chart
x_pos = np.arange(len(probes))
bars = ax.bar(x_pos, means, yerr=stds, capsize=8, 
              alpha=0.8, color='steelblue', edgecolor='black', linewidth=1.5)

# Highlight best probe
bars[0].set_color('darkgreen')
bars[0].set_alpha(0.9)

ax.set_xlabel('Probe Type', fontsize=12, fontweight='bold')
ax.set_ylabel('Top-1 Accuracy', fontsize=12, fontweight='bold')
ax.set_title('Probe Design Comparison (Mean ¬± Std Dev over 3 seeds)', 
             fontsize=14, fontweight='bold')
ax.set_xticks(x_pos)
ax.set_xticklabels(probes, rotation=45, ha='right')
ax.set_ylim([0, 1.0])
ax.grid(True, axis='y', alpha=0.3)

# Add value labels on bars
for i, (mean, std) in enumerate(zip(means, stds)):
    ax.text(i, mean + std + 0.02, f"{mean:.3f}", 
           ha='center', va='bottom', fontsize=9, fontweight='bold')

plt.tight_layout()
plt.savefig(engine.output_dir / 'probe_comparison_bar.png', dpi=300, bbox_inches='tight')
plt.show()

print("\n‚úì Plot saved to: outputs/probe_comparison_bar.png")

## 6. Training Curves Comparison

Visualize convergence behavior by plotting training curves for all probe types.

In [None]:
# Plot training curves for each probe
fig, axes = plt.subplots(2, 3, figsize=(16, 10))
axes = axes.flatten()

# Group experiments by probe
for idx, (probe, _) in enumerate(probe_results.items()):
    if idx >= 6:
        break
        
    ax = axes[idx]
    
    # Get all experiments for this probe
    probe_experiments = [e for e in completed if e['config']['probe_type'] == probe]
    
    # Plot each seed
    for exp in probe_experiments:
        if 'training_history' in exp:
            history = exp['training_history']
            if 'val_top_1_accuracy' in history:
                epochs = range(1, len(history['val_top_1_accuracy']) + 1)
                ax.plot(epochs, history['val_top_1_accuracy'], 
                       alpha=0.6, linewidth=2)
    
    # Calculate mean curve
    all_curves = []
    for exp in probe_experiments:
        if 'training_history' in exp and 'val_top_1_accuracy' in exp['training_history']:
            all_curves.append(exp['training_history']['val_top_1_accuracy'])
    
    if all_curves:
        # Find minimum length
        min_len = min(len(c) for c in all_curves)
        truncated = [c[:min_len] for c in all_curves]
        mean_curve = np.mean(truncated, axis=0)
        epochs = range(1, len(mean_curve) + 1)
        ax.plot(epochs, mean_curve, 'r-', linewidth=3, label='Mean')
    
    ax.set_xlabel('Epoch', fontsize=10)
    ax.set_ylabel('Validation Accuracy', fontsize=10)
    ax.set_title(f'{probe}', fontsize=11, fontweight='bold')
    ax.grid(True, alpha=0.3)
    ax.set_ylim([0, 1])
    if all_curves:
        ax.legend(loc='lower right')

fig.suptitle('Training Curves by Probe Type', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.savefig(engine.output_dir / 'probe_training_curves.png', dpi=300, bbox_inches='tight')
plt.show()

print("\n‚úì Plot saved to: outputs/probe_training_curves.png")

## 7. Optional: Run Sparsity Sweep Campaign

The sparsity sweep explores how varying the measurement budget (M) affects performance. This campaign runs 45 experiments testing M values from 4 to 32.

**Warning:** This campaign takes ~30 minutes. Comment out this cell if you want to skip it.

In [None]:
# Uncomment to run sparsity sweep
# print("\n" + "="*80)
# print("STARTING SPARSITY SWEEP CAMPAIGN")
# print("="*80)
# print("\nThis will run ~45 experiments (5 M values √ó 3 probes √ó 3 seeds)")
# print("Expected runtime: ~30 minutes on CPU\n")
#
# try:
#     sparsity_campaign = engine.search("../configs/search_spaces/sparsity_sweep.yaml")
#     print("\n‚úì Sparsity sweep campaign completed!")
# except Exception as e:
#     print(f"\n‚úó Campaign failed: {e}")

print("\n‚ÑπÔ∏è  Sparsity sweep is commented out by default.")
print("   Uncomment the code above to run this campaign.")

## 8. Sparsity Analysis Plots

If you ran the sparsity sweep, visualize how accuracy varies with measurement budget M. This shows the trade-off between sensing overhead and performance.

In [None]:
# Check if sparsity campaign was run
if 'sparsity_campaign' in locals():
    # Analyze sparsity results
    sparsity_experiments = sparsity_campaign['experiments']
    sparsity_completed = [e for e in sparsity_experiments if e['status'] == 'completed']
    
    # Group by M value and probe
    sparsity_data = {}
    for exp in sparsity_completed:
        M = exp['config']['system']['M']
        probe = exp['config']['probe_type']
        key = (M, probe)
        
        if key not in sparsity_data:
            sparsity_data[key] = []
        sparsity_data[key].append(exp['metrics']['top_1_accuracy'])
    
    # Plot M vs accuracy for each probe
    fig, ax = plt.subplots(figsize=(12, 7))
    
    # Get unique probes
    probes = sorted(set(key[1] for key in sparsity_data.keys()))
    markers = ['o', 's', '^', 'D', 'v', 'p']
    
    for idx, probe in enumerate(probes):
        # Get data for this probe
        M_values = []
        mean_accs = []
        std_accs = []
        
        for M in sorted(set(key[0] for key in sparsity_data.keys())):
            if (M, probe) in sparsity_data:
                accs = sparsity_data[(M, probe)]
                M_values.append(M)
                mean_accs.append(np.mean(accs))
                std_accs.append(np.std(accs))
        
        # Plot with error bars
        ax.errorbar(M_values, mean_accs, yerr=std_accs, 
                   marker=markers[idx % len(markers)], markersize=8,
                   linewidth=2.5, capsize=5, capthick=2,
                   label=probe, alpha=0.8)
    
    ax.set_xlabel('Measurement Budget (M)', fontsize=12, fontweight='bold')
    ax.set_ylabel('Top-1 Accuracy', fontsize=12, fontweight='bold')
    ax.set_title('Sparsity Analysis: Accuracy vs. Measurement Budget', 
                fontsize=14, fontweight='bold')
    ax.legend(loc='best', fontsize=10)
    ax.grid(True, alpha=0.3)
    ax.set_ylim([0, 1])
    
    plt.tight_layout()
    plt.savefig(engine.output_dir / 'sparsity_analysis.png', dpi=300, bbox_inches='tight')
    plt.show()
    
    print("\n‚úì Plot saved to: outputs/sparsity_analysis.png")
else:
    print("\n‚ÑπÔ∏è  Sparsity campaign not run. Skipping analysis.")

## 9. Export Results to CSV

Export all experiment results to CSV format for further analysis in Excel, R, or other tools.

In [None]:
# Export probe comparison results
export_data = []
for exp in completed:
    export_data.append({
        'experiment_id': exp.get('experiment_id', 'N/A'),
        'name': exp['config']['name'],
        'probe_type': exp['config']['probe_type'],
        'model_type': exp['config']['model_type'],
        'N': exp['config']['system']['N'],
        'K': exp['config']['system']['K'],
        'M': exp['config']['system']['M'],
        'top_1_accuracy': exp['metrics']['top_1_accuracy'],
        'top_5_accuracy': exp['metrics'].get('top_5_accuracy', 0),
        'val_loss': exp['metrics'].get('val_loss', 0),
        'training_time_seconds': exp['training_time_seconds'],
        'total_epochs': exp['total_epochs'],
        'best_epoch': exp['best_epoch'],
        'model_parameters': exp['model_parameters'],
        'status': exp['status'],
        'timestamp': exp.get('timestamp', 'N/A')
    })

export_df = pd.DataFrame(export_data)

# Save to CSV
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
csv_path = engine.output_dir / f'probe_comparison_{timestamp}.csv'
export_df.to_csv(csv_path, index=False)

print(f"\n‚úì Results exported to: {csv_path}")
print(f"  Total experiments: {len(export_df)}")
print(f"  Columns: {', '.join(export_df.columns)}")

# Display first few rows
print("\nPreview:")
print(export_df[['name', 'probe_type', 'top_1_accuracy', 'training_time_seconds']].head())

## 10. Cross-Fidelity Validation (Optional)

The cross-fidelity validation campaign tests how results transfer across different system scales. This is useful for:
- Validating small-scale prototypes before full deployment
- Understanding scaling behavior
- Reducing computational costs during initial exploration

**Setup Instructions:**
```python
# First, run low-fidelity experiments (fast)
low_fidelity = engine.search("../configs/search_spaces/cross_fidelity_validation.yaml")

# Then analyze fidelity gap
analyzer = ResultAnalyzer("results.db")
fidelity_gap = analyzer.compute_fidelity_gap(
    low_fidelity_campaign='quick_test',
    high_fidelity_campaign='full_search'
)

# Plot correlation
engine.plot_fidelity_correlation(fidelity_gap)
```

This analysis shows whether rankings from small-scale tests match full-scale results, allowing you to predict performance without expensive computation.

In [None]:
# Placeholder for cross-fidelity analysis
print("\n‚ÑπÔ∏è  Cross-fidelity validation requires running both low and high fidelity campaigns.")
print("   See the markdown cell above for setup instructions.")
print("   This is an advanced feature typically used for large-scale studies.")

## Summary

In this notebook, you've learned to:
- ‚úì List and explore available search space configurations
- ‚úì Execute multi-experiment campaigns
- ‚úì Compute summary statistics across random seeds
- ‚úì Generate publication-quality comparison plots
- ‚úì Visualize training dynamics and convergence
- ‚úì Export results to CSV for external analysis

### Key Findings (Probe Comparison):
Based on typical results, you should observe:
1. **Structured probes** (DFT, Hadamard) outperform random baselines
2. **Quasi-random probes** (Sobol, Halton) offer good middle ground
3. **Variance** across seeds is typically 1-3% for stable probes
4. **Training speed** is similar across probe types

### Next Steps:
- **Model Comparison**: Run `model_comparison.yaml` to test different architectures
- **Hyperparameter Tuning**: Modify configs to explore learning rates, batch sizes
- **Deep Analysis**: Use `04_analyze_results.ipynb` for statistical tests
- **Production**: Deploy best configuration for your specific use case

### Available for Exploration:
```python
# Quick experiments
engine.search("../configs/search_spaces/quick_test.yaml")  # 2 min

# Comprehensive studies
engine.search("../configs/search_spaces/model_comparison.yaml")  # 10 min
engine.search("../configs/search_spaces/sparsity_sweep.yaml")  # 30 min
engine.search("../configs/search_spaces/full_search.yaml")  # hours
```

All results are automatically saved to `results.db` and can be analyzed anytime! üöÄ