# Capacity Ablation Analysis with Novel Diversity Metrics
## Research Framework: Understanding Diversity Mechanisms in Multi-Objective GFlowNets

This notebook analyzes **RQ1: What drives diversity in MOGFNs?** - specifically testing **H1a: Hypernetwork capacity is the primary driver**.

### Novel Metrics Implemented:
- **Spatial Diversity**: MCE (Mode Coverage Entropy), PMD (Pairwise Minimum Distance)
- **Objective Space**: PAS (Preference-Aligned Spread), PFS (Pareto Front Smoothness)
- **Trajectory**: TDS (Trajectory Diversity Score), MPD (Multi-Path Diversity)
- **Flow**: FCI (Flow Concentration Index)
- **Dynamics**: RBD (Replay Buffer Diversity)

### Analysis Goals:
1. Compare capacity configurations using novel diversity metrics
2. Identify which metrics capture unique diversity aspects
3. Select best capacity for sampling ablation (Week 4)
4. Validate that novel metrics provide insights beyond traditional metrics

In [None]:
# Install local project package (helps resolve local `src` imports if the project is a package)
%pip install -e /Users/katherinedemers/Documents/GitHub/diversity-mogfn

# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

# Set style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

# Ensure project root is on sys.path so `src` can be imported
import sys
project_root = Path('/Users/katherinedemers/Documents/GitHub/diversity-mogfn').resolve()
if str(project_root) not in sys.path:
	sys.path.insert(0, str(project_root))

# Fallback path (keeps previous behavior)
if '/mnt/user-data/uploads' not in sys.path:
	sys.path.append('/mnt/user-data/uploads')

# Import novel diversity metrics (raise informative error if unavailable)
try:
	from src.metrics.spatial import mode_coverage_entropy, pairwise_minimum_distance
	from src.metrics.objective import preference_aligned_spread, pareto_front_smoothness
	from src.metrics.trajectory import trajectory_diversity_score, multi_path_diversity
	from src.metrics.flow import flow_concentration_index
	from src.metrics.dynamics import replay_buffer_diversity
	print("✓ All imports successful")
	print("✓ Novel diversity metrics loaded")
except ModuleNotFoundError as e:
	raise ModuleNotFoundError(
		f"{e}. Ensure the `src` package exists under {project_root} or /mnt/user-data/uploads, "
		"or install it as a package (e.g. add a pyproject.toml/setup.py and run `%pip install -e .`)."
	)

ModuleNotFoundError: No module named 'src'

## 1. Load Capacity Ablation Results

In [None]:
# Load results by scanning the capacity folders and reading each `metrics.json`
import json
import re
from pathlib import Path

# Define results root path
results_root = Path('/Users/katherinedemers/Documents/GitHub/diversity-mogfn/results/ablations/capacity')

print(f'Using results root: {results_root}')

# Find all metrics.json files under the results root (recursive)
metrics_files = list(results_root.rglob('metrics.json'))
print(f'Found {len(metrics_files)} metrics.json files')

records = []
for mf in metrics_files:
    try:
        with open(mf, 'r') as fh:
            data = json.load(fh)
    except Exception as e:
        print(f'Warning: failed to load {mf}: {e}')
        continue

    # Attach provenance
    data['_metrics_path'] = str(mf)
    data['_result_dir'] = str(mf.parent)
    records.append(data)

# Convert to DataFrame
if len(records) == 0:
    print('No metrics.json records found under results root.')
    # Create example structure for demonstration
    df = pd.DataFrame({
        'capacity': ['small', 'medium', 'large', 'xlarge'] * 5,
        'conditioning': ['hypernet'] * 20,
        'seed': [0, 1, 2, 3, 4] * 4,
        'hypervolume': np.random.rand(20) * 0.8 + 0.2,
        'avg_pairwise_distance': np.random.rand(20) * 2 + 1,
        'training_time': np.random.rand(20) * 1000 + 500,
        'num_parameters': [100000, 250000, 500000, 1000000] * 5
    })
else:
    df = pd.json_normalize(records)
    # Coerce seed column to integer where possible
    if 'seed' in df.columns:
        df['seed'] = pd.to_numeric(df['seed'], errors='coerce').astype('Int64')

print('\nLoaded results dataframe with columns:')
print(df.columns.tolist())
print(f'Number of experiments loaded: {len(df)}')

if len(df) > 0:
    print(f"\nCapacity levels: {df['capacity'].unique()}")
    print(f"Conditioning types: {df['conditioning'].unique()}")
    print(f"Seeds: {df['seed'].unique()}")

# Determine available novel metrics (keeps downstream cells compatible)
novel_metric_columns = ['mce', 'num_modes', 'pmd', 'pas', 'pfs', 'tds', 'mpd', 'fci', 'rbd']
available_novel_metrics = [col for col in novel_metric_columns if col in df.columns]
missing_novel_metrics = [col for col in novel_metric_columns if col not in df.columns]

print(f'\nAvailable novel metrics: {available_novel_metrics}')
print(f'Missing novel metrics: {missing_novel_metrics}')

# Provide a convenience variable for capacity order used in downstream plotting/analysis
capacity_order = ['small', 'medium', 'large', 'xlarge']

## 2. Compute Novel Diversity Metrics

**CRITICAL**: Your CSV must contain the raw data needed to compute these metrics:
- `trajectories`: List of trajectory objects for each experiment
- `objectives`: Objective values array (N, num_objectives)
- `state_visits`: State visitation counts (for FCI)
- `replay_buffer`: Replay buffer trajectories (for RBD)

**If you don't have these yet**, you'll need to:
1. Modify your experimental code to save this data
2. Or compute metrics during training and save them directly

For now, I'll show you how to compute them when data is available:

In [None]:
def compute_all_novel_metrics(row):
    """
    Compute all novel diversity metrics for one experimental run.
    
    This function assumes your CSV has columns with raw data:
    - 'objectives': numpy array of shape (N, num_objectives)
    - 'trajectories': list of trajectory objects
    - 'state_visits': dict or array of state visitation counts
    - 'replay_buffer': list of trajectory objects
    - 'gflownet': trained GFlowNet model (for PAS)
    
    Returns dict of computed metrics.
    """
    metrics = {}
    
    # === SPATIAL DIVERSITY ===
    try:
        objectives = row['objectives']  # Should be numpy array
        mce, num_modes = mode_coverage_entropy(objectives, eps='auto', min_samples=5)
        metrics['mce'] = mce
        metrics['num_modes'] = num_modes
        
        pmd = pairwise_minimum_distance(objectives, top_k=100)
        metrics['pmd'] = pmd
    except Exception as e:
        print(f"Warning: Could not compute spatial metrics - {e}")
        metrics['mce'] = np.nan
        metrics['num_modes'] = np.nan
        metrics['pmd'] = np.nan
    
    # === OBJECTIVE SPACE ===
    try:
        # PAS requires trained GFlowNet model
        if 'gflownet' in row:
            pas, spreads = preference_aligned_spread(
                row['gflownet'], 
                num_preferences=20, 
                samples_per_pref=50
            )
            metrics['pas'] = pas
            metrics['pas_std'] = np.std(spreads)
        else:
            metrics['pas'] = np.nan
            metrics['pas_std'] = np.nan
        
        # PFS
        pfs = pareto_front_smoothness(objectives, method='curve_fitting')
        metrics['pfs'] = pfs
    except Exception as e:
        print(f"Warning: Could not compute objective space metrics - {e}")
        metrics['pas'] = np.nan
        metrics['pas_std'] = np.nan
        metrics['pfs'] = np.nan
    
    # === TRAJECTORY DIVERSITY ===
    try:
        trajectories = row['trajectories']  # Should be list of trajectory objects
        
        tds = trajectory_diversity_score(trajectories)
        metrics['tds'] = tds
        
        mpd = multi_path_diversity(trajectories, method='entropy')
        metrics['mpd'] = mpd
    except Exception as e:
        print(f"Warning: Could not compute trajectory metrics - {e}")
        metrics['tds'] = np.nan
        metrics['mpd'] = np.nan
    
    # === FLOW CONCENTRATION ===
    try:
        state_visits = row['state_visits']  # Dict or array
        fci = flow_concentration_index(state_visits, method='gini')
        metrics['fci'] = fci
    except Exception as e:
        print(f"Warning: Could not compute flow metrics - {e}")
        metrics['fci'] = np.nan
    
    # === DYNAMICS ===
    try:
        replay_buffer = row['replay_buffer']  # List of trajectories
        rbd = replay_buffer_diversity(
            replay_buffer, 
            metric='trajectory_distance',
            sample_size=500
        )
        metrics['rbd'] = rbd
    except Exception as e:
        print(f"Warning: Could not compute dynamics metrics - {e}")
        metrics['rbd'] = np.nan
    
    return metrics

# Apply to all rows (if raw data is available)
print("Computing novel diversity metrics...")
print("⚠️ NOTE: This requires raw data (trajectories, objectives, etc.) in your CSV")
print("If you don't have this data yet, skip to the analysis section using traditional metrics")

# Uncomment when you have raw data:
# novel_metrics = df.apply(compute_all_novel_metrics, axis=1, result_type='expand')
# df = pd.concat([df, novel_metrics], axis=1)
# print("✓ Novel metrics computed for all experiments")

### Alternative: Load Pre-Computed Novel Metrics

If you computed metrics during training and saved them:

In [None]:
# If your CSV already contains novel metrics as columns:
novel_metric_columns = ['mce', 'num_modes', 'pmd', 'pas', 'pfs', 'tds', 'mpd', 'fci', 'rbd']

# Check which metrics are available
available_novel_metrics = [col for col in novel_metric_columns if col in df.columns]
missing_novel_metrics = [col for col in novel_metric_columns if col not in df.columns]

print(f"Available novel metrics: {available_novel_metrics}")
print(f"Missing novel metrics: {missing_novel_metrics}")

if len(available_novel_metrics) == 0:
    print("\n⚠️ No novel metrics found in CSV. Using traditional metrics only for now.")
    print("You should compute novel metrics during your experiments and save them.")

## 3. Descriptive Statistics by Capacity

In [None]:
# Group by capacity and compute statistics
capacity_stats = df.groupby('capacity').agg({
    'hypervolume': ['mean', 'std'],
    'avg_pairwise_distance': ['mean', 'std'],
    'training_time': ['mean', 'std'],
    'num_parameters': 'first',
}).round(4)

# Add novel metrics if available
for metric in available_novel_metrics:
    if metric in df.columns:
        capacity_stats[(metric, 'mean')] = df.groupby('capacity')[metric].mean()
        capacity_stats[(metric, 'std')] = df.groupby('capacity')[metric].std()

print("="*80)
print("CAPACITY ABLATION: DESCRIPTIVE STATISTICS")
print("="*80)
print(capacity_stats)
print("\n" + "="*80)

## 4. Visualization: Novel Metrics vs Capacity

### Figure 1: Comprehensive Diversity Metrics Comparison

In [None]:
# Determine which metrics to plot
metrics_to_plot = {
    'Traditional': ['hypervolume', 'avg_pairwise_distance', 'spacing', 'spread'],
    'Spatial': ['mce', 'pmd'],
    'Objective': ['pas', 'pfs'],
    'Trajectory': ['tds', 'mpd'],
    'Flow/Dynamics': ['fci', 'rbd']
}

# Filter to only available metrics
available_metrics_by_category = {}
for category, metrics in metrics_to_plot.items():
    available = [m for m in metrics if m in df.columns]
    if available:
        available_metrics_by_category[category] = available

# Create comprehensive figure
n_categories = len(available_metrics_by_category)
fig, axes = plt.subplots(n_categories, 1, figsize=(12, 5 * n_categories))
if n_categories == 1:
    axes = [axes]

capacity_order = ['small', 'medium', 'large', 'xlarge']

for idx, (category, metrics) in enumerate(available_metrics_by_category.items()):
    ax = axes[idx]
    
    # Plot all metrics in this category
    for metric in metrics:
        # Compute mean and std for each capacity
        means = []
        stds = []
        for cap in capacity_order:
            data = df[df['capacity'] == cap][metric].dropna()
            means.append(data.mean())
            stds.append(data.std())
        
        # Plot with error bars
        ax.errorbar(capacity_order, means, yerr=stds, 
                   marker='o', linewidth=2, markersize=8, 
                   label=metric.upper(), capsize=5)
    
    ax.set_xlabel('Capacity', fontsize=12, fontweight='bold')
    ax.set_ylabel('Metric Value', fontsize=12, fontweight='bold')
    ax.set_title(f'{category} Diversity Metrics vs Capacity', 
                fontsize=14, fontweight='bold')
    ax.legend(loc='best')
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('/mnt/user-data/outputs/capacity_diversity_metrics.pdf', dpi=300, bbox_inches='tight')
print("✓ Saved: capacity_diversity_metrics.pdf")
plt.show()

### Figure 2: Metric Correlation Heatmap

**Research Question**: Do novel metrics capture distinct aspects of diversity?

In [None]:
# Select all diversity metrics (traditional + novel)
all_diversity_metrics = ['avg_pairwise_distance', 'spacing', 'spread'] + available_novel_metrics
available_for_corr = [m for m in all_diversity_metrics if m in df.columns]

if len(available_for_corr) >= 2:
    # Compute correlation matrix
    corr_matrix = df[available_for_corr].corr()
    
    # Plot heatmap
    plt.figure(figsize=(10, 8))
    sns.heatmap(corr_matrix, annot=True, fmt='.3f', cmap='coolwarm', 
                center=0, vmin=-1, vmax=1, square=True,
                cbar_kws={'label': 'Correlation'})
    plt.title('Diversity Metrics Correlation Matrix', fontsize=16, fontweight='bold', pad=20)
    plt.tight_layout()
    plt.savefig('/mnt/user-data/outputs/metric_correlation_heatmap.pdf', dpi=300, bbox_inches='tight')
    print("✓ Saved: metric_correlation_heatmap.pdf")
    plt.show()
    
    # Find metrics with low correlation (novel information)
    print("\n=== METRIC INDEPENDENCE ANALYSIS ===")
    print("Metrics with low correlation (< 0.5) capture distinct diversity aspects:\n")
    
    for i, metric1 in enumerate(available_for_corr):
        for metric2 in available_for_corr[i+1:]:
            corr = corr_matrix.loc[metric1, metric2]
            if abs(corr) < 0.5:
                print(f"  • {metric1.upper()} ↔ {metric2.upper()}: r={corr:.3f}")
else:
    print("⚠️ Not enough metrics available for correlation analysis")

### Figure 3: Capacity vs Parameters (Efficiency Analysis)

In [None]:
# Diversity-Efficiency Ratio (DER) calculation
if 'avg_pairwise_distance' in df.columns:
    df['der'] = df['avg_pairwise_distance'] / (df['training_time'] * df['num_parameters'] / 1e9)
    
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    
    # (a) Diversity vs Parameters
    for capacity in capacity_order:
        subset = df[df['capacity'] == capacity]
        axes[0].scatter(subset['num_parameters'], subset['avg_pairwise_distance'],
                       label=capacity, s=100, alpha=0.6)
    
    axes[0].set_xlabel('Number of Parameters', fontsize=12, fontweight='bold')
    axes[0].set_ylabel('Diversity (Avg Pairwise Distance)', fontsize=12, fontweight='bold')
    axes[0].set_title('(a) Diversity vs Model Size', fontsize=14, fontweight='bold')
    axes[0].set_xscale('log')
    axes[0].legend()
    axes[0].grid(True, alpha=0.3)
    
    # (b) Diversity-Efficiency Ratio
    der_by_capacity = df.groupby('capacity')['der'].agg(['mean', 'std'])
    axes[1].bar(capacity_order, 
               [der_by_capacity.loc[c, 'mean'] if c in der_by_capacity.index else 0 for c in capacity_order],
               yerr=[der_by_capacity.loc[c, 'std'] if c in der_by_capacity.index else 0 for c in capacity_order],
               capsize=5, alpha=0.7, color=['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728'])
    
    axes[1].set_xlabel('Capacity', fontsize=12, fontweight='bold')
    axes[1].set_ylabel('Diversity-Efficiency Ratio (DER)', fontsize=12, fontweight='bold')
    axes[1].set_title('(b) Efficiency Comparison', fontsize=14, fontweight='bold')
    axes[1].grid(True, alpha=0.3, axis='y')
    
    plt.tight_layout()
    plt.savefig('/mnt/user-data/outputs/capacity_efficiency_analysis.pdf', dpi=300, bbox_inches='tight')
    print("✓ Saved: capacity_efficiency_analysis.pdf")
    plt.show()

## 5. Statistical Analysis

### Hypothesis Testing: Does capacity significantly affect diversity?

In [None]:
from scipy import stats

print("="*80)
print("STATISTICAL SIGNIFICANCE TESTING")
print("="*80)
print("\nTesting H1a: Hypernetwork capacity significantly affects diversity\n")

# Select primary diversity metric
primary_diversity_metric = 'avg_pairwise_distance'
if 'mce' in df.columns:
    primary_diversity_metric = 'mce'  # Prefer novel metric if available

# Prepare data by capacity
groups = []
for cap in capacity_order:
    data = df[df['capacity'] == cap][primary_diversity_metric].dropna()
    groups.append(data)

# One-way ANOVA
if all(len(g) > 0 for g in groups):
    F_stat, p_value = stats.f_oneway(*groups)
    
    print(f"Metric: {primary_diversity_metric.upper()}")
    print(f"F-statistic: {F_stat:.4f}")
    print(f"p-value: {p_value:.4e}")
    
    if p_value < 0.05:
        print("✓ SIGNIFICANT: Capacity has significant effect on diversity (p < 0.05)")
    else:
        print("✗ NOT SIGNIFICANT: Capacity effect not detected (p >= 0.05)")
    
    # Post-hoc pairwise comparisons (Bonferroni corrected)
    print("\n--- Post-hoc Pairwise Comparisons (Bonferroni corrected) ---")
    n_comparisons = 0
    for i, cap1 in enumerate(capacity_order):
        for cap2 in capacity_order[i+1:]:
            n_comparisons += 1
    
    for i, cap1 in enumerate(capacity_order):
        for cap2 in capacity_order[i+1:]:
            t_stat, p = stats.ttest_ind(groups[capacity_order.index(cap1)], 
                                       groups[capacity_order.index(cap2)])
            p_corrected = min(p * n_comparisons, 1.0)
            sig = "***" if p_corrected < 0.001 else "**" if p_corrected < 0.01 else "*" if p_corrected < 0.05 else "ns"
            print(f"  {cap1:8s} vs {cap2:8s}: p={p_corrected:.4f} {sig}")
    
    # Effect sizes (Cohen's d)
    print("\n--- Effect Sizes (Cohen's d) ---")
    def cohens_d(group1, group2):
        n1, n2 = len(group1), len(group2)
        var1, var2 = np.var(group1, ddof=1), np.var(group2, ddof=1)
        pooled_std = np.sqrt(((n1-1)*var1 + (n2-1)*var2) / (n1+n2-2))
        return (np.mean(group1) - np.mean(group2)) / pooled_std
    
    for i, cap1 in enumerate(capacity_order):
        for cap2 in capacity_order[i+1:]:
            d = cohens_d(groups[capacity_order.index(cap1)], 
                        groups[capacity_order.index(cap2)])
            magnitude = "large" if abs(d) >= 0.8 else "medium" if abs(d) >= 0.5 else "small"
            print(f"  {cap1:8s} vs {cap2:8s}: d={d:.3f} ({magnitude})")

print("\n" + "="*80)

## 6. Scenario-Based Configuration Ranking

Following the research framework's recommendation for **multi-scenario analysis**:

In [None]:
def normalize_metric(series, higher_is_better=True):
    """Normalize metric to [0, 1] range"""
    min_val = series.min()
    max_val = series.max()
    if max_val == min_val:
        return pd.Series([0.5] * len(series), index=series.index)
    
    normalized = (series - min_val) / (max_val - min_val)
    if not higher_is_better:
        normalized = 1 - normalized
    return normalized

# Prepare scoring dataframe
scoring_df = df.groupby(['capacity', 'conditioning']).agg({
    'hypervolume': 'mean',
    'avg_pairwise_distance': 'mean',
    'training_time': 'mean',
    'num_parameters': 'first'
}).reset_index()

# Add novel metrics if available
for metric in available_novel_metrics:
    if metric in df.columns:
        scoring_df[metric] = df.groupby(['capacity', 'conditioning'])[metric].mean().values

# Normalize metrics
scoring_df['quality_norm'] = normalize_metric(scoring_df['hypervolume'], True)
scoring_df['diversity_norm'] = normalize_metric(scoring_df['avg_pairwise_distance'], True)
scoring_df['efficiency_norm'] = normalize_metric(
    1 / (scoring_df['training_time'] * scoring_df['num_parameters']), True
)

# Add normalized novel metrics
if 'mce' in scoring_df.columns:
    scoring_df['mce_norm'] = normalize_metric(scoring_df['mce'], True)
if 'tds' in scoring_df.columns:
    scoring_df['tds_norm'] = normalize_metric(scoring_df['tds'], True)
if 'fci' in scoring_df.columns:
    scoring_df['fci_norm'] = normalize_metric(scoring_df['fci'], False)  # Lower FCI is better

print("="*80)
print("SCENARIO-BASED CONFIGURATION RANKING")
print("="*80)

# === SCENARIO 1: Maximum Diversity (PRIMARY FOR RESEARCH) ===
print("\n📊 SCENARIO 1: Maximum Diversity (Research Focus)")
print("-" * 80)

# Build diversity score using multiple metrics if available
diversity_components = {'diversity_norm': 0.30}
remaining_weight = 0.50

if 'mce_norm' in scoring_df.columns:
    diversity_components['mce_norm'] = 0.25
    remaining_weight -= 0.25
if 'tds_norm' in scoring_df.columns:
    diversity_components['tds_norm'] = 0.15
    remaining_weight -= 0.15
if 'fci_norm' in scoring_df.columns:
    diversity_components['fci_norm'] = 0.10
    remaining_weight -= 0.10

# Distribute remaining weight
if remaining_weight > 0:
    diversity_components['diversity_norm'] += remaining_weight

# Compute diversity-focused score
scoring_df['diversity_focused_score'] = sum(
    weight * scoring_df[metric] for metric, weight in diversity_components.items()
)
scoring_df['diversity_focused_score'] += 0.15 * scoring_df['quality_norm']  # Quality threshold
scoring_df['diversity_focused_score'] += 0.05 * scoring_df['efficiency_norm']  # Minor efficiency

top_3_diversity = scoring_df.nlargest(3, 'diversity_focused_score')
print("\nTop 3 Configurations:")
print(top_3_diversity[['capacity', 'conditioning', 'diversity_focused_score', 
                       'avg_pairwise_distance', 'hypervolume']].to_string(index=False))

# === SCENARIO 2: Balanced Quality-Diversity ===
print("\n📊 SCENARIO 2: Balanced Quality-Diversity")
print("-" * 80)

scoring_df['balanced_score'] = (
    0.40 * scoring_df['quality_norm'] +
    0.40 * scoring_df['diversity_norm'] +
    0.20 * scoring_df['efficiency_norm']
)

top_3_balanced = scoring_df.nlargest(3, 'balanced_score')
print("\nTop 3 Configurations:")
print(top_3_balanced[['capacity', 'conditioning', 'balanced_score',
                      'avg_pairwise_distance', 'hypervolume']].to_string(index=False))

# === SCENARIO 3: Quality Priority (Sanity Check) ===
print("\n📊 SCENARIO 3: Quality Priority (Sanity Check)")
print("-" * 80)

scoring_df['quality_priority_score'] = (
    0.60 * scoring_df['quality_norm'] +
    0.30 * scoring_df['diversity_norm'] +
    0.10 * scoring_df['efficiency_norm']
)

top_3_quality = scoring_df.nlargest(3, 'quality_priority_score')
print("\nTop 3 Configurations:")
print(top_3_quality[['capacity', 'conditioning', 'quality_priority_score',
                     'hypervolume', 'avg_pairwise_distance']].to_string(index=False))

print("\n" + "="*80)

## 7. Final Recommendation for Sampling Ablation

In [None]:
# Select best configuration for next phase (sampling ablation)
best_config = scoring_df.nlargest(1, 'diversity_focused_score').iloc[0]

print("="*80)
print("🎯 RECOMMENDED CONFIGURATION FOR SAMPLING ABLATION (Week 4)")
print("="*80)
print(f"\nCapacity: {best_config['capacity'].upper()}")
print(f"Conditioning: {best_config['conditioning'].upper()}")
print(f"\nRationale: Maximizes diversity while maintaining quality")
print(f"\nPerformance Metrics:")
print(f"  • Diversity Score: {best_config['diversity_focused_score']:.4f}")
print(f"  • Avg Pairwise Distance: {best_config['avg_pairwise_distance']:.4f}")
print(f"  • Hypervolume: {best_config['hypervolume']:.4f}")
print(f"  • Parameters: {int(best_config['num_parameters']):,}")
print(f"  • Training Time: {best_config['training_time']:.2f}s")

if 'mce' in best_config:
    print(f"  • Mode Coverage Entropy (MCE): {best_config['mce']:.4f}")
if 'tds' in best_config:
    print(f"  • Trajectory Diversity Score (TDS): {best_config['tds']:.4f}")
if 'fci' in best_config:
    print(f"  • Flow Concentration Index (FCI): {best_config['fci']:.4f}")

print("\n" + "="*80)
print("\n✅ Phase 1 (Capacity Ablation) Complete")
print("➡️  Next: Run sampling ablation experiments with this configuration")
print("="*80)

## 8. Export Results

In [None]:
# Save processed results
scoring_df.to_csv('/mnt/user-data/outputs/capacity_ablation_scores.csv', index=False)
print("✓ Saved: capacity_ablation_scores.csv")

# Save recommendation
with open('/mnt/user-data/outputs/recommended_config.txt', 'w') as f:
    f.write("RECOMMENDED CONFIGURATION FOR SAMPLING ABLATION\n")
    f.write("="*60 + "\n\n")
    f.write(f"Capacity: {best_config['capacity']}\n")
    f.write(f"Conditioning: {best_config['conditioning']}\n")
    f.write(f"\nDiversity Score: {best_config['diversity_focused_score']:.4f}\n")
    f.write(f"Hypervolume: {best_config['hypervolume']:.4f}\n")
    f.write(f"Avg Pairwise Distance: {best_config['avg_pairwise_distance']:.4f}\n")
print("✓ Saved: recommended_config.txt")

print("\n" + "="*80)
print("ALL RESULTS SAVED TO /mnt/user-data/outputs/")
print("="*80)