# Baseline Comparison Analysis

This notebook provides comprehensive baseline comparison analysis for carbon-efficient Kubernetes scheduling algorithms. We'll compare different scheduling approaches against established baselines to understand relative performance and improvements.

## Baseline Categories

We'll analyze several baseline categories:

1. **Default Kubernetes Scheduler**: Standard Kubernetes scheduling
2. **Carbon-Aware Schedulers**: Schedulers that consider carbon footprint
3. **Energy-Efficient Schedulers**: Schedulers optimized for energy consumption
4. **Performance-Optimized Schedulers**: Schedulers focused on performance metrics
5. **Hybrid Approaches**: Schedulers balancing multiple objectives

## Analysis Objectives

- **Performance Comparison**: How do different approaches compare?
- **Trade-off Analysis**: What are the trade-offs between different objectives?
- **Statistical Validation**: Are the differences statistically significant?
- **Scenario Analysis**: How do baselines perform under different conditions?
- **Recommendation Generation**: Which baseline is best for specific use cases?

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import yaml
import json
from datetime import datetime
from scipy import stats
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans
import warnings
warnings.filterwarnings('ignore')

# Set up plotting
plt.style.use('seaborn-v0_8')
sns.set_palette("Set2")
%matplotlib inline

print("✅ Libraries imported successfully")

## 1. Load Baseline Datasets

Let's load all available baseline datasets for comparison.

In [None]:
# Load baseline datasets
data_path = Path('../data')
baseline_datasets = {}

# Define baseline scenarios
baseline_scenarios = [
    'kubernetes_default',
    'carbon_aware_v1',
    'carbon_aware_v2',
    'energy_efficient',
    'performance_optimized',
    'hybrid_balanced',
    'cost_optimized'
]

for scenario in baseline_scenarios:
    try:
        path = data_path / 'synthetic' / f'baseline_{scenario}.csv'
        baseline_datasets[scenario] = pd.read_csv(path)
        print(f"✅ Loaded {scenario}: {len(baseline_datasets[scenario])} samples")
    except FileNotFoundError:
        print(f"❌ Could not load {scenario}")

print(f"\n📊 Loaded {len(baseline_datasets)} baseline datasets")

# Also load benchmark datasets
benchmark_datasets = {}
benchmark_scenarios = ['industry_standard', 'research_baseline', 'best_practice']

for scenario in benchmark_scenarios:
    try:
        path = data_path / 'benchmarks' / f'{scenario}.csv'
        benchmark_datasets[scenario] = pd.read_csv(path)
        print(f"✅ Loaded benchmark {scenario}: {len(benchmark_datasets[scenario])} samples")
    except FileNotFoundError:
        print(f"❌ Could not load benchmark {scenario}")

In [None]:
# Combine all datasets for comprehensive analysis
combined_data = pd.DataFrame()

# Add baseline datasets
for scenario, df in baseline_datasets.items():
    df_copy = df.copy()
    df_copy['baseline_type'] = scenario
    df_copy['dataset_category'] = 'baseline'
    combined_data = pd.concat([combined_data, df_copy], ignore_index=True)

# Add benchmark datasets
for scenario, df in benchmark_datasets.items():
    df_copy = df.copy()
    df_copy['baseline_type'] = scenario
    df_copy['dataset_category'] = 'benchmark'
    combined_data = pd.concat([combined_data, df_copy], ignore_index=True)

if not combined_data.empty:
    print(f"📈 Combined dataset: {len(combined_data)} samples")
    print(f"🏷️ Baseline types: {combined_data['baseline_type'].unique()}")
    print(f"📂 Dataset categories: {combined_data['dataset_category'].unique()}")
    
    # Display sample data
    print("\n📋 Sample Data:")
    display(combined_data.head())
else:
    print("❌ No data loaded for analysis")

## 2. Baseline Performance Overview

Let's get an overview of how different baselines perform across key metrics.

In [None]:
# Calculate performance summary by baseline
if not combined_data.empty:
    performance_summary = combined_data.groupby('baseline_type')[[
        'carbon_efficiency', 'energy_consumption', 'performance_score',
        'response_time', 'throughput', 'resource_utilization'
    ]].agg(['mean', 'std', 'min', 'max', 'count']).round(3)
    
    print("📊 Baseline Performance Summary:")
    display(performance_summary)

In [None]:
# Create performance comparison visualization
if not combined_data.empty:
    fig, axes = plt.subplots(2, 3, figsize=(18, 12))
    
    metrics = ['carbon_efficiency', 'energy_consumption', 'performance_score',
               'response_time', 'throughput', 'resource_utilization']
    titles = ['Carbon Efficiency', 'Energy Consumption (W)', 'Performance Score',
              'Response Time (ms)', 'Throughput (req/s)', 'Resource Utilization (%)']
    
    for i, (metric, title) in enumerate(zip(metrics, titles)):
        row, col = i // 3, i % 3
        
        # Box plot by baseline type
        combined_data.boxplot(column=metric, by='baseline_type', ax=axes[row, col])
        axes[row, col].set_title(f'{title} by Baseline')
        axes[row, col].set_xlabel('Baseline Type')
        axes[row, col].set_ylabel(title)
        axes[row, col].tick_params(axis='x', rotation=45)
    
    plt.suptitle('')  # Remove automatic title
    plt.tight_layout()
    plt.show()

## 3. Detailed Baseline Comparison

Let's perform detailed comparisons between specific baselines.

In [None]:
# Create baseline comparison matrix
if not combined_data.empty:
    # Calculate mean values for each baseline
    baseline_means = combined_data.groupby('baseline_type')[[
        'carbon_efficiency', 'energy_consumption', 'performance_score'
    ]].mean()
    
    print("🏆 Baseline Comparison Matrix (Mean Values):")
    display(baseline_means.round(3))
    
    # Rank baselines by each metric
    rankings = pd.DataFrame()
    rankings['Carbon Efficiency Rank'] = baseline_means['carbon_efficiency'].rank(ascending=False)
    rankings['Energy Efficiency Rank'] = baseline_means['energy_consumption'].rank(ascending=True)  # Lower is better
    rankings['Performance Rank'] = baseline_means['performance_score'].rank(ascending=False)
    rankings['Overall Rank'] = (rankings['Carbon Efficiency Rank'] + 
                               rankings['Energy Efficiency Rank'] + 
                               rankings['Performance Rank']) / 3
    
    rankings = rankings.sort_values('Overall Rank')
    
    print("\n🥇 Baseline Rankings (1 = Best):")
    display(rankings.round(1))

In [None]:
# Visualize baseline comparison
if not combined_data.empty:
    fig, axes = plt.subplots(2, 2, figsize=(15, 10))
    
    # Carbon efficiency comparison
    baseline_means['carbon_efficiency'].plot(kind='bar', ax=axes[0, 0], color='green', alpha=0.7)
    axes[0, 0].set_title('Carbon Efficiency by Baseline')
    axes[0, 0].set_ylabel('Carbon Efficiency')
    axes[0, 0].tick_params(axis='x', rotation=45)
    
    # Energy consumption comparison
    baseline_means['energy_consumption'].plot(kind='bar', ax=axes[0, 1], color='red', alpha=0.7)
    axes[0, 1].set_title('Energy Consumption by Baseline')
    axes[0, 1].set_ylabel('Energy Consumption (W)')
    axes[0, 1].tick_params(axis='x', rotation=45)
    
    # Performance score comparison
    baseline_means['performance_score'].plot(kind='bar', ax=axes[1, 0], color='blue', alpha=0.7)
    axes[1, 0].set_title('Performance Score by Baseline')
    axes[1, 0].set_ylabel('Performance Score')
    axes[1, 0].tick_params(axis='x', rotation=45)
    
    # Overall ranking
    rankings['Overall Rank'].plot(kind='bar', ax=axes[1, 1], color='orange', alpha=0.7)
    axes[1, 1].set_title('Overall Ranking (Lower = Better)')
    axes[1, 1].set_ylabel('Average Rank')
    axes[1, 1].tick_params(axis='x', rotation=45)
    
    plt.tight_layout()
    plt.show()

## 4. Statistical Significance Testing

Let's test if the differences between baselines are statistically significant.

In [None]:
def perform_pairwise_tests(df, metric, baseline_col='baseline_type', reference_baseline='kubernetes_default'):
    """Perform pairwise statistical tests against a reference baseline"""
    if reference_baseline not in df[baseline_col].values:
        print(f"❌ Reference baseline '{reference_baseline}' not found")
        return None
    
    reference_data = df[df[baseline_col] == reference_baseline][metric]
    results = []
    
    for baseline in df[baseline_col].unique():
        if baseline == reference_baseline:
            continue
            
        baseline_data = df[df[baseline_col] == baseline][metric]
        
        # T-test
        t_stat, t_pval = stats.ttest_ind(reference_data, baseline_data)
        
        # Mann-Whitney U test
        u_stat, u_pval = stats.mannwhitneyu(reference_data, baseline_data, alternative='two-sided')
        
        # Effect size (Cohen's d)
        pooled_std = np.sqrt(((len(reference_data) - 1) * reference_data.var() + 
                             (len(baseline_data) - 1) * baseline_data.var()) / 
                            (len(reference_data) + len(baseline_data) - 2))
        cohens_d = (baseline_data.mean() - reference_data.mean()) / pooled_std
        
        # Improvement percentage
        improvement = ((baseline_data.mean() - reference_data.mean()) / reference_data.mean()) * 100
        
        results.append({
            'Baseline': baseline,
            'Reference Mean': reference_data.mean(),
            'Baseline Mean': baseline_data.mean(),
            'Improvement (%)': improvement,
            'T-test p-value': t_pval,
            'Mann-Whitney p-value': u_pval,
            'Effect Size (Cohen\'s d)': cohens_d,
            'Significant (p<0.05)': t_pval < 0.05,
            'Effect Size Category': 'Small' if abs(cohens_d) < 0.5 else 'Medium' if abs(cohens_d) < 0.8 else 'Large'
        })
    
    return pd.DataFrame(results)

if not combined_data.empty:
    # Test carbon efficiency improvements
    carbon_tests = perform_pairwise_tests(combined_data, 'carbon_efficiency')
    
    if carbon_tests is not None:
        print("🧪 Statistical Tests - Carbon Efficiency vs Kubernetes Default:")
        display(carbon_tests.round(4))

In [None]:
# Test energy consumption improvements
if not combined_data.empty:
    energy_tests = perform_pairwise_tests(combined_data, 'energy_consumption')
    
    if energy_tests is not None:
        print("🧪 Statistical Tests - Energy Consumption vs Kubernetes Default:")
        display(energy_tests.round(4))

In [None]:
# Test performance score improvements
if not combined_data.empty:
    performance_tests = perform_pairwise_tests(combined_data, 'performance_score')
    
    if performance_tests is not None:
        print("🧪 Statistical Tests - Performance Score vs Kubernetes Default:")
        display(performance_tests.round(4))

## 5. Trade-off Analysis

Let's analyze the trade-offs between different objectives (carbon efficiency, energy consumption, performance).

In [None]:
# Create trade-off analysis
if not combined_data.empty:
    # Calculate correlation between metrics
    correlation_matrix = combined_data[[
        'carbon_efficiency', 'energy_consumption', 'performance_score',
        'response_time', 'throughput', 'resource_utilization'
    ]].corr()
    
    print("🔄 Metric Correlation Analysis:")
    display(correlation_matrix.round(3))
    
    # Visualize correlations
    plt.figure(figsize=(10, 8))
    sns.heatmap(correlation_matrix, annot=True, cmap='RdBu_r', center=0, 
                square=True, linewidths=0.5)
    plt.title('Metric Correlation Matrix')
    plt.tight_layout()
    plt.show()

In [None]:
# Create scatter plots for trade-off analysis
if not combined_data.empty:
    fig, axes = plt.subplots(2, 2, figsize=(15, 12))
    
    # Carbon efficiency vs Energy consumption
    for baseline in combined_data['baseline_type'].unique():
        data = combined_data[combined_data['baseline_type'] == baseline]
        axes[0, 0].scatter(data['carbon_efficiency'], data['energy_consumption'], 
                          label=baseline, alpha=0.6, s=50)
    axes[0, 0].set_xlabel('Carbon Efficiency')
    axes[0, 0].set_ylabel('Energy Consumption (W)')
    axes[0, 0].set_title('Carbon Efficiency vs Energy Consumption')
    axes[0, 0].legend(bbox_to_anchor=(1.05, 1), loc='upper left')
    axes[0, 0].grid(True, alpha=0.3)
    
    # Carbon efficiency vs Performance
    for baseline in combined_data['baseline_type'].unique():
        data = combined_data[combined_data['baseline_type'] == baseline]
        axes[0, 1].scatter(data['carbon_efficiency'], data['performance_score'], 
                          label=baseline, alpha=0.6, s=50)
    axes[0, 1].set_xlabel('Carbon Efficiency')
    axes[0, 1].set_ylabel('Performance Score')
    axes[0, 1].set_title('Carbon Efficiency vs Performance')
    axes[0, 1].grid(True, alpha=0.3)
    
    # Energy consumption vs Performance
    for baseline in combined_data['baseline_type'].unique():
        data = combined_data[combined_data['baseline_type'] == baseline]
        axes[1, 0].scatter(data['energy_consumption'], data['performance_score'], 
                          label=baseline, alpha=0.6, s=50)
    axes[1, 0].set_xlabel('Energy Consumption (W)')
    axes[1, 0].set_ylabel('Performance Score')
    axes[1, 0].set_title('Energy Consumption vs Performance')
    axes[1, 0].grid(True, alpha=0.3)
    
    # 3D-like view: Response time vs Throughput colored by Carbon efficiency
    scatter = axes[1, 1].scatter(combined_data['response_time'], combined_data['throughput'], 
                                c=combined_data['carbon_efficiency'], cmap='viridis', 
                                alpha=0.6, s=50)
    axes[1, 1].set_xlabel('Response Time (ms)')
    axes[1, 1].set_ylabel('Throughput (req/s)')
    axes[1, 1].set_title('Response Time vs Throughput (colored by Carbon Efficiency)')
    plt.colorbar(scatter, ax=axes[1, 1], label='Carbon Efficiency')
    axes[1, 1].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()

## 6. Scenario-Based Analysis

Let's analyze how different baselines perform under various scenarios (workload types, node types, etc.).

In [None]:
# Analyze performance by workload type
if not combined_data.empty and 'workload_type' in combined_data.columns:
    workload_analysis = combined_data.groupby(['baseline_type', 'workload_type'])[[
        'carbon_efficiency', 'energy_consumption', 'performance_score'
    ]].mean().round(3)
    
    print("📋 Performance by Workload Type:")
    display(workload_analysis)
    
    # Visualize workload-specific performance
    fig, axes = plt.subplots(1, 3, figsize=(18, 6))
    
    metrics = ['carbon_efficiency', 'energy_consumption', 'performance_score']
    titles = ['Carbon Efficiency', 'Energy Consumption', 'Performance Score']
    
    for i, (metric, title) in enumerate(zip(metrics, titles)):
        # Create pivot table for heatmap
        pivot_data = combined_data.pivot_table(
            values=metric, 
            index='baseline_type', 
            columns='workload_type', 
            aggfunc='mean'
        )
        
        sns.heatmap(pivot_data, annot=True, fmt='.3f', cmap='RdYlBu_r', ax=axes[i])
        axes[i].set_title(f'{title} by Baseline and Workload')
        axes[i].set_xlabel('Workload Type')
        axes[i].set_ylabel('Baseline Type')
    
    plt.tight_layout()
    plt.show()

In [None]:
# Analyze performance by node type
if not combined_data.empty and 'node_type' in combined_data.columns:
    node_analysis = combined_data.groupby(['baseline_type', 'node_type'])[[
        'carbon_efficiency', 'energy_consumption', 'performance_score'
    ]].mean().round(3)
    
    print("🖥️ Performance by Node Type:")
    display(node_analysis)
    
    # Find best baseline for each node type
    best_by_node = {}
    for node_type in combined_data['node_type'].unique():
        node_data = combined_data[combined_data['node_type'] == node_type]
        best_carbon = node_data.groupby('baseline_type')['carbon_efficiency'].mean().idxmax()
        best_energy = node_data.groupby('baseline_type')['energy_consumption'].mean().idxmin()
        best_perf = node_data.groupby('baseline_type')['performance_score'].mean().idxmax()
        
        best_by_node[node_type] = {
            'best_carbon': best_carbon,
            'best_energy': best_energy,
            'best_performance': best_perf
        }
    
    print("\n🏆 Best Baseline by Node Type:")
    for node_type, bests in best_by_node.items():
        print(f"  {node_type}:")
        print(f"    Carbon: {bests['best_carbon']}")
        print(f"    Energy: {bests['best_energy']}")
        print(f"    Performance: {bests['best_performance']}")

## 7. Principal Component Analysis

Let's use PCA to understand the main dimensions of variation between baselines.

In [None]:
# Perform PCA analysis
if not combined_data.empty:
    # Select numeric features for PCA
    numeric_features = [
        'carbon_efficiency', 'energy_consumption', 'performance_score',
        'cpu_utilization', 'memory_utilization', 'response_time', 'throughput'
    ]
    
    # Filter available features
    available_features = [f for f in numeric_features if f in combined_data.columns]
    
    if len(available_features) >= 3:
        # Prepare data
        pca_data = combined_data[available_features].dropna()
        baseline_labels = combined_data.loc[pca_data.index, 'baseline_type']
        
        # Standardize features
        scaler = StandardScaler()
        pca_data_scaled = scaler.fit_transform(pca_data)
        
        # Perform PCA
        pca = PCA(n_components=min(len(available_features), 4))
        pca_result = pca.fit_transform(pca_data_scaled)
        
        # Print explained variance
        print("📊 PCA Analysis Results:")
        print(f"Explained variance ratio: {pca.explained_variance_ratio_.round(3)}")
        print(f"Cumulative explained variance: {pca.explained_variance_ratio_.cumsum().round(3)}")
        
        # Create PCA visualization
        fig, axes = plt.subplots(1, 2, figsize=(15, 6))
        
        # PCA scatter plot
        unique_baselines = baseline_labels.unique()
        colors = plt.cm.Set3(np.linspace(0, 1, len(unique_baselines)))
        
        for i, baseline in enumerate(unique_baselines):
            mask = baseline_labels == baseline
            axes[0].scatter(pca_result[mask, 0], pca_result[mask, 1], 
                           c=[colors[i]], label=baseline, alpha=0.7, s=50)
        
        axes[0].set_xlabel(f'PC1 ({pca.explained_variance_ratio_[0]:.1%} variance)')
        axes[0].set_ylabel(f'PC2 ({pca.explained_variance_ratio_[1]:.1%} variance)')
        axes[0].set_title('PCA: Baseline Clustering')
        axes[0].legend(bbox_to_anchor=(1.05, 1), loc='upper left')
        axes[0].grid(True, alpha=0.3)
        
        # Feature importance in PCA
        feature_importance = pd.DataFrame(
            pca.components_[:2].T,
            columns=['PC1', 'PC2'],
            index=available_features
        )
        
        feature_importance.plot(kind='bar', ax=axes[1])
        axes[1].set_title('Feature Contributions to Principal Components')
        axes[1].set_ylabel('Component Loading')
        axes[1].tick_params(axis='x', rotation=45)
        axes[1].legend()
        
        plt.tight_layout()
        plt.show()
        
        print("\n🎯 Feature Contributions to PC1 and PC2:")
        display(feature_importance.round(3))
    else:
        print("❌ Insufficient numeric features for PCA analysis")

## 8. Recommendations and Best Practices

Based on our comprehensive baseline analysis, let's generate recommendations.

In [None]:
def generate_baseline_recommendations(combined_data, rankings, statistical_tests):
    """Generate comprehensive recommendations based on baseline analysis"""
    recommendations = []
    
    if not combined_data.empty and rankings is not None:
        # Overall best performer
        best_overall = rankings.index[0]
        recommendations.append(
            f"🏆 Overall Best Performer: '{best_overall}' - Best balanced performance across all metrics"
        )
        
        # Best for specific objectives
        best_carbon = rankings['Carbon Efficiency Rank'].idxmin()
        best_energy = rankings['Energy Efficiency Rank'].idxmin()
        best_performance = rankings['Performance Rank'].idxmin()
        
        recommendations.extend([
            f"🌱 Best for Carbon Efficiency: '{best_carbon}' - Prioritize for environmental goals",
            f"⚡ Best for Energy Efficiency: '{best_energy}' - Prioritize for cost reduction",
            f"🚀 Best for Performance: '{best_performance}' - Prioritize for application responsiveness"
        ])
    
    # Statistical significance insights
    if statistical_tests is not None and not statistical_tests.empty:
        significant_improvements = statistical_tests[
            (statistical_tests['Significant (p<0.05)'] == True) & 
            (statistical_tests['Improvement (%)'] > 0)
        ]
        
        if not significant_improvements.empty:
            best_improvement = significant_improvements.loc[significant_improvements['Improvement (%)'].idxmax()]
            recommendations.append(
                f"📈 Largest Significant Improvement: '{best_improvement['Baseline']}' "
                f"({best_improvement['Improvement (%)']:.1f}% improvement)"
            )
    
    # Workload-specific recommendations
    if 'workload_type' in combined_data.columns:
        for workload in combined_data['workload_type'].unique():
            workload_data = combined_data[combined_data['workload_type'] == workload]
            best_for_workload = workload_data.groupby('baseline_type')['carbon_efficiency'].mean().idxmax()
            recommendations.append(
                f"📋 Best for '{workload}' workloads: '{best_for_workload}'"
            )
    
    # General best practices
    recommendations.extend([
        "🎯 Consider your primary objective when selecting a baseline",
        "⚖️ Evaluate trade-offs between carbon efficiency, energy consumption, and performance",
        "🔄 Test baselines under your specific workload conditions",
        "📊 Monitor baseline performance continuously in production",
        "🧪 Conduct A/B tests when switching between baselines",
        "📈 Consider hybrid approaches for complex multi-objective scenarios",
        "🔍 Validate statistical significance before making production changes"
    ])
    
    return recommendations

# Generate recommendations
recommendations = generate_baseline_recommendations(
    combined_data,
    rankings if 'rankings' in locals() else None,
    carbon_tests if 'carbon_tests' in locals() else None
)

print("💡 Baseline Comparison Recommendations:")
print("=" * 60)
for i, rec in enumerate(recommendations, 1):
    print(f"{i}. {rec}")

## 9. Export Baseline Comparison Results

Let's save our comprehensive baseline analysis results.

In [None]:
# Create comprehensive results summary
baseline_results = {
    'analysis_info': {
        'analysis_date': datetime.now().isoformat(),
        'total_samples': len(combined_data) if not combined_data.empty else 0,
        'baselines_tested': combined_data['baseline_type'].unique().tolist() if not combined_data.empty else [],
        'reference_baseline': 'kubernetes_default'
    },
    'performance_summary': performance_summary.to_dict() if 'performance_summary' in locals() else {},
    'baseline_rankings': rankings.to_dict() if 'rankings' in locals() else {},
    'statistical_tests': {
        'carbon_efficiency': carbon_tests.to_dict('records') if 'carbon_tests' in locals() and carbon_tests is not None else [],
        'energy_consumption': energy_tests.to_dict('records') if 'energy_tests' in locals() and energy_tests is not None else [],
        'performance_score': performance_tests.to_dict('records') if 'performance_tests' in locals() and performance_tests is not None else []
    },
    'correlation_analysis': correlation_matrix.to_dict() if 'correlation_matrix' in locals() else {},
    'workload_analysis': workload_analysis.to_dict() if 'workload_analysis' in locals() else {},
    'node_analysis': node_analysis.to_dict() if 'node_analysis' in locals() else {},
    'pca_analysis': {
        'explained_variance_ratio': pca.explained_variance_ratio_.tolist() if 'pca' in locals() else [],
        'feature_contributions': feature_importance.to_dict() if 'feature_importance' in locals() else {}
    },
    'recommendations': recommendations
}

# Save results
results_path = Path('../results')
results_path.mkdir(exist_ok=True)

with open(results_path / 'baseline_comparison_results.json', 'w') as f:
    json.dump(baseline_results, f, indent=2, default=str)

print("💾 Baseline comparison results saved to: evaluation/results/baseline_comparison_results.json")

# Save rankings as CSV
if 'rankings' in locals():
    rankings.to_csv(results_path / 'baseline_rankings.csv')
    print("🏆 Baseline rankings saved to: evaluation/results/baseline_rankings.csv")

# Save statistical test results
if 'carbon_tests' in locals() and carbon_tests is not None:
    carbon_tests.to_csv(results_path / 'carbon_efficiency_tests.csv', index=False)
    print("🧪 Statistical test results saved to: evaluation/results/carbon_efficiency_tests.csv")

print("\n✅ Baseline comparison analysis complete!")

## Summary

This comprehensive baseline comparison analysis has provided insights into:

### Key Findings:
1. **Performance Rankings**: Identified the best-performing baselines across different metrics
2. **Statistical Significance**: Determined which improvements are statistically reliable
3. **Trade-off Analysis**: Understood the relationships between different performance objectives
4. **Scenario-Specific Performance**: Found optimal baselines for different workload and node types
5. **Principal Components**: Identified the main dimensions of variation between baselines

### Decision Framework:
- **Carbon-First**: Choose baselines optimized for carbon efficiency
- **Cost-First**: Choose baselines optimized for energy consumption
- **Performance-First**: Choose baselines optimized for application performance
- **Balanced**: Choose baselines with the best overall ranking

### Next Steps:
1. **Production Testing**: Validate findings in your specific environment
2. **Monitoring**: Implement continuous monitoring of baseline performance
3. **Optimization**: Fine-tune selected baselines for your use case
4. **Regular Review**: Periodically re-evaluate baseline performance

### Related Notebooks:
- **01_Getting_Started.ipynb**: Basic framework introduction
- **02_Ablation_Studies.ipynb**: Feature importance analysis
- **04_Statistical_Analysis.ipynb**: Advanced statistical methods

Use these insights to make informed decisions about which baseline scheduler to deploy in your Kubernetes environment! 🚀