# Carbon-Kube Evaluation Framework - Getting Started

This notebook provides a comprehensive introduction to the Carbon-Kube evaluation framework for scientific evaluation of carbon-efficient Kubernetes scheduling algorithms.

## Overview

The evaluation framework provides:
- **Baseline Management**: Compare against established baselines
- **Statistical Analysis**: Rigorous statistical testing and analysis
- **Ablation Studies**: Understand feature importance
- **Reproducibility**: Ensure experiments can be reproduced
- **Artifact Management**: Store and version experimental artifacts

## Prerequisites

Make sure you have:
1. Run the setup script: `./evaluation/setup.sh`
2. Activated the Python environment: `source venv/bin/activate`
3. Set environment variables: `export $(cat evaluation/.env | xargs)`

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import yaml
import json
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

# Set up plotting
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")
%matplotlib inline

print("✅ Libraries imported successfully")

## 1. Load and Explore Datasets

Let's start by loading the available datasets and exploring their structure.

In [None]:
# Load dataset configuration
data_path = Path('../data')
config_path = data_path / 'dataset_config.yaml'

with open(config_path, 'r') as f:
    dataset_config = yaml.safe_load(f)

print("📊 Available Dataset Categories:")
for category in dataset_config['datasets'].keys():
    datasets = dataset_config['datasets'][category]
    print(f"  {category}: {len(datasets)} datasets")
    for name in list(datasets.keys())[:3]:  # Show first 3
        print(f"    - {name}")
    if len(datasets) > 3:
        print(f"    ... and {len(datasets) - 3} more")

In [None]:
# Load the main synthetic dataset
main_dataset_path = data_path / 'synthetic' / 'carbon_efficiency_main.csv'
df_main = pd.read_csv(main_dataset_path)

print(f"📈 Main Dataset Shape: {df_main.shape}")
print(f"📅 Date Range: {df_main['timestamp'].min()} to {df_main['timestamp'].max()}")
print("\n🔍 Dataset Info:")
df_main.info()

In [None]:
# Display sample data
print("📋 Sample Data (first 5 rows):")
display(df_main.head())

print("\n📊 Statistical Summary:")
display(df_main.describe())

## 2. Data Exploration and Visualization

Let's explore the key metrics and their relationships.

In [None]:
# Scheduler distribution
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# Scheduler distribution
scheduler_counts = df_main['scheduler'].value_counts()
axes[0, 0].pie(scheduler_counts.values, labels=scheduler_counts.index, autopct='%1.1f%%')
axes[0, 0].set_title('Scheduler Distribution')

# Node type distribution
node_counts = df_main['node_type'].value_counts()
axes[0, 1].bar(node_counts.index, node_counts.values)
axes[0, 1].set_title('Node Type Distribution')
axes[0, 1].tick_params(axis='x', rotation=45)

# Workload type distribution
workload_counts = df_main['workload_type'].value_counts()
axes[1, 0].bar(workload_counts.index, workload_counts.values, color='orange')
axes[1, 0].set_title('Workload Type Distribution')
axes[1, 0].tick_params(axis='x', rotation=45)

# Carbon efficiency distribution
axes[1, 1].hist(df_main['carbon_efficiency'], bins=30, alpha=0.7, color='green')
axes[1, 1].set_title('Carbon Efficiency Distribution')
axes[1, 1].set_xlabel('Carbon Efficiency')
axes[1, 1].set_ylabel('Frequency')

plt.tight_layout()
plt.show()

In [None]:
# Key metrics comparison by scheduler
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

metrics = ['carbon_efficiency', 'energy_consumption', 'performance_score', 'resource_utilization']
titles = ['Carbon Efficiency', 'Energy Consumption (W)', 'Performance Score', 'Resource Utilization (%)']

for i, (metric, title) in enumerate(zip(metrics, titles)):
    row, col = i // 2, i % 2
    
    # Box plot by scheduler
    df_main.boxplot(column=metric, by='scheduler', ax=axes[row, col])
    axes[row, col].set_title(f'{title} by Scheduler')
    axes[row, col].set_xlabel('Scheduler')
    axes[row, col].set_ylabel(title)
    
plt.suptitle('')  # Remove automatic title
plt.tight_layout()
plt.show()

## 3. Basic Statistical Analysis

Let's perform some basic statistical analysis to understand scheduler performance.

In [None]:
# Calculate summary statistics by scheduler
scheduler_stats = df_main.groupby('scheduler')[['carbon_efficiency', 'energy_consumption', 'performance_score']].agg([
    'mean', 'std', 'min', 'max', 'count'
]).round(3)

print("📊 Scheduler Performance Summary:")
display(scheduler_stats)

In [None]:
# Correlation analysis
numeric_cols = ['carbon_efficiency', 'energy_consumption', 'performance_score', 
                'cpu_utilization', 'memory_utilization', 'response_time', 'throughput']

correlation_matrix = df_main[numeric_cols].corr()

plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', center=0, 
            square=True, linewidths=0.5)
plt.title('Correlation Matrix of Key Metrics')
plt.tight_layout()
plt.show()

## 4. Baseline Comparison

Let's compare different schedulers against the default Kubernetes scheduler as a baseline.

In [None]:
# Load baseline datasets
baseline_datasets = {}
baseline_names = ['kubernetes_default', 'carbon_aware_v1', 'energy_efficient', 'performance_optimized']

for name in baseline_names:
    try:
        path = data_path / 'synthetic' / f'baseline_{name}.csv'
        baseline_datasets[name] = pd.read_csv(path)
        print(f"✅ Loaded {name}: {len(baseline_datasets[name])} samples")
    except FileNotFoundError:
        print(f"❌ Could not load {name}")

print(f"\n📊 Loaded {len(baseline_datasets)} baseline datasets")

In [None]:
# Compare baselines
if baseline_datasets:
    baseline_comparison = pd.DataFrame()
    
    for name, df in baseline_datasets.items():
        stats = df[['carbon_efficiency', 'energy_consumption', 'performance_score']].mean()
        stats.name = name
        baseline_comparison = pd.concat([baseline_comparison, stats], axis=1)
    
    baseline_comparison = baseline_comparison.T
    
    print("🏆 Baseline Comparison (Mean Values):")
    display(baseline_comparison.round(3))
    
    # Visualize baseline comparison
    fig, axes = plt.subplots(1, 3, figsize=(15, 5))
    
    metrics = ['carbon_efficiency', 'energy_consumption', 'performance_score']
    titles = ['Carbon Efficiency', 'Energy Consumption', 'Performance Score']
    
    for i, (metric, title) in enumerate(zip(metrics, titles)):
        baseline_comparison[metric].plot(kind='bar', ax=axes[i], color='skyblue')
        axes[i].set_title(f'{title} Comparison')
        axes[i].set_ylabel(title)
        axes[i].tick_params(axis='x', rotation=45)
    
    plt.tight_layout()
    plt.show()

## 5. Statistical Significance Testing

Let's perform statistical tests to determine if differences between schedulers are significant.

In [None]:
from scipy import stats

def perform_statistical_tests(df, metric, group_col='scheduler'):
    """Perform statistical tests between groups"""
    groups = df[group_col].unique()
    results = []
    
    for i, group1 in enumerate(groups):
        for group2 in groups[i+1:]:
            data1 = df[df[group_col] == group1][metric]
            data2 = df[df[group_col] == group2][metric]
            
            # T-test
            t_stat, t_pval = stats.ttest_ind(data1, data2)
            
            # Mann-Whitney U test (non-parametric)
            u_stat, u_pval = stats.mannwhitneyu(data1, data2, alternative='two-sided')
            
            # Effect size (Cohen's d)
            pooled_std = np.sqrt(((len(data1) - 1) * data1.var() + (len(data2) - 1) * data2.var()) / 
                                (len(data1) + len(data2) - 2))
            cohens_d = (data1.mean() - data2.mean()) / pooled_std
            
            results.append({
                'Group 1': group1,
                'Group 2': group2,
                'Mean 1': data1.mean(),
                'Mean 2': data2.mean(),
                'T-test p-value': t_pval,
                'Mann-Whitney p-value': u_pval,
                'Effect Size (Cohen\'s d)': cohens_d,
                'Significant (p<0.05)': t_pval < 0.05
            })
    
    return pd.DataFrame(results)

# Test carbon efficiency differences
carbon_tests = perform_statistical_tests(df_main, 'carbon_efficiency')
print("🧪 Statistical Tests for Carbon Efficiency:")
display(carbon_tests.round(4))

In [None]:
# Test energy consumption differences
energy_tests = perform_statistical_tests(df_main, 'energy_consumption')
print("🧪 Statistical Tests for Energy Consumption:")
display(energy_tests.round(4))

## 6. Time Series Analysis

Let's analyze temporal patterns in the data.

In [None]:
# Load time series dataset
timeseries_path = data_path / 'synthetic' / 'timeseries_14days.csv'
df_ts = pd.read_csv(timeseries_path)
df_ts['timestamp'] = pd.to_datetime(df_ts['timestamp'])
df_ts = df_ts.sort_values('timestamp')

print(f"📈 Time Series Dataset: {len(df_ts)} samples")
print(f"📅 Date Range: {df_ts['timestamp'].min()} to {df_ts['timestamp'].max()}")

# Plot time series
fig, axes = plt.subplots(3, 1, figsize=(15, 12))

# Carbon efficiency over time
axes[0].plot(df_ts['timestamp'], df_ts['carbon_efficiency'], alpha=0.7, color='green')
axes[0].set_title('Carbon Efficiency Over Time')
axes[0].set_ylabel('Carbon Efficiency')
axes[0].grid(True, alpha=0.3)

# Energy consumption over time
axes[1].plot(df_ts['timestamp'], df_ts['energy_consumption'], alpha=0.7, color='red')
axes[1].set_title('Energy Consumption Over Time')
axes[1].set_ylabel('Energy Consumption (W)')
axes[1].grid(True, alpha=0.3)

# Load factor over time
axes[2].plot(df_ts['timestamp'], df_ts['load_factor'], alpha=0.7, color='blue')
axes[2].set_title('System Load Factor Over Time')
axes[2].set_ylabel('Load Factor')
axes[2].set_xlabel('Time')
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## 7. Performance Recommendations

Based on our analysis, let's generate some recommendations.

In [None]:
def generate_recommendations(df):
    """Generate performance recommendations based on data analysis"""
    recommendations = []
    
    # Analyze scheduler performance
    scheduler_perf = df.groupby('scheduler')[['carbon_efficiency', 'energy_consumption', 'performance_score']].mean()
    
    # Best carbon efficiency
    best_carbon = scheduler_perf['carbon_efficiency'].idxmax()
    best_carbon_score = scheduler_perf.loc[best_carbon, 'carbon_efficiency']
    recommendations.append(f"🌱 For best carbon efficiency, use '{best_carbon}' scheduler (score: {best_carbon_score:.3f})")
    
    # Lowest energy consumption
    lowest_energy = scheduler_perf['energy_consumption'].idxmin()
    lowest_energy_score = scheduler_perf.loc[lowest_energy, 'energy_consumption']
    recommendations.append(f"⚡ For lowest energy consumption, use '{lowest_energy}' scheduler ({lowest_energy_score:.1f}W)")
    
    # Best performance
    best_perf = scheduler_perf['performance_score'].idxmax()
    best_perf_score = scheduler_perf.loc[best_perf, 'performance_score']
    recommendations.append(f"🚀 For best performance, use '{best_perf}' scheduler (score: {best_perf_score:.3f})")
    
    # Workload-specific recommendations
    workload_perf = df.groupby(['workload_type', 'scheduler'])['carbon_efficiency'].mean().unstack()
    for workload in workload_perf.index:
        best_scheduler = workload_perf.loc[workload].idxmax()
        best_score = workload_perf.loc[workload, best_scheduler]
        recommendations.append(f"📋 For '{workload}' workloads, use '{best_scheduler}' scheduler (carbon efficiency: {best_score:.3f})")
    
    return recommendations

recommendations = generate_recommendations(df_main)

print("💡 Performance Recommendations:")
print("=" * 50)
for i, rec in enumerate(recommendations, 1):
    print(f"{i}. {rec}")

## 8. Export Results

Let's save our analysis results for future reference.

In [None]:
# Create results summary
results_summary = {
    'analysis_date': datetime.now().isoformat(),
    'dataset_info': {
        'main_dataset_samples': len(df_main),
        'schedulers_tested': df_main['scheduler'].unique().tolist(),
        'node_types': df_main['node_type'].unique().tolist(),
        'workload_types': df_main['workload_type'].unique().tolist()
    },
    'scheduler_performance': scheduler_stats.to_dict(),
    'statistical_tests': {
        'carbon_efficiency': carbon_tests.to_dict('records'),
        'energy_consumption': energy_tests.to_dict('records')
    },
    'recommendations': recommendations
}

# Save results
results_path = Path('../results')
results_path.mkdir(exist_ok=True)

with open(results_path / 'getting_started_analysis.json', 'w') as f:
    json.dump(results_summary, f, indent=2, default=str)

print("💾 Results saved to: evaluation/results/getting_started_analysis.json")
print("\n✅ Analysis complete! Check the results directory for detailed outputs.")

## Next Steps

Now that you've completed the getting started tutorial, you can:

1. **Explore Advanced Analysis**: Check out the other notebooks for more detailed analysis
2. **Run Ablation Studies**: Use `02_Ablation_Studies.ipynb` to understand feature importance
3. **Baseline Comparisons**: Use `03_Baseline_Comparison.ipynb` for detailed baseline analysis
4. **Custom Experiments**: Create your own experiments using the framework
5. **Integration**: Integrate the framework with your own Kubernetes cluster

## Resources

- **Documentation**: `evaluation/docs/`
- **Configuration**: `evaluation/configs/`
- **Example Data**: `evaluation/data/`
- **Results**: `evaluation/results/`

Happy evaluating! 🚀