# Runtime-Optimized VM Allocation for Benchmarks

This notebook implements a bin packing algorithm to allocate benchmarks to VMs based on actual HiGHS v1.10 runtime data, aiming to minimize total runtime variance across VMs.

## Important: HiGHS Variant Multiplier

**We run 5 HiGHS solver variants for each benchmark:**
1. `highs` (v1.10.0 standard)
2. `highs-hipo-ipm` (v1.11.0 with IPM solver)
3. `highs-hipo` (v1.11.0 with HiPO solver + 128 block size)
4. `highs-hipo-32` (v1.11.0 with HiPO + 32 block size)
5. `highs-hipo-64` (v1.11.0 with HiPO + 64 block size)

Therefore, **actual VM runtime = base HiGHS runtime × 5**

This multiplier is applied to all runtime calculations and VM allocations to accurately reflect the actual wall-clock time each VM will take to complete all benchmark variants.

In [175]:
import heapq
from pathlib import Path

import numpy as np
import pandas as pd
import yaml

# CONFIGURATION
MAX_RUNTIME_PER_VM_SECONDS = (
    4 * 24 * 3600
)  # 4 days = 345,600 seconds (max wall-clock time per VM)
# MAX_RUNTIME_PER_VM_SECONDS = None  # Set to None for no limit

hipo_variants = ["highs-hipo-ipm", "highs-hipo-128", "highs-hipo-32", "highs-hipo-64"]
NUM_HIGHS_VARIANTS = len(hipo_variants) + 1  # +1 for standard 'highs'

print("Configuration:")
print(
    f"  Max runtime per VM: {MAX_RUNTIME_PER_VM_SECONDS / 3600:.1f} hours ({MAX_RUNTIME_PER_VM_SECONDS / (24 * 3600):.1f} days)"
    if MAX_RUNTIME_PER_VM_SECONDS
    else "  No runtime cap"
)
print(f"  HiGHS variants to run: {NUM_HIGHS_VARIANTS}")

Configuration:
  Max runtime per VM: 96.0 hours (4.0 days)
  HiGHS variants to run: 5


## Load and Process Runtime Data

In [176]:
# Load HiGHS runtime data (including HiGHS-hipo if available)
highs_data = pd.read_csv(
    "../main_results.csv",
    header=None,
    names=[
        "Benchmark",
        "Size",
        "Solver",
        "Solver Version",
        "Solver Release Year",
        "Status",
        "Termination Condition",
        "Runtime (s)",
        "Memory Usage (MB)",
        "Objective Value",
        "Max Integrality Violation",
        "Duality Gap",
        "Reported Runtime (s)",
        "Timeout",
        "Hostname",
        "Run ID",
        "Timestamp",
    ],
)

# Filter for HiGHS 1.10.0 (2025) and HiGHS-hipo successful runs
highs_v110 = highs_data[
    ((highs_data["Solver Version"] == "1.10.0") & (highs_data["Solver"] == "highs"))
]

print(f"Found {len(highs_v110)} successful HiGHS v1.10/hipo benchmark runs")
print(f"Solvers included: {highs_v110['Solver'].unique()}")
highs_v110.head()

Found 120 successful HiGHS v1.10/hipo benchmark runs
Solvers included: ['highs']


Unnamed: 0,Benchmark,Size,Solver,Solver Version,Solver Release Year,Status,Termination Condition,Runtime (s),Memory Usage (MB),Objective Value,Max Integrality Violation,Duality Gap,Reported Runtime (s),Timeout,Hostname,Run ID,Timestamp
45,genx-4_three_zones_w_policies_slack,3-1h,highs,1.10.0,2025,TO,Timeout,3600.0,3812.844,,,,3600.0,3600.0,benchmark-instance-z-m37,20250429_090606_benchmark-instance-z-m37,2025-04-29 22:40:55.143181
47,genx-6_three_zones_w_multistage-no_uc,3-1h,highs,1.10.0,2025,ok,optimal,170.6828514480003,524.544,16995.001810747806,,,169.56626176834106,3600.0,benchmark-instance-z-m37,20250429_090606_benchmark-instance-z-m37,2025-04-30 00:48:26.168300
49,pypsa-eur-elec-trex,6-12h,highs,1.10.0,2025,ok,optimal,226.40647792699747,692.944,7243765719.563413,,,224.6772541999817,3600.0,benchmark-instance-z-m37,20250429_090606_benchmark-instance-z-m37,2025-04-30 01:05:00.905452
51,times-etimeseu-france-elec+heat-co2-multi_stage,1-64ts,highs,1.10.0,2025,ok,optimal,22.21635568299825,365.092,427842.1201974797,,,21.157464027404785,3600.0,benchmark-instance-z-m37,20250429_090606_benchmark-instance-z-m37,2025-04-30 01:13:25.552901
86,DCOPF-Carolinas_uc_2M,1-997,highs,1.10.0,2025,ok,optimal,2080.905538397994,2878.12,4463695.700557045,3.502851030676985e-11,9.478984509261144e-05,2079.601967334748,3600.0,benchmark-instance-z2-m41,20250503_040156_benchmark-instance-z2-m41,2025-05-03 21:11:07.318952


In [177]:
# Create benchmark runtime mapping
# IMPORTANT: We multiply runtime by NUM_HIGHS_VARIANTS since each benchmark
# is run with 5 different HiGHS variants (highs, highs-hipo-128, highs-hipo-ipm, highs-hipo-32, highs-hipo-64)
benchmark_runtimes = {}
for _, row in highs_v110.iterrows():
    benchmark_key = f"{row['Benchmark']}-{row['Size']}"
    try:
        base_runtime = float(row["Runtime (s)"])
        # Multiply by number of variants since we run all variants for each benchmark
        if pd.isna(base_runtime):
            base_runtime = float(row["Timeout"])
        actual_vm_runtime = base_runtime * NUM_HIGHS_VARIANTS
        benchmark_runtimes[benchmark_key] = actual_vm_runtime
    except Exception as e:
        print(f"Error processing row: {row}")
        print(f"Error: {e}")

print(f"Runtime data available for {len(benchmark_runtimes)} benchmarks")
print(
    f"Base total runtime (single variant): {sum(benchmark_runtimes.values()) / NUM_HIGHS_VARIANTS} seconds ({sum(benchmark_runtimes.values()) / NUM_HIGHS_VARIANTS / 3600:.1f} hours)"
)
print(
    f"Actual total runtime ({NUM_HIGHS_VARIANTS} variants): {sum(benchmark_runtimes.values())} seconds ({sum(benchmark_runtimes.values()) / 3600:.1f} hours)"
)
print(
    f"Multiplier applied: {NUM_HIGHS_VARIANTS}x (running {NUM_HIGHS_VARIANTS} HiGHS variants per benchmark)"
)

Runtime data available for 120 benchmarks
Base total runtime (single variant): 525601.61061178 seconds (146.0 hours)
Actual total runtime (5 variants): 2628008.0530589 seconds (730.0 hours)
Multiplier applied: 5x (running 5 HiGHS variants per benchmark)


## Load Benchmark Metadata

In [178]:
# Load benchmark metadata to get size categories and URLs
meta = yaml.safe_load(open("../results/metadata.yaml"))

# Create a lookup for metadata
metadata_lookup = {}
for name, benchmark in meta["benchmarks"].items():
    for size_info in benchmark["Sizes"]:
        instance_key = f"{name}-{size_info['Name']}"
        metadata_lookup[instance_key] = size_info

benchmarks_by_size = {"S": [], "M": [], "L": []}
all_benchmark_instances = []

for _, row in highs_v110.iterrows():
    instance_key = f"{row['Benchmark']}-{row['Size']}"

    # Get metadata for this instance
    size_info = metadata_lookup.get(instance_key)
    if size_info is None:
        print(f"Warning: No metadata found for {instance_key}")
        continue

    # Apply variant multiplier to runtime
    if pd.isna(row["Runtime (s)"]):
        base_runtime = float(row["Timeout"])
    else:
        base_runtime = float(row["Runtime (s)"])

    actual_runtime: float = base_runtime * NUM_HIGHS_VARIANTS

    instance = {
        "name": row["Benchmark"],
        "size_name": row["Size"],
        "size_category": size_info["Size"],
        "instance_key": instance_key,
        "runtime": actual_runtime,  # Runtime multiplied by number of variants
        "base_runtime": base_runtime,  # Store original single-variant runtime for reference
        "num_variables": size_info.get("Num. variables", 0),
        "num_constraints": size_info.get("Num. constraints", 0),
        "url": size_info["URL"],
    }

    benchmarks_by_size[size_info["Size"]].append(instance)
    all_benchmark_instances.append(instance)

print(
    f"Total benchmark instances (from filtered dataset): {len(all_benchmark_instances)}"
)
for size, instances in benchmarks_by_size.items():
    print(f"  {size}: {len(instances)}")
print("All instances have runtime data from highs_v110")
print(
    f"Runtime multiplier: {NUM_HIGHS_VARIANTS}x (to account for running all HiGHS variants)"
)

Total benchmark instances (from filtered dataset): 120
  S: 18
  M: 87
  L: 15
All instances have runtime data from highs_v110
Runtime multiplier: 5x (to account for running all HiGHS variants)


## Bin Packing Algorithms

In [179]:
class VMAllocation:
    def __init__(self, vm_id: int):
        self.vm_id = vm_id
        self.benchmarks = []
        self.total_runtime = 0.0

    def add_benchmark(self, benchmark: dict):
        """Add benchmark with real runtime data only"""
        if benchmark["runtime"] is None:
            raise ValueError(
                f"Benchmark {benchmark['instance_key']} has no runtime data!"
            )

        self.benchmarks.append(benchmark)
        self.total_runtime += benchmark["runtime"]

    def get_total_runtime(self):
        return self.total_runtime

    def __lt__(self, other):
        # For heap operations - compare by total runtime
        return self.total_runtime < other.total_runtime

In [180]:
def first_fit_decreasing(benchmarks: list[dict], num_vms: int) -> list[VMAllocation]:
    """
    First Fit Decreasing bin packing algorithm.
    Uses ONLY benchmarks with real runtime data.
    """
    # Filter to only benchmarks with real runtime data
    runtime_benchmarks = [b for b in benchmarks if b["runtime"] is not None]
    print(
        f"Using {len(runtime_benchmarks)} benchmarks with real runtime data (filtered from {len(benchmarks)} total)"
    )

    # Create VMs
    vms = [VMAllocation(i) for i in range(num_vms)]

    # Sort benchmarks by runtime (descending)
    sorted_benchmarks = sorted(
        runtime_benchmarks, key=lambda x: x["runtime"], reverse=True
    )

    # Assign benchmarks to VMs
    for benchmark in sorted_benchmarks:
        # Find VM with minimum current runtime
        min_vm = min(vms, key=lambda vm: vm.total_runtime)
        min_vm.add_benchmark(benchmark)

    return vms

In [181]:
def longest_processing_time_first(
    benchmarks: list[dict], num_vms: int
) -> list[VMAllocation]:
    """
    Longest Processing Time First algorithm using a min-heap.
    Uses ONLY benchmarks with real runtime data.
    """
    # Filter to only benchmarks with real runtime data
    runtime_benchmarks = [b for b in benchmarks if b["runtime"] is not None]
    print(
        f"Using {len(runtime_benchmarks)} benchmarks with real runtime data (filtered from {len(benchmarks)} total)"
    )

    # Create VMs and initialize heap
    vms = [VMAllocation(i) for i in range(num_vms)]
    vm_heap = list(vms)  # Min-heap based on total runtime
    heapq.heapify(vm_heap)

    # Sort benchmarks by runtime (descending)
    sorted_benchmarks = sorted(
        runtime_benchmarks, key=lambda x: x["runtime"], reverse=True
    )

    # Assign benchmarks
    for benchmark in sorted_benchmarks:
        # Get VM with minimum load
        min_vm = heapq.heappop(vm_heap)
        min_vm.add_benchmark(benchmark)
        # Re-insert VM into heap
        heapq.heappush(vm_heap, min_vm)

    return vms

In [182]:
def balanced_partition(
    benchmarks: list[dict], num_vms: int, max_runtime_per_vm: float = None
) -> list[VMAllocation]:
    """
    Balanced partition algorithm that tries to achieve equal total runtime per VM.
    Uses ONLY benchmarks with real runtime data.

    If max_runtime_per_vm is set:
    - Automatically creates additional VMs if needed to respect the cap
    - No VM will exceed max_runtime_per_vm (strictly enforced)

    Args:
        benchmarks: List of benchmark dictionaries with runtime data
        num_vms: Initial number of VMs to create
        max_runtime_per_vm: Maximum runtime allowed per VM (in seconds). If None, no limit.
    """
    # Filter to only benchmarks with real runtime data
    runtime_benchmarks = [
        b for b in benchmarks if b["runtime"] is not None and not pd.isna(b["runtime"])
    ]
    print(
        f"Using {len(runtime_benchmarks)} benchmarks with real runtime data (filtered from {len(benchmarks)} total)"
    )

    if len(runtime_benchmarks) == 0:
        return []

    # Calculate total runtime and target per VM
    total_runtime = sum(b["runtime"] for b in runtime_benchmarks)

    # If max_runtime_per_vm is set, ensure we have enough VMs
    if max_runtime_per_vm is not None:
        min_vms_needed = int(np.ceil(total_runtime / max_runtime_per_vm))
        if min_vms_needed > num_vms:
            print(
                f"⚠️  WARNING: Initial {num_vms} VMs cannot fit all benchmarks within {max_runtime_per_vm / 3600:.1f}h limit"
            )
            print(
                f"   Automatically increasing to {min_vms_needed} VMs to respect runtime cap"
            )
            num_vms = min_vms_needed

    target_runtime_per_vm = total_runtime / num_vms

    print(
        f"Target runtime per VM: {target_runtime_per_vm:.1f} seconds ({target_runtime_per_vm / 3600:.1f} hours)"
    )
    if max_runtime_per_vm is not None:
        print(
            f"Max runtime per VM (hard cap): {max_runtime_per_vm:.1f} seconds ({max_runtime_per_vm / 3600:.1f} hours)"
        )

    # Create initial VMs
    vms = [VMAllocation(i) for i in range(num_vms)]

    # Sort benchmarks by runtime (descending) - largest first for better bin packing
    sorted_benchmarks = sorted(
        runtime_benchmarks, key=lambda x: x["runtime"], reverse=True
    )

    # Assign benchmarks with balance consideration
    for benchmark in sorted_benchmarks:
        benchmark_runtime = benchmark["runtime"]

        # Find VM that would be closest to target after adding this benchmark
        best_vm = None
        best_score = float("inf")

        for vm in vms:
            current_runtime = vm.total_runtime
            after_runtime = current_runtime + benchmark_runtime

            # HARD CAP: Skip if this would exceed max runtime
            if max_runtime_per_vm is not None and after_runtime > max_runtime_per_vm:
                continue  # This VM cannot take this benchmark

            # Score based on deviation from target
            score = abs(after_runtime - target_runtime_per_vm)

            # Prefer VMs that are under-loaded
            if current_runtime < target_runtime_per_vm:
                score *= 0.8  # Bonus for under-loaded VMs

            if score < best_score:
                best_score = score
                best_vm = vm

        # If no VM can take this benchmark, create a new one
        if best_vm is None:
            if (
                max_runtime_per_vm is not None
                and benchmark_runtime > max_runtime_per_vm
            ):
                print(
                    f"❌ ERROR: Benchmark {benchmark['instance_key']} runtime ({benchmark_runtime / 3600:.1f}h) exceeds max VM runtime ({max_runtime_per_vm / 3600:.1f}h)!"
                )
                print(
                    "   This benchmark CANNOT fit in any VM. Consider increasing MAX_RUNTIME_PER_VM_SECONDS."
                )
                # Still add it to a new VM, but warn the user

            print(
                f"⚠️  Creating additional VM #{len(vms)} for benchmark {benchmark['instance_key']} ({benchmark_runtime / 3600:.1f}h)"
            )
            best_vm = VMAllocation(len(vms))
            vms.append(best_vm)

        best_vm.add_benchmark(benchmark)

    # Report on VMs created
    if len(vms) > num_vms:
        print(
            f"✓ Created {len(vms) - num_vms} additional VMs to respect runtime cap (total: {len(vms)} VMs)"
        )

    return vms

## Algorithm Comparison

In [183]:
def analyze_allocation(vms: list[VMAllocation], algorithm_name: str):
    """
    Analyze and print statistics for a VM allocation.
    """
    runtimes = [vm.total_runtime for vm in vms]

    # Filter out empty VMs (should not happen with real runtime data only)
    active_vms = [vm for vm in vms if vm.total_runtime > 0]
    active_runtimes = [vm.total_runtime for vm in active_vms]

    print(f"\n=== {algorithm_name} ===")
    print(f"Total VMs created: {len(vms)}")
    print(f"Active VMs (with benchmarks): {len(active_vms)}")
    print(f"Empty VMs: {len(vms) - len(active_vms)}")

    if len(active_vms) > 0:
        print(
            f"Total runtime: {sum(active_runtimes):.1f} seconds ({sum(active_runtimes) / 3600:.1f} hours)"
        )
        print(
            f"Average runtime per active VM: {np.mean(active_runtimes):.1f} seconds ({np.mean(active_runtimes) / 3600:.1f} hours)"
        )
        print(
            f"Runtime standard deviation: {np.std(active_runtimes):.1f} seconds ({np.std(active_runtimes) / 3600:.1f} hours)"
        )
        print(
            f"Min runtime: {min(active_runtimes):.1f} seconds ({min(active_runtimes) / 3600:.1f} hours)"
        )
        print(
            f"Max runtime: {max(active_runtimes):.1f} seconds ({max(active_runtimes) / 3600:.1f} hours)"
        )
        print(
            f"Runtime ratio (max/min): {max(active_runtimes) / min(active_runtimes):.2f}"
        )

        # Efficiency (how balanced the allocation is)
        efficiency = 1 - (np.std(active_runtimes) / np.mean(active_runtimes))
        print(f"Load balance efficiency: {efficiency:.3f} (1.0 = perfect balance)")
    else:
        print("No active VMs found!")
        efficiency = 0

    return {
        "algorithm": algorithm_name,
        "total_runtime": sum(active_runtimes) if active_vms else 0,
        "std_runtime": np.std(active_runtimes) if active_vms else 0,
        "max_runtime": max(active_runtimes) if active_vms else 0,
        "min_runtime": min(active_runtimes) if active_vms else 0,
        "efficiency": efficiency,
        "runtimes": runtimes,
        "active_vms": len(active_vms),
        "num_vms": len(vms),
    }

In [184]:
# Use ONLY benchmarks that have real runtime data - no estimation!
benchmarks_with_runtime = [
    b for b in all_benchmark_instances if b["runtime"] is not None
]
print(f"Using {len(benchmarks_with_runtime)} benchmarks with real HiGHS runtime data")
print(
    f"Excluded {len(all_benchmark_instances) - len(benchmarks_with_runtime)} benchmarks without runtime data"
)
print(
    f"Base total runtime (single HiGHS variant): {sum(b['base_runtime'] for b in benchmarks_with_runtime) / 3600:.1f} hours"
)
print(
    f"Actual total runtime ({NUM_HIGHS_VARIANTS} HiGHS variants): {sum(b['runtime'] for b in benchmarks_with_runtime) / 3600:.1f} hours"
)
print(f"Runtime multiplier: {NUM_HIGHS_VARIANTS}x")

if MAX_RUNTIME_PER_VM_SECONDS is not None:
    print(
        f"\n⚙️  Runtime cap enabled: {MAX_RUNTIME_PER_VM_SECONDS} seconds ({MAX_RUNTIME_PER_VM_SECONDS / 3600:.1f} hours) per VM"
    )
else:
    print("\n⚙️  No runtime cap configured (unlimited)")

# Separate L-size benchmarks for highmem machines
l_size_benchmarks = [b for b in benchmarks_with_runtime if b["size_category"] == "L"]
non_l_benchmarks = [b for b in benchmarks_with_runtime if b["size_category"] != "L"]

print("\nBenchmark separation by size category:")
print(
    f"  L-size (highmem): {len(l_size_benchmarks)} benchmarks, {sum(b['runtime'] for b in l_size_benchmarks) / 3600:.1f} hours (with {NUM_HIGHS_VARIANTS} variants)"
)
print(
    f"  S/M-size (standard): {len(non_l_benchmarks)} benchmarks, {sum(b['runtime'] for b in non_l_benchmarks) / 3600:.1f} hours (with {NUM_HIGHS_VARIANTS} variants)"
)

# Test different numbers of VMs for each category
results = []

# L-size benchmarks (fewer VMs since they need highmem)
print(f"\n{'=' * 50}")
print("TESTING L-SIZE BENCHMARKS (HIGHMEM MACHINES)")
print(f"{'=' * 50}")

l_vm_options = [2, 3, 4, 5] if len(l_size_benchmarks) > 0 else [1]
for num_vms in l_vm_options:
    if len(l_size_benchmarks) == 0:
        print("No L-size benchmarks with runtime data")
        break

    print(f"\nTesting {num_vms} highmem VMs for L-size benchmarks:")

    bp_vms = balanced_partition(l_size_benchmarks, num_vms, MAX_RUNTIME_PER_VM_SECONDS)
    bp_result = analyze_allocation(bp_vms, f"L-size Balanced Partition ({num_vms} VMs)")
    bp_result["num_vms"] = num_vms
    bp_result["size_category"] = "L"
    results.append(bp_result)

# S/M-size benchmarks (more VMs with standard machines)
print(f"\n{'=' * 50}")
print("TESTING S/M-SIZE BENCHMARKS (STANDARD MACHINES)")
print(f"{'=' * 50}")

sm_vm_options = [8, 10, 12, 15]
for num_vms in sm_vm_options:
    if len(non_l_benchmarks) == 0:
        print("No S/M-size benchmarks with runtime data")
        break

    print(f"\nTesting {num_vms} standard VMs for S/M-size benchmarks:")

    bp_vms = balanced_partition(non_l_benchmarks, num_vms, MAX_RUNTIME_PER_VM_SECONDS)
    bp_result = analyze_allocation(
        bp_vms, f"S/M-size Balanced Partition ({num_vms} VMs)"
    )
    bp_result["num_vms"] = num_vms
    bp_result["size_category"] = "S/M"
    results.append(bp_result)

Using 120 benchmarks with real HiGHS runtime data
Excluded 0 benchmarks without runtime data
Base total runtime (single HiGHS variant): 146.0 hours
Actual total runtime (5 HiGHS variants): 730.0 hours
Runtime multiplier: 5x

⚙️  Runtime cap enabled: 345600 seconds (96.0 hours) per VM

Benchmark separation by size category:
  L-size (highmem): 15 benchmarks, 507.9 hours (with 5 variants)
  S/M-size (standard): 105 benchmarks, 222.1 hours (with 5 variants)

TESTING L-SIZE BENCHMARKS (HIGHMEM MACHINES)

Testing 2 highmem VMs for L-size benchmarks:
Using 15 benchmarks with real runtime data (filtered from 15 total)
   Automatically increasing to 6 VMs to respect runtime cap
Target runtime per VM: 304753.1 seconds (84.7 hours)
Max runtime per VM (hard cap): 345600.0 seconds (96.0 hours)
⚠️  Creating additional VM #6 for benchmark times-etimeseu-europe-elec+heat-co2-multi_stage-29-64ts (50.0h)
⚠️  Creating additional VM #7 for benchmark genx-elec_trex-15-168h (50.0h)
✓ Created 2 additional V

## Results Summary

In [185]:
# Print summary comparison table
print("\\n" + "=" * 80)
print("ALGORITHM COMPARISON SUMMARY")
print("=" * 80)

df_results = pd.DataFrame(results)

# Separate results by size category
l_results = (
    df_results[df_results["size_category"] == "L"]
    if "size_category" in df_results.columns
    else pd.DataFrame()
)
sm_results = (
    df_results[df_results["size_category"] == "S/M"]
    if "size_category" in df_results.columns
    else df_results
)

print(
    f"\\n{'Size':<6} {'VM Count':<9} {'Algorithm':<25} {'Efficiency':<12} {'Max Runtime (h)':<15} {'Std Dev (h)':<12}"
)
print("-" * 85)

for _, row in df_results.iterrows():
    size_cat = row.get("size_category", "Mixed")
    alg_name = row["algorithm"].split("(")[0].strip()
    print(
        f"{size_cat:<6} {row['num_vms']:<9} {alg_name:<25} "
        f"{row['efficiency']:.3f}{'':8} {row['max_runtime'] / 3600:.1f}{'':12} "
        f"{row['std_runtime'] / 3600:.1f}"
    )

# Find best configurations for each size category
print(f"\\n{'=' * 80}")
print("BEST CONFIGURATIONS:")
print(f"{'=' * 80}")

if len(l_results) > 0:
    best_l = l_results.loc[l_results["efficiency"].idxmax()]
    print(
        f"Best L-size (highmem): {best_l['num_vms']} VMs (efficiency: {best_l['efficiency']:.3f})"
    )

if len(sm_results) > 0:
    best_sm = sm_results.loc[sm_results["efficiency"].idxmax()]
    print(
        f"Best S/M-size (standard): {best_sm['num_vms']} VMs (efficiency: {best_sm['efficiency']:.3f})"
    )

# Calculate total deployment
if len(l_results) > 0 and len(sm_results) > 0:
    total_vms = best_l["num_vms"] + best_sm["num_vms"]
    total_efficiency = (best_l["efficiency"] + best_sm["efficiency"]) / 2
    print(
        f"\\nTotal deployment: {total_vms} VMs ({best_l['num_vms']} highmem + {best_sm['num_vms']} standard)"
    )
    print(f"Average efficiency: {total_efficiency:.3f}")
elif len(sm_results) > 0:
    print(f"\\nTotal deployment: {best_sm['num_vms']} standard VMs only")
    print(f"Efficiency: {best_sm['efficiency']:.3f}")

ALGORITHM COMPARISON SUMMARY
\nSize   VM Count  Algorithm                 Efficiency   Max Runtime (h) Std Dev (h) 
-------------------------------------------------------------------------------------
L      2         L-size Balanced Partition 0.726         86.9             17.4
L      3         L-size Balanced Partition 0.726         86.9             17.4
L      4         L-size Balanced Partition 0.726         86.9             17.4
L      5         L-size Balanced Partition 0.726         86.9             17.4
S/M    8         S/M-size Balanced Partition 0.982         45.0             0.8
S/M    10        S/M-size Balanced Partition 1.000         37.0             0.0
S/M    12        S/M-size Balanced Partition 1.000         31.7             0.0
S/M    15        S/M-size Balanced Partition 0.987         23.1             0.3
BEST CONFIGURATIONS:
Best L-size (highmem): 2 VMs (efficiency: 0.726)
Best S/M-size (standard): 12 VMs (efficiency: 1.000)
\nTotal deployment: 14 VMs (2 highmem +

## Generate Optimal Allocation

In [186]:
# Generate optimal allocations for both size categories
print("\n\nGenerating optimal allocations with size-based machine separation...")

if MAX_RUNTIME_PER_VM_SECONDS is not None:
    print(
        f"Runtime cap: {MAX_RUNTIME_PER_VM_SECONDS}s ({MAX_RUNTIME_PER_VM_SECONDS / 3600:.1f}h) per VM"
    )

optimal_l_vms = []
optimal_sm_vms = []
best_l_result = None
best_sm_result = None

# Generate L-size allocation (highmem machines)
if len(l_results) > 0:
    best_l_result = l_results.loc[l_results["efficiency"].idxmax()]
    optimal_l_num_vms = best_l_result["num_vms"]

    print(f"\nL-size benchmarks: {optimal_l_num_vms} highmem VMs")
    print(f"Efficiency: {best_l_result['efficiency']:.3f}")
    print(f"Max VM runtime: {best_l_result['max_runtime'] / 3600:.1f} hours")

    optimal_l_vms = balanced_partition(
        l_size_benchmarks, optimal_l_num_vms, MAX_RUNTIME_PER_VM_SECONDS
    )
    l_final_result = analyze_allocation(
        optimal_l_vms, "Final L-size Allocation - Highmem"
    )

# Generate S/M-size allocation (standard machines)
if len(sm_results) > 0:
    best_sm_result = sm_results.loc[sm_results["efficiency"].idxmax()]
    optimal_sm_num_vms = best_sm_result["num_vms"]

    print(f"\nS/M-size benchmarks: {optimal_sm_num_vms} standard VMs")
    print(f"Efficiency: {best_sm_result['efficiency']:.3f}")
    print(f"Max VM runtime: {best_sm_result['max_runtime'] / 3600:.1f} hours")

    optimal_sm_vms = balanced_partition(
        non_l_benchmarks, optimal_sm_num_vms, MAX_RUNTIME_PER_VM_SECONDS
    )
    sm_final_result = analyze_allocation(
        optimal_sm_vms, "Final S/M-size Allocation - Standard"
    )

# Combined summary
total_vms = len(optimal_l_vms) + len(optimal_sm_vms)
total_runtime = sum(vm.total_runtime for vm in optimal_l_vms + optimal_sm_vms)

print(f"\n{'=' * 60}")
print("FINAL ALLOCATION SUMMARY")
print(f"{'=' * 60}")
print(f"Total VMs: {total_vms}")
print(f"  - Highmem VMs (L-size): {len(optimal_l_vms)}")
print(f"  - Standard VMs (S/M-size): {len(optimal_sm_vms)}")
print(f"Total allocated runtime: {total_runtime / 3600:.1f} hours")
print("Machine separation ensures optimal resource utilization")



Generating optimal allocations with size-based machine separation...
Runtime cap: 345600s (96.0h) per VM

L-size benchmarks: 2 highmem VMs
Efficiency: 0.726
Max VM runtime: 86.9 hours
Using 15 benchmarks with real runtime data (filtered from 15 total)
   Automatically increasing to 6 VMs to respect runtime cap
Target runtime per VM: 304753.1 seconds (84.7 hours)
Max runtime per VM (hard cap): 345600.0 seconds (96.0 hours)
⚠️  Creating additional VM #6 for benchmark times-etimeseu-europe-elec+heat-co2-multi_stage-29-64ts (50.0h)
⚠️  Creating additional VM #7 for benchmark genx-elec_trex-15-168h (50.0h)
✓ Created 2 additional VMs to respect runtime cap (total: 8 VMs)

=== Final L-size Allocation - Highmem ===
Total VMs created: 8
Active VMs (with benchmarks): 8
Empty VMs: 0
Total runtime: 1828518.8 seconds (507.9 hours)
Average runtime per active VM: 228564.8 seconds (63.5 hours)
Runtime standard deviation: 62713.1 seconds (17.4 hours)
Min runtime: 180000.0 seconds (50.0 hours)
Max run

## Export Configuration

In [187]:
# Export the allocation to YAML files for infrastructure
# NOTE: This exports ONLY benchmarks with real runtime data, separated by size category
# ONLY exports VMs that have benchmarks assigned (skips empty VMs)
# Benchmarks are sorted SMALLEST FIRST so they run in that order on each VM
output_dir = Path("../infrastructure/benchmarks/runtime_optimized")
output_dir.mkdir(exist_ok=True, parents=True)

# Clear existing files
for file in output_dir.glob("*.yaml"):
    file.unlink()

exported_vms = 0
total_benchmarks_exported = 0

# Filter to only VMs with benchmarks
active_l_vms = [vm for vm in optimal_l_vms if vm.benchmarks]
active_sm_vms = [vm for vm in optimal_sm_vms if vm.benchmarks]

print(
    f"Exporting {len(active_l_vms)} highmem VMs and {len(active_sm_vms)} standard VMs (skipping empty VMs)\n"
)

# Export L-size VMs (highmem machines)
for vm_idx, vm in enumerate(active_l_vms):
    # L-size benchmarks always get highmem machines
    machine_type = "c4-highmem-8"
    years = [2025]  # Include highs-hipo for L benchmarks

    # Sort benchmarks by runtime (SMALLEST FIRST) so they run in order
    sorted_benchmarks = sorted(vm.benchmarks, key=lambda b: b["runtime"])

    # Create benchmark structure with runtime metadata
    benchmarks_dict = {}
    benchmark_runtimes = {}  # Track total runtime per benchmark

    for benchmark in sorted_benchmarks:  # Use sorted list
        benchmark_name = benchmark["name"]
        if benchmark_name not in benchmarks_dict:
            benchmarks_dict[benchmark_name] = {"Sizes": []}
            benchmark_runtimes[benchmark_name] = 0.0

        size_entry = {
            "Name": benchmark["size_name"],
            "Size": benchmark["size_category"],
            "URL": benchmark["url"],
            "_runtime_s": round(benchmark["runtime"], 2),
            "_base_runtime_s": round(benchmark["base_runtime"], 2),
        }
        benchmarks_dict[benchmark_name]["Sizes"].append(size_entry)
        benchmark_runtimes[benchmark_name] += benchmark["runtime"]

    # Add total runtime for each benchmark
    for benchmark_name in benchmarks_dict:
        benchmarks_dict[benchmark_name]["_runtime_s"] = round(
            benchmark_runtimes[benchmark_name], 2
        )

    # Create YAML content with total runtime metadata
    yaml_content = {
        "machine-type": machine_type,
        "years": years,
        "_total_runtime_s": round(vm.total_runtime, 2),
        "_total_runtime_h": round(vm.total_runtime / 3600, 2),
        "_max_runtime_cap_h": MAX_RUNTIME_PER_VM_SECONDS / 3600
        if MAX_RUNTIME_PER_VM_SECONDS
        else None,
        "_num_benchmarks": len(vm.benchmarks),
        "_note": "Benchmarks sorted by runtime (smallest first). YAML dict order is preserved in Python 3.7+.",
        "benchmarks": benchmarks_dict,
    }

    # Write to file
    filename = f"highmem-vm-{vm_idx:02d}.yaml"
    with open(output_dir / filename, "w") as f:
        yaml.safe_dump(yaml_content, f, default_flow_style=False, sort_keys=False)

    print(
        f"Exported {filename}: {len(vm.benchmarks)} L-size benchmarks, "
        f"{vm.total_runtime / 3600:.1f}h runtime"
    )

    total_benchmarks_exported += len(vm.benchmarks)

# Export S/M-size VMs (standard machines)
for vm_idx, vm in enumerate(active_sm_vms):
    # S/M-size benchmarks get standard machines
    machine_type = "c4-standard-2"
    years = [2025]

    # Sort benchmarks by runtime (SMALLEST FIRST) so they run in order
    sorted_benchmarks = sorted(vm.benchmarks, key=lambda b: b["runtime"])

    # Create benchmark structure with runtime metadata
    benchmarks_dict = {}
    benchmark_runtimes = {}  # Track total runtime per benchmark

    for benchmark in sorted_benchmarks:  # Use sorted list
        benchmark_name = benchmark["name"]
        if benchmark_name not in benchmarks_dict:
            benchmarks_dict[benchmark_name] = {"Sizes": []}
            benchmark_runtimes[benchmark_name] = 0.0

        size_entry = {
            "Name": benchmark["size_name"],
            "Size": benchmark["size_category"],
            "URL": benchmark["url"],
            "_runtime_s": round(benchmark["runtime"], 2),
            "_base_runtime_s": round(benchmark["base_runtime"], 2),
        }
        benchmarks_dict[benchmark_name]["Sizes"].append(size_entry)
        benchmark_runtimes[benchmark_name] += benchmark["runtime"]

    # Add total runtime for each benchmark
    for benchmark_name in benchmarks_dict:
        benchmarks_dict[benchmark_name]["_runtime_s"] = round(
            benchmark_runtimes[benchmark_name], 2
        )

    # Create YAML content with total runtime metadata
    yaml_content = {
        "machine-type": machine_type,
        "years": years,
        "_total_runtime_s": round(vm.total_runtime, 2),
        "_total_runtime_h": round(vm.total_runtime / 3600, 2),
        "_max_runtime_cap_h": MAX_RUNTIME_PER_VM_SECONDS / 3600
        if MAX_RUNTIME_PER_VM_SECONDS
        else None,
        "_num_benchmarks": len(vm.benchmarks),
        "_note": "Benchmarks sorted by runtime (smallest first). YAML dict order is preserved in Python 3.7+.",
        "benchmarks": benchmarks_dict,
    }

    # Write to file
    filename = f"standard-{vm_idx:02d}.yaml"
    with open(output_dir / filename, "w") as f:
        yaml.safe_dump(yaml_content, f, default_flow_style=False, sort_keys=False)

    print(
        f"Exported {filename}: {len(vm.benchmarks)} S/M-size benchmarks, "
        f"{vm.total_runtime / 3600:.1f}h runtime"
    )

    total_benchmarks_exported += len(vm.benchmarks)

total_exported_vms = len(active_l_vms) + len(active_sm_vms)

print(f"\n{'=' * 70}")
print(f"Configuration files written to {output_dir}/")
print(
    f"Total VMs exported: {total_exported_vms} (skipped {len(optimal_l_vms) + len(optimal_sm_vms) - total_exported_vms} empty VMs)"
)
print(f"  - Highmem VMs: {len(active_l_vms)}")
print(f"  - Standard VMs: {len(active_sm_vms)}")
print(f"Total benchmarks exported: {total_benchmarks_exported}")
print(
    f"Total runtime allocated: {sum(vm.total_runtime for vm in active_l_vms + active_sm_vms) / 3600:.1f} hours"
)
print("\nMACHINE SEPARATION POLICY:")
print("  - L-size benchmarks → c4-highmem-8 (high memory for large problems)")
print("  - S/M-size benchmarks → c4-standard-2 (cost-effective for smaller problems)")
print(
    f"\nRUNTIME CAP: {MAX_RUNTIME_PER_VM_SECONDS / 3600:.1f}h ({MAX_RUNTIME_PER_VM_SECONDS / (24 * 3600):.1f} days) per VM"
    if MAX_RUNTIME_PER_VM_SECONDS
    else "\nNo runtime cap configured"
)
print("\nBENCHMARK ORDERING: Smallest runtime first (order preserved in YAML)")
print("NOTE: Only benchmarks with real HiGHS runtime data were included.")

Exporting 8 highmem VMs and 7 standard VMs (skipping empty VMs)

Exported highmem-vm-00.yaml: 2 L-size benchmarks, 86.9h runtime
Exported highmem-vm-01.yaml: 4 L-size benchmarks, 85.6h runtime
Exported highmem-vm-02.yaml: 4 L-size benchmarks, 85.5h runtime
Exported highmem-vm-03.yaml: 1 L-size benchmarks, 50.0h runtime
Exported highmem-vm-04.yaml: 1 L-size benchmarks, 50.0h runtime
Exported highmem-vm-05.yaml: 1 L-size benchmarks, 50.0h runtime
Exported highmem-vm-06.yaml: 1 L-size benchmarks, 50.0h runtime
Exported highmem-vm-07.yaml: 1 L-size benchmarks, 50.0h runtime
Exported standard-00.yaml: 15 S/M-size benchmarks, 31.7h runtime
Exported standard-01.yaml: 15 S/M-size benchmarks, 31.7h runtime
Exported standard-02.yaml: 15 S/M-size benchmarks, 31.7h runtime
Exported standard-03.yaml: 14 S/M-size benchmarks, 31.7h runtime


Exported standard-04.yaml: 18 S/M-size benchmarks, 31.7h runtime
Exported standard-05.yaml: 14 S/M-size benchmarks, 31.7h runtime
Exported standard-06.yaml: 14 S/M-size benchmarks, 31.7h runtime

Configuration files written to ../infrastructure/benchmarks/runtime_optimized/
Total VMs exported: 15 (skipped 5 empty VMs)
  - Highmem VMs: 8
  - Standard VMs: 7
Total benchmarks exported: 120
Total runtime allocated: 730.0 hours

MACHINE SEPARATION POLICY:
  - L-size benchmarks → c4-highmem-8 (high memory for large problems)
  - S/M-size benchmarks → c4-standard-2 (cost-effective for smaller problems)

RUNTIME CAP: 96.0h (4.0 days) per VM

BENCHMARK ORDERING: Smallest runtime first (order preserved in YAML)
NOTE: Only benchmarks with real HiGHS runtime data were included.
