# Runtime-Optimized VM Allocation for Benchmarks

This notebook implements the balanced partition bin packing algorithm to allocate benchmarks to VMs based on actual solver runtime data from `results/benchmark_results.csv`, aiming to minimize total runtime variance across VMs.

## Key Points

- **Solver Coverage**: Uses actual runtime data for all solvers (glpk, highs, scip, cbc, gurobi) from benchmark results
- **No Multiplier**: Uses exact runtime values from the results CSV (not scaled)
- **Algorithm**: Balanced partition algorithm for optimal load distribution
- **Machine Separation**: L-size benchmarks on highmem VMs, S/M-size on standard VMs

In [14]:
from pathlib import Path

import numpy as np
import pandas as pd
import yaml

# CONFIGURATION
# None for limit
MAX_RUNTIME_PER_VM_SECONDS = 3600
YEAR_FILTER = 2025  # Only use benchmark results from this year

print("Configuration:")
print(
    f"  Max runtime per VM: {MAX_RUNTIME_PER_VM_SECONDS / 3600:.1f} hours ({MAX_RUNTIME_PER_VM_SECONDS / (24 * 3600):.1f} days)"
    if MAX_RUNTIME_PER_VM_SECONDS
    else "  No runtime cap"
)
print(f"  Year filter: {YEAR_FILTER}")

Configuration:
  Max runtime per VM: 1.0 hours (0.0 days)
  Year filter: 2025


## Load and Process Runtime Data

In [15]:
# Load benchmark results for all solvers
benchmark_data = pd.read_csv("../results/benchmark_results.csv")

# Convert Runtime column to numeric
benchmark_data["Runtime (s)"] = pd.to_numeric(
    benchmark_data["Runtime (s)"], errors="coerce"
)

# Filter to only runs from the configured year
benchmark_data = benchmark_data[benchmark_data["Solver Release Year"] == YEAR_FILTER]

print(f"Total benchmark runs loaded: {len(benchmark_data)}")
print(f"Solvers included: {sorted(benchmark_data['Solver'].unique().tolist())}")
print(
    f"Total combined runtime: {benchmark_data['Runtime (s)'].sum():.0f} seconds ({benchmark_data['Runtime (s)'].sum() / 3600:.1f} hours)"
)

Total benchmark runs loaded: 240
Solvers included: ['highs', 'scip']
Total combined runtime: 1030947 seconds (286.4 hours)


## Load Benchmark Metadata

In [16]:
# Load benchmark metadata to get size categories and URLs
meta = yaml.safe_load(open("../results/metadata.yaml"))

# Create a lookup for metadata
metadata_lookup = {}
for name, benchmark in meta["benchmarks"].items():
    for size_info in benchmark["Sizes"]:
        instance_key = f"{name}-{size_info['Name']}"
        metadata_lookup[instance_key] = size_info

benchmarks_by_size = {"S": [], "M": [], "L": []}
all_benchmark_instances = []

# Group by (Benchmark, Size) and sum runtimes across all solvers
grouped_data = {}
for _, row in benchmark_data.iterrows():
    benchmark_name = row["Benchmark"]
    size_name = row["Size"]
    solver_name = row["Solver"]
    instance_key = f"{benchmark_name}-{size_name}"

    # Get metadata for this benchmark
    size_info = metadata_lookup.get(instance_key)
    if size_info is None:
        continue

    # Get runtime: use actual runtime if available, otherwise use Timeout value
    runtime = row["Runtime (s)"]
    if pd.isna(runtime):
        # For failed/timed out runs, use the Timeout value as estimate
        runtime = row["Timeout"]

    # Skip if still NaN
    if pd.isna(runtime):
        continue

    runtime = float(runtime)

    # Group by benchmark-size combination
    if instance_key not in grouped_data:
        grouped_data[instance_key] = {
            "name": benchmark_name,
            "size_name": size_name,
            "size_category": size_info["Size"],
            "instance_key": instance_key,
            "solvers": [],
            "solver_runtimes": {},
            "solver_status": {},  # Track success/failure status
            "runtime": 0.0,  # Sum of all solver runtimes for this benchmark-size
            "num_variables": size_info.get("Num. variables", 0),
            "num_constraints": size_info.get("Num. constraints", 0),
            "url": size_info["URL"],
        }

    # Add this solver's runtime and status
    if solver_name not in grouped_data[instance_key]["solvers"]:
        grouped_data[instance_key]["solvers"].append(solver_name)
    grouped_data[instance_key]["solver_runtimes"][solver_name] = runtime
    grouped_data[instance_key]["solver_status"][solver_name] = row["Status"]
    grouped_data[instance_key]["runtime"] += runtime

# Convert grouped data to list and categorize by size
for instance_key, instance_data in grouped_data.items():
    all_benchmark_instances.append(instance_data)
    size_cat = instance_data["size_category"]
    benchmarks_by_size[size_cat].append(instance_data)

print(f"Total benchmark-size combinations: {len(all_benchmark_instances)}")
for size, instances in benchmarks_by_size.items():
    print(f"  {size}: {len(instances)}")
print(
    f"\nTotal runtime (all solvers combined): {sum(i['runtime'] for i in all_benchmark_instances) / 3600:.1f} hours"
)
print("\nSample benchmark-size with all solvers:")
sample = all_benchmark_instances[0]
print(
    f"  {sample['instance_key']}: solvers={sorted(sample['solvers'])}, total_runtime={sample['runtime']:.1f}s"
)
for solver in sorted(sample["solvers"]):
    status = sample["solver_status"][solver]
    status_str = "✓" if status == "ok" else f"✗({status})"
    print(f"    - {solver}: {sample['solver_runtimes'][solver]:.1f}s {status_str}")

Total benchmark-size combinations: 120
  S: 18
  M: 87
  L: 15

Total runtime (all solvers combined): 328.4 hours

Sample benchmark-size with all solvers:
  genx-4_three_zones_w_policies_slack-3-1h: solvers=['highs', 'scip'], total_runtime=7200.0s
    - highs: 3600.0s ✗(TO)
    - scip: 3600.0s ✗(TO)


In [17]:
class VMAllocation:
    def __init__(self, vm_id: int):
        self.vm_id = vm_id
        self.benchmarks = []
        self.total_runtime = 0.0

    def add_benchmark(self, benchmark: dict):
        """Add benchmark with real runtime data only"""
        if benchmark["runtime"] is None:
            raise ValueError(
                f"Benchmark {benchmark['instance_key']} has no runtime data!"
            )

        self.benchmarks.append(benchmark)
        self.total_runtime += benchmark["runtime"]

    def get_total_runtime(self):
        return self.total_runtime

    def __lt__(self, other):
        # For heap operations - compare by total runtime
        return self.total_runtime < other.total_runtime

In [18]:
def analyze_allocation(vms: list[VMAllocation], algorithm_name: str):
    """
    Analyze and print statistics for a VM allocation.
    """
    runtimes = [vm.total_runtime for vm in vms]

    # Filter out empty VMs (should not happen with real runtime data only)
    active_vms = [vm for vm in vms if vm.total_runtime > 0]
    active_runtimes = [vm.total_runtime for vm in active_vms]

    print(f"\n=== {algorithm_name} ===")
    print(f"Total VMs created: {len(vms)}")
    print(f"Active VMs (with benchmarks): {len(active_vms)}")
    print(f"Empty VMs: {len(vms) - len(active_vms)}")

    if len(active_vms) > 0:
        print(
            f"Total runtime: {sum(active_runtimes):.1f} seconds ({sum(active_runtimes) / 3600:.1f} hours)"
        )
        print(
            f"Average runtime per active VM: {np.mean(active_runtimes):.1f} seconds ({np.mean(active_runtimes) / 3600:.1f} hours)"
        )
        print(
            f"Runtime standard deviation: {np.std(active_runtimes):.1f} seconds ({np.std(active_runtimes) / 3600:.1f} hours)"
        )
        print(
            f"Min runtime: {min(active_runtimes):.1f} seconds ({min(active_runtimes) / 3600:.1f} hours)"
        )
        print(
            f"Max runtime: {max(active_runtimes):.1f} seconds ({max(active_runtimes) / 3600:.1f} hours)"
        )
        print(
            f"Runtime ratio (max/min): {max(active_runtimes) / min(active_runtimes):.2f}"
        )

        # Efficiency (how balanced the allocation is)
        efficiency = 1 - (np.std(active_runtimes) / np.mean(active_runtimes))
        print(f"Load balance efficiency: {efficiency:.3f} (1.0 = perfect balance)")
    else:
        print("No active VMs found!")
        efficiency = 0

    return {
        "algorithm": algorithm_name,
        "total_runtime": sum(active_runtimes) if active_vms else 0,
        "std_runtime": np.std(active_runtimes) if active_vms else 0,
        "max_runtime": max(active_runtimes) if active_vms else 0,
        "min_runtime": min(active_runtimes) if active_vms else 0,
        "efficiency": efficiency,
        "runtimes": runtimes,
        "active_vms": len(active_vms),
        "num_vms": len(vms),
    }

In [19]:
def balanced_partition(
    benchmarks: list[dict], num_vms: int, max_runtime_per_vm: float = None
) -> list[VMAllocation]:
    """
    Balanced partition algorithm that tries to achieve equal total runtime per VM.
    Uses ONLY benchmarks with real runtime data.

    If max_runtime_per_vm is set:
    - Automatically creates additional VMs if needed to respect the cap
    - No VM will exceed max_runtime_per_vm (strictly enforced)

    Args:
        benchmarks: List of benchmark dictionaries with runtime data
        num_vms: Initial number of VMs to create
        max_runtime_per_vm: Maximum runtime allowed per VM (in seconds). If None, no limit.
    """
    # Filter to only benchmarks with real runtime data
    runtime_benchmarks = [
        b for b in benchmarks if b["runtime"] is not None and not pd.isna(b["runtime"])
    ]
    print(
        f"Using {len(runtime_benchmarks)} benchmarks with real runtime data (filtered from {len(benchmarks)} total)"
    )

    if len(runtime_benchmarks) == 0:
        return []

    # Calculate total runtime and target per VM
    total_runtime = sum(b["runtime"] for b in runtime_benchmarks)

    # If max_runtime_per_vm is set, ensure we have enough VMs
    if max_runtime_per_vm is not None:
        min_vms_needed = int(np.ceil(total_runtime / max_runtime_per_vm))
        if min_vms_needed > num_vms:
            print(
                f"⚠️  WARNING: Initial {num_vms} VMs cannot fit all benchmarks within {max_runtime_per_vm / 3600:.1f}h limit"
            )
            print(
                f"   Automatically increasing to {min_vms_needed} VMs to respect runtime cap"
            )
            num_vms = min_vms_needed

    target_runtime_per_vm = total_runtime / num_vms

    print(
        f"Target runtime per VM: {target_runtime_per_vm:.1f} seconds ({target_runtime_per_vm / 3600:.1f} hours)"
    )
    if max_runtime_per_vm is not None:
        print(
            f"Max runtime per VM (hard cap): {max_runtime_per_vm:.1f} seconds ({max_runtime_per_vm / 3600:.1f} hours)"
        )

    # Create initial VMs
    vms = [VMAllocation(i) for i in range(num_vms)]

    # Sort benchmarks by runtime (descending) - largest first for better bin packing
    sorted_benchmarks = sorted(
        runtime_benchmarks, key=lambda x: x["runtime"], reverse=True
    )

    # Assign benchmarks with balance consideration
    for benchmark in sorted_benchmarks:
        benchmark_runtime = benchmark["runtime"]

        # Find VM that would be closest to target after adding this benchmark
        best_vm = None
        best_score = float("inf")

        for vm in vms:
            current_runtime = vm.total_runtime
            after_runtime = current_runtime + benchmark_runtime

            # HARD CAP: Skip if this would exceed max runtime
            if max_runtime_per_vm is not None and after_runtime > max_runtime_per_vm:
                continue  # This VM cannot take this benchmark

            # Score based on deviation from target
            score = abs(after_runtime - target_runtime_per_vm)

            # Prefer VMs that are under-loaded
            if current_runtime < target_runtime_per_vm:
                score *= 0.8  # Bonus for under-loaded VMs

            if score < best_score:
                best_score = score
                best_vm = vm

        # If no VM can take this benchmark, create a new one
        if best_vm is None:
            if (
                max_runtime_per_vm is not None
                and benchmark_runtime > max_runtime_per_vm
            ):
                print(
                    f"❌ ERROR: Benchmark {benchmark['instance_key']} runtime ({benchmark_runtime / 3600:.1f}h) exceeds max VM runtime ({max_runtime_per_vm / 3600:.1f}h)!"
                )
                print(
                    "   This benchmark CANNOT fit in any VM. Consider increasing MAX_RUNTIME_PER_VM_SECONDS."
                )
                # Still add it to a new VM, but warn the user

            print(
                f"⚠️  Creating additional VM #{len(vms)} for benchmark {benchmark['instance_key']} ({benchmark_runtime / 3600:.1f}h)"
            )
            best_vm = VMAllocation(len(vms))
            vms.append(best_vm)

        best_vm.add_benchmark(benchmark)

    # Report on VMs created
    if len(vms) > num_vms:
        print(
            f"✓ Created {len(vms) - num_vms} additional VMs to respect runtime cap (total: {len(vms)} VMs)"
        )

    return vms

## Bin Packing with Balanced Partition

In [20]:
# Use all benchmark instances with their exact runtime values
benchmarks_with_runtime = [
    b for b in all_benchmark_instances if b["runtime"] is not None
]
print(
    f"Using {len(benchmarks_with_runtime)} benchmark-solver instances with runtime data"
)
print(
    f"Total combined runtime: {sum(b['runtime'] for b in benchmarks_with_runtime) / 3600:.1f} hours"
)

if MAX_RUNTIME_PER_VM_SECONDS is not None:
    print(
        f"\n⚙️  Runtime cap enabled: {MAX_RUNTIME_PER_VM_SECONDS} seconds ({MAX_RUNTIME_PER_VM_SECONDS / 3600:.1f} hours) per VM"
    )
else:
    print("\n⚙️  No runtime cap configured (unlimited)")

# Separate L-size benchmarks for highmem machines
l_size_benchmarks = [b for b in benchmarks_with_runtime if b["size_category"] == "L"]
non_l_benchmarks = [b for b in benchmarks_with_runtime if b["size_category"] != "L"]

print("\nBenchmark separation by size category:")
print(
    f"  L-size (highmem): {len(l_size_benchmarks)} instances, {sum(b['runtime'] for b in l_size_benchmarks) / 3600:.1f} hours"
)
print(
    f"  S/M-size (standard): {len(non_l_benchmarks)} instances, {sum(b['runtime'] for b in non_l_benchmarks) / 3600:.1f} hours"
)

# Calculate VM options dynamically based on data
results = []

# L-size benchmarks (fewer VMs since they need highmem)
print(f"\n{'=' * 50}")
print("TESTING L-SIZE BENCHMARKS (HIGHMEM MACHINES)")
print(f"{'=' * 50}")

if len(l_size_benchmarks) > 0:
    l_total_runtime = sum(b["runtime"] for b in l_size_benchmarks)
    if MAX_RUNTIME_PER_VM_SECONDS:
        l_min_vms = int(np.ceil(l_total_runtime / MAX_RUNTIME_PER_VM_SECONDS))
    else:
        l_min_vms = 1
    # Test from min_vms up to min_vms+4 (in steps of 1)
    l_vm_options = list(range(max(1, l_min_vms - 2), l_min_vms + 3))
    print(f"Calculated VM options for L-size: {l_vm_options}")
else:
    l_vm_options = []

for num_vms in l_vm_options:
    if len(l_size_benchmarks) == 0:
        print("No L-size benchmarks with runtime data")
        break

    print(f"\nTesting {num_vms} highmem VMs for L-size benchmarks:")

    bp_vms = balanced_partition(l_size_benchmarks, num_vms, MAX_RUNTIME_PER_VM_SECONDS)
    bp_result = analyze_allocation(bp_vms, f"L-size Balanced Partition ({num_vms} VMs)")
    bp_result["num_vms"] = num_vms
    bp_result["size_category"] = "L"
    results.append(bp_result)

# S/M-size benchmarks (more VMs with standard machines)
print(f"\n{'=' * 50}")
print("TESTING S/M-SIZE BENCHMARKS (STANDARD MACHINES)")
print(f"{'=' * 50}")

if len(non_l_benchmarks) > 0:
    sm_total_runtime = sum(b["runtime"] for b in non_l_benchmarks)
    if MAX_RUNTIME_PER_VM_SECONDS:
        sm_min_vms = int(np.ceil(sm_total_runtime / MAX_RUNTIME_PER_VM_SECONDS))
    else:
        sm_min_vms = 1
    # Test from min_vms up to min_vms+6 (in steps of 2)
    sm_vm_options = list(range(max(1, sm_min_vms), sm_min_vms + 7, 2))
    print(f"Calculated VM options for S/M-size: {sm_vm_options}")
else:
    sm_vm_options = []

for num_vms in sm_vm_options:
    if len(non_l_benchmarks) == 0:
        print("No S/M-size benchmarks with runtime data")
        break

    print(f"\nTesting {num_vms} standard VMs for S/M-size benchmarks:")

    bp_vms = balanced_partition(non_l_benchmarks, num_vms, MAX_RUNTIME_PER_VM_SECONDS)
    bp_result = analyze_allocation(
        bp_vms, f"S/M-size Balanced Partition ({num_vms} VMs)"
    )
    bp_result["num_vms"] = num_vms
    bp_result["size_category"] = "S/M"
    results.append(bp_result)

Using 120 benchmark-solver instances with runtime data
Total combined runtime: 328.4 hours

⚙️  Runtime cap enabled: 3600 seconds (1.0 hours) per VM

Benchmark separation by size category:
  L-size (highmem): 15 instances, 243.2 hours
  S/M-size (standard): 105 instances, 85.2 hours

TESTING L-SIZE BENCHMARKS (HIGHMEM MACHINES)
Calculated VM options for L-size: [242, 243, 244, 245, 246]

Testing 242 highmem VMs for L-size benchmarks:
Using 15 benchmarks with real runtime data (filtered from 15 total)
   Automatically increasing to 244 VMs to respect runtime cap
Target runtime per VM: 3587.8 seconds (1.0 hours)
Max runtime per VM (hard cap): 3600.0 seconds (1.0 hours)
❌ ERROR: Benchmark genx-elec_co2-15-168h runtime (20.0h) exceeds max VM runtime (1.0h)!
   This benchmark CANNOT fit in any VM. Consider increasing MAX_RUNTIME_PER_VM_SECONDS.
⚠️  Creating additional VM #244 for benchmark genx-elec_co2-15-168h (20.0h)
❌ ERROR: Benchmark genx-elec_trex_co2-15-168h runtime (20.0h) exceeds ma

## Generate Optimal Allocation

## Results Summary

In [21]:
# Print summary comparison table
print("\n" + "=" * 80)
print("BALANCED PARTITION ALGORITHM RESULTS")
print("=" * 80)

df_results = pd.DataFrame(results)

# Separate results by size category
l_results = (
    df_results[df_results["size_category"] == "L"]
    if "size_category" in df_results.columns and len(df_results) > 0
    else pd.DataFrame()
)
sm_results = (
    df_results[df_results["size_category"] == "S/M"]
    if "size_category" in df_results.columns and len(df_results) > 0
    else pd.DataFrame()
)

if len(df_results) > 0:
    print(
        f"\n{'Size':<6} {'VM Count':<9} {'Efficiency':<12} {'Max Runtime (h)':<15} {'Std Dev (h)':<12}"
    )
    print("-" * 60)

    for _, row in df_results.iterrows():
        size_cat = row.get("size_category", "Mixed")
        print(
            f"{size_cat:<6} {int(row['num_vms']):<9} "
            f"{row['efficiency']:.3f}{'':8} {row['max_runtime'] / 3600:.1f}{'':12} "
            f"{row['std_runtime'] / 3600:.1f}"
        )

# Find best configurations for each size category
print(f"\n{'=' * 80}")
print("BEST CONFIGURATIONS:")
print(f"{'=' * 80}")

if len(l_results) > 0:
    best_l = l_results.loc[l_results["efficiency"].idxmax()]
    print(
        f"Best L-size (highmem): {int(best_l['num_vms'])} VMs (efficiency: {best_l['efficiency']:.3f})"
    )
else:
    best_l = None

if len(sm_results) > 0:
    best_sm = sm_results.loc[sm_results["efficiency"].idxmax()]
    print(
        f"Best S/M-size (standard): {int(best_sm['num_vms'])} VMs (efficiency: {best_sm['efficiency']:.3f})"
    )
else:
    best_sm = None

# Calculate total deployment
if best_l is not None and best_sm is not None:
    total_vms = int(best_l["num_vms"]) + int(best_sm["num_vms"])
    total_efficiency = (best_l["efficiency"] + best_sm["efficiency"]) / 2
    print(
        f"\nTotal deployment: {total_vms} VMs ({int(best_l['num_vms'])} highmem + {int(best_sm['num_vms'])} standard)"
    )
    print(f"Average efficiency: {total_efficiency:.3f}")
elif best_l is not None:
    print(f"\nTotal deployment: {int(best_l['num_vms'])} highmem VMs only")
    print(f"Efficiency: {best_l['efficiency']:.3f}")
elif best_sm is not None:
    print(f"\nTotal deployment: {int(best_sm['num_vms'])} standard VMs only")
    print(f"Efficiency: {best_sm['efficiency']:.3f}")


BALANCED PARTITION ALGORITHM RESULTS

Size   VM Count  Efficiency   Max Runtime (h) Std Dev (h) 
------------------------------------------------------------
L      242       0.702         20.0             4.8
L      243       0.702         20.0             4.8
L      244       0.702         20.0             4.8
L      245       0.702         20.0             4.8
L      246       0.702         20.0             4.8
S/M    86        0.709         2.0             0.5
S/M    88        0.709         2.0             0.5
S/M    90        0.709         2.0             0.5
S/M    92        0.709         2.0             0.5

BEST CONFIGURATIONS:
Best L-size (highmem): 242 VMs (efficiency: 0.702)
Best S/M-size (standard): 92 VMs (efficiency: 0.709)

Total deployment: 334 VMs (242 highmem + 92 standard)
Average efficiency: 0.706


In [22]:
# Generate optimal allocations for both size categories
print("\n\nGenerating optimal allocations with size-based machine separation...")

if MAX_RUNTIME_PER_VM_SECONDS is not None:
    print(
        f"Runtime cap: {MAX_RUNTIME_PER_VM_SECONDS}s ({MAX_RUNTIME_PER_VM_SECONDS / 3600:.1f}h) per VM"
    )

optimal_l_vms = []
optimal_sm_vms = []

# Generate L-size allocation (highmem machines)
if best_l is not None:
    optimal_l_num_vms = int(best_l["num_vms"])

    print(f"\nL-size benchmarks: {optimal_l_num_vms} highmem VMs")
    print(f"Efficiency: {best_l['efficiency']:.3f}")
    print(f"Max VM runtime: {best_l['max_runtime'] / 3600:.1f} hours")

    optimal_l_vms = balanced_partition(
        l_size_benchmarks, optimal_l_num_vms, MAX_RUNTIME_PER_VM_SECONDS
    )
    l_final_result = analyze_allocation(
        optimal_l_vms, "Final L-size Allocation - Highmem"
    )

# Generate S/M-size allocation (standard machines)
if best_sm is not None:
    optimal_sm_num_vms = int(best_sm["num_vms"])

    print(f"\nS/M-size benchmarks: {optimal_sm_num_vms} standard VMs")
    print(f"Efficiency: {best_sm['efficiency']:.3f}")
    print(f"Max VM runtime: {best_sm['max_runtime'] / 3600:.1f} hours")

    optimal_sm_vms = balanced_partition(
        non_l_benchmarks, optimal_sm_num_vms, MAX_RUNTIME_PER_VM_SECONDS
    )
    sm_final_result = analyze_allocation(
        optimal_sm_vms, "Final S/M-size Allocation - Standard"
    )

# Combined summary
total_vms = len(optimal_l_vms) + len(optimal_sm_vms)
total_runtime = sum(vm.total_runtime for vm in optimal_l_vms + optimal_sm_vms)

print(f"\n{'=' * 60}")
print("FINAL ALLOCATION SUMMARY")
print(f"{'=' * 60}")
print(f"Total VMs: {total_vms}")
print(f"  - Highmem VMs (L-size): {len(optimal_l_vms)}")
print(f"  - Standard VMs (S/M-size): {len(optimal_sm_vms)}")
print(f"Total allocated runtime: {total_runtime / 3600:.1f} hours")
print("Machine separation ensures optimal resource utilization")



Generating optimal allocations with size-based machine separation...
Runtime cap: 3600s (1.0h) per VM

L-size benchmarks: 242 highmem VMs
Efficiency: 0.702
Max VM runtime: 20.0 hours
Using 15 benchmarks with real runtime data (filtered from 15 total)
   Automatically increasing to 244 VMs to respect runtime cap
Target runtime per VM: 3587.8 seconds (1.0 hours)
Max runtime per VM (hard cap): 3600.0 seconds (1.0 hours)
❌ ERROR: Benchmark genx-elec_co2-15-168h runtime (20.0h) exceeds max VM runtime (1.0h)!
   This benchmark CANNOT fit in any VM. Consider increasing MAX_RUNTIME_PER_VM_SECONDS.
⚠️  Creating additional VM #244 for benchmark genx-elec_co2-15-168h (20.0h)
❌ ERROR: Benchmark genx-elec_trex_co2-15-168h runtime (20.0h) exceeds max VM runtime (1.0h)!
   This benchmark CANNOT fit in any VM. Consider increasing MAX_RUNTIME_PER_VM_SECONDS.
⚠️  Creating additional VM #245 for benchmark genx-elec_trex_co2-15-168h (20.0h)
❌ ERROR: Benchmark genx-elec_trex_uc-15-24h runtime (20.0h) exc

## Export Configuration

In [23]:
from datetime import datetime, timezone

now_utc = datetime.now(timezone.utc)
formatted_time = now_utc.strftime("%Y%m%d_%H%M%S")

# run_id is the identifier for a benchmark campaign
run_id = f"{formatted_time}_batch"

# will contain all the config for a run
output_dir = Path("../infrastructure/benchmarks/") / run_id
output_dir.mkdir(exist_ok=False, parents=True)

project_id = "compute-app-427709"
zone = "europe-west4-a"  # This will be overriden if a value is specified in the input metadata file
enable_gcs_upload = True
auto_destroy_vm = False
benchmarks_dir = output_dir.resolve()

In [24]:
def generate_run_tfvars(output_path: str, **vars):
    import textwrap

    """Generate a config file from a simple inline template."""

    template = textwrap.dedent("""\
        project_id = "{project_id}"
        # This will be overriden if a value is specified in the input metadata file
        zone = "{zone}"
        # Optional
        enable_gcs_upload = {enable_gcs_upload}
        auto_destroy_vm = {auto_destroy_vm}
        run_id = "{run_id}"
        benchmarks_dir = "{benchmarks_dir}"
        """)

    # Convert booleans to lowercase true/false strings
    for key, value in vars.items():
        if isinstance(value, bool):
            vars[key] = str(value).lower()

    rendered = template.format(**vars)

    with open(output_path, "w") as f:
        f.write(rendered)

    print(f"Config written to {output_path}")

In [25]:
# Export the allocation to YAML files for infrastructure
generate_run_tfvars(
    output_dir / "run.tfvars",
    project_id=project_id,
    zone=zone,
    enable_gcs_upload=enable_gcs_upload,
    auto_destroy_vm=auto_destroy_vm,
    run_id=run_id,
    benchmarks_dir=benchmarks_dir,
)

# Get unique solvers from filtered benchmark data
available_solvers = sorted(benchmark_data["Solver"].unique().tolist())
solvers_str = " ".join(available_solvers)
print(f"Solvers available for year {YEAR_FILTER}: {solvers_str}\n")

# Filter to only VMs with benchmarks
active_l_vms = [vm for vm in optimal_l_vms if vm.benchmarks]
active_sm_vms = [vm for vm in optimal_sm_vms if vm.benchmarks]

print(
    f"Exporting {len(active_l_vms)} highmem VMs and {len(active_sm_vms)} standard VMs (skipping empty VMs)\n"
)

# Export L-size VMs (highmem machines)
for vm_idx, vm in enumerate(active_l_vms):
    machine_type = "c4-highmem-8"
    years = [YEAR_FILTER]

    # Sort benchmarks by runtime (SMALLEST FIRST) so they run in order
    sorted_benchmarks = sorted(vm.benchmarks, key=lambda b: b["runtime"])

    # Create benchmark structure with runtime metadata
    benchmarks_dict = {}
    benchmark_runtimes = {}

    for benchmark in sorted_benchmarks:
        benchmark_name = benchmark["name"]
        if benchmark_name not in benchmarks_dict:
            benchmarks_dict[benchmark_name] = {"Sizes": []}
            benchmark_runtimes[benchmark_name] = 0.0

        # Include all solvers and their individual runtimes
        size_entry = {
            "Name": benchmark["size_name"],
            "Size": benchmark["size_category"],
            "_solvers": sorted(
                benchmark["solvers"]
            ),  # List of all solvers (metadata only)
            "URL": benchmark["url"],
            "_runtime_s": round(
                benchmark["runtime"], 2
            ),  # Total runtime (sum of all solvers)
            "_solver_runtimes_s": {
                solver: round(benchmark["solver_runtimes"][solver], 2)
                for solver in sorted(benchmark["solvers"])
            },
            "_solver_status": {
                solver: benchmark["solver_status"][solver]
                for solver in sorted(benchmark["solvers"])
            },
        }
        benchmarks_dict[benchmark_name]["Sizes"].append(size_entry)
        benchmark_runtimes[benchmark_name] += benchmark["runtime"]

    # Add total runtime for each benchmark
    for benchmark_name in benchmarks_dict:
        benchmarks_dict[benchmark_name]["_runtime_s"] = round(
            benchmark_runtimes[benchmark_name], 2
        )

    # Create YAML content with total runtime metadata
    yaml_content = {
        "machine-type": machine_type,
        "years": years,
        "solver": solvers_str,  # Space-separated list of solvers used for this year
        "_total_runtime_s": round(vm.total_runtime, 2),
        "_total_runtime_h": round(vm.total_runtime / 3600, 2),
        "_max_runtime_cap_h": MAX_RUNTIME_PER_VM_SECONDS / 3600
        if MAX_RUNTIME_PER_VM_SECONDS
        else None,
        "_num_benchmarks": len(vm.benchmarks),
        "_note": "Each benchmark-size runs all solvers together. For failed/timed-out runs, Timeout value is used as runtime estimate.",
        "benchmarks": benchmarks_dict,
    }

    # Write to file
    filename = f"highmem-vm-{vm_idx:02d}.yaml"
    with open(output_dir / filename, "w") as f:
        yaml.safe_dump(yaml_content, f, default_flow_style=False, sort_keys=False)

    print(
        f"Exported {filename}: {len(vm.benchmarks)} L-size benchmark-size units, "
        f"{vm.total_runtime / 3600:.1f}h runtime"
    )

# Export S/M-size VMs (standard machines)
for vm_idx, vm in enumerate(active_sm_vms):
    machine_type = "c4-standard-2"
    years = [YEAR_FILTER]

    # Sort benchmarks by runtime (SMALLEST FIRST) so they run in order
    sorted_benchmarks = sorted(vm.benchmarks, key=lambda b: b["runtime"])

    # Create benchmark structure with runtime metadata
    benchmarks_dict = {}
    benchmark_runtimes = {}

    for benchmark in sorted_benchmarks:
        benchmark_name = benchmark["name"]
        if benchmark_name not in benchmarks_dict:
            benchmarks_dict[benchmark_name] = {"Sizes": []}
            benchmark_runtimes[benchmark_name] = 0.0

        # Include all solvers and their individual runtimes
        size_entry = {
            "Name": benchmark["size_name"],
            "Size": benchmark["size_category"],
            "_solvers": sorted(
                benchmark["solvers"]
            ),  # List of all solvers (metadata only)
            "URL": benchmark["url"],
            "_runtime_s": round(
                benchmark["runtime"], 2
            ),  # Total runtime (sum of all solvers)
            "_solver_runtimes_s": {
                solver: round(benchmark["solver_runtimes"][solver], 2)
                for solver in sorted(benchmark["solvers"])
            },
            "_solver_status": {
                solver: benchmark["solver_status"][solver]
                for solver in sorted(benchmark["solvers"])
            },
        }
        benchmarks_dict[benchmark_name]["Sizes"].append(size_entry)
        benchmark_runtimes[benchmark_name] += benchmark["runtime"]

    # Add total runtime for each benchmark
    for benchmark_name in benchmarks_dict:
        benchmarks_dict[benchmark_name]["_runtime_s"] = round(
            benchmark_runtimes[benchmark_name], 2
        )

    # Create YAML content with total runtime metadata
    yaml_content = {
        "machine-type": machine_type,
        "years": years,
        "solver": solvers_str,  # Space-separated list of solvers used for this year
        "_total_runtime_s": round(vm.total_runtime, 2),
        "_total_runtime_h": round(vm.total_runtime / 3600, 2),
        "_max_runtime_cap_h": MAX_RUNTIME_PER_VM_SECONDS / 3600
        if MAX_RUNTIME_PER_VM_SECONDS
        else None,
        "_num_benchmarks": len(vm.benchmarks),
        "_note": "Each benchmark-size runs all solvers together. For failed/timed-out runs, Timeout value is used as runtime estimate.",
        "benchmarks": benchmarks_dict,
    }

    # Write to file
    filename = f"standard-{vm_idx:02d}.yaml"
    with open(output_dir / filename, "w") as f:
        yaml.safe_dump(yaml_content, f, default_flow_style=False, sort_keys=False)

    print(
        f"Exported {filename}: {len(vm.benchmarks)} S/M-size benchmark-size units, "
        f"{vm.total_runtime / 3600:.1f}h runtime"
    )

total_exported_vms = len(active_l_vms) + len(active_sm_vms)
total_benchmarks = sum(len(vm.benchmarks) for vm in active_l_vms + active_sm_vms)

print(f"\n{'=' * 70}")
print(f"Configuration files written to {output_dir}/")
print(f"Total VMs exported: {total_exported_vms}")
print(f"  - Highmem VMs: {len(active_l_vms)}")
print(f"  - Standard VMs: {len(active_sm_vms)}")
print(f"Total benchmark-size combinations exported: {total_benchmarks}")
print(
    f"Total runtime allocated: {sum(vm.total_runtime for vm in active_l_vms + active_sm_vms) / 3600:.1f} hours"
)
print("\nMACHINE SEPARATION POLICY:")
print("  - L-size benchmarks → c4-highmem-8 (high memory for large problems)")
print("  - S/M-size benchmarks → c4-standard-2 (cost-effective for smaller problems)")
print(
    f"\nRUNTIME CAP: {MAX_RUNTIME_PER_VM_SECONDS / 3600:.1f}h ({MAX_RUNTIME_PER_VM_SECONDS / (24 * 3600):.1f} days) per VM"
    if MAX_RUNTIME_PER_VM_SECONDS
    else "\nNo runtime cap configured"
)
print("\nALLOCATION STRATEGY:")
print("  - Each benchmark-size combination runs all solvers together")
print("  - Runtime per unit = sum of individual solver runtimes")
print("  - For failed/timed-out solvers: uses Timeout value as runtime estimate")
print("  - _solver_status indicates 'ok' vs 'ER'/'TO'/'OOM' etc for each solver")
print("NOTE: Using exact runtime values from benchmark_results.csv")

Config written to ../infrastructure/benchmarks/20251106_145647_batch/run.tfvars
Solvers available for year 2025: highs scip

Exporting 15 highmem VMs and 54 standard VMs (skipping empty VMs)

Exported highmem-vm-00.yaml: 1 L-size benchmark-size units, 20.0h runtime
Exported highmem-vm-01.yaml: 1 L-size benchmark-size units, 20.0h runtime
Exported highmem-vm-02.yaml: 1 L-size benchmark-size units, 20.0h runtime
Exported highmem-vm-03.yaml: 1 L-size benchmark-size units, 20.0h runtime
Exported highmem-vm-04.yaml: 1 L-size benchmark-size units, 20.0h runtime
Exported highmem-vm-05.yaml: 1 L-size benchmark-size units, 20.0h runtime
Exported highmem-vm-06.yaml: 1 L-size benchmark-size units, 20.0h runtime
Exported highmem-vm-07.yaml: 1 L-size benchmark-size units, 20.0h runtime
Exported highmem-vm-08.yaml: 1 L-size benchmark-size units, 17.4h runtime
Exported highmem-vm-09.yaml: 1 L-size benchmark-size units, 15.9h runtime
Exported highmem-vm-10.yaml: 1 L-size benchmark-size units, 13.6h ru