# Run and process benchmarks
This notebook demonstrates how to run (a set of) benchmarks, validate the results, and add the results to the website.

In [1]:
import sys

sys.path.insert(0, "..")
from runner.utils import (
    allocate_benchmarks,
    check_uptimes,
    create_benchmark_campaign,
    fetch_all_partial_results,
    load_benchmark_metadata,
)

In [None]:
# If a util function was modified, use this cell to reload it without having to restart the kernel
%run ../runner/utils.py

## Create benchmark campaign(s)
First, ensure that the desired benchmarks have been processed and categorized (following our PR template) and their metadata is present and up-to-date in `results/metadata.yaml`. Then, adapt the following cells to run the benchmarks of your choosing. This example runs benchmarks from [PR #314](https://github.com/open-energy-transition/solver-benchmark/pull/314).

### 20260225 Run IESA-Opt-NL benchmarks

In [None]:
to_run = {"IESA-Opt-NL": ["1-3h", "1-1h", "10-3h"]}

bench_sizes_to_run = set()
for b, sizes in to_run.items():
    for s in sizes:
        bench_sizes_to_run.add(b + "-" + s)

benchmarks_df = load_benchmark_metadata()
benchs_to_run = benchmarks_df[benchmarks_df.index.isin(bench_sizes_to_run)].copy()
benchs_to_run

Unnamed: 0,Benchmark,Instance,Modelling framework,Model name,Version,Contributor(s)/Source,License,Problem class,Application,Sectoral focus,...,Temporal resolution,Spatial resolution,Realistic,Num. constraints,Num. variables,Skip because,Num. nonzeros,Num. continuous variables,Num. integer variables,Notes
IESA-Opt-NL-1-3h,IESA-Opt-NL,1-3h,IESA-Opt,IESA-Opt-NL,20251218,"TNO, Utrecht University",CC BY 4.0,LP,Infrastructure & Capacity Expansion,Sector-coupled,...,2920 time slices,1 node,True,793714,493813,,,,,
IESA-Opt-NL-1-1h,IESA-Opt-NL,1-1h,IESA-Opt,IESA-Opt-NL,20251218,"TNO, Utrecht University",CC BY 4.0,LP,Infrastructure & Capacity Expansion,Sector-coupled,...,8760 time slices,1 node,True,1967554,1107013,,,,,
IESA-Opt-NL-10-3h,IESA-Opt-NL,10-3h,IESA-Opt,IESA-Opt-NL,20251218,"TNO, Utrecht University",CC BY 4.0,LP,Infrastructure & Capacity Expansion,Sector-coupled,...,2920 time slices,10 nodes,True,7603326,5672804,,,,,


In [None]:
# Create campaign:

# Allocate the Ls
l_to_run = benchs_to_run.query('Size == "L"')
vm_yamls = allocate_benchmarks(
    l_to_run,
    "Num. variables",
    len(l_to_run),  # 1 instance per VM since we're running only 3 benchmarks
    machine_type="c4-highmem-16",
    timeout_seconds=24 * 60 * 60,
    years=[2024, 2025],  # latest solvers only, so skip creating older conda envs
    # NOTE: this function also lets you choose the zone and solvers to run. For the website, we use the defaults
)
# Allocate the Ss and Ms
vm_yamls += allocate_benchmarks(
    benchs_to_run.query('Size != "L"'),
    "Num. variables",
    len(benchs_to_run) - len(l_to_run),
)

campaign_name = "iesa-opt-nl"
create_benchmark_campaign(
    f"20260225-{campaign_name}",
    campaign_name,
    vm_yamls,
)

Allocated. Estimated runtime: 1575.8h
  VM 00: 1 instances, 1575.8h
  VM 01: 1 instances, 307.5h
Allocated. Estimated runtime: 137.2h
  VM 00: 1 instances, 137.2h
Created directory and files in ../infrastructure/benchmarks/20260225-iesa-opt-nl
Run this campaign from the infrastructure/ directory using the command:
tofu apply -var-file benchmarks/20260225-iesa-opt-nl/run.tfvars -state=states/20260225-iesa-opt-nl.tfstate


**Before launching** the run using the `tofu apply` command above, take a second to:
1. Estimate cost before launching a run, it's easy to launch something costing $1000s!
1. Estimate runtime and make sure we're running as much as possible in parallel!
1. Also check if anyone else is already running benchmarks, and if we have enough gurobi licenses for the number of VMs you want to launch (our current limit is 40).
1. Finally, inspect the yaml files in the `infrastructure/<run-id>/` directory to check that you're launching the benchmarks that you expect.

## Monitor runs

I recommend:
1. 10 minutes after launching the run with the `tofu apply` command above, run the following command to view running VMs and ensure that the VMs started successfully:
    ```
    gcloud compute instances list | sort | tee /dev/tty | grep benchmark-instance | grep -i running | wc -l
    ```
1. A few hours after launching the run, use the `check_uptimes()` command in the cell below to check that the VMs have load average >= 1, which means they haven't hung and are crunching away happily.
1. If running large instances with a 24h timeout, use the `fetch_all_partial_results()` command in the cell below to fetch partial results after a day and check that it looks as you expect. E.g., if all solvers are timing out or erroring, perhaps it might be worth killing the run to save costs.

In case of errors, use these commands to SSH into a running VM and see what's happening:
```
gcloud compute ssh projects/compute-app-427709/zones/us-central1-a/instances/benchmark-instance-more-pypsa-de-sizes-04
tail -f /var/log/startup-script.log
cat /solver-benchmark/results/benchmark_results.csv
```

In [None]:
check_uptimes()

There are 1 running instances
benchmark-instance-00-campaign: 06:30:23 up 2 days, 15:32,  1 user,  load average: 1.09, 1.04, 1.08

0 potentially hung instances:



In [None]:
fetch_all_partial_results()

Cleared ../results/partial-results
There are 1 running VMs. Fetching results from: benchmark-instance-00-campaign	us-central1-a Done.
