# Run and Download Benchmark Sessions

This notebook helps you:
1. **Query running VMs** - List active benchmark VMs and their status
2. **Download results** - Fetch benchmark results, logs, and startup scripts from running or completed sessions
3. **Download from GCS** - Retrieve completed sessions from Google Cloud Storage
4. **Combine results** - Merge results from GCS and SCP sources

## Workflow
1. Run `tofu apply -var-file benchmarks/{timestamp}_batch/run.tfvars` from `infrastructure/` to start benchmarks
2. Use **Cell 2** to query VM status while running
3. Use **Cell 3** to download results from running VMs
4. Use **Cell 4** to download from GCS
5. Use **Cell 5+** to combine and analyze results

In [12]:
from pathlib import Path

# Configuration
PROJECT_ROOT = Path.cwd().parent
BASE_RESULTS_DIR = PROJECT_ROOT / "results"
INFRASTRUCTURE_DIR = PROJECT_ROOT / "infrastructure"
BASE_BENCHMARKS_DIR = INFRASTRUCTURE_DIR / "benchmarks"
SESSION_ID = "sample_run"

SESSION_RESULTS_DIR = BASE_RESULTS_DIR / SESSION_ID
SESSION_BENCHMARKS_DIR = BASE_BENCHMARKS_DIR / SESSION_ID
VM_SCRIPT = PROJECT_ROOT / "vms_gcloud.py"
VM_NAME_PREFIX = "benchmark-instance"

print(f"Benchmarks Dir: {SESSION_BENCHMARKS_DIR}")
!ls {SESSION_BENCHMARKS_DIR}

print(f"\nResults Dir: {SESSION_RESULTS_DIR}")

print(f"\nSession config:\n{'='*60}\n")
!cat $SESSION_BENCHMARKS_DIR/run.tfvars

Benchmarks Dir: /home/madhukar/oet/solver-benchmark/infrastructure/benchmarks/sample_run
run.tfvars  standard-01.yaml  standard-02.yaml

Results Dir: /home/madhukar/oet/solver-benchmark/results/sample_run

Session config:

project_id = "compute-app-427709"
# This will be overriden if a value is specified in the input metadata file
zone = "europe-west4-a"
# Optional
enable_gcs_upload = true
auto_destroy_vm = false
benchmarks_dir = "benchmarks/sample_run"

## Query Running VMs

In [13]:
# Query standard VMs
!python $VM_SCRIPT $VM_NAME_PREFIX --output table

Using current project: compute-app-427709
Discovering zones across 1 projects...
Querying VMs across 127 zones...
Progress: 50/127 zones checked
Progress: 100/127 zones checked

Found 2 matching VMs

Project                        Zone                 Name                           Status       Machine Type         Internal IP     External IP    
------------------------------------------------------------------------------------------------------------------------------------------------------------------
compute-app-427709             europe-west4-a       benchmark-instance-standard-01 RUNNING      c4-standard-2        10.164.0.2      34.90.240.138  
compute-app-427709             europe-west4-a       benchmark-instance-standard-02 RUNNING      c4-standard-2        10.164.0.3      34.90.225.213  


In [14]:
# Run commands over ssh for running VMs
!python $VM_SCRIPT $VM_NAME_PREFIX --ssh "uptime"

Using current project: compute-app-427709
Discovering zones across 1 projects...
Querying VMs across 127 zones...
Progress: 50/127 zones checked
Progress: 100/127 zones checked

Found 2 matching VMs
Executing command on 2 VMs: uptime

✓ benchmark-instance-standard-01: Success
STDOUT:
 14:17:53 up  1:14,  1 user,  load average: 0.00, 0.00, 0.00


✓ benchmark-instance-standard-02: Success
STDOUT:
 14:17:53 up  1:14,  1 user,  load average: 0.00, 0.00, 0.00


Completed: 2/2 successful


## Download Results from Running VMs

In [15]:
# Download benchmark_results.csv
!python $VM_SCRIPT $VM_NAME_PREFIX '--scp-source' 'vm:/solver-benchmark/results/benchmark_results.csv' '--scp-dest' $SESSION_RESULTS_DIR/scp/{{vm_name}}/benchmark_results.csv

Using current project: compute-app-427709
Discovering zones across 1 projects...
Querying VMs across 127 zones...
Progress: 50/127 zones checked
Progress: 100/127 zones checked

Found 2 matching VMs
Copying from 2 VMs...
✓ benchmark-instance-standard-01: Success
✓ benchmark-instance-standard-02: Success

Completed: 2/2 successful


In [16]:
# Download startup-script.log
!python $VM_SCRIPT $VM_NAME_PREFIX '--scp-source' 'vm:/var/log/startup-script.log' '--scp-dest' $SESSION_RESULTS_DIR/scp/{{vm_name}}/startup-script.log

Using current project: compute-app-427709
Discovering zones across 1 projects...
Querying VMs across 127 zones...
Progress: 50/127 zones checked
Progress: 100/127 zones checked

Found 2 matching VMs
Copying from 2 VMs...
✓ benchmark-instance-standard-01: Success
✓ benchmark-instance-standard-02: Success

Completed: 2/2 successful


In [17]:
# Download runner logs (recursive)
!python $VM_SCRIPT $VM_NAME_PREFIX '--scp-source' 'vm:/solver-benchmark/runner/logs/' '--scp-dest' $SESSION_RESULTS_DIR/scp/{{vm_name}}/logs/ --recursive

Using current project: compute-app-427709
Discovering zones across 1 projects...
Querying VMs across 127 zones...
Progress: 50/127 zones checked
Progress: 100/127 zones checked

Found 2 matching VMs
Copying from 2 VMs...
✓ benchmark-instance-standard-01: Success
✓ benchmark-instance-standard-02: Success

Completed: 2/2 successful


## Download from Google Cloud Storage

In [18]:
# Download results from GCS
!gsutil -m cp -r gs://solver-benchmarks/results/$SESSION_ID'*' $SESSION_RESULTS_DIR/gcs/results 2>/dev/null || echo "No GCS results found"
!gsutil -m cp -r gs://solver-benchmarks/logs/$SESSION_ID'*' $SESSION_RESULTS_DIR/gcs/logs 2>/dev/null || echo "No GCS logs found"

No GCS results found
No GCS logs found


In [19]:
# List downloaded results
!tree $SESSION_RESULTS_DIR/ 2>/dev/null || find $SESSION_RESULTS_DIR -type f | head -20

[01;34m/home/madhukar/oet/solver-benchmark/results/sample_run/[0m
└── [01;34mscp[0m
    ├── [01;34mbenchmark-instance-standard-01[0m
    │   ├── [00mbenchmark_results.csv[0m
    │   ├── [01;34mlogs[0m
    │   │   └── [01;34mlogs[0m
    │   │       └── [00mpypsa-eur-elec-op-ucconv-2-3h-highs-1.10.0.log[0m
    │   └── [00mstartup-script.log[0m
    └── [01;34mbenchmark-instance-standard-02[0m
        ├── [00mbenchmark_results.csv[0m
        ├── [01;34mlogs[0m
        │   └── [01;34mlogs[0m
        │       └── [00mSienna_modified_RTS_GMLC_DA_sys_NetDC_Horizon24_Day332-1-1h-highs-1.10.0.log[0m
        └── [00mstartup-script.log[0m

8 directories, 6 files


In [24]:
import pandas as pd

def load_and_combine_results(session_dir):
    """
    Load benchmark results from GCS and SCP, preferring GCS over SCP.

    Returns:
        pd.DataFrame: Combined results from all sources
    """
    session_dir = Path(session_dir)
    all_dfs = []

    # Load GCS results first (preferred)
    gcs_dir = session_dir / "gcs"
    gcs_count = 0
    if gcs_dir.exists():
        for csv_file in sorted(gcs_dir.glob("**/benchmark_results.csv")):
            try:
                df = pd.read_csv(csv_file)
                df['_source'] = 'gcs'
                all_dfs.append(df)
                gcs_count += 1
                print(f"  Loaded {len(df)} rows from {csv_file.parent.name}")
            except Exception as e:
                print(f"  Error loading {csv_file}: {e}")

    print(f"Found {gcs_count} GCS CSV files" if gcs_count else "No GCS results found")

    # Load SCP results
    scp_dir = session_dir / "scp"
    scp_count = 0
    if scp_dir.exists():
        for csv_file in sorted(scp_dir.glob("**/benchmark_results.csv")):
            try:
                df = pd.read_csv(csv_file)
                df['_source'] = 'scp'
                all_dfs.append(df)
                scp_count += 1
                print(f"  Loaded {len(df)} rows from {csv_file.parent.name}")
            except Exception as e:
                print(f"  Error loading {csv_file}: {e}")

    print(f"Found {scp_count} SCP CSV files" if scp_count else "No SCP results found")

    # Combine all dataframes and deduplicate, preferring GCS
    if all_dfs:
        combined_df = pd.concat(all_dfs, ignore_index=True)

        # Deduplicate: keep GCS over SCP for same benchmark runs
        # Sort so GCS comes first, then drop duplicates keeping first occurrence
        combined_df = combined_df.sort_values('_source', key=lambda x: (x != 'gcs')).reset_index(drop=True)

        # Identify duplicates by benchmark data (all columns except _source)
        cols_to_check = [c for c in combined_df.columns if c != '_source']
        combined_df = combined_df.drop_duplicates(subset=cols_to_check, keep='first')

        print(f"\nSuccessfully combined {len(combined_df)} total rows (after deduplication)")
        return combined_df
    else:
        print("\nNo CSV files found")
        return None

# Load results
results = load_and_combine_results(SESSION_RESULTS_DIR)

No GCS results found
  Loaded 1 rows from benchmark-instance-standard-01
  Loaded 1 rows from benchmark-instance-standard-02
Found 2 SCP CSV files

Successfully combined 2 total rows (after deduplication)


In [25]:
def get_timeout_by_machine(runs: pd.DataFrame) -> dict:
    """Infer timeout values by machine type from actual results."""
    timeout_map = {}

    # Find VM hostname column
    vm_col = None
    vm_candidates = {"hostname", "host", "vmhostname", "vm", "instancename", "instance", "_vm"}
    for c in runs.columns:
        normalized = c.lower().replace(" ", "").replace("_", "").replace("-", "")
        if normalized in vm_candidates:
            vm_col = c
            break

    if vm_col and "Timeout" in runs.columns:
        # Map hostname patterns to machine types
        for _, row in runs.iterrows():
            hostname = str(row.get(vm_col, ""))
            timeout = row.get("Timeout")

            if pd.notna(timeout):
                if "highmem" in hostname.lower():
                    timeout_map["c4-highmem-8"] = timeout
                elif "standard" in hostname.lower():
                    timeout_map["c4-standard-2"] = timeout

    return timeout_map

if results is not None:
    timeout_by_machine = get_timeout_by_machine(results)
    print("\nTimeout by machine type:")
    for machine_type, timeout in timeout_by_machine.items():
        print(f"  {machine_type}: {timeout}s")


Timeout by machine type:
  c4-standard-2: 600s


In [27]:
if results is not None:
    print("\n" + "=" * 100)
    print("COMBINED RESULTS SUMMARY")
    print("=" * 100)

    print(f"\nTotal rows: {len(results)}")
    print(f"Columns: {list(results.columns)}")

    if 'Hostname' in results.columns:
        print(f"\nVM breakdown:")
        print(results['Hostname'].value_counts().to_string())

    print(f"\nSource breakdown:")
    print(results['_source'].value_counts().to_string())

    if 'Solver' in results.columns:
        print(f"\nSolver breakdown:")
        print(results['Solver'].value_counts().to_string())

    if 'Status' in results.columns:
        print(f"\nStatus breakdown:")
        print(results['Status'].value_counts().to_string())

    if 'Benchmark' in results.columns and 'Size' in results.columns:
        print(f"\nUnique benchmarks: {results['Benchmark'].nunique()}")
        print(f"Unique sizes: {results['Size'].nunique()}")


COMBINED RESULTS SUMMARY

Total rows: 2
Columns: ['Benchmark', 'Size', 'Solver', 'Solver Version', 'Solver Release Year', 'Status', 'Termination Condition', 'Runtime (s)', 'Memory Usage (MB)', 'Objective Value', 'Max Integrality Violation', 'Duality Gap', 'Reported Runtime (s)', 'Timeout', 'Hostname', 'Run ID', 'Timestamp', '_source']

VM breakdown:
Hostname
benchmark-instance-standard-01    1
benchmark-instance-standard-02    1

Source breakdown:
_source
scp    2

Solver breakdown:
Solver
highs    2

Status breakdown:
Status
TO    2

Unique benchmarks: 2
Unique sizes: 2
