# Investigating variance in runtime / memory consumption

I ran all solvers on the 2 smallest benchmarks 10 times to see how much variance there was in the metrics.

In [1]:
import pandas as pd
import plotly.express as px

In [14]:
# NOTE this used the results from commit `9e6d959`
data_file = "./benchmark_results.csv"  # NOTE relative path!
df = pd.read_csv(data_file)
df

Unnamed: 0,Benchmark,Solver,Status,Termination Condition,Objective Value,Runtime (s),Memory Usage (MB)
0,pypsa-wind+sol+ely-1h-ucwind,gurobi,ok,optimal,3.139636e+11,5.985078,408.528
1,pypsa-wind+sol+ely-1h-ucwind,gurobi,ok,optimal,3.139636e+11,5.521643,396.212
2,pypsa-wind+sol+ely-1h-ucwind,gurobi,ok,optimal,3.139636e+11,5.608232,396.228
3,pypsa-wind+sol+ely-1h-ucwind,gurobi,ok,optimal,3.139636e+11,5.632786,401.760
4,pypsa-wind+sol+ely-1h-ucwind,gurobi,ok,optimal,3.139636e+11,5.560092,405.600
...,...,...,...,...,...,...,...
75,pypsa-wind+sol+ely-1h,scip,ok,optimal,8.466753e+10,209.339396,719.224
76,pypsa-wind+sol+ely-1h,scip,ok,optimal,8.466753e+10,209.296401,717.900
77,pypsa-wind+sol+ely-1h,scip,ok,optimal,8.466753e+10,209.478274,721.768
78,pypsa-wind+sol+ely-1h,scip,ok,optimal,8.466753e+10,210.108862,719.096


In [15]:
df["benchmark_solver"] = df["Benchmark"] + " - " + df["Solver"]

fig = px.violin(
    df,
    x="benchmark_solver",
    y="Runtime (s)",
    box=True,  # Adds a box plot inside the violin for additional stats
    points="all",  # Shows all individual data points
    title="Runtime Distribution per Benchmark-Solver Combination",
    labels={"benchmark_solver": "Benchmark - Solver"},
)
fig.show()

In [16]:
print("Variance in runtime")
stats = (
    df.groupby(["Benchmark", "Solver"])["Runtime (s)"]
    .agg(["mean", "std"])
    .reset_index()
)
# Calculate the Coefficient of Variation (CV) as (stddev / mean) * 100
stats["CV"] = (stats["std"] / stats["mean"]) * 100
stats.round(2)

Variance in runtime


Unnamed: 0,Benchmark,Solver,mean,std,CV
0,pypsa-wind+sol+ely-1h,glpk,240.09,2.96,1.23
1,pypsa-wind+sol+ely-1h,gurobi,23.65,0.22,0.91
2,pypsa-wind+sol+ely-1h,highs,108.58,1.1,1.01
3,pypsa-wind+sol+ely-1h,scip,210.23,1.37,0.65
4,pypsa-wind+sol+ely-1h-ucwind,glpk,55.31,0.79,1.43
5,pypsa-wind+sol+ely-1h-ucwind,gurobi,5.64,0.13,2.39
6,pypsa-wind+sol+ely-1h-ucwind,highs,18.7,0.12,0.63
7,pypsa-wind+sol+ely-1h-ucwind,scip,119.88,0.69,0.58


In [17]:
print("Variance in memory usage")
stats = (
    df.groupby(["Benchmark", "Solver"])["Memory Usage (MB)"]
    .agg(["mean", "std"])
    .reset_index()
)
# Calculate the Coefficient of Variation (CV) as (stddev / mean) * 100
stats["CV"] = (stats["std"] / stats["mean"]) * 100
stats.round(2)

Variance in memory usage


Unnamed: 0,Benchmark,Solver,mean,std,CV
0,pypsa-wind+sol+ely-1h,glpk,346.79,1.87,0.54
1,pypsa-wind+sol+ely-1h,gurobi,383.12,1.56,0.41
2,pypsa-wind+sol+ely-1h,highs,515.69,1.88,0.36
3,pypsa-wind+sol+ely-1h,scip,718.53,2.96,0.41
4,pypsa-wind+sol+ely-1h-ucwind,glpk,408.12,1.28,0.31
5,pypsa-wind+sol+ely-1h-ucwind,gurobi,401.5,3.78,0.94
6,pypsa-wind+sol+ely-1h-ucwind,highs,490.33,1.67,0.34
7,pypsa-wind+sol+ely-1h-ucwind,scip,1104.39,1.8,0.16


## Results

It doesn't look like there was much variance in either. The Coefficient of Variation (CV) is < 2% for both runtime and memory consumption.

# First vs Second Benchmark Runs

When I did a second benchmark run that increased timeout T from 5min to 15min, I noticed that the diff showed a lot of change in runtime. So let's look into that:
https://github.com/open-energy-transition/solver-benchmark/pull/27/files#diff-bd83e19dfe54f3c90d4f126de87f2b220aed062b851761720081d0ace78db25c

In [18]:
# Compare results of 2 benchmarking runs to see how much runtimes varied

from io import StringIO

results_before = pd.read_csv(
    StringIO("""
Benchmark,Solver,Status,Termination Condition,Objective Value,Runtime (s),Memory Usage (MB)
pypsa-eur-sec-2-lv1-3h,gurobi,TO,Timeout,,300,1787.848
pypsa-eur-sec-2-lv1-3h,highs,TO,Timeout,,300,1855.136
pypsa-eur-sec-2-lv1-3h,glpk,TO,Timeout,,300,566.772
pypsa-eur-sec-2-lv1-3h,scip,TO,Timeout,,300,4208.88
pypsa-eur-elec-10-lvopt-3h,gurobi,ok,optimal,8338089380.280747,108.03125429153442,2684.152
pypsa-eur-elec-10-lvopt-3h,highs,TO,Timeout,,300,2414.744
pypsa-eur-elec-10-lvopt-3h,glpk,TO,Timeout,,300,566.764
pypsa-eur-elec-10-lvopt-3h,scip,TO,Timeout,,300,5756.92
pypsa-eur-elec-20-lv1-3h-op,gurobi,ok,optimal,7070825187.397594,26.436074018478394,1017.264
pypsa-eur-elec-20-lv1-3h-op,highs,TO,Timeout,,300,853.76
pypsa-eur-elec-20-lv1-3h-op,glpk,TO,Timeout,,300,319.676
pypsa-eur-elec-20-lv1-3h-op,scip,TO,Timeout,,300,1668.304
pypsa-eur-elec-20-lv1-3h-op-ucconv,gurobi,ok,optimal,10504487082.690851,44.94551229476929,1140.84
pypsa-eur-elec-20-lv1-3h-op-ucconv,highs,TO,Timeout,,300,1047.224
pypsa-eur-elec-20-lv1-3h-op-ucconv,glpk,TO,Timeout,,300,326.776
pypsa-eur-elec-20-lv1-3h-op-ucconv,scip,TO,Timeout,,300,1970.784
pypsa-wind+sol+ely-1h-ucwind,gurobi,ok,optimal,313963605214.4336,5.462620973587036,451.772
pypsa-wind+sol+ely-1h-ucwind,highs,ok,optimal,313963605214.43445,19.595200538635254,536.044
pypsa-wind+sol+ely-1h-ucwind,glpk,ok,optimal,313963605200.0,52.29001998901367,408.768
pypsa-wind+sol+ely-1h-ucwind,scip,ok,optimal,313963605214.43396,111.93103623390198,1103.992
pypsa-wind+sol+ely-1h,gurobi,ok,optimal,84667526618.31015,5.820308685302734,454.436
pypsa-wind+sol+ely-1h,highs,ok,optimal,84667526618.31026,102.11832928657532,519.06
pypsa-wind+sol+ely-1h,glpk,ok,optimal,84667526620.0,228.0925920009613,347.772
pypsa-wind+sol+ely-1h,scip,ok,optimal,84667526618.308,196.46365356445312,714.24
""")
)

results_after = pd.read_csv(
    StringIO("""
Benchmark,Solver,Status,Termination Condition,Objective Value,Runtime (s),Memory Usage (MB)
pypsa-eur-sec-2-lv1-3h,gurobi,ok,optimal,46838337007.19579,566.878669500351,1799.212
pypsa-eur-sec-2-lv1-3h,highs,TO,Timeout,,900,2024.428
pypsa-eur-sec-2-lv1-3h,glpk,TO,Timeout,,900,550.112
pypsa-eur-sec-2-lv1-3h,scip,TO,Timeout,,900,4318.716
pypsa-eur-elec-10-lvopt-3h,gurobi,ok,optimal,8338089380.280747,73.17695474624634,2679.74
pypsa-eur-elec-10-lvopt-3h,highs,TO,Timeout,,900,2539.628
pypsa-eur-elec-10-lvopt-3h,glpk,TO,Timeout,,900,589.356
pypsa-eur-elec-10-lvopt-3h,scip,TO,Timeout,,900,5972.672
pypsa-eur-elec-20-lv1-3h-op,gurobi,ok,optimal,7070825187.397594,18.829575777053833,923.92
pypsa-eur-elec-20-lv1-3h-op,highs,TO,Timeout,,900,877.132
pypsa-eur-elec-20-lv1-3h-op,glpk,TO,Timeout,,900,319.84
pypsa-eur-elec-20-lv1-3h-op,scip,TO,Timeout,,900,1704.356
pypsa-eur-elec-20-lv1-3h-op-ucconv,gurobi,ok,optimal,10504487082.690851,31.80315923690796,1151.14
pypsa-eur-elec-20-lv1-3h-op-ucconv,highs,TO,Timeout,,900,1045.824
pypsa-eur-elec-20-lv1-3h-op-ucconv,glpk,TO,Timeout,,900,323.68
pypsa-eur-elec-20-lv1-3h-op-ucconv,scip,TO,Timeout,,900,2000.844
pypsa-wind+sol+ely-1h-ucwind,gurobi,ok,optimal,313963605214.4336,3.9050073623657227,457.06
pypsa-wind+sol+ely-1h-ucwind,highs,ok,optimal,313963605214.43445,13.578095197677612,541.208
pypsa-wind+sol+ely-1h-ucwind,glpk,ok,optimal,313963605200.0,33.07230806350708,407.68
pypsa-wind+sol+ely-1h-ucwind,scip,ok,optimal,313963605214.43396,77.28720355033875,1105.62
pypsa-wind+sol+ely-1h,gurobi,ok,optimal,84667526618.31015,3.800584077835083,454.536
pypsa-wind+sol+ely-1h,highs,ok,optimal,84667526618.31026,71.30572986602783,516.756
pypsa-wind+sol+ely-1h,glpk,ok,optimal,84667526620.0,153.40457582473755,347.612
pypsa-wind+sol+ely-1h,scip,ok,optimal,84667526618.308,134.33128952980042,717.12
""")
)

df_2_runs = results_before[results_before["Status"] == "ok"].merge(
    results_after[results_after["Status"] == "ok"], on=["Benchmark", "Solver"]
)
df_2_runs["Runtime Diff"] = df_2_runs["Runtime (s)_y"] - df_2_runs["Runtime (s)_x"]
df_2_runs["Runtime Diff %"] = (
    df_2_runs["Runtime Diff"] * 100 / df_2_runs["Runtime (s)_x"]
)
df_2_runs[
    ["Benchmark", "Solver", "Runtime (s)_x", "Runtime Diff", "Runtime Diff %"]
].sort_values(by="Runtime Diff %")

Unnamed: 0,Benchmark,Solver,Runtime (s)_x,Runtime Diff,Runtime Diff %
5,pypsa-wind+sol+ely-1h-ucwind,glpk,52.29002,-19.217712,-36.75216
7,pypsa-wind+sol+ely-1h,gurobi,5.820309,-2.019725,-34.701331
9,pypsa-wind+sol+ely-1h,glpk,228.092592,-74.688016,-32.744604
0,pypsa-eur-elec-10-lvopt-3h,gurobi,108.031254,-34.8543,-32.263163
10,pypsa-wind+sol+ely-1h,scip,196.463654,-62.132364,-31.625373
6,pypsa-wind+sol+ely-1h-ucwind,scip,111.931036,-34.643833,-30.951052
4,pypsa-wind+sol+ely-1h-ucwind,highs,19.595201,-6.017105,-30.707036
8,pypsa-wind+sol+ely-1h,highs,102.118329,-30.812599,-30.173427
2,pypsa-eur-elec-20-lv1-3h-op-ucconv,gurobi,44.945512,-13.142353,-29.240635
1,pypsa-eur-elec-20-lv1-3h-op,gurobi,26.436074,-7.606498,-28.773177


Looks like something changed between the two runs above because the runtime diff is pretty consistently 30%. Perhaps I used a different machine configuration, or had something running in the background.

# Third Experiment

To double check that there's no variance on the larger benchmark, I ran Gurobi on all benchmarks 10 times to see how much variance there is:

In [13]:
df_3 = pd.read_csv("./benchmark_results_gurobi_variance.csv")  # NOTE: relative path!
stats = df_3.groupby(["Benchmark", "Solver"])[["Runtime (s)", "Memory Usage (MB)"]].agg(
    ["mean", "std"]
)
# Calculate the Coefficient of Variation (CV) as (stddev / mean) * 100
stats[("Runtime (s)", "CV")] = (
    stats[("Runtime (s)", "std")] / stats[("Runtime (s)", "mean")]
) * 100
stats[("Memory Usage (MB)", "CV")] = (
    stats[("Memory Usage (MB)", "std")] / stats[("Memory Usage (MB)", "mean")]
) * 100
stats = stats.sort_index(axis=1)
stats.round(2)

Unnamed: 0_level_0,Unnamed: 1_level_0,Memory Usage (MB),Memory Usage (MB),Memory Usage (MB),Runtime (s),Runtime (s),Runtime (s)
Unnamed: 0_level_1,Unnamed: 1_level_1,CV,mean,std,CV,mean,std
Benchmark,Solver,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
pypsa-eur-elec-10-lvopt-3h,gurobi,0.42,2673.03,11.28,4.26,78.01,3.33
pypsa-eur-elec-20-lv1-3h-op,gurobi,0.21,1014.37,2.12,2.56,19.25,0.49
pypsa-eur-elec-20-lv1-3h-op-ucconv,gurobi,2.18,1133.43,24.66,4.37,32.81,1.44
pypsa-wind+sol+ely-1h,gurobi,0.32,457.16,1.48,0.69,3.75,0.03
pypsa-wind+sol+ely-1h-ucwind,gurobi,0.36,429.01,1.53,1.54,3.65,0.06


The CV here is a bit higher, but still not as high as 30%, so perhaps the above results were an outlier.