# CLP 2020-1 dataset

Experiments with the COIN-OR Linear Programming solver (CLP) with different parameter settings.

## Computational environment 

A time limit of 4000 seconds for each execution was used. 

Experiments executed in a computer with an Intel® Core™ i9-7900X processor with 10 processing cores and 94 Gb of RAM.

CLP binaries from bintray were used. The binaries were compiled with GCC 9.3.0 and coinbrew on Ubuntu 12.04. 

GNU Parallel was used to to keep 12 parallel executions running.

## Instances set

[This repository](https://github.com/h-g-s/cbc-test-set.git) constains all instances used in the experiments with the exception of 
[nucorsav](http://miplib2017.zib.de/instance_details_nucorsav.html) and [ivu59](https://miplib.zib.de/instance_details_ivu59.html), which compressed size is more than 100 Mb.

## Results checking

Executions which timed out, crashed or which the objective function value exceed  both an (absolute,relative) distance of (1e-4, 1e-3) of the optimal solution cost, respectively, were penalized with an execution times of 8000 (two times the time limit). We refer to these executions as *failures*
.

In [2]:
# load results
import pandas as pd
dfr = pd.read_csv('results-full.csv')
insts = dfr.instance.unique()
algs = dfr.algsetting.unique()
dfr.set_index(['instance', 'algsetting'], inplace=True)

# checking instance results, separating too easy and too hard
inst_summ = []

hard_inst = []
easy_inst = []

too_easy = set()
too_hard = set()

inst_std = []
inst_std_nt = []

for inst in insts:
    resi = dfr.loc[inst]
    avg = resi['result'].mean()
    minr = resi['result'].min()
    maxr = resi['result'].max()
    std = resi['result'].std()
    
    stdnt = resi[resi['result'] < 7999.9]['result'].std()
    
    if avg >= 7999.99:
        hard_inst.append( (inst, minr, maxr, avg, std) )
        too_hard.add(inst)
    else:
        if maxr <= 1:
            easy_inst.append( (inst, minr, maxr, avg, std) )
            too_easy.add(inst)
        else:
            inst_std.append((inst, std))
            inst_std_nt.append((inst, stdnt))
            #print((inst, stdnt))
    
    
    inst_summ.append( (inst, minr, maxr, avg, std) )
    

from IPython.display import display_markdown
from IPython.display import display    

dfi_hard = pd.DataFrame(hard_inst, columns=['instance', 'min', 'max', 'average', 'std'])
dfi_hard.set_index(['instance'], inplace=True)

# gurobi results
dfgrb = pd.read_csv('relax-grb-results.csv')
dfgrb.set_index(['instance'], inplace=True)

hard_res = []
for inst in dfi_hard.index.array:
    hard_res.append( (inst, dfgrb.loc[inst]['seconds'] ) )



display_markdown("""### Instances that are too hard

For instances bellow, CLP failed in the time limit  with all algorithm settings.
The time that parallel **Gurobi** LP solver took to solve these instances in the same 10 
core machine is included (with concurrent LP solvers).
""", raw=True)

dfhard_res = pd.DataFrame(hard_res, columns=['instance', 'parallel gurobi time'])
dfhard_res.set_index('instance', inplace=True)

display(dfhard_res)

display_markdown("""As it can be seen, these instance are hard even for gurobi, since gurobi uses a large parge of the
time limit to solve using the concurrent parallel solvers.""", raw=True)


display_markdown("""### Instances that are too easy

Instance where all executions took less than one second""", raw=True)
    
dfeasy = pd.DataFrame(easy_inst, columns=['instance', 'min', 'max', 'average', 'std'])
dfeasy.set_index('instance', inplace=True)


display(dfeasy)

fr = open('results.csv', 'w')
fr.write('instance,algsetting,result\n')
for inst in insts:
    if inst in too_easy or inst in too_hard:
        continue
    for alg in algs:
        res = dfr.loc[(inst, alg)]['result']
        fr.write('{},{},{}\n'.format(inst,alg,res))

fr.close()



## Resulting dataset
#display_markdown("""### Resulting dataset
#""", raw=True)

dfr = pd.read_csv('results.csv')
#display(dfr)

## Best on average parameter settings
display_markdown("""### Best on average parameter settings""", raw=True)

fa = open('algs.csv', 'w')
fa.write('algsetting,avg,nfails,min,max,stddev\n')

alg_res = []
dfr.set_index(['algsetting', 'instance'], inplace=True)
for alg in algs:
    resa = dfr.loc[alg]
    avg = resa['result'].mean()
    minr = resa['result'].min()
    maxr = resa['result'].max()
    std = resa['result'].std()
    nfail = len(resa[resa['result'] == 8000.0])
    # len(df[df['education'] == '9th'])
    alg_res.append((alg, avg, std, nfail))
    fa.write('%s,%g,%d,%g\n'%(alg, avg, nfail, std))
    
    
    
fa.close()
    
dfas = pd.DataFrame(alg_res, columns=['algsetting', 'average', 'std.dev.', 'n. failures'])
dfas.set_index(['algsetting'], inplace=True)
dfas.sort_values(by=['average'], inplace=True)
display(dfas.head())

display_markdown("""### Worst on average parameter settings""", raw=True)
display(dfas.tail())

## Instances with larges variability

display_markdown("""### Instances with largest standard deviation""", raw=True)

dfstd = pd.DataFrame(inst_std, columns=['instance', 'std.dev.'])
dfstd.set_index(['instance'], inplace=True)
dfstd.sort_values(by=['std.dev.'], inplace=True, ascending=False)
display(dfstd.head())

display_markdown("""### Instances with largest standard deviation not considering failures""", raw=True)

dfstd2 = pd.DataFrame(inst_std_nt, columns=['instance', 'std.dev.'])
dfstd2.set_index(['instance'], inplace=True)
dfstd2.sort_values(by=['std.dev.'], inplace=True, ascending=False)
display(dfstd2.head())




### Instances that are too hard

For instances bellow, CLP failed in the time limit  with all algorithm settings.
The time that parallel **Gurobi** LP solver took to solve these instances in the same 10 
core machine is included (with concurrent LP solvers).


Unnamed: 0_level_0,parallel gurobi time
instance,Unnamed: 1_level_1
kottenpark09,3002.262
in,3446.266
rmine21,1652.008
rwth-timetable,1905.262
fhnw-binschedule1,1709.141


As it can be seen, these instance are hard even for gurobi, since gurobi uses a large parge of the
time limit to solve using the concurrent parallel solvers.

### Instances that are too easy

Instance where all executions took less than one second

Unnamed: 0_level_0,min,max,average,std
instance,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
gen_ip_071,0.00,0.04,0.010000,0.006554
timtab2,0.00,0.12,0.013600,0.011249
gsvm2rl3,0.00,0.16,0.023733,0.021032
gen_ip_025,0.00,0.03,0.008667,0.005754
neos-1442119,0.01,0.24,0.043400,0.046830
...,...,...,...,...
cl_08_020_07,0.03,0.56,0.057800,0.071989
neos-3660371-kurow,0.03,0.84,0.113067,0.154359
gen_ip_068,0.00,0.04,0.010600,0.006679
go19,0.02,0.10,0.044667,0.013495


### Best on average parameter settings

Unnamed: 0_level_0,average,std.dev.,n. failures
algsetting,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
idiot-50-primals,264.541185,1182.07401,15
idiot-80-primals,266.022594,1161.380814,14
idiot-100-primals,268.43696,1160.63149,14
idiot-60-primals,268.978801,1183.449382,15
crash-idiot5-dualp-pesteep-pertv-70-duals,293.426932,1241.835538,16


### Worst on average parameter settings

Unnamed: 0_level_0,average,std.dev.,n. failures
algsetting,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
subs-132-cholesky-univ-barrier,3012.059958,3644.765574,244
scal-geo-cholesky-univ-pertv--208-passp-83-subs-132-barrier,3015.474714,3649.60811,245
cholesky-univ-barrier,3016.680181,3649.946757,245
passp-83-cholesky-univ-barrier,3016.95463,3648.657135,245
pertv-208-cholesky-univ-barrier,3019.703598,3648.570912,245


### Instances with largest standard deviation

Unnamed: 0_level_0,std.dev.
instance,Unnamed: 1_level_1
neos-5138690-middle,3925.751553
adult-max5features,3844.451407
neos-5273874-yomtsa,3824.726013
neos-5116085-kenana,3813.186312
neos-5106984-jizera,3791.770758


### Instances with largest standard deviation not considering failures

Unnamed: 0_level_0,std.dev.
instance,Unnamed: 1_level_1
scpn2,1493.21821
graph40-40-1rand,1314.56894
neos-5223573-tarwin,1249.122964
hgms62,1215.360012
savsched1,1202.796668
