# Script Demo
Show how to use the provided scripts that help with creating expertiment designs and processing the results

## Creating Experiment Design

### Cross Product Experiment Design (from table format)
Converts an experiment config in `table` form (concise) into the experiment design form required for the experiment suite.     
The function builds a cartesian product of all configuration options marked as `$FACTOR$` with a list of levels.    
The experiment in table form is a yaml file (with name `<exp_name>.yml` in `experiments/table`).
Each factor of the experiment is a yaml object with a single entry with key `$FACTOR$` and as a value the list of levels.

See example `experiments/table/demo.yml`and the resulting `experiments/designs/demo.yml`.

In [8]:
#%pycat scripts/expdesign.py # uncomment to see the code
%run scripts/expdesign.py

experiment_design = build_config_product(exp_name="demo", n_repetitions=3)

display(experiment_design)

Writing experiment design to: experiments/designs/demo.yml



{'n_repetitions': 3,
 'base_experiment': {'a': 1,
  'b': 'info',
  'c': '$FACTOR$',
  'd': '$FACTOR$',
  'client': {'e': '$FACTOR$', 'test': {'f': 'cinfo'}},
  'g': 'ginfo'},
 'factor_levels': [{'c': 1, 'd': 4, 'client': {'e': 'V1'}},
  {'c': 1, 'd': 4, 'client': {'e': 'V2'}},
  {'c': 1, 'd': 5, 'client': {'e': 'V1'}},
  {'c': 1, 'd': 5, 'client': {'e': 'V2'}},
  {'c': 1, 'd': 6, 'client': {'e': 'V1'}},
  {'c': 1, 'd': 6, 'client': {'e': 'V2'}},
  {'c': 1, 'd': 7, 'client': {'e': 'V1'}},
  {'c': 1, 'd': 7, 'client': {'e': 'V2'}},
  {'c': 2, 'd': 4, 'client': {'e': 'V1'}},
  {'c': 2, 'd': 4, 'client': {'e': 'V2'}},
  {'c': 2, 'd': 5, 'client': {'e': 'V1'}},
  {'c': 2, 'd': 5, 'client': {'e': 'V2'}},
  {'c': 2, 'd': 6, 'client': {'e': 'V1'}},
  {'c': 2, 'd': 6, 'client': {'e': 'V2'}},
  {'c': 2, 'd': 7, 'client': {'e': 'V1'}},
  {'c': 2, 'd': 7, 'client': {'e': 'V2'}},
  {'c': 3, 'd': 4, 'client': {'e': 'V1'}},
  {'c': 3, 'd': 4, 'client': {'e': 'V2'}},
  {'c': 3, 'd': 5, 'client': {'e':

## Reading and Processing Results

In [1]:
import pandas as pd

results_dir = "results"

In [3]:
# A dict with all experiment names and for each a list of experiment ids that should be included in the dataframe.
# See `scripts/results.py` to adjust how different output files are used.  
exp = {
    "example" : ["1626083535"]
}

#%pycat scripts/results.py # uncomment to see the code
%run scripts/results.py

df = read_df(results_dir, exp)
display(df)

Unnamed: 0,exp_name,exp_id,run,rep,host,info,client.arg1,client.arg2,server.arg3,server.arg4,a0,a1,a2,a3,tp
0,example,1626083535,run_1,rep_1,server_0,c2,10,2,2,test1,-a3,2,-a4,test1,14
1,example,1626083535,run_1,rep_1,client_0,c2,10,2,2,test1,-a1,10,-a2,2,15
2,example,1626083535,run_1,rep_2,server_0,c2,10,2,2,test1,-a3,2,-a4,test1,2
3,example,1626083535,run_1,rep_2,client_0,c2,10,2,2,test1,-a1,10,-a2,2,12
4,example,1626083535,run_1,rep_0,server_0,c2,10,2,2,test1,-a3,2,-a4,test1,11
5,example,1626083535,run_1,rep_0,client_0,c2,10,2,2,test1,-a1,10,-a2,2,5
6,example,1626083535,run_0,rep_1,server_0,c1,10,1,1,test1,-a3,1,-a4,test1,5
7,example,1626083535,run_0,rep_1,client_0,c1,10,1,1,test1,-a1,10,-a2,1,4
8,example,1626083535,run_0,rep_2,server_0,c1,10,1,1,test1,-a3,1,-a4,test1,3
9,example,1626083535,run_0,rep_2,client_0,c1,10,1,1,test1,-a1,10,-a2,1,8


### Group By (Aggregate Repetitions)
The experiment repeats each run `r` times.
Here we show how to apply this group by aggregate transformation on the dataframe to calculate mean, min, max, and std dev across repetitions of the same run.
(for clarity, we exclude the other config options but they could be used as further group by columns) 

In [4]:
# Example 1: Only keep the main info
df1 = df.groupby(['exp_name', 'exp_id', 'run', 'host']).agg({'tp': ['mean', 'min', 'max', 'std']}).reset_index()

df1.columns = ["_".join(v) if v[1] else v[0] for v in df1.columns.values]

display(df1)

Unnamed: 0,exp_name,exp_id,run,host,tp_mean,tp_min,tp_max,tp_std
0,example,1626083535,run_0,client_0,10.333333,4,19,7.767453
1,example,1626083535,run_0,server_0,1773.666667,21,5,9.865766
2,example,1626083535,run_1,client_0,10.666667,5,15,5.131601
3,example,1626083535,run_1,server_0,4737.0,11,2,6.244998
4,example,1626083535,run_2,client_0,16.0,8,21,7.0
5,example,1626083535,run_2,server_0,84.333333,2,5,1.527525


In [5]:
# Example 2: Keep all config info
cols = df.columns.tolist()
cols.remove('rep') # remove rep because we want to aggregate over the reps
cols.remove('tp') # remove the `value` column `tp` (throughput)

df1 = df.groupby(cols).agg({'tp': ['mean', 'min', 'max', 'std']}).reset_index()
df1.columns = ["_".join(v) if v[1] else v[0] for v in df1.columns.values]
display(df1)

Unnamed: 0,exp_name,exp_id,run,host,info,client.arg1,client.arg2,server.arg3,server.arg4,a0,a1,a2,a3,tp_mean,tp_min,tp_max,tp_std
0,example,1626083535,run_0,client_0,c1,10,1,1,test1,-a1,10,-a2,1,10.333333,4,19,7.767453
1,example,1626083535,run_0,server_0,c1,10,1,1,test1,-a3,1,-a4,test1,1773.666667,21,5,9.865766
2,example,1626083535,run_1,client_0,c2,10,2,2,test1,-a1,10,-a2,2,10.666667,5,15,5.131601
3,example,1626083535,run_1,server_0,c2,10,2,2,test1,-a3,2,-a4,test1,4737.0,11,2,6.244998
4,example,1626083535,run_2,client_0,c3,10,3,3,test1,-a1,10,-a2,3,16.0,8,21,7.0
5,example,1626083535,run_2,server_0,c3,10,3,3,test1,-a3,3,-a4,test1,84.333333,2,5,1.527525
