# _TriScale_ - Experiment Sizing

> This notebook is intended for **self-study** of _TriScale._  
Here is the [version for live sessions](live_exp-sizing.ipynb).

This notebook contains tutorial materials for _TriScale_. 

More specifically, this notebook presents _TriScale_'s `experiment_sizing` function,  
which implement a methodology to define the minimal number of runs required to estimate  
a certain performance objective with a given level of confidence. 

> If you don't know about Jupyter Notebooks and how to interact with them,  
fear not! We compiled everything that you need to know here: [Notebook Basics](tutorial_notebook-basics.ipynb) :-) 


For more details about _TriScale,_ you may refer to [the paper](https://doi.org/10.5281/zenodo.3464273).

---

- [Banana: a black-box example](#Banana:-a-black-box-example)
- [Opening the box](#Opening-the-box)
    - [Basic computation](#Basic-computation)
    - [What about upper bounds instead?](#What-about-upper-bounds-instead?)
    - [What about better bounds?](#What-about-better-bounds?)
- [Your turn: time to practice](#Your-turn:-time-to-practice)

---

To get started, we need to import a few Python modules.  
All the _TriScale_-specific functions are part of one module called `triscale`.

In [None]:
import os
from pathlib import Path

import pandas as pd
import numpy as np

import triscale

Alright, we are ready to size some experiments!

## Banana: a black-box example

Throughout the [tutorial presentation](link-to-slides), we used the Banana communication protocol as an example.  
Before having a closer look at what _TriScale_ offers, let us simply use the tool to see how many runs we need.

**Evaluation objective**  
Let us say we want to measure the overall energy consumption achieved by the protocol.  
For this purpose, we can use a simple metric: the sum of the energy consumed by all nodes in the network. 

> Note: We could pick _any metric_; the choice of the metric is independent of TriScale's methodology.

**Performance indicator**  
TriScale uses percentiles of the metric values as performance indicators.  
The goal of the experiments is to obtain an estimate of such a percentile for a given level of confidence. 
>These estimates are refered to as KPIs, or Key Performance Indicators.

For our performance metric (energy consumption), the lower the value, the better.  
Thus, we want to derive an upper bound for our chosen percentile and (hopefully) show that this bound is small. 

So let us go ahead and define the percentive we want to estimate and the confidence level for that estimation:

In [None]:
# Definition of Banana's KPI
percentile = 50 # the median
confidence = 95 # the confidence level, in %

These two values are sufficient to define the minimal number of runs required to compute this KPI.  
The computation is implemented in _TriScale_'s `experiment_sizing()` function: 

In [None]:
triscale.experiment_sizing(
    percentile, 
    confidence,
    verbose=True); 

> We need a **minimum of 5 runs.**

We can now do the same thing to estimate the long-term variability with the variability score.

In [None]:
# Definition of Banana's variability score
percentile = 25 # the median
confidence = 95 # the confidence level, in %

In [None]:
triscale.experiment_sizing(
    percentile, 
    confidence,
    verbose=True); 

> We need a **minimum of 11 series.**

Hence, with only these four parameters, we can connect the total number of runs one needs  
to perform (a minimum of 11 series of 5 runs) with the corresponding performance claims that one can make:
> **KPI**: In a series of runs, the median value of the runs metric values is lower or equal  
to the KPI with a confidence of 95%.

> **Variability score**: The range of KPI values of the middle 50% of series is less or equal  
to the variability score, with a confidence of 95%.

## Opening the box

In the previous section, we've seen on an example the basic usage of _TriScale_'s `experiment_sizing()` function.  
Let us now open the box a bit and explain how things work underneath.

<!--During the design phase of an experiment, one important question to answer is:  
> __How many times should the experiment be performed?__  

This question directly relates to the definition of _TriScale_ KPIs and variability scores. -->

### Basic computation

_TriScale_ implements a statistical method that allows to estimate, based on a data sample,  
any percentile of the underlying distribution with any level of confidence. Importantly,  
the estimation does not rely on any assumption on the nature of the underlying distribution  
(e.g., normal, or Poisson). The estimate is valid as long as the sample is independent and  
identically distributed (or _iid_ ).

Intuitively, it is "easier" to estimate the median (50th percentile) than the 99th percentile;  
the more extreme the percentile, the more samples are required to provide an estimate for a  
given level of confidence. 

Let us consider the samples $x$ are ordered such that $x_1 \leq x_2 \ldots \leq x_N$. One can  derive  
the minimal number of samples $N$ such that $x_1$ is a lower bound for  any percentile  $0<p<1$  
with a level of confidence $0<C<1$ using the following equation:

$$N \;\geq\; \frac{log(1-C)}{log(1-p)}$$

_TriScale_'s `experiment_sizing()` function implements this computation and returns the  
minimal number of samples $N$, as illustrated below.

In [None]:
# Select the percentile we want to estimate 
percentile = 10

# Select the desired level of confidence for the estimation
confidence = 99 # in %

# Compute the minimal number of samples N required
triscale.experiment_sizing(
    percentile, 
    confidence,
    verbose=True); 

The previous result indicates that for $N = 44$ samples and above,  $x_1$ is a lower bound  
for the 10th percentile with probibility larger than 99%. 

### What about upper bounds instead?

The probability distributions are symetric: it takes the same number of samples to compute  
a lower bound for the $p$-th percentile as to compute an upper bound for the $(1-p)$-th percentile.

`triscale.experiment_sizing` returns the required number of samples to estimate 
- a lower bound for percentiles $p <= 0.5$
- an upper bound for percentiles $p>0.5$

Hence, the following cell returns the same number of samples required as previously:

In [None]:
percentile = 90
confidence = 99 # in %

triscale.experiment_sizing(
    percentile, 
    confidence,
    verbose=True); 

To get a better feeling of how this minimal number of samples evolves this increasing confidence  
and more extreme percentiles, let us compute a range of minimal number of samples and display  
the results in a table (where the columns are the percentiles to estimate).

> You don't need to understand the code in the following cell. We simply computes the required  
number of samples for a list of percentiles and confidence levels, and store everything in a  
Pandas DataFrame for a nicer display in tabular format.

In [None]:
# Sets of percentiles and confidence levels to try
percentiles = [0.1, 1, 5, 10, 25, 50, 75, 90, 95, 99, 99.9]
confidences = [75, 90, 95, 99, 99.9, 99.99]

# Computing the minimum number of runs for each (perc., conf.) pair
min_number_samples = []
for c in confidences:
    tmp = []
    for p in percentiles:
        N = triscale.experiment_sizing(p,c)
        tmp.append(N[0])
    min_number_samples.append(tmp)
    
# Put the results in a DataFrame for a convenient display of the results
df = pd.DataFrame(columns=percentiles, data=min_number_samples)
df['Confidence level'] = confidences
df.set_index('Confidence level', inplace=True)

display(df)

### What about better bounds?

So far, we have seen how to compute the minimal number of samples such that $x_1$ is a valid lower bound.  
This implies that the estimate is then equal to the __smallest value__ obtained in your series of runs. 

If you work in a domain where outliers are common, you will want to get better bounds, which should be  
less affected by outliers. Good news, this is simple: you just need to run more experiments! 

The `experiment_sizing()` function takes an optional `robustness` argument that defines how many  
outliers you want your bound to exclude. In other words, for a `robustness` or $r$, the function returns  
the minimal number of samples required such that $x_{r+1}$ is a valid lower bound. 
This is illustrated below.

In [None]:
percentile = 10
confidence = 99
triscale.experiment_sizing(
    percentile, 
    confidence,
    robustness=3,
    verbose=True); 

We obtain that a minimum of $N = 97$ samples are required such that $x_4$ is a lower bound for the  
10th percentile with a confidence level of 99%.

> Naturally, this is (much) more than the 44 samples we got before, where $x_1$ was the  
lower bound. There is no free lunch! Better bounds demand more experiments.  
But at least, now you know how many you need :-)

## Your turn: time to practice

Based on the explanations above, use _TriScale_'s `experiment_sizing` function to answer  
the following questions:
- What is the minimal number of runs required to estimate the
    - **90th** percentile with **90%** confidence?
    - **90th** percentile with **95%** confidence?
    - **95th** percentile with **90%** confidence?
- Based on the answers to the previous questions, is it harder (i.e., does it require more runs)  
to increase the confidence level, or to estimate a more extreme percentile? 

_Optional (and harder) question:_ 
- For $N = 50$ samples, what is the index $m$ of the best possible (i.e., the largest) lower bound  
for the 25th percentile, estimated with a 95% confidence level? 

In [None]:
########## YOUR CODE HERE ###########
# ...
#####################################

#### Solutions

<details>
  <summary><br/>Click here show the solutions</summary>
  
```python
>>> print(triscale.experiment_sizing(90,90)[0])
22
>>> print(triscale.experiment_sizing(90,95)[0])
29
>>> print(triscale.experiment_sizing(95,90)[0])
45
```
We observe that it "costs" many more runs to estimate a more extreme percentile  
    (95th instead of 90th) than to increase the confidence level (90% to 95%).  
    This observation holds true in general. The number of runs required increases   
    exponentially when the percentiles get more extreme (close to $0$      or to $1$).
    
For the last question, we must play with the `robusteness` parameter. We can 
    write a simple loop to increase its value until the number of runs required  
    reaches 50.
    
```python
>>> r = 0
>>> while (triscale.experiment_sizing(25,95,r)[0] < 50):
>>>     r += 1 
>>> print(r)
7                                           
```        
Hence, we can exclude the 7 "worst" samples from the confidence interval.  
    With $N=50$ samples, the best lower bound for the 25th percentile with 95% confidence  
    is $x_8$     (assuming the first sample is $x_1$).
</details>

---
Next step: [Data Analysis](tutorial_data-analysis.ipynb)  
[Back to repo](.)