In [None]:
import json
import pathlib
import pandas as pd

In [None]:
cd ..

Ensure the DVC pipeline has been run. From the terminal, run:

```
dvc repro
```

In [None]:
results_file = pathlib.Path("data/scores/results.csv")
results_agg_file = pathlib.Path("data/scores/results_agg.json")

In [None]:
df = pd.read_csv(results_file)
df.describe()

In [None]:
df.head()

In [None]:
metrics_cols = ["auroc",  "lift_at_100",  "lift_at_num_errors", "auprc",  "ap_at_100",  "ap_at_num_errors"]
df_agg = df.groupby(["aggregator", "aggregator_kwargs"])[metrics_cols].agg(["mean"])
df_agg

# What do the aggregators do?

Given a score $\textbf{x} = (x_0, ... , x_K)$ for an example, where $K$ is the number of class labels, we have several methods to aggregate the $K$ scores into a single score. The most common methods are:

## `amax`
This selects the class label quality score with the highest value for each example.
This is an optimistic measure of the quality of the model's predictions.

## `amin`
This selects the class label quality score with the lowest value for each example.
This is a pessimistic measure of the quality of the model's predictions.

## `mean`
This selects the mean of the class label quality scores for each example.

## `median`
This selects the median of the class label quality scores for each example.

## `softmin_pooling`
Applies a softmin kernel to the class label quality scores for each example.

The softmin-pooled score is given by:

$$
s = \frac{ \exp(-x_i/\tau) x_i}{\sum_{i=0}^K \exp(-x_i/\tau)}
$$

where $\tau$ is a temperature parameter and 

## `log_transform_pooling`
Takes the log of each class label quality score, scales them by a weight, displaces them by a bias, and then takes the mean of the resulting values.

The log-pooled score is given by:

$$
s = \frac{1}{K} \sum_{i=0}^K \left( w_i \log(x_i + \epsilon) + b_i \right)
$$

where $\textbf{w} = (w_0, ... , w_K)$ are the weights, $\textbf{b} = (b_0, ... , b_K)$ are the biases, and $\epsilon$ is a small constant (machine epsilon) used to prevent taking the log of zero.

## `cumulative_average_ks`
This computes the cumulative average of the bottom $k$ class label quality scores for each example, i.e. the average of the $k$ worst scores.

## `simple_moving_average_ks`
This takes a _simple_ moving average (SMA) with window size $k$ of the sorted class label quality scores for each example.
The final score for each example is the mean of the moving averages.

## `exponential_moving_average`
This takes an _exponential_ moving average (EMA) of the sorted class label quality scores for each example with forgetting factor $\alpha$.

The EMA is calculated with:

$$
s_i = \begin{cases}
x_i & i = 0 \\
\alpha x_i + (1 - \alpha) s_{i-1} & i > 0
\end{cases}
$$

The final score for each example is $s_{K-1}$, i.e. the last value in the EMA sequence.

## `weighted_cumulative_average`
This computes cumulative averages $(s_1, s_2, ..., s_k)$ ($k \leq K$) of the class label quality scores sorted in asceding ordder for each example, and then takes the weighted average of those values.

The final score for each example is given by:

$$
s = \sum_{i=0}^k f(i) s_i
$$

where $f(i)$ is a scalar weighting function.

### Possible weighting functions
The weighting function $f(i)$ can be one of the following:

#### Simple mean
This gives the unweighted mean of the cumulative averages:
$$
f(i) = \frac{1}{k}
$$

#### Exponential decay
Each rank is decayed with the exponential function:
$$
f(i) = \exp(-i)
$$