# Group Metrics

The `fairlearn` package contains algorithms which enable machine learning models to minimise disparity between groups. The `metrics` portion of the package provides the means required to verify that the mitigation algorithms are succeeding.

In [None]:
import numpy as np
import pandas as pd
import sklearn.metrics as skm

## Ungrouped Metrics

At their simplest, metrics take a set of 'true' values $Y_{true}$ (from the input data) and predicted values $Y_{pred}$ (by applying the model to the input data), and use these to compute a measure. For example, the _recall_ or _true positive rate_ is given by
\begin{equation}
P( Y_{pred}=1 | Y_{true}=1 )
\end{equation}
That is, a measure of whether the model finds all the positive cases in the input data. The `scikit-learn` package implements this in [sklearn.metrics.recall_score](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.recall_score.html).

Suppose we have the following data:

In [None]:
Y_true = [0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1]
Y_pred = [0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1]

we can see that the prediction is 1 in five of the ten cases where the true value is 1, so we expect the recall to be 0.0.5:

In [None]:
skm.recall_score(Y_true, Y_pred)

## Metrics with Grouping

When considering fairness, each row of input data will have an associated group label $g \in G$, and we will want to know how the metric behaves for each $g$. To help with this, `fairlearn` provides wrappers, which take an existing (ungrouped) metric function, and apply it to each group within a set of data.

Suppose in addition to the $Y_{true}$ and $Y_{pred}$ above, we had the following set of labels:

In [None]:
group_data = ['d', 'a', 'c', 'b', 'b', 'c', 'c', 'c', 'b', 'd', 'c', 'a', 'b', 'd', 'c', 'c']

df = pd.DataFrame({ 'Y_true': Y_true, 'Y_pred': Y_pred, 'group_data': group_data})
df

We can see that for the groups 'a' and 'd' the recall is 0 (none of the true positives were identified), for 'b' the recall is 0.5 and for 'c' the recall is 0.75.

The `fairlearn.metrics.metric_by_group` routine can calculate all of these for us. This takes as arguments an _ungrouped_ metric (such as `sklearn.metrics.recall_score`), the arrays $Y_{true}$ and $Y_{pred}$ and an array of group labels (and optionally, if the ungrouped metric supports it, an array of sample weights), and produces `GroupMetricResult` object:

In [None]:
import fairlearn.metrics as flm

group_metrics = flm.metric_by_group(skm.recall_score, Y_true, Y_pred, group_data, sample_weight=None)

print("Overall recall = ", group_metrics.overall)
print("recall by groups = ", group_metrics.by_group)

Note that the overall recall is the same as that calculated above in the Ungrouped Metric section, while the `by_group` dictionary matches the values we calculated by inspection from the table above.

In addition to these basic scores, `metric_by_group` also records the maximum and minimum values of the metric, the groups for which these occurred, and also the difference and ratio between the maximum and minimum:

In [None]:
print("min recall over groups = ", group_metrics.min_over_groups)
print("occurred for groups: ", group_metrics.argmin_groups)
print()
print("max recall over groups = ", group_metrics.max_over_groups)
print("occurred for groups: ", group_metrics.argmax_groups)
print()
print("range in recall = ", group_metrics.range_over_groups)
print("ratio in recall = ", group_metrics.range_ratio_over_groups)

## Supported Ungrouped Metrics

To be used by `metric_by_group` the supplied Python function must take arguments of `y_true` and `y_pred`:
```python
my_metric_func(y_true, y_pred)
```
An additional argument of `sample_weight` is also supported:
```python
my_metric_with_weight(y_true, y_pred, sample_weight)
```
The `sample_weight` argument is always invoked by name, and _only_ if the user supplies a `sample_weight` argument.

## Convenience Wrapper

Rather than require a call to `metric_by_group` each time, we also provide a function which turns an ungrouped metric into a grouped one. This is called `make_group_metric`:

In [None]:
group_recall_score = flm.make_group_metric(skm.recall_score)

results = group_recall_score(Y_true, Y_pred, group_data)

print("Overall recall = ", results.overall)
print("recall by groups = ", results.by_group)