diff --git a/api/METRICS.md b/api/METRICS.md new file mode 100644 index 0000000..98a2be2 --- /dev/null +++ b/api/METRICS.md @@ -0,0 +1,106 @@ +# API proposal for metrics + +## Example + +```python +# For most sklearn metrics, we will have their group version that returns +# the summary of its performance across groups as well as the overall +# performance, represented as a Bunch object with fields +# * overall: overall metric value +# * by_group: a dictionary that maps sensitive feature values to metric values + +summary = accuracy_score_group_summary(y_true, y_pred, sensitive_features=sf, **other_kwargs) + +# Exporting into pd.Series or pd.DataFrame in not too complicated + +series = pd.Series({**summary.by_group, 'overall': summary.overall}) +df = pd.DataFrame({"model accuracy": {**summary.by_group, 'overall': summary.overall}}) + +# Several types of scalar metrics for group fairness can be obtained from the group summary via transformation functions + +acc_difference = difference_from_summary(summary) +acc_ratio = ratio_from_summary(summary) +acc_group_min = group_min_from_summary(summary) + +# Most common disparity metrics should be predefined + +demo_parity_difference = demographic_parity_difference(y_true, y_pred, sensitive_features=sf, **other_kwargs) +demo_parity_ratio = demographic_parity_ratio(y_true, y_pred, sensitive_features=sf, **other_kwargs) +eq_odds_difference = equalized_odds_difference(y_true, y_pred, sensitive_features=sf, **other_kwargs) + +# For predefined disparities based on sklearn metrics, we adopt a consistent naming conventions + +acc_difference = accuracy_score_difference(y_true, y_pred, sensitive_features=sf, **other_kwargs) +acc_ratio = accuracy_score_ratio(y_true, y_pred, sensitive_features=sf, **other_kwargs) +acc_group_min = accuracy_score_group_min(y_true, y_pred, sensitive_features=sf, **other_kwargs) +``` + +## Proposal + +*Function signatures* + +```python +group_summary(metric, y_true, y_pred, *, sensitive_features, **other_kwargs) +# return the group summary for the provided `metric`, where `metric` has the signature +# metric(y_true, y_pred, **other_kwargs) + +make_metric_group_summary(metric) +# return a callable object _group_summary: +# _group_summary(...) = group_summary(, ...) + +# Transformation functions returning scalars +difference_from_summary(summary) +ratio_from_summary(summary) +group_min_from_summary(summary) +group_max_from_summary(summary) + +# Metric-specific functions returning group summary and scalars +_group_summary(y_true, y_pred, *, sensitive_features, **other_kwargs) +_difference(y_true, y_pred, *, sensitive_features, **other_kwargs) +_ratio(y_true, y_pred, *, sensitive_features, **other_kwargs) +_group_min(y_true, y_pred, *, sensitive_features, **other_kwargs) +_group_max(y_true, y_pred, *, sensitive_features, **other_kwargs) +``` + +*Transformations and transformation codes* + +|transformation function|output|metric-specific function|code|aif360| +|-----------------------|------|------------------------|----|------| +|`difference_from_summary`|max - min|`_difference`|D|unprivileged - privileged| +|`ratio_from_summary`|min / max|`_ratio`|R| unprivileged / privileged| +|`group_min_from_summary`|min|`_group_min`|Min| N/A | +|`group_max_from_summary`|max|`_group_max`|Max| N/A | + +*Tasks and task codes* + +|task|definition|code| +|----|----------|----| +|binary classification|labels and predictions are in {0,1}|class| +|probabilistic binary classification|labels are in {0,1}, predictions are in [0,1] and correspond to estimates of P(y\|x)|prob| +|randomized binary classification|labels are in {0,1}, predictions are in [0,1] and represent the probability of drawing y=1 in a randomized decision|class-rand| +|regression|labels and predictions are real-valued|reg| + +*Predefined metric-specific functions* + +* variants: D, R, Min, Max refer to the transformations from the table above; G refers to `_group_summary`. + +|metric|variants|task|notes|aif360| +|------|--------|-----|----|------| +|`selection_rate`| G,D,R,Min | class | | ✓ | +|`demographic_parity`| D,R | class | `selection_rate_difference`, `selection_rate_ratio` | `statistical_parity_difference`, `disparate_impact`| +|`accuracy_score`| G,D,R,Min | class | sklearn | `accuracy` | +|`balanced_accuracy_score` | G | class | sklearn | - | +|`mean_absolute_error` | G,D,R,Max | class, reg | sklearn | class only: `error_rate` | +|`confusion_matrix` | G | class | sklearn | `binary_confusion_matrix` | +|`false_positive_rate` | G,D,R | class | | ✓ | +|`false_negative_rate` | G | class | | ✓ | +|`true_positive_rate` | G,D,R | class | | ✓ | +|`true_negative_rate` | G | class | | ✓ | +|`equalized_odds` | D,R | class | max of difference or ratio under `true_positive_rate`, `false_positive_rate` | - | +|`precision_score`| G | class | sklearn | ✓ | +|`recall_score`| G | class | sklearn | ✓ | +|`f1_score`| G | class | sklearn | - | +|`roc_auc_score`| G | prob | sklearn | - | +|`log_loss`| G | prob | sklearn | - | +|`mean_squared_error`| G | prob, reg | sklearn | - | +|`r2_score`| G | reg | sklearn | - |