From 794a07d00c1b6ca0ddfce37abd41e00ce9153051 Mon Sep 17 00:00:00 2001
From: Miro Dudik <mdudik@gmail.com>
Date: Fri, 28 Feb 2020 09:47:04 -0500
Subject: [PATCH 1/5] add metrics API proposal

Signed-off-by: Miro Dudik <mdudik@gmail.com>
---
 api/METRICS.md | 91 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 91 insertions(+)
 create mode 100644 api/METRICS.md
diff --git a/api/METRICS.md b/api/METRICS.md
new file mode 100644
index 0000000..f5c77b0
--- /dev/null
+++ b/api/METRICS.md
@@ -0,0 +1,91 @@
+# API proposal for metrics
+
+## Example
+
+```python
+# For most sklearn metrics, we would have their group version that returns a Bunch with fields
+#   * overall: overall metric value
+#   * by_group: a dictionary that maps sensitive feature values to metric values
+
+summary = accuracy_score_by_group(y_true, y_pred, sensitive_features=sf, **other_kwargs)
+
+# Exporting into pd.Series or pd.DataFrame in not too complicated
+
+series = pd.Series({**summary.by_group, 'overall': summary.overall})
+df = pd.DataFrame({"model accuracy": {**summary.by_group, 'overall': summary.overall}})
+
+# Several types of scalar metrics for group fairness can be obtained from `summary` via transformation functions
+
+acc_difference = difference_from_summary(summary)
+acc_ratio = ratio_from_summary(summary)
+acc_group_min = group_min_from_summary(summary)
+
+# Most common disparity metrics should be predefined
+
+demo_parity_difference = demographic_parity_difference(y_true, y_pred, sensitive_features=sf, **other_kwargs)
+demo_parity_ratio = demographic_parity_ratio(y_true, y_pred, sensitive_features=sf, **other_kwargs)
+eq_odds_difference = equalized_odds_difference(y_true, y_pred, sensitive_features=sf, **other_kwargs)
+
+# For predefined disparities based on sklearn metrics, we adopt a consistent naming conventions
+
+acc_difference = accuracy_score_difference(y_true, y_pred, sensitive_features=sf, **other_kwargs)
+acc_ratio = accuracy_score_ratio(y_true, y_pred, sensitive_features=sf, **other_kwargs)
+acc_group_min = accuracy_score_group_min(y_true, y_pred, sensitive_features=sf, **other_kwargs)
+```
+
+## Functions
+
+*Function signatures*
+
+```python
+metric_by_group(metric, y_true, y_pred, *, sensitive_features, **other_kwargs)
+# return the summary for the provided metrics
+
+make_metric_by_group(metric)
+# return a callable object <metric>_by_group:
+# <metric>_by_group(...) = metric_by_group(<metric>, ...)
+
+# Transformation functions returning scalars
+difference_from_summary(summary)
+ratio_from_summary(summary)
+group_min_from_summary(summary)
+group_max_from_summary(summary)
+
+# Metric-specific functions returing summary and scalars
+<metric>_by_group(y_true, y_pred, *, sensitive_features, **other_kwargs)
+<metric>_difference(y_true, y_pred, *, sensitive_features, **other_kwargs)
+<metric>_ratio(y_true, y_pred, *, sensitive_features, **other_kwargs)
+<metric>_group_min(y_true, y_pred, *, sensitive_features, **other_kwargs)
+<metric>_group_max(y_true, y_pred, *, sensitive_features, **other_kwargs)
+```
+
+*Summary of transformations*
+
+|transformation function|output|metric-specific function|code|aif360|
+|-----------------------|------|------------------------|----|------|
+|`difference_from_summary`|max - min|`<metric>_difference`|D|unprivileged - privileged|
+|`ratio_from_summary`|min / max|`<metric>_ratio`|R| unprivileged / privileged|
+|`group_min_from_summary`|min|`<metric>_group_min`|Min| N/A |
+|`group_max_from_summary`|max|`<metric>_group_max`|Max| N/A |
+
+*Supported metric-specific functions*
+
+|metric|variants|task|notes|aif360|
+|------|--------|-----|----|------|
+|`selection_rate`| G,D,R,Min | class | | &#x2713; |
+|`demographic_parity`| D,R | class | `selection_rate_difference`, `selection_rate_ratio` | `statistical_parity_difference`, `disparate_impact`|
+|`accuracy_score`| G,D,R,Min | class | sklearn | `accuracy` |
+|`balanced_accuracy_score` | G | class | sklearn | - |
+|`mean_absolute_error` | G,D,R,Max | class,reg | sklearn | class only: `error_rate`
+|`false_positive_rate` | G,D,R | class | | &#x2713; |
+|`false_negative_rate` | G | class | | &#x2713; |
+|`true_positive_rate` | G,D,R | class | | &#x2713; |
+|`true_negative_rate` | G | class | | &#x2713; |
+|`equalized_odds` | D,R | class | max of difference or ratio under `true_positive_rate`, `false_positive_rate` | - |
+|`precision_score`| G | class | sklearn | &#x2713; |
+|`recall_score`| G | class | sklearn | &#x2713; |
+|`f1_score`| G | class | sklearn | - |
+|`roc_auc_score`| G | prob | sklearn | - |
+|`log_loss`| G | prob | sklearn | - |
+|`mean_squared_error`| G | prob,reg | sklearn | - |
+|`r2_score`| G | reg | sklearn | - |

From 3b93629f7088fe0f2f7be254411a908a1c8b72ad Mon Sep 17 00:00:00 2001
From: Miro Dudik <mdudik@gmail.com>
Date: Tue, 3 Mar 2020 18:14:58 -0500
Subject: [PATCH 2/5] add clarifications and confusion_matrix

Signed-off-by: Miro Dudik <mdudik@gmail.com>
---
 api/METRICS.md | 35 ++++++++++++++++++++++++++++-------
 1 file changed, 28 insertions(+), 7 deletions(-)

diff --git a/api/METRICS.md b/api/METRICS.md
index f5c77b0..391a660 100644
--- a/api/METRICS.md
+++ b/api/METRICS.md
@@ -33,13 +33,14 @@ acc_ratio = accuracy_score_ratio(y_true, y_pred, sensitive_features=sf, **other_
 acc_group_min = accuracy_score_group_min(y_true, y_pred, sensitive_features=sf, **other_kwargs)
 ```
 
-## Functions
+## Proposal
 
 *Function signatures*
 
 ```python
 metric_by_group(metric, y_true, y_pred, *, sensitive_features, **other_kwargs)
-# return the summary for the provided metrics
+# return the summary for the provided `metric`, where `metric` has the signature
+# metric(y_true, y_pred, **other_kwargs)
 
 make_metric_by_group(metric)
 # return a callable object <metric>_by_group:
@@ -51,7 +52,7 @@ ratio_from_summary(summary)
 group_min_from_summary(summary)
 group_max_from_summary(summary)
 
-# Metric-specific functions returing summary and scalars
+# Metric-specific functions returning summary and scalars
 <metric>_by_group(y_true, y_pred, *, sensitive_features, **other_kwargs)
 <metric>_difference(y_true, y_pred, *, sensitive_features, **other_kwargs)
 <metric>_ratio(y_true, y_pred, *, sensitive_features, **other_kwargs)
@@ -59,7 +60,7 @@ group_max_from_summary(summary)
 <metric>_group_max(y_true, y_pred, *, sensitive_features, **other_kwargs)
 ```
 
-*Summary of transformations*
+*Summary of transformations and transformation codes*
 
 |transformation function|output|metric-specific function|code|aif360|
 |-----------------------|------|------------------------|----|------|
@@ -68,7 +69,18 @@ group_max_from_summary(summary)
 |`group_min_from_summary`|min|`<metric>_group_min`|Min| N/A |
 |`group_max_from_summary`|max|`<metric>_group_max`|Max| N/A |
 
-*Supported metric-specific functions*
+*Summary of tasks and task codes*
+
+|task|definition|code|
+|----|----------|----|
+|binary classification|labels and predictions are in {0,1}|class|
+|probabilistic binary classification|labels are in {0,1}, predictions are in [0,1] and correspond to estimates of P(y\|x)|prob|
+|randomized binary classification|labels are in {0,1}, predictions are in [0,1] and represent the probability of drawing y=1 in a randomized decision|class-rand|
+|regression|labels and predictions are real-valued|reg|
+
+*Predefined metric-specific functions*
+
+* variants: D, R, Min, Max refer to the transformations from the table above; G refers to `<metric>_by_group`.
 
 |metric|variants|task|notes|aif360|
 |------|--------|-----|----|------|
@@ -76,7 +88,8 @@ group_max_from_summary(summary)
 |`demographic_parity`| D,R | class | `selection_rate_difference`, `selection_rate_ratio` | `statistical_parity_difference`, `disparate_impact`|
 |`accuracy_score`| G,D,R,Min | class | sklearn | `accuracy` |
 |`balanced_accuracy_score` | G | class | sklearn | - |
-|`mean_absolute_error` | G,D,R,Max | class,reg | sklearn | class only: `error_rate`
+|`mean_absolute_error` | G,D,R,Max | class, reg | sklearn | class only: `error_rate` |
+|`confusion_matrix` | G | class | sklearn | `binary_confusion_matrix` |
 |`false_positive_rate` | G,D,R | class | | &#x2713; |
 |`false_negative_rate` | G | class | | &#x2713; |
 |`true_positive_rate` | G,D,R | class | | &#x2713; |
@@ -87,5 +100,13 @@ group_max_from_summary(summary)
 |`f1_score`| G | class | sklearn | - |
 |`roc_auc_score`| G | prob | sklearn | - |
 |`log_loss`| G | prob | sklearn | - |
-|`mean_squared_error`| G | prob,reg | sklearn | - |
+|`mean_squared_error`| G | prob, reg | sklearn | - |
 |`r2_score`| G | reg | sklearn | - |
+
+## Dashboard questions
+
+1. Should we enable regression metrics for probabilistic classification?
+  * `mean_absolute_error`, `mean_squared_error`, `mean_squared_error(...,squared=False)`
+1. Should we introduce balanced error metrics for probabilistic classification?
+  * `balanced_mean_{squared,absolute}_error`, `balanced_log_loss`
+1. Do we keep `mean_prediction` and `mean_{over,under}prediction`?

From 9359f135a372c90d093f36bc8a7c76ca144ddfe8 Mon Sep 17 00:00:00 2001
From: Miro Dudik <mdudik@gmail.com>
Date: Tue, 3 Mar 2020 18:21:43 -0500
Subject: [PATCH 3/5] fix list markdown

Signed-off-by: Miro Dudik <mdudik@gmail.com>
---
 api/METRICS.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/api/METRICS.md b/api/METRICS.md
index 391a660..ec11657 100644
--- a/api/METRICS.md
+++ b/api/METRICS.md
@@ -106,7 +106,7 @@ group_max_from_summary(summary)
 ## Dashboard questions
 
 1. Should we enable regression metrics for probabilistic classification?
-  * `mean_absolute_error`, `mean_squared_error`, `mean_squared_error(...,squared=False)`
+   * `mean_absolute_error`, `mean_squared_error`, `mean_squared_error(...,squared=False)`
 1. Should we introduce balanced error metrics for probabilistic classification?
-  * `balanced_mean_{squared,absolute}_error`, `balanced_log_loss`
+   * `balanced_mean_{squared,absolute}_error`, `balanced_log_loss`
 1. Do we keep `mean_prediction` and `mean_{over,under}prediction`?

From ddde2ff751d2a9aee17ba42d0090a9e4c182283e Mon Sep 17 00:00:00 2001
From: Miro Dudik <mdudik@gmail.com>
Date: Thu, 12 Mar 2020 11:48:43 -0400
Subject: [PATCH 4/5] rename _by_group to _group_summary for consistency

Signed-off-by: Miro Dudik <mdudik@gmail.com>
---
 api/METRICS.md | 28 +++++++++++++++-------------
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/api/METRICS.md b/api/METRICS.md
index ec11657..e497e1d 100644
--- a/api/METRICS.md
+++ b/api/METRICS.md
@@ -3,18 +3,20 @@
 ## Example
 
 ```python
-# For most sklearn metrics, we would have their group version that returns a Bunch with fields
+# For most sklearn metrics, we will have their group version that returns
+# the summary of its performance across groups as well as the overall
+# performance, represented as a Bunch object with fields
 #   * overall: overall metric value
 #   * by_group: a dictionary that maps sensitive feature values to metric values
 
-summary = accuracy_score_by_group(y_true, y_pred, sensitive_features=sf, **other_kwargs)
+summary = accuracy_score_group_summary(y_true, y_pred, sensitive_features=sf, **other_kwargs)
 
 # Exporting into pd.Series or pd.DataFrame in not too complicated
 
 series = pd.Series({**summary.by_group, 'overall': summary.overall})
 df = pd.DataFrame({"model accuracy": {**summary.by_group, 'overall': summary.overall}})
 
-# Several types of scalar metrics for group fairness can be obtained from `summary` via transformation functions
+# Several types of scalar metrics for group fairness can be obtained from the group summary via transformation functions
 
 acc_difference = difference_from_summary(summary)
 acc_ratio = ratio_from_summary(summary)
@@ -38,13 +40,13 @@ acc_group_min = accuracy_score_group_min(y_true, y_pred, sensitive_features=sf,
 *Function signatures*
 
 ```python
-metric_by_group(metric, y_true, y_pred, *, sensitive_features, **other_kwargs)
-# return the summary for the provided `metric`, where `metric` has the signature
+group_summary(metric, y_true, y_pred, *, sensitive_features, **other_kwargs)
+# return the group summary for the provided `metric`, where `metric` has the signature
 # metric(y_true, y_pred, **other_kwargs)
 
-make_metric_by_group(metric)
-# return a callable object <metric>_by_group:
-# <metric>_by_group(...) = metric_by_group(<metric>, ...)
+make_metric_group_summary(metric)
+# return a callable object <metric>_group_summary:
+# <metric>_group_summary(...) = group_summary(<metric>, ...)
 
 # Transformation functions returning scalars
 difference_from_summary(summary)
@@ -52,15 +54,15 @@ ratio_from_summary(summary)
 group_min_from_summary(summary)
 group_max_from_summary(summary)
 
-# Metric-specific functions returning summary and scalars
-<metric>_by_group(y_true, y_pred, *, sensitive_features, **other_kwargs)
+# Metric-specific functions returning group summary and scalars
+<metric>_group_summary(y_true, y_pred, *, sensitive_features, **other_kwargs)
 <metric>_difference(y_true, y_pred, *, sensitive_features, **other_kwargs)
 <metric>_ratio(y_true, y_pred, *, sensitive_features, **other_kwargs)
 <metric>_group_min(y_true, y_pred, *, sensitive_features, **other_kwargs)
 <metric>_group_max(y_true, y_pred, *, sensitive_features, **other_kwargs)
 ```
 
-*Summary of transformations and transformation codes*
+*Transformations and transformation codes*
 
 |transformation function|output|metric-specific function|code|aif360|
 |-----------------------|------|------------------------|----|------|
@@ -69,7 +71,7 @@ group_max_from_summary(summary)
 |`group_min_from_summary`|min|`<metric>_group_min`|Min| N/A |
 |`group_max_from_summary`|max|`<metric>_group_max`|Max| N/A |
 
-*Summary of tasks and task codes*
+*Tasks and task codes*
 
 |task|definition|code|
 |----|----------|----|
@@ -80,7 +82,7 @@ group_max_from_summary(summary)
 
 *Predefined metric-specific functions*
 
-* variants: D, R, Min, Max refer to the transformations from the table above; G refers to `<metric>_by_group`.
+* variants: D, R, Min, Max refer to the transformations from the table above; G refers to `<metric>_group_summary`.
 
 |metric|variants|task|notes|aif360|
 |------|--------|-----|----|------|

From 0b86e6d333bcc76d04c6cdce51c3256684e47944 Mon Sep 17 00:00:00 2001
From: Miro Dudik <mdudik@gmail.com>
Date: Mon, 16 Mar 2020 11:36:06 -0400
Subject: [PATCH 5/5] remove dashboard questions

Signed-off-by: Miro Dudik <mdudik@gmail.com>
---
 api/METRICS.md | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/api/METRICS.md b/api/METRICS.md
index e497e1d..98a2be2 100644
--- a/api/METRICS.md
+++ b/api/METRICS.md
@@ -104,11 +104,3 @@ group_max_from_summary(summary)
 |`log_loss`| G | prob | sklearn | - |
 |`mean_squared_error`| G | prob, reg | sklearn | - |
 |`r2_score`| G | reg | sklearn | - |
-
-## Dashboard questions
-
-1. Should we enable regression metrics for probabilistic classification?
-   * `mean_absolute_error`, `mean_squared_error`, `mean_squared_error(...,squared=False)`
-1. Should we introduce balanced error metrics for probabilistic classification?
-   * `balanced_mean_{squared,absolute}_error`, `balanced_log_loss`
-1. Do we keep `mean_prediction` and `mean_{over,under}prediction`?