Merge pull request #24 from fidelity/probabilistic_membership

Probabilistic membership
fidelity · Sep 7, 2023 · 4e59100 · 4e59100
2 parents 47848d3 + 4536a33
commit 4e59100
Show file tree

Hide file tree

Showing 26 changed files with 2,966 additions and 625 deletions.
diff --git a/CHANGELOG.txt b/CHANGELOG.txt
@@ -2,12 +2,20 @@
 CHANGELOG
 =========
 
+-------------------------------------------------------------------------------
+Sep 09, 2022 2.0.0
+-------------------------------------------------------------------------------
+
+- Probabilistic fairness metrics are added based on membership likelihoods and surrogates --thanks to @mthielbar
+- Algorithm based on Surrogate Membership for Inferred Metrics in Fairness Evaluation (LION 2023)
+
 -------------------------------------------------------------------------------
 August 1, 2023 1.3.4
 -------------------------------------------------------------------------------
 
 - Added False Omission Rate Difference to Binary Fairness Metrics.
 
+
 -------------------------------------------------------------------------------
 April 21, 2023 1.3.3
 -------------------------------------------------------------------------------

diff --git a/CODEOWNERS b/CODEOWNERS
@@ -1,2 +1,2 @@
 # These owners will be the default owners for everything in the repo.
-*       @bkleyn @skadio
+*       @bkleyn @skadio @mthielbar
diff --git a/README.md b/README.md
@@ -3,9 +3,12 @@
 
 # Jurity: Fairness & Evaluation Library
 
-Jurity is a research library that provides fairness metrics, recommender system evaluations, classification metrics and bias mitigation techniques. The library adheres to PEP-8 standards and is tested heavily.
+Jurity ([LION'23](), [ICMLA'21](https://ieeexplore.ieee.org/document/9680169)) is a research library 
+that provides fairness metrics, recommender system evaluations, classification metrics and bias mitigation techniques. 
+The library adheres to PEP-8 standards and is tested heavily.
 
-Jurity is developed by the Artificial Intelligence Center of Excellence at Fidelity Investments. Documentation is available at [fidelity.github.io/jurity](https://fidelity.github.io/jurity).
+Jurity is developed by the Artificial Intelligence Center of Excellence at Fidelity Investments. 
+Documentation is available at [fidelity.github.io/jurity](https://fidelity.github.io/jurity).
 
 ## Fairness Metrics
 * [Average Odds](https://fidelity.github.io/jurity/about_fairness.html#average-odds)
@@ -51,7 +54,7 @@ from jurity.fairness import BinaryFairnessMetrics, MultiClassFairnessMetrics
 binary_predictions = [1, 1, 0, 1, 0, 0]
 multi_class_predictions = ["a", "b", "c", "b", "a", "a"]
 multi_class_multi_label_predictions = [["a", "b"], ["b", "c"], ["b"], ["a", "b"], ["c", "a"], ["c"]]
-is_member = [0, 0, 0, 1, 1, 1]
+memberships = [0, 0, 0, 1, 1, 1]
 classes = ["a", "b", "c"]
 
 # Metrics (see also other available metrics)
@@ -63,11 +66,41 @@ print("Metric:", metric.description)
 print("Lower Bound: ", metric.lower_bound)
 print("Upper Bound: ", metric.upper_bound)
 print("Ideal Value: ", metric.ideal_value)
-print("Binary Fairness score: ", metric.get_score(binary_predictions, is_member))
-print("Multi-class Fairness scores: ", multi_metric.get_scores(multi_class_predictions, is_member))
-print("Multi-class multi-label Fairness scores: ", multi_metric.get_scores(multi_class_multi_label_predictions, is_member))
+print("Binary Fairness score: ", metric.get_score(binary_predictions, memberships))
+print("Multi-class Fairness scores: ", multi_metric.get_scores(multi_class_predictions, memberships))
+print("Multi-class multi-label Fairness scores: ", multi_metric.get_scores(multi_class_multi_label_predictions, memberships))
 ```
 
+## Quick Start: Probabilistic Fairness Evaluation
+
+What if we do not know the protected membership attribute of each sample? This is a practical scenario that we refer to as _probabilistic_ fairness evaluation.  
+
+At a high-level, instead of strict 0/1 deterministic membership at individual level, consider the probability of membership to protected classes for each sample.
+
+An easy baseline is to convert these probabilities back to the deterministic setting by taking the maximum likelihood as the protected membership. This is problematic as the goal is not to predict membership but to evaluate fairness.
+
+Taking this a step further, while we do not have membership information at the individual level, consider access to _surrogate membership_ at _group level_. We can then infer the fairness metrics directly.   
+
+Jurity offers both options to address the case where membership data is missing. We provide an in-depth study and formal treatment in [Surrogate Membership for Inferred Metrics in Fairness Evaluation (LION 2023)]().
+
+```python
+from jurity.fairness import BinaryFairnessMetrics
+
+# Instead of 0/1 deterministic membership at individual level 
+# consider likelihoods of membership to protected classes for each sample 
+binary_predictions = [1, 1, 0, 1]
+memberships = [[0.2, 0.8], [0.4, 0.6], [0.2, 0.8], [0.9, 0.1]]
+
+# Metric
+metric = BinaryFairnessMetrics.StatisticalParity()
+print("Binary Fairness score: ", metric.get_score(binary_predictions, memberships))
+
+# Surrogate membership: consider access to surrogate membership at the group level. 
+surrogates = [0, 2, 0, 1]
+print("Binary Fairness score: ", metric.get_score(binary_predictions, memberships, surrogates))
+```
+
+
 ## Quick Start: Bias Mitigation
 
 ```python
@@ -154,6 +187,32 @@ print('F1 score is', f1_score.get_score(predictions, labels))
 
 Jurity requires **Python 3.7+** and can be installed from PyPI using ``pip install jurity`` or by building from source as shown in [installation instructions](https://fidelity.github.io/jurity/install.html).
 
+## Citation
+
+If you use MABWiser in a publication, please cite it as:
+
+```bibtex
+    @article{DBLP:conf/lion/Melinda23,
+      author    = {Melinda Thielbar, Serdar Kadioglu, Chenhui Zhang, Rick Pack, and Lukas Dannull},
+      title     = {Surrogate Membership for Inferred Metrics in Fairness Evaluation},
+      booktitle = {The 17th Learning and Intelligent Optimization Conference (LION)},
+      publisher = {{LION}},
+      year      = {2023}
+    }
+
+    @inproceedings{DBLP:conf/icmla/MichalskyK21,
+    author       = {Filip Michalsk{\'{y}} and Serdar Kadioglu},
+    title        = {Surrogate Ground Truth Generation to Enhance Binary Fairness Evaluation in Uplift Modeling},
+    booktitle    = {20th {IEEE} International Conference on Machine Learning and Applications, 
+    {ICMLA} 2021, Pasadena, CA, USA, December 13-16, 2021},
+    pages        = {1654--1659},
+    publisher    = {{IEEE}},
+    year         = {2021},
+    url          = {https://doi.org/10.1109/ICMLA52953.2021.00264},
+    doi          = {10.1109/ICMLA52953.2021.00264},
+}
+```
+
 ## Support
 Please submit bug reports and feature requests as [Issues](https://github.com/fidelity/jurity/issues).
 

diff --git a/jurity/_version.py b/jurity/_version.py
@@ -2,4 +2,4 @@
 # Copyright FMR LLC <opensource@fidelity.com>
 # SPDX-License-Identifier: Apache-2.0
 
-__version__ = "1.3.4"
+__version__ = "2.0.0"
diff --git a/jurity/constants.py b/jurity/constants.py
@@ -0,0 +1,41 @@
+from typing import NamedTuple
+import  numpy as np
+
+
+class Constants(NamedTuple):
+    """
+    Constant values used by the modules.
+    """
+
+    default_seed = 1
+    float_null = np.float64(0.0)
+    bootstrap_trials = 100
+
+    TPR = "TPR"
+    TNR = "TNR"
+    FPR = "FPR"
+    FNR = "FNR"
+    PPV = "PPV"
+    NPV = "NPV"
+    FDR = "FDR"
+    FOR = "FOR"
+    ACC = "ACC"
+    PRED_RATE = "Prediction Rate"
+
+    user_id = "user_id"
+    item_id = "item_id"
+    estimate = "estimate"
+    inverse_propensity = "inverse_propensity"
+    ips_correction = "ips_correction"
+    propensity = "propensity"
+
+    true_positive_ratio = "true_positive_ratio"
+    true_negative_ratio = "true_negative_ratio"
+    false_positive_ratio = "false_positive_ratio"
+    false_negative_ratio = "false_negative_ratio"
+    prediction_ratio = "prediction_ratio"
+    class_col_name = "class"
+    weight_col_name = "count"
+    no_label_metrics = ["StatisticalParity", "DisparateImpact"]
+    probabilistic_metrics = ["AverageOdds", "EqualOpportunity",
+                             "FNRDifference", "StatisticalParity", "PredictiveEquality"]
diff --git a/jurity/fairness/__init__.py b/jurity/fairness/__init__.py
@@ -3,15 +3,17 @@
 # SPDX-License-Identifier: Apache-2.0
 
 import inspect
-from typing import List, Union
+from typing import List, Union, Optional
 from typing import NamedTuple
 
 import numpy as np
 import pandas as pd
 
 from jurity.fairness.base import _BaseBinaryFairness
 from jurity.fairness.base import _BaseMultiClassMetric
-from jurity.utils import check_inputs_validity
+from jurity.utils import is_one_dimensional
+from jurity.utils_proba import get_argmax_memberships
+from jurity.utils_proba import get_bootstrap_results
 from .average_odds import AverageOdds
 from .disparate_impact import BinaryDisparateImpact, MultiDisparateImpact
 from .equal_opportunity import EqualOpportunity
@@ -41,57 +43,121 @@ class BinaryFairnessMetrics(NamedTuple):
     @staticmethod
     def get_all_scores(labels: Union[List, np.ndarray, pd.Series],
                        predictions: Union[List, np.ndarray, pd.Series],
-                       is_member: Union[List, np.ndarray, pd.Series],
-                       membership_label: Union[str, float, int] = 1) -> pd.DataFrame:
+                       memberships: Union[List, np.ndarray, pd.Series],
+                       surrogates: Union[List, np.ndarray, pd.Series] = None,
+                       membership_labels: Union[str, float, int, List, np.array] = 1,
+                       bootstrap_results: Optional[pd.DataFrame] = None) -> pd.DataFrame:
         """
-        Calculates and tabulates all of the fairness metric scores.
-
+        Calculates and tabulates all fairness metric scores.
         Parameters
         ----------
         labels: Union[List, np.ndarray, pd.Series]
-            Binary ground truth labels for the provided dataset (0/1).
+            Binary ground truth labels for each sample.
         predictions: Union[List, np.ndarray, pd.Series]
-            Binary predictions from some black-box classifier (0/1).
-        is_member: Union[List, np.ndarray, pd.Series]
-            Binary membership labels (0/1).
-        membership_label: Union[str, float, int]
-            Value indicating group membership.
-            Default value is 1.
-
+            Binary prediction for each sample from a black-box classifier binary (0/1).
+        memberships: Union[List, np.ndarray, pd.Series, List[List], pd.DataFrame]
+            Membership attribute for each sample.
+                If deterministic, it is the binary label for each sample [0, 1, 0, ..., 1]
+                If probabilistic, it is the likelihoods array of membership labels
+                                  for each sample, i.e., a two-dim array [[0.6, 0.2, 0.2], ..., [..]]
+        surrogates: Union[List, np.ndarray, pd.Series]
+            Surrogate class attribute for each sample.
+                If the membership is deterministic, surrogates are not needed.
+                If the membership is probabilistic,
+                    - if surrogates are given, inferred metrics are used
+                                               to calculate the fairness metric as proposed in [1]_.
+                    - when surrogates are not given, the arg max likelihood is used as the membership for each sample.
+            Default is None.
+        membership_labels: Union[int, float, str, List[int],np.array[int]]
+            Labels indicating group membership.
+                If the membership is deterministic, a single str/int is expected, e.g., 1.
+                If the membership is probabilistic, a list or np.array of int is expected,
+                                                    with the index of the protected groups in the memberships array,
+                                                    e.g, [1, 2, 3], if 1-2-3 indexes are protected.
+                Default value is 1 for deterministic case or [1] for probabilistic case.
+        bootstrap_results: Optional[pd.DataFrame]
+            A Pandas dataframe with inferred scores based surrogate class memberships.
+            Default value is None.
+            When given, other parameters will be discarded and bootstrap results will be used.
         Returns
         ----------
         Pandas data frame with all implemented binary fairness metrics.
         """
-        # Logic to check input types
-        check_inputs_validity(labels=labels, predictions=predictions, is_member=is_member, optional_labels=False)
-
-        fairness_funcs = inspect.getmembers(BinaryFairnessMetrics, predicate=inspect.isclass)[:-1]
 
+        # if memberships is given as likelihoods WITHOUT any surrogates, then revise it to deterministic case
+        is_memberships_1d = is_one_dimensional(memberships)
+        if not is_memberships_1d and surrogates is None and bootstrap_results is None:
+            # Subtle point: membership_labels need to be an array when membership is 2d
+            # if the user didn't specify, which defaults to 1, convert 1 -> [1] automatically
+            # BUT do not overwrite membership_labels, we are still in "deterministic" mode via argmax
+            # In deterministic mode, we need a single primitive label like 1
+            memberships = get_argmax_memberships(memberships, [1] if membership_labels == 1 else membership_labels)
+            # We now converted 2d likelihoods memberships into deterministic 1d membership, set flag to true
+            is_memberships_1d = True
+
+        # Probabilistic version
+        if not is_memberships_1d or bootstrap_results is not None:
+            if membership_labels == 1:
+                membership_labels = [1]
+
+            if bootstrap_results is None:
+                bootstrap_results = get_bootstrap_results(predictions, memberships, surrogates,
+                                                          membership_labels, labels)
+
+        # Output df
         df = pd.DataFrame(columns=["Metric", "Value", "Ideal Value", "Lower Bound", "Upper Bound"])
+
+        fairness_funcs = inspect.getmembers(BinaryFairnessMetrics, predicate=inspect.isclass)[:-1]
         for fairness_func in fairness_funcs:
 
+            # Get metric
             name = fairness_func[0]
             class_ = getattr(BinaryFairnessMetrics, name)  # grab a class which is a property of BinaryFairnessMetrics
-            instance = class_()  # dynamically instantiate such class
+            metric = class_()  # dynamically instantiate such class
 
-            if name in ["DisparateImpact", "StatisticalParity"]:
-                score = instance.get_score(predictions, is_member, membership_label)
-            elif name in ["GeneralizedEntropyIndex", "TheilIndex"]:
-                score = instance.get_score(labels, predictions)
-            else:
-                score = instance.get_score(labels, predictions, is_member, membership_label)
+            # Get score
+            score = BinaryFairnessMetrics._get_score_logic(metric, name,
+                                                           labels, predictions, memberships, surrogates,
+                                                           membership_labels, bootstrap_results)
 
-            if score is None:
-                score = np.nan
-            score = np.round(score, 3)
-            df = pd.concat([df, pd.DataFrame(
-                [[instance.name, score, instance.ideal_value, instance.lower_bound, instance.upper_bound]],
-                columns=df.columns)], axis=0, ignore_index=True)
+            # Add score
+            df = pd.concat([df,
+                            pd.DataFrame([[metric.name, score, metric.ideal_value,
+                                           metric.lower_bound, metric.upper_bound]], columns=df.columns)],
+                           axis=0, ignore_index=True)
 
         df = df.set_index("Metric")
 
         return df
 
+    @staticmethod
+    def _get_score_logic(metric, name,
+                         labels, predictions,
+                         memberships, surrogates,
+                         membership_labels, bootstrap_results):
+
+        # Standard deterministic calculation
+        if bootstrap_results is None:
+            if name in ["DisparateImpact", "StatisticalParity"]:
+                score = metric.get_score(predictions, memberships, membership_labels)
+            elif name in ["GeneralizedEntropyIndex", "TheilIndex"]:
+                score = metric.get_score(labels, predictions)
+            else:
+                score = metric.get_score(labels, predictions, memberships, membership_labels)
+        else:
+            if name == "StatisticalParity":
+                score = metric.get_score(predictions, memberships, surrogates, membership_labels, bootstrap_results)
+            elif name in ["AverageOdds", "EqualOpportunity", "FNRDifference", "PredictiveEquality"]:
+                score = metric.get_score(labels, predictions, memberships, surrogates,
+                                         membership_labels, bootstrap_results)
+            else:
+                score = None
+
+        # pretty score
+        score = np.nan if score is None else np.round(score, 3)
+
+        return score
+
 
 class MultiClassFairnessMetrics(NamedTuple):
     """