Skip to content

Commit

Permalink
Merge pull request #24 from fidelity/probabilistic_membership
Browse files Browse the repository at this point in the history
Probabilistic membership
  • Loading branch information
skadio committed Sep 7, 2023
2 parents 47848d3 + 4536a33 commit 4e59100
Show file tree
Hide file tree
Showing 26 changed files with 2,966 additions and 625 deletions.
8 changes: 8 additions & 0 deletions CHANGELOG.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,20 @@
CHANGELOG
=========

-------------------------------------------------------------------------------
Sep 09, 2022 2.0.0
-------------------------------------------------------------------------------

- Probabilistic fairness metrics are added based on membership likelihoods and surrogates --thanks to @mthielbar
- Algorithm based on Surrogate Membership for Inferred Metrics in Fairness Evaluation (LION 2023)

-------------------------------------------------------------------------------
August 1, 2023 1.3.4
-------------------------------------------------------------------------------

- Added False Omission Rate Difference to Binary Fairness Metrics.


-------------------------------------------------------------------------------
April 21, 2023 1.3.3
-------------------------------------------------------------------------------
Expand Down
2 changes: 1 addition & 1 deletion CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
# These owners will be the default owners for everything in the repo.
* @bkleyn @skadio
* @bkleyn @skadio @mthielbar
71 changes: 65 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,12 @@

# Jurity: Fairness & Evaluation Library

Jurity is a research library that provides fairness metrics, recommender system evaluations, classification metrics and bias mitigation techniques. The library adheres to PEP-8 standards and is tested heavily.
Jurity ([LION'23](), [ICMLA'21](https://ieeexplore.ieee.org/document/9680169)) is a research library
that provides fairness metrics, recommender system evaluations, classification metrics and bias mitigation techniques.
The library adheres to PEP-8 standards and is tested heavily.

Jurity is developed by the Artificial Intelligence Center of Excellence at Fidelity Investments. Documentation is available at [fidelity.github.io/jurity](https://fidelity.github.io/jurity).
Jurity is developed by the Artificial Intelligence Center of Excellence at Fidelity Investments.
Documentation is available at [fidelity.github.io/jurity](https://fidelity.github.io/jurity).

## Fairness Metrics
* [Average Odds](https://fidelity.github.io/jurity/about_fairness.html#average-odds)
Expand Down Expand Up @@ -51,7 +54,7 @@ from jurity.fairness import BinaryFairnessMetrics, MultiClassFairnessMetrics
binary_predictions = [1, 1, 0, 1, 0, 0]
multi_class_predictions = ["a", "b", "c", "b", "a", "a"]
multi_class_multi_label_predictions = [["a", "b"], ["b", "c"], ["b"], ["a", "b"], ["c", "a"], ["c"]]
is_member = [0, 0, 0, 1, 1, 1]
memberships = [0, 0, 0, 1, 1, 1]
classes = ["a", "b", "c"]

# Metrics (see also other available metrics)
Expand All @@ -63,11 +66,41 @@ print("Metric:", metric.description)
print("Lower Bound: ", metric.lower_bound)
print("Upper Bound: ", metric.upper_bound)
print("Ideal Value: ", metric.ideal_value)
print("Binary Fairness score: ", metric.get_score(binary_predictions, is_member))
print("Multi-class Fairness scores: ", multi_metric.get_scores(multi_class_predictions, is_member))
print("Multi-class multi-label Fairness scores: ", multi_metric.get_scores(multi_class_multi_label_predictions, is_member))
print("Binary Fairness score: ", metric.get_score(binary_predictions, memberships))
print("Multi-class Fairness scores: ", multi_metric.get_scores(multi_class_predictions, memberships))
print("Multi-class multi-label Fairness scores: ", multi_metric.get_scores(multi_class_multi_label_predictions, memberships))
```

## Quick Start: Probabilistic Fairness Evaluation

What if we do not know the protected membership attribute of each sample? This is a practical scenario that we refer to as _probabilistic_ fairness evaluation.

At a high-level, instead of strict 0/1 deterministic membership at individual level, consider the probability of membership to protected classes for each sample.

An easy baseline is to convert these probabilities back to the deterministic setting by taking the maximum likelihood as the protected membership. This is problematic as the goal is not to predict membership but to evaluate fairness.

Taking this a step further, while we do not have membership information at the individual level, consider access to _surrogate membership_ at _group level_. We can then infer the fairness metrics directly.

Jurity offers both options to address the case where membership data is missing. We provide an in-depth study and formal treatment in [Surrogate Membership for Inferred Metrics in Fairness Evaluation (LION 2023)]().

```python
from jurity.fairness import BinaryFairnessMetrics

# Instead of 0/1 deterministic membership at individual level
# consider likelihoods of membership to protected classes for each sample
binary_predictions = [1, 1, 0, 1]
memberships = [[0.2, 0.8], [0.4, 0.6], [0.2, 0.8], [0.9, 0.1]]

# Metric
metric = BinaryFairnessMetrics.StatisticalParity()
print("Binary Fairness score: ", metric.get_score(binary_predictions, memberships))

# Surrogate membership: consider access to surrogate membership at the group level.
surrogates = [0, 2, 0, 1]
print("Binary Fairness score: ", metric.get_score(binary_predictions, memberships, surrogates))
```


## Quick Start: Bias Mitigation

```python
Expand Down Expand Up @@ -154,6 +187,32 @@ print('F1 score is', f1_score.get_score(predictions, labels))

Jurity requires **Python 3.7+** and can be installed from PyPI using ``pip install jurity`` or by building from source as shown in [installation instructions](https://fidelity.github.io/jurity/install.html).

## Citation

If you use MABWiser in a publication, please cite it as:

```bibtex
@article{DBLP:conf/lion/Melinda23,
author = {Melinda Thielbar, Serdar Kadioglu, Chenhui Zhang, Rick Pack, and Lukas Dannull},
title = {Surrogate Membership for Inferred Metrics in Fairness Evaluation},
booktitle = {The 17th Learning and Intelligent Optimization Conference (LION)},
publisher = {{LION}},
year = {2023}
}
@inproceedings{DBLP:conf/icmla/MichalskyK21,
author = {Filip Michalsk{\'{y}} and Serdar Kadioglu},
title = {Surrogate Ground Truth Generation to Enhance Binary Fairness Evaluation in Uplift Modeling},
booktitle = {20th {IEEE} International Conference on Machine Learning and Applications,
{ICMLA} 2021, Pasadena, CA, USA, December 13-16, 2021},
pages = {1654--1659},
publisher = {{IEEE}},
year = {2021},
url = {https://doi.org/10.1109/ICMLA52953.2021.00264},
doi = {10.1109/ICMLA52953.2021.00264},
}
```

## Support
Please submit bug reports and feature requests as [Issues](https://github.com/fidelity/jurity/issues).

Expand Down
2 changes: 1 addition & 1 deletion jurity/_version.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@
# Copyright FMR LLC <opensource@fidelity.com>
# SPDX-License-Identifier: Apache-2.0

__version__ = "1.3.4"
__version__ = "2.0.0"
41 changes: 41 additions & 0 deletions jurity/constants.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
from typing import NamedTuple
import numpy as np


class Constants(NamedTuple):
"""
Constant values used by the modules.
"""

default_seed = 1
float_null = np.float64(0.0)
bootstrap_trials = 100

TPR = "TPR"
TNR = "TNR"
FPR = "FPR"
FNR = "FNR"
PPV = "PPV"
NPV = "NPV"
FDR = "FDR"
FOR = "FOR"
ACC = "ACC"
PRED_RATE = "Prediction Rate"

user_id = "user_id"
item_id = "item_id"
estimate = "estimate"
inverse_propensity = "inverse_propensity"
ips_correction = "ips_correction"
propensity = "propensity"

true_positive_ratio = "true_positive_ratio"
true_negative_ratio = "true_negative_ratio"
false_positive_ratio = "false_positive_ratio"
false_negative_ratio = "false_negative_ratio"
prediction_ratio = "prediction_ratio"
class_col_name = "class"
weight_col_name = "count"
no_label_metrics = ["StatisticalParity", "DisparateImpact"]
probabilistic_metrics = ["AverageOdds", "EqualOpportunity",
"FNRDifference", "StatisticalParity", "PredictiveEquality"]
128 changes: 97 additions & 31 deletions jurity/fairness/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,17 @@
# SPDX-License-Identifier: Apache-2.0

import inspect
from typing import List, Union
from typing import List, Union, Optional
from typing import NamedTuple

import numpy as np
import pandas as pd

from jurity.fairness.base import _BaseBinaryFairness
from jurity.fairness.base import _BaseMultiClassMetric
from jurity.utils import check_inputs_validity
from jurity.utils import is_one_dimensional
from jurity.utils_proba import get_argmax_memberships
from jurity.utils_proba import get_bootstrap_results
from .average_odds import AverageOdds
from .disparate_impact import BinaryDisparateImpact, MultiDisparateImpact
from .equal_opportunity import EqualOpportunity
Expand Down Expand Up @@ -41,57 +43,121 @@ class BinaryFairnessMetrics(NamedTuple):
@staticmethod
def get_all_scores(labels: Union[List, np.ndarray, pd.Series],
predictions: Union[List, np.ndarray, pd.Series],
is_member: Union[List, np.ndarray, pd.Series],
membership_label: Union[str, float, int] = 1) -> pd.DataFrame:
memberships: Union[List, np.ndarray, pd.Series],
surrogates: Union[List, np.ndarray, pd.Series] = None,
membership_labels: Union[str, float, int, List, np.array] = 1,
bootstrap_results: Optional[pd.DataFrame] = None) -> pd.DataFrame:
"""
Calculates and tabulates all of the fairness metric scores.
Calculates and tabulates all fairness metric scores.
Parameters
----------
labels: Union[List, np.ndarray, pd.Series]
Binary ground truth labels for the provided dataset (0/1).
Binary ground truth labels for each sample.
predictions: Union[List, np.ndarray, pd.Series]
Binary predictions from some black-box classifier (0/1).
is_member: Union[List, np.ndarray, pd.Series]
Binary membership labels (0/1).
membership_label: Union[str, float, int]
Value indicating group membership.
Default value is 1.
Binary prediction for each sample from a black-box classifier binary (0/1).
memberships: Union[List, np.ndarray, pd.Series, List[List], pd.DataFrame]
Membership attribute for each sample.
If deterministic, it is the binary label for each sample [0, 1, 0, ..., 1]
If probabilistic, it is the likelihoods array of membership labels
for each sample, i.e., a two-dim array [[0.6, 0.2, 0.2], ..., [..]]
surrogates: Union[List, np.ndarray, pd.Series]
Surrogate class attribute for each sample.
If the membership is deterministic, surrogates are not needed.
If the membership is probabilistic,
- if surrogates are given, inferred metrics are used
to calculate the fairness metric as proposed in [1]_.
- when surrogates are not given, the arg max likelihood is used as the membership for each sample.
Default is None.
membership_labels: Union[int, float, str, List[int],np.array[int]]
Labels indicating group membership.
If the membership is deterministic, a single str/int is expected, e.g., 1.
If the membership is probabilistic, a list or np.array of int is expected,
with the index of the protected groups in the memberships array,
e.g, [1, 2, 3], if 1-2-3 indexes are protected.
Default value is 1 for deterministic case or [1] for probabilistic case.
bootstrap_results: Optional[pd.DataFrame]
A Pandas dataframe with inferred scores based surrogate class memberships.
Default value is None.
When given, other parameters will be discarded and bootstrap results will be used.
Returns
----------
Pandas data frame with all implemented binary fairness metrics.
"""
# Logic to check input types
check_inputs_validity(labels=labels, predictions=predictions, is_member=is_member, optional_labels=False)

fairness_funcs = inspect.getmembers(BinaryFairnessMetrics, predicate=inspect.isclass)[:-1]

# if memberships is given as likelihoods WITHOUT any surrogates, then revise it to deterministic case
is_memberships_1d = is_one_dimensional(memberships)
if not is_memberships_1d and surrogates is None and bootstrap_results is None:
# Subtle point: membership_labels need to be an array when membership is 2d
# if the user didn't specify, which defaults to 1, convert 1 -> [1] automatically
# BUT do not overwrite membership_labels, we are still in "deterministic" mode via argmax
# In deterministic mode, we need a single primitive label like 1
memberships = get_argmax_memberships(memberships, [1] if membership_labels == 1 else membership_labels)
# We now converted 2d likelihoods memberships into deterministic 1d membership, set flag to true
is_memberships_1d = True

# Probabilistic version
if not is_memberships_1d or bootstrap_results is not None:
if membership_labels == 1:
membership_labels = [1]

if bootstrap_results is None:
bootstrap_results = get_bootstrap_results(predictions, memberships, surrogates,
membership_labels, labels)

# Output df
df = pd.DataFrame(columns=["Metric", "Value", "Ideal Value", "Lower Bound", "Upper Bound"])

fairness_funcs = inspect.getmembers(BinaryFairnessMetrics, predicate=inspect.isclass)[:-1]
for fairness_func in fairness_funcs:

# Get metric
name = fairness_func[0]
class_ = getattr(BinaryFairnessMetrics, name) # grab a class which is a property of BinaryFairnessMetrics
instance = class_() # dynamically instantiate such class
metric = class_() # dynamically instantiate such class

if name in ["DisparateImpact", "StatisticalParity"]:
score = instance.get_score(predictions, is_member, membership_label)
elif name in ["GeneralizedEntropyIndex", "TheilIndex"]:
score = instance.get_score(labels, predictions)
else:
score = instance.get_score(labels, predictions, is_member, membership_label)
# Get score
score = BinaryFairnessMetrics._get_score_logic(metric, name,
labels, predictions, memberships, surrogates,
membership_labels, bootstrap_results)

if score is None:
score = np.nan
score = np.round(score, 3)
df = pd.concat([df, pd.DataFrame(
[[instance.name, score, instance.ideal_value, instance.lower_bound, instance.upper_bound]],
columns=df.columns)], axis=0, ignore_index=True)
# Add score
df = pd.concat([df,
pd.DataFrame([[metric.name, score, metric.ideal_value,
metric.lower_bound, metric.upper_bound]], columns=df.columns)],
axis=0, ignore_index=True)

df = df.set_index("Metric")

return df

@staticmethod
def _get_score_logic(metric, name,
labels, predictions,
memberships, surrogates,
membership_labels, bootstrap_results):

# Standard deterministic calculation
if bootstrap_results is None:
if name in ["DisparateImpact", "StatisticalParity"]:
score = metric.get_score(predictions, memberships, membership_labels)
elif name in ["GeneralizedEntropyIndex", "TheilIndex"]:
score = metric.get_score(labels, predictions)
else:
score = metric.get_score(labels, predictions, memberships, membership_labels)
else:
if name == "StatisticalParity":
score = metric.get_score(predictions, memberships, surrogates, membership_labels, bootstrap_results)
elif name in ["AverageOdds", "EqualOpportunity", "FNRDifference", "PredictiveEquality"]:
score = metric.get_score(labels, predictions, memberships, surrogates,
membership_labels, bootstrap_results)
else:
score = None

# pretty score
score = np.nan if score is None else np.round(score, 3)

return score


class MultiClassFairnessMetrics(NamedTuple):
"""
Expand Down
Loading

0 comments on commit 4e59100

Please sign in to comment.