Improvements and Bug Fixes for Probabilistic Fairness #27

mfthielb · 2023-12-27T16:44:34Z

What

The primary purpose of this PR is to adds a path for probabilistic fairness to BinaryFairnessmetrics.get_all_scores(). Other small updates include changing the default minimum weight value to 5 instead of 30 (this should improve results for small samples, based on results from simulations).

Other Features

A new class called UtilsProbaSimulator has been added to test_utils_proba.py. This simplifies unit tests for probabilistic fairness accuracy and enables other users to reproduce results from simulations in academic papers related to probabilistic fairness.

Why

BinaryFairnessMetrics.get_all_scores is the primary path that users will use to access fairness metrics. This makes probabilistic fairness more accessible to Jurity's audience. Adding easy simulation capabilities to test procedures will enable other users to contribute more easily.

How

get_all_scores follows the same rules as Metric.get_scores() in terms of deciding if the user is asking for probabilistic fairness or deterministic fairness. The rest is already handled by downstream processes implemented in Jurity 2.0.0.

tests/test_utils_proba.py has a new class called UtilsProbSimulator, which takes a dictionary of input unfairness characteristics and contains classes for generating simulated data from an input surrogate class.

…ogate column not named 'surrogate'. Update unit tests to fix error.

Merge update that fixes bug in summarizer that inserts missing values when membership dataframe surrogate column is not named surrogates.

Keep update_simulation up-to-date with proba_membership_updates. It should include all of that branch's fixes.

…imensional numpy array in EqualizedOdds bias mitigation. Convert to float when necessary.

…s will still print during unit tests because higher-level API will not have the option to turn warnings off. This keeps the API cleaner.

… experiment with min counts per surrogate.

… unit tests do not fail.

…put charts from results.

skadio

Review comments added

skadio · 2024-01-05T11:00:22Z

tests/test_mitigation_binary.py

-        n2p_prob_0 = n2p_prob_0.item()
-        p2p_prob_1 = p2p_prob_1.item()
-        n2p_prob_1 = n2p_prob_1.item()
+        p2p_prob_0 = p2p_prob_0


are theese lines needed?

This method returns a null if the values come back empty. Otherwise, you get an error when you try to access.

hmm.. I guess I am not seeing something. Let's take an example
p2p_prob_0 = p2p_prob_0
what does this line do? setting x=x seems redundant to me, but may be there is sth subtle i am missing

removing these

skadio · 2024-01-05T11:00:31Z

tests/test_mitigation_binary.py

-        n2p_prob_0 = n2p_prob_0.item()
-        p2p_prob_1 = p2p_prob_1.item()
-        n2p_prob_1 = n2p_prob_1.item()
+        p2p_prob_0 = p2p_prob_0


are these lines needed?

Same as above.

removing these

tests/test_utils_proba.py

skadio · 2024-01-05T11:05:14Z

jurity/utils_proba.py

@@ -992,12 +992,13 @@ def make_summary_data(self, perf_df: pd.DataFrame, surrogate_df: pd.DataFrame =
        self.check_surrogate_data(surrogate_df)
        merged_data = perf_df.merge(surrogate_df, left_on=self.surrogate_perf_col_name(),
                                    right_on=self.surrogate_surrogate_col_name())
-        self.check_merged_data(merged_data, perf_df)
+        self.check_merged_data(merged_data, perf_df,warnings)


space after comma ","

skadio · 2024-01-05T11:06:26Z

jurity/fairness/__init__.py

@@ -147,7 +147,7 @@ def _get_score_logic(metric, name,
        else:
            if name == "StatisticalParity":
                score = metric.get_score(predictions, memberships, surrogates, membership_labels, bootstrap_results)
-            elif name in ["AverageOdds", "EqualOpportunity", "FNRDifference", "PredictiveEquality"]:
+            elif name in ["AverageOdds", "EqualOpportunity", "FNRDifference", "PredictiveEquality","EqualOpportunity"]:


this list already has EqualOpportunity, no?

skadio · 2024-01-05T11:09:42Z

examples/probabilistic_fairness/input_data/surrogate_inputs.csv

quick q on the "data". Where is this data coming from? Is there an original version that we borrow from somewhere else (hence, copyright?) OR .. this data is generated by us?

skadio · 2024-01-05T11:17:55Z

examples/probabilistic_fairness/readme.md

@@ -0,0 +1,59 @@
+# Probabilistic Fairness Demonstration


I am still not sure about adding data/code from the paper into the library code here.

Particularly because, a general user of the library, when they do a pip install to use Jurity, is not interested in downloading/copying these. It also bloats the library size.

Here are two other options we can consider:

Merge the PR here, without the examples folder. Create a feature branch, named sth like probablistic_fairness_data, and add this "examples" folder in that data branch. We don't have to merge the data branch, it can always stay there. That is, anyone can access to it, download, repeat, re-run experiments. But we don't merge it to the main library.

Upload paper data/code to HuggingFace. See here https://huggingface.co/datasets/skadio/optimized_item_selection

In that case, we added the optimized item selection data from a published paper as HF dataset. It uses the Seq2Pat library (akin to Jurity) here. I can upload there if you want OR you can do the same under your account.

Thoughts?

skadio · 2024-01-05T11:18:25Z

examples/probabilistic_fairness/readme.md

+Probabilistic fairness, its accuracy, and the simulation method used in
+these demonstrations are detailed in 
+<a href="https://doi.org/10.1007/978-3-031-44505-7_29">"
+Surrogate Membership for Inferred Metrics in Fairness Evaluation"</a>


Can we add a word about what Table in the paper comes from what steps below?

It would be good to "align" tables from the paper to the code here so that one looks at the paper vs. the folder here, the connection between the two reveals itself.

skadio · 2024-01-24T21:15:28Z

jurity/fairness/__init__.py

@@ -147,7 +147,7 @@ def _get_score_logic(metric, name,
        else:
            if name == "StatisticalParity":
                score = metric.get_score(predictions, memberships, surrogates, membership_labels, bootstrap_results)
-            elif name in ["AverageOdds", "EqualOpportunity", "FNRDifference", "PredictiveEquality"]:
+            elif name in ["AverageOdds", "EqualOpportunity", "FNRDifference", "PredictiveEquality","EqualOpportunity"]:


skadio · 2024-01-24T21:16:23Z

jurity/utils_proba.py

@@ -791,7 +791,7 @@ def summarize(cls,
            likes_df.columns = membership_names
            likes_df = likes_df.reset_index()
        summarizer = cls("surrogates", "surrogates", "predictions", true_name=label_name, test_names=test_names)
-        return summarizer.make_summary_data(perf_df=df, surrogate_df=likes_df)
+        return summarizer.make_summary_data(perf_df=df, surrogate_df=likes_df,warnings=warnings)


skadio · 2024-01-24T21:16:33Z

jurity/utils_proba.py

@@ -981,7 +981,7 @@ def check_surrogate_confusion_matrix(self, confusion_df, merged_df):
            # return False
        return True

-    def make_summary_data(self, perf_df: pd.DataFrame, surrogate_df: pd.DataFrame = None):
+    def make_summary_data(self, perf_df: pd.DataFrame, surrogate_df: pd.DataFrame = None,warnings=True):


skadio · 2024-01-24T21:16:47Z

jurity/utils_proba.py

@@ -992,12 +992,13 @@ def make_summary_data(self, perf_df: pd.DataFrame, surrogate_df: pd.DataFrame =
        self.check_surrogate_data(surrogate_df)
        merged_data = perf_df.merge(surrogate_df, left_on=self.surrogate_perf_col_name(),
                                    right_on=self.surrogate_surrogate_col_name())
-        self.check_merged_data(merged_data, perf_df)
+        self.check_merged_data(merged_data, perf_df,warnings)


skadio · 2024-01-24T21:19:03Z

tests/test_mitigation_binary.py

-        n2p_prob_0 = n2p_prob_0.item()
-        p2p_prob_1 = p2p_prob_1.item()
-        n2p_prob_1 = n2p_prob_1.item()
+        p2p_prob_0 = p2p_prob_0


removing these

jurity/utils_proba.py

tests/test_utils_proba.py

mthielbar added 25 commits September 7, 2023 14:35

Add test for get_all_scores

76dade3

Bug fix. EqualOpportunity should be included in get_all_scores.

2c399b8

Small updates to test_utils_proba.py

6073318

Rearrange simulation into its own class.

7f8ecde

Simulator is its own class. Simulator unit tests running clean.

da13e4f

Small edits to test_utils_proba.py

04d0c63

Fix small bug that occurs in summarizer when mambership_df has a surr…

25fc7fd

…ogate column not named 'surrogate'. Update unit tests to fix error.

Add tests for summarizer.

fa8f8bc

Incorporate fixes to summarizer.

5ee2d0b

Merge branch 'summarizer_bug' into prob_membership_updates

2ea6660

Merge update that fixes bug in summarizer that inserts missing values when membership dataframe surrogate column is not named surrogates.

Cleanup code after merging changes to fix summarizer bug.

899c747

run_bootstrap was using incorrect class label function call.

6e0a826

Merge branch 'prob_membership_updates' into update_simulation

4157bd4

Keep update_simulation up-to-date with proba_membership_updates. It should include all of that branch's fixes.

Clean up print statements in is_one_dimensional.

15395ac

Clean up deprecation warning caused by cvx.Variable returning a one-d…

325d123

…imensional numpy array in EqualizedOdds bias mitigation. Convert to float when necessary.

Turn off user warnings where possible in test_utils_proba.py. Warning…

9f195d6

…s will still print during unit tests because higher-level API will not have the option to turn warnings off. This keeps the API cleaner.

Update to utils_proba.py

c05ae6e

Edit comments in simulator.

c2401d9

Merge code for simulator class with fixes.

5a18d04

Update minimum weight to 5 rows, according to results from simulation…

721cd3e

… experiment with min counts per surrogate.

Make simulation dataframe large enough so values are not unstable and…

13baf31

… unit tests do not fail.

Add simulation scripts and readme.md for probabilistic fairness.

50ff34e

Update comments and readme.md

5971028

Add descriptions and citations to readme

83f1782

Add input data for simulations and supporting notebooks to create out…

377756a

…put charts from results.

mfthielb requested review from bkleyn and skadio as code owners December 27, 2023 16:44

skadio reviewed Jan 16, 2024

View reviewed changes

skadio added 2 commits January 24, 2024 16:20

update

1adfb48

update

4830f9b

skadio added 2 commits January 24, 2024 16:40

update

02f0497

update

8897d66

skadio approved these changes Jan 25, 2024

View reviewed changes

skadio merged commit 87026e1 into master Jan 25, 2024
14 checks passed

skadio deleted the proba_improvements branch January 25, 2024 20:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvements and Bug Fixes for Probabilistic Fairness #27

Improvements and Bug Fixes for Probabilistic Fairness #27

mfthielb commented Dec 27, 2023

skadio left a comment

skadio Jan 5, 2024

mfthielb Jan 23, 2024

skadio Jan 24, 2024

skadio Jan 24, 2024

skadio Jan 5, 2024

mfthielb Jan 23, 2024

skadio Jan 24, 2024

skadio Jan 5, 2024

skadio Jan 24, 2024

skadio Jan 5, 2024

skadio Jan 24, 2024

skadio Jan 5, 2024

skadio Jan 5, 2024

skadio Jan 5, 2024

skadio Jan 5, 2024

skadio Jan 24, 2024

skadio Jan 24, 2024

skadio Jan 24, 2024

skadio Jan 24, 2024

skadio Jan 24, 2024

Improvements and Bug Fixes for Probabilistic Fairness #27

Improvements and Bug Fixes for Probabilistic Fairness #27

Conversation

mfthielb commented Dec 27, 2023

What

Other Features

Why

How

skadio left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment