Plots: Add plot_disparities_in_performance and plot_disparities_in_metric #561

kevinrobinson · 2020-08-14T21:28:58Z

Rationale

This work is intended to help along the suggestion in fairlearn/fairlearn-proposals#14 (comment), with the assumption that the UI and Azure-specific bits of the project will be removed and placed in a separate repository with a different governance structure.

To illustrate that this is both viable and plays to the strengths of folks currently staffing the project, here is an example function that could be used in the quick start guide, removing the need for any Azure or UI code. This is based on some key assumptions, including:

to date, there has been no user research or feedback validating that users can use these visualizations to do meaningful work that assesses or reduces real world harm
the current visualizations provide little value in making meaningful decisions about how to interpret the data, or how to take action on it
the interactivity of the visualizations is low-value based on the team's current understanding of practitioner needs (eg, Holstein et al. 2019; Madaio et al. 2020)
the overhead of making widgets and HTML/CSS/JS is a barrier for team members in maintaining, supporting and growing the project
the current visualizations do not address any of the core challenges involved in fairness as sociotechnical work (eg, Selbst et al. 2019), and so reducing maintenance burden can free folks up to engage in higher-value work

If these assumptions are off, please comment and help validate them! 😄

What this adds

This pull request adds the plot_disparities_in_performance and plot_disparities_in_selection_rate functions, intended to replace those two visualizations built as widgets in HTML/CSS/JS.

Disparity in selection rate

Note the change of bar color to grey, since blue was used to communicate something specific in the chart above and the meaning of the color is not the same in this chart.

now: +3 clicks in Azure UI

added: plot_disparity_in_selection_rate

Disparity in accuracy

Note the overprediction and underprediction values within the charts are quite different in the new plot. They reflect the values that the false_positive_rate and false_negative_rate functions return, but these don't really make sense to me. I am mixing up something important here, but not sure what. I'll take another look next week, but if other folks can spot what the issue is that'd be awesome and super helpful! 👍

EDIT: copy-paste typo in underprediction/overprediction is fixed now.

now: +3 clicks in Azure UI

added: plot_disparity_in_accuracy

EDIT: as @romanlutz helpfully pointed out, this also supports multiple values for a single sensitive attribute. This example uses data.data['race'] in place of data.data['sex'] below (these examples are all narrowly technical without sociotechnical context):

Other notes

researchers interested in low-level HCI studies (eg, can people use this chart to do meaningful work?) could start those studies immediately to understand if there is any value here (or to do comparison studies of this versus elements from Adding the disparity tab back to the widget #283 (comment))
this addresses issues like Interactive dashboard doesn't show on Azure Compute Instances, either JupyterLab or Jupyter Notebooks #558, Dashboard did not even show #484, Validate that checked in widget files aren't stale #270, Widget needs minimum padding for over-/underprediction charts #273, Fairlearn widget crashes when run inside of VS code #501, etc.
other folks looking for ways to contribute with code could add other functions (eg, a "model comparison" scatterplot), or others as-needed in the course of developing example notebooks (eg, for more realistic and useful subgroup analysis that a single binary sensitive attribute) or if there is validated demand for other visuals (eg, Adding the disparity tab back to the widget #283 (comment))
other folks looking to help with user guide or documentation are welcome to update the related user guide page
this doesn't address localization, but strings are factored out and if MS folks want to lend translation resources we could do that post-merge
this doesn't address accessibility, I'm not sure how to do that with these kinds of charts but could partner with anyone to do this now or post-merge
the quickstart example illustrates a narrowly technical perspective that doesn't follow the contributing example notebooks, but this pull request does not address that
there's no tests here, and I would vote to invest in user testing over unit testing or integration testing

Code to test this in Jupyter on 0.4.6:

# copied verbatim from quickstart
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_openml
data = fetch_openml(data_id=1590, as_frame=True)
X = pd.get_dummies(data.data)
y_true = (data.target == '>50K') * 1
sex = data.data['sex']

from fairlearn.metrics import group_summary
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier(min_samples_leaf=10, max_depth=4)
classifier.fit(X, y_true)
y_pred = classifier.predict(X)

# to use new functions
from fairlearn.plots import plot_disparities_in_performance, plot_disparities_in_selection_rate
plot_disparities_in_performance(y_true, y_pred, sex)
plot_disparities_in_selection_rate(y_true, y_pred, sex)

riedgar-ms · 2020-08-15T21:57:52Z

As a warning, matplotlib only gets installed as part of [customplots].

I think @romanlutz came up with a way of testing plotting methods too?

romanlutz · 2020-08-15T23:43:26Z

As a warning, matplotlib only gets installed as part of [customplots].

I think @romanlutz came up with a way of testing plotting methods too?

That's correct! I'm not too fond of that name customplots, so if people want to make that shorter (plots) I'm all in favor.

We'd also have to provide proper error messages telling users that they need matplotlib if it's not installed, just like with the postprocessing plots.

I don't quite agree with everything written in the description but that's immaterial.

The more important question to me is what this means to UI development for the community and the proposal fairlearn/fairlearn-proposals#14 . Does this mean you'd prefer not to have a separate repo elsewhere? On Thursday people were quite insistent that there should be a way for the community to write UI, so I'd rather get that settled before inventing new UIs.

Regarding testing: there's a pytest plugin for testing plots but it didn't work across platforms that we test on, so I never turned it on. It does work locally, though. See https://github.com/fairlearn/fairlearn/blob/master/test/unit/postprocessing/test_plots.py for more information.

Finally, this kind of change would need an update to CHANGES.md, API reference (docs folder), and user guide.

kevinrobinson · 2020-08-16T01:22:36Z

@riedgar-ms @romanlutz Thanks for the comments! 👍

This gets started with the first step outlined in fairlearn/fairlearn-proposals#14 (comment), and I'm hoping to help move things forward by showing rather than telling :) It also improves the UX for prospective users currently facing issues like #484 #501 and #558.

If folks want to merge this, I can work myself or with anyone else to finish this off.

romanlutz · 2020-08-16T02:38:16Z

@riedgar-ms @romanlutz Thanks for the comments! 👍

This gets started with the first step outlined in fairlearn/fairlearn-proposals#14 (comment), and I'm hoping to help move things forward by showing rather than telling :) It also improves the UX for prospective users currently facing issues like #484 #501 and #558.

If folks want to merge this, I can work myself or with anyone else to finish this off.

Right. I think this is a fine first step. We sort of agreed to try and add documentation (API reference for new modules, CHANGES.md, user guide) whenever adding the code, although that's obviously not as strict as long as we have severe gaps around many parts of the codebase. Starting a new module strikes me as a great place to make sure we have some of these in place. I certainly won't insist on a comprehensive user guide in this PR, but the other parts are all fixable in a single line each. Let me know if you have questions about it.

The matplotlib concern should probably be addressed as well.

Other than that this looks great. Thanks for getting this started @kevinrobinson .

Of course, this assumes that people in the community want this solution for the UI. It would certainly be nice to hear people's opinions, although in the current form matplotlib is optional and therefore this isn't that different of a setup from before. I know @adrinjalali talked about matplotlib plots before so this should be in line with his opinions hopefully (?)

kevinrobinson · 2020-08-16T13:24:12Z

@romanlutz Thanks! 👍

re: community consensus, my interpretation of fairlearn/fairlearn-proposals#14 (comment) and fairlearn/fairlearn-proposals#14 (comment) was that other folks have previously suggested this approach, which is part of why I'm trying to build on that consensus here.

But to shift gears, I think it is far more important whether users want this :) I think all user-facing changes should primarily be evaluated on the value they create for users, be informed by user research, or other informal methods of validation and co-design. So even the concerns in fairlearn/fairlearn-proposals#14 are a primary consideration, hopefully on balance the discussion on this pull request also considers whether this change would serve users. In that spirit, this PR improves the UX for people trying to visualize fairness metrics but facing issues like #484 #501 and #558.

re: documentation, I added API docs and a section to the user guide, mirroring what is already there. I didn't make any edits to improve quality (eg, adding sociotechnical context, showing how these charts are relevant for assessing or reducing real harms, adding guidelines on how to interpret visuals, etc.)

re: code, I added commits removing customplots everywhere, that's ready for review. I read #289, and so only added smoke tests for now. The code in plot_disparities_in_performance using metrics functions is the main place where code review would be really helpful! 👍

EDIT: removing the test checking for matplotlib led to CI failures for limited env. I looked at the job steps, and it seems that job only exists for that one test. I removed it, and references in other config to the test/install folder that no longer exists without that test. Hopefully doing that helps us cut through the incidental complexity without getting too derailed :)

Signed-off-by: kevinrobinson <kevin.robinson.0@gmail.com>

romanlutz · 2020-08-16T17:01:20Z

Leaving matplotlib out of the main/default/base (not sure about terminology) package was a very conscious choice. If we want to undo that we should at least hear what @adrinjalali @MiroDudik @riedgar-ms think. That in itself will hold up this PR a bit because @MiroDudik isn't available for a few days.

I vaguely remember lots of issues testing this on different platforms. The separation into a base package and a plotting extension made that slightly harder, so the simplicity of having just one is something I appreciate. Perhaps the issues with including matplotlib go further than that, though? I'm sure @adrinjalali will know.

CHANGES.md

fairlearn/metrics_plots/plot_disparities_in_selection_rate.py

Signed-off-by: kevinrobinson <kevin.robinson.0@gmail.com>

kevinrobinson · 2020-08-16T18:59:11Z

fairlearn/metrics_plots/plot_disparities_in_performance.py

+    underpredictions = []
+    for sensitive_value in sensitive_values:
+        overpredictions.append(fp_summary['by_group'][sensitive_value])
+        underpredictions.append(fn_summary['by_group'][sensitive_value])


See the issue description for more, but a review of the code in this area that determines overpredictions and underpredictions would be super helpful! 👍

This feels like something that might belong in the metrics code (although that is in the process of being rewritten).

I pointed this out in the other comment (today) but after consulting with @MiroDudik overprediction != FPR and underprediction != FNR, so we should change it so that it doesn't say over and underprediction anymore, but rather FPR and FNR.

@romanlutz Thanks! This is super helpful 👍

As the current plan for 0.5 is to ship the metrics changes, switch to Flask for the widget, and make documentation updates all around (eg, so the quickstart works), I'll leave this work alone so as to not slow down those other things from shipping first.

romanlutz

All the comments are minor and easy to address. No objections from me, looks great. Definitely need to wait for @MiroDudik 's thoughts which will take a week I'm afraid. Since this adds a new module and dependency to the base package it's not something I feel comfortable deciding without his input.

Besides, I'm super curious to hear from others in the community. If this is in line with people's ideas about a place to create visualizations then that's fantastic.

fairlearn/metrics_plots/plot_disparities_in_performance.py

fairlearn/metrics_plots/plot_disparities_in_selection_rate.py

fairlearn/metrics_plots/plot_disparities_in_performance.py

Signed-off-by: kevinrobinson <kevin.robinson.0@gmail.com>

adrinjalali · 2020-08-21T13:50:22Z

matplotlib can certainly be a soft dependency for users who need to use the plot functionalities (like pip install fairlearn[plotting] and/or pip install fairlearn[extras].

Otherwise I'm all in favor of having the basic plots using matplotlib which is kinda the main household plotting tool.

kevinrobinson · 2020-08-24T22:32:32Z

fairlearn/metrics_plots/plot_disparities_in_selection_rate.py

+    selection_rates = []
+    for sensitive_value in sensitive_values:
+        selection_rates.append(selection_rate_summary['by_group'][sensitive_value])
+    disparity = abs(selection_rates[0] - selection_rates[1])


If we move forward, fix this.

You probably mean that it should be max and min as opposed to first and second element?
@riedgar-ms 's metrics API changes will allow for that to be done easily, although we don't have to wait for that necessarily.

adrinjalali · 2020-09-21T06:20:14Z

Should we move this forward?

riedgar-ms · 2020-09-21T13:38:33Z

devops/nightly.yml

@@ -39,4 +39,3 @@ jobs:

 - template: templates/build-widget-job-template.yml

- template: templates/limited-installation-job-template.yml


Why is this being removed?

Because "limited" = no matplotlib, so since he added it as a "core" dependency it's not needed anymore.

undoing this now and therefore resolving this comment

…quick-start-remove-plots

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

romanlutz · 2020-12-19T06:41:40Z

As discussed with @kevinrobinson I took over this PR with the following changes:

move to new MetricFrame since this PR predates v0.5.0
generalize the plot_disparities_in_selection_rate function to work with any metric (including binary classification & regression), hence renamed to plot_disparities_in_metric
fix labels since we're showing FPR/FNR and the labels said over-/underprediction which isn't quite the same (denominator is different)

The reason there are two plots right now is that @kevinrobinson did an awesome job replicating the original plots from the Fairlearn dashboard. In reality, the result from plot_disparities_in_performance is just a combined FPR/FNR plot that shows accuracy through text. I'm fine with adding that for the sake of consistency with the existing dashboard, but theoretically it's possible to get the same information by calling plot_disparities_in_metric three times with the three individual metrics.

We can probably find better names (?) Suggestions welcome! Big ones:

Should the module be metrics_plots or more generically plots, or plotting? We have plotting functionality for postprocessing currently half-hidden away in the postprocessing module. Perhaps that makes sense, but I'd like to hear other people's opinions.
The function names are quite long.

@hildeweerts @adrinjalali @MiroDudik @riedgar-ms

I will also open a few issues with "help-wanted" to add plots for

model comparison similar to what we have in the Fairlearn dashboard
passing in multiple metrics rather than just one, and plotting all of them in a subplots

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

adrinjalali · 2020-12-20T19:38:50Z

I'll check the PR soon, in the meantime, could you please remind me why we have all those .png files in this PR? As in, why wouldn't they be generated in .py example files and then used in the user-guide?

romanlutz

I'll check the PR soon, in the meantime, could you please remind me why we have all those .png files in this PR? As in, why wouldn't they be generated in .py example files and then used in the user-guide?

Which .png files are you referring to? Perhaps you're looking at an earlier version? The original PR had png files, but I replaced them with .. plot:: routines provided by matplotlib to generate the plot when creating the webpage. IMO that's much preferable since we don't have to update the png whenever we make code changes.

romanlutz · 2020-12-17T18:58:59Z

devops/nightly.yml

@@ -39,4 +39,3 @@ jobs:

 - template: templates/build-widget-job-template.yml

- template: templates/limited-installation-job-template.yml


undoing this now and therefore resolving this comment

romanlutz · 2020-12-18T00:31:50Z

fairlearn/metrics_plots/plot_disparities_in_performance.py

+        - accuracy_summary['by_group'][sensitive_values[1]])
+
+    # chart text for localization
+    title_text = 'Disparity in performance'


My assumption is still that we won't do localization for these, so I won't consider this blocking unless anyone jumps in stating otherwise.

romanlutz · 2020-12-18T00:33:19Z

fairlearn/postprocessing/_constants.py

@@ -9,8 +9,6 @@

 OUTPUT_SEPARATOR = "-"*65

-_MATPLOTLIB_IMPORT_ERROR_MESSAGE = "Please make sure to install fairlearn[customplots] to use " \


Reverting to soft dependency...

docs/api_reference/index.rst

devops/templates/limited-installation-job-template.yml

docs/api_reference/index.rst

docs/contributor_guide/development_process.rst

adrinjalali · 2020-12-30T10:29:42Z

docs/user_guide/assessment.rst

+The :py:mod:`fairlearn.metrics_plots` module visualizes fairness metrics from
+different perspectives.
+
+The examples below illustrate a scenario where *binary gender* is


should we use a different example dataset for new examples we introduce?

I'm the first to criticize usage of the adult dataset since it's absolutely nonsensical and doesn't show a proper realistic task. After all, how is it useful to predict if someone earns more than the arbitrary cutoff amount of $50k?

That means we have to have something better, though. The argument for adult so far was that it's widely used in the fairness literature when comparing results of mitigation techniques. When using another dataset we've always run into the problem that you'd need to contextualize this first. For example, it's questionable to use COMPAS or Boston housing datasets without at least pointing at the complicated and controversial background. We haven't written a section in the documentation about that yet, though.

I really want us to replace adult in every example eventually. I do think that it's a separate task from this PR, though. Perhaps the credit card default dataset would work, but again we should describe the context somewhere to avoid having people think that it's fine to just use such a dataset.

Wdyt?

I agree that we should contextualize a data which we use in our docs, but this requirement is making us adding more documentation with a dataset which we know has these clear issues. I would be happier (I think) if we use another dataset which doesn't have these issues, and have the contextualization as a future work instead of adding more examples with dataset and having changing all of them later as a future work.

That said, I wouldn't veto this PR because of the dataset, it'd be just really nice if we moved away from this dataset, and not using it anew is a good place to start phasing it out I think.

I very much share your perspective. I'm not 100% happy with any of the three datasets currently in fairlearn.datasets, but I think the credit card default dataset from UCI could be great. We use it in this notebook:
https://github.com/fairlearn/fairlearn/blob/master/notebooks/Binary%20Classification%20with%20the%20UCI%20Credit-card%20Default%20Dataset.ipynb
For that, I'd create a separate PR to add it to the datasets module first, and then replace it for this example in a follow-up PR. We can discuss whether we want it elsewhere, too, for example in the quickstart. That dataset has its own issues (which were alluded to in #418 ), of course, but I think for the purpose of demonstrating the functionality of these plotting functions it'll be fine.

This would mean this stays as is in the current PR, though, because expanding the scope to adding another dataset, documenting that dataset, etc. doesn't seem right.

Sure, I thought we'd just use fetch_openml and not add it necessarily to the datasets. If you think that's worse than having this dataset, then sure.

UCI credit data set is actually not ideal, because it doesn't exhibit equalized-odds disparity, so we have to partly synthesize it (which makes for a slightly odd notebook, but it would look even weirder here I think). But I don't remember whether it shows other kinds of disparity.

Two options:

We use the original (untweaked) version of the UCI data set even if we don't have much disparity.

We use a different data set that is known to exhibit disparity, say when using linear models.

That said. I don't think this should be a blocker on this PR. However, if we are okay just using the untweaked UCI credit data set, we can put fetch_openml in this sample code for now, and replace with the fairlearn.datasets equivalent once/if it's available.

@LeJit, @hildeweerts: do you have any thoughts on this?

With regards to other data sets in a similar domain that we could use for examples, we could potentially use the German Credit dataset or the Lending Club dataset. Both are commonly used benchmark datasets in fairness papers, and German Credit is available through OpenML (Lending Club isn't though). We should eventually add the German Credit dataset to fairlearn.datasets anyways because it is a commonly used benchmark.

I have already written the fairlearn.datasets function for the UCI credit card data set as part of a sub-task to redo the associated Jupyter notebook. If we want to use the untweaked UCI credit card dataset for now, I can create a small pull request to get that code merged in.

Whether we have to add a particular data set to fairlearn.datasets depends quite a bit on the purpose of that part of the repo. We've actually had discussions about this in the past and I'm not sure whether a consensus was reached (see e.g., #583).

On a more practical note, I'm not familiar with the Lending Club data but the fact it isn't on OpenML at the moment shouldn't be a problem, because we can just add it ourselves! Is this the one you refer to: https://www.kaggle.com/wordsforthewise/lending-club? From the description it doesn't seem like this data was collected with permission from Lending Club, though...

I was just wondering, perhaps concerns re. proper contextualization can be partly solved by taking the effort to create data sheets (see https://arxiv.org/abs/1803.09010) for the data sets in fairlearn.datasets. We probably won't be able to fill out everything, but it seems to me that if we want to practice what we preach, this would be a minimal suggestion.

adrinjalali · 2020-12-30T10:31:46Z

fairlearn/metrics_plots/plot_disparities_in_metric.py

@@ -0,0 +1,86 @@
+# Copyright (c) Microsoft Corporation and Fairlearn contributors.


remove Microsoft from the new files?

adrinjalali · 2020-12-30T10:34:25Z

fairlearn/metrics_plots/plot_disparities_in_metric.py

+                               sensitive_features=sensitive_features)
+
+    # chart text for localization
+    metric_text = metric.__name__.replace('_', ' ')


we could accept metric_name or x_label, y_label optionally if the __name__ is not properly set or the user wants an alternative text.

Agreed! I added an optional argument metric_name and documented that we'll use __name__ is no metric name is passed.

fairlearn/metrics_plots/plot_disparities_in_metric.py

adrinjalali · 2020-12-30T10:35:59Z

fairlearn/metrics_plots/plot_disparities_in_metric.py

+from fairlearn.metrics import MetricFrame
+
+
+def plot_disparities_in_metric(metric, y_true, y_pred, sensitive_features, show_plot=True):


The plot functions should probably accept and return an ax parameter to receive and return the matplotlib's axis object so that users can further customize the plots.

Sounds good. I compared with some of the functions in scikit-learn. They don't have a show_plot arg, but rather the caller is expected to call plt.show() themselves. Is that preferable? I find it kind of odd to call a function called plot_* and then there's no plot shown unless I do something more, but perhaps that's just me.

In the context of matplotlib, I find it nice if the caller has the capability of having each plot in a subplot (hence passing ax), and makes it more configurable when the method returns an object which the user can change. In terms of the user having to call plt.show(), I'm agnostic. I don't mind having the show_plot as you have here. But it's basically replacing a single line of code. But I also don't mind having it True by default, which is kinda nice to the user I guess.

I'll leave it in there for now then, unless someone has stronger feelings about this :-)

test/unit/metrics_plots/__init__.py

adrinjalali · 2020-12-30T10:38:38Z

test/unit/metrics_plots/conftest.py

@@ -0,0 +1,20 @@
+# Copyright (c) Microsoft Corporation and Fairlearn contributors.


I'd rather have these bits in the test files themselves.

But that would mean we need to duplicate it if there are multiple files. Or did you mean "make it all a single file"? I'll go with that for now, but lmk if you meant something else.

I meant duplicating it in the file. It makes understanding a test file as a standalone file much easier. I'm not sure how common it is to load these datasets in the conftest, but I'm not used to it. And for a third person adding a test in a test file, they'd be a bit confused or not see that there are datasets available and they'd load their own in the tests.

That's fair. We've been doing it like that throughout the test/unit directory so far, so that's probably why Kevin originally copied this behavior. I assume the current structure is in line with your expectations. If not, please let me know!

… empty line at end of file Signed-off-by: Roman Lutz <rolutz@microsoft.com>

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

adrinjalali

Let me know when you want another round of review :)

adrinjalali · 2021-01-05T15:01:54Z

test/unit/metrics_plots/conftest.py

@@ -0,0 +1,20 @@
+# Copyright (c) Microsoft Corporation and Fairlearn contributors.


I meant duplicating it in the file. It makes understanding a test file as a standalone file much easier. I'm not sure how common it is to load these datasets in the conftest, but I'm not used to it. And for a third person adding a test in a test file, they'd be a bit confused or not see that there are datasets available and they'd load their own in the tests.

MiroDudik · 2021-01-06T15:29:34Z

fairlearn/metrics/_plots/_plot_disparities_in_metric.py

+from fairlearn.metrics import MetricFrame
+
+
+def plot_disparities_in_metric(metric, *, y_true, y_pred, sensitive_features, show_plot=True,


Three points re. API here:

I think that * in the signature should be after y_pred as we have in MetricFrame

I think this should be called plot_metric_by_group

instead of metric_name, we could just allow the format where metric is a single item dictionary? this would be more in line with how you would get this done with MetricFrame.

With the above three modifications, the signature becomes:

def plot_metric_by_group(metric, y_true, y_pred, *, sensitive_features, show_plot=True, ax=None):

Another idea (for another PR) is to have a method MetricFrame.plot(show_plot=True, ax=None) and this would be just a wrapper for that? This would be analogous to pd.DataFrame.plot. Although there's an obvious wrinkle re. what to do if you have multiple metrics and/or control features, which we probably don't need to resolve in this PR :-)

I agree with renaming to make it more consistent & I like the idea for MetricFrame.plot().

Signed-off-by: MiroDudik <mdudik@gmail.com>

hildeweerts · 2021-01-07T09:57:19Z

I know I'm a bit late to the party and I haven't run the latest code myself, but I have to admit that I personally don't find the plot_disparities_in_performance() plot very intuitive - although displaying FPR/FNR is already makes more sense to me than underprediction/overprediction.

I'm probably biased though because I've mostly worked with imbalanced datasets in the past (fraud detection, predictive maintenance, etc.), in which case accuracy doesn't really make sense as a performance metric, nor does plotting FPR/FNR on the same scale.

romanlutz · 2021-01-08T04:17:38Z

I know I'm a bit late to the party and I haven't run the latest code myself, but I have to admit that I personally don't find the plot_disparities_in_performance() plot very intuitive - although displaying FPR/FNR is already makes more sense to me than underprediction/overprediction.

I'm probably biased though because I've mostly worked with imbalanced datasets in the past (fraud detection, predictive maintenance, etc.), in which case accuracy doesn't really make sense as a performance metric, nor does plotting FPR/FNR on the same scale.

@MiroDudik mentioned something similar and I totally agree. IMO the ideal state would be to cut that function and if someone wants to see both they can use the other function to plot FPR and FNR. The only reason it's there is that Kevin replicated the existing FairlearnDashboard in this PR, and I felt bad about just cutting it. If we all agree that it's best not to have it let's do that!

Note that there are a few follow-up tasks that I created already ( #668 #667 #666 ) so those sorts of things are beyond the scope of this PR :-)

MiroDudik · 2021-04-23T16:15:58Z

I was working on a tutorial and I realized that our MetricFrame already supports plotting functionality to an extent. This is a small example demonstrating it:

# Imports
import pandas as pd
import numpy as np
import fairlearn.metrics as flm
import sklearn.metrics as skm
from fairlearn.metrics import MetricFrame

# Create a toy data set
df = pd.DataFrame(
    {'sex':  ['F', 'M', 'F', 'M', 'M', 'F', 'M', 'F', 'M', 'M'],
     'race': ['White', 'White', 'White', 'Black', 'Black', 'Black', 'Hispanic', 'Hispanic', 'White', 'Black'],
     'y_true': [0, 1, 1, 0, 1, 1, 1, 1, 1, 0],
     'y_pred': [1, 1, 1, 0, 0, 1, 0, 1, 1, 1]})

# Define metrics
metrics = {
    'accuracy': skm.accuracy_score,
    'precision': skm.precision_score,
    'recall': skm.recall_score,
    'false_positive_rate': flm.false_positive_rate,
    'true_positive_rate': flm.true_positive_rate,
    'avg_true': lambda y_true, y_pred: np.mean(y_true),
    'avg_pred': lambda y_true, y_pred: np.mean(y_pred),
    'count': lambda y_true, y_pred: y_true.shape[0],
}

# Set up metric frame
mf = MetricFrame(metrics, df['y_true'], df['y_pred'], sensitive_features=df[['sex','race']])

# Plots
mf.by_group.plot.bar(y=['false_positive_rate','true_positive_rate'],
                     title='Compare multiple disaggregated metrics in a single plot');

mf.by_group.plot.bar(subplots=True, layout=[3,3], legend=False, figsize=[12,8],
                     title='Show all metrics');

[tagging @adrinjalali , @hildeweerts , @romanlutz , @LeJit ]

adrinjalali · 2021-04-25T17:43:03Z

I was working on a tutorial and I realized that our MetricFrame already supports plotting functionality to an extent.

It'd be nice to have some of these examples in our documentation.

romanlutz · 2021-04-27T01:59:35Z

@MiroDudik this is awesome! I assume this is because MetricFrame's by_group is returned as a DataFrame which has plot: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.html

@rishabhsamb FYI this could be useful!

romanlutz · 2021-05-03T23:50:16Z

Closing this PR since #766 covers this with much less custom code. Thanks @kevinrobinson for getting the ball rolling on this! I strongly doubt we'd be this far in the transition away from the dashboard if not for this work.

@MiroDudik

…766) With a slight delay (originally targeted for April) I'm finally removing the `FairlearnDashboard` since a newer version already exists in `raiwidgets`. The documentation is updated to instead use the plots @MiroDudik created with a single line directly from the `MetricFrame`. In the future we want to add more kinds of plots as already mentioned in #758 #666 and #668 . Specifically, the model comparison plots do not yet have a replacement yet. Note that the "example" added to the `examples` directory is not shown under "Example notebooks" on the webpage, which is intentional since it's technically not a notebook. This also makes #561 mostly redundant, which I'll close shortly. #667 is also directly addressed with this PR as the examples illustrate. Signed-off-by: Roman Lutz <rolutz@microsoft.com>

kevinrobinson force-pushed the quick-start-remove-plots branch from db1c918 to 41ad924 Compare August 14, 2020 21:30

romanlutz added the enhancement New feature or request label Aug 15, 2020

romanlutz requested review from romanlutz, MiroDudik and riedgar-ms August 15, 2020 23:40

kevinrobinson added 5 commits August 16, 2020 09:24

Plots: Add disparity in performance function

3481245

Signed-off-by: kevinrobinson <kevin.robinson.0@gmail.com>

Fix naming, fix flake8

0565355

Signed-off-by: kevinrobinson <kevin.robinson.0@gmail.com>

Remove customplots; move matplotlib into requirements.txt

5512b3a

Signed-off-by: kevinrobinson <kevin.robinson.0@gmail.com>

Remove matplotlib checks also

117da70

Signed-off-by: kevinrobinson <kevin.robinson.0@gmail.com>

Rename to metrics_plots; add smoke tests; update user guide

5eb7901

Signed-off-by: kevinrobinson <kevin.robinson.0@gmail.com>

kevinrobinson force-pushed the quick-start-remove-plots branch from cfe2665 to 5eb7901 Compare August 16, 2020 13:24

kevinrobinson added 3 commits August 16, 2020 09:29

Fix flake bits

a8b248e

Signed-off-by: kevinrobinson <kevin.robinson.0@gmail.com>

Fix copy-pasta in underprediction/overprediction copy

f376bb7

Signed-off-by: kevinrobinson <kevin.robinson.0@gmail.com>

Remove LimitedInstallation CI job, since test/install is empty now

847d39e

Signed-off-by: kevinrobinson <kevin.robinson.0@gmail.com>

romanlutz reviewed Aug 16, 2020

View reviewed changes

CHANGES.md Outdated Show resolved Hide resolved

fairlearn/metrics_plots/plot_disparities_in_selection_rate.py Outdated Show resolved Hide resolved

Update nits

ee0316e

Signed-off-by: kevinrobinson <kevin.robinson.0@gmail.com>

kevinrobinson commented Aug 16, 2020

View reviewed changes

romanlutz reviewed Aug 17, 2020

View reviewed changes

fairlearn/metrics_plots/plot_disparities_in_performance.py Outdated Show resolved Hide resolved

fairlearn/metrics_plots/plot_disparities_in_selection_rate.py Outdated Show resolved Hide resolved

fairlearn/metrics_plots/plot_disparities_in_performance.py Outdated Show resolved Hide resolved

Update comment on multiple values for a single sensitives attribute

3961ff5

Signed-off-by: kevinrobinson <kevin.robinson.0@gmail.com>

kevinrobinson commented Aug 24, 2020

View reviewed changes

romanlutz mentioned this pull request Sep 17, 2020

Remove visualizations directory, roll back to previous stable UI version, and use it through Flask #578

Merged

riedgar-ms reviewed Sep 21, 2020

View reviewed changes

Roman Lutz added 2 commits December 18, 2020 22:27

Merge branch 'master' of https://github.com/fairlearn/fairlearn into …

167d89a

…quick-start-remove-plots

adjust CHANGES.md to new name and add regression metrics tests

259c3ec

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

update docs to display plots

84ae2ab

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

romanlutz reviewed Dec 20, 2020

View reviewed changes

romanlutz requested a review from hildeweerts December 23, 2020 05:54

MiroDudik reviewed Dec 28, 2020

View reviewed changes

docs/api_reference/index.rst Outdated Show resolved Hide resolved

adrinjalali reviewed Dec 30, 2020

View reviewed changes

romanlutz mentioned this pull request Dec 31, 2020

Add model comparison plot #666

Closed

Roman Lutz added 2 commits December 31, 2020 12:42

moved metrics_plots into metrics module, removed whitespace and added…

2ed159d

… empty line at end of file Signed-off-by: Roman Lutz <rolutz@microsoft.com>

adjust documentation to point to metrics module instead of metrics_plots

8d6ad4d

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

This was referenced Dec 31, 2020

Allow multiple metrics to be passed into plotting functions #667

Closed

Add control features to metric plots #668

Open

Roman Lutz added 2 commits January 4, 2021 12:51

address feedback on test organization and plotting param figsize

669acf2

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

undo changes compared to initial state

5715187

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

adrinjalali reviewed Jan 5, 2021

View reviewed changes

MiroDudik reviewed Jan 6, 2021

View reviewed changes

Fix plotting signatures in .rst

1c25ec0

Signed-off-by: MiroDudik <mdudik@gmail.com>

Base automatically changed from master to main February 6, 2021 06:05

This was referenced Apr 29, 2021

Support for by_group in Dashboard #759

Closed

Remove FairlearnDashboard and replace it with matplotlib-based plots #766

Merged

romanlutz closed this May 3, 2021

		@@ -39,4 +39,3 @@ jobs:

		- template: templates/build-widget-job-template.yml

		- template: templates/limited-installation-job-template.yml

		@@ -9,8 +9,6 @@

		OUTPUT_SEPARATOR = "-"*65

		_MATPLOTLIB_IMPORT_ERROR_MESSAGE = "Please make sure to install fairlearn[customplots] to use " \

		@@ -0,0 +1,86 @@
		# Copyright (c) Microsoft Corporation and Fairlearn contributors.

		from fairlearn.metrics import MetricFrame


		def plot_disparities_in_metric(metric, y_true, y_pred, sensitive_features, show_plot=True):

		@@ -0,0 +1,20 @@
		# Copyright (c) Microsoft Corporation and Fairlearn contributors.

		from fairlearn.metrics import MetricFrame


		def plot_disparities_in_metric(metric, *, y_true, y_pred, sensitive_features, show_plot=True,

Plots: Add plot_disparities_in_performance and plot_disparities_in_metric #561

Plots: Add plot_disparities_in_performance and plot_disparities_in_metric #561

Conversation

kevinrobinson commented Aug 14, 2020 • edited Loading

Rationale

What this adds

Disparity in selection rate

now: +3 clicks in Azure UI

added: plot_disparity_in_selection_rate

Disparity in accuracy

now: +3 clicks in Azure UI

added: plot_disparity_in_accuracy

Other notes

Code to test this in Jupyter on 0.4.6:

riedgar-ms commented Aug 15, 2020

romanlutz commented Aug 15, 2020

kevinrobinson commented Aug 16, 2020

romanlutz commented Aug 16, 2020

kevinrobinson commented Aug 16, 2020 • edited Loading

romanlutz commented Aug 16, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

romanlutz left a comment

Choose a reason for hiding this comment

adrinjalali commented Aug 21, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adrinjalali commented Sep 21, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

romanlutz commented Dec 19, 2020

adrinjalali commented Dec 20, 2020

romanlutz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adrinjalali left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MiroDudik Jan 6, 2021 • edited Loading

Choose a reason for hiding this comment

hildeweerts Jan 7, 2021 • edited Loading

Choose a reason for hiding this comment

hildeweerts commented Jan 7, 2021

romanlutz commented Jan 8, 2021

MiroDudik commented Apr 23, 2021 • edited Loading

adrinjalali commented Apr 25, 2021 • edited Loading

romanlutz commented Apr 27, 2021

romanlutz commented May 3, 2021

kevinrobinson commented Aug 14, 2020 •

edited

Loading

kevinrobinson commented Aug 16, 2020 •

edited

Loading

MiroDudik Jan 6, 2021 •

edited

Loading

hildeweerts Jan 7, 2021 •

edited

Loading

MiroDudik commented Apr 23, 2021 •

edited

Loading

adrinjalali commented Apr 25, 2021 •

edited

Loading