Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plots: Add plot_disparities_in_performance and plot_disparities_in_metric #561

Closed

Conversation

kevinrobinson
Copy link
Contributor

@kevinrobinson kevinrobinson commented Aug 14, 2020

Rationale

This work is intended to help along the suggestion in fairlearn/fairlearn-proposals#14 (comment), with the assumption that the UI and Azure-specific bits of the project will be removed and placed in a separate repository with a different governance structure.

To illustrate that this is both viable and plays to the strengths of folks currently staffing the project, here is an example function that could be used in the quick start guide, removing the need for any Azure or UI code. This is based on some key assumptions, including:

  • to date, there has been no user research or feedback validating that users can use these visualizations to do meaningful work that assesses or reduces real world harm
  • the current visualizations provide little value in making meaningful decisions about how to interpret the data, or how to take action on it
  • the interactivity of the visualizations is low-value based on the team's current understanding of practitioner needs (eg, Holstein et al. 2019; Madaio et al. 2020)
  • the overhead of making widgets and HTML/CSS/JS is a barrier for team members in maintaining, supporting and growing the project
  • the current visualizations do not address any of the core challenges involved in fairness as sociotechnical work (eg, Selbst et al. 2019), and so reducing maintenance burden can free folks up to engage in higher-value work

If these assumptions are off, please comment and help validate them! 😄

What this adds

This pull request adds the plot_disparities_in_performance and plot_disparities_in_selection_rate functions, intended to replace those two visualizations built as widgets in HTML/CSS/JS.

Disparity in selection rate

Note the change of bar color to grey, since blue was used to communicate something specific in the chart above and the meaning of the color is not the same in this chart.

now: +3 clicks in Azure UI

Screen Shot 2020-08-14 at 4 31 52 PM

added: plot_disparity_in_selection_rate

Screen Shot 2020-08-14 at 4 34 20 PM

Disparity in accuracy

Note the overprediction and underprediction values within the charts are quite different in the new plot. They reflect the values that the false_positive_rate and false_negative_rate functions return, but these don't really make sense to me. I am mixing up something important here, but not sure what. I'll take another look next week, but if other folks can spot what the issue is that'd be awesome and super helpful! 👍

EDIT: copy-paste typo in underprediction/overprediction is fixed now.

now: +3 clicks in Azure UI

Screen Shot 2020-08-14 at 4 21 11 PM

added: plot_disparity_in_accuracy

Screen Shot 2020-08-14 at 4 03 45 PM

EDIT: as @romanlutz helpfully pointed out, this also supports multiple values for a single sensitive attribute. This example uses data.data['race'] in place of data.data['sex'] below (these examples are all narrowly technical without sociotechnical context):
Screen Shot 2020-08-17 at 10 17 44 AM

Other notes

Code to test this in Jupyter on 0.4.6:

# copied verbatim from quickstart
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_openml
data = fetch_openml(data_id=1590, as_frame=True)
X = pd.get_dummies(data.data)
y_true = (data.target == '>50K') * 1
sex = data.data['sex']

from fairlearn.metrics import group_summary
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier(min_samples_leaf=10, max_depth=4)
classifier.fit(X, y_true)
y_pred = classifier.predict(X)

# to use new functions
from fairlearn.plots import plot_disparities_in_performance, plot_disparities_in_selection_rate
plot_disparities_in_performance(y_true, y_pred, sex)
plot_disparities_in_selection_rate(y_true, y_pred, sex)

@riedgar-ms
Copy link
Member

As a warning, matplotlib only gets installed as part of [customplots].

I think @romanlutz came up with a way of testing plotting methods too?

@romanlutz romanlutz added the enhancement New feature or request label Aug 15, 2020
@romanlutz
Copy link
Member

As a warning, matplotlib only gets installed as part of [customplots].

I think @romanlutz came up with a way of testing plotting methods too?

That's correct! I'm not too fond of that name customplots, so if people want to make that shorter (plots) I'm all in favor.

We'd also have to provide proper error messages telling users that they need matplotlib if it's not installed, just like with the postprocessing plots.

I don't quite agree with everything written in the description but that's immaterial.

The more important question to me is what this means to UI development for the community and the proposal fairlearn/fairlearn-proposals#14 . Does this mean you'd prefer not to have a separate repo elsewhere? On Thursday people were quite insistent that there should be a way for the community to write UI, so I'd rather get that settled before inventing new UIs.

Regarding testing: there's a pytest plugin for testing plots but it didn't work across platforms that we test on, so I never turned it on. It does work locally, though. See https://github.com/fairlearn/fairlearn/blob/master/test/unit/postprocessing/test_plots.py for more information.

Finally, this kind of change would need an update to CHANGES.md, API reference (docs folder), and user guide.

@kevinrobinson
Copy link
Contributor Author

@riedgar-ms @romanlutz Thanks for the comments! 👍

This gets started with the first step outlined in fairlearn/fairlearn-proposals#14 (comment), and I'm hoping to help move things forward by showing rather than telling :) It also improves the UX for prospective users currently facing issues like #484 #501 and #558.

If folks want to merge this, I can work myself or with anyone else to finish this off.

@romanlutz
Copy link
Member

@riedgar-ms @romanlutz Thanks for the comments! 👍

This gets started with the first step outlined in fairlearn/fairlearn-proposals#14 (comment), and I'm hoping to help move things forward by showing rather than telling :) It also improves the UX for prospective users currently facing issues like #484 #501 and #558.

If folks want to merge this, I can work myself or with anyone else to finish this off.

Right. I think this is a fine first step. We sort of agreed to try and add documentation (API reference for new modules, CHANGES.md, user guide) whenever adding the code, although that's obviously not as strict as long as we have severe gaps around many parts of the codebase. Starting a new module strikes me as a great place to make sure we have some of these in place. I certainly won't insist on a comprehensive user guide in this PR, but the other parts are all fixable in a single line each. Let me know if you have questions about it.

The matplotlib concern should probably be addressed as well.

Other than that this looks great. Thanks for getting this started @kevinrobinson .

Of course, this assumes that people in the community want this solution for the UI. It would certainly be nice to hear people's opinions, although in the current form matplotlib is optional and therefore this isn't that different of a setup from before. I know @adrinjalali talked about matplotlib plots before so this should be in line with his opinions hopefully (?)

@kevinrobinson
Copy link
Contributor Author

kevinrobinson commented Aug 16, 2020

@romanlutz Thanks! 👍

re: community consensus, my interpretation of fairlearn/fairlearn-proposals#14 (comment) and fairlearn/fairlearn-proposals#14 (comment) was that other folks have previously suggested this approach, which is part of why I'm trying to build on that consensus here.

But to shift gears, I think it is far more important whether users want this :) I think all user-facing changes should primarily be evaluated on the value they create for users, be informed by user research, or other informal methods of validation and co-design. So even the concerns in fairlearn/fairlearn-proposals#14 are a primary consideration, hopefully on balance the discussion on this pull request also considers whether this change would serve users. In that spirit, this PR improves the UX for people trying to visualize fairness metrics but facing issues like #484 #501 and #558.


re: documentation, I added API docs and a section to the user guide, mirroring what is already there. I didn't make any edits to improve quality (eg, adding sociotechnical context, showing how these charts are relevant for assessing or reducing real harms, adding guidelines on how to interpret visuals, etc.)

re: code, I added commits removing customplots everywhere, that's ready for review. I read #289, and so only added smoke tests for now. The code in plot_disparities_in_performance using metrics functions is the main place where code review would be really helpful! 👍

EDIT: removing the test checking for matplotlib led to CI failures for limited env. I looked at the job steps, and it seems that job only exists for that one test. I removed it, and references in other config to the test/install folder that no longer exists without that test. Hopefully doing that helps us cut through the incidental complexity without getting too derailed :)

Signed-off-by: kevinrobinson <kevin.robinson.0@gmail.com>
Signed-off-by: kevinrobinson <kevin.robinson.0@gmail.com>
Signed-off-by: kevinrobinson <kevin.robinson.0@gmail.com>
Signed-off-by: kevinrobinson <kevin.robinson.0@gmail.com>
Signed-off-by: kevinrobinson <kevin.robinson.0@gmail.com>
Signed-off-by: kevinrobinson <kevin.robinson.0@gmail.com>
Signed-off-by: kevinrobinson <kevin.robinson.0@gmail.com>
Signed-off-by: kevinrobinson <kevin.robinson.0@gmail.com>
@romanlutz
Copy link
Member

Leaving matplotlib out of the main/default/base (not sure about terminology) package was a very conscious choice. If we want to undo that we should at least hear what @adrinjalali @MiroDudik @riedgar-ms think. That in itself will hold up this PR a bit because @MiroDudik isn't available for a few days.

I vaguely remember lots of issues testing this on different platforms. The separation into a base package and a plotting extension made that slightly harder, so the simplicity of having just one is something I appreciate. Perhaps the issues with including matplotlib go further than that, though? I'm sure @adrinjalali will know.

CHANGES.md Outdated Show resolved Hide resolved
Signed-off-by: kevinrobinson <kevin.robinson.0@gmail.com>
underpredictions = []
for sensitive_value in sensitive_values:
overpredictions.append(fp_summary['by_group'][sensitive_value])
underpredictions.append(fn_summary['by_group'][sensitive_value])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the issue description for more, but a review of the code in this area that determines overpredictions and underpredictions would be super helpful! 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like something that might belong in the metrics code (although that is in the process of being rewritten).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pointed this out in the other comment (today) but after consulting with @MiroDudik overprediction != FPR and underprediction != FNR, so we should change it so that it doesn't say over and underprediction anymore, but rather FPR and FNR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@romanlutz Thanks! This is super helpful 👍

As the current plan for 0.5 is to ship the metrics changes, switch to Flask for the widget, and make documentation updates all around (eg, so the quickstart works), I'll leave this work alone so as to not slow down those other things from shipping first.

Copy link
Member

@romanlutz romanlutz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the comments are minor and easy to address. No objections from me, looks great. Definitely need to wait for @MiroDudik 's thoughts which will take a week I'm afraid. Since this adds a new module and dependency to the base package it's not something I feel comfortable deciding without his input.

Besides, I'm super curious to hear from others in the community. If this is in line with people's ideas about a place to create visualizations then that's fantastic.

Signed-off-by: kevinrobinson <kevin.robinson.0@gmail.com>
@adrinjalali
Copy link
Member

matplotlib can certainly be a soft dependency for users who need to use the plot functionalities (like pip install fairlearn[plotting] and/or pip install fairlearn[extras].

Otherwise I'm all in favor of having the basic plots using matplotlib which is kinda the main household plotting tool.

selection_rates = []
for sensitive_value in sensitive_values:
selection_rates.append(selection_rate_summary['by_group'][sensitive_value])
disparity = abs(selection_rates[0] - selection_rates[1])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we move forward, fix this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably mean that it should be max and min as opposed to first and second element?
@riedgar-ms 's metrics API changes will allow for that to be done easily, although we don't have to wait for that necessarily.

@adrinjalali
Copy link
Member

Should we move this forward?

@@ -39,4 +39,3 @@ jobs:

- template: templates/build-widget-job-template.yml

- template: templates/limited-installation-job-template.yml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this being removed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because "limited" = no matplotlib, so since he added it as a "core" dependency it's not needed anymore.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

undoing this now and therefore resolving this comment

Roman Lutz added 2 commits December 18, 2020 22:27
@romanlutz
Copy link
Member

As discussed with @kevinrobinson I took over this PR with the following changes:

  • move to new MetricFrame since this PR predates v0.5.0
  • generalize the plot_disparities_in_selection_rate function to work with any metric (including binary classification & regression), hence renamed to plot_disparities_in_metric
  • fix labels since we're showing FPR/FNR and the labels said over-/underprediction which isn't quite the same (denominator is different)

The reason there are two plots right now is that @kevinrobinson did an awesome job replicating the original plots from the Fairlearn dashboard. In reality, the result from plot_disparities_in_performance is just a combined FPR/FNR plot that shows accuracy through text. I'm fine with adding that for the sake of consistency with the existing dashboard, but theoretically it's possible to get the same information by calling plot_disparities_in_metric three times with the three individual metrics.

We can probably find better names (?) Suggestions welcome! Big ones:

  • Should the module be metrics_plots or more generically plots, or plotting? We have plotting functionality for postprocessing currently half-hidden away in the postprocessing module. Perhaps that makes sense, but I'd like to hear other people's opinions.
  • The function names are quite long.

@hildeweerts @adrinjalali @MiroDudik @riedgar-ms

I will also open a few issues with "help-wanted" to add plots for

  • model comparison similar to what we have in the Fairlearn dashboard
  • passing in multiple metrics rather than just one, and plotting all of them in a subplots

Signed-off-by: Roman Lutz <rolutz@microsoft.com>
@adrinjalali
Copy link
Member

I'll check the PR soon, in the meantime, could you please remind me why we have all those .png files in this PR? As in, why wouldn't they be generated in .py example files and then used in the user-guide?

Copy link
Member

@romanlutz romanlutz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check the PR soon, in the meantime, could you please remind me why we have all those .png files in this PR? As in, why wouldn't they be generated in .py example files and then used in the user-guide?

Which .png files are you referring to? Perhaps you're looking at an earlier version? The original PR had png files, but I replaced them with .. plot:: routines provided by matplotlib to generate the plot when creating the webpage. IMO that's much preferable since we don't have to update the png whenever we make code changes.

@@ -39,4 +39,3 @@ jobs:

- template: templates/build-widget-job-template.yml

- template: templates/limited-installation-job-template.yml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

undoing this now and therefore resolving this comment

- accuracy_summary['by_group'][sensitive_values[1]])

# chart text for localization
title_text = 'Disparity in performance'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My assumption is still that we won't do localization for these, so I won't consider this blocking unless anyone jumps in stating otherwise.

@@ -9,8 +9,6 @@

OUTPUT_SEPARATOR = "-"*65

_MATPLOTLIB_IMPORT_ERROR_MESSAGE = "Please make sure to install fairlearn[customplots] to use " \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverting to soft dependency...

devops/templates/limited-installation-job-template.yml Outdated Show resolved Hide resolved
docs/api_reference/index.rst Outdated Show resolved Hide resolved
docs/contributor_guide/development_process.rst Outdated Show resolved Hide resolved
The :py:mod:`fairlearn.metrics_plots` module visualizes fairness metrics from
different perspectives.

The examples below illustrate a scenario where *binary gender* is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use a different example dataset for new examples we introduce?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm the first to criticize usage of the adult dataset since it's absolutely nonsensical and doesn't show a proper realistic task. After all, how is it useful to predict if someone earns more than the arbitrary cutoff amount of $50k?

That means we have to have something better, though. The argument for adult so far was that it's widely used in the fairness literature when comparing results of mitigation techniques. When using another dataset we've always run into the problem that you'd need to contextualize this first. For example, it's questionable to use COMPAS or Boston housing datasets without at least pointing at the complicated and controversial background. We haven't written a section in the documentation about that yet, though.

I really want us to replace adult in every example eventually. I do think that it's a separate task from this PR, though. Perhaps the credit card default dataset would work, but again we should describe the context somewhere to avoid having people think that it's fine to just use such a dataset.

Wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should contextualize a data which we use in our docs, but this requirement is making us adding more documentation with a dataset which we know has these clear issues. I would be happier (I think) if we use another dataset which doesn't have these issues, and have the contextualization as a future work instead of adding more examples with dataset and having changing all of them later as a future work.

That said, I wouldn't veto this PR because of the dataset, it'd be just really nice if we moved away from this dataset, and not using it anew is a good place to start phasing it out I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I very much share your perspective. I'm not 100% happy with any of the three datasets currently in fairlearn.datasets, but I think the credit card default dataset from UCI could be great. We use it in this notebook:
https://github.com/fairlearn/fairlearn/blob/master/notebooks/Binary%20Classification%20with%20the%20UCI%20Credit-card%20Default%20Dataset.ipynb
For that, I'd create a separate PR to add it to the datasets module first, and then replace it for this example in a follow-up PR. We can discuss whether we want it elsewhere, too, for example in the quickstart. That dataset has its own issues (which were alluded to in #418 ), of course, but I think for the purpose of demonstrating the functionality of these plotting functions it'll be fine.

This would mean this stays as is in the current PR, though, because expanding the scope to adding another dataset, documenting that dataset, etc. doesn't seem right.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I thought we'd just use fetch_openml and not add it necessarily to the datasets. If you think that's worse than having this dataset, then sure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UCI credit data set is actually not ideal, because it doesn't exhibit equalized-odds disparity, so we have to partly synthesize it (which makes for a slightly odd notebook, but it would look even weirder here I think). But I don't remember whether it shows other kinds of disparity.

Two options:

  • We use the original (untweaked) version of the UCI data set even if we don't have much disparity.
  • We use a different data set that is known to exhibit disparity, say when using linear models.

That said. I don't think this should be a blocker on this PR. However, if we are okay just using the untweaked UCI credit data set, we can put fetch_openml in this sample code for now, and replace with the fairlearn.datasets equivalent once/if it's available.

@LeJit, @hildeweerts: do you have any thoughts on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With regards to other data sets in a similar domain that we could use for examples, we could potentially use the German Credit dataset or the Lending Club dataset. Both are commonly used benchmark datasets in fairness papers, and German Credit is available through OpenML (Lending Club isn't though). We should eventually add the German Credit dataset to fairlearn.datasets anyways because it is a commonly used benchmark.

I have already written the fairlearn.datasets function for the UCI credit card data set as part of a sub-task to redo the associated Jupyter notebook. If we want to use the untweaked UCI credit card dataset for now, I can create a small pull request to get that code merged in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether we have to add a particular data set to fairlearn.datasets depends quite a bit on the purpose of that part of the repo. We've actually had discussions about this in the past and I'm not sure whether a consensus was reached (see e.g., #583).

On a more practical note, I'm not familiar with the Lending Club data but the fact it isn't on OpenML at the moment shouldn't be a problem, because we can just add it ourselves! Is this the one you refer to: https://www.kaggle.com/wordsforthewise/lending-club? From the description it doesn't seem like this data was collected with permission from Lending Club, though...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just wondering, perhaps concerns re. proper contextualization can be partly solved by taking the effort to create data sheets (see https://arxiv.org/abs/1803.09010) for the data sets in fairlearn.datasets. We probably won't be able to fill out everything, but it seems to me that if we want to practice what we preach, this would be a minimal suggestion.

@@ -0,0 +1,86 @@
# Copyright (c) Microsoft Corporation and Fairlearn contributors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove Microsoft from the new files?

sensitive_features=sensitive_features)

# chart text for localization
metric_text = metric.__name__.replace('_', ' ')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could accept metric_name or x_label, y_label optionally if the __name__ is not properly set or the user wants an alternative text.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed! I added an optional argument metric_name and documented that we'll use __name__ is no metric name is passed.

fairlearn/metrics_plots/plot_disparities_in_metric.py Outdated Show resolved Hide resolved
from fairlearn.metrics import MetricFrame


def plot_disparities_in_metric(metric, y_true, y_pred, sensitive_features, show_plot=True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plot functions should probably accept and return an ax parameter to receive and return the matplotlib's axis object so that users can further customize the plots.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I compared with some of the functions in scikit-learn. They don't have a show_plot arg, but rather the caller is expected to call plt.show() themselves. Is that preferable? I find it kind of odd to call a function called plot_* and then there's no plot shown unless I do something more, but perhaps that's just me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the context of matplotlib, I find it nice if the caller has the capability of having each plot in a subplot (hence passing ax), and makes it more configurable when the method returns an object which the user can change. In terms of the user having to call plt.show(), I'm agnostic. I don't mind having the show_plot as you have here. But it's basically replacing a single line of code. But I also don't mind having it True by default, which is kinda nice to the user I guess.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave it in there for now then, unless someone has stronger feelings about this :-)

test/unit/metrics_plots/__init__.py Outdated Show resolved Hide resolved
@@ -0,0 +1,20 @@
# Copyright (c) Microsoft Corporation and Fairlearn contributors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather have these bits in the test files themselves.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that would mean we need to duplicate it if there are multiple files. Or did you mean "make it all a single file"? I'll go with that for now, but lmk if you meant something else.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant duplicating it in the file. It makes understanding a test file as a standalone file much easier. I'm not sure how common it is to load these datasets in the conftest, but I'm not used to it. And for a third person adding a test in a test file, they'd be a bit confused or not see that there are datasets available and they'd load their own in the tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair. We've been doing it like that throughout the test/unit directory so far, so that's probably why Kevin originally copied this behavior. I assume the current structure is in line with your expectations. If not, please let me know!

Roman Lutz added 2 commits December 31, 2020 12:42
… empty line at end of file

Signed-off-by: Roman Lutz <rolutz@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>
Roman Lutz added 2 commits January 4, 2021 12:51
Signed-off-by: Roman Lutz <rolutz@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>
Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know when you want another round of review :)

@@ -0,0 +1,20 @@
# Copyright (c) Microsoft Corporation and Fairlearn contributors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant duplicating it in the file. It makes understanding a test file as a standalone file much easier. I'm not sure how common it is to load these datasets in the conftest, but I'm not used to it. And for a third person adding a test in a test file, they'd be a bit confused or not see that there are datasets available and they'd load their own in the tests.

from fairlearn.metrics import MetricFrame


def plot_disparities_in_metric(metric, *, y_true, y_pred, sensitive_features, show_plot=True,
Copy link
Member

@MiroDudik MiroDudik Jan 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three points re. API here:

  • I think that * in the signature should be after y_pred as we have in MetricFrame
  • I think this should be called plot_metric_by_group
  • instead of metric_name, we could just allow the format where metric is a single item dictionary? this would be more in line with how you would get this done with MetricFrame.

With the above three modifications, the signature becomes:

def plot_metric_by_group(metric, y_true, y_pred, *, sensitive_features, show_plot=True, ax=None):

Another idea (for another PR) is to have a method MetricFrame.plot(show_plot=True, ax=None) and this would be just a wrapper for that? This would be analogous to pd.DataFrame.plot. Although there's an obvious wrinkle re. what to do if you have multiple metrics and/or control features, which we probably don't need to resolve in this PR :-)

Copy link
Contributor

@hildeweerts hildeweerts Jan 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with renaming to make it more consistent & I like the idea for MetricFrame.plot().

Signed-off-by: MiroDudik <mdudik@gmail.com>
@hildeweerts
Copy link
Contributor

I know I'm a bit late to the party and I haven't run the latest code myself, but I have to admit that I personally don't find the plot_disparities_in_performance() plot very intuitive - although displaying FPR/FNR is already makes more sense to me than underprediction/overprediction.

I'm probably biased though because I've mostly worked with imbalanced datasets in the past (fraud detection, predictive maintenance, etc.), in which case accuracy doesn't really make sense as a performance metric, nor does plotting FPR/FNR on the same scale.

@romanlutz
Copy link
Member

I know I'm a bit late to the party and I haven't run the latest code myself, but I have to admit that I personally don't find the plot_disparities_in_performance() plot very intuitive - although displaying FPR/FNR is already makes more sense to me than underprediction/overprediction.

I'm probably biased though because I've mostly worked with imbalanced datasets in the past (fraud detection, predictive maintenance, etc.), in which case accuracy doesn't really make sense as a performance metric, nor does plotting FPR/FNR on the same scale.

@MiroDudik mentioned something similar and I totally agree. IMO the ideal state would be to cut that function and if someone wants to see both they can use the other function to plot FPR and FNR. The only reason it's there is that Kevin replicated the existing FairlearnDashboard in this PR, and I felt bad about just cutting it. If we all agree that it's best not to have it let's do that!

Note that there are a few follow-up tasks that I created already ( #668 #667 #666 ) so those sorts of things are beyond the scope of this PR :-)

Base automatically changed from master to main February 6, 2021 06:05
@MiroDudik
Copy link
Member

MiroDudik commented Apr 23, 2021

I was working on a tutorial and I realized that our MetricFrame already supports plotting functionality to an extent. This is a small example demonstrating it:

# Imports
import pandas as pd
import numpy as np
import fairlearn.metrics as flm
import sklearn.metrics as skm
from fairlearn.metrics import MetricFrame

# Create a toy data set
df = pd.DataFrame(
    {'sex':  ['F', 'M', 'F', 'M', 'M', 'F', 'M', 'F', 'M', 'M'],
     'race': ['White', 'White', 'White', 'Black', 'Black', 'Black', 'Hispanic', 'Hispanic', 'White', 'Black'],
     'y_true': [0, 1, 1, 0, 1, 1, 1, 1, 1, 0],
     'y_pred': [1, 1, 1, 0, 0, 1, 0, 1, 1, 1]})

# Define metrics
metrics = {
    'accuracy': skm.accuracy_score,
    'precision': skm.precision_score,
    'recall': skm.recall_score,
    'false_positive_rate': flm.false_positive_rate,
    'true_positive_rate': flm.true_positive_rate,
    'avg_true': lambda y_true, y_pred: np.mean(y_true),
    'avg_pred': lambda y_true, y_pred: np.mean(y_pred),
    'count': lambda y_true, y_pred: y_true.shape[0],
}

# Set up metric frame
mf = MetricFrame(metrics, df['y_true'], df['y_pred'], sensitive_features=df[['sex','race']])

# Plots
mf.by_group.plot.bar(y=['false_positive_rate','true_positive_rate'],
                     title='Compare multiple disaggregated metrics in a single plot');

mf.by_group.plot.bar(subplots=True, layout=[3,3], legend=False, figsize=[12,8],
                     title='Show all metrics');

image

[tagging @adrinjalali , @hildeweerts , @romanlutz , @LeJit ]

@adrinjalali
Copy link
Member

adrinjalali commented Apr 25, 2021

I was working on a tutorial and I realized that our MetricFrame already supports plotting functionality to an extent.

It'd be nice to have some of these examples in our documentation.

@romanlutz
Copy link
Member

@MiroDudik this is awesome! I assume this is because MetricFrame's by_group is returned as a DataFrame which has plot: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.html

@rishabhsamb FYI this could be useful!

@romanlutz
Copy link
Member

Closing this PR since #766 covers this with much less custom code. Thanks @kevinrobinson for getting the ball rolling on this! I strongly doubt we'd be this far in the transition away from the dashboard if not for this work.

@romanlutz romanlutz closed this May 3, 2021
riedgar-ms pushed a commit that referenced this pull request May 19, 2021
…766)

With a slight delay (originally targeted for April) I'm finally removing the `FairlearnDashboard` since a newer version already exists in `raiwidgets`. The documentation is updated to instead use the plots @MiroDudik created with a single line directly from the `MetricFrame`. In the future we want to add more kinds of plots as already mentioned in #758 #666 and #668 . Specifically, the model comparison plots do not yet have a replacement yet.

Note that the "example" added to the `examples` directory is not shown under "Example notebooks" on the webpage, which is intentional since it's technically not a notebook.

This also makes #561 mostly redundant, which I'll close shortly. 
#667 is also directly addressed with this PR as the examples illustrate.

Signed-off-by: Roman Lutz <rolutz@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants