-
Notifications
You must be signed in to change notification settings - Fork 403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plots: Add plot_disparities_in_performance and plot_disparities_in_metric #561
Plots: Add plot_disparities_in_performance and plot_disparities_in_metric #561
Conversation
db1c918
to
41ad924
Compare
As a warning, I think @romanlutz came up with a way of testing plotting methods too? |
That's correct! I'm not too fond of that name We'd also have to provide proper error messages telling users that they need matplotlib if it's not installed, just like with the postprocessing plots. I don't quite agree with everything written in the description but that's immaterial. The more important question to me is what this means to UI development for the community and the proposal fairlearn/fairlearn-proposals#14 . Does this mean you'd prefer not to have a separate repo elsewhere? On Thursday people were quite insistent that there should be a way for the community to write UI, so I'd rather get that settled before inventing new UIs. Regarding testing: there's a pytest plugin for testing plots but it didn't work across platforms that we test on, so I never turned it on. It does work locally, though. See https://github.com/fairlearn/fairlearn/blob/master/test/unit/postprocessing/test_plots.py for more information. Finally, this kind of change would need an update to CHANGES.md, API reference (docs folder), and user guide. |
@riedgar-ms @romanlutz Thanks for the comments! 👍 This gets started with the first step outlined in fairlearn/fairlearn-proposals#14 (comment), and I'm hoping to help move things forward by showing rather than telling :) It also improves the UX for prospective users currently facing issues like #484 #501 and #558. If folks want to merge this, I can work myself or with anyone else to finish this off. |
Right. I think this is a fine first step. We sort of agreed to try and add documentation (API reference for new modules, CHANGES.md, user guide) whenever adding the code, although that's obviously not as strict as long as we have severe gaps around many parts of the codebase. Starting a new module strikes me as a great place to make sure we have some of these in place. I certainly won't insist on a comprehensive user guide in this PR, but the other parts are all fixable in a single line each. Let me know if you have questions about it. The matplotlib concern should probably be addressed as well. Other than that this looks great. Thanks for getting this started @kevinrobinson . Of course, this assumes that people in the community want this solution for the UI. It would certainly be nice to hear people's opinions, although in the current form matplotlib is optional and therefore this isn't that different of a setup from before. I know @adrinjalali talked about matplotlib plots before so this should be in line with his opinions hopefully (?) |
@romanlutz Thanks! 👍 re: community consensus, my interpretation of fairlearn/fairlearn-proposals#14 (comment) and fairlearn/fairlearn-proposals#14 (comment) was that other folks have previously suggested this approach, which is part of why I'm trying to build on that consensus here. But to shift gears, I think it is far more important whether users want this :) I think all user-facing changes should primarily be evaluated on the value they create for users, be informed by user research, or other informal methods of validation and co-design. So even the concerns in fairlearn/fairlearn-proposals#14 are a primary consideration, hopefully on balance the discussion on this pull request also considers whether this change would serve users. In that spirit, this PR improves the UX for people trying to visualize fairness metrics but facing issues like #484 #501 and #558. re: documentation, I added API docs and a section to the user guide, mirroring what is already there. I didn't make any edits to improve quality (eg, adding sociotechnical context, showing how these charts are relevant for assessing or reducing real harms, adding guidelines on how to interpret visuals, etc.) re: code, I added commits removing EDIT: removing the test checking for matplotlib led to CI failures for limited env. I looked at the job steps, and it seems that job only exists for that one test. I removed it, and references in other config to the |
Signed-off-by: kevinrobinson <kevin.robinson.0@gmail.com>
Signed-off-by: kevinrobinson <kevin.robinson.0@gmail.com>
Signed-off-by: kevinrobinson <kevin.robinson.0@gmail.com>
Signed-off-by: kevinrobinson <kevin.robinson.0@gmail.com>
Signed-off-by: kevinrobinson <kevin.robinson.0@gmail.com>
cfe2665
to
5eb7901
Compare
Signed-off-by: kevinrobinson <kevin.robinson.0@gmail.com>
Signed-off-by: kevinrobinson <kevin.robinson.0@gmail.com>
Signed-off-by: kevinrobinson <kevin.robinson.0@gmail.com>
Leaving matplotlib out of the main/default/base (not sure about terminology) package was a very conscious choice. If we want to undo that we should at least hear what @adrinjalali @MiroDudik @riedgar-ms think. That in itself will hold up this PR a bit because @MiroDudik isn't available for a few days. I vaguely remember lots of issues testing this on different platforms. The separation into a base package and a plotting extension made that slightly harder, so the simplicity of having just one is something I appreciate. Perhaps the issues with including matplotlib go further than that, though? I'm sure @adrinjalali will know. |
Signed-off-by: kevinrobinson <kevin.robinson.0@gmail.com>
underpredictions = [] | ||
for sensitive_value in sensitive_values: | ||
overpredictions.append(fp_summary['by_group'][sensitive_value]) | ||
underpredictions.append(fn_summary['by_group'][sensitive_value]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the issue description for more, but a review of the code in this area that determines overpredictions and underpredictions would be super helpful! 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like something that might belong in the metrics code (although that is in the process of being rewritten).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pointed this out in the other comment (today) but after consulting with @MiroDudik overprediction != FPR and underprediction != FNR, so we should change it so that it doesn't say over and underprediction anymore, but rather FPR and FNR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@romanlutz Thanks! This is super helpful 👍
As the current plan for 0.5 is to ship the metrics changes, switch to Flask for the widget, and make documentation updates all around (eg, so the quickstart works), I'll leave this work alone so as to not slow down those other things from shipping first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the comments are minor and easy to address. No objections from me, looks great. Definitely need to wait for @MiroDudik 's thoughts which will take a week I'm afraid. Since this adds a new module and dependency to the base package it's not something I feel comfortable deciding without his input.
Besides, I'm super curious to hear from others in the community. If this is in line with people's ideas about a place to create visualizations then that's fantastic.
Signed-off-by: kevinrobinson <kevin.robinson.0@gmail.com>
Otherwise I'm all in favor of having the basic plots using |
selection_rates = [] | ||
for sensitive_value in sensitive_values: | ||
selection_rates.append(selection_rate_summary['by_group'][sensitive_value]) | ||
disparity = abs(selection_rates[0] - selection_rates[1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we move forward, fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably mean that it should be max and min as opposed to first and second element?
@riedgar-ms 's metrics API changes will allow for that to be done easily, although we don't have to wait for that necessarily.
Should we move this forward? |
devops/nightly.yml
Outdated
@@ -39,4 +39,3 @@ jobs: | |||
|
|||
- template: templates/build-widget-job-template.yml | |||
|
|||
- template: templates/limited-installation-job-template.yml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this being removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because "limited" = no matplotlib, so since he added it as a "core" dependency it's not needed anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
undoing this now and therefore resolving this comment
…quick-start-remove-plots
Signed-off-by: Roman Lutz <rolutz@microsoft.com>
As discussed with @kevinrobinson I took over this PR with the following changes:
The reason there are two plots right now is that @kevinrobinson did an awesome job replicating the original plots from the Fairlearn dashboard. In reality, the result from We can probably find better names (?) Suggestions welcome! Big ones:
@hildeweerts @adrinjalali @MiroDudik @riedgar-ms I will also open a few issues with "help-wanted" to add plots for
|
Signed-off-by: Roman Lutz <rolutz@microsoft.com>
I'll check the PR soon, in the meantime, could you please remind me why we have all those |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll check the PR soon, in the meantime, could you please remind me why we have all those
.png
files in this PR? As in, why wouldn't they be generated in.py
example files and then used in the user-guide?
Which .png
files are you referring to? Perhaps you're looking at an earlier version? The original PR had png files, but I replaced them with .. plot::
routines provided by matplotlib to generate the plot when creating the webpage. IMO that's much preferable since we don't have to update the png whenever we make code changes.
devops/nightly.yml
Outdated
@@ -39,4 +39,3 @@ jobs: | |||
|
|||
- template: templates/build-widget-job-template.yml | |||
|
|||
- template: templates/limited-installation-job-template.yml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
undoing this now and therefore resolving this comment
- accuracy_summary['by_group'][sensitive_values[1]]) | ||
|
||
# chart text for localization | ||
title_text = 'Disparity in performance' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My assumption is still that we won't do localization for these, so I won't consider this blocking unless anyone jumps in stating otherwise.
@@ -9,8 +9,6 @@ | |||
|
|||
OUTPUT_SEPARATOR = "-"*65 | |||
|
|||
_MATPLOTLIB_IMPORT_ERROR_MESSAGE = "Please make sure to install fairlearn[customplots] to use " \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverting to soft dependency...
The :py:mod:`fairlearn.metrics_plots` module visualizes fairness metrics from | ||
different perspectives. | ||
|
||
The examples below illustrate a scenario where *binary gender* is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we use a different example dataset for new examples we introduce?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm the first to criticize usage of the adult dataset since it's absolutely nonsensical and doesn't show a proper realistic task. After all, how is it useful to predict if someone earns more than the arbitrary cutoff amount of $50k?
That means we have to have something better, though. The argument for adult so far was that it's widely used in the fairness literature when comparing results of mitigation techniques. When using another dataset we've always run into the problem that you'd need to contextualize this first. For example, it's questionable to use COMPAS or Boston housing datasets without at least pointing at the complicated and controversial background. We haven't written a section in the documentation about that yet, though.
I really want us to replace adult in every example eventually. I do think that it's a separate task from this PR, though. Perhaps the credit card default dataset would work, but again we should describe the context somewhere to avoid having people think that it's fine to just use such a dataset.
Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we should contextualize a data which we use in our docs, but this requirement is making us adding more documentation with a dataset which we know has these clear issues. I would be happier (I think) if we use another dataset which doesn't have these issues, and have the contextualization as a future work instead of adding more examples with dataset and having changing all of them later as a future work.
That said, I wouldn't veto this PR because of the dataset, it'd be just really nice if we moved away from this dataset, and not using it anew is a good place to start phasing it out I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I very much share your perspective. I'm not 100% happy with any of the three datasets currently in fairlearn.datasets
, but I think the credit card default dataset from UCI could be great. We use it in this notebook:
https://github.com/fairlearn/fairlearn/blob/master/notebooks/Binary%20Classification%20with%20the%20UCI%20Credit-card%20Default%20Dataset.ipynb
For that, I'd create a separate PR to add it to the datasets module first, and then replace it for this example in a follow-up PR. We can discuss whether we want it elsewhere, too, for example in the quickstart. That dataset has its own issues (which were alluded to in #418 ), of course, but I think for the purpose of demonstrating the functionality of these plotting functions it'll be fine.
This would mean this stays as is in the current PR, though, because expanding the scope to adding another dataset, documenting that dataset, etc. doesn't seem right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I thought we'd just use fetch_openml
and not add it necessarily to the datasets. If you think that's worse than having this dataset, then sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UCI credit data set is actually not ideal, because it doesn't exhibit equalized-odds disparity, so we have to partly synthesize it (which makes for a slightly odd notebook, but it would look even weirder here I think). But I don't remember whether it shows other kinds of disparity.
Two options:
- We use the original (untweaked) version of the UCI data set even if we don't have much disparity.
- We use a different data set that is known to exhibit disparity, say when using linear models.
That said. I don't think this should be a blocker on this PR. However, if we are okay just using the untweaked UCI credit data set, we can put fetch_openml
in this sample code for now, and replace with the fairlearn.datasets
equivalent once/if it's available.
@LeJit, @hildeweerts: do you have any thoughts on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With regards to other data sets in a similar domain that we could use for examples, we could potentially use the German Credit dataset or the Lending Club dataset. Both are commonly used benchmark datasets in fairness papers, and German Credit is available through OpenML (Lending Club isn't though). We should eventually add the German Credit dataset to fairlearn.datasets
anyways because it is a commonly used benchmark.
I have already written the fairlearn.datasets
function for the UCI credit card data set as part of a sub-task to redo the associated Jupyter notebook. If we want to use the untweaked UCI credit card dataset for now, I can create a small pull request to get that code merged in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whether we have to add a particular data set to fairlearn.datasets
depends quite a bit on the purpose of that part of the repo. We've actually had discussions about this in the past and I'm not sure whether a consensus was reached (see e.g., #583).
On a more practical note, I'm not familiar with the Lending Club data but the fact it isn't on OpenML at the moment shouldn't be a problem, because we can just add it ourselves! Is this the one you refer to: https://www.kaggle.com/wordsforthewise/lending-club? From the description it doesn't seem like this data was collected with permission from Lending Club, though...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just wondering, perhaps concerns re. proper contextualization can be partly solved by taking the effort to create data sheets (see https://arxiv.org/abs/1803.09010) for the data sets in fairlearn.datasets
. We probably won't be able to fill out everything, but it seems to me that if we want to practice what we preach, this would be a minimal suggestion.
@@ -0,0 +1,86 @@ | |||
# Copyright (c) Microsoft Corporation and Fairlearn contributors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove Microsoft from the new files?
sensitive_features=sensitive_features) | ||
|
||
# chart text for localization | ||
metric_text = metric.__name__.replace('_', ' ') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could accept metric_name
or x_label
, y_label
optionally if the __name__
is not properly set or the user wants an alternative text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed! I added an optional argument metric_name
and documented that we'll use __name__
is no metric name is passed.
from fairlearn.metrics import MetricFrame | ||
|
||
|
||
def plot_disparities_in_metric(metric, y_true, y_pred, sensitive_features, show_plot=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plot functions should probably accept and return an ax
parameter to receive and return the matplotlib
's axis
object so that users can further customize the plots.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I compared with some of the functions in scikit-learn
. They don't have a show_plot
arg, but rather the caller is expected to call plt.show()
themselves. Is that preferable? I find it kind of odd to call a function called plot_*
and then there's no plot shown unless I do something more, but perhaps that's just me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the context of matplotlib, I find it nice if the caller has the capability of having each plot in a subplot (hence passing ax
), and makes it more configurable when the method returns an object which the user can change. In terms of the user having to call plt.show()
, I'm agnostic. I don't mind having the show_plot
as you have here. But it's basically replacing a single line of code. But I also don't mind having it True
by default, which is kinda nice to the user I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll leave it in there for now then, unless someone has stronger feelings about this :-)
test/unit/metrics_plots/conftest.py
Outdated
@@ -0,0 +1,20 @@ | |||
# Copyright (c) Microsoft Corporation and Fairlearn contributors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather have these bits in the test files themselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that would mean we need to duplicate it if there are multiple files. Or did you mean "make it all a single file"? I'll go with that for now, but lmk if you meant something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant duplicating it in the file. It makes understanding a test file as a standalone file much easier. I'm not sure how common it is to load these datasets in the conftest, but I'm not used to it. And for a third person adding a test in a test file, they'd be a bit confused or not see that there are datasets available and they'd load their own in the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair. We've been doing it like that throughout the test/unit directory so far, so that's probably why Kevin originally copied this behavior. I assume the current structure is in line with your expectations. If not, please let me know!
… empty line at end of file Signed-off-by: Roman Lutz <rolutz@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>
Signed-off-by: Roman Lutz <rolutz@microsoft.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know when you want another round of review :)
test/unit/metrics_plots/conftest.py
Outdated
@@ -0,0 +1,20 @@ | |||
# Copyright (c) Microsoft Corporation and Fairlearn contributors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant duplicating it in the file. It makes understanding a test file as a standalone file much easier. I'm not sure how common it is to load these datasets in the conftest, but I'm not used to it. And for a third person adding a test in a test file, they'd be a bit confused or not see that there are datasets available and they'd load their own in the tests.
from fairlearn.metrics import MetricFrame | ||
|
||
|
||
def plot_disparities_in_metric(metric, *, y_true, y_pred, sensitive_features, show_plot=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Three points re. API here:
- I think that
*
in the signature should be aftery_pred
as we have inMetricFrame
- I think this should be called
plot_metric_by_group
- instead of
metric_name
, we could just allow the format wheremetric
is a single item dictionary? this would be more in line with how you would get this done withMetricFrame
.
With the above three modifications, the signature becomes:
def plot_metric_by_group(metric, y_true, y_pred, *, sensitive_features, show_plot=True, ax=None):
Another idea (for another PR) is to have a method MetricFrame.plot(show_plot=True, ax=None)
and this would be just a wrapper for that? This would be analogous to pd.DataFrame.plot. Although there's an obvious wrinkle re. what to do if you have multiple metrics and/or control features, which we probably don't need to resolve in this PR :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with renaming to make it more consistent & I like the idea for MetricFrame.plot()
.
Signed-off-by: MiroDudik <mdudik@gmail.com>
I know I'm a bit late to the party and I haven't run the latest code myself, but I have to admit that I personally don't find the I'm probably biased though because I've mostly worked with imbalanced datasets in the past (fraud detection, predictive maintenance, etc.), in which case accuracy doesn't really make sense as a performance metric, nor does plotting FPR/FNR on the same scale. |
@MiroDudik mentioned something similar and I totally agree. IMO the ideal state would be to cut that function and if someone wants to see both they can use the other function to plot FPR and FNR. The only reason it's there is that Kevin replicated the existing Note that there are a few follow-up tasks that I created already ( #668 #667 #666 ) so those sorts of things are beyond the scope of this PR :-) |
I was working on a tutorial and I realized that our # Imports
import pandas as pd
import numpy as np
import fairlearn.metrics as flm
import sklearn.metrics as skm
from fairlearn.metrics import MetricFrame
# Create a toy data set
df = pd.DataFrame(
{'sex': ['F', 'M', 'F', 'M', 'M', 'F', 'M', 'F', 'M', 'M'],
'race': ['White', 'White', 'White', 'Black', 'Black', 'Black', 'Hispanic', 'Hispanic', 'White', 'Black'],
'y_true': [0, 1, 1, 0, 1, 1, 1, 1, 1, 0],
'y_pred': [1, 1, 1, 0, 0, 1, 0, 1, 1, 1]})
# Define metrics
metrics = {
'accuracy': skm.accuracy_score,
'precision': skm.precision_score,
'recall': skm.recall_score,
'false_positive_rate': flm.false_positive_rate,
'true_positive_rate': flm.true_positive_rate,
'avg_true': lambda y_true, y_pred: np.mean(y_true),
'avg_pred': lambda y_true, y_pred: np.mean(y_pred),
'count': lambda y_true, y_pred: y_true.shape[0],
}
# Set up metric frame
mf = MetricFrame(metrics, df['y_true'], df['y_pred'], sensitive_features=df[['sex','race']])
# Plots
mf.by_group.plot.bar(y=['false_positive_rate','true_positive_rate'],
title='Compare multiple disaggregated metrics in a single plot');
mf.by_group.plot.bar(subplots=True, layout=[3,3], legend=False, figsize=[12,8],
title='Show all metrics'); [tagging @adrinjalali , @hildeweerts , @romanlutz , @LeJit ] |
It'd be nice to have some of these examples in our documentation. |
@MiroDudik this is awesome! I assume this is because @rishabhsamb FYI this could be useful! |
Closing this PR since #766 covers this with much less custom code. Thanks @kevinrobinson for getting the ball rolling on this! I strongly doubt we'd be this far in the transition away from the dashboard if not for this work. |
…766) With a slight delay (originally targeted for April) I'm finally removing the `FairlearnDashboard` since a newer version already exists in `raiwidgets`. The documentation is updated to instead use the plots @MiroDudik created with a single line directly from the `MetricFrame`. In the future we want to add more kinds of plots as already mentioned in #758 #666 and #668 . Specifically, the model comparison plots do not yet have a replacement yet. Note that the "example" added to the `examples` directory is not shown under "Example notebooks" on the webpage, which is intentional since it's technically not a notebook. This also makes #561 mostly redundant, which I'll close shortly. #667 is also directly addressed with this PR as the examples illustrate. Signed-off-by: Roman Lutz <rolutz@microsoft.com>
Rationale
This work is intended to help along the suggestion in fairlearn/fairlearn-proposals#14 (comment), with the assumption that the UI and Azure-specific bits of the project will be removed and placed in a separate repository with a different governance structure.
To illustrate that this is both viable and plays to the strengths of folks currently staffing the project, here is an example function that could be used in the quick start guide, removing the need for any Azure or UI code. This is based on some key assumptions, including:
If these assumptions are off, please comment and help validate them! 😄
What this adds
This pull request adds the
plot_disparities_in_performance
andplot_disparities_in_selection_rate
functions, intended to replace those two visualizations built as widgets in HTML/CSS/JS.Disparity in selection rate
Note the change of bar color to grey, since blue was used to communicate something specific in the chart above and the meaning of the color is not the same in this chart.
now: +3 clicks in Azure UI
added: plot_disparity_in_selection_rate
Disparity in accuracy
Note the overprediction and underprediction values within the charts are quite different in the new plot. They reflect the values that the
false_positive_rate
andfalse_negative_rate
functions return, but these don't really make sense to me. I am mixing up something important here, but not sure what. I'll take another look next week, but if other folks can spot what the issue is that'd be awesome and super helpful! 👍EDIT: copy-paste typo in underprediction/overprediction is fixed now.
now: +3 clicks in Azure UI
added: plot_disparity_in_accuracy
EDIT: as @romanlutz helpfully pointed out, this also supports multiple values for a single sensitive attribute. This example uses
![Screen Shot 2020-08-17 at 10 17 44 AM](https://user-images.githubusercontent.com/1056957/90407070-06046180-e074-11ea-8d5e-01af3df121e3.png)
data.data['race']
in place ofdata.data['sex']
below (these examples are all narrowly technical without sociotechnical context):Other notes
Code to test this in Jupyter on 0.4.6: