-
Notifications
You must be signed in to change notification settings - Fork 412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add model comparison plot #666
Comments
The function should either accept a metric or the user should compute the metrics outside the function and pass the scores to the plotting method. |
You're right, of course! I missed the metric function. Thanks! |
@rishabhsamb expressed some interest in this, so I wanted to make sure we agree on all the basics. In my description above it looks like a standalone function. I think that's also our only option since
But there are still a few different things to choose here such as:
After thinking through these questions I'm thinking of the following
which would then proceed to plot overall accuracy on the x-axis (the value you get from @fairlearn/fairlearn-maintainers wdyt? Also, @rishabhsamb please chime in with your own thoughts. I realize I may be missing some pieces or skipping over important details. |
I'm not sure, an I guess ideally, the user would have like a few different models and maybe one or two or more grid search CV done on a few hyper parameter sets, and would like to visualize those results. Have you looked in to the outputs
I would discourage us from incorporating these concepts into the signature of such a function. I would rather have simple metric functions as X and Y, and whatever logic needs to be there, is in that function. We have a bunch of |
Regarding the module, it seemed just intuitive to have it with metrics since this is basically a metrics visualization. That's not a terribly important point for me, though, so whatever works for people is fine by me. Note that this isn't just for I understand the concern with exposing so many knobs on the metric aggregation. However, I'm not sure how else you will achieve the same outcome. This isn't just the simple metric, but aggregations of that metric between the groups. If I need to plot the overall accuracy score vs. selection rate difference there isn't really another way to express that (as far as I can tell) other than
If I've missed another option please point it out. |
So for me to understand, if I have 3 models and a sensitive attribute which has 2 groups in it, how many points do I want to see in that plot? I was assuming 3. |
That matches my expectation exactly. But for each of those 3 models I have many choices for what the axes should be. Let's say I want to plot accuracy vs. selection rate. I'll just go with my earlier API proposal which would result in
How does this function internally know whether I want
I see no way of expressing that without args, hence:
|
The
Internally we can call the first one, in case it raises something, we call the second. plot_model_comparison(
x_metric=accuracy_score,
y_metric=partial(equalized_odds_ratio, method="max"),
y_true,
y_preds={"model 1": y_pred_1, "model 2": y_pred_2, ... },
sensitive_features, show_plot=True) |
I had thought about a similar option, but I still think it would cause some extra trouble for users. That leaves users with something like this:
It's more flexible than what I proposed in that my proposal can't accommodate the The middle path would be to still go with your suggestion, but create "shortcuts" like I certainly like having fewer arguments and for me that pretty much overrules any concerns I've mentioned. I'm just wondering whether there's another way to avoid creating so many custom functions. |
I very much prefer to make the user write 3 lines of code rather than having an API which is really hard to understand and remember. Ideally for more complicated cases, |
I like @adrinjalali's proposal above. Just a request to make all the arguments keyword only! |
[@romanlutz : for the specific use case that you're mentioning, we already support accuracy_score_group_min, but of course, i don't want to add too many entries to that table.] |
…766) With a slight delay (originally targeted for April) I'm finally removing the `FairlearnDashboard` since a newer version already exists in `raiwidgets`. The documentation is updated to instead use the plots @MiroDudik created with a single line directly from the `MetricFrame`. In the future we want to add more kinds of plots as already mentioned in #758 #666 and #668 . Specifically, the model comparison plots do not yet have a replacement yet. Note that the "example" added to the `examples` directory is not shown under "Example notebooks" on the webpage, which is intentional since it's technically not a notebook. This also makes #561 mostly redundant, which I'll close shortly. #667 is also directly addressed with this PR as the examples illustrate. Signed-off-by: Roman Lutz <rolutz@microsoft.com>
This issue is still up for grabs, but whoever wants to do it can start from where @rishabhsamb and I left off which is captured in this branch: https://github.com/rishabhsamb/fairlearn/tree/refactored-input There's quite a bit of input validation we needed to refactor first because the validation is almost identical to what we're doing for mitigation techniques, but doesn't require certain parts of it. I suppose that could be its own PR to start with, and maybe I should get that out of the way before someone does this... |
Hi! I worked on this :) I'm still a bit new and I appreciate feedback/help. I have a few questions also. You can check out my progress on https://github.com/SeanMcCarren/fairlearn/tree/model_comparison_plot
|
For (1) have you done a For (2), definitely go ahead and make the PR. You'll only trigger builds when you do a push, not on every commit. And CPU time isn't that precious. For (3), I don't think we have a good solution for plotting tests. You can write an example notebook to show their use, but that would end up being a manual check. |
I agree with everything @riedgar-ms wrote, and I'm SUPER excited you're interested in this issue!!! I would completely ignore the matplotlib tests since I could never get them to work in CI. This requires a separate issue perhaps at some point, but the main thing is that it renders right on the webpage which we need to check every time we make a change to the corresponding page. So, in summary, @riedgar-ms is right in saying that we should have documentation with an example. However, I would recommend adding a user guide page as well, ideally based on the example notebook (if you search for |
…numpydoc - also support extra kwargs which are passed to ax.scatter - only if no ax is provided, automatically infer axis labels
* refactored input Signed-off-by: Rishabh Sambare <rishabhsamb@gmail.com> * plot_model_comparison and example in a notebook The (initial attempt at) the body of plot_model_comparison (probably still missing sanity w.r.t. the x_axis_metric and y_axis_metric... can I just pass sensitive_features always? nothing else required?) To make it work, added the import in __init__.py Also created an example in one of the existing notebooks as I believe it is a fitting context! * Implemented plot_model_comparison as specified in #666 with numpydoc - also support extra kwargs which are passed to ax.scatter - only if no ax is provided, automatically infer axis labels * Fixed small bugs * Start documentation Example notebook section User guide section (with literalinclude) * rewording * Update docs/user_guide/assessment.rst Co-authored-by: Roman Lutz <romanlutz13@gmail.com> * Update fairlearn/metrics/_multi_curve_plot.py Co-authored-by: Roman Lutz <romanlutz13@gmail.com> * Update fairlearn/metrics/_multi_curve_plot.py Co-authored-by: Roman Lutz <romanlutz13@gmail.com> * Added :end-before: for safety with literalinclude * Typechecking y_preds and not showing plot * Testcase for comparison plot when there is no matplotlib installed * Removed show_plot in User Guide example * remove show_plot here also * impl roman's suggest on 15 sept to replace "_" with " " * add point color and label as a baseline * flake8 * Update _multi_curve_plot.py * Enumerate over dictionary keys to get indices * group_by_name kw: plot models with same prefix same color * Also change point label text * changes as described in PR * fixed point_labels * rename groups to model_group * model_kwargs instead of group_kwargs * Combine calls to scatter if kwargs are equal * lint * Delete ._plotting.py.swp * legend_kwargs * Improve plot_model_comparison example * Got rid of redundant preprocessing changes * flake8 * fix test * implement changes suggested by @MiroDudik * variable name change * fix param name * unused import * Add point_labels_position suggestion from @MiroDudik * Suggestion to not use global variable * Bug fix * Basic testing * flake8 * small rewrite * test coverage * fix tests with pytest.raises * more coverage * More consistent handling of sf and cf * Additional test coverage * Fixing faulty test cases * flake8 * Apply suggestions from code review Co-authored-by: MiroDudik <mdudik@gmail.com> * simplified selection rate difference * ask for list instead of tuple * make backwards compatible * Update fairlearn/metrics/_multi_curve_plot.py Co-authored-by: MiroDudik <mdudik@gmail.com> * Remove legend kwargs * Update fairlearn/metrics/_multi_curve_plot.py Co-authored-by: Richard Edgar <riedgar@microsoft.com> * refactor renaming * Testing errors * Remove test for legend_kwargs as we removed legend_kwargs as arg * Remove unnecessary comment & some final test coverage Signed-off-by: Rishabh Sambare <rishabhsamb@gmail.com> Co-authored-by: Rishabh Sambare <rishabhsamb@gmail.com> Co-authored-by: Roman Lutz <romanlutz13@gmail.com> Co-authored-by: MiroDudik <mdudik@gmail.com> Co-authored-by: Hilde Weerts <24417440+hildeweerts@users.noreply.github.com> Co-authored-by: Richard Edgar <riedgar@microsoft.com>
#947 addressed this! Closing 🙂 |
Is your feature request related to a problem? Please describe.
With the introduction of
matplotlib
-based plotting functions in #561 we are adding ways to plot metrics for all groups as defined bysensitive_features
. To get to parity with theFairlearnDashboard
we should also add model comparison visualizations. This would be pretty much identical to the following existing view:Just like the functions in #561 this should live in the
metrics
module.Describe the solution you'd like
The text was updated successfully, but these errors were encountered: