New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MetricFrame should support metrics that don't require y_true and y_pred #756
Comments
Resolving this issue would also enable #676 . |
We definitely should be figuring out how to offer metrics for other scenarios. However, I'm a little concerned about a few points. Doing (1) is going to break all existing code. Now, while we hardly have the usage of For (2), we could add a For the rest, would it be wiser to say " |
If we want to implement (1), we would not break everything right away. We would start off by issuing deprecation warnings. Would that work, @riedgar-ms @romanlutz @hildeweerts @adrinjalali ? I just think that our current API is really limiting us, so I'd rather fix this sooner rather than commit to the future of docs that describe all kinds of workarounds. I want to hold off discussing (2), because I acknowledge that there are a variety of considerations... I think that we have several options there. It would be a non-breaking change, so there's less urgency than with (1), which, if we decide to go that route, would be nice to accomplish before SciPy tutorial (at least the version where we have a new format, and issue warnings if somebody is using the old format). |
Is there a way of detecting whether an argument was passed by position rather than by keyword? |
I am generally not opposed to Step 1 (although I don't really see the need for changing When we get to Step 2 things get a bit tricky IMO. I am afraid that trying to fit all different kinds of learning tasks into one |
Re. # Define metrics
metrics = {
'accuracy': skm.accuracy_score,
'precision': skm.precision_score,
'recall': skm.recall_score,
'false_positive_rate': flm.false_positive_rate,
}
mf = MetricFrame(metrics, ...) So it seems that @riedgar-ms : the transitional API (i.e., for the purposes of deprecating the old API) would be something like: |
I'm not terribly opinionated on this issue, but if we move to something like I would also suggest providing a few examples of what that would look like if we have different kinds of metrics (e.g. accuracy needs Finally, I'm wondering whether the right thing to do here would be to add a warning that |
@romanlutz -- i agree than step (2) requires some care, but what do you think about carrying out step (1), with the intermediate API of the form: MetricFrame(*args, metrics=None, y_true=None, y_pred=None,
sensitive_features, control_features=None, sample_params=None) This would support the current behavior when 1-3 positional arguments are provided, but in those cases it would give a warning that this calling format is being deprecated. |
I think that there are a few different things going on here, and I'd like to separate the different threads. Firstly, I think that the idea behind Secondly, I certainly don't think that I can see the logic in renaming the |
@riedgar-ms : one of the reason to have keyword-only parameters is so that you can skip them when they're not needed. For example, if we went with the signature: MetricFrame(*, metrics, y_true=None, y_pred=None, sensitive_features, control_features=None, sample_params=None) Then it would be okay to drop |
Re. But I also don't mind if |
If the use case is "we can drop the current parameters, and have a different set of parameters instead" then we should probably be creating a new class to handle the second set of parameters. Right now, I feel that It may be that there is a more general API hiding within, and that we end up changing |
I feel like this discussion could benefit from some concrete examples. Like @riedgar-ms (and @hildeweerts I think) I'm currently struggling to see the benefit of complicating the currently very intuitive API, but perhaps with 3-5 examples of what this would look like for actual metrics (say, one of the existing ones we support, and then whichever new ones we might be able to accommodate through this) we might see the benefits (?) |
I was thinking the same! I think the majority of users will be looking for classification/regression metrics and may not even be familiar with reinforcement learning. Having to look at examples to understand how to use an API even in the 'simplest' scenario, signals that it is not intuitive. I like @romanlutz idea of writing out some examples! |
Is there resistance even to Step 1, which would just modify the current API to: MetricFrame(*, metrics, y_true=None, y_pred=None, sensitive_features, control_features=None, sample_params=None) I don't think that it would make the API more confusing, would it? I think that there are good use cases that have nothing to do with reinforcement learning and that we have already run into (dataset-only metrics like demographic parity, streaming metrics which are currently being implemented, and word-error rate-like settings which are requested as well). But let me create some short examples to demonstrate. |
It would break existing code, and require us to bump the version to v0.7.0. If we do that, I think we should release v0.7.0 as soon as possible after making the change. The long gap between v0.4.6 and v0.5.0 (the latter introducing the I would not be overjoyed at defaulting At the same time, you could also add the |
Based on the discussion in the community call, I'm happy with the following being done:
My only concern is orchestrating a swift release, since these are breaking changes. When it comes to metrics for other problem types, I would like to know more before deciding to support those by extending |
The reasoning behind It seems to me that generally speaking the sample_params will be similar for different metrics of the same scenario (e.g., either classification or reinforcement learning). And in the niche cases that it's not, you might as well just create two So for me at least some examples would still be appreciated, @MiroDudik :D |
I know... examples... soonish? @riedgar-ms : re. swift release, my suggestion re. staging is 1. backward compatible implementation with deprecation warning:MetricFrame(*args, metrics, y_true=None, y_pred=None, sensitive_features, control_features=None, sample_params=None) 2. move to keyword-only arguments:MetricFrame(*, metrics, y_true=None, y_pred=None, sensitive_features, control_features=None, sample_params=None) 3. addition of any other arguments like
|
I'm not keen on allowing |
@riedgar-ms: I believe that (1) must have |
Hmmm.... good point. |
Finally, some examples... in those examples I'm considering two alternative proposals for the API (they slightly differ from the original proposal in my issue at the top): Alternative A: Optional
|
@romanlutz , @riedgar-ms , @hildeweerts : please take a look at the examples above, and let me know what you think about Alternative A vs Alternative B (which just extends Alternative A). In particular, do any of you see issues with Alternative A? |
I'm still not enamoured of accepting I still feel that we should leave I would be happy to start up a fresh discussion group about what a more general disaggregated metric should look like (and as I mentioned above, it's entirely possible that the existing |
Thanks a lot for these examples - this makes things a lot clearer for me! Given these examples, I find Alternative A pretty intuitive. If I understand correctly, the most basic scenario with just At a first glance, I am a bit concerned that Alternative B contains too much "magic" behind the scenes which makes it more difficult to read (and debug) code. I do understand that writing all the dictionaries can be tedious. Adding subclasses for specific types of problems (e.g., RL) would already go a long way I think (and is much easier to properly document without overwhelming novice users). |
@riedgar-ms : what do you think about Alternative A? I think it would go a long way towards addressing the main issues and it doesn't introduce @hildeweerts : correct, our current functionality is supported by both Alternative A and Alternative B by just requiring to name |
Re. Re. function factory suggested by @riedgar-ms : this is for a different problem with varying columns |
The factory function would be something like: def expand_y_pred(y_true, y_pred, f):
assert len(y_true) == len(y_pred)
return f(y_pred)
def make_metric_from_y_pred_function(function):
return functools.partial(expand_y_pred, f = function) in which case we could define: selection_rate = make_metric_from_y_pred_function(np.mean) (note I have only written the code above, not actually tried it). There could be a similar Does that make more sense @hildeweerts ? |
Ah, yes, I see. Thanks for the explanation! I can see the added value of the factory function, but I wonder how easy it would be for (novice) Fairlearn users to identify the need for it - particularly for those who skip the user guide. I still think there's added value in having specific data structures for different types of ML tasks, rather than having to wrap functions to fit into the supervised learning paradigm - if that makes sense? E.g., with named arguments it seems like we should be able to handle But I am not sure what's the best way forward here... looking forward to hearing other people's thoughts :) |
Can you expand on If we did start allowing different names, how would me handle routing arguments? I'm concerned that if we try looking at our list of arguments, and using reflection on the supplied metric functions, we'll end up creating a source of subtle bugs. Think |
That is correct! |
I was thinking we could have a separate |
My issue with MetricFrame was that it computed the metric on instantiation which meant that the other methods could not be accessed unless I passed a metric. So you'll see in my attempt at a roc_curves class that there is a psuedo metric (tks @MiroDudik, for that solution) to enable that class to take advantage of MetricFrame's What do you think of introducing static methods within the MetricFrame class for helper functions like splitting the data by sensitive feature? |
@kstohr I'm a bit confused by what you mean by:
The argument is obviously called |
@hildeweerts that sort of leads to the question "Then what is |
I'm not sure if I understand your question correctly so sorry in advance if this is all obvious to you. But generally |
@hildeweerts it might be that I'm just not fully used to the conventions - based on your description, it sounds like |
To make things more confusing for you generally speaking |
@hildeweerts coming back to the API, though, are there metrics which would accept three parameters - |
You're right, there aren't. So that's a similar issue as metrics that don't use |
Yes. It's a nomenclature issue. When I see And, in fact, all I needed to do was split the data by sensitive feature. I strongly feel like that should be a class/module on its own. Then you can use it in combination or with custom classes to compute metrics. I totally agree with @riedgar-ms but would frame it a little differently:
Another way to think about it is we have two tasks to perform: I think Fairlearn will be easier to maintain if we don't have to worry about how the data should be split and how it should be passed to a given metric in a single module/class. I suggest we just offer a class/module for splitting the data. And then worry about applying metrics to the split data in separate modules, which may import/depend on the module that splits the data such that we always handle that consistently. I feel the current module is trying to do too much in one class. |
I'm not really sure about how best to cope with that. One solution (which would of course break everything) would be to have our own names for the inputs - say |
I think this is an important discussion and I'd love to hear thoughts from other @fairlearn/fairlearn-maintainers as well here! |
It's quite a long conversation here, could you maybe summarize what we've got so far here, for the rest of us to be able to understand the conversation? |
SummaryPer @adrinjalali 's request :-) I think that we've got a few things going on here:
@kstohr has also suggested that the 'split up by sensitive and conditional features' part of |
Pinging..... @adrinjalali @hildeweerts @MiroDudik @michaelamoako @kstohr |
@riedgar-ms Yep. That sums it up. My reason for spitting out the Note: It is ok, even preferred, if it is still a method in the class. My issue is that when you instantiate the class, it currently forces you to run a metric. It would be great to be able to perform the |
SummaryThis is a second attempt to summarise the discussion to date, with the goal of reaching a conclusion. The previous summary was overly summarised. Current StatusAs proposed at the the beginning of the thread, we are in the process of moving to requiring named arguments for metrics = {
'recall': skm.recall_score,
'accuracy': skm.accuracy_score
}
s_p = {
'recall' : { 'sample_weights' : s_w }
# No sample params for accuracy
}
mf = MetricFrame(metrics=metrics, y_true=y_t, y_pred=y_p,
sensitive_features=s_f,
control_features=c_f,
sample_params=s_p) Note that the metric functions themselves are dispatched using positional arguments - the basic signature is assumed to be Functionality GapsThe feedback above (and from other sources) has highlighted several gaps in the functionality of
There are actually work arounds for all of these using the existing A note on precomputed metricsIn the above, one proposed solution for the precomputed metric problem was to have a static factory method Proposed solutionsThere are two basic ways to provide other columns - as Providing
|
I personally don't mind this inconsistency. As for the rest of the proposal, I find it way too complicated, and not very user friendly, and hard to explain to users. I would rather settle for a middle ground where we leave We should of course accept extra input required by the metric. For example, in the case of
And with SLEP006 sklearn could accept something like:
to get a scorer. Note that this has removed the need for us to do any introspection, but we have a scorer object which is a callable but also has a state. An alternative here, since we don't actually do slicing to re-call a scorer, is to have an object which only accepts precomputed scores, like
Or something like that. |
First, my apologies @adrinjalali for not responding sooner. For I'm not quite sure what you're describing with my_metric_fn = make_scorer(inverse_propensity_score, score_params=["rewards", "propensities"]) and then pass func_dict = { 'recall' : recall_score,
'inv_propensity': my_metric_fn }
flex_mf = FlexibleMetricFrame(metrics=func_dict,
y_true=y_t, y_pred=y_p, rewards=rewards, propensities=propensites,
sensitive_features=sf) Since Presumably, |
Pinging @adrinjalali .... did I get what you meant by |
So what I was trying to say, is that handling the metadata routing is really not that easy, and most APIs we've tried to design are messy. The API in sklearn applies to scorers i.e. Alternatively, we could let I guess for now, we could have something like: metrics = AlternativeMetricFrame(
metrics={'roc': roc_func, 'inverse_propensity': inverse_propensity_score}
y_true=y_true,
y_pred=y_pred,
rewards=rewards,
propensities=propensities,
routing={'inverse_propensity': ['rewards', 'propensities']}
) We could also accept a scorer instead of the scoring method, but the user would also need to pass the estimator. I kinda would prefer that if we're gonna do routing. |
I don't have a lot to add to this discussion, but I agree that leaving pinging @fairlearn/fairlearn-maintainers to move the discussion forward |
Right now
MetricFrame
only works with metrics with the signaturemetric(y_true, y_pred)
, but its disaggregation functionality should be much more broadly applicable. Two use cases of interest:metric(y_true)
andmetric(y_pred)
. While in principle these could be handled by the current API, it's a bit confusing.metric(actions, rewards, propensities)
. This use case comes up in settings with partial observations, e.g., in lending: we only observe whether the person repays a loan (reward) for the specific loan type we provide (action), including "no loan".I think regardless of how the API is tweaked, there's an obvious first step. And then hopefully a not-too-controversial second step.
Step 1: Make MetricFrame arguments keyword-only
The suggestion is to change the current API, which is this:
To something like this:
So all arguments would be keyword-only and
metric
would becomemetrics
(for consistency with other arguments and withcolumns
in pandas). The functionality wouldn't change.Step 2: Allow flexible names of shared sample parameters
Change the signature to:
The idea is that any of the
shared_sample_params
are passed on to all metrics, whereassample_params
is (as before) a dictionary. With the new signature, it would still be possible to write things likeBut it would be simple to use other kinds of metrics that do not work with
y_true
andy_pred
.The text was updated successfully, but these errors were encountered: