New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tuning the probability threshold for classification #856
Comments
You actually can tune the threshold in nested cross-validation: You just need to set
That's something that's still missing unfortunately. |
Thanks a lot! I will try it out. |
Here is a script that shows this.
|
Further comments from my side, and questions for Erich:
Do we need to improve docs here? I guess my text above should be in there for clarification.
I had that ages ago in mlr. It was changed, and is now handled "specially" because we want to be efficient. If you would treat it as a normal tuning param, this would happen:
|
Regarding 2)
? |
Yes, I think this would be helful. As a workaround for only tuning the threshold and not other parameters at the same time, I can specify an integer tuning parameter whose upper and lower bounds are equal to the default value. |
Hello, I see how setting tune.threshold = TRUE in the makeTuneControl* function above can be used to tune the probability threshold used to convert probabilities to predictions for the class labels. However, I don't see what is being optimised in order to find this threshold, and I don't understand how I can change whatever is being optimised to what I want. What I want to do is tune the probability threshold so that I get a specific TPR, and I want both the threshold and the FPR at that point. I actually want to do this for a small range of TPRs (60, 70, 80%). I'm training a glmnet model on moderately unbalanced data and I'm currently using AUC as the optimisation objective (but this is only a surrogate objective; my metric of interest are really the FPRs for specific TPRs). How can I control how the probability threshold is optimised in nested CV? Thanks! |
Hi Andrew, you can control the measure that is optimised by setting it in Regarding your second paragraph: There you can find this line, which extracts you different tpr and fpr combinations:
Tuning directly fpr or tpr does not make sense for me, as you then just have to predict all FALSE or all TRUE. If you want to set the threshold from the beginning you can set it already when creating the learner with The modified example from above for your use case (first makeTuneWrapper with auc, then get tpr and fpr combinations):
Hope I could help you. |
Hi, it wasn't clear to me from the example above that the measure that is optimised to find the probability threshold is |
Hi Andrew, maybe partial AUC (e.g. in package pROC) is alternative for you. As far as I know you can restrict tpr or fpr to be in some interval, but I've never tried how it works if the interval is very small like in your case. Cheers, |
You could, in your custom measure, check whether tpr is the desired value and if not return a really bad score. Not sure how well that would work in practice though. |
Julia: Using pROC to get hold of the FPR as a specified TPR and minimise that FPR seems to work. It took a bit of debugging, but my code for my custom measure looks like this:
where Lars: That was my first idea; I mentioned this above. This also seems to work, although I don't like having to specify a tuning parameter to decide how close to 60% I must be to return the FPR without a big penalty for being outside the accepted range. I wonder how efficiently this measure can be minimised. I don't know anything about how the optimisation engine in mlr works, but I imagine that if it's using something like gradient descent, having a measure that has zero gradient except for one small region where there is a Dirac delta-type spike might be a bit tricky to minimise. This is just a wild guess. This method seems to work, but I have this nagging doubt that the results I get may be sub-optimal and I don't know it. Provided its fast enough, Julia's method seems to be the simplest and most direct way to get the measure I want, so I'll go with that for the moment. As I understand, after obtaining an estimate of the performance I can expect from the final model on new data if I use the model selection method in the inner loop of the CV, I should repeat exactly the same procedure that I performed in the inner loop of the CV and apply it to the entire dataset in order to build the final model. That is, using all the data, I find the optimal hyperparameters (the alpha and lambda of my glmnet model) using CV and then use these values to build the final model. Please scream if I have this wrong! :-) Thanks! |
mlr has different optimisation methods and they are all able to deal with these spikes. If you're using grid or random search they're completely unaffected by these. |
hi, let me clear this up.
this is correct. tuneThreshold optimizes the measure you selected for tuning. it can basically be any measure.
I will post more later |
Hi Bernd, Thanks for your reply. I look forward to hearing more later. In the meantime, it might help to know the context for the model I'm building. I'm a particle physicist working on one of the large experiments at the Large Hadron Collider at CERN, and I want to use mlr to build a classifier that can distinguish between two different classes of subatomic particle, based on their decay properties. I will be citing the mlr package in my paper, and you'll be able to add my paper to your list of works that use mlr. I'm trying to improve on this work: https://arxiv.org/abs/1405.6583 Quick comments in reply to your comments:
Thanks! |
the reason i said "subject to TPR >= 0.6" as the constraint for "FPR = min!", instead of "subject to TPR = 0.6" is that my both versions are mathematically equivalent, right? if you somehow have a "baggage" of earlier work that you need to compare to, its hard for me to factor that in....
modelling it like in the measure formula above seems like i would do it as well. simply do that. it would look very similar for a constraint ala TPR >= 0.6.
i simply meant you cannot "act on a plot" in nested resampling / your model selection. with the ROC plot i meant: for a "static" model like simple a logistic regression, you could look at the plot, select the point, read off the FPR at TPR = 0.6 and you are done. during nested resampling with tuning and model selection, what you seem to be doing, you need a numerical criterion (the measure we just discussed or something similar, but not a ggplot object....) You are saying that the method Julia proposed is wrong for nested CV?
like i said, creating a custom measure that does this here:
is doable in mlr, does exactly what you want. also the mlr threshold optimizer is basically and interval search and has no real problem is your objective function is "unsmooth" or looks weird. so also no problem. in general: i could probably "teach" mlr so that a user can create general measures CONDITIONED to general constraints like TPR > 0.6. then you can tune for that or tuneThreshold for that. that would be very cool. its not that hard i think but i need some time. All ok? |
Any news regarding this problem?
|
You can certainly just use a tuneWrapper with a custom grid with just on parameter setting. Not so beautiful but should to the trick. |
So, for learners that don't have a tuning parameter, the trick would be to wrap something around it (e.g. makeWeightedClassesWrapper) so that the original learner gets a tuning parameter. We can then tune the threshold together with the new tuning parameter but set the new tuning parameter to a constant value that does not change the original model. Here's my example for tuning the threshold of
|
Certainly you don't need the To get the result you can use the following code res = resample(learner = lrn2, resampling = cv10, task = pid.task, measures = meas, extract = getTuneResult)
extractSubList(res$extract, "threshold") |
Thanks, I thought that resolution would have an influence on the number of thresholds being evaluated. Does that mean that the number of thresholds cannot be set manually? How many values are evaluated? 100? |
The threshold tuning is done in a more exhaustive and accurate way as it is not so expensive. See |
I have a trivial question about the tune.threshold = TRUE cutoff. For this parameter, will the threshold be different for each repeat? If the cutoff is different, does the result return the mean threshold across repeats? |
Yes, they will be different. Look at the example above from Bernd:
Here you can see the different thresholds for each Cross Validation step. |
After you get a different optimal threshold (or tuning parameter) in each of the Cross Validation steps (see comment by PhilippPro), how do you decide which threshold to use if you want to make a prediction model on the entire dataset? Is there a function to do this in mlr? |
@mbbrigitte The resampling of the tuned learner (the result of makeTuneWrapper) is meant to give you an idea if your strategy of tuning works when compared to other strategies. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
@schiffner can you share with me what you have gathered regarding how the generalization to the multi-class case works and what options there are for tuning. thank you in advance |
Is it possible with
mlr
to tune the probability threshold for classification using nested cross-validation? In the tutorial on cost-sensitive classification the functiontuneThreshold
is briefly explained, but if I understand it correctly, it can only be used for unnested resampling. I think it is important to be able to tune the threshold in nested cross-validation because searching for an optimal threshold can lead to strong overoptimism. Hence, if we want to properly estimate the predictive performance in the same sample, we have to strictly separate testing from learning.Why don't we treat the probability threshold as a regular tuning parameter? This would not only allow nested cross-validation but also tuning this parameter with other tuning parameters at the same time.
I also don't understand why changing the threshold is not discussed as potential remedy for class imbalance in the tutorial on imbalanced classification problems.
The text was updated successfully, but these errors were encountered: