Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should sklearn include the Equal Error Rate metric? #15247

Open
vnherdeiro opened this issue Oct 14, 2019 · 2 comments
Open

Should sklearn include the Equal Error Rate metric? #15247

vnherdeiro opened this issue Oct 14, 2019 · 2 comments

Comments

@vnherdeiro
Copy link
Contributor

Hi there,

I have recently looked into using sklearn for biometric security models. In this field one of the most common metrics is equal error rate (EER). Surprisingly it's not out of the box in sklearn despite the long list of scores and metrics shipped with https://scikit-learn.org/stable/modules/classes.html

I have coded a piece of code to compute the EER in different contexts, such as optimizing hyperparameters and model space to minimize it. Is it worth cleaning it and including it in a pull request?

@jaganadhg
Copy link

There is an SO discussion here
[1] https://stackoverflow.com/questions/28339746/equal-error-rate-in-python
and a blog post here
[2] https://yangcha.github.io/EER-ROC/

@vnherdeiro
Copy link
Contributor Author

vnherdeiro commented Oct 15, 2019

Yeah I saw these examples, and refactored them to this code:

from sklearn.metrics import make_scorer, roc_curve
from scipy.optimize import brentq
from scipy.interpolate import interp1d

def calculate_eer(y_true, y_score):
    '''
    Returns the equal error rate for a binary classifier output.
    '''
    fpr, tpr, thresholds = roc_curve(y_true, y_score, pos_label=1)
    eer = brentq(lambda x : 1. - x - interp1d(fpr, tpr)(x), 0., 1.)
    return eer

From which you can use a sklearn scorer with:

make_scorer(calculate_eer, greater_is_better=False, needs_proba=True)

For instance used in grid search optimization:

GridSearchCV( LogisticRegression(), param_grid=param_grid, cv=n_folds, scoring=make_scorer(calculate_eer, greater_is_better=False, needs_proba=True))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants