New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Enhancement: Add MAPE as an evaluation metric #10711
Changes from 31 commits
d45fb57
762fa19
9a6b8fd
ea62611
70ab1e0
962a8e2
4f16a45
e2e93b0
6df2446
7c516ef
cb7d875
8194eb0
fd28645
d10af3c
61ea463
088467f
f9ac4e4
595bcf6
7575d0e
a228116
199f038
732d4dc
7a78396
0ffc824
953a6e0
064833b
e707fc8
cc8cd10
c780845
9dba988
ff43f92
a48f9d7
a0f2cd0
bfe2143
e4cb140
7312998
9c3a776
553f1ed
9ea3975
407332a
c74eabb
60f370d
e04fe37
2e20a48
203aed1
ec7de1e
96d6d4a
7aac4c0
6f1ab55
0ac8bc4
e2184f3
3dbe763
83c4aa5
6b350a5
bb36550
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -85,6 +85,7 @@ Scoring Function | |
**Regression** | ||
'explained_variance' :func:`metrics.explained_variance_score` | ||
'neg_mean_absolute_error' :func:`metrics.mean_absolute_error` | ||
'neg_mape' :func:`metrics.mean_absolute_percentage_error` | ||
'neg_mean_squared_error' :func:`metrics.mean_squared_error` | ||
'neg_mean_squared_log_error' :func:`metrics.mean_squared_log_error` | ||
'neg_median_absolute_error' :func:`metrics.median_absolute_error` | ||
|
@@ -104,7 +105,7 @@ Usage examples: | |
>>> model = svm.SVC() | ||
>>> cross_val_score(model, X, y, scoring='wrong_choice') | ||
Traceback (most recent call last): | ||
ValueError: 'wrong_choice' is not a valid scoring value. Valid options are ['accuracy', 'adjusted_mutual_info_score', 'adjusted_rand_score', 'average_precision', 'balanced_accuracy', 'brier_score_loss', 'completeness_score', 'explained_variance', 'f1', 'f1_macro', 'f1_micro', 'f1_samples', 'f1_weighted', 'fowlkes_mallows_score', 'homogeneity_score', 'mutual_info_score', 'neg_log_loss', 'neg_mean_absolute_error', 'neg_mean_squared_error', 'neg_mean_squared_log_error', 'neg_median_absolute_error', 'normalized_mutual_info_score', 'precision', 'precision_macro', 'precision_micro', 'precision_samples', 'precision_weighted', 'r2', 'recall', 'recall_macro', 'recall_micro', 'recall_samples', 'recall_weighted', 'roc_auc', 'v_measure_score'] | ||
ValueError: 'wrong_choice' is not a valid scoring value. Valid options are ['accuracy', 'adjusted_mutual_info_score', 'adjusted_rand_score', 'average_precision', 'balanced_accuracy', 'brier_score_loss', 'completeness_score', 'explained_variance', 'f1', 'f1_macro', 'f1_micro', 'f1_samples', 'f1_weighted', 'fowlkes_mallows_score', 'homogeneity_score', 'mutual_info_score', 'neg_log_loss', 'neg_mape', 'neg_mean_absolute_error', 'neg_mean_squared_error', 'neg_mean_squared_log_error', 'neg_median_absolute_error', 'normalized_mutual_info_score', 'precision', 'precision_macro', 'precision_micro', 'precision_samples', 'precision_weighted', 'r2', 'recall', 'recall_macro', 'recall_micro', 'recall_samples', 'recall_weighted', 'roc_auc', 'v_measure_score'] | ||
|
||
.. note:: | ||
|
||
|
@@ -1551,6 +1552,32 @@ Here is a small example of usage of the :func:`mean_absolute_error` function:: | |
... # doctest: +ELLIPSIS | ||
0.849... | ||
|
||
.. _mean_absolute_percentage_error: | ||
|
||
Mean absolute error | ||
------------------- | ||
|
||
The :func:`mean_absolute_percentage_error` function computes `mean absolute | ||
percentage error <https://en.wikipedia.org/wiki/Mean_absolute_percentage_error>`_, a risk | ||
metric corresponding to the expected value of the absolute percentage error loss or | ||
:math:`l1`-norm of percentage loss. | ||
|
||
If :math:`\hat{y}_i` is the predicted value of the :math:`i`-th sample, | ||
and :math:`y_i` is the corresponding true value, then the mean absolute percentage error | ||
(MAPE) estimated over :math:`n_{\text{samples}}` is defined as | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would put the (MAPE) at the first mention of mean absolute percentage error above. Also maybe add at least one sentence of explanation, say There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
|
||
.. math:: | ||
|
||
\text{MAPE}(y, \hat{y}) = \frac{100}{n_{\text{samples}}} \sum_{i=0}^{n_{\text{samples}}-1} \left| \frac{y_i - \hat{y}_i}{y_i} \right|. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I haven't reviewed how the rest of the document does it, but it seems excessively pedantic to say the sum starts at 0 and ends n_samples-1: it makes the formula a little harder to read, where There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was trying to follow the formula from Wikipedia, but your We can make the change in another PR if enough people agree with it :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jnothman, I think it's better to keep it as is, to remain consistent with the other metrics definitions. We can create an issue to apply the change to all metrics separately. |
||
|
||
Here is a small example of usage of the :func:`mean_absolute_percentage_error` function:: | ||
|
||
>>> from sklearn.metrics import mean_absolute_percentage_error | ||
>>> y_true = [3, -0.5, 2, 7] | ||
>>> y_pred = [2.5, 0.0, 2, 8] | ||
>>> mean_absolute_percentage_error(y_true, y_pred) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe add an example of it not being shift-invariant, i.e. add 10 to y_true and y_pred and show that the error is much smaller and add a sentence to explain. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
32.738... | ||
|
||
.. _mean_squared_error: | ||
|
||
Mean squared error | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -88,6 +88,8 @@ Model evaluation | |
- Added the :func:`metrics.balanced_accuracy_score` metric and a corresponding | ||
``'balanced_accuracy'`` scorer for binary classification. | ||
:issue:`8066` by :user:`xyguo` and :user:`Aman Dalmia <dalmia>`. | ||
- Added the :func:`metrics.mean_absolute_percentage_error` metric and the associated | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "Model evaluation" is not the best section for this I would say, in doc/whats_new/v0.19.0.rst there is a "Metrics" section. I think you can do the same here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nice catch, I will do that. |
||
scorer for regression problems. :issue:`10711` by :user:`Mohamed Ali Jamaoui <mohamed-ali>` | ||
|
||
Decomposition, manifold learning and clustering | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -19,6 +19,7 @@ | |
# Manoj Kumar <manojkumarsivaraj334@gmail.com> | ||
# Michael Eickenberg <michael.eickenberg@gmail.com> | ||
# Konstantin Shmelkov <konstantin.shmelkov@polytechnique.edu> | ||
# Mohamed Ali Jamaoui <m.ali.jamaoui@gmail.com> | ||
# License: BSD 3 clause | ||
|
||
from __future__ import division | ||
|
@@ -32,6 +33,7 @@ | |
|
||
__ALL__ = [ | ||
"mean_absolute_error", | ||
"mean_absolute_percentage_error", | ||
"mean_squared_error", | ||
"mean_squared_log_error", | ||
"median_absolute_error", | ||
|
@@ -181,6 +183,46 @@ def mean_absolute_error(y_true, y_pred, | |
return np.average(output_errors, weights=multioutput) | ||
|
||
|
||
def mean_absolute_percentage_error(y_true, y_pred): | ||
"""Mean absolute percentage error regression loss | ||
|
||
Read more in the :ref:`User Guide <mean_absolute_percentage_error>`. | ||
|
||
Parameters | ||
---------- | ||
y_true : array-like of shape = (n_samples) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nitpick: there should be a comma after n_samples. |
||
Ground truth (correct) target values. | ||
|
||
y_pred : array-like of shape = (n_samples) | ||
Estimated target values. | ||
|
||
Returns | ||
------- | ||
loss : float | ||
A positive floating point value (the best value is 0.0). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. between 0 and 100? |
||
|
||
Examples | ||
-------- | ||
>>> from sklearn.metrics import mean_absolute_percentage_error | ||
>>> y_true = [3, -0.5, 2, 7] | ||
>>> y_pred = [2.5, 0.0, 2, 8] | ||
>>> mean_absolute_percentage_error(y_true, y_pred) | ||
32.738... | ||
""" | ||
y_type, y_true, y_pred, _ = _check_reg_targets(y_true, y_pred, | ||
'uniform_average') | ||
|
||
if y_type == 'continuous-multioutput': | ||
raise ValueError("Multioutput not supported " | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's fine not to support it for now, but I think there is little doubt that multi-output mape would be the same on the flattened input: this would give an identical measure to macro-averaging. If we supported the variant mentioned in Wikipedia where you divide by the mean y_true, that is a different matter, because the mean across all columns may be inappropriate. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess there is a possibility to add the two kinds of implementations of MAPE and allow the user to change between them. We can also do that, when users request to have it :) |
||
"in mean_absolute_percentage_error") | ||
|
||
if (y_true == 0).any(): | ||
raise ValueError("mean_absolute_percentage_error requires" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is not currently executed in any tests. It should be tested There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added a test case called |
||
" y_true to not include zeros") | ||
|
||
return np.mean(np.abs((y_true - y_pred) / y_true)) * 100 | ||
|
||
|
||
def mean_squared_error(y_true, y_pred, | ||
sample_weight=None, | ||
multioutput='uniform_average'): | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -24,9 +24,11 @@ | |
import numpy as np | ||
|
||
from . import (r2_score, median_absolute_error, mean_absolute_error, | ||
mean_squared_error, mean_squared_log_error, accuracy_score, | ||
mean_squared_error, mean_absolute_percentage_error, | ||
mean_squared_log_error, accuracy_score, | ||
f1_score, roc_auc_score, average_precision_score, | ||
precision_score, recall_score, log_loss, balanced_accuracy_score, | ||
precision_score, recall_score, log_loss, | ||
balanced_accuracy_score, | ||
explained_variance_score, brier_score_loss) | ||
|
||
from .cluster import adjusted_rand_score | ||
|
@@ -487,6 +489,9 @@ def make_scorer(score_func, greater_is_better=True, needs_proba=False, | |
mean_absolute_error_scorer = make_scorer(mean_absolute_error, | ||
greater_is_better=False) | ||
mean_absolute_error_scorer._deprecation_msg = deprecation_msg | ||
neg_mape_scorer = make_scorer(mean_absolute_percentage_error, | ||
greater_is_better=False) | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe remove the new line here. When there is no clear rule, my advice would be to follow the same implicit convention as the code you are changing. |
||
neg_median_absolute_error_scorer = make_scorer(median_absolute_error, | ||
greater_is_better=False) | ||
deprecation_msg = ('Scoring method median_absolute_error was renamed to ' | ||
|
@@ -536,6 +541,7 @@ def make_scorer(score_func, greater_is_better=True, needs_proba=False, | |
|
||
SCORERS = dict(explained_variance=explained_variance_scorer, | ||
r2=r2_scorer, | ||
neg_mape=neg_mape_scorer, | ||
neg_median_absolute_error=neg_median_absolute_error_scorer, | ||
neg_mean_absolute_error=neg_mean_absolute_error_scorer, | ||
neg_mean_squared_error=neg_mean_squared_error_scorer, | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -42,6 +42,7 @@ | |
from sklearn.metrics import log_loss | ||
from sklearn.metrics import matthews_corrcoef | ||
from sklearn.metrics import mean_absolute_error | ||
from sklearn.metrics import mean_absolute_percentage_error | ||
from sklearn.metrics import mean_squared_error | ||
from sklearn.metrics import median_absolute_error | ||
from sklearn.metrics import precision_score | ||
|
@@ -93,6 +94,7 @@ | |
|
||
REGRESSION_METRICS = { | ||
"mean_absolute_error": mean_absolute_error, | ||
"mean_absolute_percentage_error": mean_absolute_percentage_error, | ||
"mean_squared_error": mean_squared_error, | ||
"median_absolute_error": median_absolute_error, | ||
"explained_variance_score": explained_variance_score, | ||
|
@@ -366,7 +368,9 @@ | |
"weighted_precision_score", | ||
|
||
"macro_f0.5_score", "macro_f2_score", "macro_precision_score", | ||
"macro_recall_score", "log_loss", "hinge_loss" | ||
"macro_recall_score", "log_loss", "hinge_loss", | ||
|
||
"mean_absolute_percentage_error" | ||
] | ||
|
||
|
||
|
@@ -378,15 +382,21 @@ | |
# confusion_matrix with sample_weight is in | ||
# test_classification.py | ||
"median_absolute_error", | ||
"mean_absolute_percentage_error" | ||
] | ||
|
||
# Metrics that only support non-zero y | ||
METRICS_WITH_NON_ZERO_Y = [ | ||
"mean_absolute_percentage_error" | ||
] | ||
|
||
|
||
@ignore_warnings | ||
def test_symmetry(): | ||
# Test the symmetry of score and loss functions | ||
random_state = check_random_state(0) | ||
y_true = random_state.randint(0, 2, size=(20, )) | ||
y_pred = random_state.randint(0, 2, size=(20, )) | ||
y_true = random_state.randint(1, 3, size=(20, )) | ||
y_pred = random_state.randint(1, 3, size=(20, )) | ||
|
||
# We shouldn't forget any metrics | ||
assert_equal(set(SYMMETRIC_METRICS).union( | ||
|
@@ -415,8 +425,8 @@ def test_symmetry(): | |
@ignore_warnings | ||
def test_sample_order_invariance(): | ||
random_state = check_random_state(0) | ||
y_true = random_state.randint(0, 2, size=(20, )) | ||
y_pred = random_state.randint(0, 2, size=(20, )) | ||
y_true = random_state.randint(1, 3, size=(20, )) | ||
y_pred = random_state.randint(1, 3, size=(20, )) | ||
y_true_shuffle, y_pred_shuffle = shuffle(y_true, y_pred, random_state=0) | ||
|
||
for name, metric in ALL_METRICS.items(): | ||
|
@@ -432,8 +442,6 @@ def test_sample_order_invariance(): | |
@ignore_warnings | ||
def test_sample_order_invariance_multilabel_and_multioutput(): | ||
random_state = check_random_state(0) | ||
|
||
# Generate some data | ||
y_true = random_state.randint(0, 2, size=(20, 25)) | ||
y_pred = random_state.randint(0, 2, size=(20, 25)) | ||
y_score = random_state.normal(size=y_true.shape) | ||
|
@@ -472,8 +480,8 @@ def test_sample_order_invariance_multilabel_and_multioutput(): | |
@ignore_warnings | ||
def test_format_invariance_with_1d_vectors(): | ||
random_state = check_random_state(0) | ||
y1 = random_state.randint(0, 2, size=(20, )) | ||
y2 = random_state.randint(0, 2, size=(20, )) | ||
y1 = random_state.randint(1, 3, size=(20, )) | ||
y2 = random_state.randint(1, 3, size=(20, )) | ||
|
||
y1_list = list(y1) | ||
y2_list = list(y2) | ||
|
@@ -653,8 +661,8 @@ def check_single_sample(name): | |
metric = ALL_METRICS[name] | ||
|
||
# assert that no exception is thrown | ||
for i, j in product([0, 1], repeat=2): | ||
metric([i], [j]) | ||
for i, j in product([1, 2], repeat=2): | ||
metric([i], [j]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You've left this excessively indented. |
||
|
||
|
||
@ignore_warnings | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -41,8 +41,8 @@ | |
from sklearn.externals import joblib | ||
|
||
|
||
REGRESSION_SCORERS = ['explained_variance', 'r2', | ||
'neg_mean_absolute_error', 'neg_mean_squared_error', | ||
REGRESSION_SCORERS = ['explained_variance', 'r2', 'neg_mean_absolute_error', | ||
'neg_mape', 'neg_mean_squared_error', | ||
'neg_mean_squared_log_error', | ||
'neg_median_absolute_error', 'mean_absolute_error', | ||
'mean_squared_error', 'median_absolute_error'] | ||
|
@@ -66,6 +66,7 @@ | |
|
||
MULTILABEL_ONLY_SCORERS = ['precision_samples', 'recall_samples', 'f1_samples'] | ||
|
||
NONZERO_Y_SCORERS = ['neg_mape'] | ||
|
||
def _make_estimators(X_train, y_train, y_ml_train): | ||
# Make estimators that make sense to test various scoring methods | ||
|
@@ -486,6 +487,8 @@ def check_scorer_memmap(scorer_name): | |
scorer, estimator = SCORERS[scorer_name], ESTIMATORS[scorer_name] | ||
if scorer_name in MULTILABEL_ONLY_SCORERS: | ||
score = scorer(estimator, X_mm, y_ml_mm) | ||
elif scorer_name in NONZERO_Y_SCORERS: | ||
score = scorer(estimator, X_mm, y_mm + 1) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can all use y_mm + 1, actually? Do we even need NONZERO_Y_SCORERS? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When trying to y_mm + 1 for all, 5 tests cases fail:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also the type of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Indeed. Well it's not a useful test if it's not a memmap. Make sure to store it with +1 ... |
||
else: | ||
score = scorer(estimator, X_mm, y_mm) | ||
assert isinstance(score, numbers.Number), scorer_name | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not spell it out fully here since like all the other metrics? i.e. neg_mean_absolute_percentage_error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lesteve I clarified in the PR description above that the name has to be chosen/voted by all of us. Initially I used
neg_mean_absolute_percentage_error
but then, sincemape
is already a famous acronym which, also, makes the metric cleaner, I chose to switch to neg_mape. However, we can change back to the long version, If most of us think that's the right thing to do.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be in favour of
neg_mean_absolute_error
version personally. It is more consistent withneg_mean_absolute_error
and more consistent with the metric name (metrics.mean_absolute_percentage_error
). Happy to hear what others think.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also be in favor using the explicit expanded name by default and introduce
neg_mape
as an alias as we do forneg_mse
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we do not have
neg_mse
. I thought we had.