Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Enhancement: Add MAPE as an evaluation metric #10711

Closed
wants to merge 55 commits into from
Closed
Show file tree
Hide file tree
Changes from 31 commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
d45fb57
adding MAPE as a new regression loss
mohamed-ali Feb 26, 2018
762fa19
adding MAPE api reference
mohamed-ali Feb 26, 2018
9a6b8fd
adding MAPE scorer
mohamed-ali Feb 26, 2018
ea62611
adding MAPE to metrics/__init__.py
mohamed-ali Feb 26, 2018
70ab1e0
configuring tests for MAPE scorer
mohamed-ali Feb 26, 2018
962a8e2
configuring common tests for MAPE metric
mohamed-ali Feb 26, 2018
4f16a45
correcting import in MAPE example
mohamed-ali Feb 26, 2018
e2e93b0
adding documentation under model_evaluation
mohamed-ali Feb 26, 2018
6df2446
fixing pep8
mohamed-ali Feb 26, 2018
7c516ef
adding more MAPE regression tests
mohamed-ali Feb 26, 2018
cb7d875
add whitespace around operators
mohamed-ali Feb 26, 2018
8194eb0
fix bug: missing comma
mohamed-ali Feb 26, 2018
fd28645
avoiding division by zero in test scenario
mohamed-ali Feb 27, 2018
d10af3c
adding check to validate y_true contains no zeros
mohamed-ali Feb 27, 2018
61ea463
documenting the new feature in whats_new
mohamed-ali Feb 27, 2018
088467f
precising that MAPE is a non symetric metric
mohamed-ali Feb 27, 2018
f9ac4e4
change scorer to neg_mape and fix pep8
mohamed-ali Feb 27, 2018
595bcf6
adding a metrics category for non-zero y
mohamed-ali Feb 28, 2018
7575d0e
fix pep8 issues
mohamed-ali Feb 28, 2018
a228116
undo unrelated pep8 changes
mohamed-ali Feb 28, 2018
199f038
fix conflict
mohamed-ali Feb 28, 2018
732d4dc
Merge branch 'master' into Add-MAPE-as-evaluation-metric
mohamed-ali Feb 28, 2018
7a78396
fixing scorers tests in test_score_objects.py
mohamed-ali Feb 28, 2018
0ffc824
undo autopep8 irrelevant changes
mohamed-ali Feb 28, 2018
953a6e0
undo autopep8 changes
mohamed-ali Feb 28, 2018
064833b
undo autopep8 change
mohamed-ali Feb 28, 2018
e707fc8
unduplicating tests by using randint(1,3)
mohamed-ali Feb 28, 2018
cc8cd10
use elif instead
mohamed-ali Feb 28, 2018
c780845
remove uncessary comments and add space to operator +
mohamed-ali Mar 1, 2018
9dba988
using [1, 2] as a single sample test
mohamed-ali Mar 2, 2018
ff43f92
Merge branch 'Add-MAPE-as-evaluation-metric' of https://github.com/mo…
mohamed-ali Mar 2, 2018
a48f9d7
remove excessive indentation
mohamed-ali Mar 2, 2018
a0f2cd0
test raise error when y contains zeros
mohamed-ali Mar 2, 2018
bfe2143
use memmap type for test scenario
mohamed-ali Mar 4, 2018
e4cb140
tear down nonzero_y_mm
mohamed-ali Mar 4, 2018
7312998
keep naming consistency
mohamed-ali Mar 4, 2018
9c3a776
Trigger travis after timeout
mohamed-ali Mar 4, 2018
553f1ed
Merge branch 'master' into Add-MAPE-as-evaluation-metric
mohamed-ali Mar 26, 2018
9ea3975
add comma aster n_samples
mohamed-ali Mar 26, 2018
407332a
specify that mape is between 0 and 100
mohamed-ali Mar 26, 2018
c74eabb
update MAPE description in model_evalution
mohamed-ali Mar 26, 2018
60f370d
updated mape description and add not shift-invariant demo
mohamed-ali Mar 26, 2018
e04fe37
fix travis error
mohamed-ali Mar 26, 2018
2e20a48
fix failing doctests
mohamed-ali Mar 26, 2018
203aed1
fix travis error
mohamed-ali Mar 26, 2018
ec7de1e
Merge branch 'master' into Add-MAPE-as-evaluation-metric
mohamed-ali Mar 27, 2018
96d6d4a
Merge branch 'Add-MAPE-as-evaluation-metric' of https://github.com/mo…
mohamed-ali Mar 27, 2018
7aac4c0
remove line to keep code consistent
mohamed-ali Mar 28, 2018
6f1ab55
put mape in metrics section
mohamed-ali Mar 28, 2018
0ac8bc4
resolve conflicts
mohamed-ali Aug 16, 2019
e2184f3
fix syntax error
mohamed-ali Aug 18, 2019
3dbe763
fix merge conflicts in metrics.scorer
mohamed-ali Aug 18, 2019
83c4aa5
fix merge conflicts
mohamed-ali Aug 18, 2019
6b350a5
fix merge conflicts
mohamed-ali Aug 18, 2019
bb36550
add what's new
mohamed-ali Aug 20, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/modules/classes.rst
Expand Up @@ -836,6 +836,7 @@ details.

metrics.explained_variance_score
metrics.mean_absolute_error
metrics.mean_absolute_percentage_error
metrics.mean_squared_error
metrics.mean_squared_log_error
metrics.median_absolute_error
Expand Down
29 changes: 28 additions & 1 deletion doc/modules/model_evaluation.rst
Expand Up @@ -85,6 +85,7 @@ Scoring Function
**Regression**
'explained_variance' :func:`metrics.explained_variance_score`
'neg_mean_absolute_error' :func:`metrics.mean_absolute_error`
'neg_mape' :func:`metrics.mean_absolute_percentage_error`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not spell it out fully here since like all the other metrics? i.e. neg_mean_absolute_percentage_error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lesteve I clarified in the PR description above that the name has to be chosen/voted by all of us. Initially I used neg_mean_absolute_percentage_error but then, since mape is already a famous acronym which, also, makes the metric cleaner, I chose to switch to neg_mape. However, we can change back to the long version, If most of us think that's the right thing to do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be in favour of neg_mean_absolute_error version personally. It is more consistent with neg_mean_absolute_error and more consistent with the metric name ( metrics.mean_absolute_percentage_error). Happy to hear what others think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also be in favor using the explicit expanded name by default and introduce neg_mape as an alias as we do for neg_mse.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we do not have neg_mse. I thought we had.

'neg_mean_squared_error' :func:`metrics.mean_squared_error`
'neg_mean_squared_log_error' :func:`metrics.mean_squared_log_error`
'neg_median_absolute_error' :func:`metrics.median_absolute_error`
Expand All @@ -104,7 +105,7 @@ Usage examples:
>>> model = svm.SVC()
>>> cross_val_score(model, X, y, scoring='wrong_choice')
Traceback (most recent call last):
ValueError: 'wrong_choice' is not a valid scoring value. Valid options are ['accuracy', 'adjusted_mutual_info_score', 'adjusted_rand_score', 'average_precision', 'balanced_accuracy', 'brier_score_loss', 'completeness_score', 'explained_variance', 'f1', 'f1_macro', 'f1_micro', 'f1_samples', 'f1_weighted', 'fowlkes_mallows_score', 'homogeneity_score', 'mutual_info_score', 'neg_log_loss', 'neg_mean_absolute_error', 'neg_mean_squared_error', 'neg_mean_squared_log_error', 'neg_median_absolute_error', 'normalized_mutual_info_score', 'precision', 'precision_macro', 'precision_micro', 'precision_samples', 'precision_weighted', 'r2', 'recall', 'recall_macro', 'recall_micro', 'recall_samples', 'recall_weighted', 'roc_auc', 'v_measure_score']
ValueError: 'wrong_choice' is not a valid scoring value. Valid options are ['accuracy', 'adjusted_mutual_info_score', 'adjusted_rand_score', 'average_precision', 'balanced_accuracy', 'brier_score_loss', 'completeness_score', 'explained_variance', 'f1', 'f1_macro', 'f1_micro', 'f1_samples', 'f1_weighted', 'fowlkes_mallows_score', 'homogeneity_score', 'mutual_info_score', 'neg_log_loss', 'neg_mape', 'neg_mean_absolute_error', 'neg_mean_squared_error', 'neg_mean_squared_log_error', 'neg_median_absolute_error', 'normalized_mutual_info_score', 'precision', 'precision_macro', 'precision_micro', 'precision_samples', 'precision_weighted', 'r2', 'recall', 'recall_macro', 'recall_micro', 'recall_samples', 'recall_weighted', 'roc_auc', 'v_measure_score']

.. note::

Expand Down Expand Up @@ -1551,6 +1552,32 @@ Here is a small example of usage of the :func:`mean_absolute_error` function::
... # doctest: +ELLIPSIS
0.849...

.. _mean_absolute_percentage_error:

Mean absolute error
-------------------

The :func:`mean_absolute_percentage_error` function computes `mean absolute
percentage error <https://en.wikipedia.org/wiki/Mean_absolute_percentage_error>`_, a risk
metric corresponding to the expected value of the absolute percentage error loss or
:math:`l1`-norm of percentage loss.

If :math:`\hat{y}_i` is the predicted value of the :math:`i`-th sample,
and :math:`y_i` is the corresponding true value, then the mean absolute percentage error
(MAPE) estimated over :math:`n_{\text{samples}}` is defined as
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put the (MAPE) at the first mention of mean absolute percentage error above.

Also maybe add at least one sentence of explanation, say
"MAPE computes the error relative to the true value. Therefore the same absolute distance between prediction and ground truth will lead to a smaller error if the true value is larger. In particular the metrics is not shift-invariant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


.. math::

\text{MAPE}(y, \hat{y}) = \frac{100}{n_{\text{samples}}} \sum_{i=0}^{n_{\text{samples}}-1} \left| \frac{y_i - \hat{y}_i}{y_i} \right|.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't reviewed how the rest of the document does it, but it seems excessively pedantic to say the sum starts at 0 and ends n_samples-1: it makes the formula a little harder to read, where \sum_i would suffice

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to follow the formula from Wikipedia, but your \sum_i is clearer.
EDIT: in fact, it's inspired from the other metrics formulas like: accuracy score, MAE, MSE, ..
see for instance: http://scikit-learn.org/stable/modules/model_evaluation.html#mean-absolute-error.

We can make the change in another PR if enough people agree with it :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jnothman, I think it's better to keep it as is, to remain consistent with the other metrics definitions. We can create an issue to apply the change to all metrics separately.


Here is a small example of usage of the :func:`mean_absolute_percentage_error` function::

>>> from sklearn.metrics import mean_absolute_percentage_error
>>> y_true = [3, -0.5, 2, 7]
>>> y_pred = [2.5, 0.0, 2, 8]
>>> mean_absolute_percentage_error(y_true, y_pred)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add an example of it not being shift-invariant, i.e. add 10 to y_true and y_pred and show that the error is much smaller and add a sentence to explain.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

32.738...

.. _mean_squared_error:

Mean squared error
Expand Down
2 changes: 2 additions & 0 deletions doc/whats_new/v0.20.rst
Expand Up @@ -88,6 +88,8 @@ Model evaluation
- Added the :func:`metrics.balanced_accuracy_score` metric and a corresponding
``'balanced_accuracy'`` scorer for binary classification.
:issue:`8066` by :user:`xyguo` and :user:`Aman Dalmia <dalmia>`.
- Added the :func:`metrics.mean_absolute_percentage_error` metric and the associated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Model evaluation" is not the best section for this I would say, in doc/whats_new/v0.19.0.rst there is a "Metrics" section. I think you can do the same here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, I will do that.

scorer for regression problems. :issue:`10711` by :user:`Mohamed Ali Jamaoui <mohamed-ali>`

Decomposition, manifold learning and clustering

Expand Down
2 changes: 2 additions & 0 deletions sklearn/metrics/__init__.py
Expand Up @@ -54,6 +54,7 @@

from .regression import explained_variance_score
from .regression import mean_absolute_error
from .regression import mean_absolute_percentage_error
from .regression import mean_squared_error
from .regression import mean_squared_log_error
from .regression import median_absolute_error
Expand Down Expand Up @@ -97,6 +98,7 @@
'make_scorer',
'matthews_corrcoef',
'mean_absolute_error',
'mean_absolute_percentage_error',
'mean_squared_error',
'mean_squared_log_error',
'median_absolute_error',
Expand Down
42 changes: 42 additions & 0 deletions sklearn/metrics/regression.py
Expand Up @@ -19,6 +19,7 @@
# Manoj Kumar <manojkumarsivaraj334@gmail.com>
# Michael Eickenberg <michael.eickenberg@gmail.com>
# Konstantin Shmelkov <konstantin.shmelkov@polytechnique.edu>
# Mohamed Ali Jamaoui <m.ali.jamaoui@gmail.com>
# License: BSD 3 clause

from __future__ import division
Expand All @@ -32,6 +33,7 @@

__ALL__ = [
"mean_absolute_error",
"mean_absolute_percentage_error",
"mean_squared_error",
"mean_squared_log_error",
"median_absolute_error",
Expand Down Expand Up @@ -181,6 +183,46 @@ def mean_absolute_error(y_true, y_pred,
return np.average(output_errors, weights=multioutput)


def mean_absolute_percentage_error(y_true, y_pred):
"""Mean absolute percentage error regression loss

Read more in the :ref:`User Guide <mean_absolute_percentage_error>`.

Parameters
----------
y_true : array-like of shape = (n_samples)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: there should be a comma after n_samples.

Ground truth (correct) target values.

y_pred : array-like of shape = (n_samples)
Estimated target values.

Returns
-------
loss : float
A positive floating point value (the best value is 0.0).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

between 0 and 100?


Examples
--------
>>> from sklearn.metrics import mean_absolute_percentage_error
>>> y_true = [3, -0.5, 2, 7]
>>> y_pred = [2.5, 0.0, 2, 8]
>>> mean_absolute_percentage_error(y_true, y_pred)
32.738...
"""
y_type, y_true, y_pred, _ = _check_reg_targets(y_true, y_pred,
'uniform_average')

if y_type == 'continuous-multioutput':
raise ValueError("Multioutput not supported "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine not to support it for now, but I think there is little doubt that multi-output mape would be the same on the flattened input: this would give an identical measure to macro-averaging. If we supported the variant mentioned in Wikipedia where you divide by the mean y_true, that is a different matter, because the mean across all columns may be inappropriate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess there is a possibility to add the two kinds of implementations of MAPE and allow the user to change between them.

We can also do that, when users request to have it :)

"in mean_absolute_percentage_error")

if (y_true == 0).any():
raise ValueError("mean_absolute_percentage_error requires"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not currently executed in any tests. It should be tested

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a test case called test_raise_value_error_y_with_zeros for this particular error

" y_true to not include zeros")

return np.mean(np.abs((y_true - y_pred) / y_true)) * 100


def mean_squared_error(y_true, y_pred,
sample_weight=None,
multioutput='uniform_average'):
Expand Down
10 changes: 8 additions & 2 deletions sklearn/metrics/scorer.py
Expand Up @@ -24,9 +24,11 @@
import numpy as np

from . import (r2_score, median_absolute_error, mean_absolute_error,
mean_squared_error, mean_squared_log_error, accuracy_score,
mean_squared_error, mean_absolute_percentage_error,
mean_squared_log_error, accuracy_score,
f1_score, roc_auc_score, average_precision_score,
precision_score, recall_score, log_loss, balanced_accuracy_score,
precision_score, recall_score, log_loss,
balanced_accuracy_score,
explained_variance_score, brier_score_loss)

from .cluster import adjusted_rand_score
Expand Down Expand Up @@ -487,6 +489,9 @@ def make_scorer(score_func, greater_is_better=True, needs_proba=False,
mean_absolute_error_scorer = make_scorer(mean_absolute_error,
greater_is_better=False)
mean_absolute_error_scorer._deprecation_msg = deprecation_msg
neg_mape_scorer = make_scorer(mean_absolute_percentage_error,
greater_is_better=False)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe remove the new line here. When there is no clear rule, my advice would be to follow the same implicit convention as the code you are changing.

neg_median_absolute_error_scorer = make_scorer(median_absolute_error,
greater_is_better=False)
deprecation_msg = ('Scoring method median_absolute_error was renamed to '
Expand Down Expand Up @@ -536,6 +541,7 @@ def make_scorer(score_func, greater_is_better=True, needs_proba=False,

SCORERS = dict(explained_variance=explained_variance_scorer,
r2=r2_scorer,
neg_mape=neg_mape_scorer,
neg_median_absolute_error=neg_median_absolute_error_scorer,
neg_mean_absolute_error=neg_mean_absolute_error_scorer,
neg_mean_squared_error=neg_mean_squared_error_scorer,
Expand Down
30 changes: 19 additions & 11 deletions sklearn/metrics/tests/test_common.py
Expand Up @@ -42,6 +42,7 @@
from sklearn.metrics import log_loss
from sklearn.metrics import matthews_corrcoef
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_absolute_percentage_error
from sklearn.metrics import mean_squared_error
from sklearn.metrics import median_absolute_error
from sklearn.metrics import precision_score
Expand Down Expand Up @@ -93,6 +94,7 @@

REGRESSION_METRICS = {
"mean_absolute_error": mean_absolute_error,
"mean_absolute_percentage_error": mean_absolute_percentage_error,
"mean_squared_error": mean_squared_error,
"median_absolute_error": median_absolute_error,
"explained_variance_score": explained_variance_score,
Expand Down Expand Up @@ -366,7 +368,9 @@
"weighted_precision_score",

"macro_f0.5_score", "macro_f2_score", "macro_precision_score",
"macro_recall_score", "log_loss", "hinge_loss"
"macro_recall_score", "log_loss", "hinge_loss",

"mean_absolute_percentage_error"
]


Expand All @@ -378,15 +382,21 @@
# confusion_matrix with sample_weight is in
# test_classification.py
"median_absolute_error",
"mean_absolute_percentage_error"
]

# Metrics that only support non-zero y
METRICS_WITH_NON_ZERO_Y = [
"mean_absolute_percentage_error"
]


@ignore_warnings
def test_symmetry():
# Test the symmetry of score and loss functions
random_state = check_random_state(0)
y_true = random_state.randint(0, 2, size=(20, ))
y_pred = random_state.randint(0, 2, size=(20, ))
y_true = random_state.randint(1, 3, size=(20, ))
y_pred = random_state.randint(1, 3, size=(20, ))

# We shouldn't forget any metrics
assert_equal(set(SYMMETRIC_METRICS).union(
Expand Down Expand Up @@ -415,8 +425,8 @@ def test_symmetry():
@ignore_warnings
def test_sample_order_invariance():
random_state = check_random_state(0)
y_true = random_state.randint(0, 2, size=(20, ))
y_pred = random_state.randint(0, 2, size=(20, ))
y_true = random_state.randint(1, 3, size=(20, ))
y_pred = random_state.randint(1, 3, size=(20, ))
y_true_shuffle, y_pred_shuffle = shuffle(y_true, y_pred, random_state=0)

for name, metric in ALL_METRICS.items():
Expand All @@ -432,8 +442,6 @@ def test_sample_order_invariance():
@ignore_warnings
def test_sample_order_invariance_multilabel_and_multioutput():
random_state = check_random_state(0)

# Generate some data
y_true = random_state.randint(0, 2, size=(20, 25))
y_pred = random_state.randint(0, 2, size=(20, 25))
y_score = random_state.normal(size=y_true.shape)
Expand Down Expand Up @@ -472,8 +480,8 @@ def test_sample_order_invariance_multilabel_and_multioutput():
@ignore_warnings
def test_format_invariance_with_1d_vectors():
random_state = check_random_state(0)
y1 = random_state.randint(0, 2, size=(20, ))
y2 = random_state.randint(0, 2, size=(20, ))
y1 = random_state.randint(1, 3, size=(20, ))
y2 = random_state.randint(1, 3, size=(20, ))

y1_list = list(y1)
y2_list = list(y2)
Expand Down Expand Up @@ -653,8 +661,8 @@ def check_single_sample(name):
metric = ALL_METRICS[name]

# assert that no exception is thrown
for i, j in product([0, 1], repeat=2):
metric([i], [j])
for i, j in product([1, 2], repeat=2):
metric([i], [j])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've left this excessively indented.



@ignore_warnings
Expand Down
7 changes: 6 additions & 1 deletion sklearn/metrics/tests/test_regression.py
Expand Up @@ -11,6 +11,7 @@

from sklearn.metrics import explained_variance_score
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_absolute_percentage_error
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_squared_log_error
from sklearn.metrics import median_absolute_error
Expand All @@ -28,6 +29,11 @@ def test_regression_metrics(n_samples=50):
mean_squared_error(np.log(1 + y_true),
np.log(1 + y_pred)))
assert_almost_equal(mean_absolute_error(y_true, y_pred), 1.)
# comparing (y_true + 1) and (y_pred + 1) instead of
# y_true and y_pred to avoid division by zero
assert_almost_equal(mean_absolute_percentage_error(1 + y_true,
1 + y_pred),
8.998, 2)
assert_almost_equal(median_absolute_error(y_true, y_pred), 1.)
assert_almost_equal(r2_score(y_true, y_pred), 0.995, 2)
assert_almost_equal(explained_variance_score(y_true, y_pred), 1.)
Expand Down Expand Up @@ -72,7 +78,6 @@ def test_regression_metrics_at_limits():
mean_squared_log_error, [1., -2., 3.], [1., 2., 3.])



def test__check_reg_targets():
# All of length 3
EXAMPLES = [
Expand Down
7 changes: 5 additions & 2 deletions sklearn/metrics/tests/test_score_objects.py
Expand Up @@ -41,8 +41,8 @@
from sklearn.externals import joblib


REGRESSION_SCORERS = ['explained_variance', 'r2',
'neg_mean_absolute_error', 'neg_mean_squared_error',
REGRESSION_SCORERS = ['explained_variance', 'r2', 'neg_mean_absolute_error',
'neg_mape', 'neg_mean_squared_error',
'neg_mean_squared_log_error',
'neg_median_absolute_error', 'mean_absolute_error',
'mean_squared_error', 'median_absolute_error']
Expand All @@ -66,6 +66,7 @@

MULTILABEL_ONLY_SCORERS = ['precision_samples', 'recall_samples', 'f1_samples']

NONZERO_Y_SCORERS = ['neg_mape']

def _make_estimators(X_train, y_train, y_ml_train):
# Make estimators that make sense to test various scoring methods
Expand Down Expand Up @@ -486,6 +487,8 @@ def check_scorer_memmap(scorer_name):
scorer, estimator = SCORERS[scorer_name], ESTIMATORS[scorer_name]
if scorer_name in MULTILABEL_ONLY_SCORERS:
score = scorer(estimator, X_mm, y_ml_mm)
elif scorer_name in NONZERO_Y_SCORERS:
score = scorer(estimator, X_mm, y_mm + 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can all use y_mm + 1, actually? Do we even need NONZERO_Y_SCORERS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When trying to y_mm + 1 for all, 5 tests cases fail:

======================================================================
ERROR: sklearn.metrics.tests.test_score_objects.test_scorer_memmap_input('balanced_accuracy',)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/dali/Desktop/workspace/scikit-learn/sklearn/utils/testing.py", line 326, in wrapper
    return fn(*args, **kwargs)
  File "/home/dali/Desktop/workspace/scikit-learn/sklearn/metrics/tests/test_score_objects.py", line 493, in check_scorer_memmap
    score = scorer(estimator, X_mm, y_mm + 1)
  File "/home/dali/Desktop/workspace/scikit-learn/sklearn/metrics/scorer.py", line 110, in __call__
    **self._kwargs)
  File "/home/dali/Desktop/workspace/scikit-learn/sklearn/metrics/classification.py", line 1420, in balanced_accuracy_score
    raise ValueError('Balanced accuracy is only meaningful '
ValueError: Balanced accuracy is only meaningful for binary classification problems.

======================================================================
ERROR: sklearn.metrics.tests.test_score_objects.test_scorer_memmap_input('precision',)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/dali/Desktop/workspace/scikit-learn/sklearn/utils/testing.py", line 326, in wrapper
    return fn(*args, **kwargs)
  File "/home/dali/Desktop/workspace/scikit-learn/sklearn/metrics/tests/test_score_objects.py", line 493, in check_scorer_memmap
    score = scorer(estimator, X_mm, y_mm + 1)
  File "/home/dali/Desktop/workspace/scikit-learn/sklearn/metrics/scorer.py", line 110, in __call__
    **self._kwargs)
  File "/home/dali/Desktop/workspace/scikit-learn/sklearn/metrics/classification.py", line 1266, in precision_score
    sample_weight=sample_weight)
  File "/home/dali/Desktop/workspace/scikit-learn/sklearn/metrics/classification.py", line 1044, in precision_recall_fscore_support
    "choose another average setting." % y_type)
ValueError: Target is multiclass but average='binary'. Please choose another average setting.

======================================================================
ERROR: sklearn.metrics.tests.test_score_objects.test_scorer_memmap_input('average_precision',)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/dali/Desktop/workspace/scikit-learn/sklearn/utils/testing.py", line 326, in wrapper
    return fn(*args, **kwargs)
  File "/home/dali/Desktop/workspace/scikit-learn/sklearn/metrics/tests/test_score_objects.py", line 493, in check_scorer_memmap
    score = scorer(estimator, X_mm, y_mm + 1)
  File "/home/dali/Desktop/workspace/scikit-learn/sklearn/metrics/scorer.py", line 211, in __call__
    return self._sign * self._score_func(y, y_pred, **self._kwargs)
  File "/home/dali/Desktop/workspace/scikit-learn/sklearn/metrics/ranking.py", line 217, in average_precision_score
    sample_weight=sample_weight)
  File "/home/dali/Desktop/workspace/scikit-learn/sklearn/metrics/base.py", line 75, in _average_binary_score
    return binary_metric(y_true, y_score, sample_weight=sample_weight)
  File "/home/dali/Desktop/workspace/scikit-learn/sklearn/metrics/ranking.py", line 209, in _binary_uninterpolated_average_precision
    y_true, y_score, sample_weight=sample_weight)
  File "/home/dali/Desktop/workspace/scikit-learn/sklearn/metrics/ranking.py", line 470, in precision_recall_curve
    sample_weight=sample_weight)
  File "/home/dali/Desktop/workspace/scikit-learn/sklearn/metrics/ranking.py", line 364, in _binary_clf_curve
    raise ValueError("Data is not binary and pos_label is not specified")
ValueError: Data is not binary and pos_label is not specified

======================================================================
ERROR: sklearn.metrics.tests.test_score_objects.test_scorer_memmap_input('recall',)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/dali/Desktop/workspace/scikit-learn/sklearn/utils/testing.py", line 326, in wrapper
    return fn(*args, **kwargs)
  File "/home/dali/Desktop/workspace/scikit-learn/sklearn/metrics/tests/test_score_objects.py", line 493, in check_scorer_memmap
    score = scorer(estimator, X_mm, y_mm + 1)
  File "/home/dali/Desktop/workspace/scikit-learn/sklearn/metrics/scorer.py", line 110, in __call__
    **self._kwargs)
  File "/home/dali/Desktop/workspace/scikit-learn/sklearn/metrics/classification.py", line 1364, in recall_score
    sample_weight=sample_weight)
  File "/home/dali/Desktop/workspace/scikit-learn/sklearn/metrics/classification.py", line 1044, in precision_recall_fscore_support
    "choose another average setting." % y_type)
ValueError: Target is multiclass but average='binary'. Please choose another average setting.

======================================================================
ERROR: sklearn.metrics.tests.test_score_objects.test_scorer_memmap_input('f1',)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/dali/Desktop/workspace/scikit-learn/sklearn/utils/testing.py", line 326, in wrapper
    return fn(*args, **kwargs)
  File "/home/dali/Desktop/workspace/scikit-learn/sklearn/metrics/tests/test_score_objects.py", line 493, in check_scorer_memmap
    score = scorer(estimator, X_mm, y_mm + 1)
  File "/home/dali/Desktop/workspace/scikit-learn/sklearn/metrics/scorer.py", line 110, in __call__
    **self._kwargs)
  File "/home/dali/Desktop/workspace/scikit-learn/sklearn/metrics/classification.py", line 717, in f1_score
    sample_weight=sample_weight)
  File "/home/dali/Desktop/workspace/scikit-learn/sklearn/metrics/classification.py", line 831, in fbeta_score
    sample_weight=sample_weight)
  File "/home/dali/Desktop/workspace/scikit-learn/sklearn/metrics/classification.py", line 1044, in precision_recall_fscore_support
    "choose another average setting." % y_type)
ValueError: Target is multiclass but average='binary'. Please choose another average setting.

----------------------------------------------------------------------
Ran 53 tests in 0.244s

FAILED (errors=5)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the type of y_mm is <class 'numpy.core.memmap.memmap'> whereas y_mm + 1 is of type <type 'numpy.ndarray'>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. Well it's not a useful test if it's not a memmap. Make sure to store it with +1 ...

else:
score = scorer(estimator, X_mm, y_mm)
assert isinstance(score, numbers.Number), scorer_name
Expand Down