Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Enhancement: Add MAPE as an evaluation metric #10711

Closed
wants to merge 55 commits into from
Closed

[MRG] Enhancement: Add MAPE as an evaluation metric #10711

wants to merge 55 commits into from

Conversation

mohamed-ali
Copy link
Contributor

@mohamed-ali mohamed-ali commented Feb 26, 2018

Reference Issues/PRs

Fixes #10708
Closes #6605

What does this implement/fix? Explain your changes.

  • Implements sklearn.metrics.mean_absolute_percentage_error
  • Implements the associated neg_mape scorer for regression problems.
  • Includes tests configurations in tests/test_common.py and tests/test_score_objects.py
  • Add specific tests with y_true that doesn't include zeros in tests/test_common.py and tests/test_score_objects.py.
  • Adds docstring + example.
  • Updates the documentation at doc/modules/model_evaluation.rst and doc/modules/classes.rst

Any other comments?

  • We have to reach a consensus on the name of the scorer: either neg_mean_absolute_percentage_error_scorer or neg_mape_scorer

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should note the sensitivity to small y_true, and perhaps we need to validate that y_true is non-zero.

Thanks for the thoroughness!

@@ -85,6 +85,7 @@ Scoring Function
**Regression**
'explained_variance' :func:`metrics.explained_variance_score`
'neg_mean_absolute_error' :func:`metrics.mean_absolute_error`
'neg_mean_absolute_percentage_error' :func:`metrics.mean_absolute_percentage_error`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amueller suggested neg_mape here. I'm not entirely sure which is better. But you'll need to widen the table if we go with this name!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I am open to change the name or keep it as is. But I'll the necessary changes once we agree on a name ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jnothman @amueller I am going to switch from the scorer fname neg_mean_absolute_percentage_error to the neg_mape since MAPE is already a famous acronym in the community and also because it's the only way to avoid pep8 problems.

@@ -104,7 +105,7 @@ Usage examples:
>>> model = svm.SVC()
>>> cross_val_score(model, X, y, scoring='wrong_choice')
Traceback (most recent call last):
ValueError: 'wrong_choice' is not a valid scoring value. Valid options are ['accuracy', 'adjusted_mutual_info_score', 'adjusted_rand_score', 'average_precision', 'balanced_accuracy', 'brier_score_loss', 'completeness_score', 'explained_variance', 'f1', 'f1_macro', 'f1_micro', 'f1_samples', 'f1_weighted', 'fowlkes_mallows_score', 'homogeneity_score', 'mutual_info_score', 'neg_log_loss', 'neg_mean_absolute_error', 'neg_mean_squared_error', 'neg_mean_squared_log_error', 'neg_median_absolute_error', 'normalized_mutual_info_score', 'precision', 'precision_macro', 'precision_micro', 'precision_samples', 'precision_weighted', 'r2', 'recall', 'recall_macro', 'recall_micro', 'recall_samples', 'recall_weighted', 'roc_auc', 'v_measure_score']
ValueError: 'wrong_choice' is not a valid scoring value. Valid options are ['accuracy', 'adjusted_mutual_info_score', 'adjusted_rand_score', 'average_precision', 'balanced_accuracy', 'brier_score_loss', 'completeness_score', 'explained_variance', 'f1', 'f1_macro', 'f1_micro', 'f1_samples', 'f1_weighted', 'fowlkes_mallows_score', 'homogeneity_score', 'mutual_info_score', 'neg_log_loss', 'neg_mean_absolute_error', 'neg_mean_absolute_percentage_error', 'neg_mean_squared_error', 'neg_mean_squared_log_error', 'neg_median_absolute_error', 'normalized_mutual_info_score', 'precision', 'precision_macro', 'precision_micro', 'precision_samples', 'precision_weighted', 'r2', 'recall', 'recall_macro', 'recall_micro', 'recall_samples', 'recall_weighted', 'roc_auc', 'v_measure_score']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have inspired #10712. Thanks!

if y_type == 'continuous-multioutput':
raise ValueError("Multioutput not supported "
"in mean_absolute_percentage_error")
return np.average(np.abs((y_true - y_pred)/y_true))*100
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space around / and * please

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call mean rather than average just to match the name

@sklearn-lgtm
Copy link

This pull request introduces 2 alerts when merging 7c516ef into 2e30df3 - view on lgtm.com

new alerts:

  • 1 for Explicit export is not defined
  • 1 for Implicit string concatenation in a list

Comment posted by lgtm.com

@mohamed-ali
Copy link
Contributor Author

@jnothman, @amueller, should I raise an error when y_true contains zeros, or do you suggest handling it differently? if so how?

Also, if you have a convention for the error division error message, please suggest it so I can use it, Otherwise, I can define a generic message.

@jnothman
Copy link
Member

jnothman commented Feb 27, 2018 via email

@mohamed-ali
Copy link
Contributor Author

mohamed-ali commented Feb 27, 2018

@amueller, @jnothman, @qinhanmin2014

The test scenarios in:

  • sklearn/metrics/tests/test_score_objects.py
  • sklearn/metrics/tests/test_common.py

generate a y_true sample with zeros , which fails the following check in metrics.mean_absolute_percentage_error :

    if (y_true == 0).any():
        raise ValueError("mean_absolute_percentage_error requires"
                         " y_true to never be zero")

this explains the failures in Travis.CI, could you please suggest what to do?

@jnothman
Copy link
Member

Pep8 consistency is a very poor excuse for naming inconsistency. I'm unsure what to do with the name, but pep8 is the last of our concerns (stick # noqa at the end of the line and Travis won't fail)

@jnothman
Copy link
Member

You might need to add a new category in the common tests for regression metrics that require non-zero y, and update y in the tests accordingly.

@mohamed-ali
Copy link
Contributor Author

mohamed-ali commented Feb 27, 2018

@jnothman thanks for the tip (# noqa) about travis that's helpful.
regarding the name, I think it's easy to change once we agree on one. So, I'll prioritize fixing the tests.

# matrix instead of a number. Testing of
# confusion_matrix with sample_weight is in
# test_classification.py
"confusion_matrix", # Left this one here because the tests in this file do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not change unrelated things. It makes your contribution harder to review and may introduce merge conflicts to other pull requests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, my bad. I used autopep8 on the file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #10645!

@mohamed-ali
Copy link
Contributor Author

@lesteve travis build finished successfully, so there is no regression with the new changes that you pushed. Thanks!

Copy link
Member

@lesteve lesteve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments.

Just curious, maybe for @amueller who opened the associated issue, Wikipedia page does not seem very nice with MAPE, e.g.:
"Although the concept of MAPE sounds very simple and convincing, it has major drawbacks in practical application". Is is still used despite its drawbacks?

@@ -85,6 +85,7 @@ Scoring Function
**Regression**
'explained_variance' :func:`metrics.explained_variance_score`
'neg_mean_absolute_error' :func:`metrics.mean_absolute_error`
'neg_mape' :func:`metrics.mean_absolute_percentage_error`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not spell it out fully here since like all the other metrics? i.e. neg_mean_absolute_percentage_error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lesteve I clarified in the PR description above that the name has to be chosen/voted by all of us. Initially I used neg_mean_absolute_percentage_error but then, since mape is already a famous acronym which, also, makes the metric cleaner, I chose to switch to neg_mape. However, we can change back to the long version, If most of us think that's the right thing to do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be in favour of neg_mean_absolute_error version personally. It is more consistent with neg_mean_absolute_error and more consistent with the metric name ( metrics.mean_absolute_percentage_error). Happy to hear what others think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also be in favor using the explicit expanded name by default and introduce neg_mape as an alias as we do for neg_mse.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we do not have neg_mse. I thought we had.

@@ -91,6 +91,8 @@ Model evaluation
- Added the :func:`metrics.balanced_accuracy_score` metric and a corresponding
``'balanced_accuracy'`` scorer for binary classification.
:issue:`8066` by :user:`xyguo` and :user:`Aman Dalmia <dalmia>`.
- Added the :func:`metrics.mean_absolute_percentage_error` metric and the associated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Model evaluation" is not the best section for this I would say, in doc/whats_new/v0.19.0.rst there is a "Metrics" section. I think you can do the same here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, I will do that.

@@ -487,6 +489,9 @@ def make_scorer(score_func, greater_is_better=True, needs_proba=False,
mean_absolute_error_scorer = make_scorer(mean_absolute_error,
greater_is_better=False)
mean_absolute_error_scorer._deprecation_msg = deprecation_msg
neg_mape_scorer = make_scorer(mean_absolute_percentage_error,
greater_is_better=False)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe remove the new line here. When there is no clear rule, my advice would be to follow the same implicit convention as the code you are changing.

@mohamed-ali
Copy link
Contributor Author

@amueller can you approve if all the changes that you requested have been addressed? Thanks!

@mohamed-ali
Copy link
Contributor Author

mohamed-ali commented Mar 29, 2018

@jnothman, @amueller, could you please cast you vote on whether to name the scorer neg_mean_absolute_percentage_error_scorer or neg_mape_scorer. So far @lesteve is for the long version whereas I am slightly inclined towards the shorter one. What do you think?

@jnothman
Copy link
Member

jnothman commented Mar 29, 2018 via email

@mohamed-ali
Copy link
Contributor Author

@lesteve @amueller, after @jnothman comment I assume there is no need to change the name. So I guess my work is done on this issue. @amueller @lesteve could you approve the PR?
Thanks.

@mohamed-ali
Copy link
Contributor Author

mohamed-ali commented May 16, 2019

@jnothman @amueller Is this PR still considered for merging? if so, let me know so I fix the conflicts. Thanks

@jnothman
Copy link
Member

I don't see any reason this shouldn't be considered for merge, @mohamed-ali.

I think the docs should emphasise a little more that the metric is not reported as a percentage here.

@amueller
Copy link
Member

would you mind fixing the conflicts please?

@mohamed-ali
Copy link
Contributor Author

would you mind fixing the conflicts please?

Yes sure, I'll spend some time in the next two days to fix them. Thanks for letting me know.

@amueller
Copy link
Member

thanks and sorry about the delay in reviewing

@amueller
Copy link
Member

hm looks like there's a bunch of errors now :-/

@mohamed-ali
Copy link
Contributor Author

hm looks like there's a bunch of errors now :-/

Yes, many things changed since this PR was created. I'll spend more time debugging. If not, I might recreate the PR with a new branch from master.

@ogrisel
Copy link
Member

ogrisel commented Mar 4, 2020

Closing in favor of #15007.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Andy's pets
PR phase
Development

Successfully merging this pull request may close these issues.

Add MAPE as evaluation metric
7 participants