New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Add new regression metric - Mean Squared Log Error #7655

Merged
merged 4 commits into from Nov 30, 2016

Conversation

Projects
None yet
5 participants
@kdexd
Contributor

kdexd commented Oct 12, 2016

Reference Issue: None

What does this implement/fix? Explain your changes.

  • This PR implements a new metric - "Mean Squared Logarithmic Error" (name truncated to mean_squared_log_error). I have added the method alongwith other regression metrics in sklearn.metrics.regression module.
  • Accompanying the implementation, this PR is complete with User Guide Documentation and API docstring.

Any other comments?

  • The metric is similar to mean_squared_error and MSE method can be used to calculate MSLE by cleverly passing arguments, but it always required external manual work.
  • I felt that it would be a nice to have metric due to its frequent requirement.
  • A lot of regression problems in various competitions, especially Kaggle, evaluate submissions based on this error metric or its square root. A Kaggle wiki page can be found here.
@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Oct 13, 2016

Member

Please fix test failures

Member

jnothman commented Oct 13, 2016

Please fix test failures

@kdexd kdexd changed the title from [MRG] Add new regression metric - Mean Squared Log Error to [WIP] Add new regression metric - Mean Squared Log Error Oct 13, 2016

@kdexd kdexd changed the title from [WIP] Add new regression metric - Mean Squared Log Error to [MRG] Add new regression metric - Mean Squared Log Error Oct 13, 2016

y_type, y_true, y_pred, multioutput = _check_reg_targets(
y_true, y_pred, multioutput)
if not (y_true >= 0).all() and not (y_pred >= 0).all():

This comment has been minimized.

@amueller

amueller Oct 14, 2016

Member

It can be used with anything > -1, right?

@amueller

amueller Oct 14, 2016

Member

It can be used with anything > -1, right?

This comment has been minimized.

@kdexd

kdexd Oct 15, 2016

Contributor

@amueller It can be, but (1 + log(x)) will give huge negative values which change erratically on little change of x between (-1, 0). This will not make the score look sensible. Looking mathematically it is possible, but in practical usages this metric is used for non negative targets. Although if you suggest I'd change it.

@kdexd

kdexd Oct 15, 2016

Contributor

@amueller It can be, but (1 + log(x)) will give huge negative values which change erratically on little change of x between (-1, 0). This will not make the score look sensible. Looking mathematically it is possible, but in practical usages this metric is used for non negative targets. Although if you suggest I'd change it.

This comment has been minimized.

@kdexd

kdexd Oct 15, 2016

Contributor

Additionally I just recalled that, I read somewhere - this metric is used for positive values only, still there is log(1 + x) to make everything inside log greater than one, and finally outside the log positive, which would be greater than zero. Making it allowable till -1 will nullify this 😄

@kdexd

kdexd Oct 15, 2016

Contributor

Additionally I just recalled that, I read somewhere - this metric is used for positive values only, still there is log(1 + x) to make everything inside log greater than one, and finally outside the log positive, which would be greater than zero. Making it allowable till -1 will nullify this 😄

This comment has been minimized.

@amueller

amueller Oct 17, 2016

Member

alright.

@amueller

amueller Oct 17, 2016

Member

alright.

This comment has been minimized.

@jnothman

jnothman Nov 6, 2016

Member

Yes, my reading of the equation agrees that it's designed for non-negative values with an exponential trend.

@jnothman

jnothman Nov 6, 2016

Member

Yes, my reading of the equation agrees that it's designed for non-negative values with an exponential trend.

@RPGOne

RPGOne approved these changes Oct 15, 2016

@RPGOne

RPGOne approved these changes Oct 17, 2016

@kdexd

This comment has been minimized.

Show comment
Hide comment
@kdexd

kdexd Oct 18, 2016

Contributor

Hi @amueller and @jnothman, what more shall I do in this PR ? Also, is @RPGOne is a bot or a service ?

Contributor

kdexd commented Oct 18, 2016

Hi @amueller and @jnothman, what more shall I do in this PR ? Also, is @RPGOne is a bot or a service ?

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Oct 18, 2016

Member

RPGOne is spam, as far as I know

On 18 October 2016 at 20:49, Karan Desai notifications@github.com wrote:

Hi @amueller https://github.com/amueller and @jnothman
https://github.com/jnothman, what more shall I do in this PR ? Also, is
@RPGOne https://github.com/RPGOne is a bot or a service ?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7655 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz64Yn9iA-2Ob_9lyUJmobNRC007N4ks5q1JYNgaJpZM4KVLS7
.

Member

jnothman commented Oct 18, 2016

RPGOne is spam, as far as I know

On 18 October 2016 at 20:49, Karan Desai notifications@github.com wrote:

Hi @amueller https://github.com/amueller and @jnothman
https://github.com/jnothman, what more shall I do in this PR ? Also, is
@RPGOne https://github.com/RPGOne is a bot or a service ?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7655 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz64Yn9iA-2Ob_9lyUJmobNRC007N4ks5q1JYNgaJpZM4KVLS7
.

@raghavrv

The code is cleanly written. Thanks!

Show outdated Hide outdated sklearn/metrics/regression.py
Show outdated Hide outdated sklearn/metrics/regression.py
Show outdated Hide outdated sklearn/metrics/regression.py

@raghavrv raghavrv removed the Needs Review label Oct 30, 2016

@raghavrv raghavrv added this to the 0.19 milestone Oct 30, 2016

@kdexd

This comment has been minimized.

Show comment
Hide comment
@kdexd

kdexd Oct 31, 2016

Contributor

Documentation of mean_squared_error in current master renders like this:

image

There are some inconsistencies, my build after the changes you suggested looks like this ( mean_squared_log_error ):

image

To keep the diffs in this PR specific to only one metric, I am leaving other docstrings untouched for a while, I'll be taking them up in a separate documentation cleanup issue. I have rephrased the line you reviewed and reused mean_squared_error as well. Thanks !

Contributor

kdexd commented Oct 31, 2016

Documentation of mean_squared_error in current master renders like this:

image

There are some inconsistencies, my build after the changes you suggested looks like this ( mean_squared_log_error ):

image

To keep the diffs in this PR specific to only one metric, I am leaving other docstrings untouched for a while, I'll be taking them up in a separate documentation cleanup issue. I have rephrased the line you reviewed and reused mean_squared_error as well. Thanks !

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Nov 1, 2016

Member

Thanks for the screenshot of the doc!

I'll be taking them up in a separate documentation cleanup issue.

Much appreciated.

Member

raghavrv commented Nov 1, 2016

Thanks for the screenshot of the doc!

I'll be taking them up in a separate documentation cleanup issue.

Much appreciated.

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Nov 1, 2016

Member

I think it should also be added to the scorer so users can readily refer to it by neg_mean_squared_log_error...

Member

raghavrv commented Nov 1, 2016

I think it should also be added to the scorer so users can readily refer to it by neg_mean_squared_log_error...

@kdexd

This comment has been minimized.

Show comment
Hide comment
@kdexd

kdexd Nov 2, 2016

Contributor

@raghavrv It looks like there is a renaming scheduled for regression metrics similar to this one. For the sake of uniformity, I have added a deprecation message to mean_squared_log_error_scorer just like mean_squared_error_scorer and others. Let me know if I should not include that, and I will amend the commit accordingly, thanks !

Contributor

kdexd commented Nov 2, 2016

@raghavrv It looks like there is a renaming scheduled for regression metrics similar to this one. For the sake of uniformity, I have added a deprecation message to mean_squared_log_error_scorer just like mean_squared_error_scorer and others. Let me know if I should not include that, and I will amend the commit accordingly, thanks !

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Nov 2, 2016

Member

No, don't add a deprecated version. That's only there for people using
features in older versions.

On 2 November 2016 at 13:36, Karan Desai notifications@github.com wrote:

@raghavrv https://github.com/raghavrv It looks like there is a renaming
scheduled for regression metrics similar to this one. For the sake of
uniformity, I have added a deprecation message to
mean_squared_log_error_scorer just like mean_squared_error_scorer and
others. Let me know if I should not include that, and I will amend the
commit accordingly, thanks !


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7655 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz6-edpn5mxkHZ8249LbmS10gyGymAks5q5_cxgaJpZM4KVLS7
.

Member

jnothman commented Nov 2, 2016

No, don't add a deprecated version. That's only there for people using
features in older versions.

On 2 November 2016 at 13:36, Karan Desai notifications@github.com wrote:

@raghavrv https://github.com/raghavrv It looks like there is a renaming
scheduled for regression metrics similar to this one. For the sake of
uniformity, I have added a deprecation message to
mean_squared_log_error_scorer just like mean_squared_error_scorer and
others. Let me know if I should not include that, and I will amend the
commit accordingly, thanks !


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7655 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz6-edpn5mxkHZ8249LbmS10gyGymAks5q5_cxgaJpZM4KVLS7
.

@kdexd

This comment has been minimized.

Show comment
Hide comment
@kdexd

kdexd Nov 2, 2016

Contributor

Hello @jnothman, @raghavrv:
I have made the needful additions / modifications in the PR. I'm up for anything else which is suitable to go in here, please have a look 😄

Contributor

kdexd commented Nov 2, 2016

Hello @jnothman, @raghavrv:
I have made the needful additions / modifications in the PR. I'm up for anything else which is suitable to go in here, please have a look 😄

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Nov 6, 2016

Member

Kaggle calls this "[root] mean squared logarithmic error", not "[root] mean squared log error" which sounds like it's a function of the log of the error. I think this is an important distinction. I'm not sure if you need to rename the function and scorer to reflect this, but at least the documentation needs to be absolutely clear.

Member

jnothman commented Nov 6, 2016

Kaggle calls this "[root] mean squared logarithmic error", not "[root] mean squared log error" which sounds like it's a function of the log of the error. I think this is an important distinction. I'm not sure if you need to rename the function and scorer to reflect this, but at least the documentation needs to be absolutely clear.

Show outdated Hide outdated doc/modules/model_evaluation.rst
Show outdated Hide outdated doc/modules/model_evaluation.rst
Show outdated Hide outdated doc/modules/model_evaluation.rst
y_type, y_true, y_pred, multioutput = _check_reg_targets(
y_true, y_pred, multioutput)
if not (y_true >= 0).all() and not (y_pred >= 0).all():

This comment has been minimized.

@jnothman

jnothman Nov 6, 2016

Member

Yes, my reading of the equation agrees that it's designed for non-negative values with an exponential trend.

@jnothman

jnothman Nov 6, 2016

Member

Yes, my reading of the equation agrees that it's designed for non-negative values with an exponential trend.

@kdexd

This comment has been minimized.

Show comment
Hide comment
@kdexd

kdexd Nov 6, 2016

Contributor

@jnothman Logarithmic made the name too long, but if needed, I'll change the names. But yes atleast I should be clear about it in the docstrings and User Guide. I'll push the required changes soon. Also, MSE and MAE have their square roots used quite frequently, but they are not included in scorer so I dropped [root]. Is it a good choice to provide RMSLE in scorer or it is fine this way ?

Contributor

kdexd commented Nov 6, 2016

@jnothman Logarithmic made the name too long, but if needed, I'll change the names. But yes atleast I should be clear about it in the docstrings and User Guide. I'll push the required changes soon. Also, MSE and MAE have their square roots used quite frequently, but they are not included in scorer so I dropped [root]. Is it a good choice to provide RMSLE in scorer or it is fine this way ?

@kdexd

This comment has been minimized.

Show comment
Hide comment
@kdexd

kdexd Nov 28, 2016

Contributor

Hi @raghavrv, @jnothman:

I have addressed all of your review comments and cleaned up my commit history to reduce down the whole work into isolated sequential commits containing the implementation, tests and documentation one in each ! Please let me know if there's anything else I should do..

Contributor

kdexd commented Nov 28, 2016

Hi @raghavrv, @jnothman:

I have addressed all of your review comments and cleaned up my commit history to reduce down the whole work into isolated sequential commits containing the implementation, tests and documentation one in each ! Please let me know if there's anything else I should do..

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Nov 29, 2016

Member

FWIW, cleaning up commit history is superfluous.

Member

jnothman commented Nov 29, 2016

FWIW, cleaning up commit history is superfluous.

@jnothman

Otherwise LGTM

Show outdated Hide outdated sklearn/metrics/regression.py
Show outdated Hide outdated sklearn/metrics/tests/test_regression.py

kdexd added some commits Nov 9, 2016

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Nov 29, 2016

Member
Member

jnothman commented Nov 29, 2016

@kdexd

This comment has been minimized.

Show comment
Hide comment
@kdexd

kdexd Nov 29, 2016

Contributor

Oh yes, if mean_squared_error would have broken that its test itself would fail !

Contributor

kdexd commented Nov 29, 2016

Oh yes, if mean_squared_error would have broken that its test itself would fail !

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Nov 29, 2016

Member

LGTM

Member

jnothman commented Nov 29, 2016

LGTM

@jnothman jnothman changed the title from [MRG] Add new regression metric - Mean Squared Log Error to [MRG+1] Add new regression metric - Mean Squared Log Error Nov 29, 2016

@amueller

LGTM apart from nitpick. Would you mind fixing that?

Parameters
----------
y_true : array-like of shape = (n_samples) or (n_samples, n_outputs)

This comment has been minimized.

@amueller

amueller Nov 29, 2016

Member

nitpick: you should write (n_samples,) because it's a tuple. (also everywhere below where there's a tuple with one element)

@amueller

amueller Nov 29, 2016

Member

nitpick: you should write (n_samples,) because it's a tuple. (also everywhere below where there's a tuple with one element)

This comment has been minimized.

@kdexd

kdexd Nov 30, 2016

Contributor

@amueller i read your comment after the PR was merged, although I think I will handle this is in a larger routine of documentation consistency, @raghavrv already directed me to a matching issue for the same. I will work on one more PR which is already open before starting that.

@kdexd

kdexd Nov 30, 2016

Contributor

@amueller i read your comment after the PR was merged, although I think I will handle this is in a larger routine of documentation consistency, @raghavrv already directed me to a matching issue for the same. I will work on one more PR which is already open before starting that.

@jnothman jnothman merged commit cb6a366 into scikit-learn:master Nov 30, 2016

2 of 3 checks passed

continuous-integration/appveyor/pr AppVeyor build failed
Details
ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Nov 30, 2016

Member

Thanks, @karandesai-96

Member

jnothman commented Nov 30, 2016

Thanks, @karandesai-96

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Nov 30, 2016

Member
Member

jnothman commented Nov 30, 2016

@kdexd

This comment has been minimized.

Show comment
Hide comment
@kdexd

kdexd Nov 30, 2016

Contributor

Feels good to contribute to the community, thanks @jnothman @amueller @raghavrv for the review ! 😄

Contributor

kdexd commented Nov 30, 2016

Feels good to contribute to the community, thanks @jnothman @amueller @raghavrv for the review ! 😄

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Nov 30, 2016

Member

@jnothman no worries, it was a nitpick of the highest order ;)

Member

amueller commented Nov 30, 2016

@jnothman no worries, it was a nitpick of the highest order ;)

@kdexd kdexd deleted the kdexd:msle-metric branch Dec 2, 2016

adis300 added a commit to adis300/scikit-learn that referenced this pull request Dec 13, 2016

[MRG+1] Add new regression metric - Mean Squared Log Error (#7655)
* ENH Implement mean squared log error in sklearn.metrics.regression

* TST Add tests for mean squared log error.

* DOC Write user guide and docstring about mean squared log error.

* ENH Add neg_mean_squared_log_error in metrics.scorer
@kdexd

This comment has been minimized.

Show comment
Hide comment
@kdexd

kdexd Dec 22, 2016

Contributor

Hi, I was wondering whether this should go in CHANGELOG for next release.

Contributor

kdexd commented Dec 22, 2016

Hi, I was wondering whether this should go in CHANGELOG for next release.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Dec 22, 2016

Member

With apologies, we forgot to ask you to add a changelog entry here. Please submit a new PR with it. THanks.

Member

jnothman commented Dec 22, 2016

With apologies, we forgot to ask you to add a changelog entry here. Please submit a new PR with it. THanks.

@kdexd

This comment has been minimized.

Show comment
Hide comment
@kdexd

kdexd Dec 22, 2016

Contributor

@jnothman: Sure, I'll do that in a moment, thanks for the headsup.

Contributor

kdexd commented Dec 22, 2016

@jnothman: Sure, I'll do that in a moment, thanks for the headsup.

kdexd added a commit to kdexd/scikit-learn that referenced this pull request Dec 22, 2016

kdexd added a commit to kdexd/scikit-learn that referenced this pull request Dec 22, 2016

jnothman added a commit that referenced this pull request Dec 23, 2016

sergeyf added a commit to sergeyf/scikit-learn that referenced this pull request Feb 28, 2017

[MRG+1] Add new regression metric - Mean Squared Log Error (#7655)
* ENH Implement mean squared log error in sklearn.metrics.regression

* TST Add tests for mean squared log error.

* DOC Write user guide and docstring about mean squared log error.

* ENH Add neg_mean_squared_log_error in metrics.scorer

sergeyf added a commit to sergeyf/scikit-learn that referenced this pull request Feb 28, 2017

@Przemo10 Przemo10 referenced this pull request Mar 17, 2017

Closed

update fork (#1) #8606

Sundrique added a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017

[MRG+1] Add new regression metric - Mean Squared Log Error (#7655)
* ENH Implement mean squared log error in sklearn.metrics.regression

* TST Add tests for mean squared log error.

* DOC Write user guide and docstring about mean squared log error.

* ENH Add neg_mean_squared_log_error in metrics.scorer

Sundrique added a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017

NelleV added a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017

[MRG+1] Add new regression metric - Mean Squared Log Error (#7655)
* ENH Implement mean squared log error in sklearn.metrics.regression

* TST Add tests for mean squared log error.

* DOC Write user guide and docstring about mean squared log error.

* ENH Add neg_mean_squared_log_error in metrics.scorer

NelleV added a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017

paulha added a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017

[MRG+1] Add new regression metric - Mean Squared Log Error (#7655)
* ENH Implement mean squared log error in sklearn.metrics.regression

* TST Add tests for mean squared log error.

* DOC Write user guide and docstring about mean squared log error.

* ENH Add neg_mean_squared_log_error in metrics.scorer

paulha added a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017

maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

[MRG+1] Add new regression metric - Mean Squared Log Error (#7655)
* ENH Implement mean squared log error in sklearn.metrics.regression

* TST Add tests for mean squared log error.

* DOC Write user guide and docstring about mean squared log error.

* ENH Add neg_mean_squared_log_error in metrics.scorer

maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment