New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Add new regression metric - Mean Squared Log Error #7655

Merged
merged 4 commits into from Nov 30, 2016
Jump to file or symbol
Failed to load files and symbols.
+26 −0
Diff settings

Always

Just for now

Next

ENH Implement mean squared log error in sklearn.metrics.regression

  • Loading branch information...
kdexd committed Nov 9, 2016
commit e00d9b3cddda9ab31d9977fd9ae9fb0e59a7efed
@@ -54,6 +54,7 @@
from .regression import explained_variance_score
from .regression import mean_absolute_error
from .regression import mean_squared_error
from .regression import mean_squared_log_error
from .regression import median_absolute_error
from .regression import r2_score
@@ -90,6 +91,7 @@
'matthews_corrcoef',
'mean_absolute_error',
'mean_squared_error',
'mean_squared_log_error',
'median_absolute_error',
'mutual_info_score',
'normalized_mutual_info_score',
@@ -14,6 +14,7 @@
# Jochen Wersdorfer <jochen@wersdoerfer.de>
# Lars Buitinck
# Joel Nothman <joel.nothman@gmail.com>
# Karan Desai <karandesai281196@gmail.com>
# Noel Dawe <noel@dawe.me>
# Manoj Kumar <manojkumarsivaraj334@gmail.com>
# Michael Eickenberg <michael.eickenberg@gmail.com>
@@ -33,6 +34,7 @@
__ALL__ = [
"mean_absolute_error",
"mean_squared_error",
"mean_squared_log_error",
"median_absolute_error",
"r2_score",
"explained_variance_score"
@@ -241,6 +243,28 @@ def mean_squared_error(y_true, y_pred,
return np.average(output_errors, weights=multioutput)
def mean_squared_log_error(y_true, y_pred,
sample_weight=None,
multioutput='uniform_average'):
y_type, y_true, y_pred, multioutput = _check_reg_targets(
y_true, y_pred, multioutput)
if not (y_true >= 0).all() and not (y_pred >= 0).all():

This comment has been minimized.

@amueller

amueller Oct 14, 2016

Member

It can be used with anything > -1, right?

@amueller

amueller Oct 14, 2016

Member

It can be used with anything > -1, right?

This comment has been minimized.

@kdexd

kdexd Oct 15, 2016

Contributor

@amueller It can be, but (1 + log(x)) will give huge negative values which change erratically on little change of x between (-1, 0). This will not make the score look sensible. Looking mathematically it is possible, but in practical usages this metric is used for non negative targets. Although if you suggest I'd change it.

@kdexd

kdexd Oct 15, 2016

Contributor

@amueller It can be, but (1 + log(x)) will give huge negative values which change erratically on little change of x between (-1, 0). This will not make the score look sensible. Looking mathematically it is possible, but in practical usages this metric is used for non negative targets. Although if you suggest I'd change it.

This comment has been minimized.

@kdexd

kdexd Oct 15, 2016

Contributor

Additionally I just recalled that, I read somewhere - this metric is used for positive values only, still there is log(1 + x) to make everything inside log greater than one, and finally outside the log positive, which would be greater than zero. Making it allowable till -1 will nullify this 😄

@kdexd

kdexd Oct 15, 2016

Contributor

Additionally I just recalled that, I read somewhere - this metric is used for positive values only, still there is log(1 + x) to make everything inside log greater than one, and finally outside the log positive, which would be greater than zero. Making it allowable till -1 will nullify this 😄

This comment has been minimized.

@amueller

amueller Oct 17, 2016

Member

alright.

@amueller

amueller Oct 17, 2016

Member

alright.

This comment has been minimized.

@jnothman

jnothman Nov 6, 2016

Member

Yes, my reading of the equation agrees that it's designed for non-negative values with an exponential trend.

@jnothman

jnothman Nov 6, 2016

Member

Yes, my reading of the equation agrees that it's designed for non-negative values with an exponential trend.

raise ValueError("Mean Squared Logarithmic Error cannot be used when "
"targets contain negative values.")
output_errors = np.average((np.log(y_true + 1) - np.log(y_pred + 1)) ** 2,
axis=0, weights=sample_weight)
if isinstance(multioutput, string_types):
if multioutput == 'raw_values':
return output_errors
elif multioutput == 'uniform_average':
# pass None as weights to np.average: uniform mean
multioutput = None
return np.average(output_errors, weights=multioutput)
def median_absolute_error(y_true, y_pred):
"""Median absolute error regression loss
ProTip! Use n and p to navigate between commits in a pull request.