Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix huber loss #178

Merged
merged 6 commits into from
Jan 9, 2017
Merged

Fix huber loss #178

merged 6 commits into from
Jan 9, 2017

Conversation

henry0312
Copy link
Contributor

@henry0312 henry0312 commented Jan 9, 2017

TODO:

@msftclas
Copy link

msftclas commented Jan 9, 2017

Hi @henry0312, I'm your friendly neighborhood Microsoft Pull Request Bot (You can call me MSBOT). Thanks for your contribution!
You've already signed the contribution license agreement. Thanks!

The agreement was validated by Microsoft and real humans are currently evaluating your PR.

TTYL, MSBOT;

return new RegressionL2loss(config);
} else if (type == std::string("regression_l2") || type == std::string("mean_squared_error") || type == std::string("mse")) {
return new RegressionL2loss(config);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge these two if?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, I don't understand what you mean 😢
This PR will be merged after #175

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change

if () 
...
else if()
...

to

if()
...

. They are all regression objective, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, they are.

you mean,

  if (type == std::string("regression") || type == std::string("regression_l2") || type == std::string("mean_squared_error") || type == std::string("mse")) {
    return new RegressionL2loss(config);
  }

right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. and you can break if() to two line:

  if (type == std::string("regression") || type == std::string("regression_l2") 
      || type == std::string("mean_squared_error") || type == std::string("mse")) {
    return new RegressionL2loss(config);
  }

const double a = 2.0 * weights_[i]; // difference of two first derivatives, (zero to inf) and (zero to -inf).
const double b = 0.0;
const double c = (std::fabs(score[i]) + std::fabs(label_[i])) / 1.0e3;
hessians[i] = std::exp(-(x - b) * (x - b) / 2.0 * c * c) / a;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use an inline function to calculate hessian and avoid these repeated code.

const double a = 2.0; // difference of two first derivatives, (zero to inf) and (zero to -inf)
const double b = 0.0;
const double c = (std::fabs(score[i]) + std::fabs(label_[i])) / 1.0e3;
hessians[i] = std::exp(-(x - b) * (x - b) / 2.0 * c * c) / a;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above, and you can put this function into utils/common.h

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok!

@henry0312
Copy link
Contributor Author

rebased master.
However, there may be something wrong with ApproximateHessianWithGaussian.
I'll check.

@Laurae2
Copy link
Contributor

Laurae2 commented Jan 9, 2017

@henry0312 This topic has a bunch of Gradient/Hessian functions for helping to converge for a L1 loss function when a hessian is mandatory for convergence.

They were all tested in xgboost and are all converging fast. Without a hessian, the convergence is slow (which seems to be the case in LightGBM also as @guolinke mentioned in the previous PR for L1 loss). It would be good to have an example to check the convergence speed of the L1 loss.

The short "explanation" is that when a hessian is mandatory for good convergence and we want to use a L1 loss function, we should avoid at any cost a constant hessian. To fix this issue, we smooth the linear function using a curve so the hessian is never zeroed out (better and faster convergence speed). For instance, the Fair objective (from a Microsoft research) is used to approximate such L1 function requiring a hessian.

The R versions with their gradient/hessian formulas are below:

ln_cosh_obj <- function(preds, dtrain) {
  labels <- getinfo(dtrain, "label")
  grad <- tanh(preds-labels)
  hess <- 1-grad*grad
  return(list(grad = grad, hess = hess))
}

ln_expexp_obj <- function(preds, dtrain) {
  labels <- getinfo(dtrain, "label")
  x <- preds-labels
  grad <-  (exp(2*x)-1) / (exp(2*x)+1)
  hess <-  (4*exp(2*x)) / (exp(2*x) + 1)^2 
  return(list(grad = grad, hess = hess))
}

cauchyobj <- function(preds, dtrain) {
  labels <- getinfo(dtrain, "label")
  c <- 2  #the lower the "slower/smoother" the loss is
  x <-  preds-labels
  grad <- x / (x^2/c^2+1)
  hess <- -c^2*(x^2-c^2)/(x^2+c^2)^2
  return(list(grad = grad, hess = hess))
}

fairobj <- function(preds, dtrain) {
  labels <- getinfo(dtrain, "label")
  c <- 2  #the lower the "slower/smoother" the loss is
  x <-  preds-labels
  grad <- c*x / (abs(x)+c)
  hess <- c^2 / (abs(x)+c)^2
  return(list(grad = grad, hess = hess))
}

@henry0312
Copy link
Contributor Author

@Laurae2 the hessian of mae is delta function, therefore, if we can approximate it with something appropriate (e.g. gaussian function), converge and performance will be good, I guess.

@henry0312 henry0312 changed the title Fix huber loss [WIP] Fix huber loss Jan 9, 2017
* w means weights.
*/
inline static double ApproximateHessianWithGaussian(double y, double t, double w=1.0f) {
inline static double ApproximateHessianWithGaussian(const double y, const double t, const double g, const double w=1.0f) {
const double diff = y - t;
const double pi = M_PI;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is the define of M_PI ? I cannot compile it on windows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add #include<cmath>?

Copy link
Collaborator

@guolinke guolinke Jan 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably fixed with 05e97f1

@henry0312
Copy link
Contributor Author

I finished fixing ApproximateHessianWithGaussian (also Huber loss).
L1 loss is approximated with Gaussian function like http://www.wolframalpha.com/input/?i=Integrate%5BErf%5Bx%5D%5D,+Integrate%5BErf%5Bx%2F10%5D%5D.

@henry0312 henry0312 changed the title [WIP] Fix huber loss Fix huber loss Jan 9, 2017
@henry0312
Copy link
Contributor Author

henry0312 commented Jan 9, 2017

Next, I will create Fair loss in an another PR, which @Laurae2 tells me.

@henry0312 henry0312 mentioned this pull request Jan 9, 2017
1 task
@guolinke guolinke merged commit 27d3eb3 into microsoft:master Jan 9, 2017
@henry0312 henry0312 deleted the fix_huber_loss branch January 9, 2017 16:15
@Laurae2 Laurae2 mentioned this pull request Apr 6, 2017
@lock lock bot locked as resolved and limited conversation to collaborators Mar 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants