Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Weibull loglikelihood issues #59

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

michael-tsel
Copy link

This is a fix for both continuous and discrete Weibull log-likelihood issues #58 and #56

Copy link
Owner

@ragulpr ragulpr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping #56 #58

Comment on lines 168 to 170
loglikelihoods = u * \
K.log(K.exp(hazard1 - hazard0) - (1.0 - epsilon)) - hazard1
K.log((1.0 + epsilon) - K.exp(hazard0 - hazard1)) - hazard0
return loglikelihoods
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the discrete case it’s equivalent. It’s just a matter of where we put epsilon:

 K.log((1.0 + epsilon) - K.exp(hazard0 - hazard1)) - hazard0
=K.log((1.0 + epsilon) - K.exp(hazard0)/K.exp(hazard1)) - hazard0
=K.log(K.exp(hazard0)[(1.0 + epsilon)/K.exp(hazard0) - 1/K.exp(hazard1)]) - hazard0
=K.log([(1.0 + epsilon)*K.exp(-hazard0) - K.exp(-hazard1)])
=K.log(K.exp(-hazard1)[(1.0 + epsilon)*K.exp(hazard1-hazard0) - 1])
=K.log([(1.0 + epsilon)*K.exp(hazard1-hazard0) - 1])-hazard1
=K.log([(1.0)*K.exp(hazard1-hazard0) - 1])-hazard1
~K.log([K.exp(hazard1-hazard0) - 1])-hazard1

Which form is right is just a matter of style. My reasons is that I like to emphasis that for positive distributions discretized loglikelihood can always be written on the form u*log[exp(Λ(y+1)-Λ(y))-1]-Λ(y+1), which is a form I chose because empirically it seemed most numerically stable/efficient and that automatic differentiation seemed to unfold it efficiently. I may be wrong about this however so please provide me with counterexamples if you have them.
See proposition 2.26.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the only true difference was in case of u=0.

loglikelihoods = u * (K.log(b) + b * K.log(ya)) - K.pow(ya, b)
loglikelihoods = u * (K.log(b) - K.log(a) + (b-1) * K.log(ya)) - K.pow(ya, b)
Copy link
Owner

@ragulpr ragulpr Oct 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering the only terms that differ:

b * K.log(ya)
vs
- K.log(a) + (b-1) * K.log(ya)

Where the latter can be written

- K.log(a)+b* K.log(ya)-1* K.log(ya)
=[b* K.log(ya)]- K.log(a)-1* K.log(ya)
=[b* K.log(ya)]- K.log(a)-K.log(y+eps)+K.log(a)
=[b* K.log(ya)]- K.log(y+eps)

With regards to the parameters, this is proportional to

∝b * K.log(ya)

Since K.log(y+eps) has zero gradient, so it's unnecessary to compute it. The standard in almost every statistical/ml package I've seen is - for computational reasons- to implement loss functions using the only the terms that are proportional- rather than equal to the log-likelihood.

The upsides are very marginal computational benefit. The downsides are that it can be confusing if one expects that exp(-loss)="probability of seeing this data with these parameters". Log-likelihood are hence rarely interpretable or directly comparable across distributions and implementations except in proportionality.

I'm leaning towards that the upsides is not worth the downsides and would be open for a loss function that is equal, rather than proportional. I'm a bit worried about touching these equations too much, they are battle tested in numerous fights against NaN and have proven themselves very stable and performant. A form like

     loglikelihoods = u * (K.log(b) + b* K.log(ya)- K.log(y+eps)) - K.pow(ya, b)

Seems like a pretty safe and cheap alternative however.

What do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now when I come think of it once again, you're right! There is no need to add redundant constant term to the log-likelihood in continuous case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants