Join GitHub today
Lognormal could be more general #864
In http://docs.scipy.org/doc/scipy-0.16.0/reference/generated/scipy.stats.lognorm.html the LogNormal takes 3 parameters, loc, scale and s.
In pymc3 it only takes mu and tau, which are transformations of loc and s.
Would you like a pull request introducing the other parameter? What should it be called? scale or alpha?
I'll admit I'm not a huge fan of Scipy's parameterization conventions. Their Gamma distribution also has 3 parameters, and it makes using the distribution more confusing more often than it aids modeling. Is there a specific use case for a 3-parameter lognormal? Otherwise I would err on the side of simplicity. Others are welcome to chime in.
The scipy lognormal is a little strange. The
so it matches the typical definition.
I would leave it as is, just on principle of least surprise.
See here for more info.
@twiecki Actually both SciPy's loc and scale are missing for PyMC3's Lognormal. The allow you to fit to non-normalized data. But I can just shift and scale my data and then use PyMC3's Lognormal.
Just as context, here are plots of my data and fitted lognormal curves.
@twiecki: As far as i understand the parameters, tau in PyMC3's Lognormal is 1/(s**2) for s from SciPy's lognorm. The parameter mu in PyMC3 seems to be missing in SciPy. I could't derive it from loc and scale in SciPy.
Loc and scale from SciPy on the other hand seem to be missing in PyMC3. But they are not necessarily needed, because i could transform my data.
The parameterizations for PyMC 3 should be consistent with those from PyMC 2, which in turn were generally parameterized to be consistent with WinBUGS, which was (and probably still is) the de facto standard for Bayesian modeling, since thats where many of our users likely came from. Hence, we still use precision instead of variance for normal distributions, and our gamma distribution is parameterized differently from that of SciPy. The Wikipedia entry for lognormal also specifies
I would be open to having alternative parameterizations via keyword parameters, like we do for
@dan-tee, PyMC3 does have a mu parameter, and it is in the usual parameterization.
What scipy's lognormal allows you to do is fit a shifted lognormal distribution. That is one that has a has support from (c,+infinity). For scipy loc=c, scale=exp(mu), shape=sd=1/sqrt(tau).
Here is a notebook demonstrating the differences. It might be possible to write a shifted lognormal distribution for PyMC3, I took a quick crack at it, at the end, it seems to sample ok, but I got some weird theano warnings when I sampled.
Also why does
@bjedwards Interesting, thanks for putting that together. The shift parameter does make sense and I wonder if we shouldn't just include it but default to it being
The warning is odd but reminiscent of the recent theano ufunc problem, but I suppose you're up-to-date on theano master?
I'm also surprised that
@twiecki, I would be somewhat reluctant to change Lognormal to scipy's convention, simply because scipy is the only place I've seen a lognormal parameterized that way. I could be wrong about this, maybe scipy's way is more standard than I think. I think part of the question is whether you want one distribution with default parameters that make it simple, e.g. shifted lognormal with c=0 is lognormal or just for another example skewnormal with alpha=0 is normal, or different distributions for each. As a side note I have a
I'd advocate for including a separate distribution with pointers in the documentation indicating that it matches scipy's and providing the conversion from the scipy parameterization. The lognormal docs could point to the shiftedlognormal docs.
I wouldn't know how to generate random numbers though, maybe just from a lognormal then add c?
I am not on the latest Theano, I'll update and try again. EDIT: Updating to the latest theano seems to have eliminated the errors.
Sample provides a trace with both the variable and it's transformed version, so I don't know where
I am new to both, Python and MCMC techniques and am working my way into PyMC3. As an exercise to familiarize myself with PyMC3 I would like to fit a mixture model of two shifted gamma distributions to generated data. As a next step I would then like to extend this with a stick-breaking process to an "arbitrary" number of shifted gammas, but one step at a time.
The complete code can be found in the following gist: https://gist.github.com/cs224/36482b4f52885310cec6ef975fd20184
Here is the model that I try to implement:
with pm.Model() as model: p = pm.Beta( "p", 1., 1., testval=0.5) #this is the fraction that come from mean1 vs mean2 alpha = pm.Uniform('alpha', 0, 10) # == k beta = pm.Uniform('beta' , 0, 1) # == theta shifts = pm.Normal('shifts', mu=20, sd=50, shape=2) gamma = ShiftedGamma.dist(alpha, beta, c=shifts) process = pm.Mixture('obs', tt.stack([p, 1-p]), gamma, observed=data) with model: step = pm.Metropolis() trace = pm.sample(10000, step=step)
And here is the implementation of the ShiftedGamma that I came up with:
class ShiftedLog(pm.distributions.continuous.transforms.ElemwiseTransform): name = "log" def __init__(self, c=0.0): self.c = c def backward(self, x): return tt.exp(x + self.c) def forward(self, x): return tt.log(x - self.c) class ShiftedGamma(pm.distributions.continuous.Continuous): def __init__(self, alpha=None, beta=None, mu=None, sd=None, c=0.0, *args, **kwargs): super(ShiftedGamma, self).__init__(transform=ShiftedLog(c), *args, **kwargs) alpha, beta = self.get_alpha_beta(alpha, beta, mu, sd) self.alpha = alpha self.beta = beta self.mean = alpha / beta + c self.mode = tt.maximum((alpha - 1) / beta, 0) + c self.variance = alpha / beta**2 self.c = c # self.testval = kwargs.pop('testval', None) # if not self.testval: # self.testval = c + 10.0 pm.distributions.continuous.assert_negative_support(alpha, 'alpha', 'Gamma') pm.distributions.continuous.assert_negative_support(beta, 'beta', 'Gamma') def get_alpha_beta(self, alpha=None, beta=None, mu=None, sd=None): if (alpha is not None) and (beta is not None): pass elif (mu is not None) and (sd is not None): alpha = mu**2 / sd**2 beta = mu / sd**2 else: raise ValueError('Incompatible parameterization. Either use ' 'alpha and beta, or mu and sd to specify ' 'distribution.') return alpha, beta def random(self, point=None, size=None, repeat=None): alpha, beta, c = draw_values([self.alpha, self.beta, self.c], point=point) return generate_samples(stats.gamma.rvs, alpha, scale=1. / beta, dist_shape=self.shape, size=size, loc=c) def logp(self, value): alpha = self.alpha beta = self.beta c = self.c x = value - c return bound(-gammaln(alpha) + logpow(beta, alpha) - beta * x + logpow(x, alpha - 1), x >= 0, alpha > 0, beta > 0)
I got the "inspiration" for this implementation from this gist here: https://gist.github.com/bjedwards/b9f052966705de613637
I have basically 2 questions:
Point 2) might be a consequence of having made a mistake in point 1) or it might be any of the magic reasons as explained by Michael Betancourt in his talk at StanCon: https://www.youtube.com/watch?v=DJ0c7Bm5Djk&feature=youtu.be&t=4h40m9s.
Around my question 1) I would also like to understand if there are better or easier ways to achieve the same effect of fitting a model with shifted gammas to data, e.g. is it really necessary to implement the ShiftedGamma or could I achieve the same effect via pm.Deterministic() or similar?
The main reason why I comment this question in this issue is that if my approach of using a ShiftedGamma as a new class is the "correct" one then I would answer the question of @fonnesbeck, that yes, this issue is still of interest. If there are alternative and better ways to achieve my goal of fitting such a model of shifted gammas then I'm happy to switch to a different approach.
In case this question is wrong here I will also post it on StackOverflow.
Thanks a lot for your help and insight!
If you follow the link from above to the gist you can see that I just added a PyStan version of the model.
If I use NUTS and help this Stan model with the correct init values it works. Otherwise I get "wrong answers".
If I use HMC then I get the error message: "Exception thrown at line 16: beta_log: Random variable is nan, but must not be nan! ... if this warning occurs often then your model may be either severely ill-conditioned or misspecified." How would I do this model of two mixed shifted gamma distributions "right"? Either in PyMC3 or in Stan?
I've never really worked with mixture models, so take this with a grain of salt.
You seem to have a typo in the transformation: shouldn't
I think you could simplify the other code to something like this: (please double check, I did not test this properly...)
class ShiftedGamma(pm.Gamma): def __init__(self, alpha, beta, shift, *args, **kwargs): transform = pm.distributions.transforms.lowerbound(shift) super().__init__(alpha=alpha, beta=beta, *args, **kwargs, transform=transform) self.shift = shift self.mean += shift self.mode += shift def random(self): return super().random() + self.shift def logp(self, x): return super().logp(x - self.shift) with pm.Model() as model: p = pm.Beta( "p", 1., 1., testval=0.5) alpha = pm.Uniform('alpha', 0, 10) # == k beta = pm.Uniform('beta' , 0, 1) # == theta shifts = pm.Normal('shifts', mu=20, sd=50, shape=2, testval=floatX([0., 0.])) gamma = ShiftedGamma.dist(alpha=alpha, beta=beta, shift=shifts) pm.Mixture('y', tt.stack([p, 1-p]), gamma, observed=data))
About the difficulties with initialization and sampling:
shift1_raw = pm.HalfCauchy('shift1_raw', beta=1) shift2_raw = pm.HalfCauchy('shift2_raw', beta=1) shift1 = data.min() - shift1_raw shift2 = shift1 + shift2_raw shift = tt.stack([shift1, shift2]) shift = pm.Deterministic('shift', shift)
But this definitely requires more thought; this version changes the prior, and this new prior depends on the dataset, and shift1_raw and shift2_raw will probably be highly correlated (which makes the sampler unhappy).
Also, keep in mind that mixture models typically end up being multimodal, and NUTS does not usually work well for those.
@aseyboldt: Thanks a lot for your answer and hints! I'll try and see how far this will get me :) I'll let you know.
I've posted the question also to stackoverflow here: http://stackoverflow.com/questions/42735489/fit-a-mixture-model-of-two-shifted-gamma-distributions-in-pymc3