# pymc-devs/pymc3

Closed
opened this issue Mar 15, 2017 · 6 comments

Projects
None yet
3 participants
Member

### aseyboldt commented Mar 15, 2017

 Right now we allow discrete parameters in models when sampling using nuts, and assign a metropolis step for those parameters. I can't see why this would work well even for moderate models sizes (every change in one of the discrete parameters will probably send us out of the typical set for the continuous ones, right?). Or do you know models where this works? If not, I would propose to print at least a warning if a user tries to do this. For an example where I think this lead to a problem, see this SO question
Member

### fonnesbeck commented Mar 15, 2017 • edited

 This generally works for me, so I do use it with NUTS. By "works" I mean that I get the same answer as when I switch over to a pure Metropolis sampling scheme. I typically only use it when there is only one or two non-continuous parameters. As to when it stops working, I have no idea. It would be an interesting research question. I don't think it would even run at all with ADVI, would it? It certainly would not happen automatically, since ADVI does not use sample.
Member Author

### aseyboldt commented Mar 15, 2017

 No, initialization using advi seems to be disabled silently.
Member Author

### aseyboldt commented Mar 15, 2017

 The problem I see with this is that it is very easy for a new user to put some discrete parameter in a model, which may really mess up convergence. And if you don't know much about how the samplers work it is not easy to see why this is a problem. It also does not make it easier that a lot of valid models contain observed discrete RVs.
Member

### twiecki commented Mar 15, 2017

 @aseyboldt What would be the particular problem of being moved outside the typical set? Wouldn't that also happen if you mix other samplers?
Member

### fonnesbeck commented Mar 15, 2017

 I think its an open question. I mentioned it to the Stan guys once and they did not seem to have an intuition about it either.
Member Author

### aseyboldt commented Mar 30, 2017 • edited

 Sorry, I forgot about this issue. I certainly can't claim to have a good understanding of this, but here is what I had in mind when I mentioned the typical sets. Lets say we have a distribution with one binary parameter $\alpha$ and a couple of continuous parameters $\theta$. Also, to keep it simple $P(\alpha=0) = P(\alpha=1)$. If the typical sets of $\theta | \alpha=0$ and $\theta | \alpha=1$ don't overlap much, then the sampler can't switch between the states for alpha, and gets stuck at one of the values. My understanding is that this could happen pretty quickly if a number of values in $\theta$ depend on $\alpha$ – even if all the typical sets of $\theta_i | \alpha=0, \theta_{\ne i}$ and $\theta_i | \alpha=1, \theta_{\ne i}$ overlap. For example: N = 100 with pm.Model() as model: alpha = pm.Bernoulli('alpha', p=0.5) pm.Normal('theta', mu=alpha, sd=1, shape=N)  If N is 1, we can sample from this easily. But for N=100 alpha is stuck at 0 for the whole trace. (Of course we could reparameterize y as N(0, 1) + alpha and avoid that problem, but still...) I don't know how much of a problem this is in real world applications. I just remember running into some trouble like this some time ago, but can't even find the model anymore. Edit: Ahhhh, no latex on github...

Merged