Beta_mean(theta | mu, phi) = Beta(theta | mu * phi, (1 - mu) * phi)
We could add it without any customized gradients first.
Dirichlet_mean(theta | mu, phi) = Dirichlet(theta | mu * phi)
Dirichlet_sym(theta | alpha) = Dirichlet(theta | rep(alpha, size(theta))
Beta_sym(theta | alpha) = Beta(theta | alpha, alpha)
The latter two will save memory by allowing simpler vari implementations.