Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deprecation warnings in LDA example code #125

Open
cartazio opened this issue Nov 12, 2017 · 9 comments
Open

deprecation warnings in LDA example code #125

cartazio opened this issue Nov 12, 2017 · 9 comments

Comments

@cartazio
Copy link

hey @bob-carpenter , @lizzyagibson and I have been looking at the lda example code (its nice how it closely maps to the generative description in the LDA journal paper), and theres a few deprecation warnings related to <- along with the target update increment_log_sum_exp expression. you may wanna update them :)

thanks for the lovely examples!

@cartazio
Copy link
Author

relatedly, whats the correct / recommend way to rewrite the sum over the gammas?

as written, its increment_log_prob = log_sum_exp(gamma)

should it be

a) target+= gamma
b) target+= something something gamma
c) something else?

@bob-carpenter
Copy link
Contributor

The code is way out of date. It's in

https://github.com/stan-dev/example-models/blob/ec6d329bb5a88fa53e44c28fa01287701660933c/misc/cluster/lda/lda.stan

The current marginalization over the topic (k) for a given word (n) where is this:

  for (n in 1:N) {
    real gamma[K];
    for (k in 1:K) 
      gamma[k] <- log(theta[doc[n],k]) + log(phi[k,w[n]]);
    increment_log_prob(log_sum_exp(gamma));  // likelihood
  }

That can be reduced to

  for (n in 1:N)
    target += log_sum_exp(log(theta[doc[n]]) + to_vector(log(phi[ , w[n]])));

It'd be even better to define log_phi in vector form and reuse for each n. It would also be worth doing this for log_theta if the number of words per document is greater than the total number of topics.

@bob-carpenter
Copy link
Contributor

@cartazio: Feel free to submit a pull request.

And a warning---you can't really do Bayesian inference for LDA because of the multimodality. You'll see that you won't satisfy convergence diagnostics running in multiple chains, and not just because of label switching.

@cartazio
Copy link
Author

cartazio commented Nov 12, 2017

@bob-carpenter thanks! Thats super helpful.

by multi-mode you mean: there are different local optima when viewed as an optimization problem / things are nonconvex? (ie vary the priors and there will be different local optima in the posterior?). I had to google around to figure out what you meant, https://scholar.harvard.edu/files/dtingley/files/multimod.pdf seemed the most clearly expositional despite the double spaced formatting :)

is there any good reading/references on how the "variational" formulations such as Mallet/VowpalWabbit etc deal with that issue? or is it just one of those things that tends to stay hidden in folklore common knowledge?

@bob-carpenter
Copy link
Contributor

bob-carpenter commented Nov 12, 2017 via email

@cartazio
Copy link
Author

cartazio commented Nov 13, 2017 via email

@bob-carpenter
Copy link
Contributor

bob-carpenter commented Nov 13, 2017 via email

@cartazio
Copy link
Author

cartazio commented Nov 13, 2017 via email

@bob-carpenter
Copy link
Contributor

bob-carpenter commented Nov 13, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants