Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise LdaModel.show_topics performance to only call get_lambda once #1006

Closed
nicolaslecrique opened this issue Nov 11, 2016 · 5 comments
Closed
Labels
bug Issue described a bug difficulty easy Easy issue: required small fix

Comments

@nicolaslecrique
Copy link

This function calls show_topic for each topic, which calls get_topic_terms which calls self.state.get_lambda().

self.state.get_lambda() call is very expensive, it is not dependent on a topic and can be called once for all.
On my setup (500 topics, 100k words, learned with 500k documents), calling show_topics takes 30sec, with 28sec inside the 500 get_lambda() calls

@tmylk tmylk added bug Issue described a bug difficulty easy Easy issue: required small fix labels Nov 11, 2016
@tmylk tmylk changed the title fix LdaModel.show_topics performance Optimise LdaModel.show_topics performance to only call get_lambda once Nov 11, 2016
@souravsingh
Copy link
Contributor

I would like to work on the issue. How do I start?

@tmylk
Copy link
Contributor

tmylk commented Nov 13, 2016

Let's finish your other PRs first @souravsingh :)

@bhargavvader
Copy link
Contributor

@tmylk I tried a fair amount to try and fix this but the only solutions which I could think of are not very pretty.

We could maybe have an option to pass the lambda values, so instead of calling it within the function each time, the matrix is just passed around and we can pick up the row we need.

The other option is to completely avoid calling show_topic in show_topics, and just write up code to do what is happening and keep self.state.get_lambda() only once in the beginning of the function. This also works, but is not really pretty, and there is code duplication.

Both the methods I tried work and pass tests without having to call get_lambda multiple times, but don't seem like the best solution, so I did not open a PR. If one of them sounds okay, I'll open a PR for it.

Any other suggestions?

@tmylk
Copy link
Contributor

tmylk commented Nov 17, 2016

The optimised show_topics should not call the show_topic. That is a good solution

@nicolaslecrique
Copy link
Author

Thanks a lot ! I'll check this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue described a bug difficulty easy Easy issue: required small fix
Projects
None yet
Development

No branches or pull requests

4 participants