Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does GPU help? #31

Closed
lppier opened this issue Dec 28, 2020 · 3 comments
Closed

Does GPU help? #31

lppier opened this issue Dec 28, 2020 · 3 comments

Comments

@lppier
Copy link

lppier commented Dec 28, 2020

Hi, firstly thank you so much for this library. I've tried it and it does take some time to get the topics.
Just wondering, will having GPU help speed-wise? Is the speed bottle-necked at the sentence transformers embedding portion?

@lppier lppier closed this as completed Dec 28, 2020
@lppier
Copy link
Author

lppier commented Dec 28, 2020

Apologies, managed to try it on GPU enabled cloud server and it was significantly faster.

@MaartenGr
Copy link
Owner

MaartenGr commented Dec 28, 2020

Yes! Using a GPU is highly recommended to speed-up the inference at the sentence-transformers stage.

However, if you do not have a GPU available to you, then you can actually use TF-IDF instead since BERTopic allows for custom embeddings to be passed:

from bertopic import BERTopic
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer

# Create TF-IDF sparse matrix
docs = fetch_20newsgroups(subset='all',  remove=('headers', 'footers', 'quotes'))['data']
vectorizer = TfidfVectorizer(min_df=5)
embeddings = vectorizer.fit_transform(docs)

# Run BERTopic with embeddings
model = BERTopic(allow_st_model=True)
topics, probabilities = model.fit_transform(docs, embeddings)

Note that I used the parameter allow_st_model which basically uses a sentence-transformer model to fine-tune the topic representation. This should be very efficient regardless of using a GPU since you would only need to embed a few hundred words. However, you can set this to False if you do not want to be using a sentence-transformer model at all.

EDIT: Did not saw your response but I will leave this up here for those who are interested in other embedding methods.

@lppier
Copy link
Author

lppier commented Dec 29, 2020

Thanks @MaartenGr ! This was very useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants