-
Notifications
You must be signed in to change notification settings - Fork 714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the algorithm less memory intensive #43
Comments
I am having issues using big data as well, model fit transform is very resource intensive and takes hours. I am curious: 1.Is there a way to turn on progress_bar=true fit-transforming the model like using the transformers.encode to get embeddings? I don't see a way to tell if the model is still running or hanging. 2.Is there plans to add options to offload the model to the GPU via torch? I have VRAMs to spare Any suggestion and feedback would be greatly appreciated! |
@stolam There is also the option to set @Kingstonshaw
|
@MaartenGr Thank you, for your reply. It is good to know about the Just to put things into perspective, my dataset is 20GB, my RAM is 128GB so I thought I would be OK. The memory consumption was growing slowly from 20 to 40GB and then exploded quickly with UMAP. |
@stolam I just released a new version of BERTopic (v0.5) that has a |
When using big data, it becomes infeasible to hold everything in memory at once.
Would it be possible to iterate over the data rather than hold it in memory?
It might also help exposing
n_jobs
parameter for UMAP so that the user has some control over the number of cores and therefore consumed memory.The text was updated successfully, but these errors were encountered: