New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
umap crashes in my computer with 900,000 points #500
Comments
The most likely reason for a silent crash with the system killing the job is a memory issue. UMAP can be pretty memory hungry (newer development versions are working to fix this). At least one option is the try the option |
Thanks. I will try pynndescent. I also think it overflows memory. I tried to compute an accurate nearest neighbor matrix for this data and observed crashing. The following overflows memory:
But this one works just fine:
So I guess piecewise parallelizing the process will help a lot. |
Hello :) I have a similar although more mysterious issue. UMAP gracefully embed the representation generate by one of the layers, a matrix of size Precisely, my python program is killed at this stage:
What I've done so far:
I am running this on:
|
It may just be a memory issue -- as in not enough of it. UMAP can be pretty memory hungry when doing nearest neighbor computations, and depending on that dataset that can get very expensive. I would definitely try with low_memory=True as that will likely help a little. |
Thank you very much for the swift response!
Apologies for the silly questions but I am trying to understand how to handle this type of situations. |
I'm not sure if this is correct, if you are running two UMAP instances one after another, some memory from previous instances may still be occupied. But chance of this happening may be low. |
Also, you may check whether the issue is happening in UMAP or pynndescent. pynndescent is sometimes dataset dependent. I vaguely remember a twitter thread regarding this. |
Thank you for your answers!
|
Hi, I have been trying to embed 900,000 points using UMAP in my computer.
The program eventually gets killed by the system. I tried running in both Jupyter and in terminal.
My system: 16Core/32Thread AMD CPU, 128GB RAM (Terminal reports 125GB). Ubuntu 18.04.3 LTS.
I was wondering if it is a system requirement issue or an issue in how the UMAP handles this many points. (In the paper, it seems UMAP can handle millions of points as there is a visualization of 3Million points.)
Here is a code that reproduces the error in my computer:
The text was updated successfully, but these errors were encountered: