-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory Error while reading for large files #20
Comments
I've a graph with 4.5M nodes and 80M edges. What is your rough estimate of running time on large graphs like this. |
Hi @KarthikRevanuru , from my rough estimation, it should take no more than 20GB of memory to fully load and convert your graph into the CSR format, which is used as the final graph data structure. I have run some testing with a couple large biological networks (see bench repo). For example, SSN has roughly 72M edges with 800k nodes, and it uses ~10GB of memory throughout the execution of the program (see line 65 or line 71 in this benchmarking result table). May I ask what mode of execution you are using (i.e. did you explicitly set |
Thanks for quick revert @RemyLau Do you have any estimates on running time ? Also I've enabled verbose mode and it didn't print anything regarding walks |
Hmm.. What error are you seeing @KarthikRevanuru ? It would be helpful if you can share the error log here. From the log message you shared here, it only took 6 minutes to load the graph, which is a quite reasonable time for networks of this size. And since you're using Lines 202 to 233 in 6a0a733
|
Error message is killed, after I took 64gb ram its fixed. |
what's the expected time to generate walks and train embeddings |
@KarthikRevanuru It honestly depends on a lot of factors, e.g. number of processors, CPU clock, memory clock, etc. But in your case, I'll say 6 hours would be the time where the random walk generation process is finished. So given these couple of clues, I think the issue might be caused by the large number of random walks generated. Previously in my case, although SSN network has roughly the same number of edges as yours, it has an order of magnitude less number of nodes (800k compared to 4.5M). The number of nodes does not affect the size of the sparse graph structure much, but it does affect the size of the corpurs generated (i.e. the random walks). In particular, your PecanPy/src/pecanpy/node2vec.py Line 149 in 6a0a733
This was done originally due to convenience when calling In the meantime, if it is possible, try to further increase your memory allocation to, say 128GB, and see if that could resolve the issue. |
Ok thanks ! |
I've a file with nearly 70M edges and it fails to load into memory on a machine with 32 GB ram
The text was updated successfully, but these errors were encountered: