-
Notifications
You must be signed in to change notification settings - Fork 72
High-Throughput Screening Reliability #10
Comments
fba3cb4 got that
Still 2,030 valgrind errors. Although
|
95d0096 isolates the RASPA C code into its own process. The process is constructed and terminated on every RASPA run. This allows the OS to come in and free the Not closing this issue because the right way to do this is to track down all the memory leaks, fix them, and then undo 95d0096. We're sacrificing speed until this happens. |
Here's a basic test script for process buildup / shutdown time: from multiprocessing import Process, Pipe
from time import sleep
def f(conn):
sleep(1)
conn.send(5 * 5 * 5 * 5 * 5)
conn.close()
parent_conn, child_conn = Pipe()
p = Process(target=f, args=(child_conn,))
p.start()
output = parent_conn.recv()
p.join()
p.terminate()
print(output) Run with:
Seems that the overhead is about 50 ms per process. That gives a low-end estimate of (0.05 * 1,000,000 / 3600) 13.9 hours of added computation in a high-throughput screen. There's probably a relationship between con/destruct time and number of processes, so I'd expect this number to be more like ~100 hours. Still much better than the previous 3-5 sec OS shutdown approach (57.8 days of added computation!) |
This is a set of issues around running 100k+ simulations with RASPA, and stems mainly from its memory leaks. Currently, RASPA leaks a lot of memory, which builds up over many simulations. The "solution" currently used is to run every simulation in its own process, ie.
This logic really slows down high-throughput screening, and results in a lot of unexplained segmentation faults and generally not playing well with other programs. Instead, screening should use one process per core, and each core should be capable of running an unlimited number of simulations serially.
History of debugging this:
movies.c
. 72002b9 fixed this.Currently, libraspa shuts down after ~5000 runs. I think this is because of the remaining 7 MB of memory leak. 7 MB * 5000 = 35 GB, and my computer has 32 GB memory.
For context, the valgrind output as of today from:
is 14,378 lines long and ends in
To fix, I'll have to go through this list.
The text was updated successfully, but these errors were encountered: