ENH: Add instrumentation to monitor resources #984

oesteban · 2022-04-20T19:12:45Z

Adds an instrumentation module which could be eventually packaged with nipype or standalone to keep track of resource utilization of mriqc.

For now, I'm testing the pattern and will try to write up some code to generate nice plots with the data.

cc/ @effigies @mgxd

oesteban · 2022-04-21T07:40:25Z

I'm still investigating why the recording pauses for about 100s just while the FSL fast process is kicked off. I suspect it has to do with threading, forking and locks.

example_output.csv

oesteban · 2022-04-21T07:46:45Z

Two assumptions I should've been more explicit about:

NiPype's resource monitor does not work at the workflow level. It can be useful to track isoleted interfaces but the approach is expensive, inaccurate, inefficient and irreliable for workflows (each interface triggers its own thread, with great potential for problems with process forking).
This effort is to investigate Too much vmem? #824, where FSL fast seems to behave as a memory hog.

If the new approach to monitoring works out, then we will see how to make it available to other users.

oesteban · 2022-04-25T16:07:31Z

This is the kind of plots we can generate with these new files (see code under new viz submodule).

Using spawn, plotting RSS (--nprocs 8):

Using spawn, plotting VM (--nprocs 8):

I've just plotted it with forkserver and the picture is exactly the same, so I am afraid I might have not managed to correctly configure spawn or the process pool just initiates all workers either way.

The huge light blue area amounts to python processes, which all get named "python3.8". The main process offset is also important, although nothing compared to the multiprocessing workers.

That said, it feels like we should find a solution for nipype 1 execution plugins:

Instead of multiprocessing, use multithreading. I don't know right this minute if this has ever been attempted.
Creating a plugin based in asyncio (which seems the best option in theory, as this is eminently an I/O bound problem).

wdyt @satra @effigies @mgxd ?

effigies · 2022-04-25T17:14:51Z

That said, it feels like we should find a solution for nipype 1 execution plugins:

Instead of multiprocessing, use multithreading. I don't know right this minute if this has ever been attempted.

Assuming all of your pure Python interfaces are trivial, this might work. Otherwise it's going to be hard to avoid bottlenecks related to the GIL.

Creating a plugin based in asyncio (which seems the best option in theory, as this is eminently an I/O bound problem).

Same issue here. Asyncio doesn't replace multiprocessing, it just allows us to skip writing our own callback queue.

oesteban · 2022-04-25T19:21:04Z

That said, it feels like we should find a solution for nipype 1 execution plugins:

Instead of multiprocessing, use multithreading. I don't know right this minute if this has ever been attempted.

Assuming all of your pure Python interfaces are trivial, this might work. Otherwise it's going to be hard to avoid bottlenecks related to the GIL.

Creating a plugin based in asyncio (which seems the best option in theory, as this is eminently an I/O bound problem).

Same issue here. Asyncio doesn't replace multiprocessing, it just allows us to skip writing our own callback queue.

Python interfaces could also spawn their own process to overcome the GIL. I believe that having the main thread control a pool of threads instead of processes should, at least, ease the multiplicative effect of multiprocessing in terms of allocation.

oesteban · 2022-04-27T09:00:19Z

Alright - it seems like a good practice with apparently little cost.

In e4bdd4e, I set up OMP_NUM_THREADS=1 early in the config file, and make sure that workers of the process pool set it to the proper value (passed by --omp-nthreads of the commandline or the total number of CPUs if unspecified). That keeps the mother process under 600MB VMS (i.e., less than 1/10th of the original size).

Then, we SHOULD NOT use the forkserver. It seems the forkserver doesn't kill workers (at least prior 3.11), and the default fork context works well, because of the new vms of the mother process.

Finally, as a note for @satra, @mgxd, @effigies and other nipypers, we probably want to avoid the ProcessPool, which maintains the --nprocs number of workers up, when nipype typically doesn't need that. Instead, creating ad-hoc processes when necessary, which, at points, might reach the maximum number of parallel processes, seems like a sure way of reducing further the memory fingerprint.

With the changes in this PR, and after the latest OMP=1 change, the RSS picture remains very similar (as expected):

But the VMS picture has changed dramatically:

(don't get deceived by the weird lines I removed from the plot da20809)

With this commit, I believe the VMem problems have been addressed. Resolves: #824. Related: #536.

oesteban · 2022-04-27T10:40:27Z

This all comes about because of nipy/nipype#3456

oesteban force-pushed the enh/resource-monitor branch from 6431448 to c5cd640 Compare April 21, 2022 07:33

oesteban force-pushed the enh/resource-monitor branch from f1eb7bc to cfe36d0 Compare April 25, 2022 15:52

oesteban force-pushed the enh/resource-monitor branch 2 times, most recently from 2ad11d9 to 0537657 Compare April 25, 2022 16:23

oesteban marked this pull request as ready for review April 25, 2022 16:26

oesteban requested review from effigies, mgxd and celprov April 25, 2022 16:26

oesteban force-pushed the enh/resource-monitor branch 2 times, most recently from a6f7eea to a9176ee Compare April 26, 2022 06:42

oesteban added 9 commits April 26, 2022 08:44

enh: add instrumentation to monitor resources

bc36cd5

enh: finalizing new monitor

4bdb6d1

sty: remove unnecessary lists

d34d5e7

enh: conclude implementation as a process

ad09762

enh: add visualization and execution as module

3f68ac9

maint: enable resource monitor on circle

84bfd05

fix: remove spurious representation of zeros

da20809

enh: continue delaying big nipype loads

e6b27d8

fix: initialize config in jailed workflow creation process

1f6bab4

oesteban force-pushed the enh/resource-monitor branch 2 times, most recently from 41d7f6b to 4259bab Compare April 26, 2022 08:05

oesteban added 2 commits April 27, 2022 10:39

enh: incorporate new custom plugin

3b3e5b0

fix: set OMP_NUM_THREADS=1 for master thread

e4bdd4e

oesteban force-pushed the enh/resource-monitor branch from 4259bab to e4bdd4e Compare April 27, 2022 08:49

sty: run black

48da82a

oesteban added 2 commits April 27, 2022 11:37

enh: less general setting of OMP_NUM_THREADS

dc18692

With this commit, I believe the VMem problems have been addressed. Resolves: #824. Related: #536.

enh: produce png files at the end with --resource-monitor

9600ba7

maint(circleci): keep config and resources artifacts

2696389

oesteban force-pushed the enh/resource-monitor branch from b339e4f to 2696389 Compare April 27, 2022 11:19

oesteban merged commit b60ebad into master Apr 27, 2022

oesteban deleted the enh/resource-monitor branch April 27, 2022 11:34

mgxd mentioned this pull request May 23, 2022

RF/ENH: Rework workflow generation nipreps/nibabies#219

Merged

oesteban mentioned this pull request Oct 25, 2022

Resource monitor files being emitted to output directory nipreps/fmriprep#2864

Open

effigies mentioned this pull request Dec 7, 2022

ENH: Tag memory based on data shape, annotate T2SMap nipreps/fmriprep#2898

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Add instrumentation to monitor resources #984

ENH: Add instrumentation to monitor resources #984

oesteban commented Apr 20, 2022

oesteban commented Apr 21, 2022

oesteban commented Apr 21, 2022

oesteban commented Apr 25, 2022

effigies commented Apr 25, 2022

oesteban commented Apr 25, 2022

oesteban commented Apr 27, 2022

oesteban commented Apr 27, 2022

ENH: Add instrumentation to monitor resources #984

ENH: Add instrumentation to monitor resources #984

Conversation

oesteban commented Apr 20, 2022

oesteban commented Apr 21, 2022

oesteban commented Apr 21, 2022

oesteban commented Apr 25, 2022

effigies commented Apr 25, 2022

oesteban commented Apr 25, 2022

oesteban commented Apr 27, 2022

oesteban commented Apr 27, 2022