[MRG] Enable MPI parallelism with ParallelContext #79

blakecaldwell · 2019-09-23T23:03:58Z

Use ParallelContext to split computation for each simulation among
processors on a system. This mode is not compatible with joblibs
embarrasingly parallel execution and requires MPI.

Addresses #78

blakecaldwell · 2019-09-23T23:05:57Z

Benchmarks:
MPI: 47.036s
joblibs: 58.095s

(mne) Blakes-MacBook-Pro:hnn-core blake$ time mpiexec -np 4 nrniv -python -mpi examples/parallel_simulate_evoked.py
numprocs=4
NEURON -- VERSION 7.6.7 HEAD (603da174) 2019-04-19
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2018
See http://neuron.yale.edu/neuron/credits

Additional mechanisms from files
 mod/ar.mod mod/ca.mod mod/cad.mod mod/cat.mod mod/dipole.mod mod/dipole_pp.mod mod/hh2.mod mod/kca.mod mod/km.mod mod/vecevent.mod
running on 4 cores
Simulation time: 0.03 ms...
Simulation time: 10.0 ms...
Simulation time: 20.0 ms...
Simulation time: 30.0 ms...
Simulation time: 40.0 ms...
Simulation time: 50.0 ms...
Simulation time: 60.0 ms...
Simulation time: 70.0 ms...
Simulation time: 80.0 ms...
Simulation time: 90.0 ms...
Simulation time: 100.0 ms...
Simulation time: 110.0 ms...
Simulation time: 120.0 ms...
Simulation time: 130.0 ms...
Simulation time: 140.0 ms...
Simulation time: 150.0 ms...
Simulation time: 160.0 ms...
running on 4 cores
Simulation time: 0.03 ms...
Simulation time: 10.0 ms...
Simulation time: 20.0 ms...
Simulation time: 30.0 ms...
Simulation time: 40.0 ms...
Simulation time: 50.0 ms...
Simulation time: 60.0 ms...
Simulation time: 70.0 ms...
Simulation time: 80.0 ms...
Simulation time: 90.0 ms...
Simulation time: 100.0 ms...
Simulation time: 110.0 ms...
Simulation time: 120.0 ms...
Simulation time: 130.0 ms...
Simulation time: 140.0 ms...
Simulation time: 150.0 ms...
Simulation time: 160.0 ms...
running on 4 cores
Simulation time: 0.03 ms...
Simulation time: 10.0 ms...
Simulation time: 20.0 ms...
Simulation time: 30.0 ms...
Simulation time: 40.0 ms...
Simulation time: 50.0 ms...
Simulation time: 60.0 ms...
Simulation time: 70.0 ms...
Simulation time: 80.0 ms...
Simulation time: 90.0 ms...
Simulation time: 100.0 ms...
Simulation time: 110.0 ms...
Simulation time: 120.0 ms...
Simulation time: 130.0 ms...
Simulation time: 140.0 ms...
Simulation time: 150.0 ms...
Simulation time: 160.0 ms...
running on 4 cores
Simulation time: 0.03 ms...
Simulation time: 10.0 ms...
Simulation time: 20.0 ms...
Simulation time: 30.0 ms...
Simulation time: 40.0 ms...
Simulation time: 50.0 ms...
Simulation time: 60.0 ms...
Simulation time: 70.0 ms...
Simulation time: 80.0 ms...
Simulation time: 90.0 ms...
Simulation time: 100.0 ms...
Simulation time: 110.0 ms...
Simulation time: 120.0 ms...
Simulation time: 130.0 ms...
Simulation time: 140.0 ms...
Simulation time: 150.0 ms...
Simulation time: 160.0 ms...

real	0m47.036s
user	3m5.664s
sys	0m1.402s
(mne) Blakes-MacBook-Pro:hnn-core blake$ time mpiexec -np 4 nrniv -python -mpi examples/parallel_simulate_evoked.py
numprocs=4
NEURON -- VERSION 7.6.7 HEAD (603da174) 2019-04-19
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2018
See http://neuron.yale.edu/neuron/credits

Additional mechanisms from files
 mod/ar.mod mod/ca.mod mod/cad.mod mod/cat.mod mod/dipole.mod mod/dipole_pp.mod mod/hh2.mod mod/kca.mod mod/km.mod mod/vecevent.mod
running on 4 cores
Simulation time: 0.03 ms...
Simulation time: 10.0 ms...
Simulation time: 20.0 ms...
Simulation time: 30.0 ms...
Simulation time: 40.0 ms...
Simulation time: 50.0 ms...
Simulation time: 60.0 ms...
Simulation time: 70.0 ms...
Simulation time: 80.0 ms...
Simulation time: 90.0 ms...
Simulation time: 100.0 ms...
Simulation time: 110.0 ms...
Simulation time: 120.0 ms...
Simulation time: 130.0 ms...
Simulation time: 140.0 ms...
Simulation time: 150.0 ms...
Simulation time: 160.0 ms...
running on 4 cores
Simulation time: 0.03 ms...
Simulation time: 10.0 ms...
Simulation time: 20.0 ms...
Simulation time: 30.0 ms...
Simulation time: 40.0 ms...
Simulation time: 50.0 ms...
Simulation time: 60.0 ms...
Simulation time: 70.0 ms...
Simulation time: 80.0 ms...
Simulation time: 90.0 ms...
Simulation time: 100.0 ms...
Simulation time: 110.0 ms...
Simulation time: 120.0 ms...
Simulation time: 130.0 ms...
Simulation time: 140.0 ms...
Simulation time: 150.0 ms...
Simulation time: 160.0 ms...
running on 4 cores
Simulation time: 0.03 ms...
Simulation time: 10.0 ms...
Simulation time: 20.0 ms...
Simulation time: 30.0 ms...
Simulation time: 40.0 ms...
Simulation time: 50.0 ms...
Simulation time: 60.0 ms...
Simulation time: 70.0 ms...
Simulation time: 80.0 ms...
Simulation time: 90.0 ms...
Simulation time: 100.0 ms...
Simulation time: 110.0 ms...
Simulation time: 120.0 ms...
Simulation time: 130.0 ms...
Simulation time: 140.0 ms...
Simulation time: 150.0 ms...
Simulation time: 160.0 ms...
running on 4 cores
Simulation time: 0.03 ms...
Simulation time: 10.0 ms...
Simulation time: 20.0 ms...
Simulation time: 30.0 ms...
Simulation time: 40.0 ms...
Simulation time: 50.0 ms...
Simulation time: 60.0 ms...
Simulation time: 70.0 ms...
Simulation time: 80.0 ms...
Simulation time: 90.0 ms...
Simulation time: 100.0 ms...
Simulation time: 110.0 ms...
Simulation time: 120.0 ms...
Simulation time: 130.0 ms...
Simulation time: 140.0 ms...
Simulation time: 150.0 ms...
Simulation time: 160.0 ms...

real	0m46.142s
user	3m2.367s
sys	0m1.281s
(mne) Blakes-MacBook-Pro:hnn-core blake$ time python examples/embarassingly_parallel_simulate_evoked.py
running on 1 cores
Simulation time: 0.03 ms...
running on 1 cores
Simulation time: 0.03 ms...
running on 1 cores
Simulation time: 0.03 ms...
running on 1 cores
Simulation time: 0.03 ms...
Simulation time: 10.0 ms...
Simulation time: 10.0 ms...
Simulation time: 10.0 ms...
Simulation time: 10.0 ms...
Simulation time: 20.0 ms...
Simulation time: 20.0 ms...
Simulation time: 20.0 ms...
Simulation time: 20.0 ms...
Simulation time: 30.0 ms...
Simulation time: 30.0 ms...
Simulation time: 30.0 ms...
Simulation time: 30.0 ms...
Simulation time: 40.0 ms...
Simulation time: 40.0 ms...
Simulation time: 40.0 ms...
Simulation time: 40.0 ms...
Simulation time: 50.0 ms...
Simulation time: 50.0 ms...
Simulation time: 50.0 ms...
Simulation time: 50.0 ms...
Simulation time: 60.0 ms...
Simulation time: 60.0 ms...
Simulation time: 60.0 ms...
Simulation time: 60.0 ms...
Simulation time: 70.0 ms...
Simulation time: 70.0 ms...
Simulation time: 70.0 ms...
Simulation time: 70.0 ms...
Simulation time: 80.0 ms...
Simulation time: 80.0 ms...
Simulation time: 80.0 ms...
Simulation time: 80.0 ms...
Simulation time: 90.0 ms...
Simulation time: 90.0 ms...
Simulation time: 90.0 ms...
Simulation time: 90.0 ms...
Simulation time: 100.0 ms...
Simulation time: 100.0 ms...
Simulation time: 100.0 ms...
Simulation time: 100.0 ms...
Simulation time: 110.0 ms...
Simulation time: 110.0 ms...
Simulation time: 110.0 ms...
Simulation time: 110.0 ms...
Simulation time: 120.0 ms...
Simulation time: 120.0 ms...
Simulation time: 120.0 ms...
Simulation time: 120.0 ms...
Simulation time: 130.0 ms...
Simulation time: 130.0 ms...
Simulation time: 130.0 ms...
Simulation time: 130.0 ms...
Simulation time: 140.0 ms...
Simulation time: 140.0 ms...
Simulation time: 140.0 ms...
Simulation time: 140.0 ms...
Simulation time: 150.0 ms...
Simulation time: 150.0 ms...
Simulation time: 150.0 ms...
Simulation time: 150.0 ms...
Simulation time: 160.0 ms...
Simulation time: 160.0 ms...
Simulation time: 160.0 ms...
Simulation time: 160.0 ms...

real	0m58.095s
user	3m48.320s
sys	0m1.245s

hnn_core/dipole.py

hnn_core/network.py

blakecaldwell · 2019-09-24T18:54:04Z

Tests added. @jasmainak could you please review?

hnn_core/dipole.py

hnn_core/parallel.py

hnn_core/network.py

hnn_core/parallel.py

hnn_core/tests/test_compare_hnn.py

hnn_core/dipole.py

examples/parallel_simulate_evoked.py

hnn_core/tests/test_mpi_simulation.py

jasmainak · 2019-09-24T20:46:32Z

@blakecaldwell I'm a bit lost what the user must do to switch between the two backends?

blakecaldwell · 2019-09-24T21:17:54Z

It depends on the arguments to simulate_dipole. Passing n_jobs is only relevant to joblibs and n_cores is only relevant to MPI. A combination with each > 1 is not allowed (will fail with a warning).

blakecaldwell · 2020-07-10T01:46:58Z

even though n_procs=2, it seems that all 4 of my cores are being utilized. At least that's what it seems from htop command

But do you actually have 2 physical cores with hyperthreading turned on? That's my guess. What does Apple -> 'About This Mac' say?

Sorry, of course. I realized that using 2 cores is hard-coded in the example. If you don’t provide n_procs when creating the backend, it should use all available cores (all 4). Even if there were 4 cores with hyper threading the code should have used all 4 when n_procs is not given. Sound okay?

jasmainak · 2020-07-10T02:18:04Z

hnn_core/parallel_backends.py

+    Parameters
+    ----------
+    n_procs : int | None
+        The number of processes processes MPI will use (spread over cores)


processes processes -> processes

jasmainak · 2020-07-10T02:25:33Z

stuff for another PR.

should we raise an issue to track the things for another PR?

But do you actually have 2 physical cores with hyperthreading turned on?

You're right, I have 2 cores but htop is showing 4 so probably hyperthreading

I realized that using 2 cores is hard-coded in the example.

humm ... why is that? That means n_procs does nothing? At least in joblib, I think you get fine grained control through processes. It's fine if that's not possible, it just needs to be reflected correctly in the class arguments.

Even if there were 4 cores with hyper threading the code should have used all 4 when n_procs is not given. Sound okay?

I seem to recall from @agramfort that turning on all cores by default is a bad idea because it doesn't leave any cores available for other programs like web browser. I am not exactly sure if that was the reasoning. Perhaps the default should be 1 but in the GUI it can be set to use however many you'd like?

jasmainak · 2020-07-10T02:28:56Z

hnn_core/parallel_backends.py

+
+    n_jobs : int
+        The number of jobs to start in parallel (NOT SUPPORTED)
+    n_procs : int


Suggested change

n_procs : int

n_procs : int | None

also say what happens when n_procs is None

jasmainak · 2020-07-10T02:30:49Z

hnn_core/parallel_backends.py

+    n_procs : int
+        The number of processes MPI will use (spread over cores)
+    mpi_cmd_str : str
+        The string of the mpi command with number of procs and options


if the user provides n_procs and then it's also provided here, what will happen? Should we raise an error or MPI will take care of it?

I cleaned up the docstring to explain that one was user-specified, and the other is how many MPI will actually use. NEURON will print how many it is running on, so I think that is sufficient. If the user is seeing a discrepancy, they would be able to refer to the docstring. I don't think it needs to be explained in the output or with a warning.

jasmainak · 2020-07-10T02:31:18Z

hnn_core/parallel_backends.py

+        import os
+        import sys
+        import multiprocessing
+


python builtins can be global imports

jasmainak · 2020-07-10T02:35:08Z

hnn_core/tests/test_compare_hnn.py

+
+    if backend == 'mpi':
+        with MPIBackend(n_jobs=n_jobs, n_procs=2, mpi_cmd='mpiexec'):
+            dpl = simulate_dipole(net)[0]


just a thought, since you are doing only a smoke test -- we could simulate on a smaller network so that the test is fast.

I would also add a test to check that the output of both the backends is the same. Again, smaller network should be fine.

The intention of this is to actually check the dipole and spiking data with MPI to make sure that it matches the sequential version. Each of the three tests has value in confirming correct results. This will need some adjustment when testing multiple trials, but getting consistent results with MPI vs. joblibs will be important. Perhaps in that PR test_mpi and test_joblib will run 3 trials and test_hnn_core can still run one trial.

I'm a bit afraid that we'll start hitting the CI time limits if we don't optimize tests from the beginning. If we check the outputs against each other on a smaller network, it's legit, no?

Again could be another PR since we are already too big in this PR. Let's just keep track of the issues somewhere

The limit for Travis is 40 minutes, but starting small makes sense. Can we delay making reduced tests until we can verify equal results with HNN once this PR and another for seeding of trials have been implemented?

sure! let's just open issues. It will also be helpful to have some of these issues for new contributors in a coding sprints. For onboarding them into the development cycle.

blakecaldwell · 2020-07-13T23:17:10Z

I realized that using 2 cores is hard-coded in the example.

humm ... why is that? That means n_procs does nothing? At least in joblib, I think you get fine grained control through processes. It's fine if that's not possible, it just needs to be reflected correctly in the class arguments.

No, it is set in plot_simulate_evoked.py. You could set n_procs if you want to be specific. Otherwise, it will default to the number of cores on the system.

Even if there were 4 cores with hyper threading the code should have used all 4 when n_procs is not given. Sound okay?

I seem to recall from @agramfort that turning on all cores by default is a bad idea because it doesn't leave any cores available for other programs like web browser. I am not exactly sure if that was the reasoning. Perhaps the default should be 1 but in the GUI it can be set to use however many you'd like?

I think it depends on the use case. MPI is different than embarrassingly parallel, in that if one proc slows down, it necessarily slows the whole job down because of synchronization barriers. For this reason, MPI is very careful about how processes are assigned to cores, and will even change the processor scheduling if you assign more procs than there are cores for example. Similarly, when some cores are unused, it can create an unsymmetric distribution that may be significantly slower than the balanced case. It's usually just best to let it run on all CPU cores at once. Linux/Mac is relatively forgiving if processors are oversubscribed and the user should be expecting a slowdown if they start a simulation -- this isn't a background process.

codecov-commenter · 2020-07-13T23:48:05Z

Codecov Report

Merging #79 into master will increase coverage by 1.11%.
The diff coverage is 90.47%.

@@            Coverage Diff             @@
##           master      #79      +/-   ##
==========================================
+ Coverage   89.92%   91.04%   +1.11%     
==========================================
  Files          17       18       +1     
  Lines        1708     1976     +268     
==========================================
+ Hits         1536     1799     +263     
- Misses        172      177       +5

Impacted Files	Coverage Δ
hnn_core/parallel_backends.py	`81.88% <81.88%> (ø)`
hnn_core/dipole.py	`94.44% <88.88%> (-1.56%)`	⬇️
hnn_core/neuron.py	`92.13% <92.13%> (ø)`
hnn_core/mpi_child.py	`94.28% <94.28%> (ø)`
hnn_core/__init__.py	`100.00% <100.00%> (ø)`
hnn_core/cell.py	`86.06% <100.00%> (ø)`
hnn_core/network.py	`97.01% <100.00%> (+9.67%)`	⬆️
hnn_core/tests/test_cell.py	`100.00% <100.00%> (ø)`
hnn_core/tests/test_compare_hnn.py	`100.00% <100.00%> (ø)`
hnn_core/tests/test_feed.py	`100.00% <100.00%> (ø)`
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 015124c...c039dcf. Read the comment docs.

blakecaldwell · 2020-07-14T00:07:26Z

stuff for another PR.

should we raise an issue to track the things for another PR?

Created #112 and #113

blakecaldwell · 2020-07-14T18:08:09Z

@jasmainak Please see the latest 2 commits which address the global variable naming and the API for using backends. Now n_jobs is not specified with simulate_dipole. It is only specified with JoblibBackend. There are different parameters for JoblibBackend and MPIBackend, but I don't think that'll be too surprising to the user.

The case when n_procs = 1 is handled without MPI.

Only use 'n_jobs' when instantiating JoblibBackend. This also removes the need to test for trying to run MPIBackend with n_jobs > 1. Update plot_simulate_evoked.py example to use JoblibBackend as well as MPIBackend.

jasmainak · 2020-07-14T20:06:59Z

examples/plot_simulate_evoked.py

 net = Network(params)
-dpls = simulate_dipole(net, n_jobs=1, n_trials=2)
+with JoblibBackend():


Suggested change

with JoblibBackend():

with JoblibBackend(n_procs=1):

? is that the API?

No, I don't think that makes sense. I just made a comment on how n_procs is a configuration for MPIBackend and n_jobs is for JoblibBackend.

yeah okay, they are the same thing, no? You want the arguments to reflect what the original softwares call them? In any case, can you just show the user how to use it for the JoblibBackend so that it's documented.

Suggested change

with JoblibBackend():

with JoblibBackend(n_jobs=1):

if that's the right way to do it

jasmainak · 2020-07-14T20:08:41Z

examples/plot_simulate_evoked.py

+# start the simulation trial across the number of processors (cores)
+# specified by n_procs using MPI. The 'mpiexec' launcher is for
+# openmpi, which must be installed on the system
+from hnn_core import MPIBackend


I think it's better to do:

from hnn_core.parallel import MPIBackend

but I'm fine pushing this to another PR

Then the user has to know about .parallel and structuring of the files.

no strong feeling. But we shouldn't put everything in global namespace. We can document some of the stuff in API and through examples, so it's just copy-paste for users.

Okay, I've just been following your lead on what to name things.

jasmainak · 2020-07-14T20:10:23Z

hnn_core/parallel_backends.py

+    Attributes
+    ----------
+    n_jobs : int
+        The number of jobs to start in parallel


type is missing

Not sure what you are suggesting. Do you mean casting n_jobs before assigning to self.n_jobs?

umm I am just saying you need to document self.type in the attributes

It's an int, right?

jasmainak · 2020-07-14T20:11:27Z

hnn_core/parallel_backends.py

+        _BACKEND = self._old_backend
+
+    def _clone_and_simulate(self, net, trial_idx):
+        # avoid relative lookups after being forked by joblib


why is that?

_clone_and_simulate would fail. I just ran it again with a relative lookup:

―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― test_joblib ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― joblib.externals.loky.process_executor._RemoteTraceback: """ Traceback (most recent call last): File "/Users/blake/miniconda3/envs/mne/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 418, in _process_worker r = call_item() File "/Users/blake/miniconda3/envs/mne/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 272, in __call__ return self.fn(*self.args, **self.kwargs) File "/Users/blake/miniconda3/envs/mne/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 567, in __call__ return self.func(*args, **kwargs) File "/Users/blake/miniconda3/envs/mne/lib/python3.7/site-packages/joblib/parallel.py", line 225, in __call__ for func, args, kwargs in self.items] File "/Users/blake/miniconda3/envs/mne/lib/python3.7/site-packages/joblib/parallel.py", line 225, in <listcomp> for func, args, kwargs in self.items] File "/Users/blake/repos/hnn-core-new/hnn_core/parallel_backends.py", line 88, in _clone_and_simulate from .neuron import NeuronNetwork, _simulate_single_trial KeyError: "'__name__' not in globals" """

okay fair enough :-)

jasmainak · 2020-07-14T20:36:28Z

hnn_core/neuron.py

+
+
+class NeuronNetwork(object):
+    """The NeuronNetwork class.


so I'm just trying to wrap my head around this class. The idea is that this class contains everything in the old Network class that pertains to Neuron software? The advantage is that the new Network class is picklable?

One disadvantage I see with this approach is that it disallows users to pass any Neuron objects that they created. I can see that becoming useful if we wanted to allow more detailed cell morphologies or dynamics for instance. See my proposal for incorporating the calcium dynamics for instance. Not sure if that's the best approach but I can see those kinds of APIs being useful in similar scenarios that involve customization.

Yes, and:

network.py is simpler; no reason for context manager there

NeuronNetwork can be rebuilt. For running batches of multiple trials on the same network.

Then parallel simulation would break.

Edit to 2) above. Really for running multiple trials without respawning nrniv. Running multiple trials on the same network is a different idea that I don't think is implemented yet.

humm okay, can we document this somewhere? Maybe as a new section called Notes in the docstring? See how we do it in MNE

Yeah, this would be helpful to document. Done. There will be more nuances to document as we wrestle with running batch simulations.

jasmainak · 2020-07-14T20:42:15Z

hnn_core/parallel_backends.py

+        The number of jobs to start in parallel
+    """
+    def __init__(self, n_jobs=1):
+        self.type = 'joblib'


This attribute needs to be documented @blakecaldwell with reference to my comment https://github.com/jonescompneurolab/hnn-core/pull/79/files#r454614746. Sorry if that was too ambiguous

Ah! I didn't see it. In fact, that attribute wasn't even necessary. Removed.

jasmainak · 2020-07-14T22:05:37Z

I'm going to merge this PR so we can move on with life :-)

Thanks heaps @blakecaldwell for the massive effort! 🍻

I'll make a couple of follow-up PRs to tweak a few things.

blakecaldwell commented Sep 23, 2019

View reviewed changes

hnn_core/dipole.py Outdated Show resolved Hide resolved

blakecaldwell commented Sep 23, 2019

View reviewed changes

hnn_core/dipole.py Outdated Show resolved Hide resolved

blakecaldwell commented Sep 23, 2019

View reviewed changes

hnn_core/dipole.py Show resolved Hide resolved

blakecaldwell commented Sep 23, 2019

View reviewed changes

hnn_core/dipole.py Outdated Show resolved Hide resolved

blakecaldwell commented Sep 23, 2019

View reviewed changes

hnn_core/network.py Outdated Show resolved Hide resolved

blakecaldwell force-pushed the mpi_parallel branch from 9e16101 to a896872 Compare September 24, 2019 18:51

blakecaldwell changed the title ~~WIP: enable NEURON parallelism~~ Enable MPI parallelism with ParallelContext Sep 24, 2019

blakecaldwell force-pushed the mpi_parallel branch 2 times, most recently from b3ab76c to bf2b2bb Compare September 24, 2019 20:02