New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Enable MPI parallelism with ParallelContext #79
[MRG] Enable MPI parallelism with ParallelContext #79
Conversation
Benchmarks: (mne) Blakes-MacBook-Pro:hnn-core blake$ time mpiexec -np 4 nrniv -python -mpi examples/parallel_simulate_evoked.py
numprocs=4
NEURON -- VERSION 7.6.7 HEAD (603da174) 2019-04-19
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2018
See http://neuron.yale.edu/neuron/credits
Additional mechanisms from files
mod/ar.mod mod/ca.mod mod/cad.mod mod/cat.mod mod/dipole.mod mod/dipole_pp.mod mod/hh2.mod mod/kca.mod mod/km.mod mod/vecevent.mod
running on 4 cores
Simulation time: 0.03 ms...
Simulation time: 10.0 ms...
Simulation time: 20.0 ms...
Simulation time: 30.0 ms...
Simulation time: 40.0 ms...
Simulation time: 50.0 ms...
Simulation time: 60.0 ms...
Simulation time: 70.0 ms...
Simulation time: 80.0 ms...
Simulation time: 90.0 ms...
Simulation time: 100.0 ms...
Simulation time: 110.0 ms...
Simulation time: 120.0 ms...
Simulation time: 130.0 ms...
Simulation time: 140.0 ms...
Simulation time: 150.0 ms...
Simulation time: 160.0 ms...
running on 4 cores
Simulation time: 0.03 ms...
Simulation time: 10.0 ms...
Simulation time: 20.0 ms...
Simulation time: 30.0 ms...
Simulation time: 40.0 ms...
Simulation time: 50.0 ms...
Simulation time: 60.0 ms...
Simulation time: 70.0 ms...
Simulation time: 80.0 ms...
Simulation time: 90.0 ms...
Simulation time: 100.0 ms...
Simulation time: 110.0 ms...
Simulation time: 120.0 ms...
Simulation time: 130.0 ms...
Simulation time: 140.0 ms...
Simulation time: 150.0 ms...
Simulation time: 160.0 ms...
running on 4 cores
Simulation time: 0.03 ms...
Simulation time: 10.0 ms...
Simulation time: 20.0 ms...
Simulation time: 30.0 ms...
Simulation time: 40.0 ms...
Simulation time: 50.0 ms...
Simulation time: 60.0 ms...
Simulation time: 70.0 ms...
Simulation time: 80.0 ms...
Simulation time: 90.0 ms...
Simulation time: 100.0 ms...
Simulation time: 110.0 ms...
Simulation time: 120.0 ms...
Simulation time: 130.0 ms...
Simulation time: 140.0 ms...
Simulation time: 150.0 ms...
Simulation time: 160.0 ms...
running on 4 cores
Simulation time: 0.03 ms...
Simulation time: 10.0 ms...
Simulation time: 20.0 ms...
Simulation time: 30.0 ms...
Simulation time: 40.0 ms...
Simulation time: 50.0 ms...
Simulation time: 60.0 ms...
Simulation time: 70.0 ms...
Simulation time: 80.0 ms...
Simulation time: 90.0 ms...
Simulation time: 100.0 ms...
Simulation time: 110.0 ms...
Simulation time: 120.0 ms...
Simulation time: 130.0 ms...
Simulation time: 140.0 ms...
Simulation time: 150.0 ms...
Simulation time: 160.0 ms...
real 0m47.036s
user 3m5.664s
sys 0m1.402s
(mne) Blakes-MacBook-Pro:hnn-core blake$ time mpiexec -np 4 nrniv -python -mpi examples/parallel_simulate_evoked.py
numprocs=4
NEURON -- VERSION 7.6.7 HEAD (603da174) 2019-04-19
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2018
See http://neuron.yale.edu/neuron/credits
Additional mechanisms from files
mod/ar.mod mod/ca.mod mod/cad.mod mod/cat.mod mod/dipole.mod mod/dipole_pp.mod mod/hh2.mod mod/kca.mod mod/km.mod mod/vecevent.mod
running on 4 cores
Simulation time: 0.03 ms...
Simulation time: 10.0 ms...
Simulation time: 20.0 ms...
Simulation time: 30.0 ms...
Simulation time: 40.0 ms...
Simulation time: 50.0 ms...
Simulation time: 60.0 ms...
Simulation time: 70.0 ms...
Simulation time: 80.0 ms...
Simulation time: 90.0 ms...
Simulation time: 100.0 ms...
Simulation time: 110.0 ms...
Simulation time: 120.0 ms...
Simulation time: 130.0 ms...
Simulation time: 140.0 ms...
Simulation time: 150.0 ms...
Simulation time: 160.0 ms...
running on 4 cores
Simulation time: 0.03 ms...
Simulation time: 10.0 ms...
Simulation time: 20.0 ms...
Simulation time: 30.0 ms...
Simulation time: 40.0 ms...
Simulation time: 50.0 ms...
Simulation time: 60.0 ms...
Simulation time: 70.0 ms...
Simulation time: 80.0 ms...
Simulation time: 90.0 ms...
Simulation time: 100.0 ms...
Simulation time: 110.0 ms...
Simulation time: 120.0 ms...
Simulation time: 130.0 ms...
Simulation time: 140.0 ms...
Simulation time: 150.0 ms...
Simulation time: 160.0 ms...
running on 4 cores
Simulation time: 0.03 ms...
Simulation time: 10.0 ms...
Simulation time: 20.0 ms...
Simulation time: 30.0 ms...
Simulation time: 40.0 ms...
Simulation time: 50.0 ms...
Simulation time: 60.0 ms...
Simulation time: 70.0 ms...
Simulation time: 80.0 ms...
Simulation time: 90.0 ms...
Simulation time: 100.0 ms...
Simulation time: 110.0 ms...
Simulation time: 120.0 ms...
Simulation time: 130.0 ms...
Simulation time: 140.0 ms...
Simulation time: 150.0 ms...
Simulation time: 160.0 ms...
running on 4 cores
Simulation time: 0.03 ms...
Simulation time: 10.0 ms...
Simulation time: 20.0 ms...
Simulation time: 30.0 ms...
Simulation time: 40.0 ms...
Simulation time: 50.0 ms...
Simulation time: 60.0 ms...
Simulation time: 70.0 ms...
Simulation time: 80.0 ms...
Simulation time: 90.0 ms...
Simulation time: 100.0 ms...
Simulation time: 110.0 ms...
Simulation time: 120.0 ms...
Simulation time: 130.0 ms...
Simulation time: 140.0 ms...
Simulation time: 150.0 ms...
Simulation time: 160.0 ms...
real 0m46.142s
user 3m2.367s
sys 0m1.281s
(mne) Blakes-MacBook-Pro:hnn-core blake$ time python examples/embarassingly_parallel_simulate_evoked.py
running on 1 cores
Simulation time: 0.03 ms...
running on 1 cores
Simulation time: 0.03 ms...
running on 1 cores
Simulation time: 0.03 ms...
running on 1 cores
Simulation time: 0.03 ms...
Simulation time: 10.0 ms...
Simulation time: 10.0 ms...
Simulation time: 10.0 ms...
Simulation time: 10.0 ms...
Simulation time: 20.0 ms...
Simulation time: 20.0 ms...
Simulation time: 20.0 ms...
Simulation time: 20.0 ms...
Simulation time: 30.0 ms...
Simulation time: 30.0 ms...
Simulation time: 30.0 ms...
Simulation time: 30.0 ms...
Simulation time: 40.0 ms...
Simulation time: 40.0 ms...
Simulation time: 40.0 ms...
Simulation time: 40.0 ms...
Simulation time: 50.0 ms...
Simulation time: 50.0 ms...
Simulation time: 50.0 ms...
Simulation time: 50.0 ms...
Simulation time: 60.0 ms...
Simulation time: 60.0 ms...
Simulation time: 60.0 ms...
Simulation time: 60.0 ms...
Simulation time: 70.0 ms...
Simulation time: 70.0 ms...
Simulation time: 70.0 ms...
Simulation time: 70.0 ms...
Simulation time: 80.0 ms...
Simulation time: 80.0 ms...
Simulation time: 80.0 ms...
Simulation time: 80.0 ms...
Simulation time: 90.0 ms...
Simulation time: 90.0 ms...
Simulation time: 90.0 ms...
Simulation time: 90.0 ms...
Simulation time: 100.0 ms...
Simulation time: 100.0 ms...
Simulation time: 100.0 ms...
Simulation time: 100.0 ms...
Simulation time: 110.0 ms...
Simulation time: 110.0 ms...
Simulation time: 110.0 ms...
Simulation time: 110.0 ms...
Simulation time: 120.0 ms...
Simulation time: 120.0 ms...
Simulation time: 120.0 ms...
Simulation time: 120.0 ms...
Simulation time: 130.0 ms...
Simulation time: 130.0 ms...
Simulation time: 130.0 ms...
Simulation time: 130.0 ms...
Simulation time: 140.0 ms...
Simulation time: 140.0 ms...
Simulation time: 140.0 ms...
Simulation time: 140.0 ms...
Simulation time: 150.0 ms...
Simulation time: 150.0 ms...
Simulation time: 150.0 ms...
Simulation time: 150.0 ms...
Simulation time: 160.0 ms...
Simulation time: 160.0 ms...
Simulation time: 160.0 ms...
Simulation time: 160.0 ms...
real 0m58.095s
user 3m48.320s
sys 0m1.245s |
9e16101
to
a896872
Compare
Tests added. @jasmainak could you please review? |
b3ab76c
to
bf2b2bb
Compare
@blakecaldwell I'm a bit lost what the user must do to switch between the two backends? |
It depends on the arguments to simulate_dipole. Passing n_jobs is only relevant to joblibs and n_cores is only relevant to MPI. A combination with each > 1 is not allowed (will fail with a warning). |
aa9a802
to
e213049
Compare
Sorry, of course. I realized that using 2 cores is hard-coded in the example. If you don’t provide n_procs when creating the backend, it should use all available cores (all 4). Even if there were 4 cores with hyper threading the code should have used all 4 when n_procs is not given. Sound okay? |
hnn_core/parallel_backends.py
Outdated
Parameters | ||
---------- | ||
n_procs : int | None | ||
The number of processes processes MPI will use (spread over cores) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
processes processes -> processes
should we raise an issue to track the things for another PR?
You're right, I have 2 cores but htop is showing 4 so probably hyperthreading
humm ... why is that? That means
I seem to recall from @agramfort that turning on all cores by default is a bad idea because it doesn't leave any cores available for other programs like web browser. I am not exactly sure if that was the reasoning. Perhaps the default should be 1 but in the GUI it can be set to use however many you'd like? |
|
||
n_jobs : int | ||
The number of jobs to start in parallel (NOT SUPPORTED) | ||
n_procs : int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
n_procs : int | |
n_procs : int | None |
also say what happens when n_procs
is None
n_procs : int | ||
The number of processes MPI will use (spread over cores) | ||
mpi_cmd_str : str | ||
The string of the mpi command with number of procs and options |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the user provides n_procs
and then it's also provided here, what will happen? Should we raise an error or MPI will take care of it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cleaned up the docstring to explain that one was user-specified, and the other is how many MPI will actually use. NEURON will print how many it is running on, so I think that is sufficient. If the user is seeing a discrepancy, they would be able to refer to the docstring. I don't think it needs to be explained in the output or with a warning.
hnn_core/parallel_backends.py
Outdated
import os | ||
import sys | ||
import multiprocessing | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python builtins can be global imports
|
||
if backend == 'mpi': | ||
with MPIBackend(n_jobs=n_jobs, n_procs=2, mpi_cmd='mpiexec'): | ||
dpl = simulate_dipole(net)[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a thought, since you are doing only a smoke test -- we could simulate on a smaller network so that the test is fast.
I would also add a test to check that the output of both the backends is the same. Again, smaller network should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention of this is to actually check the dipole and spiking data with MPI to make sure that it matches the sequential version. Each of the three tests has value in confirming correct results. This will need some adjustment when testing multiple trials, but getting consistent results with MPI vs. joblibs will be important. Perhaps in that PR test_mpi
and test_joblib
will run 3 trials and test_hnn_core
can still run one trial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit afraid that we'll start hitting the CI time limits if we don't optimize tests from the beginning. If we check the outputs against each other on a smaller network, it's legit, no?
Again could be another PR since we are already too big in this PR. Let's just keep track of the issues somewhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The limit for Travis is 40 minutes, but starting small makes sense. Can we delay making reduced tests until we can verify equal results with HNN once this PR and another for seeding of trials have been implemented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure! let's just open issues. It will also be helpful to have some of these issues for new contributors in a coding sprints. For onboarding them into the development cycle.
No, it is set in
I think it depends on the use case. MPI is different than embarrassingly parallel, in that if one proc slows down, it necessarily slows the whole job down because of synchronization barriers. For this reason, MPI is very careful about how processes are assigned to cores, and will even change the processor scheduling if you assign more procs than there are cores for example. Similarly, when some cores are unused, it can create an unsymmetric distribution that may be significantly slower than the balanced case. It's usually just best to let it run on all CPU cores at once. Linux/Mac is relatively forgiving if processors are oversubscribed and the user should be expecting a slowdown if they start a simulation -- this isn't a background process. |
Codecov Report
@@ Coverage Diff @@
## master #79 +/- ##
==========================================
+ Coverage 89.92% 91.04% +1.11%
==========================================
Files 17 18 +1
Lines 1708 1976 +268
==========================================
+ Hits 1536 1799 +263
- Misses 172 177 +5
Continue to review full report at Codecov.
|
@jasmainak Please see the latest 2 commits which address the global variable naming and the API for using backends. Now n_jobs is not specified with The case when |
Only use 'n_jobs' when instantiating JoblibBackend. This also removes the need to test for trying to run MPIBackend with n_jobs > 1. Update plot_simulate_evoked.py example to use JoblibBackend as well as MPIBackend.
e9bc8e3
to
b3aaae2
Compare
examples/plot_simulate_evoked.py
Outdated
net = Network(params) | ||
dpls = simulate_dipole(net, n_jobs=1, n_trials=2) | ||
with JoblibBackend(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with JoblibBackend(): | |
with JoblibBackend(n_procs=1): |
? is that the API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I don't think that makes sense. I just made a comment on how n_procs
is a configuration for MPIBackend and n_jobs
is for JoblibBackend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah okay, they are the same thing, no? You want the arguments to reflect what the original softwares call them? In any case, can you just show the user how to use it for the JoblibBackend
so that it's documented.
with JoblibBackend(): | |
with JoblibBackend(n_jobs=1): |
if that's the right way to do it
# start the simulation trial across the number of processors (cores) | ||
# specified by n_procs using MPI. The 'mpiexec' launcher is for | ||
# openmpi, which must be installed on the system | ||
from hnn_core import MPIBackend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to do:
from hnn_core.parallel import MPIBackend
but I'm fine pushing this to another PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then the user has to know about .parallel
and structuring of the files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no strong feeling. But we shouldn't put everything in global namespace. We can document some of the stuff in API and through examples, so it's just copy-paste for users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I've just been following your lead on what to name things.
Attributes | ||
---------- | ||
n_jobs : int | ||
The number of jobs to start in parallel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type is missing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you are suggesting. Do you mean casting n_jobs
before assigning to self.n_jobs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
umm I am just saying you need to document self.type in the attributes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's an int, right?
_BACKEND = self._old_backend | ||
|
||
def _clone_and_simulate(self, net, trial_idx): | ||
# avoid relative lookups after being forked by joblib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_clone_and_simulate
would fail. I just ran it again with a relative lookup:
―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― test_joblib ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
joblib.externals.loky.process_executor._RemoteTraceback:
"""
Traceback (most recent call last):
File "/Users/blake/miniconda3/envs/mne/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 418, in _process_worker
r = call_item()
File "/Users/blake/miniconda3/envs/mne/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 272, in __call__
return self.fn(*self.args, **self.kwargs)
File "/Users/blake/miniconda3/envs/mne/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 567, in __call__
return self.func(*args, **kwargs)
File "/Users/blake/miniconda3/envs/mne/lib/python3.7/site-packages/joblib/parallel.py", line 225, in __call__
for func, args, kwargs in self.items]
File "/Users/blake/miniconda3/envs/mne/lib/python3.7/site-packages/joblib/parallel.py", line 225, in <listcomp>
for func, args, kwargs in self.items]
File "/Users/blake/repos/hnn-core-new/hnn_core/parallel_backends.py", line 88, in _clone_and_simulate
from .neuron import NeuronNetwork, _simulate_single_trial
KeyError: "'__name__' not in globals"
"""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay fair enough :-)
|
||
|
||
class NeuronNetwork(object): | ||
"""The NeuronNetwork class. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so I'm just trying to wrap my head around this class. The idea is that this class contains everything in the old Network class that pertains to Neuron software? The advantage is that the new Network class is picklable?
One disadvantage I see with this approach is that it disallows users to pass any Neuron objects that they created. I can see that becoming useful if we wanted to allow more detailed cell morphologies or dynamics for instance. See my proposal for incorporating the calcium dynamics for instance. Not sure if that's the best approach but I can see those kinds of APIs being useful in similar scenarios that involve customization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and:
- network.py is simpler; no reason for context manager there
NeuronNetwork
can be rebuilt. For running batches of multiple trials on the same network.
Then parallel simulation would break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edit to 2) above. Really for running multiple trials without respawning nrniv. Running multiple trials on the same network is a different idea that I don't think is implemented yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
humm okay, can we document this somewhere? Maybe as a new section called Notes in the docstring? See how we do it in MNE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this would be helpful to document. Done. There will be more nuances to document as we wrestle with running batch simulations.
hnn_core/parallel_backends.py
Outdated
The number of jobs to start in parallel | ||
""" | ||
def __init__(self, n_jobs=1): | ||
self.type = 'joblib' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This attribute needs to be documented @blakecaldwell with reference to my comment https://github.com/jonescompneurolab/hnn-core/pull/79/files#r454614746. Sorry if that was too ambiguous
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! I didn't see it. In fact, that attribute wasn't even necessary. Removed.
I'm going to merge this PR so we can move on with life :-) Thanks heaps @blakecaldwell for the massive effort! 🍻 I'll make a couple of follow-up PRs to tweak a few things. |
Use ParallelContext to split computation for each simulation among
processors on a system. This mode is not compatible with joblibs
embarrasingly parallel execution and requires MPI.
Addresses #78