Inquiry on implementation of parallelism on the cluster #208

hainingpan · 2019-08-23T22:14:41Z

I intend to use multiple cores on the cluster so I read the documentation https://adaptive.readthedocs.io/en/latest/tutorial/tutorial.parallelism.html#mpi4py-futures-mpipoolexecutor
What confuses me is the number of nodes to specify in mpiexec -n. In the documentation, it says mpiexec -n 16 python -m mpi4py.futures run_learner.py. I am just wondering that why it is 16 instead of 1. (As it says 1 in 'On your laptop/desktop you can run this script like: mpiexec -n 1 python run_learner.py ', is it just assuming laptop/desktop computer only has one core? ) By specifying 16, will this become an issue that multiple instances are created on different cores, so that the adaptive.Runner() on each core is actually being executed simultaneously?(if so, it doesn't seem to benefit from parallelism since every core is doing the redundant same thing.) But I thought the expected behavior would be: there are multiple processes but only one is the master process. The other process will take jobs from the process pool. So I guess the code will be like:

if (rank==0)  #assume master process has rank=0
    learner=adaptive.learner(...)
    runner=adaptive.runner(...)
else #for the slaves that take jobs from parallel pool, but I am not sure how to impletement this part
    ...

So, how do I know in which mode the Runner behaves or whether it benefits from parallelism?

The text was updated successfully, but these errors were encountered:

hainingpan · 2019-08-23T22:31:59Z

To make my question clearer, please look at this example code in the documentation:

from mpi4py.futures import MPIPoolExecutor

learner = adaptive.Learner1D(f, bounds=(-1, 1))

# load the data
learner.load(fname)

# run until `goal` is reached with an `MPIPoolExecutor`
runner = adaptive.Runner(
    learner,
    executor=MPIPoolExecutor(),
    shutdown_executor=True,
    goal=lambda l: l.loss() < 0.01,
)

# periodically save the data (in case the job dies)
runner.start_periodic_saving(dict(fname=fname), interval=600)

# block until runner goal reached
runner.ioloop.run_until_complete(runner.task)

This code does not specify any process to be master or slave. But when the parallel processes all reach the sentence runner.start_periodic_saving(dict(fname=fname), interval=600), will this be the scenario that each process is just trying to save the data it has? If so, will this become a mess? Because, on the distributed cluster, the cores are usually not sharing the memory, so that how can one knows what others have on their memory? So the later one will overwrite the previous with the data which the later one prossesses. I would expect the ideal scenario to be only one process is doing the saving data job though everyone is involved in Runner. But since here no one is assigned to become the master(the one to do the saving job), how could this be possible?

basnijholt · 2019-08-26T08:02:19Z

When you start your process like: mpiexec -n 16 python -m mpi4py.futures run_learner.py it means that 15 cores will go to the MPIPoolExecutor() and 1 core will be for the local process, the one in which the learner and runner communicate.

~~I am not entirely sure~~ why on your laptop it works with -n 1, @dalcinl (the author of mpi4py) wrote that here. edit: You can actually use export MPI4PY_MAX_WORKERS=15 or -n 16!

More is explained here.

Note that this MPIPoolExecutor is only useful when your code doesn't use MPI already!~~

dalcinl · 2019-08-26T08:43:27Z

Just to clarify it again, when you run your code this way in your laptop:

mpiexec -n 1 python run_learner.py # note there is not '-m mpi4py.futures' arg

then mpi4py.futures will spawn new workers at runtime using an MPI feature that is called dynamic process management, and this feature is in the MPI standard since 1998. You can use export MPI4PY_MAX_WORKERS=15 to control the number of workers, or you can do it by requesting a specific number of workers in your code with MPIPoolExecutor(num_workers). If you create many executors, each will have its own pool of workers, each pool isolated from the others.

Unfortunately, we are in 2019, and cluster/supercomputers still fail to provide support for dynamic process management, or they make it quite cumbersome to setup. So you cannot (or cannot easily) spawn new workers once the MPI execution started. To alleviate this mess, mpi4py.futures provides an alternative execution mechanism: you start all the P processes as part of MPI execution, and then mpi4py.futures splits COMM_WORLD in one master and P-1 workers. To use this alternative execution mechanism you have to run this way (BTW, it also works in your laptop):

mpiexec -n $P python -m mpi4py.futures run_learner.py # note the `-m mpi4py.futures` flag

A minor annoyance of this execution mode is that if your code creates many executors, all of them will share the same pool of P-1 workers.

hainingpan closed this as completed Aug 27, 2019

basnijholt mentioned this issue Aug 31, 2019

release v0.9.0 #212

Closed

basnijholt mentioned this issue Jan 16, 2020

Release v0.10 #258

Closed

basnijholt added the question label Jun 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inquiry on implementation of parallelism on the cluster #208

Inquiry on implementation of parallelism on the cluster #208

hainingpan commented Aug 23, 2019

hainingpan commented Aug 23, 2019

basnijholt commented Aug 26, 2019 •

edited

dalcinl commented Aug 26, 2019

Inquiry on implementation of parallelism on the cluster #208

Inquiry on implementation of parallelism on the cluster #208

Comments

hainingpan commented Aug 23, 2019

hainingpan commented Aug 23, 2019

basnijholt commented Aug 26, 2019 • edited

dalcinl commented Aug 26, 2019

basnijholt commented Aug 26, 2019 •

edited