MPI parallel %px bug #1803

twmr · 2012-05-31T10:31:01Z

When trying to run the psum.py example on a ipcluster with 4 nodes/engines with the follwing input (compared to the presented example the array a is different on each engine in my case)

view.scatter('a',np.linspace(0.0, 15.0, 16))
px s = psum(a)

view['s'] returns [6.0, 22.0, 38.0, 54.0] instead of the expected output [120,120,120,120]

However, if i run the psum example manually (in the bash with mpirun -np 4 python psum.py ) the output is correct. The psum.py file used for this test contains

from mpi4py import MPI                                                                                                                                                                                                            
import numpy as np                                                                                                                                                                                                                

def psum(a):                                                                                                                                                                                                                      
        s = np.sum(a)                                                                                                                                                                                                             
        rcvBuf = np.array(0.0,'d')                                                                                                                                                                                                
        MPI.COMM_WORLD.Allreduce([s, MPI.DOUBLE],                                                                                                                                                                                 
                                 [rcvBuf, MPI.DOUBLE],                                                                                                                                                                            
                                 op=MPI.SUM)                                                                                                                                                                                      
        return rcvBuf                                                                                                                                                                                                             

if __name__ == '__main__':                                                                                                                                                                                                        
        rank = MPI.COMM_WORLD.Get_rank()                                                                                                                                                                                   
        a = np.linspace(rank*4.0,(rank+1)*4.0-1,4)                                                                                                                                                                                
        print "a:", a                                                                                                                                                                                                             
        result = psum(a)                                                                                                                                                                                                          
        print "result:", result

and when run with mpirun it outputs

a: [ 4.  5.  6.  7.]
a: [  8.   9.  10.  11.]
a: [ 12.  13.  14.  15.]
a: [ 0.  1.  2.  3.]
result: 120.0
result: 120.0
result: 120.0
result: 120.0

Any ideas?

The text was updated successfully, but these errors were encountered:

twmr · 2012-05-31T13:54:34Z

I think the problem is related to the fact that MPI.COMM_WORLD.Get_size() returns 1 inside the notebook, i.e., the MPI ranks of all the engines are 0:

view.execute('import os; from mpi4py import MPI')
ar = view.execute('print "CHECK",os.getpid(), MPI.COMM_WORLD.Get_rank()')
for i in range(len(view)):
    print ar.metadata[i]['stdout'].strip()

CHECK 10692 0
CHECK 10693 0
CHECK 10698 0
CHECK 10704 0
CHECK 10709 0
CHECK 10719 0
CHECK 10729 0
CHECK 10739 0
CHECK 10748 0
CHECK 10753 0

For this test I started the cluster with the command ipcluster start --profile=mpi -n 10

BTW, the parallel machinery and especially the extended async_result interface is awesome!! Thumbs up!

twmr · 2012-05-31T15:12:35Z

Damn, I must have read the docs more carefully:

and edit the file IPYTHONDIR/profile_mpi/ipcluster_config.py.

There, instruct ipcluster to use the MPI launchers by adding the lines:

c.IPClusterEngines.engine_launcher_class = 'MPIEngineSetLauncher'

This fixed this issue!

twmr closed this as completed May 31, 2012

twmr mentioned this issue May 31, 2012

doc: cleanup the parallel psums example a little #1819

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MPI parallel %px bug #1803

MPI parallel %px bug #1803

twmr commented May 31, 2012

twmr commented May 31, 2012

twmr commented May 31, 2012

MPI parallel %px bug #1803

MPI parallel %px bug #1803

Comments

twmr commented May 31, 2012

twmr commented May 31, 2012

twmr commented May 31, 2012