Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPI parallel %px bug #1803

Closed
twmr opened this issue May 31, 2012 · 2 comments
Closed

MPI parallel %px bug #1803

twmr opened this issue May 31, 2012 · 2 comments
Milestone

Comments

@twmr
Copy link
Contributor

twmr commented May 31, 2012

When trying to run the psum.py example on a ipcluster with 4 nodes/engines with the follwing input (compared to the presented example the array a is different on each engine in my case)

view.scatter('a',np.linspace(0.0, 15.0, 16))
px s = psum(a)

view['s'] returns [6.0, 22.0, 38.0, 54.0] instead of the expected output [120,120,120,120]

However, if i run the psum example manually (in the bash with mpirun -np 4 python psum.py ) the output is correct. The psum.py file used for this test contains

from mpi4py import MPI                                                                                                                                                                                                            
import numpy as np                                                                                                                                                                                                                

def psum(a):                                                                                                                                                                                                                      
        s = np.sum(a)                                                                                                                                                                                                             
        rcvBuf = np.array(0.0,'d')                                                                                                                                                                                                
        MPI.COMM_WORLD.Allreduce([s, MPI.DOUBLE],                                                                                                                                                                                 
                                 [rcvBuf, MPI.DOUBLE],                                                                                                                                                                            
                                 op=MPI.SUM)                                                                                                                                                                                      
        return rcvBuf                                                                                                                                                                                                             

if __name__ == '__main__':                                                                                                                                                                                                        
        rank = MPI.COMM_WORLD.Get_rank()                                                                                                                                                                                   
        a = np.linspace(rank*4.0,(rank+1)*4.0-1,4)                                                                                                                                                                                
        print "a:", a                                                                                                                                                                                                             
        result = psum(a)                                                                                                                                                                                                          
        print "result:", result  

and when run with mpirun it outputs

a: [ 4.  5.  6.  7.]
a: [  8.   9.  10.  11.]
a: [ 12.  13.  14.  15.]
a: [ 0.  1.  2.  3.]
result: 120.0
result: 120.0
result: 120.0
result: 120.0

Any ideas?

@twmr
Copy link
Contributor Author

twmr commented May 31, 2012

I think the problem is related to the fact that MPI.COMM_WORLD.Get_size() returns 1 inside the notebook, i.e., the MPI ranks of all the engines are 0:

view.execute('import os; from mpi4py import MPI')
ar = view.execute('print "CHECK",os.getpid(), MPI.COMM_WORLD.Get_rank()')
for i in range(len(view)):
    print ar.metadata[i]['stdout'].strip()
CHECK 10692 0
CHECK 10693 0
CHECK 10698 0
CHECK 10704 0
CHECK 10709 0
CHECK 10719 0
CHECK 10729 0
CHECK 10739 0
CHECK 10748 0
CHECK 10753 0

For this test I started the cluster with the command ipcluster start --profile=mpi -n 10

BTW, the parallel machinery and especially the extended async_result interface is awesome!! Thumbs up!

@twmr
Copy link
Contributor Author

twmr commented May 31, 2012

Damn, I must have read the docs more carefully:

and edit the file IPYTHONDIR/profile_mpi/ipcluster_config.py.

There, instruct ipcluster to use the MPI launchers by adding the lines:

c.IPClusterEngines.engine_launcher_class = 'MPIEngineSetLauncher'

This fixed this issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant