-
Notifications
You must be signed in to change notification settings - Fork 932
Closed
Description
@bosilca In testing the btl_tcp_progress_thread=1 functionality on master and v2.x today, runs usually -- but not always -- hang with even a simple MPI ring program when the progress thread is enabled.
On master, when I run the example ring_c program across 2 servers, if I run with np=2, 10 times out of 10, it runs fine. But if I run with np=4, 10 times out of 10, it hangs (I limited it down to a single IP interface, just to make the test case simpler):
$ mpirun -np 4 --map-by node --mca btl_tcp_if_include 10.3.0.1/16 --mca btl_tcp_progress_thread 1 --mca btl tcp,sm,self ring_c
Process 0 sending 10 to 1, tag 201 (4 processes in ring)
Process 0 sent to 1
...hang...
I see the same behavior on the head of v2.x.
@hppritcha It looks like we neglected to mention this feature in v2.0.0 NEWS, so we're probably not in much danger for v2.0.0.