-
Notifications
You must be signed in to change notification settings - Fork 929
Closed
open-mpi/ompi-release
#1249Labels
Milestone
Description
With the latest v2.x branch the collcomp test case from ANL thread-tests hangs when run across two Power 8 nodes with 16 threads. Here's the stack:
Stack trace(s) for thread: 13
1: -----------------
1: Branch 1 of 1, 2 processes, 100% of total [0-1]
1: -----------------
1: start_thread() at ?:?
2: -----------------
2: Branch 1 of 2, 1 process, 50% of total [0]
2: -----------------
2: runfunc() at collcomp.c:140
2: -----------------
2: Branch 2 of 2, 1 process, 50% of total [1]
2: -----------------
2: runfunc() at collcomp.c:135
Stack trace(s) for thread: 14
1: -----------------
1: Branch 1 of 1, 2 processes, 100% of total [0-1]
1: -----------------
1: start_thread() at ?:?
2: -----------------
2: Branch 1 of 2, 1 process, 50% of total [0]
2: -----------------
2: runfunc() at collcomp.c:135
2: -----------------
2: Branch 2 of 2, 1 process, 50% of total [1]
2: -----------------
2: runfunc() at collcomp.c:147
Stack trace(s) for thread: 15
1: -----------------
1: Branch 1 of 1, 2 processes, 100% of total [0-1]
1: -----------------
1: start_thread() at ?:?
2: -----------------
2: Branch 1 of 2, 1 process, 50% of total [0]
2: -----------------
2: runfunc() at collcomp.c:140
2: -----------------
2: Branch 2 of 2, 1 process, 50% of total [1]
2: -----------------
2: runfunc() at collcomp.c:136
Stack trace(s) for thread: 16
1: -----------------
1: Branch 1 of 1, 2 processes, 100% of total [0-1]
1: -----------------
1: start_thread() at ?:?
2: runfunc() at collcomp.c:136
Stack trace(s) for thread: 17
1: -----------------
1: Branch 1 of 1, 2 processes, 100% of total [0-1]
1: -----------------
1: start_thread() at ?:?
2: runfunc() at collcomp.c:98
3: PMPI_Allreduce() at pallreduce.c:107
4: ompi_coll_tuned_allreduce_intra_dec_fixed() at /u/jnysal/src/mirror-ompi-release/ompi/mca/coll/tuned/coll_tuned_$
ecision_fixed.c:77
5: -----------------
5: Branch 1 of 2, 1 process, 50% of total [0]
5: -----------------
5: ompi_coll_base_allreduce_intra_ring_segmented() at /u/jnysal/src/mirror-ompi-release/ompi/mca/coll/base/coll_ba$
e_allreduce.c:795
6: ompi_request_default_wait() at /u/jnysal/src/mirror-ompi-release/ompi/request/req_wait.c:41
7: ompi_request_wait_completion() at /u/jnysal/src/mirror-ompi-release/ompi/request/request.h:389
5: -----------------
5: Branch 2 of 2, 1 process, 50% of total [1]
5: -----------------
5: ompi_coll_base_allreduce_intra_ring_segmented() at /u/jnysal/src/mirror-ompi-release/ompi/mca/coll/base/coll_ba$
e_allreduce.c:749
6: mca_pml_ob1_send() at /u/jnysal/src/mirror-ompi-release/ompi/mca/pml/ob1/pml_ob1_isend.c:265
7: ompi_request_wait_completion() at /u/jnysal/src/mirror-ompi-release/ompi/request/request.h:385
8: sync_wait_mt() at /u/jnysal/src/mirror-ompi-release/opal/threads/wait_sync.c:72
9: opal_progress() at /u/jnysal/src/mirror-ompi-release/opal/runtime/opal_progress.c:216
Stack trace(s) for thread: 18
1: -----------------
1: Branch 1 of 1, 2 processes, 100% of total [0-1]
1: -----------------
1: start_thread() at ?:?
2: progress_engine() at /u/jnysal/src/mirror-ompi-release/opal/runtime/opal_progress_threads.c:105
3: opal_libevent2022_event_base_loop() at /u/jnysal/src/mirror-ompi-release/opal/mca/event/libevent2022/libevent/event.c:1630
4: poll_dispatch() at /u/jnysal/src/mirror-ompi-release/opal/mca/event/libevent2022/libevent/poll.c:165
5: poll() at ?:?
Stack trace(s) for thread: 19
1: -----------------
1: Branch 1 of 1, 2 processes, 100% of total [0-1]
1: -----------------
1: start_thread() at ?:?
2: progress_engine() at /u/jnysal/src/mirror-ompi-release/opal/mca/pmix/pmix112/pmix/src/util/progress_threads.c:49
3: opal_libevent2022_event_base_loop() at /u/jnysal/src/mirror-ompi-release/opal/mca/event/libevent2022/libevent/event.c:1630
4: epoll_dispatch() at /u/jnysal/src/mirror-ompi-release/opal/mca/event/libevent2022/libevent/epoll.c:407
5: epoll_wait() at ?:?
I was able to reproduce it with the TCP BTL, it but doesn't seem transport specific. The complete stack from all threads are not pasted, as they are just in the compute loop. Thread 17's stack seems to be the only relevant one.