piotrc@machine:~/ws/oneCCL/build$ CCL_LOG_LEVEL=debug CCL_WORKER_COUNT=2 mpirun -np 2 examples/cpu/cpu_allreduce_test 2024:01:16-14:54:14:(580975) |CCL_INFO| process launcher: hydra, local_proc_idx: 1, local_proc_count: 2 2024:01:16-14:54:14:(580975) |CCL_DEBUG| ofi_api_wrapper.cpp:55 get_ofi_lib_path: OFI lib path (MPI_ROOT/opt/mpi/libfabric/lib/): /localdata/piotrc/oneCCL/build/_install/opt/mpi/libfabric/lib/libfabric.so 2024:01:16-14:54:14:(580974) |CCL_INFO| process launcher: hydra, local_proc_idx: 0, local_proc_count: 2 2024:01:16-14:54:14:(580974) |CCL_DEBUG| ofi_api_wrapper.cpp:55 get_ofi_lib_path: OFI lib path (MPI_ROOT/opt/mpi/libfabric/lib/): /localdata/piotrc/oneCCL/build/_install/opt/mpi/libfabric/lib/libfabric.so 2024:01:16-14:54:14:(580975) |CCL_DEBUG| mpi_api_wrapper.cpp:40 mpi_api_init: MPI lib path: libmpi.so.12 2024:01:16-14:54:14:(580974) |CCL_DEBUG| mpi_api_wrapper.cpp:40 mpi_api_init: MPI lib path: libmpi.so.12 2024:01:16-14:54:14:(580975) |CCL_DEBUG| atl_mpi_ctx.cpp:174 get_lib_attr: MPI version: Intel(R) MPI Library 2021.11 for Linux* OS library kind: release 2024:01:16-14:54:14:(580975) |CCL_DEBUG| atl_mpi_ctx.cpp:201 get_lib_attr: version_substr: 2021.11 for Linux* OS library kind: release 2024:01:16-14:54:14:(580975) |CCL_DEBUG| atl_mpi_ctx.cpp:214 get_lib_attr: MPI numerical version: 2021 2024:01:16-14:54:14:(580975) |CCL_DEBUG| atl_mpi_ctx.cpp:238 get_lib_attr: kind_substr: release 2024:01:16-14:54:14:(580975) |CCL_DEBUG| atl_mpi_ctx.cpp:275 get_lib_attr: set lib_attr.type = impi, version 2021, minimal expected version 2019 2024:01:16-14:54:14:(580975) |CCL_DEBUG| atl_mpi_ctx.cpp:283 get_lib_attr: set lib_attr.hmem = 1, version 2021, minimal expected hmem version 2021 2024:01:16-14:54:14:(580975) |CCL_DEBUG| atl_mpi_ctx.cpp:295 get_lib_attr: MPI library type: impi 2024:01:16-14:54:14:(580975) |CCL_DEBUG| atl_mpi_ctx.cpp:526 set_env: set CCL-MPI specific environment 2024:01:16-14:54:14:(580974) |CCL_DEBUG| atl_mpi_ctx.cpp:174 get_lib_attr: MPI version: Intel(R) MPI Library 2021.11 for Linux* OS library kind: release 2024:01:16-14:54:14:(580975) |CCL_INFO| OS info: { Linux machine 5.15.0-79-generic #86-Ubuntu SMP Mon Jul 10 16:07:21 UTC 2023 x86_64 } 2024:01:16-14:54:14:(580975) |CCL_DEBUG| datatype.cpp:69 ccl_datatype_storage: create datatype_storage 2024:01:16-14:54:14:(580974) |CCL_DEBUG| atl_mpi_ctx.cpp:201 get_lib_attr: version_substr: 2021.11 for Linux* OS library kind: release 2024:01:16-14:54:14:(580974) |CCL_DEBUG| atl_mpi_ctx.cpp:214 get_lib_attr: MPI numerical version: 2021 2024:01:16-14:54:14:(580974) |CCL_DEBUG| atl_mpi_ctx.cpp:238 get_lib_attr: kind_substr: release 2024:01:16-14:54:14:(580974) |CCL_DEBUG| atl_mpi_ctx.cpp:275 get_lib_attr: set lib_attr.type = impi, version 2021, minimal expected version 2019 2024:01:16-14:54:14:(580974) |CCL_DEBUG| atl_mpi_ctx.cpp:283 get_lib_attr: set lib_attr.hmem = 1, version 2021, minimal expected hmem version 2021 2024:01:16-14:54:14:(580974) |CCL_DEBUG| atl_mpi_ctx.cpp:295 get_lib_attr: MPI library type: impi 2024:01:16-14:54:14:(580974) |CCL_DEBUG| atl_mpi_ctx.cpp:526 set_env: set CCL-MPI specific environment 2024:01:16-14:54:14:(580974) |CCL_INFO| OS info: { Linux machine 5.15.0-79-generic #86-Ubuntu SMP Mon Jul 10 16:07:21 UTC 2023 x86_64 } 2024:01:16-14:54:14:(580974) |CCL_DEBUG| datatype.cpp:69 ccl_datatype_storage: create datatype_storage 2024:01:16-14:54:14:(580974) |CCL_DEBUG| hwloc_wrapper.cpp:66 ccl_hwloc_wrapper: hwloc root object: type: Machine attr: total=527872264KB :: DMIProductName="PowerEdge R6525" :: DMIProductVersion= :: DMIBoardVendor="Dell Inc." :: DMIBoardName=0DMD2T :: DMIBoardVersion=A02 :: DMIChassisVendor="Dell Inc." :: DMIChassisType=23 :: DMIChassisVersion= :: DMIChassisAssetTag= :: DMIBIOSVendor="Dell Inc." :: DMIBIOSVersion=2.8.4 :: DMIBIOSDate=06/23/2022 :: DMISysVendor="Dell Inc." :: Backend=Linux :: LinuxCgroup=/ :: OSName=Linux :: OSRelease=5.15.0-79-generic :: OSVersion="#86-Ubuntu SMP Mon Jul 10 16:07:21 UTC 2023" :: HostName=machine :: Architecture=x86_64 :: hwlocVersion=2.9.3rc2-git :: ProcessName=cpu_allreduce_test cpuset: 0xffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff nodeset: 0xff 2024:01:16-14:54:14:(580975) |CCL_DEBUG| hwloc_wrapper.cpp:66 ccl_hwloc_wrapper: hwloc root object: type: Machine attr: total=527872264KB :: DMIProductName="PowerEdge R6525" :: DMIProductVersion= :: DMIBoardVendor="Dell Inc." :: DMIBoardName=0DMD2T :: DMIBoardVersion=A02 :: DMIChassisVendor="Dell Inc." :: DMIChassisType=23 :: DMIChassisVersion= :: DMIChassisAssetTag= :: DMIBIOSVendor="Dell Inc." :: DMIBIOSVersion=2.8.4 :: DMIBIOSDate=06/23/2022 :: DMISysVendor="Dell Inc." :: Backend=Linux :: LinuxCgroup=/ :: OSName=Linux :: OSRelease=5.15.0-79-generic :: OSVersion="#86-Ubuntu SMP Mon Jul 10 16:07:21 UTC 2023" :: HostName=machine :: Architecture=x86_64 :: hwlocVersion=2.9.3rc2-git :: ProcessName=cpu_allreduce_test cpuset: 0xffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff nodeset: 0xff [0] MPI startup(): Intel(R) MPI Library, Version 2021.11 Build 20231005 (id: 74c4a23) [0] MPI startup(): Copyright (C) 2003-2023 Intel Corporation. All rights reserved. [0] MPI startup(): library kind: release [0] MPI startup(): libfabric version: 1.18.1-impi [0] MPI startup(): libfabric provider: psm3 [0] MPI startup(): File "" not found [0] MPI startup(): Load tuning file: "/localdata/piotrc/oneCCL/build/_install/opt/mpi/etc/tuning_generic_ofi.dat" [0] MPI startup(): ===== Nic pinning on machine ===== [0] MPI startup(): Rank Pin nic [0] MPI startup(): 0 rocep161s0f1 [0] MPI startup(): 1 rocep161s0f1 [0] MPI startup(): THREAD_SPLIT mode is switched on, 2 endpoints in use [0] MPI startup(): Rank Pid Node name Pin cpu [0] MPI startup(): 0 580974 machine {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29, 30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,96,97,98,99,100,101,102,10 3,104,105,106,107,108,109,110,111,128,129,130,131,132,133,134,135,136,137,138,13 9,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,15 9,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,224,225,226,22 7,228,229,230,231,232,233,234,235,236,237,238,239} [0] MPI startup(): 1 580975 machine {48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74 ,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,112,113,114,115, 116,117,118,119,120,121,122,123,124,125,126,127,176,177,178,179,180,181,182,183, 184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203, 204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223, 240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255} 2024:01:16-14:54:15:(580974) |CCL_DEBUG| internal_kvs.cpp:325 fill_local_host_ip: use ipv4: 10.1.3.101 2024:01:16-14:54:15:(580974) |CCL_DEBUG| communicator_impl.hpp:115 create_communicator: size 2, rank 0 2024:01:16-14:54:15:(580974) |CCL_DEBUG| atl_mpi_comm.cpp:82 init_transport: init atl, requested ep_count 2 2024:01:16-14:54:15:(580975) |CCL_DEBUG| internal_kvs.cpp:325 fill_local_host_ip: use ipv4: 10.1.3.101 2024:01:16-14:54:15:(580975) |CCL_DEBUG| communicator_impl.hpp:115 create_communicator: size 2, rank 1 2024:01:16-14:54:15:(580975) |CCL_DEBUG| atl_mpi_comm.cpp:82 init_transport: init atl, requested ep_count 2 2024:01:16-14:54:15:(580974) |CCL_DEBUG| atl_mpi.cpp:66 init: MPI was initialized externaly 2024:01:16-14:54:15:(580975) |CCL_DEBUG| atl_mpi.cpp:66 init: MPI was initialized externaly 2024:01:16-14:54:15:(580974) |CCL_INFO| atl-mpi: { is_external_init: 1 mpi_lib_attr.type: impi mpi_lib_attr.hmem: 1 extra_ep: 0 mnic_type: none progress_mode: 1 sync_coll: 0 } 2024:01:16-14:54:15:(580974) |CCL_INFO| atl attrs: { in: { shm: 0, hmem: 0, sync_coll: 0, extra_ep: 0, ep_count: 2, mnic_type: none, mnic_count: 2, mnic_offset: none } out: { shm: 0, hmem: 0, mnic_type: none, mnic_count: 1, tag_bits: 32, max_tag: 1073741823 } } 2024:01:16-14:54:15:(580974) |CCL_DEBUG| atl_mpi.cpp:660 comm_split: atl-mpi-ep: 0, ep_idx 0, nic_idx 0 2024:01:16-14:54:15:(580975) |CCL_DEBUG| atl_mpi.cpp:660 comm_split: atl-mpi-ep: 0, ep_idx 0, nic_idx 0 2024:01:16-14:54:15:(580974) |CCL_DEBUG| atl_mpi.cpp:660 comm_split: atl-mpi-ep: 1, ep_idx 1, nic_idx 0 2024:01:16-14:54:15:(580974) |CCL_DEBUG| atl_base_comm.cpp:168 init_tag: atl tag: { bits: 32, max: 1073741823, mask: 1073741823, pof2: 536870912 } 2024:01:16-14:54:15:(580974) |CCL_DEBUG| atl_base_comm.cpp:90 create_comm_id: rank2rank_map: 0 12024:01:16-14:54:15:(580975) |CCL_DEBUG| atl_mpi.cpp:660 comm_split: atl-mpi-ep: 1, ep_idx 1, nic_idx 0 2024:01:16-14:54:15:(580975) |CCL_DEBUG| atl_base_comm.cpp:90 create_comm_id: rank2rank_map: 0 1 2024:01:16-14:54:15:(580975) |CCL_DEBUG| atl_base_comm.cpp:91 create_comm_id: rank2proc_map: 2024:01:16-14:54:15:(580974) |CCL_DEBUG| atl_base_comm.cpp:91 create_comm_id: rank2proc_map: 2024:01:16-14:54:15:(580974) |CCL_DEBUG| atl_base_comm.cpp:132 create_comm_id: new_comm_id 0 2024:01:16-14:54:15:(580974) |CCL_INFO| start workers for local process [0:2] 2024:01:16-14:54:15:(580975) |CCL_DEBUG| atl_base_comm.cpp:132 create_comm_id: new_comm_id 0 2024:01:16-14:54:15:(580975) |CCL_INFO| start workers for local process [1:2] 2024:01:16-14:54:15:(580974) |CCL_DEBUG| env.cpp:939 env_2_worker_affinity_auto: available_cores x0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239 2024:01:16-14:54:15:(580975) |CCL_DEBUG| env.cpp:939 env_2_worker_affinity_auto: available_cores x48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255 2024:01:16-14:54:15:(580974) |CCL_DEBUG| queue.cpp:61 ccl_sched_queue: created sched_queue, idx 0, atl_eps count 1, atl_eps[0] 0 2024:01:16-14:54:15:(580974) |CCL_DEBUG| base_thread.cpp:21 start: worker 0 2024:01:16-14:54:15:(580975) |CCL_DEBUG| queue.cpp:61 ccl_sched_queue: created sched_queue, idx 0, atl_eps count 1, atl_eps[0] 0 2024:01:16-14:54:15:(580975) |CCL_DEBUG| base_thread.cpp:21 start: worker 0 2024:01:16-14:54:15:(580981) |CCL_DEBUG| worker.cpp:322 ccl_worker_func: worker: idx: 0, cpu: 254, numa: {os_idx: 7, memory: 64491 MB, cores: 16, cpus: 32, membind: 1} 2024:01:16-14:54:15:(580981) |CCL_DEBUG| hwloc_wrapper.cpp:202 membind_thread: bound thread to NUMA node 7 2024:01:16-14:54:15:(580975) |CCL_DEBUG| exec.cpp:139 start_workers: started worker: local_proc_idx 1, worker_idx 0, cpu: 254, numa: 7 2024:01:16-14:54:15:(580975) |CCL_DEBUG| queue.cpp:61 ccl_sched_queue: created sched_queue, idx 1, atl_eps count 1, atl_eps[0] 1 2024:01:16-14:54:15:(580975) |CCL_DEBUG| base_thread.cpp:21 start: worker 1 2024:01:16-14:54:15:(580982) |CCL_DEBUG| worker.cpp:322 ccl_worker_func: worker: idx: 0, cpu: 238, numa: {os_idx: 6, memory: 64503 MB, cores: 16, cpus: 32, membind: 1} 2024:01:16-14:54:15:(580982) |CCL_DEBUG| hwloc_wrapper.cpp:202 membind_thread: bound thread to NUMA node 6 2024:01:16-14:54:15:(580974) |CCL_DEBUG| exec.cpp:139 start_workers: started worker: local_proc_idx 0, worker_idx 0, cpu: 238, numa: 6 2024:01:16-14:54:15:(580974) |CCL_DEBUG| queue.cpp:61 ccl_sched_queue: created sched_queue, idx 1, atl_eps count 1, atl_eps[0] 1 2024:01:16-14:54:15:(580974) |CCL_DEBUG| base_thread.cpp:21 start: worker 1 2024:01:16-14:54:15:(580983) |CCL_DEBUG| worker.cpp:322 ccl_worker_func: worker: idx: 1, cpu: 255, numa: {os_idx: 7, memory: 64491 MB, cores: 16, cpus: 32, membind: 1} 2024:01:16-14:54:15:(580983) |CCL_DEBUG| hwloc_wrapper.cpp:202 membind_thread: bound thread to NUMA node 7 2024:01:16-14:54:15:(580975) |CCL_DEBUG| exec.cpp:139 start_workers: started worker: local_proc_idx 1, worker_idx 1, cpu: 255, numa: 7 2024:01:16-14:54:15:(580975) |CCL_DEBUG| atl_base_comm.cpp:90 create_comm_id: rank2rank_map: 0 1 2024:01:16-14:54:15:(580975) |CCL_DEBUG| atl_base_comm.cpp:91 create_comm_id: rank2proc_map: 2024:01:16-14:54:15:(580984) |CCL_DEBUG| worker.cpp:322 ccl_worker_func: worker: idx: 1, cpu: 239, numa: {os_idx: 6, memory: 64503 MB, cores: 16, cpus: 32, membind: 1} 2024:01:16-14:54:15:(580984) |CCL_DEBUG| hwloc_wrapper.cpp:202 membind_thread: bound thread to NUMA node 6 2024:01:16-14:54:15:(580974) |CCL_DEBUG| exec.cpp:139 start_workers: started worker: local_proc_idx 0, worker_idx 1, cpu: 239, numa: 6 2024:01:16-14:54:15:(580974) |CCL_DEBUG| atl_base_comm.cpp:90 create_comm_id: rank2rank_map: 0 1 2024:01:16-14:54:15:(580974) |CCL_DEBUG| atl_base_comm.cpp:91 create_comm_id: rank2proc_map: 2024:01:16-14:54:15:(580974) |CCL_DEBUG| atl_base_comm.cpp:132 create_comm_id: new_comm_id 1 2024:01:16-14:54:15:(580974) |CCL_INFO| library version: Gold-2021.11.2 2024-01-16T 11:37:55Z (master/8d18c7b) 2024:01:16-14:54:15:(580975) |CCL_DEBUG| atl_base_comm.cpp:132 create_comm_id: new_comm_id 1 2024:01:16-14:54:15:(580975) |CCL_INFO| local process [1:2]: worker: 0, cpu: 254, numa: 7 2024:01:16-14:54:15:(580975) |CCL_INFO| local process [1:2]: worker: 1, cpu: 255, numa: 7 2024:01:16-14:54:15:(580974) |CCL_INFO| specification version: 1.0 2024:01:16-14:54:15:(580974) |CCL_INFO| build mode: release 2024:01:16-14:54:15:(580974) |CCL_INFO| C compiler: GNU 11.4.0 2024:01:16-14:54:15:(580974) |CCL_INFO| C++ compiler: GNU 11.4.0 2024:01:16-14:54:15:(580975) |CCL_DEBUG| topo_manager.cpp:1214 build_host_info: rank: 1, size: 2, host: machine 2024:01:16-14:54:15:(580974) |CCL_INFO| hwloc initialized: 1 { membind_thread_supported: 1 numa: {os_idx: 0, memory: 64049 MB, cores: 16, cpus: 32, membind: 1} numa: {os_idx: 1, memory: 64503 MB, cores: 16, cpus: 32, membind: 1} numa: {os_idx: 2, memory: 64503 MB, cores: 16, cpus: 32, membind: 1} numa: {os_idx: 3, memory: 64491 MB, cores: 16, cpus: 32, membind: 1} numa: {os_idx: 4, memory: 64455 MB, cores: 16, cpus: 32, membind: 1} numa: {os_idx: 5, memory: 64503 MB, cores: 16, cpus: 32, membind: 1} numa: {os_idx: 6, memory: 64503 MB, cores: 16, cpus: 32, membind: 1} numa: {os_idx: 7, memory: 64491 MB, cores: 16, cpus: 32, membind: 1} } 2024:01:16-14:54:15:(580974) |CCL_INFO| local process [0:2]: worker: 0, cpu: 238, numa: 6 2024:01:16-14:54:15:(580974) |CCL_INFO| local process [0:2]: worker: 1, cpu: 239, numa: 6 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_WORKER_COUNT: 2 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_WORKER_OFFLOAD: 1 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_WORKER_WAIT: 1 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_LOG_LEVEL: debug 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_ABORT_ON_THROW: 0 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_QUEUE_DUMP: 0 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_SCHED_DUMP: 0 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_SCHED_PROFILE: 0 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_ENTRY_MAX_UPDATE_TIME_SEC: 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_FRAMEWORK: none 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_ATL_TRANSPORT: mpi 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_ATL_SHM: 0 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_ATL_RMA: 0 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_ATL_HMEM: 0 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_ATL_SEND_PROXY: none 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_ATL_CACHE: 1 2024:01:16-14:54:15:(580974) |CCL_DEBUG| env.cpp:664 print: CCL_ATL_SYNC_COLL: 0 2024:01:16-14:54:15:(580974) |CCL_DEBUG| env.cpp:665 print: CCL_ATL_EXTRA_EP: 0 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_MNIC: none 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_MNIC_NAME: 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_MNIC_COUNT: 2 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_MNIC_OFFSET: none 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_ALGO_FALLBACK: 1 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_ALLGATHERV: 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_ALLREDUCE: 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_ALLTOALL: 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_ALLTOALLV: 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_BARRIER: 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_BCAST: 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_RECV: 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_REDUCE: 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_REDUCE_SCATTER: 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_SEND: 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_ALLGATHERV: 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_ALLTOALL: 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_ALLTOALLV: 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_REDUCE: 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_ALLGATHERV_SCALEOUT: 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_ALLREDUCE_SCALEOUT: 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_ALLTOALL_SCALEOUT: 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_ALLTOALLV_SCALEOUT: 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_REDUCE_SCALEOUT: 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_UNORDERED_COLL: 0 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_FUSION: 0 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_FUSION_BYTES_THRESHOLD: 16384 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_FUSION_COUNT_THRESHOLD: 256 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_FUSION_CHECK_URGENT: 1 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_FUSION_CYCLE_MS: 0.2 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_PRIORITY: none 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_SPIN_COUNT: 1000 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_YIELD: pause 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_MAX_SHORT_SIZE: 0 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_BCAST_PART_COUNT: 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_CACHE_KEY: match_id 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_CACHE_FLUSH: 0 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_BUFFER_CACHE: 1 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_STRICT_ORDER: 0 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_STAGING_BUFFER: regular 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_OP_SYNC: 0 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_CHUNK_COUNT: 1 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_MIN_CHUNK_SIZE: 65536 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_RS_CHUNK_COUNT: 1 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_RS_MIN_CHUNK_SIZE: 65536 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_ALLREDUCE_NREDUCE_BUFFERING: 0 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_ALLREDUCE_NREDUCE_SEGMENT_SIZE: 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_ALLREDUCE_2D_CHUNK_COUNT: 1 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_ALLREDUCE_2D_MIN_CHUNK_SIZE: 65536 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_ALLREDUCE_2D_SWITCH_DIMS: 0 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_ALLTOALL_SCATTER_MAX_OPS: 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_BACKEND: native 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_LOCAL_RANK: 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_LOCAL_SIZE: 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_PROCESS_LAUNCHER: hydra 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_MPI_LIBRARY_PATH: 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_OFI_LIBRARY_PATH: 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_PMIX_LIBRARY_PATH: 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_ITT_LEVEL: 0 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_BF16: scalar 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_FP16: f16c 2024:01:16-14:54:15:(580974) |CCL_INFO| CCL_ROOT: /localdata/piotrc/oneCCL/build/_install 2024:01:16-14:54:15:(580974) |CCL_INFO| I_MPI_ROOT: /localdata/piotrc/oneCCL/build/_install 2024:01:16-14:54:15:(580974) |CCL_INFO| FI_PROVIDER_PATH: /localdata/piotrc/oneCCL/build/_install/opt/mpi/libfabric/lib/prov:/usr/lib64/libfabric 2024:01:16-14:54:15:(580974) |CCL_INFO| FI_PROVIDER: 2024:01:16-14:54:15:(580974) |CCL_DEBUG| selector_impl.hpp:251 print: allgatherv selection main table [0 - max]: direct fallback table [0 - max]: flat scaleout table [0 - max]: ring 2024:01:16-14:54:15:(580974) |CCL_DEBUG| selector_impl.hpp:251 print: allreduce selection main table [0 - max]: direct fallback table [0 - 8192]: recursive_doubling [8193 - max]: ring scaleout table [0 - max]: ring 2024:01:16-14:54:15:(580974) |CCL_DEBUG| selector_impl.hpp:251 print: alltoall selection main table [0 - 1048576]: direct [1048577 - max]: scatter fallback table [0 - max]: scatter scaleout table [0 - max]: scatter 2024:01:16-14:54:15:(580974) |CCL_DEBUG| selector_impl.hpp:251 print: alltoallv selection main table [0 - 1048576]: direct [1048577 - max]: scatter fallback table [0 - max]: scatter scaleout table [0 - max]: scatter 2024:01:16-14:54:15:(580974) |CCL_DEBUG| selector_impl.hpp:251 print: barrier selection main table [0 - max]: direct fallback table [0 - max]: ring scaleout table [0 - max]: direct 2024:01:16-14:54:15:(580974) |CCL_DEBUG| selector_impl.hpp:251 print: bcast selection main table [0 - max]: direct fallback table [0 - max]: naive scaleout table [0 - max]: direct 2024:01:16-14:54:15:(580974) |CCL_DEBUG| selector_impl.hpp:251 print: recv selection main table [0 - max]: direct fallback table [0 - max]: direct scaleout table [0 - max]: direct 2024:01:16-14:54:15:(580974) |CCL_DEBUG| selector_impl.hpp:251 print: reduce selection main table [0 - max]: direct fallback table [0 - max]: tree scaleout table [0 - max]: double_tree 2024:01:16-14:54:15:(580974) |CCL_DEBUG| selector_impl.hpp:251 print: reduce_scatter selection main table [0 - max]: direct fallback table [0 - max]: ring scaleout table [0 - max]: ring 2024:01:16-14:54:15:(580974) |CCL_DEBUG| selector_impl.hpp:251 print: send selection main table [0 - max]: direct fallback table [0 - max]: direct scaleout table [0 - max]: direct 2024:01:16-14:54:15:(580974) |CCL_DEBUG| topo_manager.cpp:1214 build_host_info: rank: 0, size: 2, host: machine 2024:01:16-14:54:15:(580974) |CCL_DEBUG| atl_mpi.cpp:660 comm_split: atl-mpi-ep: 0, ep_idx 0, nic_idx 0 2024:01:16-14:54:15:(580975) |CCL_DEBUG| atl_mpi.cpp:660 comm_split: atl-mpi-ep: 0, ep_idx 0, nic_idx 0 2024:01:16-14:54:15:(580974) |CCL_DEBUG| atl_mpi.cpp:660 comm_split: atl-mpi-ep: 1, ep_idx 1, nic_idx 0 2024:01:16-14:54:15:(580975) |CCL_DEBUG| atl_mpi.cpp:660 comm_split: atl-mpi-ep: 1, ep_idx 1, nic_idx 0 2024:01:16-14:54:15:(580974) |CCL_DEBUG| atl_mpi_comm.cpp:82 init_transport: init atl, requested ep_count 2 2024:01:16-14:54:15:(580974) |CCL_DEBUG| atl_base_comm.cpp:168 init_tag: atl tag: { bits: 32, max: 1073741823, mask: 1073741823, pof2: 536870912 } 2024:01:16-14:54:15:(580974) |CCL_DEBUG| atl_base_comm.cpp:90 create_comm_id: rank2rank_map: 2024:01:16-14:54:15:(580975) |CCL_DEBUG| atl_mpi_comm.cpp:82 init_transport: init atl, requested ep_count 2 2024:01:16-14:54:15:(580975) |CCL_DEBUG| atl_base_comm.cpp:168 init_tag: atl tag: { bits: 32, max: 1073741823, mask: 1073741823, pof2: 536870912 } 2024:01:16-14:54:15:(580974) |CCL_DEBUG| atl_base_comm.cpp:91 create_comm_id: rank2proc_map: 2024:01:16-14:54:15:(580974) |CCL_DEBUG| atl_base_comm.cpp:132 create_comm_id: new_comm_id 2 2024:01:16-14:54:15:(580975) |CCL_DEBUG| atl_base_comm.cpp:90 create_comm_id: rank2rank_map: 2024:01:16-14:54:15:(580975) |CCL_DEBUG| atl_base_comm.cpp:91 create_comm_id: rank2proc_map: 2024:01:16-14:54:15:(580974) |CCL_DEBUG| comm.cpp:97 ccl_internal_comm: comm.id == explicit_id, reset comm.id 2 2024:01:16-14:54:15:(580975) |CCL_DEBUG| atl_base_comm.cpp:132 create_comm_id: new_comm_id 2 2024:01:16-14:54:15:(580974) |CCL_DEBUG| comm.cpp:147 init: { { rank: 0, size: 1, id: 2 } r2r_comm: {} node_comm: {} even_comm: {} pair_comm: {} env: {} } 2024:01:16-14:54:15:(580974) |CCL_DEBUG| comm.cpp:233 create_subcomm: new subcomm: color 0, { rank: 0, size: 1, id: 2 } 2024:01:16-14:54:15:(580975) |CCL_DEBUG| comm.cpp:97 ccl_internal_comm: comm.id == explicit_id, reset comm.id 2 2024:01:16-14:54:15:(580975) |CCL_DEBUG| comm.cpp:147 init: { { rank: 0, size: 1, id: 2 } r2r_comm: {} node_comm: {} even_comm: {} pair_comm: {} env: {} } 2024:01:16-14:54:15:(580975) |CCL_DEBUG| comm.cpp:233 create_subcomm: new subcomm: color 1, { rank: 0, size: 1, id: 2 } 2024:01:16-14:54:15:(580974) |CCL_DEBUG| atl_mpi.cpp:660 comm_split: atl-mpi-ep: 0, ep_idx 0, nic_idx 0 2024:01:16-14:54:15:(580975) |CCL_DEBUG| atl_mpi.cpp:660 comm_split: atl-mpi-ep: 0, ep_idx 0, nic_idx 0 2024:01:16-14:54:15:(580974) |CCL_DEBUG| atl_mpi.cpp:660 comm_split: atl-mpi-ep: 1, ep_idx 1, nic_idx 0 2024:01:16-14:54:15:(580975) |CCL_DEBUG| atl_mpi.cpp:660 comm_split: atl-mpi-ep: 1, ep_idx 1, nic_idx 0 2024:01:16-14:54:15:(580974) |CCL_DEBUG| atl_mpi_comm.cpp:82 init_transport: init atl, requested ep_count 2 2024:01:16-14:54:15:(580974) |CCL_DEBUG| atl_base_comm.cpp:168 init_tag: atl tag: { bits: 32, max: 1073741823, mask: 1073741823, pof2: 536870912 } 2024:01:16-14:54:15:(580975) |CCL_DEBUG| atl_mpi_comm.cpp:82 init_transport: init atl, requested ep_count 2 2024:01:16-14:54:15:(580975) |CCL_DEBUG| atl_base_comm.cpp:90 create_comm_id: rank2rank_map: 2024:01:16-14:54:15:(580975) |CCL_DEBUG| atl_base_comm.cpp:91 create_comm_id: rank2proc_map: 2024:01:16-14:54:15:(580974) |CCL_DEBUG| atl_base_comm.cpp:90 create_comm_id: rank2rank_map: 2024:01:16-14:54:15:(580974) |CCL_DEBUG| atl_base_comm.cpp:91 create_comm_id: rank2proc_map: 2024:01:16-14:54:15:(580974) |CCL_DEBUG| atl_base_comm.cpp:132 create_comm_id: new_comm_id 3 2024:01:16-14:54:15:(580975) |CCL_DEBUG| atl_base_comm.cpp:132 create_comm_id: new_comm_id 3 2024:01:16-14:54:15:(580975) |CCL_DEBUG| comm.cpp:97 ccl_internal_comm: comm.id == explicit_id, reset comm.id 3 2024:01:16-14:54:15:(580974) |CCL_DEBUG| comm.cpp:97 ccl_internal_comm: comm.id == explicit_id, reset comm.id 3 2024:01:16-14:54:15:(580974) |CCL_DEBUG| comm.cpp:147 init: { { rank: 0, size: 2, id: 3 } r2r_comm: {} node_comm: {} even_comm: {} pair_comm: {} env: {} }2024:01:16-14:54:15:(580975) |CCL_DEBUG| comm.cpp:233 create_subcomm: new subcomm: color 0, { rank: 1, size: 2, id: 3 } 2024:01:16-14:54:15:(580974) |CCL_DEBUG| comm.cpp:233 create_subcomm: new subcomm: color 0, { rank: 0, size: 2, id: 3 } 2024:01:16-14:54:15:(580974) |CCL_DEBUG| atl_mpi.cpp:660 comm_split: atl-mpi-ep: 0, ep_idx 0, nic_idx 0 2024:01:16-14:54:15:(580975) |CCL_DEBUG| atl_mpi.cpp:660 comm_split: atl-mpi-ep: 0, ep_idx 0, nic_idx 0 2024:01:16-14:54:15:(580974) |CCL_DEBUG| atl_mpi.cpp:660 comm_split: atl-mpi-ep: 1, ep_idx 1, nic_idx 0 2024:01:16-14:54:15:(580975) |CCL_DEBUG| atl_mpi.cpp:660 comm_split: atl-mpi-ep: 1, ep_idx 1, nic_idx 0 2024:01:16-14:54:15:(580974) |CCL_DEBUG| atl_mpi_comm.cpp:82 init_transport: init atl, requested ep_count 2 2024:01:16-14:54:15:(580974) |CCL_DEBUG| atl_base_comm.cpp:168 init_tag: atl tag: { bits: 32, max: 1073741823, mask: 1073741823, pof2: 536870912 } 2024:01:16-14:54:15:(580975) |CCL_DEBUG| atl_mpi_comm.cpp:82 init_transport: init atl, requested ep_count 2 2024:01:16-14:54:15:(580975) |CCL_DEBUG| atl_base_comm.cpp:90 create_comm_id: rank2rank_map: 2024:01:16-14:54:15:(580975) |CCL_DEBUG| atl_base_comm.cpp:91 create_comm_id: rank2proc_map: 2024:01:16-14:54:15:(580974) |CCL_DEBUG| atl_base_comm.cpp:90 create_comm_id: rank2rank_map: 2024:01:16-14:54:15:(580974) |CCL_DEBUG| atl_base_comm.cpp:91 create_comm_id: rank2proc_map: 2024:01:16-14:54:15:(580974) |CCL_DEBUG| atl_base_comm.cpp:132 create_comm_id: new_comm_id 4 2024:01:16-14:54:15:(580975) |CCL_DEBUG| atl_base_comm.cpp:132 create_comm_id: new_comm_id 4 2024:01:16-14:54:15:(580975) |CCL_DEBUG| comm.cpp:97 ccl_internal_comm: comm.id == explicit_id, reset comm.id 4 2024:01:16-14:54:15:(580974) |CCL_DEBUG| comm.cpp:97 ccl_internal_comm: comm.id == explicit_id, reset comm.id 4 2024:01:16-14:54:15:(580974) |CCL_DEBUG| comm.cpp:147 init: { { rank: 0, size: 2, id: 4 } r2r_comm: {} node_comm: {} even_comm: {} pair_comm: {} env: {} } 2024:01:16-14:54:15:(580975) |CCL_DEBUG| comm.cpp:233 create_subcomm: new subcomm: color -1, { rank: 1, size: 2, id: 4 } 2024:01:16-14:54:15:(580974) |CCL_DEBUG| comm.cpp:233 create_subcomm: new subcomm: color -1, { rank: 0, size: 2, id: 4 } 2024:01:16-14:54:15:(580974) |CCL_DEBUG| atl_mpi.cpp:660 comm_split: atl-mpi-ep: 0, ep_idx 0, nic_idx 0 2024:01:16-14:54:15:(580975) |CCL_DEBUG| atl_mpi.cpp:660 comm_split: atl-mpi-ep: 0, ep_idx 0, nic_idx 0 2024:01:16-14:54:15:(580974) |CCL_DEBUG| atl_mpi.cpp:660 comm_split: atl-mpi-ep: 1, ep_idx 1, nic_idx 0 2024:01:16-14:54:15:(580975) |CCL_DEBUG| atl_mpi.cpp:660 comm_split: atl-mpi-ep: 1, ep_idx 1, nic_idx 0 2024:01:16-14:54:15:(580974) |CCL_DEBUG| atl_mpi_comm.cpp:82 init_transport: init atl, requested ep_count 2 2024:01:16-14:54:15:(580974) |CCL_DEBUG| atl_base_comm.cpp:168 init_tag: atl tag: { bits: 32, max: 1073741823, mask: 1073741823, pof2: 536870912 }2024:01:16-14:54:15:(580975) |CCL_DEBUG| atl_mpi_comm.cpp:82 init_transport: init atl, requested ep_count 2 2024:01:16-14:54:15:(580975) |CCL_DEBUG| atl_base_comm.cpp:90 create_comm_id: rank2rank_map: 2024:01:16-14:54:15:(580975) |CCL_DEBUG| atl_base_comm.cpp:91 create_comm_id: rank2proc_map: 2024:01:16-14:54:15:(580974) |CCL_DEBUG| atl_base_comm.cpp:90 create_comm_id: rank2rank_map: 2024:01:16-14:54:15:(580974) |CCL_DEBUG| atl_base_comm.cpp:91 create_comm_id: rank2proc_map: 2024:01:16-14:54:15:(580974) |CCL_DEBUG| atl_base_comm.cpp:132 create_comm_id: new_comm_id 5 2024:01:16-14:54:15:(580975) |CCL_DEBUG| atl_base_comm.cpp:132 create_comm_id: new_comm_id 5 2024:01:16-14:54:15:(580975) |CCL_DEBUG| comm.cpp:97 ccl_internal_comm: comm.id == explicit_id, reset comm.id 5 2024:01:16-14:54:15:(580974) |CCL_DEBUG| comm.cpp:97 ccl_internal_comm: comm.id == explicit_id, reset comm.id 5 2024:01:16-14:54:15:(580975) |CCL_DEBUG| comm.cpp:233 create_subcomm: new subcomm: color -1, { rank: 1, size: 2, id: 5 } 2024:01:16-14:54:15:(580974) |CCL_DEBUG| comm.cpp:147 init: { { rank: 0, size: 2, id: 5 } r2r_comm: {} node_comm: {} even_comm: {} pair_comm: {} env: {} } 2024:01:16-14:54:15:(580974) |CCL_DEBUG| comm.cpp:233 create_subcomm: new subcomm: color -1, { rank: 0, size: 2, id: 5 } 2024:01:16-14:54:15:(580974) |CCL_DEBUG| comm.cpp:147 init: { { rank: 0, size: 2, id: 1 } r2r_comm: { rank: 0, size: 1, id: 2 } node_comm: { rank: 0, size: 2, id: 3 } even_comm: { rank: 0, size: 2, id: 4 } pair_comm: { rank: 0, size: 2, id: 5 } env: {} } 2024:01:16-14:54:15:(580974) |CCL_DEBUG| coll.cpp:115 ccl_coll_create: { param: { coll: allreduce, sb: 0x7ffddc2c04b0, sc: 409600, rb: 0x7ffddc4504b0, rc: 409600, dt: int32, rt: sum, comm: { rank: 0, size: 2 } } attr: { priority: 0, sync: 0, to_cache: 0, match_id: } } 2024:01:16-14:54:15:(580974) |CCL_DEBUG| comm.cpp:363 get_sched_id: sched_id 0, comm_id 1, next sched_id 1 2024:01:16-14:54:15:(580975) |CCL_DEBUG| coll.cpp:115 ccl_coll_create: { param: { coll: allreduce, sb: 0x7ffff6f7ae60, sc: 409600, rb: 0x7ffff710ae60, rc: 409600, dt: int32, rt: sum, comm: { rank: 1, size: 2 } } attr: { priority: 0, sync: 0, to_cache: 0, match_id: } } 2024:01:16-14:54:15:(580975) |CCL_DEBUG| comm.cpp:363 get_sched_id: sched_id 0, comm_id 1, next sched_id 1 2024:01:16-14:54:15:(580975) |CCL_DEBUG| sched.cpp:352 create: didn't find sched, create new one 0x5610c1fa3080, coll allreduce 2024:01:16-14:54:15:(580974) |CCL_DEBUG| sched.cpp:352 create: didn't find sched, create new one 0x5599f38f2f80, coll allreduce 2024:01:16-14:54:15:(580974) |CCL_DEBUG| comm.cpp:363 get_sched_id: sched_id 1, comm_id 1, next sched_id 2 2024:01:16-14:54:15:(580975) |CCL_DEBUG| comm.cpp:363 get_sched_id: sched_id 1, comm_id 1, next sched_id 2 2024:01:16-14:54:15:(580975) |CCL_DEBUG| parallelizer.cpp:328 process_base: sched 0x5610c1fa3080, coll allreduce, part_count 2 2024:01:16-14:54:15:(580974) |CCL_DEBUG| parallelizer.cpp:328 process_base: sched 0x5599f38f2f80, coll allreduce, part_count 2 2024:01:16-14:54:15:(580975) |CCL_DEBUG| comm.cpp:363 get_sched_id: sched_id 2, comm_id 1, next sched_id 3 2024:01:16-14:54:15:(580975) |CCL_DEBUG| comm.cpp:363 get_sched_id: sched_id 3, comm_id 1, next sched_id 4 2024:01:16-14:54:15:(580974) |CCL_DEBUG| comm.cpp:363 get_sched_id: sched_id 2, comm_id 1, next sched_id 3 2024:01:16-14:54:15:(580974) |CCL_DEBUG| comm.cpp:363 get_sched_id: sched_id 3, comm_id 1, next sched_id 4 2024:01:16-14:54:15:(580975) |CCL_DEBUG| selector_impl.hpp:261 get: param: { coll: allreduce, count: 204800, dt: int32, comm: { rank: 1, size: 2 }, buf: 0x7ffff6f7ae60 } 2024:01:16-14:54:15:(580974) |CCL_DEBUG| selector_impl.hpp:261 get: param: { coll: allreduce, count: 204800, dt: int32, comm: { rank: 0, size: 2 }, buf: 0x7ffddc2c04b0 } 2024:01:16-14:54:15:(580975) |CCL_DEBUG| selector_impl.hpp:333 get: selected algo: coll allreduce, count 204800, algo direct 2024:01:16-14:54:15:(580975) |CCL_DEBUG| selector_impl.hpp:261 get: param: { coll: allreduce, count: 204800, dt: int32, comm: { rank: 1, size: 2 }, buf: 0x7ffff6f7ae60 } 2024:01:16-14:54:15:(580974) |CCL_DEBUG| selector_impl.hpp:333 get: selected algo: coll allreduce, count 204800, algo direct 2024:01:16-14:54:15:(580974) |CCL_DEBUG| selector_impl.hpp:261 get: param: { coll: allreduce, count: 204800, dt: int32, comm: { rank: 0, size: 2 }, buf: 0x7ffddc2c04b0 } 2024:01:16-14:54:15:(580974) |CCL_DEBUG| selector_impl.hpp:333 get: selected algo: coll allreduce, count 204800, algo direct2024:01:16-14:54:15:(580975) |CCL_DEBUG| selector_impl.hpp:333 get: selected algo: coll allreduce, count 204800, algo direct 2024:01:16-14:54:15:(580975) |CCL_DEBUG| allreduce.cpp:40 ccl_coll_build_direct_allreduce: build direct allreduce 2024:01:16-14:54:15:(580974) |CCL_DEBUG| allreduce.cpp:40 ccl_coll_build_direct_allreduce: build direct allreduce 2024:01:16-14:54:15:(580975) |CCL_DEBUG| entry_factory.hpp:69 create: creating: ALLREDUCE entry 2024:01:16-14:54:15:(580975) |CCL_DEBUG| entry_factory.hpp:72 create: created: ALLREDUCE, entry: 0x5610c1fa4240, sched: 0x5610c1fa34802024:01:16-14:54:15:(580974) |CCL_DEBUG| entry_factory.hpp:69 create: creating: ALLREDUCE entry 2024:01:16-14:54:15:(580974) |CCL_DEBUG| entry_factory.hpp:72 create: created: ALLREDUCE, entry: 0x5599f38c5e00, sched: 0x5599f38c5740 2024:01:16-14:54:15:(580975) |CCL_DEBUG| selector_impl.hpp:261 get: param: { coll: allreduce, count: 204800, dt: int32, comm: { rank: 1, size: 2 }, buf: 0x7ffff7042e60 } 2024:01:16-14:54:15:(580975) |CCL_DEBUG| selector_impl.hpp:333 get: selected algo: coll allreduce, count 204800, algo direct 2024:01:16-14:54:15:(580974) |CCL_DEBUG| selector_impl.hpp:261 get: param: { coll: allreduce, count: 204800, dt: int32, comm: { rank: 0, size: 2 }, buf: 0x7ffddc3884b0 } 2024:01:16-14:54:15:(580974) |CCL_DEBUG| selector_impl.hpp:333 get: selected algo: coll allreduce, count 204800, algo direct 2024:01:16-14:54:15:(580974) |CCL_DEBUG| selector_impl.hpp:261 get: param: { coll: allreduce, count: 204800, dt: int32, comm: { rank: 0, size: 2 }, buf: 0x7ffddc3884b0 } 2024:01:16-14:54:15:(580975) |CCL_DEBUG| selector_impl.hpp:261 get: param: { coll: allreduce, count: 204800, dt: int32, comm: { rank: 1, size: 2 }, buf: 0x7ffff7042e60 } 2024:01:16-14:54:15:(580975) |CCL_DEBUG| selector_impl.hpp:333 get: selected algo: coll allreduce, count 204800, algo direct 2024:01:16-14:54:15:(580975) |CCL_DEBUG| allreduce.cpp:40 ccl_coll_build_direct_allreduce: build direct allreduce 2024:01:16-14:54:15:(580975) |CCL_DEBUG| entry_factory.hpp:69 create: creating: ALLREDUCE entry 2024:01:16-14:54:15:(580975) |CCL_DEBUG| entry_factory.hpp:72 create: created: ALLREDUCE, entry: 0x5610c1fa4480, sched: 0x5610c1fa3a40 2024:01:16-14:54:15:(580974) |CCL_DEBUG| selector_impl.hpp:333 get: selected algo: coll allreduce, count 204800, algo direct 2024:01:16-14:54:15:(580974) |CCL_DEBUG| allreduce.cpp:40 ccl_coll_build_direct_allreduce: build direct allreduce 2024:01:16-14:54:15:(580974) |CCL_DEBUG| entry_factory.hpp:69 create: creating: ALLREDUCE entry 2024:01:16-14:54:15:(580974) |CCL_DEBUG| entry_factory.hpp:72 create: created: ALLREDUCE, entry: 0x5599f38c6040, sched: 0x5599f38c3fc02024:01:16-14:54:15:(580975) |CCL_DEBUG| entry_factory.hpp:69 create: creating: SYNC entry 2024:01:16-14:54:15:(580975) |CCL_DEBUG| entry_factory.hpp:72 create: created: SYNC, entry: 0x5610c1fa46c0, sched: 0x5610c1fa3480 2024:01:16-14:54:15:(580975) |CCL_DEBUG| entry_factory.hpp:69 create: creating: SYNC entry 2024:01:16-14:54:15:(580975) |CCL_DEBUG| entry_factory.hpp:72 create: created: SYNC, entry: 0x5610c1fa4800, sched: 0x5610c1fa3a40 2024:01:16-14:54:15:(580975) |CCL_DEBUG| entry_factory.hpp:69 create: creating: DEPS entry 2024:01:16-14:54:15:(580974) |CCL_DEBUG| entry_factory.hpp:69 create: creating: SYNC entry 2024:01:16-14:54:15:(580974) |CCL_DEBUG| entry_factory.hpp:72 create: created: SYNC, entry: 0x5599f38c6280, sched: 0x5599f38c5740 2024:01:16-14:54:15:(580974) |CCL_DEBUG| entry_factory.hpp:69 create: creating: SYNC entry 2024:01:16-14:54:15:(580975) |CCL_DEBUG| entry_factory.hpp:72 create: created: DEPS, entry: 0x5610c1fa4b80, sched: 0x5610c1fa3480 2024:01:16-14:54:15:(580975) |CCL_DEBUG| sched.cpp:123 commit: sched 0x5610c1fa3080, sched_id 1, req 0x5610c1fa0c40, subscheds_count 2 2024:01:16-14:54:15:(580975) |CCL_DEBUG| sched.cpp:377 set_submitted_to_gpu: sched 0x5610c1fa3080 parent_sched 0 set_submitted_to_gpu(0) 2024:01:16-14:54:15:(580974) |CCL_DEBUG| entry_factory.hpp:72 create: created: SYNC, entry: 0x5599f38c63c0, sched: 0x5599f38c3fc0 2024:01:16-14:54:15:(580974) |CCL_DEBUG| entry_factory.hpp:69 create: creating: DEPS entry 2024:01:16-14:54:15:(580975) |CCL_DEBUG| sched.cpp:184 start: starting schedule 0x5610c1fa3080, type allreduce 2024:01:16-14:54:15:(580975) |CCL_DEBUG| comm.cpp:363 get_sched_id: sched_id 4, comm_id 1, next sched_id 5 2024:01:16-14:54:15:(580975) |CCL_DEBUG| request.cpp:84 set_counter: req: 0x5610c1fa2280, set count 2 2024:01:16-14:54:15:(580974) |CCL_DEBUG| entry_factory.hpp:72 create: created: DEPS, entry: 0x5599f38c6740, sched: 0x5599f38c5740 2024:01:16-14:54:15:(580974) |CCL_DEBUG| sched.cpp:123 commit: sched 0x5599f38f2f80, sched_id 1, req 0x5599f38c9700, subscheds_count 2 2024:01:16-14:54:15:(580974) |CCL_DEBUG| sched.cpp:377 set_submitted_to_gpu: sched 0x5599f38f2f80 parent_sched 0 set_submitted_to_gpu(0)2024:01:16-14:54:15:(580975) |CCL_DEBUG| comm.cpp:363 get_sched_id: sched_id 5, comm_id 1, next sched_id 6 2024:01:16-14:54:15:(580975) |CCL_DEBUG| request.cpp:84 set_counter: req: 0x5610c1fa2340, set count 2 2024:01:16-14:54:15:(580975) |CCL_DEBUG| request.cpp:84 set_counter: req: 0x5610c1fa0c40, set count 3 2024:01:16-14:54:15:(580975) |CCL_DEBUG| exec.cpp:263 start: worker idx: 0, coll: allreduce 2024:01:16-14:54:15:(580974) |CCL_DEBUG| sched.cpp:184 start: starting schedule 0x5599f38f2f80, type allreduce 2024:01:16-14:54:15:(580974) |CCL_DEBUG| comm.cpp:363 get_sched_id: sched_id 4, comm_id 1, next sched_id 5 2024:01:16-14:54:15:(580975) |CCL_DEBUG| worker.cpp:39 add: add sched 0x5610c1fa3480, coll partial bin: 0 2024:01:16-14:54:15:(580975) |CCL_DEBUG| worker.cpp:246 update_wait_condition: type 0, delta 1 2024:01:16-14:54:15:(580974) |CCL_DEBUG| request.cpp:84 set_counter: req: 0x5599f38c5480, set count 2 2024:01:16-14:54:15:(580974) |CCL_DEBUG| comm.cpp:363 get_sched_id: sched_id 5, comm_id 1, next sched_id 6 2024:01:16-14:54:15:(580974) |CCL_DEBUG| request.cpp:84 set_counter: req: 0x5599f38c5d00, set count 2 2024:01:16-14:54:15:(580974) |CCL_DEBUG| request.cpp:84 set_counter: req: 0x5599f38c9700, set count 3 2024:01:16-14:54:15:(580975) |CCL_DEBUG| worker.cpp:264 update_wait_condition: type 0, delta 1, new value 1 2024:01:16-14:54:15:(580975) |CCL_DEBUG| request.cpp:91 increase_counter: req: 0x5610c1fa2280, increment 1 2024:01:16-14:54:15:(580975) |CCL_DEBUG| request.cpp:94 increase_counter: req 0x5610c1fa2280, counter 3 2024:01:16-14:54:15:(580975) |CCL_DEBUG| exec.cpp:263 start: worker idx: 1, coll: allreduce 2024:01:16-14:54:15:(580975) |CCL_DEBUG| worker.cpp:39 add: add sched 0x5610c1fa3a40, coll partial bin: 0 2024:01:16-14:54:15:(580974) |CCL_DEBUG| exec.cpp:263 start: worker idx: 0, coll: allreduce 2024:01:16-14:54:15:(580974) |CCL_DEBUG| worker.cpp:39 add: add sched 0x5599f38c5740, coll partial bin: 0 2024:01:16-14:54:15:(580974) |CCL_DEBUG| worker.cpp:246 update_wait_condition: type 0, delta 12024:01:16-14:54:15:(580975) |CCL_DEBUG| worker.cpp:246 update_wait_condition: type 0, delta 1 2024:01:16-14:54:15:(580981) |CCL_DEBUG| worker.cpp:109 process_strict_sched_queue: add sched 0x5610c1fa3480 from strict_queue to exec_queue, req 0x5610c1fa2280 2024:01:16-14:54:15:(580975) |CCL_DEBUG| worker.cpp:264 update_wait_condition: type 0, delta 1, new value 12024:01:16-14:54:15:(580974) |CCL_DEBUG| worker.cpp:264 update_wait_condition: type 0, delta 1, new value 1 2024:01:16-14:54:15:(580974) |CCL_DEBUG| request.cpp:91 increase_counter: req: 0x5599f38c5480, increment 1 2024:01:16-14:54:15:(580981) |CCL_DEBUG| sched_base.cpp:132 get_priority: sched, 0x5610c1fa3480, priority 0 2024:01:16-14:54:15:(580974) |CCL_DEBUG| request.cpp:94 increase_counter: req 0x5599f38c5480, counter 3 2024:01:16-14:54:15:(580974) |CCL_DEBUG| exec.cpp:263 start: worker idx: 1, coll: allreduce 2024:01:16-14:54:15:(580975) |CCL_DEBUG| request.cpp:91 increase_counter: req: 0x5610c1fa2340, increment 1 2024:01:16-14:54:15:(580975) |CCL_DEBUG| request.cpp:94 increase_counter: req 0x5610c1fa2340, counter 3 2024:01:16-14:54:15:(580974) |CCL_DEBUG| worker.cpp:39 add: add sched 0x5599f38c3fc0, coll partial bin: 0 2024:01:16-14:54:15:(580974) |CCL_DEBUG| worker.cpp:246 update_wait_condition: type 0, delta 12024:01:16-14:54:15:(580981) |CCL_DEBUG| queue.cpp:114 add: add to bin: sched 0x5610c1fa3480, priority 0 2024:01:16-14:54:15:(580983) |CCL_DEBUG| worker.cpp:109 process_strict_sched_queue: add sched 0x5610c1fa3a40 from strict_queue to exec_queue, req 0x5610c1fa2340 2024:01:16-14:54:15:(580983) |CCL_DEBUG| sched_base.cpp:132 get_priority: sched, 0x5610c1fa3a40, priority 0 2024:01:16-14:54:15:(580982) |CCL_DEBUG| worker.cpp:109 process_strict_sched_queue: add sched 0x5599f38c5740 from strict_queue to exec_queue, req 0x5599f38c5480 2024:01:16-14:54:15:(580983) |CCL_DEBUG| queue.cpp:114 add: add to bin: sched 0x5610c1fa3a40, priority 0 2024:01:16-14:54:15:(580981) |CCL_DEBUG| queue.cpp:148 add: didn't find bin, emplaced new one 0x7fcf10000b80 , max_priority 0 2024:01:16-14:54:15:(580974) |CCL_DEBUG| worker.cpp:264 update_wait_condition: type 0, delta 1, new value 1 2024:01:16-14:54:15:(580983) |CCL_DEBUG| queue.cpp:148 add: didn't find bin, emplaced new one 0x7fcf14000b80 , max_priority 0 2024:01:16-14:54:15:(580982) |CCL_DEBUG| sched_base.cpp:132 get_priority: sched, 0x5599f38c5740, priority 0 2024:01:16-14:54:15:(580981) |CCL_DEBUG| sched.cpp:395 do_progress: starting entry: 0x5610c1fa4b80, name: DEPS [0/3] 2024:01:16-14:54:15:(580974) |CCL_DEBUG| request.cpp:91 increase_counter: req: 0x5599f38c5d00, increment 1 2024:01:16-14:54:15:(580983) |CCL_DEBUG| sched.cpp:395 do_progress: starting entry: 0x5610c1fa4800, name: SYNC [0/2] 2024:01:16-14:54:15:(580975) |CCL_DEBUG| coll.cpp:783 ccl_allreduce_impl: coll allreduce created, req 0x5610c1fa0c40 count 409600 2024:01:16-14:54:15:(580974) |CCL_DEBUG| request.cpp:94 increase_counter: req 0x5599f38c5d00, counter 3 2024:01:16-14:54:15:(580982) |CCL_DEBUG| queue.cpp:114 add: add to bin: sched 0x5599f38c5740, priority 0 2024:01:16-14:54:15:(580981) |CCL_DEBUG| sched.cpp:422 do_progress: completed entry: 0x5610c1fa4b80, name: DEPS barrier entry [0/3], shift start_idx to 1, sched 0x5610c1fa3480 2024:01:16-14:54:15:(580981) |CCL_DEBUG| sched.cpp:395 do_progress: starting entry: 0x5610c1fa46c0, name: SYNC [1/3] 2024:01:16-14:54:15:(580984) |CCL_DEBUG| worker.cpp:109 process_strict_sched_queue: add sched 0x5599f38c3fc0 from strict_queue to exec_queue, req 0x5599f38c5d00 2024:01:16-14:54:15:(580982) |CCL_DEBUG| queue.cpp:148 add: didn't find bin, emplaced new one 0x7fc658000b80 , max_priority 0 2024:01:16-14:54:15:(580981) |CCL_DEBUG| sched.cpp:422 do_progress: completed entry: 0x5610c1fa46c0, name: SYNC barrier entry [1/3], shift start_idx to 2, sched 0x5610c1fa3480 2024:01:16-14:54:15:(580983) |CCL_DEBUG| sched.cpp:422 do_progress: completed entry: 0x5610c1fa4800, name: SYNC barrier entry [0/2], shift start_idx to 1, sched 0x5610c1fa3a40 2024:01:16-14:54:15:(580984) |CCL_DEBUG| sched_base.cpp:132 get_priority: sched, 0x5599f38c3fc0, priority 0 2024:01:16-14:54:15:(580984) |CCL_DEBUG| queue.cpp:114 add: add to bin: sched 0x5599f38c3fc0, priority 02024:01:16-14:54:15:(580981) |CCL_DEBUG| sched.cpp:395 do_progress: starting entry: 0x5610c1fa4240, name: ALLREDUCE [2/3] 2024:01:16-14:54:15:(580983) |CCL_DEBUG| sched.cpp:395 do_progress: starting entry: 0x5610c1fa4480, name: ALLREDUCE [1/2] 2024:01:16-14:54:15:(580982) |CCL_DEBUG| sched.cpp:395 do_progress: starting entry: 0x5599f38c6740, name: DEPS [0/3] 2024:01:16-14:54:15:(580981) |CCL_DEBUG| allreduce_entry.hpp:44 start: ALLREDUCE entry req: req: { completed: 0, ptr: 0x5610c1fa4340 }, cnt: 204800, bytes: 819200 2024:01:16-14:54:15:(580983) |CCL_DEBUG| allreduce_entry.hpp:44 start: ALLREDUCE entry req: req: { completed: 0, ptr: 0x5610c1fa4580 }, cnt: 204800, bytes: 819200 2024:01:16-14:54:15:(580984) |CCL_DEBUG| queue.cpp:148 add: didn't find bin, emplaced new one 0x7fc65c000b80 , max_priority 0 2024:01:16-14:54:15:(580974) |CCL_DEBUG| coll.cpp:783 ccl_allreduce_impl: coll allreduce created, req 0x5599f38c9700 count 409600 2024:01:16-14:54:15:(580982) |CCL_DEBUG| sched.cpp:422 do_progress: completed entry: 0x5599f38c6740, name: DEPS barrier entry [0/3], shift start_idx to 1, sched 0x5599f38c5740 2024:01:16-14:54:15:(580984) |CCL_DEBUG| sched.cpp:395 do_progress: starting entry: 0x5599f38c63c0, name: SYNC [0/2] 2024:01:16-14:54:15:(580982) |CCL_DEBUG| sched.cpp:395 do_progress: starting entry: 0x5599f38c6280, name: SYNC [1/3] 2024:01:16-14:54:15:(580982) |CCL_DEBUG| sched.cpp:422 do_progress: completed entry: 0x5599f38c6280, name: SYNC barrier entry [1/3], shift start_idx to 2, sched 0x5599f38c5740 2024:01:16-14:54:15:(580984) |CCL_DEBUG| sched.cpp:422 do_progress: completed entry: 0x5599f38c63c0, name: SYNC barrier entry [0/2], shift start_idx to 1, sched 0x5599f38c3fc0 2024:01:16-14:54:15:(580982) |CCL_DEBUG| sched.cpp:395 do_progress: starting entry: 0x5599f38c5e00, name: ALLREDUCE [2/3] 2024:01:16-14:54:15:(580984) |CCL_DEBUG| sched.cpp:395 do_progress: starting entry: 0x5599f38c6040, name: ALLREDUCE [1/2] 2024:01:16-14:54:15:(580982) |CCL_DEBUG| allreduce_entry.hpp:44 start: ALLREDUCE entry req: req: { completed: 0, ptr: 0x5599f38c5f00 }, cnt: 204800, bytes: 819200 2024:01:16-14:54:15:(580984) |CCL_DEBUG| allreduce_entry.hpp:44 start: ALLREDUCE entry req: req: { completed: 0, ptr: 0x5599f38c6140 }, cnt: 204800, bytes: 819200 2024:01:16-14:54:15:(580981) |CCL_DEBUG| request.cpp:49 complete_counter: req 0x5610c1fa2280, counter 2 2024:01:16-14:54:15:(580983) |CCL_DEBUG| request.cpp:49 complete_counter: req 0x5610c1fa2340, counter 2 [1705416855.752995356] machine:rank1.cpu_allreduce_test: Reading from remote process' memory failed. Disabling CMA support [1705416855.752995466] machine:rank1.cpu_allreduce_test: Reading from remote process' memory failed. Disabling CMA support [1705416855.753000085] machine:rank0.cpu_allreduce_test: Reading from remote process' memory failed. Disabling CMA support [1705416855.753004673] machine:rank0.cpu_allreduce_test: Reading from remote process' memory failed. Disabling CMA support machine:rank1: Assertion failure at psm3/ptl_am/ptl.c:196: nbytes == req->req_data.recv_msglen machine:rank1: Assertion failure at psm3/ptl_am/ptl.c:196: nbytes == req->req_data.recv_msglen machine:rank0: Assertion failure at psm3/ptl_am/ptl.c:196: nbytes == req->req_data.recv_msglen machine:rank0: Assertion failure at psm3/ptl_am/ptl.c:196: nbytes == req->req_data.recv_msglen =================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = RANK 0 PID 580974 RUNNING AT machine = KILLED BY SIGNAL: 6 (Aborted) =================================================================================== =================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = RANK 1 PID 580975 RUNNING AT machine = KILLED BY SIGNAL: 6 (Aborted) ===================================================================================