Environment variables STARPU_HOSTNAME=zhores.ais-gpu.starpu-1.4.4 StarPU has found : 56 CPU workers: CPU 0 CPU 1 CPU 2 CPU 3 CPU 4 CPU 5 CPU 6 CPU 7 CPU 8 CPU 9 CPU 10 CPU 11 CPU 12 CPU 13 CPU 14 CPU 15 CPU 16 CPU 17 CPU 18 CPU 19 CPU 20 CPU 21 CPU 22 CPU 23 CPU 24 CPU 25 CPU 26 CPU 27 CPU 28 CPU 29 CPU 30 CPU 31 CPU 32 CPU 33 CPU 34 CPU 35 CPU 36 CPU 37 CPU 38 CPU 39 CPU 40 CPU 41 CPU 42 CPU 43 CPU 44 CPU 45 CPU 46 CPU 47 CPU 48 CPU 49 CPU 50 CPU 51 CPU 52 CPU 53 CPU 54 CPU 55 8 CUDA workers: CUDA 0.0 (NVIDIA A100-SXM4-80GB 71.2 GiB 27:00.0) CUDA 1.0 (NVIDIA A100-SXM4-80GB 71.2 GiB 2a:00.0) CUDA 2.0 (NVIDIA A100-SXM4-80GB 71.2 GiB 51:00.0) CUDA 3.0 (NVIDIA A100-SXM4-80GB 71.2 GiB 57:00.0) CUDA 4.0 (NVIDIA A100-SXM4-80GB 71.2 GiB 9e:00.0) CUDA 5.0 (NVIDIA A100-SXM4-80GB 71.2 GiB a4:00.0) CUDA 6.0 (NVIDIA A100-SXM4-80GB 71.2 GiB c7:00.0) CUDA 7.0 (NVIDIA A100-SXM4-80GB 71.2 GiB ca:00.0) No OpenCL worker No FPGA worker No MPI_MS worker No TCPIP_MS worker No HIP worker topology ... (hwloc logical indexes) numa 0 pack 0 core 0 PU 0 CUDA 0.0 (NVIDIA A100-SXM4-80GB 71.2 GiB 27:00.0) core 1 PU 1 CUDA 1.0 (NVIDIA A100-SXM4-80GB 71.2 GiB 2a:00.0) core 2 PU 2 CUDA 2.0 (NVIDIA A100-SXM4-80GB 71.2 GiB 51:00.0) core 3 PU 3 CUDA 3.0 (NVIDIA A100-SXM4-80GB 71.2 GiB 57:00.0) core 4 PU 4 CPU 0 core 5 PU 5 CPU 1 core 6 PU 6 CPU 2 core 7 PU 7 CPU 3 core 8 PU 8 CPU 4 core 9 PU 9 CPU 5 core 10 PU 10 CPU 6 core 11 PU 11 CPU 7 core 12 PU 12 CPU 8 core 13 PU 13 CPU 9 core 14 PU 14 CPU 10 core 15 PU 15 CPU 11 core 16 PU 16 CPU 12 core 17 PU 17 CPU 13 core 18 PU 18 CPU 14 core 19 PU 19 CPU 15 core 20 PU 20 CPU 16 core 21 PU 21 CPU 17 core 22 PU 22 CPU 18 core 23 PU 23 CPU 19 core 24 PU 24 CPU 20 core 25 PU 25 CPU 21 core 26 PU 26 CPU 22 core 27 PU 27 CPU 23 core 28 PU 28 CPU 24 core 29 PU 29 CPU 25 core 30 PU 30 CPU 26 core 31 PU 31 CPU 27 numa 1 pack 1 core 32 PU 32 CUDA 4.0 (NVIDIA A100-SXM4-80GB 71.2 GiB 9e:00.0) core 33 PU 33 CUDA 5.0 (NVIDIA A100-SXM4-80GB 71.2 GiB a4:00.0) core 34 PU 34 CUDA 6.0 (NVIDIA A100-SXM4-80GB 71.2 GiB c7:00.0) core 35 PU 35 CUDA 7.0 (NVIDIA A100-SXM4-80GB 71.2 GiB ca:00.0) core 36 PU 36 CPU 28 core 37 PU 37 CPU 29 core 38 PU 38 CPU 30 core 39 PU 39 CPU 31 core 40 PU 40 CPU 32 core 41 PU 41 CPU 33 core 42 PU 42 CPU 34 core 43 PU 43 CPU 35 core 44 PU 44 CPU 36 core 45 PU 45 CPU 37 core 46 PU 46 CPU 38 core 47 PU 47 CPU 39 core 48 PU 48 CPU 40 core 49 PU 49 CPU 41 core 50 PU 50 CPU 42 core 51 PU 51 CPU 43 core 52 PU 52 CPU 44 core 53 PU 53 CPU 45 core 54 PU 54 CPU 46 core 55 PU 55 CPU 47 core 56 PU 56 CPU 48 core 57 PU 57 CPU 49 core 58 PU 58 CPU 50 core 59 PU 59 CPU 51 core 60 PU 60 CPU 52 core 61 PU 61 CPU 53 core 62 PU 62 CPU 54 core 63 PU 63 CPU 55 bandwidth (MB/s) and latency (us)... from/to NUMA 0 CUDA 0 CUDA 1 CUDA 2 CUDA 3 CUDA 4 CUDA 5 CUDA 6 CUDA 7 NUMA 0 0 25216 25313 25295 25323 25261 25244 25234 25219 CUDA 0 23989 0 236705 240752 240891 243120 244119 244185 244216 CUDA 1 23989 243550 0 240821 241415 244132 244254 244659 243946 CUDA 2 23988 241378 247379 0 243438 244059 244077 243927 243917 CUDA 3 21584 240818 241081 245175 0 242926 243386 243393 243930 CUDA 4 23982 242025 241854 242921 247913 0 244573 244965 244823 CUDA 5 23984 241592 242090 244283 244436 247700 0 244744 244515 CUDA 6 23981 242195 243131 243624 243710 245009 247946 0 243943 CUDA 7 23983 242802 242333 245071 244461 244875 244517 244834 0 NUMA 0 0 11 11 11 11 11 11 11 11 CUDA 0 11 0 14 14 14 13 13 13 13 CUDA 1 11 14 0 14 14 13 13 13 13 CUDA 2 11 14 13 0 13 13 13 13 13 CUDA 3 11 14 14 13 0 13 13 13 13 CUDA 4 11 13 13 13 13 0 12 12 12 CUDA 5 11 13 13 12 12 12 0 12 12 CUDA 6 11 13 13 13 13 12 12 0 12 CUDA 7 11 13 13 12 13 12 12 12 0 GPU NUMA in preference order (logical index), host-to-device, device-to-host CUDA_0 0 25216 23989 1 25248 25463 CUDA_1 0 25313 23989 1 25214 26147 CUDA_2 0 25295 23988 1 25232 26140 CUDA_3 0 25323 21584 1 25234 26143 CUDA_4 0 25261 23982 1 25255 26149 CUDA_5 0 25244 23984 1 25249 26156 CUDA_6 0 25234 23981 1 25243 26156 CUDA_7 0 25219 23983 1 25235 26156