Skip to content

Commit df96551

Browse files
jhuber6jhuber-ornl
authored andcommitted
[OpenMP] Add an information flag for device data transfers
This patch adds an information flag that indicated when data is being copied to and from the device. This will be helpful for finding redundant or unnecessary data transfers in applications. Reviewed By: jdoerfert, grokos Differential Revision: https://reviews.llvm.org/D103927
1 parent f9649d1 commit df96551

File tree

4 files changed

+81
-40
lines changed

4 files changed

+81
-40
lines changed

openmp/docs/design/Runtimes.rst

Lines changed: 36 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,7 @@ with `-g` for full debug information. A full list of flags supported by
8989
* Dump the contents of the device pointer map at kernel exit: ``0x04``
9090
* Indicate when an entry is changed in the device mapping table: ``0x08``
9191
* Print OpenMP kernel information from device plugins: ``0x10``
92+
* Indicate when data is copied to and from the device: ``0x20``
9293

9394
Any combination of these flags can be used by setting the appropriate bits. For
9495
example, to enable printing all data active in an OpenMP target region along
@@ -137,44 +138,53 @@ provide the following output from the runtime library.
137138
138139
.. code-block:: text
139140
140-
Info: Device supports up to 65536 CUDA blocks and 1024 threads with a warp size of 32
141141
Info: Entering OpenMP data region at zaxpy.cpp:14:1 with 2 arguments:
142-
Info: to(X[0:N])[16384]
143-
Info: tofrom(Y[0:N])[16384]
144-
Info: Creating new map entry with HstPtrBegin=0x00007fff963f4000,
145-
TgtPtrBegin=0x00007fff963f4000, Size=16384, Name=X[0:N]
146-
Info: Creating new map entry with HstPtrBegin=0x00007fff963f8000,
147-
TgtPtrBegin=0x00007fff963f00000, Size=16384, Name=Y[0:N]
142+
Info: to(X[0:N])[16384]
143+
Info: tofrom(Y[0:N])[16384]
144+
Info: Creating new map entry with HstPtrBegin=0x00007ffde9e99000,
145+
TgtPtrBegin=0x00007f15dc600000, Size=16384, Name=X[0:N]
146+
Info: Copying data from host to device, HstPtr=0x00007ffde9e99000,
147+
TgtPtr=0x00007f15dc600000, Size=16384, Name=X[0:N]
148+
Info: Creating new map entry with HstPtrBegin=0x00007ffde9e95000,
149+
TgtPtrBegin=0x00007f15dc604000, Size=16384, Name=Y[0:N]
150+
Info: Copying data from host to device, HstPtr=0x00007ffde9e95000,
151+
TgtPtr=0x00007f15dc604000, Size=16384, Name=Y[0:N]
148152
Info: OpenMP Host-Device pointer mappings after block at zaxpy.cpp:14:1:
149153
Info: Host Ptr Target Ptr Size (B) RefCount Declaration
150-
Info: 0x00007fff963f4000 0x00007fd225004000 16384 1 Y[0:N] at zaxpy.cpp:13:17
151-
Info: 0x00007fff963f8000 0x00007fd225000000 16384 1 X[0:N] at zaxpy.cpp:13:11
154+
Info: 0x00007ffde9e95000 0x00007f15dc604000 16384 1 Y[0:N] at zaxpy.cpp:13:17
155+
Info: 0x00007ffde9e99000 0x00007f15dc600000 16384 1 X[0:N] at zaxpy.cpp:13:11
152156
Info: Entering OpenMP kernel at zaxpy.cpp:6:1 with 4 arguments:
153157
Info: firstprivate(N)[8] (implicit)
154158
Info: use_address(Y)[0] (implicit)
155159
Info: tofrom(D)[16] (implicit)
156160
Info: use_address(X)[0] (implicit)
157-
Info: Mapping exists (implicit) with HstPtrBegin=0x00007ffe37d8be80,
158-
TgtPtrBegin=0x00007f90ff004000, Size=0, updated RefCount=2, Name=Y
159-
Info: Creating new map entry with HstPtrBegin=0x00007fff963f33ff0,
160-
TgtPtrBegin=0x00007fd225003ff0, Size=16, Name=D
161-
Info: Mapping exists (implicit) with HstPtrBegin=0x00007ffe37d8fe80,
162-
TgtPtrBegin=0x00007f90ff000000, Size=0, updated RefCount=2, Name=X
163-
Info: Launching kernel __omp_offloading_fd02_c2c4ac1a__Z5daxpyPNSt3__17complexIdEES2_S1_m_l6
161+
Info: Mapping exists (implicit) with HstPtrBegin=0x00007ffde9e95000,
162+
TgtPtrBegin=0x00007f15dc604000, Size=0, updated RefCount=2, Name=Y
163+
Info: Creating new map entry with HstPtrBegin=0x00007ffde9e94fb0,
164+
TgtPtrBegin=0x00007f15dc608000, Size=16, Name=D
165+
Info: Copying data from host to device, HstPtr=0x00007ffde9e94fb0,
166+
TgtPtr=0x00007f15dc608000, Size=16, Name=D
167+
Info: Mapping exists (implicit) with HstPtrBegin=0x00007ffde9e99000,
168+
TgtPtrBegin=0x00007f15dc600000, Size=0, updated RefCount=2, Name=X
169+
Info: Launching kernel __omp_offloading_fd02_e25f6e76__Z5zaxpyPSt7complexIdES1_S0_m_l6
164170
with 8 blocks and 128 threads in SPMD mode
165-
Info: Removing map entry with HstPtrBegin=0x00007fff963f33ff0,
166-
TgtPtrBegin=0x00007fd225003ff0, Size=16, Name=D
171+
Info: Copying data from device to host, TgtPtr=0x00007f15dc608000,
172+
HstPtr=0x00007ffde9e94fb0, Size=16, Name=D
173+
Info: Removing map entry with HstPtrBegin=0x00007ffde9e94fb0,
174+
TgtPtrBegin=0x00007f15dc608000, Size=16, Name=D
167175
Info: OpenMP Host-Device pointer mappings after block at zaxpy.cpp:6:1:
168176
Info: Host Ptr Target Ptr Size (B) RefCount Declaration
169-
Info: 0x00007fff963f4000 0x00007fd225004000 16384 1 Y[0:N] at zaxpy.cpp:13:17
170-
Info: 0x00007fff963f8000 0x00007fd225000000 16384 1 X[0:N] at zaxpy.cpp:13:11
177+
Info: 0x00007ffde9e95000 0x00007f15dc604000 16384 1 Y[0:N] at zaxpy.cpp:13:17
178+
Info: 0x00007ffde9e99000 0x00007f15dc600000 16384 1 X[0:N] at zaxpy.cpp:13:11
171179
Info: Exiting OpenMP data region at zaxpy.cpp:14:1 with 2 arguments:
172-
Info: to(X[0:N])[16384]
173-
Info: tofrom(Y[0:N])[16384]
174-
Info: Removing map entry with HstPtrBegin=0x00007fff963f4000,
175-
TgtPtrBegin=0x00007fff963f4000, Size=16384, Name=X[0:N]
176-
Info: Removing map entry with HstPtrBegin=0x00007fff963f8000,
177-
TgtPtrBegin=0x00007fff963f00000, Size=16384, Name=Y[0:N]
180+
Info: to(X[0:N])[16384]
181+
Info: tofrom(Y[0:N])[16384]
182+
Info: Copying data from device to host, TgtPtr=0x00007f15dc604000,
183+
HstPtr=0x00007ffde9e95000, Size=16384, Name=Y[0:N]
184+
Info: Removing map entry with HstPtrBegin=0x00007ffde9e95000,
185+
TgtPtrBegin=0x00007f15dc604000, Size=16384, Name=Y[0:N]
186+
Info: Removing map entry with HstPtrBegin=0x00007ffde9e99000,
187+
TgtPtrBegin=0x00007f15dc600000, Size=16384, Name=X[0:N]
178188
179189
From this information, we can see the OpenMP kernel being launched on the CUDA
180190
device with enough threads and blocks for all ``1024`` iterations of the loop in

openmp/libomptarget/include/Debug.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,8 @@ enum OpenMPInfoType : uint32_t {
5252
OMP_INFOTYPE_MAPPING_CHANGED = 0x0008,
5353
// Print kernel information from target device plugins.
5454
OMP_INFOTYPE_PLUGIN_KERNEL = 0x0010,
55+
// Print whenever data is transferred to the device
56+
OMP_INFOTYPE_DATA_TRANSFER = 0x0020,
5557
// Enable every flag.
5658
OMP_INFOTYPE_ALL = 0xffffffff,
5759
};

openmp/libomptarget/src/device.cpp

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -420,6 +420,18 @@ int32_t DeviceTy::deleteData(void *TgtPtrBegin) {
420420
// Submit data to device
421421
int32_t DeviceTy::submitData(void *TgtPtrBegin, void *HstPtrBegin, int64_t Size,
422422
AsyncInfoTy &AsyncInfo) {
423+
if (getInfoLevel() & OMP_INFOTYPE_DATA_TRANSFER) {
424+
LookupResult LR = lookupMapping(HstPtrBegin, Size);
425+
auto *HT = &*LR.Entry;
426+
427+
INFO(OMP_INFOTYPE_DATA_TRANSFER, DeviceID,
428+
"Copying data from host to device, HstPtr=" DPxMOD ", TgtPtr=" DPxMOD
429+
", Size=%" PRId64 ", Name=%s\n",
430+
DPxPTR(HstPtrBegin), DPxPTR(TgtPtrBegin), Size,
431+
(HT && HT->HstPtrName) ? getNameFromMapping(HT->HstPtrName).c_str()
432+
: "unknown");
433+
}
434+
423435
if (!AsyncInfo || !RTL->data_submit_async || !RTL->synchronize)
424436
return RTL->data_submit(RTLDeviceID, TgtPtrBegin, HstPtrBegin, Size);
425437
else
@@ -430,6 +442,17 @@ int32_t DeviceTy::submitData(void *TgtPtrBegin, void *HstPtrBegin, int64_t Size,
430442
// Retrieve data from device
431443
int32_t DeviceTy::retrieveData(void *HstPtrBegin, void *TgtPtrBegin,
432444
int64_t Size, AsyncInfoTy &AsyncInfo) {
445+
if (getInfoLevel() & OMP_INFOTYPE_DATA_TRANSFER) {
446+
LookupResult LR = lookupMapping(HstPtrBegin, Size);
447+
auto *HT = &*LR.Entry;
448+
INFO(OMP_INFOTYPE_DATA_TRANSFER, DeviceID,
449+
"Copying data from device to host, TgtPtr=" DPxMOD ", HstPtr=" DPxMOD
450+
", Size=%" PRId64 ", Name=%s\n",
451+
DPxPTR(TgtPtrBegin), DPxPTR(HstPtrBegin), Size,
452+
(HT && HT->HstPtrName) ? getNameFromMapping(HT->HstPtrName).c_str()
453+
: "unknown");
454+
}
455+
433456
if (!RTL->data_retrieve_async || !RTL->synchronize)
434457
return RTL->data_retrieve(RTLDeviceID, HstPtrBegin, TgtPtrBegin, Size);
435458
else

openmp/libomptarget/test/offloading/info.c

Lines changed: 20 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
// RUN: %libomptarget-compile-nvptx64-nvidia-cuda -gline-tables-only && env LIBOMPTARGET_INFO=31 %libomptarget-run-nvptx64-nvidia-cuda 2>&1 | %fcheck-nvptx64-nvidia-cuda -allow-empty -check-prefix=INFO
1+
// RUN: %libomptarget-compile-nvptx64-nvidia-cuda -gline-tables-only && env LIBOMPTARGET_INFO=63 %libomptarget-run-nvptx64-nvidia-cuda 2>&1 | %fcheck-nvptx64-nvidia-cuda -allow-empty -check-prefix=INFO
22
// REQUIRES: nvptx64-nvidia-cuda
33

44
#include <stdio.h>
@@ -14,28 +14,34 @@ int main() {
1414
int C[N];
1515
int val = 1;
1616

17-
// INFO: CUDA device 0 info: Device supports up to {{.*}} CUDA blocks and {{.*}} threads with a warp size of {{.*}}
18-
// INFO: Libomptarget device 0 info: Entering OpenMP data region at info.c:{{[0-9]+}}:1 with 3 arguments:
17+
// INFO: CUDA device 0 info: Device supports up to {{[0-9]+}} CUDA blocks and {{[0-9]+}} threads with a warp size of {{[0-9]+}}
18+
// INFO: Libomptarget device 0 info: Entering OpenMP data region at info.c:{{[0-9]+}}:{{[0-9]+}} with 3 arguments:
1919
// INFO: Libomptarget device 0 info: alloc(A[0:64])[256]
2020
// INFO: Libomptarget device 0 info: tofrom(B[0:64])[256]
2121
// INFO: Libomptarget device 0 info: to(C[0:64])[256]
2222
// INFO: Libomptarget device 0 info: Creating new map entry with HstPtrBegin={{.*}}, TgtPtrBegin={{.*}}, Size=256, Name=A[0:64]
2323
// INFO: Libomptarget device 0 info: Creating new map entry with HstPtrBegin={{.*}}, TgtPtrBegin={{.*}}, Size=256, Name=B[0:64]
24+
// INFO: Libomptarget device 0 info: Copying data from host to device, HstPtr={{.*}}, TgtPtr={{.*}}, Size=256, Name=B[0:64]
2425
// INFO: Libomptarget device 0 info: Creating new map entry with HstPtrBegin={{.*}}, TgtPtrBegin={{.*}}, Size=256, Name=C[0:64]
25-
// INFO: Libomptarget device 0 info: OpenMP Host-Device pointer mappings after block at info.c:{{[0-9]+}}:1:
26+
// INFO: Libomptarget device 0 info: Copying data from host to device, HstPtr={{.*}}, TgtPtr={{.*}}, Size=256, Name=C[0:64]
27+
// INFO: Libomptarget device 0 info: OpenMP Host-Device pointer mappings after block at info.c:{{[0-9]+}}:{{[0-9]+}}:
2628
// INFO: Libomptarget device 0 info: Host Ptr Target Ptr Size (B) RefCount Declaration
27-
// INFO: Libomptarget device 0 info: {{.*}} {{.*}} 256 1 C[0:64] at info.c:{{[0-9]+}}:7
28-
// INFO: Libomptarget device 0 info: {{.*}} {{.*}} 256 1 B[0:64] at info.c:{{[0-9]+}}:7
29-
// INFO: Libomptarget device 0 info: {{.*}} {{.*}} 256 1 A[0:64] at info.c:{{[0-9]+}}:7
30-
// INFO: Libomptarget device 0 info: Entering OpenMP kernel at info.c:{{[0-9]+}}:1 with 1 arguments:
29+
// INFO: Libomptarget device 0 info: {{.*}} {{.*}} 256 1 C[0:64] at info.c:{{[0-9]+}}:{{[0-9]+}}
30+
// INFO: Libomptarget device 0 info: {{.*}} {{.*}} 256 1 B[0:64] at info.c:{{[0-9]+}}:{{[0-9]+}}
31+
// INFO: Libomptarget device 0 info: {{.*}} {{.*}} 256 1 A[0:64] at info.c:{{[0-9]+}}:{{[0-9]+}}
32+
// INFO: Libomptarget device 0 info: Entering OpenMP kernel at info.c:{{[0-9]+}}:{{[0-9]+}} with 1 arguments:
3133
// INFO: Libomptarget device 0 info: firstprivate(val)[4]
32-
// INFO: CUDA device 0 info: Launching kernel {{.*}} with {{.*}} and {{.*}} threads in {{.*}} mode
33-
// INFO: Libomptarget device 0 info: OpenMP Host-Device pointer mappings after block at info.c:{{[0-9]+}}:1:
34+
// INFO: CUDA device 0 info: Launching kernel __omp_offloading_{{.*}}main{{.*}} with {{[0-9]+}} blocks and {{[0-9]+}} threads in Generic mode
35+
// INFO: Libomptarget device 0 info: OpenMP Host-Device pointer mappings after block at info.c:{{[0-9]+}}:{{[0-9]+}}:
3436
// INFO: Libomptarget device 0 info: Host Ptr Target Ptr Size (B) RefCount Declaration
35-
// INFO: Libomptarget device 0 info: 0x{{.*}} 0x{{.*}} 256 1 C[0:64] at info.c:{{[0-9]+}}:7
36-
// INFO: Libomptarget device 0 info: 0x{{.*}} 0x{{.*}} 256 1 B[0:64] at info.c:{{[0-9]+}}:7
37-
// INFO: Libomptarget device 0 info: 0x{{.*}} 0x{{.*}} 256 1 A[0:64] at info.c:{{[0-9]+}}:7
38-
// INFO: Libomptarget device 0 info: Exiting OpenMP data region at info.c:{{[0-9]+}}:1
37+
// INFO: Libomptarget device 0 info: {{.*}} {{.*}} 256 1 C[0:64] at info.c:{{[0-9]+}}:{{[0-9]+}}
38+
// INFO: Libomptarget device 0 info: {{.*}} {{.*}} 256 1 B[0:64] at info.c:{{[0-9]+}}:{{[0-9]+}}
39+
// INFO: Libomptarget device 0 info: {{.*}} {{.*}} 256 1 A[0:64] at info.c:{{[0-9]+}}:{{[0-9]+}}
40+
// INFO: Libomptarget device 0 info: Exiting OpenMP data region at info.c:{{[0-9]+}}:{{[0-9]+}} with 3 arguments:
41+
// INFO: Libomptarget device 0 info: alloc(A[0:64])[256]
42+
// INFO: Libomptarget device 0 info: tofrom(B[0:64])[256]
43+
// INFO: Libomptarget device 0 info: to(C[0:64])[256]
44+
// INFO: Libomptarget device 0 info: Copying data from device to host, TgtPtr={{.*}}, HstPtr={{.*}}, Size=256, Name=B[0:64]
3945
// INFO: Libomptarget device 0 info: Removing map entry with HstPtrBegin={{.*}}, TgtPtrBegin={{.*}}, Size=256, Name=C[0:64]
4046
// INFO: Libomptarget device 0 info: Removing map entry with HstPtrBegin={{.*}}, TgtPtrBegin={{.*}}, Size=256, Name=B[0:64]
4147
// INFO: Libomptarget device 0 info: Removing map entry with HstPtrBegin={{.*}}, TgtPtrBegin={{.*}}, Size=256, Name=A[0:64]

0 commit comments

Comments
 (0)