@@ -16,6 +16,8 @@ the LLVM/OpenMP host runtime, aka. `libomp.so`, is available as a `pdf
1616LLVM/OpenMP Target Host Runtime (``libomptarget ``)
1717--------------------------------------------------
1818
19+ .. _libopenmptarget_environment_vars :
20+
1921Environment Variables
2022^^^^^^^^^^^^^^^^^^^^^
2123
@@ -171,6 +173,95 @@ shows that ``D`` will be copied back from the device once the OpenMP device
171173kernel region ends even though it isn't written to. Finally, at the end of the
172174OpenMP data region the entries for ``X `` and ``Y `` are removed from the table.
173175
176+ .. _libopenmptarget_errors :
177+
178+ Errors:
179+ ^^^^^^^
180+
181+ ``libomptarget `` provides error messages when the program fails inside the
182+ OpenMP target region. Common causes of failure could be an invalid pointer
183+ access, running out of device memory, or trying to offload when the device is
184+ busy. If the application was built with debugging symbols the error messages
185+ will additionally provide the source location of the OpenMP target region.
186+
187+ For example, consider the following code that implements a simple parallel
188+ reduction on the GPU. This code has a bug that causes it to fail in the
189+ offloading region.
190+
191+ .. code-block :: c++
192+
193+ #include <cstdio>
194+
195+ double sum(double *A, std::size_t N) {
196+ double sum = 0.0;
197+ #pragma omp target teams distribute parallel for reduction(+:sum)
198+ for (int i = 0; i < N; ++i)
199+ sum += A[i];
200+
201+ return sum;
202+ }
203+
204+ int main() {
205+ const int N = 1024;
206+ double A[N];
207+ sum(A, N);
208+ }
209+
210+ If this code is compiled and run, there will be an error message indicating what is
211+ going wrong.
212+
213+ .. code-block :: console
214+
215+ $ clang++ -fopenmp -fopenmp-targets=nvptx64 -O3 -gline-tables-only sum.cpp -o sum
216+ $ ./sum
217+
218+ .. code-block :: text
219+
220+ CUDA error: Error when copying data from device to host.
221+ CUDA error: an illegal memory access was encountered
222+ Libomptarget error: Copying data from device failed.
223+ Libomptarget error: Call to targetDataEnd failed, abort target.
224+ Libomptarget error: Failed to process data after launching the kernel.
225+ Libomptarget error: Run with LIBOMPTARGET_INFO=4 to dump host-target pointer mappings.
226+ sum.cpp:5:1: Libomptarget error 1: failure of target construct while offloading is mandatory
227+
228+ This shows that there is an illegal memory access occuring inside the OpenMP
229+ target region once execution has moved to the CUDA device, suggesting a
230+ segmentation fault. This then causes a chain reaction of failures in
231+ ``libomptarget ``. Another message suggests using the ``LIBOMPTARGET_INFO ``
232+ environment variable as described in :ref: `libopenmptarget_environment_vars `. If
233+ we do this it will print the sate of the host-target pointer mappings at the
234+ time of failure.
235+
236+ .. code-block :: console
237+
238+ $ clang++ -fopenmp -fopenmp-targets=nvptx64 -O3 -gline-tables-only sum.cpp -o sum
239+ $ env LIBOMPTARGET_INFO=4 ./sum
240+
241+ .. code-block :: text
242+
243+ info: OpenMP Host-Device pointer mappings after block at sum.cpp:5:1:
244+ info: Host Ptr Target Ptr Size (B) RefCount Declaration
245+ info: 0x00007ffc058280f8 0x00007f4186600000 8 1 sum at sum.cpp:4:10
246+
247+ This tells us that the only data mapped between the host and the device is the
248+ ``sum `` variable that will be copied back from the device once the reduction has
249+ ended. There is no entry mapping the host array ``A `` to the device. In this
250+ situation, the compiler cannot determine the size of the array at compile time
251+ so it will simply assume that the pointer is mapped on the device already by
252+ default. The solution is to add an explicit map clause in the target region.
253+
254+ .. code-block :: c++
255+
256+ double sum(double *A, std::size_t N) {
257+ double sum = 0.0;
258+ #pragma omp target teams distribute parallel for reduction(+:sum) map(to:A[0 : N])
259+ for (int i = 0; i < N; ++i)
260+ sum += A[i];
261+
262+ return sum;
263+ }
264+
174265.. toctree ::
175266 :hidden:
176267 :maxdepth: 1
0 commit comments