Skip to content

Commit

Permalink
[OpenMP][Docs] Added offloading command line reference to OpenMP FAQ
Browse files Browse the repository at this point in the history
This command adds an OpenMP offloading specific command line reference. The OpenMP FAQ links to the .rst new file.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D156387
  • Loading branch information
AntonRydahl committed Jul 30, 2023
1 parent 239777c commit 5c0f98c
Show file tree
Hide file tree
Showing 3 changed files with 246 additions and 9 deletions.
187 changes: 187 additions & 0 deletions openmp/docs/CommandLineArgumentReference.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
OpenMP Command-Line Argument Reference
======================================
Welcome to the OpenMP in LLVM command line argument reference. The content is
not a complete list of arguments but includes the essential command-line
arguments you may need when compiling and linking OpenMP.
Section :ref:`general_command_line_arguments` lists OpenMP command line options
for multicore programming while :ref:`offload_command_line_arguments` lists
options relevant to OpenMP target offloading.

.. _general_command_line_arguments:

OpenMP Command-Line Arguments
-----------------------------

``-fopenmp``
^^^^^^^^^^^^
Enable the OpenMP compilation toolchain. The compiler will parse OpenMP
compiler directives and generate parallel code.

``-fopenmp-extensions``
^^^^^^^^^^^^^^^^^^^^^^^
Enable all ``Clang`` extensions for OpenMP directives and clauses. A list of
current extensions and their implementation status can be found on the
`support <https://clang.llvm.org/docs/OpenMPSupport.html#openmp-extensions>`_
page.

``-fopenmp-simd``
^^^^^^^^^^^^^^^^^
This option enables OpenMP only for single instruction, multiple data
(SIMD) constructs.

``-static-openmp``
^^^^^^^^^^^^^^^^^^
Use the static OpenMP host runtime while linking.

``-fopenmp-version=<arg>``
^^^^^^^^^^^^^^^^^^^^^^^^^^
Set the OpenMP version to a specific version ``<arg>`` of the OpenMP standard.
For example, you may use ``-fopenmp-version=45`` to select version 4.5 of
the OpenMP standard. The default value is ``-fopenmp-version=50`` for ``Clang``
and ``-fopenmp-version=11`` for ``flang-new``.

.. _offload_command_line_arguments:

Offloading Specific Command-Line Arguments
------------------------------------------

.. _fopenmp-targets:

``-fopenmp-targets``
^^^^^^^^^^^^^^^^^^^^
| Specify which OpenMP offloading targets should be supported. For example, you
may specify ``-fopenmp-targets=amdgcn-amd-amdhsa,nvptx64``. This option is
often optional when :ref:`offload_arch` is provided.
| It is also possible to offload to CPU architectures, for instance with
``-fopenmp-targets=x86_64-pc-linux-gnu``.
.. _offload_arch:

``--offload-arch``
^^^^^^^^^^^^^^^^^^
| Specify the device architecture for OpenMP offloading. For instance
``--offload-arch=sm_80`` to target an Nvidia Tesla A100,
``--offload-arch=gfx90a`` to target an AMD Instinct MI250X, or
``--offload-arch=sm_80,gfx90a`` to target both.
| It is also possible to specify :ref:`fopenmp-targets` without specifying
``--offload-arch``. In that case, the executables ``amdgpu-arch`` or
``nvptx-arch`` will be executed as part of the compiler driver to
detect the device arhitecture automatically.
| Finally, the device architecture will also be automatically inferred with
``--offload-arch=native``.
``--offload-device-only``
^^^^^^^^^^^^^^^^^^^^^^^^^
Compile only the code that goes on the device. This option is mainly for
debugging purposes. It is primarily used for inspecting the intermediate
representation (IR) output when compiling for the device. It may also be used
if device-only runtimes are created.

``--offload-host-only``
^^^^^^^^^^^^^^^^^^^^^^^
Compile only the code that goes on the host. With this option enabled, the
``.llvm.offloading`` section with embedded device code will not be included in
the intermediate representation.

``--offload-host-device``
^^^^^^^^^^^^^^^^^^^^^^^^^
Compile the target regions for both the host and the device. That is the
default option.

``-Xopenmp-target <arg>``
^^^^^^^^^^^^^^^^^^^^^^^^^
Pass an argument ``<arg>`` to the offloading toolchain, for instance
``-Xopenmp-target -march=sm_80``.

``-Xopenmp-target=<triple> <arg>``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Pass an argument ``<arg>`` to the offloading toolchain for the target
``<triple>``. That is especially useful when an argument must differ for each
triple. For instance ``-Xopenmp-target=nvptx64 --offload-arch=sm_80
-Xopenmp-target=amdgcn --offload-arch=gfx90a`` to specify the device
architecture. Alternatively, :ref:`Xarch_host` and :ref:`Xarch_device` can
pass an argument to the host and device compilation toolchain.

``-Xoffload-linker<triple> <arg>``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Pass an argument ``<arg>`` to the offloading linker for the target specified in
``<triple>``.

.. _Xarch_device:

``-Xarch_device <arg>``
^^^^^^^^^^^^^^^^^^^^^^^
Pass an argument ``<arg>`` to the device compilation toolchain.

.. _Xarch_host:

``-Xarch_host <arg>``
^^^^^^^^^^^^^^^^^^^^^
Pass an argument ``<arg>`` to the host compilation toolchain.

``-foffload-lto[=<arg>]``
^^^^^^^^^^^^^^^^^^^^^^^^^
Enable device link time optimization (LTO) and select the LTO mode ``<arg>``.
Select either ``-foffload-lto=thin`` or ``-foffload-lto=full``. Thin LTO takes
less time while still achieving some performance gains. If no argument is set,
this option defaults to ``-foffload-lto=full``.

``-fopenmp-offload-mandatory``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| This option is set to avoid generating the host fallback code
executed when offloading to the device fails. That is
helpful when the target contains code that cannot be compiled for the host, for
instance, if it contains unguarded device intrinsics.
| This option can also be used to reduce compile time.
| This option should not be used when one wants to verify that the code is being
offloaded to the device. Instead, set the environment variable
``OMP_TARGET_OFFLOAD='MANDATORY'`` to confirm that the code is being offloaded to
the device.
``-fopenmp-target-debug[=<arg>]``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Enable debugging in the device runtime library (RTL). Note that it is both
necessary to configure the debugging in the device runtime at compile-time with
``-fopenmp-target-debug=<arg>`` and enable debugging at runtime with the
environment variable ``LIBOMPTARGET_DEVICE_RTL_DEBUG=<arg>``. Further, it is
currently only supported for Nvidia targets as of July 2023. Alternatively, the
environment variable ``LIBOMPTARGET_DEBUG`` can be set to debug both Nvidia and
AMD GPU targets. For more information, see the
`debugging instructions <https://openmp.llvm.org/design/Runtimes.html#debugging>`_.
The debugging instructions list the supported debugging arguments.

``-fopenmp-target-jit``
^^^^^^^^^^^^^^^^^^^^^^^
| Emit code that is Just-in-Time (JIT) compiled for OpenMP offloading. Embed
LLVM-IR for the device code in the object files rather than binary code for the
respective target. At runtime, the LLVM-IR is optimized again and compiled for
the target device. The optimization level can be set at runtime with
``LIBOMPTARGET_JIT_OPT_LEVEL``, for instance,
``LIBOMPTARGET_JIT_OPT_LEVEL=3`` corresponding to optimizations level ``-O3``.
See the
`OpenMP JIT details <https://openmp.llvm.org/design/Runtimes.html#libomptarget-jit-pre-opt-ir-module>`_
for instructions on extracting the embedded device code before or after the
JIT and more.
| We want to emphasize that JIT for OpenMP offloading is good for debugging as
the target IR can be extracted, modified, and injected at runtime.
``--offload-new-driver``
^^^^^^^^^^^^^^^^^^^^^^^^
In upstream LLVM, OpenMP only uses the new driver. However, enabling this
option for experimental linking with CUDA or HIP files is necessary.

``--offload-link``
^^^^^^^^^^^^^^^^^^
Use the new offloading linker `clang-linker-wrapper` to perform the link job.
`clang-linker-wrapper` is the default offloading linker for OpenMP. This option
can be used to use the new offloading linker in toolchains that do not automatically
use it. It is necessary to enable this option when linking with CUDA or HIP files.

``-nogpulib``
^^^^^^^^^^^^^
Do not link the device library for CUDA or HIP device compilation.

``-nogpuinc``
^^^^^^^^^^^^^
Do not include the default CUDA or HIP headers, and do not add CUDA or HIP
include paths.
53 changes: 44 additions & 9 deletions openmp/docs/SupportAndFAQ.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,13 +52,15 @@ All patches go through the regular `LLVM review process
Q: How to build an OpenMP GPU offload capable compiler?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
To build an *effective* OpenMP offload capable compiler, only one extra CMake
option, `LLVM_ENABLE_RUNTIMES="openmp"`, is needed when building LLVM (Generic
option, ``LLVM_ENABLE_RUNTIMES="openmp"``, is needed when building LLVM (Generic
information about building LLVM is available `here
<https://llvm.org/docs/GettingStarted.html>`__.). Make sure all backends that
are targeted by OpenMP to be enabled. By default, Clang will be built with all
backends enabled. When building with `LLVM_ENABLE_RUNTIMES="openmp"` OpenMP
should not be enabled in `LLVM_ENABLE_PROJECTS` because it is enabled by
default.
<https://llvm.org/docs/GettingStarted.html>`__.). Make sure all backends that
are targeted by OpenMP are enabled. That can be done by adjusting the CMake
option ``LLVM_TARGETS_TO_BUILD``. The corresponding targets for offloading to AMD
and Nvidia GPUs are ``"AMDGPU"`` and ``"NVPTX"``, respectively. By default,
Clang will be built with all backends enabled. When building with
``LLVM_ENABLE_RUNTIMES="openmp"`` OpenMP should not be enabled in
``LLVM_ENABLE_PROJECTS`` because it is enabled by default.

For Nvidia offload, please see :ref:`build_nvidia_offload_capable_compiler`.
For AMDGPU offload, please see :ref:`build_amdgpu_offload_capable_compiler`.
Expand All @@ -72,14 +74,17 @@ For AMDGPU offload, please see :ref:`build_amdgpu_offload_capable_compiler`.

.. _build_nvidia_offload_capable_compiler:

Q: How to build an OpenMP NVidia offload capable compiler?
Q: How to build an OpenMP Nvidia offload capable compiler?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The Cuda SDK is required on the machine that will execute the openmp application.

If your build machine is not the target machine or automatic detection of the
available GPUs failed, you should also set:

- `LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES=YY` where `YY` is the numeric compute capacity of your GPU, e.g., 75.
- ``LIBOMPTARGET_DEVICE_ARCHITECTURES=sm_<xy>,...`` where ``<xy>`` is the numeric
compute capability of your GPU. For instance, set
``LIBOMPTARGET_DEVICE_ARCHITECTURES=sm_70,sm_80`` to target the Nvidia Volta
and Ampere architectures.


.. _build_amdgpu_offload_capable_compiler:
Expand Down Expand Up @@ -133,6 +138,14 @@ With those libraries installed, then LLVM build and installed, try:
clang -O2 -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa example.c -o example && ./example
If your build machine is not the target machine or automatic detection of the
available GPUs failed, you should also set:

- ``LIBOMPTARGET_DEVICE_ARCHITECTURES=gfx<xyz>,...`` where ``<xyz>`` is the
shader core instruction set architecture. For instance, set
``LIBOMPTARGET_DEVICE_ARCHITECTURES=gfx906,gfx90a`` to target AMD GCN5
and CDNA2 devices.

Q: What are the known limitations of OpenMP AMDGPU offload?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
LD_LIBRARY_PATH or rpath/runpath are required to find libomp.so and libomptarget.so
Expand Down Expand Up @@ -349,7 +362,7 @@ create generic libraries.
The architecture can either be specified manually using ``--offload-arch=``. If
``--offload-arch=`` is present no ``-fopenmp-targets=`` flag is present then the
targets will be inferred from the architectures. Conversely, if
``--fopenmp-targets=`` is present with no ``--offload-arch`` then the target
``--fopenmp-targets=`` is present with no ``--offload-arch`` then the target
architecture will be set to a default value, usually the architecture supported
by the system LLVM was built on.

Expand Down Expand Up @@ -451,3 +464,25 @@ with OpenMP.
For more information on how this is implemented in LLVM/OpenMP's offloading
runtime, refer to the `runtime documentation <libomptarget_libc>`_.

Q: What command line options can I use for OpenMP?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
We recommend taking a look at the OpenMP
:doc:`command line argument reference <CommandLineArgumentReference>` page.

Q: Why is my build taking a long time?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
When installing OpenMP and other LLVM components, the build time on multicore
systems can be significantly reduced with parallel build jobs. As suggested in
*LLVM Techniques, Tips, and Best Practices*, one could consider using ``ninja`` as the
generator. This can be done with the CMake option ``cmake -G Ninja``. Afterward,
use ``ninja install`` and specify the number of parallel jobs with ``-j``. The build
time can also be reduced by setting the build type to ``Release`` with the
``CMAKE_BUILD_TYPE`` option. Recompilation can also be sped up by caching previous
compilations. Consider enabling ``Ccache`` with
``CMAKE_CXX_COMPILER_LAUNCHER=ccache``.

Q: Did this FAQ not answer your question?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Feel free to post questions or browse old threads at
`LLVM Discourse <https://discourse.llvm.org/c/runtimes/openmp/>`__.
15 changes: 15 additions & 0 deletions openmp/docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,21 @@ please refer to :doc:`remarks/OptimizationRemarks`.

remarks/OptimizationRemarks

OpenMP Command-Line Argument Reference
======================================
In addition to the
`Clang command-line argument reference <https://clang.llvm.org/docs/ClangCommandLineReference.html>`_
we also recommend the OpenMP
:doc:`command-line argument reference <CommandLineArgumentReference>`
page that offers a detailed overview of options specific to OpenMP. It also
contains a list of OpenMP offloading related command-line arguments.


.. toctree::
:hidden:
:maxdepth: 1

CommandLineArgumentReference

Support, Getting Involved, and Frequently Asked Questions (FAQ)
===============================================================
Expand Down

0 comments on commit 5c0f98c

Please sign in to comment.