diff --git a/libc/docs/gpu/building.rst b/libc/docs/gpu/building.rst index dab21e1324d281..6d94134a407d34 100644 --- a/libc/docs/gpu/building.rst +++ b/libc/docs/gpu/building.rst @@ -220,11 +220,15 @@ targets. This section will briefly describe their purpose. be used to enable host services for anyone looking to interface with the :ref:`RPC client`. +.. _gpu_cmake_options: + CMake options ============= This section briefly lists a few of the CMake variables that specifically -control the GPU build of the C library. +control the GPU build of the C library. These options can be passed individually +to each target using ``-DRUNTIMES__=`` when using a +standard runtime build. **LLVM_LIBC_FULL_BUILD**:BOOL This flag controls whether or not the libc build will generate its own diff --git a/libc/docs/gpu/testing.rst b/libc/docs/gpu/testing.rst index 9842a675283619..9f17159fb6d5ee 100644 --- a/libc/docs/gpu/testing.rst +++ b/libc/docs/gpu/testing.rst @@ -1,9 +1,9 @@ .. _libc_gpu_testing: -============================ -Testing the GPU libc library -============================ +========================= +Testing the GPU C library +========================= .. note:: Running GPU tests with high parallelism is likely to cause spurious failures, @@ -14,24 +14,134 @@ Testing the GPU libc library :depth: 4 :local: -Testing Infrastructure +Testing infrastructure ====================== -The testing support in LLVM's libc implementation for GPUs is designed to mimic -the standard unit tests as much as possible. We use the :ref:`libc_gpu_rpc` -support to provide the necessary utilities like printing from the GPU. Execution -is performed by emitting a ``_start`` kernel from the GPU -that is then called by an external loader utility. This is an example of how -this can be done manually: +The LLVM C library supports different kinds of :ref:`tests ` +depending on the build configuration. The GPU target is considered a full build +and therefore provides all of its own utilities to build and run the generated +tests. Currently the GPU supports two kinds of tests. + +#. **Hermetic tests** - These are unit tests built with a test suite similar to + Google's ``gtest`` infrastructure. These use the same infrastructure as unit + tests except that the entire environment is self-hosted. This allows us to + run them on the GPU using our custom utilities. These are used to test the + majority of functional implementations. + +#. **Integration tests** - These are lightweight tests that simply call a + ``main`` function and checks if it returns non-zero. These are primarily used + to test interfaces that are sensitive to threading. + +The GPU uses the same testing infrastructure as the other supported ``libc`` +targets. We do this by treating the GPU as a standard hosted environment capable +of launching a ``main`` function. Effectively, this means building our own +startup libraries and loader. + +Testing utilities +================= + +We provide two utilities to execute arbitrary programs on the GPU. That is the +``loader`` and the ``start`` object. + +Startup object +-------------- + +This object mimics the standard object used by existing C library +implementations. Its job is to perform the necessary setup prior to calling the +``main`` function. In the GPU case, this means exporting GPU kernels that will +perform the necessary operations. Here we use ``_begin`` and ``_end`` to handle +calling global constructors and destructors while ``_start`` begins the standard +execution. The following code block shows the implementation for AMDGPU +architectures. + +.. code-block:: c++ + + extern "C" [[gnu::visibility("protected"), clang::amdgpu_kernel]] void + _begin(int argc, char **argv, char **env) { + LIBC_NAMESPACE::atexit(&LIBC_NAMESPACE::call_fini_array_callbacks); + LIBC_NAMESPACE::call_init_array_callbacks(argc, argv, env); + } + + extern "C" [[gnu::visibility("protected"), clang::amdgpu_kernel]] void + _start(int argc, char **argv, char **envp, int *ret) { + __atomic_fetch_or(ret, main(argc, argv, envp), __ATOMIC_RELAXED); + } + + extern "C" [[gnu::visibility("protected"), clang::amdgpu_kernel]] void + _end(int retval) { + LIBC_NAMESPACE::exit(retval); + } + +Loader runtime +-------------- + +The startup object provides a GPU executable with callable kernels for the +respective runtime. We can then define a minimal runtime that will launch these +kernels on the given device. Currently we provide the ``amdhsa-loader`` and +``nvptx-loader`` targeting the AMD HSA runtime and CUDA driver runtime +respectively. By default these will launch with a single thread on the GPU. .. code-block:: sh - $> clang++ crt1.o test.cpp --target=amdgcn-amd-amdhsa -mcpu=gfx90a -flto - $> ./amdhsa_loader --threads 1 --blocks 1 a.out + $> clang++ crt1.o test.cpp --target=amdgcn-amd-amdhsa -mcpu=native -flto + $> amdhsa_loader --threads 1 --blocks 1 ./a.out Test Passed! -Unlike the exported ``libcgpu.a``, the testing architecture can only support a -single architecture at a time. This is either detected automatically, or set -manually by the user using ``LIBC_GPU_TEST_ARCHITECTURE``. The latter is useful -in cases where the user does not build LLVM's libc on machine with the GPU to -use for testing. +The loader utility will forward any arguments passed after the executable image +to the program on the GPU as well as any set environment variables. The number +of threads and blocks to be set can be controlled with ``--threads`` and +``--blocks``. These also accept additional ``x``, ``y``, ``z`` variants for +multidimensional grids. + +Running tests +============= + +Tests will only be built and run if a GPU target architecture is set and the +corresponding loader utility was built. These can be overridden with the +``LIBC_GPU_TEST_ARCHITECTURE`` and ``LIBC_GPU_LOADER_EXECUTABLE`` :ref:`CMake +options `. Once built, they can be run like any other tests. +The CMake target depends on how the library was built. + +#. **Cross build** - If the C library was built using ``LLVM_ENABLE_PROJECTS`` + or a runtimes cross build, then the standard targets will be present in the + base CMake build directory. + + #. All tests - You can run all supported tests with the command: + + .. code-block:: sh + + $> ninja check-libc + + #. Hermetic tests - You can run hermetic with tests the command: + + .. code-block:: sh + + $> ninja libc-hermetic-tests + + #. Integration tests - You can run integration tests by the command: + + .. code-block:: sh + + $> ninja libc-integration-tests + +#. **Runtimes build** - If the library was built using ``LLVM_ENABLE_RUNTIMES`` + then the actual ``libc`` build will be in a separate directory. + + #. All tests - You can run all supported tests with the command: + + .. code-block:: sh + + $> ninja check-libc-amdgcn-amd-amdhsa + $> ninja check-libc-nvptx64-nvidia-cuda + + #. Specific tests - You can use the same targets as above by entering the + runtimes build directory. + + .. code-block:: sh + + $> ninja -C runtimes/runtimes-amdgcn-amd-amdhsa-bins check-libc + $> ninja -C runtimes/runtimes-nvptx64-nvidia-cuda-bins check-libc + $> cd runtimes/runtimes-amdgcn-amd-amdhsa-bins && ninja check-libc + $> cd runtimes/runtimes-nvptx64-nvidia-cuda-bins && ninja check-libc + +Tests can also be built and run manually using the respective loader utility. diff --git a/libc/docs/gpu/using.rst b/libc/docs/gpu/using.rst index 11a00cd620d866..1a9446eeb1130a 100644 --- a/libc/docs/gpu/using.rst +++ b/libc/docs/gpu/using.rst @@ -159,17 +159,21 @@ GPUs. } We can then compile this for both NVPTX and AMDGPU into LLVM-IR using the -following commands. +following commands. This will yield valid LLVM-IR for the given target just like +if we were using CUDA, OpenCL, or OpenMP. .. code-block:: sh $> clang id.c --target=amdgcn-amd-amdhsa -mcpu=native -nogpulib -flto -c $> clang id.c --target=nvptx64-nvidia-cuda -march=native -nogpulib -flto -c -We use this support to treat the GPU as a hosted environment by providing a C -library and startup object just like a standard C library running on the host -machine. Then, in order to execute these programs, we provide a loader utility -to launch the executable on the GPU similar to a cross-compiling emulator. +We can also use this support to treat the GPU as a hosted environment by +providing a C library and startup object just like a standard C library running +on the host machine. Then, in order to execute these programs, we provide a +loader utility to launch the executable on the GPU similar to a cross-compiling +emulator. This is how we run :ref:`unit tests ` targeting the +GPU. This is clearly not the most efficient way to use a GPU, but it provides a +simple method to test execution on a GPU for debugging or development. Building for AMDGPU targets ^^^^^^^^^^^^^^^^^^^^^^^^^^^