|
| 1 | +========== |
| 2 | +KernelInfo |
| 3 | +========== |
| 4 | + |
| 5 | +.. contents:: |
| 6 | + :local: |
| 7 | + |
| 8 | +Introduction |
| 9 | +============ |
| 10 | + |
| 11 | +This LLVM IR pass reports various statistics for codes compiled for GPUs. The |
| 12 | +goal of these statistics is to help identify bad code patterns and ways to |
| 13 | +mitigate them. The pass operates at the LLVM IR level so that it can, in |
| 14 | +theory, support any LLVM-based compiler for programming languages supporting |
| 15 | +GPUs. |
| 16 | + |
| 17 | +By default, the pass runs at the end of LTO, and options like |
| 18 | +``-Rpass=kernel-info`` enable its remarks. Example ``opt`` and ``clang`` |
| 19 | +command lines appear in the next section. |
| 20 | + |
| 21 | +Remarks include summary statistics (e.g., total size of static allocas) and |
| 22 | +individual occurrences (e.g., source location of each alloca). Examples of the |
| 23 | +output appear in tests in `llvm/test/Analysis/KernelInfo`. |
| 24 | + |
| 25 | +Example Command Lines |
| 26 | +===================== |
| 27 | + |
| 28 | +To analyze a C program as it appears to an LLVM GPU backend at the end of LTO: |
| 29 | + |
| 30 | +.. code-block:: shell |
| 31 | +
|
| 32 | + $ clang -O2 -g -fopenmp --offload-arch=native test.c -foffload-lto \ |
| 33 | + -Rpass=kernel-info |
| 34 | +
|
| 35 | +To analyze specified LLVM IR, perhaps previously generated by something like |
| 36 | +``clang -save-temps -g -fopenmp --offload-arch=native test.c``: |
| 37 | + |
| 38 | +.. code-block:: shell |
| 39 | +
|
| 40 | + $ opt -disable-output test-openmp-nvptx64-nvidia-cuda-sm_70.bc \ |
| 41 | + -pass-remarks=kernel-info -passes=kernel-info |
| 42 | +
|
| 43 | +When specifying an LLVM pass pipeline on the command line, ``kernel-info`` still |
| 44 | +runs at the end of LTO by default. ``-no-kernel-info-end-lto`` disables that |
| 45 | +behavior so you can position ``kernel-info`` explicitly: |
| 46 | + |
| 47 | +.. code-block:: shell |
| 48 | +
|
| 49 | + $ clang -O2 -g -fopenmp --offload-arch=native test.c -foffload-lto \ |
| 50 | + -Rpass=kernel-info \ |
| 51 | + -Xoffload-linker --lto-newpm-passes='lto<O2>' |
| 52 | +
|
| 53 | + $ clang -O2 -g -fopenmp --offload-arch=native test.c -foffload-lto \ |
| 54 | + -Rpass=kernel-info -mllvm -no-kernel-info-end-lto \ |
| 55 | + -Xoffload-linker --lto-newpm-passes='module(kernel-info),lto<O2>' |
| 56 | +
|
| 57 | + $ opt -disable-output test-openmp-nvptx64-nvidia-cuda-sm_70.bc \ |
| 58 | + -pass-remarks=kernel-info \ |
| 59 | + -passes='lto<O2>' |
| 60 | +
|
| 61 | + $ opt -disable-output test-openmp-nvptx64-nvidia-cuda-sm_70.bc \ |
| 62 | + -pass-remarks=kernel-info -no-kernel-info-end-lto \ |
| 63 | + -passes='module(kernel-info),lto<O2>' |
0 commit comments