Skip to content

Commit

Permalink
Add AMDPAL Code Conventions section to AMD docs
Browse files Browse the repository at this point in the history
Summary:
This is a first version of the AMDPAL code conventions.
Further updates will undoubtably be required to fully
document AMDPAL.

Subscribers: nhaehnle, llvm-commits

Differential Revision: https://reviews.llvm.org/D45246

llvm-svn: 329188
  • Loading branch information
Tim Corringham committed Apr 4, 2018
1 parent a0b8cd0 commit af2dfc6
Showing 1 changed file with 119 additions and 0 deletions.
119 changes: 119 additions & 0 deletions llvm/docs/AMDGPUUsage.rst
Expand Up @@ -3781,6 +3781,125 @@ the ``s_trap`` instruction with the following usage:
debugger ``s_trap 0xff`` Reserved for debugger.
=================== =============== =============== =======================

AMDPAL
------

This section provides code conventions used when the target triple OS is
``amdpal`` (see :ref:`amdgpu-target-triples`) for passing runtime parameters
from the application/runtime to each invocation of a hardware shader. These
parameters include both generic, application-controlled parameters called
*user data* as well as system-generated parameters that are a product of the
draw or dispatch execution.

User Data
~~~~~~~~~

Each hardware stage has a set of 32-bit *user data registers* which can be
written from a command buffer and then loaded into SGPRs when waves are launched
via a subsequent dispatch or draw operation. This is the way most arguments are
passed from the application/runtime to a hardware shader.

Compute User Data
~~~~~~~~~~~~~~~~~

Compute shader user data mappings are simpler than graphics shaders, and have a
fixed mapping.

Note that there are always 10 available *user data entries* in registers -
entries beyond that limit must be fetched from memory (via the spill table
pointer) by the shader.

.. table:: PAL Compute Shader User Data Registers
:name: pal-compute-user-data-registers

============= ================================
User Register Description
============= ================================
0 Global Internal Table (32-bit pointer)
1 Per-Shader Internal Table (32-bit pointer)
2 - 11 Application-Controlled User Data (10 32-bit values)
12 Spill Table (32-bit pointer)
13 - 14 Thread Group Count (64-bit pointer)
15 GDS Range
============= ================================

Graphics User Data
~~~~~~~~~~~~~~~~~~

Graphics pipelines support a much more flexible user data mapping:

.. table:: PAL Graphics Shader User Data Registers
:name: pal-graphics-user-data-registers

============= ================================
User Register Description
============= ================================
0 Global Internal Table (32-bit pointer)
+ Per-Shader Internal Table (32-bit pointer)
+ 1-15 Application Controlled User Data
(1-15 Contiguous 32-bit Values in Registers)
+ Spill Table (32-bit pointer)
+ Draw Index (First Stage Only)
+ Vertex Offset (First Stage Only)
+ Instance Offset (First Stage Only)
============= ================================

The placement of the global internal table remains fixed in the first *user
data SGPR register*. Otherwise all parameters are optional, and can be mapped
to any desired *user data SGPR register*, with the following regstrictions:

* Draw Index, Vertex Offset, and Instance Offset can only be used by the first
activehardware stage in a graphics pipeline (i.e. where the API vertex
shader runs).

* Application-controlled user data must be mapped into a contiguous range of
user data registers.

* The application-controlled user data range supports compaction remapping, so
only *entries* that are actually consumed by the shader must be assigned to
corresponding *registers*. Note that in order to support an efficient runtime
implementation, the remapping must pack *registers* in the same order as
*entries*, with unused *entries* removed.

.. _pal_global_internal_table:

Global Internal Table
~~~~~~~~~~~~~~~~~~~~~

The global internal table is a table of *shader resource descriptors* (SRDs) that
define how certain engine-wide, runtime-managed resources should be accessed
from a shader. The majority of these resources have HW-defined formats, and it
is up to the compiler to write/read data as required by the target hardware.

The following table illustrates the required format:

.. table:: PAL Global Internal Table
:name: pal-git-table

============= ================================
Offset Description
============= ================================
0-3 Graphics Scratch SRD
4-7 Compute Scratch SRD
8-11 ES/GS Ring Output SRD
12-15 ES/GS Ring Input SRD
16-19 GS/VS Ring Output #0
20-23 GS/VS Ring Output #1
24-27 GS/VS Ring Output #2
28-31 GS/VS Ring Output #3
32-35 GS/VS Ring Input SRD
36-39 Tessellation Factor Buffer SRD
40-43 Off-Chip LDS Buffer SRD
44-47 Off-Chip Param Cache Buffer SRD
48-51 Sample Position Buffer SRD
52 vaRange::ShadowDescriptorTable High Bits
============= ================================

The pointer to the global internal table passed to the shader as user data
is a 32-bit pointer. The top 32 bits should be assumed to be the same as
the top 32 bits of the pipeline, so the shader may use the program
counter's top 32 bits.

Unspecified OS
--------------

Expand Down

0 comments on commit af2dfc6

Please sign in to comment.