Skip to content
Permalink
Newer
Older
100644 2427 lines (1901 sloc) 139 KB
Dec 28, 2016
1
=============================
2
DirectX Intermediate Language
3
=============================
4
5
.. contents::
6
:local:
7
:depth: 2
8
9
Introduction
10
============
11
12
This document presents the design of the DirectX Intermediate Language (DXIL) for GPU shaders. DXIL is intended to support a direct mapping of the HLSL programming language into Low-Level Virtual Machine Intermediate Representation (LLVM IR), suitable for consumption in GPU drivers. This version of the specification is based on LLVM 3.7 in the use of metadata syntax.
13
14
We distinguish between DXIL, which is a low-level IR for GPU driver compilers, and DXIR, which is a high-level IR, more suitable for emission by IR producers, such as Clang. DXIR is transformed to DXIL by the optimizer. DXIR accepts high-level constructs, such as user-defined types, multi-dimensional arrays, matrices, and vectors. These, however, are not suitable for fast JIT-ing in the driver compilers, and so are lowered by the optimizer, such that DXIL works on simpler abstractions. Both DXIL and DXIR are derived from LLVM IR. This document does not describe DXIR.
15
16
LLVM is quickly becoming a de facto standard in modern compilation technology. The LLVM framework offers several distinct features, such as a vibrant ecosystem, complete compilation framework, modular design, and reasonable documentation. We can leverage these to achieve two important objectives.
17
18
First, unification of shader compilation tool chain. DXIL is a contract between IR producers, such as compilers for HLSL and other domain-specific languages, and IR consumers, such as IHV driver JIT compilers or offline XBOX shader compiler. In addition, the design provides for conversion the current HLSL IL, called DXBC IL in this document, and DXIL.
19
20
Second, leveraging the LLVM ecosystem. Microsoft will publicly document DXIL and DXIR to attract domain language implementers and spur innovation. Using LLVM-based IR offers reduced entry costs for small teams, simply because small teams are likely to use LLVM and Clang as their main compilation framework. We will provide DXIL verifier to check consistency of generated DXIL.
21
22
The following diagram shows how some of these components tie together::
23
24
HLSL Other shading langs DSL DXBC IL
25
+ + + +
26
| | | |
27
v v v v
28
Clang Clang Other Tools dxbc2dxil
29
+ + + +
30
| | | |
31
v v v |
32
+------+--------------------+---------+ |
33
| High level IR (DXIR) | |
34
+-------------------------------------+ |
35
| |
36
| |
37
v |
38
Optimizer <-----+ Linker |
39
+ ^ + |
40
| | | |
41
| | | |
42
+------------v------+-------------v-----v-------+
43
| Low|level IR (DXIL) |
44
+------------+----------------------+-----------+
45
| |
46
v v
47
Driver Compiler Verifier
48
49
The *dxbc2dxil* element in the diagram is a component that converts existing DXBC shader byte code into DXIL. The *Optimizer* element is a component that consumes DXIR, verifies it is valid, optimizes it, and produces a valid DXIL form. The *Verifier* element is a public component that verifies DXIL. The *Linker* is a component that combines precompiled DXIL libraries with the entry function to produce a valid shader.
50
51
DXIL does not support the following HLSL features that were present in prior implementations.
52
53
* Shader models 9 and below. Microsoft may implement 10level9 shader models via DXIL capability tiers.
54
* Effects.
55
* HLSL interfaces.
56
* Shader compression/decompression.
57
* Partial precision. Half data type should be used instead.
58
* min10float type. Half data type should be used instead.
59
* HLSL *uniform* parameter qualifier.
60
* Current fxc legacy compatibility mode for old shader models (e.g., c-register binding).
61
* PDB. Debug Information annotations are used instead.
62
* Compute shader model cs_4_0.
63
* DXBC label, call, fcall constructs.
64
65
The following principles are used to ease reuse with LLVM components and aid extensibility.
66
67
* DXIL uses a subset of LLVM IR constructs that makes sense for HLSL.
68
* No modifications to the core LLVM IR; i.e., no new instructions or fundamental types.
69
* Additional information is conveyed via metadata, LLVM intrinsics or external functions.
70
* Name prefixes: 'llvm.dx.', 'llvm.dxil.', 'llvm.dxir.', 'dx.', 'dxil.', and 'dxir.' are reserved.
71
72
LLVM IR has three equivalent forms: human-readable, binary (bitcode), and in-memory. DXIL is a binary format and is based on a subset of LLVM IR bitcode format. The document uses only human-readable form to describe DXIL.
73
74
Versioning
75
==========
76
77
There are three versioning mechanisms in DXIL shaders: shader model, DXIL version, and LLVM bitcode version.
78
79
At a high-level, the shader model describes the target execution model and environment; DXIL provides a mechanism to express programs (including rules around expressing data types and operations); and LLVM bitcode provides a way to encode a DXIL program.
80
81
Shader Model
82
------------
83
84
The shader model in DXIL is similar to DXBC shader model. A shader model specifies the execution model, the set of capabilities that shader instructions can use and the constraints that a shader program must adhere to.
85
86
The shader model is specified as a named metadata in DXIL::
87
88
!dx.shaderModel = !{ !0 }
89
!0 = !{ !"<shadelModelName>", i32 <major>, i32 <minor> }
90
91
The following values of <shaderModelName>_<major>_<minor> are supported:
92
93
==================== ===================================== ===========
94
Target Legacy Models DXIL Models
95
==================== ===================================== ===========
96
Vertex shader (VS) vs_4_0, vs_4_1, vs_5_0, vs_5_1 vs_6_0
97
Hull shader (HS) hs_5_0, hs_5_1 hs_6_0
98
Domain shader (DS) ds_5_0, ds_5_1 ds_6_0
99
Geometry shader (GS) gs_4_0, gs_4_1, gs_5_0, gs_5_1 gs_6_0
100
Pixel shader (PS) ps_4_0, ps_4_1, ps_5_0, ps_5_1 ps_6_0
101
Compute shader (CS) cs_5_0 (cs_4_0 is mapped onto cs_5_0) cs_6_0
102
Shader library no support no support
103
==================== ===================================== ===========
104
105
The DXIL verifier ensures that DXIL conforms to the specified shader model.
106
107
For shader models prior to 6.0, only the rules applicable to the DXIL representation are valid. For example, the limits on maximum number of resources is honored, but the limits on registers aren't because DXIL does not have a representation for registers.
108
109
DXIL version
110
------------
111
112
The primary mechanism to evolve HLSL capabilities is through shader models. However, DXIL version is reserved for additional flexibility of future extensions. The only currently defined version is 1.0.
113
114
DXIL version has major and minor versions that are specified as named metadata::
115
116
!dx.version = !{ !0 }
117
!0 = !{ i32 <major>, i32 <minor> }
118
119
DXIL version must be declared exactly once per LLVM module (translation unit) and is valid for the entire module.
120
121
DXIL will evolve in a manner that retains backward compatibility.
122
123
LLVM Bitcode version
124
--------------------
125
126
The current version of DXIL is based on LLVM bitcode v3.7. This encoding is necessarily implied by something outside the DXIL module.
127
128
General Issues
129
==============
130
131
An important goal is to enable HLSL to be closer to a strict subset of C/C++. This has implications for DXIL design and future hardware feature requests outlined below.
132
133
Terminology
134
-----------
135
Resource refers to one of the following:
136
137
* SRV - shader resource view (read-only)
138
* UAV - unordered access view (read-write)
139
* CBV - constant buffer view (read-only)
140
* Sampler
141
142
Intrinsics typically refer to operations missing in the core LLVM IR. DXIL represents HLSL built-in functions (also called intrinsics) not as LLVM intrinsics, but rather as external function calls.
143
144
145
DXIL abstraction level
146
----------------------
147
148
DXIL has level of abstraction similar to a 'scalarized' DXBC. DXIL is lower level IR than DXIR emitted by the front-end to be amenable to fast and robust JIT-ing in driver compilers.
149
150
In particular, the following passes are performed to lower the HLSL/DXIR abstractions down to DXIL:
151
152
* optimize function parameter copies
153
* inline functions
154
* allocate and transform shader signatures
155
* lower matrices, optimizing intermediate storage
156
* linearize multi-dimensional arrays and user-defined type accesses
157
* scalarize vectors
158
159
Scalar IR
160
---------
161
DXIL operations work with scalar quantities. Several scalar quantities may be grouped together in a struct to represent several return values, which is used for memory operations, e.g., load/store, sample, etc., that benefit from access coalescing.
162
163
Metadata, resource declarations, and debugging info may contain vectors to more closely convey source code shape to tools and debuggers.
164
165
Future versions of IR may contain vectors or grouping hints for less-than-32-bit quantities, such as half and i16.
166
167
Memory accesses
168
---------------
169
170
DXIL conceptually aligns with DXBC in how different memory types are accessed. Out-of-bounds behavior and various restrictions are preserved.
171
172
Indexable thread-local and groupshared variables are represented as variables and accessed via LLVM C-like pointers.
173
174
Swizzled resources, such as textures, have opaque memory layouts from a DXIL point of view. Accesses to these resources are done via intrinsics.
175
176
There are two layouts for constant buffer memory: (1) legacy, matching DXBC's layout and (2) linear layout. SM6 DXIL uses intrinsics to read cbuffer for either layout.
177
178
Shader signatures require packing and are located in a special type of memory that cannot be viewed as linear. Accesses to signature values are done via special intrinsics in DXIL. If a signature parameter needs to be passed to a function, a copy is created first in threadlocal memory and the copy is passed to the function.
179
180
Typed buffers represent memory with in-flight data conversion. Typed buffer load/store/atomics are done via special functions in DXIL with element-granularity indexing.
181
182
The following pointer types are supported:
183
184
* Non-indexable thread-local variables.
185
* Indexable thread-local variables (DXBC x-registers).
186
* Groupshared variables (DXBC g-registers).
187
* Device memory pointer.
188
* Constant-buffer-like memory pointer.
189
190
The type of DXIL pointer is differentiated by LLVM addrspace construct. The HLSL compiler will make the best effort to infer the exact pointer addrspace such that a driver compiler can issue the most efficient instruction.
191
192
A pointer can come into being in a number of ways:
193
194
* Global Variables.
195
* AllocaInst.
196
* Synthesized as a result of some pointer arithmetic.
197
198
DXIL uses 32-bit pointers in its representation.
199
200
Out-of-bounds behavior
201
----------------------
202
203
Indexable thread-local accesses are done via LLVM pointer and have C-like OOB semantics.
204
Groupshared accesses are done via LLVM pointer too. The origin of a groupshared pointer must be a single TGSM allocation.
205
If a groupshared pointer uses in-bound GEP instruction, it should not OOB. The behavior for an OOB access for in-bound pointer is undefined.
206
For groupshared pointer from regular GEP, OOB will has same behavior as DXBC. Loads return 0 for OOB accesses; OOB stores are silently dropped.
207
208
Resource accesses keeps the same out-of-bounds behavior as DXBC. Loads return 0 for OOB accesses; OOB stores are silently dropped.
209
210
OOB pointer accesses in SM6.0 and later have undefined (C-like) behavior. LLVM memory optimization passes can be used to optimize such accesses. Where out-of-bound behavior is desired, intrinsic functions are used to access memory.
211
212
Memory access granularity
213
-------------------------
214
215
Intrinsic and resource accesses may imply a wider access than requested by an instruction. DXIL defines memory accesses for i1, i16, i32, i64, f16, f32, f64 on thread local memory, and i32, f32, f64 for memory I/O (that is, groupshared memory and memory accessed via resources such as CBs, UAVs and SRVs).
216
217
218
Number of virtual values
219
------------------------
220
221
There is no limit on the number of virtual values in DXIL. The IR is guaranteed to be in an SSA form. For optimized shaders, the optimizer will run -mem2reg LLVM pass as well as perform other memory to register promotions if profitable.
222
223
Control-flow restrictions
224
-------------------------
225
226
The DXIL control-flow graph must be reducible, as checked by T1-T2 test. DXIL does not preserve structured control flow of DXBC. Preserving structured control-flow property would impose significant burden on third-party tools optimizing to DXIL via LLVM, reducing appeal of DXIL.
227
228
DXIL allows fall-through for switch label blocks. This is a difference from DXBC, in which the fall-through is prohibited.
229
230
DXIL will not support the DXBC label and call instructions; LLVM functions can be used instead (see below). The primary uses for these are (1) HLSL interfaces, which are not supported, and (2) outlining of case-bodies in a switch statement annotated with [call], which is not a scenario of interest.
231
232
Functions
233
---------
234
235
Instead of DXBC labels/calls, DXIL supports functions and call instructions. Recursion is not allowed; DXIL validator enforces this.
236
237
The functions are regular LLVM functions. Parameters can be passed by-value or by-reference. The functions are to facilitate separate compilation for big, complex shaders. However, driver compilers are free to inline functions as they see fit.
238
239
Identifiers
240
-----------
241
242
DXIL identifiers must conform to LLVM IR identifier rules.
243
244
Identifier mangling rules are the ones used by Clang 3.7 with the HLSL target.
245
246
The following identifier prefixes are reserved:
247
248
* dx.*, dxil.*, dxir.*
249
* llvm.dx.*, llvm.dxil.*, llvm.dxir.*
250
251
Address Width
252
-------------
253
254
DXIL will use only 32-bit addresses for pointers. Byte offsets are also 32-bit.
255
256
Shader restrictions
257
-------------------
258
259
There is no support for the following in DXIL:
260
261
* recursion
262
* exceptions
263
* indirect function calls and dynamic dispatch
264
265
Entry points
266
------------
267
268
The dx.entryPoints metadata specifies a list of entry point records, one for each entry point. Libraries could specify more than one entry point per module but currently exist outside the DXIL specification; the other shader models must specify exactly one entry point.
269
270
For example::
271
272
define void @"\01?myfunc1@@YAXXZ"() #0 { ... }
273
define float @"\01?myfunc2@@YAMXZ"() #0 { ... }
274
275
!dx.entryPoints = !{ !1, !2 }
276
277
!1 = !{ void ()* @"\01?myfunc1@@YAXXZ", !"myfunc1", !3, null, null }
278
!2 = !{ float ()* @"\01?myfunc2@@YAMXZ", !"myfunc2", !5, !6, !7 }
279
280
Each entry point metadata record specifies:
281
282
* reference to the entry point function global symbol
283
* unmangled name
284
* list of signatures
285
* list of resources
286
* list of tag-value pairs of shader capabilities and other properties
287
288
A 'null' value specifies absence of a particular node.
289
290
Shader capabilities are properties that are additional to properties dictated by shader model. The list is organized as pairs of i32 tag, followed immediately by the value itself.
291
292
Hull shader representation
293
--------------------------
294
295
The hull shader is represented as two functions, related via metadata: (1) control point phase function, which is the entry point of the hull shader, and (2) patch constant phase function.
296
297
For example::
298
299
!dx.entryPoints = !{ !1 }
300
!1 = !{ void ()* @"ControlPointFunc", ..., !2 } ; shader entry record
301
!2 = !{ !"HS", !3 }
302
!3 = !{ void ()* @"PatchConstFunc", ... } ; additional hull shader state
303
304
The patch constant function represents original HLSL computation, and is not separated into fork and join phases, as it is the case in DXBC. The driver compiler may perform such separation if this is profitable for the target GPU.
305
306
In DXBC to DXIL conversion, the original patch constant function cannot be recovered during DXBC-to-DXIL conversion. Instead, instructions of each fork and join phases are 'wrapped' by a loop that iterates the corresponding number of phase-instance-count iterations. Thus, fork/join instance ID becomes the loop induction variable. LoadPatchConstant intrinsic (see below) represents load from DXBC vpc register.
307
308
The following table summarizes the names of intrinsic functions to load inputs and store outputs of hull and domain shaders. CP stands for Control Point, PC - for Patch Constant.
309
310
=================== ==================== ====================== ======================
311
Operation Control Point (Hull) Patch Constant Domain
312
=================== ==================== ====================== ======================
313
Store Input CP
314
Load Input CP LoadInput LoadInput
315
Store Output CP StoreOutput
316
Load Output CP LoadOutputControlPoint LoadOutputControlPoint
317
Store PC StorePatchConstant
318
Load PC LoadPatchConstant LoadPatchConstant
319
Store Output Vertex StoreOutput
320
=================== ==================== ====================== ======================
321
322
LoadPatchConstant function in PC stage is generated only by DXBC-to-DXIL converter, to access DXBC vpc registers. HLSL compiler produces IR that references LLVM IR values directly.
323
324
Type System
325
===========
326
327
Most of LLVM type system constructs are legal in DXIL.
328
329
Primitive Types
330
---------------
331
332
The following types are supported:
333
334
* void
335
* metadata
336
* i1, i8, i16, i32, i64
337
* half, float, double
338
339
SM6.0 assumes native hardware support for i32 and float types.
340
341
i8 is supported only in a few intrinsics to signify masks, enumeration constant values, or in metadata. It's not supported for memory access or computation by the shader.
342
343
HLSL min12int, min16int and min16uint data types are mapped to i16.
344
345
half and i16 are treated as corresponding DXBC min-presicion types (min16float, min16int/min16uint) in SM6.0.
346
347
The HLSL compiler optimizer treats half, i16 and i8 data as data types natively supported by the hardware; i.e., saturation, range clipping, INF/NaN are done according to the IEEE standard. Such semantics allow the optimizer to reuse LLVM optimization passes.
348
349
Hardware support for doubles in optional and is guarded by RequiresHardwareDouble CAP bit.
350
351
Hardware support for i64 is optional and is guarded by a CAP bit.
352
353
Vectors
354
-------
355
356
HLSL vectors are scalarized. They do not participate in computation; however, they may be present in declarations to convey original variable layout to tools, debuggers, and reflection.
357
358
Future DXIL may add support for <2 x half> and <2 x i16> vectors or hints for packing related half and i16 quantities.
359
360
Matrices
361
--------
362
363
Matrices are lowered to vectors, and are not referenced by instructions. They may be present in declarations to convey original variable layout to tools, debuggers, and reflection.
364
365
Arrays
366
------
367
368
Instructions may reference only 1D arrays of primitive types. However, complex arrays, e.g., multidimensional arrays or user-defined types, may be present to convey original variable layout to tools, debuggers, and reflection.
369
370
User-defined types
371
------------------
372
373
Original HLSL UDTs are lowered and are not referenced by instructions. However, they may be present in declarations to convey original variable layout to tools, debuggers, and reflection. Some resource operations return 'grouping' UDTs that group several return values; such UDTs are immediately 'decomposed' into components that are then consumed by other instructions.
374
375
Type conversions
376
----------------
377
378
Explicit conversions between types are supported via LLVM instructions.
379
380
Precise qualifier
381
-----------------
382
383
HLSL precise type qualifier requires that all operations contributing to the value be IEEE compliant with respect to optimizations.
384
385
Each relevant instruction that contributes to such a value is annotated with dx.precise metadata that indicates that it is illegal for the driver compiler to perform IEEE-unsafe optimizations.
386
387
The default mode for DXIL is that operations are not precise; i.e., each operation is 'fast' (this is reverse of LLVM IR default mode). There is a way to change the default behavior for the entire shader via AllOperationsPrecise shader property.
388
389
Type annotations
390
----------------
391
392
User-defined types are annotated in DXIL to 'attach' additional properties to structure fields. For example, DXIL may contain type annotations for reflection purposes::
393
394
; namespace MyNamespace1
395
; {
396
; struct MyType1
397
; {
398
; float field1;
399
; int2 field2;
400
; };
401
; }
402
403
%struct.MyNamespace1.MyType1 = type { float, <2 x i32> }
404
!struct.MyNamespace1.MyType1 = !{ !1, !2 }
405
!1 = !{ !"field1", null }
406
!2 = !{ !"field2", null }
407
408
; struct MyType2
409
; {
410
; MyType1 array_field[2];
411
; float4 float4_field;
412
; };
413
414
%struct.MyType2 = type { [2 x %struct.MyType1], <4 x float> }
415
!struct.MyType2 = !{ !3, !4 }
416
!3 = !{ !"array_field", null }
417
!4 = !{ !"float4_field", null }
418
419
The type/field annotation metadata hierarchy recursively mimics LLVM type hierarchy.
420
421
Each field-annotation record has an optional named-value pair list for infrequent annotations and for future extensions. The lists are null in the example above.
422
423
Note that Clang emits '::' to separate namespaces, if any, in type names. We modify Clang to use '.' instead, because it is illegal to use ':' in metadata names.
424
425
Shader Properties and Capabilities
426
==================================
427
428
Additional shader properties are specified via tag-value pair list, which is the last element in the entry function description record.
429
430
Shader Flags
431
------------
432
433
Shaders have additional flags that covey their capabilities via tag-value pair with tag kDxilShaderFlagsTag (0), followed by an i64 bitmask integer. The bits have the following meaning:
434
435
=== =====================================================================
436
Bit Description
437
=== =====================================================================
438
0 Disable shader optimizations
439
1 Disable math refactoring
440
2 Shader uses doubles
441
3 Force early depth stencil
442
4 Enable raw and structured buffers
443
5 Shader uses min-precision, expressed as half and i16
444
6 Shader uses double extension intrinsics
445
7 Shader uses MSAD
446
8 All resources must be bound for the duration of shader execution
447
9 Enable view port and RT array index from any stage feeding rasterizer
448
10 Shader uses inner coverage
449
11 Shader uses stencil
450
12 Shader uses intrinsics that access tiled resources
451
13 Shader uses relaxed typed UAV load formats
452
14 Shader uses Level9 comparison filtering
453
15 Shader uses up to 64 UAVs
454
16 Shader uses UAVs
455
17 Shader uses CS4 raw and structured buffers
456
18 Shader uses Rasterizer Ordered Views
457
19 Shader uses wave intrinsics
458
20 Shader uses int64 instructions
459
=== =====================================================================
460
461
Geometry Shader
462
---------------
463
464
Geometry shader properties are specified via tag-value pair with tag kDxilGSStateTag (1), followed by a list of GS properties. The format of this list is the following.
465
466
=== ==== ===============================================================
467
Idx Type Description
468
=== ==== ===============================================================
469
0 i32 Input primitive (InputPrimitive enum value).
470
1 i32 Max vertex count.
471
2 i32 Primitive topology for stream 0 (PrimitiveTopology enum value).
472
3 i32 Primitive topology for stream 1 (PrimitiveTopology enum value).
473
4 i32 Primitive topology for stream 2 (PrimitiveTopology enum value).
474
5 i32 Primitive topology for stream 3 (PrimitiveTopology enum value).
475
=== ==== ===============================================================
476
477
Domain Shader
478
-------------
479
480
Domain shader properties are specified via tag-value pair with tag kDxilDSStateTag (2), followed by a list of DS properties. The format of this list is the following.
481
482
=== ==== ===============================================================
483
Idx Type Description
484
=== ==== ===============================================================
485
0 i32 Tessellator domain (TessellatorDomain enum value).
486
1 i32 Input control point count.
487
=== ==== ===============================================================
488
489
Hull Shader
490
-----------
491
492
Hull shader properties are specified via tag-value pair with tag kDxilHSStateTag (3), followed by a list of HS properties. The format of this list is the following.
493
494
=== ======= =====================================================================
495
Idx Type Description
496
=== ======= =====================================================================
497
0 MDValue Patch constant function (global symbol).
498
1 i32 Input control point count.
499
2 i32 Output control point count.
500
3 i32 Tessellator domain (TessellatorDomain enum value).
501
4 i32 Tessellator partitioning (TessellatorPartitioning enum value).
502
5 i32 Tessellator output primitive (TessellatorOutputPrimitive enum value).
503
6 float Max tessellation factor.
504
=== ======= =====================================================================
505
506
Compute Shader
507
--------------
508
509
Compute shader has the following tag-value properties.
510
511
===================== ======================== =============================================
512
Tag Value Description
513
===================== ======================== =============================================
514
kDxilNumThreadsTag(4) MD list: (i32, i32, i32) Number of threads (X,Y,Z) for compute shader.
515
===================== ======================== =============================================
516
517
Shader Parameters and Signatures
518
================================
519
520
This section formalizes how HLSL shader input and output parameters are expressed in DXIL.
521
522
HLSL signatures and semantics
523
-----------------------------
524
525
Formal parameters of a shader entry function in HLSL specify how the shader interacts with the graphics pipeline. Input parameters, referred to as an input signature, specify values received by the shader. Output parameters, referred to as an output signature, specify values produced by the shader. The shader compiler maps HLSL input and output signatures into DXIL specifications that conform to hardware constraints outlined in the Direct3D Functional Specification. DXIL specifications are also called signatures.
526
527
Signature mapping is a complex process, as there are many constraints. All signature parameters must fit into a finite space of N 4x32-bit registers. For efficiency reasons, parameters are packed together in a way that does not violate specification constraints. The process is called signature packing. Most signatures are tightly packed; however, the VS input signature is not packed, as the values are coming from the Input Assembler (IA) stage rather than the graphics pipeline. Alternately, the PS output signature is allocated to align the SV_Target semantic index with the output register index.
528
529
Each HLSL signature parameter is defined via C-like type, interpolation mode, and semantic name and index. The type defines parameter shape, which may be quite complex. Interpolation mode adds to the packing constraints, namely that parameters packed together must have compatible interpolation modes. Semantics are extra names associated with parameters for the following purposes: (1) to specify whether a parameter is as a special System Value (SV) or not, (2) to link parameters to IA or StreamOut API streams, and (3) to aid debugging. Semantic index is used to disambiguate parameters that use the same semantic name, or span multiple rows of the register space.
530
531
SV semantics add specific meanings and constraints to associated parameters. A parameter may be supplied by the hardware, and is then known as a System Generated Value (SGV). Alternatively, a parameter may be interpreted by the hardware and is then known as System Interpreted Value (SIV). SGVs and SIVs are pipeline-stage dependent; moreover, some participate in signature packing and some do not. Non-SV semantics always participate in signature packing.
532
533
Most System Generated Values (SGV) are loaded using special Dxil intrinsic functions, rather than loading the input from a signature. These usually will not be present in the signature at all. Their presence may be detected by the declaration and use of the special instrinsic function itself. The exceptions to this are notible. In one case they are present and loaded from the signature instead of a special intrinsic because they must be part of the packed signature potentially passed from the prior stage, allowing the prior stage to override these values, such as for SV_PrimitiveID and SV_IsFrontFace that may be written in the the Geometry Shader. In another case, they identify signature elements that still contribute to DXBC signature for informational purposes, but will only use the special intrinsic function to read the value, such as for SV_PrimitiveID for GS input and SampleIndex for PS input.
534
535
The classification of behavior for various system values in various signature locations is described in a table organized by SemanticKind and SigPointKind. The SigPointKind is a new classification that uniquely identifies each set of parameters that may be input or output for each entry point. For each combination of SemanticKind and SigPointKind, there is a SemanticInterpretationKind that defines the class of treatment for that location.
536
537
Each SigPointKind also has a corresponding element allocation (or packing) behavior called PackingKind. Some SigPointKinds do not result in a signature at all, which corresponds to the packing kind of PackingKind::None.
538
539
Signature Points are enumerated as follows in the SigPointKind enum::
540
541
.. <py>import hctdb_instrhelp</py>
542
.. <py::lines('SIGPOINT-RST')>hctdb_instrhelp.get_sigpoint_rst()</py>
543
.. SIGPOINT-RST:BEGIN
544
545
== ======== ======= ========== ============== ============= ============================================================================
546
ID SigPoint Related ShaderKind PackingKind SignatureKind Description
547
== ======== ======= ========== ============== ============= ============================================================================
548
0 VSIn Invalid Vertex InputAssembler Input Ordinary Vertex Shader input from Input Assembler
549
1 VSOut Invalid Vertex Vertex Output Ordinary Vertex Shader output that may feed Rasterizer
550
2 PCIn HSCPIn Hull None Invalid Patch Constant function non-patch inputs
551
3 HSIn HSCPIn Hull None Invalid Hull Shader function non-patch inputs
552
4 HSCPIn Invalid Hull Vertex Input Hull Shader patch inputs - Control Points
553
5 HSCPOut Invalid Hull Vertex Output Hull Shader function output - Control Point
554
6 PCOut Invalid Hull PatchConstant PatchConstant Patch Constant function output - Patch Constant data passed to Domain Shader
555
7 DSIn Invalid Domain PatchConstant PatchConstant Domain Shader regular input - Patch Constant data plus system values
556
8 DSCPIn Invalid Domain Vertex Input Domain Shader patch input - Control Points
557
9 DSOut Invalid Domain Vertex Output Domain Shader output - vertex data that may feed Rasterizer
558
10 GSVIn Invalid Geometry Vertex Input Geometry Shader vertex input - qualified with primitive type
559
11 GSIn GSVIn Geometry None Invalid Geometry Shader non-vertex inputs (system values)
560
12 GSOut Invalid Geometry Vertex Output Geometry Shader output - vertex data that may feed Rasterizer
561
13 PSIn Invalid Pixel Vertex Input Pixel Shader input
562
14 PSOut Invalid Pixel Target Output Pixel Shader output
563
15 CSIn Invalid Compute None Invalid Compute Shader input
564
== ======== ======= ========== ============== ============= ============================================================================
565
566
.. SIGPOINT-RST:END
567
568
Semantic Interpretations are as follows (SemanticInterpretationKind)::
569
570
.. <py>import hctdb_instrhelp</py>
571
.. <py::lines('SEMINT-RST')>hctdb_instrhelp.get_sem_interpretation_enum_rst()</py>
572
.. SEMINT-RST:BEGIN
573
574
== ========== =============================================================
575
ID Name Description
576
== ========== =============================================================
577
0 NA Not Available
578
1 SV Normal System Value
579
2 SGV System Generated Value (sorted last)
580
3 Arb Treated as Arbitrary
581
4 NotInSig Not included in signature (intrinsic access)
582
5 NotPacked Included in signature, but does not contribute to packing
583
6 Target Special handling for SV_Target
584
7 TessFactor Special handling for tessellation factors
585
8 Shadow Shadow element must be added to a signature for compatibility
586
== ========== =============================================================
587
588
.. SEMINT-RST:END
589
590
Semantic Interpretations for each SemanticKind at each SigPointKind are as follows::
591
592
.. <py>import hctdb_instrhelp</py>
593
.. <py::lines('SEMINT-TABLE-RST')>hctdb_instrhelp.get_sem_interpretation_table_rst()</py>
594
.. SEMINT-TABLE-RST:BEGIN
595
596
====================== ==== ===== ======== ======== ====== ======= ========== ========== ====== ===== ===== ======== ===== ============ ============= ========
597
Semantic VSIn VSOut PCIn HSIn HSCPIn HSCPOut PCOut DSIn DSCPIn DSOut GSVIn GSIn GSOut PSIn PSOut CSIn
598
====================== ==== ===== ======== ======== ====== ======= ========== ========== ====== ===== ===== ======== ===== ============ ============= ========
599
Arbitrary Arb Arb NA NA Arb Arb Arb Arb Arb Arb Arb NA Arb Arb NA NA
600
VertexID SV NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
601
InstanceID SV Arb NA NA Arb Arb NA NA Arb Arb Arb NA Arb Arb NA NA
602
Position Arb SV NA NA SV SV Arb Arb SV SV SV NA SV SV NA NA
603
RenderTargetArrayIndex Arb SV NA NA SV SV Arb Arb SV SV SV NA SV SV NA NA
604
ViewPortArrayIndex Arb SV NA NA SV SV Arb Arb SV SV SV NA SV SV NA NA
605
ClipDistance Arb SV NA NA SV SV Arb Arb SV SV SV NA SV SV NA NA
606
CullDistance Arb SV NA NA SV SV Arb Arb SV SV SV NA SV SV NA NA
607
OutputControlPointID NA NA NA NotInSig NA NA NA NA NA NA NA NA NA NA NA NA
608
DomainLocation NA NA NA NA NA NA NA NotInSig NA NA NA NA NA NA NA NA
609
PrimitiveID NA NA NotInSig NotInSig NA NA NA NotInSig NA NA NA Shadow SGV SGV NA NA
610
GSInstanceID NA NA NA NA NA NA NA NA NA NA NA NotInSig NA NA NA NA
611
SampleIndex NA NA NA NA NA NA NA NA NA NA NA NA NA Shadow _41 NA NA
612
IsFrontFace NA NA NA NA NA NA NA NA NA NA NA NA SGV SGV NA NA
613
Coverage NA NA NA NA NA NA NA NA NA NA NA NA NA NotInSig _50 NotPacked _41 NA
614
InnerCoverage NA NA NA NA NA NA NA NA NA NA NA NA NA NotInSig _50 NA NA
615
Target NA NA NA NA NA NA NA NA NA NA NA NA NA NA Target NA
616
Depth NA NA NA NA NA NA NA NA NA NA NA NA NA NA NotPacked NA
617
DepthLessEqual NA NA NA NA NA NA NA NA NA NA NA NA NA NA NotPacked _50 NA
618
DepthGreaterEqual NA NA NA NA NA NA NA NA NA NA NA NA NA NA NotPacked _50 NA
619
StencilRef NA NA NA NA NA NA NA NA NA NA NA NA NA NA NotPacked _50 NA
620
DispatchThreadID NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NotInSig
621
GroupID NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NotInSig
622
GroupIndex NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NotInSig
623
GroupThreadID NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NotInSig
624
TessFactor NA NA NA NA NA NA TessFactor TessFactor NA NA NA NA NA NA NA NA
625
InsideTessFactor NA NA NA NA NA NA TessFactor TessFactor NA NA NA NA NA NA NA NA
626
====================== ==== ===== ======== ======== ====== ======= ========== ========== ====== ===== ===== ======== ===== ============ ============= ========
627
628
.. SEMINT-TABLE-RST:END
629
630
Below is a vertex shader example that is used for illustration throughout this section::
631
632
struct Foo {
633
float a;
634
float b[2];
635
};
636
637
struct VSIn {
638
uint vid : SV_VertexID;
639
float3 pos : Position;
640
Foo foo[3] : SemIn1;
641
float f : SemIn10;
642
};
643
644
struct VSOut
645
{
646
float f : SemOut1;
647
Foo foo[3] : SemOut2;
648
float4 pos : SV_Position;
649
};
650
651
void main(in VSIn In, // input signature
652
out VSOut Out) // output signature
653
{
654
...
655
}
656
657
Signature packing must be efficient. It should use as few registers as possible, and the packing algorithm should run in reasonable time. The complication is that the problem is NP complete, and the algorithm needs to resort to using a heuristic.
658
659
While the details of the packing algorithm are not important at the moment, it is important to outline some concepts related to how a packed signature is represented in DXIL. Packing is further complicated by the complexity of parameter shapes induced by the C/C++ type system. In the example above, fields of Out.foo array field are actually arrays themselves, strided in memory. Allocating such strided shapes efficiently is hard. To simplify packing, the first step is to break user-defined (struct) parameters into constituent components and to make strided arrays contiguous. This preparation step enables the algorithm to operate on dense rectangular shapes, which we call signature elements. The output signature in the example above has the following elements: float Out_f, float Out_foo_a[3], float Out_foo_b[2][3], and float4 pos. Each element is characterized by the number of rows and columns. These are 1x1, 3x1, 6x1, and 1x4, respectively. The packing algorithm reduces to fitting these elements into Nx4 register space, satisfying all packing-compatibility constraints.
660
661
Signature element record
662
------------------------
663
Each signature element is represented in DXIL as a metadata record.
664
665
For above example output signature, the element records are as follows::
666
667
; element ID, semantic name, etype, sv, s.idx, interp, rows, cols, start row, col, ext. list
668
!20 = !{i32 6, !"SemOut", i8 0, i8 0, !40, i8 2, i32 1, i8 1, i32 1, i8 2, null}
669
!21 = !{i32 7, !"SemOut", i8 0, i8 0, !41, i8 2, i32 3, i8 1, i32 1, i8 1, null}
670
!22 = !{i32 8, !"SemOut", i8 0, i8 0, !42, i8 2, i32 6, i8 1, i32 1, i8 0, null}
671
!23 = !{i32 9, !"SV_Position", i8 0, i8 3, !43, i8 2, i32 1, i8 4, i32 0, i8 0, null}
672
673
A record contains the following fields.
674
675
=== =============== ===============================================================================
676
Idx Type Description
677
=== =============== ===============================================================================
678
0 i32 Unique signature element record ID, used to identify the element in operations.
679
1 String metadata Semantic name.
680
2 i8 ComponentType (enum value).
681
3 i8 SemanticKind (enum value).
682
4 Metadata Metadata list that enumerates all semantic indexes of the flattened parameter.
683
5 i8 InterpolationMode (enum value).
684
6 i32 Number of element rows.
685
7 i8 Number of element columns.
686
8 i32 Starting row of element packing location.
687
9 i8 Starting column of element packing location.
688
10 Metadata Metadata list of additional tag-value pairs; can be 'null' or empty.
689
=== =============== ===============================================================================
690
691
Semantic name system values always start with 'S', 'V', '_' , and it is illegal to start a user semantic with this prefix. Non-SVs can be ignored by drivers. Debug layers may use these to help validate signature compatibility between stages.
692
693
The last metadata list is used to specify additional properties and future extensions.
694
695
Signature record metadata
696
-------------------------
697
698
A shader typically has two signatures: input and output, while domain shader has an additional patch constant signature. The signatures are composed of signature element records and are attached to the shader entry metadata. The examples below clarify metadata details.
699
700
Vertex shader HLSL
701
~~~~~~~~~~~~~~~~~~
702
703
Here is the HLSL of the above vertex shader. The semantic index assignment is explained in section below::
704
705
struct Foo
706
{
707
float a;
708
float b[2];
709
};
710
711
struct VSIn
712
{
713
uint vid : SV_VertexID;
714
float3 pos : Position;
715
Foo foo[3] : SemIn1;
716
// semantic index assignment:
717
// foo[0].a : SemIn1
718
// foo[0].b[0] : SemIn2
719
// foo[0].b[1] : SemIn3
720
// foo[1].a : SemIn4
721
// foo[1].b[0] : SemIn5
722
// foo[1].b[1] : SemIn6
723
// foo[2].a : SemIn7
724
// foo[2].b[0] : SemIn8
725
// foo[2].b[1] : SemIn9
726
float f : SemIn10;
727
};
728
729
struct VSOut
730
{
731
float f : SemOut1;
732
Foo foo[3] : SemOut2;
733
// semantic index assignment:
734
// foo[0].a : SemOut2
735
// foo[0].b[0] : SemOut3
736
// foo[0].b[1] : SemOut4
737
// foo[1].a : SemOut5
738
// foo[1].b[0] : SemOut6
739
// foo[1].b[1] : SemOut7
740
// foo[2].a : SemOut8
741
// foo[2].b[0] : SemOut9
742
// foo[2].b[1] : SemOut10
743
float4 pos : SV_Position;
744
};
745
746
void main(in VSIn In, // input signature
747
out VSOut Out) // output signature
748
{
749
...
750
}
751
752
The input signature is packed to be compatible with the IA stage. A packing algorithm must assign the following starting positions to the input signature elements:
753
754
=================== ==== ======= ========= ===========
755
Input element Rows Columns Start row Start column
756
=================== ==== ======= ========= ===========
757
uint VSIn.vid 1 1 0 0
758
float3 VSIn.pos 1 3 1 0
759
float VSIn.foo.a[3] 3 1 2 0
760
float VSIn.foo.b[6] 6 1 5 0
761
float VSIn.f 1 1 11 0
762
=================== ==== ======= ========= ===========
763
764
A reasonable packing algorithm would assign the following starting positions to the output signature elements:
765
766
==================== ==== ======= ========= ===========
767
Input element Rows Columns Start row Start column
768
==================== ==== ======= ========= ===========
769
uint VSOut.f 1 1 1 2
770
float VSOut.foo.a[3] 3 1 1 1
771
float VSOut.foo.b[6] 6 1 1 0
772
float VSOut.pos 1 4 0 0
773
==================== ==== ======= ========= ===========
774
775
Semantic index assignment
776
~~~~~~~~~~~~~~~~~~~~~~~~~
777
Semantic index assignment in DXIL is exactly the same as for DXBC. Semantic index assignment, abbreviated s.idx above, is a consecutive enumeration of all fields under the same semantic name as if the signature were packed for the IA stage. That is, given a complex signature element, e.g., VSOut's foo[3] with semantic name SemOut and starting index 2, the element is flattened into individual fields: foo[0].a, foo[0].b[0], ..., foo[2].b[1], and the fields receive consecutive semantic indexes 2, 3, ..., 10, respectively. Semantic-index pairs are used to set up the IA stage and to capture values of individual signature registers via the StreamOut API.
778
779
DXIL for VS signatures
780
~~~~~~~~~~~~~~~~~~~~~~
781
782
The corresponding DXIL metadata is presented below::
783
784
!dx.entryPoints = !{ !1 }
785
!1 = !{ void @main(), !"main", !2, null, null }
786
; Signatures: In, Out, Patch Constant (optional)
787
!2 = !{ !3, !4, null }
788
789
; Input signature (packed accordiong to IA rules)
790
!3 = !{ !10, !11, !12, !13, !14 }
791
; element idx, semantic name, etype, sv, s.idx, interp, rows, cols, start row, col, ext. list
792
!10 = !{i32 1, !"SV_VertexID", i8 0, i8 1, !30, i32 0, i32 1, i8 1, i32 0, i8 0, null}
793
!11 = !{i32 2, !"Position", i8 0, i8 0, !30, i32 0, i32 1, i8 3, i32 1, i8 0, null}
794
!12 = !{i32 3, !"SemIn", i8 0, i8 0, !32, i32 0, i32 3, i8 1, i32 2, i8 0, null}
795
!13 = !{i32 4, !"SemIn", i8 0, i8 0, !33, i32 0, i32 6, i8 1, i32 5, i8 0, null}
796
!14 = !{i32 5, !"SemIn", i8 0, i8 0, !34, i32 0, i32 1, i8 1, i32 11, i8 0, null}
797
; semantic index assignment:
798
!30 = !{ i32 0 }
799
!32 = !{ i32 1, i32 4, i32 7 }
800
!33 = !{ i32 2, i32 3, i32 5, i32 6, i32 8, i32 9 }
801
!34 = !{ i32 10 }
802
803
; Output signature (tightly packed according to pipeline stage packing rules)
804
!4 = !{ !20, !21, !22, !23 }
805
; element ID, semantic name, etype, sv, s.idx, interp, rows, cols, start row, col, ext. list
806
!20 = !{i32 6, !"SemOut", i8 0, i8 0, !40, i32 2, i32 1, i8 1, i32 1, i8 2, null}
807
!21 = !{i32 7, !"SemOut", i8 0, i8 0, !41, i32 2, i32 3, i8 1, i32 1, i8 1, null}
808
!22 = !{i32 8, !"SemOut", i8 0, i8 0, !42, i32 2, i32 6, i8 1, i32 1, i8 0, null}
809
!23 = !{i32 9, !"SV_Position", i8 0, i8 3, !43, i32 2, i32 1, i8 4, i32 0, i8 0, null}
810
; semantic index assignment:
811
!40 = !{ i32 1 }
812
!41 = !{ i32 2, i32 5, i32 8 }
813
!42 = !{ i32 3, i32 4, i32 6, i32 7, i32 9, i32 10 }
814
!43 = !{ i32 0 }
815
816
Hull shader example
817
~~~~~~~~~~~~~~~~~~~
818
A hull shader (HS) is defined by two entry point functions: control point (CP) function to compute control points, and patch constant (PC) function to compute patch constant data, including the tessellation factors. The inputs to both functions are the input control points for an entire patch, and therefore each element may be indexed by row and, in addition, is indexed by vertex.
819
820
Here is an HS example entry point metadata and signature list::
821
822
; !105 is extended parameter list containing reference to HS State:
823
!101 = !{ void @HSMain(), !"HSMain", !102, null, !105 }
824
; Signatures: In, Out, Patch Constant
825
!102 = !{ !103, !104, !204 }
826
827
The entry point record specifies: (1) CP function HSMain as the main symbol, and (2) PC function via optional metadata node !105.
828
829
CP-input signature describing one input control point::
830
831
!103 = !{ !110, !111 }
832
; element ID, semantic name, etype, sv, s.idx, interp, rows, cols, start row, col, ext. list
833
!110= !{i32 1, !"SV_Position", i8 0, i8 3, !130, i32 0, i32 1, i8 4, i32 0, i8 0, null}
834
!111= !{i32 2, !"array", i8 0, i8 0, !131, i32 0, i32 4, i8 3, i32 1, i8 0, null}
835
; semantic indexing for flattened elements:
836
!130 = !{ i32 0 }
837
!131 = !{ i32 0, i32 1, i32 2, i32 3 }
838
839
Note that SV_OutputControlPointID and SV_PrimitiveID input elements are SGVs loaded through special Dxil intrinsics, and are not present in the signature at all. These have a semantic interpretation of SemanticInterpretationKind::NotInSig.
840
841
CP-output signature describing one output control point::
842
843
!104 = !{ !120, !121 }
844
; element ID, semantic name, etype, sv, s.idx, interp, rows, cols, start row, col, ext. list
845
!120= !{i32 3, !"SV_Position", i8 0, i8 3, !130, i32 0, i32 1, i8 4, i32 0, i8 0, null}
846
!121= !{i32 4, !"array", i8 0, i8 0, !131, i32 0, i32 4, i8 3, i32 1, i8 0, null}
847
848
Hull shaders require an extended parameter that defines extra state::
849
850
; extended parameter HS State
851
!105 = !{ i32 3, !201 }
852
853
; HS State record defines patch constant function and other properties
854
; Patch Constant Function, in CP count, out CP count, tess domain, tess part, out prim, max tess factor
855
!201 = !{ void @PCMain(), 4, 4, 3, 1, 3, 16.0 }
856
857
PC-output signature::
858
859
!204 = !{ !220, !221, !222 }
860
; element ID, semantic name, etype, sv, s.idx, interp, rows, cols, start row, col, ext. list
861
!220= !{i32 3, !"SV_TessFactor", i8 0, i8 25, !130, i32 0, i32 4, i8 1, i32 0, i8 3, null}
862
!221= !{i32 4, !"SV_InsideTessFactor", i8 0, i8 26, !231, i32 0, i32 2, i8 1, i32 4, i8 3, null}
863
!222= !{i32 5, !"array", i8 0, i8 0, !131, i32 0, i32 4, i8 3, i32 0, i8 0, null}
864
; semantic indexing for flattened elements:
865
!231 = !{ i32 0, i32 1 }
866
867
Accessing signature value in operations
868
---------------------------------------
869
870
There are no function parameters or variables that correspond to signature elements. Instead loadInput and storeOutput functions are used to access signature element values in operations. The accesses are scalar.
871
872
These are the operation signatures::
873
874
; overloads: SM5.1: f16|f32|i16|i32, SM6.0: f16|f32|f64|i8|i16|i32|i64
875
declare float @dx.op.loadInput.f32(
876
i32, ; opcode
877
i32, ; input ID
878
i32, ; row (relative to start row of input ID)
879
i8, ; column (relative to start column of input ID), constant in [0,3]
880
i32) ; vertex index
881
882
; overloads: SM5.1: f16|f32|i16|i32, SM6.0: f16|f32|f64|i8|i16|i32|i64
883
declare void @dx.op.storeOutput.f32(
884
i32, ; opcode
885
i32, ; output ID
886
i32, ; row (relative to start row of output ID)
887
i8, ; column (relative to start column of output ID), constant in [0,3]
888
float) ; value to store
889
890
LoadInput/storeOutput takes input/output element ID, which is the unique ID of a signature element metadata record. The row parameter is the array element row index from the start of the element; the register index is obtained by adding the start row of the element and the row parameter value. Similarly, the column parameter is relative column index; the packed register component is obtained by adding the start component of the element (packed col) and the column value. Several overloads exist to access elements of different primitive types. LoadInput takes an additional vertex index parameter that represents vertex index for DS CP-inputs and GS inputs; vertex index must be undef in other cases.
891
892
Signature packing
893
-----------------
894
895
Signature elements must be packed into a space of N 4-32-bit registers according to runtime constraints. DXIL contains packed signatures. The packing algorithm is more aggressive than that for DX11. However, DXIL packing is only a suggestion to the driver implementation. Driver compilers can rearrange signature elements as they see fit, while preserving compatibility of connected pipeline stages. DXIL is designed in such a way that it is easy to 'relocate' signature elements - loadInput/storeOutput row and column indices do not need to change since they are relative to the start row/column for each element.
896
897
Signature packing types
898
~~~~~~~~~~~~~~~~~~~~~~~
899
900
Two pipeline stages can connect in four different ways, resulting in four packing types.
901
902
1. Input Assembly: VS input only
903
* Elements all map to unique registers, they may not be packed together.
904
* Interpolation mode is not used.
905
2. Connects to Rasterizer: VS output, HS CP-input/output and PC-input, DS CP-input/output, GS input/output, PS input
906
* Elements can be packed according to constraints.
907
* Interpolation mode is used and must be consistent between connecting signatures.
908
* While HS CP-output and DS CP-input signatures do not go through the rasterizer, they are still treated as such. The reason is the pass-through HS case, in which HS CP-input and HS CP-output must have identical packing for efficiency.
909
3. Patch Constant: HS PC-output, DS PC-input
910
* SV_TessFactor and SV_InsideTessFactor are the only SVs relevant here, and this is the only location where they are legal. These have special packing considerations.
911
* Interpolation mode is not used.
912
4. Pixel Shader Output: PS output only
913
* Only SV_Target maps to output register space.
914
* No packing is performed, semantic index corresponds to render target index.
915
916
Packing constraints
917
~~~~~~~~~~~~~~~~~~~
918
919
The packing algorithm is stricter and more aggressive in DXIL than in DXBC, although still compatible. In particular, array signature elements are not broken up into scalars, even if each array access can be disambiguated to a literal index. DXIL and DXBC signature packing are not identical, so linking them together into a single pipeline is not supported across compiler generations.
920
921
The row dimension of a signature element represents an index range. If constraints permit, two adjacent or overlapping index ranges are coalesced into a single index range.
922
923
Packing constraints are as follows:
924
925
1. A register must have only one interpolation mode for all 4 components.
926
2. Register components containing SVs must be to the right of components containing non-SVs.
927
3. SV_ClipDistance and SV_CullDistance have additional constraints:
928
a. May be packed together
929
b. Must occupy a maximum of 2 registers (8-components)
930
c. SV_ClipDistance must have linear interpolation mode
931
4. Registers containing SVs may not be within an index range, with the exception of Tessellation Factors (TessFactors).
932
5. If an index range R1 overlaps with a TessFactor index range R2, R1 must be contained within R2. As a consequence, outside and inside TessFactors occupy disjoint index ranges when packed.
933
6. Non-TessFactor index ranges are combined into a larger range, if they overlap.
934
7. SGVs must be packed after all non-SGVs have been packed. If there are several SGVs, they are packed in the order of HLSL declaration.
935
936
Packing for SGVs
937
~~~~~~~~~~~~~~~~
938
939
Non-SGV portions of two connecting signatures must match; however, SGV portions don't have to. An example would be a PS declaring SV_PrimitiveID as an input. If VS connects to PS, PS's SV_PrimitiveID value is synthesized by hardware; moreover, it is illegal to output SV_PrimitiveID from a VS. If GS connects PS, GS may declare SV_PrimitiveID as its output.
940
941
Unfortunately, SGV specification creates a complication for separate compilation of connecting shaders. For example, GS outputs SV_PrimitiveID, and PS inputs SV_IsFrontFace and SV_PrimitiveID in this order. The positions of SV_PrimitiveID are incompatible in GS and PS signatures. Not much can be done about this ambiguity in SM5.0 and earlier; the programmers will have to rely on SDKLayers to catch potential mismatch.
942
943
SM5.1 and later shaders work on D3D12+ runtime that uses PSO objects to describe pipeline state. Therefore, a driver compiler has access to both connecting shaders during compilation, even though the HLSL compiler does not. The driver compiler can resolve SGV ambiguity in signatures easily. For SM5.1 and later, the HLSL compiler will ensure that declared SGVs fit into packed signature; however, it will set SGV's start row-column location to (-1, 0) such that the driver compiler must resolve SGV placement during PSO compilation.
944
945
Shader Resources
946
================
947
948
All global resources referenced by entry points of an LLVM module are described via named metadata dx.resources, which consists of four metadata lists of resource records::
949
950
!dx.resources = !{ !1, !2, !3, !4 }
951
952
Resource lists are as follows.
953
954
=== ======== ==============================
955
Idx Type Description
956
=== ======== ==============================
957
0 Metadata SRVs - shader resource views.
958
1 Metadata UAVs - unordered access views.
959
2 Metadata CBVs - constant buffer views.
960
3 Metadata Samplers.
961
=== ======== ==============================
962
963
Metadata resource records
964
-------------------------
965
966
Each resource list contains resource records. Each resource record contains fields that are common for each resource type, followed by fields specific to each resource type, followed by a metadata list of tag/value pairs, which can be used to specify additional properties or future extensions and may be null or empty.
967
968
Common fields:
969
970
=== =============== ==========================================================================================
971
Idx Type Description
972
=== =============== ==========================================================================================
973
0 i32 Unique resource record ID, used to identify the resource record in createHandle operation.
974
1 Pointer Pointer to a global constant symbol with the original shape of resource and element type.
975
2 Metadata string Name of resource variable.
976
3 i32 Bind space ID of the root signature range that corresponds to this resource.
977
4 i32 Bind lower bound of the root signature range that corresponds to this resource.
978
5 i32 Range size of the root signature range that corresponds to this resource.
979
=== =============== ==========================================================================================
980
981
When the shader has reflection information, the name is the original, unmangled HLSL name. If reflection is stripped, the name is empty string.
982
983
SRV-specific fields:
984
985
=== =============== ==========================================================================================
986
Idx Type Description
987
=== =============== ==========================================================================================
988
6 i32 SRV resource shape (enum value).
989
7 i32 SRV sample count.
990
8 Metadata Metadata list of additional tag-value pairs.
991
=== =============== ==========================================================================================
992
993
SRV-specific tag/value pairs:
994
995
=== === ==== =================================================== ============================================
996
Idx Tag Type Resource Type Description
997
=== === ==== =================================================== ============================================
998
0 0 i32 Any resource, except RawBuffer and StructuredBuffer Element type.
999
1 1 i32 StructuredBuffer Element stride or StructureBuffer, in bytes.
1000
=== === ==== =================================================== ============================================
1001
1002
The symbol names for the are kDxilTypedBufferElementTypeTag (0) and kDxilStructuredBufferElementStrideTag (1).
1003
1004
UAV-specific fields:
1005
1006
=== =============== ==========================================================================================
1007
Idx Type Description
1008
=== =============== ==========================================================================================
1009
6 i32 UAV resource shape (enum value).
1010
7 i1 1 - globally-coherent UAV; 0 - otherwise.
1011
8 i1 1 - UAV has counter; 0 - otherwise.
1012
9 i1 1 - UAV is ROV (rasterizer ordered view); 0 - otherwise.
1013
10 Metadata Metadata list of additional tag-value pairs.
1014
=== =============== ==========================================================================================
1015
1016
UAV-specific tag/value pairs:
1017
1018
=== === ==== ====================================================== ============================================
1019
Idx Tag Type Resource Type Description
1020
=== === ==== ====================================================== ============================================
1021
0 0 i32 RW resource, except RWRawBuffer and RWStructuredBuffer Element type.
1022
1 1 i32 RWStructuredBuffer Element stride or StructureBuffer, in bytes.
1023
=== === ==== ====================================================== ============================================
1024
1025
The symbol names for the are kDxilTypedBufferElementTypeTag (0) and kDxilStructuredBufferElementStrideTag (1).
1026
1027
CBV-specific fields:
1028
1029
=== =============== ==========================================================================================
1030
Idx Type Description
1031
=== =============== ==========================================================================================
1032
6 i32 Constant buffer size in bytes.
1033
7 Metadata Metadata list of additional tag-value pairs.
1034
=== =============== ==========================================================================================
1035
1036
Sampler-specific fields:
1037
1038
=== =============== ==========================================================================================
1039
Idx Type Description
1040
=== =============== ==========================================================================================
1041
6 i32 Sampler type (enum value).
1042
7 Metadata Metadata list of additional tag-value pairs.
1043
=== =============== ==========================================================================================
1044
1045
The following example demonstrates SRV metadata::
1046
1047
; Original HLSL
1048
; Texture2D<float4> MyTexture2D : register(t0, space0);
1049
; StructuredBuffer<NS1::MyType1> MyBuffer[2][3] : register(t1, space0);
1050
1051
!1 = !{ !2, !3 }
1052
1053
; Scalar resource: Texture2D<float4> MyTexture2D.
1054
%dx.types.ResElem.v4f32 = type { <4 x float> }
1055
@MyTexture2D = external addrspace(1) constant %dx.types.ResElem.v4f32, align 16
1056
!2 = !{ i32 0, %dx.types.ResElem.v4f32 addrspace(1)* @MyTexture2D, !"MyTexture2D",
1057
i32 0, i32 0, i32 1, i32 2, i32 0, null }
1058
1059
; Array resource: StructuredBuffer<MyType1> MyBuffer[2][3].
1060
%struct.NS1.MyType1 = type { float, <2 x i32> }
1061
%dx.types.ResElem.NS1.MyType1 = type { %struct.NS1.MyType1 }
1062
@MyBuffer = external addrspace(1) constant [2x [3 x %dx.types.ResElem.NS1.MyType1]], align 16
1063
!3 = !{ i32 1, [2 x [3 x %dx.types.ResElem.NS1.MyType1]] addrspace(1)* @MyBuffer, !"MyBuffer",
1064
i32 0, i32 1, i32 6, i32 11, i32 0, null }
1065
1066
The type name of the variable is constructed by appending the element name (primitive, vector or UDT name) to dx.types.ResElem prefix. The type configuration of the resource range variable conveys (1) resource range shape and (2) resource element type.
1067
1068
1069
Reflection information
1070
----------------------
1071
1072
Resource reflection data is conveyed via the resource's metadata record and global, external variable. The metadata record contains the original HLSL name, root signature range information, and the reference to the global resource variable declaration. The resource variable declaration conveys resource range shape, resource type and resource element type.
1073
1074
The following disassembly provides an example::
1075
1076
; Scalar resource: Texture2D<float4> MyTexture2D.
1077
%dx.types.ResElem.v4f32 = type { <4 x float> }
1078
@MyTexture2D = external addrspace(1) constant %dx.types.ResElem.v4f32, align 16
1079
!0 = !{ i32 0, %dx.types.ResElem.v4f32 addrspace(1)* @MyTexture2D, !"MyTexture2D",
1080
i32 0, i32 3, i32 1, i32 2, i32 0, null }
1081
1082
; struct MyType2 { float4 field1; int2 field2; };
1083
; Constant buffer: ConstantBuffer<MyType2> MyCBuffer1[][3] : register(b5, space7)
1084
%struct.MyType2 = type { <4 x float>, <2 x i32> }
1085
; Type reflection information (optional)
1086
!struct.MyType2 = !{ !1, !2 }
1087
!1 = !{ !"field1", null }
1088
!2 = !{ !"field2", null }
1089
1090
%dx.types.ResElem.MyType1 = type { %struct.MyType2 }
1091
1092
@MyCBuffer1 = external addrspace(1) constant [0 x [3 x %dx.types.ResElem.MyType2]], align 16
1093
1094
!3 = !{ i32 0, [0 x [3 x %dx.types.ResElem.MyType1]] addrspace(1)* @MyCBuffer1, !"MyCBuffer1",
1095
i32 7, i32 5, i32 -1, null }
1096
1097
The reflection information can be removed from DXIL by obfuscating the resource HLSL name and resource variable name as well as removing reflection type annotations, if any.
1098
1099
Structure of resource operation
1100
-------------------------------
1101
1102
Operations involving shader resources and samplers are expressed via external function calls.
1103
1104
Below is an example for the sample method::
1105
1106
%dx.types.ResRet.f32 = type { float, float, float, float, i32 }
1107
1108
declare %dx.types.ResRet.f32 @dx.op.sample.f32(
1109
i32, ; opcode
1110
%dx.types.ResHandle, ; texture handle
1111
%dx.types.SamplerHandle, ; sampler handle
1112
float, ; coordinate c0
1113
float, ; coordinate c1
1114
float, ; coordinate c2
1115
float, ; coordinate c3
1116
i32, ; offset o0
1117
i32, ; offset o1
1118
i32, ; offset o2
1119
float) ; clamp
1120
1121
The method always returns five scalar values that are aggregated in dx.types.ResRet.f32 type and extracted into scalars via LLVM's extractelement right after the call. The first four elements are sample values and the last field is the status of operation for tiled resources. Some return values may be unused, which is easily determined from the SSA form. The driver compiler is free to specialize the sample instruction to the most efficient form depending on which return values are used in computation.
1122
1123
If applicable, each intrinsic is overloaded on return type, e.g.::
1124
1125
%dx.types.ResRet.f32 = type { float, float, float, float, i32 }
1126
%dx.types.ResRet.f16 = type { half, half, half, half, i32 }
1127
1128
declare %dx.types.ResRet.f32 @dx.op.sample.f32(...)
1129
declare %dx.types.ResRet.f16 @dx.op.sample.f16(...)
1130
1131
Wherever applicable, the return type indicates the "precision" at which the operation is executed. For example, sample intrinsic that returns half data is allowed to be executed at half precision, assuming hardware supports this; however, if the return type is float, the sample operation must be executed in float precision. If lower-precision is not supported by hardware, it is allowed to execute a higher-precision variant of the operation.
1132
1133
The opcode parameter uniquely identifies the sample operation. More details can be found in the Instructions section. The value of opcode is the same for all overloads of an operation.
1134
1135
Some resource operations are "polymorphic" with respect to resource types, e.g., dx.op.sample.f32 operates on several resource types: Texture1D[Array], Texture2D[Array], Texture3D, TextureCUBE[Array].
1136
1137
Each resource/sampler is represented by a pair of i32 values. The first value is a unique (virtual) resource range ID, which corresponds to HLSL declaration of a resource/sampler. Range ID must be a constant for SM5.1 and below. The second integer is a 0-based index within the range. The index must be constant for SM5.0 and below.
1138
1139
Both indices can be dynamic for SM6 and later to provide flexibility in usage of resources/samplers in control flow, e.g.::
1140
1141
Texture2D<float4> a[8], b[8];
1142
...
1143
Texture2D<float4> c;
1144
if(cond) // arbitrary expression
1145
c = a[idx1];
1146
else
1147
c = b[idx2];
1148
... = c.Sample(...);
1149
1150
Resources/samplers used in such a way must reside in descriptor tables (cannot be root descriptors); this will be validated during shader and root signature setup.
1151
1152
The DXIL verifier will ensure that all leaf-ranges (a and b above) of such a resource/sampler live-range have the same resource/sampler type and element type. If applicable, this constraint may be relaxed in the future. In particular, it is logical from HLSL programmer point of view to issue loads on compatible resource types, e.g., Texture2D, RWTexture2D, ROVTexture2D::
1153
1154
Texture2D<float4> a[8];
1155
RWTexture2D<float4> b[6];
1156
...
1157
Texture2D<float4> c;
1158
if(cond) // arbitrary expression
1159
c = a[idx1];
1160
else
1161
c = b[idx2];
1162
... = c.Load(...);
1163
1164
LLVM's undef value is used for unused input parameters. For example, coordinates c2 and c3 in an dx.op.sample.f32 call for Texture2D are undef, as only two coordinates c0 and c1 are required.
1165
1166
If the clamp parameter is unused, its default value is 0.0f.
1167
1168
Resource operations are not overloaded on input parameter types. For example, dx.op.sample.f32 operation does not have an overload where coordinates have half, rather than float, data type. Instead, the precision of input arguments can be inferred from the IR via a straightforward lookup along an SSA edge, e.g.::
1169
1170
%c0 = fpext half %0 to float
1171
%res = call %dx.types.ResRet.f32 @dx.op.sample.f32(..., %c0, ...)
1172
1173
SSA form makes it easy to infer that value %0 of type half got promoted to float. The driver compiler can tailor the instruction to the most efficient form for the target hardware.
1174
1175
Resource operations
1176
-------------------
1177
1178
The section lists resource access operations. The specification is given for float return type, if applicable. The list of all overloads can be found in the appendix on intrinsic operations.
1179
1180
Some general rules to interpret resource operations:
1181
1182
* The number of active (meaningful) return components is determined by resource element type. Other return values must be unused; validator ensures this.
1183
* GPU instruction needs status only if the status return value is used in the program, which is determined through SSA.
1184
* Overload suffixes are specified for each resource operation.
1185
* Type of resource determines which inputs must be defined. Unused inputs are passed typed LLVM 'undef' values. This is checked by the DXIL validator.
1186
* Offset input parameters are i8 constants in [-8,+7] range; default offset is 0.
1187
1188
Resource operation return types
1189
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1190
1191
Many resource operations return several scalar values as well as status for tiled resource access. The return values are grouped into a helper structure type, as this is LLVM's way to return several values from the operation. After an operation, helper types are immediately decomposed into scalars, which are used in further computation.
1192
1193
The defined helper types are listed below::
1194
1195
%dx.types.ResRet.i8 = type { i8, i8, i8, i8, i32 }
1196
%dx.types.ResRet.i16 = type { i16, i16, i16, i16, i32 }
1197
%dx.types.ResRet.i32 = type { i32, i32, i32, i32, i32 }
1198
%dx.types.ResRet.i64 = type { i64, i64, i64, i64, i32 }
1199
%dx.types.ResRet.f16 = type { half, half, half, half, i32 }
1200
%dx.types.ResRet.f32 = type { float, float, float, float, i32 }
1201
%dx.types.ResRet.f64 = type { double, double, double, double, i32 }
1202
1203
%dx.types.Dimensions = type { i32, i32, i32, i32 }
1204
%dx.types.SamplePos = type { float, float }
1205
1206
Resource handles
1207
~~~~~~~~~~~~~~~~
1208
1209
Resources are identified via handles passed to resource operations. Handles are represented via opaque type::
1210
1211
%dx.types.Handle = type { i8 * }
1212
1213
The handles are created out of resource range ID and index into the range::
1214
1215
declare %dx.types.Handle @dx.op.createHandle(
1216
i32, ; opcode
1217
i8, ; resource class: SRV=0, UAV=1, CBV=2, Sampler=3
1218
i32, ; resource range ID (constant)
1219
i32, ; index into the range
1220
i1) ; non-uniform resource index: false or true
1221
1222
Resource class is a constant that indicates which metadata list (SRV, UAV, CBV, Sampler) to use for property queries.
1223
1224
Resource range ID is an i32 constant, which is the position of the metadata record in the corresponding metadata list. Range IDs start with 0 and are contiguous within each list.
1225
1226
Index is an i32 value that may be a constant or a value computed by the shader.
1227
1228
CBufferLoadLegacy
1229
~~~~~~~~~~~~~~~~~
1230
1231
The following signature shows the operation syntax::
1232
1233
; overloads: SM5.1: f32|i32|f64, future SM: possibly deprecated
1234
%dx.types.CBufRet.f32 = type { float, float, float, float }
1235
declare %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(
1236
i32, ; opcode
1237
%dx.types.Handle, ; resource handle
1238
i32) ; 0-based row index (row = 16-byte DXBC register)
1239
1240
Valid resource types: ConstantBuffer. Valid shader model: SM5.1 and earlier.
1241
1242
The operation loads four 32-bit values from a constant buffer, which has legacy, 16-byte layout. Values are extracted via "extractvalue" instruction; unused values may be optimized away by the driver compiler. The operation respects SM5.1 and earlier OOB behavior for cbuffers.
1243
1244
CBufferLoad
1245
~~~~~~~~~~~
1246
1247
The following signature shows the operation syntax::
1248
1249
; overloads: SM5.1: f32|i32|f64, SM6.0: f16|f32|f64|i16|i32|i64
1250
declare float @dx.op.cbufferLoad.f32(
1251
i32, ; opcode
1252
%dx.types.Handle, ; resource handle
1253
i32, ; byte offset from the start of the buffer memory
1254
i32) ; read alignment
1255
1256
Valid resource types: ConstantBuffer.
1257
1258
The operation loads a value from a constant buffer, which has linear layout, using 1D index: byte offset from the beginning of the buffer memory. The operation respects SM5.1 and earlier OOB behavior for cbuffers.
1259
1260
Read alignment is a constant value identifying what the byte offset alignment is. If the actual byte offset does not have this alignment, the results of this operation are undefined.
1261
1262
GetDimensions
1263
~~~~~~~~~~~~~
1264
1265
The following signature shows the operation syntax::
1266
1267
declare %dx.types.Dimensions @dx.op.getDimensions(
1268
i32, ; opcode
1269
%dx.types.Handle, ; resource handle
1270
i32) ; MIP level
1271
1272
This table describes the return component meanings for each resource type { c0, c1, c2, c3 }.
1273
1274
==================== ===== ========== ========== ==========
1275
Valid resource types c0 c1 c2 c3
1276
==================== ===== ========== ========== ==========
1277
[RW]Texture1D width undef undef MIP levels
1278
[RW]Texture1DArray width array size undef MIP levels
1279
[RW]Texture2D width height undef MIP levels
1280
[RW]Texture2DArray width height array size MIP levels
1281
[RW]Texture3D width height depth MIP levels
1282
[RW]Texture2DMS width height undef samples
1283
[RW]Texture2DMSArray width height array size samples
1284
TextureCUBE width height undef MIP levels
1285
TextureCUBEArray width height array size MIP levels
1286
[RW]TypedBuffer width undef undef undef
1287
[RW]RawBuffer width undef undef undef
1288
[RW]StructuredBuffer width undef undef undef
1289
==================== ===== ========== ========== ==========
1290
1291
MIP levels is always undef for RW resources. Undef means the component will not be used. The validator will verify this.
1292
There is no GetDimensions that returns float values.
1293
1294
Sample
1295
~~~~~~
1296
1297
The following signature shows the operation syntax::
1298
1299
; overloads: SM5.1: f32, SM6.0: f16|f32
1300
declare %dx.types.ResRet.f32 @dx.op.sample.f32(
1301
i32, ; opcode
1302
%dx.types.Handle, ; texture handle
1303
%dx.types.Handle, ; sampler handle
1304
float, ; coordinate c0
1305
float, ; coordinate c1
1306
float, ; coordinate c2
1307
float, ; coordinate c3
1308
i32, ; offset o0
1309
i32, ; offset o1
1310
i32, ; offset o2
1311
float) ; clamp
1312
1313
=================== ================================ ===================
1314
Valid resource type # of active coordinates # of active offsets
1315
=================== ================================ ===================
1316
Texture1D 1 (c0) 1 (o0)
1317
Texture1DArray 2 (c0, c1 = array slice) 1 (o0)
1318
Texture2D 2 (c0, c1) 2 (o0, o1)
1319
Texture2DArray 3 (c0, c1, c2 = array slice) 2 (o0, o1)
1320
Texture3D 3 (c0, c1, c2) 3 (o0, o1, o2)
1321
TextureCUBE 3 (c0, c1, c2) 3 (o0, o1, o2)
1322
TextureCUBEArray 4 (c0, c1, c2, c3 = array slice) 3 (o0, o1, o2)
1323
=================== ================================ ===================
1324
1325
SampleBias
1326
~~~~~~~~~~
1327
1328
The following signature shows the operation syntax::
1329
1330
; overloads: SM5.1: f32, SM6.0: f16|f32
1331
declare %dx.types.ResRet.f32 @dx.op.sampleBias.f32(
1332
i32, ; opcode
1333
%dx.types.Handle, ; texture handle
1334
%dx.types.Handle, ; sampler handle
1335
float, ; coordinate c0
1336
float, ; coordinate c1
1337
float, ; coordinate c2
1338
float, ; coordinate c3
1339
i32, ; offset o0
1340
i32, ; offset o1
1341
i32, ; offset o2
1342
float, ; bias: in [-16.f,15.99f]
1343
float) ; clamp
1344
1345
Valid resource types and active components/offsets are the same as for the sample operation.
1346
1347
SampleLevel
1348
~~~~~~~~~~~
1349
1350
The following signature shows the operation syntax::
1351
1352
; overloads: SM5.1: f32, SM6.0: f16|f32
1353
declare %dx.types.ResRet.f32 @dx.op.sampleLevel.f32(
1354
i32, ; opcode
1355
%dx.types.Handle, ; texture handle
1356
%dx.types.Handle, ; sampler handle
1357
float, ; coordinate c0
1358
float, ; coordinate c1
1359
float, ; coordinate c2
1360
float, ; coordinate c3
1361
i32, ; offset o0
1362
i32, ; offset o1
1363
i32, ; offset o2
1364
float) ; LOD
1365
1366
Valid resource types and active components/offsets are the same as for the sample operation.
1367
1368
SampleGrad
1369
~~~~~~~~~~
1370
1371
The following signature shows the operation syntax::
1372
1373
; overloads: SM5.1: f32, SM6.0: f16|f32
1374
declare %dx.types.ResRet.f32 @dx.op.sampleGrad.f32(
1375
i32, ; opcode
1376
%dx.types.Handle, ; texture handle
1377
%dx.types.Handle, ; sampler handle
1378
float, ; coordinate c0
1379
float, ; coordinate c1
1380
float, ; coordinate c2
1381
float, ; coordinate c3
1382
i32, ; offset o0
1383
i32, ; offset o1
1384
i32, ; offset o2
1385
float, ; ddx0
1386
float, ; ddx1
1387
float, ; ddx2
1388
float, ; ddy0
1389
float, ; ddy1
1390
float, ; ddy2
1391
float) ; clamp
1392
1393
Valid resource types and active components and offsets are the same as for the sample operation. Valid active ddx and ddy are the same as offsets.
1394
1395
SampleCmp
1396
~~~~~~~~~
1397
1398
The following signature shows the operation syntax::
1399
1400
; overloads: SM5.1: f32, SM6.0: f16|f32
1401
declare %dx.types.ResRet.f32 @dx.op.sampleCmp.f32(
1402
i32, ; opcode
1403
%dx.types.Handle, ; texture handle
1404
%dx.types.Handle, ; sampler handle
1405
float, ; coordinate c0
1406
float, ; coordinate c1
1407
float, ; coordinate c2
1408
float, ; coordinate c3
1409
i32, ; offset o0
1410
i32, ; offset o1
1411
i32, ; offset o2
1412
float, ; compare value
1413
float) ; clamp
1414
1415
=================== ================================ ===================
1416
Valid resource type # of active coordinates # of active offsets
1417
=================== ================================ ===================
1418
Texture1D 1 (c0) 1 (o0)
1419
Texture1DArray 2 (c0, c1 = array slice) 1 (o0)
1420
Texture2D 2 (c0, c1) 2 (o0, o1)
1421
Texture2DArray 3 (c0, c1, c2 = array slice) 2 (o0, o1)
1422
TextureCUBE 3 (c0, c1, c2) 3 (o0, o1, o2)
1423
TextureCUBEArray 4 (c0, c1, c2, c3 = array slice) 3 (o0, o1, o2)
1424
=================== ================================ ===================
1425
1426
SampleCmpLevelZero
1427
~~~~~~~~~~~~~~~~~~
1428
1429
The following signature shows the operation syntax::
1430
1431
; overloads: SM5.1: f32, SM6.0: f16|f32
1432
declare %dx.types.ResRet.f32 @dx.op.sampleCmpLevelZero.f32(
1433
i32, ; opcode
1434
%dx.types.Handle, ; texture handle
1435
%dx.types.Handle, ; sampler handle
1436
float, ; coordinate c0
1437
float, ; coordinate c1
1438
float, ; coordinate c2
1439
float, ; coordinate c3
1440
i32, ; offset o0
1441
i32, ; offset o1
1442
i32, ; offset o2
1443
float) ; compare value
1444
1445
Valid resource types and active components/offsets are the same as for the sampleCmp operation.
1446
1447
TextureLoad
1448
~~~~~~~~~~~
1449
1450
The following signature shows the operation syntax::
1451
1452
; overloads: SM5.1: f32|i32, SM6.0: f16|f32|i16|i32
1453
declare %dx.types.ResRet.f32 @dx.op.textureLoad.f32(
1454
i32, ; opcode
1455
%dx.types.Handle, ; texture handle
1456
i32, ; MIP level; sample for Texture2DMS
1457
i32, ; coordinate c0
1458
i32, ; coordinate c1
1459
i32, ; coordinate c2
1460
i32, ; offset o0
1461
i32, ; offset o1
1462
i32) ; offset o2
1463
1464
=================== ========= ============================ ===================
1465
Valid resource type MIP level # of active coordinates # of active offsets
1466
=================== ========= ============================ ===================
1467
Texture1D yes 1 (c0) 1 (o0)
1468
RWTexture1D undef 1 (c0) undef
1469
Texture1DArray yes 2 (c0, c1 = array slice) 1 (o0)
1470
RWTexture1DArray undef 2 (c0, c1 = array slice) undef
1471
Texture2D yes 2 (c0, c1) 2 (o0, o1)
1472
RWTexture2D undef 2 (c0, c1) undef
1473
Texture2DArray yes 3 (c0, c1, c2 = array slice) 2 (o0, o1)
1474
RWTexture2DArray undef 3 (c0, c1, c2 = array slice) undef
1475
Texture3D yes 3 (c0, c1, c2) 3 (o0, o1, o2)
1476
RWTexture3D undef 3 (c0, c1, c2) undef
1477
=================== ========= ============================ ===================
1478
1479
For Texture2DMS:
1480
1481
=================== ============ =================================
1482
Valid resource type Sample index # of active coordinate components
1483
=================== ============ =================================
1484
Texture2DMS yes 2 (c0, c1)
1485
Texture2DMSArray yes 3 (c0, c1, c2 = array slice)
1486
=================== ============ =================================
1487
1488
TextureStore
1489
~~~~~~~~~~~~
1490
1491
The following signature shows the operation syntax::
1492
1493
; overloads: SM5.1: f32|i32, SM6.0: f16|f32|i16|i32
1494
; returns: status
1495
declare void @dx.op.textureStore.f32(
1496
i32, ; opcode
1497
%dx.types.Handle, ; texture handle
1498
i32, ; coordinate c0
1499
i32, ; coordinate c1
1500
i32, ; coordinate c2
1501
float, ; value v0
1502
float, ; value v1
1503
float, ; value v2
1504
float, ; value v3
1505
i8) ; write mask
1506
1507
The write mask indicates which components are written (x - 1, y - 2, z - 4, w - 8), similar to DXBC. The mask must cover all resource components.
1508
1509
=================== =================================
1510
Valid resource type # of active coordinate components
1511
=================== =================================
1512
RWTexture1D 1 (c0)
1513
RWTexture1DArray 2 (c0, c1 = array slice)
1514
RWTexture2D 2 (c0, c1)
1515
RWTexture2DArray 3 (c0, c1, c2 = array slice)
1516
RWTexture3D 3 (c0, c1, c2)
1517
=================== =================================
1518
1519
CalculateLOD
1520
~~~~~~~~~~~~
1521
1522
The following signature shows the operation syntax::
1523
1524
; returns: LOD
1525
declare float @dx.op.calculateLOD.f32(
1526
i32, ; opcode
1527
%dx.types.Handle, ; texture handle
1528
%dx.types.Handle, ; sampler handle
1529
float, ; coordinate c0, [0.0, 1.0]
1530
float, ; coordinate c1, [0.0, 1.0]
1531
float, ; coordinate c2, [0.0, 1.0]
1532
i1) ; true - clamped; false - unclamped
1533
1534
============================= =======================
1535
Valid resource type # of active coordinates
1536
============================= =======================
1537
Texture1D, Texture1DArray 1 (c0)
1538
Texture2D, Texture2DArray 2 (c0, c1)
1539
Texture3D 3 (c0, c1, c2)
1540
TextureCUBE, TextureCUBEArray 3 (c0, c1, c2)
1541
============================= =======================
1542
1543
TextureGather
1544
~~~~~~~~~~~~~
1545
1546
The following signature shows the operation syntax::
1547
1548
; overloads: SM5.1: f32|i32, SM6.0: f16|f32|i16|i32
1549
declare %dx.types.ResRet.f32 @dx.op.textureGather.f32(
1550
i32, ; opcode
1551
%dx.types.Handle, ; texture handle
1552
%dx.types.Handle, ; sampler handle
1553
float, ; coordinate c0
1554
float, ; coordinate c1
1555
float, ; coordinate c2
1556
float, ; coordinate c3
1557
i32, ; offset o0
1558
i32, ; offset o1
1559
i32) ; channel, constant in {0=red,1=green,2=blue,3=alpha}
1560
1561
=================== ================================ ===================
1562
Valid resource type # of active coordinates # of active offsets
1563
=================== ================================ ===================
1564
Texture2D 2 (c0, c1) 2 (o0, o1)
1565
Texture2DArray 3 (c0, c1, c2 = array slice) 2 (o0, o1)
1566
TextureCUBE 3 (c0, c1, c2) 0
1567
TextureCUBEArray 4 (c0, c1, c2, c3 = array slice) 0
1568
=================== ================================ ===================
1569
1570
TextureGatherCmp
1571
~~~~~~~~~~~~~~~~
1572
1573
The following signature shows the operation syntax::
1574
1575
; overloads: SM5.1: f32|i32, SM6.0: f16|f32|i16|i32
1576
declare %dx.types.ResRet.f32 @dx.op.textureGatherCmp.f32(
1577
i32, ; opcode
1578
%dx.types.Handle, ; texture handle
1579
%dx.types.Handle, ; sampler handle
1580
float, ; coordinate c0
1581
float, ; coordinate c1
1582
float, ; coordinate c2
1583
float, ; coordinate c3
1584
i32, ; offset o0
1585
i32, ; offset o1
1586
i32, ; channel, constant in {0=red,1=green,2=blue,3=alpha}
1587
float) ; compare value
1588
1589
Valid resource types and active components/offsets are the same as for the textureGather operation.
1590
1591
Texture2DMSGetSamplePosition
1592
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1593
1594
The following signature shows the operation syntax::
1595
1596
declare %dx.types.SamplePos @dx.op.texture2DMSGetSamplePosition(
1597
i32, ; opcode
1598
%dx.types.Handle, ; texture handle
1599
i32) ; sample ID
1600
1601
Returns sample position of a texture.
1602
1603
RenderTargetGetSamplePosition
1604
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1605
1606
The following signature shows the operation syntax::
1607
1608
declare %dx.types.SamplePos @dx.op.renderTargetGetSamplePosition(
1609
i32, ; opcode
1610
i32) ; sample ID
1611
1612
Returns sample position of a render target.
1613
1614
RenderTargetGetSampleCount
1615
~~~~~~~~~~~~~~~~~~~~~~~~~~
1616
1617
The following signature shows the operation syntax::
1618
1619
declare i32 @dx.op.renderTargetGetSampleCount(
1620
i32) ; opcode
1621
1622
Returns sample count of a render target.
1623
1624
BufferLoad
1625
~~~~~~~~~~
1626
1627
The following signature shows the operation syntax::
1628
1629
; overloads: SM5.1: f32|i32, SM6.0: f32|i32
1630
declare %dx.types.ResRet.f32 @dx.op.bufferLoad.f32(
1631
i32, ; opcode
1632
%dx.types.Handle, ; resource handle
1633
i32, ; coordinate c0
1634
i32) ; coordinate c1
1635
1636
The call respects SM5.1 OOB and alignment rules.
1637
1638
=================== =====================================================
1639
Valid resource type # of active coordinates
1640
=================== =====================================================
1641
[RW]TypedBuffer 1 (c0 in elements)
1642
[RW]RawBuffer 1 (c0 in bytes)
1643
[RW]TypedBuffer 2 (c0 in elements, c1 = byte offset into the element)
1644
=================== =====================================================
1645
1646
BufferStore
1647
~~~~~~~~~~~
1648
1649
The following signature shows the operation syntax::
1650
1651
; overloads: SM5.1: f32|i32, SM6.0: f32|i32
1652
; returns: status
1653
declare void @dx.op.bufferStore.f32(
1654
i32, ; opcode
1655
%dx.types.Handle, ; resource handle
1656
i32, ; coordinate c0
1657
i32, ; coordinate c1
1658
float, ; value v0
1659
float, ; value v1
1660
float, ; value v2
1661
float, ; value v3
1662
i8) ; write mask
1663
1664
The call respects SM5.1 OOB and alignment rules.
1665
1666
The write mask indicates which components are written (x - 1, y - 2, z - 4, w - 8), similar to DXBC. For RWTypedBuffer, the mask must cover all resource components. For RWRawBuffer and RWStructuredBuffer, valid masks are: x, xy, xyz, xyzw.
1667
1668
=================== =====================================================
1669
Valid resource type # of active coordinates
1670
=================== =====================================================
1671
RWTypedBuffer 1 (c0 in elements)
1672
RWRawBuffer 1 (c0 in bytes)
1673
RWStructuredBuffer 2 (c0 in elements, c1 = byte offset into the element)
1674
=================== =====================================================
1675
1676
BufferUpdateCounter
1677
~~~~~~~~~~~~~~~~~~~
1678
1679
The following signature shows the operation syntax::
1680
1681
; opcodes: bufferUpdateCounter
1682
declare void @dx.op.bufferUpdateCounter(
1683
i32, ; opcode
1684
%dx.types.ResHandle, ; buffer handle
1685
i8) ; 1 - increment, -1 - decrement
1686
1687
Valid resource type: RWRawBuffer.
1688
1689
AtomicBinOp
1690
~~~~~~~~~~~
1691
1692
The following signature shows the operation syntax::
1693
1694
; overloads: SM5.1: i32, SM6.0: i32
1695
; returns: original value in memory before the operation
1696
declare i32 @dx.op.atomicBinOp.i32(
1697
i32, ; opcode
1698
%dx.types.Handle, ; resource handle
1699
i32, ; binary operation code: EXCHANGE, IADD, AND, OR, XOR, IMIN, IMAX, UMIN, UMAX
1700
i32, ; coordinate c0
1701
i32, ; coordinate c1
1702
i32, ; coordinate c2
1703
i32) ; new value
1704
1705
The call respects SM5.1 OOB and alignment rules.
1706
1707
=================== =====================================================
1708
Valid resource type # of active coordinates
1709
=================== =====================================================
1710
RWTexture1D 1 (c0)
1711
RWTexture1DArray 2 (c0, c1 = array slice)
1712
RWTexture2D 2 (c0, c1)
1713
RWTexture2DArray 3 (c0, c1, c2 = array slice)
1714
RWTexture3D 3 (c0, c1, c2)
1715
RWTypedBuffer 1 (c0 in elements)
1716
RWRawBuffer 1 (c0 in bytes)
1717
RWStructuredBuffer 2 (c0 in elements, c1 - byte offset into the element)
1718
=================== =====================================================
1719
1720
AtomicBinOp subsumes corresponding DXBC atomic operations that do not return the old value in memory. The driver compiler is free to specialize the corresponding GPU instruction if the return value is unused.
1721
1722
AtomicCompareExchange
1723
~~~~~~~~~~~~~~~~~~~~~
1724
1725
The following signature shows the operation syntax::
1726
1727
; overloads: SM5.1: i32, SM6.0: i32
1728
; returns: original value in memory before the operation
1729
declare i32 @dx.op.atomicBinOp.i32(
1730
i32, ; opcode
1731
%dx.types.Handle, ; resource handle
1732
i32, ; coordinate c0
1733
i32, ; coordinate c1
1734
i32, ; coordinate c2
1735
i32, ; comparison value
1736
i32) ; new value
1737
1738
The call respects SM5.1 OOB and alignment rules.
1739
1740
=================== =====================================================
1741
Valid resource type # of active coordinates
1742
=================== =====================================================
1743
RWTexture1D 1 (c0)
1744
RWTexture1DArray 2 (c0, c1 = array slice)
1745
RWTexture2D 2 (c0, c1)
1746
RWTexture2DArray 3 (c0, c1, c2 = array slice)
1747
RWTexture3D 3 (c0, c1, c2)
1748
RWTypedBuffer 1 (c0 in elements)
1749
RWRawBuffer 1 (c0 in bytes)
1750
RWStructuredBuffer 2 (c0 in elements, c1 - byte offset into the element)
1751
=================== =====================================================
1752
1753
AtomicCompareExchange subsumes DXBC's atomic compare store. The driver compiler is free to specialize the corresponding GPU instruction if the return value is unused.
1754
1755
GetBufferBasePtr (SM6.0)
1756
~~~~~~~~~~~~~~~~~~~~~~~~
1757
1758
The following signature shows the operation syntax::
1759
1760
Returns i8* pointer to the base of [RW]RawBuffer instance.
1761
declare i8 addrspace(ASmemory) * @dx.op.getBufferBasePtr.pASmemory (
1762
i32, ; opcode
1763
%dx.types.Handle) ; resource handle
1764
Returns i8* pointer to the base of ConstantBuffer instance.
1765
declare i8 addrspace(AScbuffer) * @dx.op.getBufferBasePtr.pAScbuffer(
1766
i32, ; opcode
1767
%dx.types.Handle) ; resource handle
1768
1769
Given SM5.1 resource handle, return base pointer to perform pointer-based accesses to the resource memory.
1770
1771
Note: the functionality is requested for SM6.0 to support pointer-based accesses to SM5.1 resources with raw linear memory (raw buffer and cbuffer) in HLSL next. This would be one of the way how a valid pointer is produced in the shader, and would let new-style, pointer-based code access SM5.1 resources with linear memory view.
1772
1773
Atomic operations via pointer
1774
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1775
1776
Groupshared memory atomic operations are done via LLVM atomic instructions atomicrmw and cmpxchg. The instructions accept only i32 addrspace(ASgs) * pointers, where ASgs is the addrspace number of groupshared variables. Atomicrmw instruction does not support 'sub' and 'nand' operations. These constraints may be revisited in the future. OOB behavior is undefined.
1777
SM6.0 will enable similar mechanism for atomic operations performed on device memory (raw buffer).
1778
1779
Samplers
1780
--------
1781
1782
There are no intrinsics for samplers. Sampler reflection data is represented similar to other resources.
1783
1784
Immediate Constant Buffer
1785
-------------------------
1786
There is no immediate constant buffer in DXIL. Instead, indexable constants are represented via LLVM global initialized constants in address space ASicb.
1787
1788
Texture Buffers
1789
---------------
1790
A texture buffer is mapped to RawBuffer. Texture buffer variable declarations are present for reflection purposes only.
1791
1792
Groupshared memory
1793
------------------
1794
Groupshared memory (DXBC g-registers) is linear in DXIL. Groupshared variables are declared via global variables in addrspace(ASgs). The optimizer will not group variables; the driver compiler can do this if desired. Accesses to groupshared variables occur via pointer load/store instructions (see below).
1795
1796
Indexable threadlocal memory
1797
----------------------------
1798
Indexable threadlocal memory (DXBC x-registers) is linear in DXIL. Threadlocal variables are "declared" via alloca instructions. Threadlocal variables are assumed to reside in addrspace(0). The variables are not allocated into some memory pool; the driver compiler can do this, if desired. Accesses to threadlocal variables occur via pointer load/store instructions (see below).
1799
1800
Load/Store/Atomics via pointer in future SM
1801
-------------------------------------------
1802
HLSL offers several abstractions with linear memory: buffers, cbuffers, groupshared and indexable threadlocal memory, that are conceptually similar, but have different HLSL syntax and some differences in behavior, which are exposed to HLSL developers. The plan is to introduce pointers into HLSL to unify access syntax to such linear-memory resources such that they appear conceptually the same to HLSL programmers.
1803
1804
Each resource memory type is expressed by a unique LLVM address space. The following table shows memory types and their address spaces:
1805
1806
========================================= =====================================
1807
Memory type Address space number n - addrspace(n)
1808
========================================= =====================================
1809
code, local, indexable threadlocal memory AS_default = 0
1810
device memory ([RW]RawBuffer) AS_memory = 1
1811
cbuffer-like memory (ConstantBuffer) AS_cbuffer = 2
1812
groupshared memory AS_groupshared = 3
1813
========================================= =====================================
1814
1815
Pointers can be produced in the shader in a variety of ways (see Memory accesses section). Note that if GetBaseBufferPtr was used on [RW]RawBuffer or ConstantBuffer to produce a pointer, the base pointer is stateless; i.e., it "loses its connection" to the underlying resource and is treated as a stateless pointer into a particular memory type.
1816
1817
Additional resource properties
1818
------------------------------
1819
TODO: enumerate all additional resource range properties, e.g., ROV, Texture2DMS, globally coherent, UAV counter, sampler mode, CB: immediate/dynamic indexed.
1820
1821
Operations
1822
==========
1823
DXIL operations are represented in two ways: using LLVM instructions and using LLVM external functions. The reference list of operations as well as their overloads can be found in the attached Excel spreadsheet "DXIL Operations".
1824
1825
Operations via instructions
1826
---------------------------
1827
1828
DXIL uses a subset of core LLVM IR instructions that make sense for HLSL, where the meaning of the LLVM IR operation matches the meaning of the HLSL operation.
1829
1830
The following LLVM instructions are valid in a DXIL program, with the specified operand types where applicable. The legend for overload types (v)oid, (h)alf, (f)loat, (d)ouble, (1)-bit, (8)-bit, (w)ord, (i)nt, (l)ong.
1831
1832
.. <py>import hctdb_instrhelp</py>
1833
.. <py::lines('INSTR-RST')>hctdb_instrhelp.get_instrs_rst()</py>
1834
.. INSTR-RST:BEGIN
1835
1836
============= ======================================================================= =================
1837
Instruction Action Operand overloads
1838
============= ======================================================================= =================
1839
Ret returns a value (possibly void), from a function. vhfd1wil
1840
Br branches (conditional or unconditional)
1841
Switch performs a multiway switch
1842
Add returns the sum of its two operands wil
1843
FAdd returns the sum of its two operands hfd
1844
Sub returns the difference of its two operands wil
1845
FSub returns the difference of its two operands hfd
1846
Mul returns the product of its two operands wil
1847
FMul returns the product of its two operands hfd
1848
UDiv returns the quotient of its two unsigned operands wil
1849
SDiv returns the quotient of its two signed operands wil
1850
FDiv returns the quotient of its two operands hfd
1851
URem returns the remainder from the unsigned division of its two operands wil
1852
SRem returns the remainder from the signed division of its two operands wil
1853
FRem returns the remainder from the division of its two operands hfd
1854
Shl shifts left (logical) wil
1855
LShr shifts right (logical), with zero bit fill wil
1856
AShr shifts right (arithmetic), with 'a' operand sign bit fill wil
1857
And returns a bitwise logical and of its two operands 1wil
1858
Or returns a bitwise logical or of its two operands 1wil
1859
Xor returns a bitwise logical xor of its two operands 1wil
1860
Alloca allocates memory on the stack frame of the currently executing function
1861
Load reads from memory
1862
Store writes to memory
1863
GetElementPtr gets the address of a subelement of an aggregate value
1864
AtomicCmpXchg atomically modifies memory
1865
AtomicRMW atomically modifies memory
1866
Trunc truncates an integer 1wil
1867
ZExt zero extends an integer 1wil
1868
SExt sign extends an integer 1wil
1869
FPToUI converts a floating point to UInt hfd1wil
1870
FPToSI converts a floating point to SInt hfd1wil
1871
UIToFP converts a UInt to floating point hfd1wil
1872
SIToFP converts a SInt to floating point hfd1wil
1873
FPTrunc truncates a floating point hfd
1874
FPExt extends a floating point hfd
1875
BitCast performs a bit-preserving type cast hfd1wil
1876
AddrSpaceCast casts a value addrspace
1877
ICmp compares integers 1wil
1878
FCmp compares floating points hfd
1879
PHI is a PHI node instruction
1880
Call calls a function
1881
Select selects an instruction
1882
ExtractValue extracts from aggregate
1883
============= ======================================================================= =================
1884
1885
1886
.. INSTR-RST:END
1887
1888
Operations via external functions
1889
---------------------------------
1890
Operations missing in core LLVM IR, such as abs, fma, discard, etc., are represented by external functions, whose name is prefixed with dx.op.
1891
1892
The very first parameter of each such external function is the opcode of the operation, which is an i32 constant. For example, dx.op.unary computes a unary function T res = opcode(T input). Opcode defines which unary function to perform.
1893
1894
Opcodes are defined on a dense range and will be provided as enum in a header file. The opcode parameter is introduced for efficiency reasons: grouping of operations to reduce the total number of overloads and more efficient property lookup, e.g., via an array of operation properties rather than a hash table.
1895
1896
.. <py::lines('OPCODES-RST')>hctdb_instrhelp.get_opcodes_rst()</py>
1897
.. OPCODES-RST:BEGIN
1898
1899
=== ============================= ================================================================================================================
1900
ID Name Description
1901
=== ============================= ================================================================================================================
1902
0 TempRegLoad helper load operation
1903
1 TempRegStore helper store operation
1904
2 MinPrecXRegLoad helper load operation for minprecision
1905
3 MinPrecXRegStore helper store operation for minprecision
1906
4 LoadInput loads the value from shader input
1907
5 StoreOutput stores the value to shader output
1908
6 FAbs returns the absolute value of the input value.
1909
7 Saturate clamps the result of a single or double precision floating point value to [0.0f...1.0f]
1910
8 IsNaN returns the IsNaN
1911
9 IsInf returns the IsInf
1912
10 IsFinite returns the IsFinite
1913
11 IsNormal returns the IsNormal
1914
12 Cos returns cosine(theta) for theta in radians.
1915
13 Sin returns the Sin
1916
14 Tan returns the Tan
1917
15 Acos returns the Acos
1918
16 Asin returns the Asin
1919
17 Atan returns the Atan
1920
18 Hcos returns the Hcos
1921
19 Hsin returns the Hsin
1922
20 Htan returns the Htan
1923
21 Exp returns the Exp
1924
22 Frc returns the Frc
1925
23 Log returns the Log
1926
24 Sqrt returns the Sqrt
1927
25 Rsqrt returns the Rsqrt
1928
26 Round_ne returns the Round_ne
1929
27 Round_ni returns the Round_ni
1930
28 Round_pi returns the Round_pi
1931
29 Round_z returns the Round_z
1932
30 Bfrev returns the reverse bit pattern of the input value
1933
31 Countbits returns the Countbits
1934
32 FirstbitLo returns the FirstbitLo
1935
33 FirstbitHi returns src != 0? (BitWidth-1 - FirstbitHi) : -1
1936
34 FirstbitSHi returns src != 0? (BitWidth-1 - FirstbitSHi) : -1
1937
35 FMax returns a if a >= b, else b
1938
36 FMin returns a if a < b, else b
1939
37 IMax returns the IMax of the input values
1940
38 IMin returns the IMin of the input values
1941
39 UMax returns the UMax of the input values
1942
40 UMin returns the UMin of the input values
1943
41 IMul returns the IMul of the input values
1944
42 UMul returns the UMul of the input values
1945
43 UDiv returns the UDiv of the input values
1946
44 UAddc returns the UAddc of the input values
1947
45 USubb returns the USubb of the input values
1948
46 FMad performs a fused multiply add (FMA) of the form a * b + c
1949
47 Fma performs a fused multiply add (FMA) of the form a * b + c
1950
48 IMad performs an integral IMad
1951
49 UMad performs an integral UMad
1952
50 Msad performs an integral Msad
1953
51 Ibfe performs an integral Ibfe
1954
52 Ubfe performs an integral Ubfe
1955
53 Bfi given a bit range from the LSB of a number, places that number of bits in another number at any offset
1956
54 Dot2 two-dimensional vector dot-product
1957
55 Dot3 three-dimensional vector dot-product
1958
56 Dot4 four-dimensional vector dot-product
1959
57 CreateHandle creates the handle to a resource
1960
58 CBufferLoad loads a value from a constant buffer resource
1961
59 CBufferLoadLegacy loads a value from a constant buffer resource
1962
60 Sample samples a texture
1963
61 SampleBias samples a texture after applying the input bias to the mipmap level
1964
62 SampleLevel samples a texture using a mipmap-level offset
1965
63 SampleGrad samples a texture using a gradient to influence the way the sample location is calculated
1966
64 SampleCmp samples a texture and compares a single component against the specified comparison value
1967
65 SampleCmpLevelZero samples a texture and compares a single component against the specified comparison value
1968
66 TextureLoad reads texel data without any filtering or sampling
1969
67 TextureStore reads texel data without any filtering or sampling
1970
68 BufferLoad reads from a TypedBuffer
1971
69 BufferStore writes to a RWTypedBuffer
1972
70 BufferUpdateCounter atomically increments/decrements the hidden 32-bit counter stored with a Count or Append UAV
1973
71 CheckAccessFullyMapped determines whether all values from a Sample, Gather, or Load operation accessed mapped tiles in a tiled resource
1974
72 GetDimensions gets texture size information
1975
73 TextureGather gathers the four texels that would be used in a bi-linear filtering operation
1976
74 TextureGatherCmp same as TextureGather, except this instrution performs comparison on texels, similar to SampleCmp
1977
75 Texture2DMSGetSamplePosition gets the position of the specified sample
1978
76 RenderTargetGetSamplePosition gets the position of the specified sample
1979
77 RenderTargetGetSampleCount gets the number of samples for a render target
1980
78 AtomicBinOp performs an atomic operation on two operands
1981
79 AtomicCompareExchange atomic compare and exchange to memory
1982
80 Barrier inserts a memory barrier in the shader
1983
81 CalculateLOD calculates the level of detail
1984
82 Discard discard the current pixel
1985
83 DerivCoarseX computes the rate of change of components per stamp
1986
84 DerivCoarseY computes the rate of change of components per stamp
1987
85 DerivFineX computes the rate of change of components per pixel
1988
86 DerivFineY computes the rate of change of components per pixel
1989
87 EvalSnapped evaluates an input attribute at pixel center with an offset
1990
88 EvalSampleIndex evaluates an input attribute at a sample location
1991
89 EvalCentroid evaluates an input attribute at pixel center
1992
90 SampleIndex returns the sample index in a sample-frequency pixel shader
1993
91 Coverage returns the coverage mask input in a pixel shader
1994
92 InnerCoverage returns underestimated coverage input from conservative rasterization in a pixel shader
1995
93 ThreadId reads the thread ID
1996
94 GroupId reads the group ID (SV_GroupID)
1997
95 ThreadIdInGroup reads the thread ID within the group (SV_GroupThreadID)
1998
96 FlattenedThreadIdInGroup provides a flattened index for a given thread within a given group (SV_GroupIndex)
1999
97 EmitStream emits a vertex to a given stream
2000
98 CutStream completes the current primitive topology at the specified stream
2001
99 EmitThenCutStream equivalent to an EmitStream followed by a CutStream
2002
100 GSInstanceID GSInstanceID
2003
101 MakeDouble creates a double value
2004
102 SplitDouble splits a double into low and high parts
2005
103 LoadOutputControlPoint LoadOutputControlPoint
2006
104 LoadPatchConstant LoadPatchConstant
2007
105 DomainLocation DomainLocation
2008
106 StorePatchConstant StorePatchConstant
2009
107 OutputControlPointID OutputControlPointID
2010
108 PrimitiveID PrimitiveID
2011
109 CycleCounterLegacy CycleCounterLegacy
2012
110 WaveIsFirstLane returns 1 for the first lane in the wave
2013
111 WaveGetLaneIndex returns the index of the current lane in the wave
2014
112 WaveGetLaneCount returns the number of lanes in the wave
2015
113 WaveAnyTrue returns 1 if any of the lane evaluates the value to true
2016
114 WaveAllTrue returns 1 if all the lanes evaluate the value to true
2017
115 WaveActiveAllEqual returns 1 if all the lanes have the same value
2018
116 WaveActiveBallot returns a struct with a bit set for each lane where the condition is true
2019
117 WaveReadLaneAt returns the value from the specified lane
2020
118 WaveReadLaneFirst returns the value from the first lane
2021
119 WaveActiveOp returns the result the operation across waves
2022
120 WaveActiveBit returns the result of the operation across all lanes
2023
121 WavePrefixOp returns the result of the operation on prior lanes
2024
122 QuadReadLaneAt reads from a lane in the quad
2025
123 QuadOp returns the result of a quad-level operation
2026
124 BitcastI16toF16 bitcast between different sizes
2027
125 BitcastF16toI16 bitcast between different sizes
2028
126 BitcastI32toF32 bitcast between different sizes
2029
127 BitcastF32toI32 bitcast between different sizes
2030
128 BitcastI64toF64 bitcast between different sizes
2031
129 BitcastF64toI64 bitcast between different sizes
2032
130 LegacyF32ToF16 legacy fuction to convert float (f32) to half (f16) (this is not related to min-precision)
2033
131 LegacyF16ToF32 legacy fuction to convert half (f16) to float (f32) (this is not related to min-precision)
2034
132 LegacyDoubleToFloat legacy fuction to convert double to float
2035
133 LegacyDoubleToSInt32 legacy fuction to convert double to int32
2036
134 LegacyDoubleToUInt32 legacy fuction to convert double to uint32
2037
135 WaveAllBitCount returns the count of bits set to 1 across the wave
2038
136 WavePrefixBitCount returns the count of bits set to 1 on prior lanes
Dec 28, 2016
2039
=== ============================= ================================================================================================================
2040
2041
2042
Cos
2043
~~~
2044
2045
Theta values can be any IEEE 32-bit floating point values.
2046
2047
The maximum absolute error is 0.0008 in the interval from -100*Pi to +100*Pi.
2048
2049
2050
+----------+------+------------+---------+----+----+---------+------------+------+-----+
2051
| src | -inf | -F | -denorm | -0 | +0 | +denorm | +F | +inf | NaN |
2052
+----------+------+------------+---------+----+----+---------+------------+------+-----+
2053
| cos(src) | NaN | [-1 to +1] | -0 | -0 | +0 | +0 | [-1 to +1] | NaN | NaN |
2054
+----------+------+------------+---------+----+----+---------+------------+------+-----+
2055
2056
FAbs
2057
~~~~
2058
2059
The FAbs instruction takes simply forces the sign of the number(s) on the source operand positive, including on INF values.
2060
Applying FAbs on NaN preserves NaN, although the particular NaN bit pattern that results is not defined.
2061
2062
FMax
2063
~~~~
2064
2065
>= is used instead of > so that if min(x,y) = x then max(x,y) = y.
2066
2067
NaN has special handling: If one source operand is NaN, then the other source operand is returned.
2068
If both are NaN, any NaN representation is returned.
2069
This conforms to new IEEE 754R rules.
2070
2071
Denorms are flushed (sign preserved) before comparison, however the result written to dest may or may not be denorm flushed.
2072
2073
+------+-----------------------------+
2074
| a | b |
2075
| +------+--------+------+------+
2076
| | -inf | F | +inf | NaN |
2077
+------+------+--------+------+------+
2078
| -inf | -inf | b | +inf | -inf |
2079
+------+------+--------+------+------+
2080
| F | a | a or b | +inf | a |
2081
+------+------+--------+------+------+
2082
| +inf | +inf | +inf | +inf | +inf |
2083
+------+------+--------+------+------+
2084
| NaN | -inf | b | +inf | NaN |
2085
+------+------+--------+------+------+
2086
2087
FMin
2088
~~~~
2089
2090
NaN has special handling: If one source operand is NaN, then the other source operand is returned.
2091
If both are NaN, any NaN representation is returned.
2092
This conforms to new IEEE 754R rules.
2093
2094
Denorms are flushed (sign preserved) before comparison, however the result written to dest may or may not be denorm flushed.
2095
2096
+------+-----------------------------+
2097
| a | b |
2098
| +------+--------+------+------+
2099
| | -inf | F | +inf | NaN |
2100
+------+------+--------+------+------+
2101
| -inf | -inf | -inf | -inf | -inf |
2102
+------+------+--------+------+------+
2103
| F | -inf | a or b | a | a |
2104
+------+------+--------+------+------+
2105
| +inf | -inf | b | +inf | +inf |
2106
+------+------+--------+------+------+
2107
| NaN | -inf | b | +inf | NaN |
2108
+------+------+--------+------+------+
2109
Dec 28, 2016
2110
Saturate
2111
~~~~~~~~
2112
2113
The Saturate instruction performs the following operation on its input value:
2114
2115
min(1.0f, max(0.0f, value))
2116
2117
where min() and max() in the above expression behave in the way Min and Max behave.
2118
2119
Saturate(NaN) returns 0, by the rules for min and max.
2120
2121
Sin
2122
~~~
2123
2124
Theta values can be any IEEE 32-bit floating point values.
2125
2126
The maximum absolute error is 0.0008 in the interval from -100*Pi to +100*Pi.
2127
2128
+----------+------+------------+---------+----+----+---------+------------+------+-----+
2129
| src | -inf | -F | -denorm | -0 | +0 | +denorm | +F | +inf | NaN |
2130
+----------+------+------------+---------+----+----+---------+------------+------+-----+
2131
| sin(src) | NaN | [-1 to +1] | +1 | +1 | +1 | +1 | [-1 to +1] | NaN | NaN |
2132
+----------+------+------------+---------+----+----+---------+------------+------+-----+
2133
2134
.. OPCODES-RST:END
2135
2136
2137
Custom instructions
2138
-------------------
2139
Instructions for third-party extensions will be specially-prefixed external function calls, identified by a declared extension-set-prefix. Additional metadata will be included to provide hints about uniformity, pure or const guarantees, alignment, etc.
2140
2141
Validation Rules
2142
================
2143
2144
The following rules are verified by the *Validator* component and thus can be relied upon by downstream consumers.
2145
2146
The set of validation rules that are known to hold for a DXIL program is identifier by the 'dx.valver' named metadata node, which consists of a two-element tuple of constant int values, a major and minor version. Minor version numbers are increments as rules are added to a prior table or as the implementation fixes issues.
2147
2148
.. <py::lines('VALRULES-RST')>hctdb_instrhelp.get_valrules_rst()</py>
2149
.. VALRULES-RST:BEGIN
2150
2151
===================================== =======================================================================================================================================================================================================================================================================================================
2152
Rule Code Description
2153
===================================== =======================================================================================================================================================================================================================================================================================================
2154
BITCODE.VALID TODO - Module must be bitcode-valid
2155
CONTAINER.PARTINVALID DXIL Container must not contain unknown parts
2156
CONTAINER.PARTMATCHES DXIL Container Parts must match Module
2157
CONTAINER.PARTMISSING DXIL Container requires certain parts, corresponding to module
2158
CONTAINER.PARTREPEATED DXIL Container must have only one of each part type
2159
CONTAINER.ROOTSIGNATUREINCOMPATIBLE Root Signature in DXIL Container must be compatible with shader
2160
DECL.DXILFNEXTERN External function must be a DXIL function
2161
DECL.DXILNSRESERVED The DXIL reserved prefixes must only be used by built-in functions and types
2162
DECL.FNFLATTENPARAM Function parameters must not use struct types
2163
DECL.FNISCALLED Functions can only be used by call instructions
2164
DECL.NOTUSEDEXTERNAL External declaration should not be used
2165
DECL.USEDEXTERNALFUNCTION External function must be used
2166
DECL.USEDINTERNAL Internal declaration must be used
2167
FLOW.DEADLOOP Loop must have break
2168
FLOW.FUNCTIONCALL Function with parameter is not permitted
2169
FLOW.NORECUSION Recursion is not permitted
2170
FLOW.REDUCIBLE Execution flow must be reducible
2171
INSTR.ALLOWED Instructions must be of an allowed type
2172
INSTR.BARRIERMODEFORNONCS sync in a non-Compute Shader must only sync UAV (sync_uglobal)
2173
INSTR.BARRIERMODENOMEMORY sync must include some form of memory barrier - _u (UAV) and/or _g (Thread Group Shared Memory). Only _t (thread group sync) is optional.
2174
INSTR.BARRIERMODEUSELESSUGROUP sync can't specify both _ugroup and _uglobal. If both are needed, just specify _uglobal.
2175
INSTR.BUFFERUPDATECOUNTERONUAV BufferUpdateCounter valid only on UAV
2176
INSTR.CALLOLOAD Call to DXIL intrinsic must match overload signature
2177
INSTR.CANNOTPULLPOSITION pull-model evaluation of position disallowed
2178
INSTR.CBUFFERCLASSFORCBUFFERHANDLE Expect Cbuffer for CBufferLoad handle
2179
INSTR.CBUFFEROUTOFBOUND Cbuffer access out of bound
2180
INSTR.COORDINATECOUNTFORRAWTYPEDBUF raw/typed buffer don't need 2 coordinates
2181
INSTR.COORDINATECOUNTFORSTRUCTBUF structured buffer require 2 coordinates
2182
INSTR.DXILSTRUCTUSER Dxil struct types should only used by ExtractValue
2183
INSTR.DXILSTRUCTUSEROUTOFBOUND Index out of bound when extract value from dxil struct types
2184
INSTR.EVALINTERPOLATIONMODE Interpolation mode on %0 used with eval_* instruction must be linear, linear_centroid, linear_noperspective, linear_noperspective_centroid, linear_sample or linear_noperspective_sample
2185
INSTR.EXTRACTVALUE ExtractValue should only be used on dxil struct types and cmpxchg
2186
INSTR.FAILTORESLOVETGSMPOINTER TGSM pointers must originate from an unambiguous TGSM global variable.
2187
INSTR.HANDLENOTFROMCREATEHANDLE Resource handle should returned by createHandle
2188
INSTR.IMMBIASFORSAMPLEB bias amount for sample_b must be in the range [%0,%1], but %2 was specified as an immediate
2189
INSTR.INBOUNDSACCESS Access to out-of-bounds memory is disallowed
2190
INSTR.MINPRECISIONNOTPRECISE Instructions marked precise may not refer to minprecision values
2191
INSTR.MINPRECISONBITCAST Bitcast on minprecison types is not allowed
2192
INSTR.MIPLEVELFORGETDIMENSION Use mip level on buffer when GetDimensions
2193
INSTR.MIPONUAVLOAD uav load don't support mipLevel/sampleIndex
2194
INSTR.NOGENERICPTRADDRSPACECAST Address space cast between pointer types must have one part to be generic address space
2195
INSTR.NOIDIVBYZERO No signed integer division by zero
2196
INSTR.NOINDEFINITEACOS No indefinite arccosine
2197
INSTR.NOINDEFINITEASIN No indefinite arcsine
2198
INSTR.NOINDEFINITEDSXY No indefinite derivative calculation
2199
INSTR.NOINDEFINITELOG No indefinite logarithm
2200
INSTR.NOREADINGUNINITIALIZED Instructions should not read uninitialized value
2201
INSTR.NOUDIVBYZERO No unsigned integer division by zero
2202
INSTR.OFFSETONUAVLOAD uav load don't support offset
2203
INSTR.OLOAD DXIL intrinsic overload must be valid
2204
INSTR.ONLYONEALLOCCONSUME RWStructuredBuffers may increment or decrement their counters, but not both.
2205
INSTR.OPCODERESERVED Instructions must not reference reserved opcodes
2206
INSTR.OPCONST DXIL intrinsic requires an immediate constant operand
2207
INSTR.OPCONSTRANGE Constant values must be in-range for operation
2208
INSTR.OPERANDRANGE DXIL intrinsic operand must be within defined range
2209
INSTR.PTRBITCAST Pointer type bitcast must be have same size
2210
INSTR.RESOURCECLASSFORLOAD load can only run on UAV/SRV resource
2211
INSTR.RESOURCECLASSFORSAMPLERGATHER sample, lod and gather should on srv resource.
2212
INSTR.RESOURCECLASSFORUAVSTORE store should on uav resource.
2213
INSTR.RESOURCECOORDINATEMISS coord uninitialized
2214
INSTR.RESOURCECOORDINATETOOMANY out of bound coord must be undef
2215
INSTR.RESOURCEKINDFORBUFFERLOADSTORE buffer load/store only works on Raw/Typed/StructuredBuffer
2216
INSTR.RESOURCEKINDFORCALCLOD lod requires resource declared as texture1D/2D/3D/Cube/CubeArray/1DArray/2DArray
2217
INSTR.RESOURCEKINDFORGATHER gather requires resource declared as texture/2D/Cube/2DArray/CubeArray
2218
INSTR.RESOURCEKINDFORGETDIM Invalid resource kind on GetDimensions
2219
INSTR.RESOURCEKINDFORSAMPLE sample/_l/_d requires resource declared as texture1D/2D/3D/Cube/1DArray/2DArray/CubeArray
2220
INSTR.RESOURCEKINDFORSAMPLEC samplec requires resource declared as texture1D/2D/Cube/1DArray/2DArray/CubeArray
2221
INSTR.RESOURCEKINDFORTEXTURELOAD texture load only works on Texture1D/1DArray/2D/2DArray/3D/MS2D/MS2DArray
2222
INSTR.RESOURCEKINDFORTEXTURESTORE texture store only works on Texture1D/1DArray/2D/2DArray/3D
2223
INSTR.RESOURCEOFFSETMISS offset uninitialized
2224
INSTR.RESOURCEOFFSETTOOMANY out of bound offset must be undef
2225
INSTR.SAMPLECOMPTYPE sample_* instructions require resource to be declared to return UNORM, SNORM or FLOAT.
2226
INSTR.SAMPLEINDEXFORLOAD2DMS load on Texture2DMS/2DMSArray require sampleIndex
2227
INSTR.SAMPLERMODEFORLOD lod instruction requires sampler declared in default mode
2228
INSTR.SAMPLERMODEFORSAMPLE sample/_l/_d/_cl_s/gather instruction requires sampler declared in default mode
2229
INSTR.SAMPLERMODEFORSAMPLEC sample_c_*/gather_c instructions require sampler declared in comparison mode
2230
INSTR.STRUCTBITCAST Bitcast on struct types is not allowed
2231
INSTR.TEXTUREOFFSET offset texture instructions must take offset which can resolve to integer literal in the range -8 to 7
2232
INSTR.TGSMRACECOND Race condition writing to shared memory detected, consider making this write conditional
2233
INSTR.UNDEFRESULTFORGETDIMENSION GetDimensions used undef dimension %0 on %1
2234
INSTR.WRITEMASKFORTYPEDUAVSTORE store on typed uav must write to all four components of the UAV
2235
INSTR.WRITEMASKMATCHVALUEFORUAVSTORE uav store write mask must match store value mask, write mask is %0 and store value mask is %1
2236
META.BRANCHFLATTEN Can't use branch and flatten attributes together
2237
META.CLIPCULLMAXCOMPONENTS Combined elements of SV_ClipDistance and SV_CullDistance must fit in 8 components
2238
META.CLIPCULLMAXROWS Combined elements of SV_ClipDistance and SV_CullDistance must fit in two rows.
2239
META.CONTROLFLOWHINTNOTONCONTROLFLOW Control flow hint only works on control flow inst
2240
META.DENSERESIDS Resource identifiers must be zero-based and dense
2241
META.DUPLICATESYSVALUE System value may only appear once in signature
2242
META.ENTRYFUNCTION entrypoint not found
2243
META.FLAGSUSAGE Flags must match usage
2244
META.FORCECASEONSWITCH Attribute forcecase only works for switch
2245
META.FUNCTIONANNOTATION Cannot find function annotation for %0
2246
META.GLCNOTONAPPENDCONSUME globallycoherent cannot be used with append/consume buffers
2247
META.INTEGERINTERPMODE Interpolation mode on integer must be Constant
2248
META.INTERPMODEINONEROW Interpolation mode must be identical for all elements packed into the same row.
2249
META.INTERPMODEVALID Interpolation mode must be valid
2250
META.INVALIDCONTROLFLOWHINT Invalid control flow hint
2251
META.KNOWN Named metadata should be known
2252
META.MAXTESSFACTOR Hull Shader MaxTessFactor must be [%0..%1]. %2 specified
2253
META.NOSEMANTICOVERLAP Semantics must not overlap
2254
META.REQUIRED TODO - Required metadata missing
2255
META.SEMAKINDMATCHESNAME Semantic name must match system value, when defined.
2256
META.SEMAKINDVALID Semantic kind must be valid
2257
META.SEMANTICCOMPTYPE %0 must be %1
2258
META.SEMANTICINDEXMAX System value semantics have a maximum valid semantic index
2259
META.SEMANTICLEN Semantic length must be at least 1 and at most 64
2260
META.SEMANTICSHOULDBEALLOCATED Semantic should have a valid packing location
2261
META.SEMANTICSHOULDNOTBEALLOCATED Semantic should have a packing location of -1
2262
META.SIGNATURECOMPTYPE signature %0 specifies unrecognized or invalid component type
2263
META.SIGNATUREILLEGALCOMPONENTORDER Component ordering for packed elements must be: arbitrary < system value < system generated value
2264
META.SIGNATUREINDEXCONFLICT Only elements with compatible indexing rules may be packed together
2265
META.SIGNATUREOUTOFRANGE Signature elements must fit within maximum signature size
2266
META.SIGNATUREOVERLAP Signature elements may not overlap in packing location.
2267
META.STRUCTBUFALIGNMENT StructuredBuffer stride not aligned
2268
META.STRUCTBUFALIGNMENTOUTOFBOUND StructuredBuffer stride out of bounds
2269
META.SYSTEMVALUEROWS System value may only have 1 row
2270
META.TARGET Target triple must be 'dxil-ms-dx'
2271
META.TESSELLATOROUTPUTPRIMITIVE Invalid Tessellator Output Primitive specified. Must be point, line, triangleCW or triangleCCW.
2272
META.TESSELLATORPARTITION Invalid Tessellator Partitioning specified. Must be integer, pow2, fractional_odd or fractional_even.
2273
META.TEXTURETYPE elements of typed buffers and textures must fit in four 32-bit quantities
2274
META.USED All metadata must be used by dxil
2275
META.VALIDSAMPLERMODE Invalid sampler mode on sampler
2276
META.VALUERANGE Metadata value must be within range
2277
META.WELLFORMED TODO - Metadata must be well-formed in operand count and types
2278
SM.APPENDANDCONSUMEONSAMEUAV BufferUpdateCounter inc and dec on a given UAV (%d) cannot both be in the same shader for shader model less than 5.1.
2279
SM.CBUFFERELEMENTOVERFLOW CBuffer elements must not overflow
2280
SM.CBUFFEROFFSETOVERLAP CBuffer offsets must not overlap
2281
SM.CBUFFERTEMPLATETYPEMUSTBESTRUCT D3D12 constant/texture buffer template element can only be a struct
2282
SM.COMPLETEPOSITION Not all elements of SV_Position were written
2283
SM.COUNTERONLYONSTRUCTBUF BufferUpdateCounter valid only on structured buffers
2284
SM.CSNORETURN Compute shaders can't return values, outputs must be written in writable resources (UAVs).
2285
SM.DOMAINLOCATIONIDXOOB DomainLocation component index out of bounds for the domain.
2286
SM.DSINPUTCONTROLPOINTCOUNTRANGE DS input control point count must be [0..%0]. %1 specified
2287
SM.GSINSTANCECOUNTRANGE GS instance count must be [1..%0]. %1 specified
2288
SM.GSOUTPUTVERTEXCOUNTRANGE GS output vertex count must be [0..%0]. %1 specified
2289
SM.GSTOTALOUTPUTVERTEXDATARANGE Declared output vertex count (%0) multiplied by the total number of declared scalar components of output data (%1) equals %2. This value cannot be greater than %3
2290
SM.GSVALIDINPUTPRIMITIVE GS input primitive unrecognized
2291
SM.GSVALIDOUTPUTPRIMITIVETOPOLOGY GS output primitive topology unrecognized
2292
SM.HSINPUTCONTROLPOINTCOUNTRANGE HS input control point count must be [0..%0]. %1 specified
2293
SM.HULLPASSTHRUCONTROLPOINTCOUNTMATCH For pass thru hull shader, input control point count must match output control point count
2294
SM.INSIDETESSFACTORSIZEMATCHDOMAIN InsideTessFactor rows, columns (%0, %1) invalid for domain %2. Expected %3 rows and 1 column.
2295
SM.INVALIDRESOURCECOMPTYPE Invalid resource return type
2296
SM.INVALIDRESOURCEKIND Invalid resources kind
2297
SM.INVALIDTEXTUREKINDONUAV Texture2DMS[Array] or TextureCube[Array] resources are not supported with UAVs
2298
SM.ISOLINEOUTPUTPRIMITIVEMISMATCH Hull Shader declared with IsoLine Domain must specify output primitive point or line. Triangle_cw or triangle_ccw output are not compatible with the IsoLine Domain.
2299
SM.MAXTGSMSIZE Total Thread Group Shared Memory storage is %0, exceeded %1
2300
SM.MAXTHEADGROUP Declared Thread Group Count %0 (X*Y*Z) is beyond the valid maximum of %1
2301
SM.MULTISTREAMMUSTBEPOINT When multiple GS output streams are used they must be pointlists
2302
SM.NAME Target shader model name must be known
2303
SM.NOINTERPMODE Interpolation mode must be undefined for VS input/PS output/patch constant.
2304
SM.NOPSOUTPUTIDX Pixel shader output registers are not indexable.
2305
SM.OPCODE Opcode must be defined in target shader model
2306
SM.OPCODEININVALIDFUNCTION Invalid DXIL opcode usage like StorePatchConstant in patch constant function
2307
SM.OPERAND Operand must be defined in target shader model
2308
SM.OUTPUTCONTROLPOINTCOUNTRANGE output control point count must be [0..%0]. %1 specified
2309
SM.OUTPUTCONTROLPOINTSTOTALSCALARS Total number of scalars across all HS output control points must not exceed
2310
SM.PATCHCONSTANTONLYFORHSDS patch constant signature only valid in HS and DS
2311
SM.PSCONSISTENTINTERP Interpolation mode for PS input position must be linear_noperspective_centroid or linear_noperspective_sample when outputting oDepthGE or oDepthLE and not running at sample frequency (which is forced by inputting SV_SampleIndex or declaring an input linear_sample or linear_noperspective_sample)
2312
SM.PSCOVERAGEANDINNERCOVERAGE InnerCoverage and Coverage are mutually exclusive.
2313
SM.PSMULTIPLEDEPTHSEMANTIC Pixel Shader only allows one type of depth semantic to be declared
2314
SM.PSOUTPUTSEMANTIC Pixel Shader allows output semantics to be SV_Target, SV_Depth, SV_DepthGreaterEqual, SV_DepthLessEqual, SV_Coverage or SV_StencilRef, %0 found
2315
SM.PSTARGETCOL0 SV_Target packed location must start at column 0
2316
SM.PSTARGETINDEXMATCHESROW SV_Target semantic index must match packed row location
2317
SM.RESOURCERANGEOVERLAP Resource ranges must not overlap
2318
SM.ROVONLYINPS RasterizerOrdered objects are only allowed in 5.0+ pixel shaders
2319
SM.SAMPLECOUNTONLYON2DMS Only Texture2DMS/2DMSArray could has sample count
2320
SM.SEMANTIC Semantic must be defined in target shader model
2321
SM.STREAMINDEXRANGE Stream index (%0) must between 0 and %1
2322
SM.TESSFACTORFORDOMAIN Required TessFactor for domain not found declared anywhere in Patch Constant data
2323
SM.TESSFACTORSIZEMATCHDOMAIN TessFactor rows, columns (%0, %1) invalid for domain %2. Expected %3 rows and 1 column.
2324
SM.THREADGROUPCHANNELRANGE Declared Thread Group %0 size %1 outside valid range [%2..%3]
2325
SM.TRIOUTPUTPRIMITIVEMISMATCH Hull Shader declared with Tri Domain must specify output primitive point, triangle_cw or triangle_ccw. Line output is not compatible with the Tri domain
2326
SM.UNDEFINEDOUTPUT Not all elements of output %0 were written
2327
SM.VALIDDOMAIN Invalid Tessellator Domain specified. Must be isoline, tri or quad
2328
SM.ZEROHSINPUTCONTROLPOINTWITHINPUT When HS input control point count is 0, no input signature should exist
2329
TYPES.DEFINED Type must be defined based on DXIL primitives
2330
TYPES.I8 I8 can only used as immediate value for intrinsic
2331
TYPES.INTWIDTH Int type must be of valid width
2332
TYPES.NOMULTIDIM Only one dimension allowed for array type
2333
TYPES.NOVECTOR Vector types must not be present
2334
UNI.NOWAVESENSITIVEGRADIENT Gradient operations are not affected by wave-sensitive data or control flow.
2335
===================================== =======================================================================================================================================================================================================================================================================================================
Dec 28, 2016
2336
2337
.. VALRULES-RST:END
2338
2339
2340
Modules and Linking
2341
===================
2342
2343
HLSL has linking capabilities to enable third-party libraries. The linking step happens before shader DXIL is given to the driver compilers.
2344
2345
Additional Notes
2346
================
2347
2348
These additional notes are not normative for DXIL, and are included for the convenience of implementers.
2349
2350
Other Versioned Components
2351
--------------------------
2352
2353
In addition to shader model, DXIL and bitcode representation versions, two other interesting versioned components are discussed: the supporting operating system and runtime, and the HLSL language.
2354
2355
Support is provided in the Microsoft Windows family of operating systems, when running on the D3D12 runtime.
2356
2357
The HLSL language is versioned independently of DXIL, and currently follows an 'HLSL <year>' naming scheme. HLSL 2015 is the dialect supported by the d3dcompiler_47 library; a limited form of support is provided in the open source HLSL on LLVM project. HLSL 2016 is the version supported by the current HLSL on LLVM project, which removes some features (primarily effect framework syntax, backquote operator) and adds new ones (wave intrinsics and basic i64 support).
2358
2359
DXIL Container Format
2360
---------------------
2361
2362
DXIL is typically encapsulated in a DXIL container. A DXIL container is composed of a header, a sequence of part lengths, and a sequence of parts.
2363
2364
The following C declaration describes this structure::
2365
2366
struct DxilContainerHeader {
2367
uint32_t HeaderFourCC;
2368
uint8_t Digest[DxilContainerHashSize];
2369
uint16_t MajorVersion;
2370
uint16_t MinorVersion;
2371
uint32_t ContainerSizeInBytes; // From start of this header
2372
uint32_t PartCount;
2373
// Structure is followed by uint32_t PartOffset[PartCount];
2374
// The offset is to a DxilPartHeader.
2375
};
2376
2377
Each part has a standard header, followed by a part-specify body::
2378
2379
struct DxilPartHeader {
2380
uint32_t PartFourCC; // Four char code for part type.
2381
uint32_t PartSize; // Byte count for PartData.
2382
// Structure is followed by uint8_t PartData[PartSize].
2383
};
2384
2385
The DXIL program is found in a part with the following body::
2386
2387
struct DxilProgramHeader {
2388
uint32_t ProgramVersion; /// Major and minor version of shader, including type.
2389
uint32_t SizeInUint32; /// Size in uint32_t units including this header.
2390
uint32_t DxilMagic; // 0x4C495844, ASCII "DXIL".
2391
uint32_t DxilVersion; // DXIL version.
2392
uint32_t BitcodeOffset; // Offset to LLVM bitcode (from DxilMagic).
2393
uint32_t BitcodeSize; // Size of LLVM bitcode.
2394
// Followed by uint8_t[BitcodeHeader.BitcodeSize] after possible gap from BitcodeOffset
2395
};
2396
2397
The bitcode payload is defined as per bitcode encoding.
2398
2399
Future Directions
2400
-----------------
2401
2402
This section provides background on future directions for DXIL that may or may not materialize. They imply a new version of DXIL.
2403
2404
It's desirable to support generic pointers, pointing to one of other kinds of pointers. If the compiler fails to disambiguate, memory access is done via a generic pointer; the HLSL compiler will warn the user about each access that it cannot disambiguate. Not supported for SM6.
2405
2406
HLSL will eventually support more primitive types such as i8, i16, i32, i64, half, float, double, as well as declspec(align(n)) and #pragma pack(n) directives. SM6.0 will eventually require byte-granularity access support in hardware, especially writes. Not supported for SM6.
2407
2408
There will be a Requires32BitAlignedAccesses CAP flag. If absent, this would indicate that the shader requires writes that (1) do not write full four bytes, or (2) are not aligned on four-byte boundary. If hardware does not natively support these, the shader is rejected. Programmers can work around this hardware limitation by manually aligning smaller data on four-byte boundary in HLSL.
2409
2410
When libraries are supported as first-class DXIL constructs, "lib_*" shader models can specify more than one entry point per module; the other shader models must specify exactly one entry point.
2411
2412
The target machine specification for HLSL might specify a 64-bit pointer side with 64-bit offsets.
2413
2414
Hardware support for generic pointer is essential for HLSL next as a fallback mechanism for cases when compiler cannot disambiguate pointer's address space.
2415
2416
Future DXIL will change how half and i16 are treated:
2417
* i16 will have to be supported natively either in hardware or via emulation,
2418
* half's behavior will depend on the value of RequiresHardwareHalf CAP; if it's not set, half can be treated as min-precision type (min16float); i.e., computation may be done with values implicitly promoted to floats; if it's set and hardware does not support half type natively, the driver compiler can either emulate exact IEEE half behavior or fail shader creation.
2419
2420
Pending Specification Work
2421
==========================
2422
2423
The following work on this specification is still pending:
2424
2425
* Consider moving some additional tables and lists into hctdb and cross-reference.
2426
* Complete the extended documentation for instructions.
2427