@@ -660,19 +660,60 @@ Non-Integral Pointer Type
660
660
Note: non-integral pointer types are a work in progress, and they should be
661
661
considered experimental at this time.
662
662
663
- LLVM IR optionally allows the frontend to denote pointers in certain address
664
- spaces as "non-integral" via the :ref:`datalayout string<langref_datalayout>`.
665
- Non-integral pointer types represent pointers that have an *unspecified* bitwise
666
- representation; that is, the integral representation may be target dependent or
667
- unstable (not backed by a fixed integer).
663
+ For most targets, the pointer representation is a direct mapping from the
664
+ bitwise representation to the address of the underlying memory location.
665
+ Such pointers are considered "integral", and any pointers where the
666
+ representation is not just an integer address are called "non-integral".
667
+
668
+ Non-integral pointers have at least one of the following three properties:
669
+
670
+ * the pointer representation contains non-address bits
671
+ * the pointer representation is unstable (may changed at any time in a
672
+ target-specific way)
673
+ * the pointer representation has external state
674
+
675
+ These properties (or combinations thereof) can be applied to pointers via the
676
+ :ref:`datalayout string<langref_datalayout>`.
677
+
678
+ The exact implications of these properties are target-specific. The following
679
+ subsections describe the IR semantics and restrictions to optimization passes
680
+ for each of these properties.
681
+
682
+ Pointers with non-address bits
683
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
684
+
685
+ Pointers in this address space have a bitwise representation that not only
686
+ has address bits, but also some other target-specific metadata.
687
+ In most cases pointers with non-address bits behave exactly the same as
688
+ integral pointers, the only difference is that it is not possible to create a
689
+ pointer just from an address unless all the non-address bits are also recreated
690
+ correctly in a target-specific way.
691
+
692
+ An example of pointers with non-address bits are the AMDGPU buffer descriptors
693
+ which are 160 bits: a 128-bit fat pointer and a 32-bit offset.
694
+ Similarly, CHERI capabilities contain a 32 or 64 bit address as well as the
695
+ same number of metadata bits, but unlike the AMDGPU buffer descriptors they have
696
+ external state in addition to non-address bits.
697
+
698
+
699
+ Unstable pointer representation
700
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
701
+
702
+ Pointers in this address space have an *unspecified* bitwise representation
703
+ (i.e. not backed by a fixed integer). The bitwise pattern of such pointers is
704
+ allowed to change in a target-specific way. For example, this could be a pointer
705
+ type used with copying garbage collection where the garbage collector could
706
+ update the pointer at any time in the collection sweep.
668
707
669
708
``inttoptr`` and ``ptrtoint`` instructions have the same semantics as for
670
709
integral (i.e., normal) pointers in that they convert integers to and from
671
- corresponding pointer types, but there are additional implications to be
672
- aware of. Because the bit-representation of a non-integral pointer may
673
- not be stable, two identical casts of the same operand may or may not
710
+ corresponding pointer types, but there are additional implications to be aware
711
+ of.
712
+
713
+ For "unstable" pointer representations, the bit-representation of the pointer
714
+ may not be stable, so two identical casts of the same operand may or may not
674
715
return the same value. Said differently, the conversion to or from the
675
- non-integral type depends on environmental state in an implementation
716
+ "unstable" pointer type depends on environmental state in an implementation
676
717
defined manner.
677
718
678
719
If the frontend wishes to observe a *particular* value following a cast, the
@@ -681,21 +722,72 @@ defined manner. (In practice, this tends to require ``noinline`` routines for
681
722
such operations.)
682
723
683
724
From the perspective of the optimizer, ``inttoptr`` and ``ptrtoint`` for
684
- non-integral types are analogous to ones on integral types with one
725
+ "unstable" pointer types are analogous to ones on integral types with one
685
726
key exception: the optimizer may not, in general, insert new dynamic
686
727
occurrences of such casts. If a new cast is inserted, the optimizer would
687
728
need to either ensure that a) all possible values are valid, or b)
688
729
appropriate fencing is inserted. Since the appropriate fencing is
689
730
implementation defined, the optimizer can't do the latter. The former is
690
731
challenging as many commonly expected properties, such as
691
- ``ptrtoint(v)-ptrtoint(v) == 0``, don't hold for non-integral types.
732
+ ``ptrtoint(v)-ptrtoint(v) == 0``, don't hold for "unstable" pointer types.
692
733
Similar restrictions apply to intrinsics that might examine the pointer bits,
693
734
such as :ref:`llvm.ptrmask<int_ptrmask>`.
694
735
695
- The alignment information provided by the frontend for a non-integral pointer
736
+ The alignment information provided by the frontend for an "unstable" pointer
696
737
(typically using attributes or metadata) must be valid for every possible
697
738
representation of the pointer.
698
739
740
+ Pointers with external state
741
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
742
+
743
+ A further special case of non-integral pointers is ones that include external
744
+ state (such as bounds information or a type tag) with a target-defined size.
745
+ An example of such a type is a CHERI capability, where there is an additional
746
+ validity bit that is part of all pointer-typed registers, but is located in
747
+ memory at an implementation-defined address separate from the pointer itself.
748
+ Another example would be a fat-pointer scheme where pointers remain plain
749
+ integers, but the associated bounds are stored in an out-of-band table.
750
+
751
+ Unless also marked as "unstable", the bit-wise representation of pointers with
752
+ external state is stable and ``ptrtoint(x)`` always yields a deterministic
753
+ value. This means transformation passes are still permitted to insert new
754
+ ``ptrtoint`` instructions.
755
+
756
+ The following restrictions apply to IR level optimization passes:
757
+
758
+ The ``inttoptr`` instruction does not recreate the external state and therefore
759
+ it is target dependent whether it can be used to create a dereferenceable
760
+ pointer. In general passes should assume that the result of such an inttoptr
761
+ is not dereferenceable. For example, on CHERI targets an ``inttoptr`` will
762
+ yield a capability with the external state (the validity tag bit) set to zero,
763
+ which will cause any dereference to trap.
764
+ The ``ptrtoint`` instruction also only returns the "in-band" state and omits
765
+ all external state.
766
+
767
+ When a ``store ptr addrspace(N) %p, ptr @dst`` of such a non-integral pointer
768
+ is performed, the external metadata is also stored to an implementation-defined
769
+ location. Similarly, a ``%val = load ptr addrspace(N), ptr @dst`` will fetch the
770
+ external metadata and make it available for all uses of ``%val``.
771
+ Similarly, the ``llvm.memcpy`` and ``llvm.memmove`` intrinsics also transfer the
772
+ external state. This is essential to allow frontends to efficiently emit copies
773
+ of structures containing such pointers, since expanding all these copies as
774
+ individual loads and stores would affect compilation speed and inhibit
775
+ optimizations.
776
+
777
+ Notionally, these external bits are part of the pointer, but since
778
+ ``inttoptr`` / ``ptrtoint``` only operate on the "in-band" bits of the pointer
779
+ and the external bits are not explicitly exposed, they are not included in the
780
+ size specified in the :ref:`datalayout string<langref_datalayout>`.
781
+
782
+ When a pointer type has external state, all roundtrips via memory must
783
+ be performed as loads and stores of the correct type since stores of other
784
+ types may not propagate the external data.
785
+ Therefore it is not legal to convert an existing load/store (or a
786
+ ``llvm.memcpy`` / ``llvm.memmove`` intrinsic) of pointer types with external
787
+ state to a load/store of an integer type with same bitwidth, as that may drop
788
+ the external state.
789
+
790
+
699
791
.. _globalvars:
700
792
701
793
Global Variables
@@ -3179,8 +3271,8 @@ as follows:
3179
3271
``A<address space>``
3180
3272
Specifies the address space of objects created by '``alloca``'.
3181
3273
Defaults to the default address space of 0.
3182
- ``p[n ]:<size>:<abi>[:<pref>[:<idx>]]``
3183
- This specifies the properties of a pointer in address space ``n ``.
3274
+ ``p[<flags>][<as> ]:<size>:<abi>[:<pref>[:<idx>]]``
3275
+ This specifies the properties of a pointer in address space ``as ``.
3184
3276
The ``<size>`` parameter specifies the size of the bitwise representation.
3185
3277
For :ref:`non-integral pointers <nointptrtype>` the representation size may
3186
3278
be larger than the address width of the underlying address space (e.g. to
@@ -3193,9 +3285,13 @@ as follows:
3193
3285
default index size is equal to the pointer size.
3194
3286
The index size also specifies the width of addresses in this address space.
3195
3287
All sizes are in bits.
3196
- The address space, ``n``, is optional, and if not specified,
3197
- denotes the default address space 0. The value of ``n`` must be
3198
- in the range [1,2^24).
3288
+ The address space, ``<as>``, is optional, and if not specified, denotes the
3289
+ default address space 0. The value of ``<as>`` must be in the range [1,2^24).
3290
+ The optional ``<flags>`` are used to specify properties of pointers in this
3291
+ address space: the character ``u`` marks pointers as having an unstable
3292
+ representation, and ``e`` marks pointers having external state. See
3293
+ :ref:`Non-Integral Pointer Types <nointptrtype>`.
3294
+
3199
3295
``i<size>:<abi>[:<pref>]``
3200
3296
This specifies the alignment for an integer type of a given bit
3201
3297
``<size>``. The value of ``<size>`` must be in the range [1,2^24).
@@ -3248,9 +3344,11 @@ as follows:
3248
3344
this set are considered to support most general arithmetic operations
3249
3345
efficiently.
3250
3346
``ni:<address space0>:<address space1>:<address space2>...``
3251
- This specifies pointer types with the specified address spaces
3252
- as :ref:`Non-Integral Pointer Type <nointptrtype>` s. The ``0``
3253
- address space cannot be specified as non-integral.
3347
+ This marks pointer types with the specified address spaces
3348
+ as :ref:`unstable <nointptrtype>`.
3349
+ The ``0`` address space cannot be specified as non-integral.
3350
+ It is only supported for backwards compatibility, the flags of the ``p``
3351
+ specifier should be used instead for new code.
3254
3352
3255
3353
``<abi>`` is a lower bound on what is required for a type to be considered
3256
3354
aligned. This is used in various places, such as:
@@ -31402,4 +31500,3 @@ Semantics:
31402
31500
31403
31501
The '``llvm.preserve.struct.access.index``' intrinsic produces the same result
31404
31502
as a getelementptr with base ``base`` and access operands ``{0, gep_index}``.
31405
-
0 commit comments