diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst index 33ff3a8e85dbec..738d20018c2caa 100644 --- a/llvm/docs/LangRef.rst +++ b/llvm/docs/LangRef.rst @@ -1199,7 +1199,7 @@ Currently, only the following parameter attributes are defined: The ``elementtype`` argument attribute can be used to specify a pointer element type in a way that is compatible with `opaque pointers - `. + `_. The ``elementtype`` attribute by itself does not carry any specific semantics. However, certain intrinsics may require this attribute to be diff --git a/llvm/docs/OpaquePointers.rst b/llvm/docs/OpaquePointers.rst index 3c179a8272d27a..8528ba4673f26a 100644 --- a/llvm/docs/OpaquePointers.rst +++ b/llvm/docs/OpaquePointers.rst @@ -6,7 +6,7 @@ The Opaque Pointer Type ======================= Traditionally, LLVM IR pointer types have contained a pointee type. For example, -``i32 *`` is a pointer that points to an ``i32`` somewhere in memory. However, +``i32*`` is a pointer that points to an ``i32`` somewhere in memory. However, due to a lack of pointee type semantics and various issues with having pointee types, there is a desire to remove pointee types from pointers. @@ -29,7 +29,7 @@ actual underlying type in memory. In other words, the pointee type contains no real semantics. Lots of operations do not actually care about the underlying type. These -operations, typically intrinsics, usually end up taking an ``i8 *``. This causes +operations, typically intrinsics, usually end up taking an ``i8*``. This causes lots of redundant no-op bitcasts in the IR to and from a pointer with a different pointee type. The extra bitcasts take up space and require extra work to look through in optimizations. And more bitcasts increases the chances of @@ -57,6 +57,35 @@ LLVM IR distinguished between unsigned and signed integer types. The transition from manifesting signedness in types to instructions happened early on in LLVM's life to the betterment of LLVM IR. +Opaque Pointers Mode +==================== + +During the transition phase, LLVM can be used in two modes: In typed pointer +mode (currently still the default) all pointer types have a pointee type and +opaque pointers cannot be used. In opaque pointers mode, all pointers are +opaque. The opaque pointer mode can be enabled using ``-opaque-pointers`` in +LLVM tools like ``opt``, or ``-mllvm -opaque-pointers`` in clang. + +In opaque pointer mode, all typed pointers used in IR, bitcode, or created +using ``PointerType::get()`` and similar APIs are automatically converted into +opaque pointers. This simplifies migration and allows testing existing IR with +opaque pointers. + +.. code-block:: llvm + + define i8* @test(i8* %p) { + %p2 = getelementptr i8, i8* %p, i64 1 + ret i8* %p2 + } + + ; Is automatically converted into the following if -opaque-pointers + ; is enabled: + + define ptr @test(ptr %p) { + %p2 = getelementptr i8, ptr %p, i64 1 + ret ptr %p2 + } + I Still Need Pointee Types! =========================== @@ -87,73 +116,92 @@ indirectly. If you have use cases that this sort of fix doesn't cover, please email llvm-dev. -Transition Plan -=============== - -LLVM currently has many places that depend on pointee types. Each dependency on -pointee types needs to be resolved in some way or another. This essentially -translates to figuring out how to remove all calls to -``PointerType::getElementType`` and ``Type::getPointerElementType()``. - -Making everything use opaque pointers in one huge commit is infeasible. This -needs to be done incrementally. The following steps need to be done, in no -particular order: - -* Introduce the opaque pointer type - - * Already done - -* Remove remaining in-tree users of pointee types - - * There are many miscellaneous uses that should be cleaned up individually +Migration Instructions +====================== - * Some of the larger use cases are mentioned below +In order to support opaque pointers, two types of changes tend to be necessary. +The first is the removal of all calls to ``PointerType::getElementType()`` and +``Type::getPointerElementType()``. -* Various ABI attributes and instructions that rely on pointee types need to be - modified to specify the type separately +In the LLVM middle-end and backend, this is usually accomplished by inspecting +the type of relevant operations instead. For example, memory access related +analyses and optimizations should use the types encoded in the load and store +instructions instead of querying the pointer type. - * This has already happened for all instructions like loads, stores, GEPs, - and various attributes like ``byval`` +Frontends need to be adjusted to track pointee types independently of LLVM, +insofar as they are necessary for lowering. For example, clang now tracks the +pointee type in the ``Address`` structure. - * More cases may be found as work continues +While direct usage of pointer element types is immediately apparent in code, +there is a more subtle issue that opaque pointers need to contend with: A lot +of code assumes that pointer equality also implies that the used load/store +type is the same. Consider the following examples with typed an opaque pointers: -* Remove calls to and deprecate ``IRBuilder`` methods that rely on pointee types - - * For example, some of the ``IRBuilder::CreateGEP()`` methods use the pointer - operand's pointee type to determine the GEP operand type - - * Some methods are already deprecated with ``LLVM_ATTRIBUTE_DEPRECATED``, such - as some overloads of ``IRBuilder::CreateLoad()`` - -* Allow bitcode auto-upgrade of legacy pointer type to the new opaque pointer - type (not to be turned on until ready) - - * To support legacy bitcode, such as legacy stores/loads, we need to track - pointee types for all values since legacy instructions may infer the types - from a pointer operand's pointee type - -* Migrate frontends to not keep track of frontend pointee types via LLVM pointer - pointee types - - * This is mostly Clang, see ``clang::CodeGen::Address::getElementType()`` - -* Add option to internally treat all pointer types opaque pointers and see what - breaks, starting with LLVM tests, then run Clang over large codebases - - * We don't want to start mass-updating tests until we're fairly confident that opaque pointers won't cause major issues - -* Replace legacy pointer types in LLVM tests with opaque pointer types - -Frontend Migration Steps -======================== - -If you have your own frontend, there are a couple of things to do after opaque -pointer types fully work. - -* Don't rely on LLVM pointee types to keep track of frontend pointee types - -* Migrate away from LLVM IR instruction builders that rely on pointee types +.. code-block:: llvm - * For example, ``IRBuilder::CreateGEP()`` has multiple overloads; make sure to - use one where the source element type is explicitly passed in, not inferred - from the pointer operand pointee type + define i32 @test(i32* %p) { + store i32 0, i32* %p + %bc = bitcast i32* %p to i64* + %v = load i64, i64* %bc + ret i64 %v + } + + define i32 @test(ptr %p) { + store i32 0, ptr %p + %v = load i64, ptr %p + ret i64 %v + } + +Without opaque pointers, a check that the pointer operand of the load and +store are the same also ensures that the accessed type is the same. Using a +different type requires a bitcast, which will result in distinct pointer +operands. + +With opaque pointers, the bitcast is not present, and this check is no longer +sufficient. In the above example, it could result in store to load forwarding +of an incorrect type. Code making such assumptions needs to be adjusted to +check the accessed type explicitly: +``LI->getType() == SI->getValueOperand()->getType()``. + +Frontends using the C API through an FFI interface should be aware that a +number of C API functions are deprecated and will be removed as part of the +opaque pointer transition:: + + LLVMBuildLoad -> LLVMBuildLoad2 + LLVMBuildCall -> LLVMBuildCall2 + LLVMBuildInvoke -> LLVMBuildInvoke2 + LLVMBuildGEP -> LLVMBuildGEP2 + LLVMBuildInBoundsGEP -> LLVMBuildInBoundsGEP2 + LLVMBuildStructGEP -> LLVMBuildStructGEP2 + LLVMConstGEP -> LLVMConstGEP2 + LLVMConstInBoundsGEP -> LLVMConstInBoundsGEP2 + LLVMAddAlias -> LLVMAddAlias2 + +Additionally, it will no longer be possible to call ``LLVMGetElementType()`` +on a pointer type. + +Transition State +================ + +As of January 2022 large parts of LLVM support opaque pointers, but there are +still some major open problems: + +* Bitcode already fully supports opaque pointers, and reading up-to-date + typed pointer bitcode in opaque pointers mode also works. However, we + currently do not support pointee type based auto-upgrade of old bitcode in + opaque pointer mode. + +* While clang has limited support for opaque pointers (sufficient to compile + CTMark on Linux), a major effort will be needed to systematically remove all + uses of ``getPointerElementType()`` and the deprecated ``Address()`` + constructor. + +* We do not yet have a testing strategy for how we can test both typed and + opaque pointers during the migration. Currently, individual tests for + opaque pointers are being added, but the bulk of tests still uses typed + pointers. + +* Loop access analysis does not support opaque pointers yet, and is currently + the main source of assertion failurse in optimized builds. + +* Miscellanous uses of pointer element types remain everywhere.