Skip to content

Commit

Permalink
[mlir] partially update LLVM dialect documentation
Browse files Browse the repository at this point in the history
Rewrite the parts of the documentation that became stale: context/module
handling and type system. Expand the type system description.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D93315
  • Loading branch information
ftynse committed Dec 17, 2020
1 parent 6d2a789 commit c275125
Showing 1 changed file with 276 additions and 29 deletions.
305 changes: 276 additions & 29 deletions mlir/docs/Dialects/LLVM.md
@@ -1,49 +1,296 @@
# 'llvm' Dialect

This dialect wraps the LLVM IR types and instructions into MLIR types and
operations. It provides several additional operations that are necessary to
cover for the differences in the IR structure (e.g., MLIR does not have `phi`
operations and LLVM IR does not have a `constant` operation).
This dialect maps [LLVM IR](https://llvm.org/docs/LangRef.html) into MLIR by
defining the corresponding operations and types. LLVM IR metadata is usually
represented as MLIR attributes, which offer additional structure verification.

In this document, we use "LLVM IR" to designate the
We use "LLVM IR" to designate the
[intermediate representation of LLVM](https://llvm.org/docs/LangRef.html) and
"LLVM IR _dialect_" to refer to the MLIR dialect reflecting LLVM instructions
and types.
"LLVM _dialect_" or "LLVM IR _dialect_" to refer to this MLIR dialect.

Unless explicitly stated otherwise, the semantics of the LLVM dialect operations
must correspond to the semantics of LLVM IR instructions and any divergence is
considered a bug. The dialect also contains auxiliary operations that smoothen
the differences in the IR structure, e.g., MLIR does not have `phi` operations
and LLVM IR does not have a `constant` operation. These auxiliary operations are
systematically prefixed with `mlir`, e.g. `llvm.mlir.constant` where `llvm.` is
the dialect namespace prefix.

[TOC]

## Context and Module Association
## Dependency on LLVM IR

LLVM dialect is not expected to depend on any object that requires an
`LLVMContext`, such as an LLVM IR instruction or type. Instead, MLIR provides
thread-safe alternatives compatible with the rest of the infrastructure. The
dialect is allowed to depend on the LLVM IR objects that don't require a
context, such as data layout and triple description.

## Module Structure

IR modules use the built-in MLIR `ModuleOp` and support all its features. In
particular, modules can be named, nested and are subject to symbol visibility.
Modules can contain any operations, including LLVM functions and globals.

The LLVM IR dialect object _contains_ an LLVM Context and an LLVM Module that it
uses to define, print, parse and manage LLVM IR types. These objects can be
obtained from the dialect object using `.getLLVMContext()` and
`getLLVMModule()`. All LLVM IR objects that interact with the LLVM IR dialect
must exist in the dialect's context.
### Data Layout and Triple

An IR module may have an optional data layout and triple information attached
using MLIR attributes `llvm.data_layout` and `llvm.triple`, respectively. Both
are string attributes with the
[same syntax](https://llvm.org/docs/LangRef.html#data-layout) as in LLVM IR and
are verified to be correct. They can be defined as follows.

```mlir
module attributes {llvm.data_layout = "e",
llvm.target_triple = "aarch64-linux-android"} {
// module contents
}
```

## Types

The LLVM IR dialect defines a single MLIR type, `LLVM::LLVMType`, that can wrap
any existing LLVM IR type. Its syntax is as follows
LLVM dialect defines a set of types that correspond to LLVM IR types. The
dialect type system is _closed_: types from other dialects are not allowed
within LLVM dialect aggregate types. This property allows for more concise
custom syntax and ensures easy translation to LLVM IR.

Similarly to other MLIR context-owned objects, the creation and manipulation of
LLVM dialect types is thread-safe.

MLIR does not support module-scoped named type declarations, e.g. `%s = type
{i32, i32}` in LLVM IR. Instead, types must be fully specified at each use,
except for recursive types where only the first reference to a named type needs
to be fully specified. MLIR type aliases are supported for top-level types, i.e.
they cannot be used inside the type due to type system closedness.

The general syntax of LLVM dialect types is `!llvm.`, followed by a type kind
identifier (e.g., `ptr` for pointer or `struct` for structure) and by an
optional list of type parameters in angle brackets. The dialect follows MLIR
style for types with nested angle brackets and keyword specifiers rather than
using different bracket styles to differentiate types. Inside angle brackets,
the `!llvm` prefix is omitted for brevity; thanks to closedness of the type
system, all types are assumed to be defined in the LLVM dialect. For example,
`!llvm.ptr<struct<packed, (i8, i32)>>` is a pointer to a packed structure type
containing an 8-bit and a 32-bit integer.

### Simple Types

The following non-parametric types are supported.

- `!llvm.bfloat` (`LLVMBFloatType`) - 16-bit “brain” floating-point value
(7-bit significand).
- `!llvm.half` (`LLVMHalfType`) - 16-bit floating-point value as per
IEEE-754-2008.
- `!llvm.float` (`LLVMFloatType`) - 32-bit floating-point value as per
IEEE-754-2008.
- `!llvm.double` (`LLVMDoubleType`) - 64-bit floating-point value as per
IEEE-754-2008.
- `!llvm.fp128` (`LLVMFP128Type`) - 128-bit floating-point value as per
IEEE-754-2008.
- `!llvm.x86_fp80` (`LLVMX86FP80Type`) - 80-bit floating-point value (x87).
- `!llvm.x86_mmx` (`LLVMX86MMXType`) - value held in an MMX register on x86
machine.
- `!llvm.ppc_fp128` (`LLVMPPCFP128Type`) - 128-bit floating-point value (two
64 bits).
- `!llvm.token` (`LLVMTokenType`) - a non-inspectable value associated with an
operation.
- `!llvm.metadata` (`LLVMMetadataType`) - LLVM IR metadata, to be used only if
the metadata cannot be represented as structured MLIR attributes.
- `!llvm.void` (`LLVMVoidType`) - does not represent any value; can only
appear in function results.

These types represent a single value (or an absence thereof in case of `void`)
and correspond to their LLVM IR counterparts.

### Parametric Types

#### Integer Types

Integer types are parametric in MLIR terminology, with their bitwidth being a
type parameter. They are expressed as follows:

```
llvm-int-type ::= `!llvm.i` integer-literal
```

and represented internally as `LLVMIntegerType`. For example, `i1` is a 1-bit
integer type (bool) and `i32` as a 32-bit integer type.

#### Pointer Types

Pointer types specify an address in memory.

Pointer types are parametric types parameterized by the element type and the
address space. The address space is an integer, but this choice may be
reconsidered if MLIR implements named address spaces. Their syntax is as
follows:

```
llvm-ptr-type ::= `!llvm.ptr<` llvm-type (`,` integer-literal)? `>`
```

where the optional integer literal corresponds to the memory space. Both cases
are represented by `LLVMPointerType` internally.

#### Vector Types

Vector types represent sequences of elements, typically when multiple data
elements are processed by a single instruction (SIMD). Vectors are thought of as
stored in registers and therefore vector elements can only be addressed through
constant indices.

Vector types are parameterized by the size, which may be either _fixed_ or a
multiple of some fixed size in case of _scalable_ vectors, and the element type.
Vectors cannot be nested and only 1D vectors are supported. Scalable vectors are
still considered 1D. Their syntax is as follows:

```
llvm-vec-type ::= `!llvm.vec<` (`?` `x`)? integer-literal `x` llvm-type `>`
```

Internally, fixed vector types are represented as `LLVMFixedVectorType` and
scalable vector types are represented as `LLVMScalableVectorType`. Both classes
derive`LLVMVectorType`.

#### Array Types

Array types represent sequences of elements in memory. Unlike vectors, array
elements can be addressed with a value unknown at compile time, and can be
nested. Only 1D arrays are allowed though.

Array types are parameterized by the fixed size and the element type.
Syntactically, their representation is close to vectors:

```
llvm-array-type ::= `!llvm.array<` integer-literal `x` llvm-type `>`
```

and are internally represented as `LLVMArrayType`.

#### Function Types

Function types represent the type of a function, i.e. its signature.

Function types are parameterized by the result type, the list of argument types
and by an optional "variadic" flag. Unlike built-in `FunctionType`, LLVM dialect
functions (`LLVMFunctionType`) always have single result, which may be
`!llvm.void` if the function does not return anything. The syntax is as follows:

```
llvm-func-type ::= `!llvm.func<` llvm-type `(` llvm-type-list (`,` `...`)?
`)` `>`
```

For example,

```mlir
!llvm.func<void ()> // a function with no arguments;
!llvm.func<i32 (float, i32)> // a function with two arguments and a result;
!llvm.func<void (i32, ...)> // a variadic function with at least one argument.
```

In the LLVM dialect, functions are not first-class objects and one cannot have a
value of function type. Instead, one can take the address of a function and
operate on pointers to functions.

### Structure Types

The structure type is used to represent a collection of data members together in
memory. The elements of a structure may be any type that has a size.

Structure types are represented in a single dedicated class
mlir::LLVM::LLVMStructType. Internally, the struct type stores a (potentially
empty) name, a (potentially empty) list of contained types and a bitmask
indicating whether the struct is named, opaque, packed or uninitialized.
Structure types that don't have a name are referred to as _literal_ structs.
Such structures are uniquely identified by their contents. _Identified_ structs
on the other hand are uniquely identified by the name.

#### Identified Structure Types

Identified structure types are uniqued using their name in a given context.
Attempting to construct an identified structure with the same name a structure
that already exists in the context *will result in the existing structure being
returned*. **MLIR does not auto-rename identified structs in case of name
conflicts** because there is no naming scope equivalent to a module in LLVM IR
since MLIR modules can be arbitrarily nested.

Programmatically, identified structures can be constructed in an _uninitialized_
state. In this case, they are given a name but the body must be set up by a
later call, using MLIR's type mutation mechanism. Such uninitialized types can
be used in type construction, but must be eventually initialized for IR to be
valid. This mechanism allows for constructing _recursive_ or mutually referring
structure types: an uninitialized type can be used in its own initialization.

Once the type is initialized, its body cannot be changed anymore. Any further
attempts to modify the body will fail and return failure to the caller _unless
the type is initialized with the exact same body_. Type initialization is
thread-safe; however, if a concurrent thread initializes the type before the
current thread, the initialization may return failure.

The syntax for identified structure types is as follows.

```
llvm-ident-struct-type ::= `!llvm.struct<` string-literal, `opaque` `>`
| `!llvm.struct<` string-literal, `packed`?
`(` llvm-type-or-ref-list `)` `>`
llvm-type-or-ref-list ::= <maybe empty comma-separated list of llvm-type-or-ref>
llvm-type-or-ref ::= <any llvm type>
| `!llvm.struct<` string-literal >
```

The body of the identified struct is printed in full unless the it is
transitively contained in the same struct. In the latter case, only the
identifier is printed. For example, the structure containing the pointer to
itself is represented as `!llvm.struct<"A", (ptr<"A">)>`, and the structure `A`
containing two pointers to the structure `B` containing a pointer to the
structure `A` is represented as `!llvm.struct<"A", (ptr<"B", (ptr<"A">)>,
ptr<"B", (ptr<"A">))>`. Note that the structure `B` is "unrolled" for both
elements. _A structure with the same name but different body is a syntax error._
**The user must ensure structure name uniqueness across all modules processed in
a given MLIR context.** Stucture names are arbitrary string literals and may
include, e.g., spaces and keywords.

Identified structs may be _opaque_. In this case, the body is unknown but the
structure type is considered _initialized_ and is valid in the IR.

#### Literal Structure Types

Literal structures are uniqued according to the list of elements they contain,
and can optionally be packed. The syntax for such structs is as follows.

```
type ::= `!llvm<"` llvm-canonical-type `">
llvm-canonical-type ::= <canonical textual representation defined by LLVM>
llvm-literal-struct-type ::= `!llvm.struct<` `packed`? `(` llvm-type-list `)`
`>`
llvm-type-list ::= <maybe empty comma-separated list of llvm types w/o `!llvm`>
```

For example, one can use primitive types `!llvm.i32`, pointer types
`!llvm<"i8*">`, vector types `!llvm<"<4 x float>">` or structure types
`!llvm<"{i32, float}">`. The parsing and printing of the canonical form are
delegated to the LLVM assembly parser and printer.
Literal structs cannot be recursive, but can contain other structs. Therefore,
they must be constructed in a single step with the entire list of contained
elements provided.

#### Examples of Structure Types

```mlir
!llvm.struct<> // NOT allowed
!llvm.struct<()> // empty, literal
!llvm.struct<(i32)> // literal
!llvm.struct<(struct<(i32)>)> // struct containing a struct
!llvm.struct<packed (i8, i32)> // packed struct
!llvm.struct<"a"> // recursive reference, only allowed within
// another struct, NOT allowed at top level
!llvm.struct<"a", ptr<struct<"a">>> // supported example of recursive reference
!llvm.struct<"a", ()> // empty, named (necessary to differentiate from
// recursive reference)
!llvm.struct<"a", opaque> // opaque, named
!llvm.struct<"a", (i32)> // named
!llvm.struct<"a", packed (i8, i32)> // named, packed
```

LLVM IR dialect types contain an `llvm::Type*` object that can be obtained by
calling `.getUnderlyingType()` and used in LLVM API calls directly. These
objects are allocated within the LLVM context associated with the LLVM IR
dialect and may be linked to the properties of the associated LLVM module.
### Unsupported Types

LLVM IR dialect type can be constructed from any `llvm::Type*` that is
associated with the LLVM context of the dialect. In this document, we use the
term "wrapped LLVM IR type" to refer to the LLVM IR dialect type containing a
specific LLVM IR type.
LLVM IR `label` type does not have a counterpart in the LLVM dialect since, in
MLIR, blocks are not values and don't need a type.

## Operations

Expand Down

0 comments on commit c275125

Please sign in to comment.