Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sketching out a "types and shapes" developer document. #7108

Closed
wants to merge 3 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
136 changes: 136 additions & 0 deletions docs/developers/design_docs/types_and_shapes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
# Types and Shapes

IREE supports compiling programs from a variety of frontend frameworks to a
number of backends and uses a collection of MLIR dialects and passes to connect
between each slice through the system. Each layer of the stack has its its own
views on data types and shapes.

* Data _type_ here refers to an attribute of data which describes its meaning,
defines operations that can be performed on it, and gives information about
how it can be stored. Examples of data types are `integer`, `float`, and
`string`. See [the Wikipedia page on data types](https://en.wikipedia.org/wiki/Data_type)
for more background.
* Data _shape_ here refers to an attribute of multidimensional data (scalars,
matrices, tensors) which describes the number of elements in each axis of the
data. Shapes are comprised of a rank (the number of axes, if defined) and a
list of dimensions, one element per axis. Some example shapes are `[3, 4]`,
`[*]` (unranked), and `[?, 2]` (ranked with one unknown dimension). See the
[MLIR 'shape' Dialect documentation](https://mlir.llvm.org/docs/Dialects/ShapeDialect/)
for more background.

Frontend references:

* TensorFlow: [Introduction to Tensors](https://www.tensorflow.org/guide/tensor)
* PyTorch: [`torch.Tensor` documentation](https://pytorch.org/docs/stable/tensors.html)
* NumPy: [Data types documentation](https://numpy.org/doc/stable/user/basics.types.html)

Backend references:

* Vulkan: [buffer and image formats](https://www.khronos.org/registry/vulkan/specs/1.0/html/vkspec.html#formats)
* SPIR-V: [types](https://www.khronos.org/registry/SPIR-V/specs/1.0/SPIRV.html#_types) and [capabilities](https://www.khronos.org/registry/SPIR-V/specs/1.0/SPIRV.html#_a_id_capability_a_capability)
Comment on lines +31 to +34
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to split this to codegen backends and drivers?

Backend Driver
llvm-aot dylib
llvm-aot dylib-sync
spir-v vulkan
cuda cuda

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer for the start of the document to give enough general context: we target different hardware devices and APIs, and some of those have very strict requirements, so the layers can be pretty opinionated.

The HAL section down below can go into specifics for each compiler target / device configuration.


## Types
ScottTodd marked this conversation as resolved.
Show resolved Hide resolved

Types can roughly be grouped in a few different ways:

* Primitive (`char`, `int`) vs composite (`string`, `array<int>`)
* Signed (`int`, `int32_t`) vs unsigned (`unsigned`, `uint32_t`) vs signless
* Fixed width (`int32_t`) vs variable width (`int`, `index`, `uintptr_t`)
* Real (`float32`) vs complex (`tf.complex64`)
* Concrete vs opaque (`void*`, API internal structs, hardware image formats)
* Quantized data types (`bfloat16`)

Types are least constrained in user code within high level frameworks, where
composite types such as Python classes, media files, Protocol Buffers, JSON
objects, and other data structures can be freely created and transformed.
Meanwhile, types are most constrained by hardware and device APIs, where only
specific low level primitives are defined or where certain operations are
supported by efficient hardware implementations.

### Conversion process

IREE lowers programs from representations produced by high level frondends down
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its probably worth mentioning that we don't directly ingest Tensorflow and ingest TOSA instead. Our TOSA does not require fully defined shapes and we run shape inference after import. It will fully infer as far as it can, however not all cases on all ops are possible. E.g. transpose conv needs to know its shape at construction.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mentioned that a bit in the shapes section, thanks. Haven't yet gone into detail on each item down in this section of the doc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: +1 on TF side. We will infer the shapes as far as it can after importing the model into MLIR.

This is the pass doing shape inference: https://github.com/google/iree/blob/94a2168c63ac5e075be132fe1b97dd8427c03733/integrations/tensorflow/iree_tf_compiler/TF/Passes.cpp#L46

to low level host code with scheduling logic and device code containing fused
kernels of dense computation. The phases of compilation can be segmented by
which MLIR dialects are primarily being transformed:

```
Frontends (PyTorch, JAX, TensorFlow, TOSA, etc.)
* Includes user code, serialized ML models / programs, and other libraries


Import dialects (`iree`, `tensor`, `linalg`, etc.)


`flow` dialect (tensor program modeling and compute workload partitioning)


`stream` dialect | code generation
ScottTodd marked this conversation as resolved.
Show resolved Hide resolved
(device placement and asynchronous scheduling) | (SPIR-V, LLVM, etc.)


`hal` dialect (Hardware Abstraction Layer for buffer and execution management)


`vm` dialect (Virtual Machine for setting up and dispatching workloads)
```

See also https://google.github.io/iree/#project-architecture.

#### Requirements for import dialects

#### Requirements for `flow` dialect

#### Requirements for `stream` dialect

#### Requirements for code generation

TODO: LLVM / SPIR-V emulation of types?

#### Requirements for `hal` dialect

The Hardware Abstraction Layer maps nearly directly to underlying hardware APIs
such as Vulkan, Metal, and CUDA.

* No tensor types. Buffers of primitives or explicitly supported opaque data
types.
* Supported primitives vary per target backend and may be optionally available.
Generally expect for int32 and float32 to be well supported for mobile to
desktop -scale devices and for lower or higher bit depth types (e.g. float16,
int64) to be optionally available. On embedded systems or certain
accelerators there may be no floating support at all.

#### Requirements for `vm` dialect

IREE's Virtual Machine aims to be maximally portable, so it implements support
for i64, f32, and f64 behind extensions. See
[iree/base/config.h](https://github.com/google/iree/blob/main/iree/base/config.h)
for the specifics of each extension.

### Strategies for converting between types

#### Emulating

#### Truncating / Demotion

#### Extending / Promotion

#### Packing

TODO: pack i1 into i8/i32 (vectorization)

## Shapes

TODO: static vs dynamic
TODO: ranked vs unranked
TODO: shape inference, https://mlir.llvm.org/docs/ShapeInference/

## Layouts and tiling

TODO: dense vs sparse
TODO: dispatch grids