Skip to content

Commit

Permalink
explain the MIR const vs TY const situation
Browse files Browse the repository at this point in the history
  • Loading branch information
RalfJung committed Sep 16, 2023
1 parent 6b347d2 commit 0ddb68d
Show file tree
Hide file tree
Showing 2 changed files with 64 additions and 45 deletions.
46 changes: 2 additions & 44 deletions src/const-eval.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,51 +41,9 @@ and a [`GlobalId`]. The `GlobalId` is made up of an `Instance` referring to a co
or static or of an `Instance` of a function and an index into the function's `Promoted` table.

Constant evaluation returns an [`EvalToValTreeResult`] for type system constants or
[`EvalToConstValueResult`] with either the error, or a representation of the constant.

Constants for the type system are encoded in "valtree representation". The `ValTree` datastructure
allows us to represent

* arrays,
* many structs,
* tuples,
* enums and,
* most primitives.

The basic rule for
being permitted in the type system is that every value must be uniquely represented. In other
words: a specific value must only be representable in one specific way. For example: there is only
one way to represent an array of two integers as a `ValTree`:
`ValTree::Branch(&[ValTree::Leaf(first_int), ValTree::Leaf(second_int)])`.
Even though theoretically a `[u32; 2]` could be encoded in a `u64` and thus just be a
`ValTree::Leaf(bits_of_two_u32)`, that is not a legal construction of `ValTree`
(and is very complex to do, so it is unlikely anyone is tempted to do so).

These rules also mean that some values are not representable. There can be no `union`s in type
level constants, as it is not clear how they should be represented, because their active variant
is unknown. Similarly there is no way to represent raw pointers, as addresses are unknown at
compile-time and thus we cannot make any assumptions about them. References on the other hand
*can* be represented, as equality for references is defined as equality on their value, so we
ignore their address and just look at the backing value. We must make sure that the pointer values
of the references are not observable at compile time. We thus encode `&42` exactly like `42`.
Any conversion from
valtree back to codegen constants must reintroduce an actual indirection. At codegen time the
addresses may be deduplicated between multiple uses or not, entirely depending on arbitrary
optimization choices.

As a consequence, all decoding of `ValTree` must happen by matching on the type first and making
decisions depending on that. The value itself gives no useful information without the type that
belongs to it.

Other constants get represented as [`ConstValue::Scalar`] or
[`ConstValue::Slice`] if possible. These values are only useful outside the
compile-time interpreter. If you need the value of a constant during
interpretation, you need to directly work with [`const_to_op`].
[`EvalToConstValueResult`] with either the error, or a representation of the evaluated constant:
a [valtree](mir/index.md#valtrees) or a [MIR constant value](mir/index.md#mir-constant-values), respectively.

[`GlobalId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/interpret/struct.GlobalId.html
[`ConstValue::Scalar`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/interpret/value/enum.ConstValue.html#variant.Scalar
[`ConstValue::Slice`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/interpret/value/enum.ConstValue.html#variant.Slice
[`ConstValue::ByRef`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/interpret/value/enum.ConstValue.html#variant.ByRef
[`EvalToConstValueResult`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/interpret/error/type.EvalToConstValueResult.html
[`EvalToValTreeResult`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/interpret/error/type.EvalToValTreeResult.html
[`const_to_op`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_const_eval/interpret/struct.InterpCx.html#method.const_to_op
63 changes: 62 additions & 1 deletion src/mir/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -255,7 +255,66 @@ but [you can read about those below](#promoted)).

## Representing constants

*to be written*
When code has reached the MIR stage, constants can generally come in two forms: *MIR constants* ([`mir::Constant`]) and *type system constants* ([`ty::Const`]).
MIR constants are used as operands: in `x + CONST`, `CONST` is a MIR constant; similarly, in `x + 2`, `2` is a MIR constant.
Type system constants are used in the type system, in particular for array lengths but also for const generics.

Generally, both kinds of constants can be "unevaluated" or "already evaluated".
And unevaluated constant simply stores the `DefId` of what needs to be evaluated to compute this result.
An evaluated constant (a "value") has already been computed; their representation differs between type system constants and MIR constants:
MIR constants evaluate to a `mir::ConstValue`; type system constants evaluate to a `ty::ValTree`.

Type system constants have some more variants to support const generics: they can refer to local const generic parameters, and they are subject to inference.
Furthermore, the `mir::Constant::Ty` variant lets us use an arbitrary type system constant as a MIR constant; this happens whenever a const generic parameter is used as an operand.

### MIR constant values

In general, a MIR constant value (`mir::ConstValue`) was computed by evaluating some constant the user wrote.
This [const evaluation](../const-eval.md) produces a very low-level representation of the result in terms of individual bytes.
We call this an "indirect" constant (`mir::ConstValue::Indirect`) since the value is stored in-memory.

However, storing everything in-memory would be awfully inefficient. Hence there
are some other variants in `mir::ConstValue` that can represent certain simple
and common values more efficiently. In particular, everything that can be
directly written as a literal in Rust (integers, floats, chars, bools, but also
`"string literals"` and `b"byte string literals"`) has an optimized variant that
avoids the full overhead of the in-memory representation.

### ValTrees

An evaluated type system constant is a "valtree". The `ty::ValTree` datastructure
allows us to represent

* arrays,
* many structs,
* tuples,
* enums and,
* most primitives.

The most important rule for
this representation is that every value must be uniquely represented. In other
words: a specific value must only be representable in one specific way. For example: there is only
one way to represent an array of two integers as a `ValTree`:
`ValTree::Branch(&[ValTree::Leaf(first_int), ValTree::Leaf(second_int)])`.
Even though theoretically a `[u32; 2]` could be encoded in a `u64` and thus just be a
`ValTree::Leaf(bits_of_two_u32)`, that is not a legal construction of `ValTree`
(and is very complex to do, so it is unlikely anyone is tempted to do so).

These rules also mean that some values are not representable. There can be no `union`s in type
level constants, as it is not clear how they should be represented, because their active variant
is unknown. Similarly there is no way to represent raw pointers, as addresses are unknown at
compile-time and thus we cannot make any assumptions about them. References on the other hand
*can* be represented, as equality for references is defined as equality on their value, so we
ignore their address and just look at the backing value. We must make sure that the pointer values
of the references are not observable at compile time. We thus encode `&42` exactly like `42`.
Any conversion from
valtree back a to MIR constant value must reintroduce an actual indirection. At codegen time the
addresses may be deduplicated between multiple uses or not, entirely depending on arbitrary
optimization choices.

As a consequence, all decoding of `ValTree` must happen by matching on the type first and making
decisions depending on that. The value itself gives no useful information without the type that
belongs to it.

<a name="promoted"></a>

Expand Down Expand Up @@ -283,3 +342,5 @@ See the const-eval WG's [docs on promotion](https://github.com/rust-lang/const-e
[`ProjectionElem::Deref`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/enum.ProjectionElem.html#variant.Deref
[`Rvalue`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/enum.Rvalue.html
[`Operand`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/enum.Operand.html
[`mir::Constant`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/struct.Constant.html
[`ty::Const`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.Const.html

0 comments on commit 0ddb68d

Please sign in to comment.