spit of type folder and generics subchapters

mark-i-m · Feb 18, 2020 · 6322f58 · 6322f58
1 parent d2e17eb
commit 6322f58
Show file tree

Hide file tree

Showing 4 changed files with 251 additions and 248 deletions.
diff --git a/src/SUMMARY.md b/src/SUMMARY.md
@@ -53,6 +53,8 @@
         - [Debugging](./hir-debugging.md)
     - [Closure expansion](./closure.md)
     - [The `ty` module: representing types](./ty.md)
+        - [Generics and substitutions](./generics.md)
+        - [`TypeFolder` and `TypeFoldable`](./ty-fold.md)
     - [Generic arguments](./generic_arguments.md)
     - [Type inference](./type-inference.md)
     - [Trait solving (old-style)](./traits/resolution.md)

diff --git a/src/generics.md b/src/generics.md
@@ -0,0 +1,144 @@
+# Generics and substitutions
+
+Given a generic type `MyType<A, B, …>`, we may want to swap out the generics `A, B, …` for some
+other types (possibly other generics or concrete types). We do this a lot while doing type
+inference, type checking, and trait solving. Conceptually, during these routines, we may find out
+that one type is equal to another type and want to swap one out for the other and then swap that out
+for another type and so on until we eventually get some concrete types (or an error).
+
+In rustc this is done using the `SubstsRef` that we mentioned above (“substs” = “substitutions”).
+Conceptually, you can think of `SubstsRef` of a list of types that are to be substituted for the
+generic type parameters of the ADT.
+
+`SubstsRef` is a type alias of `List<GenericArg<'tcx>>` (see [`List` rustdocs][list]).
+[`GenericArg`] is essentially a space-efficient wrapper around [`GenericArgKind`], which is an enum
+indicating what kind of generic the type parameter is (type, lifetime, or const).  Thus, `SubstsRef`
+is conceptually like a `&'tcx [GenericArgKind<'tcx>]` slice (but it is actually a `List`).
+
+[list]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/struct.List.html
+[`GenericArg`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/subst/struct.GenericArg.html
+[`GenericArgKind`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/subst/enum.GenericArgKind.html
+
+So why do we use this `List` type instead of making it really a slice? It has the length "inline",
+so `&List` is only 32 bits. As a consequence, it cannot be "subsliced" (that only works if the
+length is out of line).
+
+This also implies that you can check two `List`s for equality via `==` (which would be not be
+possible for ordinary slices). This is precisely because they never represent a "sub-list", only the
+complete `List`, which has been hashed and interned.
+
+So pulling it all together, let’s go back to our example above:
+
+```rust,ignore
+struct MyStruct<T>
+```
+
+- There would be an `AdtDef` (and corresponding `DefId`) for `MyStruct`.
+- There would be a `TyKind::Param` (and corresponding `DefId`) for `T` (more later).
+- There would be a `SubstsRef` containing the list `[GenericArgKind::Type(Ty(T))]`
+    - The `Ty(T)` here is my shorthand for entire other `ty::Ty` that has `TyKind::Param`, which we
+      mentioned in the previous point.
+- This is one `TyKind::Adt` containing the `AdtDef` of `MyStruct` with the `SubstsRef` above.
+
+Finally, we will quickly mention the
+[`Generics`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/struct.Generics.html) type. It
+is used to give information about the type parameters of a type.
+
+### Unsubstituted Generics
+
+So above, recall that in our example the `MyStruct` struct had a generic type `T`. When we are (for
+example) type checking functions that use `MyStruct`, we will need to be able to refer to this type
+`T` without actually knowing what it is. In general, this is true inside all generic definitions: we
+need to be able to work with unknown types. This is done via `TyKind::Param` (which we mentioned in
+the example above).
+
+Each `TyKind::Param` contains two things: the name and the index. In general, the index fully
+defines the parameter and is used by most of the code. The name is included for debug print-outs.
+There are two reasons for this. First, the index is convenient, it allows you to include into the
+list of generic arguments when substituting. Second, the index is more robust. For example, you
+could in principle have two distinct type parameters that use the same name, e.g. `impl<A> Foo<A> {
+fn bar<A>() { .. } }`, although the rules against shadowing make this difficult (but those language
+rules could change in the future).
+
+The index of the type parameter is an integer indicating its order in the list of the type
+parameters. Moreover, we consider the list to include all of the type parameters from outer scopes.
+Consider the following example:
+
+```rust,ignore
+struct Foo<A, B> {
+  // A would have index 0
+  // B would have index 1
+
+  .. // some fields
+}
+impl<X, Y> Foo<X, Y> {
+  fn method<Z>() {
+    // inside here, X, Y and Z are all in scope
+    // X has index 0
+    // Y has index 1
+    // Z has index 2
+  }
+}
+```
+
+When we are working inside the generic definition, we will use `TyKind::Param` just like any other
+`TyKind`; it is just a type after all. However, if we want to use the generic type somewhere, then
+we will need to do substitutions.
+
+For example suppose that the `Foo<A, B>` type from the previous example has a field that is a
+`Vec<A>`. Observe that `Vec` is also a generic type. We want to tell the compiler that the type
+parameter of `Vec` should be replaced with the `A` type parameter of `Foo<A, B>`. We do that with
+substitutions:
+
+```rust,ignore
+struct Foo<A, B> { // Adt(Foo, &[Param(0), Param(1)])
+  x: Vec<A>, // Adt(Vec, &[Param(0)])
+  ..
+}
+
+fn bar(foo: Foo<u32, f32>) { // Adt(Foo, &[u32, f32])
+  let y = foo.x; // Vec<Param(0)> => Vec<u32>
+}
+```
+
+This example has a few different substitutions:
+
+- In the definition of `Foo`, in the type of the field `x`, we replace `Vec`'s type parameter with
+  `Param(0)`, the first parameter of `Foo<A, B>`, so that the type of `x` is `Vec<A>`.
+- In the function `bar`, we specify that we want a `Foo<u32, f32>`. This means that we will
+  substitute `Param(0)` and `Param(1)` with `u32` and `f32`.
+- In the body of `bar`, we access `foo.x`, which has type `Vec<Param(0)>`, but `Param(0)` has been
+  substituted for `u32`, so `foo.x` has type `Vec<u32>`.
+
+Let’s look a bit more closely at that last substitution to see why we use indexes. If we want to
+find the type of `foo.x`, we can get generic type of `x`, which is `Vec<Param(0)>`. Now we can take
+the index `0` and use it to find the right type substitution: looking at `Foo`'s `SubstsRef`, we
+have the list `[u32, f32]` , since we want to replace index `0`, we take the 0-th index of this
+list, which is `u32`. Voila!
+
+You may have a couple of followup questions…
+
+ **`type_of`** How do we get the “generic type of `x`"? You can get the type of pretty much anything
+ with the   `tcx.type_of(def_id)` query. In this case, we would pass the `DefId` of the field `x`.
+ The `type_of` query always returns the definition with the generics that are in scope of the
+ definition. For example, `tcx.type_of(def_id_of_my_struct)` would return the “self-view” of
+ `MyStruct`: `Adt(Foo, &[Param(0), Param(1)])`.
+
+**`subst`** How do we actually do the substitutions? There is a function for that too! You use
+[`subst`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/subst/trait.Subst.html) to
+replace a `SubstRef` with another list of types.
+
+[Here is an example of actually using `subst` in the compiler][substex].  The exact details are not
+too important, but in this piece of code, we happen to be converting from the `rustc_hir::Ty` to
+a real `ty::Ty`. You can see that we first get some substitutions (`substs`).  Then we call
+`type_of` to get a type and call `ty.subst(substs)` to get a new version of `ty` with
+the substitutions made.
+
+[substex]: https://github.com/rust-lang/rust/blob/597f432489f12a3f33419daa039ccef11a12c4fd/src/librustc_typeck/astconv.rs#L942-L953
+
+**Note on indices:** It is possible for the indices in `Param` to not match with what we expect. For
+example, the index could be out of bounds or it could be the index of a lifetime when we were
+expecting a type. These sorts of errors would be caught earlier in the compiler when translating
+from a `rustc_hir::Ty` to a `ty::Ty`. If they occur later, that is a compiler bug.
+
+
diff --git a/src/ty-fold.md b/src/ty-fold.md
@@ -0,0 +1,105 @@
+# `TypeFoldable` and `TypeFolder`
+
+How is this `subst` query actually implemented? As you can imagine, we might want to do
+substitutions on a lot of different things. For example, we might want to do a substitution directly
+on a type like we did with `Vec` above. But we might also have a more complex type with other types
+nested inside that also need substitutions.
+
+The answer is a couple of traits:
+[`TypeFoldable`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/fold/trait.TypeFoldable.html)
+and
+[`TypeFolder`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/fold/trait.TypeFolder.html).
+
+- `TypeFoldable` is implemented by types that embed type information. It allows you to recursively
+  process the contents of the `TypeFoldable` and do stuff to them.
+- `TypeFolder` defines what you want to do with the types you encounter while processing the
+  `TypeFoldable`.
+
+For example, the `TypeFolder` trait has a method
+[`fold_ty`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/fold/trait.TypeFolder.html#method.fold_ty)
+that takes a type as input a type and returns a new type as a result. `TypeFoldable` invokes the
+`TypeFolder` `fold_foo` methods on itself, giving the `TypeFolder` access to its contents (the
+types, regions, etc that are contained within).
+
+You can think of it with this analogy to the iterator combinators we have come to love in rust:
+
+```rust,ignore
+vec.iter().map(|e1| foo(e2)).collect()
+//             ^^^^^^^^^^^^ analogous to `TypeFolder`
+//         ^^^ analogous to `TypeFoldable`
+```
+
+So to reiterate:
+
+- `TypeFolder`  is a trait that defines a “map” operation.
+- `TypeFoldable`  is a trait that is implemented by things that embed types.
+
+In the case of `subst`, we can see that it is implemented as a `TypeFolder`:
+[`SubstFolder`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/subst/struct.SubstFolder.html).
+Looking at its implementation, we see where the actual substitutions are happening.
+
+However, you might also notice that the implementation calls this `super_fold_with` method. What is
+that? It is a method of `TypeFoldable`. Consider the following `TypeFoldable` type `MyFoldable`:
+
+```rust,ignore
+struct MyFoldable<'tcx> {
+  def_id: DefId,
+  ty: Ty<'tcx>,
+}
+```
+
+The `TypeFolder` can call `super_fold_with` on `MyFoldable` if it just wants to replace some of the
+fields of `MyFoldable` with new values. If it instead wants to replace the whole `MyFoldable` with a
+different one, it would call `fold_with` instead (a different method on `TypeFoldable`).
+
+In almost all cases, we don’t want to replace the whole struct; we only want to replace `ty::Ty`s in
+the struct, so usually we call `super_fold_with`. A typical implementation that `MyFoldable` could
+have might do something like this:
+
+```rust,ignore
+my_foldable: MyFoldable<'tcx>
+my_foldable.subst(..., subst)
+
+impl TypeFoldable for MyFoldable {
+  fn super_fold_with(&self, folder: &mut impl TypeFolder<'tcx>) -> MyFoldable {
+    MyFoldable {
+      def_id: self.def_id.fold_with(folder),
+      ty: self.ty.fold_with(folder),
+    }
+  }
+
+  fn super_visit_with(..) { }
+}
+```
+
+Notice that here, we implement `super_fold_with` to go over the fields of `MyFoldable` and call
+`fold_with` on *them*. That is, a folder may replace  `def_id` and `ty`, but not the whole
+`MyFoldable` struct.
+
+Here is another example to put things together: suppose we have a type like `Vec<Vec<X>>`. The
+`ty::Ty` would look like: `Adt(Vec, &[Adt(Vec, &[Param(X)])])`. If we want to do `subst(X => u32)`,
+then we would first look at the overall type. We would see that there are no substitutions to be
+made at the outer level, so we would descend one level and look at `Adt(Vec, &[Param(X)])`. There
+are still no substitutions to be made here, so we would descend again. Now we are looking at
+`Param(X)`, which can be substituted, so we replace it with `u32`. We can’t descend any more, so we
+are done, and  the overall result is `Adt(Vec, &[Adt(Vec, &[u32])])`.
+
+One last thing to mention: often when folding over a `TypeFoldable`, we don’t want to change most
+things. We only want to do something when we reach a type. That means there may be a lot of
+`TypeFoldable` types whose implementations basically just forward to their fields’ `TypeFoldable`
+implementations. Such implementations of `TypeFoldable` tend to be pretty tedious to write by hand.
+For this reason, there is a `derive` macro that allows you to `#![derive(TypeFoldable)]`. It is
+defined
+[here](https://github.com/rust-lang/rust/blob/master/src/librustc_macros/src/type_foldable.rs).
+
+**`subst`** In the case of substitutions the [actual
+folder](https://github.com/rust-lang/rust/blob/04e69e4f4234beb4f12cc76dcc53e2cc4247a9be/src/librustc/ty/subst.rs#L467-L482)
+is going to be doing the indexing we’ve already mentioned. There we define a `Folder` and call
+`fold_with` on the `TypeFoldable` to process yourself.  Then
+[fold_ty](https://github.com/rust-lang/rust/blob/04e69e4f4234beb4f12cc76dcc53e2cc4247a9be/src/librustc/ty/subst.rs#L545-L573)
+the method that process each type it looks for a `ty::Param` and for those it replaces it for
+something from the list of substitutions, otherwise recursively process the type.  To replace it,
+calls
+[ty_for_param](https://github.com/rust-lang/rust/blob/04e69e4f4234beb4f12cc76dcc53e2cc4247a9be/src/librustc/ty/subst.rs#L589-L624)
+and all that does is index into the list of substitutions with the index of the `Param`.
+