Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow rewriting on types #4940

Closed
odersky opened this issue Aug 13, 2018 · 10 comments
Closed

Allow rewriting on types #4940

odersky opened this issue Aug 13, 2018 · 10 comments

Comments

@odersky
Copy link
Contributor

odersky commented Aug 13, 2018

We had a discussion recently about the problem to represent the type of vector concat. If we assume a type Vector[Elem, Len <: Nat], it is straightforward to implement a typelevel

rewrite def concat[A, M, N](xs: Vec[A, M], ys: Vec[A, N]): Vec[A, _]

such that the result of expanding concat for concrete vectors is a vector whose length is known to be the sum of the lengths of the argument vectors. But, annoyingly, there's no simple way to declare this fact in the result type of concat. One can fallback to implicits, i.e. something like

rewrite def concat[A, M, N](xs: Vec[A, M], ys: A[A, N])(implicit ev: Add[M, N]): Vec[A, ev.Result]

But that's a rather roundabout way to achieve something that should be simple.

The proposal here is to allow rewrite not only on terms but also on types. It's sort of the dual of #4939 in that it caters for computations that produce types whereas #4939 is a way to consume types directly in typelevel computations.

To solve the example above, one would write

rewrite type Add[M <: Nat, N <: Nat] <: Nat = type M match {
  case Z => N
  case S[type M1] => S[Add[M1, N]]
}

This assumes type matches as proposed by #4939, but of course one could also use their
expansion:

rewrite type Add[M <: Nat, N <: Nat] <: Nat = rewrite erasedValue[M] match {
  case _: Z => N
  case _: S[type M1] => S[Add[M1, N]]
}

Unlike #4939, this feature is not simply encodable in the base language, so one has to explain its properties in detail:

Question: What is the syntax of a rewrite type definition?

Answer: The same as a rewrite def, except that it's type instead of def, the (optional) declared result type is prefixed by <: instead of : and the right hand side is a type expression. Type expressions are formed from

  • blocks { ... }, ending in a type expression
  • rewrite, type, and implicit matches, with type expressions in each case
  • rewrite conditionals, with type expressions in each branch
  • types

If a declared result type is missing, Any is assumed.

Question: When is a rewrite type application expanded?

Answer: Analogous to a rewrite term application, it is expanded on each use outside a rewrite method or type. In particular this means that in

rewrite def concat[A, M, N](xs: Vec[A, M], ys: A[A, N]): Vec[A, Add[M, N]]

the Add[M, N] is expanded only when concat is applied itself. Trying to expand it at the definition side would not work anyway, as the type match could not be reduced.

Question: How are applications of rewrite types handled when seen from inside a rewrite definition? In this case they cannot be expanded, so we have to have a fallback to represent them directly.

Answer: A rewrite type application T[S1, ..., Sn] in a rewrite definition is represented as an abstract type with the declared return type of T[S1, ..., Sn] as upper bound and Nothing as lower bound.

Question: How is the conformance of the RHS to the return type checked?

Answer When checking conformance of a RHS the the declared return type, we do
expand rewrite types according to the information found in the branch. E.g. if we implement concat like this:

rewrite def concat[A, M, N](xs: Vec[A, M], ys: Vec[A, N]): Vec[A, Add[M, N]] = {
  xs match {
    case x :: xs1 => x :: concat(xs1, ys)
    case Nil => ys
  }
}

we should be able to reason as follows:

  • In the case x :: xs1 we have xs1: Vec[A, M1] such that M = S[M1]
  • By reading off the declared result type, concat(xs1, ys): Vec[A, Add[M1, N1]]
  • By interpreting ::: concat(xs, ys): Vec[A, S[Add[M1, N1]]
  • By rewriting Add[M, N] = Add[S[M1], N] = S[Add[M1, N]]

The Nil case is similar but simpler.

This part looks contentious. One can hope that we will have usually sufficient reasoning power to establish that a RHS conforms to its declared type (of course casts are available as a last resort if this fails). As an alternative we could also give up, and prove the correspondence of result to result type only at expansions. But this feels a bit like cheating...

[EDIT] I think it's likely we'll need to cheat. Trying to do the inductive reasoning outlined above would mean that rewrite types should be simplified whenever possible. And that's a whole different game. If we stick to the rule that, like rewrite terms, rewrite types are expanded only outside rewrite definitions, the logical consequence is that checking rewrite types is also only done when they are expanded.

@odersky
Copy link
Contributor Author

odersky commented Aug 16, 2018

In light of #4938 (comment) one should also think of a more restricted but faster alternative: Allow simple type matches without guards (but no other matches or blocks) as a form of type. This would make it easy to compute these types as types instead of having to do term rewriting. The representation of such a type could be

case class TypeMatch(scrutinee: Type, cases: List[Type]) extends CachedGroundType

A type representing a case without type variables would be a function type with the pattern type as argument. A type representing a case with type variables would be a HkLambdaType over a function type.

There are two possible variants of this:

  1. Have rewrite type definitions and allow type matches only in these definitions. Then matching on a type proceeds as in #4939, and a failed choice is a reduction error.

  2. Don't introduce rewrite type definitions and allow type matches everywhere as a regular form of type. Such type matches would be reduced where-ever possible, and if a reduction is not possible the original type would be kept. Reduction has to be more careful for negative results. I.e. before discarding a type pattern TP for scrutinee S we have to prove that S can never be a subtype of TP in all possible instantiations of type parameters and abstract types. One possible way to achieve this could be to add a special mode to the type comparer that will swap bounds. I.e. if A is an abstract type then A <: B provided A's upper bound is a subtype of B. But under the special mode we'd compare with the lower bound instead. Similarly for TypeParams.

Since TypeMatch types are cached, we can then apply an optimization: Leave the result of a reduction attempt as the field of the type. So reductions have to be tried only once per TypeMatch type.

If we go for (2) that would also force our hand towards "no cheating". Since there are is no mode for when a type match should be reduced or not, we can't base a typechecking distinction on such a mode.

@odersky
Copy link
Contributor Author

odersky commented Aug 16, 2018

Regarding the caching optimizations, it's not so simple unfortunately. Conceptually, reduction of a type match is a function from a type and a context to a type. Different contexts can give different results. So match reduction is logically a part of subtyping, which is also context-dependent: In a given context a type match is equal to its reduction (i.e. the two are mutual subtypes).

For caching this means if has to be made dependent on the context, which would ruin its effectiveness. Alternatively one could play with keeping a trace of all abstract types and type parameters that were involved in the subtype checks that decided that a reduction could be, or could not be, performed. If all of these have the same bounds the next time a reduction is attempted, we can keep the previous result.

@Blaisorblade
Copy link
Contributor

Reduction has to be more careful for negative results. I.e. before discarding a type pattern TP for scrutinee S we have to prove that S can never be a subtype of TP in all possible instantiations of type parameters and abstract types.

We have/need similar logic for pattern matching / exhaustivity checking — oversimplifying, we insert type variables and use <:< to solve for them, just like SAT solvers show impossibility. In your example, you'd show that x.A <:< B is impossible by trying to satisfy newTypeVar(bounds = x.A.bounds) <:< B and failing.

But per #4029, this seems to need backtracking, intrinsically. For instance A & B <:< C is satisfiable if either A <:< C or B <:< C is, and A & B <:< C is unsatisfiable if both are.
To improve this without backtracking, <:< could report that it discarded search branches, hence we don't know if the branch can be discarded.

Interestingly, @liufengyun also does prefix inference sometimes: to figure out if SomeTime[ ... this[...] ...] <:< B, RemoveThisMap (in Space.scala) replaces this[...] (under some conditions) with a type variable. That's similar to the use of unification/subtype constraint solving in GADT matching, but I had never seen it for prefixes.

@liufengyun
Copy link
Contributor

liufengyun commented Aug 17, 2018

I was thinking about a related practical application of type-level rewrite, shape checking in deep learning (https://github.com/tensorflow/swift/blob/master/docs/DesignOverview.md):

Shape Checking: Shape errors are an ongoing problem for machine learning research and productivity. We already have some basic shape checks from the TensorFlow graph building APIs, but we need to invest in first class support for this to produce better diagnostics and diagnose more errors at high level API boundaries instead of inside the implementation of those APIs. We have initial ideas of how this could work, but need to explore this much further.

In DL, the dimensions can be large, encoding of numbers at type-level will be inefficient and curbersome. I am wondering if we can do something like the following:

// library
type Const[T] = Int @const
class Vector[N <: Const[Int]] {
  def padding(x: Const[Int]): Vector[N + x.type] = ???

  def concat[M <: Const[Int]](vec: Vector[M]): Vector[M + N] = ???
}

// app code
val vec: Vector[500] = ...
val vec2 : Vector[1000] = vec.padding(500)

In the above, + is an intrinsic rewrite type operator that reduces two constant integer types into its sum whenever possible. The type operator + only accept ConstantType or a const TermRef.


(Added)

A problem: how to make the following method maxPad type check? In fact, in padding, we don't need x to be a constant:

class Vector[N <: Const[Int]] {
  def padding(x: Int): Vector[N + x.type] = ???

  def maxPad(a: Int, b: Int): Vector[N + Max[a.type, b.type]] =
    padding(Max(a, b))    // `Max` is the term level op for the type-level `Max`
}

In the above, Max is the term level op for the type-level Max. For intrinsic rewrite type operators like Max and +, we also define a corresponding a term-level operation that produces a term of the corresponding type.

@liufengyun
Copy link
Contributor

Another motivation from Spatial for FPGA programming (https://spatial-lang.readthedocs.io/en/latest/faq.html):

Why does Spatial include the VectorN type?

Unlike C++, Scala does not support template classes which are parameterized by integer values. This means that, for methods where the number of elements in the resulting Vector depends on an argument to that method, the resulting type of the method cannot be written statically in Scala. However, this width is still statically analyzable. As a workaround, Spatial uses the VectorN type to denote a Vector which needs static analysis of its width.

They use complex mechanism to ensure shape safety of the operation slice on vectors: source link.

@Blaisorblade
Copy link
Contributor

A problem: how to make the following method maxPad type check?

If one goes that road, and you want to typecheck definitions rather than just their uses, sooner or later you'll need the typechecker to understand 1 + (x - 1) = x, etc. There are known techniques that aren't worth reinventing. Raw dependent types require you to proof that, but systems like Dependent ML, LiquidHaskell, Dafny etc. use domain-specific solvers — for constraints in Presburger arithmetic (that is, without products of variables), there are complete decision procedures. But we likely don't want to reimplement any of that, we should use @gsps's work instead (assuming it can connect to a SMT solver with support for these techniques, which I guess they have).

Peano numbers are indeed inefficient once they get big enough, they're a common testcase because they let you test support for more useful datatypes.

@liufengyun
Copy link
Contributor

If one goes that road, and you want to typecheck definitions rather than just their uses, sooner or later you'll need the typechecker to understand 1 + (x - 1) = x, etc.

In the context of verification or proof assistant, I think you are right. But for real-world application or at least the two applications above, I doubt that. An advantage of the approach is that Dotty already has all the infrastructure to support that, no extension of core types required if I were correct.

@Blaisorblade
Copy link
Contributor

Blaisorblade commented Aug 19, 2018

In the context of verification or proof assistant, I think you are right. But for real-world application or at least the two applications above, I doubt that.

But whenever you use dependent types, checking that two types are compatible can require checking/proving equality of values; even for the concat in the OP qualifies you're doing verification, it's just a lucky example where all the proof obligations follow immediately by normalization. That fails in this example:

xs match { // Vec[A, 1 + n]
 case x :: xs => xs ++ List(x) // Vec[A, n + 1]
}

There you already need 1 + n = n + 1, which doesn't follow by normalization.

That's not very compelling, so let's look at a recent example I saw on Twitter, it seems somebody tried to turn a rectangular (sized) vector into a vector of vectors. As best as I can recover the code from the error message (or make it up) and translate it into Scala, the code involved seems to be:

def chunks(xs: Vec[A, o * p]): Vec[Vec[A, p], o] = {
  // ...
  for {
    i <- (0 until o).toVec
  } yield {
    for {
      j <- (0 until p).toVec
    } {
      xs(i * p + j)
    }
  }
}

Here's the resulting compiler error in Haskell: https://twitter.com/lexi_lambda/status/1029918793034805248

For extra fun, ensuring the above is correct goes even beyond Presburger arithmetic, so I'm not sure how to verify it.

EDIT: yet another example in the wild comes from writing mergesort: https://stackoverflow.com/q/51919602/53974. (Wasn't even looking for either example BTW, just ran into both ones in the last couple days 😉).

@milessabin
Copy link
Contributor

You can probably find a few interesting examples among the things that people have been doing with shapeless's Sized.

@liufengyun
Copy link
Contributor

This is already implemented as match types.

http://dotty.epfl.ch/docs/reference/new-types/match-types.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants