Skolem based gadt constraints #5736

abgruszecki · 2019-01-17T16:00:43Z

Fixes #5667.
Fixes #2985.

abgruszecki · 2019-01-17T19:20:43Z

@odersky this looks ready to go! Commit messages contain basically all the information. Typeclass derivation needed to be altered, since from:

(t: T) match {
  case _: U => ...
}

we're not allowed to deduce that U <: T (consider that if U = Any, the pattern always matches).

I realize that approximating Skolem bounds in the way I'm doing it is unsound, but for now it prevents an even worse unsoundness. Exact problem is that Sko(U) | Sko(U) is simplified to U for two different Skolems, which is not what we want when considering the constraint like T >: Sko(U) | Sko(U). I'm noting for posterity that Paolo suggested we could have a separate mode where we maintain the "necessary" conditions while simplifying unions (and intersections?) - i.e. do not simplify such unions. That approach would solve the problem I've encountered and we might want to implement it later.

smarter · 2019-01-17T19:36:11Z

Exact problem is that Sko(U) | Sko(U) is simplified to U for two different Skolems,

Wait, if the constraint solver does this kind of stuff then surely it's already unsound and we should fix it ?

abgruszecki · 2019-01-17T19:48:08Z

@smarter if I understood @Blaisorblade, that's the difference between "necessary" and "sufficient". What the current constraint solver does is "sufficient", and that is what is needed for type inference. What GADT constraints need is "necessary", and, well, they don't have it (yet). For now, if constraint already contains T >: S1 <: S1 and we're asked to record that T >: Sko(U), we ensure that S1 | Sko(U) <:< S2. We do not record that T >: Sko(U). This is quite likely unsound, but the only case I can think of where it makes a difference is if we need somehow to check that T is inhabited (consider that all we do not record is a lower bound of a Skolem). I think for now it's fine to just merge this PR, since it gets rid of type unsoundness of this kind:

final case class Inv[T](t: T)
def foo[T](t: T): T = Inv(t) match { case Inv(_: Int) => 0 }
val notFive: 5 = foo[5](5) // actually zero

odersky · 2019-01-18T13:21:31Z

I fail to see why simplifying T >: Sko(U) | Sko(U) to T >: U is a problem. That's precisely as specified as far as I am concerned. We do check that the resulting constraint is satisfiable. If it is not, the subtyping derivation fails.

odersky · 2019-01-18T13:23:51Z

@smarter if I understood @Blaisorblade, that's the difference between "necessary" and "sufficient". What the current constraint solver does is "sufficient", and that is what is needed for type inference. What GADT constraints need is "necessary", and, well, they don't have it (yet).

That looks like a good characterization of the problem, yes. It also looks hard to achieve, no? Maybe we do need two different data structures and schemes for type inference and GADTs after all?

odersky

I have the impression we need a deeper analysis where the gap between allowed normalizations in inference and GADT computations makes a difference.

In inference: Unforced narrowing of bounds is sound, but loses solutions (which is acceptable).
In GADT bounds: Unforced narrowing of bounds is unsound, hence unacceptable.

Vice versa, in GADT computations we are allowed to widen bounds (which loses precision, but is sound) whereas in inference such widening is unsound.

So it seems to me a systematic solution would entail replacing lubs with glbs and vice versa in all constraint solving code. EDIT: No that would not work either. The issue is in what direction are we allowed to lose precision. So, let's say we have the situation

a1.type | a2.type  <:  B

and suppose we want to avoid unions of singleton types. In constraint solving for inference, it's OK to approximate this with A <: B where A is the common type of a1, and a2. In GADT computation it is not, but we could approximate by Nothing <: B instead. Correct?

odersky · 2019-01-18T13:30:31Z

compiler/src/dotty/tools/dotc/core/Contexts.scala

@@ -779,6 +779,21 @@ object Contexts {
        }, gadts)
      }

+      // avoid recording skolems in lower bounds
+      // recording two skolem bounds results in an union, which is then simplified
+      // T >: Sko(U) | Sko(U) is simplified to T >: U, which is simply wrong


Where is this simplification of T >: Sko(U) | Sko(U) to T >: U done? If this done only for Skolemtypes? I suspect the answer is that it happens for all singleton types. But then the fix should also be generalized to all singleton types, right?

abgruszecki · 2019-01-18T15:34:51Z

test performance please

compiler/src/dotty/tools/dotc/core/ConstraintHandling.scala

dottybot · 2019-01-18T15:36:44Z

performance test scheduled: 1 job(s) in queue, 0 running.

dottybot · 2019-01-18T17:23:02Z

Performance test finished successfully:

Visit http://dotty-bench.epfl.ch/5736/ to see the changes.

Benchmarks is based on merging with master (da1e196)

abgruszecki · 2019-01-20T22:03:17Z

Rebased on top of master. More changes to typeclass derivation tests were needed - I'll open a separate PR for replacing erasedValue with typeOf (see #5564, last 3 comments specifically).

@smarter I stared very hard at OrderingConstraint and Constraint, and I believe that current implementation is sound. Specifically: I think the calls to TypeComparer#lub in fullXBounds only differ from OrType when HK types are involved (see TypeComparer#liftIfHK), and the ones in replace only rely on Any | T being simplified to T. I added a test for some more creative usages of GADT bounds in the last commit. I think this PR is fine to merge as-is, though we may want to re-think Constraint calling lub/glb later on.

EDIT: to clarify, the above assertions about TypeComparer#lub invocations only apply to the way SmartGADTMap uses (Ordering)Constraint.

@odersky I believe GADT bounds cannot ever be approximated, only simplified. I think it'd be ~~sound to simplify T >: Sko(U) | Sko(U) to T >: Sko(U)~~, but I also think we should wait until we know there's an actual performance problem. Normal compilation times seem to be unaffected.

EDIT: see conversation below reg. simplifying skolem unions.

Blaisorblade · 2019-01-21T18:45:05Z

I think it'd be sound to simplify T >: Sko(U) | Sko(U) to T >: Sko(U)

If those are the same Sko(U) (by reference equality), yes, otherwise it's not clear. I'll write Sko(i1, U) and Sko(i2, U) for two distinct skolems (that is, they're not reference-equal; think of i1, i2 as the respective object identity).

Now, T >: Sko(i1, U) admits T = Sko(i1, U) as solution, while T >: Sko(i1, U) | Sko(i2, U) doesn't, so simplifying the latter constraint to the former would add invalid solution. Following @odersky's summary, that's not okay for inference, but okay for GADTs (that is, for proving some constraints are unsatisfiable).

BTW, unsound inference results are usually caught later, but only if the violation is visible in the trees produced by inference. Conversely, constrainPatternType(SkolemType(subtp), tp) invents a Skolem and uses it in a constraint, but it's not obvious this constraint is added to to the actual tree.

abgruszecki · 2019-01-21T21:02:28Z

@Blaisorblade I'm less and less clear on whether it's actually sound or not to simplify the bounds in such a manner. For typechecking, intuitively it should be acceptable to know less bounds, because then we will simply not accept some "morally valid" code. Note that this'd mean that it'd be sound to drop both skolems from the lower bound without losing soundness. For exhaustivity and useless pattern checks (which you may have meant by "some constraints are unsatisfiable"), we actually would not want to simplify the constraints in such a way (since T = Sko(i1, U) might be a constraint required by a subpattern, and knowing that T >: Sko(i1, U) | Sko(i2, U) lets us decide that a pattern is useless).

Blaisorblade · 2019-02-05T23:09:27Z

I'm less and less clear on whether it's actually sound or not to simplify the bounds in such a manner. For typechecking, intuitively it should be acceptable to know less bounds, because then we will simply not accept some "morally valid" code.
For exhaustivity and useless pattern checks (which you may have meant by "some constraints are unsatisfiable"), we actually would not want to simplify the constraints in such a way (since T = Sko(i1, U) might be a constraint required by a subpattern, and knowing that T >: Sko(i1, U) | Sko(i2, U) lets us decide that a pattern is useless).

Not approximating is better than approximating if we can afford it, of course. So if you can avoid simplifying those skolems without performance loss, IMHO go ahead.

But you talk about soundness, and I disagree there. And I think the two scenarios you describe should work the same way.

Overall, we seem to have three scenarios:

type inference, where the solver must search for sufficient constraints
GADT typechecking, where the solver must search for necessary constraints; this isn't implemented yet, but let me assume it'll be.
warnings about missing branches (exhaustivity) or useless branches (redundant patterns.

It seems the solver can only be in sufficient or (if implemented) necessary mode.
For scenario 3, we can pick which — since we must always approximate in one or the other direction. Neither direction breaks type soundness, but missing branches cause MatchError while extra branches are just extra code (which can be assert(false) if the user is sure).
But the warning says "match may not be exhaustive" — that is, the compiler couldn't prove the missing cases are impossible, so it approximate to "please add them".
That is, a pattern is requested unless we can soundly prove it's impossible. That's what I argued in #4029.

If we agree on this behavior for scenario 3, then the solver must search for necessary constraints, like scenario 2 — and if the resulting constraint is unsatisfiable, the original ones were as well and the considered branch is impossible.

The above implies the compiler might require unused cases. You seem to disagree.
But we can't avoid errors in both directions, so do you want to err in the other direction?

We should pick somehow and stick to that.

abgruszecki · 2019-02-06T10:38:49Z

@Blaisorblade that comment was from two weeks ago, before we discussed everything thoroughly and I came to the conclusions from my presentation/report. The situation, as I understand it now, is as follows:

Type inference is concerned with finding a solution satisfying constraint C. We can instead solve a constraint C' (found, for example, by simplifying T >: U(x) | U(y)* to T >: U) iff a solution to C' is guaranteed to be a solution for C. This is what I understand to be the sufficient case - solving C' is sufficient for solving C.

* - allow me to use the syntax U(x) for denoting some singleton of type U, since singleton types other than skolems concern us as well.

GADT constraint inference is concerned with finding the strongest possible constraint D entailed by C, G, x: P, x.type <: S where C is the current constraint, G is the current environment, P, S are respectively types of the pattern and the scrutinee, x is an existential term of types both P and S. If D is entailed by the above, then we can return D' instead as long as any solution to D is also a solution to D'. This is the necessary case - solving D' is necessary for solving D. We can also say that D' is no stronger that D, which in a roundabout way brings us to the "less constraints is always sound for typechecking" argument.

To emit safe GADT exhaustivity warnings, we must decide if the constraint C entailed by a particular missing pattern is unsatisfiable. Yes, we want/should err on the side of caution and emit a warning even if we are uncertain that the pattern is necessary. Therefore, instead of solving UNSAT(C), we may solve UNSAT(C') iff any solution to C is guaranteed to be a solution to C'. Again, this is the necessary case.

What this means is that for GADT constraints, we can simplify T >: U(x) | U(y) to T >: U(x) or T >: U(y) or T >: Nothing. However! For constraint inference, this would mean that we don't accept code that is morally valid. For exhaustivity checking, this would mean that we emit spurious warnings. Both of those cases are bad from user perspective - we should have a good reason to not accept code that should compile or to bother users with useless warnings. Since there's so far no observable impact on performance, I'd just not simplify the constraint for the time being. There's some subtlety to that argument, but I'd very much like to first finish fixing the errors in this PR that I discovered and then to present everything wrapped up together.

If we do not insert TypeVars into the bounds every time, then the only time we need to remove them is when taking the full bounds of some type. Since that logic now resides in ConstraintHandling and replaces all TypeParamRefs internal to SmartGADTMap, we have no need to perform expensive type traversals. This removes the only reason for caching bounds. The addition of HK parameter variance adaptation was necessary to make tests/pos/i6014-gadt.scala pass.

gadtSyms/gadtContext became redundant, so they were removed. The logic in typedDefDef was adjusted to only create a fresh context when necessary.

The rationale for using a Skolem here is: we want to record that there is at least one value that is both of the pattern type and the scrutinee type. All symbols are now considered valid for adding GADT constraints - the rationale is that set of constrainable symbols should be either selected on a per-(sub)pattern basis, or be the same for all matches. Previously, symbols which were only appearing variantly in a scrutinee type could be considered constrainable anyway because of an outer pattern match.

Also rename the classes to better reflect their role, and document and reorder definitions to make more sense.

The added symbols can have inter-dependencies in their bounds.

constrainPatternType is specific to term patterns, whereas in match types there is a simple subtyping relationship between the pattern and the scrutinee. In the future, simply calling isSubType in GADTFlexible context would likely be sufficient.

abgruszecki · 2019-05-07T07:54:56Z

I added recaps of what we talked about in person to all the discussions. Compared to when we talked, the only non-trivial change not related to any discussion is adding GadtConstraint.addToConstraint(_: List[Symbol]), which mirrors ConstraintHandling#addToConstraint and correctly allows registering symbols with interdependencies. This removed some code duplication between all the places that needed to register multiple symbols in a loop.

odersky

Otherwise LGTM. Can be merged after final fixes.

odersky · 2019-05-07T14:57:44Z

compiler/src/dotty/tools/dotc/core/TypeComparer.scala

-      true
-    } || op2
+
+    if (ctx.mode.is(Mode.GADTflexible)) {


This code should be moved out to a separate method (e.g. gadtEither), and there should be a doc comment explaining what it does, analogous to the original either.

odersky · 2019-05-07T15:00:12Z

compiler/src/dotty/tools/dotc/printing/PlainPrinter.scala

@@ -208,7 +208,10 @@ class PlainPrinter(_ctx: Context) extends Printer {
        else {
          val constr = ctx.typerState.constraint
          val bounds =
-            if (constr.contains(tp)) constr.fullBounds(tp.origin)(ctx.addMode(Mode.Printing))
+            if (constr.contains(tp)) {
+              val ctx0 = ctx.addMode(Mode.Printing)


Why not leave it inline? ctx.addMode(Mode.Printing).typeComparer.fullBounds(tp.origin)

abgruszecki force-pushed the skolem-based-gadt-constraints branch 2 times, most recently from ea52ba8 to a0dbebb Compare January 17, 2019 16:44

abgruszecki requested a review from odersky January 17, 2019 19:13

abgruszecki assigned abgruszecki and odersky and unassigned abgruszecki Jan 17, 2019

abgruszecki mentioned this pull request Jan 17, 2019

[CI only] Singleton constraint test #5704

Closed

odersky requested changes Jan 18, 2019

View reviewed changes

odersky assigned abgruszecki and unassigned odersky Jan 18, 2019

smarter reviewed Jan 18, 2019

View reviewed changes

compiler/src/dotty/tools/dotc/core/ConstraintHandling.scala Outdated Show resolved Hide resolved

abgruszecki force-pushed the skolem-based-gadt-constraints branch from f43c6f5 to d30c925 Compare January 20, 2019 21:24

abgruszecki force-pushed the skolem-based-gadt-constraints branch from e96e094 to 1ef5cfd Compare February 5, 2019 19:25

abgruszecki force-pushed the skolem-based-gadt-constraints branch 4 times, most recently from 1763131 to d0f48b8 Compare February 13, 2019 09:59

abgruszecki added 13 commits May 3, 2019 16:46

Allow constraining type parameters of all enclosing functions

2ab8a6d

gadtSyms/gadtContext became redundant, so they were removed. The logic in typedDefDef was adjusted to only create a fresh context when necessary.

Do not constrain types under either approx

eac0e53

Adjust behaviour of TypeComparer.either for GADTflexible

3df2c0f

Move GADT constraint code to a separate file

e1df3ae

Also rename the classes to better reflect their role, and document and reorder definitions to make more sense.

Document suspicious code in ProperGadtConstraint#tvar

824108c

Improve GadtConstraint comment

ea1815d

Simplify ConstraintHandling#fullBounds

3cf8530

Document Constraint methods

1171db4

Allow adding multiple symbols to GadtConstraint simultaneously

177bd36

The added symbols can have inter-dependencies in their bounds.

Restore old constrainPatternType doc

e32c2d6

abgruszecki force-pushed the skolem-based-gadt-constraints branch from 0b2fa94 to cb15213 Compare May 3, 2019 15:11

abgruszecki mentioned this pull request May 6, 2019

Take HKT injectivity into account when inferring constraints #6461

Closed

abgruszecki added 4 commits May 6, 2019 17:36

Further improve GadtConstraint comment

af5010d

Comment out accidentally committed sanity check

65e2aa2

Simplify NoMatchingImplicit#clarify

c9b093b

Further improve comments around GadtConstraint

9b96317

abgruszecki requested a review from odersky May 7, 2019 07:55

abgruszecki assigned odersky and unassigned abgruszecki May 7, 2019

odersky approved these changes May 7, 2019

View reviewed changes

odersky assigned abgruszecki and unassigned odersky May 7, 2019

abgruszecki added 2 commits May 13, 2019 14:55

Inline a variable

12b8999

Split TypeComparer#either and document both cases

3c3b3db

abgruszecki merged commit 3ef991d into scala:master May 13, 2019

abgruszecki deleted the skolem-based-gadt-constraints branch May 13, 2019 13:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skolem based gadt constraints #5736

Skolem based gadt constraints #5736

abgruszecki commented Jan 17, 2019 •

edited

Loading

abgruszecki commented Jan 17, 2019

smarter commented Jan 17, 2019

abgruszecki commented Jan 17, 2019 •

edited

Loading

odersky commented Jan 18, 2019

odersky commented Jan 18, 2019

odersky left a comment •

edited

Loading

odersky Jan 18, 2019

abgruszecki commented Jan 18, 2019

dottybot commented Jan 18, 2019

dottybot commented Jan 18, 2019

abgruszecki commented Jan 20, 2019 •

edited

Loading

Blaisorblade commented Jan 21, 2019 •

edited

Loading

abgruszecki commented Jan 21, 2019

Blaisorblade commented Feb 5, 2019

abgruszecki commented Feb 6, 2019 •

edited

Loading

abgruszecki commented May 7, 2019

odersky left a comment

odersky May 7, 2019

odersky May 7, 2019

Skolem based gadt constraints #5736

Skolem based gadt constraints #5736

Conversation

abgruszecki commented Jan 17, 2019 • edited Loading

abgruszecki commented Jan 17, 2019

smarter commented Jan 17, 2019

abgruszecki commented Jan 17, 2019 • edited Loading

odersky commented Jan 18, 2019

odersky commented Jan 18, 2019

odersky left a comment • edited Loading

Choose a reason for hiding this comment

odersky Jan 18, 2019

Choose a reason for hiding this comment

abgruszecki commented Jan 18, 2019

dottybot commented Jan 18, 2019

dottybot commented Jan 18, 2019

abgruszecki commented Jan 20, 2019 • edited Loading

Blaisorblade commented Jan 21, 2019 • edited Loading

abgruszecki commented Jan 21, 2019

Blaisorblade commented Feb 5, 2019

abgruszecki commented Feb 6, 2019 • edited Loading

abgruszecki commented May 7, 2019

odersky left a comment

Choose a reason for hiding this comment

odersky May 7, 2019

Choose a reason for hiding this comment

odersky May 7, 2019

Choose a reason for hiding this comment

abgruszecki commented Jan 17, 2019 •

edited

Loading

abgruszecki commented Jan 17, 2019 •

edited

Loading

odersky left a comment •

edited

Loading

abgruszecki commented Jan 20, 2019 •

edited

Loading

Blaisorblade commented Jan 21, 2019 •

edited

Loading

abgruszecki commented Feb 6, 2019 •

edited

Loading