Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skolem based gadt constraints #5736

Merged
merged 20 commits into from
May 13, 2019

Conversation

abgruszecki
Copy link
Contributor

@abgruszecki abgruszecki commented Jan 17, 2019

Fixes #5667.
Fixes #2985.

@abgruszecki abgruszecki force-pushed the skolem-based-gadt-constraints branch 2 times, most recently from ea52ba8 to a0dbebb Compare January 17, 2019 16:44
@abgruszecki
Copy link
Contributor Author

@odersky this looks ready to go! Commit messages contain basically all the information. Typeclass derivation needed to be altered, since from:

(t: T) match {
  case _: U => ...
}

we're not allowed to deduce that U <: T (consider that if U = Any, the pattern always matches).

I realize that approximating Skolem bounds in the way I'm doing it is unsound, but for now it prevents an even worse unsoundness. Exact problem is that Sko(U) | Sko(U) is simplified to U for two different Skolems, which is not what we want when considering the constraint like T >: Sko(U) | Sko(U). I'm noting for posterity that Paolo suggested we could have a separate mode where we maintain the "necessary" conditions while simplifying unions (and intersections?) - i.e. do not simplify such unions. That approach would solve the problem I've encountered and we might want to implement it later.

@smarter
Copy link
Member

smarter commented Jan 17, 2019

Exact problem is that Sko(U) | Sko(U) is simplified to U for two different Skolems,

Wait, if the constraint solver does this kind of stuff then surely it's already unsound and we should fix it ?

@abgruszecki
Copy link
Contributor Author

abgruszecki commented Jan 17, 2019

@smarter if I understood @Blaisorblade, that's the difference between "necessary" and "sufficient". What the current constraint solver does is "sufficient", and that is what is needed for type inference. What GADT constraints need is "necessary", and, well, they don't have it (yet). For now, if constraint already contains T >: S1 <: S1 and we're asked to record that T >: Sko(U), we ensure that S1 | Sko(U) <:< S2. We do not record that T >: Sko(U). This is quite likely unsound, but the only case I can think of where it makes a difference is if we need somehow to check that T is inhabited (consider that all we do not record is a lower bound of a Skolem). I think for now it's fine to just merge this PR, since it gets rid of type unsoundness of this kind:

final case class Inv[T](t: T)
def foo[T](t: T): T = Inv(t) match { case Inv(_: Int) => 0 }
val notFive: 5 = foo[5](5) // actually zero

@odersky
Copy link
Contributor

odersky commented Jan 18, 2019

I fail to see why simplifying T >: Sko(U) | Sko(U) to T >: U is a problem. That's precisely as specified as far as I am concerned. We do check that the resulting constraint is satisfiable. If it is not, the subtyping derivation fails.

@odersky
Copy link
Contributor

odersky commented Jan 18, 2019

@smarter if I understood @Blaisorblade, that's the difference between "necessary" and "sufficient". What the current constraint solver does is "sufficient", and that is what is needed for type inference. What GADT constraints need is "necessary", and, well, they don't have it (yet).

That looks like a good characterization of the problem, yes. It also looks hard to achieve, no? Maybe we do need two different data structures and schemes for type inference and GADTs after all?

Copy link
Contributor

@odersky odersky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the impression we need a deeper analysis where the gap between allowed normalizations in inference and GADT computations makes a difference.

In inference: Unforced narrowing of bounds is sound, but loses solutions (which is acceptable).
In GADT bounds: Unforced narrowing of bounds is unsound, hence unacceptable.

Vice versa, in GADT computations we are allowed to widen bounds (which loses precision, but is sound) whereas in inference such widening is unsound.

So it seems to me a systematic solution would entail replacing lubs with glbs and vice versa in all constraint solving code. EDIT: No that would not work either. The issue is in what direction are we allowed to lose precision. So, let's say we have the situation

a1.type | a2.type  <:  B

and suppose we want to avoid unions of singleton types. In constraint solving for inference, it's OK to approximate this with A <: B where A is the common type of a1, and a2. In GADT computation it is not, but we could approximate by Nothing <: B instead. Correct?

@@ -779,6 +779,21 @@ object Contexts {
}, gadts)
}

// avoid recording skolems in lower bounds
// recording two skolem bounds results in an union, which is then simplified
// T >: Sko(U) | Sko(U) is simplified to T >: U, which is simply wrong
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this simplification of T >: Sko(U) | Sko(U) to T >: U done? If this done only for Skolemtypes? I suspect the answer is that it happens for all singleton types. But then the fix should also be generalized to all singleton types, right?

@odersky odersky assigned abgruszecki and unassigned odersky Jan 18, 2019
@abgruszecki
Copy link
Contributor Author

test performance please

@dottybot
Copy link
Member

performance test scheduled: 1 job(s) in queue, 0 running.

@dottybot
Copy link
Member

Performance test finished successfully:

Visit http://dotty-bench.epfl.ch/5736/ to see the changes.

Benchmarks is based on merging with master (da1e196)

@abgruszecki
Copy link
Contributor Author

abgruszecki commented Jan 20, 2019

Rebased on top of master. More changes to typeclass derivation tests were needed - I'll open a separate PR for replacing erasedValue with typeOf (see #5564, last 3 comments specifically).

@smarter I stared very hard at OrderingConstraint and Constraint, and I believe that current implementation is sound. Specifically: I think the calls to TypeComparer#lub in fullXBounds only differ from OrType when HK types are involved (see TypeComparer#liftIfHK), and the ones in replace only rely on Any | T being simplified to T. I added a test for some more creative usages of GADT bounds in the last commit. I think this PR is fine to merge as-is, though we may want to re-think Constraint calling lub/glb later on.

EDIT: to clarify, the above assertions about TypeComparer#lub invocations only apply to the way SmartGADTMap uses (Ordering)Constraint.

@odersky I believe GADT bounds cannot ever be approximated, only simplified. I think it'd be sound to simplify T >: Sko(U) | Sko(U) to T >: Sko(U), but I also think we should wait until we know there's an actual performance problem. Normal compilation times seem to be unaffected.

EDIT: see conversation below reg. simplifying skolem unions.

@Blaisorblade
Copy link
Contributor

Blaisorblade commented Jan 21, 2019

I think it'd be sound to simplify T >: Sko(U) | Sko(U) to T >: Sko(U)

If those are the same Sko(U) (by reference equality), yes, otherwise it's not clear. I'll write Sko(i1, U) and Sko(i2, U) for two distinct skolems (that is, they're not reference-equal; think of i1, i2 as the respective object identity).

Now, T >: Sko(i1, U) admits T = Sko(i1, U) as solution, while T >: Sko(i1, U) | Sko(i2, U) doesn't, so simplifying the latter constraint to the former would add invalid solution. Following @odersky's summary, that's not okay for inference, but okay for GADTs (that is, for proving some constraints are unsatisfiable).

BTW, unsound inference results are usually caught later, but only if the violation is visible in the trees produced by inference. Conversely, constrainPatternType(SkolemType(subtp), tp) invents a Skolem and uses it in a constraint, but it's not obvious this constraint is added to to the actual tree.

@abgruszecki
Copy link
Contributor Author

@Blaisorblade I'm less and less clear on whether it's actually sound or not to simplify the bounds in such a manner. For typechecking, intuitively it should be acceptable to know less bounds, because then we will simply not accept some "morally valid" code. Note that this'd mean that it'd be sound to drop both skolems from the lower bound without losing soundness. For exhaustivity and useless pattern checks (which you may have meant by "some constraints are unsatisfiable"), we actually would not want to simplify the constraints in such a way (since T = Sko(i1, U) might be a constraint required by a subpattern, and knowing that T >: Sko(i1, U) | Sko(i2, U) lets us decide that a pattern is useless).

@Blaisorblade
Copy link
Contributor

I'm less and less clear on whether it's actually sound or not to simplify the bounds in such a manner. For typechecking, intuitively it should be acceptable to know less bounds, because then we will simply not accept some "morally valid" code.
For exhaustivity and useless pattern checks (which you may have meant by "some constraints are unsatisfiable"), we actually would not want to simplify the constraints in such a way (since T = Sko(i1, U) might be a constraint required by a subpattern, and knowing that T >: Sko(i1, U) | Sko(i2, U) lets us decide that a pattern is useless).

Not approximating is better than approximating if we can afford it, of course. So if you can avoid simplifying those skolems without performance loss, IMHO go ahead.

But you talk about soundness, and I disagree there. And I think the two scenarios you describe should work the same way.

Overall, we seem to have three scenarios:

  1. type inference, where the solver must search for sufficient constraints
  2. GADT typechecking, where the solver must search for necessary constraints; this isn't implemented yet, but let me assume it'll be.
  3. warnings about missing branches (exhaustivity) or useless branches (redundant patterns.

It seems the solver can only be in sufficient or (if implemented) necessary mode.
For scenario 3, we can pick which — since we must always approximate in one or the other direction. Neither direction breaks type soundness, but missing branches cause MatchError while extra branches are just extra code (which can be assert(false) if the user is sure).
But the warning says "match may not be exhaustive" — that is, the compiler couldn't prove the missing cases are impossible, so it approximate to "please add them".
That is, a pattern is requested unless we can soundly prove it's impossible. That's what I argued in #4029.

If we agree on this behavior for scenario 3, then the solver must search for necessary constraints, like scenario 2 — and if the resulting constraint is unsatisfiable, the original ones were as well and the considered branch is impossible.

The above implies the compiler might require unused cases. You seem to disagree.
But we can't avoid errors in both directions, so do you want to err in the other direction?

We should pick somehow and stick to that.

@abgruszecki
Copy link
Contributor Author

abgruszecki commented Feb 6, 2019

@Blaisorblade that comment was from two weeks ago, before we discussed everything thoroughly and I came to the conclusions from my presentation/report. The situation, as I understand it now, is as follows:

Type inference is concerned with finding a solution satisfying constraint C. We can instead solve a constraint C' (found, for example, by simplifying T >: U(x) | U(y)* to T >: U) iff a solution to C' is guaranteed to be a solution for C. This is what I understand to be the sufficient case - solving C' is sufficient for solving C.

* - allow me to use the syntax U(x) for denoting some singleton of type U, since singleton types other than skolems concern us as well.

GADT constraint inference is concerned with finding the strongest possible constraint D entailed by C, G, x: P, x.type <: S where C is the current constraint, G is the current environment, P, S are respectively types of the pattern and the scrutinee, x is an existential term of types both P and S. If D is entailed by the above, then we can return D' instead as long as any solution to D is also a solution to D'. This is the necessary case - solving D' is necessary for solving D. We can also say that D' is no stronger that D, which in a roundabout way brings us to the "less constraints is always sound for typechecking" argument.

To emit safe GADT exhaustivity warnings, we must decide if the constraint C entailed by a particular missing pattern is unsatisfiable. Yes, we want/should err on the side of caution and emit a warning even if we are uncertain that the pattern is necessary. Therefore, instead of solving UNSAT(C), we may solve UNSAT(C') iff any solution to C is guaranteed to be a solution to C'. Again, this is the necessary case.

What this means is that for GADT constraints, we can simplify T >: U(x) | U(y) to T >: U(x) or T >: U(y) or T >: Nothing. However! For constraint inference, this would mean that we don't accept code that is morally valid. For exhaustivity checking, this would mean that we emit spurious warnings. Both of those cases are bad from user perspective - we should have a good reason to not accept code that should compile or to bother users with useless warnings. Since there's so far no observable impact on performance, I'd just not simplify the constraint for the time being. There's some subtlety to that argument, but I'd very much like to first finish fixing the errors in this PR that I discovered and then to present everything wrapped up together.

@abgruszecki abgruszecki force-pushed the skolem-based-gadt-constraints branch 4 times, most recently from 1763131 to d0f48b8 Compare February 13, 2019 09:59
If we do not insert TypeVars into the bounds every time, then the only
time we need to remove them is when taking the full bounds of some type.

Since that logic now resides in ConstraintHandling and replaces all
TypeParamRefs internal to SmartGADTMap, we have no need to perform
expensive type traversals.

This removes the only reason for caching bounds.

The addition of HK parameter variance adaptation was necessary to make
tests/pos/i6014-gadt.scala pass.
gadtSyms/gadtContext became redundant, so they were removed.

The logic in typedDefDef was adjusted to only create a fresh context
when necessary.
The rationale for using a Skolem here is: we want to record that there
is at least one value that is both of the pattern type and the scrutinee
type.

All symbols are now considered valid for adding GADT constraints -
the rationale is that set of constrainable symbols should be either
selected on a per-(sub)pattern basis, or be the same for all matches.
Previously, symbols which were only appearing variantly in a scrutinee
type could be considered constrainable anyway because of an outer
pattern match.
Also rename the classes to better reflect their role,
and document and reorder definitions to make more sense.
The added symbols can have inter-dependencies in their bounds.
constrainPatternType is specific to term patterns, whereas in match
types there is a simple subtyping relationship between the pattern and
the scrutinee.

In the future, simply calling isSubType in GADTFlexible context would
likely be sufficient.
@abgruszecki
Copy link
Contributor Author

I added recaps of what we talked about in person to all the discussions. Compared to when we talked, the only non-trivial change not related to any discussion is adding GadtConstraint.addToConstraint(_: List[Symbol]), which mirrors ConstraintHandling#addToConstraint and correctly allows registering symbols with interdependencies. This removed some code duplication between all the places that needed to register multiple symbols in a loop.

@abgruszecki abgruszecki requested a review from odersky May 7, 2019 07:55
@abgruszecki abgruszecki assigned odersky and unassigned abgruszecki May 7, 2019
Copy link
Contributor

@odersky odersky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM. Can be merged after final fixes.

true
} || op2

if (ctx.mode.is(Mode.GADTflexible)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code should be moved out to a separate method (e.g. gadtEither), and there should be a doc comment explaining what it does, analogous to the original either.

@@ -208,7 +208,10 @@ class PlainPrinter(_ctx: Context) extends Printer {
else {
val constr = ctx.typerState.constraint
val bounds =
if (constr.contains(tp)) constr.fullBounds(tp.origin)(ctx.addMode(Mode.Printing))
if (constr.contains(tp)) {
val ctx0 = ctx.addMode(Mode.Printing)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not leave it inline? ctx.addMode(Mode.Printing).typeComparer.fullBounds(tp.origin)

@odersky odersky assigned abgruszecki and unassigned odersky May 7, 2019
@abgruszecki abgruszecki merged commit 3ef991d into scala:master May 13, 2019
@abgruszecki abgruszecki deleted the skolem-based-gadt-constraints branch May 13, 2019 13:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants