-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Solving local constraints at each function call to a polymorphic function #553
Comments
Hi, @xxdavid! Interesting findings! I haven't read Pierce and Turner's Local Type Inference completely, so I have only a superficial understanding of the paper - mostly what they achieved, not how exactly. However, your reasoning seems correct and I'm sure there's room to improve the constraint solver. What I know for certain is that the constraint solver code that's merged to the master branch was code that I salvaged from an old @josefs's PR which, according to him, never got completely finished. I rebased it onto the then current master, tried running on some examples and it turned out it does report some errors which were previously false negatives, so I cleaned it up a bit, added some tests, and we merged it. Otherwise, it would probably bitrot even more, whereas now it provides at least a little more feedback. It could definitely be improved, though. On top of what you said, I think there's at least one more place where, I think, solving constraints could help make the message more precise - https://github.com/josefs/Gradualizer/blob/master/src/typechecker.erl#L3349-L3352. There might be more, but I don't have examples at hand. In any case, good thinking! |
Hi @erszcz! Thank you for your perspective and sorry for taking it so long for me to respond. I think it was good that you saved @josefs' solver code, cleaned it, and merged to master, even though it wasn't complete. Without it, we probably would not have it. In the previous weeks I was trying to get a more complete picture of how Gradualizer works with polymorphism (I also came to an interesting realisation, will write about it later). And I still think the Local Type Inference approach is doable in our case. And I also agree that the LTI approach should also fix the case you linked, thanks for pointing it out! |
Hi @xxdavid!
Sounds great! I'm pretty sure such an addition will be welcome by everyone 👍
I'm quite intrigued now! I'm looking forward to learning your findings. I have a few thoughts to share, too, I just have to find a while to write them down. |
Hi! So I've been researching Local Type Inference (LTI) for the past months, and here's what I found out. In essence, the LTI algorithm goes like this. When inferring the type of a polymorphic function application, you first generate constraints on involved type variables by declaring that the actual arguments should be subtypes of the function parameters. For each type variable, you manage a single lower and a single upper bound (these are the constraints). You start with a lower bound equal to For example, suppose that we have these functions:
then the call
The LTI comprises (mainly) of these components:
The first four are dependent on the set of types, whereas the last two are not (mostly). This means that if you can provide the first four relations for your set of types, you can probably have LTI. Therefore unions, tuples, maps and things like that should pose no problem in implementing the LTI, as we already have three of four implemented in Gradualizer. In our case the subtyping and constraint generation relations are both implemented via the A thing that is much harder to add is gradual typing. I haven't found any paper combining LTI with gradual typing. I've been playing with it quite a lot myself and at the end I came up with a trivial extension of the original LTI algorithm that works nicely in most cases and, as it turned out, is also used in practice. In pure subtyping (without A simple approach to adding gradual typing to the LTI would be (as also suggested by @zuiderkwast here) to just ignore all Another possible and trivial change to the LTI is to use consistent-subtyping in all places where subtyping was originally used, and to add
given this typing environment: This approach is also taken by eqWAlizer as we can see in the docs (1, 2) and in the code. A problem with this approach is that it lacks theoretical foundations, that is, we don't know (formally) how good the modified algorithm exactly is. The property of always inferring the most informative type, proved in the paper, no longer makes sense as we are not dealing with a partial order here. For example, all of the following types would be valid outputs according to the original property when inferring type of the expression It should also be noted that this method is a bit weaker than carrying all the constraints throughout a function and solving them only at the end. For example, we would infer However, I think that having the LTI is still worth it. First and foremost, it enables us to provide better error messages which are local (each variable has a concrete type) and which can pinpoint the problematic line (as shown in the first post). Second, I think it would greatly simplify our codebase as constraints would probably be needed only in What do you think? If you agree, I will start with the implementation. I originally wanted to do the theoretical part first (and prove how good/strong the gradual LTI is) and then implement it. But I feel a bit stuck now, so I would rather reverse the order, given that eqWAlizer has already successfully implemented it in a similar way. |
Hi! I have some progress on the LTI implementation to share. I've implemented the core of LTI into Gradualizer, and while it still needs quite a lot of refactoring and tuning for edge cases, it already behaves well in some cases. I am very happy about how accurate error messages it now produces. Take a look: 1 -module(test_lti).
2
3 -spec id(A) -> A.
4 id(X) -> X.
5
6 f() ->
7 Answer = id(42),
8 atom_to_list(Answer). Old:
New:
Notice that there are no type variables involved and that the precise location of the error is printed. It's because at line 8 we already know that The second type of errors related to polymorphism may be in the call itself. Here I came up with a new error message (not fully polished yet) and I really like how informative it is. 1 -module(test_lti).
2
3 -spec map([A], fun((A) -> B)) -> [B].
4 map([], _F) -> [];
5 map([X | Xs], F) -> [F(X) | map(Xs, F)].
6
7 g() ->
8 map([1, 2, 3], fun atom_to_binary/1),
9 ok. Old:
New:
So I think we are headed in the right direction. I'll continue working on it and I expect that in a few weeks, I'll have it ready for code review. 🙂 |
Wow! These results are awesome 🎉 Keep up the great work, @xxdavid! |
Hi!
I've read the Local Type Inference paper by Pierce and Turner, and I came to an interesting realisation I would like to share.
In the paper (mainly in chapter three), they discuss a way to infer missing type application annotations / type arguments (so that instead of
id [Int] (3)
we can write justid (3)
, provided thatid
is of type∀X. X -> X
). Their approach is local in the way that they only introduce a new inference rule for function applications without the type application annotation, the rule computes the best substitution (intuitively the one which preservers the most information; based on locally gathered constraints that are immediately solved) and immediately uses the substitution to substitute type variables in the type of the function call expression.This is in contrast with what we are doing in Gradualizer where we only gather all the constraints when checking a function, and only then, after we checked the function, we solve the constraints. If the constraints cannot be solved, we conclude that somewhere in the function there was a polymorphic type error (usually that the lower bound of some type variable is not a subtype of some other type). If they can be solved, we are happy and discard the found substitution.
Let me illustrate it on a simple example:
Gradualizer (with
--solve_constraints
works like this):The type variable
A_typechecker_3742_1
is the instantiation ofA
infirst/1
. When checking the callfirst(Tuple)
, we record thatA_typechecker_3742_1
must have a lower boundboolean()
(because we pass inTuple
of type{boolean(), map()
). We then setX
to have the typeA_typechecker_3742_1
. And when checkingadd_one(X)
, we also record thatA_typechecker_3742_1
must have an upper boundinteger()
(becauseadd_one/1
takes only integers). After checking thatok <: ok
we have checked the whole function and move on to solving the constraints. Only now we discover that the constraints cannot be solved (asboolean()
is not a subtype ofinteger()
) and we raise a type error:On the other hand, with the approach described in the paper, it would go something like this.
When checking the call
first(Tuple)
, we find out thatfirst/1
is polymorphic (it is of type∀A. {A, any()} -> A
) and thus we want to find a suitable substitution. (I'll skip how the substitution is found.) We compute that the best substitution isboolean()/A
as it preserves the most information (i.e., the resulting type is the smallest possible; for instanceatom()/A
would also do but it would preserve less information). We immediately use the substitution and declare thefirst(Tuple)
expression to be of type[boolean()/A]A
which isboolean()
. The variableX
then also gets the typeboolean()
. When we get to check the calladd_one(X)
, we check directly thatboolean()
is a subtype ofinteger()
, which is not, and we now throw a type error! This way we can say something like this:which is much more informative than saying that somewhere in the function is a polymorphic type error of some kind (particularly when the function has tens of lines and multiple heads).
So my question is why don't we do it like this?
@josefs states in #122 that
but he doesn't say why we have to do it after we've traversed the whole function. Do you remember why, @josefs? Is it related to the issue with
any()
mentioned in the comment?And what do you think, @erszcz @zuiderkwast? Am I missing something?
The text was updated successfully, but these errors were encountered: