Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling short-circuit operators #23

Open
gomoripeti opened this issue Jul 16, 2018 · 40 comments
Open

Handling short-circuit operators #23

gomoripeti opened this issue Jul 16, 2018 · 40 comments

Comments

@gomoripeti
Copy link
Collaborator

gomoripeti commented Jul 16, 2018

let's see the following correct function with spec

-spec f(atom() | integer()) -> true | integer().
f(AI) ->
    is_atom(AI)     %% Expr1
      orelse
        AI + 1.     %% Expr2
  • the type of variable AI in Exrp2 must be the subtype of number() :: integer() | float() as the first argument of +
  • the type of variable AI in Exrp2 is atom() | integer() based on the spec which is not a subtype of integer() | float()

How could Gradualizer realise that Expr2 is only executed when type of AI is integer()?

  • if type checking was supersmart it would realise that if NOT is_atom(AI) => typeof(AI) == (atom() | integer()) & (not atom()) == integer() - however this is not always possible

Let's see a more generic example

-spec f(input()) -> any().
f(A) ->
  cond(A)
    orelse
      body(A).

-spec cond(input()) -> boolean().
-spec body(required()) -> any().

In this case what the type checking could do is realise that body(A) is only executed in a subset of cases therefor type of A in body(A) is just a subset or subtype of input(). In case input() & required() == none() (the intersection of the two types is empty) then we can say that body(A) will always fail, but otherwise we cannot know (and have to leave type checking at runtime). This method requires to be able to calculate the intersection of two types.

What is the right way to handle these kind of constructs? (if, case, andalso, orelse, multiple function clause)

@zuiderkwast
Copy link
Collaborator

zuiderkwast commented Jul 18, 2018

Just a comment on orelse and andalso. @josefs regards them as boolean operators where both operands should be booleans. Otherwise it's a type error. (@josefs: Correct me if I'm wrong!) I regard them as conditional constructs where only the first operand must be boolean. I think the latter has become common in Erlang code during the last years. There is even a line in this project where it is used like this:

 File =/= undefined andalso io:format(...),

Perhaps we should decide about which semantics are desirable.

@gomoripeti
Copy link
Collaborator Author

I also regard them as conditional constructs, as that is how the Erlang language (compiler/VM) handles them. This is also how I implemented checking them in #24, exactly to address the self-gradualizing failure on the line that you highlighted.

Nevertheless the problem domain in this ticket is more generic, how to handle that the type set of a variable is possibly reduced within a conditional clause body relative to its global type.

@josefs
Copy link
Owner

josefs commented Jul 19, 2018

Type refinement would be a very powerful feature to have. But it's also non-trivial to implement and it's unclear to me how important it is. So I think the right way forward is to first try and get the gradualizer into a state where people can use it. Perhaps a version 1.0. Then we can more easily see what features people need and miss and then start to tackle them in the right order.

As for my views on the boolean operators, @zuiderkwast , you are correct. They are boolean operators and both arguments should be boolean. This has to do with the underlying philosophy that I have of the gradualizer and how it's distinct from the dialyzer. The dialyzer tries to only find errors when they are guaranteed to happen. I consider this akin to how you put it @gomoripeti :

as that is how the Erlang language (compiler/VM) handles them

With the gradualizer I hope to provide a different experience, a typing discipline, which not only thinks about how the compiler and VM works, but is based more on the programmers intent, much like a traditional type system. One of my main motivations for starting to work on the gradualizer was to provide a typesystem-like experience for Erlang. I've seen many people being disappointed when the dialyzer didn't complain about things that a traditional typesystem would catch. Such as passing something non-boolean to a boolean operator.
As I pointed out in #24, if people want to pass something other than a boolean as the second argument then we have the type any() for that. So we have a story for allowing to use lazy boolean operators as conditional expressions, and it's even the default! But once the programmer starts to add types, the operators ought to be treated like a conventional typesystem.

@gomoripeti
Copy link
Collaborator Author

thank you @josefs, enlightening the philosophy and attitude of Gradualizer, it helps me when contributing.

But I dont want to derail this discussion with lazy ops. Keeping with real conditional constructs:

I'd like to argue that it is important: I actually hit this problem in practice (even with self-checking Gradualizer), let's say its second priority. (After handling all language constructs in the simplest way so that there is no crash any more). The practical problem here is that Gradualizer would emit a false warning (which is fine) but Gradualizer stops at the first warning per function so it might not be able to detect other more important issues. (It must stop of course)

This is also an interesting problem, so without aiming to implement something immediately I'd be happy to discuss this non-trivial problem domain (as a naive/newbie typer). Maybe some papers around the topic. (So by the time its time to implement it there would be clear discussion and understanding around it)

@josefs
Copy link
Owner

josefs commented Jul 19, 2018

Ok, I certainly don't mind discussing these topics in general.

I haven't spent very much time thinking about type refinement but if you want to dig into that subject I think that you should at the very least read Sums of Uncertainty: Refinements Go Gradual, a paper from the POPL conference last year. I haven't yet read the whole paper but I assume we will be able to borrow quite a lot from that paper.

@zuiderkwast
Copy link
Collaborator

The general refinement is topic is interesting and I want to look at the paper when I have time. Thanks for the link!

For the other topic, the lazy boolean ops, I agree with @gomoripeti that this is an interesting topic (although not first priority) as well and I like @josefs's points about the "programmer's intent" as a philosophy for the type system. In this case though, I suspect that what we can expect about the intent is not that clear. There is a trend in using them with non-booleans. I looked up the Erlang reference manual, where these operators are found not under boolean expressions but just under Short-Circuit Expressions. There is even an example with non-booleans. Quotes:

  • "From Erlang/OTP R13A, Expr2 is no longer required to evaluate to a Boolean value. As a consequence, andalso and orelse are now tail-recursive."

  • "Example 1:

    case A >= -1.0 andalso math:sqrt(A+1) > B of

    This works even if A is less than -1.0, since in that case, math:sqrt/1 is never evaluated.

After reading this, I think that this is a matter of style and it would be controversial to regard it as a type error, even in a spec'ed function. I doubt that we have the power to dictate what is good style.

We could start by suggesting changes to the reference manual, or by discussing it in the erlang-questions mailing list. It fits perhaps as a rule for the style checker Elvis.

@josefs
Copy link
Owner

josefs commented Jul 21, 2018

Thanks @zuiderkwast for the reference to the reference manual. I read that section very differently from you though. Here's my understanding: they've made the short-circuiting boolean operators tail-recursive in order to make them faster. In order to do that, they eliminated the checking of the type. In the second half of Example 2 they are just showing what is possible now, not what is good style.

I'm not sure why you're including Example 1 here as the second argument of andalso is indeed boolean.

@zuiderkwast
Copy link
Collaborator

Oops! I missed the > B part. :-) So, I guess you're right. This was slightly embarrassing, hehe.

@gomoripeti gomoripeti changed the title Type refinement in conditional expressions Handling sort-circuit operators Jul 22, 2018
@gomoripeti
Copy link
Collaborator Author

I repurposed this ticket to host discussion about handling sort-circuit operators (as topic is focusing on that anyway :) )

@gomoripeti
Copy link
Collaborator Author

I now also read the short-circuit chapter and it really seems just an optimisation, and not encouraging the use of any type as second arg. (There is no such example in the doc indeed).

However it is documented that the second arg "is no longer required to evaluate to a Boolean value" and there are certainly programs out in the wild that utilise this (I realise I was the one who added such use to gradualizer - it is a natural use for me) so Gradualizer should support such programs in some way. @josefs, you already described what is the way: the programmer should explicitly add type annotation showing such an intent.
However I'm not sure what should count as annotation expressing such an intent, could you give 1-2 examples?
I could more easily imagine the opposite behaviour: by default Gradualizer allows any type as 2nd arg, but the programmer can type annotate the result of the andalso/orelse expression to always expect boolean(). This fits better into gradual typing: by default any() is the default type and programmer can explicitly add types if wants to restrict that. It sounds to me less intuitive to have a default type, and with explicit annotation one can lift such restriction. (I understand though that this does not reflect the intent that boolean arg should be encouraged)

@josefs
Copy link
Owner

josefs commented Jul 22, 2018

I'm confused by what you write. Non-boolean values as second argument are already allowed as the default behavior in Gradualizer! Just don't write any type specs! We seem to agree that this should be the default behavior.

However, if there are type specs and the programmer still wants to use non-boolean values then these values must be returned from a function which has return type any().

Here are three examples which pass the gradualizer, despite using non-boolean values as second argument to andalso:

-spec f1() -> boolean().
f1() ->
  true andalso g1().

-spec g1() -> any().
g1() -> 3.

-spec f2() -> boolean().
f2() ->
  true andalso g2().

g2() -> 5.

f3() ->
  true andalso g3().

-spec g3() -> any().
g3() -> apa.

Note that the second argument to andalso is a function which has any() as return type.

@zuiderkwast zuiderkwast changed the title Handling sort-circuit operators Handling short-circuit operators Jul 22, 2018
@gomoripeti
Copy link
Collaborator Author

Thanks for the examples. I was confused by the inferred type of literals, but that is clarified in #26.
So basically if one would like to use any non-boolean expression (to be precise any expression that has non-boolean and not-any() inferred type) as 2nd arg, it should be wrapped in a function.

@zuiderkwast, thanks for starting a little poll on the erlang mailing list :)

@gomoripeti
Copy link
Collaborator Author

gomoripeti commented Jul 23, 2018

just an interesting note: I sometimes use andalso/orelse in the body of match-specs where conditional expressions are not allowed eg:

dbg:fun2ms(fun(_) ->
  message( element(1,caller()) =/= mymod andalso caller() )
end)

this would be equivalent with the below (but that is an illegal match-spec fun)

dbg:fun2ms(fun(_) ->
  case caller() of
    {mymod, _, _} -> message(false);
    C -> message(C)
  end
end)

this is the only (very exotic) scenario I can imagine when the non-boolean return value of andalso is actually used, and not just for the side-effect.

And actually this is converted to a match-spec term by a parse-transform without any op or fun objects so in reality this is not real code.

@Zalastax
Copy link
Collaborator

Non-boolean values as second argument are already allowed as the default behavior in Gradualizer! Just don't write any type specs!

I find this statement controversial. To me, the static types are the truth even if the types are less permissive than possible. Dynamic code that calls a function with values outside its domain is wrong. It's of course not possible to capture every dynamic pattern in static types but the ambition has to be that idiomatic code should be typeable. Deciding on andalso only accepting booleans is a statement (that's fine!) and we need to argue why we take that stand. Not writing types is also going to get more difficult as the type checker gets smarter and can infer more types.

I think it would be a good idea to go for the simpler alternative if this is not yet a common Erlang pattern and there exists a good alternative, otherwise I would want to follow Flow and TypeScript where && and || basically have the type (a: boolean, b: T) => boolean | T.

@josefs
Copy link
Owner

josefs commented Sep 29, 2018

@Zalastax, I'm sorry but I don't quite understand what you're arguing for. Would you care to elaborate on your position. What behaviour would you like Gradualizer to have in this case?

As for myself, I'm starting to come around in the issue. I now have a bit of experience with running Gradualizer on existing code bases. Based on that experience I think the right thing for Gradualizer is to be very lenient and allow for common Erlang idioms. I haven't yet bumped against code which uses non-boolean values as the second argument to andalso, but if there are code bases out there which use this idiom heavily then Gradualizer might report so many errors as to be unusable for them. That would be a real shame.
I may not like this idiom but there is no point of depriving people who use the idiom from using Gradualize. I'm not going to rush to change this behaviour though.

@Zalastax
Copy link
Collaborator

@josefs I was not really arguing for any side. Instead I was stating that I don't think we should lean on the fact that it's possible to opt out from type checking when deciding on the semantics.

I want to capture idiomatic Erlang code as well but if a pattern is troublesome and not well established I think we should allow ourselves to make a statement. This particular idiom seems fairly harmless and it has a following so I'm leaning towards changing our behaviour but I could go any way.

@UlfNorell
Copy link
Collaborator

In addressing #72 (PR: #83), I changed the type of andalso and orelse to precisely what @Zalastax suggested: (a: boolean, b: T) => boolean | T.

@gomoripeti
Copy link
Collaborator Author

(just for the record, sorry I did not add it earlier)

@zuiderkwast run the question through the erlang mailing list, which became a quite lengthy discussion (thread starts here: http://erlang.org/pipermail/erlang-questions/2018-July/096024.html)

The short outcome was that andalso/orelse should work on bools and it is mostly when prototyping/temporary quick fixing/debug logging when it is used otherwise.

Also historically the runtime check if second arg is boolean was only removed for optimisation reasons, and not in order to allow this broader usage.
(eg this function became tail recursive f([H|T]) -> is_integer(H) andalso f(T); f([]) -> true,)

summary from Viktor from the mailing list:

Especially, I liked this point from Dmitry Kolesnikov:

Long time ago, I’ve used andalso syntax but it never stays longer at
code repository due to readability. It has been refactored to
function(s) with guards or pattern match.

I think it is the process from prototyping to maintained code.
Refactoring involves gradually adding names to things by introducing
functions, adding specs and rewriting hacks into proper solutions. A
spec with a return type like false | ok | {error, Reason} would make me
reconsider the andalso trick. I guess this process gradually makes the
code change style from dynamically typed style (LISP, Perl) to
statically typed style (ML, Haskell). I find this interesting.

Although I was initially supporting the broader usage I was convinced that by default Gradualizer could check for the stricter variant (and adding an option to switch to relax the spec).
Now that the check is relaxed already, I don't have strong feelings about changing the behaviour again though. :)

@gomoripeti
Copy link
Collaborator Author

I am afraid as of master @ d74675a we have a bit inconsistent implementation. While type checking accepts any type as second argument (only needs to be subtype of the result type)

{VarBinds2, Cs2} = type_check_expr_in(EnvArg2, ResTy, Arg2),

type inference still expects the second argument to be boolean

{Ty2, VB2, Cs3} = type_check_expr(Env#env{ venv = UnionVarBindsSecondArg(Env#env.venv,VB1 )}, Arg2),
case subtype(Ty2, {type, P, bool, []}, Env#env.tenv) of

Moreover I think we can restrict the type from boolean() | T to

-spec andalso(boolean(), T) -> false | T.
-spec orelse(boolean(), T) -> true | T.

@UlfNorell
Copy link
Collaborator

It would be

-spec andalso(boolean(), false | T) -> false | T.
-spec orelse(boolean(), true | T) -> true | T.

but yes, this is a straightforward change.

@zuiderkwast
Copy link
Collaborator

So the return type should be the same as or a subtype to the 2nd operand? Is the point of this that we need to be able to push the return type into the 2nd operand when checking or is it some other theoretical benefit?

@UlfNorell
Copy link
Collaborator

Well if T = true (i.e. checking against boolean()) it would be strange to require the second argument to andalso to always be true... But also we're pushing types in and we don't have a subtraction operation on types.

@zuiderkwast
Copy link
Collaborator

We will need subtraction to implement type refinement, don't we? At least a simplified one, the simple case of union of single atoms.

@UlfNorell
Copy link
Collaborator

Sure. The first point still stands though: we shouldn't give a type error on B andalso false :: boolean().

@zuiderkwast
Copy link
Collaborator

I still don't think I understand completely. Isn't it enough to require that the 2nd operand is a subtype of the result?

I though A andalso B is syntactic sugar for case A of false -> false; _ -> B end and that we could type check it in the same way. (At least that it is possible. Not saying we should.)

@UlfNorell
Copy link
Collaborator

The second operand is an expression that should be checked against some type, not a type that should be a subtype of some other type.

We can check A andalso B the same way as you would the case (although the actual semantics has a true in the second case), namely check that false :: R and B :: R for the expected type R.

Another way of writing the type

-spec andalso(boolean(), false | T) -> false | T.

is (in pseudo-erlang)

-spec andalso(boolean(), R) -> R when false :: R.

@Zalastax
Copy link
Collaborator

Zalastax commented Nov 19, 2018

Don't we get B andalso false :: boolean() automatically via widening? The type of that expression is false but that's a subtype of boolean() so no cow on the ice.
I argue that the spec could actually be [edit: corrected syntax to what I think you mean / Viktor]

-spec andalso(true, T) -> T;
             (false, T) -> false.

This spec leaks implementation details but gives the most exact static type possible.

@UlfNorell
Copy link
Collaborator

This is indeed the most precise type, but I don't think we have the ability to check it at the moment.

@Zalastax
Copy link
Collaborator

Might be so. What about widening then? Don't we get away with B andalso false :: false ∧ false <: boolean()? If not, let's go for your suggested type until we can check what I suggested.

@UlfNorell
Copy link
Collaborator

I am little confused about what we're talking about here to be honest. The way we would actually check andalso is:

subtype(false, R)
check(A, boolean())
check(B, R)
--------
check(A andalso B, R)

which corresponds to (boolean(), false | T) -> false | T.

What is the widening alternative?

@Zalastax
Copy link
Collaborator

After pondering this a bit more I believe that it makes no difference to what programs type check. The only difference is whether we let widening happen to the type of the result or the type of the right argument.

My intuition would have it as

subtype(false, R)
subtype(T, R)
check(A, boolean())
check(B, T)
--------
check(A andalso B, R)

or, assuming that all checks are performed with "is compatible" semantics

check(A, boolean())
check(B, T)
--------
check(A andalso B, false | T)

@UlfNorell
Copy link
Collaborator

In the first case how would you pick T? It really needs to be R since B might have precisely the type R.

The second case is just wrong. If T is true (i.e. we're checking against boolean()), this is saying that we should require check(B, true).

@Zalastax
Copy link
Collaborator

I think there's some theory / smart technique that I don't know about here and it's unfortunate that I'm not able to communicate it better. Thanks for explaining the formalities! As you say, if we're checking against boolean(), T would indeed have to be true. In my mind I considered boolean() to still be a valid type for T since false | boolean() :: boolean() but I guess that makes type checking impossible to implement.

To elaborate on what I actually was going for: it's completely fine that andalso(true, false) :: false since it will be lifted to boolean() if used in a context where that's needed. I've made an example in TypeScript:

function andalso<T>(A: boolean, B: T): false | T {
    if (A) {
        return B
    } else {
        return false
    }
}

const X: false = andalso(true, false)
const Y: boolean = X

@UlfNorell
Copy link
Collaborator

UlfNorell commented Nov 20, 2018

it's completely fine that andalso(true, false) :: false

The typing rule I suggested delivers this. We would check (since A = true, B = false, and R = false)

subtype(false, false)
check(true, boolean())
check(false, false)

which are all clearly satisfied.

The only thing we don't get is andalso(false, X) :: false for an X that doesn't have type false.

@Zalastax
Copy link
Collaborator

Good! I'm still confused why the false is needed in the right argument but I trust you.

@UlfNorell
Copy link
Collaborator

You are not confused about the fact that we should allow the right argument to be false, I hope.

I guess what's causing the confusion is writing the type as

-spec andalso(boolean(), false | T) -> false | T.

writing it (as I did above) as

-spec andalso(boolean(), R) -> R when false :: R.

is closer to what the type checker would actually do, and is maybe less confusing, but equivalent.

The key to understanding the former type is to note that the T is universally quantified, i.e. andalso has this type for any value of T, not just some particular value of T that makes sense in a given situation. For instance by taking T = true we get

-spec andalso(boolean(), boolean()) -> boolean().

If we take T = none() we get (since false | none() is the same as false)

-spec andalso(boolean(), false) -> false.

We can also take T = error and get

-spec andalso(boolean(), false | error) -> false | error.

In each of these cases it would be wrong to drop false from the type we require of the second argument. If B :: boolean(), B andalso false should check against boolean(), false, and false | error.

@RumataEstor
Copy link

Does it mean that andalso(true, number()) will fail to typecheck as false: number() does not hold?

@UlfNorell
Copy link
Collaborator

It fails to type check as a number: true andalso 15 does not have type number() since we don't consider the value of the first argument. It does type check against the type number() | false though.

@RumataEstor
Copy link

@UlfNorell So if I have

-spec get_value() -> number().
get_value() -> 42.

-spec test(boolean()) -> false | number().
test(Pred) -> Pred andalso get_value().

will it also fail?

@UlfNorell
Copy link
Collaborator

No this is accepted. We will check (R = false | number() in the rule above)

subtype(false, false | number())
Pred :: boolean()
get_value() :: false | number()

all of which hold.

UlfNorell added a commit that referenced this issue Nov 22, 2018
Essentially
```erlang
-spec andalso(boolean(), false | T) -> false | T.
```
and same but with `true` for `orelse`.

cc #23
UlfNorell added a commit that referenced this issue Nov 23, 2018
Essentially
```erlang
-spec andalso(boolean(), false | T) -> false | T.
```
and same but with `true` for `orelse`.

cc #23
berbiche pushed a commit to berbiche/Gradualizer that referenced this issue Feb 9, 2021
Essentially
```erlang
-spec andalso(boolean(), false | T) -> false | T.
```
and same but with `true` for `orelse`.

cc josefs#23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants