-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Syntactic function arity #12236
Syntactic function arity #12236
Conversation
In thinking about this PR, I'm unsure how best to help with review. I'm quite happy to review this -- but I imagine someone with more experience within the type checker should also review... but maybe can do a higher-level pass? In the end, we know review is a bottleneck and wish to lower the burden of dealing with large patches like these. So: how can I best help here? (Full disclosure: I helped choose this as a task for @ncik-roberts to work on, and I offered feedback on aspects of the implementation. I have not done any detailed code review here, though, and I have not contributed any of the code.) |
I don't follow the motivation for the backward-compatibility-breaking example: # type (_, _) eq = Eq : ('a, 'a) eq
# let bad : type a. ?opt:(a, int -> int) eq -> unit -> a =
fun ?opt:(Eq = assert false) () x -> x + 1
# let x : 'a. 'a = bad () Here the syntactic arity of |
"Syntactic arity" as I describe in this PR is a property of the expression language, not the type language. So the syntactic arity of The benefit of basing arity on the expression language is that effect ordering and runtime arity is completely predictable. You're right that the current version of OCaml avoids unsoundness by computing an arity of 2 here, but that arity is not syntactically apparent. The number of arrows in the type can't be used to figure out the arity in general (this statement is true for both trunk and for this PR): (* this has arity 2 in both trunk and this PR *)
let arity2 : type a. a -> a lazy_t -> bool -> a =
fun x (lazy y) ->
(fun b -> if b then x else y) That's a clarification of what syntactic arity means but doesn't respond to your point. I'll mention that it's a small change to rewrite # let good : type a. ?opt:(a, int -> int) eq -> unit -> a =
fun ?opt:(Eq = assert false) () -> (fun x -> x + 1)
val good : ?opt:('a, int -> int) eq -> unit -> 'a = <fun> Requiring this rewrite in these cases is certainly a trade-off against the benefits of arity. But "these cases" are limited to ones where the user is hiding an arrow in the type of the function being defined using a GADT pattern bound by that same function's parameters. Your judgement is helpful in gauging the impact of this. My feeling is that this would be very rare. |
@Octachron asked me for my opinion on this specific matter (the unification with a function type to guard against unsoundness). Here is my understanding/opinion right now. The unification can be understood as the fact that we are in fact type-checking an eta-expanded version of the program, that first takes several arguments (according on the syntactic arity of the function) and then applies the patterns to them. (This expansion is explained in the RFC document.) (* before *)
let f ?opt:(p1 = def1) p2 p3 = body
(* after expansion *)
let f ?opt:arg1 arg2 arg3 =
let p1 = match arg1 with
| None -> assert false
| Some x -> x
in
let p2 = arg2 in
let p3 = arg3 in
body Another way to prevent unsoundness would be to perform the expansion and type-check the result normally. Is it equivalent to the approach that was implemented, with a unification after the fact? (This is not obvious to me in presence of Pexp_newtype abstractions etc.) Otherwise, would it maybe be possible to formulate the check in a way that would more clearly relate to the obviously-sound expansion? Then there is the question of what we think of the expansion and the restriction it imposes on the typability of OCaml programs -- it rejects more programs. My current thoughts:
The An example of (2) would be: type (_, _) eqtest = Equal : ('a, 'a) eqtest | Nope : ('a, 'b) eqtest
let ex2 : type a . (a, unit -> unit) eqtest -> a = fun Equal () -> () An example of (3) was also given: type (_, _) eq = Equal : ('a, 'a) eq
let ex3 : type a . (a, unit -> unit) eq -> a = fun Equal () -> () Personally I certainly agree that rejecting examples in category (1) is fair game, and I would agree that rejecting (2) is also reasonable, but I do find rejecting (3) more frustrating (I suspect this is the category that @Octachron finds problematic). I can see arguments going both ways:
A weakness of the "accepting (3)" position is that the "clear criterion" very much depends on typing, we cannot tell if a pattern-matching is exhaustive or not just from the definition. The RFC as proposed is purely syntactic, which has the force of simplicity. |
I would say that the implementation is extremely close to just doing the expansion, but to be exhaustive, I would add two caveats:
The caveats seem minor to me; I discuss them below. My response here doesn't help decide when we should do this expansion, but it does support a notion that the restriction on typability is a question about the expansion as opposed to a question about the current implementation.
|
I think this design point is a good one to debate. While I tend to lean toward simplicity as an important factor, it is annoying that simplicity and backward compatibility pull in opposite directions, and others are better suited than I in judging the weight of this backward incompatibility. It strikes me, though -- and in conversation with @ncik-roberts -- that review on the correctness of this patch can be done even without a decision on this particular design point. Nick esitmates that changing the behavior in @gasche's case (3) would just add complexity, meaning that time spent reviewing the patch as-is would all be well spent. I am happy to be this reviewer (as I offered above), but I wanted to check that my review would be seen as sufficiently independent here before putting the time in. What do you think? |
I think that applying the expansion (or an equivalent type-checking transformation) unconditionally, at the cost of breaking typability in case (3) (as currently proposed in this PR) is the better choice here. Indeed, I find the degree of sophistication involved in accepting (3) unreasonable; my understanding is that we would have to type-check each pattern in
I find this too complex, and it has a backtracking flavor that is unpleasant. (Also, checking whether a pattern is exhaustive is complex and we occasionally get it wrong. Having this complex check determine the desugaring / operational behavior of the code does not feel great.) Unfortunately, I also have the impression that detecting this situation -- a function constructor is ill-typed because the current type is not known to be a function type but would be if we had not delayed the introduction of some GADT equations -- is also difficult, so I am pessimistic about the fact that we could write a targeted error message in this situation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a partial review -- I stopped in type_function
. More next week.
1cc4484
to
7258a15
Compare
@gasche The current approach proceeds as follows for an n-ary function:
So, if I am understanding you correctly, it seems to me that we can detect the situation you describe and write a targeted error message in the case that the unification in Step 2 fails. |
2f17f8e
to
295214a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little bit more review complete. More in an in-person meeting tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting closer!
fc5e3af
to
bb3ddf7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I have reviewed everything here save the files in lambda
and in ocamldoc
, saving those for someone more expert in those areas of the compiler.
828ee90
to
8288593
Compare
@lpw25 volunteered to read the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving the parts in lambda/
, which look good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it's in good shape, modulo the little concern about Merlin (which I believe @ncik-roberts is looking into). All the files have been reviewed for correctness. What's left for getting this merged?
I agree that it is probably time to merge this PR. Two questions:
|
For record's sake, my final opinion on the backward compatibility issue is that the change will be fine with a specialized and understandable error message for this exotic case. Moreover I agree with @ncik-roberts that this is can be done in a straightforward way on top of this PR. For the remaining review, I don't know if @gasche has reviewed the |
I've just added the Changes entry. I will now look at these things:
|
parsing/parsetree.mli
Outdated
[C] represents a type constraint or coercion placed immediately | ||
before the arrow, e.g. [fun P1 ... Pn : t1 :> t2 -> ...] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OCaml doesn't accept this syntax at present. Rather inconsistently, while we currently allow simple constraints on both let
and fun
forms:
let f x : t = e (* valid *)
let f = fun x : t -> e (* valid *)
we allow the let
form with a simple or full subtyping constraint
let f x :> s = e (* valid *)
let f x : t :> s = e (* valid *)
but don't allow the corresponding constrained fun
forms:
let f = fun x :> s -> e (* invalid *)
let f = fun x : t :> s -> e (* invalid *)
It would be good to eventually allow these last two forms, too, but for the moment the comment needs to be updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this. I've updated the example to use a simple constraint with fun
.
Tests from: - ocaml/ocaml#12236 (and the corresponding updates to outputs found in ocaml/ocaml#12386 and ocaml/ocaml#12391) - ocaml/ocaml#12496 (not merged)
Tests from: - ocaml/ocaml#12236 (and the corresponding updates to outputs found in ocaml/ocaml#12386 and ocaml/ocaml#12391) - ocaml/ocaml#12496 (not merged)
Tests from: - ocaml/ocaml#12236 (and the corresponding updates to outputs found in ocaml/ocaml#12386 and ocaml/ocaml#12391) - ocaml/ocaml#12496 (not merged)
Tests from: - ocaml/ocaml#12236 (and the corresponding updates to outputs found in ocaml/ocaml#12386 and ocaml/ocaml#12391) - ocaml/ocaml#12496 (not merged)
Tests from: - ocaml/ocaml#12236 (and the corresponding updates to outputs found in ocaml/ocaml#12386 and ocaml/ocaml#12391) - ocaml/ocaml#12496 (not merged)
…en-typing-constraints Restore a call to `instance` that was dropped in #12236
Tests from: - ocaml/ocaml#12236 (and the corresponding updates to outputs found in ocaml/ocaml#12386 and ocaml/ocaml#12391) - ocaml/ocaml#12496 (not merged)
Tests from: - ocaml/ocaml#12236 (and the corresponding updates to outputs found in ocaml/ocaml#12386 and ocaml/ocaml#12391) - ocaml/ocaml#12496 (not merged)
Tests from: - ocaml/ocaml#12236 (and the corresponding updates to outputs found in ocaml/ocaml#12386 and ocaml/ocaml#12391) - ocaml/ocaml#12496 (not merged)
Tests from: - ocaml/ocaml#12236 (and the corresponding updates to outputs found in ocaml/ocaml#12386 and ocaml/ocaml#12391) - ocaml/ocaml#12496 (not merged)
Tests from: - ocaml/ocaml#12236 (and the corresponding updates to outputs found in ocaml/ocaml#12386 and ocaml/ocaml#12391) - ocaml/ocaml#12496 (not merged)
I don't think it was mentioned in the comments, why is the |
Yes, this is a fun (type a) -> (module struct
type t = a
end : S with type t = a) There are many such examples; the RHS of the I suppose it could be reasonable to parse these as |
* Newtypes * Constraint/coercion * Add map_half_typed_cases * Implement type-checking/translation This also promotes tests whose output changes. * Add upstream tests Tests from: - ocaml/ocaml#12236 (and the corresponding updates to outputs found in ocaml/ocaml#12386 and ocaml/ocaml#12391) - ocaml/ocaml#12496 (not merged) * Fix ocamldoc * Update chamelon minimizer * Respond to requested changes to minimizer * update new test brought in from rebase * Fix bug in chunking code * `make bootstrap` * Add Ast_invariant check * Fix type-directed disambiguation of optional arg defaults * Minor comments from review * Run syntactic-arity test, update output, and fix printing bug * Remove unnecessary call to escape * Backport changes from upstream to comparative alloc tests * Avoid the confusing [Split_function_ty] module * Comment [split_function_ty] better. * [contains_gadt] as variant instead of bool * Calculate is_final_val_param on the fly rather than precomputing indexes * Note suboptimality * Get typecore typechecking * Finish resolving merge conflicts and run tests * make bootstrap * Add iteration on / mapping over locations and attributes * Reduce diff and fix typo in comment: * promote change to zero-alloc arg structure * Undo unintentional formatting changes to chamelon * Fix minimizer * Minimize diff * Fix bug with local-returning method * Fix regression where polymorphic parameters weren't allowed to be used in same parameter list as GADTs * Fix merge conflicts and make bootstrap * Apply expected diff to zero-alloc test changed in this PR
This PR implements syntactic function arity as described in this RFC.
This is a big PR! To help everyone understand what's going on, I've written this (longish) introductory post. It starts by summarizing the related RFC, offers some changes to design that came up during implementation, includes details to help a reviewer understand the changes, and ends with a few warts in the current implementation that seem unnecessary to solve now. Readers interested in language design but not implementation details can read just through the design details. The rest of this post is relevant only to someone providing a detailed review and may make the most sense read in parallel with the review.
The motivation is twofold:
The RFC has more detail, but I’ll summarize briefly. A function has a notion of “arity”, which is how many arguments it will accept before running effects for optional argument defaults and pattern matching. (Pattern matching can have effects, e.g. lazy patterns.) Currently the parser treats all functions as unary and the arity is only determined later based on the typedtree. Effectful patterns interrupt arity, so
let f (lazy x) y = x + y
is treated as a unary function returning a unary function. However, this changes with this PR.After this PR, arity is a syntactic notion, so
let f (lazy x) y = x + y
is a 2-ary function.f (lazy (assert false))
won’t raise, butf (lazy (assert false)) 3
will.There is a related, but subtly different, notion of “runtime arity” of a function. If a function application is fully saturated (i.e. the function has runtime arity n and is applied to n arguments), there is an optimization to make application fast in native code. This PR makes syntactic arity almost always the same notion as runtime arity. The one exception is when the syntactic arity is greater than the max allowed runtime arity (127), in which case the application optimization stops being useful.
Design Decisions and Clarifications to RFC
There are a few details implicit in or missing from the RFC that are worth spelling out here.
Optional argument defaults
The RFC describes how pattern matching works with syntactic arity, but doesn’t give a precise spec for optional argument defaults. The intuition is the same. Pattern matches and optional argument defaults are evaluated from left-to-right, with the relative order preserved from the parameter order.
The more general characterization is as follows:
Any active function parameter in the list of parameters between
fun
and->
(or betweenlet function_name
and=
) is evaluated in left-to-right order before entering the body of a function (or before performing the match that is part of a function body). An active function parameter is any of the following:[| 1.0, x |]
. Matching against such a pattern can involve allocation, because x must refer to a boxed float but the array is flat. Allocation can run the GC. Running the GC can run finalizers, which runs arbitrary code.Coercions and constraints
The parsetree spec given in the RFC mentions constraints, but not coercions. This should be a 2-ary function:
stedolan tells me this was an unintentional omission. Coercions should be treated just like normal type constraints, not interrupting the arity of the declared function.
Class functions
Class functions still aren’t n-ary:
I wanted to get input on this design before doing the same thing for class functions.
Avoiding unsoundness
This definition would be unsound:
In the current world, unsoundness is avoided by not pushing the evaluation of
assert false
past any GADT patterns. (Sobad ()
would raise.) But this “pushing the evaluation” is the whole point of syntactic arity. So the approach I take is to stipulate that an n-ary function definition must unify with an n-ary arrow type:This approach has two warts. One, it rules out some previously-accepted (if weird) definitions:
Two, it interacts with locally abstract types in a non-obvious way. The unification happens after unification variables have been substituted for any locally abstract type. So a definition might typecheck, but with
'a -> 'b
substituted for what was written as a locally-abstract type:(This definition is ok because it will assert false if
opt
is omitted and it is applied up to the'a
argument.)This type-checking behavior is consistent with how locally-abstract types behave in other situations, like recursion and the value restriction, and so is perhaps acceptable.
If this is a sticking point, I’m happy to discuss alternatives.
Question
This is a substantial change to a widely-used AST construct. How does the OCaml community deal with such changes that warrant (e.g.) updating all PPXes?
A guide to review
It’s likely best to review directory-by-directory. Here’s a quick summary of the per-directory changes:
parsing
: Add the new function construct, replacing the two old constructs. The new construct includes multiple function parameters,newtype
parameters, coercions/constraints on the function body, and the function body itself (whether that’s function cases or a standard expression body).typing
: Though the diff is large, this is semantically a refactor of existing functionality. The new function construct involves several language features (newtypes
, coercions, constraints, function cases), and so this patch factors out pieces of type-checking code for those language features to be reused in the typing of the new construct.lambda
: Some tricky code is deleted. I get rid ofpush_defaults
and the currying transformation, instead preserving the syntactic arity from type-checking. I add the translation of optional default arguments – this used to be done in type-checking, but this interacts poorly with syntactic arity.utils
: I add a helper function,List.chunks_of
, that I found useful in enforcing the max runtime arity of a function.tools
: Changes to ocamlprof implied by the change in parsetree.ocamldoc
: Changes to ocamldoc implied by the change in typedtree.There is more detail below.
The parsetree/typedtree changes imply some changes outside of the parsing/typing directories. The first few commits split up the parsing and typechecking changes, if you’re interested in narrowing in on the changes implied by one or the other.
Parsing
This is mostly a straightforward implementation of the RFC, with a few things to note:
fun (type a b c) -> e
is still interpreted as nestedPexp_newtype
’s, and not as a zero-argumentPexp_function
.arg_label * expression option * pattern
to keep the diff smaller. This is instead of adopting theArg_simple of pattern | Arg_labelled of string * pattern | Arg_optional of string * pattern * expression option
distinction described in the RFC. This could easily be changed later.Pconstraint
/Pcoerce
variant instead of representing coercions/constraints as a pair of optional types. I found this type to be advantageous in type-checking, as it allows us to keep the code that handles coercions/constraints separate from the code that handles the absence of a coercion/constraint. It’s parallel toPexp_constraint
/Pexp_coerce
, which is a nice bonus.fun x y -> function p1 -> e1 | p2 -> e2
, the function should be parsed as aTfunction_cases
and not as aTfunction_body $expr
in order to get the arity right. The PR attempts to avoid shift-reduce conflicts here by creating a new class of expression,fun_expr
, which can appear on the RHS of a lambda/function definition, and which excludes function constructs. To force the interpretation of the RHS asTfunction_body $expr
, the programmer can wrap the function in parentheses.Typing
Typing code for three constructs is significantly refactored, in increasing order of invasiveness:
(type a) …
fun
/function
constructs)A source of complexity common to all of these is the fact that typing the “argument” is no longer uniform. I address this by parameterizing the typing of these constructs over a function that takes care of typing the argument.
Let me make that clearer with an example. Take constraints. Previously the only constraint-like construct was
Pexp_constraint (e, t)
, which represents things like(e : t)
. Constructs likelet f x : t = function <cases>
were desugared aslet f x = (function <cases> : t)
. This desugaring is no longer possible, as the function body is parsed asPfunction_cases
, i.e. not an expression. So, I took the code that handledPexp_constraint
, bubbled it up to a top-level function, and parameterized it over the argument to the constraint; that allows the argument to be an expression (as in(e : t)
) or “function cases” (as inlet f x : t = function <cases>
).It is relatively straightforward to see how this technique was applied with
Pexp_newtype
andPexp_constraint
/Pexp_coerce
. It is somewhat more difficult to see how this worked withtype_cases
, so I have a dedicated section on that below.map_half_typed_cases
andtype_cases
This section describes the refactoring I did in the typing of function cases. It’s ultimately a no-op refactor, but the approach is more complex than the other pieces of this PR.
There used to be a single function,
type_cases
, which handled the typing for fun parameters, match cases, function cases, letop cases, and try cases. In this PR, I pull out a large amount of that code intomap_half_typed_cases
– the typing of fun parameters now calls that instead oftype_cases
. The “half” in “half typed cases” refers to the fact that the pattern, but not the body, is type-checked. (This terminology is used already intypecore
.)Take an example like
fun x y -> function <cases>
. You can think ofmap_half_typed_cases
as the code in common to the typing ofx
,y
, and<cases>
. It’s the common prelude and postlude that we always want to run when typing a parameter, including (say) the typing of the pattern. But it’s now parameterized over what it means to type the body of the parameter. Forx
andy
, that’s a recursive call totype_function
to check the rest of the parameters and the body; for<cases>
, that’s checking the expression bodies of each of the cases.GitHub’s diff is actually very helpful here. It shows how
type_cases
was refactored intomap_half_typed_cases
. The big deleted hunk of code was just moved into the newtype_cases
.type_cases
is now only called on things that are syntactically cases, so it does some extra things:Typing a function
This is now structured as a fold over the parameters.
type_function
does much of the same work as the oldtype_function
, except now it also calls the parameterized versions of typing constraints, newtypes, and cases.A guide to review: better diffs
Refactoring newtypes/coercions involved moving a fair amount of code. This bash function helps you see that, with color:
type_newtype
my_diff '/Pexp_newtype/,/Pexp_pack/p' '/and type_newtype/,/and type_ident/p'
type_coerce
my_diff '/Pexp_coerce(sarg/,/Pexp_send/p' '/and type_coerce/,/and type_constraint/p'
Warts in the implementation
Methods
There is a small ugly function in translation to give methods the right arity.
Take the method
m
of this class as an example:This method is parsed roughly as:
Pexp_function ([ “self” ], Pexp_poly (Pexp_function ([ “x”, “y” ], … (* x + y *))))
. But we’d like form
to have an arity of 3 (not because this can change the semantics of the program, but so that we get a runtime arity of 3 and can benefit from the native code optimization). The approach I take is to recover the arity in translation to lambda, by fusing together function bodies when the inner one is aTexp_poly
. This is possible because methods are the only places thatPexp_poly
is introduced.It seems more principled to have an AST node that encodes this more directly. (Perhaps
Pexp_poly
should take an extra argument,self
?) But, I didn’t do this as part of this PR to avoid yet another invasive change.Attributes
Some attributes continue to be dropped. E.g.:
Here, the inline attribute is dropped in translation as it’s attached to the
Pfunction_cases
piece of an n-ary function, and not a proper expression. (This function can’t be inlined!) This doesn’t change anything. Previously this attribute was silently dropped in the currying translation.Attributes relevant to type-checking continue to be respected, e.g. this is OK: