diff --git a/rfcs/tuple_projections.md b/rfcs/tuple_projections.md new file mode 100644 index 0000000..28a74ea --- /dev/null +++ b/rfcs/tuple_projections.md @@ -0,0 +1,251 @@ +# Tuple Projections + +## Overview + +This RFC proposes a quality-of-life improvement to OCaml's tuples, adding +support for labeled and unlabeled tuple projections. + +## Proposed change + +The idea is to allow users to directly project elements from a tuple using +labels or indicies, as opposed to patterns: + +```ocaml +# let x = (~tuple:42, ~proj:1337, "is", 'c', 00, 1);; +val x : (tuple: int * proj:int * string * char * int * int) = + (~tuple:42, ~proj:1337, "is", 'c', 0, 1) +# x.tuple;; +- : int = 42 +# x.3;; +- : char = 'c' +``` + +Here, we're able to project out of a 6-tuple (containing both labeled and +unlabeled components) simply by writing `x.j` (for an index `j`) or `x.l` (for +a label `l`). Tuple indices are 0-indexed, to match the existing indexing +convention for arrays, lists, etc. + +This is useful for a couple reasons: +- Conciseness: avoids unnecessary boilerplate pattern matches / projection + functions e.g. `Tuple2.fst, Tuple3.fst, ...`. Additionally, with the advent + of labeled tuples, such functions are less useful[^1]. +- Clarity: `x.1` is more readable than `let _, x1, _ = x in ...` +- Parity with records: complements record field projection and aligns tuple + uses with other ML-like languages. + +Occurrences of explicit pattern matching for tuple projection are +reasonably frequent where tuple projections could otherwise be used +(found by doing a quick Sherlocode[^2]): +- ~6k uses of projections on pairs, +- ~1.15k for triples, +- ~210 for quadruples. + + +## Previous work + +Many other strongly-typed languages support built-in tuple projections. + +### SML + +SML models tuples as records with integer field names (1-indexed), so projection uses +record selection syntax: +```sml +> val x = (1, "hi", 42);; +val x = (1, "hi", 42): int * string * int +> val y = #1 x;; +val y = 1: int +> val z = #3 x +val z = 42: int +``` + +### Rust + +Rust supports tuple projections (0-indexed) for ordinary tuples (and tuple structs): +```rust +let x = (42, "is", 'c'); +let y = x.0; // 42 +let z = x.1; // "is" + +struct Point(i32, i32); +let p = Point(3, 4); +let x_coord = p.0; // 3 +``` + +Record structs also use the same syntax for projection: +```rust +struct Point { x : i32, y : i32 }; +let p = Point { x = 3, y = 4 }; +let x_coord = p.x; // 3 +``` + + +### Swift + +Swift supports tuple projections via both positional indicies and labels (as in this proposal): +```swift +import Foundation + +let x = (tuple: 42, proj: 1337, "is", 'c', 00, 1); + +print(x.tuple) // 42 +print(x.5) // 1 +``` + +## Implementation + +An experimental implementation is available at [PR 14257](https://github.com/ocaml/ocaml/pull/14257). + +### Parsetree changes + +Given the syntax for labeled tuple projection is overloaded with record field +projection, i.e. there is no syntactic distinction between the projections in: +```ocaml +let x = { foo = 1; bar = 2 } in x.foo;; +``` +and +```ocaml +let x = ~foo:1, ~bar:2 in x.foo;; +``` + + +The proposed parsetree additions will represent all projections using `Pexp_field`: +```ocaml +type field = + | Pfield_record_or_tuple_label of Longident.t loc + | Pfield_tuple_index of int loc + +and expression_desc = + ... + | Pexp_field of expression * field + ... +``` + + +### Typechecking + +While typechecking, when encountering a field projection in expressions, + +- If the field is a tuple index `j`, type as a unlabeled tuple projection. + + Check to see whether the expected type is known (principally known, if in `-principal` mode). + Then: + * If the type is not known: raise an error stating that the projection is ambiguous. + * If the type is known to be `(?l0:ty0 * ... * tyj * ... * ?ln:tyn)`: type the projection as `tyj` + +- If the field is a record or tuple label `l`. + + Check to see whether the expected type is known: + - If the type is not known: typecheck the projection `e.l` as a record projection + - If the type is known to be `(ty0, ..., tyn) t`: ditto + - If the type is known to be `(?l0:ty0 * ... * l:tyl * ... * ?ln:tyn)`: type the projection + as `tyl`. + +## Considerations + +### Limitations of type-based disambiguation + +OCaml's current type-based disambiguation mechanism is relatively weak. As a result, +many of the patterns that tuple projections are intended to replace would be ill-typed under +today's implementation. For instance: +```ocaml +# List.map (fun x -> x.1) [42, "Hello"; 1337, "World"];; +Error: The type of the tuple expression is ambiguous. + Could not determine the type of the tuple projection. +``` + +That said, this limitation does not arise from the feature itself, but from the +weaknesses in OCaml's type propagation. Improving type propagation (separately) +would benefit not only tuple projections, but other features that rely on +type-based disambiguation (e.g. constructors and record fields). As such, we +argue that tuple projections should not be rejected on this point alone, and +that the broader issues of type propagation and disambiguation be addressed +separately. + +### Syntactic overloading + +This proposal reuses the existing projection syntax `e.l` for both record +fields and labeled tuples. The primary motivator behind this is to avoid +introducing new operators and keeps projection syntax uniform. + +The downside is that it increases reliance on type-based disambiguation. + +### Diagnostic quality of error messages + +Type errors surrounding unknown fields will need to be refined. +In particular, when the compiler defaults a labeled projection to a record +field (even though it might also have been a labeled tuple projection), +the diagnostic report ought to make this clear. + +Otherwise, programs like the following may yield cryptic messages: +```ocaml +# let is_ill_typed_due_to_defaults x = + let y = x.tuple_label_a in + ignore (x : (tuple_label_a:int * string * bool)); + (y, x.2) +Error: Unbound record field `tuple_label_a` +``` + +A clearer diagnostic could be: +``` +Error: The field `tuple_label_a` is unknown. + The projection `x.tuple_label_a` was interpreted as a record field, + but no such record field exists. + +Hint: Did you mean to project from a labeled tuple instead? + If so, add an annotation to disambiguate the projection. +``` + +Other problematic examples include conflicts with existing records: +```ocaml +# type discombobulating_record = { tuple_label_a : int };; +type discombobulating_record = { tuple_label_a : int } +# let is_ill_typed_due_to_defaults x = + let y = x.tuple_label_a in + ignore (x : (tuple_label_a:int * string * bool)); + (y, x.2) +Error: The value `x` has type `discombobulating_record` but an expression was + expected of type `tuple_label_a:int * string * bool` +``` +Here the error conflates record and tuple typing, which is misleading. +A more informative report could combine a warning with the final error: +```ocaml +Warning: The projection `x.tuple_label_a` could refer either to a record field + or a labeled tuple component. It was resolved as a record field of + `discombobulating_record`. + Please disambiguate if this is wrong. +Error: The value `x` has type `discombobulating_record` but an expression + was expected of type + `tuple_label_a:int * string * bool` +``` + + +### Row-polymorphic tuples + +Unlabeled tuple projections can naturally (and efficiently) be typed using row polymorphism, in +the same way object fields are typed today: +```ocaml +# let snd x = x.2;; +val snd : 'a * 'b * .. -> 'b +``` + +This generalises to tuples of arbitrary arity (and is strictly more powerful than this proposal). +However, extending the same mechanism to labeled tuples is significantly more difficult (without incurring +runtime overhead or using monomorphisation). + +From a language design perspective, though, we would ideally want the typing of projections +to behave uniformly across both labeled and unlabeled tuples. Moreover, the typing behaviour +discussed in this RFC is compatible with this use of row polymorphism. + + +### Non-principal defaults + +For type-based disambiguation, OCaml usually implements a 'default' behaviour when +the type is unknown. For instance, in record field / variant overloading, if the type is +not known to be a nominal type `(ty0, ..., tyn) t`, the lexically-closed matching record field / variant +is used. + +We could have a similar default rule for tuple projections `e.j`, if the type is unknown: type `e` as `min 2 (j + 1)`-ary tuple. + +[^1]: Without relying on coercions between labeled and unlabeled tuples, as in [PR 14180](https://github.com/ocaml/ocaml/pull/14180). And even then, it would be shorter to write a pattern than the types for `:>`. + +[^2]: The following family of patterns were used to derive estimates: [`fun (\w\+, _)`](https://sherlocode.com/?q=fun%20(%5Cw%5C%2B%2C%20_)), [`fun (\w\+, _, _)`](https://sherlocode.com/?q=fun%20(%5Cw%5C%2B%2C%20_%2C%20_)), [`fun (\w\+, _, _, _)`](https://sherlocode.com/?q=fun%20(%5Cw%5C%2B%2C%20_%2C%20_%2C%20_)), [`fun (_, \w\+)`](https://sherlocode.com/?q=fun%20(_%2C%20%5Cw%5C%2B)), [`fun (_, \w\+, _)`](https://sherlocode.com/?q=fun%20(_%2C%20%5Cw%5C%2B%2C%20_)), [`fun (_, \w\+, _, _)`](https://sherlocode.com/?q=fun%20(_%2C%20%5Cw%5C%2B%2C%20_%2C%20_)), etc.