Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
251 changes: 251 additions & 0 deletions rfcs/tuple_projections.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,251 @@
# Tuple Projections

## Overview

This RFC proposes a quality-of-life improvement to OCaml's tuples, adding
support for labeled and unlabeled tuple projections.

## Proposed change

The idea is to allow users to directly project elements from a tuple using
labels or indicies, as opposed to patterns:

```ocaml
# let x = (~tuple:42, ~proj:1337, "is", 'c', 00, 1);;
val x : (tuple: int * proj:int * string * char * int * int) =
(~tuple:42, ~proj:1337, "is", 'c', 0, 1)
# x.tuple;;
- : int = 42
# x.3;;
- : char = 'c'
```

Here, we're able to project out of a 6-tuple (containing both labeled and
unlabeled components) simply by writing `x.j` (for an index `j`) or `x.l` (for
a label `l`). Tuple indices are 0-indexed, to match the existing indexing
convention for arrays, lists, etc.

This is useful for a couple reasons:
- Conciseness: avoids unnecessary boilerplate pattern matches / projection
functions e.g. `Tuple2.fst, Tuple3.fst, ...`. Additionally, with the advent
of labeled tuples, such functions are less useful[^1].
- Clarity: `x.1` is more readable than `let _, x1, _ = x in ...`
- Parity with records: complements record field projection and aligns tuple
uses with other ML-like languages.

Occurrences of explicit pattern matching for tuple projection are
reasonably frequent where tuple projections could otherwise be used
(found by doing a quick Sherlocode[^2]):
- ~6k uses of projections on pairs,
- ~1.15k for triples,
- ~210 for quadruples.


## Previous work

Many other strongly-typed languages support built-in tuple projections.

### SML

SML models tuples as records with integer field names (1-indexed), so projection uses
record selection syntax:
```sml
> val x = (1, "hi", 42);;
val x = (1, "hi", 42): int * string * int
> val y = #1 x;;
val y = 1: int
> val z = #3 x
val z = 42: int
```

### Rust

Rust supports tuple projections (0-indexed) for ordinary tuples (and tuple structs):
```rust
let x = (42, "is", 'c');
let y = x.0; // 42
let z = x.1; // "is"

struct Point(i32, i32);
let p = Point(3, 4);
let x_coord = p.0; // 3
```

Record structs also use the same syntax for projection:
```rust
struct Point { x : i32, y : i32 };
let p = Point { x = 3, y = 4 };
let x_coord = p.x; // 3
```


### Swift

Swift supports tuple projections via both positional indicies and labels (as in this proposal):
```swift
import Foundation

let x = (tuple: 42, proj: 1337, "is", 'c', 00, 1);

print(x.tuple) // 42
print(x.5) // 1
```

## Implementation

An experimental implementation is available at [PR 14257](https://github.com/ocaml/ocaml/pull/14257).

### Parsetree changes

Given the syntax for labeled tuple projection is overloaded with record field
projection, i.e. there is no syntactic distinction between the projections in:
```ocaml
let x = { foo = 1; bar = 2 } in x.foo;;
```
and
```ocaml
let x = ~foo:1, ~bar:2 in x.foo;;
```


The proposed parsetree additions will represent all projections using `Pexp_field`:
```ocaml
type field =
| Pfield_record_or_tuple_label of Longident.t loc
| Pfield_tuple_index of int loc

and expression_desc =
...
| Pexp_field of expression * field
...
```


### Typechecking

While typechecking, when encountering a field projection in expressions,

- If the field is a tuple index `j`, type as a unlabeled tuple projection.

Check to see whether the expected type is known (principally known, if in `-principal` mode).
Then:
* If the type is not known: raise an error stating that the projection is ambiguous.
* If the type is known to be `(?l0:ty0 * ... * tyj * ... * ?ln:tyn)`: type the projection as `tyj`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens in the case :

let x = ~a:1, ~b:2 in x.1

Is the example rejected because only x.a and x.b are accepted and not x.0/x.1 ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be rejected, as you correctly surmise, the only well-typed projections for x would be x.a, x.b.

This is consistent with the current pattern matching semantics i.e. the following is ill-typed:

# let x = ~a:1, ~b:2 in 
  let _, x1 = x in 
  x1;;
Error: The value x has type a:int * b:int
       but an expression was expected of type 'a * 'b
       The first tuple element is labeled a,
       but an unlabeled element was expected


- If the field is a record or tuple label `l`.

Check to see whether the expected type is known:
- If the type is not known: typecheck the projection `e.l` as a record projection
- If the type is known to be `(ty0, ..., tyn) t`: ditto
- If the type is known to be `(?l0:ty0 * ... * l:tyl * ... * ?ln:tyn)`: type the projection
as `tyl`.

## Considerations

### Limitations of type-based disambiguation

OCaml's current type-based disambiguation mechanism is relatively weak. As a result,
many of the patterns that tuple projections are intended to replace would be ill-typed under
today's implementation. For instance:
```ocaml
# List.map (fun x -> x.1) [42, "Hello"; 1337, "World"];;
Error: The type of the tuple expression is ambiguous.
Could not determine the type of the tuple projection.
```

That said, this limitation does not arise from the feature itself, but from the
weaknesses in OCaml's type propagation. Improving type propagation (separately)
would benefit not only tuple projections, but other features that rely on
type-based disambiguation (e.g. constructors and record fields). As such, we
argue that tuple projections should not be rejected on this point alone, and
that the broader issues of type propagation and disambiguation be addressed
separately.

### Syntactic overloading

This proposal reuses the existing projection syntax `e.l` for both record
fields and labeled tuples. The primary motivator behind this is to avoid
introducing new operators and keeps projection syntax uniform.

The downside is that it increases reliance on type-based disambiguation.

### Diagnostic quality of error messages

Type errors surrounding unknown fields will need to be refined.
In particular, when the compiler defaults a labeled projection to a record
field (even though it might also have been a labeled tuple projection),
the diagnostic report ought to make this clear.

Otherwise, programs like the following may yield cryptic messages:
```ocaml
# let is_ill_typed_due_to_defaults x =
let y = x.tuple_label_a in
ignore (x : (tuple_label_a:int * string * bool));
(y, x.2)
Error: Unbound record field `tuple_label_a`
```

A clearer diagnostic could be:
```
Error: The field `tuple_label_a` is unknown.
The projection `x.tuple_label_a` was interpreted as a record field,
but no such record field exists.

Hint: Did you mean to project from a labeled tuple instead?
If so, add an annotation to disambiguate the projection.
```

Other problematic examples include conflicts with existing records:
```ocaml
# type discombobulating_record = { tuple_label_a : int };;
type discombobulating_record = { tuple_label_a : int }
# let is_ill_typed_due_to_defaults x =
let y = x.tuple_label_a in
ignore (x : (tuple_label_a:int * string * bool));
(y, x.2)
Error: The value `x` has type `discombobulating_record` but an expression was
expected of type `tuple_label_a:int * string * bool`
```
Here the error conflates record and tuple typing, which is misleading.
A more informative report could combine a warning with the final error:
```ocaml
Warning: The projection `x.tuple_label_a` could refer either to a record field
or a labeled tuple component. It was resolved as a record field of
`discombobulating_record`.
Please disambiguate if this is wrong.
Error: The value `x` has type `discombobulating_record` but an expression
was expected of type
`tuple_label_a:int * string * bool`
```


### Row-polymorphic tuples

Unlabeled tuple projections can naturally (and efficiently) be typed using row polymorphism, in
the same way object fields are typed today:
```ocaml
# let snd x = x.2;;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we might want to have the same semantic as with

let get_y t =
  let ~y, .. = t in
  y

which is rejected if I remember correctly.

We would also need to check that the back-end supports such polymorphism (I remember talking about a usefull feature similar to this one for modules with @lthls)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is rejected currently since the expected type of the pattern (i.e. the type of t) is not known

val snd : 'a * 'b * .. -> 'b
```

This generalises to tuples of arbitrary arity (and is strictly more powerful than this proposal).
However, extending the same mechanism to labeled tuples is significantly more difficult (without incurring
runtime overhead or using monomorphisation).

From a language design perspective, though, we would ideally want the typing of projections
to behave uniformly across both labeled and unlabeled tuples. Moreover, the typing behaviour
discussed in this RFC is compatible with this use of row polymorphism.


### Non-principal defaults

For type-based disambiguation, OCaml usually implements a 'default' behaviour when
the type is unknown. For instance, in record field / variant overloading, if the type is
not known to be a nominal type `(ty0, ..., tyn) t`, the lexically-closed matching record field / variant
is used.

We could have a similar default rule for tuple projections `e.j`, if the type is unknown: type `e` as `min 2 (j + 1)`-ary tuple.

[^1]: Without relying on coercions between labeled and unlabeled tuples, as in [PR 14180](https://github.com/ocaml/ocaml/pull/14180). And even then, it would be shorter to write a pattern than the types for `:>`.

[^2]: The following family of patterns were used to derive estimates: [`fun (\w\+, _)`](https://sherlocode.com/?q=fun%20(%5Cw%5C%2B%2C%20_)), [`fun (\w\+, _, _)`](https://sherlocode.com/?q=fun%20(%5Cw%5C%2B%2C%20_%2C%20_)), [`fun (\w\+, _, _, _)`](https://sherlocode.com/?q=fun%20(%5Cw%5C%2B%2C%20_%2C%20_%2C%20_)), [`fun (_, \w\+)`](https://sherlocode.com/?q=fun%20(_%2C%20%5Cw%5C%2B)), [`fun (_, \w\+, _)`](https://sherlocode.com/?q=fun%20(_%2C%20%5Cw%5C%2B%2C%20_)), [`fun (_, \w\+, _, _)`](https://sherlocode.com/?q=fun%20(_%2C%20%5Cw%5C%2B%2C%20_%2C%20_)), etc.