Revised proposal: Tuples #46

WardBrian · 2022-04-22T18:07:51Z

This design doc takes the information from Bob's original proposal here: #24

It reformats it to the standard RFC format and adds additional information on the currently proposed implementation.
In particular, it details IO formats and some of the implementation challenges.

Rendered version:
https://github.com/WardBrian/design-docs/blob/tuples/designs/0018-tuples.md

Signed-off-by: Brian Ward <bward@flatironinstitute.org>

betanalpha · 2022-05-03T20:29:15Z

This all looks great. I am sure that these have already been discussed repeatedly and I don't want to bike shed, but a few I have a comments about potential user experience.

What is the motivation for having({ a, b, c }) and ([a, b, c]) be equivalent to (typeof(a), typeof(b), typeof(c))? If typeof(a) = typeof(b) = typeof(c) won't there be a conflict between (typeof(a), typeof(b), typeof(c)) and the array singleton (array[3] typeof(a, b, c)) or if typeof(a) = typeof(b) = typeof(c) = real then a conflict between (typeof(a), typeof(b), typeof(c)) and the the vector singleton (vector[3])? This seems like an unnecessary source of confusion when the tuple declarations can just be restricted to (a, b, c).

Why the new accessor for individual elements? Because both arrays and linear algebra use [] to access individual elements the language already has a convention for using [] to access elements of a container. Given that tuples are just heterogeneous containers it seems like it would be natural to continue this convention and not confuse users with another notation.

I have to say that I'm a huge fan of tuple(T1, T2, T3) instead of (T1, T2, T3). The extra verbosity is minimal and it emphasizes to users that this is a new container type. This would also be consistent with the recent move to the more verbose array syntax. As an additional feature is that this would provide an explicit differentiation between declarations and unpacking,

real a, b, c;
tuple(real, real, real) x = (2, 2, 2);
(a, b, c) = x; // LHS is not the declaration of a variable but temporarily packing existing variables together

verses the more ambiguous

real a, b, c;
(real, real, real) x = (2, 2, 2);
(a, b, c) = x;

Yes the difference between variable names and type names should be enough to separate the two operations but I think that the added redundancy would be more accessible to the majority of the user base (again similar to the array syntax).

EDIT: (Steve) just fixed a missing backtick

WardBrian · 2022-05-03T20:46:21Z

What is the motivation for having({ a, b, c }) and ([a, b, c]) be equivalent to (typeof(a), typeof(b), typeof(c))

I’m not sure I follow your question, the snippets you’re referencing are establishing what already exists for arrays and row vectors, not what would be used for a tuple. Tuple values would only use {} or [] if they contain arrays or linear algebra types.

Why the new accessor for individual elements?

Besides being heterogenous, the key distinction between a tuple and an array is that the tuple has a static, fixed size. The use of . instead of [] is useful precisely because it is different - one should consider tuples separately than arrays or linear algebra types, since they cannot be iterated over dynamically and you cannot access them with a variable index, only a raw integer. This style is also how struct access would most likely look if we’re going off of structs in other languages.

I have to say that I'm a huge fan of tuple(T1, T2, T3) instead of (T1, T2, T3).

I generally prefer the more succinct syntax, since I think of the fact that tuple types and values are written down very similarly as a win for understandability, but I take your point with the array syntax as well. I do feel quite strongly about not going so far as to make tuple values look like tuple(1.2, 4), though.

I’m not sure that the unpacking/assignment syntax has any ambiguity if we require parenthesis around the LHS of an unpacking. This is not needed in languages like Python, but it is a decent choice in a statically typed language. That way, if you leave the parenthesis off, it will simply be a type error, e.g
real a, b = (2.5, 3.5)
Will fail, since it will parse as trying to assign a tuple of reals to the singular real variable b.

betanalpha · 2022-05-04T20:22:40Z

I’m not sure I follow your question, the snippets you’re referencing are establishing what already exists for arrays and row vectors, not what would be used for a tuple. Tuple values would only use {} or [] if they contain arrays or linear algebra types.

In that case I think the sentence `The expression for creating tuples can follow the array notation ({ a, b, c }) and vector notation ([a, b, c]) and use (a, b, c). This will create a tuple of type (typeof(a), typeof(b), typeof(c)), each subject to the normal promotion rules.` is too confusing as it seems to imply that `{ a, b, c })`, `([a, b, c])` and `(a, b, c)` all construct a tuple of type `(typeof(a), typeof(b), typeof(c))`.

Besides being heterogenous, the key distinction between a tuple and an array is that the tuple has a static, fixed size. The use of . instead of [] is useful precisely because it is different - one should consider tuples separately than arrays or linear algebra types, since they cannot be iterated over dynamically and you cannot access them with a variable index, only a raw integer. This style is also how struct access would most likely look if we’re going off of structs in other languages.

I appreciate this motivation for the different accessor methods, although I think that the initial documentation will have to be _very_ clear and redundant about the differences between tuple containers and array containers. I think that there will be many “how do I loop over tuples” questions from users who pattern match tuples to how they already use arrays.

I have to say that I'm a huge fan of tuple(T1, T2, T3) instead of (T1, T2, T3). I generally prefer the more succinct syntax, since I think of the fact that tuple types and values are written down very similarly as a win for understandability, but I take your point with the array syntax as well. I do feel quite strongly about not going so far as to make tuple values look like tuple(1.2, 4), though.

I definitely agree that `(1.2, 4)` should be sufficient for defining inline tuple values. In my experience one of the biggest hurdles to learning the Stan language for new users who are used to less strongly/staticly typed languages like R and Python is the type system. That’s why I think that being as explicit as possible about the type declarations, even at the expense of some mild additional verbosity, will be beneficial overall.

I’m not sure that the unpacking/assignment syntax has any ambiguity if we require parenthesis around the LHS of an unpacking. This is not needed in languages like Python, but it is a decent choice in a statically typed language. That way, if you leave the parenthesis off, it will simply be a type error, e.g real a, b = (2.5, 3.5) Will fail, since it will parse as trying to assign a tuple of reals to the singular real variable b.

Sorry I wasn’t referring to formal ambiguity of the compiler but rather cognitive ambiguity of users. This new type introduces a few new concepts that I think can trip up many users, and I think that language design that explicitly discriminates between type declarations and assignments where appropriate would help ease that burden.

WardBrian · 2022-05-04T20:29:20Z

is too confusing as it seems to imply that { a, b, c }), ([a, b, c]) and (a, b, c) all construct a tuple of type (typeof(a), typeof(b), typeof(c)).

I will reword that part, good catch.

Agreed on the point of documentation. In addition to being very clear about the issues you describe, I think it will be important to stress what benefits this provides; e.g., it is possible right now to define a function which returns two real values by settting the return type to array[] real and making a contract with yourself that this will always be length 2. By contrast, if the return type is (real, real) the compiler will make that guarantee for you.

I see your point on the extra verbosity of type declarations, and would like to hear the thoughts of some others (@rybern, @SteveBronder, @bob-carpenter). To be clear, the proposed syntax would look like tuple(int, real, int) = (1, 2.5, 3); in full. A more complicated type could look like array[3] tuple(int, array[2] real)

betanalpha · 2022-05-04T20:35:56Z

Agreed on the point of documentation. In addition to being very clear about the issues you describe, I think it will be important to stress what benefits this provides; e.g., it is possible right now to define a function which returns two real values by settting the return type to array[] real and making a contract with yourself that this will always be length 2. By contrast, if the return type is (real, real) the compiler will make that guarantee for you.

Yeah, especially since this (very good) motivation is not the motivation that has long been thrown around for tuples which are ragged arrays. Users coming to this tuple design thinking that it will allow them to loop over ragged arrays will be frustrated, and so the sooner the documentation can communicate the scope of this tuple type the better.

bob-carpenter · 2022-05-04T20:48:45Z

(@betanalpha) not the motivation that has long been thrown around for tuples which are ragged arrays

I don't know who was thinking that was the motivation, but that was never my intention. I've always just wanted standard product types in the type theoretic sense. What tuples give you is like a fixed-size, fixed-type version of a list in R.

(@WardBrian) would like to hear the thoughts of some others

I'm OK with either tuple(T1, ..., TN) or (T1, ..., TN) for the type syntax. So that'd be either,

tuple(real, int, real) x = (2.7, 12, -1e-6);

or

(real, int, real) x = (2.7, 12, -1e-6);

I very much like the idea of being able to follow this with:

real a, c;
int b;
a, b, c = x;

which follows Python convention, or, if necessary for the parser,

a, b, c = x;

andrewgelman · 2022-05-04T21:30:40Z

so cool!

…

On May 4, 2022, at 4:48 PM, Bob Carpenter ***@***.***> wrote: ***@***.*** <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_betanalpha&d=DwMCaQ&c=009klHSCxuh5AI1vNQzSO0KGjl4nbi2Q0M1QLJX9BeE&r=6Ms1ds7pQWLGyUpE515738eqKiCbdfW5Cd4P8GwFGB0&m=A7K8CUV62t1qEq2W6sRzQZDhnlWGkARvhzMvhFeHNtLboQZv5-KNtP-CYutuxfrX&s=48beLXmfBkQI5Bjkby97MaSlwAm_OfXgJwcJUFc5MdQ&e=>) not the motivation that has long been thrown around for tuples which are ragged arrays I don't know who was thinking that was the motivation, but that was never my intention. I've always just wanted standard prouct types in the type theoretic sense. What tuples give you is like a fixed-size, fixed-type version of a list in R. ***@***.*** <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_WardBrian&d=DwMCaQ&c=009klHSCxuh5AI1vNQzSO0KGjl4nbi2Q0M1QLJX9BeE&r=6Ms1ds7pQWLGyUpE515738eqKiCbdfW5Cd4P8GwFGB0&m=A7K8CUV62t1qEq2W6sRzQZDhnlWGkARvhzMvhFeHNtLboQZv5-KNtP-CYutuxfrX&s=IvnHYYlIM5siBxUjuYbdZ-fBKT0K3-LIBfP54boyvho&e=>) would like to hear the thoughts of some others I'm OK with either tuple(T1, ..., TN) or (T1, ..., TN) for the type syntax. So that'd be either, tuple(real, int, real) x = (2.7, 12, -1e-6); or (real, int, real) x = (2.7, 12, -1e-6); I very much like the idea of being able to follow this with: real a, c; int b; a, b, c = x; which follows Python convention, or, if necessary for the parser, a, b, c = x; — Reply to this email directly, view it on GitHub <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_stan-2Ddev_design-2Ddocs_pull_46-23issuecomment-2D1117921604&d=DwMCaQ&c=009klHSCxuh5AI1vNQzSO0KGjl4nbi2Q0M1QLJX9BeE&r=6Ms1ds7pQWLGyUpE515738eqKiCbdfW5Cd4P8GwFGB0&m=A7K8CUV62t1qEq2W6sRzQZDhnlWGkARvhzMvhFeHNtLboQZv5-KNtP-CYutuxfrX&s=cwBuzcRnDoLbZUXOFWE2Wf7CeA5MmXq2O5RoV8-PUB8&e=>, or unsubscribe <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AAZYCUPLV4E7HOIXPUE6UXTVILPDTANCNFSM5UDAKVDQ&d=DwMCaQ&c=009klHSCxuh5AI1vNQzSO0KGjl4nbi2Q0M1QLJX9BeE&r=6Ms1ds7pQWLGyUpE515738eqKiCbdfW5Cd4P8GwFGB0&m=A7K8CUV62t1qEq2W6sRzQZDhnlWGkARvhzMvhFeHNtLboQZv5-KNtP-CYutuxfrX&s=CH4AlzS0djFgs0-Y43EfZhRdAwItSNaGRi2Z0Yj1K90&e=>. You are receiving this because you are subscribed to this thread.

WardBrian · 2022-05-04T22:37:28Z

The initial tuples PR we have for stanc doesn't support unpacking, but it is something I want to add soon thereafter. I think it will be possible to translate it directly to a std::tie if I read the C++ docs correctly, or we could generate the necessary sequential assignments ourselves.

WardBrian · 2022-10-13T15:51:06Z

During today's monthly language meeting, @spinkney and @avehtari both expressed a preference for the syntax tuple(T1, ..., TN) for tuple types, as proposed by @betanalpha.

Aki made the particular point that the more explicit form makes it much easier to search a large program (either with your eyes or with a tool like grep) for tuple types, which has a lot of utility.

Tuple expressions are still made with parenthesis, so a full example is

tuple(real, int, real) x = (2.7, 12, -1e-6);

bob-carpenter · 2022-10-14T14:01:03Z

I'm totally OK with the tuple(...) syntax.

WardBrian · 2022-10-14T14:02:52Z

After updating the implementation to match it has grown on me more than I thought. It makes things like nested tuples much more apparent to the eye, (real, real, (real, real)) vs tuple(real, real, tuple(real, real)) is a big difference.

mitzimorris

A few suggestions, but overall, the spec is very clear.

designs/0018-tuples.md

mitzimorris · 2023-03-06T16:22:52Z

regarding static - was confused by the language - perhaps "slots" makes more sense?
arrays have a fixed number of dimensions, but the size of each can be specified at runtime.
tuples have a fixed number of slots, known at compile - analogous to number of array dimensions.
I think we need to be explicit to here to avoid confusion between "size" and "dimension".

WardBrian · 2023-03-06T16:31:16Z

Agreed, especially in the final user-facing documentation. The analogy to arrays is a good one, but also has its own potential to be confusing since the relationship between different dimensions of an array and different elements in a tuple is not the same

mitzimorris

per discussion, we can be clearer w/r/t tuples structure, but otherwise, this is g2g.

designs/0018-tuples.md

mitzimorris

very clear! complete approval!

Adapt and extend Bob's original proposal

22a1444

Signed-off-by: Brian Ward <bward@flatironinstitute.org>

WardBrian requested review from bob-carpenter, mitzimorris, rok-cesnovar and SteveBronder April 27, 2022 13:20

Clarify expression wording

7adc8e2

WardBrian requested review from SteveBronder and removed request for bob-carpenter, mitzimorris, SteveBronder and rok-cesnovar May 25, 2022 15:31

Change to use tuple keyword on type declarations

98b5d28

WardBrian mentioned this pull request Nov 14, 2022

[WIP] tuples and structs rough proposal #24

Closed

mitzimorris mentioned this pull request Jan 25, 2023

tuples: parse json to var_context stan-dev/stan#3161

Closed

WardBrian added 2 commits February 27, 2023 11:38

Specify that nested tuple-arrays are in-order in vars_context

cc9509e

Add unresolved question of get_dims

04dddcb

mitzimorris previously approved these changes Mar 3, 2023

View reviewed changes

designs/0018-tuples.md Show resolved Hide resolved

designs/0018-tuples.md Outdated Show resolved Hide resolved

designs/0018-tuples.md Show resolved Hide resolved

Mension dimensionality in var_context spec, accessors in summary

c92cea5

WardBrian dismissed mitzimorris’s stale review via c92cea5 March 6, 2023 15:16

WardBrian requested a review from mitzimorris March 6, 2023 15:29

Size -> number of slots

853d87e

mitzimorris previously approved these changes Mar 6, 2023

View reviewed changes

designs/0018-tuples.md Outdated Show resolved Hide resolved

designs/0018-tuples.md Show resolved Hide resolved

Final cleanup

1251d2c

WardBrian dismissed mitzimorris’s stale review via 1251d2c March 6, 2023 20:03

WardBrian requested a review from mitzimorris March 6, 2023 20:04

mitzimorris approved these changes Mar 7, 2023

View reviewed changes

mitzimorris merged commit 159d0ab into stan-dev:master Mar 7, 2023

WardBrian deleted the tuples branch March 7, 2023 14:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revised proposal: Tuples #46

Revised proposal: Tuples #46

WardBrian commented Apr 22, 2022

betanalpha commented May 3, 2022 •

edited by SteveBronder

Loading

WardBrian commented May 3, 2022

betanalpha commented May 4, 2022 via email

WardBrian commented May 4, 2022

betanalpha commented May 4, 2022 via email

bob-carpenter commented May 4, 2022 •

edited

Loading

andrewgelman commented May 4, 2022 via email

WardBrian commented May 4, 2022

WardBrian commented Oct 13, 2022

bob-carpenter commented Oct 14, 2022

WardBrian commented Oct 14, 2022

mitzimorris left a comment

mitzimorris commented Mar 6, 2023

WardBrian commented Mar 6, 2023

mitzimorris left a comment

mitzimorris left a comment

Revised proposal: Tuples #46

Revised proposal: Tuples #46

Conversation

WardBrian commented Apr 22, 2022

betanalpha commented May 3, 2022 • edited by SteveBronder Loading

WardBrian commented May 3, 2022

betanalpha commented May 4, 2022 via email

WardBrian commented May 4, 2022

betanalpha commented May 4, 2022 via email

bob-carpenter commented May 4, 2022 • edited Loading

andrewgelman commented May 4, 2022 via email

WardBrian commented May 4, 2022

WardBrian commented Oct 13, 2022

bob-carpenter commented Oct 14, 2022

WardBrian commented Oct 14, 2022

mitzimorris left a comment

Choose a reason for hiding this comment

mitzimorris commented Mar 6, 2023

WardBrian commented Mar 6, 2023

mitzimorris left a comment

Choose a reason for hiding this comment

mitzimorris left a comment

Choose a reason for hiding this comment

betanalpha commented May 3, 2022 •

edited by SteveBronder

Loading

bob-carpenter commented May 4, 2022 •

edited

Loading