Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revised proposal: Tuples #46

Merged
merged 8 commits into from
Mar 7, 2023
Merged

Revised proposal: Tuples #46

merged 8 commits into from
Mar 7, 2023

Conversation

WardBrian
Copy link
Member

This design doc takes the information from Bob's original proposal here: #24

It reformats it to the standard RFC format and adds additional information on the currently proposed implementation.
In particular, it details IO formats and some of the implementation challenges.

Rendered version:
https://github.com/WardBrian/design-docs/blob/tuples/designs/0018-tuples.md

Signed-off-by: Brian Ward <bward@flatironinstitute.org>
@betanalpha
Copy link

betanalpha commented May 3, 2022

This all looks great. I am sure that these have already been discussed repeatedly and I don't want to bike shed, but a few I have a comments about potential user experience.

What is the motivation for having({ a, b, c }) and ([a, b, c]) be equivalent to (typeof(a), typeof(b), typeof(c))? If typeof(a) = typeof(b) = typeof(c) won't there be a conflict between (typeof(a), typeof(b), typeof(c)) and the array singleton (array[3] typeof(a, b, c)) or if typeof(a) = typeof(b) = typeof(c) = real then a conflict between (typeof(a), typeof(b), typeof(c)) and the the vector singleton (vector[3])? This seems like an unnecessary source of confusion when the tuple declarations can just be restricted to (a, b, c).

Why the new accessor for individual elements? Because both arrays and linear algebra use [] to access individual elements the language already has a convention for using [] to access elements of a container. Given that tuples are just heterogeneous containers it seems like it would be natural to continue this convention and not confuse users with another notation.

I have to say that I'm a huge fan of tuple(T1, T2, T3) instead of (T1, T2, T3). The extra verbosity is minimal and it emphasizes to users that this is a new container type. This would also be consistent with the recent move to the more verbose array syntax. As an additional feature is that this would provide an explicit differentiation between declarations and unpacking,

real a, b, c;
tuple(real, real, real) x = (2, 2, 2);
(a, b, c) = x; // LHS is not the declaration of a variable but temporarily packing existing variables together

verses the more ambiguous

real a, b, c;
(real, real, real) x = (2, 2, 2);
(a, b, c) = x;

Yes the difference between variable names and type names should be enough to separate the two operations but I think that the added redundancy would be more accessible to the majority of the user base (again similar to the array syntax).

EDIT: (Steve) just fixed a missing backtick

@WardBrian
Copy link
Member Author

What is the motivation for having({ a, b, c }) and ([a, b, c]) be equivalent to (typeof(a), typeof(b), typeof(c))

I’m not sure I follow your question, the snippets you’re referencing are establishing what already exists for arrays and row vectors, not what would be used for a tuple. Tuple values would only use {} or [] if they contain arrays or linear algebra types.

Why the new accessor for individual elements?

Besides being heterogenous, the key distinction between a tuple and an array is that the tuple has a static, fixed size. The use of . instead of [] is useful precisely because it is different - one should consider tuples separately than arrays or linear algebra types, since they cannot be iterated over dynamically and you cannot access them with a variable index, only a raw integer. This style is also how struct access would most likely look if we’re going off of structs in other languages.

I have to say that I'm a huge fan of tuple(T1, T2, T3) instead of (T1, T2, T3).

I generally prefer the more succinct syntax, since I think of the fact that tuple types and values are written down very similarly as a win for understandability, but I take your point with the array syntax as well. I do feel quite strongly about not going so far as to make tuple values look like tuple(1.2, 4), though.

I’m not sure that the unpacking/assignment syntax has any ambiguity if we require parenthesis around the LHS of an unpacking. This is not needed in languages like Python, but it is a decent choice in a statically typed language. That way, if you leave the parenthesis off, it will simply be a type error, e.g
real a, b = (2.5, 3.5)
Will fail, since it will parse as trying to assign a tuple of reals to the singular real variable b.

@betanalpha
Copy link

betanalpha commented May 4, 2022 via email

@WardBrian
Copy link
Member Author

is too confusing as it seems to imply that { a, b, c }), ([a, b, c]) and (a, b, c) all construct a tuple of type (typeof(a), typeof(b), typeof(c)).

I will reword that part, good catch.

Agreed on the point of documentation. In addition to being very clear about the issues you describe, I think it will be important to stress what benefits this provides; e.g., it is possible right now to define a function which returns two real values by settting the return type to array[] real and making a contract with yourself that this will always be length 2. By contrast, if the return type is (real, real) the compiler will make that guarantee for you.

I see your point on the extra verbosity of type declarations, and would like to hear the thoughts of some others (@rybern, @SteveBronder, @bob-carpenter). To be clear, the proposed syntax would look like tuple(int, real, int) = (1, 2.5, 3); in full. A more complicated type could look like array[3] tuple(int, array[2] real)

@betanalpha
Copy link

betanalpha commented May 4, 2022 via email

@bob-carpenter
Copy link
Collaborator

bob-carpenter commented May 4, 2022

(@betanalpha) not the motivation that has long been thrown around for tuples which are ragged arrays

I don't know who was thinking that was the motivation, but that was never my intention. I've always just wanted standard product types in the type theoretic sense. What tuples give you is like a fixed-size, fixed-type version of a list in R.

(@WardBrian) would like to hear the thoughts of some others

I'm OK with either tuple(T1, ..., TN) or (T1, ..., TN) for the type syntax. So that'd be either,

tuple(real, int, real) x = (2.7, 12, -1e-6);

or

(real, int, real) x = (2.7, 12, -1e-6);

I very much like the idea of being able to follow this with:

real a, c;
int b;
a, b, c = x;

which follows Python convention, or, if necessary for the parser,

a, b, c = x;

@andrewgelman
Copy link

andrewgelman commented May 4, 2022 via email

@WardBrian
Copy link
Member Author

The initial tuples PR we have for stanc doesn't support unpacking, but it is something I want to add soon thereafter. I think it will be possible to translate it directly to a std::tie if I read the C++ docs correctly, or we could generate the necessary sequential assignments ourselves.

@WardBrian
Copy link
Member Author

During today's monthly language meeting, @spinkney and @avehtari both expressed a preference for the syntax tuple(T1, ..., TN) for tuple types, as proposed by @betanalpha.

Aki made the particular point that the more explicit form makes it much easier to search a large program (either with your eyes or with a tool like grep) for tuple types, which has a lot of utility.

Tuple expressions are still made with parenthesis, so a full example is

tuple(real, int, real) x = (2.7, 12, -1e-6);

@bob-carpenter
Copy link
Collaborator

I'm totally OK with the tuple(...) syntax.

@WardBrian
Copy link
Member Author

After updating the implementation to match it has grown on me more than I thought. It makes things like nested tuples much more apparent to the eye, (real, real, (real, real)) vs tuple(real, real, tuple(real, real)) is a big difference.

mitzimorris
mitzimorris previously approved these changes Mar 3, 2023
Copy link
Member

@mitzimorris mitzimorris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few suggestions, but overall, the spec is very clear.

designs/0018-tuples.md Show resolved Hide resolved
designs/0018-tuples.md Outdated Show resolved Hide resolved
designs/0018-tuples.md Show resolved Hide resolved
@mitzimorris
Copy link
Member

regarding static - was confused by the language - perhaps "slots" makes more sense?
arrays have a fixed number of dimensions, but the size of each can be specified at runtime.
tuples have a fixed number of slots, known at compile - analogous to number of array dimensions.
I think we need to be explicit to here to avoid confusion between "size" and "dimension".

@WardBrian
Copy link
Member Author

Agreed, especially in the final user-facing documentation. The analogy to arrays is a good one, but also has its own potential to be confusing since the relationship between different dimensions of an array and different elements in a tuple is not the same

mitzimorris
mitzimorris previously approved these changes Mar 6, 2023
Copy link
Member

@mitzimorris mitzimorris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per discussion, we can be clearer w/r/t tuples structure, but otherwise, this is g2g.

designs/0018-tuples.md Outdated Show resolved Hide resolved
designs/0018-tuples.md Show resolved Hide resolved
Copy link
Member

@mitzimorris mitzimorris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very clear! complete approval!

@mitzimorris mitzimorris merged commit 159d0ab into stan-dev:master Mar 7, 2023
@WardBrian WardBrian deleted the tuples branch March 7, 2023 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants