-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revised proposal: Tuples #46
Conversation
Signed-off-by: Brian Ward <bward@flatironinstitute.org>
This all looks great. I am sure that these have already been discussed repeatedly and I don't want to bike shed, but a few I have a comments about potential user experience. What is the motivation for having Why the new accessor for individual elements? Because both arrays and linear algebra use I have to say that I'm a huge fan of
verses the more ambiguous
Yes the difference between variable names and type names should be enough to separate the two operations but I think that the added redundancy would be more accessible to the majority of the user base (again similar to the EDIT: (Steve) just fixed a missing backtick |
I’m not sure I follow your question, the snippets you’re referencing are establishing what already exists for arrays and row vectors, not what would be used for a tuple. Tuple values would only use {} or [] if they contain arrays or linear algebra types.
Besides being heterogenous, the key distinction between a tuple and an array is that the tuple has a static, fixed size. The use of
I generally prefer the more succinct syntax, since I think of the fact that tuple types and values are written down very similarly as a win for understandability, but I take your point with the array syntax as well. I do feel quite strongly about not going so far as to make tuple values look like I’m not sure that the unpacking/assignment syntax has any ambiguity if we require parenthesis around the LHS of an unpacking. This is not needed in languages like Python, but it is a decent choice in a statically typed language. That way, if you leave the parenthesis off, it will simply be a type error, e.g |
I’m not sure I follow your question, the snippets you’re referencing are establishing what already exists for arrays and row vectors, not what would be used for a tuple. Tuple values would only use {} or [] if they contain arrays or linear algebra types.
In that case I think the sentence
`The expression for creating tuples can follow the array notation ({ a, b, c }) and vector notation ([a, b, c]) and use (a, b, c). This will create a tuple of type (typeof(a), typeof(b), typeof(c)), each subject to the normal promotion rules.`
is too confusing as it seems to imply that `{ a, b, c })`, `([a, b, c])` and `(a, b, c)` all construct a tuple of type `(typeof(a), typeof(b), typeof(c))`.
Besides being heterogenous, the key distinction between a tuple and an array is that the tuple has a static, fixed size. The use of . instead of [] is useful precisely because it is different - one should consider tuples separately than arrays or linear algebra types, since they cannot be iterated over dynamically and you cannot access them with a variable index, only a raw integer. This style is also how struct access would most likely look if we’re going off of structs in other languages.
I appreciate this motivation for the different accessor methods, although I think that the initial documentation will have to be _very_ clear and redundant about the differences between tuple containers and array containers. I think that there will be many “how do I loop over tuples” questions from users who pattern match tuples to how they already use arrays.
I have to say that I'm a huge fan of tuple(T1, T2, T3) instead of (T1, T2, T3).
I generally prefer the more succinct syntax, since I think of the fact that tuple types and values are written down very similarly as a win for understandability, but I take your point with the array syntax as well. I do feel quite strongly about not going so far as to make tuple values look like tuple(1.2, 4), though.
I definitely agree that `(1.2, 4)` should be sufficient for defining inline tuple values.
In my experience one of the biggest hurdles to learning the Stan language for new users who are used to less strongly/staticly typed languages like R and Python is the type system. That’s why I think that being as explicit as possible about the type declarations, even at the expense of some mild additional verbosity, will be beneficial overall.
I’m not sure that the unpacking/assignment syntax has any ambiguity if we require parenthesis around the LHS of an unpacking. This is not needed in languages like Python, but it is a decent choice in a statically typed language. That way, if you leave the parenthesis off, it will simply be a type error, e.g
real a, b = (2.5, 3.5)
Will fail, since it will parse as trying to assign a tuple of reals to the singular real variable b.
Sorry I wasn’t referring to formal ambiguity of the compiler but rather cognitive ambiguity of users. This new type introduces a few new concepts that I think can trip up many users, and I think that language design that explicitly discriminates between type declarations and assignments where appropriate would help ease that burden.
|
I will reword that part, good catch. Agreed on the point of documentation. In addition to being very clear about the issues you describe, I think it will be important to stress what benefits this provides; e.g., it is possible right now to define a function which returns two real values by settting the return type to I see your point on the extra verbosity of type declarations, and would like to hear the thoughts of some others (@rybern, @SteveBronder, @bob-carpenter). To be clear, the proposed syntax would look like |
Agreed on the point of documentation. In addition to being very clear about the issues you describe, I think it will be important to stress what benefits this provides; e.g., it is possible right now to define a function which returns two real values by settting the return type to array[] real and making a contract with yourself that this will always be length 2. By contrast, if the return type is (real, real) the compiler will make that guarantee for you.
Yeah, especially since this (very good) motivation is not the motivation that has long been thrown around for tuples which are ragged arrays. Users coming to this tuple design thinking that it will allow them to loop over ragged arrays will be frustrated, and so the sooner the documentation can communicate the scope of this tuple type the better.
|
I don't know who was thinking that was the motivation, but that was never my intention. I've always just wanted standard product types in the type theoretic sense. What tuples give you is like a fixed-size, fixed-type version of a list in R.
I'm OK with either tuple(real, int, real) x = (2.7, 12, -1e-6); or (real, int, real) x = (2.7, 12, -1e-6); I very much like the idea of being able to follow this with: real a, c;
int b;
a, b, c = x; which follows Python convention, or, if necessary for the parser, a, b, c = x; |
The initial tuples PR we have for stanc doesn't support unpacking, but it is something I want to add soon thereafter. I think it will be possible to translate it directly to a |
During today's monthly language meeting, @spinkney and @avehtari both expressed a preference for the syntax Aki made the particular point that the more explicit form makes it much easier to search a large program (either with your eyes or with a tool like Tuple expressions are still made with parenthesis, so a full example is tuple(real, int, real) x = (2.7, 12, -1e-6); |
I'm totally OK with the |
After updating the implementation to match it has grown on me more than I thought. It makes things like nested tuples much more apparent to the eye, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few suggestions, but overall, the spec is very clear.
regarding static - was confused by the language - perhaps "slots" makes more sense? |
Agreed, especially in the final user-facing documentation. The analogy to arrays is a good one, but also has its own potential to be confusing since the relationship between different dimensions of an array and different elements in a tuple is not the same |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per discussion, we can be clearer w/r/t tuples structure, but otherwise, this is g2g.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very clear! complete approval!
This design doc takes the information from Bob's original proposal here: #24
It reformats it to the standard RFC format and adds additional information on the currently proposed implementation.
In particular, it details IO formats and some of the implementation challenges.
Rendered version:
https://github.com/WardBrian/design-docs/blob/tuples/designs/0018-tuples.md