New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow array sizes after type in declarations #560
Conversation
@nhuurre would you be comfortable reviewing this? |
Sure, I'll give it a try. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change I'm requesting is that a user who types
real[2] y[2];
real[2][2] z;
needs a meaningful error that explains what they did wrong.
I also have a question about the syntax. The declaration has the matrix dimensions before array dimensions. This is backwards compared to accessing the elements.
matrix[K,L][N,M] mat;
print(mat[n,m,k,l]);
I find that a bit counter-intuitive. Some new users have mixed up the order even with the current syntax where the dimensions are clearly separated. Has anyone considered unified size declaration like
matrix[N,M|K,L] mat;
where |
separates the matrix dimensions from the array-like dimensions and the dimensions are in the same order as they will be when indexing the object.
vector[7] c1[8]; | ||
vector[7] c2[8, 9]; | ||
vector[7] c3[8]; | ||
vector[7] c4[8, 9]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the pretty printer reverts code to the old syntax. That seems odd; isn't the new syntax meant to be a better replacement for the old? If so, the printer should output should reflect that.
Or maybe the AST should remember which syntax was used and pretty printer preserves it. I'd like to keep syntax information in the AST. The canonicalizer can change it to the "preferred" form.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. My plan was to make that change in a future PR, but I could easily add it if that's better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pretty printer can wait. I think the AST change would belong to this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the AST currently sensitive to style anywhere? It currently throws out e.g. whitespace and comments. IMO, preserving code style is a pretty significant change in the goal of the AST, and this would be the only instance AFAIK. Do you have an important use case in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you're saying; you're considering it a bug that e.g. comments are thrown out, and this would make that bug worse by not including this element of style in the AST.
But I was misleading, this isn't a feature of style like whitespace or comments; we're planning on depreciating the old syntax, so really we want to always print out the new syntax. When we do that we won't need this information in the AST, it would become obsolete once the pretty printer changes are made, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with omitting comments is that --auto-format
doesn't improve the readbility of a complex model--without comments it's even worse. I don't care about other syntax changes.
Note that GetLP
and GetTarget
are exactly the same except that one is deprecated. The current AST considers them separate anyway. The pretty printer prints the AST faithfully; deprecated constructs are cleaned up by the canonicalizer. This division of labor makes sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point about the precedent with GetLP
and GetTarget
, especially because it was necessary to parse them differently in order to emit a depreciation warning. I agree it's necessary to put them in the AST for that reason at minimum. I'll add that!
Thanks @nhuurre!
Good catch, I'll see what I can do. It's sometimes quite difficult to differentiate erroneous parser states from each other. I would personally prefer if this syntax were allowed but that's a bigger change.
I agree that
I personally don't find that syntax easier, but I'd understand if it were easier for the new users you mentioned. |
On the parser message, I can only add one message for all unexpected cases. I could change the message for e.g. Edit: I should mention that I could add a specialized error message using the same trick I used for #457, but it adds some complexity. |
Can't you just make the parser expect Regarding the syntax options, I didn't see a stanc3 issue for this. Where's the design discussion? Is it part of the ragged arrays design-doc? Somewhere on Discourse? |
Yep, that's essentially the trick I mentioned. I try to avoid it because it mangles the parse tree a bit. I can add it here if you think that's going to be a sufficiently common point of confusion.
It's a necessary step preceding tuples; @bob-carpenter may know about a design discussion |
I definitely agree on meaningful errors for doubling up array syntax. The order of stacked operators is an evergreen debate topic in algebraic circles because no ordering is optimal for everyting. I like
As @nhuurre points out, this produces a reversed index ordering of
and their constrained equivalent, because those won't raise compiler errors if the indexes are reversed. I tend to lean toward the solution that's easiest to specify and hence document. That usually leans toward generality. But that's how we wound up with the general array issue in the first place where we have |
You said you'd prefer to allow |
I don't know about extending the AST to include something that we don't support just to reject it, I think that's higher cost than explicitly rejecting it with the parser trick. I'll make the changes you suggested @nhuurre - better error message and parse each syntax separately. |
This is off topic, but are omments not in the AST? If so, should I create an issue to add them? That way, we can use the AST as the basis of a proper pretty printer for Stan programs. I'd like to autoformat, say, all the programs in our example repos. |
There's already an issue for it, #93 , but it's buried on page 5 (!!) of our issue tracker far behind many issues fixed or forgotten long ago. I looked into it last week but despite being marked as "good first issue" it's seems rather difficult. |
Agreed, we need to sweep this issue tracker a bit. |
I opened a poll on discourse to get more feedback on this |
This is superseded by #669, which implements the |
Closing in favor of #669 |
Declarations of the form:
are now parsed, and have the same semantics as if the
[n,..]
were after the identifier. For example:is now equivalent to:
Let me know if you come up with more test cases.
Paging @bob-carpenter