Allow array sizes after type in declarations #560

rybern · 2020-06-02T01:26:57Z

Declarations of the form:

type[n,..] identifier;
type[n,..] identifier = value;

are now parsed, and have the same semantics as if the [n,..] were after the identifier. For example:

int[2] a;
vector[3][3,2] b = ..;
matrix[3,3][5,4] c = ..;

is now equivalent to:

int a[2];
vector[3] b[3,2] = ..;
matrix[3,3] c[5,4] = ..;

Let me know if you come up with more test cases.

Paging @bob-carpenter

rok-cesnovar · 2020-06-17T09:55:39Z

@nhuurre would you be comfortable reviewing this?

nhuurre · 2020-06-17T10:05:22Z

Sure, I'll give it a try.

nhuurre

The change I'm requesting is that a user who types

real[2] y[2];
real[2][2] z;

needs a meaningful error that explains what they did wrong.

I also have a question about the syntax. The declaration has the matrix dimensions before array dimensions. This is backwards compared to accessing the elements.

matrix[K,L][N,M] mat;
print(mat[n,m,k,l]);

I find that a bit counter-intuitive. Some new users have mixed up the order even with the current syntax where the dimensions are clearly separated. Has anyone considered unified size declaration like

matrix[N,M|K,L] mat;

where | separates the matrix dimensions from the array-like dimensions and the dimensions are in the same order as they will be when indexing the object.

src/frontend/parser.mly

nhuurre · 2020-06-17T12:42:39Z

test/integration/good/pretty.expected

  vector[7] c1[8];
  vector[7] c2[8, 9];
+  vector[7] c3[8];
+  vector[7] c4[8, 9];


So the pretty printer reverts code to the old syntax. That seems odd; isn't the new syntax meant to be a better replacement for the old? If so, the printer should output should reflect that.

Or maybe the AST should remember which syntax was used and pretty printer preserves it. I'd like to keep syntax information in the AST. The canonicalizer can change it to the "preferred" form.

Agreed. My plan was to make that change in a future PR, but I could easily add it if that's better

The pretty printer can wait. I think the AST change would belong to this PR.

Is the AST currently sensitive to style anywhere? It currently throws out e.g. whitespace and comments. IMO, preserving code style is a pretty significant change in the goal of the AST, and this would be the only instance AFAIK. Do you have an important use case in mind?

I don't know but here Matthis says the current AST is "full fidelity" and people discuss why that is a good thing. Throwing out comments is a mistake. I consider it a critical bug that precludes the use of --auto-format.

I see what you're saying; you're considering it a bug that e.g. comments are thrown out, and this would make that bug worse by not including this element of style in the AST.

But I was misleading, this isn't a feature of style like whitespace or comments; we're planning on depreciating the old syntax, so really we want to always print out the new syntax. When we do that we won't need this information in the AST, it would become obsolete once the pretty printer changes are made, right?

The problem with omitting comments is that --auto-format doesn't improve the readbility of a complex model--without comments it's even worse. I don't care about other syntax changes.

Note that GetLP and GetTarget are exactly the same except that one is deprecated. The current AST considers them separate anyway. The pretty printer prints the AST faithfully; deprecated constructs are cleaned up by the canonicalizer. This division of labor makes sense to me.

Good point about the precedent with GetLP and GetTarget, especially because it was necessary to parse them differently in order to emit a depreciation warning. I agree it's necessary to put them in the AST for that reason at minimum. I'll add that!

rybern · 2020-06-17T14:07:24Z

Thanks @nhuurre!

The change I'm requesting is that a user who types
real[2] y[2];
real[2][2] z;
needs a meaningful error that explains what they did wrong.

Good catch, I'll see what I can do. It's sometimes quite difficult to differentiate erroneous parser states from each other. I would personally prefer if this syntax were allowed but that's a bigger change.

I also have a question about the syntax. The declaration has the matrix dimensions before array dimensions. This is backwards compared to accessing the elements.
matrix[K,L][N,M] mat;
print(mat[n,m,k,l]);
I find that a bit counter-intuitive. Some new users have mixed up the order even with the current syntax where the dimensions are clearly separated.

I agree that mat[n, m, k, l] is counter-intuitive in this case, but I think that mat[n, m][k, l] is correct, and I think it's reasonable (and possibly necessary) for the former syntax to match the latter. (I say the latter is correct because for any value a of type T[], I expect a[n] to be of type T, so mat[n, m] should be of type matrix.)

Has anyone considered unified size declaration like
matrix[N,M|K,L] mat;
where | separates the matrix dimensions from the array-like dimensions and the dimensions are in the same order as they will be when indexing the object.

I personally don't find that syntax easier, but I'd understand if it were easier for the new users you mentioned.

rybern · 2020-06-17T14:22:07Z

On the parser message, I can only add one message for all unexpected cases. I could change the message for e.g. real [2] y[2]; to something like "Expected assignment or semicolon after identifier in declaration (a second array index is not allowed).", but that message would be more confusing when triggered by e.g. real [2] z with a missing semicolon.

Edit: I should mention that I could add a specialized error message using the same trick I used for #457, but it adds some complexity.

nhuurre · 2020-06-17T14:40:44Z

I can only add one message for all unexpected cases. I could change the message for e.g. real [2] y[2]; to something like "Expected assignment or semicolon after identifier in declaration (a second array index is not allowed)."

Can't you just make the parser expect real[2] y[2] but then panic when it sees it?

Regarding the syntax options, I didn't see a stanc3 issue for this. Where's the design discussion? Is it part of the ragged arrays design-doc? Somewhere on Discourse?

rybern · 2020-06-17T14:46:45Z

Can't you just make the parser expect real[2] y[2] but then panic when it sees it?

Yep, that's essentially the trick I mentioned. I try to avoid it because it mangles the parse tree a bit. I can add it here if you think that's going to be a sufficiently common point of confusion.

Regarding the syntax options, I didn't see a stanc3 issue for this. Where's the design discussion? Is it part of the ragged arrays design-doc? Somewhere on Discourse?

It's a necessary step preceding tuples; @bob-carpenter may know about a design discussion

bob-carpenter · 2020-06-17T15:08:19Z

I definitely agree on meaningful errors for doubling up array syntax.

The order of stacked operators is an evergreen debate topic in algebraic circles because no ordering is optimal for everyting.

I like matrix[M, N][K1, ..., KJ] because

it preserves the simple declaration of array syntax as T[K1, ..., KJ], where here T = matrix[M, N], and
it preserves straightforward array indexing semantics, so that if x is an array of type T[K1, ..., KJ], then x[k1] is of type T[K2, ..., KJ] if k1 in 1:K1.

As @nhuurre points out, this produces a reversed index ordering of K1, ..., KJ, M, N for matrix[M, N][K1, ..., KJ], which will definitely confuse some users. The particular cases for concern are:

matrix[M, N][I, J],
vector[M][I],

and their constrained equivalent, because those won't raise compiler errors if the indexes are reversed.

I tend to lean toward the solution that's easiest to specify and hence document. That usually leans toward generality. But that's how we wound up with the general array issue in the first place where we have real[] because I just wanted to say that whenever we have a type T, we can have an array type T[] without having to say things like "except for T = real".

nhuurre · 2020-06-17T15:17:34Z

Yep, that's essentially the trick I mentioned. I try to avoid it because it mangles the parse tree a bit. I can add it here if you think that's going to be a sufficiently common point of confusion.

You said you'd prefer to allow real[N][K] x[M];. I'd like design where the parser accepts it and semantic check rejects it. Of course that requires keeping at least some formatting info in the AST.
The canonicalizer could even fix such "unambiguous but nonconforming" syntax.

rybern · 2020-06-17T15:43:56Z

I don't know about extending the AST to include something that we don't support just to reject it, I think that's higher cost than explicitly rejecting it with the parser trick.

I'll make the changes you suggested @nhuurre - better error message and parse each syntax separately.

bob-carpenter · 2020-06-17T17:25:31Z

This is off topic, but are omments not in the AST? If so, should I create an issue to add them? That way, we can use the AST as the basis of a proper pretty printer for Stan programs. I'd like to autoformat, say, all the programs in our example repos.

nhuurre · 2020-06-17T17:33:58Z

This is off topic, but are omments not in the AST? If so, should I create an issue to add them?

There's already an issue for it, #93 , but it's buried on page 5 (!!) of our issue tracker far behind many issues fixed or forgotten long ago. I looked into it last week but despite being marked as "good first issue" it's seems rather difficult.

rok-cesnovar · 2020-06-17T17:39:12Z

Agreed, we need to sweep this issue tracker a bit.

nhuurre · 2020-06-17T18:47:50Z

The order of stacked operators is an evergreen debate topic in algebraic circles because no ordering is optimal for everyting.

I opened a poll on discourse to get more feedback on this
https://discourse.mc-stan.org/t/new-array-declaration-syntax/16011

rybern · 2020-08-10T00:47:08Z

This is superseded by #669, which implements the array[] syntax

rybern · 2020-08-12T02:42:05Z

Closing in favor of #669

rybern added 3 commits June 1, 2020 17:59

allowing array sizes to be after types, in addition to after identifiers

85e960e

Added test. Also need to refresh parser errors (473); rerun tests.

78400ac

updated parser error messages; added tests

14d91b6

rybern mentioned this pull request Jun 2, 2020

Parse multiple-identifier declarations and declare-defines; refactor declaration parsing #561

Closed

nhuurre requested changes Jun 17, 2020

View reviewed changes

rybern mentioned this pull request Aug 9, 2020

New array type syntax #669

Closed

rybern closed this Aug 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow array sizes after type in declarations #560

Allow array sizes after type in declarations #560

rybern commented Jun 2, 2020

rok-cesnovar commented Jun 17, 2020

nhuurre commented Jun 17, 2020

nhuurre left a comment

nhuurre Jun 17, 2020

rybern Jun 17, 2020

nhuurre Jun 17, 2020

rybern Jun 17, 2020

nhuurre Jun 17, 2020

rybern Jun 17, 2020

nhuurre Jun 17, 2020

rybern Jun 17, 2020

rybern commented Jun 17, 2020

rybern commented Jun 17, 2020 •

edited

nhuurre commented Jun 17, 2020

rybern commented Jun 17, 2020

bob-carpenter commented Jun 17, 2020

nhuurre commented Jun 17, 2020

rybern commented Jun 17, 2020

bob-carpenter commented Jun 17, 2020

nhuurre commented Jun 17, 2020

rok-cesnovar commented Jun 17, 2020

nhuurre commented Jun 17, 2020

rybern commented Aug 10, 2020

rybern commented Aug 12, 2020

Allow array sizes after type in declarations #560

Allow array sizes after type in declarations #560

Conversation

rybern commented Jun 2, 2020

rok-cesnovar commented Jun 17, 2020

nhuurre commented Jun 17, 2020

nhuurre left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rybern commented Jun 17, 2020

rybern commented Jun 17, 2020 • edited

nhuurre commented Jun 17, 2020

rybern commented Jun 17, 2020

bob-carpenter commented Jun 17, 2020

nhuurre commented Jun 17, 2020

rybern commented Jun 17, 2020

bob-carpenter commented Jun 17, 2020

nhuurre commented Jun 17, 2020

rok-cesnovar commented Jun 17, 2020

nhuurre commented Jun 17, 2020

rybern commented Aug 10, 2020

rybern commented Aug 12, 2020

rybern commented Jun 17, 2020 •

edited