Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pretty printer preserves comments #894

Merged
merged 18 commits into from Sep 7, 2021

Conversation

nhuurre
Copy link
Collaborator

@nhuurre nhuurre commented May 3, 2021

Another shot at #93 . This is less hacky than #603 and keeps all comments (the comments with "unexpect" placement get pushed a few lines forward).

The strategy here is not to attach the comments to AST nodes but to dump them all into a single list with location metadata for each. Since AST nodes already have location metadata (for reporting semantic errors) the pretty printer can use that to interweave the comments with the code it outputs.

This is draft because unfortunately something goes wrong with #include and I can't figure out why.

Release notes

Correct canonicalizing/auto-formatting of commented Stan programs; comments no longer discarded.

Copyright and Licensing

Copyright holder: Niko Huurre
By submitting this pull request, the copyright holder is agreeing to
license the submitted work under the BSD 3-clause license (https://opensource.org/licenses/BSD-3-Clause)

@nhuurre
Copy link
Collaborator Author

nhuurre commented Jun 12, 2021

Ok, I figured it out. The location metadata coming out the parser is a bit off, the begin_loc of a statement is actually at the end of the previous statement and that means some hacks are required to correctly format the whitespace in between.

Anyway, this is ready for review.
I don't understand what's causing the Jenkins failure but it seems unrelated.

@nhuurre nhuurre marked this pull request as ready for review June 12, 2021 20:00
@rok-cesnovar
Copy link
Member

Sorry Niko, just saw that this has been waiting for 2 weeks+.

@rybern would you be able to have a look at this, given that you already discussed solutions/reviewed in #603? Else I can take a look.

@rybern
Copy link
Collaborator

rybern commented Jun 28, 2021

Sorry, probably not going to be able to help with this in a timely manner @rok-cesnovar

Copy link
Member

@rok-cesnovar rok-cesnovar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, I do have a question about all the new newlines (see pretty.expected changes). Are they intentional and could we avoid adding them?

@WardBrian
Copy link
Member

Is any help needed on this issue? I'd love to see it merged so I can start really using the auto formatting ability on my own programs without worrying about comments

@nhuurre
Copy link
Collaborator Author

nhuurre commented Aug 27, 2021

I think I resolved the problem with extra lines around local blocks.

Is any help needed on this issue?

Just waiting for a review.

@WardBrian
Copy link
Member

Do you still want to review this @rok-cesnovar? Otherwise I can take a look

@rok-cesnovar
Copy link
Member

Go ahead Brian, thanks! I forgot about this one :/ sorry

Copy link
Member

@WardBrian WardBrian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- just a few minor comments and a question. Also, looks like you need to make format to make Jenkins happy

src/frontend/lexer.mll Show resolved Hide resolved
let trim_tail col_num lines =
match lines with [] -> [] | hd :: tl -> hd :: trim (col_num - 2) tl
in
Fmt.pf ppf "@[<v>/*%a*/@]" Fmt.(list string) (trim_tail col_num lines)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How difficult would it be to preserve single-line comments as // style? I think this would be nice to have, but if it is too difficult I am OK with this behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, Bob pointed out to me that we have to preserve single line comments.

Consider the valid stan program

// this is a comment */

Converting this from a line comment to a block comment is invalid, and running this branch on it leads to:

Warning: Empty file 'test.stan' detected; this is a valid stan model but likely unintended!
Uncaught exception:
  
  "Option.value_exn None"

Raised at file "src/error.ml" (inlined), line 9, characters 14-30
Called from file "src/option.ml", line 69, characters 4-21
Called from file "src/frontend/Pretty_printing.ml", line 598, characters 33-74
Called from file "src/frontend/Pretty_printing.ml", line 606, characters 2-28
Called from file "src/stanc/stanc.ml", line 190, characters 18-60
Called from file "src/stanc/stanc.ml", line 268, characters 9-16

This needs to be addressed, and we should add said file to the test suite

src/frontend/Pretty_printing.ml Show resolved Hide resolved
let trim_tail col_num lines =
match lines with [] -> [] | hd :: tl -> hd :: trim (col_num - 2) tl
in
Fmt.pf ppf "@[<v>/*%a*/@]" Fmt.(list string) (trim_tail col_num lines)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, Bob pointed out to me that we have to preserve single line comments.

Consider the valid stan program

// this is a comment */

Converting this from a line comment to a block comment is invalid, and running this branch on it leads to:

Warning: Empty file 'test.stan' detected; this is a valid stan model but likely unintended!
Uncaught exception:
  
  "Option.value_exn None"

Raised at file "src/error.ml" (inlined), line 9, characters 14-30
Called from file "src/option.ml", line 69, characters 4-21
Called from file "src/frontend/Pretty_printing.ml", line 598, characters 33-74
Called from file "src/frontend/Pretty_printing.ml", line 606, characters 2-28
Called from file "src/stanc/stanc.ml", line 190, characters 18-60
Called from file "src/stanc/stanc.ml", line 268, characters 9-16

This needs to be addressed, and we should add said file to the test suite

@WardBrian
Copy link
Member

@nhuurre - would you mind if I take a crack at adapting what you've done to line comments?

@nhuurre
Copy link
Collaborator Author

nhuurre commented Sep 3, 2021

@WardBrian
Sorry, I should have gotten back to this earlier. Feel free try if you want. Pushed a commit with partial progress right now.
I think you need to use Format.pp_force_newline to guarantee a line break after the comment without messing up the formatting but when I tried I got an empty line after every line comment.

Copy link
Member

@WardBrian WardBrian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nevermind then, that is more or less exactly how I would have implemented it!

It seems like there is a merge issue for the AST with the addition of complex numbers, but that should be a simple fix.

Additionally, can you add the following as test/integration/good/comments.stan, and update the expected test outputs? We should make sure they preserve line comments and that the following program is preserved (except for #)

// this is a line comment */ should stay // /**/

# another one using # - should become //

// how ugly # can */ we make /* this one // # 

/* one line block */

/* a weird 
      ly formated
          block 
                */
                
/**
 * Doc comment
 * @return nothing
 */

 data {
   //nothing here either 
 }

Fmt.pf ppf "@[<v>/*%a*/@]" Fmt.(list string) (trim_tail col_num lines)
if is_block then
Fmt.pf ppf "@[<v>/*%a*/@]" Fmt.(list string) (trim_tail col_num lines)
else Fmt.pf ppf "@[<v>/*%a */@]" Fmt.(list string) lines
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this line should be

  else Fmt.pf ppf "//%a" Fmt.string (List.hd_exn lines)

or similar

Copy link
Collaborator Author

@nhuurre nhuurre Sep 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this just produces a block comment at the moment.
A // comment needs a new line at the end.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like newlines are preserved in most cases, it is only a few edge cases where weird things happen, like in test/integration/good/ode_good.stan we have this section:

functions {
  real[] harm_osc_ode(real t,
                      real[] y,         // state
                      real[] theta,     // parameters
                      real[] x,         // data
                      int[] x_int) {    // integer data

which fails in the naive solution I suggested above.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. That's where you need Format.pp_force_newline. But I'm afraid that knowing if you're in that context requires a large refactoring of the printer code.

Copy link
Member

@WardBrian WardBrian Sep 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the tests which work for now, leaving out the exceptions so the tests will still fail in CI. This will require some thought, but I think we could try a special case if a comma and a line comment are on the same line?

Edit: Hmm, I suppose you will have a problem if the comment appears inside any set of parenthesis, even if there isn't a comma, like

real f(int x //some comment
) {
...

Maybe the only state you need to track is 'are we inside parenthesis?', but that's not exactly simple either. Another alternative way is to have the lexer preserve newlines/eols in the string it uses to create a LineComment, and then try our best to remove unnecessary newlines later?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference is pp_list_of (inside parens) vs pp_list_of_statements (at toplevel). Those don't share as much code as I thought so maybe it's not so bad.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully it is simpler then. Feel free to ping me if you want help/someone else to look at the problem

@WardBrian WardBrian linked an issue Sep 7, 2021 that may be closed by this pull request
Copy link
Member

@WardBrian WardBrian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! This is a really great feature, thanks for all your work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Store comments on statements in AST for pretty printer
4 participants