Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

go/ast: Free-floating comments are single-biggest issue when manipulating the AST #20744

Open
griesemer opened this issue Jun 21, 2017 · 21 comments

Comments

@griesemer
Copy link
Contributor

commented Jun 21, 2017

Reminder issue.

The original design of the go/ast makes it very difficult to update the AST and retain correct comment placement for printing. The culprit is that the comments are "free-floating" and printed based on their positions relative to the AST nodes. (The go/ast package is one of the earliest Go packages in existence, and this is clearly a major design flaw.)

Instead, a newer/updated ast should associate all comments (not just Doc and Comment comments) with nodes. That would make it much easier to manipulate a syntax tree and have comments correctly attached. The main complexity with that approach is the heuristic that decides which comments are attached to which nodes.

@griesemer griesemer added this to the Unreleased milestone Jun 21, 2017

@griesemer griesemer self-assigned this Jun 21, 2017

@jimmyfrasche

This comment has been minimized.

Copy link
Member

commented Jun 21, 2017

@griesemer

This comment has been minimized.

Copy link
Contributor Author

commented Jun 21, 2017

@jimmyfrasche Thanks for your comment. I do have a design and prototype that is close, but not good enough for gofmt purposes. I'd be interested in how you integrate your comment nodes in the rest of a syntax tree.

@jimmyfrasche

This comment has been minimized.

Copy link
Member

commented Jun 21, 2017

@jimmyfrasche

This comment has been minimized.

Copy link
Member

commented Jun 21, 2017

@griesemer

This comment has been minimized.

Copy link
Contributor Author

commented Jun 21, 2017

@jimmyfrasche Thanks for the clarification. Because /* */ style comment can appear anywhere, they do add complexity as it's not always clear if they belong to the token before or after.

@jimmyfrasche

This comment has been minimized.

Copy link
Member

commented Jun 21, 2017

@jimmyfrasche

This comment has been minimized.

Copy link
Member

commented Jun 22, 2017

I suppose one way around that would be, in expressions, to always have comments come after (where, for binops "after" is defined as after the binary operator so that

a /* 1 */ + /* 2 */ b /* 3 */

stores 1 on a, 2 on +, and 3 on b.

Stuff like CallExpr would need to store multiple comments for code like

f /* 1 */ () /* 2 */

and statements would need before and after comments for

/* before */ return /* after - but if expression is returned comment is attached to that */

With that and the "comments are nodes" approach plus the extra CommentGroup on the File, I think that would get everything.

Verbose and filled with cases but explicit and easy to manipulate.

@griesemer

This comment has been minimized.

Copy link
Contributor Author

commented Jun 23, 2017

@jimmyfrasche My approach is similar. On the one extreme, each token could have a potential comment attached; and on the other, there's one comment for each node. The former is too expensive in terms of representation but would probably preserve comment positioning perfectly; and the latter is relatively cheap but may not preserve all comments in place (depending on heuristics). The stored position is helpful, but relying on it introduces the same problems we have already. Thus, any solution will have to work based on the graph alone, and ignore positions (they are used only in the beginning for the heuristics to determine how to associate a node).

@quasilyte

This comment has been minimized.

Copy link
Contributor

commented Oct 16, 2017

Are there any advices in how this can be done today?

In my experience, trailing comments are particularly hard to handle.
Suppose a tool that sorts constants in alphabetical order:

const (
  C // C comment
  A
  B // B comment
)

With manual Pos update and format.Node, the best result I managed to achieve is:

const (
  A
  // C comment
  C 
  // B comment
  B
)

I am sorry if this is just me being lame.

#17108 had some progress, but it is unclear if it has any relation to this.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Oct 16, 2017

@quasilyte This issue is about fixing the problem. Let's not complicate the issue with a discussion of how to deal with the problem before it is fixed. Please take that discussion to a forum; see https://golang.org/wiki/Questions. Thanks.

@matthewmueller

This comment has been minimized.

Copy link

commented Jul 20, 2018

Hey guys, I'm wondering if I can help push this one along in some way or if there are any workarounds to this issue? I keep finding reasons to modify, append and delete pieces of the AST, but adding and positioning comments ends up biting pretty hard.

Right now I use text/template to generate Go code, but that doesn't work well if you want to make adjustments to existing code. My current use-case is that I want to automatically adjust function signatures and add new functions based on schema changes, without breaking or overriding these function's handwritten bodies. This kind of code surgery is really difficult to pull off in Go right now.

I'm sure you guys have thought about this a ton already, so I'm not sure how helpful this is, but I think the AST changes would have the following characteristics:

  • comment nodes implement ast.Decl and ast.Stmt, so they can be added anywhere
  • free-floating comments (e.g. comment nodes whose next sibling isn't an ast.GenDecl) would be scattered throughout the AST, in the same way declarations and statements are.
  • attached comments (on top of a declaration or to the side of a field), would be considered owned by the node and live in the Doc field.
  • ast/printer uses gofmt's spacing rules, rather than token position

And then, any changes to these comment nodes would be reflected when ast/printer traverses the AST and prints the code out in a consistent format. That would be amazing

@vektah vektah referenced this issue Jul 23, 2018
@griesemer

This comment has been minimized.

Copy link
Contributor Author

commented Aug 6, 2018

@matthewmueller This sounds about right to me, but the devil is in the details.

I think we would want each AST node to have an additional comment field (that is usually a nil pointer) and that field would then be used to attach comments that "belong" (are associated) with that node. There may be additional information about the comment's position needed (as in "above", "before", "after", or "below" the node's line) for good positioning. Such associated comments would be always attached with the relevant nodes. (It's not clear to me that we can just use the existing Doc fields and expand the concept of Doc fields to all nodes given that those have specific meaning at the moment.) Additionally, as you have observed, "free-floating" (non-associated) comments would need to act like Decls and Stmts (and presumably those would always have an empty line before and after them as otherwise they would be associated with a node.

It should be possible to do this is an backward-compatible way: Add a new *Comment (or similar) field to all nodes, and define the respective data structure. Then, a new function could take the existing list of comments from the parser and associate them with the nodes. This is where a good heuristic is needed (I've had a prototype for this a couple of years back but I didn't get to make it complete). Finally, go/printer would need to be adjusted to position comments based on node association (I'd copy go/printer and create a 2nd version that is appropriately modified - this would make sure the existing code will continue to run as always).

The hard part is getting the comment association heuristic work well.

@josharian

This comment has been minimized.

Copy link
Contributor

commented Aug 7, 2018

@matthewmueller some concrete examples of cases that are currently mishandled: #21755, #22371.

@dave

This comment has been minimized.

Copy link
Contributor

commented Sep 15, 2018

I'm currently working on a project that attempts to address this problem.

github.com/dave/dst is a fork of the go/ast package with a few changes. The decorator package converts from *ast.File + *token.FileSet to *dst.File and back again.

All the position fields have been removed from dst so it's just the location in the tree that determines the position of the tokens. Decorations (e.g. comments and newlines) are stored along with each node, and attached to the node at various points. The intention is that any place gofmt will allow a comment / new-line to be attached, dst will allow this.

I've finished a very rough prototype that works pretty well. (Take a look at restorer_test.go - all the tests pass apart from FuncDecl now).

There's several special cases that it doesn't currently handle. Right now I'm generating much of the code, so the special cases are non-trivial to implement. (e.g. Look at FuncDecl - the func token from the Type field is rendered before Recv and Name). Over the next few weeks I'll refactor and handle the special cases.

As @griesemer points out a big problem is where to attach the decorations so as you manipulate the tree they remain attached to the node you were expecting. My algorithm probably needs improvement here too (see decorator_test.go), but I think it currently works well enough to be useful.

This issue is concerned with a replacement for the go/ast package - dst isn't necessarily aiming to be this, so I'll invite further conversation in the dst repo issues or in the #dst Gophers Slack channel.

@gopherbot

This comment has been minimized.

Copy link

commented Sep 24, 2018

Change https://golang.org/cl/137076 mentions this issue: cmd/goforward: add a tool for moving packages

@dave

This comment has been minimized.

Copy link
Contributor

commented Oct 20, 2018

I'm pretty happy with how development of https://github.com/dave/dst has come on in the past few weeks.

My main yardstick is a test that converts the entire Go standard library from ast to dst and back to ast. It prints the resultant ast and compares with the original source. Using this I've found a host of edge cases which are now all fixed. It's possible there's more bugs but I think the standard library is a pretty good corpus of code, and it's now working with 100% fidelity.

I'm also a lot happier with the algorithm that attaches comments to nodes. It gives acceptable results in the vast majority of cases.

Over the next few weeks I'll be dog-fooding the system in a couple of projects, so I'll get a better idea of how it works in the real-world.

I would invite you all to take a look. I may make some tweaks to the API that controls adding / removing comments from the attachment points, and I'd very much appreciate feedback on this.

@stapelberg

This comment has been minimized.

Copy link
Contributor

commented Jan 29, 2019

I would invite you all to take a look. I may make some tweaks to the API that controls adding / removing comments from the attachment points, and I'd very much appreciate feedback on this.

I have just gotten a chance to test the dst package, and it indeed deals with newlines/whitespace much better than the go/ast package in that dst allows you to insert line breaks/blank lines before/after inserted AST nodes.

I haven’t had the need to touch comments for my use-case (yet?), but the whitespace controls are very valuable, and I did not run into any correctness issues with dst yet.

Thanks for the helpful package!

@dave

This comment has been minimized.

Copy link
Contributor

commented Jan 29, 2019

Thanks @stapelberg! I worked really hard on getting the output perfect. Let me know if you find anything unexpected and I'll look into it.

@matttproud

This comment has been minimized.

Copy link
Contributor

commented Jan 29, 2019

@ZhengHe-MD

This comment has been minimized.

Copy link

commented Jun 11, 2019

@dave the dst package just saved my life. it's very helpful! and the dstutil module makes the migration seamless.

@Zemnmez

This comment has been minimized.

Copy link

commented Jul 30, 2019

@dave really appreciate dst. this was such an issue for generating good code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.