Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

go/ast: provide AST Rewrite functionality (ast.Walk is not good enough) #17108

Closed
griesemer opened this issue Sep 14, 2016 · 41 comments
Closed

go/ast: provide AST Rewrite functionality (ast.Walk is not good enough) #17108

griesemer opened this issue Sep 14, 2016 · 41 comments

Comments

@griesemer
Copy link
Contributor

@griesemer griesemer commented Sep 14, 2016

This has come up again and again. While it's easy to traverse the AST (ast.Walk, ast.Inspect), it's not easily possible to use those for general tree rewrites w/o significant amount of work.

For instance, if we wanted to rewrite all expressions (ast.Expr) of a certain kind into something else, ast.Walk and ast.Inspect would need to consider every node that contains ast.Expr fields separately, rather than just look at nodes that satisfy ast.Expr.

AST rewriters exist in other packages; most notably perhaps in gofmt (for gofmt -r); that one is reflection-based. Reflection-based approaches tend to be general, but also hard to understand, and slower than necessary.

API starting point:

// An ApplyFunc is invoked by Apply for each node, before and/or after
// the node's children. See Apply for the interpretation of the return
// value.
//
// The node is given by the node pointer's address addr which makes it
// possible for an ApplyFunc to rewrite the node. The node's parent is
// the node containing the *addr. If the node is part of a list, index
// identifies its position in that list. Index is < 0 otherwise.
type ApplyFunc func(parent Node, index int, addr interface{}) bool

// Apply traverses a syntax tree recursively, starting with the node
// identified by parent, index, and addr. See Apply for the meaning
// of these arguments.
// 
// If pre is not nil, it is called for each node before its children
// are traversed (pre-order). If the result of calling pre is false,
// no children are traversed, and post is not called for the node.
//
// If post is not nil, it is called for each node after its children
// were traversed (post-order). If the result of calling post is false,
// traversal is terminated and Apply returns immediately.
func Apply(parent Node, index int, addr interface{}, pre, post ApplyFunc) {
@griesemer griesemer added this to the Go1.8Maybe milestone Sep 14, 2016
@griesemer griesemer self-assigned this Sep 14, 2016
@griesemer
Copy link
Contributor Author

@griesemer griesemer commented Sep 14, 2016

@alandonovan
Copy link
Contributor

@alandonovan alandonovan commented Sep 14, 2016

This seems like a good addition. The reflect-based implementations in gofmt (and eg) is fearsomely complex, and there are other times I have wanted to use this function.

The index is not enough to uniquely identify a subtree of a node. For example, ast.CaseClause contains two slices of Nodes, and ast.BinOp contains two Exprs. I think we need to identify the field too. The most obvious way to do that is by its name. Field numbers, or field offsets, might be marginally more efficient, but would certainly be harder to read, and I'm users would thank us for choosing strings when they're debugging.

I'm not sure you need the index (or index + field name) in the Apply function, only in ApplyFunc.

@mdempsky
Copy link
Member

@mdempsky mdempsky commented Sep 14, 2016

I've also wanted this.

Index is sufficient I think if it counts across all children. E.g., in AssignStmt, it counts from 0 to len(Lhs)+len(Rhs)-1.

I've had cases where I want to be able to walk up the AST more than one Node, so I maintain a slice of (Parent, Index) tuples.

Aside: Index technically isn't necessary, and Java's Compiler API omits it from com.sun.source.util.TreePath. But I think it's handy to have.

I'm not sure I understand what addr's dynamic type will be from the description. I'm guessing though that if I'm visiting a *ast.Ident, then it could be either a **ast.Ident (e.g., the name for a ValueSpec) or *ast.Expr (e.g., an identifier in an expression context)?

@alandonovan
Copy link
Contributor

@alandonovan alandonovan commented Sep 14, 2016

The dynamic type would be a pointer to the variable (field or slice element), whatever that may be: Node, Expr, *Ident.

@griesemer
Copy link
Contributor Author

@griesemer griesemer commented Sep 14, 2016

Here's a slightly more thought-through API that provides explicit names for fields. Nice, but definitively somewhat costly:

// An ApplyFunc is invoked by Apply for each node, before and/or after
// the node's children. See Apply for the interpretation of the return
// value.
//
// The node is given by the node pointer's address addr which makes it
// possible for an ApplyFunc to rewrite (replace) the node. The node
// is referred to from its parent which contains a field with the given
// name. If the field is a list, index identifies the node's position in
// that list; index is < 0 otherwise. Or roughly speaking:
//
//   addr == &parent.name         if index < 0
//   addr == &parent.name[index]  if index >= 0
//
// Exception: If the parent is an *ast.Package, and Apply is iterating
// through the Files map, name is the filename, and index is incremented
// by 1 with each file, starting at 0. Assigning to *addr will change the
// corresponding Files map entry but only upon return from pre and post.
type ApplyFunc func(parent Node, name string, index int, addr interface{}) bool

// Apply traverses a syntax tree recursively, starting with the node
// identified by parent, name, index, and addr. See Apply for the meaning
// of these arguments.
//
// If pre is not nil, it is called for each node before its children
// are traversed (pre-order). If the result of calling pre is false,
// no children are traversed, and post is not called for the node.
//
// If post is not nil, it is called for each node after its children
// were traversed (post-order). If the result of calling post is false,
// traversal is terminated and Apply returns immediately.
//
// Only fields that refer to AST nodes are considered children;
// specifically, token.Pos, Scopes, Objects, and fields of basic types
// (strings, etc.) are ignored. Children are traversed in the order in
// which they appear in the respective node's struct definition.
func Apply(parent Node, name string, index int, addr interface{}, pre, post ApplyFunc) {
@griesemer
Copy link
Contributor Author

@griesemer griesemer commented Sep 14, 2016

@mdempsky points out that alternatively, index could simply be monotonically increasing. The index of a list element would be the provided index minus the index of the list. A utility function could provide the name of a field for a given index if necessary. That function could be reflection-based since it's probably not speed-critical.

@alandonovan
Copy link
Contributor

@alandonovan alandonovan commented Sep 14, 2016

We probably still want the field name though, so that you can locate the subtree easily in cases like SliceExpr without having to do four separate comparisons---and arguably more importantly, without having to remember to do four separate comparisons. I have found several bugs caused by assuming that a given node was the "correct" subtree of the parent node, when in fact it was another one. Passing the field name will help to make it obvious that this is something you need to think about.

@mdempsky
Copy link
Member

@mdempsky mdempsky commented Sep 14, 2016

I think the easier solution is to still reserve indices for nil fields. So for SliceExpr you'd always have 0==X, 1==Low, 2 == High, 3==Max.

@alandonovan
Copy link
Contributor

@alandonovan alandonovan commented Sep 15, 2016

You can't use the same number to indicate both a field index and a slice index.

@griesemer
Copy link
Contributor Author

@griesemer griesemer commented Sep 15, 2016

I played with this a bit and I have a prototype that could work. Observations: Passing an address to a field is problematic: Many fields that we are interested in are (ast.)Expr, Stmt, and Decl fields, which are interfaces. To "unpack" them we have to type-switch on the address, then deref the address and then type-switch again on the contents; thus requiring two type switches. Here's a better approach: If we have the parent, field name (or field index), plus slice index if needed, we have all that is necessary because we can use reflect to set a field with this information. Changing/rewriting fields is (probably) much less common then reading (traversing) the tree, thus the more costly set field operation is ok. In turn, the API can be closer to Walk and thus easier to use:

// An ApplyFunc is invoked by Apply for each node n, before and/or after
// the node's children. See Apply for the interpretation of the return
// value.
//
// The parent, name, and index field are used to identify the parent
// node's field containing n. If the field is a list, index identifies
// the node's position in that list; index is < 0 otherwise. Roughly:
//
//   n == parent.name         if index < 0
//   n == parent.name[index]  if index >= 0
//
// This information can be used to update/rewrite a node with SetField.
//
// Exception: If the parent is a *Package, and Apply is iterating
// through the Files map, name is the filename, and index is -1.
type ApplyFunc func(parent Node, name string, index int, n Node) bool

// Apply traverses a syntax tree recursively, starting with the node
// identified by parent, name, index, and n. See Apply for the meaning
// of these arguments.
//
// If pre is not nil, it is called for each node before its children
// are traversed (pre-order). If the result of calling pre is false,
// no children are traversed, and post is not called for the node.
//
// If post is not nil, it is called for each node after its children
// were traversed (post-order). If the result of calling post is false,
// traversal is terminated and Apply returns immediately.
//
// Only fields that refer to AST nodes are considered children.
// Children are traversed in the order in which they appear in the
// respective node's struct definition.
func Apply(parent Node, name string, index int, n Node, pre, post ApplyFunc)

// SetField sets the named field in the parent node to n. If the field
// is a slice, index is the slice index. The named field must exist in
// the parent, n must be assignable to that field, and the field must be
// indexable if index >= 0. Roughly:
//
//   parent.name        = n  if index < 0
//   parent.name[index] = n  if index >= 0
//
// The parent node may be a pointer to the struct containing the named
// field, or it may be the struct itself.
//
// Exception: If the parent is a Package, n must be a *File and name is
// interpreted as the filename in the Package.Files map.
func SetField(parent Node, name string, index int, n Node)

Open questions:

  1. Should Apply require parent, name, and index (which often will be nil, "", -1) so that even the top-level node can be rewritten, or should that require a special case and in return a simpler Apply that just requires, the starting node n, pre, and post?

  2. Should field addressing happen via a name (string), which is nice to read but more expensive than say just the field index as provided by reflect)?

@alandonovan
Copy link
Contributor

@alandonovan alandonovan commented Sep 15, 2016

On 14 September 2016 at 23:52, Robert Griesemer notifications@github.com
wrote:

If we have the parent, field name (or field index), plus slice index if
needed, we have all that is necessary because we can use reflect to set a
field with this information. Changing/rewriting fields is (probably) much
less common then reading (traversing) the tree, thus the more costly set
field operation is ok. In turn, the API can be closer to Walk and thus
easier to use:

Sounds good. It's interesting that Apply is now a primitive interface for
traversing AST tree nodes and edges, whether or not you wish to update the
tree. In other words, you could define the existing ast.Walk and
ast.Inspect functions in terms of Apply. That might be a good exercise to
validate the API.

// Apply traverses a syntax tree recursively, starting with the node
// identified by parent, name, index, and n. See Apply for the meaning

typo: "See ApplyFunc"

  1. Should Apply require parent, name, and index (which often will be nil,

"", -1) so that even the top-level node can be rewritten, or should that
require a special case and in return a simpler Apply that just requires,
the starting node n, pre, and post?

I don't have a good sense of how important this case will be. You could
provide a wrapper for the root case with the three zero arguments.

  1. Should field addressing happen via a name (string), which is nice to
    read but more expensive than say just the field index as provided by
    reflect)?

I see three reasons to prefer field names over field indices:
(1) Client code that uses strings will be easier to read. Unnamed integer
constants are inscrutable and error-prone.
(2) Strings are invaluable during debugging. You can print them out to
document the descent, something that is hard to do with the existing (read
only) traversal API.
(3) The AST is bound to change, if only slightly, in future versions of Go,
and we should not assume that all new fields will be added at the end of
their struct, thus the field numbering may not be durable in the long term.

@griesemer
Copy link
Contributor Author

@griesemer griesemer commented Sep 15, 2016

@alandonovan ast.Walk was written w/o a very good understanding of use cases; in fact, I introduced ast.Inspect later because it was much easier to use in many situations. We cannot remove these functions but we could mark them as deprecated. ast.Apply should cover both use cases nicely.

@alandonovan
Copy link
Contributor

@alandonovan alandonovan commented Sep 15, 2016

I didn't mean you should remove or deprecate them, only that you should go through the exercise of implementing the old functions in terms of the new, since it might be revealing.

@griesemer
Copy link
Contributor Author

@griesemer griesemer commented Sep 15, 2016

@alandonovan I understand. I think it can be done but it will be difficult to guarantee 100% semantic equivalence w/o manually checking each case, or an extensive test suite testing each node type and exit scenario. For one, Walk doesn't invoke the visitor if a node is nil, while Apply probably should (and my prototype currently does) so that the pre/post closures have a chance to rewrite/set the node (that feature will make it possible to implement Walk, while Walk could not be used to implement Apply since it misses nodes).

@mdempsky
Copy link
Member

@mdempsky mdempsky commented Sep 15, 2016

You can't use the same number to indicate both a field index and a slice index.

I think we're talking past each other. Examples of how I use a single index to identify child nodes:

  • To find the i'th child of a CallExpr: i==0 means Fun, and otherwise you want Args[i-1].
  • To find the i'th child of an AssignStmt, i<len(Lhs) means Lhs[i], and otherwise you want Rhs[i-len(Lhs)].

That said, I'm not opposed to field names.

@alandonovan
Copy link
Contributor

@alandonovan alandonovan commented Sep 15, 2016

On 15 September 2016 at 12:48, Matthew Dempsky notifications@github.com
wrote:

You can't use the same number to indicate both a field index and a slice
index.

I think we're talking past each other. Examples of how I use a single
index to identify child nodes:

  • To find the i'th child Node of a CallExpr: i==0 means Fun, and
    otherwise you want Args[i-1].
  • To find the i'th child of an AssignStmt, i<len(Lhs) means Lhs[i],
    and otherwise you want Rhs[i-len(Lhs)].

That said, I'm not opposed to field names.

Ah, so you would flatten out the scalar fields and the slices into one
sequence. Makes sense, though the bookkeeping seems onerous.

griesemer added a commit to griesemer/dotGo2016 that referenced this issue Sep 22, 2016
@quentinmit quentinmit added the NeedsFix label Oct 10, 2016
@griesemer
Copy link
Contributor Author

@griesemer griesemer commented Oct 19, 2016

For a concrete implementation, see:
griesemer/dotGo2016@f0f16c2

@rsc rsc modified the milestones: Unplanned, Go1.8Maybe Oct 20, 2016
@fatih
Copy link
Member

@fatih fatih commented Jan 10, 2017

I recently started on a tool for vim-go and wanted to easily rewrite AST as well. I've seen this issue and really looking forward what we get out from this. Meanwhile, I've copied ast.Walk() and implemented a straight forward rewriter by changing the signature of the walk function we pass. Here is a working repo: https://github.com/fatih/astrewrite I'm going to use it for my next project. It might be not perfect or might have errors (which are unknown to me know), but I thought I'll share to show a different API usage.

Also I think, we can introduce a ast.Rewrite function without breaking the current API or add it to the golang.org/x/tools/go/ast/astutil.

@griesemer
Copy link
Contributor Author

@griesemer griesemer commented Jan 10, 2017

@fatih Any reason the version I mentioned above ( griesemer/dotGo2016@f0f16c2 ) wouldn't have worked for you?

@fatih
Copy link
Member

@fatih fatih commented Jan 11, 2017

@griesemer at first the API was a little bit confusing for me. So I didn't tried to investigate in depth. But I've tried to give it a look now that you asked for it.

There are more steps involved changing a node with Apply(). Using the example src below

src := `package main

type Foo struct{}`

fset := token.NewFileSet()
file, _ := parser.ParseFile(fset, "foo.go", src, parser.ParseComments) 

I've tried to rename the type name from Foo to Bar. here is how we can do it with Apply() and with Walk() from the astrewrite package"

1. Apply:

applyFunc := func(parent ast.Node, name string, index int, n ast.Node) bool {
	spec, ok := parent.(*ast.GenDecl)
	if !ok {
		return true
	}

	x, ok := n.(*ast.TypeSpec)
	if !ok {
		return true
	}

	x.Name.Name = "Bar"
	spec.Specs[index] = x
	return true
}

rewritten := ast.Apply(file, applyFunc, nil)

2. Walk() from astrewrite:

rewriteFunc := func(n ast.Node) (ast.Node, bool) {
	x, ok := n.(*ast.TypeSpec)
	if !ok {
		return n, true
	}

	x.Name.Name = "Bar"
	return x, true
}

rewritten := astrewrite.Walk(file, rewriteFunc)

With Apply() I have to know the parent nodes type as well, so I can replace the item via the index. However in Walk() from astrewrite I don't need to know the parent node at all. I only change the node I'm interested in and then return it back.

--

Note that my current use case is very simple. There might be other use cases that Apply() covers that I'm not aware of. I'm just trying to give my feedback as a regular Go user that occasionally uses the go/ast package for writing or improving existing tools.

@griesemer
Copy link
Contributor Author

@griesemer griesemer commented Jan 11, 2017

@fatih Thanks for looking into this. A few observations:

  1. When you use Apply, you don't need to type-assert on the parent - you can simply use the new SetField function (also documented in the API): ast.SetField(parent, name, index, <newnode>). With that, both versions will be approximately the same amount of code to write.
  2. In this specific case where you simply change a string, there's no need to use a complicated rewrite in the first place: You could just walk the tree and change that string directly.
  3. Your implementation of astrewrite is easier to use in simple cases, but it is almost always more costly: The implementation of astrewrite always needs to write back the result returned by rewriteFunc, and because it's always an ast.Node there's always a type assertion needed for the assignment to work (in the astrewrite implementation) - that's relatively costly. You have to always do this for every single node even though most nodes are not rewritten. On the other hand, with ast.Apply, the ast.SetField operation is relatively expensive but you only pay when you actually rewrite something.
  4. Finally, with ast.Apply you have the choice to do the rewriting in pre- or post-order when traversing the tree, which is often important to be able to control - this is not possibly with astrewrite.

To summarize: If you only want to change the name of things, you may not need a rewrite at all. Secondly, while astrewrite may be fine for your (more general) use case, it's probably not sufficiently powerful as a general rewrite mechanism, and likely quite a bit slower than the suggested ast.Apply. Hope that helps.

Anyway, thanks again for investigating!

@fatih
Copy link
Member

@fatih fatih commented Jan 11, 2017

@griesemer Thanks a lot for the detailed explanation. I've didn't know that I could just use SetField, missed that totally. My example was simple by purpose to show the API. You're right that it's costly and I totally agree, however the API is also simple. It feels intuitive. But this library is not the place to make things feel good in expense of performance :)

I'll try to use Apply() instead of astrewrite for my next project. If there is anything I think it can be improved I'll comment here. Thanks again 👍

@josharian
Copy link
Contributor

@josharian josharian commented Jan 13, 2017

@griesemer it'd be handy if that code was go-gettable, so it is easier to use and evolve.

Since you wrote it, perhaps you want to put it up on github under your name? Or I could put it up somewhere for you; let me know.

I've pulled it out of package ast for you: https://gist.github.com/josharian/ff74a8451d7d4e7f062d7b9b04c87eac. Not tested yet, but it compiles, and the only modification was adding ast. in a bunch of places. :)

@griesemer griesemer added this to the Go1.9 milestone Jan 13, 2017
@griesemer
Copy link
Contributor Author

@griesemer griesemer commented Feb 24, 2017

@josharian I like your API but the one thing that concerns me is that an ApplyCursor needs to be set up for each node even if it's not needed (which I suspect is 99% of the time because most often an Apply is looking for one specific node only). I wonder if this approach can be made slightly more light-weight w/o losing too much of it's power.

@josharian
Copy link
Contributor

@josharian josharian commented Feb 24, 2017

Indeed. Although maybe if we documented that the lifetime of an ApplyCursor is a single call to pre/post, then we could allocate a single ApplyCursor and reuse it. The non-allocation part of the setup of the ApplyCursor should be pretty cheap, I think.

One other thing I learned while using the API above is that frequently I don't want to walk the node inserted by InsertAfter. We might want to change the behavior to non-walk (since the user can manually walk it if desired) or add a bool parameter to control the behavior.

@griesemer
Copy link
Contributor Author

@griesemer griesemer commented Feb 24, 2017

@josharian Yes, if you pass in the ApplyCursor info down it's like passing the apply parameters via the ApplyCursor struct, and it's only allocated once. I like that. Care making this an official CL as solution for this issue (assign to me for review)?

@josharian
Copy link
Contributor

@josharian josharian commented Feb 24, 2017

Will do; expect it to take a few days.

@griesemer
Copy link
Contributor Author

@griesemer griesemer commented Feb 24, 2017

Thanks - no rush.

@josharian
Copy link
Contributor

@josharian josharian commented Mar 7, 2017

(still working on this, not forgotten)

@gopherbot
Copy link

@gopherbot gopherbot commented Aug 15, 2017

Change https://golang.org/cl/55790 mentions this issue: go/ast: add Apply, for rewriting ASTs

@griesemer
Copy link
Contributor Author

@griesemer griesemer commented Oct 25, 2017

We don't use the issue tracker for questions; in the future please ask one the mailing lists or chat boards.

  • An AssignmentStmt is a statement and can be used where a statement would go.
  • A variable declaration can go into a DeclStmt which can also go where a statement would go.

The go/ast documentation groups the nodes in Decl, Expr, and Stmt, and the grouping is pretty clearly telling you which node fits where. It's also trivial to find out empirically: if you can assign a node x to an ast.Stmt, then x is a statement, etc.

@sentient
Copy link

@sentient sentient commented Oct 25, 2017

@griesemer sorry. I deleted the question. will try to find help on the boards

@smasher164
Copy link
Member

@smasher164 smasher164 commented Oct 31, 2017

Is there a reason why the proposed ast.Apply returns a (possibly) modified ast.Node? This seems like a departure from ast.Walk, and I don't understand why it is necessary to return a Node instead of modifying it in place.

@griesemer
Copy link
Contributor Author

@griesemer griesemer commented Oct 31, 2017

@smasher164 Because ast.Apply may change the type of the node which cannot be done by modifying a node in place. For instance, an ast.Apply working on an constant expression tree may replace that tree with a single node which is the constant result.

@gopherbot
Copy link

@gopherbot gopherbot commented Nov 1, 2017

Change https://golang.org/cl/74930 mentions this issue: go/ast: add Apply, for rewriting ASTs

josharian added a commit to josharian/go that referenced this issue Nov 13, 2017
Based on work by Robert Griesemer.

Fixes golang#17108

DO NOT REVIEW

Work in progress. Notes:

Needs more tests, particularly of interactions between modifications,
like Delete then InsertAfter, Delete then InsertBefore, and so on.
Even just the few tests I've added so far have helped to clarify the API.

I've changed Apply to not walk the node inserted by InsertAfter.

Optimize: Need to allocate a single ApplyCursor and reuse it.

TODO: Review API, consider shrinking it a bit. Do we need ApplyCursor.IsFile
to be exported? ApplyCursor.Name? And so on.

Should Apply do anything with comments and/or CommentMaps?

Should Apply.Name be renamed Apply.FieldName?

Change-Id: I291bb3f8aba85abdeb728714c08702c082617f54
@gopherbot
Copy link

@gopherbot gopherbot commented Nov 15, 2017

Change https://golang.org/cl/77811 mentions this issue: go/ast/astutil: add Apply, for rewriting Go ASTs

@golang golang locked and limited conversation to collaborators Nov 16, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.