New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

go/ast: provide AST Rewrite functionality (ast.Walk is not good enough) #17108

Closed
griesemer opened this Issue Sep 14, 2016 · 41 comments

Comments

Projects
None yet
@griesemer
Contributor

griesemer commented Sep 14, 2016

This has come up again and again. While it's easy to traverse the AST (ast.Walk, ast.Inspect), it's not easily possible to use those for general tree rewrites w/o significant amount of work.

For instance, if we wanted to rewrite all expressions (ast.Expr) of a certain kind into something else, ast.Walk and ast.Inspect would need to consider every node that contains ast.Expr fields separately, rather than just look at nodes that satisfy ast.Expr.

AST rewriters exist in other packages; most notably perhaps in gofmt (for gofmt -r); that one is reflection-based. Reflection-based approaches tend to be general, but also hard to understand, and slower than necessary.

API starting point:

// An ApplyFunc is invoked by Apply for each node, before and/or after
// the node's children. See Apply for the interpretation of the return
// value.
//
// The node is given by the node pointer's address addr which makes it
// possible for an ApplyFunc to rewrite the node. The node's parent is
// the node containing the *addr. If the node is part of a list, index
// identifies its position in that list. Index is < 0 otherwise.
type ApplyFunc func(parent Node, index int, addr interface{}) bool

// Apply traverses a syntax tree recursively, starting with the node
// identified by parent, index, and addr. See Apply for the meaning
// of these arguments.
// 
// If pre is not nil, it is called for each node before its children
// are traversed (pre-order). If the result of calling pre is false,
// no children are traversed, and post is not called for the node.
//
// If post is not nil, it is called for each node after its children
// were traversed (post-order). If the result of calling post is false,
// traversal is terminated and Apply returns immediately.
func Apply(parent Node, index int, addr interface{}, pre, post ApplyFunc) {

@griesemer griesemer added this to the Go1.8Maybe milestone Sep 14, 2016

@griesemer griesemer self-assigned this Sep 14, 2016

@griesemer

This comment has been minimized.

Show comment
Hide comment
@griesemer
Contributor

griesemer commented Sep 14, 2016

@alandonovan

This comment has been minimized.

Show comment
Hide comment
@alandonovan

alandonovan Sep 14, 2016

Contributor

This seems like a good addition. The reflect-based implementations in gofmt (and eg) is fearsomely complex, and there are other times I have wanted to use this function.

The index is not enough to uniquely identify a subtree of a node. For example, ast.CaseClause contains two slices of Nodes, and ast.BinOp contains two Exprs. I think we need to identify the field too. The most obvious way to do that is by its name. Field numbers, or field offsets, might be marginally more efficient, but would certainly be harder to read, and I'm users would thank us for choosing strings when they're debugging.

I'm not sure you need the index (or index + field name) in the Apply function, only in ApplyFunc.

Contributor

alandonovan commented Sep 14, 2016

This seems like a good addition. The reflect-based implementations in gofmt (and eg) is fearsomely complex, and there are other times I have wanted to use this function.

The index is not enough to uniquely identify a subtree of a node. For example, ast.CaseClause contains two slices of Nodes, and ast.BinOp contains two Exprs. I think we need to identify the field too. The most obvious way to do that is by its name. Field numbers, or field offsets, might be marginally more efficient, but would certainly be harder to read, and I'm users would thank us for choosing strings when they're debugging.

I'm not sure you need the index (or index + field name) in the Apply function, only in ApplyFunc.

@mdempsky

This comment has been minimized.

Show comment
Hide comment
@mdempsky

mdempsky Sep 14, 2016

Member

I've also wanted this.

Index is sufficient I think if it counts across all children. E.g., in AssignStmt, it counts from 0 to len(Lhs)+len(Rhs)-1.

I've had cases where I want to be able to walk up the AST more than one Node, so I maintain a slice of (Parent, Index) tuples.

Aside: Index technically isn't necessary, and Java's Compiler API omits it from com.sun.source.util.TreePath. But I think it's handy to have.

I'm not sure I understand what addr's dynamic type will be from the description. I'm guessing though that if I'm visiting a *ast.Ident, then it could be either a **ast.Ident (e.g., the name for a ValueSpec) or *ast.Expr (e.g., an identifier in an expression context)?

Member

mdempsky commented Sep 14, 2016

I've also wanted this.

Index is sufficient I think if it counts across all children. E.g., in AssignStmt, it counts from 0 to len(Lhs)+len(Rhs)-1.

I've had cases where I want to be able to walk up the AST more than one Node, so I maintain a slice of (Parent, Index) tuples.

Aside: Index technically isn't necessary, and Java's Compiler API omits it from com.sun.source.util.TreePath. But I think it's handy to have.

I'm not sure I understand what addr's dynamic type will be from the description. I'm guessing though that if I'm visiting a *ast.Ident, then it could be either a **ast.Ident (e.g., the name for a ValueSpec) or *ast.Expr (e.g., an identifier in an expression context)?

@alandonovan

This comment has been minimized.

Show comment
Hide comment
@alandonovan

alandonovan Sep 14, 2016

Contributor

The dynamic type would be a pointer to the variable (field or slice element), whatever that may be: Node, Expr, *Ident.

Contributor

alandonovan commented Sep 14, 2016

The dynamic type would be a pointer to the variable (field or slice element), whatever that may be: Node, Expr, *Ident.

@griesemer

This comment has been minimized.

Show comment
Hide comment
@griesemer

griesemer Sep 14, 2016

Contributor

Here's a slightly more thought-through API that provides explicit names for fields. Nice, but definitively somewhat costly:

// An ApplyFunc is invoked by Apply for each node, before and/or after
// the node's children. See Apply for the interpretation of the return
// value.
//
// The node is given by the node pointer's address addr which makes it
// possible for an ApplyFunc to rewrite (replace) the node. The node
// is referred to from its parent which contains a field with the given
// name. If the field is a list, index identifies the node's position in
// that list; index is < 0 otherwise. Or roughly speaking:
//
//   addr == &parent.name         if index < 0
//   addr == &parent.name[index]  if index >= 0
//
// Exception: If the parent is an *ast.Package, and Apply is iterating
// through the Files map, name is the filename, and index is incremented
// by 1 with each file, starting at 0. Assigning to *addr will change the
// corresponding Files map entry but only upon return from pre and post.
type ApplyFunc func(parent Node, name string, index int, addr interface{}) bool

// Apply traverses a syntax tree recursively, starting with the node
// identified by parent, name, index, and addr. See Apply for the meaning
// of these arguments.
//
// If pre is not nil, it is called for each node before its children
// are traversed (pre-order). If the result of calling pre is false,
// no children are traversed, and post is not called for the node.
//
// If post is not nil, it is called for each node after its children
// were traversed (post-order). If the result of calling post is false,
// traversal is terminated and Apply returns immediately.
//
// Only fields that refer to AST nodes are considered children;
// specifically, token.Pos, Scopes, Objects, and fields of basic types
// (strings, etc.) are ignored. Children are traversed in the order in
// which they appear in the respective node's struct definition.
func Apply(parent Node, name string, index int, addr interface{}, pre, post ApplyFunc) {
Contributor

griesemer commented Sep 14, 2016

Here's a slightly more thought-through API that provides explicit names for fields. Nice, but definitively somewhat costly:

// An ApplyFunc is invoked by Apply for each node, before and/or after
// the node's children. See Apply for the interpretation of the return
// value.
//
// The node is given by the node pointer's address addr which makes it
// possible for an ApplyFunc to rewrite (replace) the node. The node
// is referred to from its parent which contains a field with the given
// name. If the field is a list, index identifies the node's position in
// that list; index is < 0 otherwise. Or roughly speaking:
//
//   addr == &parent.name         if index < 0
//   addr == &parent.name[index]  if index >= 0
//
// Exception: If the parent is an *ast.Package, and Apply is iterating
// through the Files map, name is the filename, and index is incremented
// by 1 with each file, starting at 0. Assigning to *addr will change the
// corresponding Files map entry but only upon return from pre and post.
type ApplyFunc func(parent Node, name string, index int, addr interface{}) bool

// Apply traverses a syntax tree recursively, starting with the node
// identified by parent, name, index, and addr. See Apply for the meaning
// of these arguments.
//
// If pre is not nil, it is called for each node before its children
// are traversed (pre-order). If the result of calling pre is false,
// no children are traversed, and post is not called for the node.
//
// If post is not nil, it is called for each node after its children
// were traversed (post-order). If the result of calling post is false,
// traversal is terminated and Apply returns immediately.
//
// Only fields that refer to AST nodes are considered children;
// specifically, token.Pos, Scopes, Objects, and fields of basic types
// (strings, etc.) are ignored. Children are traversed in the order in
// which they appear in the respective node's struct definition.
func Apply(parent Node, name string, index int, addr interface{}, pre, post ApplyFunc) {
@griesemer

This comment has been minimized.

Show comment
Hide comment
@griesemer

griesemer Sep 14, 2016

Contributor

@mdempsky points out that alternatively, index could simply be monotonically increasing. The index of a list element would be the provided index minus the index of the list. A utility function could provide the name of a field for a given index if necessary. That function could be reflection-based since it's probably not speed-critical.

Contributor

griesemer commented Sep 14, 2016

@mdempsky points out that alternatively, index could simply be monotonically increasing. The index of a list element would be the provided index minus the index of the list. A utility function could provide the name of a field for a given index if necessary. That function could be reflection-based since it's probably not speed-critical.

@alandonovan

This comment has been minimized.

Show comment
Hide comment
@alandonovan

alandonovan Sep 14, 2016

Contributor

We probably still want the field name though, so that you can locate the subtree easily in cases like SliceExpr without having to do four separate comparisons---and arguably more importantly, without having to remember to do four separate comparisons. I have found several bugs caused by assuming that a given node was the "correct" subtree of the parent node, when in fact it was another one. Passing the field name will help to make it obvious that this is something you need to think about.

Contributor

alandonovan commented Sep 14, 2016

We probably still want the field name though, so that you can locate the subtree easily in cases like SliceExpr without having to do four separate comparisons---and arguably more importantly, without having to remember to do four separate comparisons. I have found several bugs caused by assuming that a given node was the "correct" subtree of the parent node, when in fact it was another one. Passing the field name will help to make it obvious that this is something you need to think about.

@mdempsky

This comment has been minimized.

Show comment
Hide comment
@mdempsky

mdempsky Sep 14, 2016

Member

I think the easier solution is to still reserve indices for nil fields. So for SliceExpr you'd always have 0==X, 1==Low, 2 == High, 3==Max.

Member

mdempsky commented Sep 14, 2016

I think the easier solution is to still reserve indices for nil fields. So for SliceExpr you'd always have 0==X, 1==Low, 2 == High, 3==Max.

@alandonovan

This comment has been minimized.

Show comment
Hide comment
@alandonovan

alandonovan Sep 15, 2016

Contributor

You can't use the same number to indicate both a field index and a slice index.

Contributor

alandonovan commented Sep 15, 2016

You can't use the same number to indicate both a field index and a slice index.

@griesemer

This comment has been minimized.

Show comment
Hide comment
@griesemer

griesemer Sep 15, 2016

Contributor

I played with this a bit and I have a prototype that could work. Observations: Passing an address to a field is problematic: Many fields that we are interested in are (ast.)Expr, Stmt, and Decl fields, which are interfaces. To "unpack" them we have to type-switch on the address, then deref the address and then type-switch again on the contents; thus requiring two type switches. Here's a better approach: If we have the parent, field name (or field index), plus slice index if needed, we have all that is necessary because we can use reflect to set a field with this information. Changing/rewriting fields is (probably) much less common then reading (traversing) the tree, thus the more costly set field operation is ok. In turn, the API can be closer to Walk and thus easier to use:

// An ApplyFunc is invoked by Apply for each node n, before and/or after
// the node's children. See Apply for the interpretation of the return
// value.
//
// The parent, name, and index field are used to identify the parent
// node's field containing n. If the field is a list, index identifies
// the node's position in that list; index is < 0 otherwise. Roughly:
//
//   n == parent.name         if index < 0
//   n == parent.name[index]  if index >= 0
//
// This information can be used to update/rewrite a node with SetField.
//
// Exception: If the parent is a *Package, and Apply is iterating
// through the Files map, name is the filename, and index is -1.
type ApplyFunc func(parent Node, name string, index int, n Node) bool

// Apply traverses a syntax tree recursively, starting with the node
// identified by parent, name, index, and n. See Apply for the meaning
// of these arguments.
//
// If pre is not nil, it is called for each node before its children
// are traversed (pre-order). If the result of calling pre is false,
// no children are traversed, and post is not called for the node.
//
// If post is not nil, it is called for each node after its children
// were traversed (post-order). If the result of calling post is false,
// traversal is terminated and Apply returns immediately.
//
// Only fields that refer to AST nodes are considered children.
// Children are traversed in the order in which they appear in the
// respective node's struct definition.
func Apply(parent Node, name string, index int, n Node, pre, post ApplyFunc)

// SetField sets the named field in the parent node to n. If the field
// is a slice, index is the slice index. The named field must exist in
// the parent, n must be assignable to that field, and the field must be
// indexable if index >= 0. Roughly:
//
//   parent.name        = n  if index < 0
//   parent.name[index] = n  if index >= 0
//
// The parent node may be a pointer to the struct containing the named
// field, or it may be the struct itself.
//
// Exception: If the parent is a Package, n must be a *File and name is
// interpreted as the filename in the Package.Files map.
func SetField(parent Node, name string, index int, n Node)

Open questions:

  1. Should Apply require parent, name, and index (which often will be nil, "", -1) so that even the top-level node can be rewritten, or should that require a special case and in return a simpler Apply that just requires, the starting node n, pre, and post?

  2. Should field addressing happen via a name (string), which is nice to read but more expensive than say just the field index as provided by reflect)?

Contributor

griesemer commented Sep 15, 2016

I played with this a bit and I have a prototype that could work. Observations: Passing an address to a field is problematic: Many fields that we are interested in are (ast.)Expr, Stmt, and Decl fields, which are interfaces. To "unpack" them we have to type-switch on the address, then deref the address and then type-switch again on the contents; thus requiring two type switches. Here's a better approach: If we have the parent, field name (or field index), plus slice index if needed, we have all that is necessary because we can use reflect to set a field with this information. Changing/rewriting fields is (probably) much less common then reading (traversing) the tree, thus the more costly set field operation is ok. In turn, the API can be closer to Walk and thus easier to use:

// An ApplyFunc is invoked by Apply for each node n, before and/or after
// the node's children. See Apply for the interpretation of the return
// value.
//
// The parent, name, and index field are used to identify the parent
// node's field containing n. If the field is a list, index identifies
// the node's position in that list; index is < 0 otherwise. Roughly:
//
//   n == parent.name         if index < 0
//   n == parent.name[index]  if index >= 0
//
// This information can be used to update/rewrite a node with SetField.
//
// Exception: If the parent is a *Package, and Apply is iterating
// through the Files map, name is the filename, and index is -1.
type ApplyFunc func(parent Node, name string, index int, n Node) bool

// Apply traverses a syntax tree recursively, starting with the node
// identified by parent, name, index, and n. See Apply for the meaning
// of these arguments.
//
// If pre is not nil, it is called for each node before its children
// are traversed (pre-order). If the result of calling pre is false,
// no children are traversed, and post is not called for the node.
//
// If post is not nil, it is called for each node after its children
// were traversed (post-order). If the result of calling post is false,
// traversal is terminated and Apply returns immediately.
//
// Only fields that refer to AST nodes are considered children.
// Children are traversed in the order in which they appear in the
// respective node's struct definition.
func Apply(parent Node, name string, index int, n Node, pre, post ApplyFunc)

// SetField sets the named field in the parent node to n. If the field
// is a slice, index is the slice index. The named field must exist in
// the parent, n must be assignable to that field, and the field must be
// indexable if index >= 0. Roughly:
//
//   parent.name        = n  if index < 0
//   parent.name[index] = n  if index >= 0
//
// The parent node may be a pointer to the struct containing the named
// field, or it may be the struct itself.
//
// Exception: If the parent is a Package, n must be a *File and name is
// interpreted as the filename in the Package.Files map.
func SetField(parent Node, name string, index int, n Node)

Open questions:

  1. Should Apply require parent, name, and index (which often will be nil, "", -1) so that even the top-level node can be rewritten, or should that require a special case and in return a simpler Apply that just requires, the starting node n, pre, and post?

  2. Should field addressing happen via a name (string), which is nice to read but more expensive than say just the field index as provided by reflect)?

@alandonovan

This comment has been minimized.

Show comment
Hide comment
@alandonovan

alandonovan Sep 15, 2016

Contributor

On 14 September 2016 at 23:52, Robert Griesemer notifications@github.com
wrote:

If we have the parent, field name (or field index), plus slice index if
needed, we have all that is necessary because we can use reflect to set a
field with this information. Changing/rewriting fields is (probably) much
less common then reading (traversing) the tree, thus the more costly set
field operation is ok. In turn, the API can be closer to Walk and thus
easier to use:

Sounds good. It's interesting that Apply is now a primitive interface for
traversing AST tree nodes and edges, whether or not you wish to update the
tree. In other words, you could define the existing ast.Walk and
ast.Inspect functions in terms of Apply. That might be a good exercise to
validate the API.

// Apply traverses a syntax tree recursively, starting with the node
// identified by parent, name, index, and n. See Apply for the meaning

typo: "See ApplyFunc"

  1. Should Apply require parent, name, and index (which often will be nil,

"", -1) so that even the top-level node can be rewritten, or should that
require a special case and in return a simpler Apply that just requires,
the starting node n, pre, and post?

I don't have a good sense of how important this case will be. You could
provide a wrapper for the root case with the three zero arguments.

  1. Should field addressing happen via a name (string), which is nice to
    read but more expensive than say just the field index as provided by
    reflect)?

I see three reasons to prefer field names over field indices:
(1) Client code that uses strings will be easier to read. Unnamed integer
constants are inscrutable and error-prone.
(2) Strings are invaluable during debugging. You can print them out to
document the descent, something that is hard to do with the existing (read
only) traversal API.
(3) The AST is bound to change, if only slightly, in future versions of Go,
and we should not assume that all new fields will be added at the end of
their struct, thus the field numbering may not be durable in the long term.

Contributor

alandonovan commented Sep 15, 2016

On 14 September 2016 at 23:52, Robert Griesemer notifications@github.com
wrote:

If we have the parent, field name (or field index), plus slice index if
needed, we have all that is necessary because we can use reflect to set a
field with this information. Changing/rewriting fields is (probably) much
less common then reading (traversing) the tree, thus the more costly set
field operation is ok. In turn, the API can be closer to Walk and thus
easier to use:

Sounds good. It's interesting that Apply is now a primitive interface for
traversing AST tree nodes and edges, whether or not you wish to update the
tree. In other words, you could define the existing ast.Walk and
ast.Inspect functions in terms of Apply. That might be a good exercise to
validate the API.

// Apply traverses a syntax tree recursively, starting with the node
// identified by parent, name, index, and n. See Apply for the meaning

typo: "See ApplyFunc"

  1. Should Apply require parent, name, and index (which often will be nil,

"", -1) so that even the top-level node can be rewritten, or should that
require a special case and in return a simpler Apply that just requires,
the starting node n, pre, and post?

I don't have a good sense of how important this case will be. You could
provide a wrapper for the root case with the three zero arguments.

  1. Should field addressing happen via a name (string), which is nice to
    read but more expensive than say just the field index as provided by
    reflect)?

I see three reasons to prefer field names over field indices:
(1) Client code that uses strings will be easier to read. Unnamed integer
constants are inscrutable and error-prone.
(2) Strings are invaluable during debugging. You can print them out to
document the descent, something that is hard to do with the existing (read
only) traversal API.
(3) The AST is bound to change, if only slightly, in future versions of Go,
and we should not assume that all new fields will be added at the end of
their struct, thus the field numbering may not be durable in the long term.

@griesemer

This comment has been minimized.

Show comment
Hide comment
@griesemer

griesemer Sep 15, 2016

Contributor

@alandonovan ast.Walk was written w/o a very good understanding of use cases; in fact, I introduced ast.Inspect later because it was much easier to use in many situations. We cannot remove these functions but we could mark them as deprecated. ast.Apply should cover both use cases nicely.

Contributor

griesemer commented Sep 15, 2016

@alandonovan ast.Walk was written w/o a very good understanding of use cases; in fact, I introduced ast.Inspect later because it was much easier to use in many situations. We cannot remove these functions but we could mark them as deprecated. ast.Apply should cover both use cases nicely.

@alandonovan

This comment has been minimized.

Show comment
Hide comment
@alandonovan

alandonovan Sep 15, 2016

Contributor

I didn't mean you should remove or deprecate them, only that you should go through the exercise of implementing the old functions in terms of the new, since it might be revealing.

Contributor

alandonovan commented Sep 15, 2016

I didn't mean you should remove or deprecate them, only that you should go through the exercise of implementing the old functions in terms of the new, since it might be revealing.

@griesemer

This comment has been minimized.

Show comment
Hide comment
@griesemer

griesemer Sep 15, 2016

Contributor

@alandonovan I understand. I think it can be done but it will be difficult to guarantee 100% semantic equivalence w/o manually checking each case, or an extensive test suite testing each node type and exit scenario. For one, Walk doesn't invoke the visitor if a node is nil, while Apply probably should (and my prototype currently does) so that the pre/post closures have a chance to rewrite/set the node (that feature will make it possible to implement Walk, while Walk could not be used to implement Apply since it misses nodes).

Contributor

griesemer commented Sep 15, 2016

@alandonovan I understand. I think it can be done but it will be difficult to guarantee 100% semantic equivalence w/o manually checking each case, or an extensive test suite testing each node type and exit scenario. For one, Walk doesn't invoke the visitor if a node is nil, while Apply probably should (and my prototype currently does) so that the pre/post closures have a chance to rewrite/set the node (that feature will make it possible to implement Walk, while Walk could not be used to implement Apply since it misses nodes).

@mdempsky

This comment has been minimized.

Show comment
Hide comment
@mdempsky

mdempsky Sep 15, 2016

Member

You can't use the same number to indicate both a field index and a slice index.

I think we're talking past each other. Examples of how I use a single index to identify child nodes:

  • To find the i'th child of a CallExpr: i==0 means Fun, and otherwise you want Args[i-1].
  • To find the i'th child of an AssignStmt, i<len(Lhs) means Lhs[i], and otherwise you want Rhs[i-len(Lhs)].

That said, I'm not opposed to field names.

Member

mdempsky commented Sep 15, 2016

You can't use the same number to indicate both a field index and a slice index.

I think we're talking past each other. Examples of how I use a single index to identify child nodes:

  • To find the i'th child of a CallExpr: i==0 means Fun, and otherwise you want Args[i-1].
  • To find the i'th child of an AssignStmt, i<len(Lhs) means Lhs[i], and otherwise you want Rhs[i-len(Lhs)].

That said, I'm not opposed to field names.

@alandonovan

This comment has been minimized.

Show comment
Hide comment
@alandonovan

alandonovan Sep 15, 2016

Contributor

On 15 September 2016 at 12:48, Matthew Dempsky notifications@github.com
wrote:

You can't use the same number to indicate both a field index and a slice
index.

I think we're talking past each other. Examples of how I use a single
index to identify child nodes:

  • To find the i'th child Node of a CallExpr: i==0 means Fun, and
    otherwise you want Args[i-1].
  • To find the i'th child of an AssignStmt, i<len(Lhs) means Lhs[i],
    and otherwise you want Rhs[i-len(Lhs)].

That said, I'm not opposed to field names.

Ah, so you would flatten out the scalar fields and the slices into one
sequence. Makes sense, though the bookkeeping seems onerous.

Contributor

alandonovan commented Sep 15, 2016

On 15 September 2016 at 12:48, Matthew Dempsky notifications@github.com
wrote:

You can't use the same number to indicate both a field index and a slice
index.

I think we're talking past each other. Examples of how I use a single
index to identify child nodes:

  • To find the i'th child Node of a CallExpr: i==0 means Fun, and
    otherwise you want Args[i-1].
  • To find the i'th child of an AssignStmt, i<len(Lhs) means Lhs[i],
    and otherwise you want Rhs[i-len(Lhs)].

That said, I'm not opposed to field names.

Ah, so you would flatten out the scalar fields and the slices into one
sequence. Makes sense, though the bookkeeping seems onerous.

griesemer added a commit to griesemer/dotGo2016 that referenced this issue Sep 22, 2016

@quentinmit quentinmit added the NeedsFix label Oct 10, 2016

@griesemer

This comment has been minimized.

Show comment
Hide comment
@griesemer

griesemer Oct 19, 2016

Contributor

For a concrete implementation, see:
griesemer/dotGo2016@f0f16c2

Contributor

griesemer commented Oct 19, 2016

For a concrete implementation, see:
griesemer/dotGo2016@f0f16c2

@rsc rsc modified the milestones: Unplanned, Go1.8Maybe Oct 20, 2016

@fatih

This comment has been minimized.

Show comment
Hide comment
@fatih

fatih Jan 10, 2017

Member

I recently started on a tool for vim-go and wanted to easily rewrite AST as well. I've seen this issue and really looking forward what we get out from this. Meanwhile, I've copied ast.Walk() and implemented a straight forward rewriter by changing the signature of the walk function we pass. Here is a working repo: https://github.com/fatih/astrewrite I'm going to use it for my next project. It might be not perfect or might have errors (which are unknown to me know), but I thought I'll share to show a different API usage.

Also I think, we can introduce a ast.Rewrite function without breaking the current API or add it to the golang.org/x/tools/go/ast/astutil.

Member

fatih commented Jan 10, 2017

I recently started on a tool for vim-go and wanted to easily rewrite AST as well. I've seen this issue and really looking forward what we get out from this. Meanwhile, I've copied ast.Walk() and implemented a straight forward rewriter by changing the signature of the walk function we pass. Here is a working repo: https://github.com/fatih/astrewrite I'm going to use it for my next project. It might be not perfect or might have errors (which are unknown to me know), but I thought I'll share to show a different API usage.

Also I think, we can introduce a ast.Rewrite function without breaking the current API or add it to the golang.org/x/tools/go/ast/astutil.

@griesemer

This comment has been minimized.

Show comment
Hide comment
@griesemer

griesemer Jan 10, 2017

Contributor

@fatih Any reason the version I mentioned above ( griesemer/dotGo2016@f0f16c2 ) wouldn't have worked for you?

Contributor

griesemer commented Jan 10, 2017

@fatih Any reason the version I mentioned above ( griesemer/dotGo2016@f0f16c2 ) wouldn't have worked for you?

@fatih

This comment has been minimized.

Show comment
Hide comment
@fatih

fatih Jan 11, 2017

Member

@griesemer at first the API was a little bit confusing for me. So I didn't tried to investigate in depth. But I've tried to give it a look now that you asked for it.

There are more steps involved changing a node with Apply(). Using the example src below

src := `package main

type Foo struct{}`

fset := token.NewFileSet()
file, _ := parser.ParseFile(fset, "foo.go", src, parser.ParseComments) 

I've tried to rename the type name from Foo to Bar. here is how we can do it with Apply() and with Walk() from the astrewrite package"

1. Apply:

applyFunc := func(parent ast.Node, name string, index int, n ast.Node) bool {
	spec, ok := parent.(*ast.GenDecl)
	if !ok {
		return true
	}

	x, ok := n.(*ast.TypeSpec)
	if !ok {
		return true
	}

	x.Name.Name = "Bar"
	spec.Specs[index] = x
	return true
}

rewritten := ast.Apply(file, applyFunc, nil)

2. Walk() from astrewrite:

rewriteFunc := func(n ast.Node) (ast.Node, bool) {
	x, ok := n.(*ast.TypeSpec)
	if !ok {
		return n, true
	}

	x.Name.Name = "Bar"
	return x, true
}

rewritten := astrewrite.Walk(file, rewriteFunc)

With Apply() I have to know the parent nodes type as well, so I can replace the item via the index. However in Walk() from astrewrite I don't need to know the parent node at all. I only change the node I'm interested in and then return it back.

--

Note that my current use case is very simple. There might be other use cases that Apply() covers that I'm not aware of. I'm just trying to give my feedback as a regular Go user that occasionally uses the go/ast package for writing or improving existing tools.

Member

fatih commented Jan 11, 2017

@griesemer at first the API was a little bit confusing for me. So I didn't tried to investigate in depth. But I've tried to give it a look now that you asked for it.

There are more steps involved changing a node with Apply(). Using the example src below

src := `package main

type Foo struct{}`

fset := token.NewFileSet()
file, _ := parser.ParseFile(fset, "foo.go", src, parser.ParseComments) 

I've tried to rename the type name from Foo to Bar. here is how we can do it with Apply() and with Walk() from the astrewrite package"

1. Apply:

applyFunc := func(parent ast.Node, name string, index int, n ast.Node) bool {
	spec, ok := parent.(*ast.GenDecl)
	if !ok {
		return true
	}

	x, ok := n.(*ast.TypeSpec)
	if !ok {
		return true
	}

	x.Name.Name = "Bar"
	spec.Specs[index] = x
	return true
}

rewritten := ast.Apply(file, applyFunc, nil)

2. Walk() from astrewrite:

rewriteFunc := func(n ast.Node) (ast.Node, bool) {
	x, ok := n.(*ast.TypeSpec)
	if !ok {
		return n, true
	}

	x.Name.Name = "Bar"
	return x, true
}

rewritten := astrewrite.Walk(file, rewriteFunc)

With Apply() I have to know the parent nodes type as well, so I can replace the item via the index. However in Walk() from astrewrite I don't need to know the parent node at all. I only change the node I'm interested in and then return it back.

--

Note that my current use case is very simple. There might be other use cases that Apply() covers that I'm not aware of. I'm just trying to give my feedback as a regular Go user that occasionally uses the go/ast package for writing or improving existing tools.

@griesemer

This comment has been minimized.

Show comment
Hide comment
@griesemer

griesemer Jan 11, 2017

Contributor

@fatih Thanks for looking into this. A few observations:

  1. When you use Apply, you don't need to type-assert on the parent - you can simply use the new SetField function (also documented in the API): ast.SetField(parent, name, index, <newnode>). With that, both versions will be approximately the same amount of code to write.
  2. In this specific case where you simply change a string, there's no need to use a complicated rewrite in the first place: You could just walk the tree and change that string directly.
  3. Your implementation of astrewrite is easier to use in simple cases, but it is almost always more costly: The implementation of astrewrite always needs to write back the result returned by rewriteFunc, and because it's always an ast.Node there's always a type assertion needed for the assignment to work (in the astrewrite implementation) - that's relatively costly. You have to always do this for every single node even though most nodes are not rewritten. On the other hand, with ast.Apply, the ast.SetField operation is relatively expensive but you only pay when you actually rewrite something.
  4. Finally, with ast.Apply you have the choice to do the rewriting in pre- or post-order when traversing the tree, which is often important to be able to control - this is not possibly with astrewrite.

To summarize: If you only want to change the name of things, you may not need a rewrite at all. Secondly, while astrewrite may be fine for your (more general) use case, it's probably not sufficiently powerful as a general rewrite mechanism, and likely quite a bit slower than the suggested ast.Apply. Hope that helps.

Anyway, thanks again for investigating!

Contributor

griesemer commented Jan 11, 2017

@fatih Thanks for looking into this. A few observations:

  1. When you use Apply, you don't need to type-assert on the parent - you can simply use the new SetField function (also documented in the API): ast.SetField(parent, name, index, <newnode>). With that, both versions will be approximately the same amount of code to write.
  2. In this specific case where you simply change a string, there's no need to use a complicated rewrite in the first place: You could just walk the tree and change that string directly.
  3. Your implementation of astrewrite is easier to use in simple cases, but it is almost always more costly: The implementation of astrewrite always needs to write back the result returned by rewriteFunc, and because it's always an ast.Node there's always a type assertion needed for the assignment to work (in the astrewrite implementation) - that's relatively costly. You have to always do this for every single node even though most nodes are not rewritten. On the other hand, with ast.Apply, the ast.SetField operation is relatively expensive but you only pay when you actually rewrite something.
  4. Finally, with ast.Apply you have the choice to do the rewriting in pre- or post-order when traversing the tree, which is often important to be able to control - this is not possibly with astrewrite.

To summarize: If you only want to change the name of things, you may not need a rewrite at all. Secondly, while astrewrite may be fine for your (more general) use case, it's probably not sufficiently powerful as a general rewrite mechanism, and likely quite a bit slower than the suggested ast.Apply. Hope that helps.

Anyway, thanks again for investigating!

@fatih

This comment has been minimized.

Show comment
Hide comment
@fatih

fatih Jan 11, 2017

Member

@griesemer Thanks a lot for the detailed explanation. I've didn't know that I could just use SetField, missed that totally. My example was simple by purpose to show the API. You're right that it's costly and I totally agree, however the API is also simple. It feels intuitive. But this library is not the place to make things feel good in expense of performance :)

I'll try to use Apply() instead of astrewrite for my next project. If there is anything I think it can be improved I'll comment here. Thanks again 👍

Member

fatih commented Jan 11, 2017

@griesemer Thanks a lot for the detailed explanation. I've didn't know that I could just use SetField, missed that totally. My example was simple by purpose to show the API. You're right that it's costly and I totally agree, however the API is also simple. It feels intuitive. But this library is not the place to make things feel good in expense of performance :)

I'll try to use Apply() instead of astrewrite for my next project. If there is anything I think it can be improved I'll comment here. Thanks again 👍

@josharian

This comment has been minimized.

Show comment
Hide comment
@josharian

josharian Jan 13, 2017

Contributor

@griesemer it'd be handy if that code was go-gettable, so it is easier to use and evolve.

Since you wrote it, perhaps you want to put it up on github under your name? Or I could put it up somewhere for you; let me know.

I've pulled it out of package ast for you: https://gist.github.com/josharian/ff74a8451d7d4e7f062d7b9b04c87eac. Not tested yet, but it compiles, and the only modification was adding ast. in a bunch of places. :)

Contributor

josharian commented Jan 13, 2017

@griesemer it'd be handy if that code was go-gettable, so it is easier to use and evolve.

Since you wrote it, perhaps you want to put it up on github under your name? Or I could put it up somewhere for you; let me know.

I've pulled it out of package ast for you: https://gist.github.com/josharian/ff74a8451d7d4e7f062d7b9b04c87eac. Not tested yet, but it compiles, and the only modification was adding ast. in a bunch of places. :)

@griesemer griesemer added this to the Go1.9 milestone Jan 13, 2017

@josharian

This comment has been minimized.

Show comment
Hide comment
@josharian

josharian Jan 14, 2017

Contributor

I prototyped the "use an opaque cursor for everything" approach. I'll keep using it for my work and see whether further API issues arise. Input welcome.

Here's the godoc output for that gist:

PACKAGE DOCUMENTATION

package apply
    import "."


FUNCTIONS

func Apply(root ast.Node, pre, post ApplyFunc) ast.Node
    Apply traverses a syntax tree recursively, starting with root, and
    calling pre and post for each node as described below. The result is the
    (possibly modified) syntax tree.

    If pre is not nil, it is called for each node before its children are
    traversed (pre-order). If the result of calling pre is false, no
    children are traversed, and post is not called for that node.

    If post is not nil, it is called for each node after its children were
    traversed (post-order). If the result of calling post is false,
    traversal is terminated and Apply returns immediately.

    Only fields that refer to AST nodes are considered children. Children
    are traversed in the order in which they appear in the respective node's
    struct definition.

TYPES

type ApplyCursor struct {
    // contains filtered or unexported fields
}
    An ApplyCursor describes a node encountered during Apply. Information
    about the node and its parent is available via the Node, Parent, Name,
    and Index methods.

    Roughly speaking, the following invariants hold:

	Parent().Name()          == Node()  if !HasIndex()
	Parent().Name()[Index()] == Node()  if HasIndex()

    The methods Replace, Delete, InsertBefore, and InsertAfter can be used
    to change the AST without disrupting Apply.

func (c ApplyCursor) Delete()
    Delete deletes the current Node from its containing slice. If the
    current Node is not part of a slice, Delete panics.

func (c ApplyCursor) HasIndex() bool
    HasIndex reports whether the current Node is part of a slice of Nodes.

func (c ApplyCursor) Index() int
    Index reports the index of the current Node in the slice of Nodes that
    contains it. Index panics if the current Node is not part of a slice.

func (c ApplyCursor) InsertAfter(n ast.Node)
    InsertAfter inserts n after the current Node in its containing slice. If
    the current Node is not part of a slice, InsertAfter panics. Apply will
    walk n.

func (c ApplyCursor) InsertBefore(n ast.Node)
    InsertBefore inserts n before the current Node in its containing slice.
    If the current Node is not part of a slice, InsertBefore panics. Apply
    will not walk n.

func (c ApplyCursor) IsFile() bool
    IsFile reports whether the current Node is a *File that is part of a
    *Package map of *Files.

func (c ApplyCursor) Name() string
    Name returns the name of the parent Node field that contains the current
    Node. If the parent is a Package and the current Node is a File, it
    returns the filename for the current Node.

func (c ApplyCursor) Node() ast.Node
    Node returns the current Node.

func (c ApplyCursor) Parent() ast.Node
    Parent returns the parent of the current Node.

func (c ApplyCursor) Replace(n ast.Node)
    Replace replaces the current Node with n. The replacement node is not
    walked by Apply.

type ApplyFunc func(cursor ApplyCursor) bool
    An ApplyFunc is invoked by Apply for each node n, even if n is nil,
    before and/or after the node's children.

    The return value of ApplyFunc controls the syntax tree traversal. See
    Apply for details.
Contributor

josharian commented Jan 14, 2017

I prototyped the "use an opaque cursor for everything" approach. I'll keep using it for my work and see whether further API issues arise. Input welcome.

Here's the godoc output for that gist:

PACKAGE DOCUMENTATION

package apply
    import "."


FUNCTIONS

func Apply(root ast.Node, pre, post ApplyFunc) ast.Node
    Apply traverses a syntax tree recursively, starting with root, and
    calling pre and post for each node as described below. The result is the
    (possibly modified) syntax tree.

    If pre is not nil, it is called for each node before its children are
    traversed (pre-order). If the result of calling pre is false, no
    children are traversed, and post is not called for that node.

    If post is not nil, it is called for each node after its children were
    traversed (post-order). If the result of calling post is false,
    traversal is terminated and Apply returns immediately.

    Only fields that refer to AST nodes are considered children. Children
    are traversed in the order in which they appear in the respective node's
    struct definition.

TYPES

type ApplyCursor struct {
    // contains filtered or unexported fields
}
    An ApplyCursor describes a node encountered during Apply. Information
    about the node and its parent is available via the Node, Parent, Name,
    and Index methods.

    Roughly speaking, the following invariants hold:

	Parent().Name()          == Node()  if !HasIndex()
	Parent().Name()[Index()] == Node()  if HasIndex()

    The methods Replace, Delete, InsertBefore, and InsertAfter can be used
    to change the AST without disrupting Apply.

func (c ApplyCursor) Delete()
    Delete deletes the current Node from its containing slice. If the
    current Node is not part of a slice, Delete panics.

func (c ApplyCursor) HasIndex() bool
    HasIndex reports whether the current Node is part of a slice of Nodes.

func (c ApplyCursor) Index() int
    Index reports the index of the current Node in the slice of Nodes that
    contains it. Index panics if the current Node is not part of a slice.

func (c ApplyCursor) InsertAfter(n ast.Node)
    InsertAfter inserts n after the current Node in its containing slice. If
    the current Node is not part of a slice, InsertAfter panics. Apply will
    walk n.

func (c ApplyCursor) InsertBefore(n ast.Node)
    InsertBefore inserts n before the current Node in its containing slice.
    If the current Node is not part of a slice, InsertBefore panics. Apply
    will not walk n.

func (c ApplyCursor) IsFile() bool
    IsFile reports whether the current Node is a *File that is part of a
    *Package map of *Files.

func (c ApplyCursor) Name() string
    Name returns the name of the parent Node field that contains the current
    Node. If the parent is a Package and the current Node is a File, it
    returns the filename for the current Node.

func (c ApplyCursor) Node() ast.Node
    Node returns the current Node.

func (c ApplyCursor) Parent() ast.Node
    Parent returns the parent of the current Node.

func (c ApplyCursor) Replace(n ast.Node)
    Replace replaces the current Node with n. The replacement node is not
    walked by Apply.

type ApplyFunc func(cursor ApplyCursor) bool
    An ApplyFunc is invoked by Apply for each node n, even if n is nil,
    before and/or after the node's children.

    The return value of ApplyFunc controls the syntax tree traversal. See
    Apply for details.
@griesemer

This comment has been minimized.

Show comment
Hide comment
@griesemer

griesemer Feb 24, 2017

Contributor

@josharian I like your API but the one thing that concerns me is that an ApplyCursor needs to be set up for each node even if it's not needed (which I suspect is 99% of the time because most often an Apply is looking for one specific node only). I wonder if this approach can be made slightly more light-weight w/o losing too much of it's power.

Contributor

griesemer commented Feb 24, 2017

@josharian I like your API but the one thing that concerns me is that an ApplyCursor needs to be set up for each node even if it's not needed (which I suspect is 99% of the time because most often an Apply is looking for one specific node only). I wonder if this approach can be made slightly more light-weight w/o losing too much of it's power.

@josharian

This comment has been minimized.

Show comment
Hide comment
@josharian

josharian Feb 24, 2017

Contributor

Indeed. Although maybe if we documented that the lifetime of an ApplyCursor is a single call to pre/post, then we could allocate a single ApplyCursor and reuse it. The non-allocation part of the setup of the ApplyCursor should be pretty cheap, I think.

One other thing I learned while using the API above is that frequently I don't want to walk the node inserted by InsertAfter. We might want to change the behavior to non-walk (since the user can manually walk it if desired) or add a bool parameter to control the behavior.

Contributor

josharian commented Feb 24, 2017

Indeed. Although maybe if we documented that the lifetime of an ApplyCursor is a single call to pre/post, then we could allocate a single ApplyCursor and reuse it. The non-allocation part of the setup of the ApplyCursor should be pretty cheap, I think.

One other thing I learned while using the API above is that frequently I don't want to walk the node inserted by InsertAfter. We might want to change the behavior to non-walk (since the user can manually walk it if desired) or add a bool parameter to control the behavior.

@griesemer

This comment has been minimized.

Show comment
Hide comment
@griesemer

griesemer Feb 24, 2017

Contributor

@josharian Yes, if you pass in the ApplyCursor info down it's like passing the apply parameters via the ApplyCursor struct, and it's only allocated once. I like that. Care making this an official CL as solution for this issue (assign to me for review)?

Contributor

griesemer commented Feb 24, 2017

@josharian Yes, if you pass in the ApplyCursor info down it's like passing the apply parameters via the ApplyCursor struct, and it's only allocated once. I like that. Care making this an official CL as solution for this issue (assign to me for review)?

@josharian

This comment has been minimized.

Show comment
Hide comment
@josharian

josharian Feb 24, 2017

Contributor

Will do; expect it to take a few days.

Contributor

josharian commented Feb 24, 2017

Will do; expect it to take a few days.

@griesemer

This comment has been minimized.

Show comment
Hide comment
@griesemer

griesemer Feb 24, 2017

Contributor

Thanks - no rush.

Contributor

griesemer commented Feb 24, 2017

Thanks - no rush.

@josharian

This comment has been minimized.

Show comment
Hide comment
@josharian

josharian Mar 7, 2017

Contributor

(still working on this, not forgotten)

Contributor

josharian commented Mar 7, 2017

(still working on this, not forgotten)

@gopherbot

This comment has been minimized.

Show comment
Hide comment
@gopherbot

gopherbot Aug 15, 2017

Change https://golang.org/cl/55790 mentions this issue: go/ast: add Apply, for rewriting ASTs

gopherbot commented Aug 15, 2017

Change https://golang.org/cl/55790 mentions this issue: go/ast: add Apply, for rewriting ASTs

@griesemer

This comment has been minimized.

Show comment
Hide comment
@griesemer

griesemer Oct 25, 2017

Contributor

We don't use the issue tracker for questions; in the future please ask one the mailing lists or chat boards.

  • An AssignmentStmt is a statement and can be used where a statement would go.
  • A variable declaration can go into a DeclStmt which can also go where a statement would go.

The go/ast documentation groups the nodes in Decl, Expr, and Stmt, and the grouping is pretty clearly telling you which node fits where. It's also trivial to find out empirically: if you can assign a node x to an ast.Stmt, then x is a statement, etc.

Contributor

griesemer commented Oct 25, 2017

We don't use the issue tracker for questions; in the future please ask one the mailing lists or chat boards.

  • An AssignmentStmt is a statement and can be used where a statement would go.
  • A variable declaration can go into a DeclStmt which can also go where a statement would go.

The go/ast documentation groups the nodes in Decl, Expr, and Stmt, and the grouping is pretty clearly telling you which node fits where. It's also trivial to find out empirically: if you can assign a node x to an ast.Stmt, then x is a statement, etc.

@sentient

This comment has been minimized.

Show comment
Hide comment
@sentient

sentient Oct 25, 2017

@griesemer sorry. I deleted the question. will try to find help on the boards

sentient commented Oct 25, 2017

@griesemer sorry. I deleted the question. will try to find help on the boards

@smasher164

This comment has been minimized.

Show comment
Hide comment
@smasher164

smasher164 Oct 31, 2017

Contributor

Is there a reason why the proposed ast.Apply returns a (possibly) modified ast.Node? This seems like a departure from ast.Walk, and I don't understand why it is necessary to return a Node instead of modifying it in place.

Contributor

smasher164 commented Oct 31, 2017

Is there a reason why the proposed ast.Apply returns a (possibly) modified ast.Node? This seems like a departure from ast.Walk, and I don't understand why it is necessary to return a Node instead of modifying it in place.

@griesemer

This comment has been minimized.

Show comment
Hide comment
@griesemer

griesemer Oct 31, 2017

Contributor

@smasher164 Because ast.Apply may change the type of the node which cannot be done by modifying a node in place. For instance, an ast.Apply working on an constant expression tree may replace that tree with a single node which is the constant result.

Contributor

griesemer commented Oct 31, 2017

@smasher164 Because ast.Apply may change the type of the node which cannot be done by modifying a node in place. For instance, an ast.Apply working on an constant expression tree may replace that tree with a single node which is the constant result.

@gopherbot

This comment has been minimized.

Show comment
Hide comment
@gopherbot

gopherbot Nov 1, 2017

Change https://golang.org/cl/74930 mentions this issue: go/ast: add Apply, for rewriting ASTs

gopherbot commented Nov 1, 2017

Change https://golang.org/cl/74930 mentions this issue: go/ast: add Apply, for rewriting ASTs

josharian added a commit to josharian/go that referenced this issue Nov 13, 2017

go/ast: add Apply, for rewriting ASTs
Based on work by Robert Griesemer.

Fixes #17108

DO NOT REVIEW

Work in progress. Notes:

Needs more tests, particularly of interactions between modifications,
like Delete then InsertAfter, Delete then InsertBefore, and so on.
Even just the few tests I've added so far have helped to clarify the API.

I've changed Apply to not walk the node inserted by InsertAfter.

Optimize: Need to allocate a single ApplyCursor and reuse it.

TODO: Review API, consider shrinking it a bit. Do we need ApplyCursor.IsFile
to be exported? ApplyCursor.Name? And so on.

Should Apply do anything with comments and/or CommentMaps?

Should Apply.Name be renamed Apply.FieldName?

Change-Id: I291bb3f8aba85abdeb728714c08702c082617f54
@gopherbot

This comment has been minimized.

Show comment
Hide comment
@gopherbot

gopherbot Nov 15, 2017

Change https://golang.org/cl/77811 mentions this issue: go/ast/astutil: add Apply, for rewriting Go ASTs

gopherbot commented Nov 15, 2017

Change https://golang.org/cl/77811 mentions this issue: go/ast/astutil: add Apply, for rewriting Go ASTs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment