New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: x/tools/go/generated: parser for https://golang.org/s/generatedcode format #28089

Open
dmitshur opened this Issue Oct 9, 2018 · 7 comments

Comments

Projects
None yet
6 participants
@dmitshur
Member

dmitshur commented Oct 9, 2018

Given that #13560 has been accepted, resolved, and by now, widely accepted by the Go community, I think it can be helpful to have a Go parser for it that tools written in Go could use (if desired).

It's relatively easy to write an ad hoc parser using the regexp package, but it's also possible to write a more specialized one that has less overhead.

I already wrote one a while ago, and it currently lives at github.com/shurcooL/go/generated.

I want to move it out of the repository it's currently in, which contains many miscellaneous Go packages of lower utility and quality. I was originally planning to move it out into a standalone repository on my personal site, but then I thought it might be a good fit under x/tools subrepo, specifically, in the x/tools/go directory, since it deals with Go code. The proposed import path would be:

import "golang.org/x/tools/go/generated"

Hence this proposal. If accepted, I'm happy to maintain it/be the owner. The scope is very narrow, so it should be very low volume of work.

Not sure how this intersects with #17244.

If not accepted, I would likely move it here instead:

import "dmitri.shuralyov.com/go/generated"

(The code is currently MIT licensed, but in either case, I'd relicense it under the Go license.)

/cc @andybons @bradfitz @alandonovan @matloob @ianthehat

@gopherbot gopherbot added this to the Proposal milestone Oct 9, 2018

@mvdan

This comment has been minimized.

Show comment
Hide comment
@mvdan

mvdan Oct 9, 2018

Member

Is the only concern about the regular expression its speed? If so, I'd rather improve the speed of regexp instead. There are some performance issues for the package already, so one of them may already cover this particular expression: #11646 #26623 #21463

I realise that the regex package may take a while to be fast even for this one case, but I don't think there's a sense of urgency to have a faster implementation of it in x/tools.

Member

mvdan commented Oct 9, 2018

Is the only concern about the regular expression its speed? If so, I'd rather improve the speed of regexp instead. There are some performance issues for the package already, so one of them may already cover this particular expression: #11646 #26623 #21463

I realise that the regex package may take a while to be fast even for this one case, but I don't think there's a sense of urgency to have a faster implementation of it in x/tools.

@alandonovan

This comment has been minimized.

Show comment
Hide comment
@alandonovan

alandonovan Oct 9, 2018

Contributor

Can the predicate be expressed as a function of the AST? Very often the tools that need this predicate have already parsed the file.

package astutil

// Generated reports whether the file was generated by a program,
// not handwritten, following the conventions described in #13560.
// The syntax tree must have been invoked with the ParseComment flag.
// Example:
//   f, err := parser.ParseFile(fset, filename, parser.ParseComment|parser.ImportOnly)
//   if err != nil { ... }
//   g := Generated(f)
func Generated(f *ast.File) bool { ... }

It seems like a good fit for the x/tools/go/astutil package since it doesn't add any dependencies.

Contributor

alandonovan commented Oct 9, 2018

Can the predicate be expressed as a function of the AST? Very often the tools that need this predicate have already parsed the file.

package astutil

// Generated reports whether the file was generated by a program,
// not handwritten, following the conventions described in #13560.
// The syntax tree must have been invoked with the ParseComment flag.
// Example:
//   f, err := parser.ParseFile(fset, filename, parser.ParseComment|parser.ImportOnly)
//   if err != nil { ... }
//   g := Generated(f)
func Generated(f *ast.File) bool { ... }

It seems like a good fit for the x/tools/go/astutil package since it doesn't add any dependencies.

@alandonovan

This comment has been minimized.

Show comment
Hide comment
@alandonovan

alandonovan Oct 9, 2018

Contributor

Regular expressions are great for experimenting with complicated patterns, but you don't need the regexp package for an expression this trivial:

strings.HasPrefix(line, "// Code generated ") && strings.HasSuffix(line, " DO NOT EDIT.")
Contributor

alandonovan commented Oct 9, 2018

Regular expressions are great for experimenting with complicated patterns, but you don't need the regexp package for an expression this trivial:

strings.HasPrefix(line, "// Code generated ") && strings.HasSuffix(line, " DO NOT EDIT.")
@mark-rushakoff

This comment has been minimized.

Show comment
Hide comment
@mark-rushakoff

mark-rushakoff Oct 9, 2018

Contributor

This POSIX-compliant grep command has worked fine for my needs in shell scripts:

grep -Exq '^// Code generated .* DO NOT EDIT\.$' "$file"
# Exits 0 if $file matches.

In my case, I'm not concerned about a false positive of that line occurring in a block comment or in a raw string literal.

Contributor

mark-rushakoff commented Oct 9, 2018

This POSIX-compliant grep command has worked fine for my needs in shell scripts:

grep -Exq '^// Code generated .* DO NOT EDIT\.$' "$file"
# Exits 0 if $file matches.

In my case, I'm not concerned about a false positive of that line occurring in a block comment or in a raw string literal.

@dmitshur

This comment has been minimized.

Show comment
Hide comment
@dmitshur

dmitshur Oct 9, 2018

Member

Is the only concern about the regular expression its speed?

No. I wanted to avoid the regexp dependency because it was easy to implement the matching behavior myself.

The regexp package can still be optimized independently of this code.

Can the predicate be expressed as a function of the AST?

Good question. I'd be curious to try. I expect it should be possible, but I'm not sure if it'd be faster.

Note that not only comments would have to be checked in the AST, but raw string literals as well. A .go file with the following code needs to be reported as a positive match for the pattern:

package p

import "fmt"

func Foo() {
	const s = `this is a raw string literal. it happens to contain
some text in column 1, including a string that matches
the "Code generated ... DO NOT EDIT." comment, like so:
// Code generated  DO NOT EDIT.
to product a correct result, it needs to be detected`
	fmt.Println(len(s))
}

Since it does in fact contain a line of text that matches the regular expression ^// Code generated .* DO NOT EDIT\.$.

This POSIX-compliant grep command has worked fine for my needs in shell scripts:

Indeed. It's very quick and easy to implement a parser for https://golang.org/s/generatedcode format in any language that has support for regular expressions. This package is meant to be available for use by tools that are written in Go, and prefer to avoid incurring the cost of spawning a process or importing the regexp package.

In my case, I'm not concerned about a false positive of that line occurring in a block comment or in a raw string literal.

Those are not false positives. According to the spec, a file is considered generated if a matching line of text appears anywhere in the file, which can include block comments and raw string literals.

Member

dmitshur commented Oct 9, 2018

Is the only concern about the regular expression its speed?

No. I wanted to avoid the regexp dependency because it was easy to implement the matching behavior myself.

The regexp package can still be optimized independently of this code.

Can the predicate be expressed as a function of the AST?

Good question. I'd be curious to try. I expect it should be possible, but I'm not sure if it'd be faster.

Note that not only comments would have to be checked in the AST, but raw string literals as well. A .go file with the following code needs to be reported as a positive match for the pattern:

package p

import "fmt"

func Foo() {
	const s = `this is a raw string literal. it happens to contain
some text in column 1, including a string that matches
the "Code generated ... DO NOT EDIT." comment, like so:
// Code generated  DO NOT EDIT.
to product a correct result, it needs to be detected`
	fmt.Println(len(s))
}

Since it does in fact contain a line of text that matches the regular expression ^// Code generated .* DO NOT EDIT\.$.

This POSIX-compliant grep command has worked fine for my needs in shell scripts:

Indeed. It's very quick and easy to implement a parser for https://golang.org/s/generatedcode format in any language that has support for regular expressions. This package is meant to be available for use by tools that are written in Go, and prefer to avoid incurring the cost of spawning a process or importing the regexp package.

In my case, I'm not concerned about a false positive of that line occurring in a block comment or in a raw string literal.

Those are not false positives. According to the spec, a file is considered generated if a matching line of text appears anywhere in the file, which can include block comments and raw string literals.

@mark-rushakoff

This comment has been minimized.

Show comment
Hide comment
@mark-rushakoff

mark-rushakoff Oct 9, 2018

Contributor

Those are not false positives.

Good to know that grepping this way shouldn't hit any false positives then :)

This package is meant to be available for use by tools that are written in Go

Anecdotally: my most frequent case for needing to check whether a file is auto-generated has been to manually filter through go list output in a shell script, to decide whether a file should be run through go fmt. So for me, it would be more helpful if I could use {{if .Generated}} as part of a template passed to go list -f. But changing go list is not this issue.

Contributor

mark-rushakoff commented Oct 9, 2018

Those are not false positives.

Good to know that grepping this way shouldn't hit any false positives then :)

This package is meant to be available for use by tools that are written in Go

Anecdotally: my most frequent case for needing to check whether a file is auto-generated has been to manually filter through go list output in a shell script, to decide whether a file should be run through go fmt. So for me, it would be more helpful if I could use {{if .Generated}} as part of a template passed to go list -f. But changing go list is not this issue.

@rsc

This comment has been minimized.

Show comment
Hide comment
@rsc

rsc Oct 10, 2018

Contributor

This would be a 1-declaration 1-line package. We try to avoid those. Please propose a new method or top-level function in an existing package, like probably go/ast.

Contributor

rsc commented Oct 10, 2018

This would be a 1-declaration 1-line package. We try to avoid those. Please propose a new method or top-level function in an existing package, like probably go/ast.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment