Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/tools/go/generated: parser for https://golang.org/s/generatedcode format #28089

Open
dmitshur opened this issue Oct 9, 2018 · 11 comments

Comments

@dmitshur
Copy link
Member

commented Oct 9, 2018

Given that #13560 has been accepted, resolved, and by now, widely accepted by the Go community, I think it can be helpful to have a Go parser for it that tools written in Go could use (if desired).

It's relatively easy to write an ad hoc parser using the regexp package, but it's also possible to write a more specialized one that has less overhead.

I already wrote one a while ago, and it currently lives at github.com/shurcooL/go/generated.

I want to move it out of the repository it's currently in, which contains many miscellaneous Go packages of lower utility and quality. I was originally planning to move it out into a standalone repository on my personal site, but then I thought it might be a good fit under x/tools subrepo, specifically, in the x/tools/go directory, since it deals with Go code. The proposed import path would be:

import "golang.org/x/tools/go/generated"

Hence this proposal. If accepted, I'm happy to maintain it/be the owner. The scope is very narrow, so it should be very low volume of work.

Not sure how this intersects with #17244.

If not accepted, I would likely move it here instead:

import "dmitri.shuralyov.com/go/generated"

(The code is currently MIT licensed, but in either case, I'd relicense it under the Go license.)

/cc @andybons @bradfitz @alandonovan @matloob @ianthehat

@gopherbot gopherbot added this to the Proposal milestone Oct 9, 2018

@mvdan

This comment has been minimized.

Copy link
Member

commented Oct 9, 2018

Is the only concern about the regular expression its speed? If so, I'd rather improve the speed of regexp instead. There are some performance issues for the package already, so one of them may already cover this particular expression: #11646 #26623 #21463

I realise that the regex package may take a while to be fast even for this one case, but I don't think there's a sense of urgency to have a faster implementation of it in x/tools.

@alandonovan

This comment has been minimized.

Copy link
Contributor

commented Oct 9, 2018

Can the predicate be expressed as a function of the AST? Very often the tools that need this predicate have already parsed the file.

package astutil

// Generated reports whether the file was generated by a program,
// not handwritten, following the conventions described in #13560.
// The syntax tree must have been invoked with the ParseComment flag.
// Example:
//   f, err := parser.ParseFile(fset, filename, parser.ParseComment|parser.ImportOnly)
//   if err != nil { ... }
//   g := Generated(f)
func Generated(f *ast.File) bool { ... }

It seems like a good fit for the x/tools/go/astutil package since it doesn't add any dependencies.

@alandonovan

This comment has been minimized.

Copy link
Contributor

commented Oct 9, 2018

Regular expressions are great for experimenting with complicated patterns, but you don't need the regexp package for an expression this trivial:

strings.HasPrefix(line, "// Code generated ") && strings.HasSuffix(line, " DO NOT EDIT.")
@mark-rushakoff

This comment has been minimized.

Copy link
Contributor

commented Oct 9, 2018

This POSIX-compliant grep command has worked fine for my needs in shell scripts:

grep -Exq '^// Code generated .* DO NOT EDIT\.$' "$file"
# Exits 0 if $file matches.

In my case, I'm not concerned about a false positive of that line occurring in a block comment or in a raw string literal.

@dmitshur

This comment has been minimized.

Copy link
Member Author

commented Oct 9, 2018

Is the only concern about the regular expression its speed?

No. I wanted to avoid the regexp dependency because it was easy to implement the matching behavior myself.

The regexp package can still be optimized independently of this code.

Can the predicate be expressed as a function of the AST?

Good question. I'd be curious to try. I expect it should be possible, but I'm not sure if it'd be faster.

Note that not only comments would have to be checked in the AST, but raw string literals as well. A .go file with the following code needs to be reported as a positive match for the pattern:

package p

import "fmt"

func Foo() {
	const s = `this is a raw string literal. it happens to contain
some text in column 1, including a string that matches
the "Code generated ... DO NOT EDIT." comment, like so:
// Code generated  DO NOT EDIT.
to product a correct result, it needs to be detected`
	fmt.Println(len(s))
}

Since it does in fact contain a line of text that matches the regular expression ^// Code generated .* DO NOT EDIT\.$.

This POSIX-compliant grep command has worked fine for my needs in shell scripts:

Indeed. It's very quick and easy to implement a parser for https://golang.org/s/generatedcode format in any language that has support for regular expressions. This package is meant to be available for use by tools that are written in Go, and prefer to avoid incurring the cost of spawning a process or importing the regexp package.

In my case, I'm not concerned about a false positive of that line occurring in a block comment or in a raw string literal.

Those are not false positives. According to the spec, a file is considered generated if a matching line of text appears anywhere in the file, which can include block comments and raw string literals.

@mark-rushakoff

This comment has been minimized.

Copy link
Contributor

commented Oct 9, 2018

Those are not false positives.

Good to know that grepping this way shouldn't hit any false positives then :)

This package is meant to be available for use by tools that are written in Go

Anecdotally: my most frequent case for needing to check whether a file is auto-generated has been to manually filter through go list output in a shell script, to decide whether a file should be run through go fmt. So for me, it would be more helpful if I could use {{if .Generated}} as part of a template passed to go list -f. But changing go list is not this issue.

@rsc

This comment has been minimized.

Copy link
Contributor

commented Oct 10, 2018

This would be a 1-declaration 1-line package. We try to avoid those. Please propose a new method or top-level function in an existing package, like probably go/ast.

@rsc

This comment has been minimized.

Copy link
Contributor

commented Oct 17, 2018

Ping @dmitshur; see previous comment.

@rsc rsc added the WaitingForInfo label Oct 17, 2018

@dmitshur

This comment has been minimized.

Copy link
Member Author

commented Oct 18, 2018

I agree with your comment @rsc that this would be a small package, and it's better to avoid that in x/tools.

I've been thinking about where this functionality could fit into existing packages, and I have some initial thoughts I can share. Please note this isn't a final proposal; just an iteration I want to share. I'm not completely happy with it and wouldn't want it implemented as is.

WIP Counter-Proposal Revision 1

Out of the existing packages, this functionality seemed to belong most in go/parser. The API surface of the change would be somewhat large. It would involve a new parser.Mode:

// A Mode value is a set of flags (or 0).
// They control the amount of source code parsed and other optional
// parser functionality.
//
type Mode uint

const (
	PackageClauseOnly Mode             = 1 << iota // stop parsing after package clause
	ImportsOnly                                    // stop parsing after import declarations
	ParseComments                                  // parse comments and add them to AST
+	GeneratedComment                               // determine whether the file contains a "// Code generated … DO NOT EDIT." line comment
	Trace                                          // print a trace of parsed productions
	DeclarationErrors                              // report declaration errors
	SpuriousErrors                                 // same as AllErrors, for backward-compatibility
	AllErrors         = SpuriousErrors             // report all errors (not just the first 10 on different lines)
)

When the mode is turned on, parser.ParseFile would perform additional work to determine if the .go file has a "// Code generated … DO NOT EDIT." line comment.

(There is some precedent to this in the go/build package, specifically the ImportComment import mode.)

In order for parser.ParseFile to report the result, ast.File in go/ast package would also need to be modified:

// A File node represents a Go source file.
//
// [...]
type File struct {
	Doc        *CommentGroup   // associated documentation; or nil
	Package    token.Pos       // position of "package" keyword
	Name       *Ident          // package name
	Decls      []Decl          // top-level declarations; or nil
	Scope      *Scope          // package scope (this file only)
	Imports    []*ImportSpec   // imports in this file
	Unresolved []*Ident        // unresolved identifiers in this file
	Comments   []*CommentGroup // list of all comments in the source file
+	Generated  *Comment        // a "// Code generated … DO NOT EDIT." line comment; or nil
}

At first, this seemed like the most appropriate place in existing packages to add this functionality. However, after considering the details more closely, it doesn't seem like a good fit.

The "// Code generated … DO NOT EDIT." comment lives in parallel to the actual AST structure of a file. According to the spec, such a comment can appear inside a block comment, and it would still count. It can also appear within a raw string literal, and match. The work that must be done to determine if a .go file has such a comment has very little to do with parsing the actual AST of a .go file.

WIP Counter-Proposal Revision 2

An alternative proposal to consider is to add this functionality to go/build package in the following way. Add a GeneratedComment import mode, similar to the existing build.ImportComment:

 // If ImportComment is set, parse import comments on package statements.
 // Import returns an error if it finds a comment it cannot understand
 // or finds conflicting comments in multiple source files.
 // See golang.org/s/go14customimport for more information.
 ImportComment
+
+// If GeneratedComment is set, determine which .go files contain
+// a "// Code generated … DO NOT EDIT." line comment,
+// and populate GeneratedGoFiles with the results.
+// See golang.org/s/generatedcode for more information.
+GeneratedComment

The result would be populated in a new build.Package field:

type Package struct {
	// ...

	// Source files
	GoFiles          []string // .go source files (excluding CgoFiles, TestGoFiles, XTestGoFiles)
	CgoFiles         []string // .go source files that import "C"
	IgnoredGoFiles   []string // .go source files ignored for this build
	InvalidGoFiles   []string // .go source files with detected problems (parse error, wrong package name, and so on)
+	GeneratedGoFiles []string // .go source files containing a contains a "// Code generated … DO NOT EDIT." line comment; populated only if GeneratedComment mode is set
	CFiles           []string // .c source files
	// ...

This approach seems to be cleaner than adding something to go/parser and ast.File, but I'm not sure if it's good either. The information about whether or not a file is generated feels like it should belong at the file level, not the package level.

There's also the factor that golang.org/x/tools/go/packages is meant to be used instead of go/build by now, so adding something new to go/build doesn't seem like a good idea.

A "WIP Counter-Proposal Revision 3" section can/should be added where a similar approach as in revision 2 can be evaluated for the golang.org/x/tools/go/packages package.

Conclusion

I would lean towards either putting this proposal on hold (or waiting to iterate on it further), or outright declining it. I made the proposal because I was planning to move the existing package, and wanted to check if there'd be a more fitting place for it in x/tools. It seems the existing package doesn't meet the criteria for inclusion.

Additionally, given how easy it is to parse the generated comment with just regexp, there isn't a high demand for this functionality as far as I can tell (e.g., github.com/shurcooL/go/generated currently has only 1 known public importer), so it's unclear if the additional work of integrating it into an existing package is justified at this time.

@griesemer griesemer self-assigned this Oct 31, 2018

@griesemer

This comment has been minimized.

Copy link
Contributor

commented Oct 31, 2018

@dmitshur Let's discuss this in person. Per @golang/proposal-review we're ok with adding a utility function iff we can decide on a good place for it and a good API.

@griesemer griesemer changed the title proposal: x/tools/go/generated: parser for https://golang.org/s/generatedcode format x/tools/go/generated: parser for https://golang.org/s/generatedcode format Oct 31, 2018

@griesemer griesemer modified the milestones: Proposal, Unplanned Oct 31, 2018

@gopherbot

This comment has been minimized.

Copy link

commented Jan 21, 2019

Change https://golang.org/cl/158817 mentions this issue: cmd/go: copy missing bit of documentation about code generated comment

gopherbot pushed a commit that referenced this issue Jan 22, 2019
cmd/go: copy missing bit of documentation about code generated comment
This CL attempts to restore the clarity of the original specification
at https://golang.org/s/generatedcode that the line may appear
anywhere. It is preferable (for human readability), and most common
for it to be early in the file, but that is merely a convention, not
a strict well-specified requirement. Document it as so.

Background

Issue #13560 was a proposal define a standard for marking files as
generated, one that is suitable to be recognized both by humans
and machine tools. It was accepted, and the final specification
was documented at https://golang.org/s/generatedcode. Its text,
copied exactly:

	Generated files are marked by a line of text that matches
	the regular expression, in Go syntax:

		^// Code generated .* DO NOT EDIT\.$

	The .* means the tool can put whatever folderol it wants in there,
	but the comment must be a single line and must start with Code generated
	and end with DO NOT EDIT., with a period.

	The text may appear anywhere in the file.

The https://golang.org/s/generatedcode link points to a comment
in a very large GitHub issue. That makes it harder to find.
Issue #25433 was opened about moving that information somewhere else.
It was resolved via CL 118756, which added text to cmd/go documentation
at https://golang.org/cmd/go/#hdr-Generate_Go_files_by_processing_source:

	To convey to humans and machine tools that code is generated,
	generated source should have a line early in the file that
	matches the following regular expression (in Go syntax):

		^// Code generated .* DO NOT EDIT\.$

The CL description noted that "This change merely moves that
information to a more visible place." The intention was to preserve
the specification unmodified.

The original specification was very clear that "The text may appear
anywhere in the file." The new text in cmd/go documentation wasn't
very clear. "A line early in the file" is not a precise enough criteria
to be recognized by a machine tool, because there isn't a precise
definition of what lines are "early in the file".

Updates #13560
Updates #25433
Updates #28089

Change-Id: I4e374163b16c3f972f9591ec2647fd3d5a2dd5ae
Reviewed-on: https://go-review.googlesource.com/c/158817
Reviewed-by: Rob Pike <r@golang.org>
dmitshur added a commit to shurcooL/go that referenced this issue Jul 4, 2019
generated: deprecate package in favor of moved copy
This package has been moved to a new home. Its new import path
is dmitri.shuralyov.com/go/generated. Deprecate this old copy
in favor of the new one.

Replace the implementation with a forwarding stub.

Updates golang/go#28089
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.