New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: build: define standard way to recognize machine-generated files #13560

Closed
dmitshur opened this Issue Dec 10, 2015 · 103 comments

Comments

Projects
None yet
@dmitshur
Member

dmitshur commented Dec 10, 2015

Latest Status

Edit in 2018: This is a large issue with many comments (so many, that GitHub hides most of them by default). Here is the summary of the latest status.

This proposal has been accepted and implemented by Rob Pike. The final format can be seen here (it links to a comment in this thread by Rob Pike, with the final format that was chosen):

https://golang.org/s/generatedcode

By now, most of the generated Go code uses a comment that matches the format that's described there.

The original proposal is below.


Abstract

I propose Go creates a standardized format, which would enable code-generating tools to reliably communicate to humans and other machine tools that the output is in fact a generated file. Additionally, Go should add a recommended style for a simple code generated disclaimer (which satisfies the above criteria).

Proposed Definition

A file is considered to be "generated" if and only if the maintainer(s) of the project consider it a non-canonical source. In order to make long-term changes to such files, another source must be modified, and the file in question is then fully (re)generated by a reproducible machine tool.

A distinguishable property of generated files is that they can be deleted and re-generated with a zero diff.

Background

One of the strong values that Go brings are conventions and best practices that reduce bikeshedding, increase consistency and readability across diverse teams of Go programmers. Having a well defined convention, format, or standard for things that are unimportant to the key task, but need to have some value saves time.

During Gopherfest SV 2015, Rob Pike gave a talk Go Proverbs (video, bullet-point summary) that mentioned:

Gofmt's style is no one's favorite, yet gofmt is everyone's favorite.

To expand on that, there are many examples for things that have have a recommend format/style in Go that let you simply reuse that and not force you (and other people) to invent your own style:

Description

There is one type of comment which is commonly used, but has no existing well-defined officially suggested style recommended by Go.

It is a comment that most tools that generate Go code tend to write somewhere at the top of the code.

There are currently many variations of such disclaimer headers in the wild, and they often vary insignificantly (in spacing, punctuation, etc.). New variations come to be when authors look at how other tools do this, see a large variance, end up picking their favorite and tweaking it.

Consider the following examples in the wild:

// generated by stringer -type Pill pill.go; DO NOT EDIT

// Code generated by "stringer -type Pill pill.go"; DO NOT EDIT

// Code generated by vfsgen; DO NOT EDIT

// Created by cgo -godefs - DO NOT EDIT

/* Created by cgo - DO NOT EDIT. */

// Generated by stringer -i a.out.go -o anames.go -p ppc64
// Do not edit.

// DO NOT EDIT
// generated by: x86map -fmt=decoder ../x86.csv

// DO NOT EDIT.
// Generate with: go run gen.go -full -output md5block.go

// generated by "go run gen.go". DO NOT EDIT.

// DO NOT EDIT. This file is generated by mksyntaxgo from the RE2 distribution.

// GENERATED BY make_perl_groups.pl; DO NOT EDIT.

// generated by mknacl.sh - do not edit

// DO NOT EDIT ** This file was generated with the bake tool ** DO NOT EDIT //

// Generated by running
//  maketables --tables=all --data=http://www.unicode.org/Public/8.0.0/ucd/UnicodeData.txt --casefolding=http://www.unicode.org/Public/8.0.0/ucd/CaseFolding.txt
// DO NOT EDIT

/*
* CODE GENERATED AUTOMATICALLY WITH github.com/ernesto-jimenez/gogen/unmarshalmap
* THIS FILE SHOULD NOT BE EDITED BY HAND
*/

This creates 2 problems.

  1. It's a problem for authors of code generator tools. Such authors need to spend time figuring out what format of a disclaimer header they want to use; there is no canonical standardized format.
  2. It's a problem for authors of tools that want to be able to use the information (in a helpful way) whether certain files are generated or not. There is no simple implementation that will catch all of the variations above, and there is no standardized machine-readable format that they can detect.

This leads to circular arguments and PRs/CLs. For example, see the discussion and the change itself at https://go-review.googlesource.com/#/c/15073/. It started from https://github.com/github/linguist/blob/473282d/lib/linguist/generated.rb#L241, which led to CL 15073. That lead to shurcooL/vfsgen@b2aab1c and shurcooL/go@43b2166. But the initial GitHub behavior came from protobuf disclaimers.

I've created the following func to try to answer the question if a file is generated. At this time, it uses heuristics and best-effort to tell if a file is generated. https://github.com/shurcooL/go/blob/master/analysis/generated_detection.go If there was a well defined requirement for tools to follow, this code can be made simpler and more reliable. Ideally, that helper should be moved into external library for people to reuse, and for generator tools that wish to be compliant to be able to use it for verification.

Goals

The goals of this proposal are twofold.

Primarily, to resolve the current impossibility of reliable communication between code generator tool output, and tools that try to determine if a file is code generated.

There should be a way for code generator tool authors to be able to express in their generated output that the file is generated, such that it's possible to reliably detect if a file is generated by other tools.

Secondarily, for code generator authors that simply don't care about what their disclaimer header looks like, provide a recommended style (that satisfies the first condition) template to use.

The implementation details should be defined in a design doc.

Non-goal

It is a non-goal to figure out how existing tools should choose to use or not use the fact whether a given file is generated.

There is some fear that if it's possible to determine if a file is generated reliably, then tools that display code differences will hide generated code differences. That is absolutely the choice of the tool, and in my opinion it should not enforce any behavior that users are unhappy with.

Having additional information (whether a file is generated or not) should enable tools to offer better user experiences - it should not cause tools offer worse experiences than currently.

This proposal focuses solely on enabling code generator tool authors that wish to use a standard disclaimer header to do without forcing them to invent their own format, and for tool authors that wish to make use of information whether a file is generated or not to be able to use that information as they wish. Details of how they do that is outside the scope.

Conclusion

By not standardizing a way for those two types of tools to communicate, it leads to ad-hoc solutions that are sub-optimal emerging, as can be seen above. Go has an opportunity to de-fragment this space and create a recommended standard format that will resolve the needs above, and allow people to migrate existing tools to use the specified format.

Once there's a standard, it's easy to begin updating existing tools towards it over time, and new generator/other tools can start relying on it.

Meta-disclaimer

I expect coming up with a recommended style may likely cause a lot of bikeshedding. However, I think it's a cost that's worth incurring, to go through this process once, so that we can avoid having to continuously suffer it while there's no standard at all. I personally don't care too much about what the actual format is (as much as I do about resolving the higher level problems described); I'm okay with whatever Go authors come up with. Any standard is better than no standard at all.

@dmitshur

This comment has been minimized.

Member

dmitshur commented Dec 10, 2015

This proposal comes from the result of discussion in CL 15073. /cc @bradfitz @dominikh @minux

@ianlancetaylor ianlancetaylor changed the title from proposal: Standardize format for simple code generator disclaimer headers; enable a reliable machine-readable way to determine if a file is generated. to proposal: standardize format for simple code generator disclaimer headers; enable a reliable machine-readable way to determine if a file is generated. Dec 10, 2015

@ianlancetaylor ianlancetaylor added this to the Unplanned milestone Dec 10, 2015

@zellyn

This comment has been minimized.

zellyn commented Dec 10, 2015

How about // DO NOT EDIT. Generated by $GENERATOR_NAME_OR_COMMANDLINE on the first line of the file?

@bradfitz

This comment has been minimized.

Member

bradfitz commented Dec 10, 2015

@zellyn, I think we need to agree that this is a good idea to standardize before we pick paint colors. :)

@zellyn

This comment has been minimized.

zellyn commented Dec 10, 2015

@bradfitz Agreed. Just curious why the proposal didn't include a strawman proposal for format. :-)

@dmitshur

This comment has been minimized.

Member

dmitshur commented Dec 10, 2015

I think a good and simple way to approach it would be to add somewhere (Go style guide, or a more internal document) a section that says something like this:

If a code generator tool wishes to mark a file as "generated" in a way so it can be reliably understood by other tools to be generated, then it must satisfy these criteria:

  1. {{Simple rule to follow.}}
  2. {{Another simple rule that must be followed.}}

As an example, the following disclaimer will satisfy both the criteria above:

{{Example disclaimer template.}}

That way, people who don't want to be creative can just copy/paste that example disclaimer template and use it.

However, if someone wants to add more details or make their disclaimer different, they can still do that. They just need to follow rules 1 and 2 so that it can still be recognized by tools as a generated disclaimer header.

To detect if a given file is generated, you'd only need to check if those described rules are satisfied - no need for any other heuristics. (If there are some false negatives, then the generator can be updated to satisfy the rules, instead of muddying the detection algorithm.)

I think putting it that way makes this workable. I don't think it's possible to enforce a single header format that will satisfy everyone's needs/wants and have people be okay with it. But providing simple rules that the header will follow and giving an example would work splendidly.

@rakyll

This comment has been minimized.

Member

rakyll commented Dec 11, 2015

/cc @hyangah for gobind generated files.

@adg

This comment has been minimized.

Contributor

adg commented Aug 15, 2016

This proposal sounds like a good idea. We should use the same string that is recognized by GitHub.

@robpike

This comment has been minimized.

Contributor

robpike commented Aug 15, 2016

Seems like a reasonable way to address the problem that IDEs that depend on tools are too fussy about generated code.

@robpike robpike self-assigned this Aug 15, 2016

@Kroc

This comment has been minimized.

Kroc commented Sep 6, 2016

And what about people writing in languages that are not, er, English? :P
There could be syntax for "this is machine readable". Off the top of my head, a special form of comment; Rust does this for embedding tests within doc-blocks.

/// ... the triple slash
//! ... double-slash-bang

What actual machine-readable text should be used to denote a generated file is wider issue. This quickly explodes in scope and potential bloat in the future -- e.g. Rusts function decorations.

@dmitshur

This comment has been minimized.

Member

dmitshur commented Sep 8, 2016

And what about people writing in languages that are not, er, English? :P

That's a valid point, and I think it should be considered.

We should use the same string that is recognized by GitHub.

The outcome may happen to be a string that is recognized by GitHub with their current code in linguist, but I respectfully disagree with this being a high priority criteria (if that's what was implied, I'm not sure).

As I mentioned in the proposal, what GitHub came to recognize as a Go generated file had a high degree of luck and variance in it (and circular arguments):

This leads to circular arguments and PRs/CLs. For example, see the discussion and the change itself at https://go-review.googlesource.com/#/c/15073/. It started from https://github.com/github/linguist/blob/473282d/lib/linguist/generated.rb#L241, which led to CL 15073. That lead to shurcooL/vfsgen@b2aab1c and shurcooL/go@43b2166. But the initial GitHub behavior came from protobuf disclaimers.

Specifically, see these two PRs that have shaped what GitHub recognizes today:

  1. https://github.com/github/linguist/pull/2152/files - Detect Go files generated by Protocol Buffers
  2. https://github.com/github/linguist/pull/2426/files - Detect Go files generated by go-bindata

Notice the motivation of those PRs (first one was to detect protobuf generated files, and the second was modeled after the first to detect go-bindata output). Also notice how easy it was to get them merged in.

Now, maybe we got lucky with the above sequence of events and what github recognizes today is a great format. But if not, I think it's a relatively low effort followup to submit a PR to GitHub's linguist – similar to those 2 PRs above – to make any necessary corrections after this proposal is resolved and Go has an officially recommended and recognized way to indicate that a file is generated.

Compare that with the alternative of Go using a potentially suboptimal mechanism for many, many years to come.

Seems like a reasonable way to address the problem that IDEs that depend on tools are too fussy about generated code.

robpike self-assigned this on Aug 15

I am very happy with that outcome and I trust whatever Rob Pike comes up with is going to be a great resolution for this issue.

@uluyol

This comment has been minimized.

Contributor

uluyol commented Sep 8, 2016

And what about people writing in languages that are not, er, English? :P

I don't see this as a big issue. Autogenerated files are usually not meant for being read by people, and I don't know how someone would write Go code without being able to read English anyway.

@robpike

This comment has been minimized.

Contributor

robpike commented Oct 16, 2016

Not happening in 1.8

@robpike robpike modified the milestones: Go1.9, Go1.8 Oct 16, 2016

@rakyll

This comment has been minimized.

Member

rakyll commented Oct 16, 2016

Autogenerated files are usually not meant for being read by people

This is not always true, e.g. protoc output makes the majority of the generated code. Type definitions are very central and easily one the one of the mostly read parts of a code base.

@stanim

This comment has been minimized.

stanim commented Oct 23, 2016

How about using the file extension ".gen.go" (or suffix _gen.go) for generated files, instead of standardising comments? This had two advantages:

  1. The format of the comment is irrelevant (and as such language independent)
  2. There is even no need to open the file to check if the file is generated, both by humans and by tools. This will also make tools faster (such as golint) which can skip the file immediately based on filename, instead of forcing them to open the file and inspect the source code for comments.
@dominikh

This comment has been minimized.

Member

dominikh commented Oct 23, 2016

The benefit of standardizing on a comment is that fewer tools would need to be changed, as most tools already generate comments that can be matched by a common phrase. Virtually no tool, however, uses a gen suffix.

@robpike

This comment has been minimized.

Contributor

robpike commented Oct 23, 2016

@stanim - That's a non-starter. Many generator tools, including multilingual ones, already make decisions about how to name their files. One I know personally is that the protocol buffer compiler uses .pb.go.

So a comment it will be.

@minux

This comment has been minimized.

Member

minux commented Oct 25, 2016

@Kroc

This comment has been minimized.

Kroc commented Oct 25, 2016

Actually, I was just thinking today of suggesting that generated files could be zipped; i.e. ".gogz"; go doc could still read these, but they wouldn't be readable in a normal text-editor, a good way of hinting that you shouldn't be touching the file and that the contents are not important outside of the API (go doc, if used)

tgulacsi added a commit to tgulacsi/oracall that referenced this issue Jun 5, 2018

boromisp added a commit to boromisp/enumer that referenced this issue Jun 10, 2018

jpeirson pushed a commit to jpeirson/ardielle-tools that referenced this issue Jun 13, 2018

Jeff Peirson
Added the "DO NOT EDIT" autogenerated code header
as suggested by golang/go#13560 (comment) to prevent go tools from modifying autogenerated code

jpeirson pushed a commit to jpeirson/ardielle-tools that referenced this issue Jun 13, 2018

Jeff Peirson
Added the "DO NOT EDIT" autogenerated code header
as suggested by golang/go#13560 (comment) to prevent go tools from modifying autogenerated code

SOF3 pushed a commit to SOF3/go-stringer-inverse that referenced this issue Aug 23, 2018

cmd/stringer: tweak "Code generated by" comment to match new standard
See https://golang.org/issue/13560 for the full discussion.

The actual change is just the addition of a final period.

Update golang/go#13560

Change-Id: Icc2f52b67181de344aa5107f94faa0e739ff993c
Reviewed-on: https://go-review.googlesource.com/38415
Reviewed-by: Ian Lance Taylor <iant@golang.org>

SOF3 pushed a commit to SOF3/go-stringer-inverse that referenced this issue Aug 23, 2018

cmd/stringer: tweak "Code generated by" comment to match new standard
See https://golang.org/issue/13560 for the full discussion.

The actual change is just the addition of a final period.

Update golang/go#13560

Change-Id: Icc2f52b67181de344aa5107f94faa0e739ff993c
Reviewed-on: https://go-review.googlesource.com/38415
Reviewed-by: Ian Lance Taylor <iant@golang.org>

gannett-dcorzine added a commit to GannettDigital/jstransform that referenced this issue Sep 10, 2018

PENG-6044: Use standard "Code generated" format.
Change code generated comment to follow the new Go 1.9 standard format.

Reference: golang/go#13560

gannett-dcorzine added a commit to GannettDigital/jstransform that referenced this issue Sep 14, 2018

PENG-6044: Standardize "Code generated" comment.
Change code generated comment to follow the new Go 1.9 standard format.

Reference: golang/go#13560

grailbot added a commit to grailbio/base that referenced this issue Nov 7, 2018

gtl: improve generate.py
Summary:
- Fix phrasing: "Generated from" -> "Generated by".  The latter is the standard
  phrasing for generated files. golang/go#13560

- Pass the output through goimports.

Reviewers: marius

Reviewed By: marius

Differential Revision: https://phabricator.grailbio.com/D20546

fbshipit-source-id: c140e7e

jterry75 added a commit to Microsoft/hcsshim that referenced this issue Nov 30, 2018

Update generated file headers for mksyscall_windows
Update to comply with the go standard for generated files found here:
golang/go#13560 (comment)

Signed-off-by: Justin Terry (VM) <juterry@microsoft.com>

@pongad pongad referenced this issue Dec 3, 2018

Merged

add gencli plugin #53

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment