proposal: build: define standard way to recognize machine-generated files #13560
Edit in 2018: This is a large issue with many comments (so many, that GitHub hides most of them by default). Here is the summary of the latest status.
This proposal has been accepted and implemented by Rob Pike. The final format can be seen here (it links to a comment in this thread by Rob Pike, with the final format that was chosen):
By now, most of the generated Go code uses a comment that matches the format that's described there.
The original proposal is below.
I propose Go creates a standardized format, which would enable code-generating tools to reliably communicate to humans and other machine tools that the output is in fact a generated file. Additionally, Go should add a recommended style for a simple code generated disclaimer (which satisfies the above criteria).
A file is considered to be "generated" if and only if the maintainer(s) of the project consider it a non-canonical source. In order to make long-term changes to such files, another source must be modified, and the file in question is then fully (re)generated by a reproducible machine tool.
A distinguishable property of generated files is that they can be deleted and re-generated with a zero diff.
One of the strong values that Go brings are conventions and best practices that reduce bikeshedding, increase consistency and readability across diverse teams of Go programmers. Having a well defined convention, format, or standard for things that are unimportant to the key task, but need to have some value saves time.
To expand on that, there are many examples for things that have have a recommend format/style in Go that let you simply reuse that and not force you (and other people) to invent your own style:
There is one type of comment which is commonly used, but has no existing well-defined officially suggested style recommended by Go.
It is a comment that most tools that generate Go code tend to write somewhere at the top of the code.
There are currently many variations of such disclaimer headers in the wild, and they often vary insignificantly (in spacing, punctuation, etc.). New variations come to be when authors look at how other tools do this, see a large variance, end up picking their favorite and tweaking it.
Consider the following examples in the wild:
// generated by stringer -type Pill pill.go; DO NOT EDIT // Code generated by "stringer -type Pill pill.go"; DO NOT EDIT // Code generated by vfsgen; DO NOT EDIT // Created by cgo -godefs - DO NOT EDIT /* Created by cgo - DO NOT EDIT. */ // Generated by stringer -i a.out.go -o anames.go -p ppc64 // Do not edit. // DO NOT EDIT // generated by: x86map -fmt=decoder ../x86.csv // DO NOT EDIT. // Generate with: go run gen.go -full -output md5block.go // generated by "go run gen.go". DO NOT EDIT. // DO NOT EDIT. This file is generated by mksyntaxgo from the RE2 distribution. // GENERATED BY make_perl_groups.pl; DO NOT EDIT. // generated by mknacl.sh - do not edit // DO NOT EDIT ** This file was generated with the bake tool ** DO NOT EDIT // // Generated by running // maketables --tables=all --data=http://www.unicode.org/Public/8.0.0/ucd/UnicodeData.txt --casefolding=http://www.unicode.org/Public/8.0.0/ucd/CaseFolding.txt // DO NOT EDIT /* * CODE GENERATED AUTOMATICALLY WITH github.com/ernesto-jimenez/gogen/unmarshalmap * THIS FILE SHOULD NOT BE EDITED BY HAND */
This creates 2 problems.
This leads to circular arguments and PRs/CLs. For example, see the discussion and the change itself at https://go-review.googlesource.com/#/c/15073/. It started from https://github.com/github/linguist/blob/473282d/lib/linguist/generated.rb#L241, which led to CL 15073. That lead to shurcooL/vfsgen@b2aab1c and shurcooL/go@43b2166. But the initial GitHub behavior came from protobuf disclaimers.
I've created the following func to try to answer the question if a file is generated. At this time, it uses heuristics and best-effort to tell if a file is generated. https://github.com/shurcooL/go/blob/master/analysis/generated_detection.go If there was a well defined requirement for tools to follow, this code can be made simpler and more reliable. Ideally, that helper should be moved into external library for people to reuse, and for generator tools that wish to be compliant to be able to use it for verification.
The goals of this proposal are twofold.
Primarily, to resolve the current impossibility of reliable communication between code generator tool output, and tools that try to determine if a file is code generated.
There should be a way for code generator tool authors to be able to express in their generated output that the file is generated, such that it's possible to reliably detect if a file is generated by other tools.
Secondarily, for code generator authors that simply don't care about what their disclaimer header looks like, provide a recommended style (that satisfies the first condition) template to use.
The implementation details should be defined in a design doc.
It is a non-goal to figure out how existing tools should choose to use or not use the fact whether a given file is generated.
There is some fear that if it's possible to determine if a file is generated reliably, then tools that display code differences will hide generated code differences. That is absolutely the choice of the tool, and in my opinion it should not enforce any behavior that users are unhappy with.
Having additional information (whether a file is generated or not) should enable tools to offer better user experiences - it should not cause tools offer worse experiences than currently.
This proposal focuses solely on enabling code generator tool authors that wish to use a standard disclaimer header to do without forcing them to invent their own format, and for tool authors that wish to make use of information whether a file is generated or not to be able to use that information as they wish. Details of how they do that is outside the scope.
By not standardizing a way for those two types of tools to communicate, it leads to ad-hoc solutions that are sub-optimal emerging, as can be seen above. Go has an opportunity to de-fragment this space and create a recommended standard format that will resolve the needs above, and allow people to migrate existing tools to use the specified format.
Once there's a standard, it's easy to begin updating existing tools towards it over time, and new generator/other tools can start relying on it.
I expect coming up with a recommended style may likely cause a lot of bikeshedding. However, I think it's a cost that's worth incurring, to go through this process once, so that we can avoid having to continuously suffer it while there's no standard at all. I personally don't care too much about what the actual format is (as much as I do about resolving the higher level problems described); I'm okay with whatever Go authors come up with. Any standard is better than no standard at all.
The text was updated successfully, but these errors were encountered:
I think a good and simple way to approach it would be to add somewhere (Go style guide, or a more internal document) a section that says something like this:
That way, people who don't want to be creative can just copy/paste that example disclaimer template and use it.
However, if someone wants to add more details or make their disclaimer different, they can still do that. They just need to follow rules 1 and 2 so that it can still be recognized by tools as a generated disclaimer header.
To detect if a given file is generated, you'd only need to check if those described rules are satisfied - no need for any other heuristics. (If there are some false negatives, then the generator can be updated to satisfy the rules, instead of muddying the detection algorithm.)
I think putting it that way makes this workable. I don't think it's possible to enforce a single header format that will satisfy everyone's needs/wants and have people be okay with it. But providing simple rules that the header will follow and giving an example would work splendidly.
And what about people writing in languages that are not, er, English? :P
What actual machine-readable text should be used to denote a generated file is wider issue. This quickly explodes in scope and potential bloat in the future -- e.g. Rusts function decorations.
That's a valid point, and I think it should be considered.
The outcome may happen to be a string that is recognized by GitHub with their current code in linguist, but I respectfully disagree with this being a high priority criteria (if that's what was implied, I'm not sure).
As I mentioned in the proposal, what GitHub came to recognize as a Go generated file had a high degree of luck and variance in it (and circular arguments):
Specifically, see these two PRs that have shaped what GitHub recognizes today:
Notice the motivation of those PRs (first one was to detect protobuf generated files, and the second was modeled after the first to detect go-bindata output). Also notice how easy it was to get them merged in.
Now, maybe we got lucky with the above sequence of events and what github recognizes today is a great format. But if not, I think it's a relatively low effort followup to submit a PR to GitHub's linguist – similar to those 2 PRs above – to make any necessary corrections after this proposal is resolved and Go has an officially recommended and recognized way to indicate that a file is generated.
Compare that with the alternative of Go using a potentially suboptimal mechanism for many, many years to come.
I am very happy with that outcome and I trust whatever Rob Pike comes up with is going to be a great resolution for this issue.
I don't see this as a big issue. Autogenerated files are usually not meant for being read by people, and I don't know how someone would write Go code without being able to read English anyway.
How about using the file extension ".gen.go" (or suffix _gen.go) for generated files, instead of standardising comments? This had two advantages:
Having standard suffixes makes attaching build tags (GOOS and GOARCH) to file names impossible. Also, it will make generating test files impossible. If we must use file names to label generated files, I prefer the standard library's way of prefix the filename with "z", e.g. used in syscall and runtime packages.
Actually, I was just thinking today of suggesting that generated files could be zipped; i.e. ".gogz";
* room: delete unused code and simplify * alarm: move alarm code to its own package * gmlgo: add SetSize to RoomInstanceIndex, add IsRunning to alarm package, update Camera to update after user code * gmlgo: remove unused layer index code and add code to marshal binary data for Object and SpriteState * gmlgo: update to use internal struct * camera: add CameraSetUpdateFunction * instance: start on InstanceRestore functionality (to bring old instances back) * gmlgo: update serialization logic, change InstanceExists to not require the room to exist * gmlgo: update camera - update camera to render with the last created room if no instance following - update camera code to not call follow logic if an update function is set - fix state to ignore instances that were destroyed * alarm: add gob serialization of alarm * gmlgo: add "gmlgo build" command, remove unused "gmlgo fix" command code and start moving away from cobra package * gmlgo: remove cobra and add simple tests * gmlgo: change marshalling of structs to just use binary encoding (less payload size) * gmlgo: add manual serialization to various object related structs * gmlgo: more work on serialization * gmlgo: start on generating serialization code * gmlgo: more work on generating serialization code * gmlgo: basic serialization code working * gmlgo: add tests for build * gmlgo: add test for serialization error case * gmlgo: adjust code generation to only import certain packages if their needed * gmlgo: fix generating serialization code for int type * gmlgo: remove debug log functions * gmlgo: fix DrawSpriteRotated in tests (headless) and fix tags being passed into "go build" * gmlgo: update sprite updating to work like Game Maker Studio 2 * gmlgo: update asset system to be more flexible (allow nesting assets into sub-folders) * gmlgo: fix config loading not working * gmlgo: update code generation of asset IDs to generate into the "asset" folder * gmlgo: fix code generation tests * gmlgo: update assets to no longer auto-prefix, add error handling to stop duplicate asset names, add a test and rename assets in the example games * gmlgo: update DrawText to take color parameter, fix backspace key and various other changes * gmlgo: add support for "custom" asset folder * gmlgo: fix example games * gmlgo: add json tagging and add error-check to using UnsafeObjectTypeList * gmlgo: minor fixes * gmlgo: add MouseScreenPosition(), expose time functions and update text to draw relative to camera * gmlgo: add basic support for custom assets in code generation * go: add support for Go Modules and fix broken tests * generate: fix go generate logic and paths provided to engine * keyboard: fix backspace speed behaviour not resetting when backspace key is released * modules: run "go mod tidy" * draw: update draw text code so that smaller characters like "." and "_" are offset correctly and not drawn at the top * travis: fix build by updating to latest go version (1.12.5) * travis: remove sonarqube and remove manually downloading dependencies * travis: add dependencies back * generate: fix gmlgo_gen code to work use "packages" module for Go Modules support * travis: attempt to fix build * generate: update comments around generate * file: change from deprecated gopherjs/gopherwasm/js package to syscall/js * instance: fix deletion of entities to occur after GamePostUpdate and prevent crash if room was deleted before entity deletion is handled * vet: resolve go vet warnings * vet: resolve go vet warnings #2 * asset: add basic asset manager system * asset: make asset loading use go routines / be async this was done so that WebAssembly builds would download multiple assets at once, as loading is tied to the remote downloading of asset files * asset: fix race condition that could occur if asset count was below 4 for sprites and don't initialize audio context if no audio is in the game * time: rename timeprec to monotime * debug: remove code to open now deprecated room editor * asset: rename LoadAll to UnsafeLoadAll, to deter users from calling this in their own code * window: add WindowSetSize, WindowSetScale - remove WindowWidth, WindowHeight - replace calls to WindowWidth with WindowSize().X - replace calls to WindowHeight with WindowSize().Y * room: improve error message * draw: add DrawRectangleAlpha function * gml: add GameEnd method * instance: add InstancePauseAll method so that users can pause the game * window: add WindowCursorVisible and WindowSetCursorVisible * gmlgo: update gmlgo to use Short descriptions if you run "gmlgo" by itself * gmlgo: start on publish command * travis: fix xvfb * travis: fix xenial * travis: fix xvfb #2 * publish: more work on publish command * publish: fix publishing relative directories * travis: run gmlgo publish on examples/spaceship * manifest: add generation of manifest.json file in debug mode * manifest: update index.html to preload all asset files * manifest: fix preload warnings in Chrome * window: add IsBrowser will return true if the game is playing in a browser * worm: fix sound not playing and add asset/manifest.json to .gitignore * chore: update documentation, rename example/ to example/ and various other changes * gmlgo: improve error messages with gmlgo publish * gmlgo: improve error messages with gmlgo publish #2 * gmlgo: only publish for the current platform and web * uid: add snowflake uid generation * instance: Add InstanceFree function for freeing an instance without triggering its Delete method * gmlgo: update "gmlgo generate" to follow standard generated code conventions Standard generated code conventions are here golang/go#13560 (comment) * test: add code to ensure asset dir exists for tests * test: add code so that tests can run headed * test: update documentation around testing * publish: improve error message * camera: fix camera rendering jankily * steam: add paniccatch to log out crashes to files (for debugging steam / production crashes) * draw: add DrawSpriteCutMask method to allow for masking * instance: simplify InstanceCreate code * draw: remove DrawTextColor as it duplicated DrawText * git-actions: add config * mac: fix various bugs with Mac build * git-actions: fix PATH to access /usr/local/go/bin * git-actions: fix gl * git-actions: fix xvfb * git-actions: add libglfw3-dev * git-actions: xvfb * git-actions: fix actions * git-actions: fix publish * git-actions: try improve error messages * git-actions: try improve error messages 2 * git-actions: disable publish as it fails * git-actions: try setup windows * git-actions: try fix windows * git-actions: try fix windows * git-actions: try fix windows 3 * git-actions: try fix windows 4 * git-actions: try fix windows 5 * git-actions: fix windows code coverage * headless: fix headless * git-actions: add macos * git-actions: try fix macos * git-actions: add gfx for macos * git-actions: remvoe gfx for macos and add a note * remove travis badge
As described in https://golang.org/s/generatedcode, Go has a formalized format that should be used to indicate that a file is generated. Matching that format helps linters to skip generated files; From https://golang.org/s/generatedcode (golang/go#13560 (comment)); > Generated files are marked by a line of text that matches the regular expression, in Go syntax: > > ^// Code generated .* DO NOT EDIT\.$ > > The `.*` means the tool can put whatever folderol it wants in there, but the comment > must be a single line and must start with `Code generated` and end with `DO NOT EDIT.`, > with a period. > > The text may appear anywhere in the file. This patch updates the template used for our generated types to match that format. Signed-off-by: Sebastiaan van Stijn <firstname.lastname@example.org>
…rmat As described in https://golang.org/s/generatedcode, Go has a formalized format that should be used to indicate that a file is generated. Matching that format helps linters to skip generated files; From https://golang.org/s/generatedcode (golang/go#13560 (comment)); > Generated files are marked by a line of text that matches the regular expression, in Go syntax: > > ^// Code generated .* DO NOT EDIT\.$ > > The `.*` means the tool can put whatever folderol it wants in there, but the comment > must be a single line and must start with `Code generated` and end with `DO NOT EDIT.`, > with a period. > > The text may appear anywhere in the file. This patch updates the autogenerated code to match that format. Signed-off-by: Sebastiaan van Stijn <email@example.com>
As described in https://golang.org/s/generatedcode, Go has a formalized format that should be used to indicate that a file is generated. Matching that format helps linters to skip generated files; From https://golang.org/s/generatedcode (golang/go#13560 (comment)); > Generated files are marked by a line of text that matches the regular expression, in Go syntax: > > ^// Code generated .* DO NOT EDIT\.$ > > The `.*` means the tool can put whatever folderol it wants in there, but the comment > must be a single line and must start with `Code generated` and end with `DO NOT EDIT.`, > with a period. > > The text may appear anywhere in the file. This patch updates the template used for our generated types to match that format. Signed-off-by: Sebastiaan van Stijn <firstname.lastname@example.org> Upstream-commit: 3df4f86f21fbcae3535e2231828dce16a1940dbb Component: engine
…rmat As described in https://golang.org/s/generatedcode, Go has a formalized format that should be used to indicate that a file is generated. Matching that format helps linters to skip generated files; From https://golang.org/s/generatedcode (golang/go#13560 (comment)); > Generated files are marked by a line of text that matches the regular expression, in Go syntax: > > ^// Code generated .* DO NOT EDIT\.$ > > The `.*` means the tool can put whatever folderol it wants in there, but the comment > must be a single line and must start with `Code generated` and end with `DO NOT EDIT.`, > with a period. > > The text may appear anywhere in the file. This patch updates the autogenerated code to match that format. Signed-off-by: Sebastiaan van Stijn <email@example.com> Upstream-commit: 6186e9fe8794660d669f19a2e2ba7127321b817f Component: engine