Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: cmd/link: Include build meta information #35667

Open
michael-obermueller opened this issue Nov 18, 2019 · 5 comments
Labels
Projects
Milestone

Comments

@michael-obermueller
Copy link

@michael-obermueller michael-obermueller commented Nov 18, 2019

This is a proposal to add extensive build meta information to Go binaries for various use cases like:

  • Stability: maturity analysis
  • Security: vulnerability detection
  • Technology detection, which is the process of identifying if an application's underlying technology is Go

Currently it is hard to retrieve meta information from Go binaries - either information is missing completely or extraction requires extensive parsing of the binary file. The following table lists existing metadata entities and the mechanism required to extract the information.

Meta information Extraction
Go build version Symbol table lookup to access global variable runtime.buildVersion (type string)
Build information (modules and versions) Symbol table lookup to access global variable runtime/debug.modinfo (type string)
Compiler options, e.g. build mode, compiler, gcflags, ldflags Currently this information is not present in the executable
User defined custom data, e.g. application version, vendor name Currently this is only possible when setting global string variables at compile-time. The downside of this approach is that it requires the symbol table to access them and implies data type knowledge.

This proposal is to provision extended build time meta information to Go binaries. Reading the information from binaries shall be trivial.

Go already provisions go.buildid hash string into Go binaries and provides tools to read that information from the binary.

go.buildid is provisioned in PT_NOTE segment for ELF based systems (see note sections (2-4)). In case of executable file formats which do not define appropriate mechanisms for enclosing meta information (like e.g. Windows PE), go.buildid is added as non-instruction bytes at the very beginning of the .text segment.

Thus, a portable mechanism for meta information provisioning is already in place and can be re-used for build meta information. The proposed name for build meta information is go.metadata and it should be added after the existing go.buildid entry.

go.metadata format

The proposed format for go.metadata is JSON. JSON is extensible and Go has first class JSON parsing support. The following sample shows what meta information of a simple Go binary may look like:

{
    "version": "go1.13.4",
    "compileropts": {
        "compiler": "gc",
        "mode": "pie",
        "os": "linux",
        "arch": "amd64",
        "libcvendor": "GLIBC",
        "cgoenabled": true
    },
    "buildinfo": {
        "path": "HelloWorld",
        "main": {
            "path": "HelloWorld",
            "version": "(devel)",
            "sum": ""
        },
        "deps": [
            {
                "path": "github.com/pkg/errors",
                "version": "v0.8.1",
                "sum": "h1:iURUrRGxPUNPdy5/HRSm+Yj6okJ6UtLINN0Q9M4+h3I="
            }
        ]
    },
    "user": {
        "customkey": "customval",
        "version": "1.0",
        "vendor": "my company name"
    }
}

go.metadata shall validate against the JSON schema attached to this issue.

@michael-obermueller michael-obermueller changed the title cmd/link - Include build meta information cmd/link: Include build meta information Nov 18, 2019
@michael-obermueller michael-obermueller changed the title cmd/link: Include build meta information proposal: cmd/link: Include build meta information Nov 18, 2019
@gopherbot gopherbot added this to the Proposal milestone Nov 18, 2019
@gopherbot gopherbot added the Proposal label Nov 18, 2019
@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Nov 27, 2019

This could get arbitrarily complex. We already have the first two rows in the table, accessible using go version <binary>, even for stripped binaries.

Generalizing to JSON will just make the binaries bigger and create more work for existing parsers, for very little benefit.

Generalizing to arbitrary metadata similarly adds complexity with not much benefit.

I think we should probably stop where we have stopped.

@networkimprov

This comment has been minimized.

Copy link

@networkimprov networkimprov commented Nov 27, 2019

Is there a way that apps could opt-in to this scheme? Perhaps by defining a const string (in arbitrary format) and then passing its name to a build flag to be sited at a known or locatable offset?

@michael-obermueller

This comment has been minimized.

Copy link
Author

@michael-obermueller michael-obermueller commented Nov 29, 2019

@rsc - as you outlined, some of the data is already available with go version <binary>. The issue is, that go version <binary> bears two implicit assumptions - go tool chain is installed and the binary is a Go built binary.
If these two things are removed from the equation, the process to read meta data gets a lot more tedious.

Tools (not necessarily implemented in Go) which operate on application meta information have to deal with all sorts of technologies. Performance monitoring tools and vulnerability scanners supervise production systems.
Thus, go version is not an option. Another, very different use case is to extend file command for Go applications. Go is a great technology and already deploys a mint foundation of information into application binaries. Sure, reading .go.buildinfo and parsing runtime.modinfo are no major technical obstacles. Nonetheless, it increases tech currency and the risk of failure once these internal formats change.

This reasoning led us to propose JSON formatted build meta information. The very minor increase in binary size is in our view outweight by its extendability, standardization, and availability of proven parsers. But JSON format is in no terms a mandatory requirement. If size is a roadblock, it can be substituted with another, more lightweight format.
Aside the implicit overhead of JSON, all proposed data is either

  • already included in application binary today (version, module info),
  • limited in size (likely less then 1KB, tool chain options), or
  • in control of the user (custom data)

Thus, we think the proposal adds significant benefits to Go, by bringing it en par with other technologies.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Dec 2, 2019

When you say "en par with other technologies" which technologies are you thinking of? If other languages are providing this kind of information we should consider doing what they do rather than inventing something new.

Note that for specific purposes the linker's -X option can be used to set run-time information based on build-time data.

@Hollerberg

This comment has been minimized.

Copy link

@Hollerberg Hollerberg commented Dec 4, 2019

@ianlancetaylor [disclaimer - I am co-author of the proposal]

To my knowledge - no standardized deployment mechanism or defined set of meta information that Go could directly re-use exists. We tried to compile a set of properties, that seemed reasonable for
hopefully many applications (adding custom information is e.g. has no importance for our use cases).

In shortcoming of a better mechanism, we did propose to follow the go.buildid
embedding scheme, although it requires searching .text segment in PE format - which is a real
performance hit in technology detection / determination.

Different technologies use very divergent schemes and formats. Many technologies simply leverage from not being bound to a specific file format (like ELF or PE). Others, from being backed
by organizations with the ability to extend file format definitions to their requirement.

Node.js has package.json file, which defines a rich set of information (name, version, license,
runtime version limitations, dependent packages etc.).

.Net manifest files contain a quite rich set of meta information (referenced assemblies, version,
vendor etc., standardized in ECMA-335).

Java has a subset of the proposed information in its manifest file.

@rsc rsc added this to Incoming in Proposals Dec 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Proposals
Incoming
6 participants
You can’t perform that action at this time.