Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Go support on Reproducible Builds #57120

Open
rsc opened this issue Dec 6, 2022 · 21 comments
Open

Go support on Reproducible Builds #57120

rsc opened this issue Dec 6, 2022 · 21 comments
Assignees
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@rsc
Copy link
Contributor

rsc commented Dec 6, 2022

Over at #57001 (comment), @Foxboron wrote:

Quick point while I contemplate if it's worth engaging on this topic as the Arch maintainer for the go package.

One option is to trust in Go's high-fidelity, reproducible builds and let the go command fetch the dependencies directly. I would hope that systems that take this approach are also comfortable letting the go command fetch any toolchain dependency as well, since the toolchain fetches have the same high-fidelity, reproducible behavior as module dependency fetches.

It's a very big difference between downloading sources files defined in the go.mod files and fetching binary files files from some remote location. We are all very aware of the trusting trust attack and moving the reproducible builds requirements from the downstream distributor (Linux distributions) to the upstream (Google) is not trivial.

So how is Google going to provide Reproducible Builds for the downloaded toolchains?


Then I wrote:

@Foxboron, regarding "Reproducible Builds", by that do you mean https://reproducible-builds.org/? And if so what is involved in "providing" one? As of Go 1.21 we expect our toolchains will be fully reproducible even when cross-compiling. (That is, if you build a Mac toolchain on Windows, Linux, and Mac, you get the same bits out in all cases.) I would be delighted to have a non-Google project reproducing our builds in some way.


Then @Foxboron replied:

regarding "Reproducible Builds", by that do you mean https://reproducible-builds.org/?

Yes. I have been working on this project since 2017 for Arch Linux.

And if so what is involved in "providing" one?

If this gets implemented we would be downloading binary toolchains, right? I want to reproduce the binaries distributed by Google.

Just checking out the source and building versions won't necessarily be enough, so there needs to be some attestation or SBOMs published to support the distribution of the binaries.

I'm not saying this can't be done. I'm just trying to point how the bar between the "reproducible builds" Go already facilitates with source code is very different from what you would need to ensure for binary builds.

I would be delighted to have a non-Google project reproducing our builds in some way.

I'm not sure if "our builds" is the distributed binaries from Google? But Arch has been publishing verifiable builds of the Go compiler for 2 or 3 years now.


Moving this conversation to a new issue.

@rsc rsc added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Dec 6, 2022
@rsc rsc added this to the Unreleased milestone Dec 6, 2022
@rsc rsc self-assigned this Dec 6, 2022
@Foxboron
Copy link
Contributor

Foxboron commented Dec 6, 2022

Lets be clear that I don't actually know how reproducible any binary artifacts from Go actually is at the moment. I'm just expressing that things need to be more rigorous when dealing with binary artifacts.

@seankhliao
Copy link
Member

see also #24904

@rsc
Copy link
Contributor Author

rsc commented Dec 6, 2022

Just checking out the source and building versions won't necessarily be enough, so there needs to be some attestation or SBOMs published to support the distribution of the binaries.

When you say "won't necessarily be enough", I assume you mean it won't produce the same bits, and that the extra attestation or SBOMs would be needed because they contain the extra build configuration information to get the same bits out. That's absolutely true today, and I don't think there's too much benefit to chasing a reproduction of releases before Go 1.21. Right now the directory where the build is run leaks into the binaries, and we ship pre-built compiled archives that have other leakage, and we ship two binaries that are built in part using the host C compiler, which has its own leakage. Starting in Go 1.20 we are dropping the pre-compiled archives, and in Go 1.21 we will drop the use of the host C compiler, at which point Go controls all the bits that are generated, and there are just a few steps to make them truly reproducible, namely cut out the build directory root and avoid using backslashes in partial file paths on windows.

All that work is pending to land once Go 1.21 development begins. At that point the distributions will be really, truly, reproducible from only the source commit. The bootstrap toolchain doesn't leak into the distribution and as of Go 1.21 neither will the directory where the build happens, nor which operating system ran the build. At that point, anyone should be able to check out the go1.21 tag in the repo, grab a new enough bootstrap toolchain (Go 1.21 will require Go 1.17 or later, same as Go 1.20 does), run the build, and get bit-for-bit identical results.

If there are attestations or SBOMs required to support some kind of process, I'd be happy to look into that, but it won't be necessary to reproduce the bits.

Go binaries have always been highly reproducible on a single machine environment (fixed build directory, architectures, host C compiler), because we use build input content hashes to identify up-to-date-ness. If the build is not reproducible locally, the hashes don't converge. The most common way this would happen is if some detail of the bootstrap toolchain leaked into the compiler binary, so that building itself once and building itself twice produce different results. That convergence is tested in every toolchain build, so we shake those out as soon as they creep in. It's been quite a while since the last one. What will be new in Go 1.21 is removing the "single machine environment" limitation.

@rsc
Copy link
Contributor Author

rsc commented Dec 6, 2022

Thanks for the pointer @seankhliao. Added that issue number to my pending CL and also marked that issue for Go 1.21.

@Foxboron
Copy link
Contributor

Foxboron commented Dec 6, 2022

If there are attestations or SBOMs required to support some kind of process, I'd be happy to look into that, but it won't be necessary to reproduce the bits.

I'll be happy to test and validate any reproducability claims the Go binary disitribution is making. I have spent quite a bit of time working on these sort of issues.

Obviously cgo and the external linker is a harder target for reproducability

#53528

@rsc
Copy link
Contributor Author

rsc commented Dec 6, 2022

Indeed, cgo and the external linker is a much harder target.
The plan is to stop using cgo to build the Go distribution itself (#57007).

@rsc
Copy link
Contributor Author

rsc commented Dec 6, 2022

I'll be happy to test and validate any reproducability claims the Go binary disitribution is making. I have spent quite a bit of time working on these sort of issues.

Thanks very much. We will open Go 1.21 development in February. I'll ping this issue once there is something to try.

@cherrymui
Copy link
Member

Sorry if this has been discussed elsewhere.

When we talk about "reproducible build", do we want to specify what exactly are considered as input, and what are not? For example, the program's source code is an input, so are the source code version, the toolchain version, some configurations like the target GOOS and GOARCH. The current date and time are non-inputs. What about the host GOOS and GOARCH, the source code location and toolchain location, some other environment variables, etc.?

On a narrow definition, "reproducible build" could be interpreted as building the same program twice with the exactly same configuration, gets the exactly same output. In that sense the last category could be considered as input. On a stronger definition, the last category are probably not. This issue maybe is to eliminate the last category as input. It may be good to specify it more clearly.

@dolmen
Copy link
Contributor

dolmen commented Feb 7, 2023

@cherrymui
I think that a good target for reproducible build is to be able to fully reproduce (bit-to-bit) a binary just from the knowledge of the output of go version -m being applied to it.

@rsc
Copy link
Contributor Author

rsc commented Feb 7, 2023

This issue is specifically about reproducible builds for the Go toolchain distributed on https://go.dev/dl, not for arbitrary Go binaries. For that context, the relevant inputs are a Go source tree with a VERSION file, a GOOS, and a GOARCH. The host GOOS/GOARCH does not matter - the goal is to be able to reproduce builds no matter which OS compiled them. Other environment variables like CGO_ENABLED, CC, and so on matter but are left unset by our toolchain generation, so we can ignore them for reproducing the official downloads.

@rsc
Copy link
Contributor Author

rsc commented Feb 7, 2023

@dolmen For arbitrary binaries, (1) you need to compile them with -trimpath or else put the source in the same directory as it was built with, and (2) you need to compile with CGO_ENABLED=0 or else arrange to have exactly the same C compiler and C libraries. If you can satisfy those two conditions, and then you use the go version output to get the right toolchain and Go source files, then you should get a reproducible build. That's not what this issue is about though. (For the Go toolchain itself we compile the commands with -trimpath and CGO_ENABLED=0.)

@rsc
Copy link
Contributor Author

rsc commented Mar 6, 2023

@Foxboron, I posted https://swtch.com/tmp/go1.21repro4.src.tar.gz with a source tree containing the changes for reproducible builds for the upcoming Go 1.21 release (still in development). If you build it using the standard process (./make.bash) you should get the same binaries that are in https://swtch.com/tmp/go1.21repro4.linux-amd64.tar.gz or substitute a different GOOS-GOARCH in that URL. If you use ./make.bash -distpack you should get in ../pkg/distpack the exact archive at that URL. Like any Go toolchain build, the process requires a sufficiently new Go bootstrap toolchain (Go 1.17.13 or later) in $GOROOT_BOOTSTRAP (default $HOME/sdk/go1.17.13 or $HOME/go1.17.13 or $HOME/go1.4, whichever exists). There are no other requirements of the host system.

You said earlier that you'd be happy to test and validate any reproducibility claims. Can you check that you can reproduce that build? And assuming you can reproduce this specific distribution, what is the process for adding Go to Reproducible Builds once the official Go 1.21 is released?

@Foxboron
Copy link
Contributor

Foxboron commented Mar 6, 2023

@rcs, It will be a couple of days before I'll look at this. Currently recovering from a fever.

Adding the Go project to reproducible-builds.org is just a matter of adding it to the homepage. https://salsa.debian.org/reproducible-builds/reproducible-website

@Foxboron
Copy link
Contributor

Foxboron commented Mar 10, 2023

Building the above source with ./make.bash using the Go compiler shipped with Arch (2:1.20.1-1) produces the same checksum as the binaries from the go1.21repro4.linux-amd64.tar.gz archive.

λ bin » sha256sum *
c400a53988aaf4dbbf31cb2f1adef839457e996209b3ce86d239c12acf72d270  go
2767fa3b986d6a1799ee8d6340790595a2bb88d77670ceeba0abbd348826f124  gofmt

Whats the plan to ensure there are no regression between releases?

@Foxboron
Copy link
Contributor

I can probably also run this through a few toolbox images if you want me to check multiple distributions.

@rsc
Copy link
Contributor Author

rsc commented Mar 13, 2023

@Foxboron, thanks for confirming that you can reproduce the build. That's great. I'm not too worried about testing lots of other distributions, especially since we can reproduce that go1.21repro4.linux-amd64.tar.gz from Windows and macOS too.

Our current thinking for avoiding regressions in releases is to build releases on two fairly different machines (e.g., a Linux machine and a Windows machines) and confirm that they match before issuing a release.

When I look at https://reproducible-builds.org/citests/, it appears that the top bunch are running regular tests on infrastructure run by the Reproducible Builds project. Once Go 1.21 is released (or at least go1.21rc1 is out), would it make sense for us to prepare a small repo containing a script that could be run on that infrastructure to reproduce the archives posted on https://go.dev/dl/? We could run it ourselves and be listed under "External tests" of course, but it seems like running on non-Google-owned infrastructure would be a stronger statement. What do you think?

@h01ger
Copy link

h01ger commented Mar 13, 2023

really great to read up on this issue and see the progress! kudos & thank you.

one tiny comment from my side:

@rcs, It will be a couple of days before I'll look at this. Currently recovering from a fever.

Adding the Go project to reproducible-builds.org is just a matter of adding it to the homepage. https://salsa.debian.org/reproducible-builds/reproducible-website

and in there one file needs to be edited: _data/projects.yml, where it just needs a YAML entry like
eg this one for F-Droid .

I'd either happily merge a MR or take the data from this issue ;)

@h01ger
Copy link

h01ger commented Mar 13, 2023

oh, and for testing on https://reproducible-builds.org/citests/ it's automated and the easiest if you do a release which then get's updated into Debian or Arch Linux or OpenSUSE.

@Foxboron
Copy link
Contributor

Once Go 1.21 is released (or at least go1.21rc1 is out), would it make sense for us to prepare a small repo containing a script that could be run on that infrastructure to reproduce the archives posted on https://go.dev/dl/? We could run it ourselves and be listed under "External tests" of course, but it seems like running on non-Google-owned infrastructure would be a stronger statement. What do you think?

You could run it on the github CI/CD infra on each release? That + the google infra would be a nice statement to begin with.

I'll probably write my own monitor for this, and then it might be worth to try host something on reproducible-builds.org in the future.

@rsc
Copy link
Contributor Author

rsc commented Mar 14, 2023

I like the cron-based GitHub Actions idea. Thanks.

@quite
Copy link

quite commented Apr 11, 2023

@dolmen For arbitrary binaries, (1) you need to compile them with -trimpath or else put the source in the same directory as it was built with, and (2) you need to compile with CGO_ENABLED=0 or else arrange to have exactly the same C compiler and C libraries. If you can satisfy those two conditions, and then you use the go version output to get the right toolchain and Go source files, then you should get a reproducible build. That's not what this issue is about though. (For the Go toolchain itself we compile the commands with -trimpath and CGO_ENABLED=0.)

I'd like to add that in addition to this, -buildvcs=false seems to be needed (or else some info from any VCS gets opportunistically baked into the binary right).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

7 participants