New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Go support on Reproducible Builds #57120
Comments
|
Lets be clear that I don't actually know how reproducible any binary artifacts from Go actually is at the moment. I'm just expressing that things need to be more rigorous when dealing with binary artifacts. |
|
see also #24904 |
When you say "won't necessarily be enough", I assume you mean it won't produce the same bits, and that the extra attestation or SBOMs would be needed because they contain the extra build configuration information to get the same bits out. That's absolutely true today, and I don't think there's too much benefit to chasing a reproduction of releases before Go 1.21. Right now the directory where the build is run leaks into the binaries, and we ship pre-built compiled archives that have other leakage, and we ship two binaries that are built in part using the host C compiler, which has its own leakage. Starting in Go 1.20 we are dropping the pre-compiled archives, and in Go 1.21 we will drop the use of the host C compiler, at which point Go controls all the bits that are generated, and there are just a few steps to make them truly reproducible, namely cut out the build directory root and avoid using backslashes in partial file paths on windows. All that work is pending to land once Go 1.21 development begins. At that point the distributions will be really, truly, reproducible from only the source commit. The bootstrap toolchain doesn't leak into the distribution and as of Go 1.21 neither will the directory where the build happens, nor which operating system ran the build. At that point, anyone should be able to check out the go1.21 tag in the repo, grab a new enough bootstrap toolchain (Go 1.21 will require Go 1.17 or later, same as Go 1.20 does), run the build, and get bit-for-bit identical results. If there are attestations or SBOMs required to support some kind of process, I'd be happy to look into that, but it won't be necessary to reproduce the bits. Go binaries have always been highly reproducible on a single machine environment (fixed build directory, architectures, host C compiler), because we use build input content hashes to identify up-to-date-ness. If the build is not reproducible locally, the hashes don't converge. The most common way this would happen is if some detail of the bootstrap toolchain leaked into the compiler binary, so that building itself once and building itself twice produce different results. That convergence is tested in every toolchain build, so we shake those out as soon as they creep in. It's been quite a while since the last one. What will be new in Go 1.21 is removing the "single machine environment" limitation. |
|
Thanks for the pointer @seankhliao. Added that issue number to my pending CL and also marked that issue for Go 1.21. |
I'll be happy to test and validate any reproducability claims the Go binary disitribution is making. I have spent quite a bit of time working on these sort of issues. Obviously cgo and the external linker is a harder target for reproducability |
|
Indeed, cgo and the external linker is a much harder target. |
Thanks very much. We will open Go 1.21 development in February. I'll ping this issue once there is something to try. |
|
Sorry if this has been discussed elsewhere. When we talk about "reproducible build", do we want to specify what exactly are considered as input, and what are not? For example, the program's source code is an input, so are the source code version, the toolchain version, some configurations like the target GOOS and GOARCH. The current date and time are non-inputs. What about the host GOOS and GOARCH, the source code location and toolchain location, some other environment variables, etc.? On a narrow definition, "reproducible build" could be interpreted as building the same program twice with the exactly same configuration, gets the exactly same output. In that sense the last category could be considered as input. On a stronger definition, the last category are probably not. This issue maybe is to eliminate the last category as input. It may be good to specify it more clearly. |
|
@cherrymui |
|
This issue is specifically about reproducible builds for the Go toolchain distributed on https://go.dev/dl, not for arbitrary Go binaries. For that context, the relevant inputs are a Go source tree with a VERSION file, a GOOS, and a GOARCH. The host GOOS/GOARCH does not matter - the goal is to be able to reproduce builds no matter which OS compiled them. Other environment variables like CGO_ENABLED, CC, and so on matter but are left unset by our toolchain generation, so we can ignore them for reproducing the official downloads. |
|
@dolmen For arbitrary binaries, (1) you need to compile them with -trimpath or else put the source in the same directory as it was built with, and (2) you need to compile with CGO_ENABLED=0 or else arrange to have exactly the same C compiler and C libraries. If you can satisfy those two conditions, and then you use the go version output to get the right toolchain and Go source files, then you should get a reproducible build. That's not what this issue is about though. (For the Go toolchain itself we compile the commands with -trimpath and CGO_ENABLED=0.) |
|
@Foxboron, I posted https://swtch.com/tmp/go1.21repro4.src.tar.gz with a source tree containing the changes for reproducible builds for the upcoming Go 1.21 release (still in development). If you build it using the standard process ( You said earlier that you'd be happy to test and validate any reproducibility claims. Can you check that you can reproduce that build? And assuming you can reproduce this specific distribution, what is the process for adding Go to Reproducible Builds once the official Go 1.21 is released? |
|
@rcs, It will be a couple of days before I'll look at this. Currently recovering from a fever. Adding the Go project to reproducible-builds.org is just a matter of adding it to the homepage. https://salsa.debian.org/reproducible-builds/reproducible-website |
|
Building the above source with Whats the plan to ensure there are no regression between releases? |
|
I can probably also run this through a few |
|
@Foxboron, thanks for confirming that you can reproduce the build. That's great. I'm not too worried about testing lots of other distributions, especially since we can reproduce that go1.21repro4.linux-amd64.tar.gz from Windows and macOS too. Our current thinking for avoiding regressions in releases is to build releases on two fairly different machines (e.g., a Linux machine and a Windows machines) and confirm that they match before issuing a release. When I look at https://reproducible-builds.org/citests/, it appears that the top bunch are running regular tests on infrastructure run by the Reproducible Builds project. Once Go 1.21 is released (or at least go1.21rc1 is out), would it make sense for us to prepare a small repo containing a script that could be run on that infrastructure to reproduce the archives posted on https://go.dev/dl/? We could run it ourselves and be listed under "External tests" of course, but it seems like running on non-Google-owned infrastructure would be a stronger statement. What do you think? |
|
really great to read up on this issue and see the progress! kudos & thank you. one tiny comment from my side:
and in there one file needs to be edited: _data/projects.yml, where it just needs a YAML entry like
I'd either happily merge a MR or take the data from this issue ;) |
|
oh, and for testing on https://reproducible-builds.org/citests/ it's automated and the easiest if you do a release which then get's updated into Debian or Arch Linux or OpenSUSE. |
You could run it on the github CI/CD infra on each release? That + the google infra would be a nice statement to begin with. I'll probably write my own monitor for this, and then it might be worth to try host something on reproducible-builds.org in the future. |
|
I like the cron-based GitHub Actions idea. Thanks. |
I'd like to add that in addition to this, |
Over at #57001 (comment), @Foxboron wrote:
Quick point while I contemplate if it's worth engaging on this topic as the Arch maintainer for the
gopackage.It's a very big difference between downloading sources files defined in the
go.modfiles and fetching binary files files from some remote location. We are all very aware of the trusting trust attack and moving the reproducible builds requirements from the downstream distributor (Linux distributions) to the upstream (Google) is not trivial.So how is Google going to provide Reproducible Builds for the downloaded toolchains?
Then I wrote:
@Foxboron, regarding "Reproducible Builds", by that do you mean https://reproducible-builds.org/? And if so what is involved in "providing" one? As of Go 1.21 we expect our toolchains will be fully reproducible even when cross-compiling. (That is, if you build a Mac toolchain on Windows, Linux, and Mac, you get the same bits out in all cases.) I would be delighted to have a non-Google project reproducing our builds in some way.
Then @Foxboron replied:
Yes. I have been working on this project since 2017 for Arch Linux.
If this gets implemented we would be downloading binary toolchains, right? I want to reproduce the binaries distributed by Google.
Just checking out the source and building versions won't necessarily be enough, so there needs to be some attestation or SBOMs published to support the distribution of the binaries.
I'm not saying this can't be done. I'm just trying to point how the bar between the "reproducible builds" Go already facilitates with source code is very different from what you would need to ensure for binary builds.
I'm not sure if "our builds" is the distributed binaries from Google? But Arch has been publishing verifiable builds of the Go compiler for 2 or 3 years now.
Moving this conversation to a new issue.
The text was updated successfully, but these errors were encountered: