New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/build: update C compiler on Windows builders to a currently-maintained version #35006
Comments
No worries. I will try to have a go at this myself. When I have time. Alex |
I'd recommend using the ones from https://musl.cc/ . They're up to date and work very well: http://musl.cc/x86_64-w64-mingw32-native.zip I mirror the snapshot that I use for WireGuard here, if you'd prefer to grab a build that's been "battle-tested": https://download.wireguard.com/windows-toolchain/distfiles/x86_64-w64-mingw32-native-20190903.zip |
I tried running all.bat on the current tip against
and it all passes. While, if I do the same against
it fails in debug/pe with error similar to #23649 (comment) Why is that? And, while I, probably, prefer your version of gcc, our builders should test against what most users use. Alex |
I'll CC @zv-io to talk about why the musl.cc compilers are better. Usually those compilers are up to date and use recent MingW that fixes bugs.
We should direct users to using actually maintained compilers. Windows toolchains are a moving target, and those old ones have many problems. |
I am all for making better tools available to Go users. But we still need to fix bugs that Go users experience while using whatever tools they use. How do you propose we handle bugs like #23649 if we don't use same compilers as #23649 users? Alex |
I have no interest adding hacks to Golang to work around known bugs in ancient compilers that aren't supported by Microsoft in the first place. It would be one thing if we supported MSVC and there was a bug there we have to deal with. But here we're talking about some random compilers people download from Sourceforge. I won't waste my time fighting with that junk. Rather, most of these things have been fixed upstream already, and @zv-io is nice enough to make builds available at musl.cc. |
Hi, I maintain these.
I can't explain why the other toolchain causes the error, and the build log referenced by that comment is nonexistent. I'm having a hard time finding information about how those toolchains are built. I use the latest MinGW-W64 when I prepare each release, and provide recipes and build scripts for all releases.
There's #23649 (comment) which maybe someone else can advise on, but I'm going to agree with @zx2c4 that it's entirely a waste of time to figure out how to hack around bugs in broken toolchains. Further, not encouraging their use might reduce your maintenance burden, not add to it. I'm guessing that "most users" won't complain about using up-to-date, reproducible toolchains that don't require JavaScript or advertisements to download. They even work on Windows XP. |
This is what all Go users use. Including our builders (see https://github.com/golang/go/wiki/WindowsBuild ). If we are to change this, then we should all agree what Mingw we use. Perhaps we should even provide the url in https://golang.org/doc/install - if there are many Mingw versions and they work differently, then we should be explicit there. And we should update our builders to run the same software. I am fine using http://musl.cc if everyone agrees. @rsc and @ianlancetaylor ? And we won't be fixing issues like #23649, because they are not reproducible with http://musl.cc . @mattn can you, please, verify? I used http://musl.cc/x86_64-w64-mingw32-native.zip
@zv-io thank you for taking time to comment here. The fix for #23649 is here https://go-review.googlesource.com/c/go/+/197977/ And the gist of the fix is, it works around commit b295099 from git://git.code.sf.net/p/mingw-w64/mingw-w64 Does your Mingw build includes this commit? The commit (as I understand it) adds Alex |
Change https://golang.org/cl/203603 mentions this issue: |
@rsc and @ianlancetaylor The MINGW gcc our builders use is very very old. We need to decide on the replacement - see my comment #35006 (comment) and whole thread in general. What do you think? Thank you. Alex |
I checked this compiler.
All works fine. And -race too. |
I recently had to set up a Windows machine to compile Go (with cgo) from source and it was a truly awful experience. I somewhat managed to hit every single issue and incompatibility between windows, cgo and several versions of gcc/cygwin/mingw/mingw-w64/msys/TDM-GCC that was ever reported on the issue tracker. At some point I had probably half a dozen GNU toolchains installed, all giving me a different combination of errors, crashes, and gcc warnings when running At the end I was able to find some installer that downloaded and unpacked a mingw-w64 toolchain that was miraculously able to make a cgo-enabled So yes, let's settle on some maintained, up-to-date GNU toolchain distribution for Windows, bless that one, and document it in
So that people using cgo or building Go from source on Windows know what GNU toolchain they are supposed to put on their machine. Anyway after finding this thread I nuked all the GNU toolchains I had and tried the musl.cc linked above and it worked just fine. I like that
|
Per #50824:
According to the GCC release history, 5.1.0 was released April 22, 2015, nearly seven(!) years ago. There have been subsequent maintenance releases up to GCC 5.5 (Oct. 10, 2017), and no apparent active maintenance since then. The GCC flags required for reproducible builds are only available as of GCC 8 or higher, so having such an antiquated compiler on the Windows builders means that we cannot test cgo-enabled builds for reproducibility on the builders. The builder image needs a few other updates for adequate test coverage as well (see #46693), so we should probably refresh Going forward, I would like to see us have a consistent plan for keeping builder images up to date. While it may be important to test against very old compilers, in general we expect users to work around platform bugs by upgrading their platform to fix those bugs — so it is even more important that we test against up-to-date platforms, especially on the |
I would like to make build reproducibility a priority for 2022. Marking as release-blocker for Go 1.19, and I will have a separate conversation with the release team to make a plan to resolve this. |
Alex and I had a productive session this morning, built a couple of new windows images and did some testing. The testing did unfortunately flush out a new problem on 32-bit:
I'll need to track that down before moving ahead with the builder image updates. |
Another addition of "--network=vpc" later in build.bash in the image capture phase. Updates golang/go#35006. Change-Id: I5e23a693a6042f19a309d3fe7ddedba30e95cc1f Reviewed-on: https://go-review.googlesource.com/c/build/+/406858 Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Alex Rakoczy <alex@golang.org>
An update on my work to resolve this issue. There are two remaining problems at this point, one for 32-bit and the other for 64-bit. I am currently trying to figure out the 32-bit problem (TestInternalLinkerDWARF as described previously). These problems are only happening when I use the updated windows GCP images in addition to the new compilers. When I use the new compilers with the old builder (e.g. older Windows image), the tests run successfully. [Note: the CL that selects the new base GCP windows images is CL 406859.] For TestInternalLinkerDWARF, when I run the debug/pe test on the old image, at the point where we would ordinarily load up host archives (e.g. read libgcc.a or equivalent), e.g. this code, we don't have any unresolved symbols, so the linker completely bypasses reading of the host archives. This is mysterious to me, but so be it. [I might add that when the same test runs in 64-bit mode, there's a whole collection of unresolved symbols at this point]. On the new image, we do have at least one unresolved symbol ("__acrt_iob_func"), so it proceeds with reading the host archives. Where things seem to be going wrong is during host archive reading; at a certain point when this code runs, we wind up with an unresolved reference to the symbol "main" (which should not be happening-- we already have a definition of main). More debugging needed to figure out why this is taking place. The second problem I'm seeing is that for 64-bit, the race tests run as part of all.bash are failing. Example:
This looks exactly like the failure mode in #46099. So it appears that this is also a potential blocker for rolling out the new compilers. I haven't tried to dig into this problem at all yet. |
@thanm I'm going to label this issue as okay-after-beta1, but I still feel like we need to make a decision on this (or a fix) before the RC. Let me know if you disagree. |
I'm not as blocked on this as I expected to be at this point when I added the |
I'm fine with removing release blocker and/or moving to 1.20. Of course the longer we delay, the more we risk having other bugs like #46099 "pile up", but we'll just have to figure out how to work around that. |
Another update on the debug/pe test: I still have no clue why the new compilers behave differently on the new windows images vs the older/existing images, but that seems to be where we are, so I am just going to have to cope with it. I added a big chunk of debugging trace code into the linker to trace the host object loading process, since this appears to be the place where things are going off the rails for this bug. Just to recap how the host object loading process works: the general idea is that if we're doing CGO and internal linking, we stop at the point right after loading up Go libraries and processing CGO directives (this turns out to be important) and scan through all the loader symbols looking for undefined references (relocations that target a symbol of type SXREF). The undefined symbols we are looking for are reference to runtime routines from the compiled C code, references emitted by the C compiler (example, things like Just before host object loading kicks in, we have these symbols defined. First we have the actual main symbol from the Go runtime package (from rt0_windows_386.s):
Note that it's named "_main", not "main", which if you squint seems ok. That is if you compile the C code "int main() { return 0; }" on a windows 386 box, the compiler will put out a symbol "_main" for the function, not "main" (this is the naming convention). Also of interest is that we have this symbol as well:
This doesn't correspond to any real Go or C symbol, it is a byproduct of processing CGO directives; it gets created here: https://go.googlesource.com/go/+/8a56c7742d96c8ef8e8dcecaf3d1c0e9f022f708/src/cmd/link/internal/ld/go.go#203 . Note the name here though: it's "main" and not "_main" (not consistent with the runtime package). Host object loading kicks in and we start pulling in objects. At a certain point we have an outstanding reference to "_atexit", which causes "crt2.o" to be read in. The text section in that object actually has a reference to the main symbol:
When the PE host object loader reads in crt2.o, it first removes the "_" prefix on the symbol (code here: https://go.googlesource.com/go/+/8a56c7742d96c8ef8e8dcecaf3d1c0e9f022f708/src/cmd/link/internal/loadpe/ldpe.go#555) and then later in the same function this code kicks in: which takes the existing Sxxx "main" symbol (the one created when we processed CGO directives and converts it to SXREF). Now we're out of the frying pan and into the fire: we have a symbol named "main" with type SXREF, and since we're still doing host object loading, we go looking for it in the host archives (even though we already have a symbol defining main, just that it's named "_main"). Turns out that there is actually a definition of "main" in one of the entries in some object file in Looking at the definition of main in
it seems there is a disconnect between this function (which has the leading underscore) and the code in the linker here https://go.googlesource.com/go/+/8a56c7742d96c8ef8e8dcecaf3d1c0e9f022f708/src/cmd/link/internal/loadpe/ldpe.go#554 , which is immediately stripping off the leading "_". Lo and behold, if I go into "rt0_windows_386.s" and change "TEXT _main" to "TEXT main", the I don't think this will work as a permanent fix, however, mainly since this won't work with external linking (the external linker will expect the real definition of main to be named "_main" and not "main"). Still need to work out a more general solution. |
The And: Unfortunately the commentary on those is a bit thin as to the “why”. |
Yup. I am leaning towards adding yet another "special sauce" fix (like the ones you cite) but to the CGO directive processing code in the linker. |
Change https://go.dev/cl/410124 mentions this issue: |
With https://go.dev/cl/410124 all.bat makes it through the std pkg tests ok, but then there is this new problem (again 386):
Fun! |
Change https://go.dev/cl/410241 mentions this issue: |
Pick windows-server-2008-r2-dc-v20200114 for the 2008 version of server-2008r2-v8 (the one we selected before doesn't exist). Updates golang/go#35006 Change-Id: I02e5e4fb5085173f926af5caa45518f333ab5614 Reviewed-on: https://go-review.googlesource.com/c/build/+/410241 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Carlos Amedee <carlos@golang.org> Auto-Submit: Than McIntosh <thanm@google.com> Run-TryBot: Than McIntosh <thanm@google.com>
When building CGO internal linking on windows 386, make sure to avoid rewriting references to "_main" to "main" when reading symbols during host object loading; the main routine defined by the Go runtime is still named "_main" (not "main"). If we don't do this, we wind up with an SXREF symbol named "main", which can then cause the loader to pull an actual "main" symbol out of a host archive, which is undesirable. Updates #35006. Change-Id: I3768e3617b560552f4522e9e72af879c6adf7705 Reviewed-on: https://go-review.googlesource.com/c/go/+/410124 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Alex Brainman <alex.brainman@gmail.com> Auto-Submit: Than McIntosh <thanm@google.com> Run-TryBot: Than McIntosh <thanm@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
Change https://go.dev/cl/413414 mentions this issue: |
Change https://go.dev/cl/413415 mentions this issue: |
Fix typo in URL for Go download. Add dummy VERSION to Go repo before uploading to windows builder. Updates golang/go#35006 Change-Id: I832aa781884b4da99053bd793909a693900fa937 Reviewed-on: https://go-review.googlesource.com/c/build/+/413414 Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Alex Rakoczy <alex@golang.org> Run-TryBot: Than McIntosh <thanm@google.com>
Add a set of new "*-newcc" windows 386/amd64 builders that use new Windows VMs that incorporate updated GCP base images and updated C compilers. These builders are tagged with a known issue (35006) and are not included in any trybot sets at the moment. This change is intended to make it easier to work on the remaining issues blocking the adoption of the new compilers; once we have fixes checked in we can get rid of these builders and roll the default windows builders to the "*v8" images. Updates golang/go#35006. Change-Id: I1a7fb5699c20428a846b932e395aeb10ad7adfa9 Reviewed-on: https://go-review.googlesource.com/c/build/+/413415 Reviewed-by: Carlos Amedee <carlos@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org> Run-TryBot: Than McIntosh <thanm@google.com>
Change https://go.dev/cl/415674 mentions this issue: |
Change https://go.dev/cl/415675 mentions this issue: |
Change https://go.dev/cl/414475 mentions this issue: |
Builders and try-bots have gcc 5.1 installed on them. According to https://sourceforge.net/projects/mingw-w64/files/ we could get gcc 8.1.
I discovered problem with our builders while fixing #23649. For example, current debug/pe tests fail, if gcc 8.1 is used, but pass fine on our builders. CL 197977 fixes broken debug/pe tests, but I cannot even verify it on builders / try-bots.
I propose we install gcc 8.1 so we can test #23649 fix, and keep it fixed. gcc 5.1 is many years old, we should use recent gcc on our builders.
I used these files from https://sourceforge.net/projects/mingw-w64/files/
amd64 and 386 versions. I am not sure, if I used correct versions, but they seems to work with my change. I am happy to use any alternative location / version - as long as it is recent. @mattn maybe you have suggestions which mingw version to use.
I also am not sure what to do about old gcc 5.1 installed on our builders. Perhaps we should have both old and new gcc versions installed onto our disk images. And then have different builders use different gcc versions. For better coverage. I noticed that
golang.org/x/build/cmd/buildlet.windowsPath
adds gcc bin directory to the path, perhaps we could adjust that code.@bradfitz can you help?
Thank you.
Alex
The text was updated successfully, but these errors were encountered: