Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build: "fatal error: out of memory" on windows-arm64-11 #51019

Open
bcmills opened this issue Feb 4, 2022 · 27 comments
Open

x/build: "fatal error: out of memory" on windows-arm64-11 #51019

bcmills opened this issue Feb 4, 2022 · 27 comments
Labels
arch-arm64 Builders x/build issues (builders, bots, dashboards) NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows
Milestone

Comments

@bcmills
Copy link
Member

bcmills commented Feb 4, 2022

greplogs --dashboard -md -l -e '(?ms)\Awindows-arm64.*^fatal error: out of memory' --since=2021-01-01

2022-02-04T14:02:15-25d2ab2-4afcc9f/windows-arm64-11

We may need to reconfigure the builder to either turn down the build/test parallelism or have more RAM available.

There is only one of these failures in the logs, but OTOH this builder has only ever run x/tools 12 times — so that's an 8% failure rate for this repo so far. 😅

(attn @golang/release)

@bcmills bcmills added arch-arm64 Builders x/build issues (builders, bots, dashboards) OS-Windows NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Feb 4, 2022
@gopherbot gopherbot added this to the Unreleased milestone Feb 4, 2022
@heschi heschi added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Feb 10, 2022
@heschi
Copy link
Contributor

heschi commented Feb 10, 2022

So far I'm not seeing any recurrences on what I assume is a much higher number of runs. We can keep an eye on it but right now I'm inclined to leave it alone.

@bcmills
Copy link
Member Author

bcmills commented Feb 11, 2022

So far I'm not seeing any recurrences

Here's one running x/crypto:

greplogs --dashboard -md -l -e '(?ms)\Awindows-arm64.*^fatal error: out of memory' --since=2022-02-05

2022-02-10T15:16:21-f4118a5-656d3f4/windows-arm64-11

@bcmills bcmills changed the title x/build: "fatal error: out of memory" building x/tools on windows-arm64-11 x/build: "fatal error: out of memory" building tests on windows-arm64-11 Feb 11, 2022
@bcmills bcmills changed the title x/build: "fatal error: out of memory" building tests on windows-arm64-11 x/build: "fatal error: out of memory" building on windows-arm64-11 Feb 11, 2022
@bcmills bcmills changed the title x/build: "fatal error: out of memory" building on windows-arm64-11 x/build: "fatal error: out of memory" on windows-arm64-11 Feb 11, 2022
@bcmills bcmills removed the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Feb 11, 2022
@bcmills
Copy link
Member Author

bcmills commented Feb 11, 2022

From the sheer number of packages that failed in each of those logs, I suspect that the parallelism is being set to high. What's the CPU-to-RAM ratio for this builder? (Maybe we could scale down GOMAXPROCS?)

@heschi
Copy link
Contributor

heschi commented Feb 11, 2022

About 12G RAM for 8 cores, which seems pretty plausible to me? There isn't much precedent for tweaking GOMAXPROCS but I guess it's worth a try.

...are the crypto tests really that memory hungry though? Smells weird.

@dmitshur
Copy link
Contributor

dmitshur commented Feb 11, 2022

CL 381514 for #50084 is some recent precedent.

@gopherbot
Copy link

gopherbot commented Feb 11, 2022

Change https://go.dev/cl/385182 mentions this issue: dashboard: reduce GOMAXPROCS on Windows 11 ARM64

@bcmills
Copy link
Member Author

bcmills commented Feb 11, 2022

are the crypto tests really that memory hungry though? Smells weird.

I agree. It looks like the actual OOM happened while recompiling packages in std, so probably more about compiler memory usage than test memory usage per se — but it's not clear to me why those packages would have been stale in the first place. 🤔

gopherbot pushed a commit to golang/build that referenced this issue Feb 11, 2022
The Windows 11 ARM64 builder is experiencing occasional OOMs while
building tests. Reducing GOMAXPROCS will reduce the go command's
parallelism and hopefully prevent them.

For golang/go#51019.

Change-Id: Ia4bfdddaca178c130b9b57087a66a54cff903a05
Reviewed-on: https://go-review.googlesource.com/c/build/+/385182
Trust: Heschi Kreinick <heschi@google.com>
Run-TryBot: Heschi Kreinick <heschi@google.com>
Auto-Submit: Heschi Kreinick <heschi@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
@bcmills
Copy link
Member Author

bcmills commented Feb 15, 2022

Unfortunately still OOMing even with GOMAXPROCS=4.

greplogs --dashboard -md -l -e '(?ms)\Awindows-arm64.*^fatal error: out of memory' --since=2022-02-12

2022-02-15T14:54:27-76bd8ea/windows-arm64-11

It's not at all clear to me why this is happening for the -11 builder but not the -10 builder — are they running on different host configurations?

@bcmills
Copy link
Member Author

bcmills commented Feb 15, 2022

Oh, hrm. The failure condition in that last one is a bit different — it OOMed during bootstrapping. 🤔

@bcmills
Copy link
Member Author

bcmills commented Feb 22, 2022

Three more OOMs over the weekend: one during bootstrapping in the main repo, and two during x/tools builds.

greplogs --dashboard -md -l -e '(?ms)\Awindows-arm64.*^fatal error: .*(?:out of memory|cannot allocate memory)' --since=2022-02-16

2022-02-20T20:58:11-851ecea/windows-arm64-11
2022-02-17T17:37:24-1f3875c-eaf0405/windows-arm64-11
2022-02-17T17:37:03-fd59bdf-eaf0405/windows-arm64-11

@bcmills
Copy link
Member Author

bcmills commented Mar 17, 2022

Still ongoing:

greplogs --dashboard -md -l -e '(?ms)\Awindows-arm64.*^fatal error: .*(?:out of memory|cannot allocate memory)' --since=2022-02-23

2022-03-15T13:54:34-6799a7a-e475cf2/windows-arm64-11
2022-03-14T09:19:01-b769efc-ab0f761/windows-arm64-11
2022-03-12T23:32:36-842d37e/windows-arm64-11

@heschi
Copy link
Contributor

heschi commented Mar 17, 2022

We got bit twice during the release during bootstrap too. But I have no idea what to do about it.

@bcmills
Copy link
Member Author

bcmills commented Apr 12, 2022

Still happening quite frequently, but only on the -11 builder. Does it have the same hardware configuration as the -10 builder?

greplogs --dashboard -md -l -e '(?ms)\Awindows-arm64.*^fatal error: .*(?:out of memory|cannot allocate memory)' --since=2022-03-17

2022-04-11T15:41:56-32de2b0/windows-arm64-11
2022-04-11T02:55:52-a6f6932/windows-arm64-11
2022-04-11T01:24:31-b6fb3af/windows-arm64-11
2022-04-02T14:28:33-8a816d5/windows-arm64-11
2022-03-21T13:26:21-86b02b3-7eaad60/windows-arm64-11
2022-03-21T13:26:21-86b02b3-4aa1efe/windows-arm64-11
2022-03-19T23:49:55-fa8efc1/windows-arm64-11

@heschi
Copy link
Contributor

heschi commented Apr 12, 2022

Yep, same qemu script. My best guess is some kind of OS issue/conflict with the emulator, but I have no idea how to prove or disprove that belief.

@bcmills
Copy link
Member Author

bcmills commented Apr 18, 2022

I wonder if this is somehow related to #49564, in that they both involve unexpected OOM failures on Windows.

@bcmills
Copy link
Member Author

bcmills commented Apr 18, 2022

@golang/release, is there a way to get the runtime to dump the current heap size when it fails with cannot allocate memory? It would probably be useful to know whether these OOMs are occurring due to wildly oversized heaps like the one in #49564 (comment).

@bcmills
Copy link
Member Author

bcmills commented May 3, 2022

Here's a new (but likely related) failure mode:

windows-arm64-11 at a41e37f56a4fc2523ac88a76bf54ba3e45dcf533
…
Building Go cmd/dist using C:\workdir\go1.4
go tool compile: fork/exec C:\workdir\go1.4\pkg\tool\windows_arm64\compile.exe: The paging file is too small for this operation to complete.

greplogs -l -e 'The paging file is too small' --since=2022-01-01
2022-05-03T12:34:17-a41e37f/windows-arm64-11
2022-04-24T01:22:21-86c51ed-96c8cc7/windows-arm64-11
2022-04-18T12:04:50-8db23f8-91b9915/windows-arm64-11
2022-04-15T15:57:52-2c73f5f/windows-arm64-11
2022-03-21T13:26:21-86b02b3-7eaad60/windows-arm64-11
2022-03-21T13:26:21-86b02b3-4aa1efe/windows-arm64-11
2022-03-14T09:19:01-b769efc-ab0f761/windows-arm64-11
2022-02-17T17:37:03-fd59bdf-eaf0405/windows-arm64-11
2022-02-15T14:54:27-76bd8ea/windows-arm64-11
2022-02-10T15:16:21-f4118a5-656d3f4/windows-arm64-11
2022-02-04T14:02:15-25d2ab2-4afcc9f/windows-arm64-11

@bcmills
Copy link
Member Author

bcmills commented May 3, 2022

The above failure mode suggests that there is a problem with the builder itself, not (just) #52433, since that failure occurred during bootstrapping using the old and venerable go1.4 toolchain.

@bcmills
Copy link
Member Author

bcmills commented May 16, 2022

greplogs -l -e '(?ms)\Awindows-arm64.*^fatal error: .*(?:out of memory|cannot allocate memory)' --since=2022-05-10
2022-05-16T06:55:54-86c51ed-9956996/windows-arm64-11
2022-05-14T23:57:43-335569b/windows-arm64-11
2022-05-14T23:57:43-0976fa6-335569b/windows-arm64-11
2022-05-14T15:26:27-19156a5/windows-arm64-11

@gopherbot
Copy link

gopherbot commented May 26, 2022

Change https://go.dev/cl/408702 mentions this issue: dashboard: add known issue for windows-arm64-11

gopherbot pushed a commit to golang/build that referenced this issue May 26, 2022
For golang/go#52653.
Updates golang/go#51019.

Change-Id: Ie57f7b2c2b6d4c3cc4b5f5f886773dff2a36a61e
Reviewed-on: https://go-review.googlesource.com/c/build/+/408702
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Bryan Mills <bcmills@google.com>
Run-TryBot: Bryan Mills <bcmills@google.com>
Reviewed-by: Alex Rakoczy <alex@golang.org>
@bcmills
Copy link
Member Author

bcmills commented Jun 23, 2022

@qmuntal, this is one of the issues I think should block promoting windows/arm64 to a first class port.

@bcmills
Copy link
Member Author

bcmills commented Jun 23, 2022

(You can find other windows/arm64 bugs here:
https://github.com/golang/go/issues?q=is%3Aissue+is%3Aopen++in%3Atitle+label%3Aarch-arm64+label%3Aos-windows

However, it's not clear to me which of those should be blockers to making windows/arm64 a first-class port; most of them have been mitigated by adding skips to the affected tests.)

@qmuntal
Copy link
Contributor

qmuntal commented Jun 27, 2022

Tracking the new Microsoft-provided windows/arm64 builder in #53541, which will hopefully fix this issue.

@dmitshur
Copy link
Contributor

dmitshur commented Jul 21, 2022

It seems the "fatal error: out of memory" problem hasn't been reported here for a while. One possible explanation is that we've updated the host OS version where the qemu emulator is running (one of the -11 builders was on a particularly old macOS 11.0, which lines up with #51019 (comment)).

I'll close this issue optimistically; please feel free to reopen otherwise.

@gopherbot
Copy link

gopherbot commented Jul 21, 2022

Change https://go.dev/cl/418940 mentions this issue: dashboard: remove known issue for windows-arm64-11 builder

@dmitshur
Copy link
Contributor

dmitshur commented Jul 21, 2022

The first page of the build dashboard shows 4 windows-arm64-11 fails that are all memory related, reopening.

greplogs since 2022-06-08

greplogs --dashboard -md -l -e '(?ms)\Awindows-arm64.*^fatal error: .*(?:out of memory|cannot allocate memory)' --since=2022-06-08
2022-07-20T23:32:27-244c8b0/windows-arm64-11
2022-07-20T23:29:03-df38614/windows-arm64-11
2022-07-20T16:45:46-bb1749b/windows-arm64-11
2022-07-19T14:08:07-c8730f7-ae7340a/windows-arm64-11
2022-07-18T15:58:37-c0c1bbd/windows-arm64-11
2022-07-18T12:16:49-f839522-88a06f4/windows-arm64-11
2022-07-18T12:16:49-f839522-2aa473c/windows-arm64-11
2022-07-13T15:21:36-c27b92c-88a06f4/windows-arm64-11
2022-07-05T12:57:42-84e091e/windows-arm64-11
2022-07-04T14:31:39-ceda93e/windows-arm64-11
2022-07-01T14:13:36-93bf1fc-c847a2c/windows-arm64-11
2022-07-01T13:38:07-79fefdf-c847a2c/windows-arm64-11
2022-07-01T13:37:03-fa4babc-c847a2c/windows-arm64-11
2022-06-29T15:00:52-bd1783e/windows-arm64-11
2022-06-28T13:00:32-751cae8/windows-arm64-11
2022-06-25T19:07:01-4f45ec5/windows-arm64-11
2022-06-24T13:47:25-2994e99-d38f1d1/windows-arm64-11
2022-06-17T15:02:55-694bf12-6c25ba6/windows-arm64-11
2022-06-16T15:42:24-041035c-ecc268a/windows-arm64-11
2022-06-16T14:42:12-d097bc9-ecc268a/windows-arm64-11
2022-06-13T15:38:06-d27128b/windows-arm64-11
2022-06-09T15:15:48-840e99e/windows-arm64-11
2022-06-09T15:15:48-0fc6e7c-840e99e/windows-arm64-11
2022-06-09T15:02:08-1d19788-1a2ca95/windows-arm64-11

@dmitshur dmitshur reopened this Jul 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-arm64 Builders x/build issues (builders, bots, dashboards) NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows
Projects
None yet
Development

No branches or pull requests

5 participants