New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime,cmd/compile: exit status 0xc0000374
(STATUS_HEAP_CORRUPTION
) on windows-amd64-longtest
#52647
Comments
Re-running the scan due to the possibility of failures masked by #52591:
|
That's two matching failures within the Go 1.19 cycle (and even within the past couple weeks!) on (CC @golang/windows) |
One more: |
Three days of continuous testing on 25 windows gomotes has gotten me zero of these failures, so I suspect I am missing some required component of the failure. |
None on the dashboard for the past week or so, although that's somewhat to be expected with the CL rate decreasing from the freeze.
(0 matching logs) Note that this has only been observed in the Maybe it has something to do with the shape of the machine? IIRC the (Or maybe it's some sort of crosstalk between tests and builds somehow? But that seems even more weird.) |
Still no new cases since 2022-05-10. I ran another set of 25 builders over the weekend, this time creating a new VM for each test run. Nothing. I am inclined to close this and reopen if anyone discovers new cases. |
SInce the freeze started 2022-05-07 and the rate of CLs (and thus dashboard test runs) is much lower during the freeze, it's not surprising and not necessarily meaningful to have fewer (or no) failures during that interval, and looking at the Running I would be more comfortable closing out this issue if we have a plausible (even if unconfirmed) theory for how it could have been fixed by a code or configuration change since the last failure. |
Fair enough, reopened. However beyond simply waiting for builders, I'm out of ideas for trying to reproduce this. Perhaps someone on @golang/windows has more context about this error and what may trigger it (I've been assuming memory corruption in the C allocator)? |
Looking for common factors in the
That suggests that the |
The staleness check for I believe that that function runs once per |
The But I don't know how that line could possibly be executed as part of |
One thing I haven't tried is testing at exactly one of the commits that previously failed. To that end, I'll test at f0c0e0f (commit from the 2022-04-27 failure). I've instrumented |
I still can't figure out how the The only (https://cs.opensource.google/search?q=%22%5C%22tool%5C%22,%20%5C%22compile%5C%22%22) |
I modified
|
Change https://go.dev/cl/412774 mentions this issue: |
To match the dashboard logs, I'm looking for an unindented
So far I'm not able to reproduce any exact match running any |
Ok, this is very weird. When
That leaves open several possibilities, but all of them are weird. Some that I can think of:
|
Note that in the most recent failure, https://build.golang.org/log/7f17e01b2f0dad08c7da217e4bbe56dd2cf35a6b, the failure is
This differs from the first two in that it is checking the staleness of |
Ah, sorry, we also call the |
OK, and there is a case where we can see |
OK, I see where the This is invoked while running the The |
It's perhaps worth noting that it appears that although there is a cache to avoid running |
Change https://go.dev/cl/412954 mentions this issue: |
…uilder).toolID For #52647. Change-Id: Ic12123769d339c2df677500ed59f15a4ee5037d3 Reviewed-on: https://go-review.googlesource.com/c/go/+/412954 Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Bryan Mills <bcmills@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com>
I wrote a simple program that ran four instances of No idea what is happening here. |
I still suspect some interaction with test sharding in Maybe some kind of race involving how the |
I gave this some more thought this evening.
That suggests to me that the failure mode has something to do with the way we distribute the built Given that, I think it would be ok to slip this issue to Go 1.20 to collect more information about the failure rate during open development and to see whether we see the same failure mode under conditions that aren't so closely tied to buildlet sharding. |
I agree with @bcmills that we have enough evidence at this point to drop release-blocker from this. Given that we're pretty sure we know what command is failing, is there anything we can do to gather more data from future failures? I'm worried we're just going to see logs like the ones we have, which I think we've tapped dry at this point. |
I tweaked the error message in CL 412954 to hopefully confirm Ian's diagnosis of which command is failing, and audited that log line to make sure we're also printing the command's Beyond that I think it would help to get a core dump on failure, but I don't know of a way to dump core on the builders without overwhelming the Coordinator's output limits (compare #49165). Coming from the opposite direction, would it make sense to have Perhaps we should also audit the Finally, I notice that the buildlet's |
Confirmed that the failure is indeed during
|
An intriguing clue (2022-08-23T03:09:07-0a52d80/windows-amd64-longtest):
|
No failures since the last one @bcmills reported. |
Timed out in state WaitingForInfo. Closing. (I am just a bot, though. Please speak up if this is a mistake or you have the requested information.) |
According to https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-erref/596a1078-e883-4972-9bbc-49e60bebca55, this exit code means:
greplogs --dashboard -md -l -e \(\?ms\)\\Awindows-.\*0xc0000374
2022-04-27T14:23:28-f0c0e0f/windows-amd64-longtest
Since this has only been seen once, leaving on the backlog to see whether this is a recurring pattern or a one-off fluke.
(CC @golang/runtime)
The text was updated successfully, but these errors were encountered: