Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: "exec format error" when running newly-built binary on linux-loong64-3a5000 #53116

Closed
bcmills opened this issue May 27, 2022 · 11 comments
Labels
arch-loong64 Issues solely affecting the loongson architecture. compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented May 27, 2022

#!watchflakes
post <- builder ~ `^linux-loong64` && `exec format error`
##### ../test
# go run run.go -- linkname.go
fork/exec /tmp/workdir-host-linux-loong64-3a5000/tmp/13900000/a.exe: exec format error

FAIL	linkname.go	0.262s

greplogs -l -e '(?ms)\Alinux-loong64.*exec format error'
2022-05-26T20:17:08-ec92580/linux-loong64-3a5000
2022-05-26T18:56:07-e6d8b05/linux-loong64-3a5000
2022-05-20T23:05:38-bc2c85a-c3470ca/linux-loong64-3a5000
2022-05-18T15:15:29-5e4e11f-bc2e961/linux-loong64-3a5000
2022-05-02T18:38:26-af99c20/linux-loong64-3a5000

(attn @golang/loong64)

@bcmills bcmills added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. arch-loong64 Issues solely affecting the loongson architecture. labels May 27, 2022
@gopherbot
Copy link

Change https://go.dev/cl/408938 mentions this issue: dashboard: add known issues for linux/loong64

gopherbot pushed a commit to golang/build that referenced this issue May 27, 2022
(This will cause 'greplogs --triage' to filter out this builder by
default.)

Updates golang/go#53116.

Change-Id: Ib238c641b83f6aec3d1fd75933b7d8593313da21
Reviewed-on: https://go-review.googlesource.com/c/build/+/408938
Reviewed-by: Alex Rakoczy <alex@golang.org>
Run-TryBot: Bryan Mills <bcmills@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
@xen0n
Copy link
Member

xen0n commented May 28, 2022

Can't reproduce locally as well.

@golang/loong64, would you try reproducing in a fresh environment (you could use a Gentoo chroot) and latest kernel?

@limeidan
Copy link
Contributor

This is an occasional problem, sometimes running all.bash can reproduce it locally, but the cause has not been found out for the time being, need futher investigation.

@xen0n
Copy link
Member

xen0n commented May 30, 2022

This is an occasional problem, sometimes running all.bash can reproduce it locally, but the cause has not been found out for the time being, need futher investigation.

Can you try to somehow save a copy of the faulty executable next time this happens? Might be helpful for debugging.

@limeidan
Copy link
Contributor

limeidan commented Jun 2, 2022

Can you try to somehow save a copy of the faulty executable next time this happens? Might be helpful for debugging.

FYI. And we found that the file was malformed lost 16KB of data at its beginning.

bad-bin-compile.tar.gz

@xen0n
Copy link
Member

xen0n commented Jun 15, 2022

Update: problem may be pinpointed, it's likely due to some interesting race involving fallocate and/or mmap. The Loongson team should be working on this shortly, and I'll be helping.

@abner-chenc
Copy link
Contributor

On some linux/loong64 machines, an "exec format error" error occasionally occurs when executing the make.bash or all.bash. Check these executable files with errors, it is found that the data of 16K(only one pagesize) at the beginning of the file is missing. The result of hexdump is as follows:

   1  0000000 0000 0000 0000 0000 0000 0000 0000 0000
   2  *
   3  0004000 0020 4c00 3d24 1c01 d084 02fc 2064 29c0
   4  0004010 9004 02c0 4064 29c0 4800 576a 0000 0014
   5  0004020 62d3 28c0 8e73 0012 1260 5c00 0025 0015

This problem is caused by the occasional loss of 16K data at the beginning of the file after the link process calls the syscall.Fallocate function for the second time. However, not all machines can reproduce this error (the file system formats I tested include xfs, ext4, tmfs ).

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 13, 2022
@seankhliao seankhliao added this to the Unplanned milestone Aug 20, 2022
@gopherbot
Copy link

Change https://go.dev/cl/445835 mentions this issue: cmd/link: preallocate new space at the end of the file on Linux

@bcmills
Copy link
Contributor Author

bcmills commented Feb 3, 2023

No recent failures reported by watchflakes. @golang/loong64 — was this fixed on the builder, or by some mitigation in the compiler or linker, or is there more to do here?

@gopherbot
Copy link

Change https://go.dev/cl/465156 mentions this issue: dashboard: unmark known-issues with low failure rates

@xen0n
Copy link
Member

xen0n commented Feb 4, 2023

No recent failures reported by watchflakes. @golang/loong64 — was this fixed on the builder, or by some mitigation in the compiler or linker, or is there more to do here?

Hi, I've revisited this and it indeed seems to be fixed by kernel people in v6.1 (it was a folio regression across the board from 5.17 to 6.0), discovered by our investigation, so IMO this issue could be closed as completed. Thanks for the triage.

gopherbot pushed a commit to golang/build that referenced this issue Feb 4, 2023
I had initially added known issues fairly aggressively in order to use
them to reduce noise in 'greplogs -triage'. Now that we are using
'watchflakes' for triage, that noise reduction is no longer important
(the failures are already clustered to their respective known issues),
and having greyed-out cells on the dashboard makes new regressions too
easy to miss.

Concretely:

- golang/go#42212 is mostly specific to x/net at this point (as
  golang/go#57841)

- There have been no failures matching golang/go#51001 since October.

- golang/go#52724 has been so rare lately that we hadn't yet added a
  'watchflakes' pattern for it.

- There have been no failures matching golang/go#51443 since May.

- There have been no failures matching golang/go#53116 or
  golang/go#53093 since I enabled 'watchflakes' for the builder in
  December.

- The linux-amd64-perf builder seems to be passing consistently for
  x/benchmarks and x/tools, so there is no need to refer to
  golang/go#53538 to explain failures on it.

Change-Id: Ia16db2a23e5fa037a299f1f56fb26f1cf84521e1
Reviewed-on: https://go-review.googlesource.com/c/build/+/465156
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Run-TryBot: Bryan Mills <bcmills@google.com>
Auto-Submit: Bryan Mills <bcmills@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
@golang golang locked and limited conversation to collaborators Feb 4, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-loong64 Issues solely affecting the loongson architecture. compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
Status: Done
Development

No branches or pull requests

7 participants