-
Notifications
You must be signed in to change notification settings - Fork 18k
os/exec: fatal error - unlock of unlocked lock - on arm5 platform #13615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hm, my first guess would be an unaligned 64-bit atomic access, but sync.Once (as used by os/exec and syscall.ForkLock) seem to only contain 32-bit values. WaitGroup does 64-bit loads (as seen in the systemstack trace), but I don't see any use of WaitGroup in StartProcess or os/exec. What hardware is this? And kernel distro/kernel might be relevant. |
Thank you for your prompt reply. Per you query: /proc/cpuinfo
/proc/version
|
Please set the environment variable GOTRACEBACK=2, run the program, and attach the resulting stack dump. Thanks. |
Can you check /proc/cpu/alignment (sorry from memory) and make sure there On Tue, 15 Dec 2015, 09:21 Ian Lance Taylor notifications@github.com
|
After setting GOTRACEBACK=2 several test runs were made and all failed consistently with two different error messages. The "unlock of unlocked" which I initially reported and a less common "unexpected signal during runtime execution".
|
After a clean boot of the system, the below cpu/alignment report was printed. After several test runs that all resulted in failure, the alignment report still reported zero errors (no change). I modified the User fault setting per your suggestion to "5" (report bus error) and ran the above reported tests under that setting. cat /proc/cpu/alignment
|
Followup: After several dozen test runs that all ended in panic, one test run was encountered that resulted in an alignment error.
cat /proc/cpu/alignment
|
Thanks for confirming, it's good to know that alignment errors haven't crept back into code generation.
|
It still sounds like memory corruption bugs.
The address given to atomicload64 is aligned correctly.
|
The test code was run on another hardware platform to confirm that it is not a hardware problem. The code fails in the same manner on alternate hardware. |
Do you have gdb (or gdbserver) on the platform?
I'd like to catch the SIGBUS under gdb and see
exact which instruction triggers it and the address
that triggers it. The address in the provided stack
trace is perfectly aligned, and the kernel complains
that the faulting address is 0, which doesn't make
sense. Either the runtime systemstack mechanism
is subtly wrong or something else is happening
here.
If you don't have gdb, at least run the program with
GC disabled and see if that helps. (set GOGC=off)
Also note that the kernel version (2.6.33) is a little
old. If the unlock of unlocked mutex panic happens
very frequently, it might well be a kernel futex bug.
|
Testing with gdb apparently causes enough of a environment change such that the “unexpected signal” error does not occur easily or at all. It is easy to reproduce the “unlock of unlocked lock” error, but gdb does not provide any additional debugging information about the failure (unless I am missing something). Looking into getting a more current version of embedded Linux to test if that might be the source of the problem. However the platform has considerable code written and tested to 2.6.33 and upgrading may not be an option regardless. Will advise shortly. |
Am unable to upgrade the Linux version (2.6.33-rc4) on the target platform due to legacy issues. The same code when run on other Linux versions (different platform) works flawlessly. |
A fatal error is consistently being generated by the exec.Output() function on the arm platform. The function is being used to call a python script that returns a single line of text.
The function will intermittently fail after 5 to 200 calls with an "unlock of unlocked lock" error.
Using golang version 1.5.2.
The executable was cross-compiled on a Linux 32-bit VM for arm5. I have reduced the failure to the segment of code below.
Any suggestions would be much appreciated.
The error output:
The text was updated successfully, but these errors were encountered: