Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: SIGILL on aix-ppc64 #44706

Closed
gbbr opened this issue Mar 1, 2021 · 14 comments
Closed

runtime: SIGILL on aix-ppc64 #44706

gbbr opened this issue Mar 1, 2021 · 14 comments

Comments

@gbbr
Copy link
Member

@gbbr gbbr commented Mar 1, 2021

We are getting SIGILL at startup for one of the binaries in the Datadog Agent, when goenvs() runs in the runtime (at startup):

$ gdb ./datadog-agent/trace-agent core.20971810.25160320 
GNU gdb (GDB) 8.1.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "powerpc64-ibm-aix6.1.0.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./datadog-agent/trace-agent...done.

warning: core file may not match specified executable file.
Core was generated by `trace-agent'.
Program terminated with signal SIGILL, Illegal instruction.
#0  0x000000010004d034 in runtime.goenvs_unix () at /opt/freeware/lib/golang/src/runtime/runtime1.go:85
85              envs = make([]string, n)
(gdb) where
#0  0x000000010004d034 in runtime.goenvs_unix () at /opt/freeware/lib/golang/src/runtime/runtime1.go:85
#1  0x000000010006dacc in runtime.ensureSigM.func1 () at /opt/freeware/lib/golang/src/runtime/signal_unix.go:875
#2  0x0000d431e0ddf00d in ?? ()
(gdb) quit

There is a TODO there which is concerning:
https://github.com/golang/go/blob/go1.14/src/runtime/runtime1.go#L85. Could it be related?

Is there any more information I could provide? Would appreciate some help debugging this. Apologies for the lack of information, but I do not have access to the machine which reproduces the problem.

@gbbr gbbr added the OS-AIX label Mar 1, 2021
@ALTree
Copy link
Member

@ALTree ALTree commented Mar 1, 2021

Is there any more information I could provide?

Please answer a couple of the issue template questions that you deleted:

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

Clear steps to reproduce the issue could be useful, too.

@gbbr
Copy link
Member Author

@gbbr gbbr commented Mar 1, 2021

Thanks for the prompt reply. Please bear with me while I try to obtain this information.

@laboger
Copy link
Contributor

@laboger laboger commented Mar 1, 2021

Also, when you are in gdb and stopped at the SIGILL, can you do:

x/i $pc
info reg $lr
info reg $ctr
@laboger
Copy link
Contributor

@laboger laboger commented Mar 2, 2021

@Helflym Are you seeing this problem too?

@gbbr
Copy link
Member Author

@gbbr gbbr commented Mar 2, 2021

I've lost access to my AIX VM and trying to get it back to see if I can reproduce with latest Go. The above problem was from a binary built with go1.14. I'll be back with more details ASAP.

Thanks a lot for jumping in 🙏

@Helflym
Copy link
Contributor

@Helflym Helflym commented Mar 3, 2021

No, I don't remember having already seen anything like this.
I'll take a deeper look tomorrow (I can't today).

@gbbr
Copy link
Member Author

@gbbr gbbr commented Mar 3, 2021

I managed to reproduce this issue on my own now, I have the binary and the core dump available. It happens with go1.16 too.

@laboger here is the response to the commands you've requested (core dump from binary compiled with go1.16):

Program terminated with signal SIGILL, Illegal instruction.
#0  0x000000010004e904 in runtime.check () at /usr/local/go/src/runtime/runtime1.go:239
239     /usr/local/go/src/runtime/runtime1.go: A file or directory in the path name does not exist..
(gdb) where
#0  0x000000010004e904 in runtime.check () at /usr/local/go/src/runtime/runtime1.go:239
#1  0x0000000100072dbc in runtime.rt0_go () at /usr/local/go/src/runtime/asm_ppc64x.s:82
#2  0xbadc0ffee0ddf00d in ?? ()
(gdb) x/i $pc
=> 0x10004e904 <runtime.check+468>:     lbarx   r31,0,r6
(gdb) info reg $lr
lr             0x100072dbc      0x100072dbc <runtime.rt0_go+172>
(gdb) info reg $ctr
ctr            0x100072d10      4295437584
(gdb) 

Note that this time it breaks in a different place.

Any ideas for next steps?

@randall77
Copy link
Contributor

@randall77 randall77 commented Mar 3, 2021

That is an atomic byte instruction from POWER8.
Are your sure your chip supports it? https://github.com/golang/go/wiki/MinimumRequirements#ppc64-big-endian

@gbbr
Copy link
Member Author

@gbbr gbbr commented Mar 3, 2021

That is an atomic byte instruction from POWER8.
Are your sure your chip supports it?

My bad. The machine I logged into said "POWER8" in the title, but checking prtconf right now I see it's POWER7. I wonder if that's the issue for the problem in the start of this thread too. I'm going to look into it and if that's the case, close the issue.

@gbbr
Copy link
Member Author

@gbbr gbbr commented Mar 3, 2021

It turns out the processor was Power7. Closing. Sorry for the misunderstanding.

@gbbr gbbr closed this Mar 3, 2021
@gbbr
Copy link
Member Author

@gbbr gbbr commented Mar 9, 2021

@randall77 I hope it's ok to ping you here. I have a question and I don't want to pollute with yet another open issue, perhaps you have a quick answer for me. Would a machine like this support:

Server 1:Processor Implementation 
Mode: POWER 7 Processor 
Type: PowerPC_POWER8

Server 2:Processor Implementation 
Mode: POWER 7Processor 
Type: PowerPC_POWER9

Example of two machines running processor type POWER8 and POWER9 but mode is POWER7.

@Helflym
Copy link
Contributor

@Helflym Helflym commented Mar 9, 2021

@gbbr if I remember correctly, the processor mode is exactly as if the CPU behind was a power7. Thus, it won't be supported.

@randall77
Copy link
Contributor

@randall77 randall77 commented Mar 9, 2021

I have no idea. Wouldn't be hard for you to test it, I suppose.

@laboger
Copy link
Contributor

@laboger laboger commented Mar 24, 2021

@gbbr Sorry for the late response. lbarx is new in power8 do won't run on power7. Is there a reason this is running in power7 mode on a power8 and power9? Seems like that could be easily fixed with the appropriate setting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants