Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: illegal Instruction on POWER8 with go 1.13 #38909

Closed
trdyer opened this issue May 6, 2020 · 13 comments
Closed

cmd/compile: illegal Instruction on POWER8 with go 1.13 #38909

trdyer opened this issue May 6, 2020 · 13 comments
Milestone

Comments

@trdyer
Copy link

@trdyer trdyer commented May 6, 2020

What version of Go are you using (go version)?

$ go version
1.13

Does this issue reproduce with the latest release?

Yes, it occurs when built with go 1.14 as well.

What operating system and processor architecture are you using (go env)?

go env Output
[user@redacted ~]# uname -a
Linux redacted 3.10.0-327.4.4.el7.ppc64 #1 SMP Thu Dec 17 15:52:21 EST 2015 ppc64 ppc64 ppc64 GNU/Linux
[user@redacted ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.2 (Maipo)
[user@redacted ~]# lscpu
Architecture:          ppc64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Big Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    4
Core(s) per socket:    1
Socket(s):             2
NUMA node(s):          1
Model:                 IBM,8231-E1C
L1d cache:             32K
L1i cache:             32K
NUMA node0 CPU(s):     0-7
[root@pbul-rhel7-ppc64-01 ~]#

What did you do?

I built a minimal go binary on macOS Catalina cross-compiling to linux-ppc64 using https://github.com/trdyer/go-test-ppc64

My test platforms are: linux/amd64 solaris/amd64 darwin/amd64 linux/s390x aix/ppc64 linux/ppc64 linux/ppc64le

I scp'd the binary to the remote rhel7/ppc64 server and executed the program.

What did you expect to see?

The output should have said "whats going on!"

What did you see instead?

Illegal Instruction

As far as I can tell this is a POWER8 CPU that should be supported.

@trdyer
Copy link
Author

@trdyer trdyer commented May 6, 2020

I also ran it on a POWER8 server with a slightly different CPU and it executed.

0 user@redacted /tmp # ./bt-test_linux_ppc64
whats going on!
0 user@redacted /tmp # uname -a
Linux redacted 3.10.0-327.10.1.el7.ppc64 #1 SMP Sat Jan 23 04:57:27 EST 2016 ppc64 ppc64 ppc64 GNU/Linux
0 user@redacted /tmp # cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.2 (Maipo)
0 user@redacted /tmp # lscpu
Architecture:          ppc64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Big Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    8
Core(s) per socket:    1
Socket(s):             1
NUMA node(s):          2
Model:                 IBM,8284-22A
L1d cache:             64K
L1i cache:             32K
NUMA node0 CPU(s):     0-7
NUMA node2 CPU(s):
0 user@redacted /tmp #

@randall77
Copy link
Contributor

@randall77 randall77 commented May 6, 2020

@laboger
If you try it with tip, it should print the bytes for the instruction that caused the SIGILL. That might help.

@trdyer
Copy link
Author

@trdyer trdyer commented May 6, 2020

@randall77 @laboger in my reproductions it seems to be related to the CPU model. but maybe also kernel version.

For instance this rhel6 box with the same CPU model as my working rhel7 example does not work

[user@redacted ~]# lscpu
Architecture:          ppc64
Byte Order:            Big Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    4
Core(s) per socket:    1
Socket(s):             1
NUMA node(s):          1
Model:                 IBM,8284-22A
L1d cache:             64K
L1i cache:             32K
NUMA node0 CPU(s):     0-3
[user@redacted ~]# uname -a
Linux redacted 2.6.32-573.18.1.el6.ppc64 #1 SMP Wed Jan 6 11:15:06 EST 2016 ppc64 ppc64 ppc64 GNU/Linux
@laboger
Copy link
Contributor

@laboger laboger commented May 7, 2020

This does not fail for me on the linux/ppc64 systems I have. I even tried doing a cross compile with latest go1.13 and moving it over to a power8 and it worked fine. I am not familiar with the model you are using and don't have access to such a system.

Can you use gdb to find the bad instruction?
gdb ./bt-test_linux_ppc64
run
When it hits the SIGILL do:
x/i $pc

@laboger
Copy link
Contributor

@laboger laboger commented May 7, 2020

The model you display in your go env output is:
Model: IBM,8231-E1C
And this is a power7.

@laboger
Copy link
Contributor

@laboger laboger commented May 7, 2020

What RHEL 6 do you have on your 8284-22A? It should be at least RHEL 6.5 or RHEL 7 according to this https://www.ibm.com/support/knowledgecenter/linuxonibm/liaam/liaamdistros.html#liaamdistros__supportedpower8.

@trdyer
Copy link
Author

@trdyer trdyer commented May 7, 2020

@laboger
I have rhel 6.7 on the 8284-22A.

[user@rhel6-ppc64-01 tmp]# lscpu
Architecture:          ppc64
Byte Order:            Big Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    4
Core(s) per socket:    1
Socket(s):             1
NUMA node(s):          1
Model:                 IBM,8284-22A
L1d cache:             64K
L1i cache:             32K
NUMA node0 CPU(s):     0-3
[user@rhel6-ppc64-01 tmp]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 6.7 (Santiago)

with go 1.13

(gdb) run
Starting program: /tmp/bt-test_linux_ppc64

Program received signal SIGILL, Illegal instruction.
0x000000000004995c in runtime.check () at /usr/local/go/src/runtime/runtime1.go:238
238	/usr/local/go/src/runtime/runtime1.go: No such file or directory.
	in /usr/local/go/src/runtime/runtime1.go
(gdb) x/i $pc
=> 0x4995c :	lbarx   r31,0,r5
(gdb)

and go 1.14

(gdb) run
Starting program: /tmp/bt-test_linux_ppc64

Program received signal SIGILL, Illegal instruction.
0x000000000005007c in runtime.check () at /usr/local/go/src/runtime/runtime1.go:238
238	/usr/local/go/src/runtime/runtime1.go: No such file or directory.
	in /usr/local/go/src/runtime/runtime1.go
(gdb) x/i $pc
=> 0x5007c :	lbarx   r31,0,r5
(gdb)
@laboger
Copy link
Contributor

@laboger laboger commented May 7, 2020

Can you provide the full output when it gets the SIGILL (not using gdb)?

@laboger
Copy link
Contributor

@laboger laboger commented May 7, 2020

Or in gdb after you hit the SIGILL:
p $_siginfo.si_code

@trdyer
Copy link
Author

@trdyer trdyer commented May 7, 2020

[user@rhel6-ppc64-01 tmp]# ./bt-test_linux_ppc64
Illegal instruction
[user@rhel6-ppc64-01 tmp]# gdb ./bt-test_linux_ppc64
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-83.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "ppc64-redhat-linux-gnu".
For bug reporting instructions, please see:
...
Reading symbols from /tmp/bt-test_linux_ppc64...done.
warning: Missing auto-load scripts referenced in section .debug_gdb_scripts
of file /tmp/bt-test_linux_ppc64
Use `info auto-load python [REGEXP]' to list them.
(gdb) run
Starting program: /tmp/bt-test_linux_ppc64

Program received signal SIGILL, Illegal instruction.
0x000000000004995c in runtime.check () at /usr/local/go/src/runtime/runtime1.go:238
238	/usr/local/go/src/runtime/runtime1.go: No such file or directory.
	in /usr/local/go/src/runtime/runtime1.go
(gdb) p $_siginfo.si_code
Attempt to extract a component of a value that is not a structure.
(gdb)

@laboger
Copy link
Contributor

@laboger laboger commented May 7, 2020

Can you display /proc/cpuinfo on your RHEL6.

@laboger
Copy link
Contributor

@laboger laboger commented May 7, 2020

The suspicion is that your RHEL6.7 kernel is set up to run in power7 compat mode. Also run:
LD_SHOW_AUXV=1 /bin/true | grep _PLATFORM

@ALTree ALTree changed the title Illegal Instruction on POWER8 with go 1.13 cmd/compile: illegal Instruction on POWER8 with go 1.13 May 8, 2020
@ALTree ALTree added this to the Go1.15 milestone May 8, 2020
@trdyer
Copy link
Author

@trdyer trdyer commented May 8, 2020

yup, that looks like it's it.

Sorry for wasting your time, but thank you for your help!

[user@rhel6-ppc64-01 ~]# cat /proc/cpuinfo
processor	: 0
cpu		: POWER7 (architected), altivec supported
clock		: 3891.000000MHz
revision	: 2.1 (pvr 004b 0201)

processor	: 1
cpu		: POWER7 (architected), altivec supported
clock		: 3891.000000MHz
revision	: 2.1 (pvr 004b 0201)

processor	: 2
cpu		: POWER7 (architected), altivec supported
clock		: 3891.000000MHz
revision	: 2.1 (pvr 004b 0201)

processor	: 3
cpu		: POWER7 (architected), altivec supported
clock		: 3891.000000MHz
revision	: 2.1 (pvr 004b 0201)

timebase	: 512000000
platform	: pSeries
model		: IBM,8284-22A
machine		: CHRP IBM,8284-22A
[user@rhel6-ppc64-01 ~]# LD_SHOW_AUXV=1 /bin/true | grep _PLATFORM
AT_PLATFORM:     power7
AT_BASE_PLATFORM:power8
@trdyer trdyer closed this May 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.