Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Illegal instruction (core dumped) when running coqtop on s390x #7405

Closed
vicuna opened this issue Nov 9, 2016 · 14 comments

Comments

Projects
None yet
1 participant
@vicuna
Copy link

commented Nov 9, 2016

Original bug ID: 7405
Reporter: Richard Jones
Status: resolved (set by @xavierleroy on 2016-11-11T18:47:57Z)
Resolution: fixed
Priority: normal
Severity: crash
Platform: s390x
OS: Fedora
OS Version: 26
Version: 4.04.0
Fixed in version: 4.05.0 +dev/beta1/beta2/beta3/rc1
Category: back end (clambda to assembly)
Monitored by: @gasche

Bug description

bin/coqtop -boot -native-compiler -compile theories/Init/Prelude.v -noinit -R theories Coq
Makefile.build:590: recipe for target 'theories/Init/Prelude.vo' failed
make[1]: *** [theories/Init/Prelude.vo] Segmentation fault (core dumped)

I'm afraid I don't have very much detail, but I am able to get
a core file.

Here is the stack trace according to gdb:

Core was generated by `bin/coqtop -boot -native-compiler -compile theories/Init/Prelude.v -noinit -R t'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000020080995e22 in ?? ()
[Current thread is 1 (Thread 0x2003db32870 (LWP 4231))]
(gdb) bt
#0 0x0000020080995e22 in ?? ()
#1 0x000002005b1ccfdc in camlG_extraction__entry ()
at plugins/extraction/g_extraction.ml4:22
#2 0x000003ffd81f91c0 in ?? ()
PC not saved
(gdb) frame 1
#1 0x000002005b1ccfdc in camlG_extraction__entry ()
at plugins/extraction/g_extraction.ml4:22
22 ARGUMENT EXTEND mlname

The disassembly is below. Current program counter is marked with "=>"

Dump of assembler code for function camlG_extraction__entry:
0x000002005b1ccf30 <+0>: lay %r15,-40(%r15)
0x000002005b1ccf36 <+6>: stg %r14,32(%r15)
0x000002005b1ccf3c <+12>: lgrl %r3,0x2005b214fa0
0x000002005b1ccf42 <+18>: lgrl %r4,0x2005b2159a8
0x000002005b1ccf48 <+24>: stg %r3,0(%r4)
0x000002005b1ccf4e <+30>: brasl %r14,0x2005b1ccf7a <camlG_extraction__entry+74>
0x000002005b1ccf54 <+36>: lgrl %r9,0x2005b2152d0
0x000002005b1ccf5a <+42>: lg %r12,16(%r9)
0x000002005b1ccf60 <+48>: cgr %r2,%r12
0x000002005b1ccf64 <+52>: jgne 0x2005b1ccf74 <camlG_extraction__entry+68>
0x000002005b1ccf6a <+58>: lghi %r6,1
0x000002005b1ccf6e <+62>: jg 0x2005b1cd00a <camlG_extraction__entry+218>
0x000002005b1ccf74 <+68>: brasl %r14,0x20080995e22
0x000002005b1ccf7a <+74>: lay %r15,-16(%r15)
0x000002005b1ccf80 <+80>: stg %r14,0(%r15)
0x000002005b1ccf86 <+86>: stg %r13,8(%r15)
0x000002005b1ccf8c <+92>: lgr %r13,%r15
0x000002005b1ccf90 <+96>: lgrl %r5,0x2005b215140
0x000002005b1ccf96 <+102>: lg %r2,32(%r5)
0x000002005b1ccf9c <+108>: brasl %r14,0x2005b182198 camlGenarg__default_empty_value_2063@plt
0x000002005b1ccfa2 <+114>: cgfi %r2,1
0x000002005b1ccfa8 <+120>: jge 0x2005b1ccfba <camlG_extraction__entry+138>
0x000002005b1ccfae <+126>: lg %r8,0(%r2)
0x000002005b1ccfb4 <+132>: jg 0x2005b1ccfdc <camlG_extraction__entry+172>
0x000002005b1ccfba <+138>: lgrl %r9,0x2005b215458
0x000002005b1ccfc0 <+144>: lghi %r12,0
0x000002005b1ccfc4 <+148>: sty %r12,0(%r9)
0x000002005b1ccfca <+154>: lgrl %r2,0x2005b2152d0
0x000002005b1ccfd0 <+160>: lg %r2,16(%r2)
0x000002005b1ccfd6 <+166>: brasl %r14,0x20080995e22
=> 0x000002005b1ccfdc <+172>: lay %r11,-16(%r11)
0x000002005b1ccfe2 <+178>: clgr %r11,%r10
0x000002005b1ccfe6 <+182>: jgl 0x2005b1d09a2 <camlG_extraction__entry+14962>

Steps to reproduce

Compile coq from source on s390x with OCaml 4.04.0

Additional information

Fedora has some patches on top of OCaml 4.04.0, but not many and
I don't believe any of them are relevant to this bug. However you
can find the patches here in case:

https://git.fedorahosted.org/cgit/fedora-ocaml.git/log/?h=fedora-26-4.04.0

@vicuna

This comment has been minimized.

Copy link
Author

commented Nov 9, 2016

Comment author: Richard Jones

Apparently brasl %r14,0x20080995e22 is a branch to a subroutine. However gdb thinks that is an unmapped address:

(gdb) disassemble 0x20080995e22
No function contains specified address.
(gdb) disassemble *0x20080995e22
Cannot access memory at address 0x20080995e22

That explains why it crashes, although not why it's jumping to a random place.

@vicuna

This comment has been minimized.

Copy link
Author

commented Nov 9, 2016

Comment author: Richard Jones

The function it is trying to call is probably caml_raise_exn, given other code in the main coqtop program like this:

80283d2c:   c0 e5 00 38 90 7b       brasl   %r14,80995e22 <caml_raise_exn>

g_extraction.ml4 seems like it is part of a native dynlink library (confirmed by examining the strace output, showing that it is loading theories/extraction_plugin.cmxs). My suspicion would be that relocation of symbols in that library is going wrong somehow.

@vicuna

This comment has been minimized.

Copy link
Author

commented Nov 9, 2016

Comment author: Richard Jones

open("/home/rjones/d/fedora/coq/master/coq-8.5pl3/plugins/extraction/extraction_plugin.cmxs", O_RDONLY|O_CLOEXEC) = 7
read(7, "\177ELF\2\2\1\0\0\0\0\0\0\0\0\0\0\3\0\26\0\0\0\1\0\0\0\0\0\10$\330"..., 832) = 832
fstat(7, {st_mode=S_IFREG|0755, st_size=1443584, ...}) = 0
mmap(NULL, 1238464, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 7, 0) = 0x3ff6ca80000
mmap(0x3ff6cb8f000, 131072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 7, 0x10e000) = 0x3ff6cb8f000
close(7) = 0
mprotect(0x3ff6ca80000, 1110016, PROT_READ|PROT_WRITE) = 0
mprotect(0x3ff6ca80000, 1110016, PROT_READ|PROT_EXEC) = 0
mprotect(0x3ff6cb8f000, 4096, PROT_READ) = 0
brk(NULL) = 0x8d7ac000
brk(0x8d894000) = 0x8d894000
--- SIGILL {si_signo=SIGILL, si_code=ILL_ILLOPC, si_addr=0x3ff80995e22} ---
+++ killed by SIGILL (core dumped) +++

@vicuna

This comment has been minimized.

Copy link
Author

commented Nov 9, 2016

Comment author: Richard Jones

A simple reproducer is:

ocamlfind opt -g -package dynlink -linkpkg main.ml -o test
ocamlfind opt -g -shared -o lib.cmxs lib.ml
./test

--------------- main.c ----
let () =
Dynlink.init ();
(try Dynlink.loadfile "lib.cmxs"
with
| Dynlink.Error err ->
print_endline ("dynlink error: " ^ Dynlink.error_message err)
| exn ->
failwith ("unknown error loading plugin: " ^ Printexc.to_string exn)
)

--------------- lib.c ----
let () =
print_endline "loading the plugin ...";
raise Not_found

Note that it's debugging (-g) which causes the problem. If that is
disabled then the test case works fine for me.

@vicuna

This comment has been minimized.

Copy link
Author

commented Nov 9, 2016

Comment author: Richard Jones

This patch seems to fix it:

diff --git a/asmcomp/s390x/emit.mlp b/asmcomp/s390x/emit.mlp
index 5d233a3..a099bdb 100644
--- a/asmcomp/s390x/emit.mlp
+++ b/asmcomp/s390x/emit.mlp
@@ -611,7 +611,7 @@ let emit_instr i =
| Lraise k ->
begin match k with
| Cmm.Raise_withtrace ->

  •      `    brasl   %r14, {emit_symbol "caml_raise_exn"}\n`;
    
  •      `    brasl   %r14, {emit_symbol "caml_raise_exn"}@PLT\n`;
         let lbl = record_frame Reg.Set.empty true i.dbg in
         `{emit_label lbl}:\n`
       | Cmm.Raise_notrace ->
    
@vicuna

This comment has been minimized.

Copy link
Author

commented Nov 9, 2016

Comment author: Richard Jones

Applied to Fedora, and this fixes the Coq build.

https://git.fedorahosted.org/cgit/fedora-ocaml.git/commit/?h=fedora-26-4.04.0&id=e732c39340e86939530a087744caa8d8f1247878

@vicuna

This comment has been minimized.

Copy link
Author

commented Nov 9, 2016

Comment author: @gasche

Very nice! Would you be willing to submit a PR on github? If not I can take care of it -- reusing your commit.

@vicuna

This comment has been minimized.

Copy link
Author

commented Nov 9, 2016

Comment author: Richard Jones

Please submit one, I find the github workflow very tedious.

@vicuna

This comment has been minimized.

Copy link
Author

commented Nov 9, 2016

Comment author: @gasche

Done, thanks: #903

(Note: there are command-line tools to submit github pull-requests, see https://github.com/github/hub )

@vicuna

This comment has been minimized.

Copy link
Author

commented Nov 10, 2016

Comment author: @gasche

Xavier Leroy suggested a change to the patch, which is to use emit_call "caml_raise_exn"; instead. I updated my upstream patch proposal, the new version is available at:

https://patch-diff.githubusercontent.com/raw/ocaml/ocaml/pull/903.patch

@vicuna

This comment has been minimized.

Copy link
Author

commented Nov 11, 2016

Comment author: @xavierleroy

Seems to be fixed on trunk, see GPR 903. Should it be backported to 4.04.1?

Tentatively marking this MPR resolved. Yell if you disagree.

@vicuna vicuna closed this Nov 11, 2016

@vicuna

This comment has been minimized.

Copy link
Author

commented Nov 11, 2016

Comment author: Richard Jones

I do believe this should be backported to the 4.04 branch, for the benefit of Debian and other distros.

@vicuna

This comment has been minimized.

Copy link
Author

commented Nov 11, 2016

Comment author: @gasche

Fine with me (also un-invasive for other arches), I'll backport to 4.04.

@vicuna

This comment has been minimized.

Copy link
Author

commented Nov 12, 2016

Comment author: @gasche

4.04 commit: 7020a0b

7020a0b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.