Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segment violation since 2.10.0 #405

Closed
amongonz opened this issue Nov 7, 2022 · 24 comments
Closed

Segment violation since 2.10.0 #405

amongonz opened this issue Nov 7, 2022 · 24 comments

Comments

@amongonz
Copy link

amongonz commented Nov 7, 2022

Having installed utop 2.10.0 through opam on a fresh opam switch, trying to run utop or dune utop crashes immediately with a segment violation. Downgrading the package to 2.9.2 fixes the issue. This is with OCaml 4.14.0 and opam 2.1.2 on WSL2 Ubuntu 22.04.1 LTS, x86-64.

@nilsbecker
Copy link

nilsbecker commented Nov 23, 2022

I experience a reproducible hang of utop on startup. Also 2.10.0 on ocaml 4.14.0. On Macos, x86_64. However downgrading to 2.9.2 did not fix it, so it's likely a different issue. sorry for hte noise here

@emillon
Copy link
Collaborator

emillon commented Nov 24, 2022

Do you have any of these config files:
~/.config/utop/init.ml
~/.ocamlinit
~/.utoprc
~/.config/lambda-term-inputrc

@nilsbecker
Copy link

i saw it with our without .config/utop/init.ml, and the others are absent. however, for me the issue went away after a restart of the system ?! restart of the terminal app did not. i don't understand why but in any case, utop is starting normally for me now.

@amongonz
Copy link
Author

Do you have any of these config files:
~/.config/utop/init.ml
~/.ocamlinit
~/.utoprc
~/.config/lambda-term-inputrc

I do not have any of those files. The only utop-related file I see in those directories is ~/.utop-history.

@emillon
Copy link
Collaborator

emillon commented Nov 25, 2022

let's eliminate file-related problems: does the same happen if you rename ~/.utop-history to ~/.utop-history.bak ?

@amongonz
Copy link
Author

Still happens; I'm afraid it won't be that simple. I upgraded the opam package again, tried without ~/.utop-history and reproduced the segment violation.

@emillon
Copy link
Collaborator

emillon commented Nov 26, 2022

Thanks. At least that's ruled out.
What is the exact output when you run utop?
Is a log file created in ~/Library/Logs/DiagnosticReports/?

@amongonz
Copy link
Author

I think I've solved it: I can only reproduce the issue when the opam switch is configured with no-naked-pointer, which, I didn't remember, was the case in mine.

The output itself was not very informative:

$ utop
Violación de segmento (`core' generado)

(Spanish for "segmentation violation (core dumped)".) When I submitted this issue it didn't even generate a core dump, I've just now figured out that the default ulimit -c on this system was 0. So now I have more information:

$ gdb ocamlrun core
...
Program terminated with signal SIGSEGV, Segmentation fault.
#0  caml_darken (v=0, p=0x55e49f218240 <caml_global_data>) at major_gc.c:285

That would be a SIGSEGV dereferencing a block header in this function from the 4.14.0 runtime source. That code made me remember that my opam switch was configured with no-naked-pointer, and so, installing a new opam switch with the same options except for nnp did in fact fix it.

Has utop has started using naked pointers between 2.9.2 and 2.10? If I understand correctly, naked pointers won't be supported in OCaml 5 at all, so this seems like a regression. Strangely enough, if I configure a switch with nnpchecker and run utop 2.10 I don't get see any warnings at all. Maybe the interactive screen swallows them in some way, though.

Is a log file created in ~/Library/Logs/DiagnosticReports/

That's a macOS path, isn't it? This is an Ubuntu 22.04.1 running over WSL2 (Windows 10).

@emillon
Copy link
Collaborator

emillon commented Nov 28, 2022

Ah, good debugging, thanks. I think I can take it from here. The main thing that happened between 2.9.2 and 2.10 is a switch of the library we use to handle unicode. I agree that it should be made nnp-clean.

(FYI as a tip, you can get English error messages by setting LC_ALL=C, e.g. LC_ALL=C utop. puedo leer español pero a veces no es tan facil!)

That's a macOS path, isn't it? This is an Ubuntu 22.04.1 running over WSL2 (Windows 10).

Hmm, sorry I mixed up the 2 reports.

@emillon
Copy link
Collaborator

emillon commented Nov 28, 2022

OK I managed to repro with ocaml-option-nnp (for some reason ocaml-option-nnpchecker does nothing).

@emillon
Copy link
Collaborator

emillon commented Nov 28, 2022

This is an interesting thing to debug:

  • I bisected that to the switch to zed, as expected
  • there does not seem to be new C stubs (though I need to check more)
  • nnpchecker does not work in bytecode mode!
  • nnp does work but it just errors out.
  • ocamldebug just jumps straight into the segfault without catching it - need to check how to go backwards from there
  • I tried to strip most of the toploop code from utop in order to build a native version of utop, but it does not fail

@emillon
Copy link
Collaborator

emillon commented Nov 28, 2022

I reduced that to lambda-term. Dropping the following dune file in a new dir within its sources triggers the segfault (but replacing (libraries) by lambda-term's dependencies does not):

(test
 (name link)
 (modes byte)
 (libraries lambda-term)
 (link_flags :standard -linkall))

(rule
 (write-file link.ml ""))

@amongonz
Copy link
Author

So... this is getting complicated. I'm starting to think this is actually on OCaml and isn't worth investigating further.

ocamldebug just jumps straight into the segfault without catching it

Right, because it's crashing before it's finished loading the executable, on a major collection while initialising the bytecode runtime's caml_global_data table. OCaml debugging hasn't even started at this point. I consistently get this backtrace from ocamlrun utop 2.10 on OCaml 4.14.0 with nnp:

(gdb) backtrace
#0  caml_darken (v=0, p=0x5555555a7240 <caml_global_data>) at major_gc.c:285
#1  0x0000555555592288 in caml_do_roots (f=0x555555590000 <caml_darken>, do_globals=<optimized out>) at roots_byt.c:91
#2  0x0000555555592305 in caml_darken_all_roots_start () at roots_byt.c:75
#3  0x0000555555590920 in start_cycle () at major_gc.c:407
#4  caml_major_collection_slice (howmuch=howmuch@entry=-1) at major_gc.c:1089
#5  0x0000555555591e01 in caml_gc_dispatch () at minor_gc.c:500
#6  0x0000555555591f11 in caml_check_urgent_gc (extra_root=<optimized out>, extra_root@entry=1) at minor_gc.c:575
#7  0x000055555556c96c in caml_do_pending_actions_exn () at signals.c:267
#8  0x000055555556cd00 in process_pending_actions_with_root_exn (extra_root=<optimized out>) at signals.c:299
#9  caml_process_pending_actions () at signals.c:331
#10 0x0000555555577ce2 in intern_end (res=<optimized out>, whsize=<optimized out>) at intern.c:709
#11 0x0000555555578a63 in caml_input_val (chan=chan@entry=0x5555555de740) at intern.c:808
#12 0x000055555558cb4f in caml_main (argv=0x7fffffffde48) at startup_byt.c:570
#13 0x0000555555569f12 in main (argc=<optimized out>, argv=<optimized out>) at main.c:37

caml_darken v=0 thus implies a naked null pointer has sneaked into the global data. So I tried to dump caml_global_data with the inspect library within utop, expecting it to crash only on 2.10, and discovered that it triggers SIGSEGV on utop 2.10, 2.9.2 and even on raw ocaml!

#use "topfind";;
#require "inspect";;
external get_global_data : unit -> Obj.t = "caml_get_global_data";;
get_global_data () |> Inspect.Sexpr.dump;; (* SIGSEGV *)

Those libraries barely have dependencies, so now I think the naked null pointer is already there from the start, but it doesn't cause issues on nnp until the global data grows large enough that it triggers a major collection before completing initialisation of the bytecode runtime; lambda-term may just contribute a lot to it. Any non-determinism during initialisation could explain @nilsbecker's report as well, if they were running nnp too.

Since OCaml itself is being largely reworked for 5.x, I think we should just close this issue, assume that 4.14.0's bytecode runtime is unsafe to run with nnp and watch out for future issues in the 5.x runtime. Sorry for the headache!

FYI as a tip, you can get English error messages by setting LC_ALL=C

True, so much software ignores C locales nowadays (for good reasons) that I forget about it! J'ai aussi oublié le français.

@Octachron
Copy link
Member

@debugnik did you test if you observe the same segfault with one of the OCaml 5.0.0 beta releases?

@emillon emillon reopened this Nov 28, 2022
@emillon
Copy link
Collaborator

emillon commented Nov 28, 2022

Thanks for the repro, I think it's very much worth trying to fix. On my side I'm trying to make a useful bytecode program that can make this crash.

@emillon
Copy link
Collaborator

emillon commented Nov 28, 2022

@Octachron the "inspect" code sample still segfaults on ocaml.5.0.0~beta2. Do we expect caml_get_global_data to return something inspectable?

@amongonz
Copy link
Author

@Octachron I haven't tried out 5.0 yet, as I prefer waiting for stable first releases, but I was just reading your announcement of beta2 so I'll give it a try.

gives it a try (5.0 switches build faster, don't they?)

No segfault running utop 2.10 normally on 5.0.0~beta2, but the snippet poking into caml_global_data does trigger SIGSEGV from utop and ocaml. I suppose this is the same situation as before, then: a naked null pointer is lingering but major collection isn't triggering before the globals are rooted.

@emillon
Copy link
Collaborator

emillon commented Nov 28, 2022

I've wrote a program that tries to print every global value, and I manually noted which indices cause a segfault - these are instead printed with a simpler algorithm that just displays tag/size:

#use "topfind";;
#require "inspect";;

let do_print = function
  | 12 -> false
  | 45 -> false
  | 46 -> false
  | 50 -> false
  | 61 -> false
  | 62 -> false
  | 64 -> false
  | 65 -> false
  | 71 -> false
  | 73 -> false
  | 76 -> false
  | 81 -> false
  | 89 -> false
  | 101 -> false
  | 117 -> false
  | 119 -> false
  | 143 -> false
  | 154 -> false
  | 156 -> false
  | 161 -> false
  | _ -> true

external get_global_data : unit -> Obj.t array = "caml_get_global_data";;
let g = get_global_data () in
Array.iteri (fun i d ->
  if do_print i then
    Format.printf "@[%d %a@.@]" i (Inspect.Sexpr.dump_with_formatter ?context:None) d
  else
    Format.printf "@[%d skipped %s@.@]" i (Inspect.Value.description d)
) g
0 (DUMP (OBJ/0 :TAG 248 :VALUES (STR/1 :LEN 13 "Out_of_memory") -1))
1 (DUMP (OBJ/0 :TAG 248 :VALUES (STR/1 :LEN 9 "Sys_error") -2))
2 (DUMP (OBJ/0 :TAG 248 :VALUES (STR/1 :LEN 7 "Failure") -3))
3 (DUMP (OBJ/0 :TAG 248 :VALUES (STR/1 :LEN 16 "Invalid_argument") -4))
4 (DUMP (OBJ/0 :TAG 248 :VALUES (STR/1 :LEN 11 "End_of_file") -5))
5 (DUMP (OBJ/0 :TAG 248 :VALUES (STR/1 :LEN 16 "Division_by_zero") -6))
6 (DUMP (OBJ/0 :TAG 248 :VALUES (STR/1 :LEN 9 "Not_found") -7))
7 (DUMP (OBJ/0 :TAG 248 :VALUES (STR/1 :LEN 13 "Match_failure") -8))
8 (DUMP (OBJ/0 :TAG 248 :VALUES (STR/1 :LEN 14 "Stack_overflow") -9))
9 (DUMP (OBJ/0 :TAG 248 :VALUES (STR/1 :LEN 14 "Sys_blocked_io") -10))
10 (DUMP (OBJ/0 :TAG 248 :VALUES (STR/1 :LEN 14 "Assert_failure") -11))
11 (DUMP
      (OBJ/0
         :TAG 248
         :VALUES
         (STR/1 :LEN 26 "Undefined_recursive_module")
         -12))
12 skipped Block(0): #3
13 (DUMP (STR/0 :LEN 2 "%,"))
14 (DUMP (STR/0 :LEN 12 "really_input"))
15 (DUMP (STR/0 :LEN 5 "input"))
16 (DUMP (BLK/0 :TAG 0 :VALUES 0 (BLK/1 :TAG 0 :VALUES 6 0)))
17 (DUMP (BLK/0 :TAG 0 :VALUES 0 (BLK/1 :TAG 0 :VALUES 7 0)))
18 (DUMP (STR/0 :LEN 16 "output_substring"))
19 (DUMP (STR/0 :LEN 6 "output"))
20 (DUMP
      (BLK/0
         :TAG 0
         :VALUES
         1
         (BLK/1
            :TAG 0
            :VALUES
            3
            (BLK/2 :TAG 0 :VALUES 4 (BLK/3 :TAG 0 :VALUES 6 0)))))
21 (DUMP
      (BLK/0
         :TAG 0
         :VALUES
         1
         (BLK/1
            :TAG 0
            :VALUES
            3
            (BLK/2 :TAG 0 :VALUES 4 (BLK/3 :TAG 0 :VALUES 7 0)))))
22 (DUMP (STR/0 :LEN 5 "%.12g"))
23 (DUMP (STR/0 :LEN 1 "."))
24 (DUMP (STR/0 :LEN 2 "%d"))
25 (DUMP (STR/0 :LEN 5 "false"))
26 (DUMP (STR/0 :LEN 4 "true"))
27 (DUMP (BLK/0 :TAG 0 :VALUES 1))
28 (DUMP (BLK/0 :TAG 0 :VALUES 0))
29 (DUMP (STR/0 :LEN 5 "false"))
30 (DUMP (STR/0 :LEN 4 "true"))
31 (DUMP (STR/0 :LEN 14 "bool_of_string"))
32 (DUMP (STR/0 :LEN 4 "true"))
33 (DUMP (STR/0 :LEN 5 "false"))
34 (DUMP (STR/0 :LEN 11 "char_of_int"))
35 (DUMP (STR/0 :LEN 19 "index out of bounds"))
36 (DUMP (STR/0 :LEN 28 "Pervasives.array_bound_error"))
37 (DUMP (STR/0 :LEN 11 "Stdlib.Exit"))
38 (DUMP 9218868437227405312L)
39 (DUMP -4503599627370496L)
40 (DUMP 9218868437227405313L)
41 (DUMP 9218868437227405311L)
42 (DUMP 4503599627370496L)
43 (DUMP 4372995238176751616L)
44 (DUMP (STR/0 :LEN 21 "Pervasives.do_at_exit"))
45 skipped Block(0): #104
46 skipped Block(0): #14
47 (DUMP (STR/0 :LEN 16 "Stdlib.Sys.Break"))
48 (DUMP (STR/0 :LEN 11 "5.0.0~beta2"))
49 (DUMP
      (BLK/0
         :TAG 0
         :VALUES
         5
         0
         0
         (BLK/1
            :TAG 0
            :VALUES
            (BLK/2 :TAG 0 :VALUES 1 (STR/3 :LEN 5 "beta2")))))
50 skipped Block(0): #51
51 (DUMP (STR/0 :LEN 22 "Obj.Ephemeron.blit_key"))
52 (DUMP (STR/0 :LEN 23 "Obj.Ephemeron.check_key"))
53 (DUMP (STR/0 :LEN 23 "Obj.Ephemeron.unset_key"))
54 (DUMP (STR/0 :LEN 21 "Obj.Ephemeron.set_key"))
55 (DUMP (STR/0 :LEN 26 "Obj.Ephemeron.get_key_copy"))
56 (DUMP (STR/0 :LEN 21 "Obj.Ephemeron.get_key"))
57 (DUMP (STR/0 :LEN 20 "Obj.Ephemeron.create"))
58 (DUMP (STR/0 :LEN 25 "Obj.extension_constructor"))
59 (DUMP (STR/0 :LEN 25 "Obj.extension_constructor"))
60 (DUMP (BLK/0 :TAG 0 :VALUES (STR/1 :LEN 6 "obj.ml") 97 4))
61 skipped Block(0): #24
62 skipped Block(0): #8
63 (DUMP (STR/0 :LEN 26 "CamlinternalLazy.Undefined"))
64 skipped Block(0): #3
65 skipped Block(0): #7
66 (DUMP (BLK/0 :TAG 0 :VALUES (STR/1 :LEN 6 "seq.ml") 596 4))
67 (DUMP (STR/0 :LEN 8 "Seq.drop"))
68 (DUMP (STR/0 :LEN 8 "Seq.take"))
69 (DUMP (STR/0 :LEN 8 "Seq.init"))
70 (DUMP (STR/0 :LEN 23 "Stdlib.Seq.Forced_twice"))
71 skipped Block(0): #57
72 (DUMP (STR/0 :LEN 14 "option is None"))
73 skipped Block(0): #16
74 (DUMP (STR/0 :LEN 14 "result is Ok _"))
75 (DUMP (STR/0 :LEN 17 "result is Error _"))
76 skipped Block(0): #19
77 (DUMP (STR/0 :LEN 4 "true"))
78 (DUMP (STR/0 :LEN 5 "false"))
79 (DUMP 1.000000)
80 (DUMP 0.000000)
81 skipped Block(0): #6
82 (DUMP (STR/0 :LEN 2 "\\\\"))
83 (DUMP (STR/0 :LEN 2 "\\'"))
84 (DUMP (STR/0 :LEN 2 "\\b"))
85 (DUMP (STR/0 :LEN 2 "\\t"))
86 (DUMP (STR/0 :LEN 2 "\\n"))
87 (DUMP (STR/0 :LEN 2 "\\r"))
88 (DUMP (STR/0 :LEN 8 "Char.chr"))
89 skipped Block(0): #6
90 (DUMP (BLK/0 :TAG 0 :VALUES (STR/1 :LEN 8 "uchar.ml") 88 18))
91 (DUMP (BLK/0 :TAG 0 :VALUES (STR/1 :LEN 8 "uchar.ml") 91 7))
92 (DUMP (BLK/0 :TAG 0 :VALUES (STR/1 :LEN 8 "uchar.ml") 80 18))
93 (DUMP (BLK/0 :TAG 0 :VALUES (STR/1 :LEN 8 "uchar.ml") 85 7))
94 (DUMP (STR/0 :LEN 26 " is not a latin1 character"))
95 (DUMP (STR/0 :LEN 4 "%04X"))
96 (DUMP (STR/0 :LEN 2 "U+"))
97 (DUMP (STR/0 :LEN 31 " is not an Unicode scalar value"))
98 (DUMP (STR/0 :LEN 2 "%X"))
99 (DUMP (STR/0 :LEN 25 "U+0000 has no predecessor"))
100 (DUMP (STR/0 :LEN 25 "U+10FFFF has no successor"))
101 skipped Block(0): #24
102 (DUMP (STR/0 :LEN 9 "List.map2"))
103 (DUMP (STR/0 :LEN 10 "List.iter2"))
104 (DUMP (STR/0 :LEN 15 "List.fold_left2"))
105 (DUMP (STR/0 :LEN 16 "List.fold_right2"))
106 (DUMP (STR/0 :LEN 13 "List.for_all2"))
107 (DUMP (STR/0 :LEN 12 "List.exists2"))
108 (DUMP (BLK/0 :TAG 0 :VALUES 0 0))
109 (DUMP (STR/0 :LEN 12 "List.combine"))
110 (DUMP (STR/0 :LEN 13 "List.rev_map2"))
111 (DUMP (STR/0 :LEN 9 "List.init"))
112 (DUMP (STR/0 :LEN 8 "List.nth"))
113 (DUMP (STR/0 :LEN 3 "nth"))
114 (DUMP (STR/0 :LEN 8 "List.nth"))
115 (DUMP (STR/0 :LEN 2 "tl"))
116 (DUMP (STR/0 :LEN 2 "hd"))
117 skipped Block(0): #62
118 (DUMP (STR/0 :LEN 2 "%d"))
119 skipped Block(0): #12
120 (DUMP (STR/0 :LEN 19 "index out of bounds"))
121 (DUMP (BLK/0 :TAG 0 :VALUES (STR/1 :LEN 8 "bytes.ml") 820 20))
122 (DUMP (BLK/0 :TAG 0 :VALUES (STR/1 :LEN 8 "bytes.ml") 831 9))
123 (DUMP (STR/0 :LEN 19 "index out of bounds"))
124 (DUMP (STR/0 :LEN 19 "index out of bounds"))
125 (DUMP (BLK/0 :TAG 0 :VALUES (STR/1 :LEN 8 "bytes.ml") 766 20))
126 (DUMP (BLK/0 :TAG 0 :VALUES (STR/1 :LEN 8 "bytes.ml") 777 9))
127 (DUMP (STR/0 :LEN 19 "index out of bounds"))
128 (DUMP (BLK/0 :TAG 0 :VALUES (STR/1 :LEN 8 "bytes.ml") 654 20))
129 (DUMP (BLK/0 :TAG 0 :VALUES (STR/1 :LEN 8 "bytes.ml") 679 9))
130 (DUMP (STR/0 :LEN 31 "Bytes.of_seq: cannot grow bytes"))
131 (DUMP (STR/0 :LEN 44 "String.rcontains_from / Bytes.rcontains_from"))
132 (DUMP (STR/0 :LEN 42 "String.contains_from / Bytes.contains_from"))
133 (DUMP (STR/0 :LEN 46 "String.rindex_from_opt / Bytes.rindex_from_opt"))
134 (DUMP (STR/0 :LEN 38 "String.rindex_from / Bytes.rindex_from"))
135 (DUMP (STR/0 :LEN 44 "String.index_from_opt / Bytes.index_from_opt"))
136 (DUMP (STR/0 :LEN 36 "String.index_from / Bytes.index_from"))
137 (DUMP (STR/0 :LEN 12 "Bytes.concat"))
138 (DUMP (STR/0 :LEN 31 "String.blit / Bytes.blit_string"))
139 (DUMP (STR/0 :LEN 10 "Bytes.blit"))
140 (DUMP (STR/0 :LEN 24 "String.fill / Bytes.fill"))
141 (DUMP (STR/0 :LEN 12 "Bytes.extend"))
142 (DUMP (STR/0 :LEN 22 "String.sub / Bytes.sub"))
143 skipped Block(0): #87
144 (DUMP (STR/0 :LEN 44 "String.rcontains_from / Bytes.rcontains_from"))
145 (DUMP (STR/0 :LEN 42 "String.contains_from / Bytes.contains_from"))
146 (DUMP (STR/0 :LEN 46 "String.rindex_from_opt / Bytes.rindex_from_opt"))
147 (DUMP (STR/0 :LEN 38 "String.rindex_from / Bytes.rindex_from"))
148 (DUMP (STR/0 :LEN 44 "String.index_from_opt / Bytes.index_from_opt"))
149 (DUMP (STR/0 :LEN 36 "String.index_from / Bytes.index_from"))
150 (DUMP (STR/0 :LEN 0 ""))
151 (DUMP (STR/0 :LEN 0 ""))
152 (DUMP (STR/0 :LEN 13 "String.concat"))
153 (DUMP (STR/0 :LEN 0 ""))
154 skipped Block(0): #64
155 (DUMP (STR/0 :LEN 2 "()"))
156 skipped Block(0): #3
157 (DUMP (STR/0 :LEN 18 "Marshal.from_bytes"))
158 (DUMP (STR/0 :LEN 18 "Marshal.from_bytes"))
159 (DUMP (STR/0 :LEN 17 "Marshal.data_size"))
160 (DUMP (STR/0 :LEN 42 "Marshal.to_buffer: substring out of bounds"))
161 skipped Block(0): #8
162 (DUMP (BLK/0 :TAG 0 :VALUES (STR/1 :LEN 8 "array.ml") 319 4))
163 (DUMP (STR/0 :LEN 13 "Array.combine"))
164 (DUMP (STR/0 :LEN 13 "Array.exists2"))
165 (DUMP (STR/0 :LEN 14 "Array.for_all2"))
166 (DUMP (STR/0 :LEN 44 "Array.map2: arrays must have the same length"))
167 (DUMP (STR/0 :LEN 45 "Array.iter2: arrays must have the same length"))
168 (DUMP (STR/0 :LEN 10 "Array.blit"))
169 (DUMP (STR/0 :LEN 10 "Array.fill"))
170 (DUMP (STR/0 :LEN 9 "Array.sub"))
171 (DUMP (STR/0 :LEN 10 "Array.init"))
172 (DUMP (STR/0 :LEN 19 "Stdlib.Array.Bottom"))

(the "skipped" ones are the ones that otherwise cause a segfault)

@Octachron
Copy link
Member

Octachron commented Nov 28, 2022

Trying to dump any functional value is a segfault within the no-naked-pointer mode since Inspect reads the code pointer:

# Inspect.Sexpr.dump (fun x -> x);;
Segmentation fault (core dumped)

@emillon
Copy link
Collaborator

emillon commented Nov 28, 2022

Ah, I see. So yes that's expected any case there's a closure. So that inspect thing is a red herring too.
I can confirm that the original issue (linking against lambda-term in nnp mode for example) does not appear on 5.0.0~beta2 anymore.

@emillon
Copy link
Collaborator

emillon commented Nov 28, 2022

Finally, I can reproduce the original issue with some generated code (and no utop/lambdaterm):

; dune
(executable
 (name gen)
 (modules gen))

(rule
 (with-stdout-to generated.ml
  (run ./gen.exe)))

(test
 (name generated)
 (modules generated)
 (modes byte))
(* gen.ml *)
let blank = String.make 1000 ' '

let () =
  for _ = 1 to 10000 do
    Printf.printf "let _ = \"%s\"\n" blank
  done

When running dune runtest, the compiler does not fail, but it produces a bytecode program that fails when started. This only happens when naked pointers are disabled on 4.14.0.

This also happen with a single large string with

let blank = String.make 10_000_000 ' '
let () = Printf.printf "let _ = \"%s\"\n" blank

(in contrast, generating more smaller strings crashes the compiler with a stack overflow - I know that the compiler team is not interested in fixing this)

@Octachron is that something that should get fixed on the ocaml/ocaml side? I can open a report with that information.

@Octachron
Copy link
Member

This is indeed an interesting bug report for 4.14 (to keep track of the issue at the very least).

@amongonz
Copy link
Author

Trying to dump any functional value is a segfault within the no-naked-pointer mode since Inspect reads the code pointer:

# Inspect.Sexpr.dump (fun x -> x);;
Segmentation fault (core dumped)

Good catch! If caml_hash can't distinguish code pointers when configured with nnp, that quick inspect will segfault on something other than a null pointer outside of non-nnp 4.14.0; shame. At least we have a proper repro now, this was interesting.

@emillon
Copy link
Collaborator

emillon commented Dec 9, 2022

This has been closed in ocaml/ocaml#11788. I'm closing this since this is an ocaml bug (fixed by disabling nnp, or updating to 4.14.1 or 5.0.0~rc1). Thanks a lot for the debugging help.

@emillon emillon closed this as completed Dec 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants