Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition in caml_get_raw_backtrace #6554

Closed
vicuna opened this issue Sep 11, 2014 · 6 comments

Comments

Projects
None yet
2 participants
@vicuna
Copy link

commented Sep 11, 2014

Original bug ID: 6554
Reporter: @diml
Assigned to: @mshinwell
Status: closed (set by @xavierleroy on 2016-12-07T10:34:42Z)
Resolution: fixed
Priority: normal
Severity: major
Version: 4.02.0
Target version: 4.02.1+dev
Fixed in version: 4.02.1+dev
Category: runtime system and C interface
Monitored by: @gasche @yakobowski

Bug description

We were getting random segfault in one of our system, after some investigation it turns out to be due to a race condition in caml_get_raw_backtrace:

res = caml_alloc(caml_backtrace_pos, 0);
if(caml_backtrace_buffer != NULL) {
intnat i;
for(i = 0; i < caml_backtrace_pos; i++)
Field(res, i) = Val_Codet(caml_backtrace_buffer[i]);
}

caml_alloc might run a minor collection. The minor collection might run finalisers which might raise and catch exceptions, modifying the current backtrace. If [caml_backtrace_pos] ends up smaller because of this the end of [res] is garbage.

We'll push a fix today to at least avoid the segfault. This is still not completely satisfactory as this shows again that when you get a backtrace, you might get a completely random one.

Additional information

Here is a program that reproduce the bug, to be compiled with 'ocamlopt -g -inline 0':

let () = Printexc.record_backtrace true

let finaliser _ = try raise Exit with _ -> ()

let create () =
let x = ref () in
Gc.finalise finaliser x;
x

let f () = raise Exit

let () =
let minor_size = (Gc.get ()).Gc.minor_heap_size in
while true do
Gc.minor ();
try
ignore (create () : unit ref);
f ()
with _ ->
for i = 1 to minor_size / 2 - 1 do
ignore (ref ())
done;
ignore (Printexc.get_backtrace () : string)
done

@vicuna

This comment has been minimized.

Copy link
Author

commented Sep 11, 2014

Comment author: @diml

The code changed in 4.02 but the bug is still there: the rest of the array contains NULL pointers instead of garbage.

@vicuna

This comment has been minimized.

Copy link
Author

commented Sep 11, 2014

Comment author: @mshinwell

Fix committed to the 4.02 branch, rev. 15210; and trunk, rev. 15211.

@vicuna

This comment has been minimized.

Copy link
Author

commented Sep 11, 2014

Comment author: @lefessan

I think you were a bit fast to commit the fix. Would it be possible to give us a little time to review the fix before committing it ?

In particular, I don't understand why you are not saving the backtrace in a "malloced" space, so that it won't change during the "caml_alloc" ?

@vicuna

This comment has been minimized.

Copy link
Author

commented Sep 11, 2014

Comment author: @mshinwell

Well, two people have carefully reviewed the fix, and we used a test case to prove with reasonable certainty that it is fixed...

I don't understand your comment about the malloced space. The backtrace is saved on the stack across the allocation.

@vicuna

This comment has been minimized.

Copy link
Author

commented Sep 11, 2014

Comment author: @lefessan

Sorry, I read your comment and Jeremy's one in the wrong order (i.e. I understood that your commit didn't fix the problem), and looking only at the last sentence of your bug report, not the code itself.

Since you are saving the data on the stack, I assume that this function is not called from the top of the stack when a Stack_overflow is raised, but after the stack has already been unwinded, just to be sure ?

@vicuna

This comment has been minimized.

Copy link
Author

commented Sep 11, 2014

Comment author: @mshinwell

I believe we only end up in this function if the user asks for the backtrace; in particular, this isn't the function that stashes the backtrace. As such I think we should be ok if stack space is tight when the exception actually occurs.

@vicuna vicuna closed this Dec 7, 2016

@vicuna vicuna added the stdlib label Mar 14, 2019

@vicuna vicuna added this to the 4.02.1 milestone Mar 14, 2019

@vicuna vicuna added the bug label Mar 20, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.