Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom blocks are finalized more than once if stored in OCaml's weak sets (probably related to Weak.get_copy) #7279

Closed
vicuna opened this issue Jun 27, 2016 · 16 comments

Comments

Projects
None yet
1 participant
@vicuna
Copy link

commented Jun 27, 2016

Original bug ID: 7279
Reporter: sawfish
Status: resolved (set by @damiendoligez on 2017-02-23T15:56:30Z)
Resolution: fixed
Priority: normal
Severity: major
Version: 4.03.0
Target version: 4.05.0 +dev/beta1/beta2/beta3/rc1
Fixed in version: 4.05.0 +dev/beta1/beta2/beta3/rc1
Category: standard library
Monitored by: braibant @gasche

Bug description

If custom blocks are stored in weak sets obtained by Weak.Make, their finalizers are sometimes called more than once.

There has been a discussion on the OCaml mailing list regarding this issue:
https://sympa.inria.fr/sympa/arc/caml-list/2016-06/msg00070.html

As a bottom line, the assumption was that the behavior may be related to shallow copies that are created by Weak.get_copy during lookup operations.

The bug is reproducible since OCaml 4.02.0 but does not occur in earlier versions. As an exception, the first beta release of OCaml 4.03.0 does not exhibit the bug, whereas it reproduces in the final 4.03.0 release.

Steps to reproduce

A github repository with a minimal example that reproduces the error is available at https://github.com/martin-neuhaeusser/ocaml_bug/

@vicuna

This comment has been minimized.

Copy link
Author

commented Jun 27, 2016

Comment author: bobot

The fact that Weak.Make uses Weak.get_copy is certainly the culprit.

For the fact that 0Caml 4.03.0+beta1 didn't have the bug but 4.03.0 does, is not really determining since 4.03.0+beta1 add a big bug during weak hashset resizing 8204f5d.

@vicuna

This comment has been minimized.

Copy link
Author

commented Jul 13, 2016

Comment author: @alainfrisch

Copying custom blocks sounds rather scary indeed. What about changing Weak.get_copy implementation (and documentation) so that it returns the original value if it is a custom block. Is there a better solution?

@vicuna

This comment has been minimized.

Copy link
Author

commented Jul 18, 2016

Comment author: sawfish

If Weak.get_copy just returned the original value if it is a custom block would make it rather useless in my opinion, as this makes the value live.

I am working on some OCaml bindings for an SMT solver and use a weak hashtable to lookup the solver's native terms which are stored in a custom block. This implements a restricted hash-consing that avoids having multiple alive custom blocks around that refer to the same native term.

My experiments suggest that Weak.get_copy works correctly (also when storing custom blocks) in OCaml versions prior to 4.02.0. So I still hope that there is a bug in handling the weak copies that could be fixed.

@vicuna

This comment has been minimized.

Copy link
Author

commented Jul 18, 2016

Comment author: @alainfrisch

My experiments suggest that Weak.get_copy works correctly (also when storing custom blocks) in OCaml versions prior to 4.02.0.

I think it is just that until recently, custom blocks with finalizers where normally always allocated directly in the major heap, and (to be confirmed), get_copy would instead allocate them in the minor heap, and their finalizer would not be run. Could that be the case? Of course, this is not a proper solution (and it's actually very unsafe).

would make it rather useless in my opinion, as this makes the value live.

It makes the values that fall in the same slot live for a bit longer. Whether it leads to increased memory use or not depends on the allocation profile of your application (and the size of the weak table, and the hashing function, etc), so unless you have tried it, I would be cautious before saying it renders the Weak table "useless".

This implements a restricted hash-consing that avoids having multiple alive custom blocks around that refer to the same native term.

This property would not be lost, of course.

@vicuna

This comment has been minimized.

Copy link
Author

commented Jul 19, 2016

Comment author: bobot

I think it is just that until recently, custom blocks with finalizers where normally always allocated directly in the major heap, and (to be confirmed), get_copy would instead allocate them in the minor heap, and their finalizer would not be run. Could that be the case? Of course, this is not a proper solution (and it's actually very unsafe).

By reading the code, it seems that only custom block allocated with caml_alloc_custom are finalized when in the minor heap. And get_copy allocates with caml_alloc. So I think nothing change on that matter.

@vicuna

This comment has been minimized.

Copy link
Author

commented Jul 20, 2016

Comment author: @alainfrisch

So I think nothing change on that matter.

Ok, do we all agree that this behavior was already wrong? get_copy can happily create a fresh copy of a custom block, which the client code can keep around while the original custom block is finalized perhaps releasing associated native resources. But the client code can still use the copy later and try to access those resources.

get_copy on custom block is really too risky: it breaks the invariant that custom blocks are only allocated explicitly by the user (or through the unmarshaling function using a function provided by the user).

@vicuna

This comment has been minimized.

Copy link
Author

commented Jul 20, 2016

Comment author: @alainfrisch

POC:

open Bigarray
open Bigarray.Array1

let () =
  let a = create float64 c_layout 10 in
  Gc.compact ();
  set a 0 42.;

  let w = Weak.create 1 in
  Weak.set w 0 (Some a);

  let b =
    match Weak.get_copy w 0 with
    | None -> assert false
    | Some b -> b
  in
  Printf.printf "a.(0) = %f\n" (get a 0);
  Printf.printf "b.(0) = %f\n" (get b 0);
  Gc.compact ();
  assert(Weak.get_copy w 0 = None);


  let c = create float64 c_layout 10 in
  Printf.printf "b.(0) = %f\n" (get b 0);
  set c 0 123.;
  Printf.printf "b.(0) = %f\n" (get b 0);

prints on my machine:

a.(0) = 42.000000
b.(0) = 42.000000
b.(0) = 0.000000
b.(0) = 123.000000

The array malloced for [c] reuses the space for [a] (the OCaml value has been collected, thus freeing the underlying data). This can easily lead to segfaults.

@vicuna

This comment has been minimized.

Copy link
Author

commented Jul 20, 2016

Comment author: bobot

get_copy on custom block is really too risky

I agree. I'm going to create a merge request for removing that. But should we deprecate get_copy and use get instead? I will check on Frama-C if that has a noticeable impact. Alain since you are a great user of hash-consing could you also check? I can prepare a second branch with this change.

sawfish: The problem appears already before 4.02.0, take your gcbug.ml and replace

diff --git a/gcbug.ml b/gcbug.ml
index c985f3f..f82e901 100644
--- a/gcbug.ml
+++ b/gcbug.ml
@@ -21,10 +21,7 @@ let create_dummy =
   let dummy_set = WeakDummySet.create 17 in
   fun num ->
     let new_dummy = wrapper_create_dummy num in
-    try WeakDummySet.find dummy_set new_dummy with
-    | Not_found ->
-      WeakDummySet.add dummy_set new_dummy;
-      new_dummy
+    WeakDummySet.merge dummy_set new_dummy
 
 (* Create a list of cnt random dummy instances *)
 let rec create_dummies cnt accu =

which is a more efficient version of the code. It fails with 4.01.0

@vicuna

This comment has been minimized.

Copy link
Author

commented Jul 21, 2016

Comment author: @alainfrisch

I'd argue to keep get_copy but change its implementation and doc so that it returns the original value when it is a custom block. This would only affect people doing weak tables on custom blocks, which should be rather rare (and more or less already broken anyway).

As for using get instead of get_copy in Weak.Make, this is another question. As I reported on caml-list, LexiFi now has a local variant of Weak.Make that uses get instead of get_copy; we use it for our core contract algebra system which is performance critical and this seemed to give good results. There can of course be cases where this would increase the memory usage too much; this depends on so many factors that it is difficult to give a general recommendation. For us it worked well.

@vicuna

This comment has been minimized.

Copy link
Author

commented Jul 21, 2016

Comment author: sawfish

@Frisch: The term "useless" is too harsh, of course.

After spending some more thoughts, I agree with Alain. Two observations:

  1. In the software model checker that we are implementing, we have a stand-alone implementation for hash-consing which works similar to the weak sets implemented by the Weak module: It stores the bucket entries' hashes in a usual array whereas the actual entries are stored in a Weak array. During a search, we first look for identical hashes; once a match is found, we skip the "Weak.get_copy" step and directly obtain the value from the weak array using Weak.get. In our experiments, this performs slightly better than the additional indirection via Weak.get_copy. I guess that if the underlying hash function is good, the probability of a hash collision within a bucket is low enough to take the risk of unneccessarily making an unreachable value alive.
    This observation seems to be similar to @Frisch's last remark.

  2. In case of hash-consing for terms, I think that Weak.get_copy is of limited value: If a shallow copy of an otherwise unreachable term is obtained during a search within a bucket, Weak.get_copy leaves the term itself unreachable but makes all its subterms alive. This has already been observed by Pascal Cuoq and Damien Doligez: "Hashconsing in an incrementally garbage-collected system", Section 7.2 (ML Workshop 2008).

These two observations make me believe that Weak.get_copy could be avoided entirely without impacting performance. But as there is no strong evidence for that claim (yet), I also suggest to keep Weak.get_copy and to modify it such that it returns the actual value in case that value is a custom block.

@vicuna

This comment has been minimized.

Copy link
Author

commented Jul 21, 2016

Comment author: @alainfrisch

Thanks @sawfish. This also confirms that one should perhaps let users choose whether get or get_copy should be used in Weak.Make.

@vicuna

This comment has been minimized.

Copy link
Author

commented Jul 22, 2016

Comment author: bobot

#710 for "don't copy custom block". Alain your POC is integrated in the testsuite.

@vicuna

This comment has been minimized.

Copy link
Author

commented Jul 22, 2016

Comment author: bobot

I created #7303 for discussing a full (utopic?) solution.

@vicuna

This comment has been minimized.

Copy link
Author

commented Aug 30, 2016

Comment author: @alainfrisch

Bumped target version. This is too late for 4.04 (and it's not a regression).

@vicuna

This comment has been minimized.

Copy link
Author

commented Feb 22, 2017

Comment author: @xavierleroy

Pinging @bobot to see if #710 is ready for merging.

@vicuna

This comment has been minimized.

Copy link
Author

commented Feb 23, 2017

Comment author: @damiendoligez

Fixed by #710

trunk: 2b8836c
4.05: 61037b1

@vicuna vicuna closed this Feb 23, 2017

@vicuna vicuna added the stdlib label Mar 14, 2019

@vicuna vicuna added this to the 4.05.0 milestone Mar 14, 2019

@vicuna vicuna added the bug label Mar 20, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.