Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC not aggressive enough, leads to OOM in simple conditions #7228

Closed
vicuna opened this Issue Apr 15, 2016 · 3 comments

Comments

Projects
None yet
1 participant
@vicuna
Copy link
Collaborator

vicuna commented Apr 15, 2016

Original bug ID: 7228
Reporter: @alainfrisch
Status: closed (set by @damiendoligez on 2016-04-18T14:49:11Z)
Resolution: fixed
Priority: normal
Severity: major
Target version: 4.03.0+dev / +beta1
Fixed in version: 4.03.0+dev / +beta1
Category: runtime system and C interface
Monitored by: dberthod braibant jmeber @hcarty

Bug description

The following program:

let () =
  let h = Hashtbl.create 1024 in
  for k = 1 to 1000 do
    Printf.printf "=======================\nROUND %d\n" k;
    flush stdout;

    Hashtbl.clear h;
    for i = 1 to 10000000 do
      Hashtbl.add h i (string_of_int i);
    done
  done

eats more and more memory and crashes after a few rounds (number is not deterministic).

Tests:

MSVC 32-bit:
trunk : crash after 25 or 19 rounds.
4.03 : crash after 25 or 19 rounds.
4.02 : memory usage oscillating between 150Mb and 350Mb; no crash until round 140.

Linux 64-bit:
trunk : memory usage oscillating between 600Mb and 1Gb, but no failure until round 35.
4.02 : memory usage oscillating between 500Mb and 1Gb, but no failure until round 140.

Another example:

let () =
  let l = ref [] in
  for k = 1 to 1000 do
    Printf.printf "=======================\nROUND %d\n%!" k;
    l := [];
    for i = 1 to 10000000 do
      l := Bytes.create 20 :: !l
    done
  done

Tests:

MSVC 32-bit:
trunk : crash after 5 rounds.
4.02 : memory usage oscillating between 200Mb and 550Mb; no .

Both native and bytecode are affected.

With the last example, here is the output of the test with OCAMLRUNPAM=v=1:

$ OCAMLRUNPARAM=v=1 ./eatmem.exe
=======================
ROUND 1
Starting new major GC cycle
Starting new major GC cycle
Starting new major GC cycle
Starting new major GC cycle
Starting new major GC cycle
Starting new major GC cycle
Starting new major GC cycle
Starting new major GC cycle
Starting new major GC cycle
Starting new major GC cycle
=======================
ROUND 2
=======================
ROUND 3
=======================
ROUND 4
=======================
ROUND 5
Fatal error: out of memory.

Setting OCAMLRUNPARAM=o=50 only delays the problem (crash at round 26).

Adding a call to Gc.major before the inner loop avoids the crash.

With OCAMLRUNPARAM=v=64 and grepping for "computed work", I observe that with 4.02, this amount remains around 1474537, while with 4.03, it oscillates a lot. Here is output of this grep piped to "uniq -c":

      1 computed work = 128574 words
      1 computed work = 284671 words
      1 computed work = 433664 words
      1 computed work = 574463 words
      1 computed work = 550947 words
      1 computed work = 728868 words
      1 computed work = 838521 words
      1 computed work = 1157120 words
      2 computed work = 1330687 words
      1 computed work = 1228898 words
      2 computed work = 1228883 words
      4 computed work = 1474537 words
      3 computed work = 1228848 words
      1 computed work = 1228839 words
      5 computed work = 1474537 words
      1 computed work = 1228831 words
      4 computed work = 1228825 words
      1 computed work = 1228819 words
      8 computed work = 1474537 words
      1 computed work = 1228814 words
      6 computed work = 1228810 words
      2 computed work = 1228806 words
     11 computed work = 1474537 words
      2 computed work = 1228803 words
      9 computed work = 108938 words
     11 computed work = 255035 words
     13 computed work = 382057 words
     10 computed work = 492524 words
     24 computed work = 1474537 words
     15 computed work = 115340 words
     22 computed work = 260583 words
     25 computed work = 386874 words
     28 computed work = 130639 words
     33 computed work = 273877 words
     35 computed work = 121659 words
     65 computed work = 1474537 words
      1 computed work = 4 words
      1 computed work = 182366 words
     27 computed work = 182364 words
      1 computed work = 695 words
      1 computed work = 181672 words
      3 computed work = 182364 words
     58 computed work = 136869 words
     66 computed work = 121051 words
     76 computed work = 127938 words
     88 computed work = 32216 words
     88 computed work = 84244 words
      1 computed work = 321 words
      1 computed work = 83928 words
      1 computed work = 84241 words
     11 computed work = 84244 words
    116 computed work = 52577 words
    134 computed work = 48643 words
    118 computed work = 65748 words
      1 computed work = 250 words
      1 computed work = 65501 words
      1 computed work = 65746 words
     33 computed work = 65748 words
    177 computed work = 38979 words
    169 computed work = 38981 words
      1 computed work = 148 words
      1 computed work = 38834 words
      1 computed work = 38979 words
     32 computed work = 38981 words

It seems that when the heap reaches a certain size, the GC reduces too much the amount of work to do in each slide.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Apr 15, 2016

Comment author: @alainfrisch

I think I've found the problem. Computing "caml_stat_heap_wsz * 250" in major_gc.c overflows and returns bogus results. Casting caml_stat_heap_wsz to double seems to fix the problem. But there might be other similar places to fix.

Damien: does casting to double seem like a correct fix?

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Apr 17, 2016

Comment author: @gasche

Alain proposed a fix on github:

#546

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Apr 18, 2016

Comment author: @damiendoligez

Fixed by #546.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.