Skip to content

Fix spurious major GC slices.#13086

Merged
kayceesrk merged 2 commits intoocaml:trunkfrom
damiendoligez:fix-spurious-slices
Apr 18, 2024
Merged

Fix spurious major GC slices.#13086
kayceesrk merged 2 commits intoocaml:trunkfrom
damiendoligez:fix-spurious-slices

Conversation

@damiendoligez
Copy link
Member

Reported to me by @stedolan.

Because of commit e6370d5 (memory.c) and PR #11750 we have some spurious major GC slices when a minor GC promotes more than 20% of the minor heap and a large allocation comes along before the next scheduled major slice (the one that happens when the minor heap is half full).

This is because the allocated_words counter will keep the amount of memory promoted by the minor GC until the next major slice takes it into account.

The consequence is that the large allocation will trigger a major slice on the spot, then the next scheduled major slice has very little work to do.

The solution is have a separate count for direct major allocations, and use it (instead of all major allocations) to trigger unscheduled major slices.

The problem is illustrated by the program found here: https://gist.github.com/damiendoligez/4d65d0ade50e6d0b2726e812a0eb7a14

Number of major slices (displayed by OCAMLRUNPARAM=v=0x40 ./a.out 2>&1 | grep '^allocated_words =' | wc):

version large_allocs=true large_allocs=false
trunk 3638 1879
this PR 1869 1879

Note that the amount of major GC work is not affected by this problem, but we still incur some overhead for starting the extra slices, and the latency profile is changed (the major slice pauses are closer to the minor GC pauses) so it's still worth fixing.

@gasche
Copy link
Member

gasche commented Apr 9, 2024

This could/should probably be a new flag for caml_shared_try_alloc, "do I come from the minor GC?", which would reduce the risk of getting this accounting wrong in the future if we call that function from more places

I convinced myself that the code says what it does, but how do we know that what it does is reasonable? (In particular, are there regressions on other workloads?)

@stedolan
Copy link
Contributor

This could/should probably be a new flag for caml_shared_try_alloc, "do I come from the minor GC?", which would reduce the risk of getting this accounting wrong in the future if we call that function from more places

I disagree. This is a change to GC policy: caml_alloc_shr should trigger extra major GC eventually while allocations from promotions should not. It belongs where the GC policy is defined (major_gc.c and memory.c), not in the allocator (shared_heap.c).

I convinced myself that the code says what it does, but how do we know that what it does is reasonable? (In particular, are there regressions on other workloads?)

IIRC, this change narrows an earlier fix that made too wide a change: the / 5 condition resolved an issue with a program that was doing a lot of direct major allocation, but the change affected all programs, even ones that did very little direct major allocation. This change means that the fix for such programs is more precisely targeted.

@damiendoligez damiendoligez self-assigned this Apr 17, 2024
@kayceesrk kayceesrk merged commit f37847f into ocaml:trunk Apr 18, 2024
gasche added a commit to gasche/ocaml that referenced this pull request Jul 18, 2024
When tracking a Coq performance regression in OCaml 5 (ocaml#13300), we
realized that the GC work-computation heuristics are bad for ramp-up
phases of programs that do a lot of unmarshalling, such as the loading
of Coq .vo files implied by its `Require Import` directives.

The problem is that the GC assumes a steady state, where it should
work to release approximately the same amount of memory that was
allocated since the beginning of the cycle. When allocating a large
amount of long-lived memory, this assumption results in excessive
marking work (traversing the heap to find free memory).

The fix introduces a sub-category of `major_allocated_words` (the number
of words allocated into the major heap), called
`major_allocated_words_longlived`, which counts the major-heap memory
that we expect to live until the end of time, or the next request for
a full major GC, and is not taken into account to decide marking work.

The only place where this new "longlived" category is used in this
commit is in the unmarshalling code, only in the case where the
post-unmarshalling data is larged than Max_young_wosize (cannot be
allocated as a single block in the minor heap), that is currently 2Mio
on a 64bit system.

This heuristic could be a problem for OCaml programs that allocate a
lot of short-lived memory through unmarshalling, by packets of more
than 2Mio each. Those programs could suffer from vastly increased
memory consuptmion.

Performance numbers for this change, with the Coq default GC settings:

```
Summary
  coqc.5.2+backport+change ran
    1.22 ± 0.02 times faster than coqc.5.2+backport
    1.29 ± 0.10 times faster than coqc.5.2
```

with the OCaml default GC settings:

```
Summary
  coqc.5.2+backport+change ran
    1.30 ± 0.03 times faster than coqc.5.2+backport
    1.41 ± 0.11 times faster than coqc.5.2
```

In these numbers:

- coqc.5.2 is with the stock 5.2 runtime
- coqc.5.2+backport has a backport of ocaml#13086,
  which changed the GC pacing slightly
- coqc.5.2+backport+change is the described
  change, on top of the backport
@damiendoligez damiendoligez deleted the fix-spurious-slices branch September 12, 2024 12:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants