Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing else in caml_memprof_renew_minor_sample #10056

Merged
merged 3 commits into from
Dec 4, 2020
Merged

Conversation

dra27
Copy link
Member

@dra27 dra27 commented Nov 27, 2020

This looks like a tiny regression from #8684? Spotted while not managing to fix something else...

cc @jhjourdan, @stedolan

@gadmm
Copy link
Contributor

gadmm commented Nov 30, 2020

I gave it a look since this looked to me like a bug for 4.12 and the others seem busy.

I agree that the else clause looks like an omission in #8684. It seems that the hypothesis that caml_memprof_young_trigger equals [Caml_state->young_alloc_start] if no sampling is planned in the current minor heap (from memprof.c) is not used in the code, and it does not change the behaviour if it is assigned a value strictly less than Caml_state->young_alloc_start, which explains why no bug has been observed. Still it is incorrect to decrement past Caml_state->young_alloc_start which denotes the start of an array (if you are a C-lawyer I would be happy if you could confirm that point to me).

The piece of code puzzles me for another reason, though. The first call to rand_geom() after the minor heap is emptied will not be independent. Essentially, one decides to ignore the value on the dice and re-throw it if it is past a certain threshold, which might affect randomness. This is obvious for very small sampling rates or very small minor heap sizes, so I assume that this was chosen explicitly for simplicity.

Copy link
Contributor

@gadmm gadmm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I am not sure to understand the description in Changes, though: nothing is skipped.

Copy link
Contributor

@jhjourdan jhjourdan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @dra27, for this fix. I agree there was a bug in #8684, and your PR fixes it.

@jhjourdan
Copy link
Contributor

jhjourdan commented Dec 1, 2020

BTW, there is a CI failure with AppVeyor, but this seems to be unrelated with this PR.

@jhjourdan
Copy link
Contributor

The piece of code puzzles me for another reason, though. The first call to rand_geom() after the minor heap is emptied will not be independent. Essentially, one decides to ignore the value on the dice and re-throw it if it is past a certain threshold, which might affect randomness. This is obvious for very small sampling rates or very small minor heap sizes, so I assume that this was chosen explicitly for simplicity.

There actually is no problem with this, thanks to an interesting property of the geometric (and exponential) distribution : its memorylessness. Namely, if you know that a sample of the geometric distribution is larger than a given fixed bound (here, the top of the minor heap), then the conditional distribution is a shifted geometric distribution. What we do here is forgetting the sample which went past the bound of the minor heap, and use a fresh one (with the same distribution because of memorylessness) when the minor heap will be emptied.

@jhjourdan
Copy link
Contributor

@dra27 : I'd like to make sure the testsuite fails with the current code. Could you please wait while I propose a patch for this?

@dra27
Copy link
Member Author

dra27 commented Dec 1, 2020

The AppVeyor is #10055, so indeed unrelated.

@jhjourdan - no problem at all... this PR only sprang out of tracing code for #10055 and spotting that it looked odd!

@jhjourdan
Copy link
Contributor

Here is the patch I am proposing:

diff --git a/runtime/memprof.c b/runtime/memprof.c
index 385dc7b2d..74edb7d79 100644
--- a/runtime/memprof.c
+++ b/runtime/memprof.c
@@ -809,7 +809,8 @@ void caml_memprof_renew_minor_sample(void)
     if (Caml_state->young_ptr - Caml_state->young_alloc_start < geom)
       /* No trigger in the current minor heap. */
       caml_memprof_young_trigger = Caml_state->young_alloc_start;
-    caml_memprof_young_trigger = Caml_state->young_ptr - (geom - 1);
+    else
+      caml_memprof_young_trigger = Caml_state->young_ptr - (geom - 1);
   }
 
   caml_update_young_limit();
diff --git a/runtime/minor_gc.c b/runtime/minor_gc.c
index 4aa192254..85f87683e 100644
--- a/runtime/minor_gc.c
+++ b/runtime/minor_gc.c
@@ -171,7 +171,6 @@ void caml_set_minor_heap_size (asize_t bsz)
     Caml_state->young_alloc_start + Wsize_bsize (bsz) / 2;
   Caml_state->young_alloc_end = Caml_state->young_end;
   Caml_state->young_trigger = Caml_state->young_alloc_start;
-  caml_update_young_limit();
   Caml_state->young_ptr = Caml_state->young_alloc_end;
   Caml_state->minor_heap_wsz = Wsize_bsize (bsz);
   caml_memprof_renew_minor_sample();
diff --git a/runtime/signals.c b/runtime/signals.c
index 92d517e79..b5b11c2bb 100644
--- a/runtime/signals.c
+++ b/runtime/signals.c
@@ -229,6 +229,11 @@ value caml_execute_signal_exn(int signal_number, int in_signal_handler)
 
 void caml_update_young_limit (void)
 {
+  CAMLassert(Caml_state->young_alloc_start <= caml_memprof_young_trigger &&
+             caml_memprof_young_trigger <= Caml_state->young_alloc_end);
+  CAMLassert(Caml_state->young_alloc_start <= Caml_state->young_trigger &&
+             Caml_state->young_trigger < Caml_state->young_alloc_end);
+
   /* The minor heap grows downwards. The first trigger is the largest one. */
   Caml_state->young_limit =
     caml_memprof_young_trigger < Caml_state->young_trigger ?

This essentially adds an assertion in caml_update_young_limit which would fail if the triggers are not within bounds, and removes a call to caml_update_young_limit(); which makes the assertion fail but is redundant with a following call to caml_memprof_renew_minor_sample

@jhjourdan
Copy link
Contributor

This PR is not strictly speaking a bug, since, as @gadmm explained, it is unlikely that this will actually have an observable bad behavior, and this can only affect code which uses statmemprof. However, this is actually a C undefined behavior which could result in a bug if the compiler decides to use aggressive optimization (???) or if the minor heap is given a very low memory address. Therefore, I think we should definitely cherry-pick this to 4.12 and probably backport this to 4.11 if we plan to do another 4.11 release.

@dra27
Copy link
Member Author

dra27 commented Dec 1, 2020

I added commits with those patches - I've also re-worded Changes (I'd simply noted that it was going to ignore the skip instruction, without having put enough brain cycles into noticing that the trigger would virtually always be out of bounds anyway!)

@gadmm
Copy link
Contributor

gadmm commented Dec 1, 2020

There actually is no problem with this, thanks to an interesting property of the geometric (and exponential) distribution : its memorylessness. Namely, if you know that a sample of the geometric distribution is larger than a given fixed bound (here, the top of the minor heap), then the conditional distribution is a shifted geometric distribution. What we do here is forgetting the sample which went past the bound of the minor heap, and use a fresh one (with the same distribution because of memorylessness) when the minor heap will be emptied.

Thanks, of course!

@dra27
Copy link
Member Author

dra27 commented Dec 2, 2020

Rebased to get rid of the Changes conflict. I'll merge and cherry-pick when CI returns a convincing enough answer!

@dra27 dra27 merged commit 254963a into ocaml:trunk Dec 4, 2020
@dra27 dra27 deleted the fix-8684 branch December 4, 2020 13:31
dra27 added a commit that referenced this pull request Dec 4, 2020
Missing else in caml_memprof_renew_minor_sample

(cherry picked from commit 254963a)
dra27 added a commit that referenced this pull request Dec 4, 2020
Missing else in caml_memprof_renew_minor_sample

(cherry picked from commit 254963a)
@dra27
Copy link
Member Author

dra27 commented Dec 4, 2020

A most convincing answer from CI (we were one of the lucky AppVeyor runs!). Merged and cherry-picked to 4.11 and 4.12 (see references above)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants