Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce run-time checks that the domain lock is held #11506

Merged
merged 4 commits into from
Sep 21, 2022

Conversation

gadmm
Copy link
Contributor

@gadmm gadmm commented Aug 23, 2022

Following #11485, this PR makes it simpler to find out why the user's program crashes when the domain lock is incorrectly acquired from C code.

  • Add an assertion that Caml_state is not NULL (in debug mode)
  • Add a check in public entry points of the C API (always)

For instance, when calling CAMLparam() without the domain lock, instead of a segfault, one gets a fatal error with message f: no domain lock held where f is the name of the function.

cc @gasche

@gadmm
Copy link
Contributor Author

gadmm commented Aug 23, 2022

(The changes to the Changes.md file assumes that it is merged in 5.0; indeed this changes the API: Caml_state is no longer synonymous with the new Caml_state_opt.)

@dra27
Copy link
Member

dra27 commented Aug 24, 2022

I'm not (yet) sold on the idea that correct code should be penalised with a mandatory check. Assuming that the segfault is reproducible, isn't it enough then to be able to re-run with the debug runtime?

@gadmm
Copy link
Contributor Author

gadmm commented Aug 24, 2022

What is the reasoning that this is going to cost? If you take one of the perf-sensitive functions (e.g. caml_modify and caml_initialize) and look at the generated code, you'll see that the load of Caml_state gets CSE'd and your are only left with a correctly-predicted branch.

I do not think that recompiling their app with OCaml's debug mode is what users will think about for locating such bugs (and it is clearly not made for debugging user's programming errors).

@gadmm gadmm force-pushed the caml_state_assertion branch 2 times, most recently from 7b13257 to d2df7ef Compare August 29, 2022 13:54
@gadmm
Copy link
Contributor Author

gadmm commented Aug 29, 2022

There is a sandmark benchmark running. I added a commit that removes many of the checks via CAMLparam inside "internal" code. The sandmark micro-benchmarks are very sensitive to code layout changes, so this change will help better interpret the benchmark results.

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On re-reading (I think your new documentation helped, thanks!) I understand the idea better now, and I think it makes sense. Let's also wait for those sandmark results.

Changes Outdated Show resolved Hide resolved
manual/src/cmds/intf-c.etex Outdated Show resolved Hide resolved
@gadmm
Copy link
Contributor Author

gadmm commented Aug 30, 2022

Re. benchmarks. Here is a result for the original PR:
turing
The shape suggests that there could be a slight slowdown, yet by running a few individual benchmarks on my machine, the microarchitectural effects (relevant or irrelevant) seemed dominant (e.g. some would speed up on my machine instead of slowing down, etc). Hence I wanted to see the result of the new commit "Do not check Caml_state inside CAMLparam* when internal" below. The new commit reduces the number of check points from ~120 to ~40 in the OCaml runtime.
turing2
This shuffles things around but the overall shape is the similar, however note the change of scale. This is consistent with microarchitectural effects being mostly at play. For instance, I could reproduce the slowdown of quicksort by chance. This benchmark is heavy with caml_modify, so you would think this is the likely culprit. However the slowdown did not go away when removing the check inside caml_modify... but it did so when removing a check never called by the benchmark. I could not observe a cost of having a check inside caml_modify.

We are not going to get definitive answers with the Sandmark benchmarks. It contains many micro-benchmarks and those are usually very difficult to exploit and interpret. As it stands, the suite is not well-suited to evaluate performance impacts at this level.

(In addition, it is good to have in mind that they disable ASLR for reproducibility, as I have just learned, see https://github.com/ocaml-bench/notes/blob/master/apr19.md. This means one always observes the same "random" point in the space of possible code layouts. So when the results are identical from one day to another you cannot conclude that the actual variability between two runs is low. It would be better to make the code layout more random (à la Berger's Stabilizer) rather than reduce randomness.)

Nevertheless this is not incompatible with this PR having a tiny overhead (much smaller than the variations seen on the graphs above, explaining that it is slightly skewed to the right).

Since we cannot count on Sandmark, I propose a more analytical approach: the checks are placed either where the load of Caml_state_opt is redundant (eliminated by CSE), or if the function is obviously not so performance-sensitive. This means it is good to keep inside CAMLparam, but not inside CAMLdrop.

As for caml_modify I am mixed. There is CSE, but this still anticipates a load which might not always occur otherwise, even if the CPU can parallelize this load the rest of the function. On the other hand, if this difference mattered for caml_modify then there are other optimizations which could be done to it (e.g. some redundant loads that the compiler is not eliminating currently). I do not want to spend more time with benchmarks so I propose to err on the cautious side.

@gadmm
Copy link
Contributor Author

gadmm commented Aug 31, 2022

I am happy with this version if you are. With the last commits it "looks" better (with no statistical meaning whatsoever):
turing

@gasche
Copy link
Member

gasche commented Aug 31, 2022

I would like someone with more runtime expertise than myself to give the greenlight to the extra checks. @stedolan, @xavierleroy, @damiendoligez?

@gadmm
Copy link
Contributor Author

gadmm commented Aug 31, 2022

Ok. To sum-up:

  • Checks are placed in such a way that no extra load happens in anything remotely perf-sensitive (confirmed via disassembler).
  • This is best reviewed by looking at the global diff of the PR (for once). I'll clean-up the history later.

Copy link
Member

@damiendoligez damiendoligez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation needs to be reworded, but the implementation looks good, and I'm OK with the (lack of) measurable overhead.

Comment on lines 2503 to 2504
The domain state variable "Caml_state" checks that the domain lock is
held, either in debug mode or at key entry points of the C API.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads wrong: a variable doesn't do anything. In any case, Caml_state only checks in debug mode, and you have another function to check in non-debug mode.

@gadmm
Copy link
Contributor Author

gadmm commented Sep 7, 2022

Thanks, I have improved the wording, rebased, and cleaned-up the history.

@gadmm
Copy link
Contributor Author

gadmm commented Sep 7, 2022

@Octachron Do you agree to backport to 5.0? Otherwise this is an API breakage for 5.1 and it might be better to call the PR off.

Changes Outdated Show resolved Hide resolved
@Octachron
Copy link
Member

Rereading the history, this a breaking change only compared to the alpha1 release, isn't it? (since previous versions could not use Caml_state to test if the current thread held the domain lock.)

Splitting the invariant (NULL <=> domain lock not held) to the more specific Caml_state_opt seems quite forward-compatible.

In term of API, I think it is reasonable to integrate this change in 5.0 .

@gadmm
Copy link
Contributor Author

gadmm commented Sep 8, 2022

This is correct

Copy link
Contributor Author

@gadmm gadmm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(manipulation error)

Changes Outdated Show resolved Hide resolved
Implement suggestions from Gabriel Scherer and Damien Doligez
@gadmm
Copy link
Contributor Author

gadmm commented Sep 9, 2022

I further cleaned-up history. Let's merge this so that I can make the necessary update to Boxroot in time.

@gasche gasche merged commit f57bbc6 into ocaml:trunk Sep 21, 2022
gasche added a commit that referenced this pull request Sep 21, 2022
Introduce run-time checks that the domain lock is held

(cherry picked from commit f57bbc6)
@gasche
Copy link
Member

gasche commented Sep 21, 2022

Merged, and cherry-picked to 5.0 in 26b9861.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants