-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce run-time checks that the domain lock is held #11506
Conversation
(The changes to the |
I'm not (yet) sold on the idea that correct code should be penalised with a mandatory check. Assuming that the segfault is reproducible, isn't it enough then to be able to re-run with the debug runtime? |
What is the reasoning that this is going to cost? If you take one of the perf-sensitive functions (e.g. caml_modify and caml_initialize) and look at the generated code, you'll see that the load of Caml_state gets CSE'd and your are only left with a correctly-predicted branch. I do not think that recompiling their app with OCaml's debug mode is what users will think about for locating such bugs (and it is clearly not made for debugging user's programming errors). |
7b13257
to
d2df7ef
Compare
There is a sandmark benchmark running. I added a commit that removes many of the checks via CAMLparam inside "internal" code. The sandmark micro-benchmarks are very sensitive to code layout changes, so this change will help better interpret the benchmark results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On re-reading (I think your new documentation helped, thanks!) I understand the idea better now, and I think it makes sense. Let's also wait for those sandmark results.
Re. benchmarks. Here is a result for the original PR: We are not going to get definitive answers with the Sandmark benchmarks. It contains many micro-benchmarks and those are usually very difficult to exploit and interpret. As it stands, the suite is not well-suited to evaluate performance impacts at this level. (In addition, it is good to have in mind that they disable ASLR for reproducibility, as I have just learned, see https://github.com/ocaml-bench/notes/blob/master/apr19.md. This means one always observes the same "random" point in the space of possible code layouts. So when the results are identical from one day to another you cannot conclude that the actual variability between two runs is low. It would be better to make the code layout more random (à la Berger's Stabilizer) rather than reduce randomness.) Nevertheless this is not incompatible with this PR having a tiny overhead (much smaller than the variations seen on the graphs above, explaining that it is slightly skewed to the right). Since we cannot count on Sandmark, I propose a more analytical approach: the checks are placed either where the load of As for |
b0a1974
to
408f5bc
Compare
I would like someone with more runtime expertise than myself to give the greenlight to the extra checks. @stedolan, @xavierleroy, @damiendoligez? |
Ok. To sum-up:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation needs to be reworded, but the implementation looks good, and I'm OK with the (lack of) measurable overhead.
manual/src/cmds/intf-c.etex
Outdated
The domain state variable "Caml_state" checks that the domain lock is | ||
held, either in debug mode or at key entry points of the C API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reads wrong: a variable doesn't do anything. In any case, Caml_state
only checks in debug mode, and you have another function to check in non-debug mode.
01df151
to
a05dc1b
Compare
Thanks, I have improved the wording, rebased, and cleaned-up the history. |
@Octachron Do you agree to backport to 5.0? Otherwise this is an API breakage for 5.1 and it might be better to call the PR off. |
Rereading the history, this a breaking change only compared to the alpha1 release, isn't it? (since previous versions could not use Splitting the invariant (NULL <=> domain lock not held) to the more specific In term of API, I think it is reasonable to integrate this change in 5.0 . |
This is correct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(manipulation error)
Implement suggestions from Gabriel Scherer and Damien Doligez
30aa645
to
2375b5c
Compare
I further cleaned-up history. Let's merge this so that I can make the necessary update to Boxroot in time. |
Introduce run-time checks that the domain lock is held (cherry picked from commit f57bbc6)
Merged, and cherry-picked to 5.0 in 26b9861. |
Following #11485, this PR makes it simpler to find out why the user's program crashes when the domain lock is incorrectly acquired from C code.
Caml_state
is notNULL
(in debug mode)For instance, when calling
CAMLparam()
without the domain lock, instead of a segfault, one gets a fatal error with messagef: no domain lock held
wheref
is the name of the function.cc @gasche