Assertions cleanups, annotate unreachable code#13242
Conversation
| #elif defined(_MSC_VER) | ||
| #define unreachable() (__assume(0)) | ||
| #else | ||
| #define unreachable() (abort()) |
There was a problem hiding this comment.
I'd rather see this defined as
| #define unreachable() (abort()) | |
| #define unreachable() (assert(0)) |
so that, should it ever fire, this gets recognized as an assertion failure rather than a mysterious SIGABRT.
(And yes, given how eager clang and gcc are at arm-wrestling the C specs in order to win at picobenchmarks, I am expecting such unreachable code paths to be reachable when compiled with these compilers at some of their optimization levels).
There was a problem hiding this comment.
But what if C assertions are disabled with NDEBUG? The code path may then be missing a return statement since there won't be a noreturn function to end it.
Similarly, if we used CAMLassert instead, it would result in an empty statement in non-debug runtimes, triggering the same problem.
If I keep an assertion and add a return statement, I'll probably get warnings about an unreachable return statement after the unreachable annotation…
There was a problem hiding this comment.
There is no simple way to solve this without making every compiler combination happy. You probably should not remove the few return following unreachable then.
There was a problem hiding this comment.
The compilers differ too much in behavior… but in my test they complain about missing returns rather that unreachable code. I've changed my code to use CAMLassert here.
|
I am nervous about In the cases where we know for sure that this place is dead code, |
That is my understanding too.
These are valid concerns that I'll try to address. I've remembered in the meantime that GCC can warn whenever a The diff of |
I'm willing to bet the CI will expose at least one compiler which will nevertheless emit warnings for this (its name probably starts with "MS" and ends with "VC"). |
|
I'm a bit disappointed by my last endeavor. GCC, clang, and MSCV have oh-so-subtly different behaviors regarding unreachable warnings and switch exhaustiveness check. This makes the code somewhat more verbose than I've previously hoped. |
ghost
left a comment
There was a problem hiding this comment.
Aside the previous remark, I'm satisfied with the changes as they are now.
|
Antonin Décimo (2024/06/19 01:58 -0700):
In the meantime I'm tempted to leave this as-is an revisit later, if
that's fine.
I'll trust you on that, @dustanddreams?
|
|
Miod Vallat (2024/06/19 02:02 -0700):
@dustanddreams commented on this pull request.
> @@ -1418,7 +1418,7 @@ do_resume: {
#ifndef THREADED_CODE
default:
#ifdef _MSC_VER
- __assume(0);
+ unreachable();
That's fine.
Ah sorry, didn't see this.
|
|
Ah, this now needs to be rebased to resolve the conflict in Feel free to ping me when done. |
|
Rebase is done. Maybe @gasche still has comments? |
|
My comment remains that if some of those I only looked at the new version briefly but my impression is that it suffers from the same problem. For example:
|
68ce0f6 to
3bbb449
Compare
For debug mode, there's a possible alternative implementation where we trap the program if it reaches an
You're right, but...
I can convince myself that the runtime currently cannot fall in the default cases. It's true that it's true that this doesn't prevent us from future errors. As for user FFI code, to me this example shares a lot in common with converting the sum type ocaml representation to and from a C array of integers (think socket flags, or
On the other hand the error case for the interpreter seems to be only handled for the non-threaded code interpreter (debug mode or MSVC). Am I reading the code wrong? What happens when (threaded-code) Adding |
I see no benefit to making our system less usable/debuggable in this way. (In particular I very much doubt that there is any measurable performance benefit -- in the bigarray stuff.) Why do you want to do this? My impression (but I may be wrong and I'm not judging in any way) is that you think that In terms of gcc intrisics, I think we want
Again, this could be done with a hint that says "this code path is unlikely" and then follows with a fatal error, we don't need to introduce more undefined behaviors in our programs to get code-generation benefits. |
|
Thank you for the pointers, that was really helpful. I've changed the definition of |
| default: | ||
| #ifdef _MSC_VER | ||
| __assume(0); | ||
| #else |
There was a problem hiding this comment.
This change may have a negative performance impact, I'm not sure. Have you tested it?
There was a problem hiding this comment.
(The test that I have in mind: use a dumb microbenchmark that does something compute-heavy for long enough, and compare the performance using ocamlrun before and after the change. If you don't see any noticeable performance difference then we should be fine.)
There was a problem hiding this comment.
I've struggled to show that retaining unreachable() in the interpreter brings more than marginally performance benefits. Maybe a more intensive task, or one that would run even longer would show more impressive results.
I tried with merkletrees/2.ml from the programming language benchmarks, with this patch applied to trunk (disabling unreachable), and the current PR (enabling unreachable) (rebased on the same trunk):
diff --git a/runtime/interp.c b/runtime/interp.c
index 3518d8a4cb..baad591fa5 100644
--- a/runtime/interp.c
+++ b/runtime/interp.c
@@ -1417,13 +1417,9 @@ do_resume: {
#ifndef THREADED_CODE
default:
-#ifdef _MSC_VER
- __assume(0);
-#else
caml_fatal_error("bad opcode (%"
ARCH_INTNAT_PRINTF_FORMAT "x)",
(intnat) *(pc-1));
-#endif
}
}
#endiftrunk>hyperfine "runtime\ocamlrun.exe .\ocaml.exe -I stdlib ..\..\merkletrees-2.ml 20"
Benchmark 1: runtime\ocamlrun.exe .\ocaml.exe -I stdlib ..\..\merkletrees-2.ml 20
Time (mean ± σ): 211.071 s ± 2.631 s [User: 209.878 s, System: 0.744 s]
Range (min … max): 207.575 s … 214.750 s 10 runs
assertions>hyperfine "runtime\ocamlrun.exe .\ocaml.exe -I stdlib ..\..\merkletrees-2.ml 20"
Benchmark 1: runtime\ocamlrun.exe .\ocaml.exe -I stdlib ..\..\merkletrees-2.ml 20
Time (mean ± σ): 207.112 s ± 3.019 s [User: 205.886 s, System: 0.642 s]
Range (min … max): 203.065 s … 211.719 s 10 runs
The benchmarks were done on a x64 4-cores Windows 11 VM using MSVC 19.41.33923.
|
In the latest revision, I've introduced two new macros: |
|
The PR doesn't change the behavior of the interpreter if it encounters an invalid opcode. I think that's better reexamined in another PR or issue. |
| caml_array_bound_error(); | ||
| offset = offset * b->dim[i] + (index[i] - 1); | ||
| } | ||
| break; |
There was a problem hiding this comment.
Shouldn't this be here? (cf. L1140)
| break; | |
| break; | |
| default: CAMLtrap(); |
There was a problem hiding this comment.
I've thought about this a bit more, and the bigarray layout is represented by a single bit (#define CAML_BA_LAYOUT_MASK 0x100). There are no invalid values here, if the compiler warns, we can mark it as unreachable (if ever we start using -Wswitch-default or -Wswitch-enum, but they can make the unnecessarily verbose).
There was a problem hiding this comment.
so I've removed the default case.
There was a problem hiding this comment.
Ah, the default case is needed otherwise the compiler thinks there are code paths leaving some variables uninitialized. I'll add default: unreachable(); for switches on the bigarray layout.
|
I've rebased this PR and added a fix for an incorrect assertion I've found using cppcheck in the meantime. Let me know if there's more you'd like to see or unsee here. |
gasche
left a comment
There was a problem hiding this comment.
I had yet yet yet another look at this PR.
The changes are mostly in the default branches of switches that we believe are exhaustive in practice. (Plus one change on the landing code for unsupported bytecodes.)
In some of those default switches you call CAMLtrap(), which is intended to trap, and on some others you call unreachable(), which is intended to be undefined-behavior. The fact that we do two different things rubs me the wrong way. I would prefer if we used a single macro, which is intended to mean "we believe that this point cannot be reached" and which traps. It would be simpler and safer. Is there a good reason not to do this?
(I think that unreachable() would be a perfect name for this macro, because I think that this is what a macro called unreachable() should do. Obviously the designers of C compilers have favored a different design with __builtin_unreachable as they favor performance over correctness. If using unreachable() to mean trap() is shocking to you, maybe impossible()?)
Thanks a lot for your time and energy!
I'm finally using
This would be fine by me. C23 defines I've also experimented with |
It is the job of the compiler to check that the branch is useless and remove it. If they get it wrong, at worst the program is slightly slower. It should not be the job of the author to decide between "helpful macro that will print a user message if I get it wrong" and "shoot myself in the foot", unless you can demonstrate that (1) compilers do get it wrong, and (2) the performance benefits in this specific case matter and justify the loss of safety. I was okay with if-then-else branches in the bigarray machinery, and I am also okay with switches that are slightly more verbose but also more explicit. But I don't want, as the author/reviewer/maintainer/evolver of this code, to have to think hard about which of the two macros I should use, unless you can demonstrate clear benefits to imposing this complexity on us. |
|
I tried entering the following code in goldbolt: enum caml_ba_layout {
CAML_BA_C_LAYOUT = 0, /* Row major, indices start at 0 */
CAML_BA_FORTRAN_LAYOUT = 0x100, /* Column major, indices start at 1 */
};
#define CAML_BA_LAYOUT_SHIFT 8 /* Bit offset of layout flag */
#define CAML_BA_LAYOUT_MASK 0x100 /* Mask for layout in flags field */
#define layout(v) (enum caml_ba_layout)(v & CAML_BA_LAYOUT_MASK)
extern void f(void);
extern void g(void);
/* Type your code here, or load an example. */
void test(int n) {
switch (layout(n)) {
case CAML_BA_C_LAYOUT: return (f());
case CAML_BA_FORTRAN_LAYOUT: return (g());
default: __builtin_trap();
}
}With GCC 14.1, with |
|
This makes sense. Thanks for your comments. Now |
|
If I understand correctly, in C++ Otherwise the rest of the PR looks fine to me now, thanks! |
|
I agree that |
Rather than using the compiler's unreachable builtin, C23, or C++'s unreachable annotation, we prefer trapping the program execution when an unreachable path is taken. MSVC currently has a bug with noreturn/unreachable code analysis and its __fastfail intrinsic in C, requiring the noreturn wrapper function. https://developercommunity.visualstudio.com/t/C-Code-Analysis-should-understand-that/10665570
We use CAMLunreachable() to trap the program execution if an unreachable code path is taken (in case of a programming error, data corruption, …). The C compiler might be able to optimize the unreachable branch away.
Makes it easier to identify which part of the assertion has failed.
In some cases an compilers, this allows to remove the unreachable annotation, and the compiler checks that all cases are covered. In general, it also makes it easier to reason about the space of values. In some other cases, the unreachable annotation is needed to garantee that some variables won't be left uninitialized.
Reported by cppcheck.
|
Approved again. Thanks for the masterful history rewriting. |
|
Thanks for your insights and your perseverance ;) |
|
@MisterDA Unfortunately, this PR breaks compilation on Ubuntu 20.04: The compiler is: |
This PR sports small cleanups around assertions in the runtime.
unreachable()to mark unreachable code where it makes sense, and allow C compilers to optimize the code.CAMLassert(A && B);intoCAMLassert(A); CAMLassert(B)to better identify which part of the assertion fails.CAMLassertcan be replaced withstatic_assert.(using the unreachable annotation will later be useful to optimize
ocamlctoo!)No change entry needed, I think.cc @NickBarnes