New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Signed integer overflow -- undefined behavior #32
Comments
A possible solution is to "define" the behavior of integer overflow by Another solution is to make signed integer overflow also undefined in SSM, and reflect that in the interpreter. And this would require some modification in the trace comparator to account for what that means. But I'm not sure undefined behavior is such a great idea in a langauge spec, since it means that the compiler is allowed to be less predictable. |
I didn't really know what to do here either :/ I always opt for whatever is simplest and works lol, even if its a hack. Hacks are indeed fragile though. Perhaps reflecting in the trace somehow that its undefined is best, but I am not sure how to do that best |
@sedwards-lab suggested I look at what other languages do. Java silently underflows or overflows, though Java 8 also provides the Haskell leaves over/underflow behavior undefined. It seems to exhibit wraparound behavior in the REPL and at least in some instances of the compiled code, but it leaves the crash encountered in MultOverflow unexplained. For the purposes of our interpreter, overflow and underflow should be checked for and handled explicitly. LLVM uses wraparound behavior by default for both signed and unsigned arithemtic, but allows this to be disabled using the The matter of whether LLVM compiles to machine code with efficient overflow checks was studied and written about by John Regehr. This suggests to me that naively implementing the overflow checks ourselves in C is not necessarily the best solution, since the efficiency of our code will be subject to the whims of the C compiler and optimizer. Golang explicitly specifies wraparound behavior, which precludes some compiler optimizations. In particular, one cannot assume The solution used by both Rust and Zig is to use different behavior depending on the compilation mode: for debug builds, overflows are checked and throw an exception, while for release builds, overflows go unchecked. A discussion about integer overflow in Rust can be found here. |
I added the
We can do this by adding an additional event condition, |
When would we emit such an event? Wouldn't we need to check for the overflows in order to know when to emit it? It's true that it's not urgebt. I'd be fine with whatever quick hacks would let us progress. |
When evaluating expressions, we could convert all integer values to
`Integer`, and check that the resulting value is within range before
converting back. This obviously kills performance but that's not a big deal
in the interpreter, where correctness is paramount.
…On Mon, Jun 28, 2021, 3:19 PM Robert Krook ***@***.***> wrote:
When would we emit such an event? Wouldn't we need to check for the
overflows in order to know when to emit it?
It's true that it's not urgebt. I'd be fine with whatever quick hacks
would let us progress.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#32 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AC2A5DFEBUG67FNQ4H7TX6LTVDDNRANCNFSM47KP6G5Q>
.
|
@j-hui how did you verify it was the C compiler's optimization that was doing this? I could see the product of two large positive integers producing in a negative two's complement integer after overflow. |
I tried playing around with this with Godbolt Compiler Explorer. The simplest example can be found here: https://godbolt.org/z/8xjqhxTjW Basically, we're compiling this: if (0 <= x * x)
printf("then");
else
printf("else"); If With @sedwards-lab , to answer your question, I put in an example that more closely resembles what is in the test case: https://godbolt.org/z/avrGcxfx3. The difference is that It turns out that, with |
I initially encountered this and asked @svenssonjoel about it, but the code behaved differently on his machine (valid since it's all just undefined). If we do implement the fix where we emit an |
My latest thought: determinism uber alles should be a central goal of all of our work. I'd like integer arithmetic to have completely defined behavior, including overflow. The problem here seems to be that C compilers don't have completely deterministic integer arithmetic semantics, which is very unfortunate. What I'd like to do is to somehow generate deterministic C code for this example. I don't know how |
For my specific version of gcc I noticed that int a = -some number-;
int b = -some other number-;
return a + b < 0; would return #ifdef DEFINED_INTEGER_PLUS
int add(int a, int b) { return a + b; }
#define ADD(a,b) add(a,b)
#else
#define ADD(a,b) (a + b)
#endif
...
...
...
int a = -some number-;
int b = -some other number-;
return ADD(a,b) < 0 gcc was no longer able to notice the overflow and optimize away the instructions. This does not seem robust and I am not sure to what extent this works. It's a quick hack. |
@Rewbert this really only works if we can reliably prevent gcc from inlining the @sedwards-lab +1 for determinism. But would reliably crashing on overflow be a reasonable semantics? I'm interested in this (rather than mandating overflow behavior) because we can do what Rust/Zig do: add in explicit checks for debug builds which crash on overflow, while removing these checks for release builds. If we want to reliably but efficiently implement overflow behavior, we would probably need to sidestep C. The hard way to do this would be to hard-code the assembly code for each platform, but a somewhat easier way could be to implement each arithmetic operation in LLVM IR (for which overflow is well-defined) to take advantage of LLVM's various backends, and link it in; with a linker that supports link-time optimization these could all be inlined, to avoid all the unnecessary jumps. |
@sedwards-lab and I discussed this issue offline, and came to the following conclusions:
I need to look into what the C standard says about casting between signed and unsigned integer representations, since that is essential to our C implementation strategy being at all reliable. |
Sounds like a nice plan! |
I found two bugs that are related to integer overflow---these will be checked into the low-regression test suite in #30 .
I shrank these down into these two programs:
MultOverflowIndirect:
And MultOverflow:
These have the same general cause, but two different symptoms. The cause is that signed integer overflow is undefined in C, allowing it to optimize a statement like
0 < (v * v)
into1
when the value ofv
is known to be non-zero, allowing the conditional and the computation of its condition to be pruned out altogether. Meanwhile, the Haskell interpreter naively evaluates the computation, which overflows.In MultOverflowIndirect, this seems to cause the interpreter to take the
else
branch instead. Meanwhile, in MultOverflow, where I inlined the v0 value, it crashes for some reason.I'm not familiar enough with the interpreter to directly diagnose this issue, but I know that @Rewbert came across this exact same issue before with integer arithmetic. His solution was to wrap
+
in a function_add
, which would spook the optimizer enough that the problem went away. But it doesn't seem like a complete solution, since it doesn't address the root cause, and an aggressive-enough optimizer would probably be able to inline_add
and prune out the branch anyway.The text was updated successfully, but these errors were encountered: