Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

As possible, eliminate undefined behavior from stdlib #7812

Open
vicuna opened this issue Jun 23, 2018 · 4 comments

Comments

Projects
None yet
1 participant
@vicuna
Copy link

commented Jun 23, 2018

Original bug ID: 7812
Reporter: @whitequark
Status: acknowledged (set by @xavierleroy on 2018-06-23T15:39:30Z)
Resolution: open
Priority: normal
Severity: minor
Category: standard library

Bug description

Unlike the C language, the OCaml language does not have a strict notion of undefined behavior that gives huge leeway to optimizer and is relatively well understood by programmers. In spite of that, the OCaml standard library documents certain behaviors as "undefined".

This is undesirable for several reasons. First, just saying "undefined" does not communicate very well what could happen, which is anything. In C, if an execution ever encounters undefined behavior, the entire execution (before, during, and after) is illegal, and can perform any operation, but the OCaml programmer is not familiar with this.

Second, the standard library leaves as undefined certain behaviors that could trivially be made perfectly well defined, or at least aren't actually undefined and ought to be documented as e.g. raising an exception.

Here is the complete list:

  • Filename.chop_suffix: would raise an exception or chop an unrelated part of filename. Should be fixed to raise Invalid_argument in that case instead, for consistency with chop_extension.

  • Buffer.to_seq, Buffer.to_seqi: raises an exception if a buffer is truncated, or returns no new elements if it's extended. There doesn't appear to be any implementation benefit from leaving it undefined, although I may just be missing it and in that case please correct me.

  • Sort.merge: returns incorrect values. (It's deprecated since 2000 so it can probably be removed already.)

  • Int32.of_float, Int64.of_float, Nativeint.of_float, Targetint.of_float: would actually invoke UB in C on out-of-range conversions. Either the FP environment should be set up appropriately, or the documentation should get a very stern warning that anything can happen, including the OCaml process dying with SIGFPE. This is the only case where it actually makes sense (barely, perhaps) to have UB.

@vicuna

This comment has been minimized.

Copy link
Author

commented Jun 23, 2018

Comment author: @xavierleroy

Concerning the FP-to-integer conversions, your analysis is overly pessimistic. Of course we assume the C compiler to implement IEEE-754 FP arithmetic as described in Annex F of the ISO C standards. The Annex says:

F.4 Floating to integer conversion

If the floating value is infinite or NaN or if the integral part of
the floating value exceeds the range of the integer type, then the
‘‘invalid’’ floating-point exception is raised and the resulting value
is unspecified.

F.7.3 Execution

At program startup the floating-point environment is initialized as
prescribed by IEC 60559:
[...]
— Trapping or stopping (if supported) is disabled on all
floating-point exceptions.

So, an overflow in FP->integer conversion is not UB and will just return an unspecified integer value.

Additionally, chapter 7 (standard library) specifies no library function to enable trapping or stopping behaviors for FP exceptions. (In glibc, it's a nonstandard function, feenableexcept().) So, you have to go out of your way to get a SIGFPE out of an FP->integer overflow.

@vicuna

This comment has been minimized.

Copy link
Author

commented Jun 23, 2018

Comment author: @xavierleroy

More generally, I agree that the use of "undefined behavior" in OCaml's library function descriptions is a bad choice of words, given the sinister implications "undefined behavior" has in C. For OCaml functions, it's more like "unspecified results", i.e. the function can return any value of its return type, or raise any exception, but will have no other effect.

@vicuna

This comment has been minimized.

Copy link
Author

commented Jul 10, 2018

Comment author: @whitequark

I think you're right that this is not UB, but an unspecified bit pattern is still quite bad because upon "use of an unspecified value, or other behavior where this International Standard provides two or more possibilities and imposes no further requirements on which is chosen in any instance".

That is, if you have a function

uint8_t f() { return (uint8_t)1e100; }

then both of these programs may print "taken":

if(f() == 1 && f() == 2) puts("taken");

uint8_t x = f();
if(x ^ x) puts("taken");

(LLVM has a section explicitly calling out this behavior as valid: https://llvm.org/docs/LangRef.html#undefined-values).

In practice, f may be translated to just (on e.g. x86) ret, and the return value will be taken from whatever junk was in rax. You can also stumble upon code that would exhibit apparently absurd yet deterministic behavior when the optimizer chooses to interpret the unspecified bit pattern differently in different instances, or behavior differing depending on optimization level. (There are well-known examples of such code that you will be able to find, though I don't consider them important for discussion because the standard clearly permits this behavior already.)

This is of course less bad than UB but I would argue that this still does not match the mental model an OCaml programmer may have about unspecified bit patterns. In particular, an OCaml programmer (and, really, most C programmers) would expect an unspecified bit pattern to stay the same so long as whatever operation produces it stays the same, but this isn't the case in C.

@vicuna

This comment has been minimized.

Copy link
Author

commented Jul 10, 2018

Comment author: @whitequark

I should perhaps talk more about real-world impact of this use of unspecified values in OCaml runtime. I don't think any real-world C compiler out there would actually produce code for the OCaml runtime that would lead to algebraically impossible results like f () != f (), although it would certainly be permitted to.

However, real-world C compilers will definitely produce code that in case of out-of-range conversions will have different results on different platforms (off the top of my head I think x86 and ARM64 would have different behavior here, though I'm far from an expert) and with different optimization levels, and in my opinion the OCaml compiler and runtime should either not permit this, or very clearly state that this may happen.

@vicuna vicuna added stdlib bug labels Mar 14, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.