Use macros from `limits.h` to prevent signed integer wrap-around warnigns #13083

MisterDA · 2024-04-08T14:22:21Z

The code is currently correct since we use wrap-around semantics for signed integers (-fwrapv), but:

it's difficult to communicate that fact to static analyzers, which warn when computing the minimum integer with left-shifting 1 to the sign bit position (most-significant bit);
MSVC doesn't support wrap-around semantics, but historically hasn't optimized for this (so no harm), and might innocuously warn.

Using constants from <limits.h> instead allows for self-documenting code and silences these warnings.

Computing the minimum signed integer

From the standard (which I recall doesn't consider wrap-around semantics for signed integers):

The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. […] If E1 has a signed type and nonnegative value, and E1 × 2^E2 is can't be represented in the result type, then that is the resulting value; otherwise, the behavior is undefined.

The problem being that the result of 1 << CHAR_BIT * sizeof(int) - 1 to compute the minimum int can't be represented in the result type (it's 2^63, but the maximum is 2^63-1); without wrap-around.

Introduce the INTNAT_MIN macro to avoid independent re-definitions of this value.

Is a change entry needed?
This also prevents warnings raised under Windows by clang-cl and improves code quality with MSVC.

(I might have confused undefined behavior with unspecified behavior, oh well)

runtime/bigarray.c

ghost · 2024-04-08T14:44:38Z

runtime/caml/config.h

@@ -140,16 +140,19 @@ typedef unsigned char uint8_t;
 typedef long intnat;
 typedef unsigned long uintnat;
 #define ARCH_INTNAT_PRINTF_FORMAT "l"
+#define INTNAT_MIN LONG_MIN


Have you tried moving these defines to runtime/caml/misc.h? runtime/caml/config.h doesn't include <limits.h> but these new macros depend on it, so it would make more sense to define them in a place where <limits.h> is included.

Hmmm, it's true that config.h is missing limits.h, but adding the following to misc.h also seems like wasted duplication.

#if SIZEOF_PTR == SIZEOF_LONG /* Standard models: ILP32 or I32LP64 */ #define INTNAT_MIN LONG_MIN #elif SIZEOF_PTR == SIZEOF_INT /* Hypothetical IP32L64 model */ #define INTNAT_MIN INT_MIN #elif SIZEOF_PTR == 8 /* Win64 model: IL32P64 */ #define INTNAT_MIN INT64_MIN #endif

config.h could include limits.h instead, we've switched to C11, and most of the compatibility code around C99 integer types seems to have been added for old MSVC.

The preprocessor logic duplication is unfortunate but probably acceptable (with a comment telling it must match what's in config.h) if adding <limits.h> to config.h is considered too large a change.

I think a Changes entry will be required if config.h now includes <limits.h>.

I'm opting to add limits.h to config.h. I think a follow-up PR could switch entirely to C99 fixed-width integers all the macros and defines of config.h.

NickBarnes · 2024-04-17T13:24:49Z

I'll review this.

NickBarnes

This is all good, a clear improvement.

MisterDA · 2024-04-24T13:51:31Z

This is all good, a clear improvement.

Thanks, I've rebased on trunk and added you as a reviewer.

xavierleroy · 2024-04-28T16:46:24Z

What about using INTPTR_MIN, INTPTR_MAX and UINTPTR_MAX unconditionally? OCaml's value type is, morally, intptr_t, even though it is not defined as such for historical reasons.

NickBarnes · 2024-04-29T10:49:46Z

What about using INTPTR_MIN, INTPTR_MAX and UINTPTR_MAX unconditionally? OCaml's value type is, morally, intptr_t, even though it is not defined as such for historical reasons.

This makes sense to me, and could remove the test for SIZEOF_PTR == SIZEOF_LONG etc in config.h. It does need <stdint.h>, but although I see that we use HAS_STDINT_H in config.h, I suspect that parts of the runtime wouldn't compile at all if <stdint.h> were not available.

While we're on the subject, it's surprising to me that we don't seem to have, or use, CAML_INT_MAX and CAML_INT_MIN (or similar names). Maybe this PR would be a reasonable time to introduce them?

xavierleroy · 2024-04-29T11:16:52Z

although I see that we use HAS_STDINT_H in config.h, I suspect that parts of the runtime wouldn't compile at all if <stdint.h> were not available.

Right. <stdint.h> is standard since C99, and OCaml 5 requires C11, so we should use <stdint.h> unconditionally and remove the configure test for it.

MisterDA · 2024-04-29T17:33:59Z

What about using INTPTR_MIN, INTPTR_MAX and UINTPTR_MAX unconditionally? OCaml's value type is, morally, intptr_t, even though it is not defined as such for historical reasons.

While we're on the subject, it's surprising to me that we don't seem to have, or use, CAML_INT_MAX and CAML_INT_MIN (or similar names).

Two good suggestions. I've changed the definitions to use the {u,}intptr_t limits, and namespaced the macros with the CAML_ prefix. It's technically a breaking change to move from UINTNAT_MAX to CAML_UINTNAT_MAX, but opam grep UINTNAT_MAX doesn't return anything.
Would you rather use INTPTR_MIN directly and not have CAML_INTNAT_MIN?

NickBarnes · 2024-05-02T10:35:41Z

What I meant about CAML_INT_MAX and CAML_INT_MIN was the max and min values of OCaml's int type. I find these are in fact currently defined in mlvalues.h as Max_long and Min_long (which I think are confusing names!).

MisterDA · 2024-05-13T18:18:14Z

I've rebased this PR.

What I meant about CAML_INT_MAX and CAML_INT_MIN was the max and min values of OCaml's int type. I find these are in fact currently defined in mlvalues.h as Max_long and Min_long (which I think are confusing names!).

I've introduced CAML_LONG_{MAX,MIN} macros replacing {Max,Min}_long. I think that LONG instead of INT is more consistent with the current naming. Should I retain the former names for compatibility? Are we convinced that this is a good idea?

NickBarnes · 2024-05-14T08:53:02Z

On reflection we shouldn't change Max_long or Min_long in this PR, and I regret suggesting it.
Those names have been fixed for decades and there may be a lot of code out there using them (opam grep immediately finds base_bigstring for instance). If we did change them, or offer new alternatives, IMO it should be CAML_INT_MAX and CAML_INT_MIN, because they are the maximum and minimum values of the OCaml type int.

MisterDA · 2024-05-14T10:07:35Z

On reflection we shouldn't change Max_long or Min_long in this PR, and I regret suggesting it. Those names have been fixed for decades and there may be a lot of code out there using them (opam grep immediately finds base_bigstring for instance).

My thoughts also, I'll remove that commit.

If we did change them, or offer new alternatives, IMO it should be CAML_INT_MAX and CAML_INT_MIN, because they are the maximum and minimum values of the OCaml type int.

but on 64-bits arches, only Val_long maps to a 63-bit integer, right? not Val_int, which is cast'ed to (int).

MisterDA · 2024-06-03T14:02:11Z

Rebased to fix conflicts, let me know if something can be improved.

MisterDA · 2024-06-10T08:15:16Z

Gentle ping to the reviewers, how can we move forward with this PR?

gasche · 2024-06-10T08:30:21Z

runtime/ints.c

    } else {
-      if (res >  (uint64_t)1 << 63) caml_failwith(INT64_ERRMSG);
+      if (res > (uint64_t)INT64_MAX + UINT64_C(1)) caml_failwith(INT64_ERRMSG);


Are these two changes necessary? They make the code harder to read for me.

(Just to clarify: the github UI shows only one changed line above my comment, but there is a similar change right above in the diff.)

Are these two changes necessary?

No.

They make the code harder to read for me.

I hoped to convey more meaning with a name, and show that there's an error if res is outside the range of int64_t.
The second one can be written as (uint64_t)INT64_MAX + 1 if that's nicer.

There is a comment above that says that we want to accept the range from -2^63 to 2^63-1. It is easy for me to understand why we want that, and to see the connection with the checks sign >= 0 && res >= (uint64_t)1 << 63 and sign < 0 && res > (uint64_t)1 << 63). On the other hand, to my untrained eyes the checks res > (uint64_t)INT64_MAX and res > (uint64_t)INT64_MAX + 1 look confusing and hard to relate to the comment above.

I've reverted this change, and found this one-liner (which I've not included) for the whole block in the meantime:

/* Signed representation expected, allow -2^63 to 2^63 - 1 only */ if (res > (uint64_t)INT64_MAX + (sign < 0)) caml_failwith(INT64_ERRMSG);

Introduce the macro INTNAT_MIN.

This fixes the warning from MSVC raised on -0x80000000. > warning C4146: unary minus operator applied to unsigned type, result > still unsigned The other replacements are made for consistency and, hopefully, legibility.

MisterDA changed the title ~~Limits.h min int~~ Use macros from limits.h to prevent signed integer wrap-around warnigns Apr 8, 2024

ghost reviewed Apr 8, 2024

View reviewed changes

MisterDA force-pushed the limits.h-min-int branch 3 times, most recently from 68289f9 to d36498c Compare April 11, 2024 18:53

dra27 assigned NickBarnes Apr 17, 2024

NickBarnes approved these changes Apr 24, 2024

View reviewed changes

MisterDA force-pushed the limits.h-min-int branch from d36498c to 3b61291 Compare April 24, 2024 13:51

ghost approved these changes Apr 24, 2024

View reviewed changes

NickBarnes mentioned this pull request Apr 29, 2024

We depend on <stdint.h> absolutely, not conditionally. #13134

Merged

MisterDA force-pushed the limits.h-min-int branch from 3b61291 to 1b08350 Compare April 29, 2024 17:20

MisterDA force-pushed the limits.h-min-int branch from 1b08350 to 3ba9f9b Compare May 13, 2024 18:16

MisterDA force-pushed the limits.h-min-int branch from 3ba9f9b to b350290 Compare May 14, 2024 10:10

MisterDA force-pushed the limits.h-min-int branch 2 times, most recently from 2b71514 to cc99c94 Compare June 3, 2024 14:01

MisterDA force-pushed the limits.h-min-int branch 2 times, most recently from dcd6d26 to c6ce70a Compare June 5, 2024 12:40

gasche reviewed Jun 10, 2024

View reviewed changes

MisterDA added 3 commits June 10, 2024 14:01

Use macros from limits.h for bounds when checking for overflow

753b89d

Introduce the macro INTNAT_MIN.

Use macros from limits.h when serializing ints and nativeints

b32aa47

This fixes the warning from MSVC raised on -0x80000000. > warning C4146: unary minus operator applied to unsigned type, result > still unsigned The other replacements are made for consistency and, hopefully, legibility.

Add CAML_{U,}INTNAT_{MIN,MAX} macros exposing {u,}intnat limits

7168368

MisterDA force-pushed the limits.h-min-int branch from c6ce70a to 7168368 Compare June 10, 2024 12:02

gasche approved these changes Jun 10, 2024

View reviewed changes

gasche added the merge-me label Jun 10, 2024

gasche merged commit b77b5c3 into ocaml:trunk Jun 11, 2024
18 checks passed

MisterDA deleted the limits.h-min-int branch June 12, 2024 06:10

Use macros from limits.h to prevent signed integer wrap-around warnigns #13083

Use macros from limits.h to prevent signed integer wrap-around warnigns #13083

Uh oh!

Conversation

MisterDA commented Apr 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NickBarnes commented Apr 17, 2024

Uh oh!

NickBarnes left a comment

Choose a reason for hiding this comment

Uh oh!

MisterDA commented Apr 24, 2024

Uh oh!

xavierleroy commented Apr 28, 2024

Uh oh!

NickBarnes commented Apr 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xavierleroy commented Apr 29, 2024

Uh oh!

MisterDA commented Apr 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NickBarnes commented May 2, 2024

Uh oh!

MisterDA commented May 13, 2024

Uh oh!

NickBarnes commented May 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MisterDA commented May 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MisterDA commented Jun 3, 2024

Uh oh!

MisterDA commented Jun 10, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MisterDA Jun 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Use macros from `limits.h` to prevent signed integer wrap-around warnigns #13083

Use macros from `limits.h` to prevent signed integer wrap-around warnigns #13083

MisterDA commented Apr 8, 2024 •

edited

Loading

NickBarnes commented Apr 29, 2024 •

edited

Loading

MisterDA commented Apr 29, 2024 •

edited

Loading

NickBarnes commented May 14, 2024 •

edited

Loading

MisterDA commented May 14, 2024 •

edited

Loading

MisterDA Jun 10, 2024 •

edited

Loading