Skip to content

Stricter C99 warnings, generate primitive list with types #11763

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from

Conversation

MisterDA
Copy link
Contributor

I was working on this at the end of last week after reading Modernizing Fedora's C code, and over the week-end #11759 was opened which this PR should fix.

This PR:

  • enables new warning for stricter C99 generation;
  • fixes strict-prototypes and old-style-definition warnings;
  • generates the list of primitives with their argument list.

It generates the list of primitives with their arguments by running first the C preprocessor on C files containing primitives, extracting the functions with a sed script. It retains the caching behaviour introduced in #8696. It also introduces PrimitiveX macros for all primitive arities.

Possible follow-ups for this PR include checking support from the compiler for the warnings, defining a better CAMLprim macro declaring the arity of the function and possibly using it at runtime to check the arity of a called function.

@MisterDA
Copy link
Contributor Author

MisterDA commented Nov 28, 2022

@xavierleroy was faster than me with #11762 which is equivalent to my 54210b5.

@xavierleroy
Copy link
Contributor

Could you please submit 54210b5 (the middle commit) as a separate PR, so that it can be merged immediately? (Plus, it subsumes my PR #11762, so I'll be able to close #11762.)

@xavierleroy
Copy link
Contributor

I think your 54210b5 is more comprehensive than my #11762, so let's use your code instead.

@MisterDA
Copy link
Contributor Author

Submitted this commit as #11764. I've used your commit message instead of mine.

The new code uses the preprocessor to expand possible macros in the
generation of the primitive prototype (for instance, some comparison
functions in runtime/floats.c). The preprocessor will however remove
the token CAMLprim, so we redefine in the call CAMLprim to
CAMLprim()=0, as a function-like macro will not be expanded if not
given any arguments.

The list of primitives is kept sorted to benefit from stability if it
doesn't change and functions just moved around.

A sed script was extracted in a separate file to simplify quoting and
escaping in the Makefile.

The caching mechanism of runtime/primitives is preserved: if the list
of primitives hasn't changed, this file is not modified.
@MisterDA MisterDA force-pushed the stricter-c99 branch 2 times, most recently from ac1984f to 4c64e87 Compare November 29, 2022 13:30
Fedora is considering moving all its packages to C99 and enabling
stricter C99 warnings. OCaml has already moved to a C11 compiler,
let's enable these warnings on GCC-compatible compilers.

Some of the warnings on the following article are already covered by
-Wall and -Werror.

Reference: [Modernizing Fedora's C code][1].

[1]: https://lwn.net/Articles/913505/

fixup
sed scripts are extracted in files in order to simplify quoting and
escaping in the Makefile.

It is always possible to cast a function pointer to another function
pointer type, and cast again to the original type before calling the
pointed function. I pick the base

    typedef value (*c_primitive)(void);

to store the functions in the array, and cast them back in the
interpretor according to the arity described by the opcode.

Some reference:
- [Function pointers in C][1]
- [Function pointer casts][2]

[1]: https://frama-c.com/2013/08/24/Function-pointers-in-C.html
[2]: https://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf
> Warn if storage-class specifiers like static are not the first
> things in a declaration.
@dra27
Copy link
Member

dra27 commented Nov 29, 2022

At the moment, this goes to considerable lengths to generate accurate prototypes for the primitives. Those prototypes are then forcibly cast to c_primitive in order to put them all in caml_builtin_cprim. Wouldn't it be easier just to declare them all as caml_prim(void) in prims.c? There's a tiny gotcha because runtime/caml/mlvalues.h contains the actual prototypes for caml_set_oo_name and caml_get_public_method but this can be relatively easier worked around to get this patch which when combined with just runtime/interp.c and runtime/caml/prims.h from 63cc827 also builds without triggering any warnings:

--- a/Makefile
+++ b/Makefile
@@ -771,10 +771,10 @@ runtime/primitives: \
        cp $^ $@

 runtime/prims.c : runtime/primitives
-       (echo '#define CAML_INTERNALS'; \
-         echo '#include "caml/mlvalues.h"'; \
-        echo '#include "caml/prims.h"'; \
-        sed -e 's/.*/extern value &();/' $<; \
+       (echo '#include "caml/config.h"'; \
+        echo 'typedef intnat value;'; \
+        echo 'typedef value (*c_primitive)(void);'; \
+        sed -e 's/.*/extern value &(void);/' $<; \
         echo 'c_primitive caml_builtin_cprim[] = {'; \
         sed -e 's/.*/  &,/' $<; \
         echo '  0 };'; \

Copy link
Contributor

@xavierleroy xavierleroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggestions and remarks. I admire your mastery of sed, but would still prefer a more civilized scripting language.

Comment on lines +780 to +781
$(CPP) -I./runtime -D'CAMLprim()=0' "$${prim}" | \
sed -n -f runtime/gen_primitives.sed; \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the preprocessing really needed?

Copy link
Contributor Author

@MisterDA MisterDA Dec 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The preprocessor is needed for some symbols in runtime/floats.c:

ocaml/runtime/floats.c

Lines 1127 to 1131 in d0cdf4b

CAMLprim value caml_eq_float DEFINE_NAN_CMP(==)
CAMLprim value caml_le_float DEFINE_NAN_CMP(<=)
CAMLprim value caml_lt_float DEFINE_NAN_CMP(<)
CAMLprim value caml_ge_float DEFINE_NAN_CMP(>=)
CAMLprim value caml_gt_float DEFINE_NAN_CMP(>)

I couldn't extract the arguments without expanding the macros. Using the preprocessor also removes this concern (I forgot to remove the comment):

# If primitives contain duplicated lines (e.g. because the code is defined
# like
# #ifdef X
# CAMLprim value caml_foo() ...
# #else
# CAMLprim value caml_foo() ...
# #endif)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points, thanks for the explanations. The DEFINE_NAN_CMP macro in runtime/floats.c is ugly as hell and could be rewritten. For the alternate definitions, we used to rely on sort | uniq to keep only one, but it gets more fragile when the prototypes are taken into account, that much is true.

Comment on lines +16 to +22
/^CAMLprim value/ {
:loop
/)/ ! {
N; b loop
}

s/\n */ /g
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know you had conditional jumps in sed... This seems to be in POSIX sed, even.

This said, if you're willing to suppose primitive definitions take at most 2 lines, there's a loop-free approach:
on seeing /^CAMLprim /, fetch next line with N, then remove everything following the first ).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could, but caml_array_fill would disagree!

ocaml/runtime/array.c

Lines 557 to 560 in ac52c3a

CAMLprim value caml_array_fill(value array,
value v_ofs,
value v_len,
value val)

I think there might be fewer surprises (apart from the script itself) if we loop in sed.

@@ -219,7 +219,7 @@ CAMLprim value caml_sys_exit(value retcode)
#endif
#endif

const static int sys_open_flags[] = {
static const int sys_open_flags[] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes the -Wold-style-declaration warning that I couldn't enable globally, as it's specific to GCC.

echo 'char * caml_names_of_builtin_cprim[] = {'; \
sed -e 's/.*/ "&",/' $<; \
sed -e 's/^\([a-z0-9_][a-z0-9_]*\) *(.*)/ "\1",/' $<; \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this sed without setting LC_CTYPE makes me nervous. Experience shows that it fails with bizarre locales. Or, use named character classes, e.g. [[:alnum]_].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment above suggested LC_ALL=C instead of LC_CTYPE. I'm not familiar with these, which one should I use? In the meantime, I've using export LC_ALL=C; where needed.

Comment on lines +772 to +777
# To speed up builds, we avoid changing "primitives" and
# "primitive-arguments" when files containing primitives change but
# the primitives table does not
runtime/primitives-arguments.new: \
$(runtime_BUILT_HEADERS) \
$(runtime_PRIMITIVES) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're moving code from a script into the toplevel Makefile. I'm not sure this is a good idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be ok: we switch from one trick to use a script to conditionally add a prerequisite or not, which felt like advanced magic, to resetting the modified time of a file.
It does break the Dune build, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Dune build could be a concern. But my more general concern is that too much scripting within a Makefile renders the Makefile unreadable; at some point, having separate scripts that do one thing but do it well becomes preferable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, that's why I've externalised the sed scripts.
However if we chose to keep using the preprocessor, I think it'll be difficult to give the $(CPP) content to the script due to usual concerns of spaces and word-splitting.

Using the preprocessor to extract the list of primitives solves the
issue raised by the example.
> Note that ISO C forbids casting a function pointer to type void *,
> but we've been doing this for a very long time, and all C compilers
> that I know of are happy with this.

(comment by Xavier Leroy)

This is documented by C11 Standard J.5.7, Function pointer casts.
@xavierleroy
Copy link
Contributor

Concerning @dra27's suggestion:

At the moment, this goes to considerable lengths to generate accurate prototypes for the primitives. Those prototypes are then forcibly cast to c_primitive in order to put them all in caml_builtin_cprim. Wouldn't it be easier just to declare them all as caml_prim(void) in prims.c? There's a tiny gotcha because runtime/caml/mlvalues.h contains the actual prototypes for caml_set_oo_name and caml_get_public_method but this can be relatively easier worked around […]

On the one hand, it's not legal C11 to declare value f(void) a function f that is defined in another compilation unit as value f(value), say; while I think it is correct to declare it value f(), like we currently do. On the other hand, this is no worse than the many other "not legal C11" things we already do, and it does work around the problem with great economy of means. So maybe we should commit this solution (plus the modified prims.h and interp.c files) first.

@MisterDA
Copy link
Contributor Author

MisterDA commented Jan 2, 2023

Thank you both for these rounds of comprehensive reviews. As it seems that keeping the primitives prototypes is not useful right now (which is on me, I may have dove to deep in sed and Makefile scripting), I'm closing this PR. I'll submit a simpler one soon, without the full prototypes and the preprocessor pass. We can always revisit this idea later.

@MisterDA MisterDA closed this Jan 2, 2023
MisterDA added a commit to MisterDA/ocaml that referenced this pull request Jan 12, 2023
It is always possible to cast a function pointer to another function
pointer type, and cast again to the original type before calling the
pointed function. I pick the base

    typedef value (*c_primitive)(void);

to store the functions in the array, and cast them back in the
interpreter according to the arity described by the opcode.

Some references:
- [Function pointers in C][1]
- [Function pointer casts][2]

[1]: https://frama-c.com/2013/08/24/Function-pointers-in-C.html
[2]: https://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf

The primitives function pointers can be stored as simple void
pointers:

(Comment by Xavier Leroy: ocaml#11763 (comment))

> Note that ISO C forbids casting a function pointer to type void *,
> but we've been doing this for a very long time, and all C compilers
> that I know of are happy with this.

This is documented by C11 Standard J.5.7, Function pointer casts.

The prototypes in `prims.c` and the headers differ, but:

(Comment by Xavier Leroy: ocaml#11763 (comment))

> On the one hand, it's not legal C11 to declare value f(void) a
> function f that is defined in another compilation unit as value
> f(value), say; while I think it is correct to declare it value f(),
> like we currently do. On the other hand, this is no worse than the
> many other "not legal C11" things we already do.
@MisterDA MisterDA deleted the stricter-c99 branch April 6, 2023 13:08
MisterDA added a commit to MisterDA/ocaml that referenced this pull request Aug 25, 2023
It is always possible to cast a function pointer to another function
pointer type, and cast again to the original type before calling the
pointed function. I pick the base

    typedef value (*c_primitive)(void);

to store the functions in the array, and cast them back in the
interpreter according to the arity described by the opcode.

Some references:
- [Function pointers in C][1]
- [Function pointer casts][2]

[1]: https://frama-c.com/2013/08/24/Function-pointers-in-C.html
[2]: https://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf

The primitives function pointers can be stored as simple void
pointers:

(Comment by Xavier Leroy: ocaml#11763 (comment))

> Note that ISO C forbids casting a function pointer to type void *,
> but we've been doing this for a very long time, and all C compilers
> that I know of are happy with this.

This is documented by C11 Standard J.5.7, Function pointer casts.

The prototypes in `prims.c` and the headers differ, but:

(Comment by Xavier Leroy: ocaml#11763 (comment))

> On the one hand, it's not legal C11 to declare value f(void) a
> function f that is defined in another compilation unit as value
> f(value), say; while I think it is correct to declare it value f(),
> like we currently do. On the other hand, this is no worse than the
> many other "not legal C11" things we already do.

(cherry picked from commit 8db1817)

Use strict prototypes on primitives in bytecomp

The bytecode compiler also extracts the list of primitives to a C
file, similarly as the Makefile. The Makefile was fixed in
8db1817 to account for stricter C
function prototypes, but the bytecomp problems were not discovered
back then.

This fixes the C code generation, and uses the same names and produces
the same output for the two generators.

(cherry picked from commit 0c2ba6a)
MisterDA added a commit to MisterDA/ocaml that referenced this pull request Aug 29, 2023
It is always possible to cast a function pointer to another function
pointer type, and cast again to the original type before calling the
pointed function. I pick the base

    typedef value (*c_primitive)(void);

to store the functions in the array, and cast them back in the
interpreter according to the arity described by the opcode.

Some references:
- [Function pointers in C][1]
- [Function pointer casts][2]

[1]: https://frama-c.com/2013/08/24/Function-pointers-in-C.html
[2]: https://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf

The primitives function pointers can be stored as simple void
pointers:

(Comment by Xavier Leroy: ocaml#11763 (comment))

> Note that ISO C forbids casting a function pointer to type void *,
> but we've been doing this for a very long time, and all C compilers
> that I know of are happy with this.

This is documented by C11 Standard J.5.7, Function pointer casts.

The prototypes in `prims.c` and the headers differ, but:

(Comment by Xavier Leroy: ocaml#11763 (comment))

> On the one hand, it's not legal C11 to declare value f(void) a
> function f that is defined in another compilation unit as value
> f(value), say; while I think it is correct to declare it value f(),
> like we currently do. On the other hand, this is no worse than the
> many other "not legal C11" things we already do.

(cherry picked from commit 8db1817)

Use strict prototypes on primitives in bytecomp

The bytecode compiler also extracts the list of primitives to a C
file, similarly as the Makefile. The Makefile was fixed in
8db1817 to account for stricter C
function prototypes, but the bytecomp problems were not discovered
back then.

As the startup code includes `caml/mlvalues.h`, two primitives are
declared twice with conflicting types. We prevent this with
preprocessor tricks before including the header.

The generated C code now also uses the same names as in the Makefile.

(cherry picked from commit e650dd9)
MisterDA added a commit to MisterDA/ocaml that referenced this pull request Sep 18, 2023
It is always possible to cast a function pointer to another function
pointer type, and cast again to the original type before calling the
pointed function. I pick the base

    typedef value (*c_primitive)(void);

to store the functions in the array, and cast them back in the
interpreter according to the arity described by the opcode.

Some references:
- [Function pointers in C][1]
- [Function pointer casts][2]

[1]: https://frama-c.com/2013/08/24/Function-pointers-in-C.html
[2]: https://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf

The primitives function pointers can be stored as simple void
pointers:

(Comment by Xavier Leroy: ocaml#11763 (comment))

> Note that ISO C forbids casting a function pointer to type void *,
> but we've been doing this for a very long time, and all C compilers
> that I know of are happy with this.

This is documented by C11 Standard J.5.7, Function pointer casts.

The prototypes in `prims.c` and the headers differ, but:

(Comment by Xavier Leroy: ocaml#11763 (comment))

> On the one hand, it's not legal C11 to declare value f(void) a
> function f that is defined in another compilation unit as value
> f(value), say; while I think it is correct to declare it value f(),
> like we currently do. On the other hand, this is no worse than the
> many other "not legal C11" things we already do.

(cherry picked from commit 8db1817)

Use strict prototypes on primitives in bytecomp

The bytecode compiler also extracts the list of primitives to a C
file, similarly as the Makefile. The Makefile was fixed in
8db1817 to account for stricter C
function prototypes, but the bytecomp problems were not discovered
back then.

As the startup code includes `caml/mlvalues.h`, two primitives are
declared twice with conflicting types. We prevent this with
preprocessor tricks before including the header.

The generated C code now also uses the same names as in the Makefile.

(cherry picked from commit e650dd9)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants