-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Stricter C99 warnings, generate primitive list with types #11763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@xavierleroy was faster than me with #11762 which is equivalent to my 54210b5. |
Submitted this commit as #11764. I've used your commit message instead of mine. |
4e48b31
to
68da857
Compare
68da857
to
4a705b5
Compare
The new code uses the preprocessor to expand possible macros in the generation of the primitive prototype (for instance, some comparison functions in runtime/floats.c). The preprocessor will however remove the token CAMLprim, so we redefine in the call CAMLprim to CAMLprim()=0, as a function-like macro will not be expanded if not given any arguments. The list of primitives is kept sorted to benefit from stability if it doesn't change and functions just moved around. A sed script was extracted in a separate file to simplify quoting and escaping in the Makefile. The caching mechanism of runtime/primitives is preserved: if the list of primitives hasn't changed, this file is not modified.
ac1984f
to
4c64e87
Compare
Fedora is considering moving all its packages to C99 and enabling stricter C99 warnings. OCaml has already moved to a C11 compiler, let's enable these warnings on GCC-compatible compilers. Some of the warnings on the following article are already covered by -Wall and -Werror. Reference: [Modernizing Fedora's C code][1]. [1]: https://lwn.net/Articles/913505/ fixup
sed scripts are extracted in files in order to simplify quoting and escaping in the Makefile. It is always possible to cast a function pointer to another function pointer type, and cast again to the original type before calling the pointed function. I pick the base typedef value (*c_primitive)(void); to store the functions in the array, and cast them back in the interpretor according to the arity described by the opcode. Some reference: - [Function pointers in C][1] - [Function pointer casts][2] [1]: https://frama-c.com/2013/08/24/Function-pointers-in-C.html [2]: https://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf
> Warn if storage-class specifiers like static are not the first > things in a declaration.
4c64e87
to
6433d1f
Compare
At the moment, this goes to considerable lengths to generate accurate prototypes for the primitives. Those prototypes are then forcibly cast to --- a/Makefile
+++ b/Makefile
@@ -771,10 +771,10 @@ runtime/primitives: \
cp $^ $@
runtime/prims.c : runtime/primitives
- (echo '#define CAML_INTERNALS'; \
- echo '#include "caml/mlvalues.h"'; \
- echo '#include "caml/prims.h"'; \
- sed -e 's/.*/extern value &();/' $<; \
+ (echo '#include "caml/config.h"'; \
+ echo 'typedef intnat value;'; \
+ echo 'typedef value (*c_primitive)(void);'; \
+ sed -e 's/.*/extern value &(void);/' $<; \
echo 'c_primitive caml_builtin_cprim[] = {'; \
sed -e 's/.*/ &,/' $<; \
echo ' 0 };'; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some suggestions and remarks. I admire your mastery of sed
, but would still prefer a more civilized scripting language.
$(CPP) -I./runtime -D'CAMLprim()=0' "$${prim}" | \ | ||
sed -n -f runtime/gen_primitives.sed; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the preprocessing really needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The preprocessor is needed for some symbols in runtime/floats.c
:
Lines 1127 to 1131 in d0cdf4b
CAMLprim value caml_eq_float DEFINE_NAN_CMP(==) | |
CAMLprim value caml_le_float DEFINE_NAN_CMP(<=) | |
CAMLprim value caml_lt_float DEFINE_NAN_CMP(<) | |
CAMLprim value caml_ge_float DEFINE_NAN_CMP(>=) | |
CAMLprim value caml_gt_float DEFINE_NAN_CMP(>) |
I couldn't extract the arguments without expanding the macros. Using the preprocessor also removes this concern (I forgot to remove the comment):
# If primitives contain duplicated lines (e.g. because the code is defined
# like
# #ifdef X
# CAMLprim value caml_foo() ...
# #else
# CAMLprim value caml_foo() ...
# #endif)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good points, thanks for the explanations. The DEFINE_NAN_CMP
macro in runtime/floats.c
is ugly as hell and could be rewritten. For the alternate definitions, we used to rely on sort | uniq
to keep only one, but it gets more fragile when the prototypes are taken into account, that much is true.
/^CAMLprim value/ { | ||
:loop | ||
/)/ ! { | ||
N; b loop | ||
} | ||
|
||
s/\n */ /g |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know you had conditional jumps in sed... This seems to be in POSIX sed, even.
This said, if you're willing to suppose primitive definitions take at most 2 lines, there's a loop-free approach:
on seeing /^CAMLprim /
, fetch next line with N
, then remove everything following the first )
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could, but caml_array_fill
would disagree!
Lines 557 to 560 in ac52c3a
CAMLprim value caml_array_fill(value array, | |
value v_ofs, | |
value v_len, | |
value val) |
I think there might be fewer surprises (apart from the script itself) if we loop in sed.
@@ -219,7 +219,7 @@ CAMLprim value caml_sys_exit(value retcode) | |||
#endif | |||
#endif | |||
|
|||
const static int sys_open_flags[] = { | |||
static const int sys_open_flags[] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixes the -Wold-style-declaration
warning that I couldn't enable globally, as it's specific to GCC.
echo 'char * caml_names_of_builtin_cprim[] = {'; \ | ||
sed -e 's/.*/ "&",/' $<; \ | ||
sed -e 's/^\([a-z0-9_][a-z0-9_]*\) *(.*)/ "\1",/' $<; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All this sed
without setting LC_CTYPE
makes me nervous. Experience shows that it fails with bizarre locales. Or, use named character classes, e.g. [[:alnum]_]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment above suggested LC_ALL=C
instead of LC_CTYPE
. I'm not familiar with these, which one should I use? In the meantime, I've using export LC_ALL=C;
where needed.
# To speed up builds, we avoid changing "primitives" and | ||
# "primitive-arguments" when files containing primitives change but | ||
# the primitives table does not | ||
runtime/primitives-arguments.new: \ | ||
$(runtime_BUILT_HEADERS) \ | ||
$(runtime_PRIMITIVES) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're moving code from a script into the toplevel Makefile. I'm not sure this is a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be ok: we switch from one trick to use a script to conditionally add a prerequisite or not, which felt like advanced magic, to resetting the modified time of a file.
It does break the Dune build, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Dune build could be a concern. But my more general concern is that too much scripting within a Makefile renders the Makefile unreadable; at some point, having separate scripts that do one thing but do it well becomes preferable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, that's why I've externalised the sed scripts.
However if we chose to keep using the preprocessor, I think it'll be difficult to give the $(CPP)
content to the script due to usual concerns of spaces and word-splitting.
Using the preprocessor to extract the list of primitives solves the issue raised by the example.
> Note that ISO C forbids casting a function pointer to type void *, > but we've been doing this for a very long time, and all C compilers > that I know of are happy with this. (comment by Xavier Leroy) This is documented by C11 Standard J.5.7, Function pointer casts.
Concerning @dra27's suggestion:
On the one hand, it's not legal C11 to declare |
Thank you both for these rounds of comprehensive reviews. As it seems that keeping the primitives prototypes is not useful right now (which is on me, I may have dove to deep in sed and Makefile scripting), I'm closing this PR. I'll submit a simpler one soon, without the full prototypes and the preprocessor pass. We can always revisit this idea later. |
It is always possible to cast a function pointer to another function pointer type, and cast again to the original type before calling the pointed function. I pick the base typedef value (*c_primitive)(void); to store the functions in the array, and cast them back in the interpreter according to the arity described by the opcode. Some references: - [Function pointers in C][1] - [Function pointer casts][2] [1]: https://frama-c.com/2013/08/24/Function-pointers-in-C.html [2]: https://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf The primitives function pointers can be stored as simple void pointers: (Comment by Xavier Leroy: ocaml#11763 (comment)) > Note that ISO C forbids casting a function pointer to type void *, > but we've been doing this for a very long time, and all C compilers > that I know of are happy with this. This is documented by C11 Standard J.5.7, Function pointer casts. The prototypes in `prims.c` and the headers differ, but: (Comment by Xavier Leroy: ocaml#11763 (comment)) > On the one hand, it's not legal C11 to declare value f(void) a > function f that is defined in another compilation unit as value > f(value), say; while I think it is correct to declare it value f(), > like we currently do. On the other hand, this is no worse than the > many other "not legal C11" things we already do.
It is always possible to cast a function pointer to another function pointer type, and cast again to the original type before calling the pointed function. I pick the base typedef value (*c_primitive)(void); to store the functions in the array, and cast them back in the interpreter according to the arity described by the opcode. Some references: - [Function pointers in C][1] - [Function pointer casts][2] [1]: https://frama-c.com/2013/08/24/Function-pointers-in-C.html [2]: https://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf The primitives function pointers can be stored as simple void pointers: (Comment by Xavier Leroy: ocaml#11763 (comment)) > Note that ISO C forbids casting a function pointer to type void *, > but we've been doing this for a very long time, and all C compilers > that I know of are happy with this. This is documented by C11 Standard J.5.7, Function pointer casts. The prototypes in `prims.c` and the headers differ, but: (Comment by Xavier Leroy: ocaml#11763 (comment)) > On the one hand, it's not legal C11 to declare value f(void) a > function f that is defined in another compilation unit as value > f(value), say; while I think it is correct to declare it value f(), > like we currently do. On the other hand, this is no worse than the > many other "not legal C11" things we already do. (cherry picked from commit 8db1817) Use strict prototypes on primitives in bytecomp The bytecode compiler also extracts the list of primitives to a C file, similarly as the Makefile. The Makefile was fixed in 8db1817 to account for stricter C function prototypes, but the bytecomp problems were not discovered back then. This fixes the C code generation, and uses the same names and produces the same output for the two generators. (cherry picked from commit 0c2ba6a)
It is always possible to cast a function pointer to another function pointer type, and cast again to the original type before calling the pointed function. I pick the base typedef value (*c_primitive)(void); to store the functions in the array, and cast them back in the interpreter according to the arity described by the opcode. Some references: - [Function pointers in C][1] - [Function pointer casts][2] [1]: https://frama-c.com/2013/08/24/Function-pointers-in-C.html [2]: https://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf The primitives function pointers can be stored as simple void pointers: (Comment by Xavier Leroy: ocaml#11763 (comment)) > Note that ISO C forbids casting a function pointer to type void *, > but we've been doing this for a very long time, and all C compilers > that I know of are happy with this. This is documented by C11 Standard J.5.7, Function pointer casts. The prototypes in `prims.c` and the headers differ, but: (Comment by Xavier Leroy: ocaml#11763 (comment)) > On the one hand, it's not legal C11 to declare value f(void) a > function f that is defined in another compilation unit as value > f(value), say; while I think it is correct to declare it value f(), > like we currently do. On the other hand, this is no worse than the > many other "not legal C11" things we already do. (cherry picked from commit 8db1817) Use strict prototypes on primitives in bytecomp The bytecode compiler also extracts the list of primitives to a C file, similarly as the Makefile. The Makefile was fixed in 8db1817 to account for stricter C function prototypes, but the bytecomp problems were not discovered back then. As the startup code includes `caml/mlvalues.h`, two primitives are declared twice with conflicting types. We prevent this with preprocessor tricks before including the header. The generated C code now also uses the same names as in the Makefile. (cherry picked from commit e650dd9)
It is always possible to cast a function pointer to another function pointer type, and cast again to the original type before calling the pointed function. I pick the base typedef value (*c_primitive)(void); to store the functions in the array, and cast them back in the interpreter according to the arity described by the opcode. Some references: - [Function pointers in C][1] - [Function pointer casts][2] [1]: https://frama-c.com/2013/08/24/Function-pointers-in-C.html [2]: https://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf The primitives function pointers can be stored as simple void pointers: (Comment by Xavier Leroy: ocaml#11763 (comment)) > Note that ISO C forbids casting a function pointer to type void *, > but we've been doing this for a very long time, and all C compilers > that I know of are happy with this. This is documented by C11 Standard J.5.7, Function pointer casts. The prototypes in `prims.c` and the headers differ, but: (Comment by Xavier Leroy: ocaml#11763 (comment)) > On the one hand, it's not legal C11 to declare value f(void) a > function f that is defined in another compilation unit as value > f(value), say; while I think it is correct to declare it value f(), > like we currently do. On the other hand, this is no worse than the > many other "not legal C11" things we already do. (cherry picked from commit 8db1817) Use strict prototypes on primitives in bytecomp The bytecode compiler also extracts the list of primitives to a C file, similarly as the Makefile. The Makefile was fixed in 8db1817 to account for stricter C function prototypes, but the bytecomp problems were not discovered back then. As the startup code includes `caml/mlvalues.h`, two primitives are declared twice with conflicting types. We prevent this with preprocessor tricks before including the header. The generated C code now also uses the same names as in the Makefile. (cherry picked from commit e650dd9)
I was working on this at the end of last week after reading Modernizing Fedora's C code, and over the week-end #11759 was opened which this PR should fix.
This PR:
It generates the list of primitives with their arguments by running first the C preprocessor on C files containing primitives, extracting the functions with a sed script. It retains the caching behaviour introduced in #8696. It also introduces
PrimitiveX
macros for all primitive arities.Possible follow-ups for this PR include checking support from the compiler for the warnings, defining a better
CAMLprim
macro declaring the arity of the function and possibly using it at runtime to check the arity of a called function.