-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Description
This is a summary document with the issues we have found during C++ port. We intend to use this as a guidance how to resolve compilation issues if we want to have a common code base which compiles with C and C++ compilers
Every problem has been solved, in at least one way, and all problems have working solutions.
("working" meaning many CI lanes are green, just finishing off Windows.)
Also most these proposals are already in master to some extent (except "autocast" and enum operators), but not every instance of every problem has been fixed yet in master.
The github links often do not work particularly well, jumping to a commit or PR but not landing on the correct lines, without waiting and/or scrolling around.
void*-vs-non-void*-function-return [voidp-ret]
This is far and away our biggest problem. While the obvious fix is very safe and mechanical and locally small, and everyone understands it, and it obviously preserves semantics, it is the largest due to number of occurrences.
C allows the following:
void* get_foo();
int *a = get_foo();gcc allows it with -fpermissive but msvc and clang do not.
Proposals
Insert casts as needed [voidp-ret-cast]
This is very simple, works, no #ifdef __cplusplus, very voluminous.
int *a = (int*)get_foo();e.g.
#10045
67906d3#diff-34d834b8d69724f49f21964be1e1258dL85
This seems like the most obvious choice, but there are others.
An example of a PR "with all the casts", without "auto cast" is:
https://github.com/mono/mono/pull/9950/files
"auto cast" does not reduce this to zero, but it does reduce it a lot.
Resolution
TBD
[voidp-ret-autocast]
In many cases, not all, a trick:
#ifdef __cplusplus
struct g_cast { .. something-not-hypothetical .. };
#else
#define g_cast(x) x
#endif
#define get_foo() (g_cast (get_foo())This works with g_malloc, g_malloc0, and many others, but not all.
It does cut down the line damage from other choices.
It would have to be combined with another option.
https://github.com/mono/mono/pull/10027/files#diff-b8c6c2ba344450884373fb00f33685adR57
+
https://github.com/mono/mono/pull/10027/files#diff-b8c6c2ba344450884373fb00f33685adR1272
Still leaving all these casts:
#10036
Change output signature [voidp-ret-voidpp]
int* a;
get_foo((void**)&a);No examples written but obvious.
Disadvantage: Prefer not to change C signatures.
Sometimes change output signature [voidp-ret-strengthen]
This is probably how those specific functions should be, but does not apply broadly.
Change g_malloc to g_new [voidp-ret-gnew]
It works, but does not apply broadly due to "array at end" and "other allocators".
Templatize [voidp-ret-templatize]
This is probably outside scope at this time.
Resolution
TBD
void*-vs-non-void* in function parameters values [voidp-param]
For example:
void mono_gc_wbarrier_generic_store (..., MonoObject*);
gpointer value;
mono_gc_wbarrier_generic_store (value_addr, value); // error: value must be MonoObject*Proposals
[voidp-param-cast]
mono_gc_wbarrier_generic_store (value_addr, (MonoObject*)value);weaken parameter type [voidp-param-weaken]
void mono_gc_wbarrier_generic_store (..., gpointer);
gpointer value;
mono_gc_wbarrier_generic_store (value_addr, value);If something is "weakly typed" make it a gpointer/void*, not a char* or guint8*:
"Weakly typed" isn't well defined.
Arguably guint8* is a strong type -- "array of bytes".
Arguably MonoObject* is a strong type.
It seems a little unfair to declare the root of a type hierarchy as weak, in the face of multiple such hierarchies.
Disadvantage: Probably better to leave C signatures all alone. Therefore, cast.
Resolution
TBD
void*-vs-non-void* in struct fields [voidp-vs-void-struct-field]
Proposals
Change struct fields from gpointer
For example:
#9888
Cast more
Note that struct that are not declared, are still opaque.
#9761
is a good change. It helps debugging and maintainability. Or cast more.
Resolution
TBD
Function pointers do not silently convert to void* in C++ [functionp-to-voidp-output]
For example, as function return values.
Occasionally strengthen type:
when they are of the same type.
Usually cast:
37c7d3a
Sometimes via a macro:
https://github.com/mono/mono/pull/10036/files#diff-51871b3c9f77c36adec88b19019cb69aR560
gpointer get_foo()
{
#define return return (gpointer)
if (..) return f1;
else if (..) return f2;
else if (..) return f3;
#undef return
}The macros seems a little sleazy, but sometimes it is quite a few lines.
This found a bug (that I wrote, oops):
#10117
Resolution
TBD
Function pointers do not silently convert to void* in C++ [functionp-to-voidp-input]
For example, as function parameters.
void register_jit_icall (gpointer);
void f1();
void f2()
{
register_jit_icall(f1); // error
}Occasionally strengthen type:
Thread creation could be fixed here.
Cast, or a macro:
#define foo(x, y) (foo((gpointer)(x), (y))
Example:
https://github.com/mono/mono/pull/10036/files#diff-b00b862030776d18eacb5c73934413a8R53
https://github.com/mono/mono/pull/9764/files#diff-d3ba4b3a1317ed875ae9f7f56ce199b8R2050
Resolution
TBD
Enum vs. int [enum-vs-int]
Enums cannot be assigned from integers and don't allow for math, or somesuch.
typedef enum Color { red, blue } Color;
Color r = 0;
Color purple = red + blue;
Color b = r + 1;Proposals
Insert casts [enum-vs-int-cast]
67906d3#diff-de21c55465380611eb87eed09b51e13eR18
67906d3#diff-e7e458b6256eaa730c145f14a666652aR776
Generate operator overloads [enum-vs-int-operator-overload]
https://github.com/mono/mono/pull/10027/files#diff-b8c6c2ba344450884373fb00f33685adR92
Seems strange, but is idiomiatic. Windows.h offers this.
Limits line damage.
Use ints instead, as that is what C enums are.
This is kinda an obvious safe change, but debugging suffers.
Debugging suffers because debuggers can show enums by their names or even as bitwise-or of names, as long as a variable is actually typed with a typedef.
The retyping of a enum as int could be C++ specific to limit the debugging damage.
[enum-vs-int-just-int-always]
enum { red, blue };
typedef int Color;[enum-vs-int-just-int-cxx]
enum Color { red, blue };
#ifdef __cplusplus
typedef int Color;
#else
typedef enum Color Color;
#endifResolution
TBD
Goto-around-init
C++ does not allow:
int f1();
int f2()
{
if (f1())
goto exit; // error: goto around init of i
int i = f1();
exit:
return 0;
}Even though i is not used after exit:.
Proposals
Split decl from init [goto-around-init-split-decl-from-init]
int f2()
{
if (f1())
goto exit;
int i;
i = f1();
exit:
return 0;
}Jump up [goto-around-init-goto-up]
int f2()
{
if (0) {
exit:
return 0;
}
if (f1())
goto exit;
int i = f1();
goto exit;
}Second is less damage, sometimes -- there needs to be multiple goto to be worthwhile, and have to move the end of the function.
https://github.com/mono/mono/pull/9764/files#diff-9df70935849c9fb54aea9bff7a39533fR1170
Resolution
TBD
Where to put extern "C"?
Proposals
Everywhere
Only where needed
Very initial work put extern "C" around every header (but not #include -- each header owns its own content.)
This allowed a gradual port -- mini, metadata, utils, sgen.
As the port progressed, the need to support a mixed tree both fell away, and in future could become detrimental. In particular, you cannot have extern "C" overloads or templates -- though you can selectively wrap in extern "C++" to nest those within an otherwise extern "C" context.
As well, I later had the epiphany to build extern "C" into MONO_API and ICALL_EXPORT.
That is in master.
Still later (8/14/2018) as most platforms are passing in CI, I come to WebAssembly. I have not fully discovered all of its "bindings" and as such have sprinkled some additional extern "C" around to make it pass.
If mechanisms such as name-overloading and templates become banned by future policy, then extern "C" around .h and even .c becomes a significant enforcement of that policy.
Resolution
TBD
Designated initializers
C++ has no designated initializers.
Current clang, no problem.
Current gcc, some problems.
Centos6/gcc4.4: no support
msvc?
Proposals
There are two kinda disjoint cases here -- locals and globals.
They could be merged, but the tradeoffs are kinda opposite.
locals: memset
memset them and fill in each field by name (or index, hypothetically).
Locals could also be handled like globals, but it is extra unneeded verification on the ordering.
5157357#diff-40253f389761b6525e7f5594c2d94debR1840
globals: ensure correct order.
have the initializer match up with the field order.
They all do already from what I've seen.
include+define: must line up in order
Mostly already fixed in master:
5157357
Residual either I missed or new since then:
#10109
Resolution
TBD
'a': int vs. char [char-literal-is-char-or-int]
In C, the type of 'a' is int (sizeof('a')==4). In C++ it is char (sizeof('a')==1)
This is a semantic difference and a slight risk.
Indexing an array with a character constant, as mono does, gets a warning. Cast to int.
Resolution
TBD
Keywords
Mono has identifiers like new, class, namespace, typename.
Rename to new_, klass, name_space, type_name, etc.
(C++ has other new keywords: template, operator, etc.)
Resolution
TBD
Complex/imaginary numbers
Mono uses complex/imaginary math, that was not valid C++.
While C++ has an ancient, small, inlinable complex number library, libstdc++/libc++ dependency is to be avoided at this stage, inlining is not guaranteed, so this has been ported to just use double.
Resolution
TBD
C++11 user defined literals "foo"bar
In C++11 you can have "user defined literals" which are strings appended with a type-specific user-provided suffix. For example..and I don't exact the syntax, I have never done this but something like:
class distance { distance(inches); distance(feet);. };
class time { time(seconds); time(milliseconds); }
time five_seconds = "5"s;
time half_second = "500"ms;
distance football_field = "300"ft; // or "100"yd
distance just_made_it_by_a_few = "3"inch;This is ambiguous with C++98 and C such as:
#include <inttypes.h>
#define PRI32 "%d"
int32_t i;
printf("i is "PRI32"\n", i);And an error.
To fix, just put spaces before/after quotes:
printf("i is " PRI32 "\n", i);
After suffices, before/after is clearer.
Resolution
TBD
WebAssembly requires types be unique.
https://jenkins.mono-project.com/job/test-mono-pull-request-wasm/3652/parsed_console/log.html
invalid enum type
!19255 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus, file: !7355, producer: "clang version 5.0.0 (emscripten 1.37.36 : 1.37.36)", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, enums: !19256, retainedTypes: !19257, imports: !19263)
!19256 = !{!7363, !6973, !168}
!7363 = distinct !DICompositeType(tag: DW_TAG_structure_type, file: !7364, line: 192, size: 128, elements: !7365, identifier: "_ZTS10ParseState")
/mnt/jenkins/workspace/test-mono-pull-request-wasm/sdks/builds/toolchains/emsdk/clang/e1.37.36_64bit/llvm-link: error: linked module is broken!
Rename to make them so, or use #define.
Resolution
TBD
wchar and unsigned short are different types [wchart-is-not-ushort]
Mono does this on Windows:
typedef unsigned short gunichar2;
gunichar2* foo;
wcslen(foo);This is incorrect and generates warnings or errors. wchar_t is not unsigned short. It is a unique type.
Fix is to use whatever the platform defines:
C++ rationale: This is because you can overload.
unsigned short is a "small number".
wchar_t is a "wide character".
"characters" and "numbers" are different.
print(wchar_t);
print(unsigned short);
print(L'a') // "a"
print((unsigned short)65) // "65"Resolution
TBD
Misalligned if-defs
C++ compiler finds random mistakes.
For example, in sgen we have:
#if defined (HOST_WIN32)
...
#elif defined (HAVE_UNISTD_H)
...
#endiffollowed by the opposite:
#if defined (HAVE_UNISTD_H)
...
#elif defined (HOST_WIN32)
...
#endifBut the defines are not mutually exclusive. The two chunks are meant to match up.
The fix is to have a consistent order, or maybe something else, since that is still error-prone.
Resolution
TBD
Incorrect Prototypes
Some functions have deliberately incorrect prototypes.
This is anything involving THREAD_INFO_TYPE and MonoGCDescriptor (mono_gc_alloc_fixed).
These are hacks and they lead to C++ link failures.
Proposals
Sprinkle extern "C" around more.
Typesafe wrappers or overloads.
https://github.com/mono/mono/pull/10087/files#diff-bd3a3d5c2bfd6af46373b3844de9c1d0R444
https://github.com/mono/mono/pull/10087/files#diff-bd3a3d5c2bfd6af46373b3844de9c1d0R163
Resolution
TBD
Copied instead of shared enums
Mono duplicates enums that are meant to be the same.
Either share the types as intended, or cast more.
#6700
or
https://github.com/mono/mono/pull/10087/files#diff-d5708d26cfbc2037ac98c34b2e708387R1807
Resolution
TBD
Incorrect Code
The C++ compiler rejects obviously incorrect code that the C compiler accepts.
Resolution
Fix the code
How much to port?
glib, utils, metadata, sgen, mini, pedump, monodis, monograph, unit-tests, "main", ios, Android, Mac, Linux, WebAssembly, Windows, arm, arm64, amd64, x86 all building and passing in CI (branch configure.ac hacked to always be C++). All fixes are in master or will be shortly.
Windows MSBuild project files were successfully prototyped (/TP) and Visual C++-specific problems fixed.
The more that is ported, the less extern "C" is needed.
libtest, profiler, and MonoPosixHelper were ported, easily, but extern "C" added to workaround instead.