Skip to content

C/C++ incompatibilities #10085

@jaykrell

Description

@jaykrell

This is a summary document with the issues we have found during C++ port. We intend to use this as a guidance how to resolve compilation issues if we want to have a common code base which compiles with C and C++ compilers

Every problem has been solved, in at least one way, and all problems have working solutions.
("working" meaning many CI lanes are green, just finishing off Windows.)

Also most these proposals are already in master to some extent (except "autocast" and enum operators), but not every instance of every problem has been fixed yet in master.

The github links often do not work particularly well, jumping to a commit or PR but not landing on the correct lines, without waiting and/or scrolling around.

void*-vs-non-void*-function-return [voidp-ret]

This is far and away our biggest problem. While the obvious fix is very safe and mechanical and locally small, and everyone understands it, and it obviously preserves semantics, it is the largest due to number of occurrences.

C allows the following:

void* get_foo();
int *a = get_foo();

gcc allows it with -fpermissive but msvc and clang do not.

Proposals

Insert casts as needed [voidp-ret-cast]

This is very simple, works, no #ifdef __cplusplus, very voluminous.

int *a = (int*)get_foo();

e.g.
#10045
67906d3#diff-34d834b8d69724f49f21964be1e1258dL85

This seems like the most obvious choice, but there are others.

An example of a PR "with all the casts", without "auto cast" is:
https://github.com/mono/mono/pull/9950/files
"auto cast" does not reduce this to zero, but it does reduce it a lot.

Resolution

TBD

[voidp-ret-autocast]

In many cases, not all, a trick:

#ifdef __cplusplus
struct g_cast { .. something-not-hypothetical .. };
#else
#define g_cast(x) x
#endif

#define get_foo() (g_cast (get_foo())

This works with g_malloc, g_malloc0, and many others, but not all.
It does cut down the line damage from other choices.
It would have to be combined with another option.

https://github.com/mono/mono/pull/10027/files#diff-b8c6c2ba344450884373fb00f33685adR57
+
https://github.com/mono/mono/pull/10027/files#diff-b8c6c2ba344450884373fb00f33685adR1272

Still leaving all these casts:
#10036

Change output signature [voidp-ret-voidpp]

int* a;
get_foo((void**)&a);

No examples written but obvious.
Disadvantage: Prefer not to change C signatures.

Sometimes change output signature [voidp-ret-strengthen]

#10042

This is probably how those specific functions should be, but does not apply broadly.

Change g_malloc to g_new [voidp-ret-gnew]

It works, but does not apply broadly due to "array at end" and "other allocators".

Templatize [voidp-ret-templatize]

This is probably outside scope at this time.

Resolution

TBD

void*-vs-non-void* in function parameters values [voidp-param]

For example:

void mono_gc_wbarrier_generic_store (..., MonoObject*);
gpointer value;
mono_gc_wbarrier_generic_store (value_addr, value); // error: value must be MonoObject*

Proposals

[voidp-param-cast]

mono_gc_wbarrier_generic_store (value_addr, (MonoObject*)value);

weaken parameter type [voidp-param-weaken]

void mono_gc_wbarrier_generic_store (..., gpointer);
gpointer value;
mono_gc_wbarrier_generic_store (value_addr, value);

If something is "weakly typed" make it a gpointer/void*, not a char* or guint8*:

#9881

"Weakly typed" isn't well defined.
Arguably guint8* is a strong type -- "array of bytes".
Arguably MonoObject* is a strong type.
It seems a little unfair to declare the root of a type hierarchy as weak, in the face of multiple such hierarchies.

Disadvantage: Probably better to leave C signatures all alone. Therefore, cast.

Resolution

TBD

void*-vs-non-void* in struct fields [voidp-vs-void-struct-field]

Proposals

Change struct fields from gpointer

For example:
#9888

Cast more

Note that struct that are not declared, are still opaque.
#9761

is a good change. It helps debugging and maintainability. Or cast more.

Resolution

TBD

Function pointers do not silently convert to void* in C++ [functionp-to-voidp-output]

For example, as function return values.

Occasionally strengthen type:

#10057

when they are of the same type.

Usually cast:
37c7d3a

Sometimes via a macro:
https://github.com/mono/mono/pull/10036/files#diff-51871b3c9f77c36adec88b19019cb69aR560

gpointer get_foo()
{
#define return return (gpointer)
 if (..) return f1;
 else if (..) return f2;
 else if (..) return f3;
#undef return
}

The macros seems a little sleazy, but sometimes it is quite a few lines.

This found a bug (that I wrote, oops):
#10117

Resolution

TBD

Function pointers do not silently convert to void* in C++ [functionp-to-voidp-input]

For example, as function parameters.

void register_jit_icall (gpointer);

void f1();

void f2()
{
	register_jit_icall(f1); // error
}

Occasionally strengthen type:

Thread creation could be fixed here.

Cast, or a macro:

#9992

#define foo(x, y) (foo((gpointer)(x), (y))

Example:
https://github.com/mono/mono/pull/10036/files#diff-b00b862030776d18eacb5c73934413a8R53
https://github.com/mono/mono/pull/9764/files#diff-d3ba4b3a1317ed875ae9f7f56ce199b8R2050

Resolution

TBD

Enum vs. int [enum-vs-int]

Enums cannot be assigned from integers and don't allow for math, or somesuch.

typedef enum Color { red, blue } Color;

Color r = 0;
Color purple = red + blue;
Color b = r + 1;

Proposals

Insert casts [enum-vs-int-cast]

67906d3#diff-de21c55465380611eb87eed09b51e13eR18

67906d3#diff-e7e458b6256eaa730c145f14a666652aR776

Generate operator overloads [enum-vs-int-operator-overload]

https://github.com/mono/mono/pull/10027/files#diff-b8c6c2ba344450884373fb00f33685adR92
Seems strange, but is idiomiatic. Windows.h offers this.
Limits line damage.

Use ints instead, as that is what C enums are.

This is kinda an obvious safe change, but debugging suffers.
Debugging suffers because debuggers can show enums by their names or even as bitwise-or of names, as long as a variable is actually typed with a typedef.
The retyping of a enum as int could be C++ specific to limit the debugging damage.

[enum-vs-int-just-int-always]

enum { red, blue };
typedef int Color;

[enum-vs-int-just-int-cxx]

enum Color { red, blue };
#ifdef __cplusplus
typedef int Color;
#else
typedef enum Color Color;
#endif

Resolution

TBD

Goto-around-init

C++ does not allow:

int f1();

int f2()
{
  if (f1())
     goto exit; // error: goto around init of i
  int i = f1();
exit:
  return 0;
}

Even though i is not used after exit:.

Proposals

Split decl from init [goto-around-init-split-decl-from-init]

int f2()
{
  if (f1())
    goto exit;
  int i;
  i = f1();
exit:
  return 0;
}

#10132
#9785

Jump up [goto-around-init-goto-up]

int f2()
{
  if (0) {
exit:
    return 0;
  }
  if (f1())
    goto exit;
  int i = f1();
  goto exit;
}

Second is less damage, sometimes -- there needs to be multiple goto to be worthwhile, and have to move the end of the function.

https://github.com/mono/mono/pull/9764/files#diff-9df70935849c9fb54aea9bff7a39533fR1170

Resolution

TBD

Where to put extern "C"?

Proposals

Everywhere

Only where needed

Very initial work put extern "C" around every header (but not #include -- each header owns its own content.)
This allowed a gradual port -- mini, metadata, utils, sgen.
As the port progressed, the need to support a mixed tree both fell away, and in future could become detrimental. In particular, you cannot have extern "C" overloads or templates -- though you can selectively wrap in extern "C++" to nest those within an otherwise extern "C" context.

As well, I later had the epiphany to build extern "C" into MONO_API and ICALL_EXPORT.
That is in master.

Still later (8/14/2018) as most platforms are passing in CI, I come to WebAssembly. I have not fully discovered all of its "bindings" and as such have sprinkled some additional extern "C" around to make it pass.

If mechanisms such as name-overloading and templates become banned by future policy, then extern "C" around .h and even .c becomes a significant enforcement of that policy.

#10141
#9814

Resolution

TBD

Designated initializers

C++ has no designated initializers.

Current clang, no problem.
Current gcc, some problems.
Centos6/gcc4.4: no support
msvc?

Proposals

There are two kinda disjoint cases here -- locals and globals.
They could be merged, but the tradeoffs are kinda opposite.

locals: memset

memset them and fill in each field by name (or index, hypothetically).
Locals could also be handled like globals, but it is extra unneeded verification on the ordering.

5157357#diff-40253f389761b6525e7f5594c2d94debR1840

globals: ensure correct order.

have the initializer match up with the field order.
They all do already from what I've seen.

include+define: must line up in order

Mostly already fixed in master:
5157357

Residual either I missed or new since then:
#10109

Resolution

TBD

'a': int vs. char [char-literal-is-char-or-int]

In C, the type of 'a' is int (sizeof('a')==4). In C++ it is char (sizeof('a')==1)
This is a semantic difference and a slight risk.

Indexing an array with a character constant, as mono does, gets a warning. Cast to int.

#10118

Resolution

TBD

Keywords

Mono has identifiers like new, class, namespace, typename.
Rename to new_, klass, name_space, type_name, etc.
(C++ has other new keywords: template, operator, etc.)

801e3f3

Resolution

TBD

Complex/imaginary numbers

Mono uses complex/imaginary math, that was not valid C++.
While C++ has an ancient, small, inlinable complex number library, libstdc++/libc++ dependency is to be avoided at this stage, inlining is not guaranteed, so this has been ported to just use double.

3695672

Resolution

TBD

C++11 user defined literals "foo"bar

In C++11 you can have "user defined literals" which are strings appended with a type-specific user-provided suffix. For example..and I don't exact the syntax, I have never done this but something like:

class distance { distance(inches); distance(feet);. };
class time { time(seconds); time(milliseconds); }

time five_seconds = "5"s;
time half_second = "500"ms;
distance football_field = "300"ft; // or "100"yd
distance just_made_it_by_a_few = "3"inch;

This is ambiguous with C++98 and C such as:

#include <inttypes.h>
#define PRI32 "%d"

int32_t i;
printf("i is "PRI32"\n", i);

And an error.
To fix, just put spaces before/after quotes:
printf("i is " PRI32 "\n", i);

After suffices, before/after is clearer.

9bafdc2

Resolution

TBD

WebAssembly requires types be unique.

https://jenkins.mono-project.com/job/test-mono-pull-request-wasm/3652/parsed_console/log.html

invalid enum type
!19255 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus, file: !7355, producer: "clang version 5.0.0 (emscripten 1.37.36 : 1.37.36)", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, enums: !19256, retainedTypes: !19257, imports: !19263)
!19256 = !{!7363, !6973, !168}
!7363 = distinct !DICompositeType(tag: DW_TAG_structure_type, file: !7364, line: 192, size: 128, elements: !7365, identifier: "_ZTS10ParseState")
/mnt/jenkins/workspace/test-mono-pull-request-wasm/sdks/builds/toolchains/emsdk/clang/e1.37.36_64bit/llvm-link: error: linked module is broken!

Rename to make them so, or use #define.

#10091

Resolution

TBD

wchar and unsigned short are different types [wchart-is-not-ushort]

Mono does this on Windows:

typedef unsigned short gunichar2;
gunichar2* foo;
wcslen(foo);

This is incorrect and generates warnings or errors. wchar_t is not unsigned short. It is a unique type.
Fix is to use whatever the platform defines:

#10056
#10112

C++ rationale: This is because you can overload.
unsigned short is a "small number".
wchar_t is a "wide character".
"characters" and "numbers" are different.

print(wchar_t);
print(unsigned short);
print(L'a') // "a"
print((unsigned short)65) // "65"

Resolution

TBD

Misalligned if-defs

C++ compiler finds random mistakes.
For example, in sgen we have:

#if defined (HOST_WIN32)
...
#elif defined (HAVE_UNISTD_H)
...
#endif

followed by the opposite:

#if defined (HAVE_UNISTD_H)
...
#elif defined (HOST_WIN32)
...
#endif

But the defines are not mutually exclusive. The two chunks are meant to match up.
The fix is to have a consistent order, or maybe something else, since that is still error-prone.

#10077

Resolution

TBD

Incorrect Prototypes

Some functions have deliberately incorrect prototypes.
This is anything involving THREAD_INFO_TYPE and MonoGCDescriptor (mono_gc_alloc_fixed).
These are hacks and they lead to C++ link failures.

Proposals

Sprinkle extern "C" around more.

Typesafe wrappers or overloads.

https://github.com/mono/mono/pull/10087/files#diff-bd3a3d5c2bfd6af46373b3844de9c1d0R444
https://github.com/mono/mono/pull/10087/files#diff-bd3a3d5c2bfd6af46373b3844de9c1d0R163

#10297
#10294
#10293

Resolution

TBD

Copied instead of shared enums

Mono duplicates enums that are meant to be the same.
Either share the types as intended, or cast more.

#6700
or
https://github.com/mono/mono/pull/10087/files#diff-d5708d26cfbc2037ac98c34b2e708387R1807

Resolution

TBD

Incorrect Code

The C++ compiler rejects obviously incorrect code that the C compiler accepts.

#10117

Resolution

Fix the code

How much to port?

glib, utils, metadata, sgen, mini, pedump, monodis, monograph, unit-tests, "main", ios, Android, Mac, Linux, WebAssembly, Windows, arm, arm64, amd64, x86 all building and passing in CI (branch configure.ac hacked to always be C++). All fixes are in master or will be shortly.

Windows MSBuild project files were successfully prototyped (/TP) and Visual C++-specific problems fixed.

The more that is ported, the less extern "C" is needed.
libtest, profiler, and MonoPosixHelper were ported, easily, but extern "C" added to workaround instead.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions