Want a compiler warning if I repr(C) a bool #1982

Closed
federicomenaquintero opened this Issue Apr 26, 2017 · 24 comments

Comments

Projects
None yet
@federicomenaquintero

While #992 gets resolved, I would love to have a compiler warning whenever I naively assume that repr(C) for bool will look like any C type in particular.

Example: In librsvg, I assumed that bool would repr(C) as int, and therefore as gboolean (Glib's name for int-as-boolean values). It worked... passing values from C to Rust, until it didn't, passing values from Rust to C :)

@eddyb

This comment has been minimized.

Show comment
Hide comment
@eddyb

eddyb Apr 26, 2017

Member

Isn't _Bool one byte though? Unlike gboolean, that is. The former is standard.

Member

eddyb commented Apr 26, 2017

Isn't _Bool one byte though? Unlike gboolean, that is. The former is standard.

@le-jzr

This comment has been minimized.

Show comment
Hide comment
@le-jzr

le-jzr Apr 26, 2017

Neither C99, nor C11 standard specifies what _Bool actually is. All it says is that it can represent values 0 and 1, and that assignment of any value gets converted to 1 if it's non-zero.
C11 also includes this cryptic footnote:

While the number of bits in a _Bool object is at least CHAR_BIT, the width (number of sign and value bits) of a _Bool may be just 1 bit.

In my non-expert opinion, using the type on any sort of ABI boundary is something to be avoided like a plague.

le-jzr commented Apr 26, 2017

Neither C99, nor C11 standard specifies what _Bool actually is. All it says is that it can represent values 0 and 1, and that assignment of any value gets converted to 1 if it's non-zero.
C11 also includes this cryptic footnote:

While the number of bits in a _Bool object is at least CHAR_BIT, the width (number of sign and value bits) of a _Bool may be just 1 bit.

In my non-expert opinion, using the type on any sort of ABI boundary is something to be avoided like a plague.

@federicomenaquintero

This comment has been minimized.

Show comment
Hide comment
@federicomenaquintero

federicomenaquintero Apr 26, 2017

gboolean predates _Bool :)

I had this buggy code:

#[repr(C)]
pub struct Foo {
   field1: u32,
   field2: bool
}

and I assumed that on the C side it would look like

typedef struct {
  guint field1;
  gboolean field2;
} Foo;

That doesn't work; field2 ends up with garbage in the high bits.

However, "importing" C functions for calling from Rust seemed to work fine:

extern "C" {
  fn some_c_func (blargh: u32) -> bool;
}

The C prototype is

gboolean some_c_func (guint blargh);

In the case of the struct, Rust doesn't know what my C struct will look like. Maybe emit a compiler warning until the repr(C) for bool is actually documented (I was running on an assumption like "surely it's just a C int").

In the case of the extern fucntion, declaring a return type which is not something from libc::* is clearly suspect, I think...

gboolean predates _Bool :)

I had this buggy code:

#[repr(C)]
pub struct Foo {
   field1: u32,
   field2: bool
}

and I assumed that on the C side it would look like

typedef struct {
  guint field1;
  gboolean field2;
} Foo;

That doesn't work; field2 ends up with garbage in the high bits.

However, "importing" C functions for calling from Rust seemed to work fine:

extern "C" {
  fn some_c_func (blargh: u32) -> bool;
}

The C prototype is

gboolean some_c_func (guint blargh);

In the case of the struct, Rust doesn't know what my C struct will look like. Maybe emit a compiler warning until the repr(C) for bool is actually documented (I was running on an assumption like "surely it's just a C int").

In the case of the extern fucntion, declaring a return type which is not something from libc::* is clearly suspect, I think...

@le-jzr

This comment has been minimized.

Show comment
Hide comment
@le-jzr

le-jzr Apr 26, 2017

fn some_c_func (blargh: u32) -> bool;

I'm guessing this works due to the fact that C promotes parameters/retvals smaller than int to a full int. So long as the translation of bool isn't bigger (and why would it be?) it does the right thing. As a struct member, not so much.

le-jzr commented Apr 26, 2017

fn some_c_func (blargh: u32) -> bool;

I'm guessing this works due to the fact that C promotes parameters/retvals smaller than int to a full int. So long as the translation of bool isn't bigger (and why would it be?) it does the right thing. As a struct member, not so much.

@federicomenaquintero

This comment has been minimized.

Show comment
Hide comment
@federicomenaquintero

federicomenaquintero Apr 26, 2017

I'm guessing this works due to the fact that C promotes parameters/retvals smaller than int to a full int

Yes, that makes a lot of sense.

Now I'm wondering if it works (and the tests work) on my little-endian x86 when Rust picks up the bool from the low bits of the promoted int - but it wouldn't work on big endian :)

I'm guessing this works due to the fact that C promotes parameters/retvals smaller than int to a full int

Yes, that makes a lot of sense.

Now I'm wondering if it works (and the tests work) on my little-endian x86 when Rust picks up the bool from the low bits of the promoted int - but it wouldn't work on big endian :)

@comex

This comment has been minimized.

Show comment
Hide comment
@comex

comex Apr 26, 2017

It doesn't seem particularly surprising to me that the C equivalent of Rust bool is bool, not int. bool has been in the C standard for almost 20 years...

(Also known as _Bool, but it's bool if you include <stdbool.h> or are using C++.)

edit: also, repr(C) isn't actually relevant here.

comex commented Apr 26, 2017

It doesn't seem particularly surprising to me that the C equivalent of Rust bool is bool, not int. bool has been in the C standard for almost 20 years...

(Also known as _Bool, but it's bool if you include <stdbool.h> or are using C++.)

edit: also, repr(C) isn't actually relevant here.

@SimonSapin

This comment has been minimized.

Show comment
Hide comment
@SimonSapin

SimonSapin Apr 26, 2017

Contributor
#[repr(C)]
pub struct Foo {
  field1: u32,
  field2: bool
}

As far as I understand, #[repr(C)] affects the memory layout of Foo but it can not affect the layout of types used inside Foo like bool. You’d need to add #[repr(C)] on the definition of that type, which doesn’t exist for primitive types and if it did would be in the standard library.

Contributor

SimonSapin commented Apr 26, 2017

#[repr(C)]
pub struct Foo {
  field1: u32,
  field2: bool
}

As far as I understand, #[repr(C)] affects the memory layout of Foo but it can not affect the layout of types used inside Foo like bool. You’d need to add #[repr(C)] on the definition of that type, which doesn’t exist for primitive types and if it did would be in the standard library.

@federicomenaquintero

This comment has been minimized.

Show comment
Hide comment
@federicomenaquintero

federicomenaquintero Apr 26, 2017

It doesn't seem particularly surprising to me that the C equivalent of repr(C) bool is bool, not int. It's been in the C standard for over 10 years...

Sorry, dinosaur here. AFAICT stdbool.h is from c99, and gboolean is from 1996 or thereabouts :)

It doesn't seem particularly surprising to me that the C equivalent of repr(C) bool is bool, not int. It's been in the C standard for over 10 years...

Sorry, dinosaur here. AFAICT stdbool.h is from c99, and gboolean is from 1996 or thereabouts :)

@le-jzr

This comment has been minimized.

Show comment
Hide comment
@le-jzr

le-jzr Apr 27, 2017

_Bool may have been in the standards for a few years, but it isn't integrated in the overall C language in any way. The result type of comparison operators is still int, as is the expected type of conditions in if and while. Most of the old code (which means, most of existing code) never made the transition to _Bool, because it's mostly useless and has unspecified memory layout.

Now, the question is, does Rust guarantee that Rust bool is the same as C _Bool? If so, it's technically correct to use bool in #repr(C), but it's not intuitive what it means, due to C's history (as exhibited by this very thread).

What I would suggest is this:

  • Add a type libc::c_Bool that's guaranteed to be the C _Bool.
  • Lint for bool in FFI context, and suggest in being replaced with either libc::c_int or libc::c_Bool, based on context.
  • Educate people about the pitfalls of bools in C interfaces.

le-jzr commented Apr 27, 2017

_Bool may have been in the standards for a few years, but it isn't integrated in the overall C language in any way. The result type of comparison operators is still int, as is the expected type of conditions in if and while. Most of the old code (which means, most of existing code) never made the transition to _Bool, because it's mostly useless and has unspecified memory layout.

Now, the question is, does Rust guarantee that Rust bool is the same as C _Bool? If so, it's technically correct to use bool in #repr(C), but it's not intuitive what it means, due to C's history (as exhibited by this very thread).

What I would suggest is this:

  • Add a type libc::c_Bool that's guaranteed to be the C _Bool.
  • Lint for bool in FFI context, and suggest in being replaced with either libc::c_int or libc::c_Bool, based on context.
  • Educate people about the pitfalls of bools in C interfaces.
@ubsan

This comment has been minimized.

Show comment
Hide comment
@ubsan

ubsan Apr 27, 2017

Contributor

@le-jzr C is not the only language that Rust FFIs with. In fact, into the future, I would imagine that C++ is the far more important language for Rust to FFI with, and C++ has fully integrated bool. We shouldn't punish FFI with good languages because of the mistakes of bad languages ;)

(I'd also note that all C types (except the intN_t and char types) have an unspecified (or implementation-defined) layout; the only thing that's guaranteed is that objects are binary (not ternary, for example). The intN_t types are guaranteed to be 2s complement, without padding bits, and the char types are guaranteed to be CHAR_BITS wide, and have no padding bits.)

Contributor

ubsan commented Apr 27, 2017

@le-jzr C is not the only language that Rust FFIs with. In fact, into the future, I would imagine that C++ is the far more important language for Rust to FFI with, and C++ has fully integrated bool. We shouldn't punish FFI with good languages because of the mistakes of bad languages ;)

(I'd also note that all C types (except the intN_t and char types) have an unspecified (or implementation-defined) layout; the only thing that's guaranteed is that objects are binary (not ternary, for example). The intN_t types are guaranteed to be 2s complement, without padding bits, and the char types are guaranteed to be CHAR_BITS wide, and have no padding bits.)

@le-jzr

This comment has been minimized.

Show comment
Hide comment
@le-jzr

le-jzr Apr 27, 2017

@ubsan That's just more reason not to allow plain bool in FFI, since different languages may have very different ideas about how bool is represented. Let a library type for a particular language decide, and provide conversions if necessary. IMO that's the least error-prone solution.

le-jzr commented Apr 27, 2017

@ubsan That's just more reason not to allow plain bool in FFI, since different languages may have very different ideas about how bool is represented. Let a library type for a particular language decide, and provide conversions if necessary. IMO that's the least error-prone solution.

@CryZe

This comment has been minimized.

Show comment
Hide comment
@CryZe

CryZe Apr 27, 2017

I actually hit this issue just yesterday, where both Rust's and C#'s bool are 1 byte, but C# assumes bools are 4 bytes during FFI, so my functions always returned true.

CryZe commented Apr 27, 2017

I actually hit this issue just yesterday, where both Rust's and C#'s bool are 1 byte, but C# assumes bools are 4 bytes during FFI, so my functions always returned true.

@ubsan

This comment has been minimized.

Show comment
Hide comment
@ubsan

ubsan Apr 27, 2017

Contributor

@le-jzr I don't like making Rust worse because other languages made the wrong choice. C's _Bool is the correct interop representation for a boolean type, and old C libraries (like WINAPI's) choices should not affect that.

Contributor

ubsan commented Apr 27, 2017

@le-jzr I don't like making Rust worse because other languages made the wrong choice. C's _Bool is the correct interop representation for a boolean type, and old C libraries (like WINAPI's) choices should not affect that.

@le-jzr

This comment has been minimized.

Show comment
Hide comment
@le-jzr

le-jzr Apr 27, 2017

@ubsan I don't understand what you mean by "making Rust worse". Current state is strictly worse -- we can use bool in FFI, but it doesn't mean what anyone would expect of it, because the representation of bool is still unstable (as far as I gathered from the issue OP linked, feel free to correct me). As such, at this moment, using bool in FFI is a bug, and even if it works at all (haven't tested), it can silently break at any time.

le-jzr commented Apr 27, 2017

@ubsan I don't understand what you mean by "making Rust worse". Current state is strictly worse -- we can use bool in FFI, but it doesn't mean what anyone would expect of it, because the representation of bool is still unstable (as far as I gathered from the issue OP linked, feel free to correct me). As such, at this moment, using bool in FFI is a bug, and even if it works at all (haven't tested), it can silently break at any time.

@Nemo157

This comment has been minimized.

Show comment
Hide comment
@Nemo157

Nemo157 Apr 28, 2017

Contributor

It does seem that Rust bool <-> C _Bool should be the conversion. As mentioned earlier, the problem with documenting how many bytes a bool takes up during FFI is that it's implementation defined in the C standard. GCC at least (and I assume all other decent C compilers) then specifies that it's ABI defined. Taking a look at one popular ABI:

Booleans, when stored in a memory object, are stored as single byte objects the value of which is always 0 (false) or 1 (true). When stored in integer registers (except for passing as arguments), all 8 bytes of the register are significant; any nonzero value is considered true.
-- §3.1.2

When a value of type _Bool is returned or passed in a register or on the stack, bit 0 contains the truth value and bits 1 to 7 shall be zero. Other bits are left unspecified, hence the consumer side of those values can rely on it being 0 or 1 when truncated to 8 bit.
-- §3.2.3

System V Application Binary Interface - AMD64 Architecture Processor Supplement - Draft Version 0.99.7

Which appears to me to be compatible with Rust's definition of a bool as long as any return/parameter _Bool gets truncated before using. (I assume this is probably the ABI that was looked at when designing Rust's low level type representation).


What happens if there is an ABI that doesn't have the same guarantees as Rust though? Say, one where it does guarantee to pass a parameter of type _Bool with the size and alignment of an int, but only guarantees that the value is 0 for false and non-zero for true. That would imply an FFI function like extern fn foo(bar: bool) { ... } could have the value 2 passed in. It would be possible to allow this by injecting some runtime conversion code into all extern functions when compiled under that ABI, a transformation logically equivalent to

extern fn foo(bar: c_int) {
    let bar = bar != 0;
    ...
}

which would hopefully get optimised out, but could potentially introduce a slight hidden runtime cost. An alternative could be to not allow bool to be used in FFI and have users write something like

extern fn foo(bar: c_Bool) {
    let bar = bar.to_bool();
    ...
}

Which on x86-64 could be a no-op, and on the hypothetical incompatible ABI could be a slight explicit runtime cost if not optimised out.


For all I know this might already be taken care of by LLVM just by how Rust defines its bool type. Either way I think a big takeaway from this issue is that there needs to be more explicit documentation that Rust bool <-!-> C int on an FFI boundary.

Contributor

Nemo157 commented Apr 28, 2017

It does seem that Rust bool <-> C _Bool should be the conversion. As mentioned earlier, the problem with documenting how many bytes a bool takes up during FFI is that it's implementation defined in the C standard. GCC at least (and I assume all other decent C compilers) then specifies that it's ABI defined. Taking a look at one popular ABI:

Booleans, when stored in a memory object, are stored as single byte objects the value of which is always 0 (false) or 1 (true). When stored in integer registers (except for passing as arguments), all 8 bytes of the register are significant; any nonzero value is considered true.
-- §3.1.2

When a value of type _Bool is returned or passed in a register or on the stack, bit 0 contains the truth value and bits 1 to 7 shall be zero. Other bits are left unspecified, hence the consumer side of those values can rely on it being 0 or 1 when truncated to 8 bit.
-- §3.2.3

System V Application Binary Interface - AMD64 Architecture Processor Supplement - Draft Version 0.99.7

Which appears to me to be compatible with Rust's definition of a bool as long as any return/parameter _Bool gets truncated before using. (I assume this is probably the ABI that was looked at when designing Rust's low level type representation).


What happens if there is an ABI that doesn't have the same guarantees as Rust though? Say, one where it does guarantee to pass a parameter of type _Bool with the size and alignment of an int, but only guarantees that the value is 0 for false and non-zero for true. That would imply an FFI function like extern fn foo(bar: bool) { ... } could have the value 2 passed in. It would be possible to allow this by injecting some runtime conversion code into all extern functions when compiled under that ABI, a transformation logically equivalent to

extern fn foo(bar: c_int) {
    let bar = bar != 0;
    ...
}

which would hopefully get optimised out, but could potentially introduce a slight hidden runtime cost. An alternative could be to not allow bool to be used in FFI and have users write something like

extern fn foo(bar: c_Bool) {
    let bar = bar.to_bool();
    ...
}

Which on x86-64 could be a no-op, and on the hypothetical incompatible ABI could be a slight explicit runtime cost if not optimised out.


For all I know this might already be taken care of by LLVM just by how Rust defines its bool type. Either way I think a big takeaway from this issue is that there needs to be more explicit documentation that Rust bool <-!-> C int on an FFI boundary.

@arielb1

This comment has been minimized.

Show comment
Hide comment
@arielb1

arielb1 Apr 28, 2017

Contributor

@Nemo157

A bool is not an int. When defining an extern "C" function, the Rust types should match the C types.

Which on x86-64 could be a no-op, and on the hypothetical incompatible ABI could be a slight explicit runtime cost if not optimised out.

This cost is only taken on extern "C" functions, and there are already several other cases where we have to adapt the ABI.

Contributor

arielb1 commented Apr 28, 2017

@Nemo157

A bool is not an int. When defining an extern "C" function, the Rust types should match the C types.

Which on x86-64 could be a no-op, and on the hypothetical incompatible ABI could be a slight explicit runtime cost if not optimised out.

This cost is only taken on extern "C" functions, and there are already several other cases where we have to adapt the ABI.

@Nemo157

This comment has been minimized.

Show comment
Hide comment
@Nemo157

Nemo157 Apr 28, 2017

Contributor

@arielb1 I've updated the first code block and preceding paragraph to try and remove any implication that I think a bool is an int.

If there are already hidden runtime costs in extern "C" functions depending on the target ABI then I'm definitely in favor of having an implicit rust bool <-> C _Bool conversion, since it's almost certainly free in the majority of ABIs. This just really needs to be documented well somewhere; I just checked the std library docs for bool, the Rustonomicon, the FFI chapter in the book, the Reference and the new book, in none of these could I find any documentation on what C type any primitive Rust type actually corresponds to. For the sized integers this is pretty easy to guess, but as this issue shows people can make different assumptions about what C type bool is equivalent to depending on their C background.

Contributor

Nemo157 commented Apr 28, 2017

@arielb1 I've updated the first code block and preceding paragraph to try and remove any implication that I think a bool is an int.

If there are already hidden runtime costs in extern "C" functions depending on the target ABI then I'm definitely in favor of having an implicit rust bool <-> C _Bool conversion, since it's almost certainly free in the majority of ABIs. This just really needs to be documented well somewhere; I just checked the std library docs for bool, the Rustonomicon, the FFI chapter in the book, the Reference and the new book, in none of these could I find any documentation on what C type any primitive Rust type actually corresponds to. For the sized integers this is pretty easy to guess, but as this issue shows people can make different assumptions about what C type bool is equivalent to depending on their C background.

@erkinalp

This comment has been minimized.

Show comment
Hide comment
@erkinalp

erkinalp Jun 3, 2017

GCC and LLVM bools occupy one byte in memory.

erkinalp commented Jun 3, 2017

GCC and LLVM bools occupy one byte in memory.

@arielb1

This comment has been minimized.

Show comment
Hide comment
@arielb1

arielb1 Jun 3, 2017

Contributor

@Nemo157

Sure enough, we need better documentation on how Rust types match ABI types, both in extern "C" function arguments and in memory layouts.

Contributor

arielb1 commented Jun 3, 2017

@Nemo157

Sure enough, we need better documentation on how Rust types match ABI types, both in extern "C" function arguments and in memory layouts.

@Centril Centril added the T-compiler label Dec 6, 2017

@vext01 vext01 referenced this issue in softdevteam/hwtracer Dec 7, 2017

Closed

Use a bool return type for traceme_stop_tracer(). #9

@vext01

This comment has been minimized.

Show comment
Hide comment
@vext01

vext01 Dec 11, 2017

It sounds to me like the Rust FFI should guarantee that it's native bool can safely map to to the C99 _Bool via a libc::c_bool.

For other languages that want to map their Booleans at the ABI level, wouldn't it make sense to have a separate (external) crate defining a specialised bool type?

In other words libc::c_bool is a C99 bool, and mylang::bool is for some other language which wants to pass around bools at the ABI level.

vext01 commented Dec 11, 2017

It sounds to me like the Rust FFI should guarantee that it's native bool can safely map to to the C99 _Bool via a libc::c_bool.

For other languages that want to map their Booleans at the ABI level, wouldn't it make sense to have a separate (external) crate defining a specialised bool type?

In other words libc::c_bool is a C99 bool, and mylang::bool is for some other language which wants to pass around bools at the ABI level.

@nagisa

This comment has been minimized.

Show comment
Hide comment
@nagisa

nagisa Dec 11, 2017

Contributor

Cross-referencing more discussion @ rust-lang/rust#46176

Contributor

nagisa commented Dec 11, 2017

Cross-referencing more discussion @ rust-lang/rust#46176

@matklad

This comment has been minimized.

Show comment
Hide comment
@matklad

matklad Feb 10, 2018

Member

Rust bool now officially matches C _Bool, and this I think means that we should close this issue?

Link to the decision: rust-lang/rust#46176 (comment)

Member

matklad commented Feb 10, 2018

Rust bool now officially matches C _Bool, and this I think means that we should close this issue?

Link to the decision: rust-lang/rust#46176 (comment)

@vext01 vext01 referenced this issue in softdevteam/hwtracer Feb 12, 2018

Closed

Use `bool`s in API #37

@federicomenaquintero

This comment has been minimized.

Show comment
Hide comment
@federicomenaquintero

federicomenaquintero Feb 12, 2018

I'm fine with closing this now that bool matches the platform's C _Bool.

I'm fine with closing this now that bool matches the platform's C _Bool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment