Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Want a compiler warning if I repr(C) a bool #1982

Closed
federicomenaquintero opened this Issue Apr 26, 2017 · 24 comments

Comments

Projects
None yet
@federicomenaquintero
Copy link

federicomenaquintero commented Apr 26, 2017

While #992 gets resolved, I would love to have a compiler warning whenever I naively assume that repr(C) for bool will look like any C type in particular.

Example: In librsvg, I assumed that bool would repr(C) as int, and therefore as gboolean (Glib's name for int-as-boolean values). It worked... passing values from C to Rust, until it didn't, passing values from Rust to C :)

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Apr 26, 2017

Isn't _Bool one byte though? Unlike gboolean, that is. The former is standard.

@le-jzr

This comment has been minimized.

Copy link

le-jzr commented Apr 26, 2017

Neither C99, nor C11 standard specifies what _Bool actually is. All it says is that it can represent values 0 and 1, and that assignment of any value gets converted to 1 if it's non-zero.
C11 also includes this cryptic footnote:

While the number of bits in a _Bool object is at least CHAR_BIT, the width (number of sign and value bits) of a _Bool may be just 1 bit.

In my non-expert opinion, using the type on any sort of ABI boundary is something to be avoided like a plague.

@federicomenaquintero

This comment has been minimized.

Copy link
Author

federicomenaquintero commented Apr 26, 2017

gboolean predates _Bool :)

I had this buggy code:

#[repr(C)]
pub struct Foo {
   field1: u32,
   field2: bool
}

and I assumed that on the C side it would look like

typedef struct {
  guint field1;
  gboolean field2;
} Foo;

That doesn't work; field2 ends up with garbage in the high bits.

However, "importing" C functions for calling from Rust seemed to work fine:

extern "C" {
  fn some_c_func (blargh: u32) -> bool;
}

The C prototype is

gboolean some_c_func (guint blargh);

In the case of the struct, Rust doesn't know what my C struct will look like. Maybe emit a compiler warning until the repr(C) for bool is actually documented (I was running on an assumption like "surely it's just a C int").

In the case of the extern fucntion, declaring a return type which is not something from libc::* is clearly suspect, I think...

@le-jzr

This comment has been minimized.

Copy link

le-jzr commented Apr 26, 2017

fn some_c_func (blargh: u32) -> bool;

I'm guessing this works due to the fact that C promotes parameters/retvals smaller than int to a full int. So long as the translation of bool isn't bigger (and why would it be?) it does the right thing. As a struct member, not so much.

@federicomenaquintero

This comment has been minimized.

Copy link
Author

federicomenaquintero commented Apr 26, 2017

I'm guessing this works due to the fact that C promotes parameters/retvals smaller than int to a full int

Yes, that makes a lot of sense.

Now I'm wondering if it works (and the tests work) on my little-endian x86 when Rust picks up the bool from the low bits of the promoted int - but it wouldn't work on big endian :)

@comex

This comment has been minimized.

Copy link

comex commented Apr 26, 2017

It doesn't seem particularly surprising to me that the C equivalent of Rust bool is bool, not int. bool has been in the C standard for almost 20 years...

(Also known as _Bool, but it's bool if you include <stdbool.h> or are using C++.)

edit: also, repr(C) isn't actually relevant here.

@SimonSapin

This comment has been minimized.

Copy link
Contributor

SimonSapin commented Apr 26, 2017

#[repr(C)]
pub struct Foo {
  field1: u32,
  field2: bool
}

As far as I understand, #[repr(C)] affects the memory layout of Foo but it can not affect the layout of types used inside Foo like bool. You’d need to add #[repr(C)] on the definition of that type, which doesn’t exist for primitive types and if it did would be in the standard library.

@federicomenaquintero

This comment has been minimized.

Copy link
Author

federicomenaquintero commented Apr 26, 2017

It doesn't seem particularly surprising to me that the C equivalent of repr(C) bool is bool, not int. It's been in the C standard for over 10 years...

Sorry, dinosaur here. AFAICT stdbool.h is from c99, and gboolean is from 1996 or thereabouts :)

@le-jzr

This comment has been minimized.

Copy link

le-jzr commented Apr 27, 2017

_Bool may have been in the standards for a few years, but it isn't integrated in the overall C language in any way. The result type of comparison operators is still int, as is the expected type of conditions in if and while. Most of the old code (which means, most of existing code) never made the transition to _Bool, because it's mostly useless and has unspecified memory layout.

Now, the question is, does Rust guarantee that Rust bool is the same as C _Bool? If so, it's technically correct to use bool in #repr(C), but it's not intuitive what it means, due to C's history (as exhibited by this very thread).

What I would suggest is this:

  • Add a type libc::c_Bool that's guaranteed to be the C _Bool.
  • Lint for bool in FFI context, and suggest in being replaced with either libc::c_int or libc::c_Bool, based on context.
  • Educate people about the pitfalls of bools in C interfaces.
@ubsan

This comment has been minimized.

Copy link
Contributor

ubsan commented Apr 27, 2017

@le-jzr C is not the only language that Rust FFIs with. In fact, into the future, I would imagine that C++ is the far more important language for Rust to FFI with, and C++ has fully integrated bool. We shouldn't punish FFI with good languages because of the mistakes of bad languages ;)

(I'd also note that all C types (except the intN_t and char types) have an unspecified (or implementation-defined) layout; the only thing that's guaranteed is that objects are binary (not ternary, for example). The intN_t types are guaranteed to be 2s complement, without padding bits, and the char types are guaranteed to be CHAR_BITS wide, and have no padding bits.)

@le-jzr

This comment has been minimized.

Copy link

le-jzr commented Apr 27, 2017

@ubsan That's just more reason not to allow plain bool in FFI, since different languages may have very different ideas about how bool is represented. Let a library type for a particular language decide, and provide conversions if necessary. IMO that's the least error-prone solution.

@CryZe

This comment has been minimized.

Copy link

CryZe commented Apr 27, 2017

I actually hit this issue just yesterday, where both Rust's and C#'s bool are 1 byte, but C# assumes bools are 4 bytes during FFI, so my functions always returned true.

@ubsan

This comment has been minimized.

Copy link
Contributor

ubsan commented Apr 27, 2017

@le-jzr I don't like making Rust worse because other languages made the wrong choice. C's _Bool is the correct interop representation for a boolean type, and old C libraries (like WINAPI's) choices should not affect that.

@le-jzr

This comment has been minimized.

Copy link

le-jzr commented Apr 27, 2017

@ubsan I don't understand what you mean by "making Rust worse". Current state is strictly worse -- we can use bool in FFI, but it doesn't mean what anyone would expect of it, because the representation of bool is still unstable (as far as I gathered from the issue OP linked, feel free to correct me). As such, at this moment, using bool in FFI is a bug, and even if it works at all (haven't tested), it can silently break at any time.

@Nemo157

This comment has been minimized.

Copy link
Contributor

Nemo157 commented Apr 28, 2017

It does seem that Rust bool <-> C _Bool should be the conversion. As mentioned earlier, the problem with documenting how many bytes a bool takes up during FFI is that it's implementation defined in the C standard. GCC at least (and I assume all other decent C compilers) then specifies that it's ABI defined. Taking a look at one popular ABI:

Booleans, when stored in a memory object, are stored as single byte objects the value of which is always 0 (false) or 1 (true). When stored in integer registers (except for passing as arguments), all 8 bytes of the register are significant; any nonzero value is considered true.
-- §3.1.2

When a value of type _Bool is returned or passed in a register or on the stack, bit 0 contains the truth value and bits 1 to 7 shall be zero. Other bits are left unspecified, hence the consumer side of those values can rely on it being 0 or 1 when truncated to 8 bit.
-- §3.2.3

System V Application Binary Interface - AMD64 Architecture Processor Supplement - Draft Version 0.99.7

Which appears to me to be compatible with Rust's definition of a bool as long as any return/parameter _Bool gets truncated before using. (I assume this is probably the ABI that was looked at when designing Rust's low level type representation).


What happens if there is an ABI that doesn't have the same guarantees as Rust though? Say, one where it does guarantee to pass a parameter of type _Bool with the size and alignment of an int, but only guarantees that the value is 0 for false and non-zero for true. That would imply an FFI function like extern fn foo(bar: bool) { ... } could have the value 2 passed in. It would be possible to allow this by injecting some runtime conversion code into all extern functions when compiled under that ABI, a transformation logically equivalent to

extern fn foo(bar: c_int) {
    let bar = bar != 0;
    ...
}

which would hopefully get optimised out, but could potentially introduce a slight hidden runtime cost. An alternative could be to not allow bool to be used in FFI and have users write something like

extern fn foo(bar: c_Bool) {
    let bar = bar.to_bool();
    ...
}

Which on x86-64 could be a no-op, and on the hypothetical incompatible ABI could be a slight explicit runtime cost if not optimised out.


For all I know this might already be taken care of by LLVM just by how Rust defines its bool type. Either way I think a big takeaway from this issue is that there needs to be more explicit documentation that Rust bool <-!-> C int on an FFI boundary.

@arielb1

This comment has been minimized.

Copy link
Contributor

arielb1 commented Apr 28, 2017

@Nemo157

A bool is not an int. When defining an extern "C" function, the Rust types should match the C types.

Which on x86-64 could be a no-op, and on the hypothetical incompatible ABI could be a slight explicit runtime cost if not optimised out.

This cost is only taken on extern "C" functions, and there are already several other cases where we have to adapt the ABI.

@Nemo157

This comment has been minimized.

Copy link
Contributor

Nemo157 commented Apr 28, 2017

@arielb1 I've updated the first code block and preceding paragraph to try and remove any implication that I think a bool is an int.

If there are already hidden runtime costs in extern "C" functions depending on the target ABI then I'm definitely in favor of having an implicit rust bool <-> C _Bool conversion, since it's almost certainly free in the majority of ABIs. This just really needs to be documented well somewhere; I just checked the std library docs for bool, the Rustonomicon, the FFI chapter in the book, the Reference and the new book, in none of these could I find any documentation on what C type any primitive Rust type actually corresponds to. For the sized integers this is pretty easy to guess, but as this issue shows people can make different assumptions about what C type bool is equivalent to depending on their C background.

@erkinalp

This comment has been minimized.

Copy link

erkinalp commented Jun 3, 2017

GCC and LLVM bools occupy one byte in memory.

@arielb1

This comment has been minimized.

Copy link
Contributor

arielb1 commented Jun 3, 2017

@Nemo157

Sure enough, we need better documentation on how Rust types match ABI types, both in extern "C" function arguments and in memory layouts.

@vext01

This comment has been minimized.

Copy link

vext01 commented Dec 11, 2017

It sounds to me like the Rust FFI should guarantee that it's native bool can safely map to to the C99 _Bool via a libc::c_bool.

For other languages that want to map their Booleans at the ABI level, wouldn't it make sense to have a separate (external) crate defining a specialised bool type?

In other words libc::c_bool is a C99 bool, and mylang::bool is for some other language which wants to pass around bools at the ABI level.

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Dec 11, 2017

Cross-referencing more discussion @ rust-lang/rust#46176

@nagisa

This comment has been minimized.

@matklad

This comment has been minimized.

Copy link
Member

matklad commented Feb 10, 2018

Rust bool now officially matches C _Bool, and this I think means that we should close this issue?

Link to the decision: rust-lang/rust#46176 (comment)

@federicomenaquintero

This comment has been minimized.

Copy link
Author

federicomenaquintero commented Feb 12, 2018

I'm fine with closing this now that bool matches the platform's C _Bool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.