Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#[repr(C)] enum bit width determination algo does not match that of the C compiler #28925

Open
mzabaluev opened this issue Oct 9, 2015 · 6 comments
Labels
A-ffi Area: Foreign Function Interface (FFI) C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@mzabaluev
Copy link
Contributor

In C, support for enum determinants that cannot be represented by an int is implementation-defined. Many compilers choose a different basic type for representation, based on the types or the values of the determinants. This is where the algorithm currently used by Rust for #[repr(C)] enums may differ from the one used by the C compiler, although the difference can be observed in fairly marginal cases.

GCC uses the types of the determinant expressions to find the best fit, so this is a 32-bit type on 64-bit Linux:

typedef enum {
    A = 0x80000001,
    B = -0x80000000,
} C;

Yet this is 64-bit:

typedef enum {
    A = 0x80000001,
    B = -2147483648  /* -0x80000000 */
} C;

This is because the type of integer constant 0x80000000 is determined as unsigned int due to its hexadecimal notation, and the negation preserves the type.

Rust first coerces all discriminants to isize and then apparently works out the best fitting representation type from the value range, where fitting means that negative values are preserved as such. So this ends up being 64-bit:

#[repr(C)]
enum C {
    A = 0x80000001,
    B = -0x80000000,
}

fn main() {
    println!("{}", std::mem::size_of::<C>());
}

I'm uncertain as to which approach is the best for fixing this. Trying to match the behavior of C compilers quirk-for-quirk does not seem to be feasible: there may be more than one compiler per target, and in fact the behavior even changes with compiler options, as e.g. the type of an integer constant is determined differently pre-C99 and post-C99. Perhaps a more conservative solution would be to lint on discriminant values that are out of the libc::c_int domain and suggest using fixed-width representations such as #[repr(u32)].

@mzabaluev
Copy link
Contributor Author

Credit goes out to @retep998 for discovering this.

@mzabaluev mzabaluev changed the title #[repr(C)] enum bit width determination algo does not match that of the C compiler #[repr(C)] enum bit width determination algo does not match that of the C compiler Oct 12, 2015
@briansmith
Copy link
Contributor

See also -fshort-enums. Some ARM ABIs require -fshort-enums and oithers require -fno-short-enums.

@mzabaluev
Copy link
Contributor Author

@briansmith: I think -fshort-enums needs to be represented by #[repr(u16)] and the like. The concern here is about not doing what the C compiler does by default.

@huonw huonw added A-ffi Area: Foreign Function Interface (FFI) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jan 6, 2016
@briansmith
Copy link
Contributor

@briansmith: I think -fshort-enums needs to be represented by #[repr(u16)] and the like. The concern here is about not doing what the C compiler does by default.

I think what the C compiler does by default is sometimes determined by the ABI. For example, I believe sometimes GCC's default for that flag differs depending on how GCC was configured when it was built.

IMO, instead of matching the default of "the" C compiler (in fact, there is often more than one C compiler, each of which may have a different default), Rust should default to matching the ABI.

@zackw
Copy link
Contributor

zackw commented Jan 6, 2017

I tripped over this while trying to add waitid to libc. This function is specified by POSIX to take one argument of type idtype_t, and idtype_t is specified to be a C enum. (As far as I know, this is the only such case within POSIX.) At present, because of this (and arguably also #34641) there is no way to be sure that a Rust shim for a C function that takes or returns a C enum type is ABI-correct.

@jld
Copy link
Contributor

jld commented Nov 23, 2017

Rust first coerces all discriminants to isize

This by itself is arguably a bug, even on boring platforms like x86 Linux. Simple constructive proof:

#include <stdio.h>

enum E {
  L = 0,
  H = 0x100000000
};

int main() {
	printf("%zu > %zu\n", sizeof(enum E), sizeof(void*));
	return 0;
}
$ cc -m32 -Wall -W -O1 big-enum.c && ./a.out
8 > 4

That isn't portable ISO C, but GCC and Clang accept it without warnings unless -pedantic is used, and the second example in #34641 is basically the same thing but with fewer bits.

The original discriminant sizing (and #[repr(C)]) would coerce values to both u64 and i64 to try out the various int types, because constant evaluation was… different back then, and “what type do #[repr(C)] initializers have?” wasn't really a well-formed question. But it is now.


So, trying to summarize the problems here:

  1. #[repr(C)] on Windows considers 64-bit types and should not; this looks relatively easy to fix.
  2. C numerics don't match Rust (-0x80000000 vs. -2147483648); this is hard. I don't know if there's any solution besides requiring users to constant-fold according to C rules before using the values in Rust, or maybe writing a (procedural?) macro to do that. Is there some way to at least lint for this?
  3. isize may be too small for some C enums; this can be worked around by manually applying a fixed-size repr, assuming there aren't any cases that are larger than some target's isize but small enough for ABI differences to differ.
  4. Somewhere along the line the target architecture changed from an enum to a string, so the i32 lower bound became the default, and it's probably no longer correct for all targets (MSP430?), and then there's -fshort-enums. Fixing this is probably tedious but not conceptually difficult.

As for idtype_t, it's probably not using huge values, so the main problem there would be # 4.

@Enselic Enselic added C-bug Category: This is a bug. and removed C-feature-request Category: A feature request, i.e: not implemented / a PR. labels Sep 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ffi Area: Foreign Function Interface (FFI) C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants