Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

__darwin_size_t is included even when it's not whitelisted #1215

Open
kornelski opened this issue Jan 5, 2018 · 10 comments
Open

__darwin_size_t is included even when it's not whitelisted #1215

kornelski opened this issue Jan 5, 2018 · 10 comments

Comments

@kornelski
Copy link
Contributor

kornelski commented Jan 5, 2018

Input C/C++ Header

#include <stdlib.h>
struct Foo {
    size_t s;
};

Bindgen Invocation

$ bindgen --whitelist-type Foo test.h

on macOS High Sierra.

Actual Results

pub type __darwin_size_t = :: std :: os :: raw :: c_ulong ;

(other darwin types are filtered out correctly)

Expected Results

No darwin internals at all.

@emilio
Copy link
Contributor

emilio commented Jan 6, 2018

By default a type generates all types that depend on it. In this case I guess in your platform is something like:

typedef unsigned long long __darwing_size_t;
typedef __darwin_size_t size_t;

The special casing of size_t -> usize happens later in the pipeline (on code generation), and by then we think __darwin_size_t needs to be generated.

This one is fixable, not sure how easily in a clean way though, guess I'll need to give it a shot. Easy workaround would be to blacklist_type("__darwin_.*").

@tmfink
Copy link
Contributor

tmfink commented Jun 27, 2020

I'm hitting a similar issue on Linux. The uint8_t type is defined with two levels of typedef.

Input C header

// Emulate: #include <inttypes.h>
typedef unsigned char __uint8_t;
typedef __uint8_t uint8_t;

typedef uint8_t (*my_callback)(const uint8_t *code);

Bindgen Invocation

$ bindgen extra_include.h --whitelist-type my_callback

Actual Results

pub type __uint8_t = ::std::os::raw::c_uchar;
pub type my_callback = ::std::option::Option<unsafe extern "C" fn(code: *const u8) -> u8>;

Expected Results

pub type my_callback = ::std::option::Option<unsafe extern "C" fn(code: *const u8) -> u8>;

Since __uint8_t is never used and not whitelisted, it should not appear in the output.

@kulp
Copy link
Member

kulp commented Jun 28, 2020

I think this is basically the same issue as #1188 -- an intermediate typedef is produced in both cases, because that is what the C headers say to do.

In my opinion, there should not be explicit edge cases in bindgen for this sort of thing.

The best description of rule to implement might be @tmfink's :

Since __uint8_t is never used and not whitelisted, it should not appear in the output.

but I still am not sure why "a typedef is not used" should really mean that the typedef is suppressed and other typedefs are stitched together in its absence.

Since the C headers have this, why should not bindgen ? As @emilio says, blacklist is available.

@tmfink
Copy link
Contributor

tmfink commented Jun 28, 2020

@kulp I see what you are saying about not wanting to special case this (assuming I'm understanding correctly).

One way to solve this would be to do an extra pass to see what items are "reachable" from the whitelisted set. Unreachable types (such as __uint8_t in my earlier example) would not be reachable and therefore removed.
If this incurs too much of a performance penalty, it could be opt-in.

My crate capstone-sys has pre-generate bindings that are used by default (to avoid cost of running bindgen each build).
https://github.com/capstone-rust/capstone-rs/blob/db25baef6381b339241302b3516792280dba5cc3/capstone-sys/pre_generated/capstone.rs#L3-L11

It would be nice if the bindgen output was consistent for each platform (Windows, Mac, Linux, FreeBSD, etc.) and did not rely details of the system headers. The blacklist approach depends on details of system's headers.

@kulp
Copy link
Member

kulp commented Jun 28, 2020

One way to solve this would be to do an extra pass to see what items are "reachable" from the whitelisted set. Unreachable types (such as __uint8_t in my earlier example) would not be reachable and therefore removed.

I think actually the __uint8_t type is reachable from the whitelisted types (uint8_t -> __uint8_t -> unsigned char), but later during code generation the uint8_t type itself is changed to u8 (as pointed out in #1215 (comment)) so final bindings emit __uint8_t unnecessarily.

At the risk of overgeneralizing the problem, maybe @tmfink's request could be expressed like this :

Prune type aliases (typedefs) that are not used by the emitted bindings.

However, this still does not really address what @kornelski wants, because the emitted bindings in that original case do reference the "internals" :

pub type __darwin_size_t = ::std::os::raw::c_ulong;
pub type size_t = __darwin_size_t;
#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct Foo {
    pub s: size_t,
}

So, on my Mac at least, I do not see any "special casing of size_t -> usize" and I think there might be two separate issues here :

  1. ignore "internal" intermediate typedefs (@kornelski) -- unclear how to determine what constitutes "internals"
  2. prune unused typedefs (@tmfink) -- a clear rule can exist here

@emilio
Copy link
Contributor

emilio commented Jun 29, 2020

Yeah, this is related to --size-t-is-usize and co too. The issue is that we do the replacement for types like uint8_t to u8 when we're generating code, which seems generally reasonable, but that means that we don't account for that replacement when computing the whitelist, and thus types that uint8_t uses are implicitly generated.

@emilio
Copy link
Contributor

emilio commented Jun 29, 2020

The right way to fix this is doing those replacements sooner, IMO, but that may or may not be a lot of work for this edge case.

@kornelski
Copy link
Contributor Author

There are more options:

  • Don't include intermediate typedefs or unused types from system headers. This would automatically exclude __darwin and similar OS/compiler implementation details.

  • Eagerly resolve typedefs all the way to a primitive type. That's all what counts for an ABI.

  • Always prune __* types, because that's C's way of saying "this is private".

@emilio
Copy link
Contributor

emilio commented Jun 29, 2020

Fair enough :)

Here some trade-offs with those that come to mind:

* Don't include intermediate typedefs or unused types from system headers. This would automatically exclude __darwin and similar OS/compiler implementation details.

Yeah, the problem is that this requires yet another pass on top of all the existing passes we do just to detect this special case, which is not great. You get this for free if you do the replacement earlier...

* Eagerly resolve typedefs all the way to a primitive type. That's all what counts for an ABI.

That's true, but it seems like the resulting bindings could be less idiomatic unnecessarily. So much stuff would end up being void* or int instead of some better name in the function signatures.

* Always prune `__*` types, because that's C's way of saying "this is private".

This is the simplest, but but if that typedef is used in a "public" struct, like (pseudo-code):

struct something {
    // Public members, then...
    __uint32_t __reserved[10];
};

Then bindgen would generate bindings that don't compile, which is not great. Maybe we could auto-treat them as opaque and it'd "just work", not sure...

@kornelski
Copy link
Contributor Author

kornelski commented Jun 29, 2020

To clarify, by eagerly resolve I've meant the intermediate types, so that you'd still get type typedefed = u8 instead of type typedefed = __internal_byte; type __internal_byte = u8.

And if a name is used outside of a typedef chain, then of course it should not be pruned.

AFAIK "requires yet another pass" is purely bindgen's problem, not something user would be concerned about. C headers and libclang have messy issues, so needing multiple passes to clean that data up seems normal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants