Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnamed types have a name when switching to llvm >= 16.0.0 #2488

Closed
Ilia-Kosenkov opened this issue Apr 9, 2023 · 16 comments
Closed

Unnamed types have a name when switching to llvm >= 16.0.0 #2488

Ilia-Kosenkov opened this issue Apr 9, 2023 · 16 comments

Comments

@Ilia-Kosenkov
Copy link

Sorry for not using the issue template, I have no minimal reproducible example right now.

Hi. We at extendr/libR-sys use bindgen to generate bindings for a set of headers provided by the R programming language. We have a complicated setup which creates bindings for all three mainstream OSes. On Windows, we rely on the MSYS2 and its version of LLVM.

Recently our CI for Windows started failing with the following errors:

  • warning: unused option: --allowlist-function followed by a collection of |-separated entries of our allow list
  • warning: unused option: --allowlist-var
  • warning: unused option: --allowlist-type

The builder configuration is (approximately) the following:

let mut bindgen_builder = bindgen::Builder::default()
        .allowlist_function(&allowlist_pattern)
        .allowlist_var(&allowlist_pattern)
        .allowlist_type(&allowlist_pattern)
        .header("wrapper.h")
        .parse_callbacks(Box::new(bindgen::CargoCallbacks))
        .clang_args(&[
           format!("-I{}", r_paths.include.display()),
           format!("--target={}", target),
        ])            
        .blocklist_item("max_align_t")
        .blocklist_item("__mingw_ldbl_type_t")
        .generate_comments(true)
        .parse_callbacks(Box::new(RCallbacks))
        .clang_arg("-fparse-all-comments");        

While it might be messy, you can see that we have some patterns specified for allow lists, and we have identical allow lists on all three OSes, and the allow list has not been recently changed.

I was able to locally reproduce this error if I install llvm 16.0.0 or higher (this time I am talking about native Windows version), but if I roll back to llvm 15.0.7, it works as before. This, to me, suggests that there is some (breaking?) change in, perhaps, CLI API or elsewhere that affects us.

I tried to scout clang-sys and this repo, but found no relevant issues. I am aware that this could be a problem in clang-sys, but I have a very limited understanding of llvm infrastructure, so I decided to start my investigation from here.

@pvdrz
Copy link
Contributor

pvdrz commented Apr 10, 2023

We don't have any logic conditioned to clang 16 on the bindgen codebase. I suspect this isn't an issue for clang-sys either as they just provide bindings to the libclang library as it is.

I suspect the reason for this breakage is that either mangling or some other naming mechanism changed in clang 16. Sadly without a reproducible example there's not much we can do from the bindgen side. I don't have a windows machine either so I don't have any way to test this.

EDIT: I closed this by accident 😅. I got confused between tabs.

@pvdrz pvdrz closed this as completed Apr 10, 2023
@pvdrz pvdrz reopened this Apr 10, 2023
@Ilia-Kosenkov
Copy link
Author

Hey @pvdrz, thanks for responding. Let us keep this issue open for a couple of days, I will try to come up with a reproducible example (though not sure how that will help if you cannot check this on Windows). All I can say is that, e.g., our MacOs runner uses LLVM 16.0.0 with exactly the same setup, and it does not fail, which to me means that this issue might be Windows-specific.

@CGMossa
Copy link
Contributor

CGMossa commented Apr 10, 2023

We've found one bug that is presumably the source of this issue. Unnamed items.
Using clang-rs, we have something that uses Index, and in there an unnamed enum
gets the name/display name like

name: enum (unnamed at src/unnamed_items.c:9:1)
display_name: enum (unnamed at src/unnamed_items.c:9:1)

I've made a repo that shows this issue. Most crucial details are below, otherwise repo is at https://github.com/CGMossa/clang_bug.

Details

Given a file

// Named enum
typedef enum
{
    STATUS_OK,
    STATUS_ERROR
} Status;

// Unnamed enum
enum
{
    LEVEL_LOW,
    LEVEL_MEDIUM,
    LEVEL_HIGH
};

// Named struct
typedef struct
{
    int x;
    int y;
} Coordinate;

// Unnamed struct
struct
{
    float width;
    float height;
} rectangle;

// Named union
typedef union
{
    int intValue;
    double doubleValue;
    char charValue;
} Data;

// Unnamed union
union
{
    short s;
    long l;
} number;

And

use clang::{Clang, Index};

fn main() {
    // println!("clang version: {}", clang::get_version());
    let clang = Clang::new().unwrap();
    let idx = Index::new(&clang, false, false);
    // let tu = idx.parser(source).parse().unwrap();
    let tu = idx.parser("src/unnamed_items.c").parse().unwrap();
    // dbg!(&tu);
    // dbg!(tu.get_entity());
    dbg!(tu.get_entity().get_children());
    for ele in tu.get_entity().get_children() {
        if let Some(name) = ele.get_name() {
            println!("name: {name}");
        }
        if let Some(display_name) = ele.get_display_name() {
            println!("display_name: {display_name}");
        }
    }
}

We see this issue between different llvm versions.

~\Documents\GitHub\clang_bug> clang --version                                                                                                    
clang version 16.0.0
Target: x86_64-pc-windows-msvc
Thread model: posix
InstalledDir: C:\Users\minin\scoop\apps\llvm\current\bin
~\Documents\GitHub\clang_bug> RUST_BACKTRACE=1 cargo run o+e> build_clang_16.log                                                                 ~\Documents\GitHub\clang_bug> scoop reset llvm@15.0.0                                                               
Resetting llvm (15.0.0).
Linking ~\scoop\apps\llvm\current => ~\scoop\apps\llvm\15.0.0
~\Documents\GitHub\clang_bug> RUST_BACKTRACE=1 cargo run o+e> build_clang_15.log

LLVM 15

name: Status
display_name: Status
name: Coordinate
display_name: Coordinate
name: rectangle
display_name: rectangle
name: Data
display_name: Data
name: number
display_name: number
    Finished dev [unoptimized + debuginfo] target(s) in 0.02s
     Running `C:/cargo_target_dir\debug\clang_bug.exe`
[src\main.rs:11] tu.get_entity().get_children() = [
    Entity {
        kind: EnumDecl,
        display_name: None,
        location: Some(
            SourceLocation {
                file: Some(
                    File {
                        path: "src/unnamed_items.c",
                    },
                ),
                line: 2,
                column: 9,
                offset: 23,
            },
        ),
    },
    Entity {
        kind: TypedefDecl,
        display_name: Some(
            "Status",
        ),
        location: Some(
            SourceLocation {
                file: Some(
                    File {
                        path: "src/unnamed_items.c",
                    },
                ),
                line: 6,
                column: 3,
                offset: 68,
            },
        ),
    },
    Entity {
        kind: EnumDecl,
        display_name: None,
        location: Some(
            SourceLocation {
                file: Some(
                    File {
                        path: "src/unnamed_items.c",
                    },
                ),
                line: 9,
                column: 1,
                offset: 96,
            },
        ),
    },
    Entity {
        kind: StructDecl,
        display_name: None,
        location: Some(
            SourceLocation {
                file: Some(
                    File {
                        path: "src/unnamed_items.c",
                    },
                ),
                line: 17,
                column: 9,
                offset: 187,
            },
        ),
    },
    Entity {
        kind: TypedefDecl,
        display_name: Some(
            "Coordinate",
        ),
        location: Some(
            SourceLocation {
                file: Some(
                    File {
                        path: "src/unnamed_items.c",
                    },
                ),
                line: 21,
                column: 3,
                offset: 224,
            },
        ),
    },
    Entity {
        kind: StructDecl,
        display_name: None,
        location: Some(
            SourceLocation {
                file: Some(
                    File {
                        path: "src/unnamed_items.c",
                    },
                ),
                line: 24,
                column: 1,
                offset: 258,
            },
        ),
    },
    Entity {
        kind: VarDecl,
        display_name: Some(
            "rectangle",
        ),
        location: Some(
            SourceLocation {
                file: Some(
                    File {
                        path: "src/unnamed_items.c",
                    },
                ),
                line: 28,
                column: 3,
                offset: 308,
            },
        ),
    },
    Entity {
        kind: UnionDecl,
        display_name: None,
        location: Some(
            SourceLocation {
                file: Some(
                    File {
                        path: "src/unnamed_items.c",
                    },
                ),
                line: 31,
                column: 9,
                offset: 346,
            },
        ),
    },
    Entity {
        kind: TypedefDecl,
        display_name: Some(
            "Data",
        ),
        location: Some(
            SourceLocation {
                file: Some(
                    File {
                        path: "src/unnamed_items.c",
                    },
                ),
                line: 36,
                column: 3,
                offset: 423,
            },
        ),
    },
    Entity {
        kind: UnionDecl,
        display_name: None,
        location: Some(
            SourceLocation {
                file: Some(
                    File {
                        path: "src/unnamed_items.c",
                    },
                ),
                line: 39,
                column: 1,
                offset: 450,
            },
        ),
    },
    Entity {
        kind: VarDecl,
        display_name: Some(
            "number",
        ),
        location: Some(
            SourceLocation {
                file: Some(
                    File {
                        path: "src/unnamed_items.c",
                    },
                ),
                line: 43,
                column: 3,
                offset: 489,
            },
        ),
    },
]

LLVM 16

name: Status
display_name: Status
name: Status
display_name: Status
name: enum (unnamed at src/unnamed_items.c:9:1)
display_name: enum (unnamed at src/unnamed_items.c:9:1)
name: Coordinate
display_name: Coordinate
name: Coordinate
display_name: Coordinate
name: struct (unnamed at src/unnamed_items.c:24:1)
display_name: struct (unnamed at src/unnamed_items.c:24:1)
name: rectangle
display_name: rectangle
name: Data
display_name: Data
name: Data
display_name: Data
name: union (unnamed at src/unnamed_items.c:39:1)
display_name: union (unnamed at src/unnamed_items.c:39:1)
name: number
display_name: number
    Finished dev [unoptimized + debuginfo] target(s) in 0.04s
     Running `C:/cargo_target_dir\debug\clang_bug.exe`
[src\main.rs:11] tu.get_entity().get_children() = [
    Entity {
        kind: EnumDecl,
        display_name: Some(
            "Status",
        ),
        location: Some(
            SourceLocation {
                file: Some(
                    File {
                        path: "src/unnamed_items.c",
                    },
                ),
                line: 2,
                column: 9,
                offset: 23,
            },
        ),
    },
    Entity {
        kind: TypedefDecl,
        display_name: Some(
            "Status",
        ),
        location: Some(
            SourceLocation {
                file: Some(
                    File {
                        path: "src/unnamed_items.c",
                    },
                ),
                line: 6,
                column: 3,
                offset: 68,
            },
        ),
    },
    Entity {
        kind: EnumDecl,
        display_name: Some(
            "enum (unnamed at src/unnamed_items.c:9:1)",
        ),
        location: Some(
            SourceLocation {
                file: Some(
                    File {
                        path: "src/unnamed_items.c",
                    },
                ),
                line: 9,
                column: 1,
                offset: 96,
            },
        ),
    },
    Entity {
        kind: StructDecl,
        display_name: Some(
            "Coordinate",
        ),
        location: Some(
            SourceLocation {
                file: Some(
                    File {
                        path: "src/unnamed_items.c",
                    },
                ),
                line: 17,
                column: 9,
                offset: 187,
            },
        ),
    },
    Entity {
        kind: TypedefDecl,
        display_name: Some(
            "Coordinate",
        ),
        location: Some(
            SourceLocation {
                file: Some(
                    File {
                        path: "src/unnamed_items.c",
                    },
                ),
                line: 21,
                column: 3,
                offset: 224,
            },
        ),
    },
    Entity {
        kind: StructDecl,
        display_name: Some(
            "struct (unnamed at src/unnamed_items.c:24:1)",
        ),
        location: Some(
            SourceLocation {
                file: Some(
                    File {
                        path: "src/unnamed_items.c",
                    },
                ),
                line: 24,
                column: 1,
                offset: 258,
            },
        ),
    },
    Entity {
        kind: VarDecl,
        display_name: Some(
            "rectangle",
        ),
        location: Some(
            SourceLocation {
                file: Some(
                    File {
                        path: "src/unnamed_items.c",
                    },
                ),
                line: 28,
                column: 3,
                offset: 308,
            },
        ),
    },
    Entity {
        kind: UnionDecl,
        display_name: Some(
            "Data",
        ),
        location: Some(
            SourceLocation {
                file: Some(
                    File {
                        path: "src/unnamed_items.c",
                    },
                ),
                line: 31,
                column: 9,
                offset: 346,
            },
        ),
    },
    Entity {
        kind: TypedefDecl,
        display_name: Some(
            "Data",
        ),
        location: Some(
            SourceLocation {
                file: Some(
                    File {
                        path: "src/unnamed_items.c",
                    },
                ),
                line: 36,
                column: 3,
                offset: 423,
            },
        ),
    },
    Entity {
        kind: UnionDecl,
        display_name: Some(
            "union (unnamed at src/unnamed_items.c:39:1)",
        ),
        location: Some(
            SourceLocation {
                file: Some(
                    File {
                        path: "src/unnamed_items.c",
                    },
                ),
                line: 39,
                column: 1,
                offset: 450,
            },
        ),
    },
    Entity {
        kind: VarDecl,
        display_name: Some(
            "number",
        ),
        location: Some(
            SourceLocation {
                file: Some(
                    File {
                        path: "src/unnamed_items.c",
                    },
                ),
                line: 43,
                column: 3,
                offset: 489,
            },
        ),
    },
]

@pvdrz
Copy link
Contributor

pvdrz commented Apr 10, 2023

so what's happening is that LLVM 16 on windows is adding (unnamed at path:line:column) to all the unnamed items?

Edit: if that's the case, I think we can do a fix from the bindgen side but I'd need someone to be able to run tests for me.

@CGMossa
Copy link
Contributor

CGMossa commented Apr 10, 2023

No problem. If you're on discord https://discord.gg/dKPbdpgN or here is fine.

@Ilia-Kosenkov
Copy link
Author

Just to clarify a bit, when using LLVM 15, .get_name() returns None for unnamed items, but with LLVM 16 we get Some(elem_type (unnamed at path)). Since we relied on this mechanism to form allow list, this led to malformed CLI arguments and the error I posted above (we do not sanitize the names, and the extra space before the path breaks CLI). With this in mind, I guess the name of the issue is now misleading. @pvdrz, can you please rename it to something more appropriate that reflects the potential issue that we are facing?

@pvdrz pvdrz changed the title Broken --allowlist-* on Windows when switching to llvm >= 16.0.0 Unnamed types have a name on Windows when switching to llvm >= 16.0.0 Apr 11, 2023
@decathorpe
Copy link
Contributor

Looks like this is not only affecting Windows - at least on Fedora Linux 38, we also started to get strange build failures for Rust projects that use bindgen since the upgrade to LLVM / Clang 16 ...

For example, the onig_sys crate now fails to build with this error:

  thread 'main' panicked at '"OnigValue_struct_(unnamed_at_/usr/include/oniguruma_h_787_3)" is not a valid Ident', /usr/share/cargo/registry/proc-macro2-1.0.54/src/fallback.rs:811:9

@pvdrz
Copy link
Contributor

pvdrz commented Apr 11, 2023

The plot thickens :p

@pvdrz pvdrz changed the title Unnamed types have a name on Windows when switching to llvm >= 16.0.0 Unnamed types have a name on Windows (?) when switching to llvm >= 16.0.0 Apr 11, 2023
@cuviper
Copy link
Member

cuviper commented Apr 11, 2023

I bisected the new unnamed output to llvm/llvm-project@19e984e.

@pvdrz
Copy link
Contributor

pvdrz commented Apr 11, 2023

cc-ing @KyleMayes. Do you know if clang-sys is doing any special handling of this new case for clang 16?

@KyleMayes
Copy link
Contributor

KyleMayes commented Apr 11, 2023

No, clang-sys just defines and loads the functions.

From my experience attempting to write tests for clang-rs, the libclang API frequently changes behavior from version to version and, as you are seeing, even occasionally between different platforms on the same version of libclang.

@pvdrz pvdrz changed the title Unnamed types have a name on Windows (?) when switching to llvm >= 16.0.0 Unnamed types have a name when switching to llvm >= 16.0.0 Apr 11, 2023
@pvdrz
Copy link
Contributor

pvdrz commented Apr 11, 2023

Ok thank you all, I installed the latest version of LLVM on my machine and can definitely see an issue. I'll try to come up with a fix soon :)

Edit: surprise! so the issues I saw when compiling bindgen with LLVM 17 are related to C++ namespaces and not this issue. So we have to keep looking for it.

@pvdrz
Copy link
Contributor

pvdrz commented Apr 11, 2023

@decathorpe, I have some questions for you given that you're the only person that has reported this issue on Linux .

I took a look at your header file on github and based on the error you showed I narrowed the problematic code to: https://github.com/kkos/oniguruma/blob/078f95efd45ef5e266b67307e6b3c5ffc3de38ab/src/oniguruma.h#L785-L794

So inspired by that, I ran bindgen using clang 17 with the following input:

typedef union {
  long l;
  int c;
  struct {
    char* start;
    char* end;
  } s;
  void* p;
  int tag;
} OnigValue;

And couldn't reproduce the issue.

Could you confirm that this is the typedef that's causing the issue in your build?

@pvdrz
Copy link
Contributor

pvdrz commented Apr 11, 2023

Just to clarify a bit, when using LLVM 15, .get_name() returns None for unnamed items, but with LLVM 16 we get Some(elem_type (unnamed at path)). Since we relied on this mechanism to form allow list, this led to malformed CLI arguments and the error I posted above (we do not sanitize the names, and the extra space before the path breaks CLI). With this in mind, I guess the name of the issue is now misleading. @pvdrz, can you please rename it to something more appropriate that reflects the potential issue that we are facing?

Even though we not use clang-rs but clang-sys directly we do use clang_getCursorSpelling which is the function used by get_name inside clang-rs: https://docs.rs/clang/latest/src/clang/lib.rs.html#2086-2088

We have this function called spelling which is used to obtain the name of types among other things:

/// Get this cursor's referent's spelling.
pub(crate) fn spelling(&self) -> String {
unsafe { cxstring_into_string(clang_getCursorSpelling(self.x)) }
}

The "easy" fix would be to slap a contains("(unnamed at") and return an empty string in that case. However I'm a bit reluctant to do that because we use that function for several things and we would be doing this check in cases where is not necessary.

We would have to look which of the calls to spelling is the one reporting the name of the type first and I suspect I have the right one but I don't have any way to test it (see previous comment)

@cuviper
Copy link
Member

cuviper commented Apr 12, 2023

Hmm, onig_sys is using bindgen 0.59. I can reproduce the problem with that header up to bindgen 0.61, but 0.62 works. That changelog says, "Various issues with upcoming clang/libclang versions have been fixed." Looks like #2319 is right on the nose.

For @Ilia-Kosenkov using the clang crate, you can add a similar filter with Entity::is_anonymous. That needs at least the "clang_3_7" feature enabled.

@pvdrz
Copy link
Contributor

pvdrz commented Apr 12, 2023

OK nice, thanks @cuviper I'm closing this as a duplicated of #2312 then :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants