-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix build for upstream LLVM and clang #210
Conversation
This works for me with an llvm+clang build, and I also tested on llvm 5 from packages, and it still worked. @drzaeus77 @yonghong-song does this work for you too? |
@drzaeus77 @yonghong-song can someone else please confirm this works so we can merge this PR? |
ping @drzaeus77 @yonghong-song can you check if this patch fixed the issue? |
d17248c
to
ad16842
Compare
Rebased and added Travis tests for LLVM 8. Three tests are failing. Apparently anonymous structs are broken with clang 8:
We should probably fix this before landing this PR. |
Apparently |
Yes, Let's use the struct below as an example: struct Foo {
struct {
int x;
} bar;
} If we call Interestingly though, both versions return true for indirect structs (example below). struct Foo {
struct {
int x;
};
} In theory, the new clang behavior is correct (in the first example I'm not familiar with clang API, so I don't know if there's a way to check if a struct has indirect access. A workaround we could implement on bpftrace side is to make our lexer give a name to all anonymous structs with direct access. For example, |
Can you split the source change and
For people who bison generated
An alternative is to check if |
Patch taken from upstream PR: bpftrace/bpftrace#210 Copyright: Sony Interactive Entertainment Inc. Package-Manager: Portage-2.3.62, Repoman-2.3.12 Signed-off-by: Patrick McLean <chutzpah@gentoo.org>
I can, but the failing tests are uncovering a real bug. Not sure if it's a good idea to fix the build if the resulting binary will not be fully functional. |
Unless libclang has a clear, reliable way to determine if a struct is indirect, the code will get pretty messy and fragile. I'm close to make it work, but I'm not liking the result. @ajor we only use libclang for union/struct parsing, right? Do you think it would be feasible to move union/struct parsing to bpftrace and drop libclang as a dependency? This would also solve #266 and #139 (comment). I'm willing to work on this. |
ad16842
to
39d5a02
Compare
I did try to do my own struct parsing at first before turning to libclang. I think it would be a lot of work to do properly:
and probably more |
Yes, it would be laborious. OTOH, we are introducing some ugly hacks and workarounds because of libclang (this PR and #476), and we are unable to build bpftrace statically outside of alpine because of libclang as a dependency. I'm not saying we must remove libclang and go with our own implementation, but using libclang is not pain-free, and having our own implementation could be more maintainable in the long-term. |
39d5a02
to
dc4a4cf
Compare
Updated the message on my last commit to explain the logic behind the changes. This is good from my side, just need some reviews. |
For me, this looks good. Builds and passes all tests on clang8.
Two warnings introduced (below), fixed with:
|
|
LLVM changed the behavior of clang_Cursor_isAnonymous in [1]. The old behavior would returned false for direct-acccess anonymous structs within other structs, whereas LLVM 8 returns true. To circumvent this behavior change among LLVM versions, we keep track of all structs defined within a struct. We don't parse the substruct recursively (if we do it might be parsed multiple times, and since we don't know yet if this is a direct or indirect strucutre, we might parse it incorrectly), instead we keep the cursor saved in a map. If this substruct is later declared as an attribute of the supersctruct, that means we have a direct-accessible struct. We remove it from our map, and we parse recursively (parsing recursively the cursor pointing to the declaration will effectively parse the struct definition). After the first parse, any struct left in our map is an indirect struct. Since we skipped the parsing stage for those, we need to run `clang_visitChildren` again for each struct cursor we have saved. We repeat this until there's no unvisited structs in the map. Keep in mind that while visiting a new struct we might find more indirect structs. Also add Travis jobs to test against LLVM and clang 8 on Ubuntu. [1]: llvm/llvm-project@c05e6f4
dc4a4cf
to
30eaeb7
Compare
if (struct_name == "") | ||
struct_name = ptypestr; | ||
remove_struct_prefix(struct_name); | ||
return CXChildVisit_Continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we returned CXChildVisit_Recurse here can we get rid of the looping logic and recording univisited indirect structs? Sorry, my build is still broken so I can't check myself (LLVM 8 wasn't the only problem on my computer apparently)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No because at this point we don't know if this struct is just being defined (indirect struct) or if it will be used as the type for a field of the parent struct (direct struct). The cursor is pointing to the struct declaration, not in the field declaration.
Look at the struct below (the current clang cursor is what's inside <
and >
:
struct Foo {
int field1;
<struct {
int field2;
}> field3;
char field4;
}
In this case, we want to be able to access field2
as Foo.field3.field2
. But we don't know if this is a direct or indirect struct, and we'll only know once we move our cursor to forward, in which case we'll be at:
struct Foo {
int field1;
struct {
int field2;
} <field3>;
char field4;
}
Same thing goes to indirect structs, we'll only know how the access will work once we move the cursor.
Older versions of LLVM would differentiate those two access types with clang_Cursor_isAnonymous, but from what I can tell this was a bug which was fixed on LLVM 8. Keeping track of inner structs and deferring their parsing until we have enough information was the only way I could find to keep things working across LLVM versions.
auto type = clang_getCanonicalType(clang_getCursorType(c)); | ||
auto typestr = get_clang_string(clang_getTypeSpelling(type)); | ||
|
||
if (indirect_structs.count(typestr)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would we hit this? Could you add a comment as it's not immediately clear to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following the example above, would be in this case:
struct Foo {
int field1;
struct {
int field2;
} <field3>;
char field4;
}
We saved the possibly indirect struct in our previous step, and then when we moved the cursor we realized that we were actually declaring a field of the parent struct. In this case we don't want to treat the struct as indirect, so we remove it from our list of indirect structs.
Looking at this again, this seems to be incomplete, as we'll never parse the unnamed struct in that example (unless I'm missing something). I'll investigate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently it works, but I don't know why 🙃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should have added waaaay more comments to this code :)
if (struct_name == "") | ||
struct_name = ptypestr; | ||
remove_struct_prefix(struct_name); | ||
return CXChildVisit_Continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No because at this point we don't know if this struct is just being defined (indirect struct) or if it will be used as the type for a field of the parent struct (direct struct). The cursor is pointing to the struct declaration, not in the field declaration.
Look at the struct below (the current clang cursor is what's inside <
and >
:
struct Foo {
int field1;
<struct {
int field2;
}> field3;
char field4;
}
In this case, we want to be able to access field2
as Foo.field3.field2
. But we don't know if this is a direct or indirect struct, and we'll only know once we move our cursor to forward, in which case we'll be at:
struct Foo {
int field1;
struct {
int field2;
} <field3>;
char field4;
}
Same thing goes to indirect structs, we'll only know how the access will work once we move the cursor.
Older versions of LLVM would differentiate those two access types with clang_Cursor_isAnonymous, but from what I can tell this was a bug which was fixed on LLVM 8. Keeping track of inner structs and deferring their parsing until we have enough information was the only way I could find to keep things working across LLVM versions.
auto type = clang_getCanonicalType(clang_getCursorType(c)); | ||
auto typestr = get_clang_string(clang_getTypeSpelling(type)); | ||
|
||
if (indirect_structs.count(typestr)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following the example above, would be in this case:
struct Foo {
int field1;
struct {
int field2;
} <field3>;
char field4;
}
We saved the possibly indirect struct in our previous step, and then when we moved the cursor we realized that we were actually declaring a field of the parent struct. In this case we don't want to treat the struct as indirect, so we remove it from our list of indirect structs.
Looking at this again, this seems to be incomplete, as we'll never parse the unnamed struct in that example (unless I'm missing something). I'll investigate.
Ok, I figured it out. When we move the cursor forward we'll be holding the struct definition and the field declaration. The cursor type will be FieldDecl and when we return CXChildVisit_Recurse we'll parse the inner struct. If we pass So the flow looks like this: Direct struct accessStep 1struct Foo {
int field1;
<struct {
int field2;
}> field3;
char field4;
} In this case, we want to be able to access Step 2struct Foo {
int field1;
<struct {
int field2;
} field3>;
char field4;
} Now we have all the information we need. We know this is a field declaration and we don't want to treat the struct as indirect, so we remove it from the indirect structs list. We'll follow the most common code path here, and at the end we'll return Step 3struct Foo {
int field1;
struct {
<int field2>;
} field3;
char field4;
} We know we are inside a struct used in a field declaration, so we'll assign Indirect struct accessStep 1struct Foo {
int field1;
<struct {
int field2;
}>;
char field4;
} In this case, we want to be able to access Step 2struct Foo {
int field1;
struct {
int field2;
};
<char field4>;
} We'll keep advancing our cursor until we reach the end of our file. At that point we know the fields of our struct will be indirectly accessed, but we havent't parsed the anonymous struct yet. So we start parsing again, passing the cursor we saved on step 1 as starting point. Note that we'll only parse the children of our struct, so we are not parsing the entire file again. Step 3struct Foo {
int field1;
struct {
<int field2>;
};
char field4;
} We know we are inside a struct which was not used in a field declaration, so we'll assign
|
No description provided.