-
Notifications
You must be signed in to change notification settings - Fork 12.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fuzz llvm-as #25013
Comments
This bug is to track bugs found when fuzzing llvm-as. Dependent bugs are bugs found by afl-fuzz, or a lib/Fuzzer version of llvm-as. |
Committed lib/Fuzzer version of llvm-as (i.e. llvm-as-fuzzer). Revision: http://reviews.llvm.org/rL246458 |
The bot (http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fuzzer) Currently, it fails instantly due to bug 24640 The corpus for the bot is https://github.com/kcc/fuzzing-with-sanitizers/tree/master/llvm/llvm-as/C1 Later on we may want to extend the corpus with more valid inputs |
I've extended the corpus by adding all .ll files from the llmv test suite Hitting several asserts so far, e.g. lib/IR/Globals.cpp:209: void llvm::GlobalVariable::setInitializer(llvm::Constant *): Assertion `InitVal->getType() == getType()->getElementType() && "Initializer type must match GlobalVariable type"' failed. llvm/include/llvm/Support/Casting.h:237: typename cast_retty<X, Y *>::ret_type llvm::cast(Y *) [X = llvm::Function, Y = llvm::GlobalValue]: Assertion `isa(Val) && "cast() argument of incompatible type!"' failed. |
The current llvm-as-fuzzer.cpp leaks a bit of memory every time Direct leak of 76 byte(s) in 1 object(s) allocated from: This is because the code in llvm::report_fatal_error calls this: void MyFatalErrorHandler(void *user_data, const std::string& reason, We create a string object and the long-jump over it's DTOR. This is not a blocker at this point -- we can always restart the process after processing e.g. 100M units w/o losing much speed, but ideally we need to fix this. |
With max_len=512 I start seeing more interesting things, like bug 24661 |
After a night of fuzzing I've got ~15 assertion unique failures: llvm/include/llvm/IR/DebugInfoMetadata.h:60: llvm::TypedDINodeRefllvm::DIType::TypedDINodeRef(const llvm::Metadata *) [T = llvm::DIType]: Assertion |
Attached are 12 base64-encoded reproducers for 12 different assertion failures. |
I've added a dictionary mode to libFuzzer, similar to that in AFL. |
There is a problem with the current llvm-as-fuzzer -- it accumulates some One part of the problem is that we use the global context. |
parasitic-coverage-repro The more times we parse the same input the more coverage we get. The new coverage comes from e.g. here: So far I failed to understand why.... |
r248556 effectively disables llvm-as-fuzzer on the fuzzer bot because Karl, I wonder if you have time to work on b)? |
I'll look into both, and will focus on the parasitic coverage. |
I believe the problem is due to a FoldingSet used to hold intrinsic functions. A folding set is a hashtable implemented on a SmallVector, using intrusive links to define the list of elements in a bucket. Every time a function (declaration/definition) is parsed, a lookup is done (in the create method), it either returns a pointer (in the hashtable) to the corresponding definition, or it creates a spot for a new one, and returns the corresponding pointer so that it can be initialized. When parsing is done, the table is cleared (but not resized). Hence, on the second parse, the table need not be increased since the first parse caused it to grow. This explains the first growth, found in comment #11, but doesn't for the second. Still looking into this. |
I looked at the parasitic growth in comment #11 some more. I'm convinced that the problem is that the "type signature" is encoded as part of the key in the folding set, and the "type signature" is a pointer to a type (which have been uniqued). This uniquifying of types happens on each iteration, using heap allocated elements. As a result, different iterations will have different hash values, which will cause collisions to randomly occur. As a result, the bucket lists change between iterations. I don't see a way to get rid of this effect. It is ingrained in the guts of LLVM to use dynamically allocated type addresses to uniquely identify types. |
I am ignorant about this part of LLVM: can you point to the exact code & objects? |
Its in the LLVMContext (I think). When constructor Function::Function() is called, somehow field IntID is set, and this value then causes a lookup. It then calls lookupIntrinsicID (in function.cpp). This lookup then looks up (in some table) that is a folding set for the intrinsic. This table has entries that contain other things than just intrinsics. The elements are VERY generic in FoldingSet, and all I know is that other elements vary on contents between iterations. I'm guessing this is some sort of symbol table, and I'm guessing that these other entries have types. However, I haven't convinced myself of that because:
BTW, it is very hard to figure out what is going on because much of the driving code is automatically generated code, and the abstractions are totally gone when you look at the low-level code that is actually being run in the debugger. |
I looked a bit deeper, and I think the cause of the problem is due to the uniquifying of "attribute sets". The method AttributeSetNode::get(LLVMContext &C, ArrayRef Attrs) looks for the attribute set in C.pImpl to see if the attribute set is already defined. If it is found, it returns the existing, Otherwise it creates a new (heap allocated) AttributeSetNode. The problem is how nodes are "profiled" (i.e. the Profile method used to compute the hash on the unique bits of the data). The class AttributeSetImpl defines Profile in terms of method getNode, which returns a pair<unsigned, AttributeSetNode *> for each element. The ID is computed by adding these two values using ID.addInteger() and ID.addPointer(). If you look up the definition of method FoldingSetNodeID::AddPointer(const void* Ptr) it simply adds the pointer to the sequence of bits in the Node ID. Hence, depending on dynamic allocation, you will get different hash values. This explains why we get different hash tables on different iterations. |
mentioned in issue #25014 |
mentioned in issue #25018 |
mentioned in issue #25019 |
mentioned in issue #25020 |
mentioned in issue #25031 |
mentioned in issue #25030 |
mentioned in issue #25035 |
mentioned in issue #25036 |
The text was updated successfully, but these errors were encountered: