-
Notifications
You must be signed in to change notification settings - Fork 660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for archive and single bc file linking #70
Conversation
@MartinNowack Could you clarify two things
|
|
@MartinNowack I just tried this. There seem to be some performance regressions due to your approach. For example running the following on LLVM2.9
takes ~2.9 seconds. On LLVM3.3 the same command (you need the fix here to run the above in LLVM3.3) takes ~54 seconds on my machine. At a glance running the test manually it seems that a very large amount of time is being spent linking in |
I took another look. I'm not convinced the approach is right. IIRC when linking bitcode archives the constituent modules must be iterated over until a fixed point is reached (when the set of undefined symbols does not change). In your approach we only pass over the modules once. I've posted some questions to the llvm-dev mailing list because there are quite a few things here I don't quite understand. It occurs to me there might be a way to avoid these issues entirely but I do not know if it is possible. It would great if clang was setup to link against uclibc (I presume when compiling at the moment it assumes the use of glibc) and have it link in KLEE's other runtime bits. That way KLEE doesn't have to worry about linking in anything itself. |
Linking bitcode files is just simple kind of concatenating them (plus checking for name clashes or solving them in case different linking schemes were used). At the end, you have everything in one big module. Therefore, passing over the bitcode archive once is enough. You won't be able to solve more (but beside that, linking normal object files works differently - and yes, in that case order of object files plays a role and yes, you've to iterate it to get the fixed point). Currently, with gcc and clang, libc implementations are integrated, one can replace them but either one re-compiles the compiler or changes build flags/system in the target application. I'm not quite sure if it makes it easier to use KLEE by modifying essential parts of the compiler. I would rather like to stick with KLEE being able to do it and using the full power of underlying LLVM tools. |
Ah so the approach is valid then. But clearly we are doing much more work than we did previously (looking at the implementation of
Okay. It was just a thought. It would certainly simplify some parts of KLEE (e.g. no longer necessary to make uclibc's main the entry point) but I guess modifying clang is non-trivial. |
A good workaround for uclibc is ccadar/klee-uclibc#6. Maybe, over time, LLVM infrastructure becomes better to handle that linking problem in a more sophisticated way. Currently, I would go for that patch because the features it provides is en par with the pre-llvm 3.3 version. One guess why it was faster before llvm 3.3 is that the symbol index table of archives was used to speed up the process. |
@MartinNowack Thanks. I'm currently investigating how easy it would be to re-implement the fixed point approach for linking in bitcode archives. If that proves to be too difficult/complicated I'll go for your approach instead. |
@MartinNowack Sorry I've only just got round to looking at this again. I'm going to try to implementing a slightly different approach for bitcode archives today/tomorrow. Just to warn you I just rebased the branch on current master. |
My previous commit improves linking substantially for LLVM3.3 but unfortunately it's still not as fast as it was in LLVM2.9 Before 6f3973a (LLVM3.3)
For 6f3973a (LLVM3.3)
and for LLVM2.9 (unaffected by my commit)
I am also slightly worried with my implementation too because it doesn't consider trying to find a strong definition for a defined weak symbol. The old algorithm in LLVM3.2 didn't seem to do this either, so maybe it's okay?? I'd say there's lots of things about my implementation that could be improved (I quickly tried lazy loading the bitcode modules but it seems the ValueSymbolTable doesn't have the necessary symbols in it when I do this) but this is probably an okay start. @ccadar @MartinNowack Thoughts? |
Sorry I just realise I gave invalid times for KLEE built against LLVM3.3 because I disabled optimisations (i.e. Before 6f3973a
For 6f3973a
So with these new test runs performance is closer to what we have in LLVM2.9 . It's not quite there but I would so this is "good enough" for now. The only thing I'd like to fix is that linker may include in its list of undefined symbols the KLEE special functions (e.g. |
@ccadar recommended that I test @MartinNowack workaround too. The runtime on my machine when using Martin's patch to klee-uclibc with KLEE at 302b6cd
This is certainly better performance than KLEE at 302b6cd with an unpatched version of klee-uclibc but KLEE at 6f3973a with an unpatched version of klee-uclibc looks faster. |
With LLVM 3.3 the linker does not support reading of archive files directly. This brings the support back (based on llvm-mn). Furthermore, linking single bc files or archives with bc and object files mixed is supported as well.
LLVM >= 3.3 by effectively reimplementing the linking algorithm used in LLVM <= 3.2. The LLVM specific bitcode archive format has been removed from LLVM >= 3.3 . Now archives are normal system archives that can contain LLVM bitcode modules as well as regular binary object files. The previous commit implemented an approach where ALL the bitcode modules get linked in which can be terribly slow when klee-uclibc gets linked (~600 LLVM modules). Here are the options that I considered to address this: * Use LD with LLVM gold plug-in and call as an external program. I Don't really want to add another dependency to KLEE. It already has enough! * Use the upcomming LLVM linker (lld). Not really an option because at the time of writing there is no support for linking archives of bitcode modules. * Don't use archives at all and just work with modules (i.e. replace uses of llvm-ar with llvm-link and tinker with the flags a little). This isn't so great because the resulting LLVM bitcode module we execute is bigger than it should be. * Reimpelent bitcode archive linking ourselves in a slightly better way. I've gone for the last option This implementation unfortunately loads all bitcode modules into memory first so we can query the module symbols tables. I would prefer to read the archive's index and link in modules on demand but unfortunately although the new Object::Archive interface in LLVM allows iteration over symbols it doesn't provide a way of knowing if that symbol is defined/undefined. This implementation is far from perfect!
bitcode archive linker.
that clients can access HandlerInfo nicely.
KLEE intrinsics as undefined symbols
Iterators get invalidated after elements of std::vector/set are deleted. Avoid this by remembering which elements need to be deleted and do it after iterating over the data structure.
I went through the code. Major issues were the iterators - strange that the test suite ran through. (By the way using the new llvm-lit is great!!) Unoptimized - LLVM 3
|
@MartinNowack Thanks for taking a look. Looking at my code there is at least one bug (in my implementation) in
This breaks out of the loop over a modules symbols but then the iterator gets incremented again (so we unintentionally skip a module, this shouldn't cause any problems apart from making linking very inefficient. Looking at
was legal.? If you look at this it says...
So I think doing I admit this sort of code is hard to read and a little fragile so it's probably better to go for your approach. |
because "RemovedSymbols" implies that the symbols have already been removed which is misleading because we don't remove until the end.
then clean up is performed.
of modules left because this information is no longer correct (we no longer shrink the vector).
I've cleaned up a few things but it looks good. I'm going to go ahead and merge. |
Add support for archive and single bc file linking
I'd like to bring this up again. I'm working on adding support for FreeBSD libc as KLEE libc and stumbled upon this very deficiency:
Before hacking in I want to make sure we really need current complex linker implementation. From what I read, this "link one by one until a fixed point" approach is used for performance reasons, am I right? |
With LLVM 3.3 the linker does not support reading of
archive files directly. This brings the support back
(based on llvm-mn).
Furthermore, linking single bc files or archives with
bc and object files mixed is supported as well.
@delcypher: we will need that patch to use uclibc for llvm 3.3.