New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get rid of libbfd ? #9551
Get rid of libbfd ? #9551
Conversation
Ha ! Guess who was reading the ELF specification yesterday evening... |
That's really the easy part, and why I didn't reimplement it myself a long time ago when I needed it. |
Note that the code for (Side note in the library linking proposal I'm also actually tempted to use C dll dependencies to solve the problem but I need to look at this a bit closer) |
At present, if the libbfd stuff ends up entirely removed, nothing remains of Mehdi's original, so I'd update the header (and perhaps note somewhere the original implementation). This could potentially be moved into the runtime itself, cross-referencing #9503 (in particular, #9503 (comment) and #9503 (comment)). It doesn't look as though it would be terribly much more code for that to check for other symbols names as well? Could potentially have:
and no need for helper program at all? |
@dbuenzli - I don't know the detail of what you're referring to with C DLL dependencies, but note that one side effect of FlexDLL DLLs is that they cease having dependencies (cf. this OCaml-related Cygwin fix) |
Thanks @dra27 for the nice summary. I added bare-bones support for Mach-O. I will clean-up the code (it is horrible right now) when I have the time later. |
I haven't reviewed the code carefully, but this sound extremely encouraging! The bytecode linker ( A couple of remarks on the implementation:
|
Thanks, and please hold off until it is cleaned up : ) I will try to do it in the next couple of days.
OK, I will try to get to this level of functionality.
Yes, this was for the experiment. I was planning to switch to using buffered IO (
Thanks, I will keep this in mind. |
I mean using the C dynamic linker dependencies to load
Same for mach-o. Also I don't know if the compiler can be made to emit fat binaries but if that is the case you have to be careful about that. And there's also the 32-bit story (in ELF aswel). Here's a blog post that explains how to handle all these things for It seems @xavierleroy found another reason to have that done properly. But just in case this ends up being too much C code to upstreams' taste, I'm reading the |
37d7590
to
ad539cd
Compare
After some encouraging experiments in C, I rewrote the code in OCaml. This means that we do not make use of the C headers that describe the different file formats, and instead just hard-code the different offsets, sizes, etc. The code should be both endian- and bitness-agnostic. Currently I expose three functions:
The code is lacking in error handling and other niceties, and needs more testing, but I wanted to gather feedback about the approach. What do you think? Currently there is code for ELF and Mach-O. I need to look how to integrate the FlexDLL backend which has a different model where one can Thanks! |
I must say that I find this surprisingly nice. It's roughly 200 lines of code per backend, so not too bad, and we don't expect those formats to change very often (because the rest of the world would break as well). |
I like it too, especially the fact that it's in OCaml, since that makes cross-tools a breeze. |
I added support for PE/FlexDLL. This code is self-contained and does not actually depend on FlexDLL in any way, but it would have to be updated if FlexDLL ever changes the way it encodes the exported symbol table in PE images (the Moreover, now the object file format is auto-detected by looking at the file, independently of the executing architecture. Next I plan to experiment using this code for the link-time C primitive check done by |
I had a discussion with @nojb this morning on the following points, which I think are worth raising here:
|
@let-def, this is probably the point where your opinion would be welcome :-) |
Just for completeness, I mentioned two other differences between this code and that library:
|
Not sure what I have to say :). On refactoring: Some features in Owee are not really necessary (marker and traverse stuff), so it would make sense to remove them from the main library. Stuff for symbol indexing (the interval tree structure) could also be put in an auxiliary library, so that the main focus is just ELF and Mach-O parsing... The kind of indexes to build often depends on the task to achieve (debugging or linking). On input abstraction: Owee doesn't require On Mach-O: Mach-O support in Owee is really bad :P, and I don't have a macbook to test on. A single, blessed library for working with binaries: I am happy to see that @nojb is doing good work on similar problems. I have no particular opinion on how things should be done, but it would be nice if we can converge on a single library. Ideally I want a low-level library that just deals efficiently with parsing and generating objects in various formats. |
From a quick look, owee is very much ELF+DWARF and tries to give access to all the information expressed in these formats. The present implementation has a much simpler API, extracting only the info needed by the OCaml compilers and tools, but is cross-platform. I'm afraid that trying to combine both uses cases in a single code base will result in a libbfd-style monster. |
Yeah, I wasn't thinking of simply a union, but rather a cut-down library for doing the basic executable file format access. As far as I understand it this is roughly what @let-def is proposing. |
bea70fb
to
274edde
Compare
Thanks for the review! I will look into this shortly. |
We can of course filter out symbols with empty names, but in my machine such symbols also appear in the output of
I believe this issue should be fixed (both for ELF and Mach-O, FlexDLL should not be affected): imported symbols should now be skipped. |
To clarify this point: the first symbol of the ELF symbol table is always an undefined "NULL" symbol. This symbol is now filtered out because only defined symbols are considered (as per your second remark). However, there may be other symbols with empty names apart from this one and those continue to be considered if they are defined in the DLL. |
Following @xavierleroy's lead, I did large-scale testing of the ELF code with the bash script #!/bin/bash
DLL="$1"
BASE=$(basename $DLL)
echo $BASE
ocaml binutils.ml $DLL > ocaml.$BASE.out
readelf -W --dyn-syms $DLL | grep '[0-9]\+:' | grep -v UND | awk '{print $8}' | cut -d@ -f1 > readelf.$BASE.out
if git diff --exit-code --no-index ocaml.$BASE.out readelf.$BASE.out
then
rm ocaml.$BASE.out readelf.$BASE.out
fi I invoked the script with
(output available at https://gist.github.com/nojb/022a6b66fc962feaf426f1508c0685be). The script compared 1304 dlls. There with a single diff, where the OCaml version "sees" 4 symbols that I plan to conduct similar testing for Mach-O shortly. |
I recommend two more things to test on:
|
Thanks, I will take a look. |
I did similar testing for Mach-O by comparing the output of the OCaml code with that of So on the whole the test results are quite encouraging. |
0a66f43
to
b34c0e4
Compare
Rebased on trunk and cleaned up the history. Friendly ping @xavierleroy: can you think of anything else I could do on my side to help push this PR forward? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not able yet to review in full details, just looked at the general structure, so it would be great if someone else could read this PR carefully.
This said, I very much like what I've seen so far.
The Dll
module is clumsy in the way it combines the two usage modes For_loading
and For_checking
, but it just shows that Dll
was a bad interface to begin with. A later PR could clean this up by having completely separate code paths for the "checking" use in ocamlc and for the "loading" mode in dynlink and the toplevel.
@xavierleroy do you mean you want a second reviewer ? It's easy to miss since it's in the "hidden messages" here but I made careful (and long) review, cross checking everything with the formats specs, which resulted in:
|
@dbuenzli: what I wrote was unclear, sorry about that. I know that you reviewed carefully the new library that reads the ELF, Mach-O and PE formats. Combined with the testing done by @nojb, this is plenty, there is no need for additional reviewing. The part that I wanted to read in full details but haven't been able to yet is the changes to the bytecode compiler to integrate the new library, esp. the Dll module and its various uses. If someone else can look at this too -- or already has! --, that would be even better. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read in more details and it looks all good to me.
I added a Changes entry directly on @nojb's branch, and will now merge.
Thanks! |
This is a proof of concept of a single-minded reimplementation of the functionality from
libbfd
that is required forocamlobjinfo
: to retrieve the file offset of the data corresponding to the symbolcaml_plugin_header
.This is just a proof of concept: for simplicity, the code only supports
ELF
, makes assumption as to which section will contain the symbol in question, usesmmap
the file, dispenses of any error checking, etc.I wanted to gather feedback to decide if we want to go down this path. I understand that apart from getting rid of the
libbfd
dependency this may be useful to the "library linking" project currently in progress.Comments welcome!
xref #7001 #9306
cc @dbuenzli @whitequark