Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get rid of libbfd ? #9551

Merged
merged 8 commits into from Jul 14, 2020
Merged

Get rid of libbfd ? #9551

merged 8 commits into from Jul 14, 2020

Conversation

nojb
Copy link
Contributor

@nojb nojb commented May 11, 2020

This is a proof of concept of a single-minded reimplementation of the functionality from libbfd that is required for ocamlobjinfo: to retrieve the file offset of the data corresponding to the symbol caml_plugin_header.

This is just a proof of concept: for simplicity, the code only supports ELF, makes assumption as to which section will contain the symbol in question, uses mmap the file, dispenses of any error checking, etc.

I wanted to gather feedback to decide if we want to go down this path. I understand that apart from getting rid of the libbfd dependency this may be useful to the "library linking" project currently in progress.

Comments welcome!

xref #7001 #9306

cc @dbuenzli @whitequark

@dbuenzli
Copy link
Contributor

Ha ! Guess who was reading the ELF specification yesterday evening...

@whitequark
Copy link
Member

the code only supports ELF,

That's really the easy part, and why I didn't reimplement it myself a long time ago when I needed it.

@dbuenzli
Copy link
Contributor

Note that the code for PE should already be handled by flexdll where you can open a library without running code. So what would remain is mach-o.

(Side note in the library linking proposal I'm also actually tempted to use C dll dependencies to solve the problem but I need to look at this a bit closer)

@dra27
Copy link
Member

dra27 commented May 13, 2020

At present, if the libbfd stuff ends up entirely removed, nothing remains of Mehdi's original, so I'd update the header (and perhaps note somewhere the original implementation).

This could potentially be moved into the runtime itself, cross-referencing #9503 (in particular, #9503 (comment) and #9503 (comment)). It doesn't look as though it would be terribly much more code for that to check for other symbols names as well? Could potentially have:

  • ELF: directly supported
  • Mach-O: directly supported
  • Windows: use flexdll in the appropriate mode (already in the runtime for the byte-linking check)
  • libbfd: if available, for exotic Unix?
  • dlopen: the current behaviour

and no need for helper program at all?

@dra27
Copy link
Member

dra27 commented May 13, 2020

@dbuenzli - I don't know the detail of what you're referring to with C DLL dependencies, but note that one side effect of FlexDLL DLLs is that they cease having dependencies (cf. this OCaml-related Cygwin fix)

@nojb
Copy link
Contributor Author

nojb commented May 13, 2020

Thanks @dra27 for the nice summary. I added bare-bones support for Mach-O. I will clean-up the code (it is horrible right now) when I have the time later.

@xavierleroy
Copy link
Contributor

I haven't reviewed the code carefully, but this sound extremely encouraging!

The bytecode linker (ocamlc) could use that kind of code to check that a set of DLLs provide all the external C functions referenced from the bytecode. Currently this is done by dlopen all the DLLs, but that's dangerous. See https://github.com/ocaml/ocaml/blob/trunk/bytecomp/dll.ml . All we need is a test "does this DLL define this symbol?".

A couple of remarks on the implementation:

  • Be careful with mmap() ed files in C. Every access needs to be bound-checked, otherwise it just segfaults.
  • Planning ahead for cross-compilation, the endianness of the ELF shared object can be different from that of the host.

@nojb
Copy link
Contributor Author

nojb commented May 13, 2020

I haven't reviewed the code carefully, but this sound extremely encouraging!

Thanks, and please hold off until it is cleaned up : ) I will try to do it in the next couple of days.

The bytecode linker (ocamlc) could use that kind of code to check that a set of DLLs provide all the external C functions referenced from the bytecode. Currently this is done by dlopen all the DLLs, but that's dangerous. See https://github.com/ocaml/ocaml/blob/trunk/bytecomp/dll.ml . All we need is a test "does this DLL define this symbol?".

OK, I will try to get to this level of functionality.

A couple of remarks on the implementation:

  • Be careful with mmap() ed files in C. Every access needs to be bound-checked, otherwise it just segfaults.

Yes, this was for the experiment. I was planning to switch to using buffered IO (FILE *) instead.

  • Planning ahead for cross-compilation, the endianness of the ELF shared object can be different from that of the host.

Thanks, I will keep this in mind.

@dbuenzli
Copy link
Contributor

@dbuenzli - I don't know the detail of what you're referring to with C DLL dependencies

I mean using the C dynamic linker dependencies to load cmxs and their dependencies in the right order. This wouldn't solve the problem of accessing caml_plugin for ocamlobjinfo it would solve the problem for the Dynlink API in the library linking proposal.

  • Planning ahead for cross-compilation, the endianness of the ELF shared object can be different from that of the host.

Same for mach-o. Also I don't know if the compiler can be made to emit fat binaries but if that is the case you have to be careful about that. And there's also the 32-bit story (in ELF aswel). Here's a blog post that explains how to handle all these things for mach-o.

It seems @xavierleroy found another reason to have that done properly. But just in case this ends up being too much C code to upstreams' taste, I'm reading the nix phd thesis these days and one thing they do with quite some success is to look for dependencies by searching binaries for the cryptographic hashes that get embedded in them. So I thought maybe a quick and dirty way to locate the data of caml_plugin, or as a fallback for unsupported platforms, would be to prefix the data with a sufficiently long uuid to be searched in the binary (doesn't work though if your executables can get compressed or encrypted); a bit brutal but may work well in practice.

@nojb nojb force-pushed the microbfd branch 4 times, most recently from 37d7590 to ad539cd Compare May 16, 2020 21:57
@nojb
Copy link
Contributor Author

nojb commented May 16, 2020

After some encouraging experiments in C, I rewrote the code in OCaml. This means that we do not make use of the C headers that describe the different file formats, and instead just hard-code the different offsets, sizes, etc. The code should be both endian- and bitness-agnostic.

Currently I expose three functions:

  • retrieve the file offset of the data corresponding to a symbol (this is what is used in ocamlobjinfo)
  • retrieve the list of exported symbols (I understand this is what is needed for the link-time check done by ocamlc to see that all C primitives are available).
  • adapt a symbol (eg add an underscore in the case of Mach-O)

The code is lacking in error handling and other niceties, and needs more testing, but I wanted to gather feedback about the approach. What do you think?

Currently there is code for ELF and Mach-O. I need to look how to integrate the FlexDLL backend which has a different model where one can dlopen a shared object without "loading" it.

Thanks!

@gasche
Copy link
Member

gasche commented May 17, 2020

I must say that I find this surprisingly nice. It's roughly 200 lines of code per backend, so not too bad, and we don't expect those formats to change very often (because the rest of the world would break as well).

@whitequark
Copy link
Member

I like it too, especially the fact that it's in OCaml, since that makes cross-tools a breeze.

@nojb
Copy link
Contributor Author

nojb commented May 17, 2020

I added support for PE/FlexDLL. This code is self-contained and does not actually depend on FlexDLL in any way, but it would have to be updated if FlexDLL ever changes the way it encodes the exported symbol table in PE images (the .exptbl section).

Moreover, now the object file format is auto-detected by looking at the file, independently of the executing architecture.

Next I plan to experiment using this code for the link-time C primitive check done by ocamlc (as suggested by @xavierleroy above).

@mshinwell
Copy link
Contributor

I had a discussion with @nojb this morning on the following points, which I think are worth raising here:

  • There is already an OCaml library, called owee (by @let-def ), which provides functionality for reading executable files.
  • It seems a shame to end up with two implementations of code that is highly platform-dependent and seems reasonably likely to require some level of ongoing maintenance.
  • @nojb tells me there is a problem because owee currently uses mmap and that isn't available in the standard library.
  • I suggested that maybe a new small library could be formed from the code here and in owee. This could then be vendored into the compiler tree, preserving our requirement that the tree builds standalone.
  • Forming the code into a library would hopefully mean that compiler developers benefit from external contributions to such library; and likewise, external tools benefit from compiler developers' contributions. (I have two examples of soon-to-be-announced tools both of which use or will use owee, both from @gretay-js : the FDO feedback-directed optimiser; and an implementation of low-overhead user-space probes, providing functionality similar to USDT probes. There are no doubt others too.)

@gasche
Copy link
Member

gasche commented May 18, 2020

@let-def, this is probably the point where your opinion would be welcome :-)

@nojb
Copy link
Contributor Author

nojb commented May 18, 2020

Just for completeness, I mentioned two other differences between this code and that library:

  • owee uses mmap (as mentioned)
  • if I am not mistaken, owee seems to handle only 64-bit little-endian files
  • this code also supports PE/FlexDLL

@let-def
Copy link
Contributor

let-def commented May 18, 2020

Not sure what I have to say :).

On refactoring: Some features in Owee are not really necessary (marker and traverse stuff), so it would make sense to remove them from the main library. Stuff for symbol indexing (the interval tree structure) could also be put in an auxiliary library, so that the main focus is just ELF and Mach-O parsing... The kind of indexes to build often depends on the task to achieve (debugging or linking).

On input abstraction: Owee doesn't require mmap (one module uses it to map the program binary in the memory space of the process, in a way that is already linux specific). However it does all I/O through a Bigarray (for the lack of a better abstraction to represent array of bytes with random access).

On Mach-O: Mach-O support in Owee is really bad :P, and I don't have a macbook to test on.

A single, blessed library for working with binaries: I am happy to see that @nojb is doing good work on similar problems. I have no particular opinion on how things should be done, but it would be nice if we can converge on a single library. Ideally I want a low-level library that just deals efficiently with parsing and generating objects in various formats.

@xavierleroy
Copy link
Contributor

From a quick look, owee is very much ELF+DWARF and tries to give access to all the information expressed in these formats. The present implementation has a much simpler API, extracting only the info needed by the OCaml compilers and tools, but is cross-platform. I'm afraid that trying to combine both uses cases in a single code base will result in a libbfd-style monster.

@mshinwell
Copy link
Contributor

Yeah, I wasn't thinking of simply a union, but rather a cut-down library for doing the basic executable file format access. As far as I understand it this is roughly what @let-def is proposing.

@nojb nojb force-pushed the microbfd branch 4 times, most recently from bea70fb to 274edde Compare May 19, 2020 11:39
@nojb
Copy link
Contributor Author

nojb commented May 26, 2020

More annoyingly: the Binutils functions make no difference between symbols defined by the DLL and symbols imported by the DLL. Binutils.defines_symbol returns true for imported symbols, and Binutils.symbol_offset x returns Some 0L. If we want to use this code in the ocamlc bytecode linker, to check that C stub functions are defined in the provided DLLs, we need to treat defined and imported symbols differently.

Thanks for the review! I will look into this shortly.

@nojb
Copy link
Contributor Author

nojb commented May 26, 2020

One oddity: the empty string is always defined, i.e. Binutils.defines_symbol x ""always returns true, and Binutils.symbol_offset x "" always returns Some 0L.

We can of course filter out symbols with empty names, but in my machine such symbols also appear in the output of readelf -W --dyn-syms.

More annoyingly: the Binutils functions make no difference between symbols defined by the DLL and symbols imported by the DLL. Binutils.defines_symbol returns true for imported symbols, and Binutils.symbol_offset x returns Some 0L. If we want to use this code in the ocamlc bytecode linker, to check that C stub functions are defined in the provided DLLs, we need to treat defined and imported symbols differently.

I believe this issue should be fixed (both for ELF and Mach-O, FlexDLL should not be affected): imported symbols should now be skipped.

@nojb
Copy link
Contributor Author

nojb commented May 26, 2020

One oddity: the empty string is always defined, i.e. Binutils.defines_symbol x ""always returns true, and Binutils.symbol_offset x "" always returns Some 0L.

We can of course filter out symbols with empty names, but in my machine such symbols also appear in the output of readelf -W --dyn-syms.

To clarify this point: the first symbol of the ELF symbol table is always an undefined "NULL" symbol. This symbol is now filtered out because only defined symbols are considered (as per your second remark). However, there may be other symbols with empty names apart from this one and those continue to be considered if they are defined in the DLL.

@nojb
Copy link
Contributor Author

nojb commented Jun 11, 2020

Following @xavierleroy's lead, I did large-scale testing of the ELF code with the bash script cmp.sh below. (I used a slightly patched version of binutils.ml with a CLI driver, available at https://gist.github.com/nojb/eb6e2ecfde441a6f98e02b8c87d2c6a4):

#!/bin/bash

DLL="$1"
BASE=$(basename $DLL)

echo $BASE

ocaml binutils.ml $DLL > ocaml.$BASE.out

readelf -W --dyn-syms $DLL | grep '[0-9]\+:' | grep -v UND | awk '{print $8}' | cut -d@ -f1 > readelf.$BASE.out

if git diff --exit-code --no-index ocaml.$BASE.out readelf.$BASE.out
then
    rm ocaml.$BASE.out readelf.$BASE.out
fi

I invoked the script with

find /usr -name '*.so' | xargs -n1 cmp.sh > cmp.out

(output available at https://gist.github.com/nojb/022a6b66fc962feaf426f1508c0685be). The script compared 1304 dlls. There with a single diff, where the OCaml version "sees" 4 symbols that readelf does not (the dll is libjavafx_font_t2k.so if you are curious). I will look into it, but regardless I think the result is quite good.

I plan to conduct similar testing for Mach-O shortly.

@XVilka
Copy link
Contributor

XVilka commented Jun 11, 2020

I recommend two more things to test on:

@nojb
Copy link
Contributor Author

nojb commented Jun 11, 2020

Thanks, I will take a look.

@nojb
Copy link
Contributor Author

nojb commented Jun 11, 2020

I did similar testing for Mach-O by comparing the output of the OCaml code with that of nm (testing script at https://gist.github.com/nojb/c85e51b0b1e468f9cc430b1b3b65937c, test output at https://gist.github.com/nojb/c379384578d8b0c4b06a5fe1534dbb01). Out of 1443 dylibs tested in my system, 69 were FAT binaries (which we don't support and that OCaml does not produce), and the rest did not exhibit any diff.

So on the whole the test results are quite encouraging.

@nojb nojb force-pushed the microbfd branch 2 times, most recently from 0a66f43 to b34c0e4 Compare June 24, 2020 18:38
@nojb
Copy link
Contributor Author

nojb commented Jun 24, 2020

Rebased on trunk and cleaned up the history.

Friendly ping @xavierleroy: can you think of anything else I could do on my side to help push this PR forward?

Copy link
Contributor

@xavierleroy xavierleroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not able yet to review in full details, just looked at the general structure, so it would be great if someone else could read this PR carefully.

This said, I very much like what I've seen so far.

The Dll module is clumsy in the way it combines the two usage modes For_loading and For_checking, but it just shows that Dll was a bad interface to begin with. A later PR could clean this up by having completely separate code paths for the "checking" use in ocamlc and for the "loading" mode in dynlink and the toplevel.

@dbuenzli
Copy link
Contributor

dbuenzli commented Jul 7, 2020

I was not able yet to review in full details, just looked at the general structure, so it would be great if someone else could read this PR carefully.

@xavierleroy do you mean you want a second reviewer ? It's easy to miss since it's in the "hidden messages" here but I made careful (and long) review, cross checking everything with the formats specs, which resulted in:

So I checked all these field byte offsets and their semantics... Carefully enough to say that bugs here will also be mine.

@xavierleroy
Copy link
Contributor

xavierleroy commented Jul 7, 2020

@dbuenzli: what I wrote was unclear, sorry about that.

I know that you reviewed carefully the new library that reads the ELF, Mach-O and PE formats. Combined with the testing done by @nojb, this is plenty, there is no need for additional reviewing.

The part that I wanted to read in full details but haven't been able to yet is the changes to the bytecode compiler to integrate the new library, esp. the Dll module and its various uses. If someone else can look at this too -- or already has! --, that would be even better.

Copy link
Contributor

@xavierleroy xavierleroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read in more details and it looks all good to me.

I added a Changes entry directly on @nojb's branch, and will now merge.

@xavierleroy xavierleroy merged commit 0802bac into ocaml:trunk Jul 14, 2020
@nojb
Copy link
Contributor Author

nojb commented Jul 14, 2020

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants