|
|
@@ -0,0 +1,309 @@ |
|
|
The ELF and COFF Linkers |
|
|
======================== |
|
|
|
|
|
We started rewriting the ELF (Unix) and COFF (Windows) linkers in May 2015. |
|
|
Since then, we have been making a steady progress towards providing |
|
|
drop-in replacements for the system linkers. |
|
|
|
|
|
Currently, the Windows support is mostly complete and is about 2x faster |
|
|
than the linker that comes as a part of Micrsoft Visual Studio toolchain. |
|
|
|
|
|
The ELF support is in progress and is able to link large programs |
|
|
such as Clang or LLD itself. Unless your program depends on linker scripts, |
|
|
you can expect it to be linkable with LLD. |
|
|
It is currently about 1.2x to 2x faster than GNU gold linker. |
|
|
We aim to make it a drop-in replacement for the GNU linker. |
|
|
|
|
|
We expect that FreeBSD is going to be the first large system |
|
|
to adopt LLD as the system linker. |
|
|
We are working on it in collaboration with the FreeBSD project. |
|
|
|
|
|
The linkers are notably small; as of March 2016, |
|
|
the COFF linker is under 7k LOC and the ELF linker is about 10k LOC. |
|
|
|
|
|
The linkers are designed to be as fast and simple as possible. |
|
|
Because it is simple, it is easy to extend it to support new features. |
|
|
There a few key design choices that we made to achieve these goals. |
|
|
We will describe them in this document. |
|
|
|
|
|
The ELF Linker as a Library |
|
|
--------------------------- |
|
|
|
|
|
You can embed LLD to your program by linking against it and calling the linker's |
|
|
entry point function lld::elf::link. |
|
|
|
|
|
The current policy is that it is your reponsibility to give trustworthy object |
|
|
files. The function is guaranteed to return as long as you do not pass corrupted |
|
|
or malicious object files. A corrupted file could cause a fatal error or SEGV. |
|
|
That being said, you don't need to worry too much about it if you create object |
|
|
files in the usual way and give them to the linker. It is naturally expected to |
|
|
work, or otherwise it's a linker's bug. |
|
|
|
|
|
Design |
|
|
====== |
|
|
|
|
|
We will describe the design of the linkers in the rest of the document. |
|
|
|
|
|
Key Concepts |
|
|
------------ |
|
|
|
|
|
Linkers are fairly large pieces of software. |
|
|
There are many design choices you have to make to create a complete linker. |
|
|
|
|
|
This is a list of design choices we've made for ELF and COFF LLD. |
|
|
We believe that these high-level design choices achieved a right balance |
|
|
between speed, simplicity and extensibility. |
|
|
|
|
|
* Implement as native linkers |
|
|
|
|
|
We implemented the linkers as native linkers for each file format. |
|
|
|
|
|
The two linkers share the same design but do not share code. |
|
|
Sharing code makes sense if the benefit is worth its cost. |
|
|
In our case, ELF and COFF are different enough that we thought the layer to |
|
|
abstract the differences wouldn't worth its complexity and run-time cost. |
|
|
Elimination of the abstract layer has greatly simplified the implementation. |
|
|
|
|
|
* Speed by design |
|
|
|
|
|
One of the most important thing in archiving high performance is to |
|
|
do less rather than do it efficiently. |
|
|
Therefore, the high-level design matters more than local optimizations. |
|
|
Since we are trying to create a high-performance linker, |
|
|
it is very important to keep the design as efficient as possible. |
|
|
|
|
|
Broadly speaking, we do not do anything until we have to do it. |
|
|
For example, we do not read section contents or relocations |
|
|
until we need them to continue linking. |
|
|
When we need to do some costly operation (such as looking up |
|
|
a hash table for each symbol), we do it only once. |
|
|
We obtain a handler (which is typically just a pointer to actual data) |
|
|
on the first operation and use it throughout the process. |
|
|
|
|
|
* Efficient archive file handling |
|
|
|
|
|
LLD's handling of archive files (the files with ".a" file extension) is different |
|
|
from the traditional Unix linkers and pretty similar to Windows linkers. |
|
|
We'll describe how the traditional Unix linker handles archive files, |
|
|
what the problem is, and how LLD approached the problem. |
|
|
|
|
|
The traditional Unix linker maintains a set of undefined symbols during linking. |
|
|
The linker visits each file in the order as they appeared in the command line |
|
|
until the set becomes empty. What the linker would do depends on file type. |
|
|
|
|
|
- If the linker visits an object file, the linker links object files to the result, |
|
|
and undefined symbols in the object file are added to the set. |
|
|
|
|
|
- If the linker visits an archive file, it checks for the archive file's symbol table |
|
|
and extracts all object files that have definitions for any symbols in the set. |
|
|
|
|
|
This algorithm sometimes leads to a counter-intuitive behavior. |
|
|
If you give archive files before object files, nothing will happen |
|
|
because when the linker visits archives, there is no undefined symbols in the set. |
|
|
As a result, no files are extracted from the first archive file, |
|
|
and the link is done at that point because the set is empty after it visits one file. |
|
|
|
|
|
You can fix the problem by reordering the files, |
|
|
but that cannot fix the issue of mutually-dependent archive files. |
|
|
|
|
|
Linking mutually-dependent archive files is tricky. |
|
|
You may specify the same archive file multiple times to |
|
|
let the linker visit it more than once. |
|
|
Or, you may use the special command line options, `-(` and `-)`, |
|
|
to let the linker loop over the files between the options until |
|
|
no new symbols are added to the set. |
|
|
|
|
|
Visiting the same archive files multiple makes the linker slower. |
|
|
|
|
|
Here is how LLD approached the problem. Instead of memorizing only undefined symbols, |
|
|
we program LLD so that it memorizes all symbols. |
|
|
When it sees an undefined symbol that can be resolved by extracting an object file |
|
|
from an archive file it previously visited, it immediately extracts the file and link it. |
|
|
It is doable because LLD does not forget symbols it have seen in archive files. |
|
|
|
|
|
We believe that the LLD's way is efficient and easy to justify. |
|
|
|
|
|
The semantics of LLD's archive handling is different from the traditional Unix's. |
|
|
You can observe it if you carefully craft archive files to exploit it. |
|
|
However, in reality, we don't know any program that cannot link |
|
|
with our algorithm so far, so we are not too worried about the incompatibility. |
|
|
|
|
|
Important Data Strcutures |
|
|
------------------------- |
|
|
|
|
|
We will describe the key data structures in LLD in this section. |
|
|
The linker can be understood as the interactions between them. |
|
|
Once you understand their functions, the code of the linker should look obvious to you. |
|
|
|
|
|
* SymbolBody |
|
|
|
|
|
SymbolBody is a class to represent symbols. |
|
|
They are created for symbols in object files or archive files. |
|
|
The linker creates linker-defined symbols as well. |
|
|
|
|
|
There are basically three types of SymbolBodies: Defined, Undefined, or Lazy. |
|
|
|
|
|
- Defined symbols are for all symbols that are considered as "resolved", |
|
|
including real defined symbols, COMDAT symbols, common symbols, |
|
|
absolute symbols, linker-created symbols, etc. |
|
|
- Undefined symbols represent undefined symbols, which need to be replaced by |
|
|
Defined symbols by the resolver until the link is complete. |
|
|
- Lazy symbols represent symbols we found in archive file headers |
|
|
which can turn into Defined if we read archieve members. |
|
|
|
|
|
* Symbol |
|
|
|
|
|
Symbol is a pointer to a SymbolBody. There's only one Symbol for |
|
|
each unique symbol name (this uniqueness is guaranteed by the symbol table). |
|
|
Because SymbolBodies are created for each file independently, |
|
|
there can be many SymbolBodies for the same name. |
|
|
Thus, the relationship between Symbols and SymbolBodies is 1:N. |
|
|
You can think of Symbols as handles for SymbolBodies. |
|
|
|
|
|
The resolver keeps the Symbol's pointer to always point to the "best" SymbolBody. |
|
|
Pointer mutation is the resolve operation of this linker. |
|
|
|
|
|
SymbolBodies have pointers to their Symbols. |
|
|
That means you can always find the best SymbolBody from |
|
|
any SymbolBody by following pointers twice. |
|
|
This structure makes it very easy and cheap to find replacements for symbols. |
|
|
For example, if you have an Undefined SymbolBody, you can find a Defined |
|
|
SymbolBody for that symbol just by going to its Symbol and then to SymbolBody, |
|
|
assuming the resolver have successfully resolved all undefined symbols. |
|
|
|
|
|
* SymbolTable |
|
|
|
|
|
SymbolTable is basically a hash table from strings to Symbols |
|
|
with a logic to resolve symbol conflicts. It resolves conflicts by symbol type. |
|
|
|
|
|
- If we add Undefined and Defined symbols, the symbol table will keep the latter. |
|
|
- If we add Defined and Lazy symbols, it will keep the former. |
|
|
- If we add Lazy and Undefined, it will keep the former, |
|
|
but it will also trigger the Lazy symbol to load the archive member |
|
|
to actually resolve the symbol. |
|
|
|
|
|
* Chunk (COFF specific) |
|
|
|
|
|
Chunk represents a chunk of data that will occupy space in an output. |
|
|
Each regular section becomes a chunk. |
|
|
Chunks created for common or BSS symbols are not backed by sections. |
|
|
The linker may create chunks to append additional data to an output as well. |
|
|
|
|
|
Chunks know about their size, how to copy their data to mmap'ed outputs, |
|
|
and how to apply relocations to them. |
|
|
Specifically, section-based chunks know how to read relocation tables |
|
|
and how to apply them. |
|
|
|
|
|
* InputSection (ELF specific) |
|
|
|
|
|
Since we have less synthesized data for ELF, we don't abstract slices of |
|
|
input files as Chunks for ELF. Instead, we directly use the input section |
|
|
as an internal data type. |
|
|
|
|
|
InputSection knows about their size and how to copy themselves to |
|
|
mmap'ed outputs, just like COFF Chunks. |
|
|
|
|
|
* OutputSection |
|
|
|
|
|
OutputSection is a container of InputSections (ELF) or Chunks (COFF). |
|
|
An InputSection or Chunk belongs to at most one OutputSection. |
|
|
|
|
|
There are mainly three actors in this linker. |
|
|
|
|
|
* InputFile |
|
|
|
|
|
InputFile is a superclass of file readers. |
|
|
We have a different subclass for each input file type, |
|
|
such as regular object file, archive file, etc. |
|
|
They are responsible for creating and owning SymbolBodies and |
|
|
InputSections/Chunks. |
|
|
|
|
|
* Writer |
|
|
|
|
|
The writer is responsible for writing file headers and InputSections/Chunks to a file. |
|
|
It creates OutputSections, put all InputSections/Chunks into them, |
|
|
assign unique, non-overlapping addresses and file offsets to them, |
|
|
and then write them down to a file. |
|
|
|
|
|
* Driver |
|
|
|
|
|
The linking process is drived by the driver. The driver |
|
|
|
|
|
- processes command line options, |
|
|
- creates a symbol table, |
|
|
- creates an InputFile for each input file and put all symbols in it into the symbol table, |
|
|
- checks if there's no remaining undefined symbols, |
|
|
- creates a writer, |
|
|
- and passes the symbol table to the writer to write the result to a file. |
|
|
|
|
|
Link-Time Optimization |
|
|
---------------------- |
|
|
|
|
|
LTO is implemented by handling LLVM bitcode files as object files. |
|
|
The linker resolves symbols in bitcode files normally. If all symbols |
|
|
are successfully resolved, it then calls an LLVM libLTO function |
|
|
with all bitcode files to convert them to one big regular ELF/COFF file. |
|
|
Finally, the linker replaces bitcode symbols with ELF/COFF symbols, |
|
|
so that we link the input files as if they were in the native |
|
|
format from the beginning. |
|
|
|
|
|
The details are described in this document. |
|
|
http://llvm.org/docs/LinkTimeOptimization.html |
|
|
|
|
|
Glossary |
|
|
-------- |
|
|
|
|
|
* RVA (COFF) |
|
|
|
|
|
Short for Relative Virtual Address. |
|
|
|
|
|
Windows executables or DLLs are not position-independent; they are |
|
|
linked against a fixed address called an image base. RVAs are |
|
|
offsets from an image base. |
|
|
|
|
|
Default image bases are 0x140000000 for executables and 0x18000000 |
|
|
for DLLs. For example, when we are creating an executable, we assume |
|
|
that the executable will be loaded at address 0x140000000 by the |
|
|
loader, so we apply relocations accordingly. Result texts and data |
|
|
will contain raw absolute addresses. |
|
|
|
|
|
* VA |
|
|
|
|
|
Short for Virtual Address. For COFF, it is equivalent to RVA + image base. |
|
|
|
|
|
* Base relocations (COFF) |
|
|
|
|
|
Relocation information for the loader. If the loader decides to map |
|
|
an executable or a DLL to a different address than their image |
|
|
bases, it fixes up binaries using information contained in the base |
|
|
relocation table. A base relocation table consists of a list of |
|
|
locations containing addresses. The loader adds a difference between |
|
|
RVA and actual load address to all locations listed there. |
|
|
|
|
|
Note that this run-time relocation mechanism is much simpler than ELF. |
|
|
There's no PLT or GOT. Images are relocated as a whole just |
|
|
by shifting entire images in memory by some offsets. Although doing |
|
|
this breaks text sharing, I think this mechanism is not actually bad |
|
|
on today's computers. |
|
|
|
|
|
* ICF |
|
|
|
|
|
Short for Identical COMDAT Folding (COFF) or Identical Code Folding (ELF). |
|
|
|
|
|
ICF is an optimization to reduce output size by merging read-only sections |
|
|
by not only their names but by their contents. If two read-only sections |
|
|
happen to have the same metadata, actual contents and relocations, |
|
|
they are merged by ICF. It is known as an effective technique, |
|
|
and it usually reduces C++ program's size by a few percent or more. |
|
|
|
|
|
Note that this is not entirely sound optimization. C/C++ require |
|
|
different functions have different addresses. If a program depends on |
|
|
that property, it would fail at runtime. |
|
|
|
|
|
On Windows, that's not really an issue because MSVC link.exe enabled |
|
|
the optimization by default. As long as your program works |
|
|
with the linker's default settings, your program should be safe with ICF. |
|
|
|
|
|
On Unix, your program is generally not guaranteed to be safe with ICF, |
|
|
although large programs happen to work correctly. |
|
|
LLD works fine with ICF for example. |