Skip to content
This repository was archived by the owner on Mar 31, 2026. It is now read-only.

[rld] Linker Structure

Paul Bowen-Huggett edited this page Jun 22, 2020 · 9 revisions

Table of Contents

Introduction

This page hopes to provide a high level overview of the internals of rld, the Program Repository Linker. Please note that at its current stage of development, some aspects of the design are more mature than others; everything is subject to change.

Temporal View

For performance, rld tries to parallelize the link as much as possible. At startup it creates a collection of worker threads and feeds jobs for them to be processed as necessary.

rld Structure

(Note that the size of the boxes in the diagram above does not reflect the amount of time that their processes consume. This merely shows the order of operations and how they are overlapped.)

Copy/Relocate
Data is copied from the database to the output file. Internal and external relocations are applied. There is one instance of this job for each output section.
Identify
Examine the input file to determine its type and contents. There is an instance of this job for each input file. (Note that it is possible to overlap the Identify and Resolution processes, but this is not currently implemented.)
Layout
As each input file completes Resolution, its contents are appended to the output layout (in the order specified by the user). There is a single instance of this job. See this page for further details.
Metadata
Generation of metadata for the output file format. In the case of ELF, this would be the string table, program header table, section header table, etc. There is a single instance of this job.
Resolution
Symbol resolution following the rules laid out on this page. External fixup resolution: creating symbol table entries for names that are yet to be resolved. Trampoline/PLT/GOT generation in response to fixups. There is an instance of this job for each input file.

Structural View

The structural/memory view of the linker is probably best described with the aid of a small example. Consider a simple pair of source files (below), compiled as compilation1 and compilation2.

compilation1:

void f () { g (); }

compilation2:

void g () { f (); }

The following diagram shows the state of the 3 layer data structures built by pstore and the linker’s Resolution task.

rld Structure

There are 3 layers to consider:

  1. The pstore on-disk representation
  2. The linker’s “shadow” memory
  3. The linker’s heap-allocated symbol table

Taking these one at a time:

1. The pstore On-Disk Representation

Although the way in which a program is represented in pstore is not strictly part of rld, it’s important to see that rld takes advantage of pstore features such as the fact that each string has a unique address. The diagram shows a simplified view of how these compilations might be represented in a pstore repository. The data structures for transactions, indexes, and so on are omitted for clarity.

compilationn
There is a compilation record for each unique translation unit that is compiled. Our example has two (different) source files, so we have two compilations in the database to reflect this.
Defn
A compilation contains an array of definition records. Each definition has a name (the address of an Indirect String), linkage, fragment, and so on.
fragmentn
There is a fragment for each unique function or data-object that is compiled. Fragments may be shared between compilations, but in our example the two functions have different bodies.
Sn
An Indirect String record points to a corresponding string body object Sbn. Strings are added to the database’s string set as they are processed by the compiler back-end. Rather than reference the string body directly, we use a small object which points to the string body. This is intended to improve locality of reference and, by clustering the objects tightly together, to reduce the number of physical memory pages required by the linker’s shadow memory. It’s this small object that the various other structures used as the “string address”.
Sbn
Contains the body of an individual string. Indirect String object Sn points to the corresponding string body Sbn.
Xfn
A fragment consists of one-or-more sections which can each contain zero-or-more external fixups. Each of these fixups references an object in the program by name. The linker’s Resolution operation for each fragment connects the fixup to its corresponding symbol. This enables the Relocate process to quickly find the target address of a fixup in O(1) time.

2. rld’s “Shadow” Memory

rld allocates a block of anonymous memory mapped storage equal to the size of the pstore file as its “shadow” memory. This is used to give very fast O(1) access to data in the linker which corresponds to objects in the database. Note that this memory address space is reserved, but most pages within it are never touched and so should not require physical memory to be consumed.

sn
The shadow memory for an Indirect String instance used by either or both of a definition of an external fixup points to the symbol table record associated with that specific name. A spin-lock is used to synchonize access through this pointer by multiple threads.
xn
Provides O(1) access to the symbol corresponding to each fixup and used during relocation. TODO: provide a deeper explanation.

3. rld’s Symbol Table

Members of the symbol table are allocated as definitions are processed or as external fixups which reference as-yet undefined symbols are encountered. In the latter case, an undefined symbol can become defined later compilation.

Each symbol (defined or undefined) has a name. Defined symbols have linkage and a pointer to the fragment whose contents they represent.

The symbol table uses the chunked_vector<> container to give amortized O(1) insertion time.

Clone this wiki locally