Skip to content

Commit 218ce83

Browse files
author
Zachary Turner
committed
[PDB] Begin adding documentation for the PDB file format.
Differential Revision: https://reviews.llvm.org/D26374 llvm-svn: 286491
1 parent 58ddb8d commit 218ce83

File tree

10 files changed

+306
-0
lines changed

10 files changed

+306
-0
lines changed

llvm/docs/PDB/DbiStream.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
=====================================
2+
The PDB DBI (Debug Info) Stream
3+
=====================================

llvm/docs/PDB/GlobalStream.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
=====================================
2+
The PDB Global Symbol Stream
3+
=====================================

llvm/docs/PDB/HashStream.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
=====================================
2+
The TPI & IPI Hash Streams
3+
=====================================

llvm/docs/PDB/ModiStream.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
=====================================
2+
The Module Information Stream
3+
=====================================

llvm/docs/PDB/MsfFile.rst

Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
=====================================
2+
The MSF File Format
3+
=====================================
4+
5+
.. contents::
6+
:local:
7+
8+
.. _msf_superblock:
9+
10+
The Superblock
11+
==============
12+
At file offset 0 in an MSF file is the MSF *SuperBlock*, which is laid out as
13+
follows:
14+
15+
.. code-block:: c++
16+
17+
struct SuperBlock {
18+
char FileMagic[sizeof(Magic)];
19+
ulittle32_t BlockSize;
20+
ulittle32_t FreeBlockMapBlock;
21+
ulittle32_t NumBlocks;
22+
ulittle32_t NumDirectoryBytes;
23+
ulittle32_t Unknown;
24+
ulittle32_t BlockMapAddr;
25+
};
26+
27+
- **FileMagic** - Must be equal to ``"Microsoft C / C++ MSF 7.00\\r\\n"``
28+
followed by the bytes ``1A 44 53 00 00 00``.
29+
- **BlockSize** - The block size of the internal file system. Valid values are
30+
512, 1024, 2048, and 4096 bytes. Certain aspects of the MSF file layout vary
31+
depending on the block sizes. For the purposes of LLVM, we handle only block
32+
sizes of 4KiB, and all further discussion assumes a block size of 4KiB.
33+
- **FreeBlockMapBlock** - The index of a block within the file, at which begins
34+
a bitfield representing the set of all blocks within the file which are "free"
35+
(i.e. the data within that block is not used). This bitfield is spread across
36+
the MSF file at ``BlockSize`` intervals.
37+
**Important**: ``FreeBlockMapBlock`` can only be ``1`` or ``2``! This field
38+
is designed to support incremental and atomic updates of the underlying MSF
39+
file. While writing to an MSF file, if the value of this field is `1`, you
40+
can write your new modified bitfield to page 2, and vice versa. Only when
41+
you commit the file to disk do you need to swap the value in the SuperBlock
42+
to point to the new ``FreeBlockMapBlock``.
43+
- **NumBlocks** - The total number of blocks in the file. ``NumBlocks * BlockSize``
44+
should equal the size of the file on disk.
45+
- **NumDirectoryBytes** - The size of the stream directory, in bytes. The stream
46+
directory contains information about each stream's size and the set of blocks
47+
that it occupies. It will be described in more detail later.
48+
- **BlockMapAddr** - The index of a block within the MSF file. At this block is
49+
an array of ``ulittle32_t``'s listing the blocks that the stream directory
50+
resides on. For large MSF files, the stream directory (which describes the
51+
block layout of each stream) may not fit entirely on a single block. As a
52+
result, this extra layer of indirection is introduced, whereby this block
53+
contains the list of blocks that the stream directory occupies, and the stream
54+
directory itself can be stitched together accordingly. The number of
55+
``ulittle32_t``'s in this array is given by ``ceil(NumDirectoryBytes / BlockSize)``.
56+
57+
The Stream Directory
58+
====================
59+
The Stream Directory is the root of all access to the other streams in an MSF
60+
file. Beginning at byte 0 of the stream directory is the following structure:
61+
62+
.. code-block:: c++
63+
64+
struct StreamDirectory {
65+
ulittle32_t NumStreams;
66+
ulittle32_t StreamSizes[NumStreams];
67+
ulittle32_t StreamBlocks[NumStreams][];
68+
};
69+
70+
And this structure occupies exactly ``SuperBlock->NumDirectoryBytes`` bytes.
71+
Note that each of the last two arrays is of variable length, and in particular
72+
that the second array is jagged.
73+
74+
**Example:** Suppose a hypothetical PDB file with a 4KiB block size, and 4
75+
streams of lengths {1000 bytes, 8000 bytes, 16000 bytes, 9000 bytes}.
76+
77+
Stream 0: ceil(1000 / 4096) = 1 block
78+
79+
Stream 1: ceil(8000 / 4096) = 2 blocks
80+
81+
Stream 2: ceil(16000 / 4096) = 4 blocks
82+
83+
Stream 3: ceil(9000 / 4096) = 3 blocks
84+
85+
In total, 10 blocks are used. Let's see what the stream directory might look
86+
like:
87+
88+
.. code-block:: c++
89+
90+
struct StreamDirectory {
91+
ulittle32_t NumStreams = 4;
92+
ulittle32_t StreamSizes[] = {1000, 8000, 16000, 9000};
93+
ulittle32_t StreamBlocks[][] = {
94+
{4},
95+
{5, 6},
96+
{11, 9, 7, 8},
97+
{10, 15, 12}
98+
};
99+
};
100+
101+
In total, this occupies ``15 * 4 = 60`` bytes, so ``SuperBlock->NumDirectoryBytes``
102+
would equal ``60``, and ``SuperBlock->BlockMapAddr`` would be an array of one
103+
``ulittle32_t``, since ``60 <= SuperBlock->BlockSize``.
104+
105+
Note also that the streams are discontiguous, and that part of stream 3 is in the
106+
middle of part of stream 2. You cannot assume anything about the layout of the
107+
blocks!
108+
109+
Alignment and Block Boundaries
110+
==============================
111+
As may be clear by now, it is possible for a single field (whether it be a high
112+
level record, a long string field, or even a single ``uint16``) to begin and
113+
end in separate blocks. For example, if the block size is 4096 bytes, and a
114+
``uint16`` field begins at the last byte of the current block, then it would
115+
need to end on the first byte of the next block. Since blocks are not
116+
necessarily contiguously laid out in the file, this means that both the consumer
117+
and the producer of an MSF file must be prepared to split data apart
118+
accordingly. In the aforementioned example, the high byte of the ``uint16``
119+
would be written to the last byte of block N, and the low byte would be written
120+
to the first byte of block N+1, which could be tens of thousands of bytes later
121+
(or even earlier!) in the file, depending on what the stream directory says.

llvm/docs/PDB/PdbStream.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
========================================
2+
The PDB Info Stream (aka the PDB Stream)
3+
========================================

llvm/docs/PDB/PublicStream.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
=====================================
2+
The PDB Public Symbol Stream
3+
=====================================

llvm/docs/PDB/TpiStream.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
=====================================
2+
The PDB TPI Stream
3+
=====================================

llvm/docs/PDB/index.rst

Lines changed: 160 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,160 @@
1+
=====================================
2+
The PDB File Format
3+
=====================================
4+
5+
.. contents::
6+
:local:
7+
8+
.. _pdb_intro:
9+
10+
Introduction
11+
============
12+
13+
PDB (Program Database) is a file format invented by Microsoft and which contains
14+
debug information that can be consumed by debuggers and other tools. Since
15+
officially supported APIs exist on Windows for querying debug information from
16+
PDBs even without the user understanding the internals of the file format, a
17+
large ecosystem of tools has been built for Windows to consume this format. In
18+
order for Clang to be able to generate programs that can interoperate with these
19+
tools, it is necessary for us to generate PDB files ourselves.
20+
21+
At the same time, LLVM has a long history of being able to cross-compile from
22+
any platform to any platform, and we wish for the same to be true here. So it
23+
is necessary for us to understand the PDB file format at the byte-level so that
24+
we can generate PDB files entirely on our own.
25+
26+
This manual describes what we know about the PDB file format today. The layout
27+
of the file, the various streams contained within, the format of individual
28+
records within, and more.
29+
30+
We would like to extend our heartfelt gratitude to Microsoft, without whom we
31+
would not be where we are today. Much of the knowledge contained within this
32+
manual was learned through reading code published by Microsoft on their `GitHub
33+
repo <https://github.com/Microsoft/microsoft-pdb>`__.
34+
35+
.. _pdb_layout:
36+
37+
File Layout
38+
===========
39+
40+
.. toctree::
41+
:hidden:
42+
43+
MsfFile
44+
PdbStream
45+
TpiStream
46+
DbiStream
47+
ModiStream
48+
PublicStream
49+
GlobalStream
50+
HashStream
51+
52+
.. _msf:
53+
54+
The MSF Container
55+
-----------------
56+
A PDB file is really just a special case of an MSF (Multi-Stream Format) file.
57+
An MSF file is actually a miniature "file system within a file". It contains
58+
multiple streams (aka files) which can represent arbitrary data, and these
59+
streams are divided into blocks which may not necessarily be contiguously
60+
laid out within the file (aka fragmented). Additionally, the MSF contains a
61+
stream directory (aka MFT) which describes how the streams (files) are laid
62+
out within the MSF.
63+
64+
For more information about the MSF container format, stream directory, and
65+
block layout, see :doc:`MsfFile`.
66+
67+
.. _streams:
68+
69+
Streams
70+
-------
71+
The PDB format contains a number of streams which describe various information
72+
such as the types, symbols, source files, and compilands (e.g. object files)
73+
of a program, as well as some additional streams containing hash tables that are
74+
used by debuggers and other tools to provide fast lookup of records and types
75+
by name, and various other information about how the program was compiled such
76+
as the specific toolchain used, and more. A summary of streams contained in a
77+
PDB file is as follows:
78+
79+
+--------------------+------------------------------+-------------------------------------------+
80+
| Name | Stream Index | Contents |
81+
+====================+==============================+===========================================+
82+
| Old Directory | - Fixed Stream Index 0 | - Previous MSF Stream Directory |
83+
+--------------------+------------------------------+-------------------------------------------+
84+
| PDB Stream | - Fixed Stream Index 1 | - Basic File Information |
85+
| | | - Fields to match EXE to this PDB |
86+
| | | - Map of named streams to stream indices |
87+
+--------------------+------------------------------+-------------------------------------------+
88+
| TPI Stream | - Fixed Stream Index 2 | - CodeView Type Records |
89+
| | | - Index of TPI Hash Stream |
90+
+--------------------+------------------------------+-------------------------------------------+
91+
| DBI Stream | - Fixed Stream Index 3 | - Module/Compiland Information |
92+
| | | - Indices of individual module streams |
93+
| | | - Indices of public / global streams |
94+
| | | - Section Contribution Information |
95+
| | | - Source File Information |
96+
| | | - FPO / PGO Data |
97+
+--------------------+------------------------------+-------------------------------------------+
98+
| IPI Stream | - Fixed Stream Index 4 | - CodeView Type Records |
99+
| | | - Index of IPI Hash Stream |
100+
+--------------------+------------------------------+-------------------------------------------+
101+
| /LinkInfo | - Contained in PDB Stream | - Unknown |
102+
| | Named Stream map | |
103+
+--------------------+------------------------------+-------------------------------------------+
104+
| /src/headerblock | - Contained in PDB Stream | - Unknown |
105+
| | Named Stream map | |
106+
+--------------------+------------------------------+-------------------------------------------+
107+
| /names | - Contained in PDB Stream | - PDB-wide global string table used for |
108+
| | Named Stream map | string de-duplication |
109+
+--------------------+------------------------------+-------------------------------------------+
110+
| Module Info Stream | - Contained in DBI Stream | - CodeView Symbol Records for this module |
111+
| | - One for each compiland | - Line Number Information |
112+
+--------------------+------------------------------+-------------------------------------------+
113+
| Public Stream | - Contained in DBI Stream | - Public (Exported) Symbol Records |
114+
| | | - Index of Public Hash Stream |
115+
+--------------------+------------------------------+-------------------------------------------+
116+
| Global Stream | - Contained in DBI Stream | - Global Symbol Records |
117+
| | | - Index of Global Hash Stream |
118+
+--------------------+------------------------------+-------------------------------------------+
119+
| TPI Hash Stream | - Contained in TPI Stream | - Hash table for looking up TPI records |
120+
| | | by name |
121+
+--------------------+------------------------------+-------------------------------------------+
122+
| IPI Hash Stream | - Contained in IPI Stream | - Hash table for looking up IPI records |
123+
| | | by name |
124+
+--------------------+------------------------------+-------------------------------------------+
125+
126+
More information about the structure of each of these can be found on the
127+
following pages:
128+
129+
:doc:`PdbStream`
130+
Information about the PDB Info Stream and how it is used to match PDBs to EXEs.
131+
132+
:doc:`TpiStream`
133+
Information about the TPI stream and the CodeView records contained within.
134+
135+
:doc:`DbiStream`
136+
Information about the DBI stream and relevant substreams including the Module Substreams,
137+
source file information, and CodeView symbol records contained within.
138+
139+
:doc:`ModiStream`
140+
Information about the Module Information Stream, of which there is one for each compilation
141+
unit and the format of symbols contained within.
142+
143+
:doc:`PublicStream`
144+
Information about the Public Symbol Stream.
145+
146+
:doc:`GlobalStream`
147+
Information about the Global Symbol Stream.
148+
149+
:doc:`HashStream`
150+
Information about the Hash Table stream, and how it can be used to quickly look up records
151+
by name.
152+
153+
CodeView
154+
========
155+
CodeView is another format which comes into the picture. While MSF defines
156+
the structure of the overall file, and PDB defines the set of streams that
157+
appear within the MSF file and the format of those streams, CodeView defines
158+
the format of **symbol and type records** that appear within specific streams.
159+
Refer to the pages on `CodeView Symbol Records` and `CodeView Type Records` for
160+
more information about the CodeView format.

llvm/docs/index.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -274,6 +274,7 @@ For API clients and LLVM developers.
274274
Coroutines
275275
GlobalISel
276276
XRay
277+
PDB/index
277278

278279
:doc:`WritingAnLLVMPass`
279280
Information on how to write LLVM transformations and analyses.
@@ -398,6 +399,9 @@ For API clients and LLVM developers.
398399
:doc:`XRay`
399400
High-level documentation of how to use XRay in LLVM.
400401

402+
:doc:`The Microsoft PDB File Format <PDB/index>`
403+
A detailed description of the Microsoft PDB (Program Database) file format.
404+
401405
Development Process Documentation
402406
=================================
403407

0 commit comments

Comments
 (0)