|
| 1 | +===================================== |
| 2 | +The PDB File Format |
| 3 | +===================================== |
| 4 | + |
| 5 | +.. contents:: |
| 6 | + :local: |
| 7 | + |
| 8 | +.. _pdb_intro: |
| 9 | + |
| 10 | +Introduction |
| 11 | +============ |
| 12 | + |
| 13 | +PDB (Program Database) is a file format invented by Microsoft and which contains |
| 14 | +debug information that can be consumed by debuggers and other tools. Since |
| 15 | +officially supported APIs exist on Windows for querying debug information from |
| 16 | +PDBs even without the user understanding the internals of the file format, a |
| 17 | +large ecosystem of tools has been built for Windows to consume this format. In |
| 18 | +order for Clang to be able to generate programs that can interoperate with these |
| 19 | +tools, it is necessary for us to generate PDB files ourselves. |
| 20 | + |
| 21 | +At the same time, LLVM has a long history of being able to cross-compile from |
| 22 | +any platform to any platform, and we wish for the same to be true here. So it |
| 23 | +is necessary for us to understand the PDB file format at the byte-level so that |
| 24 | +we can generate PDB files entirely on our own. |
| 25 | + |
| 26 | +This manual describes what we know about the PDB file format today. The layout |
| 27 | +of the file, the various streams contained within, the format of individual |
| 28 | +records within, and more. |
| 29 | + |
| 30 | +We would like to extend our heartfelt gratitude to Microsoft, without whom we |
| 31 | +would not be where we are today. Much of the knowledge contained within this |
| 32 | +manual was learned through reading code published by Microsoft on their `GitHub |
| 33 | +repo <https://github.com/Microsoft/microsoft-pdb>`__. |
| 34 | + |
| 35 | +.. _pdb_layout: |
| 36 | + |
| 37 | +File Layout |
| 38 | +=========== |
| 39 | + |
| 40 | +.. toctree:: |
| 41 | + :hidden: |
| 42 | + |
| 43 | + MsfFile |
| 44 | + PdbStream |
| 45 | + TpiStream |
| 46 | + DbiStream |
| 47 | + ModiStream |
| 48 | + PublicStream |
| 49 | + GlobalStream |
| 50 | + HashStream |
| 51 | + |
| 52 | +.. _msf: |
| 53 | + |
| 54 | +The MSF Container |
| 55 | +----------------- |
| 56 | +A PDB file is really just a special case of an MSF (Multi-Stream Format) file. |
| 57 | +An MSF file is actually a miniature "file system within a file". It contains |
| 58 | +multiple streams (aka files) which can represent arbitrary data, and these |
| 59 | +streams are divided into blocks which may not necessarily be contiguously |
| 60 | +laid out within the file (aka fragmented). Additionally, the MSF contains a |
| 61 | +stream directory (aka MFT) which describes how the streams (files) are laid |
| 62 | +out within the MSF. |
| 63 | + |
| 64 | +For more information about the MSF container format, stream directory, and |
| 65 | +block layout, see :doc:`MsfFile`. |
| 66 | + |
| 67 | +.. _streams: |
| 68 | + |
| 69 | +Streams |
| 70 | +------- |
| 71 | +The PDB format contains a number of streams which describe various information |
| 72 | +such as the types, symbols, source files, and compilands (e.g. object files) |
| 73 | +of a program, as well as some additional streams containing hash tables that are |
| 74 | +used by debuggers and other tools to provide fast lookup of records and types |
| 75 | +by name, and various other information about how the program was compiled such |
| 76 | +as the specific toolchain used, and more. A summary of streams contained in a |
| 77 | +PDB file is as follows: |
| 78 | + |
| 79 | ++--------------------+------------------------------+-------------------------------------------+ |
| 80 | +| Name | Stream Index | Contents | |
| 81 | ++====================+==============================+===========================================+ |
| 82 | +| Old Directory | - Fixed Stream Index 0 | - Previous MSF Stream Directory | |
| 83 | ++--------------------+------------------------------+-------------------------------------------+ |
| 84 | +| PDB Stream | - Fixed Stream Index 1 | - Basic File Information | |
| 85 | +| | | - Fields to match EXE to this PDB | |
| 86 | +| | | - Map of named streams to stream indices | |
| 87 | ++--------------------+------------------------------+-------------------------------------------+ |
| 88 | +| TPI Stream | - Fixed Stream Index 2 | - CodeView Type Records | |
| 89 | +| | | - Index of TPI Hash Stream | |
| 90 | ++--------------------+------------------------------+-------------------------------------------+ |
| 91 | +| DBI Stream | - Fixed Stream Index 3 | - Module/Compiland Information | |
| 92 | +| | | - Indices of individual module streams | |
| 93 | +| | | - Indices of public / global streams | |
| 94 | +| | | - Section Contribution Information | |
| 95 | +| | | - Source File Information | |
| 96 | +| | | - FPO / PGO Data | |
| 97 | ++--------------------+------------------------------+-------------------------------------------+ |
| 98 | +| IPI Stream | - Fixed Stream Index 4 | - CodeView Type Records | |
| 99 | +| | | - Index of IPI Hash Stream | |
| 100 | ++--------------------+------------------------------+-------------------------------------------+ |
| 101 | +| /LinkInfo | - Contained in PDB Stream | - Unknown | |
| 102 | +| | Named Stream map | | |
| 103 | ++--------------------+------------------------------+-------------------------------------------+ |
| 104 | +| /src/headerblock | - Contained in PDB Stream | - Unknown | |
| 105 | +| | Named Stream map | | |
| 106 | ++--------------------+------------------------------+-------------------------------------------+ |
| 107 | +| /names | - Contained in PDB Stream | - PDB-wide global string table used for | |
| 108 | +| | Named Stream map | string de-duplication | |
| 109 | ++--------------------+------------------------------+-------------------------------------------+ |
| 110 | +| Module Info Stream | - Contained in DBI Stream | - CodeView Symbol Records for this module | |
| 111 | +| | - One for each compiland | - Line Number Information | |
| 112 | ++--------------------+------------------------------+-------------------------------------------+ |
| 113 | +| Public Stream | - Contained in DBI Stream | - Public (Exported) Symbol Records | |
| 114 | +| | | - Index of Public Hash Stream | |
| 115 | ++--------------------+------------------------------+-------------------------------------------+ |
| 116 | +| Global Stream | - Contained in DBI Stream | - Global Symbol Records | |
| 117 | +| | | - Index of Global Hash Stream | |
| 118 | ++--------------------+------------------------------+-------------------------------------------+ |
| 119 | +| TPI Hash Stream | - Contained in TPI Stream | - Hash table for looking up TPI records | |
| 120 | +| | | by name | |
| 121 | ++--------------------+------------------------------+-------------------------------------------+ |
| 122 | +| IPI Hash Stream | - Contained in IPI Stream | - Hash table for looking up IPI records | |
| 123 | +| | | by name | |
| 124 | ++--------------------+------------------------------+-------------------------------------------+ |
| 125 | + |
| 126 | +More information about the structure of each of these can be found on the |
| 127 | +following pages: |
| 128 | + |
| 129 | +:doc:`PdbStream` |
| 130 | + Information about the PDB Info Stream and how it is used to match PDBs to EXEs. |
| 131 | + |
| 132 | +:doc:`TpiStream` |
| 133 | + Information about the TPI stream and the CodeView records contained within. |
| 134 | + |
| 135 | +:doc:`DbiStream` |
| 136 | + Information about the DBI stream and relevant substreams including the Module Substreams, |
| 137 | + source file information, and CodeView symbol records contained within. |
| 138 | + |
| 139 | +:doc:`ModiStream` |
| 140 | + Information about the Module Information Stream, of which there is one for each compilation |
| 141 | + unit and the format of symbols contained within. |
| 142 | + |
| 143 | +:doc:`PublicStream` |
| 144 | + Information about the Public Symbol Stream. |
| 145 | + |
| 146 | +:doc:`GlobalStream` |
| 147 | + Information about the Global Symbol Stream. |
| 148 | + |
| 149 | +:doc:`HashStream` |
| 150 | + Information about the Hash Table stream, and how it can be used to quickly look up records |
| 151 | + by name. |
| 152 | + |
| 153 | +CodeView |
| 154 | +======== |
| 155 | +CodeView is another format which comes into the picture. While MSF defines |
| 156 | +the structure of the overall file, and PDB defines the set of streams that |
| 157 | +appear within the MSF file and the format of those streams, CodeView defines |
| 158 | +the format of **symbol and type records** that appear within specific streams. |
| 159 | +Refer to the pages on `CodeView Symbol Records` and `CodeView Type Records` for |
| 160 | +more information about the CodeView format. |
0 commit comments