Skip to content
This repository has been archived by the owner on Jul 5, 2023. It is now read-only.

PdbFileFormat

Sébastien Marchand edited this page Aug 10, 2016 · 2 revisions

PDB File Format

The purpose of this document is to describe the content of the PDB File Format. The structure of the PDB enclosure (which is a Multi-Stream File or MSF) will not be described in detail in this document, which will focus on the contents of the streams.

Streams with fixed index.

Stream #0 : Root directory

This stream has information on where to find the other streams in the PDB. When present, it seems to be a copy of the previous version of the MSF directory.

Stream #1 : PDB headers

Contains the PDB header and the list of named streams.

Stream #2 : TPI (Type info)

This article discusses the contents of this stream. The document "Microsoft Symbol and Type Information" (see references) is really useful for understanding the contents of this stream.

Stream #3 : DBI (Debug Info)

Structure :

Block Length Details
DBI Header pos(sizeof(DbiHeader)) PdbFileFormat#DbiHeader
Module informations DbiHeader.gp_modi_size Set of PdbFileFormat#DbiModuleInfo
Section contribs DbiHeader.section_contribution_size Signature (32 bits) followed by a set of PdbFileFormat#SectionContrib
Section map DbiHeader.section_map_size Seems to contain a table indicating the offset to apply to each address in the different section to get the corresponding RVA.
The structure of this substream is :
- Number of sections (including the reloc section). It is on 2 bytes and it is repeated twice (if there is 10 sections you'll read 0x0A000A00)
- Set of PdbFileFormat#DbiSectionMapItem structures, one for each section. The structure for the reloc section seems to have the number 0 and its field are not valid (length = 0xFFFFFFFF).
File info DbiHeader.file_info_size This substream contains a list of the files used to generate each obj, the structure of it is :
Header | File-blocks table | Offset table | Name table.
- The header contain the size of the File-blocks table (on 16 bits) followed by the size of the offset table (16 bits). You have to multiply these size by 4 to obtain the size in bytes.
- The file block-tables is divided in 2 parts. The first half contains the starting value of each blocks (16 bits by value) and the second one contains the length of these blocks. These value refers to the offset table. There seems that there's always a last block with a starting value equal to the length of the offset table and a length of 0 at the end of this table.
- The offset table contains offset to the beginning of file names in the name table. These offset are relative to the beginning of the name table.
The example here DbiFileInfoSample illustrates this information. TS map DbiHeader.ts_map_size Size = 0 on testDll, we can ignore it. EC info DbiHeader.ec_info_size It's important to note that this field appear after the dbg_header_size field in the header of this stream but the EC info are located before the DbgHeader.
The structure of this substream is :
- Signature (32 bits : 0xFEEFFEEF)
- 32 bits field (value = 1, seems to be a version or an age)
- Offset to the end of the name block (32 bits)
- Filename block + number of entries in the name table (32 bits)
- name table (set of offset relative to the beginning of the filename block)
- value that seems to be the number of filenames in the the filename block (32 bits) but it's not always the case (I've observed a difference of +/- 1 in some cases). DbiDbgHeader sizeof(DbiDbgHeader) PdbFileFormat#DbiDbgHeader

DbiHeader

struct DbiHeader {
int32 signature;
uint32 version;
uint32 age;
int16 global_symbol_info_stream;
uint16 pdb_dll_version;
int16 public_symbol_info_stream;
uint16 pdb_dll_build_major;
int16 symbol_record_stream;
uint16 pdb_dll_build_minor;
uint32 gp_modi_size;
uint32 section_contribution_size;
uint32 section_map_size;
uint32 file_info_size;
uint32 ts_map_size;
uint32 mfc_index;
uint32 dbg_header_size;
uint32 ec_info_size;
uint16 flags;
uint16 machine;
uint32 reserved;
};

DbiModuleInfo

struct DbiModuleInfo {
uint32 opened;
SectionContrib section;
uint16 flags;
int16 stream;
uint32 symbol_bytes;
uint32 old_lines_bytes;
uint32 lines_bytes;
int16 num_files;
uint16 padding;
uint32 offsets;
uint32 num_source;
uint32 num_compiler;
std::string module_name;
std::string object_name;
};

SectionContrib

struct SectionContrib {
int16 section;
int16 pad1;
int32 offset;
int32 size;
uint32 flags;
int16 module;
int16 pad2;
uint32 data_crc;
uint32 reloc_crc;
};

DbiDbgHeader

struct DbiDbgHeader {
int16 fpo;
int16 exception;
int16 fixup;
int16 omap_to_src;
int16 omap_from_src;
int16 section_header;
int16 token_rid_map;
int16 x_data;
int16 p_data;
int16 new_fpo;
int16 section_header_origin;
};

DbiSectionMapItem

struct DbiSectionMapItem {
uint8 flags;
uint8 section_type;
uint8 unknown_data_1[4]; // Don't know that's in this field, it's always 0x00000000 or 0xFFFFFFFF.
uint16 section_number; // Value = 0 for the reloc section.
uint8 unknown_data_2[4]; // Same as unknown_data_1.
uint32 rva_offset; // Value added to the address offset when calculating the RVA.
uint32 section_length;
};

DbiFileInfoSample

Header : 
03 00 05 00 // The File-blocks table have a length of 3*32bits and the offset table have a length of 5*32 bits.

File-blocks table : 3*2*16 bits :
00 00 // Block 1 : Starting offset
03 00 // Block 2 : Starting offset
06 00 // Block 3 : Starting offset
03 00 // Block 1 : Length
02 00 // Block 2 : Length
00 00 // Block 3 : Length

Offset table : 5*32 bits :
(0x00000000) ./src_A.cc // Block 1 start at 0 and have a length of 3, so it is composed by src_A.cc, header_1.h and header_2.h.
(0x5B000000) ./header_1.h
(0xB8000000) ./header_2.h
(0x16010000) ./src_B.cc // Block 2 start at 3 and have a length of 2, so it is composed by src_B.cc and header_1.h.
(0x5B000000) ./header_1.h

Streams with variable numbers.

Names


This stream ID can be found in the header of the DBI stream.

This stream contains the name to all the source files used to build the PE file matching the PDB.

The structure of this stream is :

- Signature (32 bits : 0xFEEFFEEF)

- 32 bits field (value = 1, seems to be a version or an age)

- Offset to the end of the name block (32 bits)

- Filename block + number of entries in the name table (32 bits)

- name table (set of offset relative to the beginning of the filename block)

- value that seems to be the number of filenames in the the filename block (32 bits) but it's not always the case (We've observed a difference of +/- 1 in some cases).

Globals

This stream ID can be found in the header of the DBI stream.

Public

The public stream contains information about public symbols. Its ID can be found in the header of the DBI stream. The structure of the stream is described below.

Public stream header

struct PublicStreamHeader {
// The offset of the sorted table of public symbols, in bytes and relative
// to the |unknown| field of this header.
uint32 sorted_symbols_offset;

// The size of the sorted table of public symbols, in bytes.
// This is equal to 4 times the number of public symbols.
uint32 sorted_symbols_size;

// These fields are always equal to zero.
uint32 zero_0;
uint32 zero_1;
uint32 zero_2;
uint32 zero_3;

// Padding field, which can have any value.
uint32 padding;

// An unknown field that is always equal to -1.
uint32 unknown;

// The signature of the stream, which is equal to |kPublicStreamSignature|.
uint32 signature;

// The size of the table of public symbol offsets.
// This is equal to 8 times the number of public symbols.
uint32 offset_table_size;

// The size of the hash table of public symbols, in bytes. This includes
// a 512-byte bitset with a 1 in used buckets followed by an array identifying
// a representative of each bucket.
uint32 hash_table_size;
};

Public symbol offsets

For each public symbol, in the same order as in the symbol record stream:

  • uint32: The offset of the symbol in the symbol record stream, incremented by one.

    - uint32: The value 0x1.

Public symbol hash table

The public stream then contains the representation of an hash table in which keys are symbol names. This part is omitted when the PDB doesn't contain public symbols.

The hash table representation starts with a 512-byte bit set in which bits corresponding to non-empty buckets are set to one. The bucket corresponding to a symbol name can be computed using the HashString() function of pdb_util.cc.

After the bit set, we find an uint32 with value 0x0.

Then, a representative of each bucket is listed using an uint32 that is the index of a public symbol in the "Public symbol offsets" table multiplied by 12.


Sorted table of symbols

Finally, for each public symbol, ordered address in the image:

  • uint32: Index of the symbol in the symbol record stream.

Modules

Those stream IDs can be found in the DBI stream.

The first field encountered in this stream is a 4 bytes value indicating it's type, the expected value is C13 (4).

After this we have a symbol table. The size of this table is located in the information that we got for this stream in the DBI stream. This is the same type of table as the one we find in the symbol record stream.

The line information is arranged as a back-to-back run of {type, len} prefixed chunks. The types are DEBUG_S_FILECHKSMS and DEBUG_S_LINES. The first of these provides file names and a file content checksum, where each record is identified by its index into its chunk (excluding type and len). The other one consist of a bunch of CV_SourceFile structures.

Section header

Not encountered yet.

Section header origin

This stream ID can be found in the Dbg header of the DBI stream.

Not encountered yet.

FPO (Frame pointer omission)

This stream ID can be found in the Dbg header of the DBI stream.

New FPO

This stream ID can be found in the Dbg header of the DBI stream.

Exception

This stream ID can be found in the Dbg header of the DBI stream.

Not encountered yet.

Fixup

This stream ID can be found in the Dbg header of the DBI stream.

Omap-to-src

This stream ID can be found in the Dbg header of the DBI stream.

Not encountered yet.

Omap-from-src

This stream ID can be found in the Dbg header of the DBI stream.

Not encountered yet.

Token rid map

This stream ID can be found in the Dbg header of the DBI stream.

Not encountered yet.

x-data

This stream ID can be found in the Dbg header of the DBI stream.

Not encountered yet.

p-data

This stream ID can be found in the Dbg header of the DBI stream.

Not encountered yet..

Type info hash

This stream ID can be found in the header of the type info stream.

Symbol record

This stream ID can be found in the header of the DBI stream..

This stream contains a set of symbol record. The structure of each block is pretty simple :

- Symbol record length (2 bytes), the length field is not included in the length.

- Symbol record type ID (2 bytes).

- Symbol record data (length - 2 bytes).


The data content for each different kind of symbol can be match to a structure contained in the cvinfo header file. The different symbol types ID are also enumerated on this file. The document "Microsoft Symbol and Type Information" (see References) is useful to understand the content of this stream.

References