@@ -5,6 +5,44 @@ The MSF File Format
55.. contents ::
66 :local:
77
8+ .. _msf_layout :
9+
10+ File Layout
11+ ===========
12+
13+ The MSF file format consists of the following components:
14+
15+ 1. :ref: `msf_superblock `
16+ 2. :ref: `msf_freeblockmap ` (also know as Free Page Map, or FPM)
17+ 3. Data
18+
19+ Each component is stored as an indexed block, the length of which is specified
20+ in ``SuperBlock::BlockSize ``. The file consists of 1 or more iterations of the
21+ following pattern (sometimes referred to as an "interval"):
22+
23+ 1. 1 block of data
24+ 2. Free Block Map 1 (corresponds to ``SuperBlock::FreeBlockMapBlock `` 1)
25+ 3. Free Block Map 2 (corresponds to ``SuperBlock::FreeBlockMapBlock `` 2)
26+ 4. ``SuperBlock::BlockSize - 3 `` blocks of data
27+
28+ In the first interval, the first data block is used to store
29+ :ref: `msf_superblock `.
30+
31+ The following diagram demonstrates the general layout of the file (\| denotes
32+ the end of an interval, and is for visualization purposes only):
33+
34+ +-------------+-----------------------+------------------+------------------+----------+----+------+------+------+-------------+----+-----+
35+ | Block Index | 0 | 1 | 2 | 3 - 4095 | \| | 4096 | 4097 | 4098 | 4099 - 8191 | \| | ... |
36+ +=============+=======================+==================+==================+==========+====+======+======+======+=============+====+=====+
37+ | Meaning | :ref: `msf_superblock ` | Free Block Map 1 | Free Block Map 2 | Data | \| | Data | FPM1 | FPM2 | Data | \| | ... |
38+ +-------------+-----------------------+------------------+------------------+----------+----+------+------+------+-------------+----+-----+
39+
40+ The file may end after any block, including immediately after a FPM1.
41+
42+ .. note ::
43+ LLVM only supports 4096 byte blocks (sometimes referred to as the "BigMsf"
44+ variant), so the rest of this document will assume a block size of 4096.
45+
846.. _msf_superblock :
947
1048The Superblock
@@ -32,14 +70,9 @@ follows:
3270 sizes of 4KiB, and all further discussion assumes a block size of 4KiB.
3371- **FreeBlockMapBlock ** - The index of a block within the file, at which begins
3472 a bitfield representing the set of all blocks within the file which are "free"
35- (i.e. the data within that block is not used). This bitfield is spread across
36- the MSF file at ``BlockSize `` intervals.
37- **Important **: ``FreeBlockMapBlock `` can only be ``1 `` or ``2 ``! This field
38- is designed to support incremental and atomic updates of the underlying MSF
39- file. While writing to an MSF file, if the value of this field is `1 `, you
40- can write your new modified bitfield to page 2, and vice versa. Only when
41- you commit the file to disk do you need to swap the value in the SuperBlock
42- to point to the new ``FreeBlockMapBlock ``.
73+ (i.e. the data within that block is not used). See :ref: `msf_freeblockmap ` for
74+ more information.
75+ **Important **: ``FreeBlockMapBlock `` can only be ``1 `` or ``2 ``!
4376- **NumBlocks ** - The total number of blocks in the file. ``NumBlocks * BlockSize ``
4477 should equal the size of the file on disk.
4578- **NumDirectoryBytes ** - The size of the stream directory, in bytes. The stream
@@ -53,7 +86,32 @@ follows:
5386 contains the list of blocks that the stream directory occupies, and the stream
5487 directory itself can be stitched together accordingly. The number of
5588 ``ulittle32_t ``'s in this array is given by ``ceil(NumDirectoryBytes / BlockSize) ``.
56-
89+
90+ .. _msf_freeblockmap :
91+
92+ The Free Block Map
93+ ==================
94+
95+ The Free Block Map (sometimes referred to as the Free Page Map, or FPM) is a
96+ series of blocks which contains a bit flag for every block in the file. The
97+ flag will be set to 0 if the block is in use, and 1 if the block is unused.
98+
99+ Each file contains two FPMs, one of which is active at any given time. This
100+ feature is designed to support incremental and atomic updates of the underlying
101+ MSF file. While writing to an MSF file, if the active FPM is FPM1, you can
102+ write your new modified bitfield to FPM2, and vice versa. Only when you commit
103+ the file to disk do you need to swap the value in the SuperBlock to point to
104+ the new ``FreeBlockMapBlock ``.
105+
106+ The Free Block Maps are stored as a series of single blocks thoughout the file
107+ at intervals of BlockSize. Because each FPM block is of size ``BlockSize ``
108+ bytes, it contains 8 times as many bits as an interval has blocks. This means
109+ that the first block of each FPM refers to the first 8 intervals of the file
110+ (the first 32768 blocks), the second block of each FPM refers to the next 8
111+ blocks, and so on. This results in far more FPM blocks being present than are
112+ required, but in order to maintain backwards compatibility the format must stay
113+ this way.
114+
57115The Stream Directory
58116====================
59117The Stream Directory is the root of all access to the other streams in an MSF
@@ -66,10 +124,10 @@ file. Beginning at byte 0 of the stream directory is the following structure:
66124 ulittle32_t StreamSizes[NumStreams];
67125 ulittle32_t StreamBlocks[NumStreams][];
68126 };
69-
127+
70128And this structure occupies exactly ``SuperBlock->NumDirectoryBytes `` bytes.
71129Note that each of the last two arrays is of variable length, and in particular
72- that the second array is jagged.
130+ that the second array is jagged.
73131
74132**Example: ** Suppose a hypothetical PDB file with a 4KiB block size, and 4
75133streams of lengths {1000 bytes, 8000 bytes, 16000 bytes, 9000 bytes}.
97155 {10, 15, 12}
98156 };
99157 };
100-
158+
101159In total, this occupies ``15 * 4 = 60 `` bytes, so ``SuperBlock->NumDirectoryBytes ``
102160would equal ``60 ``, and ``SuperBlock->BlockMapAddr `` would be an array of one
103161``ulittle32_t ``, since ``60 <= SuperBlock->BlockSize ``.
0 commit comments