Skip to content

Commit cb70fe1

Browse files
author
Zachary Turner
committed
[PDB Docs] Add documentation for the hash table format.
llvm-svn: 357826
1 parent 91d6caf commit cb70fe1

File tree

1 file changed

+101
-0
lines changed

1 file changed

+101
-0
lines changed

llvm/docs/PDB/HashTable.rst

Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,103 @@
11
The PDB Serialized Hash Table Format
22
====================================
3+
4+
.. contents::
5+
:local:
6+
7+
.. _hash_intro:
8+
9+
Introduction
10+
============
11+
12+
One of the design goals of the PDB format is to provide accelerated access to
13+
debug information, and for this reason there are several occasions where hash
14+
tables are serialized and embedded directly to the file, rather than requiring
15+
a consumer to read a list of values and reconstruct the hash table on the fly.
16+
17+
The serialization format supports hash tables of arbitrarily large size and
18+
capacity, as well as value types and hash functions. The only supported key
19+
value type is a uint32. The only requirement is that the producer and consumer
20+
agree on the hash function. As such, the hash function can is not discussed
21+
further in this document, it is assumed that for a particular instance of a PDB
22+
file hash table, the appropriate hash function is being used.
23+
24+
On-Disk Format
25+
==============
26+
27+
.. code-block:: none
28+
29+
.--------------------.-- +0
30+
| Size |
31+
.--------------------.-- +4
32+
| Capacity |
33+
.--------------------.-- +8
34+
| Present Bit Vector |
35+
.--------------------.-- +N
36+
| Deleted Bit Vector |
37+
.--------------------.-- +M ─╮
38+
| Key | │
39+
.--------------------.-- +M+4 │
40+
| Value | │
41+
.--------------------.-- +M+4+sizeof(Value) │
42+
... ├─ |Capacity| Bucket entries
43+
.--------------------. │
44+
| Key | │
45+
.--------------------. │
46+
| Value | │
47+
.--------------------. ─╯
48+
49+
- **Size** - The number of values contained in the hash table.
50+
51+
- **Capacity** - The number of buckets in the hash table. Producers should
52+
maintain a load factor of no greater than ``2/3*Capacity+1``.
53+
54+
- **Present Bit Vector** - A serialized bit vector which contains information
55+
about which buckets have valid values. If the bucket has a value, the
56+
corresponding bit will be set, and if the bucket doesn't have a value (either
57+
because the bucket is empty or because the value is a tombstone value) the bit
58+
will be unset.
59+
60+
- **Deleted Bit Vector** - A serialized bit vector which contains information
61+
about which buckets have tombstone values. If the entry in this bucket is
62+
deleted, the bit will be set, otherwise it will be unset.
63+
64+
- **Keys and Values** - A list of ``Capacity`` hash buckets, where the first
65+
entry is the key (always a uint32), and the second entry is the value. The
66+
state of each bucket (valid, empty, deleted) can be determined by examining
67+
the present and deleted bit vectors.
68+
69+
70+
.. _hash_bit_vectors:
71+
72+
Present and Deleted Bit Vectors
73+
===============================
74+
75+
The bit vectors indicating the status of each bucket are serialized as follows:
76+
77+
.. code-block:: none
78+
79+
.--------------------.-- +0
80+
| Word Count |
81+
.--------------------.-- +4
82+
| Word_0 | ─╮
83+
.--------------------.-- +8 │
84+
| Word_1 | │
85+
.--------------------.-- +12 ├─ |Word Count| values
86+
... │
87+
.--------------------. │
88+
| Word_N | │
89+
.--------------------. ─╯
90+
91+
The words, when viewed as a contiguous block of bytes, represent a bit vector with
92+
the following layout:
93+
94+
.. code-block:: none
95+
96+
.------------. .------------.------------.
97+
| Word_N | ... | Word_1 | Word_0 |
98+
.------------. .------------.------------.
99+
| | | | |
100+
+N*32 +(N-1)*32 +64 +32 +0
101+
102+
where the k'th bit of this bit vector represents the status of the k'th bucket
103+
in the hash table.

0 commit comments

Comments
 (0)