Skip to content

cdbmake 12.1

Manvendra Bhangui edited this page Feb 25, 2024 · 2 revisions

NAME

cdbmake - write cdb by reading series of encoded records on input

cdbmake-12 - write cdb by reading key value pair records on input

cdbmake-sv - write cdb by reading /etc/services records on input

cdbdump - read cdb in stdin and print database in cdbmake format.

SYNOPSIS

cdbmake f ftmp

cdbdump

DESCRIPTION

cdbmake reads a series of encoded records from its standard input and writes a constant database to f. See the section "CDB ENCODING".

cdbmake ensures that f is updated atomically, so programs reading f never have to wait for cdbmake to finish. It does this by first writing the database to ftmp and then moving ftmp on top of f. If ftmp already exists, it is destroyed. The directories containing ftmp and f must be writable to cdbmake; they must also be on the same filesystem.

cdbdump reads a constant database from its standard input and prints the database contents, in cdbmake(1) format, on standard output.

CDB ENCODING

The input data for cdbmake multiple lines, with each lines as per following format

+keylen,datalen:key->data

where key and data can be any string including ASCII or binary characters. Positions, lengths, and hash values are 32-bit quantities, stored in little-endian form in 4 bytes. Thus a cdb must fit into 4 gigabytes. If you have a file with each line in key value pairs, you can convert it into the above format using the cdbmake-12 script.

CDB FORMAT SPECIFICATION

A structure for constant databases 19960914 Copyright 1996 D. J. Bernstein, djb@pobox.com

A cdb is an associative array: it maps strings (``keys'') to strings (``data'').

A cdb contains 256 pointers to linearly probed open hash tables. The hash tables contain pointers to (key,data) pairs. A cdb is stored in a single file on disk:

    +----------------+---------+-------+-------+-----+---------+
    | p0 p1 ... p255 | records | hash0 | hash1 | ... | hash255 |
    +----------------+---------+-------+-------+-----+---------+

Each of the 256 initial pointers states a position and a length. The position is the starting byte position of the hash table. The length is the number of slots in the hash table.

Records are stored sequentially, without special alignment. A record states a key length, a data length, the key, and the data.

Each hash table slot states a hash value and a byte position. If the byte position is 0, the slot is empty. Otherwise, the slot points to a record whose key has that hash value.

Positions, lengths, and hash values are 32-bit quantities, stored in little-endian form in 4 bytes. Thus a cdb must fit into 4 gigabytes.

A record is located as follows. Compute the hash value of the key in the record. The hash value modulo 256 is the number of a hash table. The hash value divided by 256, modulo the length of that table, is a slot number. Probe that slot, the next higher slot, and so on, until you find the record or run into an empty slot.

The cdb hash function is ``h = ((h << 5) + h) ^ c'', with a starting hash of 5381.

SEE ALSO

cdbdump(1), cdbget(1), cdbgetm(1), cdbmake-12(1), cdbmake-sv(1),

Clone this wiki locally