RocksDB Bloom Filter

What is Bloom Filter

A bloom filter is a bit array that could tell whether an arbitrary key may exist in a set of keys.

Say that there is a set of keys, an algorithm could be applied to create a bit array called bloom filter. Then, given an arbitrary key, this bit array could tell whether a key “may exist” or “definitely not exist” in the original key sets.

Here is a detailed explanation for how bloom filter works.

In RocksDB, every SST file contains a bloom filter. It is used to decide if we need to go into the file to find the exact key.

Life Cycle

In RocksDB, each SST file has a bloom filter, which is created as soon as the file is written. This bloom filter is stored as a part of SST file. There is no difference when building bloom filter for files in different levels.

There is no operation to combine bloom filters. Bloom filters can only be created from a set of keys. When we combine two SST files, new bloom filter is created from keys of new file.

When we open a SST file, the bloom filter is opened and loaded in memory. If

options:: allow_mmap_reads=true

The bloom filter is memory mapped.

When the SST file is closed, the bloom filter is removed from memory. But if

BlockBasedTableOptions::cache_index_and_filter_blocks=true,

the bloom filter may be cached in block cache.

Memory Usage

The bloom filter needs all keys to fit in memory to build it. It seems at first sight that building bloom filter for large SST file is impossible. But in block based table, there is no need to worry because bloom filter is created for each data block that just contains a small range of keys.

Details for format of block based table’s filter could be found here.

The block size is defined in

Options::block_size

The default value is 4K. When building a SST file, key-value pairs are added in a sequence, when there is enough k-v pairs ( < 4K), RocksDB would create bloom filter for them and write it to file. So only a very small fraction of keys is stored in memory to build the bloom filter.

We are also working on a new bloom filter for block based table that contains a filter for all keys in SST file. It could improve RocksDB read performance but requires storing hashes of all the keys when building it. Details could be found in next section.

New bloom filter format

Original bloom filter creates filter for each "data block", thus requires little memory when building it.

We are working on a new bloom filter(called full filter) that contains a filter for all keys in the SST file. This could improve reading performance because it avoids traveling in complicated SST format. But it takes more memory when building because all keys in SST file is required. We have optimization to store hashes of keys in memory. But users still needs to think twice when using it for large SST files.

Users could specify which kind of bloom filter to use in

tableOptions::FilterBlockType

It use the original bloom filter by default.

The full filter block is formatted as follows:

[filter]
[num probes]        : 1 byte
[num blocks]        : 4 bytes

The filter is a big bits array that could be used to check for all keys in SST file. num probes is the number of hash functions used to create bloom filter. In original bloom filter format, it is attached at the end of each filter. (Actually it is the same with full filter) num blocks is a parameter used inside of filter's algorithm. It has no relation to blocks in SST file.

Contents

RocksDB Wiki
Overview
RocksDB FAQ
Terminology
Requirements
Contributors' Guide
Release Methodology
RocksDB Users and Use Cases
RocksDB Public Communication and Information Channels
Basic Operations
- Iterator
- Prefix seek
- SeekForPrev
- Tailing Iterator
- Compaction Filter
- Multi Column Family Iterator
- Read-Modify-Write (Merge) Operator
- Column Families
- Creating and Ingesting SST files
- Single Delete
- SST Partitioner
- Low Priority Write
- Time to Live (TTL) Support
- Transactions
- Snapshot
- DeleteRange
- Atomic flush
- Read-only and Secondary instances
- Approximate Size
- User-defined Timestamp
- Wide Columns
- BlobDB
- Online Verification
Options
- Setup Options and Basic Tuning
- Option String and Option Map
- RocksDB Options File
MemTable
Journal
- Write Ahead Log (WAL)
- MANIFEST
- Track WAL in MANIFEST
Cache
- Block Cache
- SecondaryCache (Experimental)
Write Buffer Manager
Compaction
- Leveled Compaction
- Universal compaction style
- FIFO compaction style
- Manual Compaction
- Subcompaction
- Choose Level Compaction Files
- Managing Disk Space Utilization
- Trivial Move Compaction
- Remote Compaction
SST File Formats
- Block-based Table Format
- PlainTable Format
- CuckooTable Format
- External Table
- Index Block Format
- Bloom Filter
- Data Block Hash Index
IO
- Rate Limiter
- SST File Manager
- Direct I/O
Compression
- Dictionary Compression
Full File Checksum and Checksum Handoff
Background Error Handling
Huge Page TLB Support
Tiered Storage (Experimental)
Logging and Monitoring
- Logger
- Statistics
- Compaction Stats and DB Status
- Perf Context and IO Stats Context
- EventListener
Known Issues
Troubleshooting Guide
Tests
- Stress Test
- Fuzzing
- Benchmarking
Tools / Utilities
- Administration and Data Access Tool
- How to Backup RocksDB?
- Replication Helpers
- Checkpoints
- How to persist in-memory RocksDB database
- Third-party language bindings
- RocksDB Trace, Replay, Analyzer, and Workload Generation
- Block cache analysis and simulation tools
- IO Tracer and Parser
Implementation Details
- Delete Stale Files
- Partitioned Index/Filters
- WritePrepared-Transactions
- WriteUnprepared-Transactions
- How we keep track of live SST files
- How we index SST
- Merge Operator Implementation
- RocksDB Repairer
- Write Batch With Index
- Two Phase Commit
- Iterator's Implementation
- Simulation Cache
- [To Be Deprecated] Persistent Read Cache
- DeleteRange Implementation
- unordered_write
Extending RocksDB
- RocksDB Configurable Objects
- The Customizable Class
- Object Registry
RocksJava
- RocksJava Basics
- Logging in RocksJava
- JNI Debugging
- RocksJava API TODO
- RocksJava Performance on Flash Storage
- Tuning RocksDB from Java
Performance
- Performance Benchmarks
- In Memory Workload Performance
- Read-Modify-Write (Merge) Performance
- Delete A Range Of Keys
- Write Stalls
- Pipelined Write
- MultiGet Performance
- Tuning Guide
- Memory usage in RocksDB
- Speed-Up DB Open
- Implement Queue Service Using RocksDB
- Asynchronous IO
- Off-peak in RocksDB
Projects Being Developed
Misc
- Building on Windows
- Developing with an IDE
- Open Projects
- Talks
- Publication
- Features Not in LevelDB
- How to ask a performance-related question?
- Articles about Rocks

RocksDB Bloom Filter

What is Bloom Filter

Life Cycle

Memory Usage

New bloom filter format

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!