Skip to content
/ rsqf Public

Go implementation of a Rank and Select Quotient Filter

License

Notifications You must be signed in to change notification settings

nfisher/rsqf

Repository files navigation

Rank and Seek Quotient Filter

Godoc Build Status Go Report Card codecov

A general purpose counting filter: making every bit count by Pandey et al., SIGMOD’17

A Rank and Seek Quotient Filter (RSQF) is an Approximate Membership Query data structure. It is similar to the popular Bloom Filter (BF) however where a BF only provides insert, and lookup.

An RSQF provides:

  • insert
  • delete
  • lookup
  • resize1
  • merge

1 - resize does not permit an increase in p (e.g. unique identity capacity). It does however increase the total number of available slots. This provides two benefits:

  1. Faster queries as the remainders can be aligned closer to their home slot with shorter runs.
  2. Increase in the number of entries allowing for deletion.

Overview

This implementation of RSQF has a fixed error rate of 1/512 or ~0.2%. This was done so that each block is ensured to be contiguous in memory.

r is therefore fixed at 9 (r = log2(1/0.001953)). q will vary relative to p (e.g. for 1m entries p = log2(1,000,000/0.001953)).

A filter that accepts 1 million entries will require ~11.67 bits per entry and require approximately 1.46MB (Si) of memory.

Status

  • Rank
  • Select
  • Hash (fnv, might consider Murmur3, CityHash, or xxHash)
  • FirstAvailableSlot
  • Insert
  • MayContain
  • Resize
  • Merge
  • Count (CQF)

Sizing

The following table shows the approximate sizing for this RSQF implementation at various values of n.

n δ p r q Q size
100,000 1/512 26 9 17 184.32 KB
1,000,000 1/512 29 9 20 1.47 MB
10,000,000 1/512 33 9 24 23.59 MB
100,000,000 1/512 36 9 27 188.74 MB
1,000,000,000 1/512 39 9 30 1.51 GB

Glossary

This glossary summarises the variables specified in the stonybrook paper.

Note: Where integers are specified, fractional values are rounded up to the nearest integer when calculated from floating point values.

n - (integer)
Maximum number of insertions (e.g. 1,000,000).
δ - (fraction)
Error rate or false-positive rate (e.g. 1/512 or 1/100).
p - (integer)
Number of bits required from the hashed input to achieve the target error for the given number of insertions (n). The p-bit hash is split into high bits (quotient) and low bits (remainder).
p = log2(n/δ)
r / remainder - (integer)
The number of remainder bits which are written to `Q.remainders`.
r = log2(1/δ)
q / quotient - (integer)
The number of quotient bits used to indicate the expected `home slot` in the filter.
q = p - r
run
A run is a consecutive group of remainders where the quotient is equal.
h0(a) = h0(b) = h0(c)
occupied - (bit)
A bit that indicates the position of a home slot for a given `run`.
runend - (bit)
A bit that indicates the end of a `run`.
Q - (struct)
The RSQF data structure which contains 2q r-bits of available space allocated by a `block` array. The memory in bits required by the struct can be calculated as follows:
2^q/64 * (8+64(r+2))
Q.occupieds - (bit vector)
Q.runends - (bit vector)
Q.remainders - (bit vector)
block
A block is `64(r + 2) + 8` bit structure. It is composed of the following fields:
  • offset - 1 x 8-bit.
  • runends - 1 x 64-bit.
  • occupieds - 1 x 64-bit.
  • remainders - r x 64-bit.
home slot - (array index)
The home slot is the location where a remainder would be placed if h0(x) is unoccupied by another value.
i = h0(x); Q[i].remainders == h1(x)
slot i - (array index)
i = h0(x)
h(x) - (integer)
A universal hashing function. For this library FNV-1a (64-bit) was employed as it is available in the standard library.
h0(x) / i - (integer)
The masked upper bits of the hash shifted right `r` times.
h0 = (h(x) >> r) & (2^q - 1)
h1(x) - (integer)
The masked lower half of the hash.
h1 = h(x) & (2^r - 1)
B - (bit vector)
Variable representing a bit-vector. Typically one of `Q.occupieds`, `Q.runends`, or `Q.remainders`.
RANK(B, i) - (integer)
Rank returns the number of 1s in B up to position i.
RANK(B, i) =
SELECT(B, i) - (integer)
Select returns the index of the ith 1 in B.
SELECT(B, i) =
Oi - (integer)
Is every 64th slot which is stored in `Q[i].offset` to save space. The offset is calculated with the algorithm that follows.
Oi = SELECT(Q.runends, RANK(Q.occupieds, i)) - i
Oj - (integer)
Oj is a derived intermediate slot value which is discovered using the algorithm that follows.
i = h0(x)
Oi = SELECT(Q.runends, RANK(Q.occupieds, i)) - i
d = RANK(q.occupieds[i + 1, ..., j], j - i - 1)
t = SELECT(Q.runends[i + Oi + 1,...,2^q - 1], d)
Oj = i + Oi + t - j

About

Go implementation of a Rank and Select Quotient Filter

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published