Skip to content
Eyal Rozenberg edited this page May 3, 2017 · 2 revisions

Overview

The Delta compression scheme applies a discrete derivative to the input: Instead of representation data[i], data[i+1], data[i+2] ... it represents data[i+1] - data[i], data[i+2] - data[i+1] etc. Decompression is the inverse operation: Discrete integration. Of course, we are not interested in closed-form symbolic formulae for discrete calculus (an interesting topic in itself) but rather in concrete, evaluted functions. Hence this discrete integration is realized simply a prefix sum of the elementwise differences, starting from some initial baseline.

Like many of the schemes in this library, our implementation of the Delta scheme augments its basic data with additional baseline values: Instead of only using the first value in the entire compressed input - from which one has to compute a very log sum - we keep a baseline value for every segment of s consecutive elements. These segments can now be decompressed independently, as these baseline values are the results of the prefix sum upto the beginning of each segment. These auxiliary sums do not change the semantics of the rest of the compressed d data.

This scheme's shorthand designator is DELTA.

Scheme parameters

Parameter Aliases Symbol used
Domain element type Uncompressed type, uncompressed data type τ_D
Difference type Element difference type τ_δ
Column length Single bitmap length, uncompressed data length n
Segment length Baselining period s

Memory layout

  • The following scalars are used: (Potentially) Overall baseline value; column length
  • An array of length ceil(n/s) holds baseline values for the beginning of each of the floor(n/s) segments of the data, and perhaps a last partial segment. The baseline values may be the original values at positions 0, s, 2*s and so on, but this is not required to be the case
  • The derivatives - differences - constitute one array, differences of element type τ_δ. differences[i] is the difference between the ith and the i-1th original elements - for i which is not at the beginning of a segment; for these elements, the difference from the baseline is stored. This can be the difference from the previous uncompressed value, and 0 at the beginning - but that's not necessarily the case.

Links to the code

Clone this wiki locally