Skip to content
Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Stochastic Database Cracking Build Status

This is the source codes for the experiments in the Stochastic Database Cracking: Towards Robust Adaptive Indexing in Main-Memory Column-Stores paper.

To run a particular algorithm on a particular dataset, execute:

./ [data] [algo] [nqueries] [workload] [selectivity] [update] [timelimit]

[data] is one of the following:

  • (it will be downloaded on demand around 2.2GB)

[algo] is one of the following:

  • crack
  • sort
  • scan
  • ddc
  • ddr
  • dd1c
  • dd1r
  • mdd1r
  • mdd1rp1
  • mdd1rp5
  • mdd1rp10
  • mdd1rp50
  • naive_r1th
  • naive_r2th
  • naive_r4th
  • naive_r8th
  • naive_r1x
  • naive_r2x
  • aicc
  • aicc1r
  • aics
  • aics1r
  • aiss

[nqueries] is an integer denoting the number of queries to be executed.

[workload] is one of the following:

  • Random
  • Sequential
  • SeqNoOver
  • SeqRevOver
  • SeqAlternate
  • SeqRand
  • ZoomIn
  • SeqZoomIn
  • ZoomOut
  • SeqZoomOut
  • Skew
  • ConsRandom
  • SkyServer (downloaded on demand)

[selectivity] is a floating point, e.g.:

  • 0.5 (means 50% selectivity)
  • 1e-2 (means 1% selectivity)
  • 1e-7 (means 0.00001% selectivity)

[update] is one of the following:

  • NOUP (means read only queries)
  • LFHV (means low frequency high volume updates)
  • HFLV (means high frequency low volume updates)
  • ROLL (means queue-like update workload)
  • TRASH (means insert 1M tuples at 10, 10^5 th query)
  • DELETE (means delete 1000 tuples every 1000 queries)
  • APPEND (means gradually insert 10M queries every 1000 queries)

[timelimit] is an integer denoting the maximum runtime in seconds before it is prematurely terminated (if exceeded).

Example run:

./ crack 100000 Random 1e-2 NOUP 30

SkyServer dataset and queries

The SkyServer dataset consists of a sequence of 585634221 integers which represents the degree of ascension in the Photoobjall table. Originally the degree is a floating point between 0 to 360, but in this dataset, it has been multiplied by 1 million and converted to integers. You will need to download the SkyServer dataset from scrack-skyserver-dataset repository. Then you will be able to run:

./ dd1r 200000 SkyServer 1e-7 NOUP 60
./ dd1r 200000 SkyServer 1e-7 HFLV 60

The SkyServer queries consist of a sequence of 158325 point queries on the ascension column (similarly formatted by multiplying it by 1 million and converted to integers).


Stochastic Database Cracking






No releases published


No packages published