This is the source codes for the experiments in the Stochastic Database Cracking: Towards Robust Adaptive Indexing in Main-Memory Column-Stores paper.
To run a particular algorithm on a particular dataset, execute:
./run.sh [data] [algo] [nqueries] [workload] [selectivity] [update] [timelimit]
[data] is one of the following:
- skyserver.data (it will be downloaded on demand around 2.2GB)
[algo] is one of the following:
[nqueries] is an integer denoting the number of queries to be executed.
[workload] is one of the following:
- SkyServer (downloaded on demand)
[selectivity] is a floating point, e.g.:
- 0.5 (means 50% selectivity)
- 1e-2 (means 1% selectivity)
- 1e-7 (means 0.00001% selectivity)
[update] is one of the following:
- NOUP (means read only queries)
- LFHV (means low frequency high volume updates)
- HFLV (means high frequency low volume updates)
- ROLL (means queue-like update workload)
- TRASH (means insert 1M tuples at 10, 10^5 th query)
- DELETE (means delete 1000 tuples every 1000 queries)
- APPEND (means gradually insert 10M queries every 1000 queries)
[timelimit] is an integer denoting the maximum runtime in seconds before it is prematurely terminated (if exceeded).
./run.sh 100000000.data crack 100000 Random 1e-2 NOUP 30
SkyServer dataset and queries
The SkyServer dataset consists of a sequence of 585634221 integers which represents the degree of ascension in the Photoobjall table. Originally the degree is a floating point between 0 to 360, but in this dataset, it has been multiplied by 1 million and converted to integers. You will need to download the SkyServer dataset from scrack-skyserver-dataset repository. Then you will be able to run:
./run.sh skyserver.data dd1r 200000 SkyServer 1e-7 NOUP 60 ./run.sh skyserver.data dd1r 200000 SkyServer 1e-7 HFLV 60
The SkyServer queries consist of a sequence of 158325 point queries on the ascension column (similarly formatted by multiplying it by 1 million and converted to integers).