This project is a stand-alone implementation of the SPST cracker index.
CMake to be installed and a
C++11 compliant compiler. Python 2.7 (with pathlib) and SQLite are necessary to run all the automatic setup scripts.
Running the experiments
There are two ways to run experiments. The automatic way sets up the whole environment, and stores the result in an SQLite database, it runs all the experiments used in the progressive indexing papers. Manually, all the process needs to be done from the command line, from compiling, generating data, workload and executing the experiments. The results are then printed in the console
To automatically run all the experiments you need to run the following script:
The output header is: swap_time;index_insert_time;scan_time;lookup_time;prune_time;update_time
- swap_time is the time spent swapping elements (In the first query it also takes into account the time to copy the column)
- index_insert_time is the time to insert nodes into the cracker index.
- scan_time is the time to scan the column between the indexed offsets.
- lookup_time is the time to perform a lookup on the cracker index.
- prune_time is the time to prune the leaves of the SPST.
- update_time is the time to perform the Merge Ripple Inserts.
First, we compile the code using release (-O3) mode
cmake -DCMAKE_BUILD_TYPE=Release && make
The experiments have the following paramegers:
- num-queries is an integer representing the total amount of queries in the workload.
- column-size is the an integer representing the size of the column.
- workload-pattern is the pattern the workload follows.
- update-frequency is the frequency the baches of updates will be executed (from update-frequency to update-frequency queries).
- update-size is the an integer representing the number of updates performed per batch.
- algorithm is the id of the to-be-executed algorithm.
- start-updates is the an integer representing the amount of queries that should run before issuing any updates.
./main --num-queries=(?) --column-size=(?) --workload-pattern=(?) --update-frequency=(?) --update-size=(?) --query-selectivity=(?) --algorithm=(?) --start-updates=(?)
- 1 = Avl Tree
- 2 = SPST
- 1 = Random
- 2 = Sequential
- 3 = Skewed
./main --num-queries=100 --column-size=10000000 --workload-pattern=1 --update-frequency=0 --update-size=0 --query-selectivity=1 --algorithm=1 --start-updates=0
Analyzing the results
If the results are generated by the experiments script (scripts/run_experiments.py), all the results are stored in an SQLite instance (results.db) in the repo root. It can be queried/exported to python/r for plotting purposes.
Third Party Code