Skip to content
This project is a stand-alone implementation of the SPST cracker index.
C++ Python CMake
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
src
.gitignore
CMakeLists.txt
README.md
SPST.cbp
run.py
sqlite.py

README.md

SPST

This project is a stand-alone implementation of the SPST cracker index.

Requirements

CMake to be installed and a C++11 compliant compiler. Python 2.7 (with pathlib) and SQLite are necessary to run all the automatic setup scripts.

Running the experiments

There are two ways to run experiments. The automatic way sets up the whole environment, and stores the result in an SQLite database, it runs all the experiments used in the progressive indexing papers. Manually, all the process needs to be done from the command line, from compiling, generating data, workload and executing the experiments. The results are then printed in the console

Automatic

To automatically run all the experiments you need to run the following script:

python run.py

Manual

The output header is: swap_time;index_insert_time;scan_time;lookup_time;prune_time;update_time

  • swap_time is the time spent swapping elements (In the first query it also takes into account the time to copy the column)
  • index_insert_time is the time to insert nodes into the cracker index.
  • scan_time is the time to scan the column between the indexed offsets.
  • lookup_time is the time to perform a lookup on the cracker index.
  • prune_time is the time to prune the leaves of the SPST.
  • update_time is the time to perform the Merge Ripple Inserts.

Compile

First, we compile the code using release (-O3) mode

cmake -DCMAKE_BUILD_TYPE=Release && make

Run Experiments

The experiments have the following paramegers:

  • num-queries is an integer representing the total amount of queries in the workload.
  • column-size is the an integer representing the size of the column.
  • workload-pattern is the pattern the workload follows.
  • update-frequency is the frequency the baches of updates will be executed (from update-frequency to update-frequency queries).
  • update-size is the an integer representing the number of updates performed per batch.
  • algorithm is the id of the to-be-executed algorithm.
  • start-updates is the an integer representing the amount of queries that should run before issuing any updates.
./main --num-queries=(?) --column-size=(?) --workload-pattern=(?) --update-frequency=(?) --update-size=(?) --query-selectivity=(?) --algorithm=(?) --start-updates=(?)

algorithm:

  • 1 = Avl Tree
  • 2 = SPST

workload-pattern:

  • 1 = Random
  • 2 = Sequential
  • 3 = Skewed

Example:

./main --num-queries=100 --column-size=10000000 --workload-pattern=1 --update-frequency=0 --update-size=0 --query-selectivity=1 --algorithm=1 --start-updates=0

Analyzing the results

If the results are generated by the experiments script (scripts/run_experiments.py), all the results are stored in an SQLite instance (results.db) in the repo root. It can be queried/exported to python/r for plotting purposes.

Third Party Code

Papers

You can’t perform that action at this time.