Skip to content


Repository files navigation

Multidimensional Adaptive/Progressive Indexing

This project is a stand-alone implementation of all the current adaptive and progressive multidimensional indexing algorithms.


CMake to be installed and a C++11 compliant compiler. Python 3.7 and Jupyter Notebooks are necessary to run the scripted experiments and their respective plots.

Available Indexing Algorithms

  • Adaptive KD-Tree
  • Progressive KD-Tree
  • Greedy Progressive KD-Tree
  • Quasii
  • Full KD-Tree (Median Pivoting)
  • Full KD-Tree (Average Pivoting)
  • Full Scan (Candidate List)

Available Datasets & Workloads

We use three real datasets, each with their own respective workload, and one synthetic dataset following a uniform random distribution. For the synthetic dataset, we also provide 8 different synthetic workloads


All real datasets and workloads were made publically available here.


The Sloan Digital Sky Survey is a project to map the universe. Their data and queries are publicly available at their website. The data set we use here consists of two columns, era, and dec, from the photoobjall table with approximately 69 million tuples. The workload consists of 100,000 real range queries executed on those two attributes.


The power benchmark consists of sensor data collected from a manufacturing installation, obtained from the DEBS 2012 challenge. The data set has three dimensions and 10 million tuples. The workload consists of random close-range queries on each dimension.


The 1000 Genomes Project collects data regarding human genomes. It consists of 10 million genomes, described in 19 dimensions. The workload consists of queries performed by bio-informaticians.


The synthetic data set follows a uniform data distribution for each attribute in the table, consisting of 4-byte floating-point numbers in the range of [0, 3 * 10^7), where 3 * 10^7 is the number of tuples. We use eight different synthetic workloads in our performance comparison. All workloads consist of queries with filters in all available dimensions, the follow the below form:

SELECT SUM(R.A) FROM R WHERE (R.A BETWEEN a_low AND a_high) and (R.B BETWEEN b_low AND b_high) ...

The values for low and high are chosen based on the workload pattern. The different workload patterns and their mathematical description are depicted below.

Running the experiments


First, we compile the code using release (-O3) mode

cmake -DCMAKE_BUILD_TYPE=Release && make

Download Data

For the real data, you must execute the script in the root directory to automatically download all datasets and workloads. You need approximately 50Gb of free disk space.


Running Experiments

All experiments are inside the experiments folder. There are 3 main sets of experiments. Note that unless stated otherwise, all algorithms run with min_partition_size = 1024, and all progressive indexing algorithms run with delta = 0.2.

  • real-data-workload: Runs all algorithms on the three real datasets and their respective workloads.
  • synthetic_workloads: Runs all algorithms on the uniform random dataset with 7 different synthetic workloads (shifting workload is the first 10 queries of uniform repeated multiple times).
  • uniformdata-delta: Compares how the delta parameter influences the Progressive KD-Tree on a uniform random dataset & workload (delta ranges from 0.1 to 1). Every folder contains three main files,, config.json, and plots.ipynb. If you want to change how certain experiments run, change the config.json file. To run an experiment, you must cd to its folder and execute the file. Example:
cd experiments/real-data-workload

Beforing plotting, make sure you have the necessary packages installed: First make sure you have pip installed:

pip --version

Should return version information about pip.

Next, install the necessary packages to plot the figures and tables:

pip install jupyterlab numpy pandas plotly matplotlib

After running each experiment, each folder contains a python notebook you can use to plot and analyze the results. For that, you must initiate the jupyter notebook server at the desired folder

jupyter notebook



No description, website, or topics provided.






No releases published


No packages published