This repository contains the implementation code for "A Study in Kernel Functions for Sparse Numerical Array-Based Range Filters", a project focused on enhancing the Sparse Numerical Array-Based Range Filters (SNARF) data structure by exploring the integration of various non-linear kernel functions. Our aim is to investigate if non-linear models can provide improved performance, accuracy, and space efficiency over the linear model currently used in SNARF.
The project addresses the limitations of the linear spline model in SNARF, proposing the use of different non-linear kernel functions (quadratic, cubic, logarithmic, and exponential) to potentially reduce the model size and achieve comparable error rates with greater efficiency.
- Implement a configurable SNARF version supporting various kernel functions.
- Theoretically analyze and evaluate the trade-offs between model size, error rates, and performance efficiency.
- Conduct comprehensive performance comparisons with existing data structures (Cuckoo Filter, SuRF, Rosetta) using real-world and synthetic datasets.
- For C++ setup, refer to Makefile and Dockerfile for details.
- Python 3.11 for simulations and benchmarking.
- Environment: Tested on Ubuntu 22.04 LTS with an Intel i9-13900KS processor and 32GB of RAM.
/include
: Header files for the modified SNARF implementation./src
: Source code for the modified SNARF implementation./data
: Scripts and utilities for generating and managing datasets./benchmarks
: Benchmarking scripts and result analysis tools./docs
: Documentation on the algorithms, data structures, and experimental setup./examples
: A simple tutorial on how to run the configurable SNARF./tests
: Unit tests that encompass the entire implementation.
Instructions on building the project, generating datasets, and running benchmarks are provided in the corresponding directories. Ensure to follow the setup procedures outlined in /docs
for compiling the source code.
The project is dockerized, so the following commands will need to be run in order to use it on a local environment.
Build the container:
docker build -t snarfpp .
Run the app with a make
command:
docker run --name snarfpp_container -v $(pwd):/usr/src/snarfpp snarfpp <make_command>
For example, to run tests, use the following command:
docker run --name snarfpp_container -v $(pwd):/usr/src/snarfpp snarfpp tests
Alternatively, you can use the run_docker.sh
script to run the app with the same commands, but includes cleanup tasks. An argument of either tests
, examples
, or all
must be supplied.
./run_docker.sh <make_command>
This work contributes to the field of learned index structures by providing insights into the potential benefits of integrating non-linear kernel functions into SNARF, offering a more adaptable and efficient solution for range filtering tasks.
We acknowledge the foundational work on SNARF and the learned index structures that inspired this research. Special thanks to the authors of the original SNARF paper and the original source code for their invaluable contributions to the domain.
This project is licensed under the MIT License.
For any inquiries or contributions, please open an issue in this repository or contact me via any of my listed socials.