Skip to content
fast kernel evaluation in high dimensions via hashing
C++ C MATLAB Makefile Python CMake Other
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
benchmark Update README.md May 11, 2019
demo Update rehashing.py May 13, 2019
docs Move doxygen May 14, 2019
hbe Update README.md May 14, 2019
plots Add files via upload May 12, 2019
resources askit May 11, 2019
.gitignore
.gitmodules Move submodules May 10, 2019
LICENSE Create LICENSE May 10, 2019
README.md Update README.md May 13, 2019

README.md

Hashing-Based-Estimators (HBE)

HBE is a C++ library for fast kernel evaluation for high-dimensional data that also includes a python implementation for illustration purposes. HBE uses Locality Sensitive Hashing (LSH) to produce provably accurate estimates of the kernel density for a given query point as well as weighted generalizations thereof. HBE is designed for radially decreasing kernel functions (e.g. Gaussian or Exponential kernels) and truly high dimensional data where FIGTree and Dual-Tree algorithms for fast kernel evaluation are slow.

Currently, HBE supports one LSH family: Eucledian LSH, introduced by (Datar, Immorlica, Indyk, Mirrokni SoCG'04) in the context of solving the Nearest Neighbor Search problem for the Euclidean distance metric (see also the E2LSH package by Alex Andoni).

How to use HBE

The first step to use HBE is to consult the python demo that describes the tuning process. Alternatively you can directly consult the C++ documentation. In our Wiki you can also find descriptions of how to construct tunable synthetic benchmarks for kernel evaluation as well as produce dataset visualizations.

How fast is HBE?

The speed of HBE depends on the desired relative error, the kernel and the dataset. For relative error 0.1 under the Gaussian kernel and datasets around 1M points HBE takes less than 25ms per query and very often around ~2ms. For specific datasets and comparison with competing methods (FIGTree, ASKIT, Random Sampling (RS)) see Table 1.

Table 1

If you want to gain intuition about what properties of datasets affect the performance of kernel evaluation algorithms see our Synthetic Benchmarks page.

Authors

HBE is mainly developed by Kexin Rong and Paris Siminelakis. HBE has grown out of research projects with our collaborators Peter Bailis, Moses Charikar and Philip Levis.

If you want to cite HBE in a publication, here is the bibliographic information of our research papers where the algorithms are described and analyzed:

Rehashing Kernel Evaluation in High Dimensions. Paris Siminelakis, Kexin Rong, Peter Bailis, Moses Charikar, Philip Levis. ICML 2019

Hashing-Based-Estimators for Kernel Density in High Dimensions. Moses Charikar, Paris Siminelakis, FOCS 2017.

License

HBE is available under the MIT License (see LICENSE.txt)

You can’t perform that action at this time.