This folder contains scripts that run all nearest neighbor searches in a number of libraries. For the most part, the scripts are very bare-bones. For example, they don't even output the results.
To run the scripts, you'll obviously first need to install the libraries.
/install folder in this repo contains scripts for installing all of these libraries.
With all the libraries installed, just call the
runtest.sh script with a single parameter that is the dataset to test on.
The table below provides a brief description of the libraries compared against.
|FLANN||The Fast Library for Approximate Nearest Neighbor queries. This C++ library is the standard method for nearest neighbor in Matlab/Octave and the OpenCV computer vision toolkit.|
|Julia||A popular new language designed from the ground up for fast data processing. Julia supports faster nearest neighbor queries using the KDTrees.jl package.|
|Langford's cover tree||A reference implementation for the cover tree data structure created by John Langford. The implementation is in C, and the data structure is widely included in C/C++ machine learning libraries.|
|MLPack||A C++ library for machine learning. MLPack was the first library to demonstrate the utility of generic programming in machine learning. The interface for nearest neighbor queries lets you use either a cover tree or kdtree.|
|R||A popular language for statisticians. Nearest neighbor queries are implemented in the FNN package, which provides bindings to the C-based ANN library for kdtrees.|
|scikit-learn||The Python machine learning toolkit. The documentation is very beginner friendly and easy to learn. The interface for nearest neighbor queries lets you use either a ball tree or kdtree to speed up the calculations. Both data structures were written in Cython.|
|Weka||A Java data mining tool with a popular GUI frontend. Nearest neighbor queries in Weka are very, very slow for me and not remotely competitive with any of the libraries above.|