Switch branches/tags
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
cover_tree
AllKnn.java
README.md
allknn.R
allknn.flann
allknn.julia
allknn.scikit
runtest.sh
time2sec.hs

README.md

bench/allknn

This folder contains scripts that run all nearest neighbor searches in a number of libraries. For the most part, the scripts are very bare-bones. For example, they don't even output the results.

To run the scripts, you'll obviously first need to install the libraries. The /install folder in this repo contains scripts for installing all of these libraries. With all the libraries installed, just call the runtest.sh script with a single parameter that is the dataset to test on.

The table below provides a brief description of the libraries compared against.

Library Description
FLANN The Fast Library for Approximate Nearest Neighbor queries. This C++ library is the standard method for nearest neighbor in Matlab/Octave and the OpenCV computer vision toolkit.
Julia A popular new language designed from the ground up for fast data processing. Julia supports faster nearest neighbor queries using the KDTrees.jl package.
Langford's cover tree A reference implementation for the cover tree data structure created by John Langford. The implementation is in C, and the data structure is widely included in C/C++ machine learning libraries.
MLPack A C++ library for machine learning. MLPack was the first library to demonstrate the utility of generic programming in machine learning. The interface for nearest neighbor queries lets you use either a cover tree or kdtree.
R A popular language for statisticians. Nearest neighbor queries are implemented in the FNN package, which provides bindings to the C-based ANN library for kdtrees.
scikit-learn The Python machine learning toolkit. The documentation is very beginner friendly and easy to learn. The interface for nearest neighbor queries lets you use either a ball tree or kdtree to speed up the calculations. Both data structures were written in Cython.
Weka A Java data mining tool with a popular GUI frontend. Nearest neighbor queries in Weka are very, very slow for me and not remotely competitive with any of the libraries above.