Skip to content
Work in progress for eventual contribution to scikit-learn
Python
Find file
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
tests distrubite all the files to the package named pyDistances
.gitignore
Makefile HUGE commit: make dist metrics work for sparse input
README.rst Update README.rst
__init__.py distrubite all the files to the package named pyDistances
_parse_src.py HUGE commit: make dist metrics work for sparse input
ball_tree.pyx distrubite all the files to the package named pyDistances
bench.py distrubite all the files to the package named pyDistances
bench_ball.py
bench_kdt.py fix KD Tree
bench_v_sklearn.py cleanup
brute_neighbors.pyx distrubite all the files to the package named pyDistances
distfuncs.pxi HUGE commit: make dist metrics work for sparse input
distfuncs.pxi_src HUGE commit: make dist metrics work for sparse input
distmetrics.pxd
distmetrics.pyx add KD Tree functionality
setup.py distrubite all the files to the package named pyDistances
slow_ball_tree.py distrubite all the files to the package named pyDistances
utils.py HUGE commit: make dist metrics work for sparse input
visualize_KDT.py

README.rst

Note: this module should be considered unsupported. It probably won't compile, and if it does, it may contain errors in some cases. See a more developed version of this functionality in scikit-learn: https://github.com/scikit-learn/scikit-learn/pull/1732/

pyDistances

Fast cython distance computations in a variety of metrics which expose pointers to C functions.

This is a framework that will allow general distance metrics to be incorporated into tree-based neighbors searches. The idea is that we need a fast way to compute the distance between two points under a given metric. In the basic framework here, this involves creating an object which exposes C-pointers to a function and a parameter structure so that the distance function can be called from python as normal, or alternatively can be called directly from cython without python overhead.

pdist/cdist

The code has functions which duplicate the behavior of scipy.spatial.distances.pdist and scipy.spatial.distances.cdist.

BallTree

The code features a BallTree object which can quickly return nearest neighbors under any of the available metrics.

Benchmarks

run bench.py for a comparison of runtimes between pyDistance and scipy.spatial for the available metrics. Times are comparable in most cases, much faster in a few cases, and slightly slower in a few cases.

TODO

Search TODO within distmetrics.pyx to see a list. One big one (which should be straightforward) is to make the distance metrics work with CSR matrices. This will involve writing an alternate version of each core distance function.

Something went wrong with that request. Please try again.