cdist performance issue with 'sqeuclidean' metric #3251

Closed
emanuele opened this Issue Jan 28, 2014 · 1 comment

2 participants

@emanuele

Unexpectedly the computation of 'sqeuclidean' metric is slower than that of 'euclidean' metric, in scipy.spatial.cdist(). Given that, in principle, 'sqeuclidean' is the same as 'euclidean' but without the expensive sqrt(), then the computation should be much faster.
Evidence:
In [24]: X = np.random.random((100,3))
In [25]: Y = np.random.random((80,3))
In [27]: %timeit cdist(X, Y, metric='euclidean')
10000 loops, best of 3: 127 us per loop
In [28]: %timeit cdist(X, Y, metric='sqeuclidean')
10000 loops, best of 3: 152 us per loop

The reason of this issue is related the implementation of 'sqeuclidean' metric which computes the distance by first computing the 'euclidean' one and then computing the square (with "**2" !, see https://github.com/scipy/scipy/blob/master/scipy/spatial/distance.py#L1224 ). The main reason behind all this is that there is no C implementation of 'sqeuclidean' to wrap.

I may provide a pull request in near future to address this issue.

@argriffing

I think this has been fixed and merged.

@argriffing argriffing closed this Jan 30, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment