You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In CVTArchive, we need to use a k-D tree for nearest neighbor searches when there are a lot of centroids. There are many implementations of k-D tree to choose from. Two notable implementations are scipy.spatial.cKDTree and sklearn.neighbors.NearestNeighbors. Both implementations are optimized for batched nearest-neighbor queries, but in _get_index, we query the nearest neighbor of a single point. If we run the following code, we can compare the performance of each implementation on single and batch queries.
Update as of v0.5.0: Most of pyribs now uses batch inputs, making this PR a bit less relevant since we do not have to worry about adding single entries to the k-D tree. However, this is still useful for functions such as add_single which rely on adding one entry at a time. Furthermore, it seems scipy's k-D tree was faster than sklearn's even with batched inputs.
In CVTArchive, we need to use a k-D tree for nearest neighbor searches when there are a lot of centroids. There are many implementations of k-D tree to choose from. Two notable implementations are
scipy.spatial.cKDTree
andsklearn.neighbors.NearestNeighbors
. Both implementations are optimized for batched nearest-neighbor queries, but in_get_index
, we query the nearest neighbor of a single point. If we run the following code, we can compare the performance of each implementation on single and batch queries.I got the following output, which shows that
NearestNeighbors
is ~10x slower on single queries.scipy cKDTree (batch) 0.036180734634399414 scipy cKDTree (single) 100%|██████████████████████████████████████████████████████████| 100000/100000 [00:03<00:00, 27718.94it/s] 3.6090946197509766 # cKDTree is fast sklearn NN with kd_tree (batch) 0.052004098892211914 sklearn NN with kd_tree (single) 100%|███████████████████████████████████████████████████████████| 100000/100000 [00:41<00:00, 2397.46it/s$ 41.71117091178894 # NearesNeighbors is slow
In short, we should definitely use
cKDTree
.The text was updated successfully, but these errors were encountered: