Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added query_ball_point and related tests to scipy.spatial.cKDTree. #262

Closed
wants to merge 5 commits into from

Conversation

patvarilly
Copy link
Contributor

scipy.spatial.cKDTree implements only part of the more general scipy.spatial.KDTree interface in Cython. This pull requests ports over the method query_ball_point to cKDTree and adds corresponding unit tests to verify correctness.

I have used a slightly modified version of Anne Archibald's benchmarking script to measure the resulting performance gains (see http://mail.scipy.org/pipermail/scipy-dev/2008-October/009854.html, script copied below). The corresponding numbers show a ~ 250-350-fold speedup (this seems relatively insensitive to the query ball radius, see below):

dimension 3, 10000 points
KDTree constructed: 0.0823722
cKDTree constructed: 0.00159502
KDTree 1000 lookups: 0.574862
cKDTree 1000 lookups: 0.00470614
flat cKDTree 1000 lookups: 0.342408
KDTree 1000 ball lookups: 5.54947
cKDTree 1000 ball lookups: 0.0189772
flat cKDTree 1000 ball lookups: 0.343151
Ball lookups agree? True

dimension 8, 10000 points
KDTree constructed: 0.168782
cKDTree constructed: 0.00218606
KDTree 1000 lookups: 10.7511
cKDTree 1000 lookups: 0.091007
flat cKDTree 1000 lookups: 0.675035
KDTree 1000 ball lookups: 6.13426
cKDTree 1000 ball lookups: 0.017458
flat cKDTree 1000 ball lookups: 0.351695
Ball lookups agree? True

dimension 16, 10000 points
KDTree constructed: 0.11167
cKDTree constructed: 0.00220299
KDTree 1000 lookups: 62.2512
cKDTree 1000 lookups: 1.01098
flat cKDTree 1000 lookups: 1.44906
KDTree 1000 ball lookups: 3.89043
cKDTree 1000 ball lookups: 0.015486
flat cKDTree 1000 ball lookups: 0.352298
Ball lookups agree? True

For a probe radius of 0.5 instead of 0.2 (see script for meaning), I get the following results in 3 dimensions:

dimension 3, 10000 points
KDTree constructed: 0.0826931
cKDTree constructed: 0.00157905
KDTree 1000 lookups: 0.556555
cKDTree 1000 lookups: 0.00440192
flat cKDTree 1000 lookups: 0.338442
KDTree 1000 ball lookups: 16.3933
cKDTree 1000 ball lookups: 0.0493851
flat cKDTree 1000 ball lookups: 0.423063
Ball lookups agree? True

As far as I can tell, the issue is that KDTree.query_ball_point is written for readability, but ends up doing large numbers of list operations to build the results.

From here on, it should not be much harder to port over the remaining methods of KDTree to cKDTree, although these are not a current priority for me. As an aside, it seemed at first glance that the public interfaces for KDTree.query() and cKDTree.query() have fallen out of sync, though I didn't check thoroughly.

All the best,
Patrick


import numpy as np
import time

from scipy.spatial import KDTree, cKDTree

m = 3
n = 10000
r = 1000

data = np.concatenate((np.random.randn(n//2,m),
np.random.randn(n-n//2,m)+np.ones(m)))
queries = np.concatenate((np.random.randn(r//2,m),
np.random.randn(r-r//2,m)+np.ones(m)))

print "dimension %d, %d points" % (m,n)

t = time.time()
T1 = KDTree(data)
print "KDTree constructed:\t%g" % (time.time()-t)
t = time.time()
T2 = cKDTree(data)
print "cKDTree constructed:\t%g" % (time.time()-t)

t = time.time()
w = T1.query(queries)
print "KDTree %d lookups:\t%g" % (r, time.time()-t)
del w

t = time.time()
w = T2.query(queries)
print "cKDTree %d lookups:\t%g" % (r, time.time()-t)
del w

T3 = cKDTree(data,leafsize=n)
t = time.time()
w = T3.query(queries)
print "flat cKDTree %d lookups:\t%g" % (r, time.time()-t)
del w

t = time.time()
w1 = T1.query_ball_point(queries, 0.2)
print "KDTree %d ball lookups:\t%g" % (r, time.time()-t)

t = time.time()
w2 = T2.query_ball_point(queries, 0.2)
print "cKDTree %d ball lookups:\t%g" % (r, time.time()-t)

t = time.time()
w3 = T3.query_ball_point(queries, 0.2)
print "flat cKDTree %d ball lookups:\t%g" % (r, time.time()-t)

all_good = True
for a, b in zip(w1, w2):
if sorted(a) != sorted(b):
all_good = False
for a, b in zip(w1, w3):
if sorted(a) != sorted(b):
all_good = False

print "Ball lookups agree? %s" % str(all_good)

@patvarilly
Copy link
Contributor Author

A quick benchmark of query_ball_tree in cKDTree vs KDTree (code below). The speedup here is also ~300-fold:

dimension 3, 1000 points in tree 1, 1000 points in tree 2
KDTrees constructed: 0.0186899
cKDTrees constructed: 0.000445127
KDTree query_ball_tree(r = 0.5): 2.69068
cKDTree query_ball_tree(r = 0.5): 0.00940299


import numpy as np
import time

from scipy.spatial import KDTree, cKDTree

m = 3
n1 = 1000
n2 = 1000

data1 = np.concatenate((np.random.randn(n1//2,m),
np.random.randn(n1-n1//2,m)+np.ones(m)))
data2 = np.concatenate((np.random.randn(n2//2,m),
np.random.randn(n2-n2//2,m)+np.ones(m)))

print "dimension %d, %d points in tree 1, %d points in tree 2" % (m,n1,n2)

t = time.time()
T1 = KDTree(data1)
T2 = KDTree(data2)
print "KDTrees constructed:\t%g" % (time.time()-t)
t = time.time()
cT1 = cKDTree(data1)
cT2 = cKDTree(data1)
print "cKDTrees constructed:\t%g" % (time.time()-t)

t = time.time()
T1.query_ball_tree(T2, 0.5)
print "KDTree query_ball_tree(r = 0.5):\t%g" % (time.time()-t)

t = time.time()
cT1.query_ball_tree(cT2, 0.5)
print "cKDTree query_ball_tree(r = 0.5):\t%g" % (time.time()-t)

@patvarilly
Copy link
Contributor Author

I've simplified the logic in KDTree.query_pairs, then ported the simplified logic to cKDTree (basically, being careful about not visiting the same pair of nodes twice, instead of keeping track of visited pairs in a set and testing against the set every time a new node pair is visited). Here is a quick benchmark (cKDTree version is ~750-fold faster with respect to old query_pairs, ~ 370-fold faster than the revised version):

dimension 3, 1000 points in tree
KDTree constructed: 0.00983906
cKDTree constructed: 0.000268936
KDTree old query_pairs(r = 0.5): 2.66407
KDTree query_pairs(r = 0.5): 1.31732
cKDTree query_pairs(r = 0.5): 0.00354481


import numpy as np
import time

from scipy.spatial import KDTree, cKDTree

m = 3
n = 1000

data = np.concatenate((np.random.randn(n//2,m),
np.random.randn(n-n//2,m)+np.ones(m)))

print "dimension %d, %d points in tree" % (m,n)

t = time.time()
T = KDTree(data)
print "KDTree constructed:\t%g" % (time.time()-t)
t = time.time()
cT = cKDTree(data)
print "cKDTree constructed:\t%g" % (time.time()-t)

t = time.time()
T.old_query_pairs(0.5) # Manually reinserted old query_pairs() back into KDTree for this benchmark
print "KDTree old query_pairs(r = 0.5):\t%g" % (time.time()-t)

t = time.time()
T.query_pairs(0.5)
print "KDTree query_pairs(r = 0.5):\t%g" % (time.time()-t)

t = time.time()
cT.query_pairs(0.5)
print "cKDTree query_pairs(r = 0.5):\t%g" % (time.time()-t)

@patvarilly
Copy link
Contributor Author

The benchmark for sparse_distance_matrix:

dimension 3, 1000 points in tree 1, 1000 points in tree 2
KDTrees constructed: 0.0191472
cKDTrees constructed: 0.000463009
KDTree sparse_distance_matrix(r = 0.2): 4.14861
cKDTree sparse_distance_matrix(r = 0.2): 0.0185668
KDTree sparse_distance_matrix(r = 0.5): 7.69248
cKDTree sparse_distance_matrix(r = 0.5): 0.037811


import numpy as np
import time

from scipy.spatial import KDTree, cKDTree

m = 3
n1 = 1000
n2 = 1000

data1 = np.concatenate((np.random.randn(n1//2,m),
np.random.randn(n1-n1//2,m)+np.ones(m)))
data2 = np.concatenate((np.random.randn(n2//2,m),
np.random.randn(n2-n2//2,m)+np.ones(m)))

print "dimension %d, %d points in tree 1, %d points in tree 2" % (m,n1,n2)

t = time.time()
T1 = KDTree(data1)
T2 = KDTree(data2)
print "KDTrees constructed:\t%g" % (time.time()-t)
t = time.time()
cT1 = cKDTree(data1)
cT2 = cKDTree(data1)
print "cKDTrees constructed:\t%g" % (time.time()-t)

t = time.time()
T1.sparse_distance_matrix(T2, 0.2)
print "KDTree sparse_distance_matrix(r = 0.2):\t%g" % (time.time()-t)

t = time.time()
cT1.sparse_distance_matrix(cT2, 0.2)
print "cKDTree sparse_distance_matrix(r = 0.2):\t%g" % (time.time()-t)

t = time.time()
T1.sparse_distance_matrix(T2, 0.5)
print "KDTree sparse_distance_matrix(r = 0.5):\t%g" % (time.time()-t)

t = time.time()
cT1.sparse_distance_matrix(cT2, 0.5)
print "cKDTree sparse_distance_matrix(r = 0.5):\t%g" % (time.time()-t)

@patvarilly
Copy link
Contributor Author

cKDTree is now feature complete with respect to KDTree.

Here is a benchmark for cKDTree.count_neighbors. The speedup here isn't as great because my current Cython code ends up allocating and freeing an array on every call to __count_neighbors_traverse. A better solution would be to allocate as many arrays as there are levels in the two trees passed to count_neighbors at the beginning, and then use these arrays as a stack as you go up and down the trees. That said, a 20-fold improvement for now is not too bad, either.

dimension 3, 1000 points in tree 1, 1000 in tree 2
KDTrees constructed: 0.0217071
cKDTrees constructed: 0.000464916
KDTree count_neighbors(r = 0.5): 1.88414
cKDTree count_neighbors(r = 0.5): 0.07599


import numpy as np
import time

from scipy.spatial import KDTree, cKDTree

m = 3
n1 = 1000
n2 = 1000

data1 = np.concatenate((np.random.randn(n1//2,m),
np.random.randn(n1-n1//2,m)+np.ones(m)))
data2 = np.concatenate((np.random.randn(n2//2,m),
np.random.randn(n2-n2//2,m)+np.ones(m)))

print "dimension %d, %d points in tree 1, %d in tree 2" % (m,n1,n2)

t = time.time()
T1 = KDTree(data1)
T2 = KDTree(data2)
print "KDTrees constructed:\t%g" % (time.time()-t)
t = time.time()
cT1 = cKDTree(data1)
cT2 = cKDTree(data1)
print "cKDTrees constructed:\t%g" % (time.time()-t)

t = time.time()
T1.count_neighbors(T2, 0.5)
print "KDTree count_neighbors(r = 0.5):\t%g" % (time.time()-t)

t = time.time()
cT1.count_neighbors(cT2, 0.5)
print "cKDTree count_neighbors(r = 0.5):\t%g" % (time.time()-t)

@sturlamolden
Copy link
Contributor

Countless integer size errors.

Bug fixes:

https://github.com/sturlamolden/scipy/blob/master/scipy/spatial/ckdtree.pyx

@sturlamolden
Copy link
Contributor

The code is also full of missing variable declarations. Almost none of the range loops are correctly cythonized due to this, and also many other speed-critical variables lack declaration.

@rgommers
Copy link
Member

As commented on the mailing list (http://thread.gmane.org/gmane.comp.python.scientific.devel/16703/focus=16734), here are Sturla's edits on top of Patrick's: https://github.com/rgommers/scipy/tree/fixes-sturla

@rgommers
Copy link
Member

Patrick, is the last commit you added ("Numerous bug fixes") the same as the last 5 commits of https://github.com/rgommers/scipy/tree/fixes-sturla? I spent some effort to port them and give them reasonable commit messages, so it would be better if you could use those. If you're not sure how I can help with that.

@patvarilly
Copy link
Contributor Author

Ralf, that's right, I was trying to bring in the changes into my branch to address some of the issues that Sturla brought up, but I did it by copying the final files from fixes-sturla and committing them. Could you let me know the correct way to bring these in?

@sturlamolden
Copy link
Contributor

The tests for query_pairs still fail on Windows 64.

@rgommers
Copy link
Member

To remove the last commit from your branch and this PR:

$ git reset --hard HEAD^   # go one commit back
$ git push origin ckdtree-improvements  --force   # assume your github remote is called "origin"

Then to grab commits from my repo

$ git remote add rgommers https://github.com/rgommers/scipy.git
$ git fetch rgommers
$ git merge rgommers/fixes-sturla

Then push that back to Github so it shows up here. The second step above can also be replaced by asking me to send you a PR with my changes.

You can also use git cherry-pick commit-id instead of using git merge. Then you can carry over commits one by one. In this case merge should be easier though, because it takes all commits at once.

@patvarilly
Copy link
Contributor Author

Thanks for the detailed instructions! All set now, hopefully. Will comment on the substantive parts before going to bed tonight.

@patvarilly
Copy link
Contributor Author

Following up on Sturla's suggestion, I implemented a version of query that has the same recursive traversal strcuture as all the other methods, and that uses the rectangle-to-point distance tracker class for readability. Unfortunately, it's about 3 times as slow as the old version of query:

dimension 3, 10000 points
KDTree constructed: 0.0961771
cKDTree constructed: 0.00191593
KDTree 1000 lookups: 0.666298
cKDTree 1000 new lookups: 0.011121
cKDTree 1000 old lookups: 0.00368786
flat cKDTree 1000 new lookups: 0.079838
flat cKDTree 1000 old lookups: 0.0734

As far as I can tell, this is because the old version of query uses a priority queue to continuosly select for processing the closest unexamined kd-tree node. This is something that can't be done in a recursive algorithm. I committed the code for historical curiosity, but will remove it in the following commit.

@patvarilly
Copy link
Contributor Author

Updated benchmarks with current version of the code. I ran these on a different machine, but the improvement factors should be mostly unaffected:

query_ball_point: ~ 200-fold improvement
query_ball_tree: ~ 350-fold improvement
query_pairs: ~ 350-fold improvement
count_neighbors: ~ 70-fold improvement
sparse_distance_matrix: ~ 2000- to 3000- fold improvement (improvement from previous benchmark due to cythonizing loop indices)

So making the code immensely more readable accrues only a slight speed penalty for some functions, and has made others (notably sparse_distance_matrix) much faster.


query_ball_point (query radius 0.5, same benchmark code as above)

dimension 3, 10000 points
KDTree 1000 ball lookups: 5.31274
cKDTree 1000 ball lookups: 0.024559
flat cKDTree 1000 ball lookups: 0.0826628

dimension 8, 10000 points
KDTree 1000 ball lookups: 5.80942
cKDTree 1000 ball lookups: 0.024523
flat cKDTree 1000 ball lookups: 0.0908511

dimension 16, 10000 points
KDTree 1000 ball lookups: 3.98487
cKDTree 1000 ball lookups: 0.022228
flat cKDTree 1000 ball lookups: 0.0916049


query_ball_tree

dimension 3, 1000 points in tree 1, 1000 points in tree 2
KDTree query_ball_tree(r = 0.5): 2.34604
cKDTree query_ball_tree(r = 0.5): 0.00636292


query_pairs

dimension 3, 1000 points in tree
KDTree query_pairs(r = 0.5): 1.2968
cKDTree query_pairs(r = 0.5): 0.00355601


count_neighbors

dimension 3, 1000 points in tree 1, 1000 in tree 2
KDTree count_neighbors(r = 0.5): 1.71428
cKDTree count_neighbors(r = 0.5): 0.0246949


sparse_distance_matrix

dimension 3, 1000 points in tree 1, 1000 points in tree 2
KDTree sparse_distance_matrix(r = 0.2): 3.69791
cKDTree sparse_distance_matrix(r = 0.2): 0.00170898
KDTree sparse_distance_matrix(r = 0.5): 6.51638
cKDTree sparse_distance_matrix(r = 0.5): 0.00226092

@rgommers
Copy link
Member

@patvarilly you used a script adapted from the mailing list right? It may be useful to put that under spatial/benchmarks and make it available as spatial.bench(). You can find an example of how this is done in scipy/linalg (although it seems to be partially broken, missing imports).

@rgommers
Copy link
Member

And: timings look great, nice job!

@sturlamolden
Copy link
Contributor

All tests pass on Windows 64 and Python 2.7. The code looks nice and clean now. I think it's ok to go into SciPy if the tests pass on the other platforms as well. Patrick's rewrite removes serveral errors from the current cKDTree code (integer size and exception handling) and makes it fully compatible with KDTree (i.e. can be used as a drop-in replacement). We should still look into GIL handling, but this should be postponed to a second update. KDTree should be kept as it is easier to extend from Python.

@rgommers
Copy link
Member

All passes on OS X with Python 3.2. too. So guess this is ready. bench() would be nice to have still.

@sturlamolden
Copy link
Contributor

I have placed a ticket for this PR on the trac (ticket #1712).

@sturlamolden
Copy link
Contributor

On Win64 .query() should return an array of dtype=int if a C long does not overflow. There is only a test for this on scalar return values.

@sturlamolden
Copy link
Contributor

I have an implementation of the last issue ready. It still needs testing, I will send Patrick a pull request when I'm done.
For now it is here:
https://github.com/sturlamolden/scipy/tree/ckdtree-improvements

@sturlamolden
Copy link
Contributor

Ok, tests passes. I've sent a pull request to Patrick that should be merged first.

@rgommers
Copy link
Member

@patvarilly have you had a chance to look at Sturla's PR? Would be nice to get this merged soon.

@patvarilly
Copy link
Contributor Author

Alright, Sturla's PR is (finally) in, and the benchmarks are now self-contained in spatial/benchmarks (runnable by saying "import scipy.spatial; scipy.spatial.bench()". As far as I can tell, that would wrap up this pull request and make it ready for merging. My apologies for the extreme delay in getting this last bit done. I look forward to your comments.

@sturlamolden
Copy link
Contributor

We should regenerate ckdtree.c with Cython 0.17 to avoid any Cython related bugs, and re-run all the tests. I haven't had time to install the latest Cython yet. It was released on September 1th.

@patvarilly
Copy link
Contributor Author

OK, regenerated ckdtree.c with Cython 0.17. I re-ran the tests and they all pass.

@rgommers
Copy link
Member

rgommers commented Sep 7, 2012

Benchmarks look good. They run a bit slow but I don't think that's a problem. And hard to avoid if you want to show differences of O(1000) without printing too many digits.

@rgommers
Copy link
Member

rgommers commented Sep 7, 2012

All tests pass for me too. Ready to merge I think.

@rgommers
Copy link
Member

rgommers commented Sep 7, 2012

Checked the change to setup.py, this isn't needed for setupscons.py or bento.

@rgommers
Copy link
Member

rgommers commented Sep 7, 2012

One more request @patvarilly: would you mind adding a short paragraph describing the main changes in doc/release/0.12.0-notes?

@patvarilly
Copy link
Contributor Author

All set. I don't know what caused the indentation to go a bit off, but it's now fixed. What do you mean by "Checked the change to setup.py, this isn't needed for setupscons.py or bento"? Should I have made any other changes to make the benchmarks work? I usually compile and install SciPy by running "python setup.py install --user" in my scipy directory.

@rgommers
Copy link
Member

rgommers commented Sep 9, 2012

No other changes are needed. We have 3 build systems, one based on numpy.distutils (setup.py files), one based on numscons (SConscript files) and one based on Bento (bento.info / bscript files). So for most changes to setup.py, those need to be made for all build systems. In this case that wasn't needed.

@rgommers
Copy link
Member

rgommers commented Sep 9, 2012

Thanks for adding to the release notes and fixing the indentation.

@rgommers
Copy link
Member

rgommers commented Sep 9, 2012

Merged in eec9174. Thanks again Patrick and Sturla!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants