Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Port cKDTree query methods to C++, allow pickling on Python 2, and add support for periodic queries #4890

Merged
merged 64 commits into from
Nov 22, 2015

Conversation

sturlamolden
Copy link
Contributor

New WIP PR after rebase disaster.

@sturlamolden sturlamolden changed the title WPI: Port cKDTree query methods to C++ WIP: Port cKDTree query methods to C++ May 19, 2015
#include "ckdtree_cpp_rectangle.h"

static void
__count_neighbors_traverse(const ckdtree *self,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't all double underscore names reserved in C++?

Copy link
Member

@pv pv May 19, 2015 via email

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The C++ compilers do not complain about the double underscores, but I'll check. They are in the Cython code, but it is trivial to remove them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double underscores are gone. They served no purpose anyway in the C++ code.

@sturlamolden
Copy link
Contributor Author

query_ball_point is now ported to C++. With some minor modifications it should also be possible to multithread this function.

query_ball_tree and sparse_distance_matrixremains to be ported to C++.

@sturlamolden
Copy link
Contributor Author

count_neighborshas a test failure according to Travis, returns 34 instead of 33.

Update: Found one typo, but now it returns 32 instead of 33. Bugger.
Update: Squashed

@sturlamolden
Copy link
Contributor Author

query_ball_treeis ported to C++, sparse_distance_matrixonly remaining method to port.

query_ball_point is multithreaded similarly to query.

@sturlamolden
Copy link
Contributor Author

All query methods now in C++

Added formerly requested option of returning other containers than dok_matrix from sparse_distance_matrix. Test case needed for these.

@sturlamolden
Copy link
Contributor Author

Remaining issues:

  • testing on Windows
  • code review, final bug fixing
  • use prefetching in sparse_distance_matrix (if I bother)

@sturlamolden
Copy link
Contributor Author

Travis is happy with this, it seems.

@rgommers rgommers added enhancement A new feature or improvement scipy.spatial labels May 21, 2015
@rgommers
Copy link
Member

Copying @pv's question that didn't get answered yet: does this help with the performance regressions? https://pv.github.io/scipy-bench/#regressions?sort=3&dir=desc ?

I did some browsing through all the cKDTree benchmarks and while we get a major improvement for 0.16.x there are a few things that got a lot slower. Would be good to copy benchmark results for this PR in a comment here before merging.

@sturlamolden
Copy link
Contributor Author

I can do a systematic benchmark of cKDTree in 0.15.x, 0.16.x and this PR. That is clearly needed before this is ready to merge. I tried to read the benchmarks from @pv , but I am not quite sure what they measure and how to read them.

0.16.x provides two ways of constructing the kd-tree: sliding midpoint as before and median as in scikit-learn. I would not be surprised if the new method could give performance boost or regression depending on the query method. Since the median method makes kd-tree construction slower (but queries can be faster), the net effect can also depend on how much the kd-tree construction account for the whole run-time (e.g. it matters relatively more for smaller data sets than for large). The effect on query time will also depend on the distribution of the data, though @jakevdp thinks the median method is only inferior in contrived situations. The effect on run-time from the changes here probably depends on the hardware as well, and maybe also the compiler.

Another thing to note is that speed is not everything. This PR also allows the GIL to be released while doing the queries. In 0.15.x the GIL is never released; in 0.16.x only .queryreleases the GIL; here all query methods release the GIL. This is rather important because id-tree queries can be lengthy and will lock-up the interpreter while they run with the GIL locked. Even if we do not multithread the queries, concurrent i/o will suffer, GUIs will not respond, and real-time graphics might not update in time. kd-trees are often used for computer graphics (e.g. in games and simulations), so this is rather important.

@sturlamolden
Copy link
Contributor Author

I should probably set up a repo for the benchmarks.

@sturlamolden
Copy link
Contributor Author

Repo for benchmarks:
https://github.com/sturlamolden/cKDTree-bench

@sturlamolden
Copy link
Contributor Author

@sturlamolden
Copy link
Contributor Author

The only performance regression from 0.15.x to 0.16.x is in the kd-tree creation time. But that is on a time scale three orders of magnitude smaller than the duration of the query methods.

This PR is equally fast or faster than 0.16.x, depending on the query method.

@rgommers
Copy link
Member

The existing benchmarks are here: benchmarks/benchmarks/spatial.py. It looks like the benchmark results do depend on the shape of the input data.

The rendered benchmarks are at https://pv.github.io/scipy-bench/# (look for spatial). They show performance as function of time, from 2009 to now. As an example of a regression, see https://pv.github.io/scipy-bench/#spatial.Neighbors.time_sparse_distance_matrix. On the left sidebar, deselect KDTree and select cKDTree under class. Then you'll see a graph with a jump up in 2013 (the Cython rewrite) and other jump up (click and drag a zoom window onto the plot to zoom in) from a couple of weeks ago.

Now for a particular benchmark it can be the case that that benchmark is not realistic and can therefore be ignored or should be adapted. The query, query_pairs and query_ball_points ones look OK. build becoming slower might be expected. sparse_distance_matrix becoming slower I'm not sure about.

Note that you can run the benchmarks locally with $ python runtests.py --bench -s spatial

@rgommers
Copy link
Member

The output you'll get is:

$ python runtests.py --bench -s spatial
Building, see build.log...
    ... build in progress
    ... build in progress
    ... build in progress
    ... build in progress
    ... build in progress
    ... build in progress
    ... build in progress
    ... build in progress
Build OK
Running benchmarks for Scipy version 0.17.0.dev0+7b893ae at /home/rgommers/Code/bldscipy/build/testenv/lib/python2.7/site-packages/scipy/__init__.pyc
· Discovering benchmarks
· Running 6 total benchmarks (1 commits * 1 environments * 6 benchmarks)
[  0.00%] ·· Building for /usr/bin/python
[  0.00%] ·· Benchmarking /usr/bin/python
[ 16.67%] ··· Running spatial.Build.time_build                                                       ok
[ 16.67%] ···· 
               =================== ========= =========
               --                         class       
               ------------------- -------------------
                    (m, n, r)        KDTree   cKDTree 
               =================== ========= =========
                 (3, 10000, 1000)   33.08ms    3.46ms 
                 (8, 10000, 1000)   64.40ms    4.73ms 
                (16, 10000, 1000)   37.49ms    6.80ms 
               =================== ========= =========

[ 33.33%] ··· Running spatial.Neighbors.time_count_neighbors                                         ok
[ 33.33%] ···· 
               ================== ============== ========== =========
               --                                       class        
               --------------------------------- --------------------
                  (m, n1, n2)      probe radius    KDTree    cKDTree 
               ================== ============== ========== =========
                (3, 1000, 1000)        0.2        410.96ms    2.32ms 
                (3, 1000, 1000)        0.5        698.07ms    3.90ms 
                (8, 1000, 1000)        0.2         1.71s      8.46ms 
                (8, 1000, 1000)        0.5         2.11s     11.01ms 
                (16, 1000, 1000)       0.2         2.50s     14.63ms 
                (16, 1000, 1000)       0.5         2.71s     17.44ms 
               ================== ============== ========== =========

[ 50.00%] ··· Running spatial.Neighbors.time_sparse_distance_matrix                                  ok
[ 50.00%] ···· 
               ================== ============== ======== =========
               --                                      class       
               --------------------------------- ------------------
                  (m, n1, n2)      probe radius   KDTree   cKDTree 
               ================== ============== ======== =========
                (3, 1000, 1000)        0.2        1.41s     1.97ms 
                (3, 1000, 1000)        0.5        2.53s     6.99ms 
                (8, 1000, 1000)        0.2        5.39s     6.83ms 
                (8, 1000, 1000)        0.5        6.59s     9.42ms 
                (16, 1000, 1000)       0.2        9.42s    14.19ms 
                (16, 1000, 1000)       0.5        10.68s   17.21ms 
               ================== ============== ======== =========

[ 66.67%] ··· Running spatial.Query.time_query                                                       ok
[ 66.67%] ···· 
               =================== ========== ========== ==============
               --                                 class                
               ------------------- ------------------------------------
                    (m, n, r)        KDTree    cKDTree    cKDTree_flat 
               =================== ========== ========== ==============
                 (3, 10000, 1000)   247.76ms    2.73ms      72.60ms    
                 (8, 10000, 1000)    4.26s     22.65ms      105.84ms   
                (16, 10000, 1000)    27.65s    235.68ms     166.01ms   
               =================== ========== ========== ==============

[ 83.33%] ··· Running spatial.Radius.time_query_ball_point                                           ok
[ 83.33%] ···· 
               ================== ============== ======== ========= ==============
               --                                              class              
               --------------------------------- ---------------------------------
                   (m, n, r)       probe radius   KDTree   cKDTree   cKDTree_flat 
               ================== ============== ======== ========= ==============
                (3, 10000, 1000)       0.2        2.08s     8.49ms     77.31ms    
                (3, 10000, 1000)       0.5        6.04s    26.86ms     83.03ms    
               ================== ============== ======== ========= ==============

[100.00%] ··· Running spatial.Radius.time_query_pairs                                                ok
[100.00%] ···· 
               ================ ============== ========== ========== ==============
               --                                             class                
               ------------------------------- ------------------------------------
                  (m, n, r)      probe radius    KDTree    cKDTree    cKDTree_flat 
               ================ ============== ========== ========== ==============
                (3, 1000, 30)        0.2        388.76ms   977.07μs      3.75ms    
                (3, 1000, 30)        0.5        506.58ms    1.63ms       4.29ms    
                (8, 1000, 30)        0.2         2.51s      3.86ms       5.72ms    
                (8, 1000, 30)        0.5         2.46s      4.72ms       5.67ms    
                (16, 1000, 30)       0.2         2.63s      7.39ms       8.66ms    
                (16, 1000, 30)       0.5         2.62s      8.11ms       8.71ms    
               ================ ============== ========== ========== ==============

@sturlamolden
Copy link
Contributor Author

cKDTree did not have sparse_distance_matrix prior to the Cython rewrite. Clearly that could not have caused a regression :)

@sturlamolden
Copy link
Contributor Author

In my tests I used the dummy data generator from scikit-learn to get a more realistic dataset than just random numbers.

@pv
Copy link
Member

pv commented May 22, 2015 via email

@pv
Copy link
Member

pv commented May 22, 2015

The first regression in the sparse_distance_matrix is probably due to fixing the implementation to give correct result, probably a9565c0

The kdtree benchmarks currently under benchmarks/ are those that were previously under scipy/spatial/benchmarks --- I have no idea whether they are sensible or not. If you think they are not sensible, they should be replaced. (Given that KDTrees have various applications, it's not clear to me there is a good condition for "realistic" datasets.)

@pv
Copy link
Member

pv commented May 22, 2015

The more recent performance regression in sparse_distance_matrix benchmarks is bea41ad, as indicated in the plots.

@sturlamolden
Copy link
Contributor Author

Did you look at the timings in the ipython notebook? Here you can see the runtime of sparse_distance_matrix reduced from 4.07 s in SciPy 0.15.x to 3.12 s in 0.16.x. Yet your benchmark show a performance regression. I am not sure what is going on here.

@sturlamolden
Copy link
Contributor Author

What I mean by "realistic dataset" is that the data has some structure so that the n-dimensional sample space is not uniformly populated. This is at least the common situation is classification problems.

@pv
Copy link
Member

pv commented May 22, 2015

Note that it's not really "my" benchmark --- these are the old benchmarks written by the original ckdtree authors. The benchmarks are in the scipy repository --- you can try running them on your machine and to see if you can reproduce the results.

@sturlamolden
Copy link
Contributor Author

I am not expecting the Travis build to succeed (I don't know how to shut it down), but now the merge conflicts are taken care of. The next step is to fix the build issue on SunOS and Cygwin due to ::infinite.

@sturlamolden
Copy link
Contributor Author

But on the positive side, we are up to date with scipy/master again :-)

@sturlamolden
Copy link
Contributor Author

As expected, all the builds are failing because the symbol ::infinity was removed in scipy/master and is now missing.

@sturlamolden sturlamolden changed the title WIP: Port cKDTree query methods to C++ WIP: Port cKDTree query methods to C++ and add support for periodic queries Nov 11, 2015
@sturlamolden sturlamolden changed the title WIP: Port cKDTree query methods to C++ and add support for periodic queries WIP: Port cKDTree query methods to C++, allow pickling on Python 2, and add support for periodic queries Nov 11, 2015
@sturlamolden
Copy link
Contributor Author

Bugger, I forgot about using prefixes on the commit messages :-(

@sturlamolden
Copy link
Contributor Author

Great, what do you think @rainwoodman ? Are we happy now? More that needs to be done? Fix my last commit messages perhaps (I don't know how), but apart from that, does this look good?

@sturlamolden
Copy link
Contributor Author

@rgommers Should we put an 0.17 milestone on this? This PR is basically done, unless @rainwoodman has anything else to add.

(It remains to verify that it builds on Wintendo. My old PC just died of old age, but I am setting up a virtual box to run on my Mac. Thank God I kept the DVD.)

@sturlamolden
Copy link
Contributor Author

Summary of what is in this PR:

  • All query methods are in C++ and release the GIL
  • Build method is in C++ and releases the GIL
  • Support for periodic spaces for using cKDTree in cosmology simulations
  • Pickling is supported on Python 2 as well
  • sparse_distance_matrix can return other containers than dok_matrix which prevents truncation to 0 of distances between excluded pairs.
  • Many conditional branches are replaced with C++ templates to ensure the fastest execution path
  • query_ball_point has a major speed improvement over the current implementation
  • query_ball_point can in additon optionally be multi-threaded

@rainwoodman
Copy link
Contributor

Sure. I am happy with this. Thanks!

@sturlamolden sturlamolden changed the title WIP: Port cKDTree query methods to C++, allow pickling on Python 2, and add support for periodic queries ENH: Port cKDTree query methods to C++, allow pickling on Python 2, and add support for periodic queries Nov 12, 2015
@sturlamolden
Copy link
Contributor Author

WIP tag is removed.

@rainwoodman
Copy link
Contributor

We will likely write a code for 2-point galaxy clustering analysis in cosmology, with an extension to the original Gray and Moore 2000 to the anisotropic case.

This code will be based on the current ckdtree code in this branch, but would add these features to ckdtree:

  • weighted input data, multiple weights shall be supported. The tree nodes will be augmented to include the sum of weights for proper pruning.
    cKDTree(data, leafsize, boxsize, weight)
  • An anisotropic count_neighbors method with pruning, that returns number of pairs as a function of r and z/r, where z is the distance along a selected axis:
    count_neighbors_2d(self, other, r, mu, p)

From my eyes these two features are quite doable. What do your think? @sturlamolden @rgommers ?

We will be testing the full program (which would just blend some parallelism and IO interface with the modified ckdtree) against the standard code used in SDSS galaxy clustering analysis (https://github.com/bareid/xi), and fix bugs along the way.

I am guessing the time-line for a new ckdtree PR will be the next 2 to 3 months, probably not soon enough to make it into next release?

@sturlamolden
Copy link
Contributor Author

They are probably doable. I am not sure what the weights do, though, but now that we use templates it is easy to plug them in without producing a performance regression.

0.17 will probably be tagged and branched soon. It is that time of the year. Then 0.18 will be next summer, or something like that. :-)

@sturlamolden
Copy link
Contributor Author

Can someone please tag this PR with 0.17 milestone?

@ev-br ev-br added this to the 0.17.0 milestone Nov 16, 2015
@rgommers
Copy link
Member

I've verified that this builds and passes all tests with MinGW (python 3.4, sse3 build). So in it goes. Thanks @sturlamolden and @rainwoodman!

rgommers added a commit that referenced this pull request Nov 22, 2015
ENH: Port cKDTree query methods to C++, allow pickling on Python 2, and add support for periodic queries
@rgommers rgommers merged commit ee8eba7 into scipy:master Nov 22, 2015
@rgommers
Copy link
Member

@rainwoodman a next release is never more than 6 months away, so no worries about the timing for the next PR.

@rgommers
Copy link
Member

I'll send an update for the release notes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.spatial
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants