ENH: adding spherical Voronoi diagram algorithm to scipy.spatial #5232

Merged
merged 80 commits into from Feb 16, 2016

Conversation

Projects
None yet
8 participants
@tylerjereddy
Contributor

tylerjereddy commented Sep 8, 2015

Background

On July 30th I proposed the idea of adding spherical Voronoi code to scipy (details / motivation described in Issue #5097), which was positively received. As suggested, I posted a similar proposal to the mailing list, which did not receive any responses. While recognizing that an addition this major may be rejected outright, I nonetheless thought I'd move forward with an initial preview of the code, its documentation, and associated unit tests, to give a more concrete idea of how it behaves. The unit tests I wrote all passed after the code migration was complete (all failed initially).

Question / Advice

Before investing more time into refinements I thought it would be useful to receive feedback, if only to confirm that items on my checklist need to be addressed, etc.

My current to-do list for improving the code (if we move forward with this):

  • miscellaneous code improvements suggested by CJ Carey (see below)
  • Python 3 compatibility (assess with 2to3 diff first)
  • quantify test coverage
  • pathological test cases
    • all generators in a single hemisphere
    • all generators on the equator
  • in some cases, my returned data structures (dictionary objects instead of arrays) could probably be viewed as an inconsistency with other scipy.spatial codes, for which some effort seems to have been made to establish unification of returned data structures (as evidenced here: http://docs.scipy.org/doc/scipy/reference/spatial.html#simplex-representation)
  • it may be worth exploring usage of scipy.spatial.tsearch for certain parts of my code that were written manually (I noticed this function might be useful when working on this PR)
  • there are almost certainly good prospects for additional vectorization of some components of the code and / or usage of Cython for critical loops; I'm less familiar with the latter but open to giving it a go if encouraged
  • the organization of functions defined outside the SphericalVoronoi class may have to be adjusted--some may need to be _hidden and / or have import mechanics adjusted so that import scipy.spatial.spherical_voronoi can be avoided as a supplement for certain utility functions (i.e., generating random array of points on sphere) to from scipy.spatial import SphericalVoronoi
  • the method docstrings in SphericalVoronoi get truncated -- this is a bit awkward, but probably standard sphinx behaviour -- the user can still click on the method to see the full docstring
  • some unit tests are skipped so could be removed if scipy tends to avoid committing potentially useful tests that aren't quite working yet
  • code style
    • PEP8
    • spurious comments? (some of these have been dealt with)
    • improve concision of variable names (see note from CJ Carey)
  • add SphericalVoronoi to release notes / reference guide (?!) as indicated in development FAQ
  • many third-party dependencies present in my own repo had to be stripped so that e.g. pandas isn't a requirement--this did not substantially affect the quadratic time complexity / performance (see: https://github.com/tylerjereddy/scipy_spherical_voronoi_rough_work/blob/master/scipy_spatial_spherical_Voronoi_testing.ipynb).
scipy/spatial/spherical_voronoi.py
+ array_x0_values = array_Dx_values / array_denominator_values
+ array_y0_values = array_Dy_values / array_denominator_values
+ array_z0_values = array_Dz_values / array_denominator_values
+ circumcenter_array = np.hstack((array_x0_values[:,np.newaxis], array_y0_values[:,np.newaxis], array_z0_values[:,np.newaxis]))

This comment has been minimized.

@perimosocordiae

perimosocordiae Sep 8, 2015

Member

Could simplify this to:

circumcenter_array = np.column_stack((array_x0_values, array_y0_values, array_z0_values))
@perimosocordiae

perimosocordiae Sep 8, 2015

Member

Could simplify this to:

circumcenter_array = np.column_stack((array_x0_values, array_y0_values, array_z0_values))
scipy/spatial/spherical_voronoi.py
+ vi1 = poly[i]
+ vi2 = poly[(i+1) % N]
+ prod = np.cross(vi1, vi2)
+ total[0] += prod[0]

This comment has been minimized.

@perimosocordiae

perimosocordiae Sep 8, 2015

Member

Initialize total = np.zeros(3), then you can update more simply:

total += prod
@perimosocordiae

perimosocordiae Sep 8, 2015

Member

Initialize total = np.zeros(3), then you can update more simply:

total += prod
scipy/spatial/spherical_voronoi.py
+
+ list_vertices = [] #need a list of tuples for above function
+ for coord in array_ordered_Voronoi_polygon_vertices:
+ list_vertices.append(tuple(coord))

This comment has been minimized.

@perimosocordiae

perimosocordiae Sep 8, 2015

Member

Could use list(map(tuple, array_ordered_Voronoi_polygon_vertices)) here to avoid the loop.

@perimosocordiae

perimosocordiae Sep 8, 2015

Member

Could use list(map(tuple, array_ordered_Voronoi_polygon_vertices)) here to avoid the loop.

scipy/spatial/spherical_voronoi.py
+ if abs(pre_acos_term) > 1.0:
+ theta = 0
+ else:
+ theta = np.sum(np.array(list_Voronoi_poygon_angles_radians))

This comment has been minimized.

@perimosocordiae

perimosocordiae Sep 8, 2015

Member

The call to np.array() here is superfluous; np.sum will handle lists just fine.

@perimosocordiae

perimosocordiae Sep 8, 2015

Member

The call to np.array() here is superfluous; np.sum will handle lists just fine.

scipy/spatial/spherical_voronoi.py
+ Vectorized version for use with multiple tetrahedra in tetrahedron_coord_array -- the latter should have shape (N, 4, 3).'''
+ num_tetrahedra = tetrahedron_coord_array.shape[0]
+ #reshape the tetrahedron_coord_array to place all tetrahedra consecutively without nesting
+ tetrahedron_coord_array = np.reshape(tetrahedron_coord_array, (tetrahedron_coord_array.shape[0] * tetrahedron_coord_array.shape[1], tetrahedron_coord_array.shape[2]))

This comment has been minimized.

@perimosocordiae

perimosocordiae Sep 8, 2015

Member

Your style is much more verbose than the rest of scipy, which will make it hard to conform to line-length limits and (IMO) reduces readability. I'd recommend dropping nonessential information from your variable names. For example, pretty much every instance of the array_foo_values pattern could be more succinctly named foo.

@perimosocordiae

perimosocordiae Sep 8, 2015

Member

Your style is much more verbose than the rest of scipy, which will make it hard to conform to line-length limits and (IMO) reduces readability. I'd recommend dropping nonessential information from your variable names. For example, pretty much every instance of the array_foo_values pattern could be more succinctly named foo.

@tylerjereddy

This comment has been minimized.

Show comment
Hide comment
@tylerjereddy

tylerjereddy Sep 9, 2015

Contributor

Thanks CJ. I've made your specific suggested changes. I added a new task for improving the concision of the variable names to the to-do list above.

Contributor

tylerjereddy commented Sep 9, 2015

Thanks CJ. I've made your specific suggested changes. I added a new task for improving the concision of the variable names to the to-do list above.

@niknow

This comment has been minimized.

Show comment
Hide comment
@niknow

niknow Sep 10, 2015

Contributor

Cool. :) I think I can fix the problem with the docstrings (and maybe some PEP8 issues along the way).

Contributor

niknow commented Sep 10, 2015

Cool. :) I think I can fix the problem with the docstrings (and maybe some PEP8 issues along the way).

scipy/spatial/spherical_voronoi.py
+__all__ = ['SphericalVoronoi']
+
+
+def determinant_fallback(m):

This comment has been minimized.

@pv

pv Oct 6, 2015

Member

Can this function be simply replaced by np.linalg.det?
"Compatiblity with Python 2.6" is probably not the reason why it exists; whether np.linalg.det works or not shouldn't depend on that.

@pv

pv Oct 6, 2015

Member

Can this function be simply replaced by np.linalg.det?
"Compatiblity with Python 2.6" is probably not the reason why it exists; whether np.linalg.det works or not shouldn't depend on that.

scipy/spatial/spherical_voronoi.py
+ dy = np.delete(d, 2, axis=2)
+ dz = np.delete(d, 3, axis=2)
+
+ try:

This comment has been minimized.

@pv

pv Oct 6, 2015

Member

This part should probably read, instead of the try/except clause,

if HAS_NUMPY_VEC_DET:
    dx = ...
    ...
else:
    dx = np.array([np.linalg.det(m) for m in dx])
    ...

and on the top of the file add:

from scipy._lib._version import NumpyVersion

# Whether Numpy has stacked matrix linear algebra
HAS_NUMPY_VEC_DET = (NumpyVersion(np.__version__) >= '1.8.0')
@pv

pv Oct 6, 2015

Member

This part should probably read, instead of the try/except clause,

if HAS_NUMPY_VEC_DET:
    dx = ...
    ...
else:
    dx = np.array([np.linalg.det(m) for m in dx])
    ...

and on the top of the file add:

from scipy._lib._version import NumpyVersion

# Whether Numpy has stacked matrix linear algebra
HAS_NUMPY_VEC_DET = (NumpyVersion(np.__version__) >= '1.8.0')
scipy/spatial/spherical_voronoi.py
@@ -0,0 +1,342 @@
+"""

This comment has been minimized.

@pv

pv Oct 6, 2015

Member

The file should be renamed to _spherical_voronoi.py, only the SphericalVoronoi class is public.

@pv

pv Oct 6, 2015

Member

The file should be renamed to _spherical_voronoi.py, only the SphericalVoronoi class is public.

scipy/spatial/spherical_voronoi.py
+ self._tri = None
+ self.calc_vertices_regions()
+
+ def calc_vertices_regions(self):

This comment has been minimized.

@pv

pv Oct 6, 2015

Member

Is this a public method for the users? If not -> _calc_vertices_regions.

@pv

pv Oct 6, 2015

Member

Is this a public method for the users? If not -> _calc_vertices_regions.

scipy/spatial/spherical_voronoi.py
+
+ nominator = np.vstack((dx, dy, dz))
+ denominator = np.matlib.repmat(2 * a, 3, 1)
+ return (nominator / denominator).transpose()

This comment has been minimized.

@pv

pv Oct 6, 2015

Member

repmat is often replaced by broadcasting in Numpy code:

nominator = np.vstack((dx, dy, dz))
denominator = 2*a
return (nominator / denominator).T
@pv

pv Oct 6, 2015

Member

repmat is often replaced by broadcasting in Numpy code:

nominator = np.vstack((dx, dy, dz))
denominator = 2*a
return (nominator / denominator).T
@rgommers

This comment has been minimized.

Show comment
Hide comment
@rgommers

rgommers Oct 9, 2015

Member

@tylerjereddy a few comments on open items on the list:

add SphericalVoronoi to release notes / reference guide (?!) as indicated in development FAQ

Addition to the reference guide is already achieved by your edit to the docstring of spatial/__init__.py. And I wouldn't worry about the release notes, we can do that afterwards. So this box can be checked.

the organization of functions defined outside the SphericalVoronoi class may have to be adjusted ...

This isn't necessary, your code will take very little time to import. Everything looks good as is.

there are almost certainly good prospects for additional vectorization of some components of the code and / or usage of Cython for critical loops; I'm less familiar with the latter but open to giving it a go if encouraged

Do you think that your code is unacceptably slow (for real-world size problems or compared to another implementation)? I see some for-loops that might be quite slow, but only optimize at the end if you think it's needed.

Member

rgommers commented Oct 9, 2015

@tylerjereddy a few comments on open items on the list:

add SphericalVoronoi to release notes / reference guide (?!) as indicated in development FAQ

Addition to the reference guide is already achieved by your edit to the docstring of spatial/__init__.py. And I wouldn't worry about the release notes, we can do that afterwards. So this box can be checked.

the organization of functions defined outside the SphericalVoronoi class may have to be adjusted ...

This isn't necessary, your code will take very little time to import. Everything looks good as is.

there are almost certainly good prospects for additional vectorization of some components of the code and / or usage of Cython for critical loops; I'm less familiar with the latter but open to giving it a go if encouraged

Do you think that your code is unacceptably slow (for real-world size problems or compared to another implementation)? I see some for-loops that might be quite slow, but only optimize at the end if you think it's needed.

scipy/spatial/spherical_voronoi.py
+
+ >>> from matplotlib import colors
+ >>> from mpl_toolkits.mplot3d.art3d import Poly3DCollection
+ >>> import numpy as np

This comment has been minimized.

@rgommers

rgommers Oct 9, 2015

Member

this import can be deleted, it's assumed in every single example in scipy and the docs will build correctly without it.

@rgommers

rgommers Oct 9, 2015

Member

this import can be deleted, it's assumed in every single example in scipy and the docs will build correctly without it.

scipy/spatial/spherical_voronoi.py
+ >>> import matplotlib.pyplot as plt
+ >>> from scipy.spatial import spherical_voronoi
+ >>> from mpl_toolkits.mplot3d import proj3d
+ >>> import scipy as sp

This comment has been minimized.

@rgommers

rgommers Oct 9, 2015

Member

this import can be deleted as well, it does nothing and scipy as sp should not be used anymore for anything

@rgommers

rgommers Oct 9, 2015

Member

this import can be deleted as well, it does nothing and scipy as sp should not be used anymore for anything

@rgommers

This comment has been minimized.

Show comment
Hide comment
@rgommers

rgommers Oct 9, 2015

Member

The plotting example works for me even though it's not pretty. Mayavi is indeed a too heavy dependency.

Member

rgommers commented Oct 9, 2015

The plotting example works for me even though it's not pretty. Mayavi is indeed a too heavy dependency.

@tylerjereddy

This comment has been minimized.

Show comment
Hide comment
@tylerjereddy

tylerjereddy Oct 12, 2015

Contributor

@rgommers It seems that some of the (essential and much appreciated) refactoring Nikolai has done has caused a pretty substantial loss in performance based on my informal benchmarks:

performance_spherical_voronoi

The Jupyter notebook used to produce the crude benchmarks has been made public (https://github.com/tylerjereddy/bench_spherical_voronoi/blob/master/assess_spherical_Voronoi_algorithm.ipynb).

So, I think it would be reasonable to expect the performance of the code in the scipy fork to roughly match the performance of the code from my repo (https://github.com/tylerjereddy/py_sphere_Voronoi). At the moment, I don't think it is acceptable, even for my personal use at ~20,000 input points maximum.

Why is the scipy fork performing slower? Speculation probably isn't productive without line profiling (as usual). However, the new SphericalVoronoi class automatically generates unsorted Voronoi regions when it is initialized (mostly for consistency with things like scipy.spatial.ConvexHull or Voronoi where the primary use data structure is automatically generated when the class is instantiated). However, my suspicion is that splitting the general Voronoi vertex generation and Voronoi region vertex sorting operations in this fashion has had some cost (they are more 'integrated' in old code). [the challenges of sorting the Voronoi region vertices are discussed in the initial presentation of the algorithm to the Python community: https://www.youtube.com/watch?v=gxNa9BD5CnQ]

I'm going to have to read the refactored code carefully. One question--if I can manage to restore the performance is there an accepted way to incorporate a "performance test" to prevent future "performance regressions" of this nature? I imagine unit tests aren't quite the right place for various reasons--execution time could change with computers / other parts of code improving, etc., though I suppose one could decorate the "unit test" with the appropriate flags such that it only runs when a comprehensive suite is desired.

Contributor

tylerjereddy commented Oct 12, 2015

@rgommers It seems that some of the (essential and much appreciated) refactoring Nikolai has done has caused a pretty substantial loss in performance based on my informal benchmarks:

performance_spherical_voronoi

The Jupyter notebook used to produce the crude benchmarks has been made public (https://github.com/tylerjereddy/bench_spherical_voronoi/blob/master/assess_spherical_Voronoi_algorithm.ipynb).

So, I think it would be reasonable to expect the performance of the code in the scipy fork to roughly match the performance of the code from my repo (https://github.com/tylerjereddy/py_sphere_Voronoi). At the moment, I don't think it is acceptable, even for my personal use at ~20,000 input points maximum.

Why is the scipy fork performing slower? Speculation probably isn't productive without line profiling (as usual). However, the new SphericalVoronoi class automatically generates unsorted Voronoi regions when it is initialized (mostly for consistency with things like scipy.spatial.ConvexHull or Voronoi where the primary use data structure is automatically generated when the class is instantiated). However, my suspicion is that splitting the general Voronoi vertex generation and Voronoi region vertex sorting operations in this fashion has had some cost (they are more 'integrated' in old code). [the challenges of sorting the Voronoi region vertices are discussed in the initial presentation of the algorithm to the Python community: https://www.youtube.com/watch?v=gxNa9BD5CnQ]

I'm going to have to read the refactored code carefully. One question--if I can manage to restore the performance is there an accepted way to incorporate a "performance test" to prevent future "performance regressions" of this nature? I imagine unit tests aren't quite the right place for various reasons--execution time could change with computers / other parts of code improving, etc., though I suppose one could decorate the "unit test" with the appropriate flags such that it only runs when a comprehensive suite is desired.

@tylerjereddy tylerjereddy referenced this pull request in tylerjereddy/scipy Oct 12, 2015

Closed

Assess performance of current code #12

@rgommers

This comment has been minimized.

Show comment
Hide comment
@rgommers

rgommers Oct 12, 2015

Member

Hi @tylerjereddy, a quick reply about benchmarking (without having time to look into the details here now): we'd love to have proper benchmarks for any new feature that's added. Those live in https://github.com/scipy/scipy/tree/master/benchmarks and are easy to add.

Member

rgommers commented Oct 12, 2015

Hi @tylerjereddy, a quick reply about benchmarking (without having time to look into the details here now): we'd love to have proper benchmarks for any new feature that's added. Those live in https://github.com/scipy/scipy/tree/master/benchmarks and are easy to add.

@tylerjereddy

This comment has been minimized.

Show comment
Hide comment
@tylerjereddy

tylerjereddy Oct 12, 2015

Contributor

@rgommers It also seems that removing import scipy from the Examples section of the docstring is causing an issue when calling scipy.rand() in one of the Travis tests, even though the docs build for me locally. The control flow does import from scipy before reaching that point, so that's a bit strange if scipy is supposed to be imported by default.

Contributor

tylerjereddy commented Oct 12, 2015

@rgommers It also seems that removing import scipy from the Examples section of the docstring is causing an issue when calling scipy.rand() in one of the Travis tests, even though the docs build for me locally. The control flow does import from scipy before reaching that point, so that's a bit strange if scipy is supposed to be imported by default.

@rgommers

This comment has been minimized.

Show comment
Hide comment
@rgommers

rgommers Oct 12, 2015

Member

scipy.rand isn't a scipy but a numpy function: try scipy.rand is np.random.rand

There is nothing in the scipy namespace that should be used (except test()), it's only there for historical reasons. See http://docs.scipy.org/doc/scipy/reference/api.html#guidelines-for-importing-functions-from-scipy

Member

rgommers commented Oct 12, 2015

scipy.rand isn't a scipy but a numpy function: try scipy.rand is np.random.rand

There is nothing in the scipy namespace that should be used (except test()), it's only there for historical reasons. See http://docs.scipy.org/doc/scipy/reference/api.html#guidelines-for-importing-functions-from-scipy

@rgommers

This comment has been minimized.

Show comment
Hide comment
@rgommers

rgommers Oct 12, 2015

Member

That is a massive difference in performance indeed, looks like it needs fixing.

Member

rgommers commented Oct 12, 2015

That is a massive difference in performance indeed, looks like it needs fixing.

@tylerjereddy

This comment has been minimized.

Show comment
Hide comment
@tylerjereddy

tylerjereddy Oct 18, 2015

Contributor

Ok, I've restored (and actually improved upon) the performance:

performance_spherical_voronoi

I have a few more comments and concerns that I'll try to summarize later.

Contributor

tylerjereddy commented Oct 18, 2015

Ok, I've restored (and actually improved upon) the performance:

performance_spherical_voronoi

I have a few more comments and concerns that I'll try to summarize later.

@tylerjereddy

This comment has been minimized.

Show comment
Hide comment
@tylerjereddy

tylerjereddy Oct 18, 2015

Contributor
  • Performance
    • The Competition: There seems to be a paucity of open source alternatives for calculating spherical Voronoi diagrams. There's an interactive implementation written by Jason Davies (https://www.jasondavies.com/maps/voronoi/) using the javascript D3 library. As this implementation appears to allow progressive (interactive) insertion of data points (and also proceeds via the Convex Hull), it will probably have the same quadratic time complexity as our algorithm (see Table 1 below).

      Benchmarking would take at least some effort as only interactive interaction is allowed (can't feed in array data to get output--which is part of the motivation for developing something more flexible like this in scipy). His example with all major airports in the world (https://www.jasondavies.com/maps/voronoi/airports/) cites about 3000 data points, which is easily within the well-performing region of our implementation.

      There's a closed source implementation in Wolfram (http://demonstrations.wolfram.com/VoronoiDiagramOnASphere/), but this also only appears to allow for interaction rather than feeding in data and returning vertices, regions, etc. It also briefly describes an algorithm that exploits the same properties as ours, and would likely have the same time complexity.

      Apparently, the bleeding edge version of CGAL should be capable of calculating spherical Voronoi diagrams, and indeed it was a former CGAL core developer that provided key insight into the method for sorting the Voronoi regions in my early attempts. I think setting up CGAL properly with C++ libraries can be tricky though. Last time I checked, Qhull made a brief comment about how the spherical Voronoi diagram could be computed, but didn't offer this functionality. If I had to guess, I would suspect that the CGAL implementation of a similar algorithm would run at least a bit faster than ours with all the effort they put into C++ optimization, etc. I can't find a reference to it in their documentation though.

    • Theory: There are faster (loglinear) algorithms recently published in the mathematics literature for spherical Voronoi diagrams (Table 1). I made several attempts to incorporate the stereographic projection paradigm, without much success. In the future, a faster implementation would be nice, but these can be really hard to implement. At least if we have a working slower algorithm it will serve as a useful unit test for correct results when attempting to improve with a fancier algorithm.

    • Generator limit: It appears that the upper limit on the number of testable generators (input data points) in my benchmarks relates to the amount of physical memory used (the space complexity). Although I had made some initial attempts to 'benchmark' the space complexity, this seems to be a bit harder to do well. Suffice it to say that it certainly seems clear that at around 70,000 generators a machine with 16 GB of RAM starts to choke. I routinely deal with ~20,000 generators when working with viruses. Hard to say what other application domains will require--certainly we can't produce a spherical Voronoi diagram with the coordinates of all ~3 trillion trees on our planet. Nonetheless, there are < 200 countries in the world for looking at epidemiology-related things on the planet surface, and surely many practical applications involve < 70,000 generators.

Table 1: Spherical Voronoi Algorithms

Authors Year Citations Paradigm Performance
Augenbaum 1985 89 progressive insertion quadratic
Renka 1997 146 incremental loglinear
Gold and Mostafavi 2000 33 point-line model ?
Na et al. 2002 50 stereographic projection loglinear
Chen et al. 2003 34 spherical mesh ?
Caroli et al. 2009 3 incremental ?
Dinis and Mamede 2010 3 sweep line loglinear
  • Unit testing:
    • The unit tests have changed quite a lot from my original implementation. I think Nikolai felt there were too many 'functional' tests rather than proper unit tests. That was probably true. It is noteworthy that one of the removed unit tests verified an unproved (but empirically observed) property of spherical Voronoi diagrams--the upper limit on the number of Voronoi vertices is 2n - 4, where n is the number of generators.

      A similar relationship (2n - 5) has been formally proven in the plane [Theorem 4.12 in 'Discrete and Computational Geometry' (2011) Devadoss and O'Rourke], but not for a sphere (yet). Anyway, good to have a record of this here if that is proven in the future. Certainly looks true.

generator_vertex_comparison

  • Example in Docstring: I'm a bit confused by the new example that was coded in (see my annotations below). Actually, it may very well still be correct, but perhaps @niknow can confirm that these issues are just the result of the perspective / plotting limitations?!

example_annotated

- **Input handling**: We don't really have any checks for the spherical equivalent of general position in the input data. It is not difficult to cook up pathological input that would cause issues. I think anything with < 3 points is problematic, and single hemisphere or all points on equator may also be pathological. Duplicate generators would obviously be problematic.

Beyond that, I guess there's still a few check points in the original list above, although they may be less important now.

Contributor

tylerjereddy commented Oct 18, 2015

  • Performance
    • The Competition: There seems to be a paucity of open source alternatives for calculating spherical Voronoi diagrams. There's an interactive implementation written by Jason Davies (https://www.jasondavies.com/maps/voronoi/) using the javascript D3 library. As this implementation appears to allow progressive (interactive) insertion of data points (and also proceeds via the Convex Hull), it will probably have the same quadratic time complexity as our algorithm (see Table 1 below).

      Benchmarking would take at least some effort as only interactive interaction is allowed (can't feed in array data to get output--which is part of the motivation for developing something more flexible like this in scipy). His example with all major airports in the world (https://www.jasondavies.com/maps/voronoi/airports/) cites about 3000 data points, which is easily within the well-performing region of our implementation.

      There's a closed source implementation in Wolfram (http://demonstrations.wolfram.com/VoronoiDiagramOnASphere/), but this also only appears to allow for interaction rather than feeding in data and returning vertices, regions, etc. It also briefly describes an algorithm that exploits the same properties as ours, and would likely have the same time complexity.

      Apparently, the bleeding edge version of CGAL should be capable of calculating spherical Voronoi diagrams, and indeed it was a former CGAL core developer that provided key insight into the method for sorting the Voronoi regions in my early attempts. I think setting up CGAL properly with C++ libraries can be tricky though. Last time I checked, Qhull made a brief comment about how the spherical Voronoi diagram could be computed, but didn't offer this functionality. If I had to guess, I would suspect that the CGAL implementation of a similar algorithm would run at least a bit faster than ours with all the effort they put into C++ optimization, etc. I can't find a reference to it in their documentation though.

    • Theory: There are faster (loglinear) algorithms recently published in the mathematics literature for spherical Voronoi diagrams (Table 1). I made several attempts to incorporate the stereographic projection paradigm, without much success. In the future, a faster implementation would be nice, but these can be really hard to implement. At least if we have a working slower algorithm it will serve as a useful unit test for correct results when attempting to improve with a fancier algorithm.

    • Generator limit: It appears that the upper limit on the number of testable generators (input data points) in my benchmarks relates to the amount of physical memory used (the space complexity). Although I had made some initial attempts to 'benchmark' the space complexity, this seems to be a bit harder to do well. Suffice it to say that it certainly seems clear that at around 70,000 generators a machine with 16 GB of RAM starts to choke. I routinely deal with ~20,000 generators when working with viruses. Hard to say what other application domains will require--certainly we can't produce a spherical Voronoi diagram with the coordinates of all ~3 trillion trees on our planet. Nonetheless, there are < 200 countries in the world for looking at epidemiology-related things on the planet surface, and surely many practical applications involve < 70,000 generators.

Table 1: Spherical Voronoi Algorithms

Authors Year Citations Paradigm Performance
Augenbaum 1985 89 progressive insertion quadratic
Renka 1997 146 incremental loglinear
Gold and Mostafavi 2000 33 point-line model ?
Na et al. 2002 50 stereographic projection loglinear
Chen et al. 2003 34 spherical mesh ?
Caroli et al. 2009 3 incremental ?
Dinis and Mamede 2010 3 sweep line loglinear
  • Unit testing:
    • The unit tests have changed quite a lot from my original implementation. I think Nikolai felt there were too many 'functional' tests rather than proper unit tests. That was probably true. It is noteworthy that one of the removed unit tests verified an unproved (but empirically observed) property of spherical Voronoi diagrams--the upper limit on the number of Voronoi vertices is 2n - 4, where n is the number of generators.

      A similar relationship (2n - 5) has been formally proven in the plane [Theorem 4.12 in 'Discrete and Computational Geometry' (2011) Devadoss and O'Rourke], but not for a sphere (yet). Anyway, good to have a record of this here if that is proven in the future. Certainly looks true.

generator_vertex_comparison

  • Example in Docstring: I'm a bit confused by the new example that was coded in (see my annotations below). Actually, it may very well still be correct, but perhaps @niknow can confirm that these issues are just the result of the perspective / plotting limitations?!

example_annotated

- **Input handling**: We don't really have any checks for the spherical equivalent of general position in the input data. It is not difficult to cook up pathological input that would cause issues. I think anything with < 3 points is problematic, and single hemisphere or all points on equator may also be pathological. Duplicate generators would obviously be problematic.

Beyond that, I guess there's still a few check points in the original list above, although they may be less important now.

@niknow

This comment has been minimized.

Show comment
Hide comment
@niknow

niknow Oct 20, 2015

Contributor

Regarding the Example in Docstring: I can confirm that there are visibility issues with the generators and the Voronoi vertices. If you rotate the plot in the interactive view, then sometimes generators or Voronoi vertices become visible/invisible depending on the current perspective on the plot. All in all plotting spherical Voronoi diagrams in this way is certainly not ideal. Using matplotlibs built in Poly3DCollection and scatter is only meant as a temporary solution.

Contributor

niknow commented Oct 20, 2015

Regarding the Example in Docstring: I can confirm that there are visibility issues with the generators and the Voronoi vertices. If you rotate the plot in the interactive view, then sometimes generators or Voronoi vertices become visible/invisible depending on the current perspective on the plot. All in all plotting spherical Voronoi diagrams in this way is certainly not ideal. Using matplotlibs built in Poly3DCollection and scatter is only meant as a temporary solution.

@niknow

This comment has been minimized.

Show comment
Hide comment
@niknow

niknow Oct 20, 2015

Contributor

Regarding the 2n-4 unittest: If one plays around with the interactive version from https://www.jasondavies.com/maps/voronoi/ , one can see that the claim that a spherical Voronoi diagram always has exactly v=2n-4 vertices is wrong in general:

voronoi

This picture shows two diagrams with the same number n of generator points, but the right one has one Voronoi vertex more than the left one. However, the left one is a pretty special case, because the degree of that vertex in the graph is four (would our algorithm actually detect that?).
Since the Voronoi vertices are the projections of the circumcenters of the simplices of the Delaunay triangulation, the number of Voronoi vertices is bounded by the number of simplices of a Delaunay triangulation. This is bounded by the McMullen`s upper bound theorem (see for instance https://en.wikipedia.org/wiki/Upper_bound_theorem), which in our case gives v <= 2n-2. So I guess one could always check that.

The fact that the better test v=2n-4 seems to be correct in all your test cases is probably because the problem described above is unlikely to occur in practice and the following should be true: Let v be the number of vertices of a spherical Voronoi diagram. Assume that the union over all faces of the diagram covers the whole sphere and that every vertex has degree 3. Then v=2n-4.
(This can be seen as follows: Remove one face from the Voronoi diagram and project the remaining diagram onto the plane via stereographic projection centered at the generator of the face you removed. The result will be a planar graph with f=n-1 bounded faces. Since all vertices have degree 3 and summing over the degrees of all vertices gives two times the number e of edges, we obtain 2e = 3v. Substituting this into Eulers formula v-e+f=1 for planar graphs gives the result.)

Consequently: If we test the condition v=2n-4 on diagrams, which do not fall into this very special category, I think this is actually okay. Alternatively, we could test for v <= 2n-2. What do you think?

Contributor

niknow commented Oct 20, 2015

Regarding the 2n-4 unittest: If one plays around with the interactive version from https://www.jasondavies.com/maps/voronoi/ , one can see that the claim that a spherical Voronoi diagram always has exactly v=2n-4 vertices is wrong in general:

voronoi

This picture shows two diagrams with the same number n of generator points, but the right one has one Voronoi vertex more than the left one. However, the left one is a pretty special case, because the degree of that vertex in the graph is four (would our algorithm actually detect that?).
Since the Voronoi vertices are the projections of the circumcenters of the simplices of the Delaunay triangulation, the number of Voronoi vertices is bounded by the number of simplices of a Delaunay triangulation. This is bounded by the McMullen`s upper bound theorem (see for instance https://en.wikipedia.org/wiki/Upper_bound_theorem), which in our case gives v <= 2n-2. So I guess one could always check that.

The fact that the better test v=2n-4 seems to be correct in all your test cases is probably because the problem described above is unlikely to occur in practice and the following should be true: Let v be the number of vertices of a spherical Voronoi diagram. Assume that the union over all faces of the diagram covers the whole sphere and that every vertex has degree 3. Then v=2n-4.
(This can be seen as follows: Remove one face from the Voronoi diagram and project the remaining diagram onto the plane via stereographic projection centered at the generator of the face you removed. The result will be a planar graph with f=n-1 bounded faces. Since all vertices have degree 3 and summing over the degrees of all vertices gives two times the number e of edges, we obtain 2e = 3v. Substituting this into Eulers formula v-e+f=1 for planar graphs gives the result.)

Consequently: If we test the condition v=2n-4 on diagrams, which do not fall into this very special category, I think this is actually okay. Alternatively, we could test for v <= 2n-2. What do you think?

@tylerjereddy

This comment has been minimized.

Show comment
Hide comment
@tylerjereddy

tylerjereddy Oct 20, 2015

Contributor

@niknow Just to clarify, the upper limit should be 2n-4 Voronoi vertices, not the exact number of vertices. Does the exception exceed the upper limit [I can't tell if the diagram with one less vertex is at the 2n-4 limit already, or somewhere below it--so adding one more vertex doesn't seem to rule the upper limit out clearly]? If it exceeds the upper limit, that would be an issue, but if it is <= 2n-4, the unit test still holds as far as I can tell. The textbook also defines the planar version (2n-5) as an upper limit only. If we have a concrete example where 2n-4 is exceeded in a spherical Voronoi diagram then yeah maybe v <= 2n-2 could be used, although maybe not crucial.

Contributor

tylerjereddy commented Oct 20, 2015

@niknow Just to clarify, the upper limit should be 2n-4 Voronoi vertices, not the exact number of vertices. Does the exception exceed the upper limit [I can't tell if the diagram with one less vertex is at the 2n-4 limit already, or somewhere below it--so adding one more vertex doesn't seem to rule the upper limit out clearly]? If it exceeds the upper limit, that would be an issue, but if it is <= 2n-4, the unit test still holds as far as I can tell. The textbook also defines the planar version (2n-5) as an upper limit only. If we have a concrete example where 2n-4 is exceeded in a spherical Voronoi diagram then yeah maybe v <= 2n-2 could be used, although maybe not crucial.

@argriffing

This comment has been minimized.

Show comment
Hide comment
@argriffing

argriffing Oct 20, 2015

Contributor

If one plays around with the interactive version from https://www.jasondavies.com/maps/voronoi/ , one can see that the claim that a spherical Voronoi diagram always has exactly v=2n-4 vertices is wrong in general

Isn't this what is meant by cooking up a pathological input which is not in (the spherical equivalent of) general position?

We don't really have any checks for the spherical equivalent of general position in the input data. It is not difficult to cook up pathological input that would cause issues.

Contributor

argriffing commented Oct 20, 2015

If one plays around with the interactive version from https://www.jasondavies.com/maps/voronoi/ , one can see that the claim that a spherical Voronoi diagram always has exactly v=2n-4 vertices is wrong in general

Isn't this what is meant by cooking up a pathological input which is not in (the spherical equivalent of) general position?

We don't really have any checks for the spherical equivalent of general position in the input data. It is not difficult to cook up pathological input that would cause issues.

@tylerjereddy tylerjereddy referenced this pull request in matplotlib/matplotlib Oct 21, 2015

Open

Handling Spherical Polygons #5294

@niknow

This comment has been minimized.

Show comment
Hide comment
@niknow

niknow Oct 22, 2015

Contributor

As I said, this problem occurs only in rather special cases. Tylers plot and my argument suggests that it is ok to test for v=2n-4 as long as the diagram used for the unit test is not very special. Testing for v=2n-4 has the advantage that this would detect if the number of vertices has changed after a refactoring for instance (which would be suspicous and a good reason to double check the refactoring).

Contributor

niknow commented Oct 22, 2015

As I said, this problem occurs only in rather special cases. Tylers plot and my argument suggests that it is ok to test for v=2n-4 as long as the diagram used for the unit test is not very special. Testing for v=2n-4 has the advantage that this would detect if the number of vertices has changed after a refactoring for instance (which would be suspicous and a good reason to double check the refactoring).

@tylerjereddy

This comment has been minimized.

Show comment
Hide comment
@tylerjereddy

tylerjereddy Nov 2, 2015

Contributor

So, should some effort be put in to input handling and checking for pathological cases? For example, check how scipy.spatial.Voronoi (or qhull, really) deals with duplicate data points in the input array and build similar behaviour into SphericalVoronoi

Contributor

tylerjereddy commented Nov 2, 2015

So, should some effort be put in to input handling and checking for pathological cases? For example, check how scipy.spatial.Voronoi (or qhull, really) deals with duplicate data points in the input array and build similar behaviour into SphericalVoronoi

@niknow

This comment has been minimized.

Show comment
Hide comment
@niknow

niknow Nov 5, 2015

Contributor

I guess we could add a check ifthe number of input points is >= 4 for instance.

Contributor

niknow commented Nov 5, 2015

I guess we could add a check ifthe number of input points is >= 4 for instance.

@tylerjereddy tylerjereddy referenced this pull request in tylerjereddy/py_sphere_Voronoi Nov 12, 2015

Open

dimension of input coordinates and voronoi patch are not the same #5

@tylerjereddy

This comment has been minimized.

Show comment
Hide comment
@tylerjereddy

tylerjereddy Nov 12, 2015

Contributor

@niknow I've merged in your branch with adjustments to the unit tests. Note that there are actually several other locations where set up code could be abstracted to setUp() methods and there is also a complete lack of tearDown() methods in the testsuite--the latter are normally used to clean up objects generated in setUp(), although this may not be crucial given the small size of most testing objects used.

Contributor

tylerjereddy commented Nov 12, 2015

@niknow I've merged in your branch with adjustments to the unit tests. Note that there are actually several other locations where set up code could be abstracted to setUp() methods and there is also a complete lack of tearDown() methods in the testsuite--the latter are normally used to clean up objects generated in setUp(), although this may not be crucial given the small size of most testing objects used.

@niknow

This comment has been minimized.

Show comment
Hide comment
@niknow

niknow Nov 14, 2015

Contributor

Very good. :) There are no tearDown-methods at the moment, because they would be empty.

Contributor

niknow commented Nov 14, 2015

Very good. :) There are no tearDown-methods at the moment, because they would be empty.

scipy/spatial/_spherical_voronoi.py
+ """
+ Calculates the determinant of m using Laplace expansion in the first row.
+ This function is used only as a fallback to ensure backwards compatibility
+ with Python 2.6.

This comment has been minimized.

@pv

pv Nov 15, 2015

Member

The reason to have this function is probably not backwards compat with Python 2.6 (if it is, that's fairly obscure and would need some further clarification). Can this be replaced by np.linalg.det(m), which should be more efficient? The code below calls it for 4x4 matrices only, which should be OK for all Numpy versions.

@pv

pv Nov 15, 2015

Member

The reason to have this function is probably not backwards compat with Python 2.6 (if it is, that's fairly obscure and would need some further clarification). Can this be replaced by np.linalg.det(m), which should be more efficient? The code below calls it for 4x4 matrices only, which should be OK for all Numpy versions.

This comment has been minimized.

@niknow

niknow Nov 15, 2015

Contributor

We were using np.linalg.det only in a previous built (see for instance b6724a9), which works perfectly fine in most unittest scenarios of scipy. However, in the unittest suite Python 2.6 this throws the error:

LinAlgError: 3-dimensional array given. Array must be two-dimensional

To my knowledge, the np.linalg.det function used in the version of numpy, which is in place during the Python 2.6 unittests supports only 2x2-matrices. Therefore, we introduced the check in line 21, to find out whether or not np.linalg.det from numpy can be used or if we have to use the fallback code. Determinants of 4x4 matrices are needed in the function calc_circumcenters. They are crucial to that function and this function is crucial to the entire module.

Therefore, I unfortunately don't see any way to remove the fallback code without breaking the Python 2.6 unittest, although this is admittedly a bit ugly. :/ If you have any idea how to solve this problem more elegantly, please let us know!

I totally agree that the docstring is misleading though. The problem is not the version of Python, but the version of numpy used in conjuction with Python 2.6 in the Python 2.6 unittesting framework of scipy. Do you think it is ok, if I just change the docstring to ''to ensure backwards compatibility with numpy < 1.8.0" ?

@niknow

niknow Nov 15, 2015

Contributor

We were using np.linalg.det only in a previous built (see for instance b6724a9), which works perfectly fine in most unittest scenarios of scipy. However, in the unittest suite Python 2.6 this throws the error:

LinAlgError: 3-dimensional array given. Array must be two-dimensional

To my knowledge, the np.linalg.det function used in the version of numpy, which is in place during the Python 2.6 unittests supports only 2x2-matrices. Therefore, we introduced the check in line 21, to find out whether or not np.linalg.det from numpy can be used or if we have to use the fallback code. Determinants of 4x4 matrices are needed in the function calc_circumcenters. They are crucial to that function and this function is crucial to the entire module.

Therefore, I unfortunately don't see any way to remove the fallback code without breaking the Python 2.6 unittest, although this is admittedly a bit ugly. :/ If you have any idea how to solve this problem more elegantly, please let us know!

I totally agree that the docstring is misleading though. The problem is not the version of Python, but the version of numpy used in conjuction with Python 2.6 in the Python 2.6 unittesting framework of scipy. Do you think it is ok, if I just change the docstring to ''to ensure backwards compatibility with numpy < 1.8.0" ?

This comment has been minimized.

@argriffing

argriffing Nov 21, 2015

Contributor

Right, the Python version is a red herring. The issue was that numpy < 1.8 cannot do a bunch of determinants in a single call, whereas newer numpy can do this because it has more clever broadcasting. To avoid a combinatorial explosion of TravisCI tests, not all supported Python versions are tested together with all supported numpy versions, which has resulted in old numpy being tested only on old Python.

@argriffing

argriffing Nov 21, 2015

Contributor

Right, the Python version is a red herring. The issue was that numpy < 1.8 cannot do a bunch of determinants in a single call, whereas newer numpy can do this because it has more clever broadcasting. To avoid a combinatorial explosion of TravisCI tests, not all supported Python versions are tested together with all supported numpy versions, which has resulted in old numpy being tested only on old Python.

This comment has been minimized.

@pv

pv Nov 22, 2015

Member

If the issue is backwards compat with Numpy <1.8, then this function should not be needed, because a numpy version check is made already before, cf. if HAS_NUMPY_VEC_DET: below.

@pv

pv Nov 22, 2015

Member

If the issue is backwards compat with Numpy <1.8, then this function should not be needed, because a numpy version check is made already before, cf. if HAS_NUMPY_VEC_DET: below.

This comment has been minimized.

@pv

pv Nov 22, 2015

Member

Namely, the for loops are already done there, so it appears np.linalg.det should work OK. If the idea is to have determinant_fallback as a vectorized determinant, then for loops should not be used.

@pv

pv Nov 22, 2015

Member

Namely, the for loops are already done there, so it appears np.linalg.det should work OK. If the idea is to have determinant_fallback as a vectorized determinant, then for loops should not be used.

This comment has been minimized.

@tylerjereddy

tylerjereddy Nov 26, 2015

Contributor

@pv I removed determinant_fallback and adjusted the code to use np.linalg.det in its place, as you describe. Hopefully the problematic test suite with python 2.6 / older numpy passes.

@tylerjereddy

tylerjereddy Nov 26, 2015

Contributor

@pv I removed determinant_fallback and adjusted the code to use np.linalg.det in its place, as you describe. Hopefully the problematic test suite with python 2.6 / older numpy passes.

@niknow

This comment has been minimized.

Show comment
Hide comment
@niknow

niknow Nov 21, 2015

Contributor

While going over the ToDo list again I was wondering: What tasks are currently blocking the release?

Contributor

niknow commented Nov 21, 2015

While going over the ToDo list again I was wondering: What tasks are currently blocking the release?

@tylerjereddy

This comment has been minimized.

Show comment
Hide comment
@tylerjereddy

tylerjereddy Dec 17, 2015

Contributor

Running:
nosetests --with-coverage

And filtering for the relevant module results:

Name                                                Stmts   Miss Branch BrPart  Cover   Missing
-----------------------------------------------------------------------------------------------
scipy/spatial/_spherical_voronoi.py                    71      4     15      1    94%   59-62, 53->59

I think that is a decent result--so I'll add a check for quantifying test coverage above.

Contributor

tylerjereddy commented Dec 17, 2015

Running:
nosetests --with-coverage

And filtering for the relevant module results:

Name                                                Stmts   Miss Branch BrPart  Cover   Missing
-----------------------------------------------------------------------------------------------
scipy/spatial/_spherical_voronoi.py                    71      4     15      1    94%   59-62, 53->59

I think that is a decent result--so I'll add a check for quantifying test coverage above.

@codecov-io

This comment has been minimized.

Show comment
Hide comment
@codecov-io

codecov-io Jan 4, 2016

@@            master   #5232   diff @@
======================================
  Files          235     236     +1
  Stmts        43388   43460    +72
  Branches      8167    8174     +7
  Methods          0       0       
======================================
+ Hit          33782   33849    +67
- Partial       2593    2594     +1
- Missed        7013    7017     +4

Review entire Coverage Diff as of 8b0cd69

Powered by Codecov. Updated on successful CI builds.

@@            master   #5232   diff @@
======================================
  Files          235     236     +1
  Stmts        43388   43460    +72
  Branches      8167    8174     +7
  Methods          0       0       
======================================
+ Hit          33782   33849    +67
- Partial       2593    2594     +1
- Missed        7013    7017     +4

Review entire Coverage Diff as of 8b0cd69

Powered by Codecov. Updated on successful CI builds.

niknow and others added some commits Sep 27, 2015

The SphericalVoronoi docstring Examples section has been improved by …
…removing spurious imports of numpy and scipy. This Fixes #10 in the fork on which the PR is based.
Prepended the name of the spherical_voronoi.py module with an undersc…
…ore and adjusted the __init__.py file accordingly to allow the sphinx rendering to html to be preserved. Fixes #11 in the fork that is the basis for this PR.
Adjusting SphericalVoronoi docstring to deal with import changes real…
…ting to _spherical_voronoi. Likewise for related unit tests. Fixes #11.
@tylerjereddy

This comment has been minimized.

Show comment
Hide comment
@tylerjereddy

tylerjereddy Feb 16, 2016

Contributor

@argriffing I ended up rebasing on the command line (didn't see any 'button' for this based on the already-rebased branch created above?). Had to add one more commit to clean up the testing module after the rebase for some reason (I had to resolve a few conflicts along the way). I suspect the latter commit shouldn't have been necessary with a perfect conflict resolution workflow using git rebase --continue as I did, but it seems fine now.

After recompiling / installing scipy the full suite of scipy.spatial tests passed. Checking the "Files Changed" tab seems to indicate that the major source of conflicts Thanks.txt was fairly cleanly resolved (i.e., no obliterated contributions there).

Contributor

tylerjereddy commented Feb 16, 2016

@argriffing I ended up rebasing on the command line (didn't see any 'button' for this based on the already-rebased branch created above?). Had to add one more commit to clean up the testing module after the rebase for some reason (I had to resolve a few conflicts along the way). I suspect the latter commit shouldn't have been necessary with a perfect conflict resolution workflow using git rebase --continue as I did, but it seems fine now.

After recompiling / installing scipy the full suite of scipy.spatial tests passed. Checking the "Files Changed" tab seems to indicate that the major source of conflicts Thanks.txt was fairly cleanly resolved (i.e., no obliterated contributions there).

@argriffing

This comment has been minimized.

Show comment
Hide comment
@argriffing

argriffing Feb 16, 2016

Contributor

didn't see any 'button'

Now I see that this PR does not have its own branch -- the changes are directly to master. Maybe that has something to do with it.

Contributor

argriffing commented Feb 16, 2016

didn't see any 'button'

Now I see that this PR does not have its own branch -- the changes are directly to master. Maybe that has something to do with it.

+ for vertex in sv.vertices:
+ distances = distance.cdist(sv.points,np.array([vertex]))
+ closest = np.array(sorted(distances)[0:3])
+ print(closest)

This comment has been minimized.

@pv

pv Feb 16, 2016

Member

Extra print

@pv

pv Feb 16, 2016

Member

Extra print

@pv

This comment has been minimized.

Show comment
Hide comment
@pv

pv Feb 16, 2016

Member

afaik, github ui doesn't have helpers for rebasing things, you have to do it manually (as done above)

Member

pv commented Feb 16, 2016

afaik, github ui doesn't have helpers for rebasing things, you have to do it manually (as done above)

pv added a commit that referenced this pull request Feb 16, 2016

Merge pull request #5232 from tylerjereddy/master
ENH: spatial: adding spherical Voronoi diagram algorithm to scipy.spatial

@pv pv merged commit 648ad02 into scipy:master Feb 16, 2016

2 checks passed

codecov/project 77.88% (target 0.01%)
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@pv

This comment has been minimized.

Show comment
Hide comment
@pv

pv Feb 16, 2016

Member

Thanks, merged based on above comments.

Member

pv commented Feb 16, 2016

Thanks, merged based on above comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment