ENH: Poisson Disk sampling for QMC #13918

diregoblin · 2021-04-22T08:15:24Z

Reference issue

Adding Poisson Disk sampling to QMC #13912

What does this implement/fix?

Poisson disk sampling in arbitrary dimensions using Bridson's algorithm, implemented using numpy and scipy.
Generates so-called "blue noise" that prevents clustering by ensuring each two points are at least radius apart.
https://www.cs.ubc.ca/~rbridson/docs/bridson-siggraph07-poissondisk.pdf

import matplotlib.pyplot as plt
from matplotlib.collections import PatchCollection
from scipy.stats import qmc


rng = np.random.default_rng()
radius = 0.2
engine = qmc.PoissonDisk(d=2, radius=0.2, seed=rng)
sample = engine.random(20)

fig, ax = plt.subplots()
_ = ax.scatter(sample[:, 0], sample[:, 1])
circles = [plt.Circle((xi, yi), radius=radius/2, fill=False)
           for xi, yi in sample]
collection = PatchCollection(circles, match_original=True)
ax.add_collection(collection)
_ = ax.set(aspect='equal', xlabel=r'$x_1$', ylabel=r'$x_2$',
           xlim=[0, 1], ylim=[0, 1])
plt.show()

scipy/stats/_qmc.py

tupui · 2021-07-29T08:51:14Z

@diregoblin any update here? I am also seeing that @rougier you did a PR here scikit-image/scikit-image#3531. Maybe you would be interested to help here?

rougier · 2021-07-29T14:19:55Z

Sure. How exactly?

tupui · 2021-07-29T15:06:46Z

Sure. How exactly?

@rougier you could either work together on the same branch if @diregoblin would agree, or provide comments, review, suggestions on the current progress. Basically just having another pair of eyes/expertize would already be helpful 😃

tupui · 2021-08-09T20:08:01Z

@jasondavies I saw you did some implementations of this. You might be interested in this.

tupui · 2021-10-01T18:08:33Z

@diregoblin @rougier I have been playing around and did this https://gist.github.com/tupui/ade53fbc2dbde5420eab1f6000e1f963.

Still chasing bugs (I will update this soon) but this is the idea behind what I had in mind.

tupui · 2021-10-05T15:58:12Z

@diregoblin @rougier I updated my gist and now it works as expected (sorry if you checked it before 😅, I had unexpected slicing issues, and I am still puzzled about a few things like the need to use np.flip). The idea is not that we copy paste this, but since we have now 2 implementations, we can think further on what's best in terms of algorithm/design/efficiency. Let me know what you think about these.

rougier · 2021-10-06T05:55:40Z

For the PR, you could add a test to check for point to point distances that need to be greater or equal to the radius. For performances, you could benchmark the two methods on a few cases and check which one is faster. Also, from memory, I had some surprise with the size of the initial domain (square domain was ok, but for other cases, I did some bad mix between width / height)

tupui · 2022-01-18T10:16:15Z

Here is another approach worth looking into, sample elimination. It's not a strict poisson disk sampling though. Starting from a good sample (from a low discrepancy sequence) would make this quite fast.

http://www.cemyuksel.com/research/sampleelimination/sampleelimination.pdf

We could have a method parameter to select different algorithms from.

tupui · 2022-03-14T15:59:24Z

@diregoblin are you still interested in working on this?

If not, I am happy to try make this ready for inclusion. (Your authorship would be preserved if that would be a concern.)

I tested this a bit more locally and found your version to be faster than mine. I might be doing something wrong, I am not sure. It ended up being a bit complicated with all the slicing in nD. But I am just fine going with any version to move forward.

diregoblin · 2022-03-15T14:33:35Z

@tupui Hi Pamphile,

Sorry, I've been too busy lately with teaching and other stuff to really work on this... If you are ready to proceed with making this ready for inclusion, please go ahead. If you need any clarification about the algorithm, I'll be happy to help of course.

tupui · 2022-03-15T17:09:02Z

@tupui Hi Pamphile,

Sorry, I've been too busy lately with teaching and other stuff to really work on this... If you are ready to proceed with making this ready for inclusion, please go ahead. If you need any clarification about the algorithm, I'll be happy to help of course.

No problem. Thank you I will take over then and see what I can do 😃

[skip ci]

tupui · 2022-05-17T13:05:28Z

scipy/stats/_qmc.py

+        def in_limits(sample: np.ndarray) -> bool:
+            return (sample.max() <= 1.) and (sample.min() >= 0.)


We need this in a few other places in this file already. I would like to create an external function for that.

I will do another PR.

tupui · 2022-05-17T13:12:28Z

FYI I started the maintenance work. I am just trying to clean the code and make it more "SciPy"-like since the approach seemed to be good when I played around with my implementation.

I am going to add some tests/doc and think a bit more about the API before asking for reviews. Of course, feel free to comment my progress already.

tupui · 2022-05-19T13:45:39Z

Thank you @diregoblin for the review 😃 I made the changes. Hopefully we have another maintainer reviewing this before end of next week so it can get into the new release (1.9).

@grlee77 if you have time can you have a look?

diregoblin · 2022-05-19T14:27:28Z

Great! One thing that worries me a bit is the way this is supposed to be used in QMC. If we just draw n samples, we will sample a region of space around the initial point, and have zero samples elsewhere. The best way to use this sampling is to draw new points until the pool is empty: that will mean the whole domain is covered, and there is nowhere to put new samples. I don't see a natural way to do this in the scipy implementation, except for selecting a very large n. Maybe you can think of adding this option?

tupui · 2022-05-19T14:37:47Z

Great! One thing that worries me a bit is the way this is supposed to be used in QMC. If we just draw n samples, we will sample a region of space around the initial point, and have zero samples elsewhere. The best way to use this sampling is to draw new points until the pool is empty: that will mean the whole domain is covered, and there is nowhere to put new samples. I don't see a natural way to do this in the scipy implementation, except for selecting a very large n. Maybe you can think of adding this option?

I see what you mean yes. Right now you can achieve this by asking more points and it will continue the sampling (there is a test for that). Instead of another option, we could also add another method that would not ask for n but just fill the space. I would call it fill_space? If you have other ideas. I think this would be cleaner in this API. We have such additional method with Sobol' for instance where we can sample only powers of 2. What do you think?

diregoblin · 2022-05-19T14:54:53Z

Great! One thing that worries me a bit is the way this is supposed to be used in QMC. If we just draw n samples, we will sample a region of space around the initial point, and have zero samples elsewhere. The best way to use this sampling is to draw new points until the pool is empty: that will mean the whole domain is covered, and there is nowhere to put new samples. I don't see a natural way to do this in the scipy implementation, except for selecting a very large n. Maybe you can think of adding this option?

I see what you mean yes. Right now you can achieve this by asking more points and it will continue the sampling (there is a test for that). Instead of another option, we could also add another method that would not ask for n but just fill the space. I would call it fill_space? If you have other ideas. I think this would be cleaner in this API. We have such additional method with Sobol' for instance where we can sample only powers of 2. What do you think?

Yeah, I think this can work, and the name sounds clear enough. This is probably the best option.

rougier · 2022-05-19T17:19:29Z

Looks good. One comment for the main method: it would be nice to be able to specify the overall number of samples we want instead of radius. In such case, we could compute an approximated radius to achieve this number of samples (with no guarantee unless we ad/remove missing point and proceed with a Lloyd relaxation.

tupui · 2022-05-19T18:32:07Z

Looks good. One comment for the main method: it would be nice to be able to specify the overall number of samples we want instead of radius. In such case, we could compute an approximated radius to achieve this number of samples (with no guarantee unless we ad/remove missing point and proceed with a Lloyd relaxation.

Oh that's a good idea. So in terms of API we could still have radius, but using None it would do the automatic computation.

About the Lloyd relaxation, I am happy to hear we could have an internal usage of what I added 😅 How would this work? ~~Like we would run Lloyd and then put again all samples in the pool and retry?~~ ah ok understood.

I suggest we work on these in follow ups (I like these and will do it, I am not pushing back to not do it😉). The current state is ready IMO and we don't have much more time to get a chance to see this in the next release. It will be also simpler to review if we do this incrementally.

tupui · 2022-05-19T19:44:07Z

@diregoblin @rougier I made the changes to add fill_space. If you are ok with this PR, please consider approving. This would give confidence for other maintainers to merge as experts would have seen this 😃

[skip actions] [skip azp]

diregoblin

Checked, this looks good to me. Github doesn't allow me to explicitly approve this, since it's my PR.

diregoblin · 2022-05-20T07:43:24Z

Looks good. One comment for the main method: it would be nice to be able to specify the overall number of samples we want instead of radius. In such case, we could compute an approximated radius to achieve this number of samples (with no guarantee unless we ad/remove missing point and proceed with a Lloyd relaxation.

Off the top of my head, I don't know a way to calculate r that will work in arbitrary dimensions. It is related to the hypersphere packing problem, and I think there are only rough estimations for high-dimensional space.

stefanv

I just did a rough read-over of the docs mainly.

scipy/stats/_qmc.py

stefanv · 2022-05-20T18:24:29Z

scipy/stats/_qmc.py

+          new candidates are sampled within the hypersphere.
+        * ``surface``: only sample the surface of the hypersphere.
+
+    candidates : int


Wouldn't you want to control the number of points more directly?

What do you mean by more directly?

Often I want to put, say, 15 points into an area, ensuring they don't touch. It does not look as though you can do that with the current implementation.

You can, it's just random(15). The issue is that it can return less than 15 points. It would be hard to generalize and ensure we can find exactly 15 points. We would need to change the radius to make this happen. But then we would not be able to potentially add more samples or it would do so by reducing the radius. IMO not clear. Can make sense, but definitely a deviation from the base Poisson sampling.

This parameter is just to say that at each iterations how many points around a given point we are sampling. Then we look at these candidates one by one and see if they are far enough from existing points. If yes we add them and to the sample and the pool of centre points to use in next iterations.

random(15)? I thought the argument is an int?

I meant PoissonDisk(...).random(15)

Ah! That makes sense. If you happen to have enough points in the poisson disk, but I suppose you can write a loop to keep trying.

This is a bit what the method full_space does. Try to add as many points as possible.

scipy/stats/_qmc.py

stefanv · 2022-05-20T18:58:00Z

scipy/stats/_qmc.py

+        self.squared_radius = self.radius**2
+
+        # sample to generate per iteration in the hypersphere around center
+        self.candidates = candidates


candidates is a bit confusing, since these aren't the candidates; perhaps nr_candidates?

Do you mean just here or also the parameter name? I though it was simple like this. And since we have a parameter called candidates as a user I think I would not expect self.candidates to be something else than my parameter.

The other reason for not using n_... is that we are not consistent at all in terms of naming the number of something. So I didn't want to make a choice 😅

I think candidates refer to the candidates themselves, whereas something like nr_candidates refers to, well, that. So just depends on what you mean with the name.

I do mean a number of candidates. What about I mark it private and rename it to self._nr_candidates while keeping the name of the parameter as candidates which I think is clear especially since we have typing.

Check the rest of SciPy? At a quick glance, it seems like elsewhere they also user nr when indicating number of. See, e.g., https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.basinhopping.html#scipy.optimize.basinhopping

There will undoubtedly be many other examples.

Ok then I will rename both to ncandidates

scipy/stats/_qmc.py

Co-authored-by: Stefan van der Walt <sjvdwalt@gmail.com>

tupui

Thank you @stefanv!

tupui · 2022-05-20T19:24:24Z

scipy/stats/_qmc.py

+          new candidates are sampled within the hypersphere.
+        * ``surface``: only sample the surface of the hypersphere.
+
+    candidates : int


What do you mean by more directly?

tupui · 2022-05-20T19:27:18Z

scipy/stats/_qmc.py

+
+    The maximum number of point that a sample can contain is directly linked
+    to the `radius`. As the dimension of the space increases, a higher radius
+    spread the points further and help overcome the curse-of-dimensionality.


Yes exactly

tupui · 2022-05-20T19:27:56Z

scipy/stats/_qmc.py

+
+    .. warning::
+
+       The algorithm is more suitable for low dimensions and sampling size


I am not sure what to write. This is very empirical and highly depends on how you select the parameters. If you take a tiny radius and high dimensions then it will need a tons of points to fill the space. But playing with the radius can make this totally manageable.

tupui · 2022-05-20T19:31:19Z

scipy/stats/_qmc.py

+       The algorithm is more suitable for low dimensions and sampling size
+       due to its iterative nature and memory requirements.
+
+    Some code taken from [2]_ by P. Zun, written consent given on 31.03.2021


P. Zun (@diregoblin) got the code from [2] (actually he tracked down some Matlab code and it lead to this SO post). There is more detail in the long discussion we had on this PR at the beginning.

scipy/stats/_qmc.py

tupui · 2022-05-20T19:37:35Z

scipy/stats/_qmc.py

+        self.squared_radius = self.radius**2
+
+        # sample to generate per iteration in the hypersphere around center
+        self.candidates = candidates


Do you mean just here or also the parameter name? I though it was simple like this. And since we have a parameter called candidates as a user I think I would not expect self.candidates to be something else than my parameter.

The other reason for not using n_... is that we are not consistent at all in terms of naming the number of something. So I didn't want to make a choice 😅

scipy/stats/_qmc.py

Co-authored-by: Stefan van der Walt <sjvdwalt@gmail.com>

tupui

I think I addressed your review @stefanv 😃 (I am running the CI again to have it green since I changed a param. Should be good, it passes locally. I also want to check on the doc the link to the tutorial)

tupui · 2022-05-20T21:09:38Z

scipy/stats/_qmc.py

+
+    .. warning::
+
+       The algorithm is more suitable for low dimensions and sampling size


tupui · 2022-05-20T21:10:15Z

scipy/stats/_qmc.py

+    a hard problem parameters might need to be adjusted manually. Beware of
+    the dimension: as the dimensionality increases, the number of samples
+    required to fill the space increases exponentially
+    (curse-of-dimensionality).


I added a link above in the notes when I first mention the curse.

[skip actions] [skip circle]

tylerjereddy · 2022-05-23T01:37:13Z

Looks like the reviews are generally pretty positive and CI is passing with 100 % patch diff coverage. No old tests have been modified.

It looks like the commits have been written fairly deliberately so I'll let them remain separate when I merge, just in case.

tylerjereddy · 2022-05-23T01:37:23Z

thanks

tupui · 2022-05-23T03:15:05Z

Thank you @tylerjereddy!

Thank you everyone for the work, especially @diregoblin! It's a great new feature to have IMHO.

diregoblin · 2022-05-23T12:04:37Z

Thank you everybody! Pamphile, thanks again for adapting the code to SciPy standards. Cheers, Pavel

…

On Mon, 23 May 2022 at 06:15, Pamphile Roy ***@***.***> wrote: Thank you @tylerjereddy <https://github.com/tylerjereddy>! Thank you everyone for the work, especially @diregoblin <https://github.com/diregoblin>! It's a great new feature to have IMHO. — Reply to this email directly, view it on GitHub <#13918 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAM4QINXUE6ZKXEGJWV6WVLVLLZ4PANCNFSM43L6BEYQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

tupui · 2022-05-23T12:51:56Z

Pleasure @diregoblin. Don't hesitate to ping me if you have improvements you would like to make. I am happy to pair-program like this in the future.

jpreiss · 2022-05-23T21:40:58Z

Thanks for implementing this feature! It will be very useful for things like procedural generation as well as QMC.

tupui · 2022-05-24T00:29:53Z

@jpreiss happy to read. I am very much interested in what people will do with this and QMC in general. Feel free to ping me or open issues with enhancements/bugs.

tupui added enhancement A new feature or improvement scipy.stats labels Apr 22, 2021

tupui linked an issue Apr 22, 2021 that may be closed by this pull request

Adding Poisson Disc sampling to QMC #13912

Closed

tupui reviewed Apr 22, 2021

View reviewed changes

scipy/stats/_qmc.py Outdated Show resolved Hide resolved

tupui mentioned this pull request Aug 1, 2021

META: tracker for improvements in scipy.stats.qmc #14510

Open

31 tasks

diregoblin and others added 2 commits May 16, 2022 16:27

Initial commit of Poisson Disc for QMC

665aab9

MAINT: refactor code.

c98fbe5

[skip ci]

tupui force-pushed the master branch from 271ee18 to c98fbe5 Compare May 17, 2022 12:59

tupui reviewed May 17, 2022

View reviewed changes

tupui added 4 commits May 17, 2022 19:16

MAINT: fix variables and shapes

f4f5e17

TST: add PoissonDisc test

d77bee0

MAINT: rename disc to disk to match literature

98c30ed

TST: combination of dim and radius in tests

05cd236

tupui force-pushed the master branch from 49731cb to 05cd236 Compare May 17, 2022 17:44

tupui added 4 commits May 17, 2022 20:26

MAINT: fix 0 dim/sample cases

eb0b4b8

ENH: allow continue sample and proper reset

47817f8

TST/DOC: add validation test and some documentation

9297146

FIX: ignore divide warning

6d8e790

tupui added this to the 1.9.0 milestone May 19, 2022

ENH: add fill_space

a41baf1

DOC: fix examples rendering.

d3a757d

[skip actions] [skip azp]

diregoblin commented May 20, 2022

View reviewed changes

stefanv reviewed May 20, 2022

View reviewed changes

Apply suggestions from code review. [skip actions][skip azp]

7b0cd25

Co-authored-by: Stefan van der Walt <sjvdwalt@gmail.com>

tupui reviewed May 20, 2022

View reviewed changes

tupui and others added 3 commits May 20, 2022 22:09

Update scipy/stats/_qmc.py [skip ci]

8aba4bc

Co-authored-by: Stefan van der Walt <sjvdwalt@gmail.com>

Update scipy/stats/_qmc.py [skip ci]

3e05060

Co-authored-by: Stefan van der Walt <sjvdwalt@gmail.com>

MAINT: rephrase doc and change parameter name

4d1dace

tupui reviewed May 20, 2022

View reviewed changes

MAINT: revert filter warning.

8889fbd

[skip actions] [skip circle]

tupui force-pushed the master branch from f540b1c to 8889fbd Compare May 21, 2022 09:31

tylerjereddy merged commit 1451b2c into scipy:main May 23, 2022

mplough-kobold mentioned this pull request Mar 19, 2024

ENH: Poisson disk sampling for arbitrary bounds #20288

Closed

		def in_limits(sample: np.ndarray) -> bool:
		return (sample.max() <= 1.) and (sample.min() >= 0.)


		.. warning::

		The algorithm is more suitable for low dimensions and sampling size

ENH: Poisson Disk sampling for QMC #13918

ENH: Poisson Disk sampling for QMC #13918

Conversation

diregoblin commented Apr 22, 2021 • edited by tupui

Reference issue

What does this implement/fix?

tupui commented Jul 29, 2021

rougier commented Jul 29, 2021

tupui commented Jul 29, 2021

tupui commented Aug 9, 2021

tupui commented Oct 1, 2021 • edited

tupui commented Oct 5, 2021

rougier commented Oct 6, 2021

tupui commented Jan 18, 2022 • edited

tupui commented Mar 14, 2022

diregoblin commented Mar 15, 2022

tupui commented Mar 15, 2022

Choose a reason for hiding this comment

tupui commented May 17, 2022 • edited

tupui commented May 19, 2022

diregoblin commented May 19, 2022

tupui commented May 19, 2022

diregoblin commented May 19, 2022

rougier commented May 19, 2022

tupui commented May 19, 2022 • edited

tupui commented May 19, 2022

diregoblin left a comment

Choose a reason for hiding this comment

diregoblin commented May 20, 2022

stefanv left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tupui left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tupui left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tylerjereddy commented May 23, 2022

tylerjereddy commented May 23, 2022

tupui commented May 23, 2022

diregoblin commented May 23, 2022 via email

tupui commented May 23, 2022

jpreiss commented May 23, 2022

tupui commented May 24, 2022

diregoblin commented Apr 22, 2021 •

edited by tupui

tupui commented Oct 1, 2021 •

edited

tupui commented Jan 18, 2022 •

edited

tupui commented May 17, 2022 •

edited

tupui commented May 19, 2022 •

edited

tupui left a comment •

edited