Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Poisson Disk sampling for QMC #13918

Merged
merged 20 commits into from
May 23, 2022
Merged

ENH: Poisson Disk sampling for QMC #13918

merged 20 commits into from
May 23, 2022

Conversation

diregoblin
Copy link
Contributor

@diregoblin diregoblin commented Apr 22, 2021

Reference issue

Adding Poisson Disk sampling to QMC #13912

What does this implement/fix?

Poisson disk sampling in arbitrary dimensions using Bridson's algorithm, implemented using numpy and scipy.
Generates so-called "blue noise" that prevents clustering by ensuring each two points are at least radius apart.
https://www.cs.ubc.ca/~rbridson/docs/bridson-siggraph07-poissondisk.pdf

import matplotlib.pyplot as plt
from matplotlib.collections import PatchCollection
from scipy.stats import qmc


rng = np.random.default_rng()
radius = 0.2
engine = qmc.PoissonDisk(d=2, radius=0.2, seed=rng)
sample = engine.random(20)

fig, ax = plt.subplots()
_ = ax.scatter(sample[:, 0], sample[:, 1])
circles = [plt.Circle((xi, yi), radius=radius/2, fill=False)
           for xi, yi in sample]
collection = PatchCollection(circles, match_original=True)
ax.add_collection(collection)
_ = ax.set(aspect='equal', xlabel=r'$x_1$', ylabel=r'$x_2$',
           xlim=[0, 1], ylim=[0, 1])
plt.show()

poissonDisk

@tupui tupui added enhancement A new feature or improvement scipy.stats labels Apr 22, 2021
@tupui tupui linked an issue Apr 22, 2021 that may be closed by this pull request
scipy/stats/_qmc.py Outdated Show resolved Hide resolved
@tupui
Copy link
Member

tupui commented Jul 29, 2021

@diregoblin any update here? I am also seeing that @rougier you did a PR here scikit-image/scikit-image#3531. Maybe you would be interested to help here?

@rougier
Copy link

rougier commented Jul 29, 2021

Sure. How exactly?

@tupui
Copy link
Member

tupui commented Jul 29, 2021

Sure. How exactly?

@rougier you could either work together on the same branch if @diregoblin would agree, or provide comments, review, suggestions on the current progress. Basically just having another pair of eyes/expertize would already be helpful 😃

@tupui
Copy link
Member

tupui commented Aug 9, 2021

@jasondavies I saw you did some implementations of this. You might be interested in this.

@tupui
Copy link
Member

tupui commented Oct 1, 2021

@diregoblin @rougier I have been playing around and did this https://gist.github.com/tupui/ade53fbc2dbde5420eab1f6000e1f963.

Still chasing bugs (I will update this soon) but this is the idea behind what I had in mind.

@tupui
Copy link
Member

tupui commented Oct 5, 2021

@diregoblin @rougier I updated my gist and now it works as expected (sorry if you checked it before 😅, I had unexpected slicing issues, and I am still puzzled about a few things like the need to use np.flip). The idea is not that we copy paste this, but since we have now 2 implementations, we can think further on what's best in terms of algorithm/design/efficiency. Let me know what you think about these.

@rougier
Copy link

rougier commented Oct 6, 2021

For the PR, you could add a test to check for point to point distances that need to be greater or equal to the radius. For performances, you could benchmark the two methods on a few cases and check which one is faster. Also, from memory, I had some surprise with the size of the initial domain (square domain was ok, but for other cases, I did some bad mix between width / height)

@tupui
Copy link
Member

tupui commented Jan 18, 2022

Here is another approach worth looking into, sample elimination. It's not a strict poisson disk sampling though. Starting from a good sample (from a low discrepancy sequence) would make this quite fast.

http://www.cemyuksel.com/research/sampleelimination/sampleelimination.pdf

We could have a method parameter to select different algorithms from.

@tupui
Copy link
Member

tupui commented Mar 14, 2022

@diregoblin are you still interested in working on this?

If not, I am happy to try make this ready for inclusion. (Your authorship would be preserved if that would be a concern.)

I tested this a bit more locally and found your version to be faster than mine. I might be doing something wrong, I am not sure. It ended up being a bit complicated with all the slicing in nD. But I am just fine going with any version to move forward.

@diregoblin
Copy link
Contributor Author

@tupui Hi Pamphile,

Sorry, I've been too busy lately with teaching and other stuff to really work on this... If you are ready to proceed with making this ready for inclusion, please go ahead. If you need any clarification about the algorithm, I'll be happy to help of course.

@tupui
Copy link
Member

tupui commented Mar 15, 2022

@tupui Hi Pamphile,

Sorry, I've been too busy lately with teaching and other stuff to really work on this... If you are ready to proceed with making this ready for inclusion, please go ahead. If you need any clarification about the algorithm, I'll be happy to help of course.

No problem. Thank you I will take over then and see what I can do 😃

Comment on lines +1687 to +1688
def in_limits(sample: np.ndarray) -> bool:
return (sample.max() <= 1.) and (sample.min() >= 0.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need this in a few other places in this file already. I would like to create an external function for that.

I will do another PR.

@tupui
Copy link
Member

tupui commented May 17, 2022

FYI I started the maintenance work. I am just trying to clean the code and make it more "SciPy"-like since the approach seemed to be good when I played around with my implementation.

I am going to add some tests/doc and think a bit more about the API before asking for reviews. Of course, feel free to comment my progress already.

@tupui
Copy link
Member

tupui commented May 19, 2022

Thank you @diregoblin for the review 😃 I made the changes. Hopefully we have another maintainer reviewing this before end of next week so it can get into the new release (1.9).

@grlee77 if you have time can you have a look?

@tupui tupui added this to the 1.9.0 milestone May 19, 2022
@diregoblin
Copy link
Contributor Author

Great! One thing that worries me a bit is the way this is supposed to be used in QMC. If we just draw n samples, we will sample a region of space around the initial point, and have zero samples elsewhere. The best way to use this sampling is to draw new points until the pool is empty: that will mean the whole domain is covered, and there is nowhere to put new samples. I don't see a natural way to do this in the scipy implementation, except for selecting a very large n. Maybe you can think of adding this option?

@tupui
Copy link
Member

tupui commented May 19, 2022

Great! One thing that worries me a bit is the way this is supposed to be used in QMC. If we just draw n samples, we will sample a region of space around the initial point, and have zero samples elsewhere. The best way to use this sampling is to draw new points until the pool is empty: that will mean the whole domain is covered, and there is nowhere to put new samples. I don't see a natural way to do this in the scipy implementation, except for selecting a very large n. Maybe you can think of adding this option?

I see what you mean yes. Right now you can achieve this by asking more points and it will continue the sampling (there is a test for that). Instead of another option, we could also add another method that would not ask for n but just fill the space. I would call it fill_space? If you have other ideas. I think this would be cleaner in this API. We have such additional method with Sobol' for instance where we can sample only powers of 2. What do you think?

@diregoblin
Copy link
Contributor Author

Great! One thing that worries me a bit is the way this is supposed to be used in QMC. If we just draw n samples, we will sample a region of space around the initial point, and have zero samples elsewhere. The best way to use this sampling is to draw new points until the pool is empty: that will mean the whole domain is covered, and there is nowhere to put new samples. I don't see a natural way to do this in the scipy implementation, except for selecting a very large n. Maybe you can think of adding this option?

I see what you mean yes. Right now you can achieve this by asking more points and it will continue the sampling (there is a test for that). Instead of another option, we could also add another method that would not ask for n but just fill the space. I would call it fill_space? If you have other ideas. I think this would be cleaner in this API. We have such additional method with Sobol' for instance where we can sample only powers of 2. What do you think?

Yeah, I think this can work, and the name sounds clear enough. This is probably the best option.

@rougier
Copy link

rougier commented May 19, 2022

Looks good. One comment for the main method: it would be nice to be able to specify the overall number of samples we want instead of radius. In such case, we could compute an approximated radius to achieve this number of samples (with no guarantee unless we ad/remove missing point and proceed with a Lloyd relaxation.

@tupui
Copy link
Member

tupui commented May 19, 2022

Looks good. One comment for the main method: it would be nice to be able to specify the overall number of samples we want instead of radius. In such case, we could compute an approximated radius to achieve this number of samples (with no guarantee unless we ad/remove missing point and proceed with a Lloyd relaxation.

Oh that's a good idea. So in terms of API we could still have radius, but using None it would do the automatic computation.

About the Lloyd relaxation, I am happy to hear we could have an internal usage of what I added 😅 How would this work? Like we would run Lloyd and then put again all samples in the pool and retry? ah ok understood.

I suggest we work on these in follow ups (I like these and will do it, I am not pushing back to not do it😉). The current state is ready IMO and we don't have much more time to get a chance to see this in the next release. It will be also simpler to review if we do this incrementally.

@tupui
Copy link
Member

tupui commented May 19, 2022

@diregoblin @rougier I made the changes to add fill_space. If you are ok with this PR, please consider approving. This would give confidence for other maintainers to merge as experts would have seen this 😃

[skip actions] [skip azp]
Copy link
Contributor Author

@diregoblin diregoblin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked, this looks good to me. Github doesn't allow me to explicitly approve this, since it's my PR.

@diregoblin
Copy link
Contributor Author

Looks good. One comment for the main method: it would be nice to be able to specify the overall number of samples we want instead of radius. In such case, we could compute an approximated radius to achieve this number of samples (with no guarantee unless we ad/remove missing point and proceed with a Lloyd relaxation.

Off the top of my head, I don't know a way to calculate r that will work in arbitrary dimensions. It is related to the hypersphere packing problem, and I think there are only rough estimations for high-dimensional space.

Copy link
Member

@stefanv stefanv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just did a rough read-over of the docs mainly.

scipy/stats/_qmc.py Outdated Show resolved Hide resolved
scipy/stats/_qmc.py Outdated Show resolved Hide resolved
new candidates are sampled within the hypersphere.
* ``surface``: only sample the surface of the hypersphere.

candidates : int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't you want to control the number of points more directly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by more directly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Often I want to put, say, 15 points into an area, ensuring they don't touch. It does not look as though you can do that with the current implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can, it's just random(15). The issue is that it can return less than 15 points. It would be hard to generalize and ensure we can find exactly 15 points. We would need to change the radius to make this happen. But then we would not be able to potentially add more samples or it would do so by reducing the radius. IMO not clear. Can make sense, but definitely a deviation from the base Poisson sampling.

This parameter is just to say that at each iterations how many points around a given point we are sampling. Then we look at these candidates one by one and see if they are far enough from existing points. If yes we add them and to the sample and the pool of centre points to use in next iterations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

random(15)? I thought the argument is an int?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant PoissonDisk(...).random(15)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! That makes sense. If you happen to have enough points in the poisson disk, but I suppose you can write a loop to keep trying.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit what the method full_space does. Try to add as many points as possible.

scipy/stats/_qmc.py Outdated Show resolved Hide resolved
scipy/stats/_qmc.py Outdated Show resolved Hide resolved
scipy/stats/_qmc.py Show resolved Hide resolved
scipy/stats/_qmc.py Outdated Show resolved Hide resolved
self.squared_radius = self.radius**2

# sample to generate per iteration in the hypersphere around center
self.candidates = candidates
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

candidates is a bit confusing, since these aren't the candidates; perhaps nr_candidates?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean just here or also the parameter name? I though it was simple like this. And since we have a parameter called candidates as a user I think I would not expect self.candidates to be something else than my parameter.

The other reason for not using n_... is that we are not consistent at all in terms of naming the number of something. So I didn't want to make a choice 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think candidates refer to the candidates themselves, whereas something like nr_candidates refers to, well, that. So just depends on what you mean with the name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do mean a number of candidates. What about I mark it private and rename it to self._nr_candidates while keeping the name of the parameter as candidates which I think is clear especially since we have typing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check the rest of SciPy? At a quick glance, it seems like elsewhere they also user nr when indicating number of. See, e.g., https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.basinhopping.html#scipy.optimize.basinhopping

There will undoubtedly be many other examples.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok then I will rename both to ncandidates

scipy/stats/_qmc.py Show resolved Hide resolved
scipy/stats/_qmc.py Show resolved Hide resolved
Co-authored-by: Stefan van der Walt <sjvdwalt@gmail.com>
Copy link
Member

@tupui tupui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @stefanv!

new candidates are sampled within the hypersphere.
* ``surface``: only sample the surface of the hypersphere.

candidates : int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by more directly?


The maximum number of point that a sample can contain is directly linked
to the `radius`. As the dimension of the space increases, a higher radius
spread the points further and help overcome the curse-of-dimensionality.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes exactly


.. warning::

The algorithm is more suitable for low dimensions and sampling size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what to write. This is very empirical and highly depends on how you select the parameters. If you take a tiny radius and high dimensions then it will need a tons of points to fill the space. But playing with the radius can make this totally manageable.

The algorithm is more suitable for low dimensions and sampling size
due to its iterative nature and memory requirements.

Some code taken from [2]_ by P. Zun, written consent given on 31.03.2021
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P. Zun (@diregoblin) got the code from [2] (actually he tracked down some Matlab code and it lead to this SO post). There is more detail in the long discussion we had on this PR at the beginning.

scipy/stats/_qmc.py Outdated Show resolved Hide resolved
scipy/stats/_qmc.py Show resolved Hide resolved
scipy/stats/_qmc.py Outdated Show resolved Hide resolved
self.squared_radius = self.radius**2

# sample to generate per iteration in the hypersphere around center
self.candidates = candidates
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean just here or also the parameter name? I though it was simple like this. And since we have a parameter called candidates as a user I think I would not expect self.candidates to be something else than my parameter.

The other reason for not using n_... is that we are not consistent at all in terms of naming the number of something. So I didn't want to make a choice 😅

scipy/stats/_qmc.py Show resolved Hide resolved
scipy/stats/_qmc.py Show resolved Hide resolved
tupui and others added 3 commits May 20, 2022 22:09
Co-authored-by: Stefan van der Walt <sjvdwalt@gmail.com>
Co-authored-by: Stefan van der Walt <sjvdwalt@gmail.com>
Copy link
Member

@tupui tupui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I addressed your review @stefanv 😃 (I am running the CI again to have it green since I changed a param. Should be good, it passes locally. I also want to check on the doc the link to the tutorial)


.. warning::

The algorithm is more suitable for low dimensions and sampling size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

a hard problem parameters might need to be adjusted manually. Beware of
the dimension: as the dimensionality increases, the number of samples
required to fill the space increases exponentially
(curse-of-dimensionality).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a link above in the notes when I first mention the curse.

[skip actions] [skip circle]
@tylerjereddy
Copy link
Contributor

Looks like the reviews are generally pretty positive and CI is passing with 100 % patch diff coverage. No old tests have been modified.

It looks like the commits have been written fairly deliberately so I'll let them remain separate when I merge, just in case.

@tylerjereddy tylerjereddy merged commit 1451b2c into scipy:main May 23, 2022
@tylerjereddy
Copy link
Contributor

thanks

@tupui
Copy link
Member

tupui commented May 23, 2022

Thank you @tylerjereddy!

Thank you everyone for the work, especially @diregoblin! It's a great new feature to have IMHO.

@diregoblin
Copy link
Contributor Author

diregoblin commented May 23, 2022 via email

@tupui
Copy link
Member

tupui commented May 23, 2022

Pleasure @diregoblin. Don't hesitate to ping me if you have improvements you would like to make. I am happy to pair-program like this in the future.

@jpreiss
Copy link
Contributor

jpreiss commented May 23, 2022

Thanks for implementing this feature! It will be very useful for things like procedural generation as well as QMC.

@tupui
Copy link
Member

tupui commented May 24, 2022

@jpreiss happy to read. I am very much interested in what people will do with this and QMC in general. Feel free to ping me or open issues with enhancements/bugs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adding Poisson Disc sampling to QMC
9 participants