-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Poisson Disk sampling for QMC #13918
Conversation
@diregoblin any update here? I am also seeing that @rougier you did a PR here scikit-image/scikit-image#3531. Maybe you would be interested to help here? |
Sure. How exactly? |
@rougier you could either work together on the same branch if @diregoblin would agree, or provide comments, review, suggestions on the current progress. Basically just having another pair of eyes/expertize would already be helpful 😃 |
@jasondavies I saw you did some implementations of this. You might be interested in this. |
@diregoblin @rougier I have been playing around and did this https://gist.github.com/tupui/ade53fbc2dbde5420eab1f6000e1f963. Still chasing bugs (I will update this soon) but this is the idea behind what I had in mind. |
@diregoblin @rougier I updated my gist and now it works as expected (sorry if you checked it before 😅, I had unexpected slicing issues, and I am still puzzled about a few things like the need to use |
For the PR, you could add a test to check for point to point distances that need to be greater or equal to the radius. For performances, you could benchmark the two methods on a few cases and check which one is faster. Also, from memory, I had some surprise with the size of the initial domain (square domain was ok, but for other cases, I did some bad mix between width / height) |
Here is another approach worth looking into, sample elimination. It's not a strict poisson disk sampling though. Starting from a good sample (from a low discrepancy sequence) would make this quite fast. http://www.cemyuksel.com/research/sampleelimination/sampleelimination.pdf We could have a |
@diregoblin are you still interested in working on this? If not, I am happy to try make this ready for inclusion. (Your authorship would be preserved if that would be a concern.) I tested this a bit more locally and found your version to be faster than mine. I might be doing something wrong, I am not sure. It ended up being a bit complicated with all the slicing in nD. But I am just fine going with any version to move forward. |
@tupui Hi Pamphile, Sorry, I've been too busy lately with teaching and other stuff to really work on this... If you are ready to proceed with making this ready for inclusion, please go ahead. If you need any clarification about the algorithm, I'll be happy to help of course. |
No problem. Thank you I will take over then and see what I can do 😃 |
def in_limits(sample: np.ndarray) -> bool: | ||
return (sample.max() <= 1.) and (sample.min() >= 0.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need this in a few other places in this file already. I would like to create an external function for that.
I will do another PR.
FYI I started the maintenance work. I am just trying to clean the code and make it more "SciPy"-like since the approach seemed to be good when I played around with my implementation. I am going to add some tests/doc and think a bit more about the API before asking for reviews. Of course, feel free to comment my progress already. |
Thank you @diregoblin for the review 😃 I made the changes. Hopefully we have another maintainer reviewing this before end of next week so it can get into the new release (1.9). @grlee77 if you have time can you have a look? |
Great! One thing that worries me a bit is the way this is supposed to be used in QMC. If we just draw |
I see what you mean yes. Right now you can achieve this by asking more points and it will continue the sampling (there is a test for that). Instead of another option, we could also add another method that would not ask for |
Yeah, I think this can work, and the name sounds clear enough. This is probably the best option. |
Looks good. One comment for the main method: it would be nice to be able to specify the overall number of samples we want instead of radius. In such case, we could compute an approximated radius to achieve this number of samples (with no guarantee unless we ad/remove missing point and proceed with a Lloyd relaxation. |
Oh that's a good idea. So in terms of API we could still have About the Lloyd relaxation, I am happy to hear we could have an internal usage of what I added 😅 How would this work? I suggest we work on these in follow ups (I like these and will do it, I am not pushing back to not do it😉). The current state is ready IMO and we don't have much more time to get a chance to see this in the next release. It will be also simpler to review if we do this incrementally. |
@diregoblin @rougier I made the changes to add |
[skip actions] [skip azp]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked, this looks good to me. Github doesn't allow me to explicitly approve this, since it's my PR.
Off the top of my head, I don't know a way to calculate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just did a rough read-over of the docs mainly.
scipy/stats/_qmc.py
Outdated
new candidates are sampled within the hypersphere. | ||
* ``surface``: only sample the surface of the hypersphere. | ||
|
||
candidates : int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't you want to control the number of points more directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by more directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Often I want to put, say, 15 points into an area, ensuring they don't touch. It does not look as though you can do that with the current implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can, it's just random(15)
. The issue is that it can return less than 15 points. It would be hard to generalize and ensure we can find exactly 15 points. We would need to change the radius to make this happen. But then we would not be able to potentially add more samples or it would do so by reducing the radius. IMO not clear. Can make sense, but definitely a deviation from the base Poisson sampling.
This parameter is just to say that at each iterations how many points around a given point we are sampling. Then we look at these candidates one by one and see if they are far enough from existing points. If yes we add them and to the sample and the pool of centre points to use in next iterations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
random(15)
? I thought the argument is an int?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant PoissonDisk(...).random(15)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! That makes sense. If you happen to have enough points in the poisson disk, but I suppose you can write a loop to keep trying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit what the method full_space
does. Try to add as many points as possible.
scipy/stats/_qmc.py
Outdated
self.squared_radius = self.radius**2 | ||
|
||
# sample to generate per iteration in the hypersphere around center | ||
self.candidates = candidates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
candidates
is a bit confusing, since these aren't the candidates; perhaps nr_candidates
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean just here or also the parameter name? I though it was simple like this. And since we have a parameter called candidates
as a user I think I would not expect self.candidates
to be something else than my parameter.
The other reason for not using n_...
is that we are not consistent at all in terms of naming the number of something. So I didn't want to make a choice 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think candidates
refer to the candidates themselves, whereas something like nr_candidates
refers to, well, that. So just depends on what you mean with the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do mean a number of candidates. What about I mark it private and rename it to self._nr_candidates
while keeping the name of the parameter as candidates
which I think is clear especially since we have typing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check the rest of SciPy? At a quick glance, it seems like elsewhere they also user nr when indicating number of. See, e.g., https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.basinhopping.html#scipy.optimize.basinhopping
There will undoubtedly be many other examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok then I will rename both to ncandidates
Co-authored-by: Stefan van der Walt <sjvdwalt@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @stefanv!
scipy/stats/_qmc.py
Outdated
new candidates are sampled within the hypersphere. | ||
* ``surface``: only sample the surface of the hypersphere. | ||
|
||
candidates : int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by more directly?
scipy/stats/_qmc.py
Outdated
|
||
The maximum number of point that a sample can contain is directly linked | ||
to the `radius`. As the dimension of the space increases, a higher radius | ||
spread the points further and help overcome the curse-of-dimensionality. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes exactly
|
||
.. warning:: | ||
|
||
The algorithm is more suitable for low dimensions and sampling size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what to write. This is very empirical and highly depends on how you select the parameters. If you take a tiny radius and high dimensions then it will need a tons of points to fill the space. But playing with the radius can make this totally manageable.
scipy/stats/_qmc.py
Outdated
The algorithm is more suitable for low dimensions and sampling size | ||
due to its iterative nature and memory requirements. | ||
|
||
Some code taken from [2]_ by P. Zun, written consent given on 31.03.2021 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P. Zun (@diregoblin) got the code from [2] (actually he tracked down some Matlab code and it lead to this SO post). There is more detail in the long discussion we had on this PR at the beginning.
scipy/stats/_qmc.py
Outdated
self.squared_radius = self.radius**2 | ||
|
||
# sample to generate per iteration in the hypersphere around center | ||
self.candidates = candidates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean just here or also the parameter name? I though it was simple like this. And since we have a parameter called candidates
as a user I think I would not expect self.candidates
to be something else than my parameter.
The other reason for not using n_...
is that we are not consistent at all in terms of naming the number of something. So I didn't want to make a choice 😅
Co-authored-by: Stefan van der Walt <sjvdwalt@gmail.com>
Co-authored-by: Stefan van der Walt <sjvdwalt@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I addressed your review @stefanv 😃 (I am running the CI again to have it green since I changed a param. Should be good, it passes locally. I also want to check on the doc the link to the tutorial)
|
||
.. warning:: | ||
|
||
The algorithm is more suitable for low dimensions and sampling size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
a hard problem parameters might need to be adjusted manually. Beware of | ||
the dimension: as the dimensionality increases, the number of samples | ||
required to fill the space increases exponentially | ||
(curse-of-dimensionality). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a link above in the notes when I first mention the curse.
[skip actions] [skip circle]
Looks like the reviews are generally pretty positive and CI is passing with 100 % patch diff coverage. No old tests have been modified. It looks like the commits have been written fairly deliberately so I'll let them remain separate when I merge, just in case. |
thanks |
Thank you @tylerjereddy! Thank you everyone for the work, especially @diregoblin! It's a great new feature to have IMHO. |
Thank you everybody! Pamphile, thanks again for adapting the code to SciPy
standards.
Cheers,
Pavel
…On Mon, 23 May 2022 at 06:15, Pamphile Roy ***@***.***> wrote:
Thank you @tylerjereddy <https://github.com/tylerjereddy>!
Thank you everyone for the work, especially @diregoblin
<https://github.com/diregoblin>! It's a great new feature to have IMHO.
—
Reply to this email directly, view it on GitHub
<#13918 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAM4QINXUE6ZKXEGJWV6WVLVLLZ4PANCNFSM43L6BEYQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Pleasure @diregoblin. Don't hesitate to ping me if you have improvements you would like to make. I am happy to pair-program like this in the future. |
Thanks for implementing this feature! It will be very useful for things like procedural generation as well as QMC. |
@jpreiss happy to read. I am very much interested in what people will do with this and QMC in general. Feel free to ping me or open issues with enhancements/bugs. |
Reference issue
Adding Poisson Disk sampling to QMC #13912
What does this implement/fix?
Poisson disk sampling in arbitrary dimensions using Bridson's algorithm, implemented using
numpy
andscipy
.Generates so-called "blue noise" that prevents clustering by ensuring each two points are at least
radius
apart.https://www.cs.ubc.ca/~rbridson/docs/bridson-siggraph07-poissondisk.pdf