Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Support Random Unit Vector #16205

Closed
saroad2 opened this issue May 18, 2022 · 16 comments · Fixed by #17277
Closed

ENH: Support Random Unit Vector #16205

saroad2 opened this issue May 18, 2022 · 16 comments · Fixed by #17277
Labels
enhancement A new feature or improvement scipy.stats

Comments

@saroad2
Copy link

saroad2 commented May 18, 2022

Is your feature request related to a problem? Please describe.

Proposed new feature or change:

Description

Often you want to randomize a "direction" that doesn't have "size". This can be useful when:

  • You want to randomize the direction of an NPC walk in a 2D or 3D games, where its speed is a predefined constant.
  • You're developing particles simulation with all particles moving in random directions
  • etc.

This random direction should be an nth dimensional unit vector which is randomized uniformly from the unit sphere. That means that sections of the unit sphere with equal areas should have the same chance of getting a vector from.

Implementation

If one wants to randomize nth dimensional direction uniformly, one must do the following:

  1. Generate n components of the vector using the standard normal distribution
  2. Find the norm the generated vector
  3. Divide the vector by its norm

In simple Python code:

import numpy as np
from scipy import stats


def sphere_uniform(n: int):
    vec = stats.norm.rvs(size=n)
    size = np.linalg.norm(vec)
    return vec / size

This implementation suggestion is based on this blog post. I'm not sure if there's a faster way of implementing it using C++ and Cython, but I'm confident that someone here will know.

Why Implement It in Scipy?

Originally, I suggested implementing it on Numpy (numpy/numpy#21523), but they stated that they are not interested in expending the numpy.random library and suggested adding this new feature as part of scipy.stats.

I believe that random unit vectors are common enough to be a part of Numpy or Scipy, which are two of the most popular python libraries (if not the most popular ones). Scientists, gamers and engineers use random directions all the time, and I believe there is no reason why this ability shouldn't be provided out-of-the-box.

@saroad2 saroad2 added the enhancement A new feature or improvement label May 18, 2022
@ilayn
Copy link
Member

ilayn commented May 18, 2022

Seems a bit too specific to be on its own as a function to be honest. The following is sufficient (similar to what you have instead you want a uniform distribution, you can also center around 0. by subtracting 0.5 from vec):

import numpy as np

def unit_vector(n: int):
    rng = np.default_rng()
    vec = rng.random(n)
    vec /= np.linalg.norm(vec)
    return vec

It gets even shorter if you don't mind using np.random.rand(n) directly. Hence I think this is a bearable burden for any use.

@saroad2
Copy link
Author

saroad2 commented May 18, 2022

Hey @ilayn , thank you for the quick response.

I respectfully disagree. I've encountered the need for randomizing a unit vector uniformly many times for the past few years during my physics degree, and in my job as an ML team lead. This is a quite needed ability that I'm sure people would be happy to have here at Scipy.

As for the implementation you've suggested, the problem with it that there's a tendency of the random vector to be in the diagonal line between the origin and the corner of the unit cube. This is very elegnatly explained in the blog post I've linked above.

That means that the distribution in the implementation you've suggested is not uniform as most users would assume it is. This is the reason one must use normal distribution to randomize the components of the vector and not uniform distribution between -1 and 1.

As many others thing, this is not something that is a "must have", we agree on that. People could implement this using one of the implementations you and I suggested.

@ilayn
Copy link
Member

ilayn commented May 18, 2022

I have quite some technical issues with that blogpost but regardless of the distribution which involves no additional code for changing it normal distribution, this is still a two-liner code that I would prefer burdening the user instead of maintaining within scipy or numpy.

@rkern
Copy link
Member

rkern commented May 18, 2022

Uniformly-random vectors on the unit hypersphere is an important multivariate distribution. It's not hard to implement oneself, if one knows the necessary trick (and yes, the normal distribution is necessary). If one doesn't know the appropriate trick, ad hoc implementations tend to fail in ways that are hard to diagnose in higher dimensions than 2.

@dschmitz89
Copy link
Contributor

What could be a good API for this?
Sth like a distribution scipy.stats.UniformUnitSphere with method rvs or just a function scipy.stats.randomunitvector(size) ?

@mdhaber
Copy link
Contributor

mdhaber commented Jun 3, 2022

The former, I think. It would fit well as a subclass of multi_rv_generic in _multivariate.py.

@saroad2
Copy link
Author

saroad2 commented Jun 5, 2022

Good to see that other people find this feature needed and usefull.

Do I have a go on implementing it?

@dschmitz89
Copy link
Contributor

I asked on the mailing list if there are any objections against this and a function for spherical mean/variance. If there are none , I think it can be taken as a yes in a few days.

@mdhaber
Copy link
Contributor

mdhaber commented Jun 5, 2022

I think that's true, @dschmitz89.

@saroad2 One thing to look out for is to not start with the shortcomings we're addressing in gh-16278. If you use another multivariate distribution as a reference, you may be tempted to add np.squeeze to the output of rvs. This is undesirable. If d is the number of dimensions and size is passed in as a shape parameter (or () if size is None), then the output shape should be size + (d,). (I think it's OK to support only scalar d for now because there is a decision to be made w.r.t. how to vectorize over array shapes.)

@saroad2
Copy link
Author

saroad2 commented Jun 8, 2022

What's the difference between multi_rv_generic and multi_rv_frozen? I didn't understance what the frozen stands for.

@rkern
Copy link
Member

rkern commented Jun 8, 2022

multi_rv_generic instances have methods that take the distribution parameters. One instance of the multi_rv_generic is made at the module level.

stats.multivariate_normal.pdf(x, mean, cov)

multi_rv_frozen instances are created by calling the multi_rv_generic instance with the distribution parameters. The frozen instance then carries those distribution parameters around so that you can call the methods without providing the distribution parameters.

my_norm = stats.multivariate_normal(mean, cov)
my_norm.pdf(x)

This scheme is relatively complicated for the multivariate distributions because each one is fairly different from the others. The univariate distributions have much more shared infrastructure and only need to implement the generic class. They all share a frozen implementation.

@mdhaber
Copy link
Contributor

mdhaber commented Jul 25, 2022

@saroad2 are you still interested in implementing this? If you do, I can review and merge.

@saroad2
Copy link
Author

saroad2 commented Jul 25, 2022

Hey @mdhaber.

After many hours of trying, I could not build Scipy on my Windows environment. It was really frustrating.
Unfortunately, I don't have access to a Linux or Mac environments, so I cannot implement this ability for now.

Nevertheless, I would love to see it get implemented! It should be quite easy once you get to build Scipy...

@mdhaber
Copy link
Contributor

mdhaber commented Jul 25, 2022

Yeah that's the trick. I develop on Windows, too, and it was really difficult to get it working the first time. Have you followed these instructions, including building OpenBLAS?

In the meantime, you could consider using Gitpod so you don't need to worry about building locally.

Also, building is supposed to become easier soon with this new Meson build system, but I'm not sure if it's working on Windows yet. (Perhaps others will chime in?) I'll let you know when I've been successful with that.

@dschmitz89
Copy link
Contributor

dschmitz89 commented Aug 28, 2022

@saroad2 I am using the Windows Subsystem for Linux on Windows 11. Works like a charm there with the instructions for conda given here . I admit, I have not even tried to build using windows directly. Anything related to complex compiled code on Windows has always been very painful for me too (Fortran ... ).
There is a discussion about building on windows in #16769 . Wondering if this is a blocking point for many potential contributors.

In case you cannot find the time, I would be able to take this over.

@saroad2
Copy link
Author

saroad2 commented Aug 28, 2022

Hey @dschmitz89 , thanks for the feedback!

In the following weeks I will not be able to work on this. A little bit preoccupied with other things.

I would love if you could take over so I won't block it.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.stats
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants