Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mir.random.nonuniform: Add Ziggurat method for Normal & Exponential #261

Closed
wants to merge 6 commits into from

Conversation

wilzbach
Copy link
Member

@wilzbach wilzbach commented Jul 19, 2016

Adds the Ziggurat sampling algorithm for Normal & Exponential distribution.

  1. Paper

Marsaglia, George, and Wai Wan Tsang. "The ziggurat method for generating random variables."
Journal of statistical software 5.8 (2000): 1-7.

  1. Thoughts
  1. Distribution plots (dub ./examples/nonuniform_plot.d):

image

image

Ping @joseph-wakeling-sociomantic @9il

@wilzbach wilzbach mentioned this pull request Jul 19, 2016
7 tasks
}
};

return Ziggurat!(T, fallback, R, true)(pdf, invPdf, 128, rightEnd, T(9.91256303526217e-3));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually would like to run the initialization in CTFE as it will never change, but exp uses inline assembler which isn't supported in CTFE :/
Has anyone an idea?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, start by filing an issue against phobos asking for a CTFE'able exp.

BTW where on earth does this magic constant 9.91256...e-3 come from? I would suggest making it a named manifest constant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, start by filing an issue against phobos asking for a CTFE'able exp.

Ok thanks - done. I will test whether copying the non-inline version from Phobos works.

BTW where on earth does this magic constant 9.91256...e-3 come from? I would suggest making it a named manifest constant.

It also comes from [Marsaglia00] - there it's called v.
It's the area of every block and thus depends on k and the distribution.

-> I will declare it more explicitly.

@codecov-io
Copy link

codecov-io commented Jul 20, 2016

Current coverage is 96.74% (diff: 97.77%)

Merging #261 into master will increase coverage by 0.21%

@@             master       #261   diff @@
==========================================
  Files            19         21     +2   
  Lines          3262       3874   +612   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits           3149       3748   +599   
- Misses          113        126    +13   
  Partials          0          0          

Sunburst

Powered by Codecov. Last update d24907b...b7541d9

@wilzbach
Copy link
Member Author

wilzbach commented Jul 20, 2016

(will post more summaries here soon, here's a brief overview)

Design flaws of the Ziggurat algorithm

two main critics:

  • same variable is used to pick the block & value (last 7 or 8 bits). However 2^50 values are needed to detect this
    • SHR3 not uniform (shouldn't affect us)

An Improved Ziggurat Method to Generate Normal Random Samples

  • use double
  • two random variables (one separate 7bit one to pick the block) -> optimization uses 1 + 1/4 random variables
  • no precomputation of f(x_i)
  • idea: use two random integers for higher-precisions values

@wilzbach
Copy link
Member Author

wilzbach commented Jul 20, 2016

btw an interesting overview paper is Gaussian Random Number Generators by Thomas et. al.
Summary: Wallace-method is the fastest, but doesn't provide good statistical quality. Ziggurat was the second fastest method among the huge benchmark while passing the chi-squared and achieving good scores at the high sigma test (see table 3 and 4).

Authors: Sebastian Wilzbach
*/

module mir.random.nonuniform;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor, but didn't we agree to create a mir.random.distribution package to contain everything ... ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See: #262

@9il
Copy link
Member

9il commented Aug 5, 2016

Does Ziggurat method yield better results for normal distribution then other methods in Atmosphere or dstats?

@wilzbach
Copy link
Member Author

wilzbach commented Aug 5, 2016

Does Ziggurat method yield better results for normal distribution then other methods in Atmosphere or dstats?

What's the best way to compare?

@wilzbach
Copy link
Member Author

wilzbach commented Aug 5, 2016

@joseph-wakeling-sociomantic

NormalDist CDF looks good from what I can judge

image

Exp doesn't

image

@9il
Copy link
Member

9il commented Aug 5, 2016

Does Ziggurat method yield better results for normal distribution then other methods in Atmosphere or dstats?
What's the best way to compare?

If you don't know if Ziggurat yield better result, then we do not need Ziggurat

@9il
Copy link
Member

9il commented Aug 5, 2016

What's the best way to compare?

There are no the best way. The best option is to read a couple of articles. We need to understand need we Ziggurat or not first, before spend time on it

@joseph-wakeling-sociomantic

If you don't know if Ziggurat yield better result, then we do not need Ziggurat

The existing academic literature would suggest that Ziggurat is a very effective method; the paper by Thomas et al. offers a variety of tests of statistical quality that could be used, IIRC.

Question is, given the timelines, is it worth pushing on with Ziggurat, or would it be better to implement more basic implementations of the various distributions, and return to Ziggurat as a longer-term work?

@9il
Copy link
Member

9il commented Aug 5, 2016

Question is, given the timelines, is it worth pushing on with Ziggurat, or would it be better to implement more basic implementations of the various distributions, and return to Ziggurat as a longer-term work?

First question is Ziggurat better for basic distributions than basic implementations of them? If there no strong Yes, then we the next after Tinflex is basic implementations.

@wilzbach
Copy link
Member Author

wilzbach commented Aug 5, 2016

The existing academic literature would suggest that Ziggurat is a very effective method; the paper by Thomas et al. offers a variety of tests of statistical quality that could be used, IIRC.

Yes sorry I should have been more precise. @9il what would be needed to convince you that Ziggurat is better than the algorithms in Atmosphere or dstats?
Is a X^2 test (that's what they use to evaluate the goodness of the fit) ok or should I also do the high-sigma test (more complex)?

Question is, given the timelines, is it worth pushing on with Ziggurat, or would it be better to implement more basic implementations of the various distributions, and return to Ziggurat as a longer-term work?

We have time at least time until October. I would prefer to go with the "better" algorithm and tune it. We already have the basics implementations in dstats against which we can benchmark and compare.

@wilzbach
Copy link
Member Author

wilzbach commented Aug 5, 2016

First question is Ziggurat better for basic distributions than basic implementations of them? If there no strong Yes, then we the next after Tinflex is basic implementations.

Quote from my comment above:

btw an interesting overview paper is Gaussian Random Number Generators by Thomas et. al.
Summary: Wallace-method is the fastest, but doesn't provide good statistical quality. Ziggurat was the second fastest method among the huge benchmark while passing the chi-squared and achieving good scores at the high sigma test (see table 3 and 4).

@9il
Copy link
Member

9il commented Aug 5, 2016

btw an interesting overview paper is Gaussian Random Number Generators by Thomas et. al.
Summary: Wallace-method is the fastest, but doesn't provide good statistical quality. Ziggurat was the second fastest method among the huge benchmark while passing the chi-squared and achieving good scores at the high sigma test (see table 3 and 4).

Wallace-method is not specialised for Normal if I am not wrong

@wilzbach
Copy link
Member Author

wilzbach commented Aug 5, 2016

Wallace-method is not specialised for Normal if I am not wrong

Afaik Ziggurat is neither. In the literature it's just commonly only used for Normal and exponential distributions, however it's a general method that works for all monotone decreasing distributions (or if their symmetric half is monotone decreasing)

@joseph-wakeling-sociomantic

First question is Ziggurat better for basic distributions than basic implementations of them?

I would say, "Yes, but." The "but" is because Ziggurat is more complicated to implement correctly (as we're learning). So, in terms of the current project, I would say there's a tradeoff between focusing on Ziggurat correct, versus getting a good variety of basic distributions in place with simpler (but more limited) algorithms.

@joseph-wakeling-sociomantic

however it's a general method that works for all monotone decreasing distributions (or if their symmetric half is monotone decreasing

Yes, exactly.

@wilzbach wilzbach mentioned this pull request Aug 10, 2016
@WebDrake
Copy link

@wilzbach:

We have time at least time until October.

The emails I'm getting from GSoC suggest that we're supposed to be finished up by the end of August, with 23 August as your own deadline for finalizing code and 29 August as the deadline for Ilya and me to submit our final evaluation report?
https://developers.google.com/open-source/gsoc/timeline

@wilzbach
Copy link
Member Author

The emails I'm getting from GSoC suggest that we're supposed to be finished up by the end of August, with 23 August as your own deadline for finalizing code and 29 August as the deadline for Ilya and me to submit our final evaluation report?
https://developers.google.com/open-source/gsoc/timeline

Yep, but that doesn't stop me to continue to work (I know that I wasted quite a lot of time)
My submission will only include the (Tin)flex algorithm, but I am still onto the mission to write (the building blocks for) a new & fast std.random for D. Hence I suggested to do it properly and as the benchmark in #286 suggests it's worth it.

@WebDrake
Copy link

Yep, but that doesn't stop me to continue to work (I know that I wasted quite a lot of time)

It's great that you want to keep working, but I was concerned about the expectations raised in the description of your GSoC project and what the people responsible might expect to see (which is why earlier I raised the possibility of doing some basic implementations of a variety of non-uniform distributions).

@9il you're the primary mentor here, so what are your thoughts?

@9il
Copy link
Member

9il commented Aug 10, 2016

@9il you're the primary mentor here, so what are your thoughts?

We already have 2 general purpose Discrete RNG realizations and Tinflex will be ready soon.

Tinflex is hard numeric project without obvious workforce requirements. R version contains a lot of numeric bugs, many of them are fixed in the this project. We can stamp / copy-past boost rng, and this is not problem. But copy-pasting is not related to the proper RNG numbers. First, we need proper shell over std.random and fixed uniform generators. @wilzbach expected that he would add them during GSoC, in the same I didn't expect it.

If @wilzbach implemented a variety of non-uniform distributions but not Tinflex, then I would not be able to consider this GSoC project as finished.

@WebDrake
Copy link

If @wilzbach implemented a variety of non-uniform distributions but not Tinflex, then I would not be able to consider this GSoC project as finished.

I agree with that; I'm asking whether we should implement a variety of non-uniform distributions instead of (short term) focusing on Ziggurat. Tinflex obviously takes primacy as it is the most significant part of the promised work.

@9il
Copy link
Member

9il commented Aug 10, 2016

I agree with that; I'm asking whether we should implement a variety of non-uniform distributions instead of (short term) focusing on Ziggurat. Tinflex obviously takes primacy as it is the most significant part of the promised work.

Yes, I prefer to add more a variety of non-uniform distributions instead of Ziggurat

@9il
Copy link
Member

9il commented Nov 27, 2016

Please reopen for mir-random

@9il 9il closed this Nov 27, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants