mir.random.nonuniform: Add Ziggurat method for Normal & Exponential #261
Conversation
} | ||
}; | ||
|
||
return Ziggurat!(T, fallback, R, true)(pdf, invPdf, 128, rightEnd, T(9.91256303526217e-3)); |
wilzbach
Jul 19, 2016
Author
Member
I actually would like to run the initialization in CTFE as it will never change, but exp
uses inline assembler which isn't supported in CTFE :/
Has anyone an idea?
I actually would like to run the initialization in CTFE as it will never change, but exp
uses inline assembler which isn't supported in CTFE :/
Has anyone an idea?
joseph-wakeling-sociomantic
Jul 19, 2016
Well, start by filing an issue against phobos asking for a CTFE'able exp
.
BTW where on earth does this magic constant 9.91256...e-3
come from? I would suggest making it a named manifest constant.
Well, start by filing an issue against phobos asking for a CTFE'able exp
.
BTW where on earth does this magic constant 9.91256...e-3
come from? I would suggest making it a named manifest constant.
wilzbach
Jul 19, 2016
Author
Member
Well, start by filing an issue against phobos asking for a CTFE'able exp.
Ok thanks - done. I will test whether copying the non-inline version from Phobos works.
BTW where on earth does this magic constant 9.91256...e-3 come from? I would suggest making it a named manifest constant.
It also comes from [Marsaglia00] - there it's called v
.
It's the area of every block and thus depends on k
and the distribution.
-> I will declare it more explicitly.
Well, start by filing an issue against phobos asking for a CTFE'able exp.
Ok thanks - done. I will test whether copying the non-inline version from Phobos works.
BTW where on earth does this magic constant 9.91256...e-3 come from? I would suggest making it a named manifest constant.
It also comes from [Marsaglia00] - there it's called v
.
It's the area of every block and thus depends on k
and the distribution.
-> I will declare it more explicitly.
/// precalculate scaling to R.max for x_i | ||
T[] xScaled; | ||
/// precalculate pdf value for x_i | ||
T[] fs; |
wilzbach
Jul 19, 2016
•
Author
Member
"An Improved Ziggurat Method to Generate Normal Random Samples" postulates that saving the fs
isn't necessary to which I couldn't follow yet.
DRanU() returns a uniform random number, U (0, 1), and IRanU() returns 32-bit
unsigned random integer. DRanNormalTail is implemented as a separate function:
it gets called only rarely, so that efficiency does not matter. For the same reason, it
is not necessary to avoid a call to exp() when checking for the wedges (this could
be achieved by precomputing the function values f (xi). (page 7)
Imho saving doesn't waste much space and saves time
"An Improved Ziggurat Method to Generate Normal Random Samples" postulates that saving the fs
isn't necessary to which I couldn't follow yet.
DRanU() returns a uniform random number, U (0, 1), and IRanU() returns 32-bit
unsigned random integer. DRanNormalTail is implemented as a separate function:
it gets called only rarely, so that efficiency does not matter. For the same reason, it
is not necessary to avoid a call to exp() when checking for the wedges (this could
be achieved by precomputing the function values f (xi). (page 7)
Imho saving doesn't waste much space and saves time
T function(T x) invPdf; | ||
|
||
/// precalculate difference x_i / x_{i+1} | ||
T[] xDiv; |
wilzbach
Jul 19, 2016
Author
Member
As the array size is known at compile-time, should we use T[k]
?
As the array size is known at compile-time, should we use T[k]
?
joseph-wakeling-sociomantic
Jul 19, 2016
Don't see why not, but right now k
is not provided at compile time AFAICS?
Don't see why not, but right now k
is not provided at compile time AFAICS?
wilzbach
Jul 20, 2016
Author
Member
Don't see why not, but right now k is not provided at compile time AFAICS?
Well it's not strictly needed at compile-time (in comparison to the other compile-time parameters), but I don't see any use case where one wants to be able to choose the block size at runtime as the rightEnd
and averageArea
need to be precomputed by hand anyways.
Don't see why not, but right now k is not provided at compile time AFAICS?
Well it's not strictly needed at compile-time (in comparison to the other compile-time parameters), but I don't see any use case where one wants to be able to choose the block size at runtime as the rightEnd
and averageArea
need to be precomputed by hand anyways.
Marsaglia, George, and Wai Wan Tsang. "The ziggurat method for generating random variables." | ||
Journal of statistical software 5.8 (2000): 1-7. | ||
*/ | ||
struct Ziggurat(T, string _fallback, R = uint, bool bothSides) |
joseph-wakeling-sociomantic
Jul 19, 2016
Instead of R
I would use UIntType
(it matches the typical template-parameter name in both phobos std.random
and C++11 <random>
for the word-type of the uniform RNG).
However, in this case, is there any possibility to avoid needing the word-type to be known at compile time?
Instead of R
I would use UIntType
(it matches the typical template-parameter name in both phobos std.random
and C++11 <random>
for the word-type of the uniform RNG).
However, in this case, is there any possibility to avoid needing the word-type to be known at compile time?
joseph-wakeling-sociomantic
Jul 19, 2016
Also, note that while a dependency on the word size is potentially more flexible, it's worth considering also just templating the ziggurat on the actual RNG type.
Also, note that while a dependency on the word size is potentially more flexible, it's worth considering also just templating the ziggurat on the actual RNG type.
wilzbach
Jul 20, 2016
Author
Member
However, in this case, is there any possibility to avoid needing the word-type to be known at compile time?
Well you already proposed it.
Also, note that while a dependency on the word size is potentially more flexible, it's worth considering also just templating the ziggurat on the actual RNG type.
Hmm while I understand the motivation of providing a simple API for the user, this would create even more template bloat (for every RNG, RNG template in Ziggurat template) and limit the API more than needed. Hence I think making opCall
generic depending on the RNG is probably the better way to go.
However, in this case, is there any possibility to avoid needing the word-type to be known at compile time?
Well you already proposed it.
Also, note that while a dependency on the word size is potentially more flexible, it's worth considering also just templating the ziggurat on the actual RNG type.
Hmm while I understand the motivation of providing a simple API for the user, this would create even more template bloat (for every RNG, RNG template in Ziggurat template) and limit the API more than needed. Hence I think making opCall
generic depending on the RNG is probably the better way to go.
} | ||
|
||
/// samples a value from the discrete distribution using a custom random generator | ||
T opCall(RNG)(ref RNG gen) const |
joseph-wakeling-sociomantic
Jul 19, 2016
Assuming you do need to know the word-type (your template parameter R
) at compile time, you might want to validate that typeof(gen.front)
matches it.
Note that in principle at least it's possible to handle different R
and generator return type: if R's size is a multiple of the generator's word size, then you can populate it by several calls to the generator; conversely, if R
is smaller than the generator's word size, you can use a single RNG variate to provide several of the needed values. But this may be adding excessive complexity.
Assuming you do need to know the word-type (your template parameter R
) at compile time, you might want to validate that typeof(gen.front)
matches it.
Note that in principle at least it's possible to handle different R
and generator return type: if R's size is a multiple of the generator's word size, then you can populate it by several calls to the generator; conversely, if R
is smaller than the generator's word size, you can use a single RNG variate to provide several of the needed values. But this may be adding excessive complexity.
wilzbach
Jul 20, 2016
Author
Member
Assuming you do need to know the word-type (your template parameter R) at compile time, you might want to validate that typeof(gen.front) matches it.
Nice idea (done).
Note that in principle at least it's possible to handle different R and generator return type: if R's size is a multiple of the generator's word size, then you can populate it by several calls to the generator; conversely, if R is smaller than the generator's word size, you can use a single RNG variate to provide several of the needed values. But this may be adding excessive complexity.
Good point - how common is it to have something different than uint
or ulong
?
Assuming you do need to know the word-type (your template parameter R) at compile time, you might want to validate that typeof(gen.front) matches it.
Nice idea (done).
Note that in principle at least it's possible to handle different R and generator return type: if R's size is a multiple of the generator's word size, then you can populate it by several calls to the generator; conversely, if R is smaller than the generator's word size, you can use a single RNG variate to provide several of the needed values. But this may be adding excessive complexity.
Good point - how common is it to have something different than uint
or ulong
?
joseph-wakeling-sociomantic
Jul 20, 2016
Good point - how common is it to have something different than uint or ulong?
Not common, I think, at least not these days.
Good point - how common is it to have something different than uint or ulong?
Not common, I think, at least not these days.
import mir.internal.math : exp, log; | ||
|
||
auto pdf = (T x) => cast(T) exp(-x); | ||
auto invPdf = (T x) => cast(T) -log(x); |
joseph-wakeling-sociomantic
Jul 19, 2016
I'm not sure I like the explicit casts. Surely it's possible to get the same result here without them?
I'm not sure I like the explicit casts. Surely it's possible to get the same result here without them?
wilzbach
Jul 20, 2016
Author
Member
(Idk why my comment wasn't saved). The problem is that exp
only accepts and returns real
.
(Idk why my comment wasn't saved). The problem is that exp
only accepts and returns real
.
joseph-wakeling-sociomantic
Aug 5, 2016
Just as the normal
implementation looks to be missing mean + variance, here the exponential implementation is missing its \lambda
control parameter.
Just as the normal
implementation looks to be missing mean + variance, here the exponential implementation is missing its \lambda
control parameter.
} | ||
}; | ||
|
||
return Ziggurat!(T, fallback, R, false)(pdf, invPdf, 256, T(7.697117470131487), T(3.949659822581572e-3)); |
joseph-wakeling-sociomantic
Jul 19, 2016
While calculations using pdf
and invPdf
can only be done at runtime for now (because of the exp
implementation issues), is there any reason why the actual lambdas can't be provided as template parameters? It would make for a more logical design, I think (and also be future-proof against the point when you get a CTFE'able exp
).
While calculations using pdf
and invPdf
can only be done at runtime for now (because of the exp
implementation issues), is there any reason why the actual lambdas can't be provided as template parameters? It would make for a more logical design, I think (and also be future-proof against the point when you get a CTFE'able exp
).
joseph-wakeling-sociomantic
Jul 19, 2016
I would also suggest defining explicit templates for Normal(T)
, Exponential(T)
etc. to wrap these Ziggurat
instantiations.
I would also suggest defining explicit templates for Normal(T)
, Exponential(T)
etc. to wrap these Ziggurat
instantiations.
joseph-wakeling-sociomantic
Jul 19, 2016
An alternative would be to define actual Normal(T)
and Exponential(T)
wrapper structs that just use a Ziggurat
instance internally; that might also give you a more future-proof API should you ever wish to rework the internals.
That might be better, because it'll probably be easier for users to debug stuff if they see a type called Normal!double
instead of Ziggurat!(LOTS, OF, DIFFERENT, STUFF)
.
An alternative would be to define actual Normal(T)
and Exponential(T)
wrapper structs that just use a Ziggurat
instance internally; that might also give you a more future-proof API should you ever wish to rework the internals.
That might be better, because it'll probably be easier for users to debug stuff if they see a type called Normal!double
instead of Ziggurat!(LOTS, OF, DIFFERENT, STUFF)
.
wilzbach
Jul 19, 2016
Author
Member
I put everything as template argument that is absolutely needed at compile time.
Does it make a huge difference? - I thought in the end we can just use enum z = Ziggurat!..
On July 19, 2016 2:15:03 PM GMT+02:00, Joseph Wakeling notifications@github.com wrote:
- import mir.internal.math : exp, log;
- auto pdf = (T x) => cast(T) exp(-x);
- auto invPdf = (T x) => cast(T) -log(x);
- // values from [Marsaglia00]
- enum fallback = q{
-
T fallback(RNG)(ref RNG gen) const
-
{
-
import std.random : uniform;
-
auto u = uniform!("[]", T, T)(0, 1, gen);
-
return 7.69711 - u;
-
}
- };
- return Ziggurat!(T, fallback, R, false)(pdf, invPdf, 256,
T(7.697117470131487), T(3.949659822581572e-3));
While calculations using pdf
and invPdf
can only be done at runtime
for now (because of the exp
implementation issues), is there any
reason why the actual lambdas can't be provided as template parameters?
It would make for a more logical design, I think (and also be
future-proof against the point when you get a CTFE'able exp
).
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub:
https://github.com/libmir/mir/pull/261/files/6a6565f321a80c67b95a7657bf55cc77989fa09c#r71325812
I put everything as template argument that is absolutely needed at compile time.
Does it make a huge difference? - I thought in the end we can just use enum z = Ziggurat!..
On July 19, 2016 2:15:03 PM GMT+02:00, Joseph Wakeling notifications@github.com wrote:
- import mir.internal.math : exp, log;
- auto pdf = (T x) => cast(T) exp(-x);
- auto invPdf = (T x) => cast(T) -log(x);
- // values from [Marsaglia00]
- enum fallback = q{
T fallback(RNG)(ref RNG gen) const
{
import std.random : uniform;
auto u = uniform!("[]", T, T)(0, 1, gen);
return 7.69711 - u;
}
- };
- return Ziggurat!(T, fallback, R, false)(pdf, invPdf, 256,
T(7.697117470131487), T(3.949659822581572e-3));While calculations using
invPdf
can only be done at runtime
for now (because of theexp
implementation issues), is there any
reason why the actual lambdas can't be provided as template parameters?
It would make for a more logical design, I think (and also be
future-proof against the point when you get a CTFE'ableexp
).
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub:
https://github.com/libmir/mir/pull/261/files/6a6565f321a80c67b95a7657bf55cc77989fa09c#r71325812
joseph-wakeling-sociomantic
Jul 19, 2016
It makes less of a difference if the Ziggurat
instance is wrapped away inside a type that's explicitly Normal
or Exponential
or whatever. But if you're set on returning a raw Ziggurat
instantiation from your normal
and exponential
factory functions, it might be useful to have the PDF and inverse PDF clearly there in the template parameters.
It makes less of a difference if the Ziggurat
instance is wrapped away inside a type that's explicitly Normal
or Exponential
or whatever. But if you're set on returning a raw Ziggurat
instantiation from your normal
and exponential
factory functions, it might be useful to have the PDF and inverse PDF clearly there in the template parameters.
wilzbach
Jul 20, 2016
Author
Member
An alternative would be to define actual Normal(T) and Exponential(T) wrapper structs that just use a Ziggurat instance internally; that might also give you a more future-proof API should you ever wish to rework the internals.
I really like the idea of having a broad Normal(T, UIntType)
API, but with my current attempt (see below - for some reasons the PR didn't go through yesterday) it is yet another function call.
Should I build a Ziggurat struct with mixins?
An alternative would be to define actual Normal(T) and Exponential(T) wrapper structs that just use a Ziggurat instance internally; that might also give you a more future-proof API should you ever wish to rework the internals.
I really like the idea of having a broad Normal(T, UIntType)
API, but with my current attempt (see below - for some reasons the PR didn't go through yesterday) it is yet another function call.
Should I build a Ziggurat struct with mixins?
Current coverage is 96.74% (diff: 97.77%)@@ master #261 diff @@
==========================================
Files 19 21 +2
Lines 3262 3874 +612
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 3149 3748 +599
- Misses 113 126 +13
Partials 0 0
|
(will post more summaries here soon, here's a brief overview) Design flaws of the Ziggurat algorithmtwo main critics:
An Improved Ziggurat Method to Generate Normal Random Samples
|
btw an interesting overview paper is Gaussian Random Number Generators by Thomas et. al. |
Authors: Sebastian Wilzbach | ||
*/ | ||
|
||
module mir.random.nonuniform; |
joseph-wakeling-sociomantic
Aug 5, 2016
Minor, but didn't we agree to create a mir.random.distribution
package to contain everything ... ?
Minor, but didn't we agree to create a mir.random.distribution
package to contain everything ... ?
gen.popFront(); | ||
|
||
// TODO: this is a bit biased | ||
size_t i = u & kMask; |
joseph-wakeling-sociomantic
Aug 5, 2016
Given your TODO
here, I assume this is the first place you're looking to understand why the plotted distributions don't look quite right .... ?
Given your TODO
here, I assume this is the first place you're looking to understand why the plotted distributions don't look quite right .... ?
wilzbach
Aug 5, 2016
Author
Member
Yes resolving whether we can reuse the random bits or need to generate another 1/4 random variable (one of the optimizations described in later papers), shouldn't matter so much :/
Yes resolving whether we can reuse the random bits or need to generate another 1/4 random variable (one of the optimizations described in later papers), shouldn't matter so much :/
T x, y, u; | ||
do | ||
{ | ||
u = uniform!("[]", T, T)(0, 1, gen); |
joseph-wakeling-sociomantic
Aug 5, 2016
Marsaglia & Tang talk about UNI
(in their paper) being a generator of "uniform (0, 1) variates". That would suggest over the open rather than closed interval, i.e. uniform!"()"
rather than uniform!"[]"
. Does that make a difference to your distribution results?
Marsaglia & Tang talk about UNI
(in their paper) being a generator of "uniform (0, 1) variates". That would suggest over the open rather than closed interval, i.e. uniform!"()"
rather than uniform!"[]"
. Does that make a difference to your distribution results?
|
||
// TODO: this is a bit biased | ||
size_t i = u & kMask; | ||
//size_t i = uniform!("[)", size_t, size_t)(0, kMask, gen); |
joseph-wakeling-sociomantic
Aug 5, 2016
Minor: note that uniform!T
should give uniform distribution across all possible values of an integral type T
. But in this case arguably unnecessary.
Minor: note that uniform!T
should give uniform distribution across all possible values of an integral type T
. But in this case arguably unnecessary.
import mir.internal.math : exp, log, sqrt; | ||
|
||
auto pdf = (T x) => cast(T) exp(T(-0.5) * x * x); | ||
auto invPdf = (T x) => cast(T) sqrt(T(-2) * log(x)); |
joseph-wakeling-sociomantic
Aug 5, 2016
Aren't these definitions missing the mean and variance parameters of the normal distribution? I recognize that one can generate any normal distribution once one has variates from N(0, 1), but surely that should be baked in to the implementation?
Also: even allowing for N(0, 1), isn't the PDF missing the divisor by \sqrt{2 * \pi}
... ? & presumably this means the inverse PDF is also missing something correspondingly?
Aren't these definitions missing the mean and variance parameters of the normal distribution? I recognize that one can generate any normal distribution once one has variates from N(0, 1), but surely that should be baked in to the implementation?
Also: even allowing for N(0, 1), isn't the PDF missing the divisor by \sqrt{2 * \pi}
... ? & presumably this means the inverse PDF is also missing something correspondingly?
Does Ziggurat method yield better results for normal distribution then other methods in Atmosphere or dstats? |
What's the best way to compare? |
NormalDist CDF looks good from what I can judge Exp doesn't |
If you don't know if Ziggurat yield better result, then we do not need Ziggurat |
There are no the best way. The best option is to read a couple of articles. We need to understand need we Ziggurat or not first, before spend time on it |
The existing academic literature would suggest that Ziggurat is a very effective method; the paper by Thomas et al. offers a variety of tests of statistical quality that could be used, IIRC. Question is, given the timelines, is it worth pushing on with Ziggurat, or would it be better to implement more basic implementations of the various distributions, and return to Ziggurat as a longer-term work? |
First question is Ziggurat better for basic distributions than basic implementations of them? If there no strong |
Yes sorry I should have been more precise. @9il what would be needed to convince you that Ziggurat is better than the algorithms in Atmosphere or dstats?
We have time at least time until October. I would prefer to go with the "better" algorithm and tune it. We already have the basics implementations in dstats against which we can benchmark and compare. |
Quote from my comment above:
|
Wallace-method is not specialised for Normal if I am not wrong |
Afaik Ziggurat is neither. In the literature it's just commonly only used for Normal and exponential distributions, however it's a general method that works for all monotone decreasing distributions (or if their symmetric half is monotone decreasing) |
I would say, "Yes, but." The "but" is because Ziggurat is more complicated to implement correctly (as we're learning). So, in terms of the current project, I would say there's a tradeoff between focusing on Ziggurat correct, versus getting a good variety of basic distributions in place with simpler (but more limited) algorithms. |
Yes, exactly. |
The emails I'm getting from GSoC suggest that we're supposed to be finished up by the end of August, with 23 August as your own deadline for finalizing code and 29 August as the deadline for Ilya and me to submit our final evaluation report? |
Yep, but that doesn't stop me to continue to work (I know that I wasted quite a lot of time) |
It's great that you want to keep working, but I was concerned about the expectations raised in the description of your GSoC project and what the people responsible might expect to see (which is why earlier I raised the possibility of doing some basic implementations of a variety of non-uniform distributions). @9il you're the primary mentor here, so what are your thoughts? |
We already have 2 general purpose Discrete RNG realizations and Tinflex will be ready soon. Tinflex is hard numeric project without obvious workforce requirements. R version contains a lot of numeric bugs, many of them are fixed in the this project. We can stamp / copy-past boost rng, and this is not problem. But copy-pasting is not related to the proper RNG numbers. First, we need proper shell over std.random and fixed If @wilzbach implemented a variety of non-uniform distributions but not Tinflex, then I would not be able to consider this GSoC project as finished. |
I agree with that; I'm asking whether we should implement a variety of non-uniform distributions instead of (short term) focusing on Ziggurat. Tinflex obviously takes primacy as it is the most significant part of the promised work. |
Yes, I prefer to add more a variety of non-uniform distributions instead of Ziggurat |
Please reopen for mir-random |
Adds the Ziggurat sampling algorithm for Normal & Exponential distribution.
-0.02
:/dub ./examples/nonuniform_plot.d
):Ping @joseph-wakeling-sociomantic @9il