HighPrecision<f32/64> #531

sicking · 2018-06-26T23:37:17Z

This implements a distribution like Uniform<f32/f64>, but using the full precision of the floating point types.

There's some unfortunate things in there, like the HPFloatHelper trait. However it seems needed for the same reason that we have that we have the UniformSampler and SampleUniform traits. I went with a slightly simplified version of those traits since we don't have the same need of extensibility here.

I wrote some pretty thorough tests in order to convince myself that the implementation is correct. The full set of tests take well over 60 seconds to run on my modern MacBook Pro, so I instead have a smaller set of tests which are run by default. But the full set is also there and can be enabled by anyone tinkering with this code in the future.

Regarding relationship to issue #494, I think this PR is actually relatively orthogonal. I don't see the distribution in this PR as something that someone would ever want to use as input for other randomness related algorithms, such as weighted choices or bool generation. It's simply too bad performance and too high precision.

In fact, we might want to name this FullPrecision<...> or MaxPrecision<...> instead. And leave the name HighPrecision for something with better performance.

sicking · 2018-06-27T08:20:02Z

Oh, and I intentionally didn't include the code for HighPrecision01 from #372 since I don't want to take credit for @pitdicker's work.

dhardy · 2018-06-27T09:05:37Z

Your benchmarks show 6-10× the runtime:

test distr_uniform_f32           ... bench:       1,568 ns/iter (+/- 30) = 2551 MB/s
test distr_highprecision1_f32    ... bench:       9,184 ns/iter (+/- 639) = 435 MB/s
test distr_highprecision2_f32    ... bench:      15,198 ns/iter (+/- 707) = 263 MB/s
test distr_highprecision3_f32    ... bench:      15,792 ns/iter (+/- 690) = 253 MB/s

test distr_uniform_f64           ... bench:       2,153 ns/iter (+/- 42) = 3715 MB/s
test distr_highprecision1_f64    ... bench:      11,909 ns/iter (+/- 437) = 671 MB/s
test distr_highprecision2_f64    ... bench:      18,425 ns/iter (+/- 4,472) = 434 MB/s
test distr_highprecision3_f64    ... bench:      19,341 ns/iter (+/- 2,335) = 413 MB/s

Which is reasonable enough if the precision is actually needed, though as you say there may not be many use-cases where it is.

So the big question is do we want to include this in Rand? You say it is orthogonal to #494 but it's not exactly, for example we could put all high-precision implementations in a companion crate.

since I don't want to take credit for ...

The git commits still track who created the commit (even if you rebase or adjust it), so not a problem.

sicking · 2018-06-27T16:12:04Z

Yeah, sticking this in a separate crate is definitely an option. Also related to discussion in #290. I'll see if I can rebase the other patch.

sicking · 2018-06-29T07:35:19Z

I've rebased this on top of #523 and added the code from #372. So the PR will look funny until #523 lands.

(The mean test is totally inadequate for checking high precision.)

…le bit

pitdicker · 2018-07-20T14:16:02Z

Sorry for not looking at your code for so long. Can you maybe give a high-level description how HighPrecision<f32/f64> works before I dive in?

Oh, and I intentionally didn't include the code for HighPrecision01 from #372 since I don't want to take credit for @pitdicker's work.

Feel free to treat the code like it is your own 😄.

sicking · 2018-07-21T06:39:19Z

No worries. Mainly did this because it was a fun challenge. I'm not particularly opinionated about where the final implementation ends up living since I'm not sure in which situations this code is needed.

It's easiest to think about only positive numbers as a start, and using 32bit numbers as an example.

The implementation takes high and low and converts into binary representations of the exponent and the mantissa. This mainly means parsing out the mantissa/exponent, and then adding the 24th high bit that IEEE754 optimizes out. If we have an underflow number this bit is not added but the exponent is adjusted instead. Note that we here don't care about the exponent bias, but rather just treat the exponent as a number between 1-254. (Biases only matters converting encoded values to numeric values, which we never do). Lets say that we get:
low_mantissa: 0x0080_1234
low_exponent: 140
high_mantissa: 0x00a0_0012
high_exponent: 150

Unless we had underflow, both mantissas will be 24-bit numbers.

We then convert both numbers to use the higher exponent, and shift down the mantissa to compensate. Here that means right-shifting low_mantissa, and increasing low_exponent, by 10. During shifting, if we're shifting the low value, round towards negative infinity. If we're shifting the high value, round towards positive infinity. This gives

low_mantissa: 0x0000_2004
low_exponent: 150
high_mantissa: 0x00a0_0012
high_exponent: 150

(At this point this information is saved in the Distribution object).

Conceptually what we do after this is that we treat the high and the low as infinitely long numbers, by adding 0s at the end, and generate an infinitely long random number between them. We then grab enough bits of this random number, rounding towards negative infinity, that we get a 24bit number. In our case:

Generate a number between
0x0000_2002_0000_0000_...
and
0x00a0_0012_0000_0000_...
like
0x000b_85f2_2fe1_c90a_...
grab 24 bits (while also adjusting the exponent)
0x00b8_5f22
exponent: 150 - 4 = 146

In practice, the way this is done is to generate a number between the low and high mantissa using an Uniform<u32>, in our example we get 0x000b_85f2. We then shift left 4 bits to get a 24bit number, and add random 4bit value (the number 2 in our example). We also decrease the exponent by 4 giving us 146.

This is repeated until we have a mantissa that is exactly 24 bits, or until we get into the underflow range (i.e. when exponent is 1). Usually only one loop is needed, except if we started with a mantissa of exactly 0.

We could at this point shift left more, and add more random bits at the end. However if we do that and then shift right to get to a 24bit value we'd get back exactly to where we are now. So at this point we know that even if we had added infinitely more random bits at the end, rounding would get us back to exactly the mantissa that we have now.

Finally we check that our number:
mantissa: 0x00b8_5f22
expnent: 146

is within the range that we set out to generate a number in. The only reason that we could get a number outside of the range is the rounding we did in the beginning to make both our limits have the same exponent.

If we ended up with a number that's too big or too small, then start over from scratch.

That's about all that's needed.

Other than taking care of negative numbers.

Negative numbers are handled by giving encoding the two mantissas as normal i32s. I.e. if the lower limit is negative, we extract the mantissa and exponent as described above, but then multiply the mantissa by -1.

These signed integers are then used as limits in the uniform distribution. I.e. the distribution is a Uniform<i32>.

Once have a generated number, we check if it is negative and then grab its absolute value. This absolute value is what all subsequent math is done on. This in order to correctly measure that we have a 24bit value since this matches the IEEE754 encoding. Everything is done exactly the same other than two things:

When we left-shift and add a random number in order to get a 24bit value, we subtract rather than add the extra bits at the end (in our example above, we would have subtracted 2 rather than add 2).
At the end, if we had a negative number, we set the sign-bit in the generated result.

I've skipped some details above, but this is mainly it. Do let me know if anything is still unclear.

sicking · 2018-07-21T06:47:02Z

Oh, and we might want to move these distributions to their own file. There's very little sharing with the rest of the code in float.rs, especially now that the existing macro invocations can't be used on account of SIMD, and it'd make it easier to move to a separate crate if desired.

dhardy · 2023-03-30T10:26:34Z

As mentioned in #494, I will now close this. It's a nice idea, but doesn't seem useful in rand itself and I haven't seen much interest in actually using it. If it were to be revived, there's still the question of where it should be housed (rand_distr or a new crate).

sicking force-pushed the highprec branch from 867de5c to ff821e2 Compare June 26, 2018 23:59

sicking force-pushed the highprec branch from ff821e2 to fa70def Compare June 29, 2018 07:33

sicking force-pushed the highprec branch 5 times, most recently from 1c96ce5 to fa1f932 Compare July 1, 2018 20:02

dhardy added the P-postpone Waiting on something else label Jul 7, 2018

sicking force-pushed the highprec branch from fa1f932 to eab8ebd Compare July 12, 2018 11:58

sicking and others added 4 commits July 12, 2018 15:42

Implement HighPrecision distribution

fa7dc39

Implement HighPrecision01 distribution

f821ab7

Float sampling: improve high precision sampling; add mean test

97b107a

(The mean test is totally inadequate for checking high precision.)

Improve tests, fix comments, and simplify HighPrecision01 code a litt…

651c1ac

…le bit

sicking force-pushed the highprec branch from eab8ebd to 651c1ac Compare July 12, 2018 22:44

pitdicker mentioned this pull request Jul 18, 2018

Implement HighPrecision01 distribution #372

Closed

sicking mentioned this pull request Jul 31, 2018

Compromising accuracy and speed [distributions] #494

Open

dhardy mentioned this pull request Sep 16, 2019

Tracker: 0.8 release #889

Closed

19 tasks

dhardy mentioned this pull request Nov 13, 2019

Conversion to 64-bit float rust-random/book#21

Closed

dhardy mentioned this pull request May 15, 2020

Remove degenerate cases statrs-dev/statrs#102

Open

dhardy mentioned this pull request Oct 8, 2020

Partially-sampled random numbers #1055

Closed

vks mentioned this pull request Jan 24, 2021

gen_range(f64::MIN..f64::MAX) should panic instead of silently overflow #1090

Closed

dhardy mentioned this pull request Oct 18, 2021

Implement an exact Bernoulli distribution #1193

Open

dhardy mentioned this pull request Feb 20, 2023

Equidistant sampling of uniform FP values #1270

Open

dhardy mentioned this pull request Feb 21, 2023

Uniform float improvements #1289

Merged

dhardy closed this Mar 30, 2023

dhardy mentioned this pull request Oct 18, 2023

Higher quality (0, 1] floats #1346

Closed

vks mentioned this pull request Apr 10, 2024

Investigate replacing Ziggurat sampling with ETF #257

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HighPrecision<f32/64> #531

HighPrecision<f32/64> #531

sicking commented Jun 26, 2018 •

edited

Loading

sicking commented Jun 27, 2018

dhardy commented Jun 27, 2018

sicking commented Jun 27, 2018

sicking commented Jun 29, 2018 •

edited

Loading

pitdicker commented Jul 20, 2018

sicking commented Jul 21, 2018

sicking commented Jul 21, 2018

dhardy commented Mar 30, 2023

HighPrecision<f32/64> #531

HighPrecision<f32/64> #531

Conversation

sicking commented Jun 26, 2018 • edited Loading

sicking commented Jun 27, 2018

dhardy commented Jun 27, 2018

sicking commented Jun 27, 2018

sicking commented Jun 29, 2018 • edited Loading

pitdicker commented Jul 20, 2018

sicking commented Jul 21, 2018

sicking commented Jul 21, 2018

dhardy commented Mar 30, 2023

sicking commented Jun 26, 2018 •

edited

Loading

sicking commented Jun 29, 2018 •

edited

Loading