Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support generation of strong random numbers #1372

Merged
merged 10 commits into from
Apr 4, 2017

Conversation

g-andrade
Copy link
Contributor

This PR proposes two new additions to the crypto module, both named strong_rand_uniform, for effortless generation of cryptographically secure numbers:

crypto:strong_rand_uniform/0: generates floats on the open interval ]0.0, 1.0[
crypto:strong_rand_uniform/1: generates integers on an arbitrary closed interval [1, N]

These follow the same interfaces as rand:uniform/0 and rand:uniform/1, and both use OpenSSL's BN_rand_range method.

Generated floating point values are limited to an effective entropy of up to 51 bits but are expected to be uniformly distributed between 0.0 and 1.0.

Supersedes #1363.

@IngelaAndin IngelaAndin added team:PS Assigned to OTP team PS feature labels Mar 12, 2017
@IngelaAndin IngelaAndin added the testing currently being tested, tag is used by OTP internal CI label Mar 14, 2017
@g-andrade g-andrade force-pushed the crypto/strong_random_numbers branch from ce3e998 to d07008a Compare March 14, 2017 23:54
@g-andrade
Copy link
Contributor Author

A conflict had popped up in the meantime, in crypto.c; I've rebased over master and it's ok now.

@RaimoNiskanen
Copy link
Contributor

I have finally had the time to think through this PR, and think we should adapt it more to be a rand plugin, plus change rand to actually alow plugins. Thereby we get a uniform API for different random generators and also normal standard deviation strong random floats for free.

By the way, do you have an actual use case for strong random integers?

We are phasing out the use of mpint's. Use plain binaries and get_bn_from_bin() instead.

As building blocks we need most of your suggested functions, but I suggest:

  • Make strong_rand_uniform_nif/2 reflect the backend libcrypto function BN_rand_range() - rename it to strong_rand_range_nif(Range :: binary()) -> binary(), that use integers in binaries, not mpint's, only takes the Range width and returns an integer [0 .. Range-1]. This reduces the BIGNUM handling in the C code, and makes a more flexible building block.

  • Create a function strong_rand_range(Range :: integer() | binary()) -> binary() that calls the NIF above and also returns [0 .. Range-1] in a binary.

  • Create a new NIF strong_rand_float_nif/0 that calls BN_rand(p_rnd, 52, -1, 0), then uses BN_bn2bin() to get the bytes, be64toh() from endian.h to get the integer, and then construct the IEEE double in C to return it via enif_make_double() after subtracting 1.0. This should optimize generation of floats since BN_rand should be better at power of 2 ranges than BN_rand_range and constructing the double in C as well as subtracting 1.0 should be faster in C than in Erlang and roughly the same code size. This is maybe premature optimization as the strong_rand_uniform/0 function you already have would do just fine to just rename strong_rand_float/0 if you fix it to be able to return 0.0.

  • Create a wrapper function strong_rand_float() -> float() that calls the above NIF to generate a random float in the range [0.0 .. 1.0), that is including 0.0 but excluding 1.0. It seems the corresponding function in the deprecated random module is ambiguously documented and in the rand module incorrectly so. The latter can return 0.0.

  • I do not know if strong_rand_range/1 and strong_rand_float/0 should be documented, nor exported, since the our intention now is to call them via the rand module and the interface below. But maybe they are useful enough on their own to be documented...

  • Create exported and documented seed generators for the rand module: rand_seed() -> State :: rand:state() and rand_seed_s() -> State :: rand:state(). Where State is {AlgHandler,0}, AlgHandler = #{type => crypto, max => infinity, next => fun crypto:strong_rand_next/1, uniform => fun crypto:strong_rand_uniform/1, uniform_n => fun crypto:strong_rand_uniform/2, jump => fun crypto:strong_rand_jump/1}.

  • Create exported plugin functions (that ignore the seed) to be called from the rand module:

    • strong_rand_next(Seed) -> {bytes_to_integer(strong_rand_range(1 bsl 64)),Seed}
    • strong_rand_uniform({_,_} = State) -> {strong_rand_float(),State}
    • strong_rand_uniform(Max, {_,_} = State) -> {bytes_to_integer(strong_rand_range(Max)) + 1,State}
    • strong_rand_jump({_,_} = State) -> State
  • To actually allow for plugins open up the types in the rand module: rand:state() and rand:export_state() to not be -opaque anymore. This probably means that types rand:alg(), rand:alg_seed() and rand:alg_handler() needs to be exported as well. Plus that they must be generalized to e.g rand:alg() :: rand_alg() | atom() where the rand module internally should use rand_alg() instead of today alg().

To use this you call crypto:rand_seed() and after that R = rand:uniform(65536), or if you do not want the process dictionary magic S0 = crypto:rand_seed_s() and after that {R,S1} = {rand:uniform_s(65536, S0) where the S0..Sn waiving is just to please the API.

@g-andrade
Copy link
Contributor Author

By the way, do you have an actual use case for strong random integers?

I reckon it's something that has been missing for some time, and I find it as useful as having strong random bytes; now, one could argue a strong random byte generator (e.g. strong_rand_bytes) is enough to derive randomness for any other data type, but then the case for e.g. floats becomes particularly tricky, as it's very easy to take a naive (but wrong) approach; if the standard library were to provide for it, a lot of people won't head into this pitfall in the future.

I've implemented most of your suggestions, with the notable exception of not having the 'crypto rand plugin' functions' exposed, as it's consistent with the corresponding internal rand module functions for the built-in algorithms - besides, is it expectable people hot-swap the crypto module in runtime? In any case, I don't mind doing it differently.

As for the crypto:strong_rand_float function, I didn't NIF-ize it yet, as I would first like to know whether you think the current solution is going in the right direction.

@g-andrade g-andrade force-pushed the crypto/strong_random_numbers branch from aee9118 to 5eae0da Compare March 18, 2017 18:06
}

bn_rand = BN_new();
if (BN_rand_range(bn_rand, bn_range) != 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be if (! BN_rand_range(bn_rand, bn_range)) { since the return value of BN_rand_range() is a boolean, not a numerical value

Copy link
Contributor Author

@g-andrade g-andrade Mar 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmmmh, I considered that (as it's a very common pattern), but the documentation explicitly states that either '0' or '1' shall be returned for failure or success, respectively.
It would still work after that change, but I worry whether it could suddenly behave unexpectedly if, let's say, one day the interface gets extended and it starts returning '2' or '0xBEEF' to signal something else entirely?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allright, i found other functions in the OpenSSL documentation that return 0 or 1 as this one, and -1 if not implemented, so keep the != 1.


<p><em>Example</em></p>
<pre>
crypto:rand_seed(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_ = crypto:rand_seed(), to be more Dialyzer friendly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed on 1f236ff

<pre>
crypto:rand_seed(),
_IntegerValue = rand:uniform(42), % [1; 42]
_FloatValue = rand:uniform(). % [0.0; 1.0]</pre>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The range should be % [0.0; 1.0[

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed on 6f6c478

seed(Alg) ->
seed_put(seed_s(Alg)).

-spec seed_s(AlgOrExpState::alg() | export_state()) -> state().
-spec seed_s(AlgOrStateOrExpState::builtin_alg() | state() | export_state()) -> state().
seed_s(Alg) when is_atom(Alg) ->
Copy link
Contributor

@RaimoNiskanen RaimoNiskanen Mar 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To minimize the use of guards maybe reorder these into:

seed_s({AlgHandler,_Seed}) when is_map(AlgHandler) ->
seed_s({Alg0,Seed}) ->
seed_s(Alg) ->

Then we rely on alg_handler() being a map and does not use that alg() must be an atom, since it could be possible to widen the alg() type one day...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed on 195edd9

%% Algorithm state
-type state() :: {alg_handler(), alg_seed()}.
-type builtin_alg() :: exs64 | exsplus | exs1024.
-type alg() :: builtin_alg() | term().
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-type alg() :: builtin_alg() | atom()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed on 54b89c8

@RaimoNiskanen
Copy link
Contributor

RaimoNiskanen commented Mar 20, 2017

I also think this is an obviously missing feature, but my colleagues sometimes point out that it is a thin argument...

Floats are tricky to get right. Our current implementation in the rand module actually due to its implementation has got a strange distribution of the returned numbers in that the smaller the numbers the shorter the distance between them. And uniform ranges are also hard to get right. The rand implementation as of today can produce bad distribution for big ranges, due to its implementation.

We will have to fix that for the rand module. But for strong random numbers the distribution has to be good. Therefore I think this PR is a valuable contribution.

The state of this PR looks very good, not exactly like I said but just as I wanted it! So this is definitely the right direction. A few nitpicks above.

The reason I want to use export entry funs (e.g fun crypto:rand_plugin_uniform/1) as plugin interface is that it is possible to upgrade the crypto application. And if you do that a process in the system that holds a reference to the crypto funs would get killed. Therefore it feels safer to have them as internally exported, undocumented, and called as export entry funs.

I really do not know if crypto:strong_rand_{range,float} should be exported and documented or not. What do you think?

@RaimoNiskanen RaimoNiskanen removed the testing currently being tested, tag is used by OTP internal CI label Mar 21, 2017
end.
strong_rand_range_nif(_BinRange) -> ?nif_stub.

strong_rand_float() ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just got an idea. Wouldn't this produce exactly the same distribution of numbers?

strong_rand_float() ->
    BinFraction = strong_rand_range(1 bsl 53),
    bytes_to_integer(BinFraction) / 9007199254740992.0. % math:pow(2, 53)

If that is true it is much faster and there would probably be no need for a NIF.
I also want to use it in the rand module, unless someone proves me wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I ended up rewriting it using a similar approach based on both your and @okeuday 's suggestions.

@okeuday
Copy link
Contributor

okeuday commented Mar 21, 2017

@RaimoNiskanen Yeah, that approach works. It has been in quickrand for awhile (here):

strong_float() ->
    % 53 bits maximum for double precision floating point representation
    % erlang:round(53.0 / 8) == 7 bytes for random number
    <<I:56/integer>> = crypto:strong_rand_bytes(7),
    I / ?BITS56. % scaled by maximum random number (2 ^ (7 * 8)) - 1

@RaimoNiskanen
Copy link
Contributor

@okeuday: I see (fairly certainly) two problematic details with that code, the first is the same "error" as in the current rand module:

  • I / ?BITS56 feeds 56 random bits into the division, so if the top 1..3 bits are zero we still have 55..52 random bits. This causes numbers in the interval [0.5; 1.0[ to get the distance 2^-53, in [0.25; 0.5[ the distance 2^-54, and in [0.125; 0.25[ the distance 2^-55. They are not equidistant over [0.0; 1.0[.
  • Since the division is with ?BITS56 (16#FFFFFFFFFFFFFF), not 2^56, there will probably be strange rounding artifacts in the produced range, plus 1.0 will be part of the generated range, which I think is wrong for a random range functions since all I have seen include the lower bound and exclude the upper.

Therefore I suggest masking to 53 bits before the division and dividing with 2.0^53 which should make the resulting numbers equidistant (2.0^-53) over [0.0; 1.0[.

The suggested binary syntax solution also produces equidistant numbers, but with distance 2.0^-52 since subtracting 1.0 shifts in a zero lowest bit, so that bit is not random, which is unfortunate.

The division can also be optimized:

strong_rand_float() ->
    BinFraction = strong_rand_range(1 bsl 53),
    bytes_to_integer(BinFraction) * math:pow(2, -53).

Floating point multiplication should be faster than division and the compiler constant evaluates math:pow(2, -53). For me it was a surprise that it does so. Maybe safer to replace with 1.11022302462515657e-16, in a descriptive macro.

@okeuday
Copy link
Contributor

okeuday commented Mar 22, 2017

@RaimoNiskanen Using 56 bits instead of 53 bits should not be a problem here due to 56 bits having a random value that is easier to think of as an integer in the range of [0..72057594037927935] which has uniform distribution due to crypto:strong_rand_bytes/1, assigned to the integer I. While the division I / ?BITS56 may be rounded differently based on the hardware implementation of IEEE754 (e.g., a double rounding that occurs with an extended-based system that stores into a double-precision value when compared to a single/double system) the result should remain uniformly distributed in the range [0.0 .. 1.0] despite the potential variation in different double precision rounding with different hardware.

I believe the range [0.0 .. 1.0] is more useful for various math when compared to the range [0.0 .. 1.0[ and it is my expectation that the range [0.0 .. 1.0[ is a more popular implementation choice for source code that generates random floating point values simply due to the dependence on the IEEE754 double precision binary format with the assignment of 52 bits of randomness, as was done in this pull request.

I agree that using a multiplication instead of a division is better and a good change for efficiency. I also agree that a macro for the value of math:pow(2, -53) seems safer, though it may only matter for older versions of Erlang (1.11022302462515657e-16 is a machine epsilon value for binary64 though the float.h DBL_EPSILON is the more typical math:pow(2, -52) machine epsilon value).

@g-andrade
Copy link
Contributor Author

@RaimoNiskanen ,

The reason I want to use export entry funs (e.g fun crypto:rand_plugin_uniform/1) as plugin interface is that it is possible to upgrade the crypto application. And if you do that a process in the system that holds a reference to the crypto funs would get killed. Therefore it feels safer to have them as internally exported, undocumented, and called as export entry funs.

Fix pushed.

I really do not know if crypto:strong_rand_{range,float} should be exported and documented or not. What do you think?

I think keeping them out of sight would lead to more elegant use of the funcionality, as it would provide people with a single, consistent solution - "use the rand plugin" - while at the same time keeping the crypto interface as slim as it can be.

@g-andrade
Copy link
Contributor Author

g-andrade commented Mar 22, 2017

As for the alternative approaches to generating uniform numbers over [0.0, 1.0] / [0.0, 1.0[ - very interesting brain food. I've pushed this:

-define(HALF_DBL_EPSILON, 1.1102230246251565e-16). % math:pow(2, -53)

strong_rand_float() ->
    WholeRange = strong_rand_range(1 bsl 53),
    ?HALF_DBL_EPSILON * bytes_to_integer(WholeRange).

Which should generate numbers over the half-open [0.0, 1.0[ interval. If the closed interval is to be preferred, I reckon generating the random integer up to 2**53 should do the job (it being a power of two, there should be no loss of precision.)

@RaimoNiskanen
Copy link
Contributor

RaimoNiskanen commented Mar 23, 2017

@g-andrade: Looks good! I prefer the half-open interval partly because the integer range function and most other range functions i have seen use half-open intervals and partly see the last paragraph. I agree that extending the strong rand range to ((1 bsl 53) + 1) would be the right way to close the interval.

@okeuday
Using 53 bits integer and divide with 2^53 avoids rounding since all such integers have an exact representation as IEEE754 doubles. Using more bits and larger 2^N divisor causes rounding. Using 2^N - 1 divisor also causes rounding.

The resulting numbers after rounding are still uniformly distributed, but not evenly so since the distance between two adjacent numbers varies over the range. It is true that for every sub range to [0.0..1.0) sufficiently larger than the machine epsilon the probability is the same, but every possible number is not equally probable. I think that is annoying. See also http://xoroshiro.di.unimi.it/ "Generating uniform doubles in the unit interval" for a discussion.

I also think that the half open range [0.0..1.0) is more useful than the closed range since then you can e.g generate one set of numbers in [0.0..1.0) another set in [1.0..2.0) and join them without getting a probability spike for 1.0.

@okeuday
Copy link
Contributor

okeuday commented Mar 23, 2017

@RaimoNiskanen Thank you for the reference. I have switched my code to use only 53 bits to avoid rounding and it can remain an alternative for the [0.0 .. 1.0] range.

@RaimoNiskanen
Copy link
Contributor

RaimoNiskanen commented Mar 24, 2017

@okeuday

Just to underline again: using a non-2^N divisor will also cause rounding.

To avoid rounding for the [0.0 .. 1.0] range one should produce a random number in the range [0 .. 2^53] and then divide by 2.0^53 i.e the integer range should contain the upper bound so you can use a 2^N divisor. But then you need to produce a random integer on a range size not 2^N but 1+2^N, which is cumbersome but supported by libcrypto's BN_rand_range.

@RaimoNiskanen RaimoNiskanen added the testing currently being tested, tag is used by OTP internal CI label Mar 24, 2017
@RaimoNiskanen
Copy link
Contributor

I will add a cleanup commit and run it once more in the daily tests. Therefore removing the 'testing' label, which may be confusing....

@RaimoNiskanen RaimoNiskanen removed the testing currently being tested, tag is used by OTP internal CI label Apr 3, 2017
@RaimoNiskanen RaimoNiskanen merged commit c84e541 into erlang:master Apr 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature team:PS Assigned to OTP team PS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants