Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove bias from functions that get randomness, simplify their implementation, and move to separate file #624

Merged
merged 11 commits into from
Jan 28, 2023

Conversation

alejandro-colomar
Copy link
Collaborator

No description provided.

@alejandro-colomar alejandro-colomar force-pushed the shadow_random branch 4 times, most recently from c040366 to 88e04c5 Compare December 30, 2022 23:57
lib/bit.h Fixed Show fixed Hide fixed
@alejandro-colomar alejandro-colomar force-pushed the shadow_random branch 3 times, most recently from a6aafcb to 1fd52d2 Compare December 31, 2022 01:23
libmisc/random.c Outdated
* Similar to arc4random_uniform(3), but returns a u_long.
*/
unsigned long
shadow_random_uniform(unsigned long upper_bound)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The algo I implemented in Linux might be a bit more efficient if the cost of multiplication is low compared to the cost of drawing a new random sample, which is usually the case:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e9a688bcb19348862afe30d7c85bc37c4c293471
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7f576b2593a978451416424e75f69ad1e3ae4efe

Copy link
Collaborator Author

@alejandro-colomar alejandro-colomar Dec 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Jason,

Hmmm, those Linux commits do too much magic to my taste. Even with the comment and the commit message, I can't still understand why the multiplication works.

So, for an initial implementation, I'd keep the simple truncation with &=, even if it's slower.

After that, well, for shadow, I don't think the performance is worth making the code cryptic. Moreover, since we use 64-bit numbers in this code, we would need 128-bit multiplication, which is not something supported everywhere.

However, it might be an interesting thing to do in glibc. We could discuss improving the glibc implementation of arc4random_uniform(3) if you think it is an improvement over the current glibc code (please check). If we do that, I suggest that you prepare a very detailed description of why the multiplication works, because at least in my case (and I suspect for many other programmers too), I already forgot the details of how integer multiplication works bitwise from when I studied that at school, and it's something I never use, so it's hard to understand why it serves the purpose here.

[edited; after writing the answer to Theo, I discarded some ideas]

Cheers,

Alex

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lemire has proofs: https://arxiv.org/abs/1805.10941

Your previous comment had a mention of expanding the kernel API. Not sure what you're proposing, but I've got a patchset under review that will make arc4random() and company both safe and fast.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lemire has proofs: https://arxiv.org/abs/1805.10941

Thanks!

Your previous comment had a mention of expanding the kernel API. Not sure what you're proposing,

I'll restore the message here, since it seems that arc4random*(3) can't be fixed because the bugs have percolated to users that now depend on the design issues. That's fair, though; nothing surprising in C's history.

The removed part of my message was:

"
But, don't hurry; after Theo's email, I'm starting to strongly believe that the arc4random_uniform(3) interface is really bad, and you were right when you suggested a few months ago that we should reconsider if we want to add it, or maybe design a better API. I will propose deprecating arc4random*(3) functions in glibc, and come up with a better set of interfaces, maybe implemented entirely in the kernel.

I'll also send some draft patches to the kernel, to start discussion about new functions that improve over the current get_random*() that you showed, and from that discussion we can maybe come up with new syscalls.
"

but I've got a patchset under review that will make arc4random() and company both safe and fast.

Please add me to the loop; I'm curious. What's the link?

Copy link
Collaborator Author

@alejandro-colomar alejandro-colomar Jan 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finally understood how the multiplication method works (I needed to viasualize it). Here's some reduction to 3 bits, which (at least to me) clarifies how the rejection works. If you go to 4 bits, you can see that the rejection decreases considerably, and the more bits you add, the less rejection you'll see for low values of N.

image

Copy link
Collaborator Author

@alejandro-colomar alejandro-colomar Jan 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a graph of the probability of looping, for the algorithm with multiplication (for 32 bits):

image

From those saw lines, only the last one goes up to 0.5. As you go closer to 0, they are smaller. This is compared to my implementation with masks, where all the saw lines go up to 0.5.

This comes from the fact that masking is discarding entropy bits. glibc keeps those entropy bits in a loop, but it's simpler to just use multiplication to make use of them in a single operation. And really, glibc's algorithm still discards some entropy compared to multiplication, but it's a small amount.

Worst case is still the same 0.5, but with this optimization, it happens less often.

Copy link
Collaborator Author

@alejandro-colomar alejandro-colomar Jan 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ikerexxe , @hallyn

This is a considerable performance improvement for low values of N. The only downside is that it requires using a type wider than long, which would need to be unsigned __int128. It's a GNU extension, so I don't know how much it's viable. The current approach of masking is not so bad, it will normally just call a handful of times getrandom(2), but it could be much better.

Copy link
Collaborator Author

@alejandro-colomar alejandro-colomar Jan 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here goes an (incomplete and less detailed) 4-bit visual explanation of how it works, to compare it to the 3-bit one from above. We can see that the rejection is smaller for low values of N compared to the same values of N when the original random number generator has less bits.

image

@alejandro-colomar
Copy link
Collaborator Author

Changes:

  • Rename shadow_random_uniform() to shadow_random_below().

Since our function behaves slightly different than arc4random_uniform(3), it's better not to have a name that would cause confusion. Instead, use a name similar to the Linux kernel internal functions that behave like ours.

@alejandro-colomar
Copy link
Collaborator Author

Changes:

  • Fix undefined behavior in include guard (leading underscore + uppercase is reserved).

  • Add bit_ceil_wrapul().

  • Rename functions to distance from arc4random(3), and at the same time, make more clear that the functions produce cryptographically-secure values. However, keep some similarity with arc4random() in what is useful, for some consistency.

  • Improve comments documenting these functions.

  • Fix Undefined Behavior in implementation of shadow_random_uniform()/csrand_uniform(), which was due to a right shift by 32. Use instead bit_ceil_wrapul(), which has no UB.

@alejandro-colomar
Copy link
Collaborator Author

Changes: Rebase to master

@alejandro-colomar alejandro-colomar marked this pull request as ready for review January 16, 2023 12:43
@alejandro-colomar
Copy link
Collaborator Author

alejandro-colomar commented Jan 16, 2023

I think this is ready for review. This time, I didn't add extensive documentation for these functions, because I think the one-liner comment is clear enough. In fact, the manual page I wrote for the arc4random(3) functions is similarly short (see below), and most of the differences are noting the bugs in the libc functions. If you consider some more documentation is merited, please comment.

arc4random(3)              Library Functions Manual              arc4random(3)

NAME
       arc4random,  arc4random_uniform, arc4random_buf - cryptographically‐se‐
       cure pseudorandom number generator

LIBRARY
       Standard C library (libc, ‐lc)

SYNOPSIS
       #include <stdlib.h>

       uint32_t arc4random(void);
       uint32_t arc4random_uniform(uint32_t upper_bound);
       void arc4random_buf(void buf[.n], size_t n);

DESCRIPTION
       These functions give cryptographically‐secure pseudorandom numbers.

       arc4random() returns a uniformly‐distributed value.

       arc4random_uniform() returns a uniformly‐distributed  value  less  than
       upper_bound (see BUGS).

       arc4random_buf()  fills  the  memory pointed to by buf, with n bytes of
       pseudorandom data.

       The rand(3) and drand48(3) families of functions should  only  be  used
       where  the  quality  of  the  pseudorandom numbers is not a concern and
       there’s a need for repeatability of the results.  Unless you meet  both
       of those conditions, use the arc4random() functions.

RETURN VALUE
       arc4random() returns a pseudorandom number.

       arc4random_uniform()  returns  a  pseudorandom  number  less  than  up‐
       per_bound for valid input, or 0 when upper_bound is invalid.

ATTRIBUTES
       For an explanation of the terms  used  in  this  section,  see  attrib‐
       utes(7).
       ┌────────────────────────────────────────────┬───────────────┬─────────┐
       │Interface                                   │ Attribute     │ Value   │
       ├────────────────────────────────────────────┼───────────────┼─────────┤
       │arc4random(), arc4random_uniform(),         │ Thread safety │ MT‐Safe │
       │arc4random_buf()                            │               │         │
       └────────────────────────────────────────────┴───────────────┴─────────┘

STANDARDS
       These nonstandard functions are present on several Unix systems.

BUGS
       An  upper_bound  of  0  doesn’t make sense in a call to arc4random_uni‐
       form().  Such a call will fail, and return 0.  Be careful,  since  that
       value  is  not less than upper_bound.  In some cases, such as accessing
       an array, using that value could result in Undefined Behavior.

SEE ALSO
       getrandom(3), rand(3), drand48(3), random(7)

Linux man‐pages (unreleased)        (date)                       arc4random(3)

@alejandro-colomar
Copy link
Collaborator Author

Changes:

  • Use the term interval, rather than range, since interval is more correct mathematically, and so it will be more precise for users of the API what it does.

@ikerexxe
Copy link
Collaborator

I have several questions before being able to further review this PR.

  • What does UB mean?
  • What will bit_ceil_wrapul() and leading_zerosul() do? A search on the internet hasn't provided any meaningful answer.

In the Add csrand_uniform() commit you wrote reutrns instead of returns.

Finally, a request. Can you move the bit manipulation functions commit? It took me a context switch to understand why you were adding those functions, and then you didn't use the functions provided in those commits in the next commit, so I had another context switch.

@alejandro-colomar
Copy link
Collaborator Author

alejandro-colomar commented Jan 16, 2023

I have several questions before being able to further review this PR.

* What does UB mean?

Ahh, sorry. UB is an abbreviation for Undefined Behavior (aka Nasal Demons; now I wonder if someone ever wrote a program called nasald :P ). It's a common abbreviation in some forums, and I sometimes assume it's common everywhere :)

https://port70.net/%7Ensz/c/c11/n1570.html#3.4.3

* What will bit_ceil_wrapul() and leading_zerosul() do? A search on the internet hasn't provided any meaningful answer.

They are similar to planned additions to ISO C23.

https://thephd.dev/c23-is-coming-here-is-what-is-on-the-menu#n3022---modern-bit-utilities
https://www.open-std.org/JTC1/SC22/WG14/www/docs/n3054.pdf#subsection.7.18.3
https://www.open-std.org/JTC1/SC22/WG14/www/docs/n3054.pdf#subsection.7.18.16

In the Add csrand_uniform() commit you wrote reutrns instead of returns.

Finally, a request. Can you move the bit manipulation functions commit? It took me a context switch to understand why you were adding those functions, and then you didn't use the functions provided in those commits in the next commit, so I had another context switch.

Ahh, yes, I thought I had put it just before its use. I'll move it.

Happy new year :)

Alex

@alejandro-colomar alejandro-colomar force-pushed the shadow_random branch 2 times, most recently from 9bc5838 to a20792d Compare January 16, 2023 22:03
@alejandro-colomar
Copy link
Collaborator Author

Changes:

  • Reorder commits.
  • Change comment to expand UB into "Undefined Behavior".

Copy link
Collaborator

@ikerexxe ikerexxe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some inline questions to better understand the code.

In the Add csrand_uniform() commit you wrote reutrns instead of returns.

This comment is still valid.

lib/bit.h Outdated Show resolved Hide resolved
libmisc/csrand.c Outdated Show resolved Hide resolved
libmisc/csrand.c Outdated Show resolved Hide resolved
@ikerexxe
Copy link
Collaborator

arc4random(3)              Library Functions Manual              arc4random(3)

NAME
       arc4random,  arc4random_uniform, arc4random_buf - cryptographically‐se‐
       cure pseudorandom number generator

I guess this manual page was updated recently. Am I right? I cannot see it like that in my system.

@alejandro-colomar
Copy link
Collaborator Author

arc4random(3)              Library Functions Manual              arc4random(3)

NAME
       arc4random,  arc4random_uniform, arc4random_buf - cryptographically‐se‐
       cure pseudorandom number generator

I guess this manual page was updated recently. Am I right? I cannot see it like that in my system.

Yes, I added it in 2023 :)

alx@debian:~/src/linux/man-pages/man-pages/main$ git log -- man3/arc4random.3
commit 3638ab3f148ad661e731e4d5d744c2fd32c463f0
Author: Alejandro Colomar <alx@kernel.org>
Date:   Thu Jan 5 20:14:19 2023 +0100

    arc4random.3: Raise the severity of the CAVEATS to BUGS
    
    This is a misdesign that the original OpenBSD developers fail to
    acknowledge.
    
    Link: <https://inbox.sourceware.org/libc-alpha/068b01c4-d0c4-0849-eabb-09c020a1480b@gmail.com/T/>
    Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
    Cc: Arsen Arsenović <arsen@aarsen.me>
    Cc: Sam James <sam@gentoo.org>
    Cc: "Serge E. Hallyn" <serge@hallyn.com>
    Signed-off-by: Alejandro Colomar <alx@kernel.org>

commit 477c35bc1b0a785bd0b1eb5d57943b0483390519
Author: Alejandro Colomar <alx@kernel.org>
Date:   Mon Jan 2 12:08:31 2023 +0100

    arc4random.3: Be consistent in uses of pseudorandom
    
    Signed-off-by: Alejandro Colomar <alx@kernel.org>

commit 848404feb5bfbdcb9b8f693ec29159231ee2a7c8
Author: Alejandro Colomar <alx@kernel.org>
Date:   Sun Jan 1 17:18:22 2023 +0100

    arc4random.3: New page documenting the arc4random(3) family of functions
    
    arc4random(3)
    arc4random_uniform(3)
    arc4random_buf(3)
    
    Signed-off-by: Alejandro Colomar <alx@kernel.org>

@alejandro-colomar
Copy link
Collaborator Author

alejandro-colomar commented Jan 17, 2023

Changes:

  • Fix typo in commit message: reutrns -> returns

Some inline questions to better understand the code.

In the Add csrand_uniform() commit you wrote reutrns instead of returns.

This comment is still valid.

Oops, fixed.

arc4random(3) returns a number.
arc4random_buf(3) fills a buffer.
arc4random_uniform(3) returns a number less than a bound.

and I'd add a hypothetical one which we use:

*_interval() should return a number within the interval [min, max].

In reality, the function being called csrand() in this patch is not
really cryptographically secure, since it had a bias, but a subsequent
patch will fix that.

Signed-off-by: Alejandro Colomar <alx@kernel.org>
A set of APIs similar to arc4random(3) is complex enough to deserve its
own file.

Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Cristian Rodríguez <crrodriguez@opensuse.org>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: Björn Esser <besser82@fedoraproject.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Joseph Myers <joseph@codesourcery.com>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
These functions implement bit manipulation APIs, which will be added to
C23, so that in the far future, we will be able to replace our functions
by the standard ones, just by adding the stdc_ prefix, and including
<stdbit.h>.

However, we need to avoid UB for an input of 0, so slightly deviate from
C23, and use a different name (with _wrap) for distunguishing our API
from the standard one.

Cc: Joseph Myers <joseph@codesourcery.com>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
This API is similar to arc4random_uniform(3).  However, for an input of
0, this function is equivalent to csrand(), while arc4random_uniform(0)
returns 0.

This function will be used to reimplement csrand_interval() as a wrapper
around this one.

The current implementation of csrand_interval() doesn't produce very good
random numbers.  It has a bias.  And that comes from performing some
unnecessary floating-point calculations that overcomplicate the problem.

Looping until the random number hits within bounds is unbiased, and
truncating unwanted bits makes the overhead of the loop very small.

We could reduce loop overhead even more, by keeping unused bits of the
random number, if the width of the mask is not greater than
ULONG_WIDTH/2, however, that complicates the code considerably, and I
prefer to be a bit slower but have simple code.

BTW, Björn really deserves the copyright for csrand() (previously known
as read_random_bytes()), since he rewrote it almost from scratch last
year, and I kept most of its contents.  Since he didn't put himself in
the copyright back then, and BSD-3-Clause doesn't allow me to attribute
derived works, I won't add his name, but if he asks, he should be put in
the copyright too.

Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Cristian Rodríguez <crrodriguez@opensuse.org>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: Björn Esser <besser82@fedoraproject.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Joseph Myers <joseph@codesourcery.com>
Cc: Sam James <sam@gentoo.org>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
The old code didn't produce very good random numbers.  It had a bias.
And that was from performing some unnecessary floating-point
calculations that overcomplicate the problem.

Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Cristian Rodríguez <crrodriguez@opensuse.org>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: Björn Esser <besser82@fedoraproject.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Joseph Myers <joseph@codesourcery.com>
Cc: Sam James <sam@gentoo.org>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
It is common to use the expression 'sizeof(x) * CHAR_BIT' to mean the
width in bits of a type or object.  Now that there are _WIDTH macros for
some types, indicating the number of bits that they use, it makes sense
to wrap this calculation in a macro of a similar name.

Signed-off-by: Alejandro Colomar <alx@kernel.org>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
alejandro-colomar added a commit to alejandro-colomar/shadow that referenced this pull request Jan 27, 2023
Use a different algorithm to minimize rejection.  This is essentially
the same algorithm implemented in the Linux kernel for
__get_random_u32_below(), but written in a more readable way, and
avoiding microopimizations that make it less readable.

Which (the Linux kernel implementation) is itself based on Daniel
Lemire's algorithm from "Fast Random Integer Generation in an Interval",
linked below.  However, I couldn't really understand that paper very
much, so I had to reconstruct the proofs from scratch, just from what I
could understand from the Linux kernel implementation source code.

I constructed some graphical explanation of how it works, and why it
is optimal, because I needed to visualize it to understand it.  It is
published in the GitHub pull request linked below.

Here goes a wordy explanation of why this algorithm based on
multiplication is better optimized than my original implementation based
on masking.

masking:

	It discards the extra bits of entropy that are not necessary for
	this operation.  This works as if dividing the entire space of
	possible csrand() values into smaller spaces of a size that is
	a smaller power of 2.  Each of those smaller spaces has a
	rejection band, so we get as many rejection bands as spaces
	there are.  For smaller values of 'n', the size of each
	rejection band is smaller, but having more rejection bands
	compensates for this, and results in the same inefficiency as
	for large values of 'n'.

multiplication:

	It divides the entire space of possible random numbers in
	chunks of size exactly 'n', so that there is only one rejection
	band that is the remainder of `2^64 % n`.  The worst case is
	still similar to the masking algorithm, a rejection band that is
	almost half the entire space (n = 2^63 + 1), but for lower
	values of 'n', by only having one small rejection band, it is
	much faster than the masking algorithm.

	This algorithm, however, has one caveat: the implementation
	is harder to read, since it relies on several bitwise tricky
	operations to perform operations like `2^64 % n`, `mult % 2^64`,
	and `mult / 2^64`.  And those operations are different depending
	on the number of bits of the maximum possible random number
	generated by the function.  This means that while this algorithm
	could also be applied to get uniform random numbers in the range
	[0, n-1] quickly from a function like rand(3), which only
	produces 31 bits of (non-CS) random numbers, it would need to be
	implemented differently.  However, that's not a concern for us,
	it's just a note so that nobody picks this code and expects it
	to just work with rand(3) (which BTW I tried for testing it, and
	got a bit confused until I realized this).

Finally, here's some light testing of this implementation, just to know
that I didn't goof it.  I pasted this function into a standalone
program, and run it many times to find if it has any bias (I tested also
to see how many iterations it performs, and it's also almost always 1,
but that test is big enough to not paste it here).

int main(int argc, char *argv[])
{
	printf("%lu\n", csrand_uniform(atoi(argv[1])));
}

$ seq 1 1000 | while read _; do ./a.out 3; done | grep 1 | wc -l
341
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 1 | wc -l
339
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 1 | wc -l
338
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 2 | wc -l
336
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 2 | wc -l
328
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 2 | wc -l
335
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 0 | wc -l
332
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 0 | wc -l
331
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 0 | wc -l
327

This isn't a complete test for a cryptographically-secure random number
generator, of course, but I leave that for interested parties.

Link: <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e9a688bcb19348862afe30d7c85bc37c4c293471>
Link: <shadow-maint#624 (comment)>
Link: <https://arxiv.org/abs/1805.10941>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Cristian Rodríguez <crrodriguez@opensuse.org>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: Björn Esser <besser82@fedoraproject.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Joseph Myers <joseph@codesourcery.com>
Cc: Sam James <sam@gentoo.org>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Iker Pedrosa <ipedrosa@redhat.com>
[Daniel Lemire: Added link to research paper in source code]
Cc: Daniel Lemire <daniel@lemire.me>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
@alejandro-colomar
Copy link
Collaborator Author

Changes: rebase to master

Use a different algorithm to minimize rejection.  This is essentially
the same algorithm implemented in the Linux kernel for
__get_random_u32_below(), but written in a more readable way, and
avoiding microopimizations that make it less readable.

Which (the Linux kernel implementation) is itself based on Daniel
Lemire's algorithm from "Fast Random Integer Generation in an Interval",
linked below.  However, I couldn't really understand that paper very
much, so I had to reconstruct the proofs from scratch, just from what I
could understand from the Linux kernel implementation source code.

I constructed some graphical explanation of how it works, and why it
is optimal, because I needed to visualize it to understand it.  It is
published in the GitHub pull request linked below.

Here goes a wordy explanation of why this algorithm based on
multiplication is better optimized than my original implementation based
on masking.

masking:

	It discards the extra bits of entropy that are not necessary for
	this operation.  This works as if dividing the entire space of
	possible csrand() values into smaller spaces of a size that is
	a smaller power of 2.  Each of those smaller spaces has a
	rejection band, so we get as many rejection bands as spaces
	there are.  For smaller values of 'n', the size of each
	rejection band is smaller, but having more rejection bands
	compensates for this, and results in the same inefficiency as
	for large values of 'n'.

multiplication:

	It divides the entire space of possible random numbers in
	chunks of size exactly 'n', so that there is only one rejection
	band that is the remainder of `2^64 % n`.  The worst case is
	still similar to the masking algorithm, a rejection band that is
	almost half the entire space (n = 2^63 + 1), but for lower
	values of 'n', by only having one small rejection band, it is
	much faster than the masking algorithm.

	This algorithm, however, has one caveat: the implementation
	is harder to read, since it relies on several bitwise tricky
	operations to perform operations like `2^64 % n`, `mult % 2^64`,
	and `mult / 2^64`.  And those operations are different depending
	on the number of bits of the maximum possible random number
	generated by the function.  This means that while this algorithm
	could also be applied to get uniform random numbers in the range
	[0, n-1] quickly from a function like rand(3), which only
	produces 31 bits of (non-CS) random numbers, it would need to be
	implemented differently.  However, that's not a concern for us,
	it's just a note so that nobody picks this code and expects it
	to just work with rand(3) (which BTW I tried for testing it, and
	got a bit confused until I realized this).

Finally, here's some light testing of this implementation, just to know
that I didn't goof it.  I pasted this function into a standalone
program, and run it many times to find if it has any bias (I tested also
to see how many iterations it performs, and it's also almost always 1,
but that test is big enough to not paste it here).

int main(int argc, char *argv[])
{
	printf("%lu\n", csrand_uniform(atoi(argv[1])));
}

$ seq 1 1000 | while read _; do ./a.out 3; done | grep 1 | wc -l
341
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 1 | wc -l
339
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 1 | wc -l
338
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 2 | wc -l
336
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 2 | wc -l
328
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 2 | wc -l
335
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 0 | wc -l
332
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 0 | wc -l
331
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 0 | wc -l
327

This isn't a complete test for a cryptographically-secure random number
generator, of course, but I leave that for interested parties.

Link: <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e9a688bcb19348862afe30d7c85bc37c4c293471>
Link: <shadow-maint#624 (comment)>
Link: <https://arxiv.org/abs/1805.10941>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Cristian Rodríguez <crrodriguez@opensuse.org>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: Björn Esser <besser82@fedoraproject.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Joseph Myers <joseph@codesourcery.com>
Cc: Sam James <sam@gentoo.org>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Iker Pedrosa <ipedrosa@redhat.com>
[Daniel Lemire: Added link to research paper in source code]
Cc: Daniel Lemire <daniel@lemire.me>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
Now that we optimized csrand_uniform(), we don't need these functions.

This reverts commit 7c8fe29.

Signed-off-by: Alejandro Colomar <alx@kernel.org>
@alejandro-colomar
Copy link
Collaborator Author

alejandro-colomar commented Jan 27, 2023

Changes:

  • Clarify why -n%n works. [@hallyn]

@hallyn
Copy link
Member

hallyn commented Jan 28, 2023

I'm still reading the paper, but let's merge now :)

thanks. this has been educational for me.

@hallyn
Copy link
Member

hallyn commented Jan 28, 2023

Actually, I'll wait for a final ok from @alejandro-colomar - you're ready for this to merge?

@alejandro-colomar
Copy link
Collaborator Author

alejandro-colomar commented Jan 28, 2023 via email

@hallyn hallyn merged commit 848f53c into shadow-maint:master Jan 28, 2023
hallyn pushed a commit that referenced this pull request Jan 28, 2023
Use a different algorithm to minimize rejection.  This is essentially
the same algorithm implemented in the Linux kernel for
__get_random_u32_below(), but written in a more readable way, and
avoiding microopimizations that make it less readable.

Which (the Linux kernel implementation) is itself based on Daniel
Lemire's algorithm from "Fast Random Integer Generation in an Interval",
linked below.  However, I couldn't really understand that paper very
much, so I had to reconstruct the proofs from scratch, just from what I
could understand from the Linux kernel implementation source code.

I constructed some graphical explanation of how it works, and why it
is optimal, because I needed to visualize it to understand it.  It is
published in the GitHub pull request linked below.

Here goes a wordy explanation of why this algorithm based on
multiplication is better optimized than my original implementation based
on masking.

masking:

	It discards the extra bits of entropy that are not necessary for
	this operation.  This works as if dividing the entire space of
	possible csrand() values into smaller spaces of a size that is
	a smaller power of 2.  Each of those smaller spaces has a
	rejection band, so we get as many rejection bands as spaces
	there are.  For smaller values of 'n', the size of each
	rejection band is smaller, but having more rejection bands
	compensates for this, and results in the same inefficiency as
	for large values of 'n'.

multiplication:

	It divides the entire space of possible random numbers in
	chunks of size exactly 'n', so that there is only one rejection
	band that is the remainder of `2^64 % n`.  The worst case is
	still similar to the masking algorithm, a rejection band that is
	almost half the entire space (n = 2^63 + 1), but for lower
	values of 'n', by only having one small rejection band, it is
	much faster than the masking algorithm.

	This algorithm, however, has one caveat: the implementation
	is harder to read, since it relies on several bitwise tricky
	operations to perform operations like `2^64 % n`, `mult % 2^64`,
	and `mult / 2^64`.  And those operations are different depending
	on the number of bits of the maximum possible random number
	generated by the function.  This means that while this algorithm
	could also be applied to get uniform random numbers in the range
	[0, n-1] quickly from a function like rand(3), which only
	produces 31 bits of (non-CS) random numbers, it would need to be
	implemented differently.  However, that's not a concern for us,
	it's just a note so that nobody picks this code and expects it
	to just work with rand(3) (which BTW I tried for testing it, and
	got a bit confused until I realized this).

Finally, here's some light testing of this implementation, just to know
that I didn't goof it.  I pasted this function into a standalone
program, and run it many times to find if it has any bias (I tested also
to see how many iterations it performs, and it's also almost always 1,
but that test is big enough to not paste it here).

int main(int argc, char *argv[])
{
	printf("%lu\n", csrand_uniform(atoi(argv[1])));
}

$ seq 1 1000 | while read _; do ./a.out 3; done | grep 1 | wc -l
341
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 1 | wc -l
339
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 1 | wc -l
338
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 2 | wc -l
336
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 2 | wc -l
328
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 2 | wc -l
335
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 0 | wc -l
332
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 0 | wc -l
331
$ seq 1 1000 | while read _; do ./a.out 3; done | grep 0 | wc -l
327

This isn't a complete test for a cryptographically-secure random number
generator, of course, but I leave that for interested parties.

Link: <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e9a688bcb19348862afe30d7c85bc37c4c293471>
Link: <#624 (comment)>
Link: <https://arxiv.org/abs/1805.10941>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Cristian Rodríguez <crrodriguez@opensuse.org>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: Björn Esser <besser82@fedoraproject.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Joseph Myers <joseph@codesourcery.com>
Cc: Sam James <sam@gentoo.org>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Iker Pedrosa <ipedrosa@redhat.com>
[Daniel Lemire: Added link to research paper in source code]
Cc: Daniel Lemire <daniel@lemire.me>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
@alejandro-colomar alejandro-colomar deleted the shadow_random branch January 29, 2023 19:37
alejandro-colomar added a commit to alejandro-colomar/shadow that referenced this pull request Jun 19, 2024
This was broken during an optimization.  We need to start from a random
value of the width of the limit, that is, 32 bits.

Thanks to Jason for pointing to his similar code in the kernel, which
made me see my mistake.

Fixes: 2a61122 ("Unoptimize the higher part of the domain of csrand_uniform()")
Closes: <shadow-maint#1015>
Reported-by: Michael Brunnbauer <https://github.com/michaelbrunnbauer>
Link: <https://git.zx2c4.com/linux-rng/tree/drivers/char/random.c#n535>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Link: <shadow-maint#638>
Link: <shadow-maint#634>
Link: <shadow-maint#624>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
alejandro-colomar added a commit to alejandro-colomar/shadow that referenced this pull request Jun 19, 2024
This was broken during an un-optimization.  We need to start from a
random value of the width of the limit, that is, 32 bits.

Thanks to Jason for pointing to his similar code in the kernel, which
made me see my mistake.

Fixes: 2a61122 ("Unoptimize the higher part of the domain of csrand_uniform()")
Closes: <shadow-maint#1015>
Reported-by: Michael Brunnbauer <https://github.com/michaelbrunnbauer>
Link: <https://git.zx2c4.com/linux-rng/tree/drivers/char/random.c#n535>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Link: <shadow-maint#638>
Link: <shadow-maint#634>
Link: <shadow-maint#624>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
alejandro-colomar added a commit to alejandro-colomar/shadow that referenced this pull request Jun 19, 2024
I accidentally broke this code during an un-optimization.  We need to
start from a random value of the width of the limit, that is, 32 bits.

Thanks to Jason for pointing to his similar code in the kernel, which
made me see my mistake.

Fixes: 2a61122 ("Unoptimize the higher part of the domain of csrand_uniform()")
Closes: <shadow-maint#1015>
Reported-by: Michael Brunnbauer <https://github.com/michaelbrunnbauer>
Link: <https://git.zx2c4.com/linux-rng/tree/drivers/char/random.c#n535>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Link: <shadow-maint#638>
Link: <shadow-maint#634>
Link: <shadow-maint#624>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
alejandro-colomar added a commit to alejandro-colomar/shadow that referenced this pull request Jun 19, 2024
I accidentally broke this code during an un-optimization.  We need to
start from a random value of the width of the limit, that is, 32 bits.

Thanks to Jason for pointing to his similar code in the kernel, which
made me see my mistake.

Fixes: 2a61122 ("Unoptimize the higher part of the domain of csrand_uniform()")
Closes: <shadow-maint#1015>
Reported-by: Michael Brunnbauer <https://github.com/michaelbrunnbauer>
Link: <https://git.zx2c4.com/linux-rng/tree/drivers/char/random.c#n535>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Link: <shadow-maint#638>
Link: <shadow-maint#634>
Link: <shadow-maint#624>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
alejandro-colomar added a commit to alejandro-colomar/shadow that referenced this pull request Jun 20, 2024
I accidentally broke this code during an un-optimization.  We need to
start from a random value of the width of the limit, that is, 32 bits.

Thanks to Jason for pointing to his similar code in the kernel, which
made me see my mistake.

Fixes: 2a61122 ("Unoptimize the higher part of the domain of csrand_uniform()")
Closes: <shadow-maint#1015>
Reported-by: Michael Brunnbauer <https://github.com/michaelbrunnbauer>
Link: <https://git.zx2c4.com/linux-rng/tree/drivers/char/random.c#n535>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Link: <shadow-maint#638>
Link: <shadow-maint#634>
Link: <shadow-maint#624>
Tested-by: Michael Brunnbauer <https://github.com/michaelbrunnbauer>
Reviewed-by: Michael Brunnbauer <https://github.com/michaelbrunnbauer>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
hallyn pushed a commit that referenced this pull request Jun 21, 2024
I accidentally broke this code during an un-optimization.  We need to
start from a random value of the width of the limit, that is, 32 bits.

Thanks to Jason for pointing to his similar code in the kernel, which
made me see my mistake.

Fixes: 2a61122 ("Unoptimize the higher part of the domain of csrand_uniform()")
Closes: <#1015>
Reported-by: Michael Brunnbauer <https://github.com/michaelbrunnbauer>
Link: <https://git.zx2c4.com/linux-rng/tree/drivers/char/random.c#n535>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Link: <#638>
Link: <#634>
Link: <#624>
Tested-by: Michael Brunnbauer <https://github.com/michaelbrunnbauer>
Reviewed-by: Michael Brunnbauer <https://github.com/michaelbrunnbauer>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
alejandro-colomar added a commit that referenced this pull request Jun 21, 2024
I accidentally broke this code during an un-optimization.  We need to
start from a random value of the width of the limit, that is, 32 bits.

Thanks to Jason for pointing to his similar code in the kernel, which
made me see my mistake.

Fixes: 2a61122 ("Unoptimize the higher part of the domain of csrand_uniform()")
Closes: <#1015>
Reported-by: Michael Brunnbauer <https://github.com/michaelbrunnbauer>
Link: <https://git.zx2c4.com/linux-rng/tree/drivers/char/random.c#n535>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Link: <#638>
Link: <#634>
Link: <#624>
Tested-by: Michael Brunnbauer <https://github.com/michaelbrunnbauer>
Reviewed-by: Michael Brunnbauer <https://github.com/michaelbrunnbauer>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
Cherry-picked-from: 4119a2dce564 ("lib/csrand.c: Fix the lower part of the domain of csrand_uniform()")
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Link: <#1025>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
alejandro-colomar added a commit that referenced this pull request Jun 21, 2024
I accidentally broke this code during an un-optimization.  We need to
start from a random value of the width of the limit, that is, 32 bits.

Thanks to Jason for pointing to his similar code in the kernel, which
made me see my mistake.

Fixes: 2a61122 ("Unoptimize the higher part of the domain of csrand_uniform()")
Closes: <#1015>
Reported-by: Michael Brunnbauer <https://github.com/michaelbrunnbauer>
Link: <https://git.zx2c4.com/linux-rng/tree/drivers/char/random.c#n535>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Link: <#638>
Link: <#634>
Link: <#624>
Tested-by: Michael Brunnbauer <https://github.com/michaelbrunnbauer>
Reviewed-by: Michael Brunnbauer <https://github.com/michaelbrunnbauer>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
Cherry-picked-from: 4119a2dce564 ("lib/csrand.c: Fix the lower part of the domain of csrand_uniform()")
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Link: <#1025>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
@thesamesam thesamesam mentioned this pull request Jul 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants