Remove bias from functions that get randomness, simplify their implementation, and move to separate file #624

alejandro-colomar · 2022-12-30T19:03:09Z

No description provided.

lib/bit.h

zx2c4 · 2022-12-31T03:06:52Z

libmisc/random.c

+ * Similar to arc4random_uniform(3), but returns a u_long.
+ */
+unsigned long
+shadow_random_uniform(unsigned long upper_bound)


The algo I implemented in Linux might be a bit more efficient if the cost of multiplication is low compared to the cost of drawing a new random sample, which is usually the case:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e9a688bcb19348862afe30d7c85bc37c4c293471
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7f576b2593a978451416424e75f69ad1e3ae4efe

Hi Jason,

Hmmm, those Linux commits do too much magic to my taste. Even with the comment and the commit message, I can't still understand why the multiplication works.

So, for an initial implementation, I'd keep the simple truncation with &=, even if it's slower.

After that, well, for shadow, I don't think the performance is worth making the code cryptic. Moreover, since we use 64-bit numbers in this code, we would need 128-bit multiplication, which is not something supported everywhere.

However, it might be an interesting thing to do in glibc. We could discuss improving the glibc implementation of arc4random_uniform(3) if you think it is an improvement over the current glibc code (please check). If we do that, I suggest that you prepare a very detailed description of why the multiplication works, because at least in my case (and I suspect for many other programmers too), I already forgot the details of how integer multiplication works bitwise from when I studied that at school, and it's something I never use, so it's hard to understand why it serves the purpose here.

[edited; after writing the answer to Theo, I discarded some ideas]

Cheers,

Alex

Lemire has proofs: https://arxiv.org/abs/1805.10941

Your previous comment had a mention of expanding the kernel API. Not sure what you're proposing, but I've got a patchset under review that will make arc4random() and company both safe and fast.

Lemire has proofs: https://arxiv.org/abs/1805.10941

Thanks!

Your previous comment had a mention of expanding the kernel API. Not sure what you're proposing,

I'll restore the message here, since it seems that arc4random*(3) can't be fixed because the bugs have percolated to users that now depend on the design issues. That's fair, though; nothing surprising in C's history.

The removed part of my message was:

"
But, don't hurry; after Theo's email, I'm starting to strongly believe that the arc4random_uniform(3) interface is really bad, and you were right when you suggested a few months ago that we should reconsider if we want to add it, or maybe design a better API. I will propose deprecating arc4random*(3) functions in glibc, and come up with a better set of interfaces, maybe implemented entirely in the kernel.

I'll also send some draft patches to the kernel, to start discussion about new functions that improve over the current get_random*() that you showed, and from that discussion we can maybe come up with new syscalls.
"

but I've got a patchset under review that will make arc4random() and company both safe and fast.

Please add me to the loop; I'm curious. What's the link?

I finally understood how the multiplication method works (I needed to viasualize it). Here's some reduction to 3 bits, which (at least to me) clarifies how the rejection works. If you go to 4 bits, you can see that the rejection decreases considerably, and the more bits you add, the less rejection you'll see for low values of N.

This is a graph of the probability of looping, for the algorithm with multiplication (for 32 bits):

From those saw lines, only the last one goes up to 0.5. As you go closer to 0, they are smaller. This is compared to my implementation with masks, where all the saw lines go up to 0.5.

This comes from the fact that masking is discarding entropy bits. glibc keeps those entropy bits in a loop, but it's simpler to just use multiplication to make use of them in a single operation. And really, glibc's algorithm still discards some entropy compared to multiplication, but it's a small amount.

Worst case is still the same 0.5, but with this optimization, it happens less often.

@ikerexxe , @hallyn

This is a considerable performance improvement for low values of N. The only downside is that it requires using a type wider than long, which would need to be unsigned __int128. It's a GNU extension, so I don't know how much it's viable. The current approach of masking is not so bad, it will normally just call a handful of times getrandom(2), but it could be much better.

Here goes an (incomplete and less detailed) 4-bit visual explanation of how it works, to compare it to the 3-bit one from above. We can see that the rejection is smaller for low values of N compared to the same values of N when the original random number generator has less bits.

alejandro-colomar · 2023-01-01T17:28:59Z

Changes:

Rename shadow_random_uniform() to shadow_random_below().

Since our function behaves slightly different than arc4random_uniform(3), it's better not to have a name that would cause confusion. Instead, use a name similar to the Linux kernel internal functions that behave like ours.

alejandro-colomar · 2023-01-08T02:07:46Z

Changes:

Fix undefined behavior in include guard (leading underscore + uppercase is reserved).
Add bit_ceil_wrapul().
Rename functions to distance from arc4random(3), and at the same time, make more clear that the functions produce cryptographically-secure values. However, keep some similarity with arc4random() in what is useful, for some consistency.
Improve comments documenting these functions.
Fix Undefined Behavior in implementation of shadow_random_uniform()/csrand_uniform(), which was due to a right shift by 32. Use instead bit_ceil_wrapul(), which has no UB.

alejandro-colomar · 2023-01-16T12:29:23Z

Changes: Rebase to master

alejandro-colomar · 2023-01-16T12:46:32Z

I think this is ready for review. This time, I didn't add extensive documentation for these functions, because I think the one-liner comment is clear enough. In fact, the manual page I wrote for the arc4random(3) functions is similarly short (see below), and most of the differences are noting the bugs in the libc functions. If you consider some more documentation is merited, please comment.

arc4random(3)              Library Functions Manual              arc4random(3)

NAME
       arc4random,  arc4random_uniform, arc4random_buf - cryptographically‐se‐
       cure pseudorandom number generator

LIBRARY
       Standard C library (libc, ‐lc)

SYNOPSIS
       #include <stdlib.h>

       uint32_t arc4random(void);
       uint32_t arc4random_uniform(uint32_t upper_bound);
       void arc4random_buf(void buf[.n], size_t n);

DESCRIPTION
       These functions give cryptographically‐secure pseudorandom numbers.

       arc4random() returns a uniformly‐distributed value.

       arc4random_uniform() returns a uniformly‐distributed  value  less  than
       upper_bound (see BUGS).

       arc4random_buf()  fills  the  memory pointed to by buf, with n bytes of
       pseudorandom data.

       The rand(3) and drand48(3) families of functions should  only  be  used
       where  the  quality  of  the  pseudorandom numbers is not a concern and
       there’s a need for repeatability of the results.  Unless you meet  both
       of those conditions, use the arc4random() functions.

RETURN VALUE
       arc4random() returns a pseudorandom number.

       arc4random_uniform()  returns  a  pseudorandom  number  less  than  up‐
       per_bound for valid input, or 0 when upper_bound is invalid.

ATTRIBUTES
       For an explanation of the terms  used  in  this  section,  see  attrib‐
       utes(7).
       ┌────────────────────────────────────────────┬───────────────┬─────────┐
       │Interface                                   │ Attribute     │ Value   │
       ├────────────────────────────────────────────┼───────────────┼─────────┤
       │arc4random(), arc4random_uniform(),         │ Thread safety │ MT‐Safe │
       │arc4random_buf()                            │               │         │
       └────────────────────────────────────────────┴───────────────┴─────────┘

STANDARDS
       These nonstandard functions are present on several Unix systems.

BUGS
       An  upper_bound  of  0  doesn’t make sense in a call to arc4random_uni‐
       form().  Such a call will fail, and return 0.  Be careful,  since  that
       value  is  not less than upper_bound.  In some cases, such as accessing
       an array, using that value could result in Undefined Behavior.

SEE ALSO
       getrandom(3), rand(3), drand48(3), random(7)

Linux man‐pages (unreleased)        (date)                       arc4random(3)

alejandro-colomar · 2023-01-16T13:06:04Z

Changes:

Use the term interval, rather than range, since interval is more correct mathematically, and so it will be more precise for users of the API what it does.

ikerexxe · 2023-01-16T15:50:54Z

I have several questions before being able to further review this PR.

What does UB mean?
What will bit_ceil_wrapul() and leading_zerosul() do? A search on the internet hasn't provided any meaningful answer.

In the Add csrand_uniform() commit you wrote reutrns instead of returns.

Finally, a request. Can you move the bit manipulation functions commit? It took me a context switch to understand why you were adding those functions, and then you didn't use the functions provided in those commits in the next commit, so I had another context switch.

alejandro-colomar · 2023-01-16T21:06:57Z

I have several questions before being able to further review this PR.
* What does UB mean?

Ahh, sorry. UB is an abbreviation for Undefined Behavior (aka Nasal Demons; now I wonder if someone ever wrote a program called nasald :P ). It's a common abbreviation in some forums, and I sometimes assume it's common everywhere :)

https://port70.net/%7Ensz/c/c11/n1570.html#3.4.3

* What will bit_ceil_wrapul() and leading_zerosul() do? A search on the internet hasn't provided any meaningful answer.

They are similar to planned additions to ISO C23.

https://thephd.dev/c23-is-coming-here-is-what-is-on-the-menu#n3022---modern-bit-utilities
https://www.open-std.org/JTC1/SC22/WG14/www/docs/n3054.pdf#subsection.7.18.3
https://www.open-std.org/JTC1/SC22/WG14/www/docs/n3054.pdf#subsection.7.18.16

In the Add csrand_uniform() commit you wrote reutrns instead of returns.

Finally, a request. Can you move the bit manipulation functions commit? It took me a context switch to understand why you were adding those functions, and then you didn't use the functions provided in those commits in the next commit, so I had another context switch.

Ahh, yes, I thought I had put it just before its use. I'll move it.

Happy new year :)

Alex

alejandro-colomar · 2023-01-16T22:03:46Z

Changes:

Reorder commits.
Change comment to expand UB into "Undefined Behavior".

ikerexxe

Some inline questions to better understand the code.

In the Add csrand_uniform() commit you wrote reutrns instead of returns.

This comment is still valid.

lib/bit.h

libmisc/csrand.c

ikerexxe · 2023-01-17T10:08:58Z

arc4random(3)              Library Functions Manual              arc4random(3)

NAME
       arc4random,  arc4random_uniform, arc4random_buf - cryptographically‐se‐
       cure pseudorandom number generator

I guess this manual page was updated recently. Am I right? I cannot see it like that in my system.

alejandro-colomar · 2023-01-17T12:12:24Z

arc4random(3)              Library Functions Manual              arc4random(3)

NAME
       arc4random,  arc4random_uniform, arc4random_buf - cryptographically‐se‐
       cure pseudorandom number generator
I guess this manual page was updated recently. Am I right? I cannot see it like that in my system.

Yes, I added it in 2023 :)

alx@debian:~/src/linux/man-pages/man-pages/main$ git log -- man3/arc4random.3
commit 3638ab3f148ad661e731e4d5d744c2fd32c463f0
Author: Alejandro Colomar <alx@kernel.org>
Date:   Thu Jan 5 20:14:19 2023 +0100

    arc4random.3: Raise the severity of the CAVEATS to BUGS
    
    This is a misdesign that the original OpenBSD developers fail to
    acknowledge.
    
    Link: <https://inbox.sourceware.org/libc-alpha/068b01c4-d0c4-0849-eabb-09c020a1480b@gmail.com/T/>
    Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
    Cc: Arsen Arsenović <arsen@aarsen.me>
    Cc: Sam James <sam@gentoo.org>
    Cc: "Serge E. Hallyn" <serge@hallyn.com>
    Signed-off-by: Alejandro Colomar <alx@kernel.org>

commit 477c35bc1b0a785bd0b1eb5d57943b0483390519
Author: Alejandro Colomar <alx@kernel.org>
Date:   Mon Jan 2 12:08:31 2023 +0100

    arc4random.3: Be consistent in uses of pseudorandom
    
    Signed-off-by: Alejandro Colomar <alx@kernel.org>

commit 848404feb5bfbdcb9b8f693ec29159231ee2a7c8
Author: Alejandro Colomar <alx@kernel.org>
Date:   Sun Jan 1 17:18:22 2023 +0100

    arc4random.3: New page documenting the arc4random(3) family of functions
    
    arc4random(3)
    arc4random_uniform(3)
    arc4random_buf(3)
    
    Signed-off-by: Alejandro Colomar <alx@kernel.org>

alejandro-colomar · 2023-01-17T12:13:48Z

Changes:

Fix typo in commit message: reutrns -> returns

Some inline questions to better understand the code.

In the Add csrand_uniform() commit you wrote reutrns instead of returns.

This comment is still valid.

Oops, fixed.

arc4random(3) returns a number. arc4random_buf(3) fills a buffer. arc4random_uniform(3) returns a number less than a bound. and I'd add a hypothetical one which we use: *_interval() should return a number within the interval [min, max]. In reality, the function being called csrand() in this patch is not really cryptographically secure, since it had a bias, but a subsequent patch will fix that. Signed-off-by: Alejandro Colomar <alx@kernel.org>

A set of APIs similar to arc4random(3) is complex enough to deserve its own file. Cc: "Jason A. Donenfeld" <Jason@zx2c4.com> Cc: Cristian Rodríguez <crrodriguez@opensuse.org> Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org> Cc: Björn Esser <besser82@fedoraproject.org> Cc: Yann Droneaud <ydroneaud@opteya.com> Cc: Joseph Myers <joseph@codesourcery.com> Signed-off-by: Alejandro Colomar <alx@kernel.org>

These functions implement bit manipulation APIs, which will be added to C23, so that in the far future, we will be able to replace our functions by the standard ones, just by adding the stdc_ prefix, and including <stdbit.h>. However, we need to avoid UB for an input of 0, so slightly deviate from C23, and use a different name (with _wrap) for distunguishing our API from the standard one. Cc: Joseph Myers <joseph@codesourcery.com> Cc: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Alejandro Colomar <alx@kernel.org>

This API is similar to arc4random_uniform(3). However, for an input of 0, this function is equivalent to csrand(), while arc4random_uniform(0) returns 0. This function will be used to reimplement csrand_interval() as a wrapper around this one. The current implementation of csrand_interval() doesn't produce very good random numbers. It has a bias. And that comes from performing some unnecessary floating-point calculations that overcomplicate the problem. Looping until the random number hits within bounds is unbiased, and truncating unwanted bits makes the overhead of the loop very small. We could reduce loop overhead even more, by keeping unused bits of the random number, if the width of the mask is not greater than ULONG_WIDTH/2, however, that complicates the code considerably, and I prefer to be a bit slower but have simple code. BTW, Björn really deserves the copyright for csrand() (previously known as read_random_bytes()), since he rewrote it almost from scratch last year, and I kept most of its contents. Since he didn't put himself in the copyright back then, and BSD-3-Clause doesn't allow me to attribute derived works, I won't add his name, but if he asks, he should be put in the copyright too. Cc: "Jason A. Donenfeld" <Jason@zx2c4.com> Cc: Cristian Rodríguez <crrodriguez@opensuse.org> Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org> Cc: Björn Esser <besser82@fedoraproject.org> Cc: Yann Droneaud <ydroneaud@opteya.com> Cc: Joseph Myers <joseph@codesourcery.com> Cc: Sam James <sam@gentoo.org> Signed-off-by: Alejandro Colomar <alx@kernel.org>

The old code didn't produce very good random numbers. It had a bias. And that was from performing some unnecessary floating-point calculations that overcomplicate the problem. Cc: "Jason A. Donenfeld" <Jason@zx2c4.com> Cc: Cristian Rodríguez <crrodriguez@opensuse.org> Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org> Cc: Björn Esser <besser82@fedoraproject.org> Cc: Yann Droneaud <ydroneaud@opteya.com> Cc: Joseph Myers <joseph@codesourcery.com> Cc: Sam James <sam@gentoo.org> Signed-off-by: Alejandro Colomar <alx@kernel.org>

It is common to use the expression 'sizeof(x) * CHAR_BIT' to mean the width in bits of a type or object. Now that there are _WIDTH macros for some types, indicating the number of bits that they use, it makes sense to wrap this calculation in a macro of a similar name. Signed-off-by: Alejandro Colomar <alx@kernel.org>

Signed-off-by: Alejandro Colomar <alx@kernel.org>

Use a different algorithm to minimize rejection. This is essentially the same algorithm implemented in the Linux kernel for __get_random_u32_below(), but written in a more readable way, and avoiding microopimizations that make it less readable. Which (the Linux kernel implementation) is itself based on Daniel Lemire's algorithm from "Fast Random Integer Generation in an Interval", linked below. However, I couldn't really understand that paper very much, so I had to reconstruct the proofs from scratch, just from what I could understand from the Linux kernel implementation source code. I constructed some graphical explanation of how it works, and why it is optimal, because I needed to visualize it to understand it. It is published in the GitHub pull request linked below. Here goes a wordy explanation of why this algorithm based on multiplication is better optimized than my original implementation based on masking. masking: It discards the extra bits of entropy that are not necessary for this operation. This works as if dividing the entire space of possible csrand() values into smaller spaces of a size that is a smaller power of 2. Each of those smaller spaces has a rejection band, so we get as many rejection bands as spaces there are. For smaller values of 'n', the size of each rejection band is smaller, but having more rejection bands compensates for this, and results in the same inefficiency as for large values of 'n'. multiplication: It divides the entire space of possible random numbers in chunks of size exactly 'n', so that there is only one rejection band that is the remainder of `2^64 % n`. The worst case is still similar to the masking algorithm, a rejection band that is almost half the entire space (n = 2^63 + 1), but for lower values of 'n', by only having one small rejection band, it is much faster than the masking algorithm. This algorithm, however, has one caveat: the implementation is harder to read, since it relies on several bitwise tricky operations to perform operations like `2^64 % n`, `mult % 2^64`, and `mult / 2^64`. And those operations are different depending on the number of bits of the maximum possible random number generated by the function. This means that while this algorithm could also be applied to get uniform random numbers in the range [0, n-1] quickly from a function like rand(3), which only produces 31 bits of (non-CS) random numbers, it would need to be implemented differently. However, that's not a concern for us, it's just a note so that nobody picks this code and expects it to just work with rand(3) (which BTW I tried for testing it, and got a bit confused until I realized this). Finally, here's some light testing of this implementation, just to know that I didn't goof it. I pasted this function into a standalone program, and run it many times to find if it has any bias (I tested also to see how many iterations it performs, and it's also almost always 1, but that test is big enough to not paste it here). int main(int argc, char *argv[]) { printf("%lu\n", csrand_uniform(atoi(argv[1]))); } $ seq 1 1000 | while read _; do ./a.out 3; done | grep 1 | wc -l 341 $ seq 1 1000 | while read _; do ./a.out 3; done | grep 1 | wc -l 339 $ seq 1 1000 | while read _; do ./a.out 3; done | grep 1 | wc -l 338 $ seq 1 1000 | while read _; do ./a.out 3; done | grep 2 | wc -l 336 $ seq 1 1000 | while read _; do ./a.out 3; done | grep 2 | wc -l 328 $ seq 1 1000 | while read _; do ./a.out 3; done | grep 2 | wc -l 335 $ seq 1 1000 | while read _; do ./a.out 3; done | grep 0 | wc -l 332 $ seq 1 1000 | while read _; do ./a.out 3; done | grep 0 | wc -l 331 $ seq 1 1000 | while read _; do ./a.out 3; done | grep 0 | wc -l 327 This isn't a complete test for a cryptographically-secure random number generator, of course, but I leave that for interested parties. Link: <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e9a688bcb19348862afe30d7c85bc37c4c293471> Link: <shadow-maint#624 (comment)> Link: <https://arxiv.org/abs/1805.10941> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com> Cc: Cristian Rodríguez <crrodriguez@opensuse.org> Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org> Cc: Björn Esser <besser82@fedoraproject.org> Cc: Yann Droneaud <ydroneaud@opteya.com> Cc: Joseph Myers <joseph@codesourcery.com> Cc: Sam James <sam@gentoo.org> Cc: Serge Hallyn <serge@hallyn.com> Cc: Iker Pedrosa <ipedrosa@redhat.com> [Daniel Lemire: Added link to research paper in source code] Cc: Daniel Lemire <daniel@lemire.me> Signed-off-by: Alejandro Colomar <alx@kernel.org>

alejandro-colomar · 2023-01-27T12:36:11Z

Changes: rebase to master

Use a different algorithm to minimize rejection. This is essentially the same algorithm implemented in the Linux kernel for __get_random_u32_below(), but written in a more readable way, and avoiding microopimizations that make it less readable. Which (the Linux kernel implementation) is itself based on Daniel Lemire's algorithm from "Fast Random Integer Generation in an Interval", linked below. However, I couldn't really understand that paper very much, so I had to reconstruct the proofs from scratch, just from what I could understand from the Linux kernel implementation source code. I constructed some graphical explanation of how it works, and why it is optimal, because I needed to visualize it to understand it. It is published in the GitHub pull request linked below. Here goes a wordy explanation of why this algorithm based on multiplication is better optimized than my original implementation based on masking. masking: It discards the extra bits of entropy that are not necessary for this operation. This works as if dividing the entire space of possible csrand() values into smaller spaces of a size that is a smaller power of 2. Each of those smaller spaces has a rejection band, so we get as many rejection bands as spaces there are. For smaller values of 'n', the size of each rejection band is smaller, but having more rejection bands compensates for this, and results in the same inefficiency as for large values of 'n'. multiplication: It divides the entire space of possible random numbers in chunks of size exactly 'n', so that there is only one rejection band that is the remainder of `2^64 % n`. The worst case is still similar to the masking algorithm, a rejection band that is almost half the entire space (n = 2^63 + 1), but for lower values of 'n', by only having one small rejection band, it is much faster than the masking algorithm. This algorithm, however, has one caveat: the implementation is harder to read, since it relies on several bitwise tricky operations to perform operations like `2^64 % n`, `mult % 2^64`, and `mult / 2^64`. And those operations are different depending on the number of bits of the maximum possible random number generated by the function. This means that while this algorithm could also be applied to get uniform random numbers in the range [0, n-1] quickly from a function like rand(3), which only produces 31 bits of (non-CS) random numbers, it would need to be implemented differently. However, that's not a concern for us, it's just a note so that nobody picks this code and expects it to just work with rand(3) (which BTW I tried for testing it, and got a bit confused until I realized this). Finally, here's some light testing of this implementation, just to know that I didn't goof it. I pasted this function into a standalone program, and run it many times to find if it has any bias (I tested also to see how many iterations it performs, and it's also almost always 1, but that test is big enough to not paste it here). int main(int argc, char *argv[]) { printf("%lu\n", csrand_uniform(atoi(argv[1]))); } $ seq 1 1000 | while read _; do ./a.out 3; done | grep 1 | wc -l 341 $ seq 1 1000 | while read _; do ./a.out 3; done | grep 1 | wc -l 339 $ seq 1 1000 | while read _; do ./a.out 3; done | grep 1 | wc -l 338 $ seq 1 1000 | while read _; do ./a.out 3; done | grep 2 | wc -l 336 $ seq 1 1000 | while read _; do ./a.out 3; done | grep 2 | wc -l 328 $ seq 1 1000 | while read _; do ./a.out 3; done | grep 2 | wc -l 335 $ seq 1 1000 | while read _; do ./a.out 3; done | grep 0 | wc -l 332 $ seq 1 1000 | while read _; do ./a.out 3; done | grep 0 | wc -l 331 $ seq 1 1000 | while read _; do ./a.out 3; done | grep 0 | wc -l 327 This isn't a complete test for a cryptographically-secure random number generator, of course, but I leave that for interested parties. Link: <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e9a688bcb19348862afe30d7c85bc37c4c293471> Link: <shadow-maint#624 (comment)> Link: <https://arxiv.org/abs/1805.10941> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com> Cc: Cristian Rodríguez <crrodriguez@opensuse.org> Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org> Cc: Björn Esser <besser82@fedoraproject.org> Cc: Yann Droneaud <ydroneaud@opteya.com> Cc: Joseph Myers <joseph@codesourcery.com> Cc: Sam James <sam@gentoo.org> Cc: Serge Hallyn <serge@hallyn.com> Cc: Iker Pedrosa <ipedrosa@redhat.com> [Daniel Lemire: Added link to research paper in source code] Cc: Daniel Lemire <daniel@lemire.me> Signed-off-by: Alejandro Colomar <alx@kernel.org>

Now that we optimized csrand_uniform(), we don't need these functions. This reverts commit 7c8fe29. Signed-off-by: Alejandro Colomar <alx@kernel.org>

alejandro-colomar · 2023-01-27T12:38:08Z

Changes:

Clarify why -n%n works. [@hallyn]

hallyn · 2023-01-28T02:03:31Z

I'm still reading the paper, but let's merge now :)

thanks. this has been educational for me.

hallyn · 2023-01-28T02:04:04Z

Actually, I'll wait for a final ok from @alejandro-colomar - you're ready for this to merge?

alejandro-colomar · 2023-01-28T03:06:44Z

Hi Serge!

On Sat, Jan 28, 2023, 03:04 Serge Hallyn ***@***.***> wrote: Actually, I'll wait for a final ok from @alejandro-colomar <https://github.com/alejandro-colomar> - you're ready for this to merge?

Sure :) Unless you'd like me to clarify anything a bit more in the commit messages or comments, I think it's fine. Cheers, Alex

Use a different algorithm to minimize rejection. This is essentially the same algorithm implemented in the Linux kernel for __get_random_u32_below(), but written in a more readable way, and avoiding microopimizations that make it less readable. Which (the Linux kernel implementation) is itself based on Daniel Lemire's algorithm from "Fast Random Integer Generation in an Interval", linked below. However, I couldn't really understand that paper very much, so I had to reconstruct the proofs from scratch, just from what I could understand from the Linux kernel implementation source code. I constructed some graphical explanation of how it works, and why it is optimal, because I needed to visualize it to understand it. It is published in the GitHub pull request linked below. Here goes a wordy explanation of why this algorithm based on multiplication is better optimized than my original implementation based on masking. masking: It discards the extra bits of entropy that are not necessary for this operation. This works as if dividing the entire space of possible csrand() values into smaller spaces of a size that is a smaller power of 2. Each of those smaller spaces has a rejection band, so we get as many rejection bands as spaces there are. For smaller values of 'n', the size of each rejection band is smaller, but having more rejection bands compensates for this, and results in the same inefficiency as for large values of 'n'. multiplication: It divides the entire space of possible random numbers in chunks of size exactly 'n', so that there is only one rejection band that is the remainder of `2^64 % n`. The worst case is still similar to the masking algorithm, a rejection band that is almost half the entire space (n = 2^63 + 1), but for lower values of 'n', by only having one small rejection band, it is much faster than the masking algorithm. This algorithm, however, has one caveat: the implementation is harder to read, since it relies on several bitwise tricky operations to perform operations like `2^64 % n`, `mult % 2^64`, and `mult / 2^64`. And those operations are different depending on the number of bits of the maximum possible random number generated by the function. This means that while this algorithm could also be applied to get uniform random numbers in the range [0, n-1] quickly from a function like rand(3), which only produces 31 bits of (non-CS) random numbers, it would need to be implemented differently. However, that's not a concern for us, it's just a note so that nobody picks this code and expects it to just work with rand(3) (which BTW I tried for testing it, and got a bit confused until I realized this). Finally, here's some light testing of this implementation, just to know that I didn't goof it. I pasted this function into a standalone program, and run it many times to find if it has any bias (I tested also to see how many iterations it performs, and it's also almost always 1, but that test is big enough to not paste it here). int main(int argc, char *argv[]) { printf("%lu\n", csrand_uniform(atoi(argv[1]))); } $ seq 1 1000 | while read _; do ./a.out 3; done | grep 1 | wc -l 341 $ seq 1 1000 | while read _; do ./a.out 3; done | grep 1 | wc -l 339 $ seq 1 1000 | while read _; do ./a.out 3; done | grep 1 | wc -l 338 $ seq 1 1000 | while read _; do ./a.out 3; done | grep 2 | wc -l 336 $ seq 1 1000 | while read _; do ./a.out 3; done | grep 2 | wc -l 328 $ seq 1 1000 | while read _; do ./a.out 3; done | grep 2 | wc -l 335 $ seq 1 1000 | while read _; do ./a.out 3; done | grep 0 | wc -l 332 $ seq 1 1000 | while read _; do ./a.out 3; done | grep 0 | wc -l 331 $ seq 1 1000 | while read _; do ./a.out 3; done | grep 0 | wc -l 327 This isn't a complete test for a cryptographically-secure random number generator, of course, but I leave that for interested parties. Link: <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e9a688bcb19348862afe30d7c85bc37c4c293471> Link: <#624 (comment)> Link: <https://arxiv.org/abs/1805.10941> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com> Cc: Cristian Rodríguez <crrodriguez@opensuse.org> Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org> Cc: Björn Esser <besser82@fedoraproject.org> Cc: Yann Droneaud <ydroneaud@opteya.com> Cc: Joseph Myers <joseph@codesourcery.com> Cc: Sam James <sam@gentoo.org> Cc: Serge Hallyn <serge@hallyn.com> Cc: Iker Pedrosa <ipedrosa@redhat.com> [Daniel Lemire: Added link to research paper in source code] Cc: Daniel Lemire <daniel@lemire.me> Signed-off-by: Alejandro Colomar <alx@kernel.org>

This was broken during an optimization. We need to start from a random value of the width of the limit, that is, 32 bits. Thanks to Jason for pointing to his similar code in the kernel, which made me see my mistake. Fixes: 2a61122 ("Unoptimize the higher part of the domain of csrand_uniform()") Closes: <shadow-maint#1015> Reported-by: Michael Brunnbauer <https://github.com/michaelbrunnbauer> Link: <https://git.zx2c4.com/linux-rng/tree/drivers/char/random.c#n535> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com> Link: <shadow-maint#638> Link: <shadow-maint#634> Link: <shadow-maint#624> Signed-off-by: Alejandro Colomar <alx@kernel.org>

This was broken during an un-optimization. We need to start from a random value of the width of the limit, that is, 32 bits. Thanks to Jason for pointing to his similar code in the kernel, which made me see my mistake. Fixes: 2a61122 ("Unoptimize the higher part of the domain of csrand_uniform()") Closes: <shadow-maint#1015> Reported-by: Michael Brunnbauer <https://github.com/michaelbrunnbauer> Link: <https://git.zx2c4.com/linux-rng/tree/drivers/char/random.c#n535> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com> Link: <shadow-maint#638> Link: <shadow-maint#634> Link: <shadow-maint#624> Signed-off-by: Alejandro Colomar <alx@kernel.org>

I accidentally broke this code during an un-optimization. We need to start from a random value of the width of the limit, that is, 32 bits. Thanks to Jason for pointing to his similar code in the kernel, which made me see my mistake. Fixes: 2a61122 ("Unoptimize the higher part of the domain of csrand_uniform()") Closes: <shadow-maint#1015> Reported-by: Michael Brunnbauer <https://github.com/michaelbrunnbauer> Link: <https://git.zx2c4.com/linux-rng/tree/drivers/char/random.c#n535> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com> Link: <shadow-maint#638> Link: <shadow-maint#634> Link: <shadow-maint#624> Signed-off-by: Alejandro Colomar <alx@kernel.org>

I accidentally broke this code during an un-optimization. We need to start from a random value of the width of the limit, that is, 32 bits. Thanks to Jason for pointing to his similar code in the kernel, which made me see my mistake. Fixes: 2a61122 ("Unoptimize the higher part of the domain of csrand_uniform()") Closes: <shadow-maint#1015> Reported-by: Michael Brunnbauer <https://github.com/michaelbrunnbauer> Link: <https://git.zx2c4.com/linux-rng/tree/drivers/char/random.c#n535> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com> Link: <shadow-maint#638> Link: <shadow-maint#634> Link: <shadow-maint#624> Tested-by: Michael Brunnbauer <https://github.com/michaelbrunnbauer> Reviewed-by: Michael Brunnbauer <https://github.com/michaelbrunnbauer> Signed-off-by: Alejandro Colomar <alx@kernel.org>

I accidentally broke this code during an un-optimization. We need to start from a random value of the width of the limit, that is, 32 bits. Thanks to Jason for pointing to his similar code in the kernel, which made me see my mistake. Fixes: 2a61122 ("Unoptimize the higher part of the domain of csrand_uniform()") Closes: <#1015> Reported-by: Michael Brunnbauer <https://github.com/michaelbrunnbauer> Link: <https://git.zx2c4.com/linux-rng/tree/drivers/char/random.c#n535> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com> Link: <#638> Link: <#634> Link: <#624> Tested-by: Michael Brunnbauer <https://github.com/michaelbrunnbauer> Reviewed-by: Michael Brunnbauer <https://github.com/michaelbrunnbauer> Signed-off-by: Alejandro Colomar <alx@kernel.org>

I accidentally broke this code during an un-optimization. We need to start from a random value of the width of the limit, that is, 32 bits. Thanks to Jason for pointing to his similar code in the kernel, which made me see my mistake. Fixes: 2a61122 ("Unoptimize the higher part of the domain of csrand_uniform()") Closes: <#1015> Reported-by: Michael Brunnbauer <https://github.com/michaelbrunnbauer> Link: <https://git.zx2c4.com/linux-rng/tree/drivers/char/random.c#n535> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com> Link: <#638> Link: <#634> Link: <#624> Tested-by: Michael Brunnbauer <https://github.com/michaelbrunnbauer> Reviewed-by: Michael Brunnbauer <https://github.com/michaelbrunnbauer> Signed-off-by: Alejandro Colomar <alx@kernel.org> Cherry-picked-from: 4119a2dce564 ("lib/csrand.c: Fix the lower part of the domain of csrand_uniform()") Cc: "Serge E. Hallyn" <serge@hallyn.com> Link: <#1025> Signed-off-by: Alejandro Colomar <alx@kernel.org>

alejandro-colomar force-pushed the shadow_random branch 4 times, most recently from c040366 to 88e04c5 Compare December 30, 2022 23:57

github-advanced-security bot found potential problems Dec 31, 2022

View reviewed changes

lib/bit.h Fixed Show fixed Hide fixed

alejandro-colomar force-pushed the shadow_random branch 3 times, most recently from a6aafcb to 1fd52d2 Compare December 31, 2022 01:23

zx2c4 reviewed Dec 31, 2022

View reviewed changes

alejandro-colomar force-pushed the shadow_random branch from 1fd52d2 to 9773405 Compare January 1, 2023 17:26

alejandro-colomar force-pushed the shadow_random branch from 9773405 to dd11565 Compare January 8, 2023 02:01

alejandro-colomar force-pushed the shadow_random branch from dd11565 to 6593ba9 Compare January 8, 2023 15:34

alejandro-colomar force-pushed the shadow_random branch from 6593ba9 to dd04da6 Compare January 16, 2023 12:28

alejandro-colomar marked this pull request as ready for review January 16, 2023 12:43

alejandro-colomar force-pushed the shadow_random branch from dd04da6 to e8cb233 Compare January 16, 2023 13:04

alejandro-colomar force-pushed the shadow_random branch 2 times, most recently from 9bc5838 to a20792d Compare January 16, 2023 22:03

ikerexxe reviewed Jan 17, 2023

View reviewed changes

lib/bit.h Outdated Show resolved Hide resolved

libmisc/csrand.c Outdated Show resolved Hide resolved

libmisc/csrand.c Outdated Show resolved Hide resolved

alejandro-colomar force-pushed the shadow_random branch from a20792d to 389efe1 Compare January 17, 2023 12:13

alejandro-colomar added 7 commits January 27, 2023 13:35

Use WIDTHOF() instead of its expansion

275a785

Signed-off-by: Alejandro Colomar <alx@kernel.org>

alejandro-colomar force-pushed the shadow_random branch from 1def9b4 to 9cc1bb4 Compare January 27, 2023 12:35

alejandro-colomar added 2 commits January 27, 2023 13:37

Revert "Add bit manipulation functions"

439075a

Now that we optimized csrand_uniform(), we don't need these functions. This reverts commit 7c8fe29. Signed-off-by: Alejandro Colomar <alx@kernel.org>

alejandro-colomar force-pushed the shadow_random branch from 9cc1bb4 to 439075a Compare January 27, 2023 12:37

hallyn merged commit 848f53c into shadow-maint:master Jan 28, 2023

alejandro-colomar deleted the shadow_random branch January 29, 2023 19:37

alejandro-colomar mentioned this pull request Jun 19, 2024

lib/csrand.c: Fix the lower part of the domain of csrand_uniform() #1025

Merged

thesamesam mentioned this pull request Jul 18, 2024

Use liba2i #1049

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove bias from functions that get randomness, simplify their implementation, and move to separate file #624

Remove bias from functions that get randomness, simplify their implementation, and move to separate file #624

alejandro-colomar commented Dec 30, 2022

zx2c4 Dec 31, 2022

alejandro-colomar Dec 31, 2022 •

edited

Loading

zx2c4 Jan 1, 2023

alejandro-colomar Jan 1, 2023

alejandro-colomar Jan 17, 2023 •

edited

Loading

alejandro-colomar Jan 17, 2023 •

edited

Loading

alejandro-colomar Jan 17, 2023 •

edited

Loading

alejandro-colomar Jan 18, 2023 •

edited

Loading

alejandro-colomar commented Jan 1, 2023

alejandro-colomar commented Jan 8, 2023

alejandro-colomar commented Jan 16, 2023

alejandro-colomar commented Jan 16, 2023 •

edited

Loading

alejandro-colomar commented Jan 16, 2023

ikerexxe commented Jan 16, 2023

alejandro-colomar commented Jan 16, 2023 •

edited

Loading

alejandro-colomar commented Jan 16, 2023

ikerexxe left a comment

ikerexxe commented Jan 17, 2023

alejandro-colomar commented Jan 17, 2023

alejandro-colomar commented Jan 17, 2023 •

edited

Loading

alejandro-colomar commented Jan 27, 2023

alejandro-colomar commented Jan 27, 2023 •

edited

Loading

hallyn commented Jan 28, 2023

hallyn commented Jan 28, 2023

alejandro-colomar commented Jan 28, 2023 via email

Remove bias from functions that get randomness, simplify their implementation, and move to separate file #624

Remove bias from functions that get randomness, simplify their implementation, and move to separate file #624

Conversation

alejandro-colomar commented Dec 30, 2022

zx2c4 Dec 31, 2022

Choose a reason for hiding this comment

alejandro-colomar Dec 31, 2022 • edited Loading

Choose a reason for hiding this comment

zx2c4 Jan 1, 2023

Choose a reason for hiding this comment

alejandro-colomar Jan 1, 2023

Choose a reason for hiding this comment

alejandro-colomar Jan 17, 2023 • edited Loading

Choose a reason for hiding this comment

alejandro-colomar Jan 17, 2023 • edited Loading

Choose a reason for hiding this comment

alejandro-colomar Jan 17, 2023 • edited Loading

Choose a reason for hiding this comment

alejandro-colomar Jan 18, 2023 • edited Loading

Choose a reason for hiding this comment

alejandro-colomar commented Jan 1, 2023

alejandro-colomar commented Jan 8, 2023

alejandro-colomar commented Jan 16, 2023

alejandro-colomar commented Jan 16, 2023 • edited Loading

alejandro-colomar commented Jan 16, 2023

ikerexxe commented Jan 16, 2023

alejandro-colomar commented Jan 16, 2023 • edited Loading

alejandro-colomar commented Jan 16, 2023

ikerexxe left a comment

Choose a reason for hiding this comment

ikerexxe commented Jan 17, 2023

alejandro-colomar commented Jan 17, 2023

alejandro-colomar commented Jan 17, 2023 • edited Loading

alejandro-colomar commented Jan 27, 2023

alejandro-colomar commented Jan 27, 2023 • edited Loading

hallyn commented Jan 28, 2023

hallyn commented Jan 28, 2023

alejandro-colomar commented Jan 28, 2023 via email

alejandro-colomar Dec 31, 2022 •

edited

Loading

alejandro-colomar Jan 17, 2023 •

edited

Loading

alejandro-colomar Jan 17, 2023 •

edited

Loading

alejandro-colomar Jan 17, 2023 •

edited

Loading

alejandro-colomar Jan 18, 2023 •

edited

Loading

alejandro-colomar commented Jan 16, 2023 •

edited

Loading

alejandro-colomar commented Jan 16, 2023 •

edited

Loading

alejandro-colomar commented Jan 17, 2023 •

edited

Loading

alejandro-colomar commented Jan 27, 2023 •

edited

Loading