Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upFuture of `thread_rng` #463
Comments
dhardy
added
E-question
X-discussion
labels
May 17, 2018
This comment has been minimized.
This comment has been minimized.
|
Using feature flags can affect dependencies, right? This might break assumptions of dependencies, so this seems dangerous. I would prefer to use different types instead and require the dependencies to be generic in the type of thread RNG. Instead of providing several variants of I think |
This comment has been minimized.
This comment has been minimized.
|
Yes, feature flags can affect dependencies, which is why I suggested what I did above. Using a custom version of
This is why I would not recommend using such a feature flag on servers. Interesting idea trying to use such an attack in multiplayer games, but there is no benefit for synchronous-step models (common in RTS) or attacking the server in server-centric models (common for FPS), and even if it were achievable vs other players the consequences are not high. Going back to my proposal, I prefer
|
This comment has been minimized.
This comment has been minimized.
I tried to argue that dependencies should make it possible for the user to choose the RNG, which wouldn't have this problem. If the thread RNG is not exposed via the API, it is an implementation detail and I think it should not be affected by feature flags, because it can have unintended consequences. |
This comment has been minimized.
This comment has been minimized.
|
Related to this, there are security trade-offs to decide:
Given independent |
This comment has been minimized.
This comment has been minimized.
|
@vks I don't get what you mean; you're saying use |
This comment has been minimized.
This comment has been minimized.
|
I agree that having a Based on that I'd be tempted to provide very few security guarantees for I don't know enough about CSPRNG to have opinions about what guarantees that On the topic of feature flags: Having feature flags which changes what guarantees A better approach would be to encourage libraries which use rand to include feature flags which makes them use different |
This comment has been minimized.
This comment has been minimized.
|
While I partly agree with you @sicking it is important to consider uses like in I only see two solutions for this:
|
This comment has been minimized.
This comment has been minimized.
|
A few comments on the first post, but I haven't though everything through yet...
I am really not that concerned with the memory usage. While 4kb is a lot, it is also really not that much for any system that the standard library is available on. The init time is something to worry about a bit more. On my PC it takes 5265 ns to initialize For the situation with many worker threads, isn't it better to use the scheme I ended up with of splitting an RNG using a wrapper such as Performance with
Performance with
Performance with
Retrieving the RNG from TLS greatly dominates the cost here.
Yes, that is a big argument. It is the call site that determince whether On the other hand, we already reserve the right to change the algorithm of
Renaming Still, having a thread-local variant using a fast RNG does not offer much advantage in the common case unless the RNG is cached.
The performance of RDRAND, for comparison (1 value per iteration, instead of 1000 in the previous benchmarks):
RDRAND is 5-10× slower that the current
Would you really recommend |
This comment has been minimized.
This comment has been minimized.
|
But what is the As I said in the first post, ultimately such usage is about trust and risk, but the |
This comment has been minimized.
This comment has been minimized.
|
I feel like there are two very common use cases which I think would be great to have very easy-to-access APIs for:
I was thinking 1 was Maybe what I'm asking for for 2 is more dhardy#60 plus an empty And to be clear, I think there are many uses of rng which does not fall into either of these categories. For these I think having APIs which are explicit about which RNG algorithm is used and what the source of seed is the way to go. That way developers can choose whatever algorithm provides the tradeoffs that match their requirements. |
This comment has been minimized.
This comment has been minimized.
|
I don't know if However the point is that if 2 is not fine for Though in reality I suspect that specifically |
This comment has been minimized.
This comment has been minimized.
|
Category 2 (simply fast) is actually pretty easy. Our current But when it comes to seeding a hash algorithm for |
This comment has been minimized.
This comment has been minimized.
I don't think this is well-defined. You can usually make RNGs faster by increasing the size of the state, because it allows for more instruction parallelism. Also, in which regard should it be fast? Initialization? Generating |
This comment has been minimized.
This comment has been minimized.
I think the dependencies should use use |
This comment has been minimized.
This comment has been minimized.
|
This works for generating random bytes faster, but not necessarily for getting random data into other code faster. As noted in the doc, it's necessary to benchmark your code to find the fastest RNG, so it's pointless trying to find "the fastest RNG" for |
This comment has been minimized.
This comment has been minimized.
fschutt
commented
May 23, 2018
|
A question: In the latest release (rand 0.5.0), |
This comment has been minimized.
This comment has been minimized.
|
It requires |
This comment has been minimized.
This comment has been minimized.
There are not many RNGs for which this is true, but does help a bit for Xorshift and Xoroshiro. It does come at the cost of using more registers, so it may work better in benchmarks than in real code. |
This comment has been minimized.
This comment has been minimized.
|
I was also thinking of vectorization.
…On Wed, May 23, 2018, 19:18 Paul Dicker ***@***.***> wrote:
You can usually make RNGs faster by increasing the size of the state,
because it allows for more instruction parallelism.
There are not many RNGs for which this is true, but does help a bit for
Xorshift and Xoroshiro. It does come at the cost of using more registers,
so it may work better in benchmarks than in real code.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#463 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AACCtFDkwR2-phZ6M5ORgekLWZRZ_ackks5t1ZnzgaJpZM4UCvJv>
.
|
This comment has been minimized.
This comment has been minimized.
Is this really about the overhead of TLS or about checking that the RNG was initialised? Because any call to @pitdicker your More important though is that any approach using splitting won't work with the current API, since new threads would need a "master" to initialise from. I don't know if
This is partially intentional to force users to choose between
Not in its current form, no. With extra protections (also using RDRAND or with forward secrecy), perhaps. But I don't think either of these can be added without a big performance decrease, so if we did add something like this it would be distinct from
I don't think this has any extra requirements here. @pitdicker has been experimenting with using After reviewing this again I only really see these options:
Any further thoughts? I like the idea of the latter but it does make Rand more complex. |
This comment has been minimized.
This comment has been minimized.
The details of how thread-local storage works are still a bit fuzzy for me, and the implementation in the standard library is spread over quite a few files. Checking if it is initialized is one part. There may also be some indirection, because it has to cross a crate boundary. And at some point (on Unix) it uses
I ended up with a better scheme using splitting. But I only mentioned it in the context of many worker threads using a fast RNG.
The main problem is that some sort of I am afraid it can be confusing/disappointing for users. If you don't know how things work, the current But to use When you get to that point, isn't it not just as easy to seed a small PRNG of your own choosing, possibly from Not sure what I want to say really
I am not against this, but also don't really see the problem of using more memory. Do you really think there are situations where we have TLS, but don't have abundant memory?
We are really not in such a bad state in my opinion (although the |
dhardy
referenced this issue
Jun 15, 2018
Closed
Fork protection / reseeding / pub-priv generators #314
pitdicker
referenced this issue
Jun 22, 2018
Closed
Add a feature flag to OsRng to allow use of RDRAND on modern systems #410
This comment has been minimized.
This comment has been minimized.
hdevalence
commented
Jul 11, 2018
•
|
Having a single, strong, fast Is there any reason not to make the choice of RNG backing the |
This comment has been minimized.
This comment has been minimized.
tarcieri
commented
Jul 11, 2018
|
I would definitely recommend having a hardware accelerated AES-based RNG on platforms with hardware AES, gated on runtime feature detection. These can be implemented using a simple abstraction: the AES encryption round function alone e.g. There are any number of options for a cipher-based CSPRNG to choose from which are all fully parallelizable/vectorizable. A general theme among these is AES-CTR with a periodic rekeying mechanism (see also RDRAND). Here's a specific, recent example of such an RNG: https://blog.cr.yp.to/20170723-random.html Regarding RDRAND specifically, I think it's perfectly fine for the case of e.g. seeding SipHash to mitigate hashDoS, but probably not the best option for some sort of general-purpose I would definitely recommend ChaCha as a pure software option which should work everywhere, with ChaCha20 as a paranoid default or it could arguably be reduced to ChaCha12 (the best known attack on ChaCha only works up to 7 rounds and 20-rounds are a paranoid safety margin). ChaCha can be trivially accelerated using e.g. AVX2-style instructions or other vector unit primitives which are ubiquitously available (add, rotate, xor) and has a small code size. I'm certainly not wild about things like HC-128 or ISAAC. The former saw some analysis via ESTREAM, but ChaCha20 (and its predecessor Salsa20) have seen considerably more as ChaCha20 is an MTI cipher for TLS. It definitely sounds like there's a bit of a cipher zoo going on, and I'd strongly suggest reducing the number of ciphers unless there are very strong technical arguments for doing otherwise. |
This comment has been minimized.
This comment has been minimized.
|
FWIW, I ported a C implementation of DJB's suggestion to Rust: https://github.com/vks/aesrng |
This comment has been minimized.
This comment has been minimized.
|
ChaCha20 (and even ChaCha8) is a lot slower than HC-128, which is why we went with that option. We considered HC-128 acceptable due to the ESTREAM recommendation and decided to remove ISAAC from Rand proper (hasn't happened yet but will). Making use of hardware AES in I don't plan to work on this myself but PRs are welcome. |
This comment has been minimized.
This comment has been minimized.
tarcieri
commented
Jul 12, 2018
That's a bit surprising to hear, considering chacha8 and chacha12 are consistently faster than hc128 across multiple architectures on SUPERCOP (with chacha20 often beating it out as well): https://bench.cr.yp.to/results-stream.html (that said, ChaCha8 is pretty much zero margin for error, and I wouldn't recommend it as it has no safety margin. ChaCha12 is a happy medium between that and ChaCha20's paranoia) |
This comment has been minimized.
This comment has been minimized.
Maybe our implementation needs to be optimized/vectorized. |
This comment has been minimized.
This comment has been minimized.
|
If ChaCha can be optimised to compete that would be great. It uses a lot less memory and initialisation time (I guess this is why those benches show HC-128 as terrible on short sequences). |
This comment has been minimized.
This comment has been minimized.
I have wondered about those benchmarks before. Some of those benchmarks of HC-128 are even off by two orders of magnitude! Maybe it is also counting initialization time, combined with a terrible implementation? And using an implementation of ChaCha that is much faster than anything there is in Rust at the moment? |
This comment has been minimized.
This comment has been minimized.
|
I also tried the |
This comment has been minimized.
This comment has been minimized.
tarcieri
commented
Jul 12, 2018
|
There isn't a particularly good implementation of ChaCha in Rust right now that I know of (which is why I plan on writing one soon). |
This comment has been minimized.
This comment has been minimized.
|
I wrote an explicitly vectorized ChaCha4 implementation for my |
This comment has been minimized.
This comment has been minimized.
tarcieri
commented
Jul 29, 2018
|
I guess I missed the previous discussion of Randen, but it looks like a very nice option for platforms where hardware AES is available:
|
This comment has been minimized.
This comment has been minimized.
|
Yes [you did]: #462 Edit to clarify: Randen looks like a good RNG, but I'd want to see third-party cryptographic review before promoting it here. |
dhardy commentedMay 17, 2018
•
edited
Status: proposal to allow hardware-dependent generators and replace HC128 with a faster variant of ChaCha (start reading here).
This topic comes up quite a bit from various angles; I think it's time to get some ideas down about the future of
thread_rng(regarding the 0.6 release or later).I see the following potential uses for
thread_rng:Also, I think it's worth mentioning where we do not expect
thread_rngto be used:thread_rngmay not be the fastest optionAnd an important note on security: we should aim to provide a secure source of random data, but ultimately it is up to users to decide how much they trust our implementation and what their risks are.
thread_rngdoes not have the simplest code to review and is currently young and subject to further change. Also we may or may not implement forward secrecy (backtracking resistance), and for ultimate security solutions using no local state may be preferred.Our current implementation of
thread_rngtries to satisfy the above with a fast, secure PRNG, but at the cost of high memory usage and initialisation time per thread. For applications with a low number of long-running threads this is reasonable, but for many worker threads may not be ideal.There are two ways we can let users influence the implementation:
thread_rngor call a different function)Feature flags allow configuration on a per-application basis, e.g.
The last two options sound very risky to me — should we ask distributors and end-users to reason about the security of whole applications? It is quite possible that the people building applications — even developers — will not know about all uses of
thread_rngrequiring secure randomness.This brings me to ask, is having only a single user-facing function ideal? What if instead:
strong_rngreplacesthread_rngas a source of cryptographically secure randomnessweak_rngis added as a fast source of randomness; depending on feature flags this could just wrapstrong_rngor could be independentAn advantage of the above is that feature-flags could allow replacing the current implementation (HC-128; 4176 bytes) with two smaller back ends (e.g. ChaCha + PCG; 136 + 16 bytes), while only compromising the speed of the secure generator.
Another advantage is that we could add forward-secrecy to
strong_rngwith less concern for performance implications.But first, why bother when generators like Randen and
RDRANDclaim to satisfy all requirements anyway? This is a good question to which I only have vague answers: Randen is still new and unproven and may have portability issues; RDRAND is not fully trusted; and may not be the fastest option.Second, what about users like
HashMapwhere weaknesses are often not exploitable (depending on application design and usage) and in the worst case only allow DOS attacks (slow algorithms)? Another good question. One possible answer is that these use-cases should useweak_rngbut by default this would be secure anyway; we provide feature flags to change that but discourage usage on servers. It might seem tempting to add a third function, but, frankly, this kind of thing is probably the main use case forweak_rnganyway.Another, very different, option is that we keep
thread_rnglooking like it is but removeCryptoRngsupport and recommend it not be used for crypto keys. Then we can add a feature flag changing its implementation to an insecure generator with less concern. This may be a good option, but goes against our recent changes (switching to HC-128 and implementingCryptoRng).BTW, lets not bikeshed
thread_rngvsThreadRng::newor other syntax here.