-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Full recovery of internal state #14
Comments
Solid work Steve! |
I wouldn't mind double checking this work. How about a Makefile and instructions to run the "attack"? Bug 1 & 3 is pretty simple fix. Spinlocks are pretty normal stuff in kernel code. It's the best way to to protect code that should be run by only 1 thread. I was not able to reproduce the problem of a 'block forever'. Do you have an alternative solution? |
BTW: I do highly appreciate the hard work you put into this. |
Clone the repo, and click If this is your required instruction and you can reproduce it, please update your README so that downstream developers (especially some custom android kernel) will receive alerts. |
It appears that |
@Sc00bz Are you willing to peer review v2.0? |
Unfortunately, the overall effect of @Sc00bz' work isn't merely "here are some bugs you should fix". Rather, it calls into question whether the entire project is safe - or even necessary. The bar of whether something should be used as a replacement for /dev/urandom is much higher than a couple of bugfixes. |
LOL, if everyone abandoned and deleted every project that had an exploit, there would be nothing on the web. If you feel it's useless, unstar this repo and move on... Traffic insights of this project says otherwise, so other people since that past 8 years have found some use from it. I'll assume you know that going from 1.x to 2.x is NOT merely bug fixes... If Sc00bz is unwilling to test/review v2.0, then I'll still try to improve this project without his input. |
Did you run You don't need the spin lock. Just lock around these lines because Lines 359 to 364 in e4cb549
Or on line 339 unlock and then lock again. Lines 336 to 341 in e4cb549
Also I didn't mention this in the blog because it's a small thing but you need to lock on the work thread. It's possible for I'm not going to review v2 unless there's crypto. I'd suggest using ChaCha8 (that's 8 vs 20 rounds. It's weaker but should be fine) and do 4x, 8x, or 16x in parallel using SIMD (SSE2, AVX, AVX2, and AVX-512 for x86 and NEON for ARM). Don't bother with endianness or [de]interleaving the instances because it doesn't matter for a PRNG. For seeding/reseeding, I'd suggest doing what /dev/urandom does unless you can grab from that or Optionally you could use VAES instead when it's available, but you'll need benchmarks to see if that's faster. You could use a spin lock so multiple threads can work at the same time but on different states. Since the current spin lock does very little for performance. Also you don't need to randomly select. Just rotate through them all. Also you should not throwaway generated random data. Mark how much in the buffer is used and refill it when there's nothing left. Since a lot of calls are going to be for nonce/key sized data 12-32 bytes. |
This is a valid suggestion provided by recent cryptanalysis. According to Jean-Philippe Aumasson, ChaCha8 is still cryptographically secure, while ChaCha20 is over-cautious:
Because the current Linux kernel RNG is based on ChaCha20 and provides 400+ MBps throughput with 64k chunk sizes (according to my benchmark in #13), and knowing that ChaCha8 will provide a 2.5× speed up, we can expect close to 1+ GBps throughput. While this doesn't reach the throughput of xorshift64, it's cryptographically secure and still provides incredible performance. |
No one is suggesting you abandon the project (that I've seen so far). Instead we're advising you to either move away from xorshift64 to a cryptographically secure primitive (EG, ChaCha8), or remove the claims in the README.md that it's secure as well as the advice to replace
I find value in this project as a fast RNG in kernel space (even though it might be a better fit in user space). I initially stumbled on this project researching the historic OpenBSD Another RNG Linux kernel module that I stumbled on is https://github.com/Error916/LFSR_module which uses a 128-bit linear feedback shift register, but the performance is terrible. It's also (obviously) insecure, but being as slow as it is reduces my interest in the project. Anyway, your project is great, we just want the security claims to be accurate. |
Thanks for all the positive feedback. The chacha8 seems like a good idea. I'll implement it in a dev branch and see how it performs. |
I'm pretty sure they use normal ChaCha20. Which can only use 128 bit SIMD. AVX2 and AVX-512 are 256 bit and 512 bit SIMD respectively. My suggestion would be 2-4x faster on top of the 2.5x. Since most/all current x86 CPUs have at least AVX2. Also the speed up from not shuffling the state in registers because it's registers of 4, 8, or 16 ChaCha states. So it's at least 5x faster with AVX2 and 10x with AVX-512. Note that's for actual AVX2 and AVX-512. AMD has put out CPUs with those instruction sets but are working on half the SIMD width at a time. It's faster than half width SIMD by a little but not 2x. AMD Zen 2 moved from this to full width AVX2 and AMD Zen 4 added half width AVX-512. |
So the potential is there to outperform the existing xorshift64 implementation? Win-win! |
Personally, I don't think ChaCha8 will outperform xorshift... We'll need to wait until it's implemented to see how it all performs. BTW: I know there's probably not much interest (in this thread in particular), but I just pushed a totally revamped v2beta using updated PRNGS and implementing fixes and recommendations. |
"ChaCha8" in the second to last sentence caught my eye. So I missed the multiple times where you said you didn't implement it and went looking for it. Anyway I accidentally broke version 2's UHS mode. I'm also not sure how you were able to benchmark it because it should just segfault. Since you are writing to unallocated memory and overwriting a pointer with random. Also this won't compile on 32 bit architectures and you didn't fix an off by one bug—right I never mentioned it because it's a minor bug that didn't really matter until now. I guess it still doesn't really matter. Anyway looking forward to your ChaCha8 implementation. |
OK, now I'm going to need a bit of help... I want to make sure this is solid. I found an implementation of chacha20 that I was able to put into kernel code. It seems to work, but runs about 80MB/s slower than urandom. (I guess this would be because it's using the misc-device and not optimized). If anyone here is able to convert the chacha20 to chacha8, it would be appreciated... (and fix the bug, you just pointed out) I'm assuming once chacha20 is modified to chacha8, we should get close to the goal speeds. |
OK, I think I figured it out... I found here a reference; So, I just need to simply change line 675 from
|
I basically rewrote the whole thing. ChaCha8 uses scalar, SSE2, SSSE3, AVX, XOP, AVX2, AVX-512, and NEON in 160 lines. It runs with 16 states of ChaCha and outputs 1 KiB per call. I need to do some testing. I want to make sure XOP and AVX-512 work but I'd need to find something that runs those. Currently XOP is commented out because the GCC documentation is lacking. Also I don't think AMD has made a CPU with XOP in 7 years and Intel never implemented it. Oh right I wanted to do a condition variable but still haven't looked up how that works in a Linux kernel module. I'll probably do a pull request tomorrow. I have three methods for reseeding: |
Hey so I ran into an issue with Linux and fixed it. There is still one small thing, apparently a compile feature in GCC is C++ only: I can add the runtime test for which instructions sets are available but the number of files goes from 1 to 7 because each file needs to be compiled with I'll try to figure something out tomorrow. Also I wasn't able to test XOP or AVX-512. |
There are defines for cpu flags... Is this what you're looking for??? Creating compile time option flags for the detected cpu flags? (This list is not complete and is just for an example)
|
Scoobz How are things working out? |
2.x has been released, so closing this issue. |
Blog: https://tobtu.com/blog/2023/3/breaking-xor-shift-prng/
Code: https://github.com/Sc00bz/break-srandom
You should probably update the read me to state it's an insecure PRNG.
The text was updated successfully, but these errors were encountered: