Full recovery of internal state #14

Sc00bz · 2023-03-27T22:16:47Z

Blog: https://tobtu.com/blog/2023/3/breaking-xor-shift-prng/

Code: https://github.com/Sc00bz/break-srandom

You should probably update the read me to state it's an insecure PRNG.

atoponce · 2023-03-27T22:32:59Z

Solid work Steve!

josenk · 2023-03-28T01:48:51Z

I wouldn't mind double checking this work. How about a Makefile and instructions to run the "attack"?

Bug 1 & 3 is pretty simple fix. Spinlocks are pretty normal stuff in kernel code. It's the best way to to protect code that should be run by only 1 thread. I was not able to reproduce the problem of a 'block forever'. Do you have an alternative solution?

josenk · 2023-03-28T01:53:13Z

BTW: I do highly appreciate the hard work you put into this.

EL-File4138 · 2023-03-28T06:16:39Z

I wouldn't mind double checking this work. How about a Makefile and instructions to run the "attack"?

Bug 1 & 3 is pretty simple fix. Spinlocks are pretty normal stuff in kernel code. It's the best way to to protect code that should be run by only 1 thread. I was not able to reproduce the problem of a 'block forever'. Do you have an alternative solution?

Clone the repo, and click attack/test.html. If your system installed your own work, It will be detected. Otherwise, a true CSPRNG will not trigger the alarm.
Distinguishable from a CSPRNG means your randomness is not qualified for cryptographical solutions. This is already called a break.

If this is your required instruction and you can reproduce it, please update your README so that downstream developers (especially some custom android kernel) will receive alerts.

atoponce · 2023-03-28T12:44:22Z

It appears that attack/test.html only works in Chrome/Chromium. From my testing, it fails in Firefox and Brave, which means they're probably using the getrandom() system call to get randomness for window.crypto.getRandomValues() rather than reading /dev/urandom directly.

josenk · 2023-03-28T20:02:09Z

@Sc00bz Are you willing to peer review v2.0?

roycewilliams · 2023-03-28T23:19:52Z

Unfortunately, the overall effect of @Sc00bz' work isn't merely "here are some bugs you should fix". Rather, it calls into question whether the entire project is safe - or even necessary.

The bar of whether something should be used as a replacement for /dev/urandom is much higher than a couple of bugfixes.

josenk · 2023-03-29T01:19:38Z

LOL, if everyone abandoned and deleted every project that had an exploit, there would be nothing on the web. If you feel it's useless, unstar this repo and move on... Traffic insights of this project says otherwise, so other people since that past 8 years have found some use from it.

I'll assume you know that going from 1.x to 2.x is NOT merely bug fixes... If Sc00bz is unwilling to test/review v2.0, then I'll still try to improve this project without his input.

Sc00bz · 2023-03-29T03:18:55Z

Spinlocks are pretty normal stuff in kernel code. It's the best way to to protect code that should be run by only 1 thread. I was not able to reproduce the problem of a 'block forever'. Do you have an alternative solution?

Did you run dd if=/dev/srandom of=/dev/null bs=1M in 17+ times or write a script to do the same? Also this only will work in standard mode because UHS has 32 arrays and only 16 can be locked currently.

You don't need the spin lock. Just lock around these lines because update_sarray*() locks:

srandom/srandom.c

Lines 359 to 364 in e4cb549

    
           memcpy(new_buf + (Block * 512), prngArrays[arraysPosition], 512); 
        
           #if ULTRA_HIGH_SPEED_MODE 
        
                   update_sarray_uhs(arraysPosition); 
        
           #else 
        
                   update_sarray(arraysPosition); 
        
           #endif

Or on line 339 unlock and then lock again.

srandom/srandom.c

Lines 336 to 341 in e4cb549

    
           while ((ArraysBusyFlags & 1 << arraysPosition) == (1 << arraysPosition)) { 
        
                   arraysPosition += 1; 
        
                   if (arraysPosition >= numberOfRndArrays) { 
        
                           arraysPosition = 0; 
        
                   } 
        
           }

Also I didn't mention this in the blog because it's a small thing but you need to lock on the work thread. It's possible for seed_PRND_*() to get undone by another thread.

I'm not going to review v2 unless there's crypto. I'd suggest using ChaCha8 (that's 8 vs 20 rounds. It's weaker but should be fine) and do 4x, 8x, or 16x in parallel using SIMD (SSE2, AVX, AVX2, and AVX-512 for x86 and NEON for ARM). Don't bother with endianness or [de]interleaving the instances because it doesn't matter for a PRNG. For seeding/reseeding, I'd suggest doing what /dev/urandom does unless you can grab from that or getrandom(). I did something like this in 2008 with SHA-256 and SSE2.

Optionally you could use VAES instead when it's available, but you'll need benchmarks to see if that's faster. You could use a spin lock so multiple threads can work at the same time but on different states. Since the current spin lock does very little for performance. Also you don't need to randomly select. Just rotate through them all. Also you should not throwaway generated random data. Mark how much in the buffer is used and refill it when there's nothing left. Since a lot of calls are going to be for nonce/key sized data 12-32 bytes.

atoponce · 2023-03-29T04:33:29Z

I'd suggest using ChaCha8 (that's 8 vs 20 rounds. It's weaker but should be fine)

This is a valid suggestion provided by recent cryptanalysis. According to Jean-Philippe Aumasson, ChaCha8 is still cryptographically secure, while ChaCha20 is over-cautious:

5.3 How many rounds?
Our goal is to propose numbers of rounds for which we have strong confidence that the algorithm will never be wounded, let alone broken, using the terminology defined in §4.4. Based on the previous research and our cryptanalysis experience, in particular as related to these algorithm (sic), we propose the following:

AES: ...

BLAKE2: ...

ChaCha: 8 rounds instead of 20 (that is, ChaCha8), yielding a 2.5× speed-up.

SHA-3: ...

These suggested numbers of rounds are probably imperfect choices in terms of consistency, yet the security margins of these four families of algorithms are more consistent with these new rounds numbers than with their original ones. Note that AES’ initial security margin was the thinnest of all, not because it’s the older design, but because of the initial not-overconservative choice of rounds vs. best known attack. AES is thus the primitive for which we propose the smallest change.

Because the current Linux kernel RNG is based on ChaCha20 and provides 400+ MBps throughput with 64k chunk sizes (according to my benchmark in #13), and knowing that ChaCha8 will provide a 2.5× speed up, we can expect close to 1+ GBps throughput. While this doesn't reach the throughput of xorshift64, it's cryptographically secure and still provides incredible performance.

atoponce · 2023-03-29T04:57:01Z

LOL, if everyone abandoned and deleted every project that had an exploit, there would be nothing on the web. If you feel it's useless, unstar this repo and move on

No one is suggesting you abandon the project (that I've seen so far). Instead we're advising you to either move away from xorshift64 to a cryptographically secure primitive (EG, ChaCha8), or remove the claims in the README.md that it's secure as well as the advice to replace /dev/urandom.

Traffic insights of this project says otherwise, so other people since that past 8 years have found some use from it.

I find value in this project as a fast RNG in kernel space (even though it might be a better fit in user space). I initially stumbled on this project researching the historic OpenBSD /dev/{a,p,s}random devices. I was intrigued when I came here, but was dismayed when I read the README.md followed by reading the source code.

Another RNG Linux kernel module that I stumbled on is https://github.com/Error916/LFSR_module which uses a 128-bit linear feedback shift register, but the performance is terrible. It's also (obviously) insecure, but being as slow as it is reduces my interest in the project.

Anyway, your project is great, we just want the security claims to be accurate.

josenk · 2023-03-29T13:43:30Z

Thanks for all the positive feedback. The chacha8 seems like a good idea. I'll implement it in a dev branch and see how it performs.

Sc00bz · 2023-03-29T18:34:44Z

the current Linux kernel RNG is based on ChaCha20

I'm pretty sure they use normal ChaCha20. Which can only use 128 bit SIMD. AVX2 and AVX-512 are 256 bit and 512 bit SIMD respectively. My suggestion would be 2-4x faster on top of the 2.5x. Since most/all current x86 CPUs have at least AVX2. Also the speed up from not shuffling the state in registers because it's registers of 4, 8, or 16 ChaCha states. So it's at least 5x faster with AVX2 and 10x with AVX-512.

Note that's for actual AVX2 and AVX-512. AMD has put out CPUs with those instruction sets but are working on half the SIMD width at a time. It's faster than half width SIMD by a little but not 2x. AMD Zen 2 moved from this to full width AVX2 and AMD Zen 4 added half width AVX-512.

atoponce · 2023-03-29T20:53:36Z

So the potential is there to outperform the existing xorshift64 implementation? Win-win!

josenk · 2023-03-30T03:15:16Z

Personally, I don't think ChaCha8 will outperform xorshift... We'll need to wait until it's implemented to see how it all performs.

BTW: I know there's probably not much interest (in this thread in particular), but I just pushed a totally revamped v2beta using updated PRNGS and implementing fixes and recommendations.
I replace SplitMix with wyhash and I replaced an old xorshift128 with xoshiro256**.
I added many new things like random buffer selection (instead of sticking to one for the entire request), random shuffle, random selection of PSRNG and a totally separate PRNG that is "internal to the module only" to make all these random events unexposed. Fixed the locking, lots of cleanup and simplification.
The README is not complete and ChaCha8 is not implemented yet. I decided to use ChaCha8 in standard mode and UHS will use this updated XorShift if it's good enough.
It's about 50% slower then srandom 1.x.

Sc00bz · 2023-03-30T20:17:26Z

"ChaCha8" in the second to last sentence caught my eye. So I missed the multiple times where you said you didn't implement it and went looking for it. Anyway I accidentally broke version 2's UHS mode. I'm also not sure how you were able to benchmark it because it should just segfault. Since you are writing to unallocated memory and overwriting a pointer with random. Also this won't compile on 32 bit architectures and you didn't fix an off by one bug—right I never mentioned it because it's a minor bug that didn't really matter until now. I guess it still doesn't really matter. Anyway looking forward to your ChaCha8 implementation.

josenk · 2023-03-30T20:38:51Z

OK, now I'm going to need a bit of help... I want to make sure this is solid. I found an implementation of chacha20 that I was able to put into kernel code. It seems to work, but runs about 80MB/s slower than urandom. (I guess this would be because it's using the misc-device and not optimized).

If anyone here is able to convert the chacha20 to chacha8, it would be appreciated... (and fix the bug, you just pointed out)

https://github.com/josenk/srandom/tree/v2beta

I'm assuming once chacha20 is modified to chacha8, we should get close to the goal speeds.

josenk · 2023-03-31T02:08:22Z

OK, I think I figured it out... I found here a reference; Perform the ChaCha rounds in sets of two, Column round and Diagonal round

So, I just need to simply change line 675 from 10 to 4, to make it from chacha20 to chacha8.

https://github.com/josenk/srandom/blob/v2beta/srandom.c#L675

Sc00bz · 2023-04-01T22:52:24Z

I basically rewrote the whole thing. ChaCha8 uses scalar, SSE2, SSSE3, AVX, XOP, AVX2, AVX-512, and NEON in 160 lines. It runs with 16 states of ChaCha and outputs 1 KiB per call. I need to do some testing. I want to make sure XOP and AVX-512 work but I'd need to find something that runs those. Currently XOP is commented out because the GCC documentation is lacking. Also I don't think AMD has made a CPU with XOP in 7 years and Intel never implemented it. Oh right I wanted to do a condition variable but still haven't looked up how that works in a Linux kernel module. I'll probably do a pull request tomorrow.

I have three methods for reseeding:
Each set of states reseed after 4 TiB with get_random_bytes().
Each set of states reseed after 4 TiB - 1 KiB with the next 1 KiB.
Each set of states repeat output after 16 ZiB (2**74 bytes).

Sc00bz · 2023-04-04T01:37:09Z

Hey so I ran into an issue with Linux and fixed it. There is still one small thing, apparently a compile feature in GCC is C++ only: __attribute__((target("..."))).

I can add the runtime test for which instructions sets are available but the number of files goes from 1 to 7 because each file needs to be compiled with -msse2, -mssse3, etc. Unless you know of a better way to do this. I think it can still be 1 file if there are defines like __AVX2__ for all of these SSE2, SSSE3, AVX, XOP, AVX2, and AVX-512. But I don't know how to detect those features in a makefile and add the appropriate flags.

I'll try to figure something out tomorrow. Also I wasn't able to test XOP or AVX-512.

josenk · 2023-04-04T13:45:19Z

There are defines for cpu flags...

https://stackoverflow.com/questions/28939652/how-to-detect-sse-sse2-avx-avx2-avx-512-avx-128-fma-kcvi-availability-at-compile

Is this what you're looking for??? Creating compile time option flags for the detected cpu flags? (This list is not complete and is just for an example)

OPTS += $(shell cat /proc/cpuinfo |grep -q "avx " && echo "-mavx")
OPTS += $(shell cat /proc/cpuinfo |grep -q "avx2" && echo "-mavx2")
OPTS += $(shell cat /proc/cpuinfo |grep -q "avx512" && echo "-mavx512")
OPTS += $(shell cat /proc/cpuinfo |grep -q "sse " && echo "-msse")
OPTS += $(shell cat /proc/cpuinfo |grep -q "sse2" && echo "-msse2")
OPTS += $(shell cat /proc/cpuinfo |grep -q "sse3" && echo "-msse3")
OPTS += $(shell cat /proc/cpuinfo |grep -q "sse4" && echo "-msse4")

josenk · 2023-04-10T14:13:32Z

Scoobz How are things working out?

josenk · 2023-06-12T02:03:15Z

2.x has been released, so closing this issue.

Sc00bz mentioned this issue Mar 27, 2023

srandom is not secure (README.md bugs) #13

Closed

josenk closed this as completed Jun 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Full recovery of internal state #14

Full recovery of internal state #14

Sc00bz commented Mar 27, 2023

atoponce commented Mar 27, 2023

josenk commented Mar 28, 2023

josenk commented Mar 28, 2023

EL-File4138 commented Mar 28, 2023

atoponce commented Mar 28, 2023 •

edited

Loading

josenk commented Mar 28, 2023

roycewilliams commented Mar 28, 2023

josenk commented Mar 29, 2023

Sc00bz commented Mar 29, 2023

atoponce commented Mar 29, 2023 •

edited

Loading

atoponce commented Mar 29, 2023

josenk commented Mar 29, 2023

Sc00bz commented Mar 29, 2023

atoponce commented Mar 29, 2023

josenk commented Mar 30, 2023

Sc00bz commented Mar 30, 2023

josenk commented Mar 30, 2023

josenk commented Mar 31, 2023

Sc00bz commented Apr 1, 2023

Sc00bz commented Apr 4, 2023

josenk commented Apr 4, 2023 •

edited

Loading

josenk commented Apr 10, 2023

josenk commented Jun 12, 2023

Full recovery of internal state #14

Full recovery of internal state #14

Comments

Sc00bz commented Mar 27, 2023

atoponce commented Mar 27, 2023

josenk commented Mar 28, 2023

josenk commented Mar 28, 2023

EL-File4138 commented Mar 28, 2023

atoponce commented Mar 28, 2023 • edited Loading

josenk commented Mar 28, 2023

roycewilliams commented Mar 28, 2023

josenk commented Mar 29, 2023

Sc00bz commented Mar 29, 2023

atoponce commented Mar 29, 2023 • edited Loading

atoponce commented Mar 29, 2023

josenk commented Mar 29, 2023

Sc00bz commented Mar 29, 2023

atoponce commented Mar 29, 2023

josenk commented Mar 30, 2023

Sc00bz commented Mar 30, 2023

josenk commented Mar 30, 2023

josenk commented Mar 31, 2023

Sc00bz commented Apr 1, 2023

Sc00bz commented Apr 4, 2023

josenk commented Apr 4, 2023 • edited Loading

josenk commented Apr 10, 2023

josenk commented Jun 12, 2023

atoponce commented Mar 28, 2023 •

edited

Loading

atoponce commented Mar 29, 2023 •

edited

Loading

josenk commented Apr 4, 2023 •

edited

Loading