# PA193 Seminar - RNGs
This notebook contains code for several tasks treated in this seminar. 

# Imports: 
 1. Execute next cell.

In [33]:
import time, math, secrets
from collections import Counter
from cryptography.hazmat.primitives import hashes
from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes

# PRNG: state, seed, determinism 

We will work with PRNG implemented in [random](https://docs.python.org/3/library/random.html) package. See first 4 methods (`seed, setstate, getstate, randbytes`) in the documentation. 
 1. Import **random** package.
 2. Generate (and print) 3 random bytes.  
 3. Print out bytes in hexadecimal form (use `.hex()` method of bytes). Execute cell 2x - use Run or Ctrl+Enter.  
 4. Now seed the generator with arbitrary value and execute 2x - you will see the same result (=RNG is deterministic and initilialized with seed).  
 5. Save state of the generator (into `state` variable) of the PRNG -- insert line right after the seeding (before generation of bytes). Print out the state of the PRNG, it consists of multiple values.
 6. Use the `state` to set up the PRNG and generate the same bytes.

In [38]:
import random 
rnd_bytes = random.randbytes(3)
print(rnd_bytes)
print(rnd_bytes.hex())

random.seed(2)
state = random.getstate()
print(random.randbytes(3).hex())

# print(state)
random.setstate(state)
print(random.randbytes(3).hex())

b'\xaf[\xd9'
af5bd9
a9bef4
a9bef4


# LCG: periodicity, seeding, brute force attack
Standard PRNG functions are very fast but also very insecure. 
 * In python, PRNG is [implemented](https://svn.python.org/projects/python/branches/release32-maint/Lib/random.py) in random module is [Mersenne Twister](https://en.wikipedia.org/wiki/Mersenne_Twister) with state formed by 625 32-bit integers. 
 * In other languages (C, Java, Rust) LCG is typically used. Internal state of LCG is **single** value (state) updated iterativelly as $$state = (state*a+c) \pmod m.$$ Overview of constants `a,c,m` used by the LCG for several languages can be found [LCG params and generators](https://en.wikipedia.org/wiki/Linear_congruential_generator).  
 1. Implement LCG generator and instantiate `RNG` with constants $(a=3, c=1, m = 257)$ 
 2. Seed `RNG` with `0`, generate the sequence of bytes $S_1$ and find its period. Period should be 256 or 1 .
 3. Find the missing number $s$ in $S_1.$ Reseed the generator with $s$, generate the sequence and find its period. The missing number is 128 - it produces constant sequence since $3*128+1 \equiv 128 \pmod {257}.$


In [35]:
class LCG:
    def __init__(self, a, c, m):
        self.a, self.c, self.m = a, c, m
        self.seed(0)

    def seed(self, seed):
        self.state = seed

    def rand(self):
        self.state = (self.state * self.a + self.c) % self.m
        return self.state
    
RNG = LCG(3, 1, 2**8+1)
RNG.seed(int(time.time()))
values = []
for i in range(300):
    r = RNG.rand()
    if r in values:
        print(r, len(values))
        break
    else:
        values.append(r)
        
missing = list(set(range(257)) - set(values))

print(f"missing numbers = {missing}")
RNG.seed(missing[0])
print(f"Other sequence={[RNG.rand() for _ in range(5)]}")

246 256
missing numbers = [128]
Other sequence=[128, 128, 128, 128, 128]


 4. The plaintext `b'0123456789abcdef'` was encrypted by AES with key generated by LCG seeded by time. Find the key.
 * What is the maximal complexity of the attack? It suffices to generate whole cycle (256) and the test all 16B blocks, hence the complexity is 256 iterations. From practical point of view 256 + 16 bytes are generated and key candidates are sliced from that sequence of bytes. 

In [37]:
RNG.seed(int(time.time()))
def LCG_bytes(num_bytes):
    return bytes([RNG.rand()%256 for i in range(num_bytes)])

K = LCG_bytes(16)
AES_enc = Cipher(algorithms.AES(K), modes.ECB()).encryptor()
plaintext = b'0123456789abcdef'
ciphertext = AES_enc.update(plaintext)

byte_array = LCG_bytes(256 + 16)
for offset in range(256):
    K_candidate = byte_array[offset:offset+16]
    AES_dec = Cipher(algorithms.AES(K_candidate), modes.ECB()).decryptor()
    if AES_dec.update(ciphertext) == plaintext:
        print(K)
        break

b'8\xa9\xfb\xf0\xcflD\xcdf2\x97\xc5N\xeb\xc0?'


# LCG: Forward/backward predictability  
 0. Parameters of LCG generators can be found here [LCG params and generators](https://en.wikipedia.org/wiki/Linear_congruential_generator) 
 1. Attacker knowns that the **glibc** generated number 1406932606. Why he is able to find the internal state of the **RNG**? What is the problem? 
 2. Use appropriate seed and generate next 9 values. 
 3. Are you able to create "inverse" LCG that goes in opposite directions? Start from the last value generated in 2. end with the first.  
  - **HINT**: $x_{i+1} = a*x_{i}+c \pmod m \implies x_{i} = a^{-1}*x_{i+1}-(a^{-1}*c) \pmod m$

 

In [24]:
glibc = LCG(a=1103515245, c=12345, m=2**31)
glibc.seed(1406932606)

rngs = [glibc.rand() for _ in range(10)]
print(rngs)

a_inverse = pow(1103515245,-1,2**31)
glibc_backward = LCG(a=a_inverse, c=-12345*a_inverse, m=2**31)
glibc_backward.seed(551188310)

rngs = [glibc_backward.rand() for _ in range(10)]
print(rngs)

[654583775, 1449466924, 229283573, 1109335178, 1051550459, 1293799192, 794471793, 551188310, 803550167, 1772930244]
[794471793, 1293799192, 1051550459, 1109335178, 229283573, 1449466924, 654583775, 1406932606, 12345, 0]


# Entropy: estimation
 1. Use `time_entropy` and generate sequence `S` of 100 random blocks.
 2. Compute histogram of generated values and compute entropy of the source (`time_entropy`). Try different sizes of the sequence and you will see that the entropy estimation fluctuates. 

In [25]:
def time_entropy():
    start = time.time_ns()
    delta = time.time_ns() - start
    if delta < 256:
        size = 1 
    elif delta < 256**2:
        size = 2
    else:
        size = 3
    return delta.to_bytes(size, byteorder='little')
        
    
def H_from_freqs(freqs):
    freq_sum = sum(freqs)
    res = 0
    for freq in freqs:
        prob = freq/freq_sum
        res +=  prob*math.log2(prob)
    return -res

S = [time_entropy() for _ in  range(100)]
hist = Counter(S)
print(f"Histogram of values: {hist}")

H = H_from_freqs(hist.values())
print(f"Entropy produced={H}")

Histogram of values: Counter({b'F': 60, b'<': 19, b'G': 10, b'P': 3, b'=': 3, b'Z': 2, b'Q': 2, b'\xbe': 1})
Entropy produced=1.8253250453233267


# Entropy: values repetitions
 1. How many random blocks with 32 bits of entropy (produced by system RNG) we need to generate to find one collision (repeated value/block). Use with `secret.token_bytes(??)` within `collision` function to generate random bytes. What should be used instead of `??`. Resulted number should be comparable to $2^{16}=65000$ (birthday paradox).  
 2. Now use `generate_bytes` that uses biased src `time_entropy` instead of `secret.token_bytes(??)`. Function `time_entropy` produces approximately 2 bits of entropy per 1B. 
 * Q: How many bytes we should generate using `generate_bytes` so they contain 32 bits of entropy? A: 16 bytes
 * Q: You can see that significantly less iterations (returned value of `collision`) are needed to obtain collision. What is the reason? A: the reason for that is that `time_entropy` generate values with different probabilities (while probabilities for `secret.token_bytes(??)` are equal).

In [30]:
def generate_bytes(num_bytes):
    buffer = bytes()
    while len(buffer) < num_bytes:
        buffer += time_entropy()
    return buffer[:num_bytes]

def collision(num_bytes):
    S = []
    for i in range(10**6): 
        r = generate_bytes(num_bytes)
#         r = secrets.token_bytes(num_bytes)
        if r in S: 
            return len(S)
        else:
            S.append(r)
            
collision(16)

30

# Entropy pool: processing pool
 1. Use three different methods bytes slicing (`pool[a:b]`) `XOR`,`SHA1`,  to process 40 bytes of the pool to resulted random block of 20 bytes. Use `collision` function and  decide which will be better approach for producing different values.
 2. When pool (in `add_event` method) becomes bigger then `maxpoolsize` it should be mininimized. But pool content should be replaced by block with the same entropy. Add the test of the pool size with appropriate processing and replacement to the `add_event` method.

In [86]:
def SHA1(message: bytes):
        digest = hashes.Hash(hashes.SHA1())
        digest.update(message)
        return digest.finalize() 

def XOR(bytes1, bytes2):
    return bytes(a ^ b for (a, b) in zip(bytes1, bytes2))

class EntropyPool(object):
    def __init__(self, maxpoolsize=32) -> None:
        self.maxsize = maxpoolsize
        self.pool = bytes()
    
    def add_event(self, num_bytes) -> None:
        self.pool += generate_bytes(num_bytes)
#         if len(self.pool) > self.maxsize:
#             self.pool = SHA1(self.pool)
        
    def random(self):
        res = SHA1(self.pool)
#         res = XOR(self.pool[:20], self.pool[20:40])
        self.pool = bytes()
        return res

# CSPRNG: period, seeding, backdoor
 1. Questions:
     * What is the period of the `CRT_PRNG`? 
     * Can we replace AES by SHA1 and obtain same security?
     * Internal state updated the same function (AES or SHA1) 
 2. Backdoored (designer knows the key of the generator) RNG generated 16B `nonce=b'z\x94a\x1e\xe2\x0e/\r\xe2\x85\xb6\x94\xca\x1b\xd1\x91'` and then it was used to generate key `K` for AES(16B). Find the internal state of  `CRT_PRNG` and generated 16B key `K` (b'\x97\x14Y\n~\...')
 3. Seed CRT_PRNG with appropriate amount of entropy from `EntropyPool`. 
  * Q: How many bytes should by in the pool to produce random key? A: The amount of entropy of the pool should be equal to size of the key. 

In [28]:
class CRT_PRNG: 
    def __init__(self, seed):
        self.seed(seed)
        self.counter = 0
        
    def seed(self, seed):
        cipher = Cipher(algorithms.AES(seed), modes.ECB())
        self.AES = cipher.encryptor()
        
    def rand(self):
        msg = self.counter.to_bytes(length=16, byteorder='little') 
        rnd_block = self.AES.update(msg)
        self.counter += 1
        return rnd_block
    
K = bytes(16) # zero key
RNG_backdoored = CRT_PRNG(seed=K)

# Bonus: recent Minecraft RNG failure  
The generator `JavaRNGMinecraft` implements simplified version of RNG used in Minecraft below. It directly outputs random integer `randomInteger`(no need to multiply floats back with `(1 << 24)`). The vulnerability and exploit is described in 
[Randar Explanation and Information](https://github.com/spawnmason/randar-explanation/blob/master/README.md).
`public float nextFloat() {
   this.seed = (this.seed * multiplier + addend) % modulus; // update the seed
   int randomInteger = (int) (this.seed >> 24); // take the top 24 bits of the seed
   return randomInteger / ((float) (1 << 24)); // divide it by 2^24 to get a number between 0 and 1
}`

 1. Find the internal state of the generator. It can be found using different approaches slow brute force, fast LLL (see Randar), can you propose other method?

In [152]:
class JavaRNGMinecraft:
    def __init__(self, seed = 0):
        self.seed(seed)

    def seed(self, seed: int):
        self.state = seed

    def randomInteger(self) -> int:
        self.state = (25214903917 * self.state + 11) % 2**48
        print(self.state)
        return (self.state >> 24)
    
RNG = JavaRNGMinecraft(int(time.time()))
rnds = [RNG.randomInteger() for i in range(3)]
print(rnds)