# Validating xorshift128 model accuracy

Here we show working to validate our [4.2 Model Results](https://research.nccgroup.com/2021/10/15/cracking-random-number-generators-using-machine-learning-part-1-xorshift128/) - specifically this assertion:

> To be more confident about what we have achieved, we have generated a new sample of 100,000 random numbers with a different seed than those used to generate the previous data, and we got the same result of 100% bitwise accuracy.

## Setup configuration for our validation

In [1]:
import os
import sys

def generate_secure_random_32bit() -> int:
    return int.from_bytes(os.urandom(int(32/8)),sys.byteorder)

PINNED_RANDOM_VARIABLES = [generate_secure_random_32bit() for _ in range(4)]
print(f'# PINNED_RANDOM_VARIABLES = {PINNED_RANDOM_VARIABLES}')

# PINNED_RANDOM_VARIABLES = [3110163663, 2164588659, 2731293512, 2875181111]


In [2]:
RNG_OUTPUT_FILENAME = 'xorshift128_validation.txt'
PREVIOUS_TIMESTEP_COUNT = 4
IMPORT_COUNT = 100000 + PREVIOUS_TIMESTEP_COUNT

## Generate deterministic dataset

In [3]:
import typing

def xorshift128() -> typing.Callable[[], int]:
    '''xorshift
    https://ja.wikipedia.org/wiki/Xorshift
    '''

    x, y, z, w = PINNED_RANDOM_VARIABLES

    def _random() -> int:
        nonlocal x, y, z, w
        t = x ^ ((x << 11) & 0xFFFFFFFF)  # 32bit
        x, y, z = y, z, w
        w = (w ^ (w >> 19)) ^ (t ^ (t >> 8))
        return w

    return _random

with open(RNG_OUTPUT_FILENAME, 'w') as f:
    r = xorshift128()

    for i in range(IMPORT_COUNT):
        f.write(f'{r()}\n')

## Convert to input and target data

In [4]:
import numpy as np

# convert the sequence of generated numbers to 4 inputs and one output
def strided(a, L):
    shp = a.shape
    s  = a.strides
    nd0 = shp[0]-L+1
    shp_in = (nd0,L)+shp[1:]
    strd_in = (s[0],) + s
    return np.lib.stride_tricks.as_strided(a, shape=shp_in, strides=strd_in)

df = np.genfromtxt(RNG_OUTPUT_FILENAME,delimiter='\n',dtype='uint64')

TOTAL_DATA_NUM = IMPORT_COUNT-PREVIOUS_TIMESTEP_COUNT

# calculates how many bits are in the output.
BIT_WIDTH = np.ceil(np.log2(np.amax(df))).astype(int)

# convert the generated numbers to binary sequences
df_as_bits =(df[:,None] & (1 << np.arange(BIT_WIDTH,dtype='uint64')) > 0).astype(int)
df_as_frames = strided(df_as_bits, PREVIOUS_TIMESTEP_COUNT+1)

indicies = np.arange(TOTAL_DATA_NUM,dtype='uint64')
np.random.shuffle(indicies)
df_as_frames=df_as_frames[indicies]

# convert the data into inputs and outputs
y = df_as_frames[:,-1,:]
X = df_as_frames[:,:-1,]
X = X.reshape([X.shape[0], X.shape[1]*X.shape[2]])

print(np.shape(X), np.shape(y))

(100000, 128) (100000, 32)


## Load pre-trained model

In [5]:
from tensorflow.keras.models import load_model
model = load_model('xorshift128_model.h5')

2021-10-19 17:34:51.354401: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


## Validate bitwise accuracy

In [6]:
results = model.evaluate(X, y, batch_size=256)
print("test loss: %f, test acc: %s" % tuple(results))

2021-10-19 17:34:51.513741: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)


test loss: 0.000000, test acc: 1.0
