# Random Seed Search Nash Equilibrium Opening Book


Rock Paper Scissors is a battle between Order and Chaos. Hail Eris!

The Nash Equilibrium strategy for Rock Paper Scissors is to play perfectly randomly. On a long enough timeline, this a guaranteed statistical draw. For years it was said this strategy couldn't be beaten. This is only true with a mathematically pure implementation of randomness. However is the more popular method is `random.random(seed)`.

This uses the Mersenne Twister algorithm which has a period length is chosen to be a Mersenne prime. The randomness generated by the Mersenne Twister acts as a repeating number rather than a true irrational number. Within the randomness is a hidden pattern, incredibly faint but with just enough information structure to be exploitable.

Most of the leaderboard have traded the Nash Equilibrium Draw for high-risk, high-reward Order. Playing on the side of Order results in winning games more quickly but only if your Order is better than their Order, which is how the leaderboard is sorted and it comes with a time limit. Playing on the side of Chaos gives a baseline strategy of an unbeatable statistical draw. All that is required is to exploit the opponents Order and use it against them. Thus is invoked the Third Law of Thermodynamics "A system's entropy approaches a constant value as its temperature approaches absolute zero."

Random Seed Search has a leaderboard score of 915 which is +230 points above the Nash Equilibrium PI bot and is still slowly trending upwards. 

Irrational numbers do however contain one fatal flaw. They are password protected. They have to be beaten before the game starts with it's author explicitly naming them in his source code. So how many irrational numbers are there? An infinite amount. This makes brute force search impossible. However human linguistics only has room for a small number of named irrational numbers. What is your favorite irrational number? Name it! Did you choose PI?

The price of being irrational is that you are forced to introduce yourself as: Pleased to meet you, won't you guess my name? Strategy is to play a fixed sequence against every opponent. The password attack is performed by playing a counter-irrational agent with the same sequence but all the numbers rotated by +1 % 3. The Anti-Pi bot can equally be countered by the Anti-Anti-PI bot, which does the same again. However my PI bot was forked 5 times (+3 upvotes) whilst the Anti-Anti-PI bots got no forks and less upvotes. Human psychology prefers not to be too irrational when it comes to selecting irrational numbers. This causes selection bias which is exploitable.

Thus the choice of irrational number can be used both as an attack and as a secret secure password. 

Anti-Pi bot scores [ 824 803 711 668 685 ] vs [ 685 ] for my original Pi bot. There are two effects to be seen here. First is that three of the scores [ 668 662 685 ] are within 23 points of each other. This is the Nash Equilibrium draw point for this game with exactly equal wins and losses. The Mersenne Twister RNG Bot scores [ 761 ] which is in the middle of the Anti-PI bot distribution. The other two Anti-Pi bots scores are [ 824 803 711 ] in a slightly higher range. 5 people forked my notebook (yet only 3 gave upvotes). The Anti-Pi bot is thus a password attack on an irrational agent. The Anti-Anti-PI bot scored [ 711 662 ]. which correlates within 21 of the 685 Pi-Bot Nash Equlibrium centerpoint.  Nobody forked the Anti-Pi or Anti-Anti-Pi bots.

PI can be transformed into base 3 by `list(map( lambda c: int(c) % 3, re.sub( '[^1-9]', '', str(math.pi) )))`. but in practice requires `import mpmath; mpmath.1234; mpmath.p()` to generate a string long enough to satisfy a thousand move game.

The Random Seed Search algorithm takes the default RNG `random.random(seed)`, 
and generates RNG sequences (length 20) for as many seeds as can fit in the 
100Mb submission filesize limit. This happens to be 20 million seeds. 
Numpy.size = `(20,000,000, 20) x int8` = 425.0 MB = 94Mb tar.gz. 
Compression offers a ~4.5x space saving, 
mostly due to the first 6 bits in every int8 being zero for trinary numbers. 

Pre-caching costs 27 minutes of compile time. 
By careful use of excluding previously rejected seeds, 
we can search this array very quickly and even implement 
offset search for the opponents history without the 1s timelimit. 
Each turn we can reduce the remaining search space by a factor of 3. 
This is compared to the previous implemention which could 
only search about ~10,000 seeds per turn using a python loop.

If a seed is found, the next number in the sequence can be predicted 
without violating apparent randomness. 
This effectively acts as an opening book against Mersenne Twister RNG agents. 
What is observed in practice is the Random Seed Search is only occasionally able to 
find a small sequences numbers, often during the other agent's random warmup period. 
I have not yet seen a +500 score against an unseeded random agent. 
I suspect these partial matching sequences represent hash collisions 
of repeating sequences within the Mersenne Twister number. 

As the history gets longer, hash collisions become exponentially rarer. 
This is the game-winning difference between using a repeating number and irrational number 
for your source of randomness. 
The effect is quite small, but the 50% standard deviation for a random draw 
is in this game is only 20. A statistical 2-3 point advantage shifts 
the probability distribution of endgame score, a -21 score gets converted into a draw 
and a +19 score gets converted into a win. 

This is equivalent to the statistical effect that causes 
Black Holes to slowly lose mass via Hawking Radiation.

The leaderboard rewards winning points quickly, but the only way to become champion is to never statistically lose. Order can be indeed be beaten at its own game.

(Hopefully) Achievement Unlocked: Beat the unbeatable Nash Equilibrium RNG bot!

TODO: verify the statistics and make sure this actually works in practice. However this is still a really cool idea and implemention.

# Opponents

We have two classes of opponent that we are attempting to counter.

The first is a random agent that has explictly set a numeric seed smaller than 17 million. Against this agent, we can a near perfect score, however it may take a few turns to establish a pattern match.

The second agent is the unseeded random agent. Behind the scenes `random.seed(None)` gets its seed value from `urandom` which means we are unlikely to find the exact seed on our list. What we are left with are hash collisions, which are a byproduct of the Mersenne Twiser algorithm that uses a repeating number (with a Mersenne Prime period) rather than the pure unbreakable randomness of an Irrational number. The logs show that occasionally we will get a sequence of 3-8 numbers in a row that follow one of the seed patterns found in the cache. This usually only happens within the first 15 opening moves of the game, but this gives us a small but significant statistical advantage over the Nash Equlibrium strategy. As they say in Vegas: The House Always Wins! 
- https://github.com/python/cpython/blob/master/Modules/_randommodule.c

In [None]:
%%writefile main.py
import random

def random_agent_seeded(observation, configuration, seed=42):
    # Set a deterministic seed
    if observation.step == 0:
        random.seed(seed)

    action = random.randint(0, configuration.signs-1)
    print(f'random_agent_seeded({seed}) = {action}')
    return int(action)


def random_agent_unseeded(observation, configuration):
    # Reset seed using system clock
    if observation.step == 0:
        random.seed(None)

    action = random.randint(0, configuration.signs-1)
    print(f'random_agent_unseeded() = {action}')
    return int(action)


# Irrational Agent

For a more extended discussion of the grandparent class, see the [IrrationalAgent](https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-irrational-agent) notebook

In [None]:
%%writefile -a main.py
# %%writefile IrrationalAgent.py
# Source: https://www.kaggle.com/jamesmcguigan/random-seed-search-irrational-agent/
# Source: https://github.com/JamesMcGuigan/ai-games/blob/master/games/rock-paper-scissors/rng/IrrationalAgent.py

import re
import time
from typing import List, Union

from mpmath import mp
mp.dps = 2048  # more than 1000 as 10%+ of chars will be dropped


def encode_irrational(irrational: Union[str,mp.mpf], offset=0) -> List[int]:
    """
    Encode the irrational number into trinary
    The irrational is converted to a string, "0"s are removed
    Then each of the digits is converted to an integer % 3 and added the to sequence
    """
    if isinstance(irrational, list) and all([ 0 <= n <= 2 for n in irrational ]):
        return irrational  # prevent double encoding

    string   = re.sub('[^1-9]', '', str(irrational))
    sequence = [
        ( int(c) + int(offset) ) % 3
        for c in string
    ]
    assert len(sequence) >= 1000
    return sequence



class IrrationalAgent():
    """
    Play an fixed sequence of moves derived the digits of an irrational number

    Irrational numbers are more mathematically pure source of randomness than
    the repeating numbers used by the Mersenne Twister RNG

    This agent is pure Chaos and contains no Order capable of being exploited once the game has started

    Its only vulnerability is Password Attack.
    Pleased to meet you, won't you guess my name?

    There are an uncountable infinity of irrational numbers
    Your choice of irrational is your password
    Be irrational in your choice of irrational if want this agent to be secure
    Alternatively choose a popular irrational with an offset to attack a specific agent

    This is the true Nash Equilibrium solution to Rock Paper Scissors
    """

    irrationals = {
        name: encode_irrational(irrational)
        for name, irrational in {
            'pi':       mp.pi(),
            'phi':      mp.phi(),
            'e':        mp.e(),
            'sqrt2':    mp.sqrt(2),
            'sqrt5':    mp.sqrt(5),
            'euler':    mp.euler(),
            'catalan':  mp.catalan(),
            'apery':    mp.apery(),
            # 'khinchin':  mp.khinchin(),  # slow
            # 'glaisher':  mp.glaisher(),  # slow
            # 'mertens':   mp.mertens(),   # slow
            # 'twinprime': mp.twinprime(), # slow
        }.items()
    }


    def __init__(self, name='irrational', irrational: Union[str,mp.mpf] = None, offset=0, verbose=True):
        # Irrational numbers are pure random sequences that are immune to random seed search
        # Using name == 'irrational' causes the number to be reset each new game
        if irrational is not None and ( name == 'irrational' or name in self.irrationals.keys() ):
            name = 'secret'
        if name == 'irrational':
            irrational = self.generate_secure_irrational()
        if name in self.irrationals.keys():
            irrational = self.irrationals[name]
        self.irrational = self.encode_irrational(irrational, offset=offset)

        self.name       = name
        self.offset     = offset
        self.verbose    = verbose
        self.reset()


    def reset(self):
        """
        Reset on the first turn of every new game
        This allows a single instance to be run in a loop for testing
        """
        self.history = {
            "action":   [],
            "opponent": []
        }
        if self.name == 'irrational':
            irrational      = self.generate_secure_irrational()
            self.irrational = self.encode_irrational(irrational, offset=self.offset)



    def __call__(self, obs, conf):
        return self.agent(obs, conf)

    def agent(self, obs, conf):
        """ Wrapper function for setting state in child classes """

        # Generate a new history and irrational seed for each new game
        if obs.step == 0:
            self.reset()

        # Keep a record of opponent and action state
        if obs.step > 0:
            self.history['opponent'].append(obs.lastOpponentAction)

        # This is where the subclassable agent logic happens
        action = self.action(obs, conf)

        # Keep a record of opponent and action state
        self.history['action'].append(action)
        return action


    def action(self, obs, conf):
        """ Play the next digit in a fixed irrational sequence """
        action = int(self.irrational[ obs.step % len(self.irrational) ])
        action = (action + self.offset) % conf.signs
        if self.verbose:
            name = self.__class__.__name__ + ':' + self.name + (f'+{self.offset}' if self.offset else '')
            opponent = ( self.history['opponent'] or [None] )[-1]
            expected = ( action - 1 ) % 3
            print(f"{obs.step:4d} | {opponent}{self.win_symbol()} > action {action} | " +
                  f"{name}")
        return action


    @staticmethod
    def generate_secure_irrational():
        """
        Be irrational in your choice of irrational if want this agent to be secure
        """
        irrational = sum([
            mp.sqrt(n) * (time.monotonic_ns() % 2**32)
            for n in range(2, 5 + (time.monotonic_ns() % 1024))
        ])
        return irrational


    @classmethod
    def encode_irrational(cls, irrational: Union[str,mp.mpf], offset=0) -> List[int]:
        if irrational is None:
            irrational = cls.generate_secure_irrational()
        return encode_irrational(irrational, offset)



    ### Logging

    def win_symbol(self):
        """ Symbol representing the reward from the previous turn """
        action   = ( self.history['action']   or [None] )[-1]
        opponent = ( self.history['opponent'] or [None] )[-1]
        if isinstance(action, int) and isinstance(opponent, int):
            if action % 3 == (opponent + 1) % 3: return '+'  # win
            if action % 3 == (opponent + 0) % 3: return '|'  # draw
            if action % 3 == (opponent - 1) % 3: return '-'  # loss
        return ' '


irrational_instance = IrrationalAgent(name='pi', offset=0)
def irrational_agent(obs, conf):
    return irrational_instance.agent(obs, conf)


# Irrational Search Agent

For a more extended discussion of the parent class, see the [IrrationalSearchAgent](https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-irrational-search-agent) notebook

In [None]:
%%writefile -a main.py
# %%writefile IrrationalSearchAgent.py
# Source: https://www.kaggle.com/jamesmcguigan/random-seed-search-irrational-search-agent/
# Source: https://github.com/JamesMcGuigan/ai-games/blob/master/games/rock-paper-scissors/rng/IrrationalSearchAgent.py
from typing import List, Tuple, Union

from mpmath import mp

# from IrrationalAgent import IrrationalAgent


class IrrationalSearchAgent(IrrationalAgent):
    """
    This attempts a Password Attack against IrrationalAgent

    Its only vulnerability is Password Attack.
    Pleased to meet you, won't you guess my name?

    There are an uncountable infinity of irrational numbers,
    but a rather limited number of useful irrationals with pronounceable names

    This will only work if the opponent has chosen to play an irrational agent
    but has chosen to play a popular named irrational
    and has also chosen the same trinary encoding algorithm

    If the opponent is not playing a known irrational sequence
    then the true Nash Equilibrium is to play a secret and unnamed irrational sequence
    """

    def __init__(self, name='irrational', irrational=None, search: List[Union[str, mp.mpf]]=None, verbose=True):
        search = {
            f'irrational#{n}': self.encode_irrational(number)
            for n, number in enumerate(search)
        } if search is not None else {}
        self.irrationals = { **self.__class__.irrationals, **search }
        super().__init__(name=name, irrational=irrational, verbose=verbose)


    def action(self, obs, conf):
        expected, irrational_name = self.search_irrationals(self.history['opponent'])
        if expected is not None:
            action   = (expected + 1) % conf.signs
            opponent = ( self.history['opponent'] or [None] )[-1]
            if self.verbose:
                print(f"{obs.step:4d} | {opponent}{self.win_symbol()} > action {expected} -> {action} | " +
                      f"Found Irrational: {irrational_name}")
        else:
            action = super().action(obs, conf)  # play our own irrational number sequence
        return action


    def search_irrationals(self, sequence: List[int]) -> Tuple[int, str]:
        """
        Search through list of named irrational sequences
        if found, return the next expected number in the sequence along with the name
        """
        expected, irrational_name = None, None
        if len(sequence):
            for offset in [0,1,2]:
                for irrational_name, irrational in self.irrationals.items():
                    irrational = [ (n + offset) % 3 for n in irrational[:len(sequence)+1] ]
                    if irrational[:len(sequence)] == sequence:
                        expected = irrational[len(sequence)]
                        break
                else: continue
                break
        return expected, irrational_name




irrational_search_instance = IrrationalSearchAgent()
def irrational_search_agent(obs, conf):
    return irrational_search_instance.agent(obs, conf)


# Random Seed Search

This is the class you have been looking for

In [None]:
%%writefile -a main.py
# %%writefile RandomSeedSearch.py
# Source: https://www.kaggle.com/jamesmcguigan/random-seed-search-nash-equilibrium-opening-book/
# Source: https://github.com/JamesMcGuigan/ai-games/blob/master/games/rock-paper-scissors/rng/RandomSeedSearch.py

import glob
import os
import random
import time
from collections import defaultdict
from operator import itemgetter
from typing import List, Optional, Tuple

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'  # disable tensorflow logging

import numpy as np
import tensorflow as tf
from humanize import naturalsize

# from rng.IrrationalSearchAgent import IrrationalSearchAgent

# BUGFIX: Kaggle Submission Environment os.getcwd() == "/kaggle/working/"
if os.environ.get('GFOOTBALL_DATA_DIR', ''):
    os.chdir('/kaggle_simulations/agent/')


class RandomSeedSearch(IrrationalSearchAgent):
    """
    The Random Seed Search algorithm takes the default RNG `random.random(seed)`,
    and generates RNG sequences (length 20) for as many seeds as can fit in the
    100Mb submission filesize limit. This happens to be 20 million seeds.
    Numpy.size = `(20,000,000, 20) x int8` = 425.0 MB = 94Mb tar.gz.
    Compression offers a ~4.5x space saving,
    mostly due to the first 6 bits in every int8 being zero for trinary numbers.

    Pre-caching costs 27 minutes of compile time.
    By careful use of excluding previously rejected seeds,
    we can search this array very quickly and even implement
    offset search for the opponents history without the 1s timelimit.
    Each turn we can reduce the remaining search space by a factor of 3.
    This is compared to the previous implemention which could
    only search about ~10,000 seeds per turn using a python loop.

    If a seed is found, the next number in the sequence can be predicted
    without violating apparent randomness.
    This effectively acts as an opening book against Mersenne Twister RNG agents.
    What is observed in practice is the Random Seed Search is only occasionally able to
    find a small sequences numbers, often during the other agent's random warmup period.
    I have not yet seen a +500 score against an unseeded random agent.
    I suspect these partial matching sequences represent hash collisions
    of repeating sequences within the Mersenne Twister number.

    As the history gets longer, hash collisions become exponentially rarer.
    This is the game-winning difference between using a repeating number and irrational number
    for your source of randomness.
    The effect is quite small, but the 50% standard deviation for a random draw
    is in this game is only 20. A statistical 2-3 point advantage shifts
    the probability distribution of endgame score, a -21 score gets converted into a draw
    and a +19 score gets converted into a win.

    This is equivalent to the statistical effect that causes
    Black Holes to slowly lose mass via Hawking Radiation.

    Achievement Unlocked: Beat the unbeatable Nash Equilibrium RNG bot!
    """

    methods     = [ 'random' ]  # [ 'np', 'tf' ]  # just use random
    cache_steps = 20  # seeds are rarely found after move 15
    cache_seeds = int(4 * 100_000_000 / len(methods) / cache_steps)  # 100Mb x 4.25x tar.gz compression
    cache = {
        method: np.load(f'{method}.npy')
        for method in methods
        if os.path.exists(f'{method}.npy')
    }

    def __init__(self,
                 min_length=4,
                 max_offset=1000,
                 use_stats=True,
                 cheat=False,
                 no_submit=False,
                 verbose=True
    ):
        """
        :param min_length:  minimum sequence length for a match 3^4 == 1/81 probability
        :param max_offset:  maximum offset to search for from start of history
        :param use_stats:   pick the most probable continuation rather than minimum seed
        :param no_submit:   throw exception mid-game, to allow testing  submission environment
        :param cheat:       set the opponents seed - only works on localhost
        :param verbose:     log output to console
        """
        self.use_stats  = use_stats
        self.cheat      = cheat   # needs to be set before super()
        self.min_length = min_length
        self.max_offset = max_offset
        self.no_submit  = no_submit
        self.conf       = {'episodeSteps': 1000, 'actTimeout': 1000, 'agentTimeout': 15, 'runTimeout': 1200, 'isProduction': False, 'signs': 3}
        super().__init__(verbose=verbose)
        self.print_cache_size()

    def reset(self):
        """ This gets called at the beginning of every game """
        if self.verbose >= 2: print(f'{self.__class__.__name__} | reset()')
        super().reset()

        # SPEC: self.candidate_seeds[method][offset] = list(seeds)
        self.candidate_seeds = defaultdict(lambda: defaultdict(dict))
        # SPEC: self.repeating_seeds[method] = count
        self.repeating_seeds = defaultdict(lambda: defaultdict(int))
        if self.cheat:
            self.set_global_seed()


    # obs  {'step': 168, 'lastOpponentAction': 0}
    # conf {'episodeSteps': 1000, 'actTimeout': 1000, 'agentTimeout': 15, 'runTimeout': 1200, 'isProduction': False, 'signs': 3}
    def action(self, obs, conf):
        # NOTE: self.history state is managed in the parent class
        self.conf = conf

        # This allows testing of the agent without actually submitting to the competition
        if self.no_submit and obs.step > 900:
            raise Exception("Don't Submit To Competition")

        # This is a short circuit to speed up unit tests
        irrational, irrational_name = self.search_irrationals(self.history['opponent'])
        if irrational is not None and obs.step > self.min_length + 2:
            return super().action(obs, conf)

        # Search the Random Seed Cache
        # Important to do this each turn as it reduces self.candidate_seeds[offset] by 1/3
        expected, seed, offset, method = self.search_cache(self.history['opponent'])

        # If have multiple or zero seed matches, but also an irrational, then use that
        if seed is None and irrational is not None:
            action = super().action(obs, conf)

        elif expected is not None:
            action   = (expected + 1) % conf.signs
            opponent = ( self.history['opponent'] or [None] )[-1]
            if seed is None: seed = f"{'many':>12s}"
            else:            seed = f"{seed + (offset or 0)/1000:12.3f}"
            if self.verbose:
                print(
                    f"{obs.step:4d} | {opponent}{self.win_symbol()} > action {expected} -> {action} | " +
                    f"Found RNG Seed: {method:6s} {seed} |",
                    self.log_repeating_seeds()
                )

        else:
            # Default to parent class to return a secure Irrational sequence
            action = super().action(obs, conf)

        return int(action) % conf.signs



    ### Searching

    def search_cache(self, history: List[int]) \
            -> Tuple[Optional[int], Optional[int], Optional[int], Optional[str]]:
        """
        Search though the range of different pre-cached numpy arrays for a seed sequence match
        """
        expected, seed, offset, method = None, None, None, None
        for method in self.methods:
            seed, expected, offset = self.search_cache_method(history, method=method)
            if expected is not None:
                break
        return expected, seed, offset, method


    def search_cache_method(self, history: List[int], method='random') \
            -> Tuple[Optional[int], Optional[int], Optional[int]]:
        """
        Perform a vectorized numpy search for the opponent's RNG sequence
        This allows 8.5 million records to be searched in about 0.2ms
        Compared to a search rate of about ~10,000/s using list() comparison in a python loop
        """
        time_start = time.perf_counter()
        seed     = None
        expected = None
        offset   = None
        if method in self.cache.keys() and len(history):

            # Keep track of sequences we have already excluded, to improve performance
            seeds, offset = self.find_candidate_seeds(history, method)
            sequence      = history[offset:]

            if self.use_stats and len(seeds) >= 2:
                # the most common early-game case is a hash collision
                # we can compute stats on the distribution of next numbers
                # the list of matching seeds will exponentially decrease as history gets longer
                seed     = None
                stats    = np.bincount(self.cache[method][seeds,len(sequence)])
                expected = np.argmax(stats)
                if np.count_nonzero(stats) == 1:  # all seeds agree
                    seed = np.min(seeds)

            elif len(seeds):
                # Pick the first matching seed, and play out the full sequence
                seed = np.min(seeds)
                size = len(sequence)

                if self.cache[method].shape[1] > size:
                    # Lookup the next number from the cache
                    expected = self.cache[method][seed,size]
                else:
                    # Compute the remainder of the Mersenne Twister sequence
                    expected = self.get_rng_sequence(seed, length=size + 1, method=method)[-1]

            # This is a log of how much statistical advantage we gained
            if seed is not None:
                self.repeating_seeds[method][seed + offset/1000] += 1

        if self.verbose >= 2:
            time_taken = (time.perf_counter() - time_start) * 1000
            print(f'{self.__class__.__name__} | search_cache({method}): {time_taken:.3f}ms')

        return seed, expected, offset


    def find_candidate_seeds(self, history, method: str) -> Tuple[np.ndarray, int]:
        """
        Find a list of candidate seeds for a given sequence
        This makes searching through the cache very fast
        We can effectively exclude a third of the cache on ever iteration

        This now also implements offset search to find more candidates
        Despite using a loop here, the search time is somewhat stable around 600ms
        As we can exclude 1/3 of the dataset for each offset at each turn

        By implementing offset search we constantly find seeds throughout the match
        and RandomSeedSearch stops being just an opening book
        """

        seeds_by_offset = {}
        max_offset = min(self.max_offset, len(history))
        for offset in range(max_offset):
            sequence = history[offset:]
            size     = np.min([len(sequence), self.cache[method].shape[1]])
            if size < self.min_length: continue  # reduce the noise of matching sequences

            # Have we already searched for this offset and excluded possibilities
            if offset in self.candidate_seeds[method]:
                candidates = self.candidate_seeds[method][offset]
                if len(candidates) == 0:
                    seeds = candidates  # we found an empty list
                else:
                    seeds_idx  = np.where( np.all(
                        self.cache[method][ candidates, :size ] == sequence[ :size ]
                    , axis=1))[0]
                    seeds = candidates[seeds_idx]
            else:
                seeds = np.where( np.all(
                    self.cache[method][ : , :size ] == sequence[ :size ]
                , axis=1))[0]

            self.candidate_seeds[method][offset] = seeds
            if len(seeds):
                seeds_by_offset[offset] = seeds

        # Return the search that returned the shortest list
        if len(seeds_by_offset) == 0:
            seeds  = np.array([])
            offset = 0
        else:
            offset = sorted(seeds_by_offset.items(), key=lambda pair: len(pair[1]))[0][0]
            seeds  = seeds_by_offset[offset]
        return seeds, offset


    def get_rng_sequence(self, seed, length, method='random', use_cache=True) -> List[int]:
        """
        Generates the first N numbers for a given Mersenne Twister seed sequence

        Results may potentially be cached, though careful attention is paid to
        save and restore the internal state of random.random() to prevent us
        from affecting the opponent's RNG sequence,
        and accidentally stealing numbers from their sequence
        """
        sequence = []
        if ( use_cache
         and method in self.cache.keys()
         and seed   < self.cache[method].shape[0]
         and length < self.cache[method].shape[1]
        ):
            sequence = self.cache[method][seed][:length]
        else:
            # If the results are not in the cache
            # ensure we save and restore random seed state to avoid affecting opponent's RNG
            if method == 'random':
                seed_state = random.getstate()
                random.seed(seed)
                sequence = [ random.randint(0,2) for _ in range(length) ]
                random.setstate(seed_state)
            elif method == 'np':
                seed_state = np.random.get_state()
                np.random.seed(seed)
                sequence = np.random.randint(0,2, length)
                np.random.set_state(seed_state)
            elif method == 'tf':
                generator = tf.random.Generator.from_seed(seed)
                sequence = generator.uniform((length,), minval=0, maxval=3, dtype=tf.dtypes.int32).numpy()
        return list(sequence)



    ### Precaching

    def precache(self) -> List[str]:
        # BUGFIX: joblib PicklingError: Could not pickle the task to send it to the workers
        return [ self.precache_method(method) for method in self.methods ]

    def precache_method(self, method='random') -> str:
        """
        Compute all the random.random() Mersenne Twister RNG sequences
        for the first 17 million seeds.
        Only the first 25 numbers from each sequence are returned,
        but we rarely find cache hits or hash collisions after the first 15 moves.

        These numbers are configurable via cls.cache_steps and cls.cache_seeds

        This takes about 23 minutes of runtime and generates 425Mb of int8 data
        This compresses to 94Mb in tar.gz format which is under the 100Mb submission limit
        """
        filename = f'{method}.npy'
        shape    = (self.cache_seeds, self.cache_steps)
        if os.path.exists(filename) and np.load(filename).shape == shape:
            print(f'cached {filename:10s} =', shape, '=', naturalsize(os.path.getsize(filename)))
        else:
            method_cache = np.array([
                self.get_rng_sequence(seed=seed, length=self.cache_steps, method=method)
                for seed in range(self.cache_seeds)
            ], dtype=np.int8)
            np.save(filename, method_cache)
            print(f'wrote  {filename:10s} =', method_cache.shape, '=', naturalsize(os.path.getsize(filename)))
        return filename



    ### Logging

    def log_repeating_seeds(self) -> dict:
        """
        Format self.repeating_seeds for logging

        3^4 = 1 in 81   chance, which is the default minimum sequence length
        3^5 = 1 in 243  chance
        3^6 = 1 in 729  chance, which in theory should happen once per game
        3^7 = 1 in 2187 chance, which is a statistical advantage
        """
        repeating_seeds = {}
        for method in self.repeating_seeds.keys():
            repeating_seeds[method] = {
                key: value
                for key, value in self.repeating_seeds[method].items()
                if value >= self.min_length
            }
            repeating_seeds[method] = dict(sorted(repeating_seeds[method].items(), key=itemgetter(1), reverse=True))
            if len(repeating_seeds[method]) == 0:
                del repeating_seeds[method]
        return repeating_seeds


    def print_cache_size(self):
        """ Mostly for debugging purposes, log the contents of the cache """
        if self.verbose:
            filenames = [ (filename + ' = ' + naturalsize(os.path.getsize(filename)))
                          for filename in glob.glob('*') ]
            print('tar.gz =', filenames)
            print(f'{self.__class__.__name__}::cache.keys()', list(self.cache.keys()))
            for name, cache in self.cache.items():
                print(f'{self.__class__.__name__}::cache[{name}] = {cache.shape}' )


    ### Cheats

    @staticmethod
    def set_global_seed(seed=42):
        """
        This is a cheat to set the opponent's RNG seed
        It only works when playing on localhost or inside a Kaggle Notebook
        This is due to both agents playing within the same python process
        and thus sharing random.random() RNG state

        This doesn't work inside the kaggle leaderboard submission environment
        as both agents play within different CPU processes.

        This attack also assumes that we do not ourselves invoke the
        random.random() RNG ourselves at runtime, which is solved by
        precaching the first 8.5 million seed sequences at compile time
        """
        random.seed(seed)
        np.random.seed(seed)
        tf.random.set_seed(seed)



random_seed_search_instance = RandomSeedSearch()
def random_seed_search_agent(obs, conf):
    return random_seed_search_instance.agent(obs, conf)


In [None]:
%run main.py

# Precache RNG Sequences

Compute all the random.random() Mersenne Twister RNG sequences for the first 20 million seeds.

Only the first 20 numbers from each sequence are returned, but we rarely find cache hits or hash collisions after the first 15 moves. A sequence of 20 trinary numbers has a 1 in 3.5 billion probability of occuring randomly.

This takes about 23 minutes of runtime and generates 425Mb of int8 data. This compresses to 94Mb in tar.gz format which is under the 100Mb submission limit.

At runtime it takes upto 850ms (on localhost) and 1.2s (inside a Kaggle notebook) to search through this entire list as a numpy vectorized operation. This is close or even over to the maximum 1s/turn runtime limit imposed by the competition. 

However this search time can be reduced massively by keeping track of previously excluded seed indexes. This enables us to do offset searching, searching for hash collisions at all possible sequence lengths. Search time for this loop remains stable around 600ms as the number of remaining candidate seeds for each offset is reduced by a factor of 3 after each search.

In [None]:
# We can save some notebook compile time by reusing the cache generated by the last submission
!cp -v ../input/random-seed-search-nash-equilibrium-opening-book/*.npy ./

In [None]:
%%time
RandomSeedSearch().precache()

In [None]:
%%time
%run main.py

In [None]:
%%time
import random

random.seed(42)
method   = 'random'
size     = RandomSeedSearch.cache[method].shape[1]
history  = np.array([ random.randint(0,2) for _ in range(size)  ])
np.where( np.all(
    RandomSeedSearch.cache[method][ : , :size ] == history[ :size ]
, axis=1))[0]

In [None]:
%%time
%run main.py

# Unit Tests

Make sure everything works as expected

In [None]:
%%writefile test_RandomSeedSearch.py
import numpy as np
import pytest
from joblib import delayed, Parallel
from kaggle_environments import evaluate

from main import IrrationalAgent
from main import IrrationalSearchAgent
from main import random_agent_seeded
from main import RandomSeedSearch


@pytest.mark.parametrize("name",   IrrationalSearchAgent.irrationals.keys())
@pytest.mark.parametrize("offset", [0,1,2])
def test_RandomSeedSearch_vs_named_irrational(name, offset):
    """ Assert we can find the full irrational sequence every time """
    assert len(RandomSeedSearch.cache)  # check cache filepath can be found
    episodeSteps = 100
    results = evaluate(
        "rps",
        [
            IrrationalAgent(name=name, offset=offset, verbose=False),
            RandomSeedSearch(verbose=False)
        ],
        configuration={
            "episodeSteps": episodeSteps,
            "tieRewardThreshold":  1,     # Disable draws
            "actTimeout":          1000,  # Prevent TimeoutError
        },
        # debug=True  # pull request
    )
    results = np.array(results).reshape((-1,2))
    assert len(results[ results == None ]) == 0   # No errored matches    
    assert (results[0][0] + episodeSteps/2.1) < results[0][1]


def test_RandomSeedSearch_vs_seeded_rng():
    """ Assert we can find the full irrational sequence every time """
    assert len(RandomSeedSearch.cache)  # check cache filepath can be found
    episodeSteps = 100
    results = evaluate(
        "rps",
        [
            random_agent_seeded,
            RandomSeedSearch(verbose=False)
        ],
        configuration={
            "episodeSteps": episodeSteps,
            "tieRewardThreshold":  1,     # Disable draws
            "actTimeout":          1000,  # Prevent TimeoutError
        },
        # debug=True  # pull request
    )
    assert results[0][0] < results[0][1], results



def test_RandomSeedSearch_vs_Irrational():
    """ Show we don't have a statistical advantage inside the opening book vs irrational """
    episodeSteps = 100
    results = Parallel(-1)( delayed(evaluate)(
        "rps",
        [
            IrrationalAgent(verbose=False),
            RandomSeedSearch(verbose=False)
        ],
        configuration={
            "episodeSteps": episodeSteps,
            "tieRewardThreshold":  1,     # Disable draws
            "actTimeout":          1000,  # Prevent TimeoutError
        },
        num_episodes=1
        # debug=True,  # pull request
    ) for _ in range(int(1000/episodeSteps)) )
    results = np.array(results).reshape((-1,2))
    assert len(results[ results == None ]) == 0   # No errored matches

    totals  = np.mean(results, axis=0)
    std     = np.std(results, axis=0).round(2)
    winrate = [ np.sum(results[:,0] > results[:,1]),
                np.sum(results[:,0] < results[:,1]) ]

    print('results', results)
    print('totals',  totals)
    print('std',     std)
    print('winrate', winrate)

    assert np.abs(totals[0]) < 0.2 * episodeSteps  # scores are within 20%
    assert np.abs(totals[1]) < 0.2 * episodeSteps  # scores are within 20%
    assert np.abs(std[0])    < 0.2 * episodeSteps  # std  within 20%
    assert np.abs(std[1])    < 0.2 * episodeSteps  # std  within 20%



def test_RandomSeedSearch_vs_unseeded_RNG():
    """ Show we have a statistical advantage vs RNG """
    episodeSteps = 100
    results = Parallel(-1)( delayed(evaluate)(
        "rps",
        [
            "rng/random_agent_unseeded.py",
            RandomSeedSearch(verbose=False)
        ],
        configuration={
            "episodeSteps": episodeSteps,
            "tieRewardThreshold":  1,     # Disable draws
            "actTimeout":          1000,  # Prevent TimeoutError
        },
        num_episodes=1
        # debug=True,  # pull request
    ) for _ in range(int(1000/episodeSteps)) )
    results = np.array(results).reshape((-1,2))
    assert len(results[ results == None ]) == 0   # No errored matches

    totals  = np.sum(results, axis=0)
    std     = np.std(results, axis=0).round(2)
    winrate = [ np.sum(results[:,0] > results[:,1]),
                np.sum(results[:,0] < results[:,1]) ]

    print('results', results)
    print('totals',  totals)
    print('std',     std)
    print('winrate', winrate)

    assert winrate[0] <= winrate[1], winrate      # We have a winrate advantage or draw
    assert totals[0]  <  totals[1],  totals       # We have a statistical advantage
    # assert np.abs(std[0]) < 0.2 * episodeSteps  # std within 20%
    # assert np.abs(std[1]) < 0.2 * episodeSteps  # std within 20%


In [None]:
!pip install --upgrade -q kaggle_environments 2> /dev/null

In [None]:
!pytest -v test_RandomSeedSearch.py

# Demonstration


Versus an agent with a manually defined seed, we can easily win without the need for offset search.

In [None]:
from kaggle_environments import make, evaluate

env = make("rps", configuration={"episodeSteps": 10}, debug=True)
env.run([ RandomSeedSearch(max_offset=1000), random_agent_seeded,  ])
env.render(mode="ipython", width=400, height=400)

Versus an unseeded agent, we hopefully get a small statistical advantage from hash collisions.

Timings for offset search over the entire history remain stable around ~750ms

In [None]:
from kaggle_environments import make, evaluate

env = make("rps", configuration={"episodeSteps": 20}, debug=True)
env.run([ RandomSeedSearch(verbose=2, max_offset=1000), random_agent_unseeded,  ])
env.render(mode="ipython", width=400, height=400)

In localhost mode, we can also cheat by setting the opponents seed for them. However this doesn't actually work in the submissions environment.

In [None]:
from kaggle_environments import make, evaluate

env = make("rps", configuration={"episodeSteps": 20}, debug=True)
env.run([ RandomSeedSearch(cheat=True, max_offset=0), random_agent_unseeded,  ])
env.render(mode="ipython", width=400, height=400)

We can also beat anybody who plays an irrational sequence

In [None]:
from kaggle_environments import make, evaluate

env = make("rps", configuration={"episodeSteps": 30}, debug=True)
env.run([ RandomSeedSearch(), IrrationalAgent(name='pi', verbose=False),  ])
env.render(mode="ipython", width=400, height=400)

# Statistics

However a small but consistent statistical advantage can sometimes be enough to convert a loss into a draw and a draw into a win. 

In [None]:
%%time
import numpy as np
from joblib import delayed, Parallel

agents  = [ RandomSeedSearch(verbose=False), random_agent_unseeded ]

results = evaluate(
    "rps",
    agents,
    configuration={
        "episodeSteps":        1000,  # Full match
        "tieRewardThreshold":  1,     # Disable draws
        "actTimeout":          1000,  # Prevent TimeoutError
    },
    num_episodes=30
    # debug=True,  # pull request
)

results = np.array(results).reshape((-1,2))
results[ results == None ] = -1
totals  = np.sum(results, axis=0)
std     = np.std(results, axis=0).round(2)
# simulate +-20 draws
winrate = [ np.sum(results[:,0]-20 > results[:,1]),
            np.sum(results[:,0]+20 < results[:,1]) ]


print([ getattr(agent, '__name__', agent.__class__.__name__) for agent in agents ])
print('winrate', winrate, '/', len(results))
print('totals ', totals)
print('std    ', std)

# Submission

Multi-file submissions where introduced during Hailte 

> Today we released support for multi-file agents. To upload an agent with multiple files, submit your work as a .tar.gz archive (the name must end in .tar.gz) with a python file at the top level called main.py that conforms to the normal agent submission requirements. The maximum upload size of 100MB applies to the full archive. For the initial release of this feature we only compile the main.py file so to import other python files you'll either have to compile/exec or importlib the relevant files. We're hoping to improve this in a future release and will update this topic as appropriate. - Sam Harris
- https://www.kaggle.com/c/halite/discussion/177686


`tar.gzip` gives about 4.25 - 4.8x compression, which is due to the first 6 bits in every numpy int8 being 0 when encoding trinary

In [None]:
%%time
%%bash
GZIP=-9 tar cfz main.py.tar.gz main.py *.npy
ls -lah  *.tar.gz *.npy *.py
tar -tvf *.tar.gz

# Further Reading

If you liked this notebook, please upvote!

This work was inspired by my previous attempt to apply this approach to the MINST dataset, for which I was able to achieve [11% accuracy](https://www.kaggle.com/jamesmcguigan/minst-random-seed-search) on a 1 in 10 choice! 

This notebook is part of a series exploring Rock Paper Scissors:

Irrational
- [PI Bot](https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-pi-bot)
- [Anti-PI Bot](https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-anti-pi-bot)
- [Anti-Anti-PI Bot](https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-anti-anti-pi-bot)
- [Irrational Agent](https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-irrational-agent)
- [Irrational Search Agent](https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-irrational-search-agent)
- [Random Seed Search Nash Equlibrium Opening Book](https://www.kaggle.com/jamesmcguigan/random-seed-search-nash-equlibrium-opening-book)

RNG
- [Random Agent](https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-random-agent)
- [RNG Statistics](https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-rng-statistics)

Sequence
- [De Bruijn Sequence](https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-de-bruijn-sequence)

Opponent Response
- [Anti-Rotn](https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-anti-rotn)
- [Sequential Strategies](https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-sequential-strategies)

Statistical 
- [Weighted Random Agent](https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-weighted-random-agent)
- [Anti-Rotn Weighted Random](https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-anti-rotn-weighted-random)
- [Statistical Prediction](https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-statistical-prediction)

Memory Patterns
- [Naive Bayes](https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-naive-bayes)
- [Memory Patterns](https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-memory-patterns)

Decision Tree
- [XGBoost](https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-xgboost)
- [Multi Stage Decision Tree](https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-multi-stage-decision-tree)
- [Decision Tree Ensemble](https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-decision-tree-ensemble)

Neural Networks
- [LSTM](https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-lstm)

Ensemble
- [Multi Armed Stats Bandit](https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-multi-armed-stats-bandit)

RoShamBo Competition Winners
- [Iocaine Powder](https://www.kaggle.com/jamesmcguigan/rps-roshambo-comp-iocaine-powder)
- [Greenberg](https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-greenberg)