This is a slightly updated copy of [Chan Kha Vu](https://www.kaggle.com/chankhavu)'s original [RPS Dojo](https://www.kaggle.com/chankhavu/rps-dojo). Please follow the link first and give the author all the respect he or she deserves! Kernels like this really make a difference and bring more life to Kaggle Competitions!

# Rock-Paper-Scissors Dojo

![Dojo](https://i.imgur.com/jeWU2Ea.png[/img])

It is very important to keep a diverse pool of agents to test your new agent with. In this notebook, I collected a bunch of agents from public notebooks to form a "Dojo" where you can test your agent before submitting.

I am also planning to add my own agents that are weaker than my flagship ones in the future. This notebook can be attached to your notebooks as a dataset. At the end of this notebook, you can also find a simple code for agents comparison.

## <span style='color:blue'>Changes compared to original version</span>
### Performance
* Dependency-free local high-performance RPS evaluator
* All agents are converted from files into classes with same contract and shared library dependencies
* All agents are ranked by performance and have performance coefficient added to control how many matches will be played against this agent
* Some slow agents are slightly modified to increase repformance
* Multi-armed bandits are excluded - those are extremely slow and are in fact a <i>meta</i>-agents that can be composed of any subset of other agents

### Competition
* New <span style='color:brown'>Brown</span> Belt added
* Some agents moved to other categories to better match complexity
* New agents added based on my own research

### New agent format
Agents are now classes. New instance is created before each match, so `__init__` is where you define global parameters.

`def next_action(self, T, A, S):` instance method is called on every step with following parameters:
* `T = observation.step`
* `A = observation.lastOpponentAction if T > 0 else None`
* `S = configuration.signs`

In [None]:
import random
import secrets
import math
import collections
import time
import numpy as np
import pandas as pd
from tqdm.auto import tqdm
from multiprocessing import Pool

import pydash
from itertools import combinations_with_replacement
from sklearn.tree import DecisionTreeClassifier
from xgboost import XGBClassifier
from operator import itemgetter

def get_score(S, A1, A2):
    return (S + A1 - A2 + 1) % S - 1

<h1 style='background:white; border:3px solid; color:black'><center>White Belt</center></h1>

<br>

These agents are extremely simple and can be used as a minimum testing pool for your agent. A decent agent should beat all of the "white belt" baseline ones in 100% of matches. The following agents were added in the "simple" category:
- **`Rock`** &mdash; plays only Rock
- **`Paper`** &mdash; plays only Paper
- **`Scissors`** &mdash; plays only Scissors
- **`Mirror`** &mdash; mirrors the opponent's last moves
- **`Mirror1`** &mdash; mirrors the opponent's last moves and shift by 1
- **`Mirror2`** &mdash; mirrors the opponent's last moves and shift by 2
- **`CounterReact`** &mdash; counter strategy to reactionary `Mirror1` agent
- **`DeBruijn`** &mdash; from the [Rock Paper Scissors - De Bruijn Sequence](https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-de-bruijn-sequence) notebook
- **`Statistical`** &mdash; monitors the move statistics and tries to counter the most frequent move
- **`NotSoMarkov`** &mdash; from [(Not so) Markov](https://www.kaggle.com/alexandersamarin/not-so-markov) (v5)
- **`RFind`** &mdash; from [Running RPSContest bots](https://www.kaggle.com/purplepuppy/running-rpscontest-bots)

In [None]:
class Rock:
    K = 20000
    def next_action(self, T, A, S):
      return 0

class Paper:
    K = 20000
    def next_action(self, T, A, S):
      return 1

class Scissors:
    K = 20000
    def next_action(self, T, A, S):
      return 2

class Mirror:
    K = 10000
    def next_action(self, T, A, S):
        if T == 0:
            return secrets.randbelow(S)
        else:
            return A

class Mirror1:
    K = 10000
    def next_action(self, T, A, S):
        if T == 0:
            return secrets.randbelow(S)
        else:
            return (A + 1) % S

class Mirror2:
    K = 10000
    def next_action(self, T, A, S):
        if T == 0:
            return secrets.randbelow(S)
        else:
            return (A + 2) % S

class CounterReact:
    K = 6000
    def __init__(self):
        self.last_counter_action = 0

    def next_action(self, T, A, S):
        if T == 0:
            self.last_counter_action = secrets.randbelow(S)
        elif get_score(S, self.last_counter_action, A) == 1:
            self.last_counter_action = (self.last_counter_action + 2) % S
        else:
            self.last_counter_action = (A + 1) % S

        return self.last_counter_action

class DeBruijn:
    K = 10000
    actions = pydash.flatten(list(combinations_with_replacement([2,1,0,2,1,0],3)) * 18)

    def next_action(self, T, A, S):
        return int(DeBruijn.actions[T] % S)
    
class Statistical:
    K = 5000
    def __init__(self):
        self.action_histogram = {}

    def next_action(self, T, A, S):
        if T == 0:
            return secrets.randbelow(S)
        else:
            if A not in self.action_histogram:
                self.action_histogram[A] = 0
            self.action_histogram[A] += 1
            mode_action = None
            mode_action_count = None
            for k, v in self.action_histogram.items():
                if mode_action_count is None or v > mode_action_count:
                    mode_action = k
                    mode_action_count = v
                    continue

            return (mode_action + 1) % S
        
class NotSoMarkov:
    K = 150
    def __init__(self):
        self.action_seq = []
        self.table = None
    
    @property
    def key(self):
        return ''.join([str(a) for a in self.action_seq[:-1]])

    def next_action(self, T, A, S):
        k = 2
        if T % 250 == 0: # refresh table every 250 steps
            self.action_seq = []
            self.table = collections.defaultdict(lambda: [1, 1, 1])    
        if len(self.action_seq) <= 2 * k + 1:
            action = secrets.randbelow(S)
            if T > 0:
                self.action_seq.extend([A, action])
            else:
                self.action_seq.append(action)
            return action
        # update table
        self.table[self.key][A] += 1
        # update action seq
        self.action_seq[:-2] = self.action_seq[2:]
        self.action_seq[-2] = A
        # predict opponent next move
        if T < 500:
            next_opponent_action_pred = np.argmax(self.table[self.key])
        else:
            scores = np.array(self.table[self.key])
            next_opponent_action_pred = np.random.choice(S, p=scores/scores.sum()) # add stochasticity for second part of the game
        # make an action
        action = (next_opponent_action_pred + 1) % S
        # if high probability to lose -> let's surprise our opponent with sudden change of our strategy
        if T > 900:
            action = next_opponent_action_pred
        self.action_seq[-1] = action
        return int(action)
    
class RFind:
    K = 200
    def __init__(self):
        self.hist = []  # history of your moves
        self.dict_last = {}
        self.max_dict_key = 10
        self.last_move = 0

    def predict(self, S):
        for i in reversed(range(min(len(self.hist), self.max_dict_key))):
            t = tuple(self.hist[-i:])
            if t in self.dict_last:
                return self.dict_last[t]
        return secrets.randbelow(S)

    def update(self, move, A):
        self.hist.append(move)
        for i in reversed(range(min(len(self.hist), self.max_dict_key))):
            t = tuple(self.hist[-i:])
            self.dict_last[t] = A

    def next_action(self, T, A, S):
        if T == 0:
            self.last_move = secrets.randbelow(S)
        else:
            self.update(self.last_move, A)
            self.last_move = (self.predict(S) + 1) % S
        return self.last_move
    
white_belt = {
    "Rock": Rock,
    "Paper": Paper,
    "Scissors": Scissors,
    "Mirror": Mirror,
    "Mirror1": Mirror1,
    "Mirror2": Mirror2,
    "CounterReact": CounterReact,
    "DeBruijn": DeBruijn,
    "Statistical": Statistical,
    "NotSoMarkov": NotSoMarkov,
    "RFind": RFind,
}
print(list(white_belt.keys()))

<h1 style='background:blue; border:3px solid; border-color: blue; color:white'><center>Blue Belt</center></h1>

<br>

You also need to beat these in 100% of the time if you want any chance to rise to the "bronze" range, because the LB is probably already filled with those agents. The following agents were added in the "Blue Belt" category:

- **`TransitionMatrix`** &mdash; from [RPS: Opponent Transition Matrix](https://www.kaggle.com/group16/rps-opponent-transition-matrix) notebook (v2)
- **`StochasticTransitionMatrix`** &mdash; from [RPS - Stochastic Transition Matrix](https://www.kaggle.com/peternagymathe/rps-stochastic-transition-matrix)
- **`StatisticalPrediction`** &mdash; from [Rock Paper Scissors - Statistical Prediction](https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-statistical-prediction) (v17)
- **`WeightedRandom`** &mdash; from [Weighted Random Agent](https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-weighted-random-agent?scriptVersionId=46468176) (v4)
- **`Patterns`** &mdash; stores statistics of opponent moves separately for each combination of preceding actions
- **`SelfPatterns`** &mdash; stores statistics of own and opponent moves for each combination of preceding actions (2-5 steps to the past)

In [None]:
class TransitionMatrix:
    K = 100
    def __init__(self):
        self.T = np.zeros((3, 3))
        self.P = np.zeros((3, 3))

        # a1 is the action of the opponent 1 step ago
        # a2 is the action of the opponent 2 steps ago
        self.a1 = None
        self.a2 = None

    def next_action(self, T, A, S):
        if T > 1:
            self.a1 = A
            self.T[self.a2, self.a1] += 1
            self.P = np.divide(self.T, np.maximum(1, self.T.sum(axis=1)).reshape(-1, 1))
            self.a2 = self.a1
            if np.sum(self.P[self.a1, :]) == 1:
                return int((np.random.choice(3, p=self.P[self.a1, :]) + 1) % S)
            else:
                return secrets.randbelow(S)
        else:
            if T == 1:
                self.a2 = A
            return secrets.randbelow(S)

class StochasticTransitionMatrix:
    K = 100
    def __init__(self):
        self.matrix = np.ones((3,3,3)) * (1/3) #so we can choose object based on what we chose and what the opponent chose transition matrix
        self.matrix_freq = np.ones((3,3,3)) #frequency matrix
        self.prev_me = 0
        self.prev_op = 0

    def next_action(self, T, A, S):
        if T == 0:
            self.prev_me = (np.random.choice(S, p=self.matrix[self.prev_op, self.prev_me, :]) + 1) % S
            return self.prev_me
        if T > 1:
            self.matrix_freq[self.prev_op, self.prev_me, A] += 1
            self.matrix[self.prev_op, self.prev_me, :] = self.matrix_freq[self.prev_op, self.prev_me, :] / np.sum(self.matrix_freq[self.prev_op, self.prev_me, :]) 
        self.prev_op = A #we store the last action of the opponent  
        self.prev_me = (np.random.choice(S, p=self.matrix[self.prev_op, self.prev_me, :]) + 1) % S
        return self.prev_me

class StatisticalPrediction:
    K = 10
    def __init__(self):
        self.history = {
            "guess":      [0,1,2],
            "prediction": [0,1,2],
            "expected":   [0,1,2],
            "action":     [0,1,2],
            "opponent":   [0,1],
        }

    def next_action(self, T, A, S):
        actions         = list(range(S))  # [0,1,2]
        last_action     = self.history['action'][-1]
        opponent_action = A if T > 0 else 2

        self.history['opponent'].append(opponent_action)

        # Make weighted random guess based on the complete move history, weighted towards relative moves based on our last action 
        move_frequency       = collections.Counter(self.history['opponent'])
        response_frequency   = collections.Counter(zip(self.history['action'], self.history['opponent'])) 
        move_weights         = [ move_frequency.get(n,1) + response_frequency.get((last_action,n),1) for n in range(S) ] 
        guess                = random.choices( population=actions, weights=move_weights, k=1 )[0]

        # Compare our guess to how our opponent actually played
        guess_frequency      = collections.Counter(zip(self.history['guess'], self.history['opponent']))
        guess_weights        = [ guess_frequency.get((guess,n),1) for n in range(S) ]
        prediction           = random.choices( population=actions, weights=guess_weights, k=1 )[0]

        # Repeat, but based on how many times our prediction was correct
        prediction_frequency = collections.Counter(zip(self.history['prediction'], self.history['opponent']))
        prediction_weights   = [ prediction_frequency.get((prediction,n),1) for n in range(S) ]
        expected             = random.choices( population=actions, weights=prediction_weights, k=1 )[0]

        # Play the +1 counter move
        action = (expected + 1) % S

        # Persist state
        self.history['guess'].append(guess)
        self.history['prediction'].append(prediction)
        self.history['expected'].append(expected)
        self.history['action'].append(action)

        return action

class WeightedRandom:
    K = 800
    def next_action(self, T, A, S):
        if T == 0:
            self.choices = list(range(S))
            self.opponent_frequency = [1] * S
        else:
            self.opponent_frequency[A] += 1

        expected_action = random.choices(self.choices, weights=self.opponent_frequency, k=1)[0]
        counter_action  = (expected_action + 1) % S
        return counter_action

class Patterns:
    K = 400
    def __init__(self):
        self.rng = random.SystemRandom()

        self.hash1 = 0
        self.hash2 = 0
        self.hash3 = 0
        self.map1 = {}
        self.map2 = {}
        self.map3 = {}
        self.Jmin = 1
        self.Jmax = 11
        self.J = self.rng.randrange(self.Jmin, self.Jmax+1)
        self.D = 2
        self.G = 3
        self.R = 0.6
        self.B = 0

    def add(self, map1, hash1, A):
        if hash1 not in map1:
            map1[hash1] = {'S':0}
        d = map1[hash1]
        if A not in d:
            d[A] = 1
        else:
            d[A] += 1
        d['S'] += 1

    def match(self, map1, hash1, S):
        if hash1 not in map1:
            return
        d = map1[hash1]
        if d['S'] >= self.G:
            for A in range(S):
                if A in d and d[A] >= d['S'] * self.R:
                    self.B = (A+1) % S
                    self.J = self.rng.randrange(self.Jmin, self.Jmax)

    def next_action(self, T, A, S):
        if T > self.D:
            self.add(self.map1, self.hash1, A)
            self.add(self.map2, self.hash2, A)
            self.add(self.map3, self.hash3, A)
        if T > 0:
            self.hash1 = self.hash1 // S + A * S**(self.D-1)
            self.hash2 = self.hash2 // S + self.B * S**(self.D-1)
            self.hash3 = self.hash3 // S**2 + (A + S*self.B) * S**(2*self.D-1)
        self.B = self.rng.randrange(0, S)
        if self.J == 0:
            self.match(self.map1, self.hash1, S)
            self.match(self.map2, self.hash2, S)
            self.match(self.map3, self.hash3, S)
        else:
            self.J -= 1
        return self.B
    
class SelfPatterns:
    K = 100
    def __init__(self):
        self.Jmin = 0
        self.Jmax = 5
        self.J = self.Jmin + secrets.randbelow(self.Jmax-self.Jmin+1)
        self.Dmin = 2
        self.Dmax = 5
        self.Hash = []
        self.Map = []
        self.MyMap = []
        for D in range(self.Dmin,self.Dmax+1):
            self.Hash.append([0, 0, 0])
            self.Map.append([{}, {}, {}])
            self.MyMap.append([{}, {}, {}])
        self.G = 2
        self.R = 0.4
        self.V = 0.7
        self.VM = 0.7
        self.B = 0

    def add(self, map1, hash1, A):
        if hash1 not in map1:
            map1[hash1] = {'S':0}
        d = map1[hash1]
        if A not in d:
            d[A] = 1
        else:
            d[A] += 1
        d['S'] += 1

    def match(self, map1, hash1, S):
        if hash1 not in map1:
            return
        d = map1[hash1]
        if d['S'] >= self.G:
            for A in range(S):
                if A in d and (d[A] >= d['S'] * self.R + (1-self.R) * self.G) and secrets.randbelow(101) < 100 * self.V:
                    if secrets.randbelow(101) < 100 * self.VM:
                        self.B = (A+1) % S
                    else:
                        self.B = A % S
                    self.J = self.Jmin + secrets.randbelow(self.Jmax-self.Jmin+1)

    def next_action(self, T, A, S):
        BA = (self.B+1)%S
        self.B = secrets.randbelow(S)
        for D in range(self.Dmin,self.Dmax+1):
            if T > D:
                self.add(self.Map[D-self.Dmin][0], self.Hash[D-self.Dmin][0], A)
                self.add(self.Map[D-self.Dmin][1], self.Hash[D-self.Dmin][1], A)
                self.add(self.Map[D-self.Dmin][2], self.Hash[D-self.Dmin][2], A)
                self.add(self.MyMap[D-self.Dmin][0], self.Hash[D-self.Dmin][0], BA)
                self.add(self.MyMap[D-self.Dmin][1], self.Hash[D-self.Dmin][1], BA)
                self.add(self.MyMap[D-self.Dmin][2], self.Hash[D-self.Dmin][2], BA)
            if T > 0:
                self.Hash[D-self.Dmin][0] = self.Hash[D-self.Dmin][0] // S**2 + (A + S*self.B) * S**(2*D-1)
                self.Hash[D-self.Dmin][1] = self.Hash[D-self.Dmin][1] // S + A * S**(D-1)
                self.Hash[D-self.Dmin][2] = self.Hash[D-self.Dmin][2] // S + self.B * S**(D-1)
            if self.J == 0:
                self.match(self.Map[D-self.Dmin][0], self.Hash[D-self.Dmin][0], S)
                self.match(self.Map[D-self.Dmin][1], self.Hash[D-self.Dmin][1], S)
                self.match(self.Map[D-self.Dmin][2], self.Hash[D-self.Dmin][2], S)
            if self.J == 0:
                self.match(self.MyMap[D-self.Dmin][0], self.Hash[D-self.Dmin][0], S)
                self.match(self.MyMap[D-self.Dmin][1], self.Hash[D-self.Dmin][1], S)
                self.match(self.MyMap[D-self.Dmin][2], self.Hash[D-self.Dmin][2], S)
        if self.J > 0:
            self.J -= 1
        return self.B

blue_belt = {
    "TransitionMatrix": TransitionMatrix,
    "StochasticTransitionMatrix": StochasticTransitionMatrix,
    "StatisticalPrediction": StatisticalPrediction,
    "WeightedRandom": WeightedRandom,
    "Patterns": Patterns,
    "SelfPatterns": SelfPatterns,
}
print(list(blue_belt.keys()))

<h1 style='background:brown; border:3px solid; border-color: brown; color:white'><center>Brown Belt</center></h1>

<br>

Completely random agents are expected to be unpredictable, so normally you would end up with nearly all ties against these agents.

However, no pseudo random number generator is perfect. If your agent manages to crack specific algorithm used by random generator, it could try to win more often:

- **`Random`** &mdash; random - built-in python random generator
- **`SystemRandom`** &mdash; SystemRandom - generator provided by underlying OS
- **`SecretsRandom`** &mdash; secrets - cryptographically strong random generator, arguably most suitable for competitions like this
- **`NumpyRandom`** &mdash; numpy - numpy random choice random generator

In [None]:
class Random:
    K = 4000
    def next_action(self, T, A, S):
        return random.randrange(0, S)

class SystemRandom:
    K = 1000
    def __init__(self):
        self.rng = random.SystemRandom()

    def next_action(self, T, A, S):
        return self.rng.randrange(0, S)

class SecretsRandom:
    K = 1000
    def next_action(self, T, A, S):
        return secrets.randbelow(S)
    
class NumpyRandom:
    K = 1000
    def next_action(self, T, A, S):
        return np.random.choice(S)
    
brown_belt = {
    "Random": Random,
    "SystemRandom": SystemRandom,
    "SecretsRandom": SecretsRandom,
    "NumpyRandom": NumpyRandom,
}
print(list(brown_belt.keys()))

<h1 style='background:black; border:3px solid; border-color: black; color:white'><center>Black Belt Baselines</center></h1>

<br>

These are agents that have very nice standing on the leaderboard (near the bronze band and even above). The following agents were added to the "Black Belt" category:

- **`DecisionTree`** &mdash; from [Decision Tree Classifier](https://www.kaggle.com/alexandersamarin/decision-tree-classifier?scriptVersionId=46415861) (v4)
- **`Xgboost`** &mdash; from [XGBoost For Predicting Opponent's Action](https://www.kaggle.com/ollyattwood/xgboost-for-predicting-opponent-s-action) (v1).
- **`PatternsAggressive`** &mdash; another new agent (see Patterns above), with only last 200 steps included in statistics
- **`MemoryPatterns`** &mdash; from [Rock, Paper, Scissors with Memory Patterns](https://www.kaggle.com/yegorbiryukov/rock-paper-scissors-with-memory-patterns) (v20)
- **`Greenberg`** &mdash; from [RPS: RoShamBo Competition - Greenberg](https://www.kaggle.com/group16/rps-roshambo-competition-greenberg)
- **`Iocaine`** &mdash; from [RPS: RoShamBo Comp - Iocaine Powder](https://www.kaggle.com/group16/rps-roshambo-comp-iocaine-powder)
- **`TestingPleaseIgnore`** &mdash; from [Running RPSContest bots](https://www.kaggle.com/purplepuppy/running-rpscontest-bots)
- **`IOU2`** &mdash; from [RPSContest - IO2_fightinguuu](http://www.rpscontest.com/entry/885001)
- (missing) **`Dllu1`** &mdash; from [RPSContest - dllu1](http://www.rpscontest.com/entry/498002)
- (missing) **`CentrifugalBumblepuppy4`** &mdash; from [RPSContest - Centrifugal BumblePuppy 4](centrifugal_bumblepuppy_v4)
- **`NumpyPatterns`** &mdash; like Patterns, but optimized for performance with numpy, which allowed to calculate more statistics and introduce MAB-like scoring

In [None]:
class DecisionTree:
    K = 1
    def construct_local_features(self, rollouts):
        return np.concatenate(([step % k for step in rollouts['steps'] for k in (2, 3, 5)], rollouts['steps'], rollouts['actions'], rollouts['opp-actions']), axis=None)

    def construct_global_features(self, rollouts):
        features = np.zeros((6,), dtype=int)
        features[:3] = np.mean(np.array(rollouts['actions']).reshape(-1,1) == np.arange(3), axis=0)
        features[3:] = np.mean(np.array(rollouts['opp-actions']).reshape(-1,1) == np.arange(3), axis=0)
        return features

    def construct_features(self, short_stat_rollouts, long_stat_rollouts):
        lf = self.construct_local_features(short_stat_rollouts)
        gf = self.construct_global_features(long_stat_rollouts)
        return np.concatenate([lf, gf])

    def predict_opponent_move(self, train_data, test_sample):
        classifier = DecisionTreeClassifier(random_state=42)
        classifier.fit(train_data['x'], train_data['y'])
        return classifier.predict(test_sample)

    def update_rollouts_hist(self, A):
        self.rollouts_hist['steps'].append(self.last_move['step'])
        self.rollouts_hist['actions'].append(self.last_move['action'])
        self.rollouts_hist['opp-actions'].append(A)

    def warmup_strategy(self, S, A, T):
        action = secrets.randbelow(S)
        if T == 0:
            self.rollouts_hist = {'steps': [], 'actions': [], 'opp-actions': []}
            self.last_move = {'step': 0, 'action': action}
        else:
            self.update_rollouts_hist(A)
            self.last_move = {'step': T, 'action': action}
        return int(action)

    def init_training_data(self, k):
        for i in range(len(self.rollouts_hist['steps']) - k + 1):
            short_stat_rollouts = {key: self.rollouts_hist[key][i:i+k] for key in self.rollouts_hist}
            long_stat_rollouts = {key: self.rollouts_hist[key][:i+k] for key in self.rollouts_hist}
            features = self.construct_features(short_stat_rollouts, long_stat_rollouts)        
            self.data['x'].append(features)
        self.test_sample = self.data['x'][-1].reshape(1, -1)
        self.data['x'] = self.data['x'][:-1]
        self.data['y'] = self.rollouts_hist['opp-actions'][k:]

    def next_action(self, T, A, S):
        k = 5
        min_samples = 25
        if T == 0:
            self.data = {'x': [], 'y': []}
        # if not enough data -> randomize
        if T <= min_samples + k:
            return self.warmup_strategy(S, A, T)
        # update statistics
        self.update_rollouts_hist(A)
        # update training data
        if len(self.data['x']) == 0:
            self.init_training_data(k)
        else:        
            short_stat_rollouts = {key: self.rollouts_hist[key][-k:] for key in self.rollouts_hist}
            features = self.construct_features(short_stat_rollouts, self.rollouts_hist)
            self.data['x'].append(self.test_sample[0])
            self.data['y'] = self.rollouts_hist['opp-actions'][k:]
            self.test_sample = features.reshape(1, -1)

        # predict opponents move and choose an action
        next_opp_action_pred = self.predict_opponent_move(self.data, self.test_sample)
        action = int((next_opp_action_pred + 1) % S)
        self.last_move = {'step': T, 'action': action}
        return action
    
class Xgboost:
    K = 0.25
    def __init__(self):
        self.numTurnsPredictors = 5 #number of previous turns to use as predictors
        self.minTrainSetRows = 10 #only start predicting moves after we have enough data
        self.myLastMove = None
        self.mySecondLastMove = None
        self.opponentLastMove = None
        self.numDummies = 2 #how many dummy vars we need to represent a move
        self.predictors = pd.DataFrame(columns=[str(x) for x in range(self.numTurnsPredictors * 2 * self.numDummies)]).astype("int")
        self.opponentsMoves = [0] * 1000
        self.roundHistory = [None] * 1000
        self.dummies = [[[0,0,0,0], [0,1,0,0], [1,0,0,0]], [[0,0,0,1], [0,1,0,1], [1,0,0,1]], [[0,0,1,0], [0,1,1,0], [1,0,1,0]]]
        self.clf = XGBClassifier(n_estimators=10)

    def updateFeatures(self, rounds):
        self.predictors.loc[len(self.predictors)] = sum(rounds, [])

    def fitAndPredict(self, x, y, newX):
        self.clf.fit(x.values, y)
        return int(self.clf.predict(np.array(newX).reshape((1,-1)))[0])

    def next_action(self, T, A, S):
        if T == 0:
            self.myLastMove = secrets.randbelow(S)
            return self.myLastMove

        self.roundHistory[T-1] = self.dummies[self.myLastMove][A]
        if T == 1:
            self.myLastMove = secrets.randbelow(S)
            return self.myLastMove
        else:
            self.opponentsMoves[T-2] = A

            if T > self.numTurnsPredictors:
                self.updateFeatures(self.roundHistory[:T][-self.numTurnsPredictors - 1: -1])

            if len(self.predictors) > self.minTrainSetRows:
                predictX = sum(self.roundHistory[:T][-self.numTurnsPredictors:], []) #data to predict next move
                predictedMove = self.fitAndPredict(self.predictors, self.opponentsMoves[:T-1][(self.numTurnsPredictors-1):], predictX)
                self.myLastMove = (predictedMove + 1) % S
                return self.myLastMove
            else:
                self.myLastMove = secrets.randbelow(S)
                return self.myLastMove

class PatternAggressive:
    K = 10
    def __init__(self):
        self.Jmax = 2
        self.J = self.Jmax - int(math.sqrt(secrets.randbelow((self.Jmax+1)**2)))
        self.Dmin = 2
        self.Dmax = 5
        self.Hash = []
        self.Map = []
        self.MyMap = []
        for D in range(self.Dmin,self.Dmax+1):
            self.Hash.append([0, 0, 0])
            self.Map.append([{}, {}, {}])
            self.MyMap.append([{}, {}, {}])
        self.G = 2
        self.R = 0.4
        self.V = 0.8
        self.VM = 0.95
        self.B = 0
        self.DT = 200

    def add(self, map1, hash1, A, T):
        if hash1 not in map1:
            map1[hash1] = {'S': []}
        d = map1[hash1]
        if A not in d:
            d[A] = [T]
        else:
            d[A].append(T)
        d['S'].append(T)

    def rank(self, A, T):
        return len([a for a in A if a > T - self.DT])

    def match(self, map1, hash1, S, T):
        if hash1 not in map1:
            return
        d = map1[hash1]
        if self.rank(d['S'], T) >= self.G:
            for A in range(S):
                if A in d and (self.rank(d[A], T) >= self.rank(d['S'], T) * self.R + (1-self.R) * self.G) and secrets.randbelow(1001) < 1000 * self.V:
                    if secrets.randbelow(1001) < 1000 * self.VM:
                        self.B = (A+1) % S
                    else:
                        self.B = A % S
                    self.J = self.Jmax - int(math.sqrt(secrets.randbelow((self.Jmax+1)**2)))
    
    def next_action(self, T, A, S):
        BA = (self.B+1)%S
        self.B = secrets.randbelow(S)
        for D in range(self.Dmin,self.Dmax+1):
            if T > D:
                self.add(self.Map[D-self.Dmin][0], self.Hash[D-self.Dmin][0], A, T)
                self.add(self.Map[D-self.Dmin][1], self.Hash[D-self.Dmin][1], A, T)
                self.add(self.Map[D-self.Dmin][2], self.Hash[D-self.Dmin][2], A, T)
                self.add(self.MyMap[D-self.Dmin][0], self.Hash[D-self.Dmin][0], BA, T)
                self.add(self.MyMap[D-self.Dmin][1], self.Hash[D-self.Dmin][1], BA, T)
                self.add(self.MyMap[D-self.Dmin][2], self.Hash[D-self.Dmin][2], BA, T)
            if T > 0:
                self.Hash[D-self.Dmin][0] = self.Hash[D-self.Dmin][0] // S**2 + (A + S*self.B) * S**(2*D-1)
                self.Hash[D-self.Dmin][1] = self.Hash[D-self.Dmin][1] // S + A * S**(D-1)
                self.Hash[D-self.Dmin][2] = self.Hash[D-self.Dmin][2] // S + self.B * S**(D-1)
            if self.J == 0:
                self.match(self.Map[D-self.Dmin][0], self.Hash[D-self.Dmin][0], S, T)
                self.match(self.Map[D-self.Dmin][1], self.Hash[D-self.Dmin][1], S, T)
                self.match(self.Map[D-self.Dmin][2], self.Hash[D-self.Dmin][2], S, T)
            if self.J == 0:
                self.match(self.MyMap[D-self.Dmin][0], self.Hash[D-self.Dmin][0], S, T)
                self.match(self.MyMap[D-self.Dmin][1], self.Hash[D-self.Dmin][1], S, T)
                self.match(self.MyMap[D-self.Dmin][2], self.Hash[D-self.Dmin][2], S, T)
        if self.J > 0:
            self.J -= 1
        return self.B

class MemoryPatterns:
    K = 3
    def __init__(self):
        self.current_memory = []
        self.previous_action = {
            "action": None,
            "action_from_pattern": False,
            "pattern_group_index": None,
            "pattern_index": None
        }
        self.steps_to_random = random.randint(3, 5)
        self.current_memory_max_length = 10
        self.reward = 0
        self.group_memory_length = self.current_memory_max_length
        self.groups_of_memory_patterns = []
        for i in range(5, 2, -1):
            self.groups_of_memory_patterns.append({
                "memory_length": self.group_memory_length,
                "memory_patterns": []
            })
            self.group_memory_length -= 2

    def evaluate_pattern_efficiency(self, previous_step_result):
        pattern_group_index = self.previous_action["pattern_group_index"]
        pattern_index = self.previous_action["pattern_index"]
        pattern = self.groups_of_memory_patterns[pattern_group_index]["memory_patterns"][pattern_index]
        pattern["reward"] += previous_step_result
        if pattern["reward"] <= -3:
            del self.groups_of_memory_patterns[pattern_group_index]["memory_patterns"][pattern_index]
    
    def find_action(self, group, group_index):
        if len(self.current_memory) > group["memory_length"]:
            this_step_memory = self.current_memory[-group["memory_length"]:]
            memory_pattern, pattern_index = self.find_pattern(group["memory_patterns"], this_step_memory, group["memory_length"])
            if memory_pattern != None:
                my_action_amount = 0
                for action in memory_pattern["opp_next_actions"]:
                    if (action["amount"] > my_action_amount or
                            (action["amount"] == my_action_amount and random.random() > 0.5)):
                        my_action_amount = action["amount"]
                        my_action = action["response"]
                return my_action, pattern_index
        return None, None

    def find_pattern(self, memory_patterns, memory, memory_length):
        for i in range(len(memory_patterns)):
            actions_matched = 0
            for j in range(memory_length):
                if memory_patterns[i]["actions"][j] == memory[j]:
                    actions_matched += 1
                else:
                    break
            if actions_matched == memory_length:
                return memory_patterns[i], i
        return None, None

    def update_current_memory(self, my_action):
        if len(self.current_memory) > self.current_memory_max_length:
            del self.current_memory[:2]
        self.current_memory.append(my_action)
    
    def update_memory_pattern(self, group, A):
        if len(self.current_memory) > group["memory_length"]:
            previous_step_memory = self.current_memory[-group["memory_length"] - 2 : -2]
            previous_pattern, pattern_index = self.find_pattern(group["memory_patterns"], previous_step_memory, group["memory_length"])
            if previous_pattern == None:
                previous_pattern = {
                    "actions": previous_step_memory.copy(),
                    "reward": 0,
                    "opp_next_actions": [
                        {"action": 0, "amount": 0, "response": 1},
                        {"action": 1, "amount": 0, "response": 2},
                        {"action": 2, "amount": 0, "response": 0}
                    ]
                }
                group["memory_patterns"].append(previous_pattern)
            for action in previous_pattern["opp_next_actions"]:
                if action["action"] == A:
                    action["amount"] += 1
    
    def next_action(self, T, A, S):
        my_action = None
    
        self.steps_to_random -= 1
        if self.steps_to_random <= 0:
            self.steps_to_random = random.randint(3, 5)
            my_action = secrets.randbelow(S)
            self.previous_action["action"] = my_action
            self.previous_action["action_from_pattern"] = False
            self.previous_action["pattern_group_index"] = None
            self.previous_action["pattern_index"] = None

        if T > 0:
            self.current_memory.append(A)
            previous_step_result = get_score(S, self.current_memory[-2], self.current_memory[-1])
            self.reward += previous_step_result
            if self.previous_action["action_from_pattern"]:
                self.evaluate_pattern_efficiency(previous_step_result)

        for i in range(len(self.groups_of_memory_patterns)):
            self.update_memory_pattern(self.groups_of_memory_patterns[i], A)
            if my_action == None:
                my_action, pattern_index = self.find_action(self.groups_of_memory_patterns[i], i)
                if my_action != None:
                    self.previous_action["action"] = my_action
                    self.previous_action["action_from_pattern"] = True
                    self.previous_action["pattern_group_index"] = i
                    self.previous_action["pattern_index"] = pattern_index

        if my_action == None:
            my_action = secrets.randbelow(S)
            self.previous_action["action"] = my_action
            self.previous_action["action_from_pattern"] = False
            self.previous_action["pattern_group_index"] = None
            self.previous_action["pattern_index"] = None

        self.update_current_memory(my_action)
        return my_action

class Iocaine:
    K = 1
    class Stats:
        def __init__(self):
            self.sum = [[0, 0, 0]]
        def add(self, move, score):
            self.sum[-1][move] += score
        def advance(self):
            self.sum.append(self.sum[-1])
        def max(self, age, default, score):
            if age >= len(self.sum): diff = self.sum[-1]
            else: diff = [self.sum[-1][i] - self.sum[-1 - age][i] for i in range(3)]
            m = max(diff)
            if m > score: return diff.index(m), m
            return default, score

    class Predictor:
        def __init__(self):
            self.stats = Iocaine.Stats()
            self.lastguess = -1

        def addguess(self, lastmove, guess, S):
            if lastmove is not None:
                diff = (lastmove - self.prediction) % S
                self.stats.add((diff+1) % S, 1)
                self.stats.add((diff-1) % S, -1)
                self.stats.advance()
            self.prediction = guess

        def bestguess(self, age, best, S):
            bestdiff = self.stats.max(age, (best[0] - self.prediction) % S, best[1])
            return (bestdiff[0] + self.prediction) % S, bestdiff[1]

    def __init__(self):
        self.predictors = []
        self.ages = [1000, 100, 10, 5, 2, 1]
        self.predict_history = self.predictor((len(self.ages), 2, 3))
        self.predict_frequency = self.predictor((len(self.ages), 2))
        self.predict_fixed = self.predictor()
        self.predict_random = self.predictor()
        self.predict_meta = [Iocaine.Predictor() for a in range(len(self.ages))]
        self.stats = [Iocaine.Stats() for i in range(2)]
        self.histories = [[], [], []]

    def recall(self, age, hist):
        end, length = 0, 0
        for past in range(1, min(age + 1, len(hist) - 1)):
            if length >= len(hist) - past: break
            for i in range(-1 - length, 0):
                if hist[i - past] != hist[i]: break
            else:
                for length in range(length + 1, len(hist) - past):
                    if hist[-past - length - 1] != hist[-length - 1]: break
                else: length += 1
                end = len(hist) - past
        return end

    def predictor(self, dims=None):
        if dims: return [self.predictor(dims[1:]) for i in range(dims[0])]
        self.predictors.append(Iocaine.Predictor())
        return self.predictors[-1]

    def next_action(self, T, A, S):
        if T > 0:
            self.histories[1].append(A)
            self.histories[2].append((self.histories[0][-1], A))
            for watch in range(2):
                self.stats[watch].add(self.histories[watch][-1], 1)

        rand = random.randrange(3)
        self.predict_random.addguess(A, rand, S)
        self.predict_fixed.addguess(A, 0, S)

        for a, age in enumerate(self.ages):
            best = [self.recall(age, hist) for hist in self.histories]
            for mimic in range(2):
                for watch, when in enumerate(best):
                    if not when: move = rand
                    else: move = self.histories[mimic][when]
                    self.predict_history[a][mimic][watch].addguess(A, move, S)
                mostfreq, score = self.stats[mimic].max(age, rand, -1)
                self.predict_frequency[a][mimic].addguess(A, mostfreq, S)

        for meta, age in enumerate(self.ages):
            best = (-1, -1)
            for predictor in self.predictors:
                best = predictor.bestguess(age, best, S)
            self.predict_meta[meta].addguess(A, best[0], S)

        best = (-1, -1)
        for meta in range(len(self.ages)):
            best = self.predict_meta[meta].bestguess(len(self.histories[0]) , best, S) 
        self.histories[0].append(best[0])
        return best[0]

class Greenberg:
    K = 0.5
    def __init__(self):
        self.opp_moves = []
        self.my_moves = []
        self.act = None
    
    @staticmethod
    def min_index(values):
            return min(enumerate(values), key=itemgetter(1))[0]

    @staticmethod
    def max_index(values):
        return max(enumerate(values), key=itemgetter(1))[0]

    def find_best_prediction(self, l, T):  # l = len
        bs = -1000
        bp = 0
        if self.p_random_score > bs:
            bs = self.p_random_score
            bp = self.p_random
        for i in range(3):
            for j in range(24):
                for k in range(4):
                    new_bs = self.p_full_score[T%50][j][k][i] - (self.p_full_score[(50+T-l)%50][j][k][i] if l else 0)
                    if new_bs > bs:
                        bs = new_bs
                        bp = (self.p_full[j][k] + i) % 3
                for k in range(2):
                    new_bs = self.r_full_score[T%50][j][k][i] - (self.r_full_score[(50+T-l)%50][j][k][i] if l else 0)
                    if new_bs > bs:
                        bs = new_bs
                        bp = (self.r_full[j][k] + i) % 3
            for j in range(2):
                for k in range(2):
                    new_bs = self.p_freq_score[T%50][j][k][i] - (self.p_freq_score[(50+T-l)%50][j][k][i] if l else 0)
                    if new_bs > bs:
                        bs = new_bs
                        bp = (self.p_freq[j][k] + i) % 3
                    new_bs = self.r_freq_score[T%50][j][k][i] - (self.r_freq_score[(50+T-l)%50][j][k][i] if l else 0)
                    if new_bs > bs:
                        bs = new_bs
                        bp = (self.r_freq[j][k] + i) % 3
        return bp

    def next_action(self, TT, A, S):
        if TT > 0:
            self.my_moves.append(self.act)
            self.opp_moves.append(A)

        wins_with = (1,2,0)      #superior
        best_without = (2,0,1)   #inferior

        lengths = (10, 20, 30, 40, 49, 0)
        self.p_random = secrets.randbelow(S)

        score_table =((0,-1,1),(1,0,-1),(-1,1,0))
        T = len(self.opp_moves)  #so T is number of trials completed

        if not self.my_moves:
            self.opp_history = [0]
            self.my_history = [0]
            self.gear = [[0] for _ in range(24)]
            self.p_random_score = 0
            self.p_full_score = [[[[0 for i in range(3)] for k in range(4)] for j in range(24)] for l in range(50)]
            self.r_full_score = [[[[0 for i in range(3)] for k in range(2)] for j in range(24)] for l in range(50)]
            self.p_freq_score = [[[[0 for i in range(3)] for k in range(2)] for j in range(2)] for l in range(50)]
            self.r_freq_score = [[[[0 for i in range(3)] for k in range(2)] for j in range(2)] for l in range(50)]
            self.s_len = [0] * 6

            self.p_full = [[0,0,0,0] for _ in range(24)]
            self.r_full = [[0,0] for _ in range(24)]
        else:
            self.my_history.append(self.my_moves[-1])
            self.opp_history.append(self.opp_moves[-1])
            self.p_random_score += score_table[self.p_random][self.opp_history[-1]]
            self.p_full_score[T%50] = [[[self.p_full_score[(T+49)%50][j][k][i] + score_table[(self.p_full[j][k] + i) % 3][self.opp_history[-1]] for i in range(3)] for k in range(4)] for j in range(24)]
            self.r_full_score[T%50] = [[[self.r_full_score[(T+49)%50][j][k][i] + score_table[(self.r_full[j][k] + i) % 3][self.opp_history[-1]] for i in range(3)] for k in range(2)] for j in range(24)]
            self.p_freq_score[T%50] = [[[self.p_freq_score[(T+49)%50][j][k][i] + score_table[(self.p_freq[j][k] + i) % 3][self.opp_history[-1]] for i in range(3)] for k in range(2)] for j in range(2)]
            self.r_freq_score[T%50] = [[[self.r_freq_score[(T+49)%50][j][k][i] + score_table[(self.r_freq[j][k] + i) % 3][self.opp_history[-1]] for i in range(3)] for k in range(2)] for j in range(2)]
            self.s_len = [s + score_table[p][self.opp_history[-1]] for s,p in zip(self.s_len,self.p_len)]

        if not self.my_moves:
            self.my_history_hash = [[0],[0],[0],[0]]
            self.opp_history_hash = [[0],[0],[0],[0]]
        else:
            self.my_history_hash[0].append(self.my_history[-1])
            self.opp_history_hash[0].append(self.opp_history[-1])
            for i in range(1,4):
                self.my_history_hash[i].append(self.my_history_hash[i-1][-1] * 3 + self.my_history[-1])
                self.opp_history_hash[i].append(self.opp_history_hash[i-1][-1] * 3 + self.opp_history[-1])

        for i in range(24):
            self.gear[i].append((3 + self.opp_history[-1] - self.p_full[i][2]) % 3)
            if T > 1:
                self.gear[i][T] += 3 * self.gear[i][T-1]
            self.gear[i][T] %= 9
        if not self.my_moves:
            self.freq = [[0,0,0],[0,0,0]]
            value = [[0,0,0],[0,0,0]]
        else:
            self.freq[0][self.my_history[-1]] += 1
            self.freq[1][self.opp_history[-1]] += 1
            value = [[(1000 * (self.freq[i][2] - self.freq[i][1])) / float(T),
                      (1000 * (self.freq[i][0] - self.freq[i][2])) / float(T),
                      (1000 * (self.freq[i][1] - self.freq[i][0])) / float(T)] for i in range(2)]
        self.p_freq = [[wins_with[Greenberg.max_index(self.freq[i])], wins_with[Greenberg.max_index(value[i])]] for i in range(2)]
        self.r_freq = [[best_without[Greenberg.min_index(self.freq[i])], best_without[Greenberg.min_index(value[i])]] for i in range(2)]

        f = [[[[0,0,0] for k in range(4)] for j in range(2)] for i in range(3)]
        t = [[[0,0,0,0] for j in range(2)] for i in range(3)]

        m_len = [[0 for _ in range(T)] for i in range(3)]

        for i in range(T-1,0,-1):
            m_len[0][i] = 4
            for j in range(4):
                if self.my_history_hash[j][i] != self.my_history_hash[j][T]:
                    m_len[0][i] = j
                    break
            for j in range(4):
                if self.opp_history_hash[j][i] != self.opp_history_hash[j][T]:
                    m_len[1][i] = j
                    break
            for j in range(4):
                if self.my_history_hash[j][i] != self.my_history_hash[j][T] or self.opp_history_hash[j][i] != self.opp_history_hash[j][T]:
                    m_len[2][i] = j
                    break

        for i in range(T-1,0,-1):
            for j in range(3):
                for k in range(m_len[j][i]):
                    f[j][0][k][self.my_history[i+1]] += 1
                    f[j][1][k][self.opp_history[i+1]] += 1
                    t[j][0][k] += 1
                    t[j][1][k] += 1

                    if t[j][0][k] == 1:
                        self.p_full[j*8 + 0*4 + k][0] = wins_with[self.my_history[i+1]]
                    if t[j][1][k] == 1:
                        self.p_full[j*8 + 1*4 + k][0] = wins_with[self.opp_history[i+1]]
                    if t[j][0][k] == 3:
                        self.p_full[j*8 + 0*4 + k][1] = wins_with[Greenberg.max_index(f[j][0][k])]
                        self.r_full[j*8 + 0*4 + k][0] = best_without[Greenberg.min_index(f[j][0][k])]
                    if t[j][1][k] == 3:
                        self.p_full[j*8 + 1*4 + k][1] = wins_with[Greenberg.max_index(f[j][1][k])]
                        self.r_full[j*8 + 1*4 + k][0] = best_without[Greenberg.min_index(f[j][1][k])]

        for j in range(3):
            for k in range(4):
                self.p_full[j*8 + 0*4 + k][2] = wins_with[Greenberg.max_index(f[j][0][k])]
                self.r_full[j*8 + 0*4 + k][1] = best_without[Greenberg.min_index(f[j][0][k])]

                self.p_full[j*8 + 1*4 + k][2] = wins_with[Greenberg.max_index(f[j][1][k])]
                self.r_full[j*8 + 1*4 + k][1] = best_without[Greenberg.min_index(f[j][1][k])]

        for j in range(24):
            gear_freq = [0] * 9
            for i in range(T-1,0,-1):
                if self.gear[j][i] == self.gear[j][T]:
                    gear_freq[self.gear[j][i+1]] += 1
            self.p_full[j][3] = (self.p_full[j][1] + Greenberg.max_index(gear_freq)) % 3

        self.p_len = [self.find_best_prediction(l, T) for l in lengths]
        self.act = self.p_len[Greenberg.max_index(self.s_len)]
        return self.act

class TestingPleaseIgnore:
    K = 20
    def counter_prob(self, probs):
        weighted_list = []
        for h in self.rps:
            weighted = 0
            for p in probs.keys():
                points = self.score[h + p]
                prob = probs[p]
                weighted += points * prob
            weighted_list.append((h, weighted))
        return max(weighted_list, key=itemgetter(1))[0]

    def __init__(self):
        self.score  = {'RR': 0, 'PP': 0, 'SS': 0, \
                  'PR': 1, 'RS': 1, 'SP': 1, \
                  'RP': -1, 'SR': -1, 'PS': -1,}
        self.cscore = {'RR': 'r', 'PP': 'r', 'SS': 'r', \
                  'PR': 'b', 'RS': 'b', 'SP': 'b', \
                  'RP': 'c', 'SR': 'c', 'PS': 'c',}
        self.beat = {'P': 'S', 'S': 'R', 'R': 'P'}
        self.cede = {'P': 'R', 'S': 'P', 'R': 'S'}
        self.rps = ['R', 'P', 'S']
        self.wlt = {1: 0, -1: 1, 0: 2}

        self.played_probs = collections.defaultdict(lambda: 1)
        self.dna_probs = [
            collections.defaultdict(lambda: collections.defaultdict(lambda: 1)) for i in range(18)
        ]
        self.wlt_probs = [collections.defaultdict(lambda: 1) for i in range(9)]
        self.answers = [{'c': 1, 'b': 1, 'r': 1} for i in range(12)]
        self.patterndict = [collections.defaultdict(str) for i in range(6)]
        self.consec_strat_usage = [[0] * 6, [0] * 6,
                                   [0] * 6]  #consecutive strategy usage
        self.consec_strat_candy = [[], [], []]  #consecutive strategy candidates
        self.histories = ["", "", ""]
        self.dna = ["" for i in range(12)]
        self.sc = 0
        self.strats = [[] for i in range(3)]
        
    def next_action(self, T, A, S):
        if T == 0:
            self.B = random.choice(self.rps)
            return {'R': 0, 'P': 1, 'S': 2}[self.B]
        prev_sc = self.sc

        self.sc = self.score[self.B + 'RPS'[A]]
        for j in range(3):
            prev_strats = self.strats[j][:]
            for i, c in enumerate(self.consec_strat_candy[j]):
                if c == 'RPS'[A]:
                    self.consec_strat_usage[j][i] += 1
                else:
                    self.consec_strat_usage[j][i] = 0
            m = max(self.consec_strat_usage[j])
            self.strats[j] = [
                i for i, c in enumerate(self.consec_strat_candy[j])
                if self.consec_strat_usage[j][i] == m
            ]

            for s1 in prev_strats:
                for s2 in self.strats[j]:
                    self.wlt_probs[j * 3 + self.wlt[prev_sc]][chr(s1) + chr(s2)] += 1

            if self.dna[2 * j + 0] and self.dna[2 * j + 1]:
                self.answers[2 * j + 0][self.cscore['RPS'[A] + self.dna[2 * j + 0]]] += 1
                self.answers[2 * j + 1][self.cscore['RPS'[A] + self.dna[2 * j + 1]]] += 1
            if self.dna[2 * j + 6] and self.dna[2 * j + 7]:
                self.answers[2 * j + 6][self.cscore['RPS'[A] + self.dna[2 * j + 6]]] += 1
                self.answers[2 * j + 7][self.cscore['RPS'[A] + self.dna[2 * j + 7]]] += 1

            for length in range(min(10, len(self.histories[j])), 0, -2):
                pattern = self.patterndict[2 * j][self.histories[j][-length:]]
                if pattern:
                    for length2 in range(min(10, len(pattern)), 0, -2):
                        self.patterndict[2 * j + 1][pattern[-length2:]] += self.B + 'RPS'[A]
                self.patterndict[2 * j][self.histories[j][-length:]] += self.B + 'RPS'[A]
        self.played_probs['RPS'[A]] += 1
        self.dna_probs[0][self.dna[0]]['RPS'[A]] += 1
        self.dna_probs[1][self.dna[1]]['RPS'[A]] += 1
        self.dna_probs[2][self.dna[1] + self.dna[0]]['RPS'[A]] += 1
        self.dna_probs[9][self.dna[6]]['RPS'[A]] += 1
        self.dna_probs[10][self.dna[6]]['RPS'[A]] += 1
        self.dna_probs[11][self.dna[7] + self.dna[6]]['RPS'[A]] += 1

        self.histories[0] += self.B + 'RPS'[A]
        self.histories[1] += 'RPS'[A]
        self.histories[2] += self.B

        self.dna = ["" for i in range(12)]
        for j in range(3):
            for length in range(min(10, len(self.histories[j])), 0, -2):
                pattern = self.patterndict[2 * j][self.histories[j][-length:]]
                if pattern != "":
                    self.dna[2 * j + 1] = pattern[-2]
                    self.dna[2 * j + 0] = pattern[-1]
                    for length2 in range(min(10, len(pattern)), 0, -2):
                        pattern2 = self.patterndict[2 * j + 1][pattern[-length2:]]
                        if pattern2 != "":
                            self.dna[2 * j + 7] = pattern2[-2]
                            self.dna[2 * j + 6] = pattern2[-1]
                            break
                    break

        probs = {}
        for hand in self.rps:
            probs[hand] = self.played_probs[hand]

        for j in range(3):
            if self.dna[j * 2] and self.dna[j * 2 + 1]:
                for hand in self.rps:
                    probs[hand] *= self.dna_probs[j*3+0][self.dna[j*2+0]][hand] * \
                                   self.dna_probs[j*3+1][self.dna[j*2+1]][hand] * \
                          self.dna_probs[j*3+2][self.dna[j*2+1]+self.dna[j*2+0]][hand]
                    probs[hand] *= self.answers[j*2+0][self.cscore[hand+self.dna[j*2+0]]] * \
                                   self.answers[j*2+1][self.cscore[hand+self.dna[j*2+1]]]
                self.consec_strat_candy[j] = [self.dna[j*2+0], self.beat[self.dna[j*2+0]], self.cede[self.dna[j*2+0]],\
                                         self.dna[j*2+1], self.beat[self.dna[j*2+1]], self.cede[self.dna[j*2+1]]]
                strats_for_hand = {'R': [], 'P': [], 'S': []}
                for i, c in enumerate(self.consec_strat_candy[j]):
                    strats_for_hand[c].append(i)
                pr = self.wlt_probs[self.wlt[self.sc] + 3 * j]
                for hand in self.rps:
                    for s1 in self.strats[j]:
                        for s2 in strats_for_hand[hand]:
                            probs[hand] *= pr[chr(s1) + chr(s2)]
            else:
                self.consec_strat_candy[j] = []
        for j in range(3):
            if self.dna[j * 2 + 6] and self.dna[j * 2 + 7]:
                for hand in self.rps:
                    probs[hand] *= self.dna_probs[j*3+9][self.dna[j*2+6]][hand] * \
                                   self.dna_probs[j*3+10][self.dna[j*2+7]][hand] * \
                          self.dna_probs[j*3+11][self.dna[j*2+7]+self.dna[j*2+6]][hand]
                    probs[hand] *= self.answers[j*2+6][self.cscore[hand+self.dna[j*2+6]]] * \
                                   self.answers[j*2+7][self.cscore[hand+self.dna[j*2+7]]]

        self.B = self.counter_prob(probs)
        return {'R': 0, 'P': 1, 'S': 2}[self.B]
    
class IOU2:
    K = 20
    def __init__(self):
        self.num_predictor = 27
        self.len_rfind = [20]
        self.limit = [10,20,60]
        self.beat = { "R":"P" , "P":"S", "S":"R"}
        self.not_lose = { "R":"PPR" , "P":"SSP" , "S":"RRS" } #50-50 chance
        self.my_his   =""
        self.your_his =""
        self.both_his =""
        self.list_predictor = [""]*self.num_predictor
        self.length = 0
        self.temp1 = { "PP":"1" , "PR":"2" , "PS":"3",
                      "RP":"4" , "RR":"5", "RS":"6",
                      "SP":"7" , "SR":"8", "SS":"9"}
        self.temp2 = { "1":"PP","2":"PR","3":"PS",
                        "4":"RP","5":"RR","6":"RS",
                        "7":"SP","8":"SR","9":"SS"} 
        self.who_win = { "PP": 0, "PR":1 , "PS":-1,
                        "RP": -1,"RR":0, "RS":1,
                        "SP": 1, "SR":-1, "SS":0}
        self.score_predictor = [0]*self.num_predictor
        self.output = random.choice("RPS")
        self.predictors = [self.output]*self.num_predictor

    def next_action(self, T, A, S):
        to_char = ["R", "P", "S"]
        from_char = {"R": 0, "P": 1, "S": 2}
        if T == 0:
            return from_char[self.output]
        input = to_char[A]

        if len(self.list_predictor[0])<5:
            front =0
        else:
            front =1
        for i in range (self.num_predictor):
            if self.predictors[i]==input:
                result ="1"
            else:
                result ="0"
            self.list_predictor[i] = self.list_predictor[i][front:5]+result #only 5 rounds before
        #history matching 1-6
        self.my_his += self.output
        self.your_his += input
        self.both_his += self.temp1[input+self.output]
        self.length +=1
        for i in range(1):
            len_size = min(self.length,self.len_rfind[i])
            j=len_size
            #self.both_his
            while j>=1 and not self.both_his[self.length-j:self.length] in self.both_his[0:self.length-1]:
                j-=1
            if j>=1:
                k = self.both_his.rfind(self.both_his[self.length-j:self.length],0,self.length-1)
                self.predictors[0+6*i] = self.your_his[j+k]
                self.predictors[1+6*i] = self.beat[self.my_his[j+k]]
            else:
                self.predictors[0+6*i] = random.choice("RPS")
                self.predictors[1+6*i] = random.choice("RPS")
            j=len_size
            #self.your_his
            while j>=1 and not self.your_his[self.length-j:self.length] in self.your_his[0:self.length-1]:
                j-=1
            if j>=1:
                k = self.your_his.rfind(self.your_his[self.length-j:self.length],0,self.length-1)
                self.predictors[2+6*i] = self.your_his[j+k]
                self.predictors[3+6*i] = self.beat[self.my_his[j+k]]
            else:
                self.predictors[2+6*i] = random.choice("RPS")
                self.predictors[3+6*i] = random.choice("RPS")
            j=len_size
            #self.my_his
            while j>=1 and not self.my_his[self.length-j:self.length] in self.my_his[0:self.length-1]:
                j-=1
            if j>=1:
                k = self.my_his.rfind(self.my_his[self.length-j:self.length],0,self.length-1)
                self.predictors[4+6*i] = self.your_his[j+k]
                self.predictors[5+6*i] = self.beat[self.my_his[j+k]]
            else:
                self.predictors[4+6*i] = random.choice("RPS")
                self.predictors[5+6*i] = random.choice("RPS")

        for i in range(3):
            temp =""
            search = self.temp1[(self.output+input)] #last round
            for start in range(2, min(self.limit[i],self.length) ):
                if search == self.both_his[self.length-start]:
                    temp+=self.both_his[self.length-start+1]
            if(temp==""):
                self.predictors[6+i] = random.choice("RPS")
            else:
                collectR = {"P":0,"R":0,"S":0} #take win/lose from opponent into account
                for sdf in temp:
                    next_move = self.temp2[sdf]
                    if(self.who_win[next_move]==-1):
                        collectR[self.temp2[sdf][1]]+=3
                    elif(self.who_win[next_move]==0):
                        collectR[self.temp2[sdf][1]]+=1
                    elif(self.who_win[next_move]==1):
                        collectR[self.beat[self.temp2[sdf][0]]]+=1
                max1 = -1
                p1 =""
                for key in collectR:
                    if(collectR[key]>max1):
                        max1 = collectR[key]
                        p1 += key
                self.predictors[6+i] = random.choice(p1)
        for i in range(9,27):
            self.predictors[i] = self.beat[self.beat[self.predictors[i-9]]]
        len_his = len(self.list_predictor[0])
        for i in range(self.num_predictor):
            sum = 0
            for j in range(len_his):
                if self.list_predictor[i][j]=="1":
                    sum+=(j+1)*(j+1)
                else:
                    sum-=(j+1)*(j+1)
            self.score_predictor[i] = sum
        max_score = max(self.score_predictor)
        if max_score>0:
            predict = self.predictors[self.score_predictor.index(max_score)]
        else:
            predict = random.choice(self.your_his)
        self.output = random.choice(self.not_lose[predict])
        return from_char[self.output]

class NumpyPatterns:
    K = 20
    def __init__(self):
        self.B = 0
        # Jitter - steps before next non-random move
        self.Jmax = 2
        self.J2 = (self.Jmax+1)**2
        self.J = self.Jmax - int(math.sqrt(secrets.randbelow(self.J2)))
        # Depth - number of previous steps taken into consideration
        self.Dmin = 1
        self.Dmax = 3
        self.DL = self.Dmax-self.Dmin+1
        self.HL = 3
        self.HText = ['Opp',  'Me', 'Score']
        self.Depth = np.arange(self.DL)
        self.Hash = np.zeros((self.HL, self.DL), dtype=int)
        self.G = 2
        self.R = 0.4
        self.RG = (1-self.R) * self.G
        self.Threshold = 0.4
        
    def split_idx(self, idx):
        d = idx % self.DL
        idx //= self.DL
        h2 = idx % self.HL
        idx //= self.HL
        h1 = idx % self.HL
        idx //= self.HL
        return d, h1, h2, idx
    
    def next_action(self, T, A, S):
        B, HL, DL, Dmin, Dmax = self.B, self.HL, self.DL, self.Dmin, self.Dmax
        SD = S**self.DL
        if T == 0:
            self.Map = np.zeros((S, SD**2, HL, HL, DL))
            self.SList = np.arange(S)[:,None,None,None]
            self.Predicts = np.full((HL, HL, DL), S, dtype=int)
            self.Attempts = np.zeros((HL, HL, DL), dtype=int)
            self.Scores = np.zeros((S, HL, HL, DL))
            self.OrgID = np.ogrid[:S, :HL, :HL, :DL]
            self.Hash2 = self.Hash[None,:] + SD*self.Hash[:,None]
        else:
            C = get_score(S, A, B) + 1
            ABC = np.array([A, B, C])[:,None]
            Depth, Hash, Hash2, Map, SList, OrgID, Predicts, Attempts, Scores = self.Depth, self.Hash, self.Hash2, self.Map, self.SList, self.OrgID, self.Predicts, self.Attempts, self.Scores
            # Update Moves Map by previous move and previous Hash
            Map *= 0.995
            Map[OrgID[0], Hash2, OrgID[1], OrgID[2], OrgID[3]] += (T > Depth + Dmin) * (SList == A)
            # Update Hash by previous move
            Hash[:] //= S
            Hash[:] += ABC[:HL] * S**Depth
            Hash2[:] = Hash[None,:] + SD*Hash[:,None]
            
            # Update prediction scores by previous move
            PB = Predicts < S
            Attempts[:] = Attempts + PB
            Scores[:] += PB * get_score(S, Predicts + SList, A)
            #print(T, Scores.T[0])
            # Update prediction scores by previous move
            PR = Map[OrgID[0], Hash2, OrgID[1], OrgID[2], OrgID[3]]
            Sum = np.sum(PR, axis=0)
            Predicts[:] = (np.max((Sum >= self.G) * (PR >= Sum * self.R + self.RG) * (SList + 1), axis=0) - 1) % (S + 1)

        self.B = np.random.choice(S)
        if self.J > 0:
            self.J -= 1
        else:
            sc = np.where(self.Predicts < S, self.Scores / (self.Attempts + 2), 0).ravel()
            idx = np.argmax(sc)
            if sc[idx] > self.Threshold:
                Raw = self.Predicts.ravel()
                L = len(Raw)
                self.B = (Raw[idx % L] + idx // L) % S
                self.J = self.Jmax - int(math.sqrt(secrets.randbelow(self.J2)))
                #parts = self.split_idx(idx)
                #print(T, f'{parts[0]+self.Dmin}: {self.HText[parts[1]]}-{self.HText[parts[2]]}+{parts[3]}', self.Scores[:, parts[1], parts[2], parts[0]], self.B)
        return self.B

black_belt = {
    "DecisionTree": DecisionTree,
    "Xgboost": Xgboost,
    "PatternAggressive": PatternAggressive,
    "MemoryPatterns": MemoryPatterns,
    "Iocaine": Iocaine,
    "Greenberg": Greenberg,
    "TestingPleaseIgnore": TestingPleaseIgnore,
    "IOU2": IOU2,
    "NumpyPatterns": NumpyPatterns,
}
print(list(black_belt.keys()))

# Submission

In [None]:
%%writefile submission.py

import secrets
import math
import numpy as np

def get_score(S, A1, A2):
    return (S + A1 - A2 + 1) % S - 1

class Submission:
    K = 10
    def __init__(self, verbose=False):
        self.B = 0
        # Jitter - steps before next non-random move
        self.Jmax = 2
        self.J2 = (self.Jmax+1)**2
        self.J = int(math.sqrt(secrets.randbelow(self.J2)))
        # Depth - number of previous steps taken into consideration
        self.Dmin = 2
        self.Dmax = 6
        self.DL = self.Dmax-self.Dmin+1
        self.HL = 2
        self.HText = ['Opp',  'Me', 'Score']
        self.Depth = np.arange(self.DL)
        self.Hash = np.zeros((self.HL, self.DL), dtype=int)
        self.G = 2
        self.R = 0.4
        self.RG = (1-self.R) * self.G
        self.Threshold = 0.2
        self.verbose = verbose
        
    def split_idx(self, idx):
        d = idx % self.DL
        idx //= self.DL
        h2 = idx % self.HL
        idx //= self.HL
        h1 = idx % self.HL
        idx //= self.HL
        return d, h1, h2, idx
    
    def next_action(self, T, A, S):
        B, HL, DL, Dmin, Dmax = self.B, self.HL, self.DL, self.Dmin, self.Dmax
        SD = S**self.DL
        PR = None
        if T == 0:
            self.Map = np.zeros((S, SD**2, HL, HL, DL))
            self.SList = np.arange(S)[:,None,None,None]
            self.Predicts = np.full((HL, HL, DL), S, dtype=int)
            self.Attempts = np.zeros((HL, HL, DL), dtype=int)
            self.Scores = np.zeros((S, HL, HL, DL))
            self.OrgID = np.ogrid[:S, :HL, :HL, :DL]
            self.Hash2 = self.Hash[None,:] + SD*self.Hash[:,None]
        else:
            C = get_score(S, A, B) + 1
            if self.verbose: print(T, f'{B}-{A} {1-C}')
            ABC = np.array([A, B, C])[:,None]
            Depth, Hash, Hash2, Map, SList, OrgID, Predicts, Attempts, Scores = self.Depth, self.Hash, self.Hash2, self.Map, self.SList, self.OrgID, self.Predicts, self.Attempts, self.Scores
            # Update Moves Map by previous move and previous Hash
            Map *= 0.995
            Map[OrgID[0], Hash2, OrgID[1], OrgID[2], OrgID[3]] += (T > Depth + Dmin) * (SList == A)
            # Update Hash by previous move
            Hash[:] //= S
            Hash[:] += ABC[:HL] * S**Depth
            Hash2[:] = Hash[None,:] + SD*Hash[:,None]
            
            # Update prediction scores by previous move
            PB = Predicts < S
            Attempts[:] = Attempts + PB
            Scores[:] += PB * get_score(S, Predicts + SList, A)
            #print(T, Scores.T[0])
            # Update prediction scores by previous move
            PR = Map[OrgID[0], Hash2, OrgID[1], OrgID[2], OrgID[3]]
            Sum = np.sum(PR, axis=0)
            Predicts[:] = (np.max((Sum >= self.G) * (PR >= Sum * self.R + self.RG) * (SList + 1), axis=0) - 1) % (S + 1)

        self.B = np.random.choice(S)
        if self.J > 0:
            self.J -= 1
        else:
            sc = np.where(self.Predicts < S, self.Scores / (self.Attempts + 5), 0).ravel()
            idx = np.argmax(sc)
            if sc[idx] > self.Threshold:
                self.Scores.ravel()[idx] -= 1/3
                Raw = self.Predicts.ravel()
                L = len(Raw)
                p = None
                s = 0
                if PR is not None:
                    p = PR.ravel().reshape((3,-1))[:, idx % L]
                    s = np.sum(p)
                if s > 0 and np.random.choice(3) > 3:
                    p /= s
                    self.B = (np.random.choice(S, p=p) + idx // L) % S
                    parts = self.split_idx(idx)
                    if self.verbose: print(T, f'Weighted {parts[0]+self.Dmin}: {self.HText[parts[1]]}-{self.HText[parts[2]]}+{parts[3]}', p, self.B)
                else:
                    self.B = (Raw[idx % L] + idx // L) % S
                    parts = self.split_idx(idx)
                    if self.verbose: print(T, f'Direct {parts[0]+self.Dmin}: {self.HText[parts[1]]}-{self.HText[parts[2]]}+{parts[3]}', self.Scores.ravel()[idx], self.B)
                self.J = int(math.sqrt(secrets.randbelow(self.J2)))
        return self.B

submission = Submission(verbose=True)

def agent(observation, configuration):
    T = observation.step
    A = observation.lastOpponentAction if T > 0 else None
    S = configuration.signs
    try:
        return int(submission.next_action(T, A, S))
    except Exception as e:
        print(T, f'Failed', e)
        return int(np.random.choice(S))

In [None]:
%run submission.py

# Evaluation

This is an example of how you can evaluate your own model using the baseline agents in this Dojo. In this example, we will evaluate `Rock` agent against the other ones.

The evaluation code uses **multi processing** (python's built-in), as in [RPS - Multiprocessing Agent Comparisons](https://www.kaggle.com/booooooow/rps-multiprocessing-agent-comparisons) notebook.

### Reasoning:

kaggle_environment is too generic and allows for execution of different games with different rules in a similar fascion. It does execution timeout checks and other validations, which makes it really slow when it comes to just testing your agent.

So I created minimalistic emulator that implements basic RPS rules, with only performance in mind

In [None]:
class Environment:
    @staticmethod
    def run(agents):
        actions = None, None
        reward = 0
        for step in range(1000):
            actions = agents[0].next_action(step, actions[1], 3), agents[1].next_action(step, actions[0], 3)
            reward += (3 + actions[0] - actions[1] + 1) % 3 - 1
        return (0 if -reward < 20 else -1) if reward < 20 else 1
    
    @staticmethod
    def evaluate(args):
        Agent1, Agent2, N = args
        start = time.time()
        won, lost, tie = 0, 0, 0
        N = int(N * min(Agent1.K,Agent2.K))
        for it in range(N):
            score = Environment.run([Agent1(), Agent2()])
            if score > 0: won += 1
            elif score < 0: lost += 1
            else: tie += 1
        elapsed = time.time() - start
        if N < 1:
            return Agent1, Agent2, 0, 0, 0, 0, elapsed
        return Agent1, Agent2, N, 100*won/N, 100*lost/N, 100*tie/N, elapsed

    @staticmethod
    def evaluate_dojo(agent, all_agents=None, N=1):
        if all_agents is None:
            all_agents = {
                **white_belt,
                **blue_belt,
                **brown_belt,
                **black_belt,
            }
        all_agent_names = list(all_agents.keys())
        L = len(all_agents)

        df = pd.DataFrame(columns=['games', '% wins', '% loses', '% ties', 'duration'], index=all_agent_names)

        settings = [(agent, all_agents[all_agent_names[a]], N) for a in range(L)]

        # For debug purposes you can comment multiprocessing and uncomment simple loop
        for Agent1, Agent2, N, won, lost, tie, elapsed in tqdm(Pool().imap_unordered(Environment.evaluate, settings), total=len(settings)):
        # for x in tqdm(settings, total=len(settings)):
        #    Agent1, Agent2, N, won, lost, tie, elapsed = Environment.evaluate(x)
            df.loc[Agent2.__name__] = [N, won, lost, tie, elapsed]

        return df

In [None]:
Environment.evaluate_dojo(Submission, N=20)

# Save Submission

In [None]:
from kaggle_environments import make
from kaggle_environments.envs.rps.agents import agents
import numpy as np
import pandas as pd
from tqdm.auto import tqdm

all_agents = list(agents.values())
all_agent_names = list(k[:8] for k in agents.keys())
env = make("rps", configuration={"episodeSteps": 1000})
L = len(all_agents)
results = np.zeros((L,), dtype=int)
N = 10
for it in tqdm(range(N), total=N):
    for a in range(L):
        env.run(["submission.py", all_agents[a]])
        rewards = env.toJSON()['rewards']
        results[a] += -1 if rewards[0] is None else int(rewards[0] >= 20) - int(rewards[0] <= -20)
        # env.render(mode="ipython", width=800, height=800)
pd.DataFrame(100*results/N, index=all_agent_names, columns=["% wins"])