# Intro
In this notebook, I explored the idea of making a Q-Learning bot do well. I was surprised by how more common reinforcement learning models in other tasks (Q-learning, Deep Q-learning, etc) weren't working very well in the RPS competition, as I had mentioned in this discussion sometime back: https://www.kaggle.com/c/rock-paper-scissors/discussion/201837

Then I worked hard and I think I got it to work.
I developed a crude way of training the Q-Learning model, and after that the agent has an average score of ~850-900, which is not bad. It's probably good to include it in an ensemble, but wasn't able to test the idea out extensively.

I noticed that this bot is extremely flexible and really does learn (if you train it right). After geometry bot was released, my old Q-Learning bot (and others) were in bit of a mess. I just added it in the agents dict, and now the Q-Learning bot is about neutral (maybe slightly positive) against Geometry Bot. 

I also highly recommend to use this agent in an ensemble, as relatively static Q-Tables in general can be vulnerable to exploitation. 

I'll add comments explaining each part of the code.

In [None]:
from kaggle_environments import make #Create the RPS environment.
env = make("rps", configuration = {"episodeSteps":1000})

# Enemy Agents

The crux of the training here, is that the Q-Learning agent learns from experience. For a strong agent, it needs good experience. For this, I pretty much picked all of the top public agents shared on Kaggle. I don't have the links for the opponents, but most are compiled in https://www.kaggle.com/ihelon/rock-paper-scissors-agents-comparison

So now let's just define all of the agents in separate .py files.

In [None]:
%%writefile copy_opponent.py

import random
from kaggle_environments.envs.rps.utils import get_score

def copy_opponent(observation, configuration):
    if observation.step > 0:
        return observation.lastOpponentAction
    else:
        return random.randrange(0, configuration.signs)

In [None]:
%%writefile reactionary.py

import random
from kaggle_environments.envs.rps.utils import get_score

last_react_action = None


def reactionary(observation, configuration):
    global last_react_action
    if observation.step == 0:
        last_react_action = random.randrange(0, configuration.signs)
    elif get_score(last_react_action, observation.lastOpponentAction) <= 1:
        last_react_action = (observation.lastOpponentAction + 1) % configuration.signs

    return last_react_action

In [None]:
%%writefile counter_reactionary.py

import random
from kaggle_environments.envs.rps.utils import get_score

last_counter_action = None


def counter_reactionary(observation, configuration):
    global last_counter_action
    if observation.step == 0:
        last_counter_action = random.randrange(0, configuration.signs)
    elif get_score(last_counter_action, observation.lastOpponentAction) == 1:
        last_counter_action = (last_counter_action + 2) % configuration.signs
    else:
        last_counter_action = (observation.lastOpponentAction + 1) % configuration.signs

    return last_counter_action

In [None]:
%%writefile markov_agent.py

import numpy as np
import collections

def markov_agent(observation, configuration):
    k = 2
    global table, action_seq
    if observation.step % 250 == 0: # refresh table every 250 steps
        action_seq, table = [], collections.defaultdict(lambda: [1, 1, 1])    
    if len(action_seq) <= 2 * k + 1:
        action = int(np.random.randint(3))
        if observation.step > 0:
            action_seq.extend([observation.lastOpponentAction, action])
        else:
            action_seq.append(action)
        return action
    # update table
    key = ''.join([str(a) for a in action_seq[:-1]])
    table[key][observation.lastOpponentAction] += 1
    # update action seq
    action_seq[:-2] = action_seq[2:]
    action_seq[-2] = observation.lastOpponentAction
    # predict opponent next move
    key = ''.join([str(a) for a in action_seq[:-1]])
    if observation.step < 500:
        next_opponent_action_pred = np.argmax(table[key])
    else:
        scores = np.array(table[key])
        next_opponent_action_pred = np.random.choice(3, p=scores/scores.sum()) # add stochasticity for second part of the game
    # make an action
    action = (next_opponent_action_pred + 1) % 3
    # if high probability to lose -> let's surprise our opponent with sudden change of our strategy
    if observation.step > 900:
        action = next_opponent_action_pred
    action_seq[-1] = action
    return int(action)

In [None]:
%%writefile memory_patterns.py

import random

def find_pattern(memory_patterns, memory, memory_length):
    """ find appropriate pattern in memory """
    for pattern in memory_patterns:
        actions_matched = 0
        for i in range(memory_length):
            if pattern["actions"][i] == memory[i]:
                actions_matched += 1
            else:
                break
        # if memory fits this pattern
        if actions_matched == memory_length:
            return pattern
    # appropriate pattern not found
    return None

def get_step_result_for_my_agent(my_agent_action, opp_action):
    """ 
        get result of the step for my_agent
        1, 0 and -1 representing win, tie and lost results of the game respectively
    """
    if my_agent_action == opp_action:
        return 0
    elif (my_agent_action == (opp_action + 1)) or (my_agent_action == 0 and opp_action == 2):
        return 1
    else:
        return -1


# maximum steps in the pattern
steps_max = 3
# minimum steps in the pattern
steps_min = 3
# maximum amount of steps until reassessment of effectiveness of current memory patterns
max_steps_until_memory_reassessment = random.randint(80, 120)

# current memory of the agent
current_memory = []
# list of 1, 0 and -1 representing win, tie and lost results of the game respectively
# length is max_steps_until_memory_reassessment
results = []
# current best sum of results
best_sum_of_results = 0
# how many times each action was performed by opponent
opponent_actions_count = [0, 0, 0]
# memory length of patterns in first group
# steps_max is multiplied by 2 to consider both my_agent's and opponent's actions
group_memory_length = steps_max * 2
# list of groups of memory patterns
groups_of_memory_patterns = []
for i in range(steps_max, steps_min - 1, -1):
    groups_of_memory_patterns.append({
        # how many steps in a row are in the pattern
        "memory_length": group_memory_length,
        # list of memory patterns
        "memory_patterns": []
    })
    group_memory_length -= 2
    

def my_agent(obs, conf):
    """ your ad here """
    global results
    global best_sum_of_results
    # action of my_agent
    my_action = None
    
    # if it's not first step, add opponent's last action to agent's current memory
    # and reassess effectiveness of current memory patterns
    if obs["step"] > 0:
        # count opponent's actions
        opponent_actions_count[obs["lastOpponentAction"]] += 1
        # add opponent's last step to current_memory
        current_memory.append(obs["lastOpponentAction"])
        # previous step won or lost
        results.append(get_step_result_for_my_agent(current_memory[-2], current_memory[-1]))
        
        # if there is enough steps added to results for memery reassessment
        if len(results) == max_steps_until_memory_reassessment:
            results_sum = sum(results)
            # if effectiveness of current memory patterns has decreased significantly
            if results_sum < (best_sum_of_results * 0.5):
                # flush all current memory patterns
                best_sum_of_results = 0
                results = []
                for group in groups_of_memory_patterns:
                    group["memory_patterns"] = []
            else:
                # if effectiveness of current memory patterns has increased
                if results_sum > best_sum_of_results:
                    best_sum_of_results = results_sum
                del results[:1]
    
    # search for my_action in memory patterns
    for group in groups_of_memory_patterns:
        # if length of current memory is bigger than necessary for a new memory pattern
        if len(current_memory) > group["memory_length"]:
            # get momory of the previous step
            previous_step_memory = current_memory[:group["memory_length"]]
            previous_pattern = find_pattern(group["memory_patterns"], previous_step_memory, group["memory_length"])
            if previous_pattern == None:
                previous_pattern = {
                    "actions": previous_step_memory.copy(),
                    "opp_next_actions": [
                        {"action": 0, "amount": 0, "response": 1},
                        {"action": 1, "amount": 0, "response": 2},
                        {"action": 2, "amount": 0, "response": 0}
                    ]
                }
                group["memory_patterns"].append(previous_pattern)
            # if such pattern already exists
            for action in previous_pattern["opp_next_actions"]:
                if action["action"] == obs["lastOpponentAction"]:
                    action["amount"] += 1
            # delete first two elements in current memory (actions of the oldest step in current memory)
            del current_memory[:2]
            
            # if action was not yet found
            if my_action == None:
                pattern = find_pattern(group["memory_patterns"], current_memory, group["memory_length"])
                # if appropriate pattern is found
                if pattern != None:
                    my_action_amount = 0
                    for action in pattern["opp_next_actions"]:
                        # if this opponent's action occurred more times than currently chosen action
                        # or, if it occured the same amount of times and this one is choosen randomly among them
                        if (action["amount"] > my_action_amount or
                                (action["amount"] == my_action_amount and random.random() > 0.5)):
                            my_action_amount = action["amount"]
                            my_action = action["response"]
    
    # if no action was found
    if my_action == None:
        # choose action randomly
        my_action = random.randint(0, 2)
    
    current_memory.append(my_action)
    return my_action

In [None]:
%%writefile multi_armed_bandit.py


import pandas as pd
import numpy as np
import json


# base class for all agents, random agent
class agent():
    def initial_step(self):
        return np.random.randint(3)
    
    def history_step(self, history):
        return np.random.randint(3)
    
    def step(self, history):
        if len(history) == 0:
            return int(self.initial_step())
        else:
            return int(self.history_step(history))
    
# agent that returns (previousCompetitorStep + shift) % 3
class mirror_shift(agent):
    def __init__(self, shift=0):
        self.shift = shift
    
    def history_step(self, history):
        return (history[-1]['competitorStep'] + self.shift) % 3
    
    
# agent that returns (previousPlayerStep + shift) % 3
class self_shift(agent):
    def __init__(self, shift=0):
        self.shift = shift
    
    def history_step(self, history):
        return (history[-1]['step'] + self.shift) % 3    


# agent that beats the most popular step of competitor
class popular_beater(agent):
    def history_step(self, history):
        counts = np.bincount([x['competitorStep'] for x in history])
        return (int(np.argmax(counts)) + 1) % 3

    
# agent that beats the agent that beats the most popular step of competitor
class anti_popular_beater(agent):
    def history_step(self, history):
        counts = np.bincount([x['step'] for x in history])
        return (int(np.argmax(counts)) + 2) % 3
    
    
# simple transition matrix: previous step -> next step
class transition_matrix(agent):
    def __init__(self, deterministic = False, counter_strategy = False, init_value = 0.1, decay = 1):
        self.deterministic = deterministic
        self.counter_strategy = counter_strategy
        if counter_strategy:
            self.step_type = 'step' 
        else:
            self.step_type = 'competitorStep'
        self.init_value = init_value
        self.decay = decay
        
    def history_step(self, history):
        matrix = np.zeros((3,3)) + self.init_value
        for i in range(len(history) - 1):
            matrix = (matrix - self.init_value) / self.decay + self.init_value
            matrix[int(history[i][self.step_type]), int(history[i+1][self.step_type])] += 1

        if  self.deterministic:
            step = np.argmax(matrix[int(history[-1][self.step_type])])
        else:
            step = np.random.choice([0,1,2], p = matrix[int(history[-1][self.step_type])]/matrix[int(history[-1][self.step_type])].sum())
        
        if self.counter_strategy:
            # we predict our step using transition matrix (as competitor can do) and beat probable competitor step
            return (step + 2) % 3 
        else:
            # we just predict competitors step and beat it
            return (step + 1) % 3
    

# similar to the transition matrix but rely on both previous steps
class transition_tensor(agent):
    
    def __init__(self, deterministic = False, counter_strategy = False, init_value = 0.1, decay = 1):
        self.deterministic = deterministic
        self.counter_strategy = counter_strategy
        if counter_strategy:
            self.step_type1 = 'step' 
            self.step_type2 = 'competitorStep'
        else:
            self.step_type2 = 'step' 
            self.step_type1 = 'competitorStep'
        self.init_value = init_value
        self.decay = decay
        
    def history_step(self, history):
        matrix = np.zeros((3,3, 3)) + 0.1
        for i in range(len(history) - 1):
            matrix = (matrix - self.init_value) / self.decay + self.init_value
            matrix[int(history[i][self.step_type1]), int(history[i][self.step_type2]), int(history[i+1][self.step_type1])] += 1

        if  self.deterministic:
            step = np.argmax(matrix[int(history[-1][self.step_type1]), int(history[-1][self.step_type2])])
        else:
            step = np.random.choice([0,1,2], p = matrix[int(history[-1][self.step_type1]), int(history[-1][self.step_type2])]/matrix[int(history[-1][self.step_type1]), int(history[-1][self.step_type2])].sum())
        
        if self.counter_strategy:
            # we predict our step using transition matrix (as competitor can do) and beat probable competitor step
            return (step + 2) % 3 
        else:
            # we just predict competitors step and beat it
            return (step + 1) % 3

        
# looks for the same pattern in history and returns the best answer to the most possible counter strategy
class pattern_matching(agent):
    def __init__(self, steps = 3, deterministic = False, counter_strategy = False, init_value = 0.1, decay = 1):
        self.deterministic = deterministic
        self.counter_strategy = counter_strategy
        if counter_strategy:
            self.step_type = 'step' 
        else:
            self.step_type = 'competitorStep'
        self.init_value = init_value
        self.decay = decay
        self.steps = steps
        
    def history_step(self, history):
        if len(history) < self.steps + 1:
            return self.initial_step()
        
        next_step_count = np.zeros(3) + self.init_value
        pattern = [history[i][self.step_type] for i in range(- self.steps, 0)]
        
        for i in range(len(history) - self.steps):
            next_step_count = (next_step_count - self.init_value)/self.decay + self.init_value
            current_pattern = [history[j][self.step_type] for j in range(i, i + self.steps)]
            if np.sum([pattern[j] == current_pattern[j] for j in range(self.steps)]) == self.steps:
                next_step_count[history[i + self.steps][self.step_type]] += 1
        
        if next_step_count.max() == self.init_value:
            return self.initial_step()
        
        if  self.deterministic:
            step = np.argmax(next_step_count)
        else:
            step = np.random.choice([0,1,2], p = next_step_count/next_step_count.sum())
        
        if self.counter_strategy:
            # we predict our step using transition matrix (as competitor can do) and beat probable competitor step
            return (step + 2) % 3 
        else:
            # we just predict competitors step and beat it
            return (step + 1) % 3
        
# if we add all agents the algorithm will spend more that 1 second on turn and will be invalidated
# right now the agens are non optimal and the same computeations are repeated a lot of times
# the approach can be optimised to run much faster
agents = {
    'mirror_0': mirror_shift(0),
    'mirror_1': mirror_shift(1),  
    'mirror_2': mirror_shift(2),
    'self_0': self_shift(0),
    'self_1': self_shift(1),  
    'self_2': self_shift(2),
    'popular_beater': popular_beater(),
    'anti_popular_beater': anti_popular_beater(),
    'random_transitison_matrix': transition_matrix(False, False),
    'determenistic_transitison_matrix': transition_matrix(True, False),
    'random_self_trans_matrix': transition_matrix(False, True),
    'determenistic_self_trans_matrix': transition_matrix(True, True),
    'random_transitison_tensor': transition_tensor(False, False),
    'determenistic_transitison_tensor': transition_tensor(True, False),
    'random_self_trans_tensor': transition_tensor(False, True),
    'determenistic_self_trans_tensor': transition_tensor(True, True),
    
    'random_transitison_matrix_decay': transition_matrix(False, False, decay = 1.05),
    'random_self_trans_matrix_decay': transition_matrix(False, True, decay = 1.05),
    'random_transitison_tensor_decay': transition_tensor(False, False, decay = 1.05),
    'random_self_trans_tensor_decay': transition_tensor(False, True, decay = 1.05),
    
    'determenistic_transitison_matrix_decay': transition_matrix(True, False, decay = 1.05),
    'determenistic_self_trans_matrix_decay': transition_matrix(True, True, decay = 1.05),
    'determenistic_transitison_tensor_decay': transition_tensor(True, False, decay = 1.05),
    'determenistic_self_trans_tensor_decay': transition_tensor(True, True, decay = 1.05),
    
#     'random_transitison_matrix_decay2': transition_matrix(False, False, decay = 1.001),
#     'random_self_trans_matrix_decay2': transition_matrix(False, True, decay = 1.001),
#     'random_transitison_tensor_decay2': transition_tensor(False, False, decay = 1.001),
#     'random_self_trans_tensor_decay2': transition_tensor(False, True, decay = 1.001),
    
#     'determenistic_transitison_matrix_decay2': transition_matrix(True, False, decay = 1.001),
#     'determenistic_self_trans_matrix_decay2': transition_matrix(True, True, decay = 1.001),
#     'determenistic_transitison_tensor_decay2': transition_tensor(True, False, decay = 1.001),
#     'determenistic_self_trans_tensor_decay2': transition_tensor(True, True, decay = 1.001),
    
#     'random_pattern_matching_decay_1': pattern_matching(1, False, False, decay = 1.001),
#     'random_self_pattern_matching_decay_1': pattern_matching(1, False, True, decay = 1.001),
#     'determenistic_pattern_matching_decay_1': pattern_matching(1, True, False, decay = 1.001),
#     'determenistic_self_pattern_matching_decay_1': pattern_matching(1, True, True, decay = 1.001),
    
#     'random_pattern_matching_decay_2': pattern_matching(2, False, False, decay = 1.001),
#     'random_self_pattern_matching_decay_2': pattern_matching(2, False, True, decay = 1.001),
#     'determenistic_pattern_matching_decay_2': pattern_matching(2, True, False, decay = 1.001),
#     'determenistic_self_pattern_matching_decay_2': pattern_matching(2, True, True, decay = 1.001),
    
#     'random_pattern_matching_decay_3': pattern_matching(3, False, False, decay = 1.001),
#     'random_self_pattern_matching_decay_3': pattern_matching(3, False, True, decay = 1.001),
    'determenistic_pattern_matching_decay_3': pattern_matching(3, True, False, decay = 1.001),
    'determenistic_self_pattern_matching_decay_3': pattern_matching(3, True, True, decay = 1.001),
    
#     'random_pattern_matching_decay_4': pattern_matching(4, False, False, decay = 1.001),
#     'random_self_pattern_matching_decay_4': pattern_matching(4, False, True, decay = 1.001),
#     'determenistic_pattern_matching_decay_4': pattern_matching(4, True, False, decay = 1.001),
#     'determenistic_self_pattern_matching_decay_4': pattern_matching(4, True, True, decay = 1.001),
    
#     'random_pattern_matching_decay_5': pattern_matching(5, False, False, decay = 1.001),
#     'random_self_pattern_matching_decay_5': pattern_matching(5, False, True, decay = 1.001),
#     'determenistic_pattern_matching_decay_5': pattern_matching(5, True, False, decay = 1.001),
#     'determenistic_self_pattern_matching_decay_5': pattern_matching(5, True, True, decay = 1.001),
    
#     'random_pattern_matching_decay_6': pattern_matching(6, False, False, decay = 1.001),
#     'random_self_pattern_matching_decay_6': pattern_matching(6, False, True, decay = 1.001),
#     'determenistic_pattern_matching_decay_6': pattern_matching(6, True, False, decay = 1.001),
#     'determenistic_self_pattern_matching_decay_6': pattern_matching(6, True, True, decay = 1.001),
}

history = []
bandit_state = {k:[1,1] for k in agents.keys()}
    
def multi_armed_bandit_agent (observation, configuration):
    
    # bandits' params
    step_size = 3 # how much we increase a and b 
    decay_rate = 1.05 # how much do we decay old historical data
    
    global history, bandit_state
    
    def log_step(step = None, history = None, agent = None, competitorStep = None, file = 'history.csv'):
        if step is None:
            step = np.random.randint(3)
        if history is None:
            history = []
        history.append({'step': step, 'competitorStep': competitorStep, 'agent': agent})
        if file is not None:
            pd.DataFrame(history).to_csv(file, index = False)
        return step
    
    def update_competitor_step(history, competitorStep):
        history[-1]['competitorStep'] = int(competitorStep)
        return history
    
    # load history
    if observation.step == 0:
        pass
    else:
        history = update_competitor_step(history, observation.lastOpponentAction)
        
        # updating bandit_state using the result of the previous step
        # we can update all states even those that were not used
        for name, agent in agents.items():
            agent_step = agent.step(history[:-1])
            bandit_state[name][1] = (bandit_state[name][1] - 1) / decay_rate + 1
            bandit_state[name][0] = (bandit_state[name][0] - 1) / decay_rate + 1
            
            if (history[-1]['competitorStep'] - agent_step) % 3 == 1:
                bandit_state[name][1] += step_size
            elif (history[-1]['competitorStep'] - agent_step) % 3 == 2:
                bandit_state[name][0] += step_size
            else:
                bandit_state[name][0] += step_size/2
                bandit_state[name][1] += step_size/2
            
    # we can use it for analysis later
#     with open('bandit.json', 'w') as outfile:
#         json.dump(bandit_state, outfile)
#     
    
    # generate random number from Beta distribution for each agent and select the most lucky one
    best_proba = -1
    best_agent = None
    for k in bandit_state.keys():
        proba = np.random.beta(bandit_state[k][0],bandit_state[k][1])
        if proba > best_proba:
            best_proba = proba
            best_agent = k
        
    step = agents[best_agent].step(history)
    
    return log_step(step, history, best_agent)

In [None]:
%%writefile opponent_transition_matrix.py

import numpy as np
import pandas as pd
import random

T = np.zeros((3, 3))
P = np.zeros((3, 3))

# a1 is the action of the opponent 1 step ago
# a2 is the action of the opponent 2 steps ago
a1, a2 = None, None

def transition_agent(observation, configuration):
    global T, P, a1, a2
    if observation.step > 1:
        a1 = observation.lastOpponentAction
        T[a2, a1] += 1
        P = np.divide(T, np.maximum(1, T.sum(axis=1)).reshape(-1, 1))
        a2 = a1
        if np.sum(P[a1, :]) == 1:
            return int((np.random.choice(
                [0, 1, 2],
                p=P[a1, :]
            ) + 1) % 3)
        else:
            return int(np.random.randint(3))
    else:
        if observation.step == 1:
            a2 = observation.lastOpponentAction
        return int(np.random.randint(3))

In [None]:
%%writefile decision_tree_classifier.py

import numpy as np
import collections
from sklearn.tree import DecisionTreeClassifier

def construct_local_features(rollouts):
    features = np.array([[step % k for step in rollouts['steps']] for k in (2, 3, 5)])
    features = np.append(features, rollouts['steps'])
    features = np.append(features, rollouts['actions'])
    features = np.append(features, rollouts['opp-actions'])
    return features

def construct_global_features(rollouts):
    features = []
    for key in ['actions', 'opp-actions']:
        for i in range(3):
            actions_count = np.mean([r == i for r in rollouts[key]])
            features.append(actions_count)
    
    return np.array(features)

def construct_features(short_stat_rollouts, long_stat_rollouts):
    lf = construct_local_features(short_stat_rollouts)
    gf = construct_global_features(long_stat_rollouts)
    features = np.concatenate([lf, gf])
    return features

def predict_opponent_move(train_data, test_sample):
    classifier = DecisionTreeClassifier(random_state=42)
    classifier.fit(train_data['x'], train_data['y'])
    return classifier.predict(test_sample)

def update_rollouts_hist(rollouts_hist, last_move, opp_last_action):
    rollouts_hist['steps'].append(last_move['step'])
    rollouts_hist['actions'].append(last_move['action'])
    rollouts_hist['opp-actions'].append(opp_last_action)
    return rollouts_hist

def warmup_strategy(observation, configuration):
    global rollouts_hist, last_move
    action = int(np.random.randint(3))
    if observation.step == 0:
        last_move = {'step': 0, 'action': action}
        rollouts_hist = {'steps': [], 'actions': [], 'opp-actions': []}
    else:
        rollouts_hist = update_rollouts_hist(rollouts_hist, last_move, observation.lastOpponentAction)
        last_move = {'step': observation.step, 'action': action}
    return int(action)

def init_training_data(rollouts_hist, k):
    for i in range(len(rollouts_hist['steps']) - k + 1):
        short_stat_rollouts = {key: rollouts_hist[key][i:i+k] for key in rollouts_hist}
        long_stat_rollouts = {key: rollouts_hist[key][:i+k] for key in rollouts_hist}
        features = construct_features(short_stat_rollouts, long_stat_rollouts)        
        data['x'].append(features)
    test_sample = data['x'][-1].reshape(1, -1)
    data['x'] = data['x'][:-1]
    data['y'] = rollouts_hist['opp-actions'][k:]
    return data, test_sample

def agent(observation, configuration):
    # hyperparameters
    k = 5
    min_samples = 25
    global rollouts_hist, last_move, data, test_sample
    if observation.step == 0:
        data = {'x': [], 'y': []}
    # if not enough data -> randomize
    if observation.step <= min_samples + k:
        return warmup_strategy(observation, configuration)
    # update statistics
    rollouts_hist = update_rollouts_hist(rollouts_hist, last_move, observation.lastOpponentAction)
    # update training data
    if len(data['x']) == 0:
        data, test_sample = init_training_data(rollouts_hist, k)
    else:        
        short_stat_rollouts = {key: rollouts_hist[key][-k:] for key in rollouts_hist}
        features = construct_features(short_stat_rollouts, rollouts_hist)
        data['x'].append(test_sample[0])
        data['y'] = rollouts_hist['opp-actions'][k:]
        test_sample = features.reshape(1, -1)
        
    # predict opponents move and choose an action
    next_opp_action_pred = predict_opponent_move(data, test_sample)
    action = int((next_opp_action_pred + 1) % 3)
    last_move = {'step': observation.step, 'action': action}
    return action

In [None]:
%%writefile statistical_prediction.py

import random
import pydash
from collections import Counter

# Create a small amount of starting history
history = {
    "guess":      [0,1,2],
    "prediction": [0,1,2],
    "expected":   [0,1,2],
    "action":     [0,1,2],
    "opponent":   [0,1],
}
def statistical_prediction_agent(observation, configuration):    
    global history
    actions         = list(range(configuration.signs))  # [0,1,2]
    last_action     = history['action'][-1]
    opponent_action = observation.lastOpponentAction if observation.step > 0 else 2
    
    history['opponent'].append(opponent_action)

    # Make weighted random guess based on the complete move history, weighted towards relative moves based on our last action 
    move_frequency       = Counter(history['opponent'])
    response_frequency   = Counter(zip(history['action'], history['opponent'])) 
    move_weights         = [ move_frequency.get(n,1) + response_frequency.get((last_action,n),1) for n in range(configuration.signs) ] 
    guess                = random.choices( population=actions, weights=move_weights, k=1 )[0]
    
    # Compare our guess to how our opponent actually played
    guess_frequency      = Counter(zip(history['guess'], history['opponent']))
    guess_weights        = [ guess_frequency.get((guess,n),1) for n in range(configuration.signs) ]
    prediction           = random.choices( population=actions, weights=guess_weights, k=1 )[0]

    # Repeat, but based on how many times our prediction was correct
    prediction_frequency = Counter(zip(history['prediction'], history['opponent']))
    prediction_weights   = [ prediction_frequency.get((prediction,n),1) for n in range(configuration.signs) ]
    expected             = random.choices( population=actions, weights=prediction_weights, k=1 )[0]

    # Play the +1 counter move
    action = (expected + 1) % configuration.signs
    
    # Persist state
    history['guess'].append(guess)
    history['prediction'].append(prediction)
    history['expected'].append(expected)
    history['action'].append(action)

    # Print debug information
    print('opponent_action                = ', opponent_action)
    print('move_weights,       guess      = ', move_weights, guess)
    print('guess_weights,      prediction = ', guess_weights, prediction)
    print('prediction_weights, expected   = ', prediction_weights, expected)
    print('action                         = ', action)
    print()
    
    return action

In [None]:
%%writefile iocaine.py

import random


def recall(age, hist):
    """Looking at the last 'age' points in 'hist', finds the
    last point with the longest similarity to the current point,
    returning 0 if none found."""
    end, length = 0, 0
    for past in range(1, min(age + 1, len(hist) - 1)):
        if length >= len(hist) - past: break
        for i in range(-1 - length, 0):
            if hist[i - past] != hist[i]: break
        else:
            for length in range(length + 1, len(hist) - past):
                if hist[-past - length - 1] != hist[-length - 1]: break
            else: length += 1
            end = len(hist) - past
    return end

def beat(i):
    return (i + 1) % 3
def loseto(i):
    return (i - 1) % 3

class Stats:
    """Maintains three running counts and returns the highest count based
         on any given time horizon and threshold."""
    def __init__(self):
        self.sum = [[0, 0, 0]]
    def add(self, move, score):
        self.sum[-1][move] += score
    def advance(self):
        self.sum.append(self.sum[-1])
    def max(self, age, default, score):
        if age >= len(self.sum): diff = self.sum[-1]
        else: diff = [self.sum[-1][i] - self.sum[-1 - age][i] for i in range(3)]
        m = max(diff)
        if m > score: return diff.index(m), m
        return default, score

class Predictor:
    """The basic iocaine second- and triple-guesser.    Maintains stats on the
         past benefits of trusting or second- or triple-guessing a given strategy,
         and returns the prediction of that strategy (or the second- or triple-
         guess) if past stats are deviating from zero farther than the supplied
         "best" guess so far."""
    def __init__(self):
        self.stats = Stats()
        self.lastguess = -1
    def addguess(self, lastmove, guess):
        if lastmove != -1:
            diff = (lastmove - self.prediction) % 3
            self.stats.add(beat(diff), 1)
            self.stats.add(loseto(diff), -1)
            self.stats.advance()
        self.prediction = guess
    def bestguess(self, age, best):
        bestdiff = self.stats.max(age, (best[0] - self.prediction) % 3, best[1])
        return (bestdiff[0] + self.prediction) % 3, bestdiff[1]

ages = [1000, 100, 10, 5, 2, 1]

class Iocaine:

    def __init__(self):
        """Build second-guessers for 50 strategies: 36 history-based strategies,
             12 simple frequency-based strategies, the constant-move strategy, and
             the basic random-number-generator strategy.    Also build 6 meta second
             guessers to evaluate 6 different time horizons on which to score
             the 50 strategies' second-guesses."""
        self.predictors = []
        self.predict_history = self.predictor((len(ages), 2, 3))
        self.predict_frequency = self.predictor((len(ages), 2))
        self.predict_fixed = self.predictor()
        self.predict_random = self.predictor()
        self.predict_meta = [Predictor() for a in range(len(ages))]
        self.stats = [Stats() for i in range(2)]
        self.histories = [[], [], []]

    def predictor(self, dims=None):
        """Returns a nested array of predictor objects, of the given dimensions."""
        if dims: return [self.predictor(dims[1:]) for i in range(dims[0])]
        self.predictors.append(Predictor())
        return self.predictors[-1]

    def move(self, them):
        """The main iocaine "move" function."""

        # histories[0] stores our moves (last one already previously decided);
        # histories[1] stores their moves (last one just now being supplied to us);
        # histories[2] stores pairs of our and their last moves.
        # stats[0] and stats[1] are running counters our recent moves and theirs.
        if them != -1:
            self.histories[1].append(them)
            self.histories[2].append((self.histories[0][-1], them))
            for watch in range(2):
                self.stats[watch].add(self.histories[watch][-1], 1)

        # Execute the basic RNG strategy and the fixed-move strategy.
        rand = random.randrange(3)
        self.predict_random.addguess(them, rand)
        self.predict_fixed.addguess(them, 0)

        # Execute the history and frequency stratgies.
        for a, age in enumerate(ages):
            # For each time window, there are three ways to recall a similar time:
            # (0) by history of my moves; (1) their moves; or (2) pairs of moves.
            # Set "best" to these three timeframes (zero if no matching time).
            best = [recall(age, hist) for hist in self.histories]
            for mimic in range(2):
                # For each similar historical moment, there are two ways to anticipate
                # the future: by mimicing what their move was; or mimicing what my
                # move was.    If there were no similar moments, just move randomly.
                for watch, when in enumerate(best):
                    if not when: move = rand
                    else: move = self.histories[mimic][when]
                    self.predict_history[a][mimic][watch].addguess(them, move)
                # Also we can anticipate the future by expecting it to be the same
                # as the most frequent past (either counting their moves or my moves).
                mostfreq, score = self.stats[mimic].max(age, rand, -1)
                self.predict_frequency[a][mimic].addguess(them, mostfreq)

        # All the predictors have been updated, but we have not yet scored them
        # and chosen a winner for this round.    There are several timeframes
        # on which we can score second-guessing, and we don't know timeframe will
        # do best.    So score all 50 predictors on all 6 timeframes, and record
        # the best 6 predictions in meta predictors, one for each timeframe.
        for meta, age in enumerate(ages):
            best = (-1, -1)
            for predictor in self.predictors:
                best = predictor.bestguess(age, best)
            self.predict_meta[meta].addguess(them, best[0])

        # Finally choose the best meta prediction from the final six, scoring
        # these against each other on the whole-game timeframe. 
        best = (-1, -1)
        for meta in range(len(ages)):
            best = self.predict_meta[meta].bestguess(len(self.histories[0]) , best) 

        # We've picked a next move.    Record our move in histories[0] for next time.
        self.histories[0].append(best[0])

        # And return it.
        return best[0]

iocaine = None

def iocaine_agent(observation, configuration):
    global iocaine
    if observation.step == 0:
        iocaine = Iocaine()
        act = iocaine.move(-1)
    else:
        act = iocaine.move(observation.lastOpponentAction)
        
    return act

In [None]:
%%writefile greenberg.py

# greenberg roshambo bot, winner of 2nd annual roshambo programming competition
# http://webdocs.cs.ualberta.ca/~darse/rsbpc.html

# original source by Andrzej Nagorko
# http://www.mathpuzzle.com/greenberg.c

# Python translation by Travis Erdman
# https://github.com/erdman/roshambo

import random
from operator import itemgetter
# from itertools import izip
izip   = zip   # BUGFIX: izip   is python2
xrange = range # BUGFIX: xrange is python2

rps_to_text  = ('rock','paper','scissors')
rps_to_num   = {'rock':0, 'paper':1, 'scissors':2}

def player(my_moves, opp_moves):
    wins_with    = (1,2,0)  # superior
    best_without = (2,0,1)  # inferior

    lengths = (10, 20, 30, 40, 49, 0)
    p_random = random.choice([0,1,2])  #called 'guess' in iocaine

    TRIALS = 1000
    score_table =((0,-1,1),(1,0,-1),(-1,1,0))
    T = len(opp_moves)  #so T is number of trials completed

    def min_index(values):
        return min(enumerate(values), key=itemgetter(1))[0]

    def max_index(values):
        return max(enumerate(values), key=itemgetter(1))[0]

    def find_best_prediction(l):  # l = len
        bs = -TRIALS
        bp = 0
        if player.p_random_score > bs:
            bs = player.p_random_score
            bp = p_random
        for i in xrange(3):
            for j in xrange(24):
                for k in xrange(4):
                    new_bs = player.p_full_score[T%50][j][k][i] - (player.p_full_score[(50+T-l)%50][j][k][i] if l else 0)
                    if new_bs > bs:
                        bs = new_bs
                        bp = (player.p_full[j][k] + i) % 3
                for k in xrange(2):
                    new_bs = player.r_full_score[T%50][j][k][i] - (player.r_full_score[(50+T-l)%50][j][k][i] if l else 0)
                    if new_bs > bs:
                        bs = new_bs
                        bp = (player.r_full[j][k] + i) % 3
            for j in xrange(2):
                for k in xrange(2):
                    new_bs = player.p_freq_score[T%50][j][k][i] - (player.p_freq_score[(50+T-l)%50][j][k][i] if l else 0)
                    if new_bs > bs:
                        bs = new_bs
                        bp = (player.p_freq[j][k] + i) % 3
                    new_bs = player.r_freq_score[T%50][j][k][i] - (player.r_freq_score[(50+T-l)%50][j][k][i] if l else 0)
                    if new_bs > bs:
                        bs = new_bs
                        bp = (player.r_freq[j][k] + i) % 3
        return bp


    if not my_moves:
        player.opp_history = [0]  #pad to match up with 1-based move indexing in original
        player.my_history = [0]
        player.gear = [[0] for _ in xrange(24)]
        # init()
        player.p_random_score = 0
        player.p_full_score = [[[[0 for i in xrange(3)] for k in xrange(4)] for j in xrange(24)] for l in xrange(50)]
        player.r_full_score = [[[[0 for i in xrange(3)] for k in xrange(2)] for j in xrange(24)] for l in xrange(50)]
        player.p_freq_score = [[[[0 for i in xrange(3)] for k in xrange(2)] for j in xrange(2)] for l in xrange(50)]
        player.r_freq_score = [[[[0 for i in xrange(3)] for k in xrange(2)] for j in xrange(2)] for l in xrange(50)]
        player.s_len = [0] * 6

        player.p_full = [[0,0,0,0] for _ in xrange(24)]
        player.r_full = [[0,0] for _ in xrange(24)]
    else:
        player.my_history.append(rps_to_num[my_moves[-1]])
        player.opp_history.append(rps_to_num[opp_moves[-1]])
        # update_scores()
        player.p_random_score += score_table[p_random][player.opp_history[-1]]
        player.p_full_score[T%50] = [[[player.p_full_score[(T+49)%50][j][k][i] + score_table[(player.p_full[j][k] + i) % 3][player.opp_history[-1]] for i in xrange(3)] for k in xrange(4)] for j in xrange(24)]
        player.r_full_score[T%50] = [[[player.r_full_score[(T+49)%50][j][k][i] + score_table[(player.r_full[j][k] + i) % 3][player.opp_history[-1]] for i in xrange(3)] for k in xrange(2)] for j in xrange(24)]
        player.p_freq_score[T%50] = [[[player.p_freq_score[(T+49)%50][j][k][i] + score_table[(player.p_freq[j][k] + i) % 3][player.opp_history[-1]] for i in xrange(3)] for k in xrange(2)] for j in xrange(2)]
        player.r_freq_score[T%50] = [[[player.r_freq_score[(T+49)%50][j][k][i] + score_table[(player.r_freq[j][k] + i) % 3][player.opp_history[-1]] for i in xrange(3)] for k in xrange(2)] for j in xrange(2)]
        player.s_len = [s + score_table[p][player.opp_history[-1]] for s,p in izip(player.s_len,player.p_len)]


    # update_history_hash()
    if not my_moves:
        player.my_history_hash = [[0],[0],[0],[0]]
        player.opp_history_hash = [[0],[0],[0],[0]]
    else:
        player.my_history_hash[0].append(player.my_history[-1])
        player.opp_history_hash[0].append(player.opp_history[-1])
        for i in xrange(1,4):
            player.my_history_hash[i].append(player.my_history_hash[i-1][-1] * 3 + player.my_history[-1])
            player.opp_history_hash[i].append(player.opp_history_hash[i-1][-1] * 3 + player.opp_history[-1])


    #make_predictions()

    for i in xrange(24):
        player.gear[i].append((3 + player.opp_history[-1] - player.p_full[i][2]) % 3)
        if T > 1:
            player.gear[i][T] += 3 * player.gear[i][T-1]
        player.gear[i][T] %= 9 # clearly there are 9 different gears, but original code only allocated 3 gear_freq's
                               # code apparently worked, but got lucky with undefined behavior
                               # I fixed by allocating gear_freq with length = 9
    if not my_moves:
        player.freq = [[0,0,0],[0,0,0]]
        value = [[0,0,0],[0,0,0]]
    else:
        player.freq[0][player.my_history[-1]] += 1
        player.freq[1][player.opp_history[-1]] += 1
        value = [[(1000 * (player.freq[i][2] - player.freq[i][1])) / float(T),
                  (1000 * (player.freq[i][0] - player.freq[i][2])) / float(T),
                  (1000 * (player.freq[i][1] - player.freq[i][0])) / float(T)] for i in xrange(2)]
    player.p_freq = [[wins_with[max_index(player.freq[i])], wins_with[max_index(value[i])]] for i in xrange(2)]
    player.r_freq = [[best_without[min_index(player.freq[i])], best_without[min_index(value[i])]] for i in xrange(2)]

    f = [[[[0,0,0] for k in xrange(4)] for j in xrange(2)] for i in xrange(3)]
    t = [[[0,0,0,0] for j in xrange(2)] for i in xrange(3)]

    m_len = [[0 for _ in xrange(T)] for i in xrange(3)]

    for i in xrange(T-1,0,-1):
        m_len[0][i] = 4
        for j in xrange(4):
            if player.my_history_hash[j][i] != player.my_history_hash[j][T]:
                m_len[0][i] = j
                break
        for j in xrange(4):
            if player.opp_history_hash[j][i] != player.opp_history_hash[j][T]:
                m_len[1][i] = j
                break
        for j in xrange(4):
            if player.my_history_hash[j][i] != player.my_history_hash[j][T] or player.opp_history_hash[j][i] != player.opp_history_hash[j][T]:
                m_len[2][i] = j
                break

    for i in xrange(T-1,0,-1):
        for j in xrange(3):
            for k in xrange(m_len[j][i]):
                f[j][0][k][player.my_history[i+1]] += 1
                f[j][1][k][player.opp_history[i+1]] += 1
                t[j][0][k] += 1
                t[j][1][k] += 1

                if t[j][0][k] == 1:
                    player.p_full[j*8 + 0*4 + k][0] = wins_with[player.my_history[i+1]]
                if t[j][1][k] == 1:
                    player.p_full[j*8 + 1*4 + k][0] = wins_with[player.opp_history[i+1]]
                if t[j][0][k] == 3:
                    player.p_full[j*8 + 0*4 + k][1] = wins_with[max_index(f[j][0][k])]
                    player.r_full[j*8 + 0*4 + k][0] = best_without[min_index(f[j][0][k])]
                if t[j][1][k] == 3:
                    player.p_full[j*8 + 1*4 + k][1] = wins_with[max_index(f[j][1][k])]
                    player.r_full[j*8 + 1*4 + k][0] = best_without[min_index(f[j][1][k])]

    for j in xrange(3):
        for k in xrange(4):
            player.p_full[j*8 + 0*4 + k][2] = wins_with[max_index(f[j][0][k])]
            player.r_full[j*8 + 0*4 + k][1] = best_without[min_index(f[j][0][k])]

            player.p_full[j*8 + 1*4 + k][2] = wins_with[max_index(f[j][1][k])]
            player.r_full[j*8 + 1*4 + k][1] = best_without[min_index(f[j][1][k])]

    for j in xrange(24):
        gear_freq = [0] * 9 # was [0,0,0] because original code incorrectly only allocated array length 3

        for i in xrange(T-1,0,-1):
            if player.gear[j][i] == player.gear[j][T]:
                gear_freq[player.gear[j][i+1]] += 1

        #original source allocated to 9 positions of gear_freq array, but only allocated first three
        #also, only looked at first 3 to find the max_index
        #unclear whether to seek max index over all 9 gear_freq's or just first 3 (as original code)
        player.p_full[j][3] = (player.p_full[j][1] + max_index(gear_freq)) % 3

    # end make_predictions()

    player.p_len = [find_best_prediction(l) for l in lengths]

    return rps_to_text[player.p_len[max_index(player.s_len)]]



# observation   =  {'step': 1, 'lastOpponentAction': 1}
# configuration =  {'episodeSteps': 10, 'agentTimeout': 60, 'actTimeout': 1, 'runTimeout': 1200, 'isProduction': False, 'signs': 3}
my_moves    = []
opp_moves   = []
def kaggle_agent(observation, configuration):    
    global my_moves
    global opp_moves
    if observation.step > 0:
        opp_move = rps_to_text[ observation.lastOpponentAction ]
        opp_moves.append( opp_move )
        
    action_text = player(my_moves, opp_moves)
    action      = rps_to_num[action_text]

    my_moves.append(action_text)
    return int(action)

In [None]:
%%writefile preCoded_hist.py

import random

moves = [0, 1, 2]
dna_encode = {
    '11': '1', '10': '2', '12': '3',
    '01': '4', '02': '5', '00': '6',
    '22': '7', '21': '8', '20': '9' }

def beat_move(x):
    return (x + 1) % 3

def agent (observation, configuration):
    global opp_history, action, dna
    if observation.step == 0:
        opp_history = ''
        dna = ''
        action = random.choice([0, 1, 2])
    else:
        opp_history += str(observation.lastOpponentAction)
        dna += dna_encode[str(observation.lastOpponentAction) + str(action)]

        for length in (100, 90, 80, 70, 60, 50, 40, 30, 20, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1):
            # Search for the last longest chain
            x = dna[:-1].rfind (dna[-length:])
            if x >= 0:
                # If found: Pick what will be the next move and play against it
                next_move = opp_history[x + length]
                action = beat_move(int(next_move))
                break
    
    return action

In [None]:
%%writefile xgboost.py

import random
from pandas import DataFrame
from xgboost import XGBClassifier

numTurnsPredictors = 5 #number of previous turns to use as predictors
minTrainSetRows = 10 #only start predicting moves after we have enough data
myLastMove = None
mySecondLastMove = None
opponentLastMove = None
numDummies = 2 #how many dummy vars we need to represent a move
predictors = DataFrame(columns=[str(x) for x in range(numTurnsPredictors * 2 * numDummies)])
predictors = predictors.astype("int")
opponentsMoves = []
roundHistory = [] #moves made by both players in each round
clf = XGBClassifier(n_estimators=10)

def randomMove():
    return random.randint(0,2)

#converts my and opponents moves into dummy variables i.e. [1,2] into [0,1,1,0]
def convertToDummies(moves):
    newMoves = []
    dummies = [[0,0], [0,1], [1,0]]

    for move in moves:
        newMoves.extend(dummies[move])

    return newMoves

def updateRoundHistory(myMove, opponentMove):
    global roundHistory
    roundHistory.append(convertToDummies([myMove, opponentMove]))

def flattenData(data):
    return sum(data, [])

def updateFeatures(rounds):
    global predictors
    flattenedRounds = flattenData(rounds)
    predictors.loc[len(predictors)] = flattenedRounds

def fitAndPredict(clf, x, y, newX):
    df = DataFrame.from_records([newX], columns=[str(i) for i in range(numTurnsPredictors * 2 * numDummies)])
    clf.fit(x, y)
    return int(clf.predict(df)[0])

def makeMove(observation, configuration):
    global myLastMove
    global mySecondLastMove
    global opponentLastMove
    global predictors
    global opponentsMoves
    global roundHistory

    if observation.step == 0:
        myLastMove = randomMove()
        return myLastMove

    if observation.step == 1:
        updateRoundHistory(myLastMove, observation.lastOpponentAction)
        myLastMove = randomMove()
        return myLastMove

    else:
        updateRoundHistory(myLastMove, observation.lastOpponentAction)
        opponentsMoves.append(observation.lastOpponentAction)

        if observation.step > numTurnsPredictors:
            updateFeatures(roundHistory[-numTurnsPredictors - 1: -1])

        if len(predictors) > minTrainSetRows:
            predictX = flattenData(roundHistory[-numTurnsPredictors:]) #data to predict next move
            predictedMove = fitAndPredict(clf, predictors,
                                opponentsMoves[(numTurnsPredictors-1):], predictX)
            myLastMove = (predictedMove + 1) % 3
            return myLastMove
        else:
            myLastMove = randomMove()
            return myLastMove

In [None]:
%%writefile not_losing.py
import random
import numpy as np
import lightgbm as lgb
import pandas as pd

USE_BACK = 10
PRED_USE_STEP_THRES = 200
PRED_USE_SCORE_THRES = 0.9
my_actions = []
op_actions = []
solutions   = []

## ============== LIGHT GBM PREDICTION ============== ## 
def predict(my_actions, op_actions):
    size = len(my_actions)
    
    d = dict()
    for u in range(USE_BACK):
        d[f"OP_{u}"] = op_actions[u: size - (USE_BACK - u)]
        d[f"MY_{u}"] = my_actions[u: size - (USE_BACK - u)]
    
    X_train = pd.DataFrame(d)
    y_train = op_actions[USE_BACK: size]
    y_train = pd.DataFrame(y_train, columns=["y"])
    
    n = dict()
    for u in range(USE_BACK):
        n[f"OP_{u}"] = [op_actions[size - (USE_BACK - u)]]
        n[f"MY_{u}"] = [my_actions[size - (USE_BACK - u)]]
    
    X_test = pd.DataFrame(n)

    classifier = lgb.LGBMClassifier(
        random_state=0, 
        n_estimators=10, 
    )
    
    classifier.fit(X_train, y_train)
    return classifier.predict_proba(X_test).tolist()[0], int(classifier.predict(X_test)[0])

## ============== RANDOM(NASH EQUILIBRIUM) ============== ##
def randomize():
    return int(random.randint(0, 2))

## ============== PREDICT ONLY CONFIRM, OTHER RANDOM ============== ##
def predict_only_when_confirmation(observation, configuration):
    global my_actions
    global op_actions
    
    if observation.step != 0:
        op_actions.append(observation.lastOpponentAction)
    
    if observation.step > PRED_USE_STEP_THRES:
        pred_proba, pred = predict(my_actions, op_actions)      
        if max(pred_proba) > PRED_USE_SCORE_THRES:
            my_action = pred
            my_action = (my_action + 1) % 3
        
        else:
            my_action = randomize()
        
    else:    
        my_action = randomize()
    
    my_actions.append(my_action)
    
    return my_action

In [None]:
%%writefile simple_method.py
import random

recs = []

def my_agent(obs, conf):
    global recs
    
    count=len(recs)
    
    if count==0:
        hand = random.randint(0, 2)
        
    else:
        recs.append(obs["lastOpponentAction"])

        hand_count = [recs.count(0), recs.count(1), recs.count(2)]
        hand_ratio = [hand_count[0], hand_count[0]+hand_count[1], count] / count

        hand_rand = random.random()
        for i,ratio in enumerate(hand_ratio):
            if hand_rand <= ratio:
                hand = i
                break
    
    return (hand+1)%3

In [None]:
%%writefile geometry.py

import operator
import numpy as np
import cmath
from typing import List
from collections import namedtuple
import traceback
import sys


basis = np.array(
    [1, cmath.exp(2j * cmath.pi * 1 / 3), cmath.exp(2j * cmath.pi * 2 / 3)]
)


HistMatchResult = namedtuple("HistMatchResult", "idx length")


def find_all_longest(seq, max_len=None) -> List[HistMatchResult]:
    """
    Find all indices where end of `seq` matches some past.
    """
    result = []

    i_search_start = len(seq) - 2

    while i_search_start > 0:
        i_sub = -1
        i_search = i_search_start
        length = 0

        while i_search >= 0 and seq[i_sub] == seq[i_search]:
            length += 1
            i_sub -= 1
            i_search -= 1

            if max_len is not None and length > max_len:
                break

        if length > 0:
            result.append(HistMatchResult(i_search_start + 1, length))

        i_search_start -= 1

    result = sorted(result, key=operator.attrgetter("length"), reverse=True)

    return result


def probs_to_complex(p):
    return p @ basis


def _fix_probs(probs):
    """
    Put probs back into triangle. Sometimes this happens due to rounding errors or if you
    use complex numbers which are outside the triangle.
    """
    if min(probs) < 0:
        probs -= min(probs)

    probs /= sum(probs)

    return probs


def complex_to_probs(z):
    probs = (2 * (z * basis.conjugate()).real + 1) / 3
    probs = _fix_probs(probs)
    return probs


def z_from_action(action):
    return basis[action]


def sample_from_z(z):
    probs = complex_to_probs(z)
    return np.random.choice(3, p=probs)


def bound(z):
    return probs_to_complex(complex_to_probs(z))


def norm(z):
    return bound(z / abs(z))


class Pred:
    def __init__(self, *, alpha):
        self.offset = 0
        self.alpha = alpha
        self.last_feat = None

    def train(self, target):
        if self.last_feat is not None:
            offset = target * self.last_feat.conjugate()   # fixed

            self.offset = (1 - self.alpha) * self.offset + self.alpha * offset

    def predict(self, feat):
        """
        feat is an arbitrary feature with a probability on 0,1,2
        anything which could be useful anchor to start with some kind of sensible direction
        """
        feat = norm(feat)

        # offset = mean(target - feat)
        # so here we see something like: result = feat + mean(target - feat)
        # which seems natural and accounts for the correlation between target and feat
        # all RPSContest bots do no more than that as their first step, just in a different way
        
        result = feat * self.offset

        self.last_feat = feat

        return result
    
    
class BaseAgent:
    def __init__(self):
        self.my_hist = []
        self.opp_hist = []
        self.my_opp_hist = []
        self.outcome_hist = []
        self.step = None

    def __call__(self, obs, conf):
        try:
            if obs.step == 0:
                action = np.random.choice(3)
                self.my_hist.append(action)
                return action

            self.step = obs.step

            opp = int(obs.lastOpponentAction)
            my = self.my_hist[-1]

            self.my_opp_hist.append((my, opp))
            self.opp_hist.append(opp)

            outcome = {0: 0, 1: 1, 2: -1}[(my - opp) % 3]
            self.outcome_hist.append(outcome)

            action = self.action()

            self.my_hist.append(action)

            return action
        except Exception:
            traceback.print_exc(file=sys.stderr)
            raise

    def action(self):
        pass


class Agent(BaseAgent):
    def __init__(self, alpha=0.01):
        super().__init__()

        self.predictor = Pred(alpha=alpha)

    def action(self):
        self.train()

        pred = self.preds()

        return_action = sample_from_z(pred)

        return return_action

    def train(self):
        last_beat_opp = z_from_action((self.opp_hist[-1] + 1) % 3)
        self.predictor.train(last_beat_opp)

    def preds(self):
        hist_match = find_all_longest(self.my_opp_hist, max_len=20)

        if not hist_match:
             return 0

        feat = z_from_action(self.opp_hist[hist_match[0].idx])

        pred = self.predictor.predict(feat)

        return pred
    
    
agent = Agent()


def call_agent(obs, conf):
    return agent(obs, conf)

In [None]:
%%writefile "anti_geo.py"

import operator
import numpy as np
import cmath
from collections import namedtuple

basis = np.array([1, cmath.exp(2j * cmath.pi * 1 / 3), cmath.exp(2j * cmath.pi * 2 / 3)])
HistMatchResult = namedtuple("HistMatchResult", "idx length")

def find_all_longest(seq, max_len=None): 
        result = []
        i_search_start = len(seq) - 2
        while i_search_start > 0:
            i_sub = -1
            i_search = i_search_start
            length = 0
            while i_search >= 0 and seq[i_sub] == seq[i_search]:
                length += 1
                i_sub -= 1
                i_search -= 1
                if max_len is not None and length > max_len: break
            if length > 0: result.append(HistMatchResult(i_search_start + 1, length))
            i_search_start -= 1

        return sorted(result, key=operator.attrgetter("length"), reverse=True)

def complex_to_probs(z):
        probs = (2 * (z * basis.conjugate()).real + 1) / 3
        if min(probs) < 0: probs -= min(probs)
        return probs / sum(probs)

opp_hist = []
my_opp_hist = []
offset = 0
last_feat = None

def agent(obs, conf):
    global action, opp_hist, my_opp_hist, offset, last_feat

    if obs.step == 0:
        action = np.random.choice(3)
    else:
        my_opp_hist.append((obs.lastOpponentAction, action))
        opp_hist.append(action)

        if last_feat is not None:
            this_offset = (basis[(opp_hist[-1] + 1) % 3]) * last_feat.conjugate()
            offset = (1 - .01) * offset + .01 * this_offset

        hist_match = find_all_longest(my_opp_hist, 20)
        if not hist_match:
            pred = 0
        else:
            feat = basis[opp_hist[hist_match[0].idx]]
            last_feat = complex_to_probs(feat / abs(feat)) @ basis
            pred = last_feat * offset * cmath.exp(2j * cmath.pi * 1/9)

        probs = complex_to_probs(pred)
        if probs[np.argmax(probs)] > .334:
            action = (int(np.argmax(probs))+1)%3
        else:
            action = (np.random.choice(3, p=probs)+1)%3

    return action

In [None]:
%%writefile "anti_otm.py"

import numpy as np
import pandas as pd
import random

T = np.zeros((3, 3))
P = np.zeros((3, 3))

a1, a2 = None, None
last_action = None # track my action.


###########################################
# Original agent with modifications marked ->
###########################################

def anti_transition_agent(observation, configuration):
    global T, P, a1, a2, last_action
    if observation.step > 1:
        a1 = last_action   # on me only; take mirrored view on game
        T[a2, a1] += 1
        P = np.divide(T, np.maximum(1, T.sum(axis=1)).reshape(-1, 1))
        a2 = a1
        if np.sum(P[a1, :]) == 1:
            probs = P[a1,:]
            
            probs += 0.63 * np.roll(probs, 1)    # This is the magic addition of phase
            
            result = (int(probs.argmax()) + 1) % 3   # Changed to argmax instead of stochastic
        else:
            result = int(np.random.randint(3))
    else:
        if observation.step == 1:
            a2 = last_action    # on me only
        result = int(np.random.randint(3))
        
    result = (result + 1) % 3  # beat what he would have done
        
    last_action = result
        
    return result

This is the more comprehensive Multi-Layered Bandit by JamesMcGuigan. https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-multi-stage-decision-tree.
I have made some modifications to make it stronger (just including some other top public agents). This is infact close to my top agent on the leaderboard.

In [None]:
%%writefile new_mlb.py
##### ./memory_patterns.py #####


import random
import numpy as np 
import pandas as pd
from typing import List, Dict, Tuple, Any
from operator import itemgetter
from collections import defaultdict
import torch
from torch import nn, optim

from kaggle_environments import evaluate, make, utils
from kaggle_environments.envs.rps.utils import get_score
from kaggle_environments.envs.rps.agents import *

def batch(iterable, n=1):
    l = len(iterable)
    for ndx in range(0, l, n):
        yield iterable[ndx:min(ndx + n, l)]


class MemoryPatterns:
    def __init__(self, min_memory=2, max_memory=20, threshold=0.5, warmup=5, verbose=True):
        self.min_memory = min_memory
        self.max_memory = max_memory
        self.threshold  = threshold
        self.warmup     = warmup
        self.verbose    = verbose
        self.history = {
            "step":      [],
            "reward":    [],
            "opponent":  [],
            "pattern":   [],
            "action":    [],
            # "rotn_self": [],
            # "rotn_opp":  [],
        }
        pass
    
    def __call__(self, obs, conf):
        return self.agent(obs, conf)

    
    # obs  {'remainingOverageTime': 60, 'step': 1, 'reward': 0, 'lastOpponentAction': 0}
    # conf {'episodeSteps': 1000, 'actTimeout': 1, 'runTimeout': 1200, 'signs': 3, 'tieRewardThreshold': 20, 'agentTimeout': 60}
    def agent(self, obs, conf):
        # pass
        # pass
        self.obs  = obs
        self.conf = conf
        self.update_state(obs, conf)
        if obs.step < self.warmup:
            expected = self.random_action(obs, conf)
        else:
            for keys in [ ("opponent", "action"), ("opponent",) ]:
                # history  = self.generate_history(["opponent", "action"])  # "action" must be last
                history  = self.generate_history(["opponent"])  
                memories = self.build_memory(history) 
                patterns = self.find_patterns(history, memories)
                if len(patterns): break
            score, expected, pattern = self.find_best_pattern(patterns)
            self.history['pattern'].append(pattern)    
            if self.verbose:
                pass
                pass
                pass
                pass
                pass
                pass
                pass

        action = (expected + 1) % conf.signs
        self.history['action'].append(action)
        
        if self.verbose:
            pass
        return int(action) 
    
    
    def random_action(self, obs, conf) -> int:
        return random.randint(0, conf.signs-1)

    def sequential_action(self, obs, conf) -> int:
        return (obs.step + 1) % conf.signs

    
    def update_state(self, obs, conf):
        self.history['step'].append( obs.step )
        self.history['reward'].append( obs.reward )
        if obs.step != 0:
            self.history['opponent'].append( obs.lastOpponentAction )
            # rotn_self = (self.history['opponent'][-1] - self.history['opponent'][-2]) % conf.signs 
            # rotn_opp  = (self.history['opponent'][-1] - self.history['action'][-1]))  % conf.signs
            # self.history['rotn_self'].append( rotn_self )
            # self.history['rotn_opp'].append( rotn_opp )
        
        
    def generate_history(self, keys: List[str]) -> List[Tuple[int]]:
        # Reverse order to correctly match up arrays
        history = list(zip(*[ reversed(self.history[key]) for key in keys ]))
        history = list(reversed(history))
        return history
    
    
    def build_memory(self, history: List[Tuple[int]]) -> List[ Dict[Tuple[int], List[int]] ]:
        output    = [ dict() ] * self.min_memory
        expecteds = self.generate_history(["opponent"])
        for batch_size in range(self.min_memory, self.max_memory+1):
            if batch_size >= len(history): break  # ignore batch sizes larger than history
            output_batch    = defaultdict(lambda: [0,0,0])
            history_batches  = list(batch(history, batch_size+1))
            expected_batches = list(batch(expecteds, batch_size+1))
            for n, (pattern, expected_batch) in enumerate(zip(history_batches, expected_batches)):
                previous_pattern = tuple(pattern[:-1])
                expected         = (expected_batch[-1][-1] or 0) % self.conf.signs  # assume "action" is always last 
                output_batch[ previous_pattern ][ expected ] += 1
            output.append( dict(output_batch) )
        return output

    
    def find_patterns(self, history: List[Tuple[int]], memories: List[ Dict[Tuple[int], List[int]] ]) -> List[Tuple[float, int, Tuple[int]]]:
        patterns = []
        for n in range(1, self.max_memory+1):
            if n >= len(history): break
                
            pattern = tuple(history[-n:])
            if pattern in memories[n]:
                score    = np.std(memories[n][pattern])
                expected = np.argmax(memories[n][pattern])
                patterns.append( (score, expected, pattern) )
        patterns = sorted(patterns, key=itemgetter(0), reverse=True)
        return patterns
    
    
    def find_best_pattern(self, patterns: List[Tuple[float, int, Tuple[int]]] ) -> Tuple[float, int, Tuple[int]]:
        patterns       = sorted(patterns, key=itemgetter(0), reverse=True)
        pattern_scores = self.get_pattern_scores()
        for (score, expected, pattern) in patterns:
            break
            # if pattern in pattern_scores:
            #     if pattern_scores[pattern] > self.threshold:
            #         break
            #     else:
            #         expected += 1
            #         break
            # else:
            #     break
        else:
            score    = 0.0
            expected = self.random_action(self.obs, self.conf)
            pattern  = tuple()
        return score, expected, pattern
    
    
    def get_pattern_scores(self):
        pattern_rewards = defaultdict(list)
        for reward, pattern in self.generate_history(["reward", "pattern"]):
            pattern_rewards[pattern].append( reward )
        pattern_scores = { pattern: np.mean(rewards) for patten, rewards in pattern_rewards.items() }
        return pattern_scores
                    
            
            
instance = MemoryPatterns()
def memory_patterns(obs, conf):
    return instance(obs, conf)



##### ./reactionary.py #####

import random
from kaggle_environments.envs.rps.utils import get_score


last_react_action = None


def reactionary(observation, configuration):
    global last_react_action
    if observation.step == 0:
        last_react_action = random.randrange(0, configuration.signs)
    elif get_score(last_react_action, observation.lastOpponentAction) <= 1:
        last_react_action = (observation.lastOpponentAction + 1) % configuration.signs

    return last_react_action


##### ./decision_tree_3.py #####


import time
import os
import random
import numpy as np
from typing import List, Dict
from sklearn.tree import DecisionTreeClassifier

def random_agent(observation, configuration):
    return random.randint(0, configuration.signs-1)

def rock_agent(observation, configuration):
    return 0

def paper_agent(observation, configuration):
    return 1

def scissors_agent(observation, configuration):
    return 2

def sequential_agent(observation, configuration):
    return observation.step % configuration.signs



def get_winstats(decision_tree_history_2) -> Dict[str,int]:
    total = len(decision_tree_history_2['action'])
    wins = 0
    draw = 0
    loss = 0 
    for n in range(total):
        if   decision_tree_history_2['action'][n] == decision_tree_history_2['opponent'][n] + 1: wins +=  1
        elif decision_tree_history_2['action'][n] == decision_tree_history_2['opponent'][n]:     draw +=  1
        elif decision_tree_history_2['action'][n] == decision_tree_history_2['opponent'][n] - 1: loss +=  1
    return { "wins": wins, "draw": draw, "loss": loss }

def get_winrate(decision_tree_history_2):
    winstats = get_winstats(decision_tree_history_2)
    winrate  = winstats['wins'] / (winstats['wins'] + winstats['loss']) if (winstats['wins'] + winstats['loss']) else 0
    return winrate
    
    
# Initialize starting decision_tree_history_2
decision_tree_history_2 = {
    "step":        [],
    "prediction1": [],
    "prediction2": [],
    "expected":    [],
    "action":      [],
    "opponent":    [],
}

# NOTE: adding statistics causes the DecisionTree to make random moves 
def get_statistics(values) -> List[float]:
    values = np.array(values)
    return [
        np.count_nonzero(values == n) / len(values)
        if len(values) else 0.0
        for n in [0,1,2]
    ]


# observation   =  {'step': 1, 'lastOpponentAction': 1}
# configuration =  {'episodeSteps': 10, 'agentTimeout': 60, 'actTimeout': 1, 'runTimeout': 1200, 'isProduction': False, 'signs': 3}
def decision_tree_agent_3(observation, configuration, window=5, stages=2, random_freq=0.66, warmup_period=10, max_samples=1000):    
    global decision_tree_history_2
    warmup_period   = warmup_period  # if os.environ.get('KAGGLE_KERNEL_RUN_TYPE','') != 'Interactive' else 0
    models          = [ None ] + [ DecisionTreeClassifier() ] * stages
    
    time_start      = time.perf_counter()
    actions         = list(range(configuration.signs))  # [0,1,2]
    
    step            = observation.step
    last_action     = decision_tree_history_2['action'][-1]          if len(decision_tree_history_2['action']) else 2
    opponent_action = observation.lastOpponentAction if observation.step > 0   else 2
        
    if observation.step > 0:
        decision_tree_history_2['opponent'].append(opponent_action)
        
    winrate  = get_winrate(decision_tree_history_2)
    winstats = get_winstats(decision_tree_history_2)
    
    # Set default values     
    prediction1 = random.randint(0,2)
    prediction2 = random.randint(0,2)
    prediction3 = random.randint(0,2)
    expected    = random.randint(0,2)

    # We need at least some turns of decision_tree_history_2 for DecisionTreeClassifier to work
    if observation.step >= window:
        # First we try to predict the opponents next move based on move decision_tree_history_2
        # TODO: create windowed decision_tree_history_2
        try:
            n_start = max(1, len(decision_tree_history_2['opponent']) - window - max_samples) 
            # pass
            if stages >= 1:
                X = np.stack([
                    np.array([
                        # get_statistics(decision_tree_history_2['action'][:n+window]),
                        # get_statistics(decision_tree_history_2['opponent'][:n-1+window]),
                        decision_tree_history_2['action'][n:n+window], 
                        decision_tree_history_2['opponent'][n:n+window]
                    ]).flatten()
                    for n in range(n_start,len(decision_tree_history_2['opponent'])-window-warmup_period) 
                ])
                Y = np.array([
                    decision_tree_history_2['opponent'][n+window]
                    for n in range(n_start,len(decision_tree_history_2['opponent'])-window-warmup_period) 
                ])  
                Z = np.array([
                    # get_statistics(decision_tree_history_2['action']),
                    # get_statistics(decision_tree_history_2['opponent']),
                    decision_tree_history_2['action'][-window+1:] + [ last_action ], 
                    decision_tree_history_2['opponent'][-window:] 
                ]).flatten().reshape(1, -1)

                models[1].fit(X, Y)
                expected = prediction1 = models[1].predict(Z)[0]

            if stages >= 2:
                # Now retrain including prediction decision_tree_history_2
                X = np.stack([
                    np.array([
                        # get_statistics(decision_tree_history_2['action'][:n+window]),
                        # get_statistics(decision_tree_history_2['prediction1'][:n+window]),
                        # get_statistics(decision_tree_history_2['opponent'][:n-1+window]),
                        decision_tree_history_2['action'][n:n+window], 
                        decision_tree_history_2['prediction1'][n:n+window],
                        decision_tree_history_2['opponent'][n:n+window],
                    ]).flatten()
                    for n in range(n_start,len(decision_tree_history_2['opponent'])-window-warmup_period) 
                ])
                Y = np.array([
                    decision_tree_history_2['opponent'][n+window]
                    for n in range(n_start,len(decision_tree_history_2['opponent'])-window-warmup_period) 
                ])  
                Z = np.array([
                    # get_statistics(decision_tree_history_2['action']),
                    # get_statistics(decision_tree_history_2['prediction1']),
                    # get_statistics(decision_tree_history_2['opponent']),
                    decision_tree_history_2['action'][-window+1:]      + [ last_action ], 
                    decision_tree_history_2['prediction1'][-window+1:] + [ prediction1 ],
                    decision_tree_history_2['opponent'][-window:] 
                ]).flatten().reshape(1, -1)

                models[2].fit(X, Y)
                expected = prediction2 = models[2].predict(Z)[0]

            if stages >= 3:
                # Now retrain including prediction decision_tree_history_2
                X = np.stack([
                    np.array([
                        # get_statistics(decision_tree_history_2['action'][:n+window]),
                        # get_statistics(decision_tree_history_2['prediction1'][:n+window]),
                        # get_statistics(decision_tree_history_2['prediction2'][:n+window]),
                        # get_statistics(decision_tree_history_2['opponent'][:n-1+window]),
                        decision_tree_history_2['action'][n:n+window], 
                        decision_tree_history_2['prediction1'][n:n+window],
                        decision_tree_history_2['prediction2'][n:n+window],
                        decision_tree_history_2['opponent'][n:n+window],
                    ]).flatten()
                    for n in range(n_start,len(decision_tree_history_2['opponent'])-window-warmup_period) 
                ])
                Y = np.array([
                    decision_tree_history_2['opponent'][n+window]
                    for n in range(n_start,len(decision_tree_history_2['opponent'])-window-warmup_period) 
                ])  
                Z = np.array([
                    # get_statistics(decision_tree_history_2['action']),
                    # get_statistics(decision_tree_history_2['prediction1']),
                    # get_statistics(decision_tree_history_2['prediction2']),
                    # get_statistics(decision_tree_history_2['opponent']),
                    decision_tree_history_2['action'][-window+1:]      + [ last_action ], 
                    decision_tree_history_2['prediction1'][-window+1:] + [ prediction1 ],
                    decision_tree_history_2['prediction2'][-window+1:] + [ prediction2 ],
                    decision_tree_history_2['opponent'][-window:] 
                ]).flatten().reshape(1, -1)

                models[3].fit(X, Y)
                expected = prediction3 = models[3].predict(Z)[0]
        
        except Exception as exception:
            pass
                    
    # During the warmup period, play random to get a feel for the opponent 
    if (observation.step <= max(warmup_period,window)):
        actor  = 'warmup'
        action = random_agent(observation, configuration)    
    
    # Play a purely random move occasionally, which will hopefully distort any opponent statistics
    elif (random.random() <= random_freq):
        actor  = 'random'
        action = random_agent(observation, configuration)
        
    # But mostly use DecisionTreeClassifier to predict the next move
    else:
        actor  = 'DecisionTree'
        action = (expected + 1) % configuration.signs
    
    # Persist state
    decision_tree_history_2['step'].append(step)
    decision_tree_history_2['prediction1'].append(prediction1)
    decision_tree_history_2['prediction2'].append(prediction2)
    decision_tree_history_2['expected'].append(expected)
    decision_tree_history_2['action'].append(action)
    if observation.step == 0:  # keep arrays equal length
        decision_tree_history_2['opponent'].append(random.randint(0, 2))


    # Print debug information
    time_taken = time.perf_counter() - time_start
    # pass    
    pass    
    return int(action)



##### ./testing_please_ignore.py #####

code_ignore = compile(
    """
from collections import defaultdict
import operator
import random
if input == "":
    score  = {'RR': 0, 'PP': 0, 'SS': 0, \
              'PR': 1, 'RS': 1, 'SP': 1, \
              'RP': -1, 'SR': -1, 'PS': -1,}
    cscore = {'RR': 'r', 'PP': 'r', 'SS': 'r', \
              'PR': 'b', 'RS': 'b', 'SP': 'b', \
              'RP': 'c', 'SR': 'c', 'PS': 'c',}
    beat = {'P': 'S', 'S': 'R', 'R': 'P'}
    cede = {'P': 'R', 'S': 'P', 'R': 'S'}
    rps = ['R', 'P', 'S']
    wlt = {1: 0, -1: 1, 0: 2}

    def counter_prob(probs):
        weighted_list = []
        for h in rps:
            weighted = 0
            for p in probs.keys():
                points = score[h + p]
                prob = probs[p]
                weighted += points * prob
            weighted_list.append((h, weighted))

        return max(weighted_list, key=operator.itemgetter(1))[0]

    played_probs = defaultdict(lambda: 1)
    dna_probs = [
        defaultdict(lambda: defaultdict(lambda: 1)) for i in range(18)
    ]

    wlt_probs = [defaultdict(lambda: 1) for i in range(9)]

    answers = [{'c': 1, 'b': 1, 'r': 1} for i in range(12)]

    patterndict = [defaultdict(str) for i in range(6)]

    consec_strat_usage = [[0] * 6, [0] * 6,
                          [0] * 6]  #consecutive strategy usage
    consec_strat_candy = [[], [], []]  #consecutive strategy candidates

    output = random.choice(rps)
    histories = ["", "", ""]
    dna = ["" for i in range(12)]

    sc = 0
    strats = [[] for i in range(3)]
else:
    prev_sc = sc

    sc = score[output + input]
    for j in range(3):
        prev_strats = strats[j][:]
        for i, c in enumerate(consec_strat_candy[j]):
            if c == input:
                consec_strat_usage[j][i] += 1
            else:
                consec_strat_usage[j][i] = 0
        m = max(consec_strat_usage[j])
        strats[j] = [
            i for i, c in enumerate(consec_strat_candy[j])
            if consec_strat_usage[j][i] == m
        ]

        for s1 in prev_strats:
            for s2 in strats[j]:
                wlt_probs[j * 3 + wlt[prev_sc]][chr(s1) + chr(s2)] += 1

        if dna[2 * j + 0] and dna[2 * j + 1]:
            answers[2 * j + 0][cscore[input + dna[2 * j + 0]]] += 1
            answers[2 * j + 1][cscore[input + dna[2 * j + 1]]] += 1
        if dna[2 * j + 6] and dna[2 * j + 7]:
            answers[2 * j + 6][cscore[input + dna[2 * j + 6]]] += 1
            answers[2 * j + 7][cscore[input + dna[2 * j + 7]]] += 1

        for length in range(min(10, len(histories[j])), 0, -2):
            pattern = patterndict[2 * j][histories[j][-length:]]
            if pattern:
                for length2 in range(min(10, len(pattern)), 0, -2):
                    patterndict[2 * j +
                                1][pattern[-length2:]] += output + input
            patterndict[2 * j][histories[j][-length:]] += output + input
    played_probs[input] += 1
    dna_probs[0][dna[0]][input] += 1
    dna_probs[1][dna[1]][input] += 1
    dna_probs[2][dna[1] + dna[0]][input] += 1
    dna_probs[9][dna[6]][input] += 1
    dna_probs[10][dna[6]][input] += 1
    dna_probs[11][dna[7] + dna[6]][input] += 1

    histories[0] += output + input
    histories[1] += input
    histories[2] += output

    dna = ["" for i in range(12)]
    for j in range(3):
        for length in range(min(10, len(histories[j])), 0, -2):
            pattern = patterndict[2 * j][histories[j][-length:]]
            if pattern != "":
                dna[2 * j + 1] = pattern[-2]
                dna[2 * j + 0] = pattern[-1]
                for length2 in range(min(10, len(pattern)), 0, -2):
                    pattern2 = patterndict[2 * j + 1][pattern[-length2:]]
                    if pattern2 != "":
                        dna[2 * j + 7] = pattern2[-2]
                        dna[2 * j + 6] = pattern2[-1]
                        break
                break

    probs = {}
    for hand in rps:
        probs[hand] = played_probs[hand]

    for j in range(3):
        if dna[j * 2] and dna[j * 2 + 1]:
            for hand in rps:
                probs[hand] *= dna_probs[j*3+0][dna[j*2+0]][hand] * \
                               dna_probs[j*3+1][dna[j*2+1]][hand] * \
                      dna_probs[j*3+2][dna[j*2+1]+dna[j*2+0]][hand]
                probs[hand] *= answers[j*2+0][cscore[hand+dna[j*2+0]]] * \
                               answers[j*2+1][cscore[hand+dna[j*2+1]]]
            consec_strat_candy[j] = [dna[j*2+0], beat[dna[j*2+0]], cede[dna[j*2+0]],\
                                     dna[j*2+1], beat[dna[j*2+1]], cede[dna[j*2+1]]]
            strats_for_hand = {'R': [], 'P': [], 'S': []}
            for i, c in enumerate(consec_strat_candy[j]):
                strats_for_hand[c].append(i)
            pr = wlt_probs[wlt[sc] + 3 * j]
            for hand in rps:
                for s1 in strats[j]:
                    for s2 in strats_for_hand[hand]:
                        probs[hand] *= pr[chr(s1) + chr(s2)]
        else:
            consec_strat_candy[j] = []
    for j in range(3):
        if dna[j * 2 + 6] and dna[j * 2 + 7]:
            for hand in rps:
                probs[hand] *= dna_probs[j*3+9][dna[j*2+6]][hand] * \
                               dna_probs[j*3+10][dna[j*2+7]][hand] * \
                      dna_probs[j*3+11][dna[j*2+7]+dna[j*2+6]][hand]
                probs[hand] *= answers[j*2+6][cscore[hand+dna[j*2+6]]] * \
                               answers[j*2+7][cscore[hand+dna[j*2+7]]]

    output = counter_prob(probs)
""", '<string>', 'exec')
gg_ignore = {}


def testing_please_ignore(observation, configuration):
    global gg_ignore
    global code_ignore
    inp = ''
    try:
        inp = 'RPS'[observation.lastOpponentAction]
    except:
        pass
    gg_ignore['input'] = inp
    exec(code_ignore, gg_ignore)
    return {'R': 0, 'P': 1, 'S': 2}[gg_ignore['output']]



##### ./pi.py #####

import re

PI = "3.14159 26535 89793 23846 26433 83279 50288 41971 69399 37510 58209 74944 59230 78164 06286 20899 86280 34825 34211 70679 82148 08651 32823 06647 09384 46095 50582 23172 53594 08128 48111 74502 84102 70193 85211 05559 64462 29489 54930 38196 44288 10975 66593 34461 28475 64823 37867 83165 27120 19091 45648 56692 34603 48610 45432 66482 13393 60726 02491 41273 72458 70066 06315 58817 48815 20920 96282 92540 91715 36436 78925 90360 01133 05305 48820 46652 13841 46951 94151 16094 33057 27036 57595 91953 09218 61173 81932 61179 31051 18548 07446 23799 62749 56735 18857 52724 89122 79381 83011 94912 98336 73362 44065 66430 86021 39494 63952 24737 19070 21798 60943 70277 05392 17176 29317 67523 84674 81846 76694 05132 00056 81271 45263 56082 77857 71342 75778 96091 73637 17872 14684 40901 22495 34301 46549 58537 10507 92279 68925 89235 42019 95611 21290 21960 86403 44181 59813 62977 47713 09960 51870 72113 49999 99837 29780 49951 05973 17328 16096 31859 50244 59455 34690 83026 42522 30825 33446 85035 26193 11881 71010 00313 78387 52886 58753 32083 81420 61717 76691 47303 59825 34904 28755 46873 11595 62863 88235 37875 93751 95778 18577 80532 17122 68066 13001 92787 66111 95909 21642 01989 38095 25720 10654 85863 27886 59361 53381 82796 82303 01952 03530 18529 68995 77362 25994 13891 24972 17752 83479 13151 55748 57242 45415 06959 50829 53311 68617 27855 88907 50983 81754 63746 49393 19255 06040 09277 01671 13900 98488 24012 85836 16035 63707 66010 47101 81942 95559 61989 46767 83744 94482 55379 77472 68471 04047 53464 62080 46684 25906 94912 93313 67702 89891 52104 75216 20569 66024 05803 81501 93511 25338 24300 35587 64024 74964 73263 91419 92726 04269 92279 67823 54781 63600 93417 21641 21992 45863 15030 28618 29745 55706 74983 85054 94588 58692 69956 90927 21079 75093 02955 32116 53449 87202 75596 02364 80665 49911 98818 34797 75356 63698 07426 54252 78625 51818 41757 46728 90977 77279 38000 81647 06001 61452 49192 17321 72147 72350 14144 19735 68548 16136 11573 52552 13347 57418 49468 43852 33239 07394 14333 45477 62416 86251 89835 69485 56209 92192 22184 27255 02542 56887 67179 04946 01653 46680 49886 27232 79178 60857 84383 82796 79766 81454 10095 38837 86360 95068 00642 25125 20511 73929 84896 08412 84886 26945 60424 19652 85022 21066 11863 06744 27862 20391 94945 04712 37137 86960 95636 43719 17287 46776 46575 73962 41389 08658 32645 99581 33904 78027 59009 94657 64078 95126 94683 98352 59570 98258 22620 52248 94077 26719 47826 84826 01476 99090 26401 36394 43745 53050 68203 49625 24517 49399 65143 14298 09190 65925 09372 21696 46151 57098 58387 41059 78859 59772 97549 89301 61753 92846 81382 68683 86894 27741 55991 85592 52459 53959 43104 99725 24680 84598 72736 44695 84865 38367 36222 62609 91246 08051 24388 43904 51244 13654 97627 80797 71569 14359 97700 12961 60894 41694 86855 58484 06353 42207 22258 28488 64815 84560 28506 01684 27394 52267 46767 88952 52138 52254 99546 66727 82398 64565 96116 35488 62305 77456 49803 55936 34568 17432 41125 15076 06947 94510 96596 09402 52288 79710 89314 56691 36867 22874 89405 60101 50330 86179 28680 92087 47609 17824 93858 90097 14909 67598 52613 65549 78189 31297 84821 68299 89487 22658 80485 75640 14270 47755 51323 79641 45152 37462 34364 54285 84447 95265 86782 10511 41354 73573 95231 13427 16610 21359 69536 23144 29524 84937 18711 01457 65403 59027 99344 03742 00731 05785 39062 19838 74478 08478 48968 33214 45713 86875 19435 06430 21845 31910 48481 00537 06146 80674 91927 81911 97939 95206 14196 63428 75444 06437 45123 71819 21799 98391 01591 95618 14675 14269 12397 48940 90718 64942 31961 56794 52080 95146 55022 52316 03881 93014 20937 62137 85595 66389 37787 08303 90697 92077 34672 21825 62599 66150 14215 03068 03844 77345 49202 60541 46659 25201 49744 28507 32518 66600 21324 34088 19071 04863 31734 64965 14539 05796 26856 10055 08106 65879 69981 63574 73638 40525 71459 10289 70641 40110 97120 62804 39039 75951 56771 57700 42033 78699 36007 23055 87631 76359 42187 31251 47120 53292 81918 26186 12586 73215 79198 41484 88291 64470 60957 52706 95722 09175 67116 72291 09816 90915 28017 35067 12748 58322 28718 35209 35396 57251 21083 57915 13698 82091 44421 00675 10334 67110 31412 67111 36990 86585 16398 31501 97016 51511 68517 14376 57618 35155 65088 49099 89859 98238 73455 28331 63550 76479 18535 89322 61854 89632 13293 30898 57064 20467 52590 70915 48141 65498 59461 63718 02709 81994 30992 44889 57571 28289 05923 23326 09729 97120 84433 57326 54893 82391 19325 97463 66730 58360 41428 13883 03203 82490 37589 85243 74417 02913 27656 18093 77344 40307 07469 21120 19130 20330 38019 76211 01100 44929 32151 60842 44485 96376 69838 95228 68478 31235 52658 21314 49576 85726 24334 41893 03968 64262 43410 77322 69780 28073 18915 44110 10446 82325 27162 01052 65227 21116 60396 66557 30925 47110 55785 37634 66820 65310 98965 26918 62056 47693 12570 58635 66201 85581 00729 36065 98764 86117 91045 33488 50346 11365 76867 53249 44166 80396 26579 78771 85560 84552 96541 26654 08530 61434 44318 58676 97514 56614 06800 70023 78776 59134 40171 27494 70420 56223 05389 94561 31407 11270 00407 85473 32699 39081 45466 46458 80797 27082 66830 63432 85878 56983 05235 80893 30657 57406 79545 71637 75254 20211 49557 61581 40025 01262 28594 13021 64715 50979 25923 09907 96547 37612 55176 56751 35751 78296 66454 77917 45011 29961 48903 04639 94713 29621 07340 43751 89573 59614 58901 93897 13111 79042 97828 56475 03203 19869 15140 28708 08599 04801 09412 14722 13179 47647 77262 24142 54854 54033 21571 85306 14228 81375 85043 06332 17518 29798 66223 71721 59160 77166 92547 48738 98665 49494 50114 65406 28433 66393 79003 97692 65672 14638 53067 36096 57120 91807 63832 71664 16274 88880 07869 25602 90228 47210 40317 21186 08204 19000 42296 61711 96377 92133 75751 14959 50156 60496 31862 94726 54736 42523 08177 03675 15906 73502 35072 83540 56704 03867 43513 62222 47715 89150 49530 98444 89333 09634 08780 76932 59939 78054 19341 44737 74418 42631 29860 80998 88687 41326 04721 56951 62396 58645 73021 63159 81931 95167 35381 29741 67729 47867 24229 24654 36680 09806 76928 23828 06899 64004 82435 40370 14163 14965 89794 09243 23789 69070 69779 42236 25082 21688 95738 37986 23001 59377 64716 51228 93578 60158 81617 55782 97352 33446 04281 51262 72037 34314 65319 77774 16031 99066 55418 76397 92933 44195 21541 34189 94854 44734 56738 31624 99341 91318 14809 27777 10386 38773 43177 20754 56545 32207 77092 12019 05166 09628 04909 26360 19759 88281 61332 31666 36528 61932 66863 36062 73567 63035 44776 28035 04507 77235 54710 58595 48702 79081 43562 40145 17180 62464 36267 94561 27531 81340 78330 33625 42327 83944 97538 24372 05835 31147 71199 26063 81334 67768 79695 97030 98339 13077 10987 04085 91337 46414 42822 77263 46594 70474 58784 77872 01927 71528 07317 67907 70715 72134 44730 60570 07334 92436 93113 83504 93163 12840 42512 19256 51798 06941 13528 01314 70130 47816 43788 51852 90928 54520 11658 39341 96562 13491 43415 95625 86586 55705 52690 49652 09858 03385 07224 26482 93972 85847 83163 05777 75606 88876 44624 82468 57926 03953 52773 48030 48029 00587 60758 25104 74709 16439 61362 67604 49256 27420 42083 20856 61190 62545 43372 13153 59584 50687 72460 29016 18766 79524 06163 42522 57719 54291 62991 93064 55377 99140 37340 43287 52628 88963 99587 94757 29174 64263 57455 25407 90914 51357 11136 94109 11939 32519 10760 20825 20261 87985 31887 70584 29725 91677 81314 96990 09019 21169 71737 27847 68472 68608 49003 37702 42429 16513 00500 51683 23364 35038 95170 29893 92233 45172 20138 12806 96501 17844 08745 19601 21228 59937 16231 30171 14448 46409 03890 64495 44400 61986 90754 85160 26327 50529 83491 87407 86680 88183 38510 22833 45085 04860 82503 93021 33219 71551 84306 35455 00766 82829 49304 13776 55279 39751 75461 39539 84683 39363 83047 46119 96653 85815 38420 56853 38621 86725 23340 28308 71123 28278 92125 07712 62946 32295 63989 89893 58211 67456 27010 21835 64622 01349 67151 88190 97303 81198 00497 34072 39610 36854 06643 19395 09790 19069 96395 52453 00545 05806 85501 95673 02292 19139 33918 56803 44903 98205 95510 02263 53536 19204 19947 45538 59381 02343 95544 95977 83779 02374 21617 27111 72364 34354 39478 22181 85286 24085 14006 66044 33258 88569 86705 43154 70696 57474 58550 33232 33421 07301 54594 05165 53790 68662 73337 99585 11562 57843 22988 27372 31989 87571 41595 78111 96358 33005 94087 30681 21602 87649 62867 44604 77464 91599 50549 73742 56269 01049 03778 19868 35938 14657 41268 04925 64879 85561 45372 34786 73303 90468 83834 36346 55379 49864 19270 56387 29317 48723 32083 76011 23029 91136 79386 27089 43879 93620 16295 15413 37142 48928 30722 01269 01475 46684 76535 76164 77379 46752 00490 75715 55278 19653 62132 39264 06160 13635 81559 07422 02020 31872 77605 27721 90055 61484 25551 87925 30343 51398 44253 22341 57623 36106 42506 39049 75008 65627 10953 59194 65897 51413 10348 22769 30624 74353 63256 91607 81547 81811 52843 66795 70611 08615 33150 44521 27473 92454 49454 23682 88606 13408 41486 37767 00961 20715 12491 40430 27253 86076 48236 34143 34623 51897 57664 52164 13767 96903 14950 19108 57598 44239 19862 91642 19399 49072 36234 64684 41173 94032 65918 40443 78051 33389 45257 42399 50829 65912 28508 55582 15725 03107 12570 12668 30240 29295 25220 11872 67675 62204 15420 51618 41634 84756 51699 98116 14101 00299 60783 86909 29160 30288 40026 91041 40792 88621 50784 24516 70908 70006 99282 12066 04183 71806 53556 72525 32567 53286 12910 42487 76182 58297 65157 95984 70356 22262 93486 00341 58722 98053 49896 50226 29174 87882 02734 20922 22453 39856 26476 69149 05562 84250 39127 57710 28402 79980 66365 82548 89264 88025 45661 01729 67026 64076 55904 29099 45681 50652 65305 37182 94127 03369 31378 51786 09040 70866 71149 65583 43434 76933 85781 71138 64558 73678 12301 45876 87126 60348 91390 95620 09939 36103 10291 61615 28813 84379 09904 23174 73363 94804 57593 14931 40529 76347 57481 19356 70911 01377 51721 00803 15590 24853 09066 92037 67192 20332 29094 33467 68514 22144 77379 39375 17034 43661 99104 03375 11173 54719 18550 46449 02636 55128 16228 82446 25759 16333 03910 72253 83742 18214 08835 08657 39177 15096 82887 47826 56995 99574 49066 17583 44137 52239 70968 34080 05355 98491 75417 38188 39994 46974 86762 65516 58276 58483 58845 31427 75687 90029 09517 02835 29716 34456 21296 40435 23117 60066 51012 41200 65975 58512 76178 58382 92041 97484 42360 80071 93045 76189 32349 22927 96501 98751 87212 72675 07981 25547 09589 04556 35792 12210 33346 69749 92356 30254 94780 24901 14195 21238 28153 09114 07907 38602 51522 74299 58180 72471 62591 66854 51333 12394 80494 70791 19153 26734 30282 44186 04142 63639 54800 04480 02670 49624 82017 92896 47669 75831 83271 31425 17029 69234 88962 76684 40323 26092 75249 60357 99646 92565 04936 81836 09003 23809 29345 95889 70695 36534 94060 34021 66544 37558 90045 63288 22505 45255 64056 44824 65151 87547 11962 18443 96582 53375 43885 69094 11303 15095 26179 37800 29741 20766 51479 39425 90298 96959 46995 56576 12186 56196 73378 62362 56125 21632 08628 69222 10327 48892 18654 36480 22967 80705 76561 51446 32046 92790 68212 07388 37781 42335 62823 60896 32080 68222 46801 22482 61177 18589 63814 09183 90367 36722 20888 32151 37556 00372 79839 40041 52970 02878 30766 70944 47456 01345 56417 25437 09069 79396 12257 14298 94671 54357 84687 88614 44581 23145 93571 98492 25284 71605 04922 12424 70141 21478 05734 55105 00801 90869 96033 02763 47870 81081 75450 11930 71412 23390 86639 38339 52942 57869 05076 43100 63835 19834 38934 15961 31854 34754 64955 69781 03829 30971 64651 43840 70070 73604 11237 35998 43452 25161 05070 27056 23526 60127 64848 30840 76118 30130 52793 20542 74628 65403 60367 45328 65105 70658 74882 25698 15793 67897 66974 22057 50596 83440 86973 50201 41020 67235 85020 07245 22563 26513 41055 92401 90274 21624 84391 40359 98953 53945 90944 07046 91209 14093 87001 26456 00162 37428 80210 92764 57931 06579 22955 24988 72758 46101 26483 69998 92256 95968 81592 05600 10165 52563 7567"
PI = re.sub('[^1-9]', '', PI)

# observation   =  {'step': 1, 'lastOpponentAction': 1}
# configuration =  {'episodeSteps': 10, 'agentTimeout': 60, 'actTimeout': 1, 'runTimeout': 1200, 'isProduction': False, 'signs': 3}
def pi_agent(observation, configuration):    
    action = int(PI[observation.step]) % configuration.signs
    return int(action)



##### ./statistical.py #####


import random
import pydash
from collections import Counter

# Create a small amount of starting statistical_history
statistical_history = {
    "guess":      [0,1,2],
    "prediction": [0,1,2],
    "expected":   [0,1,2],
    "action":     [1,2,0],
    "opponent":   [0,1],
    "rotn":       [0,1],
}
# observation   =  {'step': 1, 'lastOpponentAction': 1}
# configuration =  {'episodeSteps': 1000, 'agentTimeout': 60, 'actTimeout': 1, 'runTimeout': 1200, 'isProduction': False, 'signs': 3}
def statistical_prediction_agent(observation, configuration):    
    global statistical_history
    actions          = list(range(configuration.signs))  # [0,1,2]
    last_action      = statistical_history['action'][-1]
    prev_opp_action  = statistical_history['opponent'][-1]
    opponent_action  = observation.lastOpponentAction if observation.step > 0 else 2
    rotn             = (opponent_action - prev_opp_action) % configuration.signs

    statistical_history['opponent'].append(opponent_action)
    statistical_history['rotn'].append(rotn)
    
    # Make weighted random guess based on the complete move statistical_history, weighted towards relative moves based on our last action 
    move_frequency   = Counter(statistical_history['rotn'])
    action_frequency = Counter(zip(statistical_history['action'], statistical_history['rotn'])) 
    move_weights     = [   move_frequency.get(n, 1) 
                         + action_frequency.get((last_action,n), 1) 
                         for n in range(configuration.signs) ] 
    guess            = random.choices( population=actions, weights=move_weights, k=1 )[0]
    
    # Compare our guess to how our opponent actually played
    guess_frequency  = Counter(zip(statistical_history['guess'], statistical_history['rotn']))
    guess_weights    = [ guess_frequency.get((guess,n), 1) 
                         for n in range(configuration.signs) ]
    prediction       = random.choices( population=actions, weights=guess_weights, k=1 )[0]

    # Repeat, but based on how many times our prediction was correct
    pred_frequency   = Counter(zip(statistical_history['prediction'], statistical_history['rotn']))
    pred_weights     = [ pred_frequency.get((prediction,n), 1) 
                         for n in range(configuration.signs) ]
    expected         = random.choices( population=actions, weights=pred_weights, k=1 )[0]

    
    # Slowly decay to 50% pure randomness as the match progresses
    pure_random_chance = observation.step / (configuration.episodeSteps * 2)
    if random.random() < pure_random_chance:
        action = random.randint(0, configuration.signs-1)
        is_pure_random_chance = True
    else:
        # Play the +1 counter move
        # action = (expected + 1) % configuration.signs                  # without rotn
        action = (opponent_action + expected + 1) % configuration.signs  # using   rotn
        is_pure_random_chance = False
    
    # Persist state
    statistical_history['guess'].append(guess)
    statistical_history['prediction'].append(prediction)
    statistical_history['expected'].append(expected)
    statistical_history['action'].append(action)

    # Print debug information
    pass
    pass
    pass
    pass
    pass
    pass
    pass
    pass
    
    return action



##### ./dllu1.py #####

code_dllu1 = compile(
    """
# see also www.dllu.net/rps
# remember, rpsdllu1_agentner.py is extremely useful for offline testing, 
# here's a screenshot: http://i.imgur.com/DcO9M.png
import random
numPre = 30
numMeta = 6
if not input:
    limit = 8
    beat={'R':'P','P':'S','S':'R'}
    moves=['','','','']
    pScore=[[5]*numPre,[5]*numPre,[5]*numPre,[5]*numPre,[5]*numPre,[5]*numPre]
    centrifuge={'RP':0,'PS':1,'SR':2,'PR':3,'SP':4,'RS':5,'RR':6,'PP':7,'SS':8}
    centripete={'R':0,'P':1,'S':2}
    soma = [0,0,0,0,0,0,0,0,0];
    rps = [1,1,1];
    a="RPS"
    best = [0,0,0];
    length=0
    p=[random.choice("RPS")]*numPre
    m=[random.choice("RPS")]*numMeta
    mScore=[5,2,5,2,4,2]
else:
    for i in range(numPre):
        pp = p[i]
        bpp = beat[pp]
        bbpp = beat[bpp]
        pScore[0][i]=0.9*pScore[0][i]+((input==pp)-(input==bbpp))*3
        pScore[1][i]=0.9*pScore[1][i]+((output==pp)-(output==bbpp))*3
        pScore[2][i]=0.87*pScore[2][i]+(input==pp)*3.3-(input==bpp)*1.2-(input==bbpp)*2.3
        pScore[3][i]=0.87*pScore[3][i]+(output==pp)*3.3-(output==bpp)*1.2-(output==bbpp)*2.3
        pScore[4][i]=(pScore[4][i]+(input==pp)*3)*(1-(input==bbpp))
        pScore[5][i]=(pScore[5][i]+(output==pp)*3)*(1-(output==bbpp))
    for i in range(numMeta):
        mScore[i]=0.96*(mScore[i]+(input==m[i])-(input==beat[beat[m[i]]]))
    soma[centrifuge[input+output]] +=1;
    rps[centripete[input]] +=1;
    moves[0]+=str(centrifuge[input+output])
    moves[1]+=input
    moves[2]+=output
    length+=1
    for y in range(3):
        j=min([length,limit])
        while j>=1 and not moves[y][length-j:length] in moves[y][0:length-1]:
            j-=1
        i = moves[y].rfind(moves[y][length-j:length],0,length-1)
        p[0+2*y] = moves[1][j+i] 
        p[1+2*y] = beat[moves[2][j+i]]
    j=min([length,limit])
    while j>=2 and not moves[0][length-j:length-1] in moves[0][0:length-2]:
        j-=1
    i = moves[0].rfind(moves[0][length-j:length-1],0,length-2)
    if j+i>=length:
        p[6] = p[7] = random.choice("RPS")
    else:
        p[6] = moves[1][j+i] 
        p[7] = beat[moves[2][j+i]]
        
    best[0] = soma[centrifuge[output+'R']]*rps[0]/rps[centripete[output]]
    best[1] = soma[centrifuge[output+'P']]*rps[1]/rps[centripete[output]]
    best[2] = soma[centrifuge[output+'S']]*rps[2]/rps[centripete[output]]
    p[8] = p[9] = a[best.index(max(best))]
    
    for i in range(10,numPre):
        p[i]=beat[beat[p[i-10]]]
        
    for i in range(0,numMeta,2):
        m[i]=       p[pScore[i  ].index(max(pScore[i  ]))]
        m[i+1]=beat[p[pScore[i+1].index(max(pScore[i+1]))]]
output = beat[m[mScore.index(max(mScore))]]
if max(mScore)<0.07 or random.randint(3,40)>length:
    output=beat[random.choice("RPS")]
""", '<string>', 'exec')
gg_dllu1 = {}


def dllu1_agent(observation, configuration):
    global gg_dllu1
    global code_dllu1
    inp = ''
    try:
        inp = 'RPS'[observation.lastOpponentAction]
    except:
        pass
    gg_dllu1['input'] = inp
    exec(code_dllu1, gg_dllu1)
    return {'R': 0, 'P': 1, 'S': 2}[gg_dllu1['output']]



##### ./greenberg.py #####


# greenberg roshambo bot, winner of 2nd annual roshambo programming competition
# http://webdocs.cs.ualberta.ca/~darse/rsbpc.html

# original source by Andrzej Nagorko
# http://www.mathpuzzle.com/greenberg.c

# Python translation by Travis Erdman
# https://github.com/erdman/roshambo

import random
from operator import itemgetter
# from itertools import izip
izip   = zip   # BUGFIX: izip   is python2
xrange = range # BUGFIX: xrange is python2

rps_to_text  = ('rock','paper','scissors')
rps_to_num   = {'rock':0, 'paper':1, 'scissors':2}

def player(my_moves, opp_moves):
    wins_with    = (1,2,0)  # superior
    best_without = (2,0,1)  # inferior

    lengths = (10, 20, 30, 40, 49, 0)
    p_random = random.choice([0,1,2])  #called 'guess' in iocaine

    TRIALS = 1000
    score_table =((0,-1,1),(1,0,-1),(-1,1,0))
    T = len(opp_moves)  #so T is number of trials completed

    def min_index(values):
        return min(enumerate(values), key=itemgetter(1))[0]

    def max_index(values):
        return max(enumerate(values), key=itemgetter(1))[0]

    def find_best_prediction(l):  # l = len
        bs = -TRIALS
        bp = 0
        if player.p_random_score > bs:
            bs = player.p_random_score
            bp = p_random
        for i in xrange(3):
            for j in xrange(24):
                for k in xrange(4):
                    new_bs = player.p_full_score[T%50][j][k][i] - (player.p_full_score[(50+T-l)%50][j][k][i] if l else 0)
                    if new_bs > bs:
                        bs = new_bs
                        bp = (player.p_full[j][k] + i) % 3
                for k in xrange(2):
                    new_bs = player.r_full_score[T%50][j][k][i] - (player.r_full_score[(50+T-l)%50][j][k][i] if l else 0)
                    if new_bs > bs:
                        bs = new_bs
                        bp = (player.r_full[j][k] + i) % 3
            for j in xrange(2):
                for k in xrange(2):
                    new_bs = player.p_freq_score[T%50][j][k][i] - (player.p_freq_score[(50+T-l)%50][j][k][i] if l else 0)
                    if new_bs > bs:
                        bs = new_bs
                        bp = (player.p_freq[j][k] + i) % 3
                    new_bs = player.r_freq_score[T%50][j][k][i] - (player.r_freq_score[(50+T-l)%50][j][k][i] if l else 0)
                    if new_bs > bs:
                        bs = new_bs
                        bp = (player.r_freq[j][k] + i) % 3
        return bp


    if not my_moves:
        player.opp_history = [0]  #pad to match up with 1-based move indexing in original
        player.my_history = [0]
        player.gear = [[0] for _ in xrange(24)]
        # init()
        player.p_random_score = 0
        player.p_full_score = [[[[0 for i in xrange(3)] for k in xrange(4)] for j in xrange(24)] for l in xrange(50)]
        player.r_full_score = [[[[0 for i in xrange(3)] for k in xrange(2)] for j in xrange(24)] for l in xrange(50)]
        player.p_freq_score = [[[[0 for i in xrange(3)] for k in xrange(2)] for j in xrange(2)] for l in xrange(50)]
        player.r_freq_score = [[[[0 for i in xrange(3)] for k in xrange(2)] for j in xrange(2)] for l in xrange(50)]
        player.s_len = [0] * 6

        player.p_full = [[0,0,0,0] for _ in xrange(24)]
        player.r_full = [[0,0] for _ in xrange(24)]
    else:
        player.my_history.append(rps_to_num[my_moves[-1]])
        player.opp_history.append(rps_to_num[opp_moves[-1]])
        # update_scores()
        player.p_random_score += score_table[p_random][player.opp_history[-1]]
        player.p_full_score[T%50] = [[[player.p_full_score[(T+49)%50][j][k][i] + score_table[(player.p_full[j][k] + i) % 3][player.opp_history[-1]] for i in xrange(3)] for k in xrange(4)] for j in xrange(24)]
        player.r_full_score[T%50] = [[[player.r_full_score[(T+49)%50][j][k][i] + score_table[(player.r_full[j][k] + i) % 3][player.opp_history[-1]] for i in xrange(3)] for k in xrange(2)] for j in xrange(24)]
        player.p_freq_score[T%50] = [[[player.p_freq_score[(T+49)%50][j][k][i] + score_table[(player.p_freq[j][k] + i) % 3][player.opp_history[-1]] for i in xrange(3)] for k in xrange(2)] for j in xrange(2)]
        player.r_freq_score[T%50] = [[[player.r_freq_score[(T+49)%50][j][k][i] + score_table[(player.r_freq[j][k] + i) % 3][player.opp_history[-1]] for i in xrange(3)] for k in xrange(2)] for j in xrange(2)]
        player.s_len = [s + score_table[p][player.opp_history[-1]] for s,p in izip(player.s_len,player.p_len)]


    # update_history_hash()
    if not my_moves:
        player.my_history_hash = [[0],[0],[0],[0]]
        player.opp_history_hash = [[0],[0],[0],[0]]
    else:
        player.my_history_hash[0].append(player.my_history[-1])
        player.opp_history_hash[0].append(player.opp_history[-1])
        for i in xrange(1,4):
            player.my_history_hash[i].append(player.my_history_hash[i-1][-1] * 3 + player.my_history[-1])
            player.opp_history_hash[i].append(player.opp_history_hash[i-1][-1] * 3 + player.opp_history[-1])


    #make_predictions()

    for i in xrange(24):
        player.gear[i].append((3 + player.opp_history[-1] - player.p_full[i][2]) % 3)
        if T > 1:
            player.gear[i][T] += 3 * player.gear[i][T-1]
        player.gear[i][T] %= 9 # clearly there are 9 different gears, but original code only allocated 3 gear_freq's
                               # code apparently worked, but got lucky with undefined behavior
                               # I fixed by allocating gear_freq with length = 9
    if not my_moves:
        player.freq = [[0,0,0],[0,0,0]]
        value = [[0,0,0],[0,0,0]]
    else:
        player.freq[0][player.my_history[-1]] += 1
        player.freq[1][player.opp_history[-1]] += 1
        value = [[(1000 * (player.freq[i][2] - player.freq[i][1])) / float(T),
                  (1000 * (player.freq[i][0] - player.freq[i][2])) / float(T),
                  (1000 * (player.freq[i][1] - player.freq[i][0])) / float(T)] for i in xrange(2)]
    player.p_freq = [[wins_with[max_index(player.freq[i])], wins_with[max_index(value[i])]] for i in xrange(2)]
    player.r_freq = [[best_without[min_index(player.freq[i])], best_without[min_index(value[i])]] for i in xrange(2)]

    f = [[[[0,0,0] for k in xrange(4)] for j in xrange(2)] for i in xrange(3)]
    t = [[[0,0,0,0] for j in xrange(2)] for i in xrange(3)]

    m_len = [[0 for _ in xrange(T)] for i in xrange(3)]

    for i in xrange(T-1,0,-1):
        m_len[0][i] = 4
        for j in xrange(4):
            if player.my_history_hash[j][i] != player.my_history_hash[j][T]:
                m_len[0][i] = j
                break
        for j in xrange(4):
            if player.opp_history_hash[j][i] != player.opp_history_hash[j][T]:
                m_len[1][i] = j
                break
        for j in xrange(4):
            if player.my_history_hash[j][i] != player.my_history_hash[j][T] or player.opp_history_hash[j][i] != player.opp_history_hash[j][T]:
                m_len[2][i] = j
                break

    for i in xrange(T-1,0,-1):
        for j in xrange(3):
            for k in xrange(m_len[j][i]):
                f[j][0][k][player.my_history[i+1]] += 1
                f[j][1][k][player.opp_history[i+1]] += 1
                t[j][0][k] += 1
                t[j][1][k] += 1

                if t[j][0][k] == 1:
                    player.p_full[j*8 + 0*4 + k][0] = wins_with[player.my_history[i+1]]
                if t[j][1][k] == 1:
                    player.p_full[j*8 + 1*4 + k][0] = wins_with[player.opp_history[i+1]]
                if t[j][0][k] == 3:
                    player.p_full[j*8 + 0*4 + k][1] = wins_with[max_index(f[j][0][k])]
                    player.r_full[j*8 + 0*4 + k][0] = best_without[min_index(f[j][0][k])]
                if t[j][1][k] == 3:
                    player.p_full[j*8 + 1*4 + k][1] = wins_with[max_index(f[j][1][k])]
                    player.r_full[j*8 + 1*4 + k][0] = best_without[min_index(f[j][1][k])]

    for j in xrange(3):
        for k in xrange(4):
            player.p_full[j*8 + 0*4 + k][2] = wins_with[max_index(f[j][0][k])]
            player.r_full[j*8 + 0*4 + k][1] = best_without[min_index(f[j][0][k])]

            player.p_full[j*8 + 1*4 + k][2] = wins_with[max_index(f[j][1][k])]
            player.r_full[j*8 + 1*4 + k][1] = best_without[min_index(f[j][1][k])]

    for j in xrange(24):
        gear_freq = [0] * 9 # was [0,0,0] because original code incorrectly only allocated array length 3

        for i in xrange(T-1,0,-1):
            if player.gear[j][i] == player.gear[j][T]:
                gear_freq[player.gear[j][i+1]] += 1

        #original source allocated to 9 positions of gear_freq array, but only allocated first three
        #also, only looked at first 3 to find the max_index
        #unclear whether to seek max index over all 9 gear_freq's or just first 3 (as original code)
        player.p_full[j][3] = (player.p_full[j][1] + max_index(gear_freq)) % 3

    # end make_predictions()

    player.p_len = [find_best_prediction(l) for l in lengths]

    return rps_to_text[player.p_len[max_index(player.s_len)]]



# observation   =  {'step': 1, 'lastOpponentAction': 1}
# configuration =  {'episodeSteps': 10, 'agentTimeout': 60, 'actTimeout': 1, 'runTimeout': 1200, 'isProduction': False, 'signs': 3}
my_moves    = []
opp_moves   = []
def greenberg_agent(observation, configuration):    
    global my_moves
    global opp_moves
    if observation.step > 0:
        opp_move = rps_to_text[ observation.lastOpponentAction ]
        opp_moves.append( opp_move )
        
    action_text = player(my_moves, opp_moves)
    action      = rps_to_num[action_text]

    my_moves.append(action_text)
    return int(action)



##### ./decision_tree_1.py #####


import time
import os
import random
import numpy as np
from typing import List, Dict
from sklearn.tree import DecisionTreeClassifier

def random_agent(observation, configuration):
    return random.randint(0, configuration.signs-1)

def rock_agent(observation, configuration):
    return 0

def paper_agent(observation, configuration):
    return 1

def scissors_agent(observation, configuration):
    return 2

def sequential_agent(observation, configuration):
    return observation.step % configuration.signs



def get_winstats(decision_tree_history_1) -> Dict[str,int]:
    total = len(decision_tree_history_1['action'])
    wins = 0
    draw = 0
    loss = 0 
    for n in range(total):
        if   decision_tree_history_1['action'][n] == decision_tree_history_1['opponent'][n] + 1: wins +=  1
        elif decision_tree_history_1['action'][n] == decision_tree_history_1['opponent'][n]:     draw +=  1
        elif decision_tree_history_1['action'][n] == decision_tree_history_1['opponent'][n] - 1: loss +=  1
    return { "wins": wins, "draw": draw, "loss": loss }

def get_winrate(decision_tree_history_1):
    winstats = get_winstats(decision_tree_history_1)
    winrate  = winstats['wins'] / (winstats['wins'] + winstats['loss']) if (winstats['wins'] + winstats['loss']) else 0
    return winrate
    
    
# Initialize starting decision_tree_history_1
decision_tree_history_1 = {
    "step":        [],
    "prediction1": [],
    "prediction2": [],
    "expected":    [],
    "action":      [],
    "opponent":    [],
}

# NOTE: adding statistics causes the DecisionTree to make random moves 
def get_statistics(values) -> List[float]:
    values = np.array(values)
    return [
        np.count_nonzero(values == n) / len(values)
        if len(values) else 0.0
        for n in [0,1,2]
    ]


# observation   =  {'step': 1, 'lastOpponentAction': 1}
# configuration =  {'episodeSteps': 10, 'agentTimeout': 60, 'actTimeout': 1, 'runTimeout': 1200, 'isProduction': False, 'signs': 3}
def decision_tree_agent_1(observation, configuration, window=5, stages=2, random_freq=0.66, warmup_period=10, max_samples=1000):    
    global decision_tree_history_1
    warmup_period   = warmup_period  # if os.environ.get('KAGGLE_KERNEL_RUN_TYPE','') != 'Interactive' else 0
    models          = [ None ] + [ DecisionTreeClassifier() ] * stages
    
    time_start      = time.perf_counter()
    actions         = list(range(configuration.signs))  # [0,1,2]
    
    step            = observation.step
    last_action     = decision_tree_history_1['action'][-1]          if len(decision_tree_history_1['action']) else 2
    opponent_action = observation.lastOpponentAction if observation.step > 0   else 2
        
    if observation.step > 0:
        decision_tree_history_1['opponent'].append(opponent_action)
        
    winrate  = get_winrate(decision_tree_history_1)
    winstats = get_winstats(decision_tree_history_1)
    
    # Set default values     
    prediction1 = random.randint(0,2)
    prediction2 = random.randint(0,2)
    prediction3 = random.randint(0,2)
    expected    = random.randint(0,2)

    # We need at least some turns of decision_tree_history_1 for DecisionTreeClassifier to work
    if observation.step >= window:
        # First we try to predict the opponents next move based on move decision_tree_history_1
        # TODO: create windowed decision_tree_history_1
        try:
            n_start = max(1, len(decision_tree_history_1['opponent']) - window - max_samples) 
            # pass
            if stages >= 1:
                X = np.stack([
                    np.array([
                        # get_statistics(decision_tree_history_1['action'][:n+window]),
                        # get_statistics(decision_tree_history_1['opponent'][:n-1+window]),
                        decision_tree_history_1['action'][n:n+window], 
                        decision_tree_history_1['opponent'][n:n+window]
                    ]).flatten()
                    for n in range(n_start,len(decision_tree_history_1['opponent'])-window-warmup_period) 
                ])
                Y = np.array([
                    decision_tree_history_1['opponent'][n+window]
                    for n in range(n_start,len(decision_tree_history_1['opponent'])-window-warmup_period) 
                ])  
                Z = np.array([
                    # get_statistics(decision_tree_history_1['action']),
                    # get_statistics(decision_tree_history_1['opponent']),
                    decision_tree_history_1['action'][-window+1:] + [ last_action ], 
                    decision_tree_history_1['opponent'][-window:] 
                ]).flatten().reshape(1, -1)

                models[1].fit(X, Y)
                expected = prediction1 = models[1].predict(Z)[0]

            if stages >= 2:
                # Now retrain including prediction decision_tree_history_1
                X = np.stack([
                    np.array([
                        # get_statistics(decision_tree_history_1['action'][:n+window]),
                        # get_statistics(decision_tree_history_1['prediction1'][:n+window]),
                        # get_statistics(decision_tree_history_1['opponent'][:n-1+window]),
                        decision_tree_history_1['action'][n:n+window], 
                        decision_tree_history_1['prediction1'][n:n+window],
                        decision_tree_history_1['opponent'][n:n+window],
                    ]).flatten()
                    for n in range(n_start,len(decision_tree_history_1['opponent'])-window-warmup_period) 
                ])
                Y = np.array([
                    decision_tree_history_1['opponent'][n+window]
                    for n in range(n_start,len(decision_tree_history_1['opponent'])-window-warmup_period) 
                ])  
                Z = np.array([
                    # get_statistics(decision_tree_history_1['action']),
                    # get_statistics(decision_tree_history_1['prediction1']),
                    # get_statistics(decision_tree_history_1['opponent']),
                    decision_tree_history_1['action'][-window+1:]      + [ last_action ], 
                    decision_tree_history_1['prediction1'][-window+1:] + [ prediction1 ],
                    decision_tree_history_1['opponent'][-window:] 
                ]).flatten().reshape(1, -1)

                models[2].fit(X, Y)
                expected = prediction2 = models[2].predict(Z)[0]

            if stages >= 3:
                # Now retrain including prediction decision_tree_history_1
                X = np.stack([
                    np.array([
                        # get_statistics(decision_tree_history_1['action'][:n+window]),
                        # get_statistics(decision_tree_history_1['prediction1'][:n+window]),
                        # get_statistics(decision_tree_history_1['prediction2'][:n+window]),
                        # get_statistics(decision_tree_history_1['opponent'][:n-1+window]),
                        decision_tree_history_1['action'][n:n+window], 
                        decision_tree_history_1['prediction1'][n:n+window],
                        decision_tree_history_1['prediction2'][n:n+window],
                        decision_tree_history_1['opponent'][n:n+window],
                    ]).flatten()
                    for n in range(n_start,len(decision_tree_history_1['opponent'])-window-warmup_period) 
                ])
                Y = np.array([
                    decision_tree_history_1['opponent'][n+window]
                    for n in range(n_start,len(decision_tree_history_1['opponent'])-window-warmup_period) 
                ])  
                Z = np.array([
                    # get_statistics(decision_tree_history_1['action']),
                    # get_statistics(decision_tree_history_1['prediction1']),
                    # get_statistics(decision_tree_history_1['prediction2']),
                    # get_statistics(decision_tree_history_1['opponent']),
                    decision_tree_history_1['action'][-window+1:]      + [ last_action ], 
                    decision_tree_history_1['prediction1'][-window+1:] + [ prediction1 ],
                    decision_tree_history_1['prediction2'][-window+1:] + [ prediction2 ],
                    decision_tree_history_1['opponent'][-window:] 
                ]).flatten().reshape(1, -1)

                models[3].fit(X, Y)
                expected = prediction3 = models[3].predict(Z)[0]
        
        except Exception as exception:
            pass
                    
    # During the warmup period, play random to get a feel for the opponent 
    if (observation.step <= max(warmup_period,window)):
        actor  = 'warmup'
        action = random_agent(observation, configuration)    
    
    # Play a purely random move occasionally, which will hopefully distort any opponent statistics
    elif (random.random() <= random_freq):
        actor  = 'random'
        action = random_agent(observation, configuration)
        
    # But mostly use DecisionTreeClassifier to predict the next move
    else:
        actor  = 'DecisionTree'
        action = (expected + 1) % configuration.signs
    
    # Persist state
    decision_tree_history_1['step'].append(step)
    decision_tree_history_1['prediction1'].append(prediction1)
    decision_tree_history_1['prediction2'].append(prediction2)
    decision_tree_history_1['expected'].append(expected)
    decision_tree_history_1['action'].append(action)
    if observation.step == 0:  # keep arrays equal length
        decision_tree_history_1['opponent'].append(random.randint(0, 2))


    # Print debug information
    time_taken = time.perf_counter() - time_start
    # pass    
    pass    
    return int(action)



##### ./IOU2.py #####

import random

class Strategy:
  def __init__(self):
    # 2 different self.lengths of history, 3 kinds of history, both, mine, yours
    # 3 different self.limit self.length of reverse learning
    # 6 kinds of strategy based on Iocaine Powder
    self.num_predictor = 27


    self.len_rfind = [20]
    self.limit = [10,20,60]
    self.beat = { "R":"P" , "P":"S", "S":"R"}
    self.not_lose = { "R":"PPR" , "P":"SSP" , "S":"RRS" } #50-50 chance
    self.my_his   =""
    self.your_his =""
    self.both_his =""
    self.list_predictor = [""]*self.num_predictor
    self.length = 0
    self.temp1 = { "PP":"1" , "PR":"2" , "PS":"3",
              "RP":"4" , "RR":"5", "RS":"6",
              "SP":"7" , "SR":"8", "SS":"9"}
    self.temp2 = { "1":"PP","2":"PR","3":"PS",
                "4":"RP","5":"RR","6":"RS",
                "7":"SP","8":"SR","9":"SS"} 
    self.who_win = { "PP": 0, "PR":1 , "PS":-1,
                "RP": -1,"RR":0, "RS":1,
                "SP": 1, "SR":-1, "SS":0}
    self.score_predictor = [0]*self.num_predictor
    self.output = random.choice("RPS")
    self.predictors = [self.output]*self.num_predictor


  def prepare_next_move(self, prev_input):
    input = prev_input

    #update self.predictors
    #"""
    if len(self.list_predictor[0])<5:
        front =0
    else:
        front =1
    for i in range (self.num_predictor):
        if self.predictors[i]==input:
            result ="1"
        else:
            result ="0"
        self.list_predictor[i] = self.list_predictor[i][front:5]+result #only 5 rounds before
    #history matching 1-6
    self.my_his += self.output
    self.your_his += input
    self.both_his += self.temp1[input+self.output]
    self.length +=1
    for i in range(1):
        len_size = min(self.length,self.len_rfind[i])
        j=len_size
        #self.both_his
        while j>=1 and not self.both_his[self.length-j:self.length] in self.both_his[0:self.length-1]:
            j-=1
        if j>=1:
            k = self.both_his.rfind(self.both_his[self.length-j:self.length],0,self.length-1)
            self.predictors[0+6*i] = self.your_his[j+k]
            self.predictors[1+6*i] = self.beat[self.my_his[j+k]]
        else:
            self.predictors[0+6*i] = random.choice("RPS")
            self.predictors[1+6*i] = random.choice("RPS")
        j=len_size
        #self.your_his
        while j>=1 and not self.your_his[self.length-j:self.length] in self.your_his[0:self.length-1]:
            j-=1
        if j>=1:
            k = self.your_his.rfind(self.your_his[self.length-j:self.length],0,self.length-1)
            self.predictors[2+6*i] = self.your_his[j+k]
            self.predictors[3+6*i] = self.beat[self.my_his[j+k]]
        else:
            self.predictors[2+6*i] = random.choice("RPS")
            self.predictors[3+6*i] = random.choice("RPS")
        j=len_size
        #self.my_his
        while j>=1 and not self.my_his[self.length-j:self.length] in self.my_his[0:self.length-1]:
            j-=1
        if j>=1:
            k = self.my_his.rfind(self.my_his[self.length-j:self.length],0,self.length-1)
            self.predictors[4+6*i] = self.your_his[j+k]
            self.predictors[5+6*i] = self.beat[self.my_his[j+k]]
        else:
            self.predictors[4+6*i] = random.choice("RPS")
            self.predictors[5+6*i] = random.choice("RPS")

    for i in range(3):
        temp =""
        search = self.temp1[(self.output+input)] #last round
        for start in range(2, min(self.limit[i],self.length) ):
            if search == self.both_his[self.length-start]:
                temp+=self.both_his[self.length-start+1]
        if(temp==""):
            self.predictors[6+i] = random.choice("RPS")
        else:
            collectR = {"P":0,"R":0,"S":0} #take win/lose from opponent into account
            for sdf in temp:
                next_move = self.temp2[sdf]
                if(self.who_win[next_move]==-1):
                    collectR[self.temp2[sdf][1]]+=3
                elif(self.who_win[next_move]==0):
                    collectR[self.temp2[sdf][1]]+=1
                elif(self.who_win[next_move]==1):
                    collectR[self.beat[self.temp2[sdf][0]]]+=1
            max1 = -1
            p1 =""
            for key in collectR:
                if(collectR[key]>max1):
                    max1 = collectR[key]
                    p1 += key
            self.predictors[6+i] = random.choice(p1)
    
    #rotate 9-27:
    for i in range(9,27):
        self.predictors[i] = self.beat[self.beat[self.predictors[i-9]]]
        
    #choose a predictor
    len_his = len(self.list_predictor[0])
    for i in range(self.num_predictor):
        sum = 0
        for j in range(len_his):
            if self.list_predictor[i][j]=="1":
                sum+=(j+1)*(j+1)
            else:
                sum-=(j+1)*(j+1)
        self.score_predictor[i] = sum
    max_score = max(self.score_predictor)
    #min_score = min(self.score_predictor)
    #c_temp = {"R":0,"P":0,"S":0}
    #for i in range (self.num_predictor):
        #if self.score_predictor[i]==max_score:
        #    c_temp[self.predictors[i]] +=1
        #if self.score_predictor[i]==min_score:
        #    c_temp[self.predictors[i]] -=1
    if max_score>0:
        predict = self.predictors[self.score_predictor.index(max_score)]
    else:
        predict = random.choice(self.your_his)
    self.output = random.choice(self.not_lose[predict])
    return self.output


global GLOBAL_STRATEGY
GLOBAL_STRATEGY = Strategy()


def iou2_agent(observation, configuration):
  global GLOBAL_STRATEGY

  # Action mapping
  to_char = ["R", "P", "S"]
  from_char = {"R": 0, "P": 1, "S": 2}

  if observation.step > 0:
    GLOBAL_STRATEGY.prepare_next_move(to_char[observation.lastOpponentAction])
  action = from_char[GLOBAL_STRATEGY.output]
  return action



##### ./memory_patterns_v20.py #####

# start executing cells from here to rewrite submission.py

import random

def evaluate_pattern_efficiency(previous_step_result):
    """ 
        evaluate efficiency of the pattern and, if pattern is inefficient,
        remove it from agent's memory
    """
    pattern_group_index = previous_action["pattern_group_index"]
    pattern_index = previous_action["pattern_index"]
    pattern = groups_of_memory_patterns[pattern_group_index]["memory_patterns"][pattern_index]
    pattern["reward"] += previous_step_result
    # if pattern is inefficient
    if pattern["reward"] <= EFFICIENCY_THRESHOLD:
        # remove pattern from agent's memory
        del groups_of_memory_patterns[pattern_group_index]["memory_patterns"][pattern_index]
    
def find_action(group, group_index):
    """ if possible, find my_action in this group of memory patterns """
    if len(current_memory) > group["memory_length"]:
        this_step_memory = current_memory[-group["memory_length"]:]
        memory_pattern, pattern_index = find_pattern(group["memory_patterns"], this_step_memory, group["memory_length"])
        if memory_pattern != None:
            my_action_amount = 0
            for action in memory_pattern["opp_next_actions"]:
                # if this opponent's action occurred more times than currently chosen action
                # or, if it occured the same amount of times and this one is choosen randomly among them
                if (action["amount"] > my_action_amount or
                        (action["amount"] == my_action_amount and random.random() > 0.5)):
                    my_action_amount = action["amount"]
                    my_action = action["response"]
            return my_action, pattern_index
    return None, None

def find_pattern(memory_patterns, memory, memory_length):
    """ find appropriate pattern and its index in memory """
    for i in range(len(memory_patterns)):
        actions_matched = 0
        for j in range(memory_length):
            if memory_patterns[i]["actions"][j] == memory[j]:
                actions_matched += 1
            else:
                break
        # if memory fits this pattern
        if actions_matched == memory_length:
            return memory_patterns[i], i
    # appropriate pattern not found
    return None, None

def get_step_result_for_memory_patterns_v20(memory_patterns_v20_action, opp_action):
    """ 
        get result of the step for memory_patterns_v20
        1, 0 and -1 representing win, tie and lost results of the game respectively
        reward will be taken from observation in the next release of kaggle environments
    """
    if memory_patterns_v20_action == opp_action:
        return 0
    elif (memory_patterns_v20_action == (opp_action + 1)) or (memory_patterns_v20_action == 0 and opp_action == 2):
        return 1
    else:
        return -1
    
def update_current_memory(obs, my_action):
    """ add memory_patterns_v20's current step to current_memory """
    # if there's too many actions in the current_memory
    if len(current_memory) > current_memory_max_length:
        # delete first two elements in current memory
        # (actions of the oldest step in current memory)
        del current_memory[:2]
    # add agent's last action to agent's current memory
    current_memory.append(my_action)
    
def update_memory_pattern(obs, group):
    """ if possible, update or add some memory pattern in this group """
    # if length of current memory is suitable for this group of memory patterns
    if len(current_memory) > group["memory_length"]:
        # get memory of the previous step
        # considering that last step actions of both agents are already present in current_memory
        previous_step_memory = current_memory[-group["memory_length"] - 2 : -2]
        previous_pattern, pattern_index = find_pattern(group["memory_patterns"], previous_step_memory, group["memory_length"])
        if previous_pattern == None:
            previous_pattern = {
                # list of actions of both players
                "actions": previous_step_memory.copy(),
                # total reward earned by using this pattern
                "reward": 0,
                # list of observed opponent's actions after each occurrence of this pattern
                "opp_next_actions": [
                    # action that was made by opponent,
                    # amount of times that action occurred,
                    # what should be the response of memory_patterns_v20
                    {"action": 0, "amount": 0, "response": 1},
                    {"action": 1, "amount": 0, "response": 2},
                    {"action": 2, "amount": 0, "response": 0}
                ]
            }
            group["memory_patterns"].append(previous_pattern)
        # update previous_pattern
        for action in previous_pattern["opp_next_actions"]:
            if action["action"] == obs["lastOpponentAction"]:
                action["amount"] += 1
    
# "%%writefile -a submission.py" will append the code below to submission.py,
# it WILL NOT rewrite submission.py

# maximum steps in a memory pattern
STEPS_MAX = 5
# minimum steps in a memory pattern
STEPS_MIN = 3
# lowest efficiency threshold of a memory pattern before being removed from agent's memory
EFFICIENCY_THRESHOLD = -3
# amount of steps between forced random actions
FORCED_RANDOM_ACTION_INTERVAL = random.randint(STEPS_MIN, STEPS_MAX)

# current memory of the agent
current_memory = []
# previous action of memory_patterns_v20
previous_action = {
    "action": None,
    # action was taken from pattern
    "action_from_pattern": False,
    "pattern_group_index": None,
    "pattern_index": None
}
# amount of steps remained until next forced random action
steps_to_random = FORCED_RANDOM_ACTION_INTERVAL
# maximum length of current_memory
current_memory_max_length = STEPS_MAX * 2
# current reward of memory_patterns_v20
# will be taken from observation in the next release of kaggle environments
reward = 0
# memory length of patterns in first group
# STEPS_MAX is multiplied by 2 to consider both memory_patterns_v20's and opponent's actions
group_memory_length = current_memory_max_length
# list of groups of memory patterns
groups_of_memory_patterns = []
for i in range(STEPS_MAX, STEPS_MIN - 1, -1):
    groups_of_memory_patterns.append({
        # how many steps in a row are in the pattern
        "memory_length": group_memory_length,
        # list of memory patterns
        "memory_patterns": []
    })
    group_memory_length -= 2
    
# "%%writefile -a submission.py" will append the code below to submission.py,
# it WILL NOT rewrite submission.py

def memory_patterns_v20(obs, conf):
    """ your ad here """
    # action of memory_patterns_v20
    my_action = None
    
    # forced random action
    global steps_to_random
    steps_to_random -= 1
    if steps_to_random <= 0:
        steps_to_random = FORCED_RANDOM_ACTION_INTERVAL
        # choose action randomly
        my_action = random.randint(0, 2)
        # save action's data
        previous_action["action"] = my_action
        previous_action["action_from_pattern"] = False
        previous_action["pattern_group_index"] = None
        previous_action["pattern_index"] = None
    
    # if it's not first step
    if obs["step"] > 0:
        # add opponent's last step to current_memory
        current_memory.append(obs["lastOpponentAction"])
        # previous step won or lost
        previous_step_result = get_step_result_for_memory_patterns_v20(current_memory[-2], current_memory[-1])
        global reward
        reward += previous_step_result
        # if previous action of memory_patterns_v20 was taken from pattern
        if previous_action["action_from_pattern"]:
            evaluate_pattern_efficiency(previous_step_result)
    
    for i in range(len(groups_of_memory_patterns)):
        # if possible, update or add some memory pattern in this group
        update_memory_pattern(obs, groups_of_memory_patterns[i])
        # if action was not yet found
        if my_action == None:
            my_action, pattern_index = find_action(groups_of_memory_patterns[i], i)
            if my_action != None:
                # save action's data
                previous_action["action"] = my_action
                previous_action["action_from_pattern"] = True
                previous_action["pattern_group_index"] = i
                previous_action["pattern_index"] = pattern_index
    
    # if no action was found
    if my_action == None:
        # choose action randomly
        my_action = random.randint(0, 2)
        # save action's data
        previous_action["action"] = my_action
        previous_action["action_from_pattern"] = False
        previous_action["pattern_group_index"] = None
        previous_action["pattern_index"] = None
    
    # add memory_patterns_v20's current step to current_memory
    update_current_memory(obs, my_action)
    return my_action



##### ./anti_pi.py #####

import re

PI = "3.14159 26535 89793 23846 26433 83279 50288 41971 69399 37510 58209 74944 59230 78164 06286 20899 86280 34825 34211 70679 82148 08651 32823 06647 09384 46095 50582 23172 53594 08128 48111 74502 84102 70193 85211 05559 64462 29489 54930 38196 44288 10975 66593 34461 28475 64823 37867 83165 27120 19091 45648 56692 34603 48610 45432 66482 13393 60726 02491 41273 72458 70066 06315 58817 48815 20920 96282 92540 91715 36436 78925 90360 01133 05305 48820 46652 13841 46951 94151 16094 33057 27036 57595 91953 09218 61173 81932 61179 31051 18548 07446 23799 62749 56735 18857 52724 89122 79381 83011 94912 98336 73362 44065 66430 86021 39494 63952 24737 19070 21798 60943 70277 05392 17176 29317 67523 84674 81846 76694 05132 00056 81271 45263 56082 77857 71342 75778 96091 73637 17872 14684 40901 22495 34301 46549 58537 10507 92279 68925 89235 42019 95611 21290 21960 86403 44181 59813 62977 47713 09960 51870 72113 49999 99837 29780 49951 05973 17328 16096 31859 50244 59455 34690 83026 42522 30825 33446 85035 26193 11881 71010 00313 78387 52886 58753 32083 81420 61717 76691 47303 59825 34904 28755 46873 11595 62863 88235 37875 93751 95778 18577 80532 17122 68066 13001 92787 66111 95909 21642 01989 38095 25720 10654 85863 27886 59361 53381 82796 82303 01952 03530 18529 68995 77362 25994 13891 24972 17752 83479 13151 55748 57242 45415 06959 50829 53311 68617 27855 88907 50983 81754 63746 49393 19255 06040 09277 01671 13900 98488 24012 85836 16035 63707 66010 47101 81942 95559 61989 46767 83744 94482 55379 77472 68471 04047 53464 62080 46684 25906 94912 93313 67702 89891 52104 75216 20569 66024 05803 81501 93511 25338 24300 35587 64024 74964 73263 91419 92726 04269 92279 67823 54781 63600 93417 21641 21992 45863 15030 28618 29745 55706 74983 85054 94588 58692 69956 90927 21079 75093 02955 32116 53449 87202 75596 02364 80665 49911 98818 34797 75356 63698 07426 54252 78625 51818 41757 46728 90977 77279 38000 81647 06001 61452 49192 17321 72147 72350 14144 19735 68548 16136 11573 52552 13347 57418 49468 43852 33239 07394 14333 45477 62416 86251 89835 69485 56209 92192 22184 27255 02542 56887 67179 04946 01653 46680 49886 27232 79178 60857 84383 82796 79766 81454 10095 38837 86360 95068 00642 25125 20511 73929 84896 08412 84886 26945 60424 19652 85022 21066 11863 06744 27862 20391 94945 04712 37137 86960 95636 43719 17287 46776 46575 73962 41389 08658 32645 99581 33904 78027 59009 94657 64078 95126 94683 98352 59570 98258 22620 52248 94077 26719 47826 84826 01476 99090 26401 36394 43745 53050 68203 49625 24517 49399 65143 14298 09190 65925 09372 21696 46151 57098 58387 41059 78859 59772 97549 89301 61753 92846 81382 68683 86894 27741 55991 85592 52459 53959 43104 99725 24680 84598 72736 44695 84865 38367 36222 62609 91246 08051 24388 43904 51244 13654 97627 80797 71569 14359 97700 12961 60894 41694 86855 58484 06353 42207 22258 28488 64815 84560 28506 01684 27394 52267 46767 88952 52138 52254 99546 66727 82398 64565 96116 35488 62305 77456 49803 55936 34568 17432 41125 15076 06947 94510 96596 09402 52288 79710 89314 56691 36867 22874 89405 60101 50330 86179 28680 92087 47609 17824 93858 90097 14909 67598 52613 65549 78189 31297 84821 68299 89487 22658 80485 75640 14270 47755 51323 79641 45152 37462 34364 54285 84447 95265 86782 10511 41354 73573 95231 13427 16610 21359 69536 23144 29524 84937 18711 01457 65403 59027 99344 03742 00731 05785 39062 19838 74478 08478 48968 33214 45713 86875 19435 06430 21845 31910 48481 00537 06146 80674 91927 81911 97939 95206 14196 63428 75444 06437 45123 71819 21799 98391 01591 95618 14675 14269 12397 48940 90718 64942 31961 56794 52080 95146 55022 52316 03881 93014 20937 62137 85595 66389 37787 08303 90697 92077 34672 21825 62599 66150 14215 03068 03844 77345 49202 60541 46659 25201 49744 28507 32518 66600 21324 34088 19071 04863 31734 64965 14539 05796 26856 10055 08106 65879 69981 63574 73638 40525 71459 10289 70641 40110 97120 62804 39039 75951 56771 57700 42033 78699 36007 23055 87631 76359 42187 31251 47120 53292 81918 26186 12586 73215 79198 41484 88291 64470 60957 52706 95722 09175 67116 72291 09816 90915 28017 35067 12748 58322 28718 35209 35396 57251 21083 57915 13698 82091 44421 00675 10334 67110 31412 67111 36990 86585 16398 31501 97016 51511 68517 14376 57618 35155 65088 49099 89859 98238 73455 28331 63550 76479 18535 89322 61854 89632 13293 30898 57064 20467 52590 70915 48141 65498 59461 63718 02709 81994 30992 44889 57571 28289 05923 23326 09729 97120 84433 57326 54893 82391 19325 97463 66730 58360 41428 13883 03203 82490 37589 85243 74417 02913 27656 18093 77344 40307 07469 21120 19130 20330 38019 76211 01100 44929 32151 60842 44485 96376 69838 95228 68478 31235 52658 21314 49576 85726 24334 41893 03968 64262 43410 77322 69780 28073 18915 44110 10446 82325 27162 01052 65227 21116 60396 66557 30925 47110 55785 37634 66820 65310 98965 26918 62056 47693 12570 58635 66201 85581 00729 36065 98764 86117 91045 33488 50346 11365 76867 53249 44166 80396 26579 78771 85560 84552 96541 26654 08530 61434 44318 58676 97514 56614 06800 70023 78776 59134 40171 27494 70420 56223 05389 94561 31407 11270 00407 85473 32699 39081 45466 46458 80797 27082 66830 63432 85878 56983 05235 80893 30657 57406 79545 71637 75254 20211 49557 61581 40025 01262 28594 13021 64715 50979 25923 09907 96547 37612 55176 56751 35751 78296 66454 77917 45011 29961 48903 04639 94713 29621 07340 43751 89573 59614 58901 93897 13111 79042 97828 56475 03203 19869 15140 28708 08599 04801 09412 14722 13179 47647 77262 24142 54854 54033 21571 85306 14228 81375 85043 06332 17518 29798 66223 71721 59160 77166 92547 48738 98665 49494 50114 65406 28433 66393 79003 97692 65672 14638 53067 36096 57120 91807 63832 71664 16274 88880 07869 25602 90228 47210 40317 21186 08204 19000 42296 61711 96377 92133 75751 14959 50156 60496 31862 94726 54736 42523 08177 03675 15906 73502 35072 83540 56704 03867 43513 62222 47715 89150 49530 98444 89333 09634 08780 76932 59939 78054 19341 44737 74418 42631 29860 80998 88687 41326 04721 56951 62396 58645 73021 63159 81931 95167 35381 29741 67729 47867 24229 24654 36680 09806 76928 23828 06899 64004 82435 40370 14163 14965 89794 09243 23789 69070 69779 42236 25082 21688 95738 37986 23001 59377 64716 51228 93578 60158 81617 55782 97352 33446 04281 51262 72037 34314 65319 77774 16031 99066 55418 76397 92933 44195 21541 34189 94854 44734 56738 31624 99341 91318 14809 27777 10386 38773 43177 20754 56545 32207 77092 12019 05166 09628 04909 26360 19759 88281 61332 31666 36528 61932 66863 36062 73567 63035 44776 28035 04507 77235 54710 58595 48702 79081 43562 40145 17180 62464 36267 94561 27531 81340 78330 33625 42327 83944 97538 24372 05835 31147 71199 26063 81334 67768 79695 97030 98339 13077 10987 04085 91337 46414 42822 77263 46594 70474 58784 77872 01927 71528 07317 67907 70715 72134 44730 60570 07334 92436 93113 83504 93163 12840 42512 19256 51798 06941 13528 01314 70130 47816 43788 51852 90928 54520 11658 39341 96562 13491 43415 95625 86586 55705 52690 49652 09858 03385 07224 26482 93972 85847 83163 05777 75606 88876 44624 82468 57926 03953 52773 48030 48029 00587 60758 25104 74709 16439 61362 67604 49256 27420 42083 20856 61190 62545 43372 13153 59584 50687 72460 29016 18766 79524 06163 42522 57719 54291 62991 93064 55377 99140 37340 43287 52628 88963 99587 94757 29174 64263 57455 25407 90914 51357 11136 94109 11939 32519 10760 20825 20261 87985 31887 70584 29725 91677 81314 96990 09019 21169 71737 27847 68472 68608 49003 37702 42429 16513 00500 51683 23364 35038 95170 29893 92233 45172 20138 12806 96501 17844 08745 19601 21228 59937 16231 30171 14448 46409 03890 64495 44400 61986 90754 85160 26327 50529 83491 87407 86680 88183 38510 22833 45085 04860 82503 93021 33219 71551 84306 35455 00766 82829 49304 13776 55279 39751 75461 39539 84683 39363 83047 46119 96653 85815 38420 56853 38621 86725 23340 28308 71123 28278 92125 07712 62946 32295 63989 89893 58211 67456 27010 21835 64622 01349 67151 88190 97303 81198 00497 34072 39610 36854 06643 19395 09790 19069 96395 52453 00545 05806 85501 95673 02292 19139 33918 56803 44903 98205 95510 02263 53536 19204 19947 45538 59381 02343 95544 95977 83779 02374 21617 27111 72364 34354 39478 22181 85286 24085 14006 66044 33258 88569 86705 43154 70696 57474 58550 33232 33421 07301 54594 05165 53790 68662 73337 99585 11562 57843 22988 27372 31989 87571 41595 78111 96358 33005 94087 30681 21602 87649 62867 44604 77464 91599 50549 73742 56269 01049 03778 19868 35938 14657 41268 04925 64879 85561 45372 34786 73303 90468 83834 36346 55379 49864 19270 56387 29317 48723 32083 76011 23029 91136 79386 27089 43879 93620 16295 15413 37142 48928 30722 01269 01475 46684 76535 76164 77379 46752 00490 75715 55278 19653 62132 39264 06160 13635 81559 07422 02020 31872 77605 27721 90055 61484 25551 87925 30343 51398 44253 22341 57623 36106 42506 39049 75008 65627 10953 59194 65897 51413 10348 22769 30624 74353 63256 91607 81547 81811 52843 66795 70611 08615 33150 44521 27473 92454 49454 23682 88606 13408 41486 37767 00961 20715 12491 40430 27253 86076 48236 34143 34623 51897 57664 52164 13767 96903 14950 19108 57598 44239 19862 91642 19399 49072 36234 64684 41173 94032 65918 40443 78051 33389 45257 42399 50829 65912 28508 55582 15725 03107 12570 12668 30240 29295 25220 11872 67675 62204 15420 51618 41634 84756 51699 98116 14101 00299 60783 86909 29160 30288 40026 91041 40792 88621 50784 24516 70908 70006 99282 12066 04183 71806 53556 72525 32567 53286 12910 42487 76182 58297 65157 95984 70356 22262 93486 00341 58722 98053 49896 50226 29174 87882 02734 20922 22453 39856 26476 69149 05562 84250 39127 57710 28402 79980 66365 82548 89264 88025 45661 01729 67026 64076 55904 29099 45681 50652 65305 37182 94127 03369 31378 51786 09040 70866 71149 65583 43434 76933 85781 71138 64558 73678 12301 45876 87126 60348 91390 95620 09939 36103 10291 61615 28813 84379 09904 23174 73363 94804 57593 14931 40529 76347 57481 19356 70911 01377 51721 00803 15590 24853 09066 92037 67192 20332 29094 33467 68514 22144 77379 39375 17034 43661 99104 03375 11173 54719 18550 46449 02636 55128 16228 82446 25759 16333 03910 72253 83742 18214 08835 08657 39177 15096 82887 47826 56995 99574 49066 17583 44137 52239 70968 34080 05355 98491 75417 38188 39994 46974 86762 65516 58276 58483 58845 31427 75687 90029 09517 02835 29716 34456 21296 40435 23117 60066 51012 41200 65975 58512 76178 58382 92041 97484 42360 80071 93045 76189 32349 22927 96501 98751 87212 72675 07981 25547 09589 04556 35792 12210 33346 69749 92356 30254 94780 24901 14195 21238 28153 09114 07907 38602 51522 74299 58180 72471 62591 66854 51333 12394 80494 70791 19153 26734 30282 44186 04142 63639 54800 04480 02670 49624 82017 92896 47669 75831 83271 31425 17029 69234 88962 76684 40323 26092 75249 60357 99646 92565 04936 81836 09003 23809 29345 95889 70695 36534 94060 34021 66544 37558 90045 63288 22505 45255 64056 44824 65151 87547 11962 18443 96582 53375 43885 69094 11303 15095 26179 37800 29741 20766 51479 39425 90298 96959 46995 56576 12186 56196 73378 62362 56125 21632 08628 69222 10327 48892 18654 36480 22967 80705 76561 51446 32046 92790 68212 07388 37781 42335 62823 60896 32080 68222 46801 22482 61177 18589 63814 09183 90367 36722 20888 32151 37556 00372 79839 40041 52970 02878 30766 70944 47456 01345 56417 25437 09069 79396 12257 14298 94671 54357 84687 88614 44581 23145 93571 98492 25284 71605 04922 12424 70141 21478 05734 55105 00801 90869 96033 02763 47870 81081 75450 11930 71412 23390 86639 38339 52942 57869 05076 43100 63835 19834 38934 15961 31854 34754 64955 69781 03829 30971 64651 43840 70070 73604 11237 35998 43452 25161 05070 27056 23526 60127 64848 30840 76118 30130 52793 20542 74628 65403 60367 45328 65105 70658 74882 25698 15793 67897 66974 22057 50596 83440 86973 50201 41020 67235 85020 07245 22563 26513 41055 92401 90274 21624 84391 40359 98953 53945 90944 07046 91209 14093 87001 26456 00162 37428 80210 92764 57931 06579 22955 24988 72758 46101 26483 69998 92256 95968 81592 05600 10165 52563 7567"
PI = re.sub('[^1-9]', '', PI)

# observation   =  {'step': 1, 'lastOpponentAction': 1}
# configuration =  {'episodeSteps': 10, 'agentTimeout': 60, 'actTimeout': 1, 'runTimeout': 1200, 'isProduction': False, 'signs': 3}
def anti_pi_agent(observation, configuration):    
    action = (int(PI[observation.step]) + 1) % configuration.signs
    return int(action)



##### ./naive_bayes.py #####

from collections import defaultdict
from itertools import chain, combinations
import random
import sys
from typing import *

import numpy as np
from pydash import flatten


class RPSNaiveBayes():
    def __init__(self, max_memory=20, verbose=True):
        self.max_memory = max_memory
        self.verbose    = verbose
        self.history = {
            "opponent": [],
            "rotn":     [],
            "expected": [],
            "action":   [],
        }
        # self.root_keys = ['action','opponent','rotn','expected']
        self.root_keys = ['action','opponent']
        self.keys = [
            ",".join(combo)
            for n in range(1,len(self.root_keys)+1)        
            for combo in combinations(self.root_keys, n)
        ]
        # self.keys = ['action', 'opponent', 'rotn', 'action,opponent', 'action,rotn', 'opponent,rotn', 'action,opponent,rotn']
        self.memory = {
            key: defaultdict(lambda: np.array([0,0,0]))
            for key in self.keys
        }
        
    def __call__(self, obs, conf):
        return self.agent(obs, conf)


    # obs  {'remainingOverageTime': 60, 'step': 1, 'reward': 0, 'lastOpponentAction': 0}
    # conf {'episodeSteps': 10, 'actTimeout': 1, 'runTimeout': 1200, 'signs': 3, 'tieRewardThreshold': 20, 'agentTimeout': 60}
    def agent(self, obs, conf):
        # pass
        self.update_state(obs, conf)

        views          = self.get_current_views()
        log_likelihood = self.get_log_likelihood(views)
        probability    = self.get_probability(log_likelihood)

        expected = random.choices( population=[0,1,2], weights=probability, k=1 )[0]
        action   = int(expected + 1) % conf.signs
        self.history['expected'].insert(0, expected)
        self.history['action'].insert(0, action)

        if self.verbose:
            pass

        return int(action)


    def update_state(self, obs, conf):
        if obs.step > 0:
            rotn = obs.lastOpponentAction - self.history['action'][0] 

            self.history['opponent'].insert(0, obs.lastOpponentAction % conf.signs)
            self.history['rotn'].insert(0, rotn)

        for keys in self.memory.keys():
            memories = self.get_new_memories(keys)
            for value, path in memories:
                self.memory[keys][path][value] += 1


    def get_key_min_length(self, keys: str) -> int:
        min_length = min([ len(self.history[key]) for key in keys.split(',') ])
        return min_length


    def get_new_memories(self, keys: Union[str,List[str]]) -> List[Tuple[Tuple,int]]:
        min_length = self.get_key_min_length(keys)
        min_length = min(min_length, self.max_memory)
        memories   = []
        for n in range(1,min_length):
            value = self.history["opponent"][0]
            paths = []
            for key in keys.split(','):
                path = self.history[key][1:n]
                if len(path): paths.append(path)
            paths = tuple(flatten(paths))
            if len(paths):
                memories.append( (value, paths) )
        return memories


    def get_current_views(self) -> Dict[str, List[Tuple[int]]]:
        views = {
            keys: [
                tuple(flatten([value, paths]))
                for (value, paths) in self.get_new_memories(keys)
            ]
            for keys in self.memory.keys()
        }
        return views


    def get_log_likelihood(self, views: List[Tuple]) -> np.ndarray:
        log_likelihoods = np.array([.0,.0,.0])
        for keys in self.memory.keys():
            count = np.sum( np.array(list(self.memory[keys].values())).shape )
            for path in views[keys]:
                try:
                    n_unique = 3 ** len(path)
                    freqs = self.memory[keys][path] * n_unique    
                    probs = (freqs + 1) / ( count + n_unique )    # Laplacian Smoothing
                    log_likelihood = [
                        np.log(probs[a]) - np.log(probs[b] + probs[c])
                        if (probs[b] + probs[c]) > 0 else 0.0
                        for a, b, c in [ (0,1,2), (1,2,0), (2,0,1) ]
                    ]
                    log_likelihood = [ n if not np.isnan(n) else 0.0 for n in log_likelihood ]
                    log_likelihoods += np.array(log_likelihood)
                except ZeroDivisionError: pass

        return log_likelihoods

    
    def get_probability(self, log_likelihood: np.ndarray) -> np.ndarray:
        probability = np.exp(log_likelihood)
        probability[ probability == np.inf ] = sys.maxsize / len(probability) / 2
        probability = probability / np.sum(probability)
        return probability
        
            
    
    
naive_bayes_instance = RPSNaiveBayes()
def naive_bayes(obs, conf):
    return naive_bayes_instance.agent(obs, conf)



##### ./anti_anti_pi.py #####

import re

PI = "3.14159 26535 89793 23846 26433 83279 50288 41971 69399 37510 58209 74944 59230 78164 06286 20899 86280 34825 34211 70679 82148 08651 32823 06647 09384 46095 50582 23172 53594 08128 48111 74502 84102 70193 85211 05559 64462 29489 54930 38196 44288 10975 66593 34461 28475 64823 37867 83165 27120 19091 45648 56692 34603 48610 45432 66482 13393 60726 02491 41273 72458 70066 06315 58817 48815 20920 96282 92540 91715 36436 78925 90360 01133 05305 48820 46652 13841 46951 94151 16094 33057 27036 57595 91953 09218 61173 81932 61179 31051 18548 07446 23799 62749 56735 18857 52724 89122 79381 83011 94912 98336 73362 44065 66430 86021 39494 63952 24737 19070 21798 60943 70277 05392 17176 29317 67523 84674 81846 76694 05132 00056 81271 45263 56082 77857 71342 75778 96091 73637 17872 14684 40901 22495 34301 46549 58537 10507 92279 68925 89235 42019 95611 21290 21960 86403 44181 59813 62977 47713 09960 51870 72113 49999 99837 29780 49951 05973 17328 16096 31859 50244 59455 34690 83026 42522 30825 33446 85035 26193 11881 71010 00313 78387 52886 58753 32083 81420 61717 76691 47303 59825 34904 28755 46873 11595 62863 88235 37875 93751 95778 18577 80532 17122 68066 13001 92787 66111 95909 21642 01989 38095 25720 10654 85863 27886 59361 53381 82796 82303 01952 03530 18529 68995 77362 25994 13891 24972 17752 83479 13151 55748 57242 45415 06959 50829 53311 68617 27855 88907 50983 81754 63746 49393 19255 06040 09277 01671 13900 98488 24012 85836 16035 63707 66010 47101 81942 95559 61989 46767 83744 94482 55379 77472 68471 04047 53464 62080 46684 25906 94912 93313 67702 89891 52104 75216 20569 66024 05803 81501 93511 25338 24300 35587 64024 74964 73263 91419 92726 04269 92279 67823 54781 63600 93417 21641 21992 45863 15030 28618 29745 55706 74983 85054 94588 58692 69956 90927 21079 75093 02955 32116 53449 87202 75596 02364 80665 49911 98818 34797 75356 63698 07426 54252 78625 51818 41757 46728 90977 77279 38000 81647 06001 61452 49192 17321 72147 72350 14144 19735 68548 16136 11573 52552 13347 57418 49468 43852 33239 07394 14333 45477 62416 86251 89835 69485 56209 92192 22184 27255 02542 56887 67179 04946 01653 46680 49886 27232 79178 60857 84383 82796 79766 81454 10095 38837 86360 95068 00642 25125 20511 73929 84896 08412 84886 26945 60424 19652 85022 21066 11863 06744 27862 20391 94945 04712 37137 86960 95636 43719 17287 46776 46575 73962 41389 08658 32645 99581 33904 78027 59009 94657 64078 95126 94683 98352 59570 98258 22620 52248 94077 26719 47826 84826 01476 99090 26401 36394 43745 53050 68203 49625 24517 49399 65143 14298 09190 65925 09372 21696 46151 57098 58387 41059 78859 59772 97549 89301 61753 92846 81382 68683 86894 27741 55991 85592 52459 53959 43104 99725 24680 84598 72736 44695 84865 38367 36222 62609 91246 08051 24388 43904 51244 13654 97627 80797 71569 14359 97700 12961 60894 41694 86855 58484 06353 42207 22258 28488 64815 84560 28506 01684 27394 52267 46767 88952 52138 52254 99546 66727 82398 64565 96116 35488 62305 77456 49803 55936 34568 17432 41125 15076 06947 94510 96596 09402 52288 79710 89314 56691 36867 22874 89405 60101 50330 86179 28680 92087 47609 17824 93858 90097 14909 67598 52613 65549 78189 31297 84821 68299 89487 22658 80485 75640 14270 47755 51323 79641 45152 37462 34364 54285 84447 95265 86782 10511 41354 73573 95231 13427 16610 21359 69536 23144 29524 84937 18711 01457 65403 59027 99344 03742 00731 05785 39062 19838 74478 08478 48968 33214 45713 86875 19435 06430 21845 31910 48481 00537 06146 80674 91927 81911 97939 95206 14196 63428 75444 06437 45123 71819 21799 98391 01591 95618 14675 14269 12397 48940 90718 64942 31961 56794 52080 95146 55022 52316 03881 93014 20937 62137 85595 66389 37787 08303 90697 92077 34672 21825 62599 66150 14215 03068 03844 77345 49202 60541 46659 25201 49744 28507 32518 66600 21324 34088 19071 04863 31734 64965 14539 05796 26856 10055 08106 65879 69981 63574 73638 40525 71459 10289 70641 40110 97120 62804 39039 75951 56771 57700 42033 78699 36007 23055 87631 76359 42187 31251 47120 53292 81918 26186 12586 73215 79198 41484 88291 64470 60957 52706 95722 09175 67116 72291 09816 90915 28017 35067 12748 58322 28718 35209 35396 57251 21083 57915 13698 82091 44421 00675 10334 67110 31412 67111 36990 86585 16398 31501 97016 51511 68517 14376 57618 35155 65088 49099 89859 98238 73455 28331 63550 76479 18535 89322 61854 89632 13293 30898 57064 20467 52590 70915 48141 65498 59461 63718 02709 81994 30992 44889 57571 28289 05923 23326 09729 97120 84433 57326 54893 82391 19325 97463 66730 58360 41428 13883 03203 82490 37589 85243 74417 02913 27656 18093 77344 40307 07469 21120 19130 20330 38019 76211 01100 44929 32151 60842 44485 96376 69838 95228 68478 31235 52658 21314 49576 85726 24334 41893 03968 64262 43410 77322 69780 28073 18915 44110 10446 82325 27162 01052 65227 21116 60396 66557 30925 47110 55785 37634 66820 65310 98965 26918 62056 47693 12570 58635 66201 85581 00729 36065 98764 86117 91045 33488 50346 11365 76867 53249 44166 80396 26579 78771 85560 84552 96541 26654 08530 61434 44318 58676 97514 56614 06800 70023 78776 59134 40171 27494 70420 56223 05389 94561 31407 11270 00407 85473 32699 39081 45466 46458 80797 27082 66830 63432 85878 56983 05235 80893 30657 57406 79545 71637 75254 20211 49557 61581 40025 01262 28594 13021 64715 50979 25923 09907 96547 37612 55176 56751 35751 78296 66454 77917 45011 29961 48903 04639 94713 29621 07340 43751 89573 59614 58901 93897 13111 79042 97828 56475 03203 19869 15140 28708 08599 04801 09412 14722 13179 47647 77262 24142 54854 54033 21571 85306 14228 81375 85043 06332 17518 29798 66223 71721 59160 77166 92547 48738 98665 49494 50114 65406 28433 66393 79003 97692 65672 14638 53067 36096 57120 91807 63832 71664 16274 88880 07869 25602 90228 47210 40317 21186 08204 19000 42296 61711 96377 92133 75751 14959 50156 60496 31862 94726 54736 42523 08177 03675 15906 73502 35072 83540 56704 03867 43513 62222 47715 89150 49530 98444 89333 09634 08780 76932 59939 78054 19341 44737 74418 42631 29860 80998 88687 41326 04721 56951 62396 58645 73021 63159 81931 95167 35381 29741 67729 47867 24229 24654 36680 09806 76928 23828 06899 64004 82435 40370 14163 14965 89794 09243 23789 69070 69779 42236 25082 21688 95738 37986 23001 59377 64716 51228 93578 60158 81617 55782 97352 33446 04281 51262 72037 34314 65319 77774 16031 99066 55418 76397 92933 44195 21541 34189 94854 44734 56738 31624 99341 91318 14809 27777 10386 38773 43177 20754 56545 32207 77092 12019 05166 09628 04909 26360 19759 88281 61332 31666 36528 61932 66863 36062 73567 63035 44776 28035 04507 77235 54710 58595 48702 79081 43562 40145 17180 62464 36267 94561 27531 81340 78330 33625 42327 83944 97538 24372 05835 31147 71199 26063 81334 67768 79695 97030 98339 13077 10987 04085 91337 46414 42822 77263 46594 70474 58784 77872 01927 71528 07317 67907 70715 72134 44730 60570 07334 92436 93113 83504 93163 12840 42512 19256 51798 06941 13528 01314 70130 47816 43788 51852 90928 54520 11658 39341 96562 13491 43415 95625 86586 55705 52690 49652 09858 03385 07224 26482 93972 85847 83163 05777 75606 88876 44624 82468 57926 03953 52773 48030 48029 00587 60758 25104 74709 16439 61362 67604 49256 27420 42083 20856 61190 62545 43372 13153 59584 50687 72460 29016 18766 79524 06163 42522 57719 54291 62991 93064 55377 99140 37340 43287 52628 88963 99587 94757 29174 64263 57455 25407 90914 51357 11136 94109 11939 32519 10760 20825 20261 87985 31887 70584 29725 91677 81314 96990 09019 21169 71737 27847 68472 68608 49003 37702 42429 16513 00500 51683 23364 35038 95170 29893 92233 45172 20138 12806 96501 17844 08745 19601 21228 59937 16231 30171 14448 46409 03890 64495 44400 61986 90754 85160 26327 50529 83491 87407 86680 88183 38510 22833 45085 04860 82503 93021 33219 71551 84306 35455 00766 82829 49304 13776 55279 39751 75461 39539 84683 39363 83047 46119 96653 85815 38420 56853 38621 86725 23340 28308 71123 28278 92125 07712 62946 32295 63989 89893 58211 67456 27010 21835 64622 01349 67151 88190 97303 81198 00497 34072 39610 36854 06643 19395 09790 19069 96395 52453 00545 05806 85501 95673 02292 19139 33918 56803 44903 98205 95510 02263 53536 19204 19947 45538 59381 02343 95544 95977 83779 02374 21617 27111 72364 34354 39478 22181 85286 24085 14006 66044 33258 88569 86705 43154 70696 57474 58550 33232 33421 07301 54594 05165 53790 68662 73337 99585 11562 57843 22988 27372 31989 87571 41595 78111 96358 33005 94087 30681 21602 87649 62867 44604 77464 91599 50549 73742 56269 01049 03778 19868 35938 14657 41268 04925 64879 85561 45372 34786 73303 90468 83834 36346 55379 49864 19270 56387 29317 48723 32083 76011 23029 91136 79386 27089 43879 93620 16295 15413 37142 48928 30722 01269 01475 46684 76535 76164 77379 46752 00490 75715 55278 19653 62132 39264 06160 13635 81559 07422 02020 31872 77605 27721 90055 61484 25551 87925 30343 51398 44253 22341 57623 36106 42506 39049 75008 65627 10953 59194 65897 51413 10348 22769 30624 74353 63256 91607 81547 81811 52843 66795 70611 08615 33150 44521 27473 92454 49454 23682 88606 13408 41486 37767 00961 20715 12491 40430 27253 86076 48236 34143 34623 51897 57664 52164 13767 96903 14950 19108 57598 44239 19862 91642 19399 49072 36234 64684 41173 94032 65918 40443 78051 33389 45257 42399 50829 65912 28508 55582 15725 03107 12570 12668 30240 29295 25220 11872 67675 62204 15420 51618 41634 84756 51699 98116 14101 00299 60783 86909 29160 30288 40026 91041 40792 88621 50784 24516 70908 70006 99282 12066 04183 71806 53556 72525 32567 53286 12910 42487 76182 58297 65157 95984 70356 22262 93486 00341 58722 98053 49896 50226 29174 87882 02734 20922 22453 39856 26476 69149 05562 84250 39127 57710 28402 79980 66365 82548 89264 88025 45661 01729 67026 64076 55904 29099 45681 50652 65305 37182 94127 03369 31378 51786 09040 70866 71149 65583 43434 76933 85781 71138 64558 73678 12301 45876 87126 60348 91390 95620 09939 36103 10291 61615 28813 84379 09904 23174 73363 94804 57593 14931 40529 76347 57481 19356 70911 01377 51721 00803 15590 24853 09066 92037 67192 20332 29094 33467 68514 22144 77379 39375 17034 43661 99104 03375 11173 54719 18550 46449 02636 55128 16228 82446 25759 16333 03910 72253 83742 18214 08835 08657 39177 15096 82887 47826 56995 99574 49066 17583 44137 52239 70968 34080 05355 98491 75417 38188 39994 46974 86762 65516 58276 58483 58845 31427 75687 90029 09517 02835 29716 34456 21296 40435 23117 60066 51012 41200 65975 58512 76178 58382 92041 97484 42360 80071 93045 76189 32349 22927 96501 98751 87212 72675 07981 25547 09589 04556 35792 12210 33346 69749 92356 30254 94780 24901 14195 21238 28153 09114 07907 38602 51522 74299 58180 72471 62591 66854 51333 12394 80494 70791 19153 26734 30282 44186 04142 63639 54800 04480 02670 49624 82017 92896 47669 75831 83271 31425 17029 69234 88962 76684 40323 26092 75249 60357 99646 92565 04936 81836 09003 23809 29345 95889 70695 36534 94060 34021 66544 37558 90045 63288 22505 45255 64056 44824 65151 87547 11962 18443 96582 53375 43885 69094 11303 15095 26179 37800 29741 20766 51479 39425 90298 96959 46995 56576 12186 56196 73378 62362 56125 21632 08628 69222 10327 48892 18654 36480 22967 80705 76561 51446 32046 92790 68212 07388 37781 42335 62823 60896 32080 68222 46801 22482 61177 18589 63814 09183 90367 36722 20888 32151 37556 00372 79839 40041 52970 02878 30766 70944 47456 01345 56417 25437 09069 79396 12257 14298 94671 54357 84687 88614 44581 23145 93571 98492 25284 71605 04922 12424 70141 21478 05734 55105 00801 90869 96033 02763 47870 81081 75450 11930 71412 23390 86639 38339 52942 57869 05076 43100 63835 19834 38934 15961 31854 34754 64955 69781 03829 30971 64651 43840 70070 73604 11237 35998 43452 25161 05070 27056 23526 60127 64848 30840 76118 30130 52793 20542 74628 65403 60367 45328 65105 70658 74882 25698 15793 67897 66974 22057 50596 83440 86973 50201 41020 67235 85020 07245 22563 26513 41055 92401 90274 21624 84391 40359 98953 53945 90944 07046 91209 14093 87001 26456 00162 37428 80210 92764 57931 06579 22955 24988 72758 46101 26483 69998 92256 95968 81592 05600 10165 52563 7567"
PI = re.sub('[^1-9]', '', PI)

# observation   =  {'step': 1, 'lastOpponentAction': 1}
# configuration =  {'episodeSteps': 10, 'agentTimeout': 60, 'actTimeout': 1, 'runTimeout': 1200, 'isProduction': False, 'signs': 3}
def anti_anti_pi_agent(observation, configuration):    
    action = (int(PI[observation.step]) + 2) % configuration.signs
    return int(action)



##### ./anti_rotn.py #####


import random

rotn_history    = []
rotn_stats = [ 0, 0, 0 ]

# observation   =  {'step': 1, 'lastOpponentAction': 1}
# configuration =  {'episodeSteps': 10, 'agentTimeout': 60, 'actTimeout': 1, 'runTimeout': 1200, 'isProduction': False, 'signs': 3}
def anti_rotn(observation, configuration, warmup=25):    
    global rotn_history
    global rotn_stats
    
    if observation.step > 0:
        rotn_history.append( observation.lastOpponentAction )

    if len(rotn_history) >= 2:
        rotn = (rotn_history[-1] - rotn_history[-2]) % configuration.signs
        rotn_stats[ rotn ] += 1
        
    if observation.step < warmup:
        action = random.randint(0, configuration.signs-1)
    else:
        ev = list({
            0: rotn_stats[2] - rotn_stats[1], 
            1: rotn_stats[0] - rotn_stats[2], 
            2: rotn_stats[1] - rotn_stats[0], 
        }.values())
        offset = ev.index(max(ev))
        action = (offset + observation.lastOpponentAction) % configuration.signs
        pass
        
    return int(action)



##### ./geometry.py #####

import operator
import numpy as np
import cmath
from typing import List
from collections import namedtuple
import traceback
import sys


basis = np.array(
    [1, cmath.exp(2j * cmath.pi * 1 / 3), cmath.exp(2j * cmath.pi * 2 / 3)]
)


HistMatchResult = namedtuple("HistMatchResult", "idx length")


def find_all_longest(seq, max_len=None) -> List[HistMatchResult]:
    """
    Find all indices where end of `seq` matches some past.
    """
    result = []

    i_search_start = len(seq) - 2

    while i_search_start > 0:
        i_sub = -1
        i_search = i_search_start
        length = 0

        while i_search >= 0 and seq[i_sub] == seq[i_search]:
            length += 1
            i_sub -= 1
            i_search -= 1

            if max_len is not None and length > max_len:
                break

        if length > 0:
            result.append(HistMatchResult(i_search_start + 1, length))

        i_search_start -= 1

    result = sorted(result, key=operator.attrgetter("length"), reverse=True)

    return result


def probs_to_complex(p):
    return p @ basis


def _fix_probs(probs):
    """
    Put probs back into triangle. Sometimes this happens due to rounding errors or if you
    use complex numbers which are outside the triangle.
    """
    if min(probs) < 0:
        probs -= min(probs)

    probs /= sum(probs)

    return probs


def complex_to_probs(z):
    probs = (2 * (z * basis.conjugate()).real + 1) / 3
    probs = _fix_probs(probs)
    return probs


def z_from_action(action):
    return basis[action]


def sample_from_z(z):
    probs = complex_to_probs(z)
    return np.random.choice(3, p=probs)


def bound(z):
    return probs_to_complex(complex_to_probs(z))


def norm(z):
    return bound(z / abs(z))


class Pred:
    def __init__(self, *, alpha):
        self.offset = 0
        self.alpha = alpha
        self.last_feat = None

    def train(self, target):
        if self.last_feat is not None:
            offset = target * self.last_feat.conjugate()   # fixed

            self.offset = (1 - self.alpha) * self.offset + self.alpha * offset

    def predict(self, feat):
        """
        feat is an arbitrary feature with a probability on 0,1,2
        anything which could be useful anchor to start with some kind of sensible direction
        """
        feat = norm(feat)

        # offset = mean(target - feat)
        # so here we see something like: result = feat + mean(target - feat)
        # which seems natural and accounts for the correlation between target and feat
        # all RPSContest bots do no more than that as their first step, just in a different way
        
        result = feat * self.offset

        self.last_feat = feat

        return result
    
    
class BaseAgent:
    def __init__(self):
        self.my_hist = []
        self.opp_hist = []
        self.my_opp_hist = []
        self.outcome_hist = []
        self.step = None

    def __call__(self, obs, conf):
        try:
            if obs.step == 0:
                action = np.random.choice(3)
                self.my_hist.append(action)
                return action

            self.step = obs.step

            opp = int(obs.lastOpponentAction)
            my = self.my_hist[-1]

            self.my_opp_hist.append((my, opp))
            self.opp_hist.append(opp)

            outcome = {0: 0, 1: 1, 2: -1}[(my - opp) % 3]
            self.outcome_hist.append(outcome)

            action = self.action()

            self.my_hist.append(action)

            return action
        except Exception:
            traceback.print_exc(file=sys.stderr)
            raise

    def action(self):
        pass


class Agent(BaseAgent):
    def __init__(self, alpha=0.01):
        super().__init__()

        self.predictor = Pred(alpha=alpha)

    def action(self):
        self.train()

        pred = self.preds()

        return_action = sample_from_z(pred)

        return return_action

    def train(self):
        last_beat_opp = z_from_action((self.opp_hist[-1] + 1) % 3)
        self.predictor.train(last_beat_opp)

    def preds(self):
        hist_match = find_all_longest(self.my_opp_hist, max_len=20)

        if not hist_match:
             return 0

        feat = z_from_action(self.opp_hist[hist_match[0].idx])

        pred = self.predictor.predict(feat)

        return pred
    
    
agent = Agent()


def geometry_agent(obs, conf):
    return agent(obs, conf)



##### ./iocaine.py #####


import random


def recall(age, hist):
    """Looking at the last 'age' points in 'hist', finds the
    last point with the longest similarity to the current point,
    returning 0 if none found."""
    end, length = 0, 0
    for past in range(1, min(age + 1, len(hist) - 1)):
        if length >= len(hist) - past: break
        for i in range(-1 - length, 0):
            if hist[i - past] != hist[i]: break
        else:
            for length in range(length + 1, len(hist) - past):
                if hist[-past - length - 1] != hist[-length - 1]: break
            else: length += 1
            end = len(hist) - past
    return end

def beat(i):
    return (i + 1) % 3
def loseto(i):
    return (i - 1) % 3

class Stats:
    """Maintains three running counts and returns the highest count based
         on any given time horizon and threshold."""
    def __init__(self):
        self.sum = [[0, 0, 0]]
    def add(self, move, score):
        self.sum[-1][move] += score
    def advance(self):
        self.sum.append(self.sum[-1])
    def max(self, age, default, score):
        if age >= len(self.sum): diff = self.sum[-1]
        else: diff = [self.sum[-1][i] - self.sum[-1 - age][i] for i in range(3)]
        m = max(diff)
        if m > score: return diff.index(m), m
        return default, score

class Predictor:
    """The basic iocaine second- and triple-guesser.    Maintains stats on the
         past benefits of trusting or second- or triple-guessing a given strategy,
         and returns the prediction of that strategy (or the second- or triple-
         guess) if past stats are deviating from zero farther than the supplied
         "best" guess so far."""
    def __init__(self):
        self.stats = Stats()
        self.lastguess = -1
    def addguess(self, lastmove, guess):
        if lastmove != -1:
            diff = (lastmove - self.prediction) % 3
            self.stats.add(beat(diff), 1)
            self.stats.add(loseto(diff), -1)
            self.stats.advance()
        self.prediction = guess
    def bestguess(self, age, best):
        bestdiff = self.stats.max(age, (best[0] - self.prediction) % 3, best[1])
        return (bestdiff[0] + self.prediction) % 3, bestdiff[1]

ages = [1000, 100, 10, 5, 2, 1]

class Iocaine:

    def __init__(self):
        """Build second-guessers for 50 strategies: 36 history-based strategies,
             12 simple frequency-based strategies, the constant-move strategy, and
             the basic random-number-generator strategy.    Also build 6 meta second
             guessers to evaluate 6 different time horizons on which to score
             the 50 strategies' second-guesses."""
        self.predictors = []
        self.predict_history = self.predictor((len(ages), 2, 3))
        self.predict_frequency = self.predictor((len(ages), 2))
        self.predict_fixed = self.predictor()
        self.predict_random = self.predictor()
        self.predict_meta = [Predictor() for a in range(len(ages))]
        self.stats = [Stats() for i in range(2)]
        self.histories = [[], [], []]

    def predictor(self, dims=None):
        """Returns a nested array of predictor objects, of the given dimensions."""
        if dims: return [self.predictor(dims[1:]) for i in range(dims[0])]
        self.predictors.append(Predictor())
        return self.predictors[-1]

    def move(self, them):
        """The main iocaine "move" function."""

        # histories[0] stores our moves (last one already previously decided);
        # histories[1] stores their moves (last one just now being supplied to us);
        # histories[2] stores pairs of our and their last moves.
        # stats[0] and stats[1] are running counters our recent moves and theirs.
        if them != -1:
            self.histories[1].append(them)
            self.histories[2].append((self.histories[0][-1], them))
            for watch in range(2):
                self.stats[watch].add(self.histories[watch][-1], 1)

        # Execute the basic RNG strategy and the fixed-move strategy.
        rand = random.randrange(3)
        self.predict_random.addguess(them, rand)
        self.predict_fixed.addguess(them, 0)

        # Execute the history and frequency stratgies.
        for a, age in enumerate(ages):
            # For each time window, there are three ways to recall a similar time:
            # (0) by history of my moves; (1) their moves; or (2) pairs of moves.
            # Set "best" to these three timeframes (zero if no matching time).
            best = [recall(age, hist) for hist in self.histories]
            for mimic in range(2):
                # For each similar historical moment, there are two ways to anticipate
                # the future: by mimicing what their move was; or mimicing what my
                # move was.    If there were no similar moments, just move randomly.
                for watch, when in enumerate(best):
                    if not when: move = rand
                    else: move = self.histories[mimic][when]
                    self.predict_history[a][mimic][watch].addguess(them, move)
                # Also we can anticipate the future by expecting it to be the same
                # as the most frequent past (either counting their moves or my moves).
                mostfreq, score = self.stats[mimic].max(age, rand, -1)
                self.predict_frequency[a][mimic].addguess(them, mostfreq)

        # All the predictors have been updated, but we have not yet scored them
        # and chosen a winner for this round.    There are several timeframes
        # on which we can score second-guessing, and we don't know timeframe will
        # do best.    So score all 50 predictors on all 6 timeframes, and record
        # the best 6 predictions in meta predictors, one for each timeframe.
        for meta, age in enumerate(ages):
            best = (-1, -1)
            for predictor in self.predictors:
                best = predictor.bestguess(age, best)
            self.predict_meta[meta].addguess(them, best[0])

        # Finally choose the best meta prediction from the final six, scoring
        # these against each other on the whole-game timeframe. 
        best = (-1, -1)
        for meta in range(len(ages)):
            best = self.predict_meta[meta].bestguess(len(self.histories[0]) , best) 

        # We've picked a next move.    Record our move in histories[0] for next time.
        self.histories[0].append(best[0])

        # And return it.
        return best[0]

iocaine = None

def iocaine_agent(observation, configuration):
    global iocaine
    if observation.step == 0:
        iocaine = Iocaine()
        act = iocaine.move(-1)
    else:
        act = iocaine.move(observation.lastOpponentAction)
        
    return act



##### ./decision_tree_2.py #####


import time
import os
import random
import numpy as np
from typing import List, Dict
from sklearn.tree import DecisionTreeClassifier

def random_agent(observation, configuration):
    return random.randint(0, configuration.signs-1)

def rock_agent(observation, configuration):
    return 0

def paper_agent(observation, configuration):
    return 1

def scissors_agent(observation, configuration):
    return 2

def sequential_agent(observation, configuration):
    return observation.step % configuration.signs



def get_winstats(decision_tree_history_2) -> Dict[str,int]:
    total = len(decision_tree_history_2['action'])
    wins = 0
    draw = 0
    loss = 0 
    for n in range(total):
        if   decision_tree_history_2['action'][n] == decision_tree_history_2['opponent'][n] + 1: wins +=  1
        elif decision_tree_history_2['action'][n] == decision_tree_history_2['opponent'][n]:     draw +=  1
        elif decision_tree_history_2['action'][n] == decision_tree_history_2['opponent'][n] - 1: loss +=  1
    return { "wins": wins, "draw": draw, "loss": loss }

def get_winrate(decision_tree_history_2):
    winstats = get_winstats(decision_tree_history_2)
    winrate  = winstats['wins'] / (winstats['wins'] + winstats['loss']) if (winstats['wins'] + winstats['loss']) else 0
    return winrate
    
    
# Initialize starting decision_tree_history_2
decision_tree_history_2 = {
    "step":        [],
    "prediction1": [],
    "prediction2": [],
    "expected":    [],
    "action":      [],
    "opponent":    [],
}

# NOTE: adding statistics causes the DecisionTree to make random moves 
def get_statistics(values) -> List[float]:
    values = np.array(values)
    return [
        np.count_nonzero(values == n) / len(values)
        if len(values) else 0.0
        for n in [0,1,2]
    ]


# observation   =  {'step': 1, 'lastOpponentAction': 1}
# configuration =  {'episodeSteps': 10, 'agentTimeout': 60, 'actTimeout': 1, 'runTimeout': 1200, 'isProduction': False, 'signs': 3}
def decision_tree_agent_2(observation, configuration, window=5, stages=2, random_freq=0.66, warmup_period=10, max_samples=1000):    
    global decision_tree_history_2
    warmup_period   = warmup_period  # if os.environ.get('KAGGLE_KERNEL_RUN_TYPE','') != 'Interactive' else 0
    models          = [ None ] + [ DecisionTreeClassifier() ] * stages
    
    time_start      = time.perf_counter()
    actions         = list(range(configuration.signs))  # [0,1,2]
    
    step            = observation.step
    last_action     = decision_tree_history_2['action'][-1]          if len(decision_tree_history_2['action']) else 2
    opponent_action = observation.lastOpponentAction if observation.step > 0   else 2
        
    if observation.step > 0:
        decision_tree_history_2['opponent'].append(opponent_action)
        
    winrate  = get_winrate(decision_tree_history_2)
    winstats = get_winstats(decision_tree_history_2)
    
    # Set default values     
    prediction1 = random.randint(0,2)
    prediction2 = random.randint(0,2)
    prediction3 = random.randint(0,2)
    expected    = random.randint(0,2)

    # We need at least some turns of decision_tree_history_2 for DecisionTreeClassifier to work
    if observation.step >= window:
        # First we try to predict the opponents next move based on move decision_tree_history_2
        # TODO: create windowed decision_tree_history_2
        try:
            n_start = max(1, len(decision_tree_history_2['opponent']) - window - max_samples) 
            # pass
            if stages >= 1:
                X = np.stack([
                    np.array([
                        # get_statistics(decision_tree_history_2['action'][:n+window]),
                        # get_statistics(decision_tree_history_2['opponent'][:n-1+window]),
                        decision_tree_history_2['action'][n:n+window], 
                        decision_tree_history_2['opponent'][n:n+window]
                    ]).flatten()
                    for n in range(n_start,len(decision_tree_history_2['opponent'])-window-warmup_period) 
                ])
                Y = np.array([
                    decision_tree_history_2['opponent'][n+window]
                    for n in range(n_start,len(decision_tree_history_2['opponent'])-window-warmup_period) 
                ])  
                Z = np.array([
                    # get_statistics(decision_tree_history_2['action']),
                    # get_statistics(decision_tree_history_2['opponent']),
                    decision_tree_history_2['action'][-window+1:] + [ last_action ], 
                    decision_tree_history_2['opponent'][-window:] 
                ]).flatten().reshape(1, -1)

                models[1].fit(X, Y)
                expected = prediction1 = models[1].predict(Z)[0]

            if stages >= 2:
                # Now retrain including prediction decision_tree_history_2
                X = np.stack([
                    np.array([
                        # get_statistics(decision_tree_history_2['action'][:n+window]),
                        # get_statistics(decision_tree_history_2['prediction1'][:n+window]),
                        # get_statistics(decision_tree_history_2['opponent'][:n-1+window]),
                        decision_tree_history_2['action'][n:n+window], 
                        decision_tree_history_2['prediction1'][n:n+window],
                        decision_tree_history_2['opponent'][n:n+window],
                    ]).flatten()
                    for n in range(n_start,len(decision_tree_history_2['opponent'])-window-warmup_period) 
                ])
                Y = np.array([
                    decision_tree_history_2['opponent'][n+window]
                    for n in range(n_start,len(decision_tree_history_2['opponent'])-window-warmup_period) 
                ])  
                Z = np.array([
                    # get_statistics(decision_tree_history_2['action']),
                    # get_statistics(decision_tree_history_2['prediction1']),
                    # get_statistics(decision_tree_history_2['opponent']),
                    decision_tree_history_2['action'][-window+1:]      + [ last_action ], 
                    decision_tree_history_2['prediction1'][-window+1:] + [ prediction1 ],
                    decision_tree_history_2['opponent'][-window:] 
                ]).flatten().reshape(1, -1)

                models[2].fit(X, Y)
                expected = prediction2 = models[2].predict(Z)[0]

            if stages >= 3:
                # Now retrain including prediction decision_tree_history_2
                X = np.stack([
                    np.array([
                        # get_statistics(decision_tree_history_2['action'][:n+window]),
                        # get_statistics(decision_tree_history_2['prediction1'][:n+window]),
                        # get_statistics(decision_tree_history_2['prediction2'][:n+window]),
                        # get_statistics(decision_tree_history_2['opponent'][:n-1+window]),
                        decision_tree_history_2['action'][n:n+window], 
                        decision_tree_history_2['prediction1'][n:n+window],
                        decision_tree_history_2['prediction2'][n:n+window],
                        decision_tree_history_2['opponent'][n:n+window],
                    ]).flatten()
                    for n in range(n_start,len(decision_tree_history_2['opponent'])-window-warmup_period) 
                ])
                Y = np.array([
                    decision_tree_history_2['opponent'][n+window]
                    for n in range(n_start,len(decision_tree_history_2['opponent'])-window-warmup_period) 
                ])  
                Z = np.array([
                    # get_statistics(decision_tree_history_2['action']),
                    # get_statistics(decision_tree_history_2['prediction1']),
                    # get_statistics(decision_tree_history_2['prediction2']),
                    # get_statistics(decision_tree_history_2['opponent']),
                    decision_tree_history_2['action'][-window+1:]      + [ last_action ], 
                    decision_tree_history_2['prediction1'][-window+1:] + [ prediction1 ],
                    decision_tree_history_2['prediction2'][-window+1:] + [ prediction2 ],
                    decision_tree_history_2['opponent'][-window:] 
                ]).flatten().reshape(1, -1)

                models[3].fit(X, Y)
                expected = prediction3 = models[3].predict(Z)[0]
        
        except Exception as exception:
            pass
                    
    # During the warmup period, play random to get a feel for the opponent 
    if (observation.step <= max(warmup_period,window)):
        actor  = 'warmup'
        action = random_agent(observation, configuration)    
    
    # Play a purely random move occasionally, which will hopefully distort any opponent statistics
    elif (random.random() <= random_freq):
        actor  = 'random'
        action = random_agent(observation, configuration)
        
    # But mostly use DecisionTreeClassifier to predict the next move
    else:
        actor  = 'DecisionTree'
        action = (expected + 1) % configuration.signs
    
    # Persist state
    decision_tree_history_2['step'].append(step)
    decision_tree_history_2['prediction1'].append(prediction1)
    decision_tree_history_2['prediction2'].append(prediction2)
    decision_tree_history_2['expected'].append(expected)
    decision_tree_history_2['action'].append(action)
    if observation.step == 0:  # keep arrays equal length
        decision_tree_history_2['opponent'].append(random.randint(0, 2))


    # Print debug information
    time_taken = time.perf_counter() - time_start
    # pass    
    pass    
    return int(action)



##### ./centrifugal_bumblepuppy_v4.py #####

code_bumblepuppy = compile(
    """
#                         WoofWoofWoof
#                     Woof            Woof
#                Woof                      Woof
#              Woof                          Woof
#             Woof  Centrifugal Bumble-puppy  Woof
#              Woof                          Woof
#                Woof                      Woof
#                     Woof            Woof
#                         WoofWoofWoof

import random

number_of_predictors = 60 #yes, this really has 60 predictors.
number_of_metapredictors = 4 #actually, I lied! This has 240 predictors.


if not input:
	limits = [50,20,6]
	beat={'R':'P','P':'S','S':'R'}
	urmoves=""
	mymoves=""
	DNAmoves=""
	outputs=[random.choice("RPS")]*number_of_metapredictors
	predictorscore1=[3]*number_of_predictors
	predictorscore2=[3]*number_of_predictors
	predictorscore3=[3]*number_of_predictors
	predictorscore4=[3]*number_of_predictors
	nuclease={'RP':'a','PS':'b','SR':'c','PR':'d','SP':'e','RS':'f','RR':'g','PP':'h','SS':'i'}
	length=0
	predictors=[random.choice("RPS")]*number_of_predictors
	metapredictors=[random.choice("RPS")]*number_of_metapredictors
	metapredictorscore=[3]*number_of_metapredictors
else:

	for i in range(number_of_predictors):
		#metapredictor 1
		predictorscore1[i]*=0.8
		predictorscore1[i]+=(input==predictors[i])*3
		predictorscore1[i]-=(input==beat[beat[predictors[i]]])*3
		#metapredictor 2: beat metapredictor 1 (probably contains a bug)
		predictorscore2[i]*=0.8
		predictorscore2[i]+=(output==predictors[i])*3
		predictorscore2[i]-=(output==beat[beat[predictors[i]]])*3
		#metapredictor 3
		predictorscore3[i]+=(input==predictors[i])*3
		if input==beat[beat[predictors[i]]]:
			predictorscore3[i]=0
		#metapredictor 4: beat metapredictor 3 (probably contains a bug)
		predictorscore4[i]+=(output==predictors[i])*3
		if output==beat[beat[predictors[i]]]:
			predictorscore4[i]=0
			
	for i in range(number_of_metapredictors):
		metapredictorscore[i]*=0.96
		metapredictorscore[i]+=(input==metapredictors[i])*3
		metapredictorscore[i]-=(input==beat[beat[metapredictors[i]]])*3
		
	
	#Predictors 1-18: History matching
	urmoves+=input		
	mymoves+=output
	DNAmoves+=nuclease[input+output]
	length+=1
	
	for z in range(3):
		limit = min([length,limits[z]])
		j=limit
		while j>=1 and not DNAmoves[length-j:length] in DNAmoves[0:length-1]:
			j-=1
		if j>=1:
			i = DNAmoves.rfind(DNAmoves[length-j:length],0,length-1) 
			predictors[0+6*z] = urmoves[j+i] 
			predictors[1+6*z] = beat[mymoves[j+i]] 
		j=limit			
		while j>=1 and not urmoves[length-j:length] in urmoves[0:length-1]:
			j-=1
		if j>=1:
			i = urmoves.rfind(urmoves[length-j:length],0,length-1) 
			predictors[2+6*z] = urmoves[j+i] 
			predictors[3+6*z] = beat[mymoves[j+i]] 
		j=limit
		while j>=1 and not mymoves[length-j:length] in mymoves[0:length-1]:
			j-=1
		if j>=1:
			i = mymoves.rfind(mymoves[length-j:length],0,length-1) 
			predictors[4+6*z] = urmoves[j+i] 
			predictors[5+6*z] = beat[mymoves[j+i]]
	#Predictor 19,20: RNA Polymerase		
	L=len(mymoves)
	i=DNAmoves.rfind(DNAmoves[L-j:L-1],0,L-2)
	while i==-1:
		j-=1
		i=DNAmoves.rfind(DNAmoves[L-j:L-1],0,L-2)
		if j<2:
			break
	if i==-1 or j+i>=L:
		predictors[18]=predictors[19]=random.choice("RPS")
	else:
		predictors[18]=beat[mymoves[j+i]]
		predictors[19]=urmoves[j+i]

	#Predictors 21-60: rotations of Predictors 1:20
	for i in range(20,60):
		predictors[i]=beat[beat[predictors[i-20]]] #Trying to second guess me?
	
	metapredictors[0]=predictors[predictorscore1.index(max(predictorscore1))]
	metapredictors[1]=beat[predictors[predictorscore2.index(max(predictorscore2))]]
	metapredictors[2]=predictors[predictorscore3.index(max(predictorscore3))]
	metapredictors[3]=beat[predictors[predictorscore4.index(max(predictorscore4))]]
	
	#compare predictors
output = beat[metapredictors[metapredictorscore.index(max(metapredictorscore))]]
if max(metapredictorscore)<0:
	output = beat[random.choice(urmoves)]
""", '<string>', 'exec')
gg_bumblepuppy = {}


def centrifugal_bumblepuppy(observation, configuration):
    global gg_bumblepuppy
    global code_bumblepuppy
    inp = ''
    try:
        inp = 'RPS'[observation.lastOpponentAction]
    except:
        pass
    gg_bumblepuppy['input'] = inp
    exec(code_bumblepuppy, gg_bumblepuppy)
    return {'R': 0, 'P': 1, 'S': 2}[gg_bumblepuppy['output']]





##### ./CNN.py ####

bs = 6 # batch size  

opponent_actions = []
agent_actions = []
actions = []
batch_x = []
batch_y = []

class RPS(nn.Module):
    """
    Class that predict logits of action probabilities given game history.
        Inputs: game history [bs, 2, 10].
        Outputs: logits of action probabilities [bs, 3].
    """
    def __init__(self):
        super().__init__()
        self.conv_layers = nn.Sequential(
            nn.Conv1d(2, 4, 3, 1, 1, bias=False),
            nn.ReLU(),
            nn.AvgPool1d(2),
            nn.Conv1d(4, 8, 3, 1, 1, bias=False),
            nn.ReLU(),
            nn.AvgPool1d(2),
            nn.Conv1d(8, 16, 2, 1, 1, bias=False),
            nn.ReLU(),
            nn.AvgPool1d(2)
        )
        self.head = nn.Sequential(
            nn.Linear(16, 6),
            nn.ReLU(),
            nn.Linear(6, 3)
        )

    def forward(self, x):
        x = self.conv_layers(x)
        x = torch.flatten(x, 1)
        x = self.head(x)
        return x

def soft_cross_entropy(target, prediciton):
    log_probs = nn.functional.log_softmax(prediciton, dim=1)
    sce = -(target * log_probs).sum() / target.shape[0]
    return sce

def train_step(model, data, optimizer):
    model.train()
    torch.set_grad_enabled(True)

    X = data['X'].view(-1, 2, 10)
    y = data['y'].view(-1, 3)
    prd = model(X)
    loss = soft_cross_entropy(y, prd)

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

model = RPS()
optimizer = optim.Adam(model.parameters(), lr=1e-3)

def CNN_agent(observation, configuration):
    
    global actions, agent_actions, opponent_actions
    global model, optimizer
    global batch_x, batch_y
    global bs
    
    # first step
    if observation.step == 0:
        hand = np.random.randint(2)
        actions.append(hand)
        return hand
    
    # first warm up rounds
    if 0 < observation.step < 12:
        opponent_actions.append(observation.lastOpponentAction)
        agent_actions.append(actions[-1])
        hand = np.random.randint(2)
        actions.append(hand)
        return hand
    
    # start to train CNN
    elif observation.step >= 12:
        opponent_actions.append(observation.lastOpponentAction)
        agent_actions.append(actions[-1])
        
        wining_action = (opponent_actions[-1] + 1) % 3 
        fair_action = opponent_actions[-1]
        lose_action = (opponent_actions[-1] - 1) % 3 

        # soft labels for target    
        y = [0, 0, 0]
        y[wining_action] = 0.7
        y[fair_action] = 0.2
        y[lose_action] = 0.1 
        
        # add data for history
        batch_x.append([opponent_actions[-2:-12:-1],
                        agent_actions[-2:-12:-1]])
        batch_y.append(y)
        
        # data for single CNN update 
        data = {'X': torch.Tensor([opponent_actions[-2:-12:-1],
                                   agent_actions[-2:-12:-1]]),
                'y': torch.Tensor(y)} 
        
        # evaluate single training step
        train_step(model, data, optimizer)
        
        # evaluate mini-batch training steps
        if observation.step % 10 == 0:
            k = 1 if observation.step < 100 else 3
            for _ in range(k):
                idxs = np.random.choice(list(range(len(batch_y))), bs)
                data = {'X': torch.Tensor(np.array(batch_x)[idxs]),
                        'y': torch.Tensor(np.array(batch_y)[idxs])}
                train_step(model, data, optimizer)
        
        # data for current action prediction
        X_prd = torch.Tensor([opponent_actions[-1:-11:-1],
                              agent_actions[-1:-11:-1]]).view(1, 2, -1)
        
        # predict logits
        probs = model(X_prd).view(3)
        # calculate probabilities
        probs = nn.functional.softmax(probs, dim=0).detach().cpu().numpy()
        
        # choose action
        hand = np.random.choice([0, 1, 2], p=probs)
        actions.append(hand)
        
        return int(hand)




#### ./anti_otm.py ####
T = np.zeros((3, 3))
P = np.zeros((3, 3))

a1, a2 = None, None
last_action = None # track my action.


###########################################
# Original agent with modifications marked ->
###########################################

def anti_transition_agent(observation, configuration):
    global T, P, a1, a2, last_action
    if observation.step > 1:
        a1 = last_action   # on me only; take mirrored view on game
        T[a2, a1] += 1
        P = np.divide(T, np.maximum(1, T.sum(axis=1)).reshape(-1, 1))
        a2 = a1
        if np.sum(P[a1, :]) == 1:
            probs = P[a1,:]
            
            probs += 0.63 * np.roll(probs, 1)    # This is the magic addition of phase
            
            result = (int(probs.argmax()) + 1) % 3   # Changed to argmax instead of stochastic
        else:
            result = int(np.random.randint(3))
    else:
        if observation.step == 1:
            a2 = last_action    # on me only
        result = int(np.random.randint(3))
        
    result = (result + 1) % 3  # beat what he would have done
        
    last_action = result
        
    return result



##### ./multi_armed_bandit.py #####


from collections import defaultdict
import numpy as np
import time
from operator import itemgetter

mlb_opponent = []
mlb_expected = defaultdict(list)
mlb_agents   = {
#     'random':               (lambda obs, conf: random_agent(obs, conf)),
#     'pi':                   (lambda obs, conf: pi_agent(obs, conf)),
    'anti_pi':              (lambda obs, conf: anti_pi_agent(obs, conf)),
#     'anti_anti_pi':         (lambda obs, conf: anti_anti_pi_agent(obs, conf)),
#     'reactionary':          (lambda obs, conf: reactionary(obs, conf)),
#     'anti_rotn':            (lambda obs, conf: anti_rotn(obs, conf, warmup=1)),
    
    'iou2':                  (lambda obs, conf: iou2_agent(obs, conf)),
    'geometry':              (lambda obs, conf: geometry_agent(obs, conf)),
    'memory_patterns_v20':   (lambda obs, conf: memory_patterns_v20(obs, conf)),
    'testing_please_ignore': (lambda obs, conf: testing_please_ignore(obs, conf)),
    'bumblepuppy':           (lambda obs, conf: centrifugal_bumblepuppy(obs, conf)), 
    'dllu1_agent':           (lambda obs, conf: dllu1_agent(obs, conf)), 
    
    'memory_patterns':       (lambda obs, conf: memory_patterns(obs, conf)),
    'naive_bayes':           (lambda obs, conf: naive_bayes(obs, conf)),
    'iocaine':               (lambda obs, conf: iocaine_agent(obs, conf)),
    'greenberg':             (lambda obs, conf: greenberg_agent(obs, conf)),
    'statistical':           (lambda obs, conf: statistical_prediction_agent(obs, conf)),
    'statistical_expected':  (lambda obs, conf: statistical_history['expected'][-1] + 1),       
    'decision_tree_1':       (lambda obs, conf: decision_tree_agent_1(obs, conf, stages=1, window=4)),
    'decision_tree_2':       (lambda obs, conf: decision_tree_agent_2(obs, conf, stages=2, window=6)),
    'decision_tree_3':       (lambda obs, conf: decision_tree_agent_3(obs, conf, stages=3, window=10)),
    'CNN':                   (lambda obs, conf: CNN_agent(obs, conf)),
    'anti_otm':              (lambda obs, conf: anti_transition_agent(obs, conf)),
}

# observation   = {'step': 1, 'lastOpponentAction': 1}
# configuration = {'episodeSteps': 10, 'agentTimeout': 60, 'actTimeout': 1, 'runTimeout': 1200, 'isProduction': False, 'signs': 3}
def multi_armed_bandit_agent(observation, configuration, warmup=1, step_reward=3, decay_rate=0.95 ):
    global mlb_expected
    global mlb_opponent
    global mlb_agents
    time_start = time.perf_counter()
    if os.environ.get('KAGGLE_KERNEL_RUN_TYPE', 'Localhost') == 'Interactive':
        warmup = 1

    
    if observation.step != 0: 
        mlb_opponent += [ observation.lastOpponentAction ]
    # else:
    #     mlb_opponent += [ random_agent(observation, configuration) ]
    
    
    # Implement Multi Armed Bandit Logic
    win_loss_scores = defaultdict(lambda: [0.0, 0.0])
    for name, values in list(mlb_expected.items()):
        for n in range(min(len(values), len(mlb_opponent))):
            win_loss_scores[name][1] = (win_loss_scores[name][1] - 1) * decay_rate + 1
            win_loss_scores[name][0] = (win_loss_scores[name][0] - 1) * decay_rate + 1
            
            # win | expect rock, play paper -> opponent plays rock
            if   mlb_expected[name][n] == (mlb_opponent[n] + 0) % configuration.signs:                
                win_loss_scores[name][0] += step_reward 
                
            # draw | expect rock, play paper -> opponent plays paper
            elif mlb_expected[name][n] == (mlb_opponent[n] + 1) % configuration.signs:  
                win_loss_scores[name][0] += step_reward 
                win_loss_scores[name][1] += step_reward 
                
            # win | expect rock, play paper -> opponent plays scissors
            elif mlb_expected[name][n] == (mlb_opponent[n] + 2) % configuration.signs:
                win_loss_scores[name][1] += step_reward 
      

    # Update predictions for next turn
    for name, agent_fn in list(mlb_agents.items()):
        try:
            agent_action        = agent_fn(observation, configuration)
            agent_expected      = (agent_action - 1) % configuration.signs
            mlb_expected[name] += [ agent_expected ]
        except Exception as exception:
            print('Exception:', name, agent_fn, exception)
    
    
    # Pick the Best Agent
    beta_scores = {
        name: np.random.beta(win_loss_scores[name][0], win_loss_scores[name][1])
        for name in win_loss_scores.keys()
    }

    if observation.step == 0:
        # Always play scissors first move
        # At Auction       - https://www.artsy.net/article/artsy-editorial-christies-sothebys-played-rock-paper-scissors-20-million-consignment
        # EDA best by test - https://www.kaggle.com/jamesmcguigan/rps-episode-archive-dataset-eda
        agent_name = 'scissors'
        expected = 1  
    elif observation.step < warmup:
        agent_name = 'random'
        expected   = random_agent(observation, configuration)       
    else:
        agent_name = sorted(beta_scores.items(), key=itemgetter(1), reverse=True)[0][0]
        expected   = mlb_expected[agent_name][-1]
    
    action = (expected + 1) % configuration.signs
        
                
    time_taken = time.perf_counter() - time_start
    print(f'opponent        = ', mlb_opponent)
    print(f'expected        = ', dict(mlb_expected),    '\n')
    print(f'win_loss_scores = ', dict(win_loss_scores), '\n')
    print(f'beta_scores     = ', dict(beta_scores),     '\n')
    print(f'action          =  {action} | agent = {agent_name} | step = {observation.step} | {time_taken:.3f}s')
    print('-'*20, '\n')
    
    return int(action)






In [None]:
%%writefile "CNN.py"
import numpy as np

import torch
from torch import nn, optim

from kaggle_environments import evaluate, make, utils
from kaggle_environments.envs.rps.utils import get_score
from kaggle_environments.envs.rps.agents import *

class RPS(nn.Module):
    """
    Class that predict logits of action probabilities given game history.
        Inputs: game history [bs, 2, 10].
        Outputs: logits of action probabilities [bs, 3].
    """
    def __init__(self):
        super().__init__()
        self.conv_layers = nn.Sequential(
            nn.Conv1d(2, 4, 3, 1, 1, bias=False),
            nn.ReLU(),
            nn.AvgPool1d(2),
            nn.Conv1d(4, 8, 3, 1, 1, bias=False),
            nn.ReLU(),
            nn.AvgPool1d(2),
            nn.Conv1d(8, 16, 2, 1, 1, bias=False),
            nn.ReLU(),
            nn.AvgPool1d(2)
        )
        self.head = nn.Sequential(
            nn.Linear(16, 6),
            nn.ReLU(),
            nn.Linear(6, 3)
        )

    def forward(self, x):
        x = self.conv_layers(x)
        x = torch.flatten(x, 1)
        x = self.head(x)
        return x

def soft_cross_entropy(target, prediciton):
    log_probs = nn.functional.log_softmax(prediciton, dim=1)
    sce = -(target * log_probs).sum() / target.shape[0]
    return sce

def train_step(model, data, optimizer):
    model.train()
    torch.set_grad_enabled(True)

    X = data['X'].view(-1, 2, 10)
    y = data['y'].view(-1, 3)
    prd = model(X)
    loss = soft_cross_entropy(y, prd)

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

bs = 6 # batch size  

opponent_actions = []
agent_actions = []
actions = []
batch_x = []
batch_y = []



model = RPS()
optimizer = optim.Adam(model.parameters(), lr=1e-3)

def agent(observation, configuration):
    
    global actions, agent_actions, opponent_actions
    global model, optimizer
    global batch_x, batch_y
    global bs
    
    # first step
    if observation.step == 0:
        hand = np.random.randint(2)
        actions.append(hand)
        return hand
    
    # first warm up rounds
    if 0 < observation.step < 12:
        opponent_actions.append(observation.lastOpponentAction)
        agent_actions.append(actions[-1])
        hand = np.random.randint(2)
        actions.append(hand)
        return hand
    
    # start to train CNN
    elif observation.step >= 12:
        opponent_actions.append(observation.lastOpponentAction)
        agent_actions.append(actions[-1])
        
        wining_action = (opponent_actions[-1] + 1) % 3 
        fair_action = opponent_actions[-1]
        lose_action = (opponent_actions[-1] - 1) % 3 

        # soft labels for target    
        y = [0, 0, 0]
        y[wining_action] = 0.7
        y[fair_action] = 0.2
        y[lose_action] = 0.1 
        
        # add data for history
        batch_x.append([opponent_actions[-2:-12:-1],
                        agent_actions[-2:-12:-1]])
        batch_y.append(y)
        
        # data for single CNN update 
        data = {'X': torch.Tensor([opponent_actions[-2:-12:-1],
                                   agent_actions[-2:-12:-1]]),
                'y': torch.Tensor(y)} 
        
        # evaluate single training step
        train_step(model, data, optimizer)
        
        # evaluate mini-batch training steps
        if observation.step % 10 == 0:
            k = 1 if observation.step < 100 else 3
            for _ in range(k):
                idxs = np.random.choice(list(range(len(batch_y))), bs)
                data = {'X': torch.Tensor(np.array(batch_x)[idxs]),
                        'y': torch.Tensor(np.array(batch_y)[idxs])}
                train_step(model, data, optimizer)
        
        # data for current action prediction
        X_prd = torch.Tensor([opponent_actions[-1:-11:-1],
                              agent_actions[-1:-11:-1]]).view(1, 2, -1)
        
        # predict logits
        probs = model(X_prd).view(3)
        # calculate probabilities
        probs = nn.functional.softmax(probs, dim=0).detach().cpu().numpy()
        
        # choose action
        hand = np.random.choice([0, 1, 2], p=probs)
        actions.append(hand)
        
        return int(hand)


# Q-Learning Code

There are a few insights I learned from getting the Q-learning algorithm to work.
1. Looking at 5 moves of history (per player) is close to the most optimum history length for Q-Learning.
2. Add noise to the Q-Table to make it robust.

The second point is extremely crucial in getting the agent to work. The reason is that a Q-Table is static and doesn't change much from move to move. Thus, any medium-strong agent can quickly learn and exploit the q-table. However, if you add noise to the q-table, this can cause the agent to pick the second best (or 3rd best in the extreme) move, which reduces the predictability of the agent.

As all things in life, the amount of noise has to be just right. Too less and your agent can be exploited. Too much and your agent can't use its knowledge, and becomes a random agent.

The code is commented providing as many details as I can

In [None]:
import numpy as np
import random

lr = 0.9
gamma = 0.96
epsilon = 0.05 #Start with lots of random moves. We'll change this later on.
beat = {0:1, 1:2, 2:0}
delta = 0.2   # This is to control the amount of noise. Check chooseAction() for details

# Look at last 5 moves for both of us, so total 10 moves (5 + 5)
# The total number of states in the Q-Table is 3**(2 * numPast), so this QTable is GIGANTIC!!!! There's no way to fill all values of the table without pre-training.
# Small Q-Tables don't work. So we're stuck with this for now. 
# Go even bigger, and it doesn't work again. Seems like 5 is the sweet spot.
numPast = 5

# Map the game situation to the state number
gameToState = {}

# Just setting qTable to 0 so that python sees it as a global variable. createQTable will initialize the QTable
qTable = 0

# Initializes the QTable with 0s with n states and m actions
def createQTable(states, actions):
    global qTable
    qTable = np.zeros((actions, states))
    qTable = np.matrix(qTable)

# Creates a dict to map each game scenario (ex - "0112") to state number.
def createGameToStateMap(numPast):
    global gameToState
    count = 0
    while count < 3**(2*numPast):
        key = str(np.base_repr(count, base=3))
        key = key.zfill(2*numPast)
        gameToState[key] = count
        count += 1


# Chooses action based on QTable. Exploitation vs Exploration is included here.
def chooseAction(currentState):
    global epsilon, qTable, delta
    if np.random.uniform() < epsilon:  # 1-epsilon % of the time, we want to play random. Else play normally. 
        
        # The below line of code is one of the most important additions to get Q-Learning to work. 
        tempQ = qTable + (delta * np.random.randn(qTable.shape[0], qTable.shape[1])) #Add noise to the q table. 
        #Noise will be taken from a normal distribution, and the amount of noise is scaled by delta
        
        
        stateAction = tempQ[:, currentState]
        if np.max(stateAction) == np.min(stateAction):  #Best action has same q-value as weakest action, then pick random.
            action = random.choice([0, 1, 2])
        else:
            action = np.argmax(stateAction)
            action = int(action)
    
    else:
        action = random.choice([0, 1, 2])
    return action


# Update the QTable based on the rewards received
def learn(currentState, action, reward, nextState):
    global qTable, lr, gamma
    qPredict = qTable[action, currentState]

    qTarget = reward + gamma * qTable[:, nextState].max()

    qTable[action, currentState] += lr * (qTarget - qPredict)

    
# To calculate reward
def winLose(myMove, oppMove):
    if oppMove == beat[myMove]:
        return -1
    elif oppMove == myMove:
        return 0
    elif myMove == beat[oppMove]:
        return 1

# Convert history array into game string. Used for finding which state in the q-table we are in.
def generateGameString(myHist, oppHist, numPast):
    gameString = ""
    for i in range(numPast):
        gameString += str(myHist[len(myHist) - 1 - i])
        gameString += str(oppHist[len(oppHist) - 1 - i])
    return gameString
        
myHist = []
oppHist = []
move = 0
nextState = 0
currentState = 0

createQTable(3**(2*numPast), 3)
createGameToStateMap(numPast)

def agent(observation, step):
    global myHist, oppHist, numPast, gameToState, move, nextState, currentState, epsilon
    game_num = observation.step
    if game_num == 0:
        epsilon = 0.05
        myHist = []
        oppHist = []
        nextState = 0
        currentState = 0
        move = random.choice([0, 1, 2])
        return move
    lastOppMove = observation.lastOpponentAction

    if game_num < 5:
        myHist.append(move)
        oppHist.append(lastOppMove)
        move = random.choice([0, 1, 2])
        return move

    if game_num < 6: # First time we have just enough history to make a move using the Q-Table, but not enough to train it first.
        myHist.append(move)
        oppHist.append(lastOppMove)
        
        nextState = gameToState[generateGameString(myHist, oppHist, numPast)]
        move = chooseAction(nextState)
        return move

    # Here, we first update the Q-Table based on what the bot played last time. Then we make the next move. 
    currentState = nextState
    
    myHist.append(move)
    oppHist.append(lastOppMove)

    nextState = gameToState[generateGameString(myHist, oppHist, numPast)]
    reward = winLose(move, lastOppMove)
    
    learn(currentState, move, reward, nextState)
    move = chooseAction(nextState)
    
    # This is another important aspect for the bot. Since every opponent is practicaly a new opponent, we have to learn the specific quirks first.
    # For this, I initially start playing randomly 90% of the time, and over the course of 150 moves, slowly reduce it to only 10% random.
    if epsilon < 0.9:
        epsilon += (0.8 / 150)
    return move

A small thing to keep in mind: the above code block cannot be a separate .py file, as we DON'T want to reset the Q-Table every time. After we are done training, we can convert it to a .py file for submission.

# Training

Pre-training the Q-Table is what got my agent to work. Since the table has about 200,000 values, it needs pre-training to understand the "general structure" of an agent. 

In [None]:
# Define all enemies
agents = ['statistical', 'copy_opponent.py', 'reactionary.py', 'preCoded_hist.py', 'counter_reactionary.py', 'markov_agent', 'memory_patterns.py', 
         'iocaine.py', 'greenberg.py', 'xgboost.py', 'multi_armed_bandit.py', 'opponent_transition_matrix.py', 'decision_tree_classifier.py', 'statistical_prediction.py', 
          'not_losing.py', 'simple_method.py', 'geometry.py', 'anti_geo.py', 'anti_otm.py', 'new_mlb.py', 'CNN.py']

In [None]:
# Keep Track of Score performance
scoresOverTime = {}
for opp in agents:
    scoresOverTime[opp] = []

scoresOverTime

This is the part where we train. 
Epochs is the number of times the Q-Learning bot plays against all agents. I just set it at 25 for now. Too less and the bot doesn't learn enough, too much and it overfits to the agents in the sample.

In [None]:
epochs = 25

for i in range(epochs):
    print()
    print("EPOCH  " + str(i + 1) + " ---------------------------------")
    for opp in agents:
        print("Now Playing Agent " + opp, end='')
        gamePlayed = env.run([agent, opp])
        print("   Score -> " + str(gamePlayed[len(gamePlayed) - 1][0]['reward']))
        scoresOverTime[opp].append(gamePlayed[len(gamePlayed) - 1][0]['reward'])
print("Finished Training")

Based on the evaluation, I noticed that the Q-Learning bot is relatively neutral with Geometry Bot! About half the time (little more) it wins, and the other half it loses. Overall, with my experiments I saw that this bot is extremely flexible. Any new game changer bot is released to the public. Just add it in the agents list, and the Q-Table will do a decent job against it. 

Let's just see the first few values of the Q-Table...

In [None]:
qTable

Save the Q-Table for use later on. When you submit the Q-Learning bot for the leaderboard, copy the above q-learning code. Then inside of `createQTable()`, read in the .npy file instead of initializing it with 0s. 

In [None]:
np.save('qtable', qTable)

And that's it! With just this much code, you can decently train the Q-Learning bot, and get a score of about ~900. Now since this is a single agent (and a relatively static bot), it's score can vary, so including it in an ensemble will probably give better results.


# Potential Improvements

1. When looking at my agent's logs, I saw that often it lost around 10 points within the first 100 moves (even though the bot is practicaly random at this point). I think it has to do with the fact that I am using `random.randint`. Maybe using numpy or a combination of random number generators could make it betteer.
2. Sometimes, the Q-Table would take a lead of like 30-40 points after just 500 games, but then the opponent would figure out the table and end up winning / drawing. I didn't do it here, but it could easily be solved by playing random after a strong lead. Avoid the Q-Table from being exploited.
3. I could get the training to work very well with Tony Robinson's RPS logs. If you can train it using that, you can even get the Q-Table to learn from hidden agents like the one by Stas SL!


I hope this showed some modifications that I found interesting and helped the Q-Learning algo to work. I didn't expect such a gigantic Q-Table to be feasible, but with training it just is.