# Dictionary game history keyed agent

To create the agent for this competition, we must put its code in \*.py file.   
To do this, we can use the [magic commands](https://ipython.readthedocs.io/en/stable/interactive/magics.html) of Jupyter Notebooks    
One of these commands is [writefile](https://ipython.readthedocs.io/en/stable/interactive/magics.html#cellmagic-writefile) which writes the contents of the cell to a file.

The idea was to try and detect opponent deviations from random and try to penalize them. While at the same time staying as random as possible and not be to deterministic so it can be predicted.  
  
After seeing that the competition was itself trying to penalize random. I changed mine to fallback to the copied in [Rock, Paper, Scissors with Memory Patterns](https://www.kaggle.com/yegorbiryukov/rock-paper-scissors-with-memory-patterns?select=submission.py) instead of strictly random. This agent seemed to be based on a sort of similar idea. Testing seems to show I fallback about the first 10% of the games building up game history and after that it is pretty much strictly my agent being used the last 90%. How much Memory Patterns fallsback the first 10% of the games to random, if it does, I don't know.  
  
Parameters that could be varied include the key length. I'm changing from 5 to 6 for this version. Subkeys are also updated so shortening the key probably doesn't make much sense. I require a key to have a history of at least 10 games and one of the RPS choice to have been used at least 35% of the time.

# Functions and Imports - Memory Patterns

In [None]:
%%writefile submission.py
# start executing cells from here to rewrite submission.py

import random

def evaluate_pattern_efficiency(previous_step_result):
    """ 
        evaluate efficiency of the pattern and, if pattern is inefficient,
        remove it from agent's memory
    """
    pattern_group_index = previous_action["pattern_group_index"]
    pattern_index = previous_action["pattern_index"]
    pattern = groups_of_memory_patterns[pattern_group_index]["memory_patterns"][pattern_index]
    pattern["reward"] += previous_step_result
    # if pattern is inefficient
    if pattern["reward"] <= EFFICIENCY_THRESHOLD:
        # remove pattern from agent's memory
        del groups_of_memory_patterns[pattern_group_index]["memory_patterns"][pattern_index]
    
def find_action(group, group_index):
    """ if possible, find my_action in this group of memory patterns """
    if len(current_memory) > group["memory_length"]:
        this_step_memory = current_memory[-group["memory_length"]:]
        memory_pattern, pattern_index = find_pattern(group["memory_patterns"], this_step_memory, group["memory_length"])
        if memory_pattern != None:
            my_action_amount = 0
            for action in memory_pattern["opp_next_actions"]:
                # if this opponent's action occurred more times than currently chosen action
                # or, if it occured the same amount of times and this one is choosen randomly among them
                if (action["amount"] > my_action_amount or
                        (action["amount"] == my_action_amount and random.random() > 0.5)):
                    my_action_amount = action["amount"]
                    my_action = action["response"]
            return my_action, pattern_index
    return None, None

def find_pattern(memory_patterns, memory, memory_length):
    """ find appropriate pattern and its index in memory """
    for i in range(len(memory_patterns)):
        actions_matched = 0
        for j in range(memory_length):
            if memory_patterns[i]["actions"][j] == memory[j]:
                actions_matched += 1
            else:
                break
        # if memory fits this pattern
        if actions_matched == memory_length:
            return memory_patterns[i], i
    # appropriate pattern not found
    return None, None

def get_step_result_for_my_agent(my_agent_action, opp_action):
    """ 
        get result of the step for my_agent
        1, 0 and -1 representing win, tie and lost results of the game respectively
        reward will be taken from observation in the next release of kaggle environments
    """
    if my_agent_action == opp_action:
        return 0
    elif (my_agent_action == (opp_action + 1)) or (my_agent_action == 0 and opp_action == 2):
        return 1
    else:
        return -1
    
def update_current_memory(obs, my_action):
    """ add my_agent's current step to current_memory """
    # if there's too many actions in the current_memory
    if len(current_memory) > current_memory_max_length:
        # delete first two elements in current memory
        # (actions of the oldest step in current memory)
        del current_memory[:2]
    # add agent's last action to agent's current memory
    current_memory.append(my_action)
    
def update_memory_pattern(obs, group):
    """ if possible, update or add some memory pattern in this group """
    # if length of current memory is suitable for this group of memory patterns
    if len(current_memory) > group["memory_length"]:
        # get memory of the previous step
        # considering that last step actions of both agents are already present in current_memory
        previous_step_memory = current_memory[-group["memory_length"] - 2 : -2]
        previous_pattern, pattern_index = find_pattern(group["memory_patterns"], previous_step_memory, group["memory_length"])
        if previous_pattern == None:
            previous_pattern = {
                # list of actions of both players
                "actions": previous_step_memory.copy(),
                # total reward earned by using this pattern
                "reward": 0,
                # list of observed opponent's actions after each occurrence of this pattern
                "opp_next_actions": [
                    # action that was made by opponent,
                    # amount of times that action occurred,
                    # what should be the response of my_agent
                    {"action": 0, "amount": 0, "response": 1},
                    {"action": 1, "amount": 0, "response": 2},
                    {"action": 2, "amount": 0, "response": 0}
                ]
            }
            group["memory_patterns"].append(previous_pattern)
        # update previous_pattern
        for action in previous_pattern["opp_next_actions"]:
            if action["action"] == obs["lastOpponentAction"]:
                action["amount"] += 1

# Global Variables - Memory Patterns

In [None]:
%%writefile -a submission.py
# "%%writefile -a submission.py" will append the code below to submission.py,
# it WILL NOT rewrite submission.py

# maximum steps in a memory pattern
STEPS_MAX = 5
# minimum steps in a memory pattern
STEPS_MIN = 3
# lowest efficiency threshold of a memory pattern before being removed from agent's memory
EFFICIENCY_THRESHOLD = -3
# amount of steps between forced random actions
FORCED_RANDOM_ACTION_INTERVAL = random.randint(STEPS_MIN, STEPS_MAX)

# current memory of the agent
current_memory = []
# previous action of my_agent
previous_action = {
    "action": None,
    # action was taken from pattern
    "action_from_pattern": False,
    "pattern_group_index": None,
    "pattern_index": None
}
# amount of steps remained until next forced random action
steps_to_random = FORCED_RANDOM_ACTION_INTERVAL
# maximum length of current_memory
current_memory_max_length = STEPS_MAX * 2
# current reward of my_agent
# will be taken from observation in the next release of kaggle environments
reward = 0
# memory length of patterns in first group
# STEPS_MAX is multiplied by 2 to consider both my_agent's and opponent's actions
group_memory_length = current_memory_max_length
# list of groups of memory patterns
groups_of_memory_patterns = []
for i in range(STEPS_MAX, STEPS_MIN - 1, -1):
    groups_of_memory_patterns.append({
        # how many steps in a row are in the pattern
        "memory_length": group_memory_length,
        # list of memory patterns
        "memory_patterns": []
    })
    group_memory_length -= 2
    

# Create Agent - Memory Pattern

In [None]:
%%writefile -a submission.py
# "%%writefile -a submission.py" will append the code below to submission.py,
# it WILL NOT rewrite submission.py

def pattern(obs, conf):
    """ your ad here """
    # action of my_agent
    my_action = None
    
    # forced random action
    global steps_to_random
    steps_to_random -= 1
    if steps_to_random <= 0:
        steps_to_random = FORCED_RANDOM_ACTION_INTERVAL
        # choose action randomly
        my_action = random.randint(0, 2)
        # save action's data
        previous_action["action"] = my_action
        previous_action["action_from_pattern"] = False
        previous_action["pattern_group_index"] = None
        previous_action["pattern_index"] = None
    
    # if it's not first step
    if obs["step"] > 0:
        # add opponent's last step to current_memory
        current_memory.append(obs["lastOpponentAction"])
        # previous step won or lost
        previous_step_result = get_step_result_for_my_agent(current_memory[-2], current_memory[-1])
        global reward
        reward += previous_step_result
        # if previous action of my_agent was taken from pattern
        if previous_action["action_from_pattern"]:
            evaluate_pattern_efficiency(previous_step_result)
    
    for i in range(len(groups_of_memory_patterns)):
        # if possible, update or add some memory pattern in this group
        update_memory_pattern(obs, groups_of_memory_patterns[i])
        # if action was not yet found
        if my_action == None:
            my_action, pattern_index = find_action(groups_of_memory_patterns[i], i)
            if my_action != None:
                # save action's data
                previous_action["action"] = my_action
                previous_action["action_from_pattern"] = True
                previous_action["pattern_group_index"] = i
                previous_action["pattern_index"] = pattern_index
    
    # if no action was found
    if my_action == None:
        # choose action randomly
        my_action = random.randint(0, 2)
        # save action's data
        previous_action["action"] = my_action
        previous_action["action_from_pattern"] = False
        previous_action["pattern_group_index"] = None
        previous_action["pattern_index"] = None
    
    # add my_agent's current step to current_memory
    update_current_memory(obs, my_action)
    return my_action

* # Functions and Imports

In [None]:
%%writefile -a submission.py

import collections, itertools
import random

# Global Variables

In [None]:
%%writefile -a submission.py

running_key = collections.deque(7*(0,0), 7)

d = {}

opp = {
    0: 0,
    1: 0,
    2: 0
}

# previous action of my_agent
last_action = 0

# Create Agent

In [None]:
%%writefile -a submission.py

def my_agent(observation, configuration):
    global opp, d
    global last_action
    global running_key
    ROCK = 0
    PAPER = 1
    SCISSORS = 2
    
    
    my_action = pattern(observation, configuration)
    action_from_pattern = True
    
    # if it's not first step
    if observation.step > 0:
        had_prev = False
        prev_key = tuple(itertools.islice(running_key,0,5))
        # update current keys
        for k in range(len(prev_key),0,-1):
            subkey_prev = tuple(itertools.islice(prev_key,0,k))
            if subkey_prev in d:
                had_prev = True
                prior = d[subkey_prev]
                prior[observation.lastOpponentAction] += 1
            else:
                prior = opp.copy()
                prior[observation.lastOpponentAction] = 1
                d[tuple(subkey_prev)] = prior                
        # add opponent's last step to current_memory
        last_game = (last_action,observation.lastOpponentAction)
        #running_cnt += get_step_result_for_my_agent(last_action,observation.lastOpponentAction)
        running_key.appendleft(last_game)
        had_key = False
        for k in range(len(running_key),0,-1):
            subkey = tuple(itertools.islice(running_key,0,k))
            if subkey in d:
                had_key = True
                prior = d[subkey]
                v = prior.values()
                mv = max(v)
                total = sum(v)
                if total < 11 or mv / total < .35: 
                    continue
                else:
                    #iv = v.index(mv)
                    rock_per = prior[ROCK] / total
                    paper_per = prior[PAPER] / total
                    scissors_per = prior[SCISSORS] / total                
                    r = random.uniform(0, 1)
                    if r <= rock_per:
                        my_action = PAPER
                    elif r <= rock_per + paper_per:
                        my_action = SCISSORS
                    else: 
                        my_action = ROCK  
                    action_from_pattern = False
                break
            else:
                prior = opp.copy()
                prior[observation.lastOpponentAction] = 1
                d[tuple(subkey)] = prior
        if not had_key:
            prior = opp.copy()
            prior[observation.lastOpponentAction] = 1
            d[tuple(running_key)] = prior
     
    last_action = my_action
    return my_action

<a id="3"></a>
<h2 style='background:#FBE338; border:0; color:black'><center>Battle Examples<center><h2>

This part is devoted to simulating and testing battles with other agents.

We need to import the library for creating environments and simulating agent battles

In [None]:
# Upgrade kaggle_environments using pip before import
!pip install -q -U kaggle_environments

In [None]:
from kaggle_environments import make

Create a rock-paper-scissors environment (RPS), and set 1000 episodes for each simulation

In [None]:
env = make(
    "rps", 
    configuration={"episodeSteps": 100},
    debug=True
)

Let's take the agent that will copy our previous action from [Rock Paper Scissors - Agents Comparison](https://www.kaggle.com/ihelon/rock-paper-scissors-agents-comparison)

In [None]:
agent_copy_opponent_path = "../input/rock-paper-scissors-agents-comparison/copy_opponent.py"

Let's start simulating the battle nash_equilibrium_agent vs copy_opponent_agent

In [None]:
# nash_equilibrium_agent vs copy_opponent_agent
print("running",agent_copy_opponent_path)
env.run(
    ["submission.py", agent_copy_opponent_path]
)

env.render(mode="ipython", width=500, height=400)

Let's take the agent that will hit our previous action from [Rock Paper Scissors - Agents Comparison](https://www.kaggle.com/ihelon/rock-paper-scissors-agents-comparison)

In [None]:
agent_reactionary_path = "../input/rock-paper-scissors-agents-comparison/reactionary.py"

Let's start simulating the battle nash_equilibrium_agent vs reactionary

In [None]:
# nash_equilibrium_agent vs reactionary
env.run(
    ["submission.py", agent_reactionary_path]
)

env.render(mode="ipython", width=500, height=400)

Simulating the battle between two nash_equilibrium_agent agents

In [None]:
# nash_equilibrium_agent vs nash_equilibrium_agent
env.run(
    ["submission.py", "submission.py"]
)

env.render(mode="ipython", width=500, height=400)

Let's take the agent that will always use Rock from [Rock Paper Scissors - Agents Comparison](https://www.kaggle.com/ihelon/rock-paper-scissors-agents-comparison)

In [None]:
agent_rock_path = "../input/rock-paper-scissors-agents-comparison/rock.py"

Simulating the battle between nash_equilibrium_agent and rock

In [None]:
# nash_equilibrium_agent vs rock
env.run(
    ["submission.py", agent_rock_path]
)

env.render(mode="ipython", width=500, height=400)

Let's take the agent that will always use Scissors from [Rock Paper Scissors - Agents Comparison](https://www.kaggle.com/ihelon/rock-paper-scissors-agents-comparison)

In [None]:
agent_scissors_path = "../input/rock-paper-scissors-agents-comparison/scissors.py"

Simulating the battle between nash_equilibrium_agent and scissors

In [None]:
# nash_equilibrium_agent vs scissors
env.run(
    ["submission.py", agent_scissors_path]
)

env.render(mode="ipython", width=500, height=400)

Let's take the agent that will always use Paper from [Rock Paper Scissors - Agents Comparison](https://www.kaggle.com/ihelon/rock-paper-scissors-agents-comparison)

In [None]:
agent_paper_path = "../input/rock-paper-scissors-agents-comparison/paper.py"

Simulating the battle between nash_equilibrium_agent and paper

In [None]:
# nash_equilibrium_agent vs scissors
env.run(
    ["submission.py", agent_paper_path]
)

env.render(mode="ipython", width=500, height=400)

Let's take the agent that will hit self last actions from [Rock Paper Scissors - Agents Comparison](https://www.kaggle.com/ihelon/rock-paper-scissors-agents-comparison)

In [None]:
agent_hit_the_last_own_action_path = "../input/rock-paper-scissors-agents-comparison/hit_the_last_own_action.py"

Simulating the battle between nash_equilibrium_agent and agent_hit_the_last_own_action_path

In [None]:
# nash_equilibrium_agent vs scissors
env.run(
    ["submission.py", agent_hit_the_last_own_action_path]
)

env.render(mode="ipython", width=500, height=400)