# Rock Paper Scissors - Multi Armed Bandit

The idea here is to take a selection of agents, have them predict each move, and (hopefully) pick the agent most likely to win.

There are two implementions here. 
- The first uses the simple logic of computing the running average of the prediction accuracy and uses the agent with the higest score
- The second correctly implements the Multi-Armed Bandit strategy, using exponential decay and uses `np.random.beta()` to generate a probability for selection

This notebook is inspired by:
- https://www.kaggle.com/ilialar/multi-armed-bandit-vs-deterministic-agents

# Imports

Rather than use a multi-file commit (which might have been easier), I using the method of concatenating all the scripts together. This method however requires care to avoid any global namespace conflicts with variables such as `history`. Thus perl is used to make minon edits to some of the files. 

In [None]:
!find ../input/ -name '*.py'

In [None]:
!cat ../input/rock-paper-scissors-anti-anti-pi-bot/pi.py | perl -p -e 's/kaggle_agent/anti_anti_pi_agent/g;' | tee anti_anti_pi.py > /dev/null
!cat ../input/rock-paper-scissors-anti-pi-bot/pi.py      | perl -p -e 's/kaggle_agent/anti_pi_agent/g;'      | tee anti_pi.py      > /dev/null
!cat ../input/rock-paper-scissors-pi-bot/pi.py           | perl -p -e 's/kaggle_agent/pi_agent/g;'           | tee pi.py           > /dev/null

In [None]:
!cat ../input/rock-paper-scissors/react.py | tee reactionary.py > /dev/null

In [None]:
!cat ../input/rock-paper-scissors-anti-rotn/anti_rotn.py | perl -p -e 's/history/rotn_history/g;' | tee anti_rotn.py > /dev/null

In [None]:
!cat ../input/rps-roshambo-comp-iocaine-powder/submission.py | tee -a iocaine.py > /dev/null

In [None]:
!cat ../input/rock-paper-scissors-greenberg/greenberg.py | perl -p -e 's/kaggle_agent/greenberg_agent/g' | tee greenberg.py > /dev/null

In [None]:
!cat ../input/rock-paper-scissors-statistical-prediction/submission.py | perl -p -e 's/history/statistical_history/g;' | tee statistical.py > /dev/null

In [None]:
!cat ../input/rock-paper-scissors-naive-bayes/submission.py | perl -p -e 's/kaggle_agent/naive_bayes/g;' | tee naive_bayes.py > /dev/null

In [None]:
!cat ../input/rock-paper-scissors-memory-patterns/submission.py | perl -p -e 's/kaggle_agent/memory_patterns/g;' | tee memory_patterns.py > /dev/null

In [None]:
!cat ../input/rock-paper-scissors-naive-bayes/submission.py | perl -p -e 's/kaggle_agent/naive_bayes/g; s/instance/naive_bayes_instance/g;' | tee naive_bayes.py > /dev/null

In [None]:
!cat ../input/rps-geometry-silver-rank-by-minimal-logic/geometry.py | perl -p -e 's/call_agent/geometry_agent/g;' | tee geometry.py > /dev/null

In [None]:
cat ../input/rps-dojo/black_belt/IOU2.py | perl -p -e 's/agent/iou2_agent/g;' | tee IOU2.py > /dev/null

In [None]:
cat ../input/rps-dojo/black_belt/memory_patterns_v20.py | perl -p -e 's/my_agent/memory_patterns_v20/g;' | tee memory_patterns_v20.py > /dev/null


In [None]:
cat ../input/rps-dojo/black_belt/centrifugal_bumblepuppy_v4.py | perl -p -e 's/run/centrifugal_bumblepuppy/g; s/gg/gg_bumblepuppy/g; s/code/code_bumblepuppy/;' | tee centrifugal_bumblepuppy_v4.py > /dev/null

In [None]:
!cat ../input/rps-dojo/black_belt/testing_please_ignore.py | perl -p -e 's/run/testing_please_ignore/g; s/gg/gg_ignore/g; s/code/code_ignore/;' | tee testing_please_ignore.py > /dev/null

In [None]:
!cat ../input/rps-dojo/black_belt/dllu1.py | perl -p -e 's/run/dllu1_agent/g; s/gg/gg_dllu1/g; s/code/code_dllu1/;' | tee dllu1.py > /dev/null

In [None]:
!cat ../input/rock-paper-scissors-random-seed-search/main.py | perl -p -e 's/(history|min_seed|best_method|solutions|random_agent)/rss_$1/g; s/seeds_per_turn=200_000/seeds_per_turn=100_000/g' | tee random_seed_search.py > /dev/null

In [None]:
!cat ../input/rock-paper-scissors-decision-tree/submission.py | perl -p -e 's/history/decision_tree_history_1/g; s/decision_tree_agent/decision_tree_agent_1/g;' | tee decision_tree_1.py > /dev/null
!cat ../input/rock-paper-scissors-decision-tree/submission.py | perl -p -e 's/history/decision_tree_history_2/g; s/decision_tree_agent/decision_tree_agent_2/g;' | tee decision_tree_2.py > /dev/null 
!cat ../input/rock-paper-scissors-decision-tree/submission.py | perl -p -e 's/history/decision_tree_history_2/g; s/decision_tree_agent/decision_tree_agent_3/g;' | tee decision_tree_3.py > /dev/null

In [None]:
!cat ../input/rock-paper-scissors-genetic-algorithm/genetics.py ../input/rock-paper-scissors-genetic-algorithm/genetics_choice.py | perl -p -e 's/from genetics/#/' | tee genetics.py > /dev/null

In [None]:
!cat ../input/rock-paper-scissors-flatten/flatten.py | tee flatten.py > /dev/null

In [None]:
# !cat ../input/going-meta-with-kumoko/submission.py | perl -p -e 's/def agent/def kumoko_agent/' | tee kumoko.py > /dev/null

In [None]:
!cat ../input/rps-opponent-transition-matrix/submission.py | tee transition.py > /dev/null

In [None]:
!find ./ -name '*.py' | xargs -L1 perl -p -i -e 's/\bprint\(.*\)/pass/sg;'

# Simple Winrate Agent

This was my attempt at rewriting the multi-armed bandit logic from scratch.

It uses simplified logic, in that it simply computes the (running) mean of predicted winrate for each agent based on historical data, but doesn't formally implement exponential decay or use `np.random.beta()` as a selection method.

This implemention is no longer being used

In [None]:
# %%writefile multi_armed_stats_bandit.py
# @unused - this file is not used

from collections import defaultdict
import numpy as np
import time
from operator import itemgetter

mlb_opponent = []
mlb_expected = defaultdict(list)
mlb_agents   = {
#     'random':               (lambda obs, conf: random_agent(obs, conf)),
#     'pi':                   (lambda obs, conf: pi_agent(obs, conf)),
    'anti_pi':              (lambda obs, conf: anti_pi_agent(obs, conf)),
#     'anti_anti_pi':         (lambda obs, conf: anti_anti_pi_agent(obs, conf)),
#     'reactionary':          (lambda obs, conf: reactionary(obs, conf)),
#     'anti_rotn':            (lambda obs, conf: anti_rotn(obs, conf, warmup=1)),
    
    'iou2':                  (lambda obs, conf: iou2_agent(obs, conf)),
    'geometry':              (lambda obs, conf: geometry_agent(obs, conf)),
    'memory_patterns_v20':   (lambda obs, conf: memory_patterns_v20(obs, conf)),
    'testing_please_ignore': (lambda obs, conf: testing_please_ignore(obs, conf)),
    'naive_bayes':           (lambda obs, conf: naive_bayes(obs, conf)),
    'bumblepuppy':           (lambda obs, conf: centrifugal_bumblepuppy(obs, conf)), 
    'dllu1_agent':           (lambda obs, conf: dllu1_agent(obs, conf)), 

    'genetics':              (lambda obs, conf: genetics_choice(obs, conf)), 
    'flatten':               (lambda obs, conf: flatten_agent(obs, conf)),
    'transition':            (lambda obs, conf: transition_agent(obs, conf)),
    # 'kumoko':                (lambda obs, conf: kumoko_agent(obs, conf)), # broken
    
    'memory_patterns':       (lambda obs, conf: memory_patterns(obs, conf)),
    'naive_bayes':           (lambda obs, conf: naive_bayes(obs, conf)),
    'iocaine':               (lambda obs, conf: iocaine_agent(obs, conf)),
    'greenberg':             (lambda obs, conf: greenberg_agent(obs, conf)),
    'statistical':           (lambda obs, conf: statistical_prediction_agent(obs, conf)),
    'statistical_expected':  (lambda obs, conf: statistical_history['expected'][-1] + 1),       
    # 'decision_tree_1':       (lambda obs, conf: decision_tree_agent_1(obs, conf, stages=1, window=4)),
    'decision_tree_2':       (lambda obs, conf: decision_tree_agent_2(obs, conf, stages=2, window=6)),
    'decision_tree_3':       (lambda obs, conf: decision_tree_agent_3(obs, conf, stages=3, window=10)),
    # 'random_seed_search':    (lambda obs, conf: random_seed_search_agent(obs, conf)),
}

# observation   = {'step': 1, 'lastOpponentAction': 1}
# configuration = {'episodeSteps': 10, 'agentTimeout': 60, 'actTimeout': 1, 'runTimeout': 1200, 'isProduction': False, 'signs': 3}
def multi_armed_bandit_stats_agent(observation, configuration, average='running', window=20):
    global mlb_expected
    global mlb_opponent
    global mlb_agents
    time_start = time.perf_counter()

    if observation.step != 0: 
        mlb_opponent += [ observation.lastOpponentAction ]
    # else:
    #     mlb_opponent += [ random_agent(observation, configuration) ]

    if window: 
        mlb_opponent = mlb_opponent[-window:] 
        for name, agent_fn in list(mlb_agents.items()):
            mlb_expected[name] = mlb_expected[name][-window:] 

    
    # accuracy is in date order
    accuracy = {
        name: np.array([
            int( mlb_expected[name][-n] == mlb_opponent[-n] )
            for n in range(1, min(len(values), len(mlb_opponent))+1)
        ])[::-1]
        if len(values) else np.array([0.0])
        for name, values in list(mlb_expected.items())
    }
    
    # Update predictions for next turn
    for name, agent_fn in list(mlb_agents.items()):
        try:
            agent_action        = agent_fn(observation, configuration)
            agent_expected      = (agent_action - 1) % configuration.signs
            mlb_expected[name] += [ agent_expected ]
        except Exception as exception:
            print('Exception:', name, agent_fn, exception)
    
    action     = 1
    agent_name = 'random'

    scores = {}
    if observation.step != 0: 
        # Compute average scores
        for name, values in accuracy.items():
            if len(values) == 0:
                scores[name] = 0.0
            elif average == 'mean':
                scores[name] = np.mean( values )
            elif average == 'mean_squared':
                scores[name] = np.mean( values ) ** 2
            elif average == 'running':
                weights = np.sqrt(np.arange(1,len(values)+1))[::-1] 
                scores[name] = np.mean( values / weights * np.sum(weights) )  
            else: 
                assert average in [ 'mean', 'mean_squared', 'half', 'running' ], f"average != {[ 'mean', 'mean_squared', 'half', 'running' ]}" 
            scores[name] = np.round(scores[name], 5)

        scores       = dict(sorted(scores.items(),       key=itemgetter(1),                       reverse=True))
        mlb_expected = dict(sorted(mlb_expected.items(), key=lambda pair: scores.get(pair[0], 0), reverse=True))

        # Sort by most accurate
        if len(scores):
            # agent_name, score = sorted(scores.items(), key=itemgetter(1), reverse=True)[0]
            agent_name = random.choices( population=list(scores.keys()), weights=list(scores.values()), k=1 )[0]
            expected   = mlb_expected[agent_name][-1]
        else:
            agent_name, score = 'random', 0
            expected = random_agent(observation, configuration)

        action = (expected + 1) % configuration.signs
                
    time_taken = time.perf_counter() - time_start
    print(f'opponent =', mlb_opponent)
    print(f'expected =', dict(mlb_expected))
    print(f'scores   =', dict(scores))
    print(f'action   = {action} | agent = {agent_name} | step = {observation.step} | {time_taken:.3f}s')
    print()
    
    return int(action)
    

# Multi-Armed Bandit

This is a more mathematically correct implemention of the Multi-Armed Bandit logic. 

It uses both exponential decay and with probabilistic selection using `np.random.beta()`. Logic was inspired by:
- https://www.kaggle.com/ilialar/multi-armed-bandit-vs-deterministic-agents

In [None]:
%%writefile multi_armed_bandit.py
# Source: https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-multi-armed-bandit/
import contextlib
import os
from collections import defaultdict
import numpy as np
import time
from operator import itemgetter

# from memory.memory_patterns import memory_patterns_agent
# from memory.RPSNaiveBayes import naive_bayes_agent
# from rng.random_agent import random_agent
# from roshambo_competition.greenberg import greenberg_agent
# from roshambo_competition.iocaine_powder import iocaine_agent
# from statistical.statistical_prediction import statistical_prediction_agent
# from simple.anti_pi import anti_pi_agent
# from simple.pi import pi_agent

mlb_history  = {
    'actions':  [],
    'opponent': []
}
mlb_expected = defaultdict(list)
mlb_agents   = {
    #     'random':               (lambda obs, conf: random_agent(obs, conf)),
    #     'pi':                   (lambda obs, conf: pi_agent(obs, conf)),
    'anti_pi':               (lambda obs, conf: anti_pi_agent(obs, conf)),
    #     'anti_anti_pi':         (lambda obs, conf: anti_anti_pi_agent(obs, conf)),
    #     'reactionary':          (lambda obs, conf: reactionary(obs, conf)),
    'anti_rotn':            (lambda obs, conf: anti_rotn(obs, conf, warmup=1)),

    'iou2':                  (lambda obs, conf: iou2_agent(obs, conf)),
    'geometry':              (lambda obs, conf: geometry_agent(obs, conf)),
    'memory_patterns_v20':   (lambda obs, conf: memory_patterns_v20(obs, conf)),
    'testing_please_ignore': (lambda obs, conf: testing_please_ignore(obs, conf)),
    'bumblepuppy':           (lambda obs, conf: centrifugal_bumblepuppy(obs, conf)),
    'dllu1_agent':           (lambda obs, conf: dllu1_agent(obs, conf)),

    'genetics':              (lambda obs, conf: genetics_choice(obs, conf)),
    'flatten':               (lambda obs, conf: flatten_agent(obs, conf)),
    'transition':            (lambda obs, conf: transition_agent(obs, conf)),
    # 'kumoko':                (lambda obs, conf: kumoko_agent(obs, conf)), # broken    

    'memory_patterns':       (lambda obs, conf: memory_patterns(obs, conf)),
    'naive_bayes':           (lambda obs, conf: naive_bayes(obs, conf)),
    'iocaine':               (lambda obs, conf: iocaine_agent(obs, conf)),
    'greenberg':             (lambda obs, conf: greenberg_agent(obs, conf)),
    'statistical':           (lambda obs, conf: statistical_prediction_agent(obs, conf)),
    'statistical_expected':  (lambda obs, conf: statistical_history['expected'][-1] + 1),
    # 'decision_tree_1':       (lambda obs, conf: decision_tree_agent_1(obs, conf, stages=1, window=20)),
    'decision_tree_2':       (lambda obs, conf: decision_tree_agent_2(obs, conf, stages=2, window=6)),
    'decision_tree_3':       (lambda obs, conf: decision_tree_agent_3(obs, conf, stages=3, window=10)),
    #'random_seed_search':    (lambda obs, conf: random_seed_search_agent(obs, conf)),
}

# observation   = {'step': 1, 'lastOpponentAction': 1}
# configuration = {'episodeSteps': 10, 'agentTimeout': 60, 'actTimeout': 1, 'runTimeout': 1200, 'isProduction': False, 'signs': 3}
def multi_armed_bandit_agent(observation, configuration, warmup=1, step_reward=3, decay_rate=0.95, verbose=True ):
    global mlb_expected
    global mlb_history
    global mlb_agents
    time_start = time.perf_counter()
    if os.environ.get('KAGGLE_KERNEL_RUN_TYPE', 'Localhost') == 'Interactive':
        warmup = 1


    if observation.step != 0:
        mlb_history['opponent'] += [ observation.lastOpponentAction ]
    # else:
    #     mlb_history['opponent'] += [ random_agent(observation, configuration) ]


    # Implement Multi Armed Bandit Logic
    win_loss_scores = defaultdict(lambda: [0.0, 0.0])
    for name, values in list(mlb_expected.items()):
        for n in range(min(len(values), len(mlb_history['opponent']))):
            win_loss_scores[name][1] = (win_loss_scores[name][1] - 1) * decay_rate + 1
            win_loss_scores[name][0] = (win_loss_scores[name][0] - 1) * decay_rate + 1

            # win | expect rock, play paper -> opponent plays rock
            if   mlb_expected[name][n] == (mlb_history['opponent'][n] + 0) % configuration.signs:
                win_loss_scores[name][0] += step_reward

                # draw | expect rock, play paper -> opponent plays paper
            elif mlb_expected[name][n] == (mlb_history['opponent'][n] + 1) % configuration.signs:
                win_loss_scores[name][0] += step_reward
                win_loss_scores[name][1] += step_reward

                # win | expect rock, play paper -> opponent plays scissors
            elif mlb_expected[name][n] == (mlb_history['opponent'][n] + 2) % configuration.signs:
                win_loss_scores[name][1] += step_reward


    # Update predictions for next turn
    for name, agent_fn in list(mlb_agents.items()):
        try:
            with contextlib.redirect_stdout(None):  # disable stdout for child agents
                agent_action        = agent_fn(observation, configuration)
                agent_expected      = (agent_action - 1) % configuration.signs
                mlb_expected[name] += [ agent_expected ]
        except Exception as exception:
            print('Exception:', name, agent_fn, exception)


    # Pick the Best Agent
    beta_scores = {
        name: np.random.beta(win_loss_scores[name][0], win_loss_scores[name][1])
        for name in win_loss_scores.keys()
    }

    if observation.step == 0:
        # Always play scissors first move
        # At Auction       - https://www.artsy.net/article/artsy-editorial-christies-sothebys-played-rock-paper-scissors-20-million-consignment
        # EDA best by test - https://www.kaggle.com/jamesmcguigan/rps-episode-archive-dataset-eda
        agent_name = 'scissors'
        expected = 1
    elif observation.step < warmup:
        agent_name = 'random'
        expected   = random_agent(observation, configuration)
    else:
        agent_name = sorted(beta_scores.items(), key=itemgetter(1), reverse=True)[0][0]
        expected   = mlb_expected[agent_name][-1]

    action = (expected + 1) % configuration.signs


    if verbose:
        best_score    = beta_scores.get(agent_name,0)
        last_opponent = (mlb_history['opponent'] or [0])[-1]
        win_symbol    = (
            ' ' if observation.step == 0 else 
            '+' if mlb_history['actions'][-1] == (mlb_history['opponent'][-1] + 1) % 3 else
            '|' if mlb_history['actions'][-1] == (mlb_history['opponent'][-1] + 0) % 3 else
            '-'
        )
        time_taken    = time.perf_counter() - time_start
        print(f'{observation.step:4d} | {time_taken:0.2f}s | {last_opponent}{win_symbol} -> action = {expected} -> {action} | {best_score*100:3.0f}% {agent_name}')

    mlb_history['actions'] += [ action ]
    return int(action)

In [None]:
!rm -vf submission.py
!ls -tr ./*.py | xargs -L1 -I{} sed -i -z 's!^!\n##### {} #####\n\n!g' {}  # add filename to start of each file
!ls -tr ./*.py | xargs -L1 -I{} sed -i -z 's/$/\n\n/g' {}  # add newlines to end of each file
!ls -tr ./*.py | xargs cat > submission.py
%run -i 'submission.py'
# !head submission.py

# Test

In [None]:
# !cat -n "submission.py" | grep 1687 -C 10

In [None]:
from kaggle_environments import make

env = make("rps", configuration={"episodeSteps": 25}, debug=True)
env.run(["submission.py", "pi.py" ])
env.render(mode="ipython", width=600, height=600)

In [None]:
from kaggle_environments import make

env = make("rps", configuration={"episodeSteps": 25}, debug=True)
env.run(["submission.py", lambda obs, conf: obs.step % 3 ])
env.render(mode="ipython", width=600, height=600)

In [None]:
from kaggle_environments import make

env = make("rps", configuration={"episodeSteps": 25}, debug=True)
env.run(["submission.py", "anti_rotn.py"])
env.render(mode="ipython", width=600, height=600)

In [None]:
# !cat -n "submission.py" | grep -v PI | grep -C 5 3364 

In [None]:
from kaggle_environments import make

env = make("rps", configuration={"episodeSteps": 25}, debug=True)
env.run(["submission.py", "statistical.py"])
env.render(mode="ipython", width=600, height=600)

In [None]:
from kaggle_environments import make

env = make("rps", configuration={"episodeSteps": 100}, debug=False)
env.run(["submission.py", "decision_tree_3.py"])
env.render(mode="ipython", width=600, height=600)

# Further Reading

This notebook is part of a series exploring Rock Paper Scissors:

Predetermined
- [PI Bot](https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-pi-bot)
- [Anti-PI Bot](https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-anti-pi-bot)
- [Anti-Anti-PI Bot](https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-anti-anti-pi-bot)
- [De Bruijn Sequence](https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-de-bruijn-sequence)

RNG
- [Random Agent](https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-random-agent)
- [Random Seed Search](https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-random-seed-search)
- [RNG Statistics](https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-rng-statistics)

Opponent Response
- [Anti-Rotn](https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-anti-rotn)
- [Sequential Strategies](https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-sequential-strategies)

Statistical 
- [Weighted Random Agent](https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-weighted-random-agent)
- [Anti-Rotn Weighted Random](https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-anti-rotn-weighted-random)
- [Statistical Prediction](https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-statistical-prediction)

Memory Patterns
- [Naive Bayes](https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-naive-bayes)
- [Memory Patterns](https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-memory-patterns)

Decision Tree
- [XGBoost](https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-xgboost)
- [Multi Stage Decision Tree](https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-multi-stage-decision-tree)
- [Decision Tree Ensemble](https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-decision-tree-ensemble)

Neural Networks
- [LSTM](https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-lstm)

Ensemble
- [Multi Armed Stats Bandit](https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-multi-armed-stats-bandit)

RoShamBo Competition Winners
- [Iocaine Powder](https://www.kaggle.com/jamesmcguigan/rps-roshambo-comp-iocaine-powder)
- [Greenberg](https://www.kaggle.com/jamesmcguigan/rock-paper-scissors-greenberg)