# Beating a stochastic agent

Here I present how you can beat stochastic agent for which you know the exact algorithm.

The first step is to copy the agent. Then you mirror the logic, by making it see what it would have seen as an enemy. This usually means, you feed *your own last action* instead of the lastOpponentAction into the logic. Moreover, you store the opponent action as "your own action" in your history - but here the agent does not store this history anyway. For deterministic bots without randomness you could already get a perfect score (e.g. [Anti Statistical](https://www.kaggle.com/superant/anti-statistical)).

For a stochastic bot you may already have a tiny edge over the original, but you can make it much bigger by rotating the probability action distribution clockwise (see [Geometry of RPS](https://www.kaggle.com/c/rock-paper-scissors/discussion/210305)). Rotating is very similar to adding a random variable where one of three probabilities is zero.

In the latest version I write this on probabilities directly, rather than using complex numbers. 

The rotation and the magic number which is used here before argmax is: `probs += 0.63 * np.roll(probs, 1)`. Seeing this in terms of complex number multiplication on the representations, this value is suspiciously close to \\(\exp(2\pi i \cdot \frac{1}{9})\\)

Several interesting questions arise:

* Why is the phase shift so sensitive and why is `0.63` a good value for different stochastic agents? Or can you find a better value?
* Can you do better than a constant shift?
* Can you use this technique without knowing the opponent's code?
* Can you detect whether the opponent is vulnerable to such a tactics (i.e. detect if he is OTM) without trying out (inducing) this strategy?

# Original Opponent Transition Matrix

Here is the original [Opponent Transition Matrix](https://www.kaggle.com/group16/rps-opponent-transition-matrix): 

In [None]:
%%writefile otm.py

import numpy as np
import pandas as pd
import random

T = np.zeros((3, 3))
P = np.zeros((3, 3))

# a1 is the action of the opponent 1 step ago
# a2 is the action of the opponent 2 steps ago
a1, a2 = None, None

def transition_agent(observation, configuration):
    global T, P, a1, a2
    if observation.step > 1:
        a1 = observation.lastOpponentAction
        T[a2, a1] += 1
        P = np.divide(T, np.maximum(1, T.sum(axis=1)).reshape(-1, 1))
        a2 = a1
        if np.sum(P[a1, :]) == 1:
            return int((np.random.choice(
                [0, 1, 2],
                p=P[a1, :]
            ) + 1) % 3)
        else:
            return int(np.random.randint(3))
    else:
        if observation.step == 1:
            a2 = observation.lastOpponentAction
        return int(np.random.randint(3))

# Anti Opponent Transition Matrix

Changes are marked with comments. Opponents actions are completely ignored. The anti-agent starts being effective only if you add the phase.

In [None]:
%%writefile "anti_otm.py"

import numpy as np
import pandas as pd
import random

T = np.zeros((3, 3))
P = np.zeros((3, 3))

a1, a2 = None, None
last_action = None # track my action.


###########################################
# Original agent with modifications marked ->
###########################################

def anti_transition_agent(observation, configuration):
    global T, P, a1, a2, last_action
    if observation.step > 1:
        a1 = last_action   # on me only; take mirrored view on game
        T[a2, a1] += 1
        P = np.divide(T, np.maximum(1, T.sum(axis=1)).reshape(-1, 1))
        a2 = a1
        if np.sum(P[a1, :]) == 1:
            probs = P[a1,:]
            
            probs += 0.63 * np.roll(probs, 1)    # This is the magic addition of phase
            
            result = (int(probs.argmax()) + 1) % 3   # Changed to argmax instead of stochastic
        else:
            result = int(np.random.randint(3))
    else:
        if observation.step == 1:
            a2 = last_action    # on me only
        result = int(np.random.randint(3))
        
    result = (result + 1) % 3  # beat what he would have done
        
    last_action = result
        
    return result

# Evaluation

In [None]:
from kaggle_environments import evaluate, make, utils
env = make("rps", debug=True)

num_win=0
num_loss=0
num_matches=0

for _ in range(50):
    env.reset()
    result=env.run(["anti_otm.py", "otm.py"])
    reward=result[-1][0]["observation"]["reward"]
    if reward>20:
        num_win+=1
    if reward<-20:
        num_loss+=1
    num_matches+=1
    
    print(f"{reward:+4.0f}, {num_matches:2d} matches, {num_win/num_matches:5.1%} win, {num_loss/num_matches:5.1%} loss")