## Evaluate performance against all bots

To see the result of running this code, see `results.txt`

This notebook contains code for evaluating my agent. Note that you must ensure `model_stateDict.pt` is present. Also note that this notebook saves its results to Google Drive. This is because I was running it on Colab and wanted to avoid losing the data when my session was ended. You can either run this notebook on Colab or remove the Google Drive portion if you want to run locally.

In [1]:
!pip install --upgrade open_spiel



In [2]:
import numpy as np

from open_spiel.python import rl_agent
from open_spiel.python import rl_environment
import pyspiel

In [11]:
RECALL = 20

# The population of 43 bots. See the RRPS paper for high-level descriptions of
# what each bot does.

print("Loading bot population...")
pop_size = pyspiel.ROSHAMBO_NUM_BOTS
print(f"Population size: {pop_size}")
roshambo_bot_names = pyspiel.roshambo_bot_names()
roshambo_bot_names.sort()

bot_id = 0
roshambo_bot_ids = {}
for name in roshambo_bot_names:
  roshambo_bot_ids[name] = bot_id
  bot_id += 1

roshambo_id_to_name = {v: k for k, v in roshambo_bot_ids.items()}

Loading bot population...
Population size: 43


Remove the following cell if you want to run locally. If you do this, make sure to update the save path later in the notebook.

In [4]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


The following cell will print a warning if not on a machine with a GPU. Feel free to ignore this if running on a CPU. The model is small enough that you can run on a CPU (this is what I did when evaluating the model).

In [26]:
import torch
import torch.nn as nn
import torch.nn.functional as F

gpuAvailable = torch.cuda.is_available()
if not gpuAvailable:
  print("Warning: torch did not find GPU")

WINDOW_SIZE = 200   # Size of the window to use for the LSTM
NUM_WINDOWS_PER_SERIES = 1000 - WINDOW_SIZE  # Number of windows per time series

class LSTM(nn.Module):
    """
    This model predicts the opponent's next move given 2 inputs:
    1. The first WINDOW_SIZE moves of the opponent and agent (agent acting purely randomly)
    2. The previous WINDOW_SIZE moves of the opponent and agent
    """
    def __init__(self):
        super().__init__()

        lstmHiddenSize = 100
        self.lstm = nn.LSTM(input_size=6, hidden_size=lstmHiddenSize, batch_first=True)
        if gpuAvailable:
          self.lstm.cuda()

        startingFirstActionSize = WINDOW_SIZE * 6
        self.firstResponsesNetwork = nn.Sequential(
            nn.Linear(startingFirstActionSize, startingFirstActionSize // 2),
            nn.ReLU(inplace=True),
            nn.Linear(startingFirstActionSize // 2, startingFirstActionSize // 4),
            nn.ReLU(inplace=True),
            nn.Linear(startingFirstActionSize // 4, startingFirstActionSize // 6),
            nn.ReLU(inplace=True),
        )
        if gpuAvailable:
          self.firstResponsesNetwork.cuda()

        startingCombinedSize = lstmHiddenSize + startingFirstActionSize // 6
        self.combinedNetwork = nn.Sequential(
            nn.Linear(startingCombinedSize, startingCombinedSize // 2),
            nn.ReLU(inplace=True),
            nn.Linear(startingCombinedSize // 2, startingCombinedSize // 4),
            nn.ReLU(inplace=True),
            nn.Linear(startingCombinedSize // 4, 3),
            nn.Softmax(dim=1),
        )
        if gpuAvailable:
          self.combinedNetwork.cuda()

    def forward(self, recentActions, firstActions):
        x1, _ = self.lstm(recentActions)
        x1 = x1[:, -1, :]  # Get only the last output
        x2 = self.firstResponsesNetwork(firstActions)
        x = torch.cat((x1.view(x1.size(0), -1), x2.view(x2.size(0), -1)), dim=1)
        x = self.combinedNetwork(x)
        return x



In [34]:
from random import random, randint

class LstmAgent(rl_agent.AbstractAgent):
  def __init__(self, model_path: str = 'model_stateDict.pt', num_actions: int = 3, name: str = "lstm_agent", random_chance: float = .8):
    assert num_actions > 0
    self._num_actions = num_actions  # 3
    assert 0 <= random_chance <= 1, "Random chance must be between 0 and 1"
    self.random_chance = random_chance
    self.stepNum = 0

    self.model = LSTM()
    if gpuAvailable:  # Model was trained on GPU, so need to map if not inferencing on GPU
      self.model.load_state_dict(torch.load(model_path))
    else:
      self.model.load_state_dict(torch.load(model_path, map_location=torch.device('cpu')))
    self.model.eval()

  def restart(self):
    self.stepNum = 0

  def convertHistoryToOneHotEncoding(self, history: list[int]) -> torch.Tensor:
    assert len(history) == 2 * WINDOW_SIZE, f"History has {len(history)} elements (expected {2 * WINDOW_SIZE})"
    result = np.array(history).reshape((len(history) // 2, 2))  # Reshape results
    result = np.flip(result, axis=1).copy()  # Have opponent predictions first to match model's expected format
    result = F.one_hot(torch.tensor(result), 3)   # Convert to one-hot
    result = torch.reshape(result, (1, len(result),6)) # Reshape again (into format expected by model)
    return result.type(torch.float32)

  def step(self, time_step, is_evaluation=False):
    # If it is the end of the episode, don't select an action.
    if time_step.last():
      return

    probs = np.ones(self._num_actions) / self._num_actions

    # For the first WINDOW_SIZE steps, return a random action
    if self.stepNum < WINDOW_SIZE:
      self.stepNum += 1
      return rl_agent.StepOutput(action=randint(0,2), probs=probs)

    """
    Choose a random action some amount of the time. This seems to improve performance. My theory is
    that this makes the history more similar to what the model was trained on, which is an opponent's
    response to completely random inputs. Of course, random chance can't beat most others consistently,
    so when choosing the randomness, you want to have enough randomness that the model can perform well
    but not so much randomness that the model doesn't get enough chances to beat Greenberg.
    If self.random_chance is 0, it will always use the model.
    If self.random_chance is 1, it will always be random (equivalent to randbot).
    """
    if self.random_chance > 0 and (self.random_chance == 1 or random() < self.random_chance):
      self.stepNum += 1
      return rl_agent.StepOutput(action=randint(0,2), probs=probs)
    else:
      # Run history through the LSTM to predict what opponent will do next
      game, state = pyspiel.deserialize_game_and_state(time_step.observations["serialized_state"])
      history = state.history()
      prev200 = self.convertHistoryToOneHotEncoding(history[-400:])
      first200 = self.convertHistoryToOneHotEncoding(history[:400])
      first200 = torch.unsqueeze(torch.flatten(first200), 0)

      prediction = self.model(prev200, first200)[0].argmax().item()
      action = (prediction + 1) % 3  # Select the action that beats what you think opponent will do

      return rl_agent.StepOutput(action=action, probs=probs)

In [9]:
class BotAgent(rl_agent.AbstractAgent):
  """Agent class that wraps a bot.

  Note, the environment must include the OpenSpiel state in its observations,
  which means it must have been created with use_full_state=True.

  This is a simple wrapper that lets the RPS bots be interpreted as agents under
  the RL API.
  """

  def __init__(self, num_actions, bot, name="bot_agent"):
    assert num_actions > 0
    self._bot = bot
    self._num_actions = num_actions

  def restart(self):
    self._bot.restart()

  def step(self, time_step, is_evaluation=False):
    # If it is the end of the episode, don't select an action.
    if time_step.last():
      return
    _, state = pyspiel.deserialize_game_and_state(
        time_step.observations["serialized_state"])
    action = self._bot.step(state)
    probs = np.zeros(self._num_actions)
    probs[action] = 1.0
    return rl_agent.StepOutput(action=action, probs=probs)

def create_roshambo_bot_agent(player_id, num_actions, bot_names, pop_id):
  name = bot_names[pop_id]
  # Creates an OpenSpiel bot with the default number of throws
  # (pyspiel.ROSHAMBO_NUM_THROWS). To create one for a different number of
  # throws per episode, add the number as the third argument here.
  bot = pyspiel.make_roshambo_bot(player_id, name)
  return BotAgent(num_actions, bot, name=name)

In [40]:
from tqdm import tqdm

def eval_agents_count_winrate(env, agents, num_players, num_episodes, verbose=False):
  """Slightly altered to count number of wins/losses/draws"""
  sum_episode_rewards = np.zeros(num_players)
  wins = 0
  draws = 0
  losses = 0

  for ep in tqdm(range(num_episodes)):
  # for ep in range(num_episodes):
    for agent in agents:
      # Bots need to be restarted at the start of the episode.
      if hasattr(agent, "restart"):
        agent.restart()
    time_step = env.reset()
    episode_rewards = np.zeros(num_players)
    while not time_step.last():
      agents_output = [
          agent.step(time_step, is_evaluation=True) for agent in agents
      ]
      action_list = [agent_output.action for agent_output in agents_output]
      time_step = env.step(action_list)
      episode_rewards += time_step.rewards
    sum_episode_rewards += episode_rewards

    if episode_rewards[0] < episode_rewards[1]:
      losses += 1
    elif episode_rewards[0] > episode_rewards[1]:
      wins += 1
    else:
      draws += 1

    if verbose:
      print(f"Finished episode {ep}, "
            + f"avg returns: {sum_episode_rewards / (ep+1)}")

  return sum_episode_rewards / num_episodes, wins, losses, draws

def testAgentAndRandomness(randomnessRate: float, agentId: int, numEpisodes: int) -> str:
  myAgent = LstmAgent('model_stateDict.pt', random_chance=randomnessRate)
  agents = [
      myAgent,
      create_roshambo_bot_agent(1, 3, roshambo_bot_names, agentId)
  ]
  env = rl_environment.Environment(
    "repeated_game(stage_game=matrix_rps(),num_repetitions=" +
    f"{pyspiel.ROSHAMBO_NUM_THROWS}," +
    f"recall={RECALL})",
    include_full_state=True)

  print(f"Starting eval run for {randomnessRate :.2f} random chance and against agent {roshambo_id_to_name[agentId]}")
  avg_eval_returns, wins, losses, draws = eval_agents_count_winrate(env, agents, 2, numEpisodes, verbose=False)

  resultStr = f"For {randomnessRate :.2f} random rate and against agent {roshambo_id_to_name[agentId]}:\n" \
    f"Avg return: {avg_eval_returns}\nWins: {wins}\nLosses: {losses}\nDraws: {draws}\nWin rate: {wins / numEpisodes :.2%}"
  return resultStr

Change the specified path in the next cell if running locally instead of on Colab.

In [50]:
def testAllInRange(start: int, end: int, numEpisodes: int, filename: str):
  """Tests agents with ids [start, end)"""
  for agentId in range(start, end):
    resString = testAgentAndRandomness(0.8, agentId, numEpisodes)
    # Change the below file path if running locally instead of on Colab
    with open(f'/content/drive/My Drive/CS486A4/RandomnessEvals/{filename}.txt', 'a') as f:
      f.write('\n' + resString + '\n')

Colab allows you to have 3 notebooks open at once. To speed up testing, I created 3 copies of this notebook and had each run tests against a subset of the agents. If you want to run everything in this notebook, replace the specified line below.

In [52]:
from time import time

start = time()
testAllInRange(0, 15, 400, 'finalTest1')  # To split up the work over 3 notebooks
# testAllInRange(0, 44, 400, 'finalTest')  # To run everything in this notebook
print("Total runtime:", time() - start)

Starting eval run for 0.80 random chance and against agent actr_lag2_decay


100%|██████████| 400/400 [13:14<00:00,  1.99s/it]


Starting eval run for 0.80 random chance and against agent adddriftbot2


100%|██████████| 400/400 [12:35<00:00,  1.89s/it]


Starting eval run for 0.80 random chance and against agent addshiftbot3


100%|██████████| 400/400 [12:31<00:00,  1.88s/it]


Starting eval run for 0.80 random chance and against agent antiflatbot


100%|██████████| 400/400 [12:40<00:00,  1.90s/it]


Starting eval run for 0.80 random chance and against agent antirotnbot


100%|██████████| 400/400 [12:33<00:00,  1.88s/it]


Starting eval run for 0.80 random chance and against agent biopic


100%|██████████| 400/400 [12:42<00:00,  1.91s/it]


Starting eval run for 0.80 random chance and against agent boom


100%|██████████| 400/400 [12:54<00:00,  1.94s/it]


Starting eval run for 0.80 random chance and against agent copybot


100%|██████████| 400/400 [12:36<00:00,  1.89s/it]


Starting eval run for 0.80 random chance and against agent debruijn81


100%|██████████| 400/400 [13:03<00:00,  1.96s/it]


Starting eval run for 0.80 random chance and against agent driftbot


100%|██████████| 400/400 [13:13<00:00,  1.98s/it]


Starting eval run for 0.80 random chance and against agent flatbot3


100%|██████████| 400/400 [13:16<00:00,  1.99s/it]


Starting eval run for 0.80 random chance and against agent foxtrotbot


100%|██████████| 400/400 [12:42<00:00,  1.91s/it]


Starting eval run for 0.80 random chance and against agent freqbot2


100%|██████████| 400/400 [12:26<00:00,  1.87s/it]


Starting eval run for 0.80 random chance and against agent granite


100%|██████████| 400/400 [12:29<00:00,  1.87s/it]


Starting eval run for 0.80 random chance and against agent greenberg


100%|██████████| 400/400 [13:19<00:00,  2.00s/it]

Total runtime: 11540.165916919708



