# Lux AI Agent Evaluation

This notebook shows how you can compare performance of two agents on Kaggle with the `lux-ai-2021` CLI.

If you want to play many matches to determine the winrate between two agents, you can run the tournament with Kaggle notebook "Save and Run".

There are a few benefits to this arrangement
- This is one way to document your winrate.
- You do not need to use your computing resources.
- You can work on other things while this runs.
- This arrangement is on Kaggle, where you might be working on.

However, there are a few issues
- You are only limited to 4 CPUs on Kaggle.
- A notebook run is at most 9 hours at once.
- You are not able to see intermediate results.
- You will need to upload your agent to Kaggle.
- To view the replays, you will need to upload your replays to https://2021vis.lux-ai.org/
- I needed to add a few lines to `main.py` so that it can run some agents with the CLI.

Regardless, I hope this is a demo on how you can use the CLI - what needs to be installed and what input does it expects.

# Installation

Install the command line interface (CLI) published by the contest makers.

You will also need to update the `kaggle-environments` Python package.

In [None]:
!npm install -g @lux-ai/2021-challenge@latest &> /dev/null
!pip install kaggle-environments -U &> /dev/null

# Load Agents

Copy the published agents from the notebook output into local directory, and unzip the submission files.

You may seek the source notebook for the submission files. However, the top solution of the notebook may not be the latest version. Exporting to a Kaggle Dataset is a workaround. To standardise the procedures, all agents will be loaded from the dataset.

In [None]:
%%bash

AGENT_ST="../input/lux-ai-published-agents/stonet2000/lux-ai-season-1-jupyter-notebook-tutorial/v18"
AGENT_GP="../input/lux-ai-published-agents/glmcdona/reinforcement-learning-openai-ppo-with-python-game/v47"
AGENT_IA="../input/lux-ai-published-agents/ilialar/lux-ai-risk-averse-baseline/v9"
AGENT_H2="../input/lux-ai-published-agents/huikang/lux-ai-working-title-bot/v2"
AGENT_H3="../input/lux-ai-published-agents/huikang/lux-ai-working-title-bot/v3"
AGENT_SA="../input/lux-ai-published-agents/shoheiazuma/lux-ai-with-imitation-learning/v3"
AGENT_SR="../input/lux-ai-published-agents/stefanschulmeister87/pure-rule-based-agent/v21"
AGENT_RD="../input/lux-ai-published-agents/realneuralnetwork/lux-ai-with-il-decreasing-learning-rate/v3"
AGENT_AB="../input/lux-ai-published-agents/andrej0marinchenko/lux-ai-big-bd-model-train/v3"
AGENT_AD="../input/lux-ai-published-agents/adityasharma01/lux-ai-with-il-decreasing-learning-rate/v7"
AGENT_H7="../input/lux-ai-published-agents/huikang/lux-ai-working-title-bot-private-version/v76"
AGENT_SU="../input/lux-ai-published-agents/shoheiazuma/lux-ai-submit-unet-il/v6"
AGENT_H9="../input/lux-ai-published-agents/huikang/lux-ai-working-title-bot-private-version/v129"
AGENT_IN="../input/lux-ai-published-agents/ironbar/luxai/best_local_agent_nairu"
AGENT_TB="../input/lux-ai-published-agents/isaiahPressman/Kaggle_Lux_AI_2021/best"


export AGENT_A_DIR=$AGENT_TB
export AGENT_B_DIR=$AGENT_H9
rm -rf agent-a/
rm -rf agent-b/
mkdir -p agent-a/
mkdir -p agent-b/
cp -r $AGENT_A_DIR/* agent-a/
cp -r $AGENT_B_DIR/* agent-b/

echo $AGENT_A_DIR > .AGENT_A_DIR
echo $AGENT_B_DIR > .AGENT_B_DIR

In [None]:
!cd agent-a && tar -xvzf *.tar.gz &> /dev/null
!cd agent-b && tar -xvzf *.tar.gz &> /dev/null

This is some code fix I need to make so that the imitation agent can run with `main.py`.

In [None]:
%%writefile agent-b/main.py
from typing import Dict
import sys
from agent import agent

try: # for Toad Brigade
    from lux_ai.rl_agent.rl_agent import agent
except:
    pass

if __name__ == "__main__":

    def read_input():
        """
        Reads input from stdin
        """
        try:
            return input()
        except EOFError as eof:
            raise SystemExit(eof)
    step = 0
    class Observation(Dict[str, any]):
        def __init__(self, player=0) -> None:
            self.player = player
            # self.updates = []
            # self.step = 0
    observation = Observation()
    observation["updates"] = []
    observation["step"] = 0
    observation["remainingOverageTime"] = 60.
    player_id = 0
    while True:
        inputs = read_input()
        observation["updates"].append(inputs)

        if inputs == "D_DONE":
            if step == 0:  # the codefix
                player_id = int(observation["updates"][0])
                observation.player = player_id
                observation["player"] = player_id
                observation["width"], observation["height"] = map(int, observation["updates"][1].split())
            actions = agent(observation, None)
            observation["updates"] = []
            step += 1
            observation["step"] = step
            print(",".join(actions))
            print("D_FINISH")

In [None]:
!cp agent-b/main.py agent-a/main.py

# Single Evaluation

We play a single match on a 12x12 board.

In [None]:
!ls agent-a

In [None]:
!ls agent-b

In [None]:
%%bash
GFOOTBALL_DATA_DIR=C lux-ai-2021 \
    --loglevel 0 --maxtime 30000  --output . \
    --width 12 --height 12 ./agent-a/main.py ./agent-b/main.py

In [None]:
!ls replays/
!cp replays/* .

You can upload the replay file onto https://2021vis.lux-ai.org/ to see the battle.

(If you figure out a way to view the replay on Kaggle notebooks given the replay file, please share with us the method)

# Batch Evaluation

If you want measure the winrate between two agents, you need to play many matches.

For each map size, we play a number of matches until timeout. For larger maps, we play for a longer time in total.

To reduce uncertainty in relative performance, the seed of the matches will have to be consistent over different plays.

In [None]:
%%writefile evaluate_for_map_size.sh

MAP_SIZE=$1
for run in {1..1000}; 
    do GFOOTBALL_DATA_DIR=C lux-ai-2021 --seed $run --loglevel 1 --maxtime 10000 \
    --height $MAP_SIZE --width $MAP_SIZE --storeReplay=false --storeLogs=false \
    ./agent-a/main.py ./agent-b/main.py >> logs-$MAP_SIZE.txt;
done

In [None]:
!chmod +x ./evaluate_for_map_size.sh

In [None]:
!timeout 1.2h bash ./evaluate_for_map_size.sh 12

In [None]:
!timeout 1.6h bash ./evaluate_for_map_size.sh 16

In [None]:
!timeout 2.4h bash ./evaluate_for_map_size.sh 24

In [None]:
!timeout 3.2h bash ./evaluate_for_map_size.sh 32

# Evaluation Summary

We estimate of the winrate with the weighted average of wins. A draw is considered half a win.

In [None]:
wins_template = """
    { rank: 1, agentID: 0, name: './agent-a/main.py' },
    { rank: 2, agentID: 1, name: './agent-b/main.py' }
"""

draw_template = """
    { rank: 1, agentID: 0, name: './agent-a/main.py' },
    { rank: 1, agentID: 1, name: './agent-b/main.py' }
"""

lose_template = """
    { rank: 1, agentID: 1, name: './agent-b/main.py' },
    { rank: 2, agentID: 0, name: './agent-a/main.py' }
"""

map_sizes = [12,16,24,32]
total_score = 0
for map_size in map_sizes:
    logfile_name = f"logs-{map_size}.txt"
    with open(logfile_name) as f:
        data_string = f.read()
        wins = data_string.count(wins_template)
        draw = data_string.count(draw_template)
        lose = data_string.count(lose_template)
        score = (wins + draw / 2)/(wins + lose + draw)*100
        total_score += score/len(map_sizes)
        print(f"Map size: {map_size}, Score: {score:.3f}, Stats: {wins}/{draw}/{lose}")
print(f"Total score: {total_score:.0f}")

In [None]:
!cat .AGENT_A_DIR
!cat .AGENT_B_DIR

Use your statistics knowledge to calculate the confidence interval of the winrate. There are [tools](https://epitools.ausvet.com.au/ciproportion) online that you can use to estimate the confidence interval.