> # Introduction

In this notebook, we will create a submission file for the ConnectX competition starting from the trained agents that we obtained in [the first part](https://www.kaggle.com/koutetsu/connectx-multi-agent-reinforcement-learning-1).

This is inspired by [this notebook](https://www.kaggle.com/phunghieu/connectx-with-deep-q-learning) by [Hieu Phung](https://www.kaggle.com/phunghieu), so if you do like this notebook don't forget to upvote his notebook too.

# Libraries

In [None]:
from pathlib import Path
import pickle
from typing import Union
import sys

from jinja2 import Template
from kaggle_environments import make, evaluate
import numpy as np

This is needed to make sure all the values in numpy arrays are printed and added to the submission template

In [None]:
np.set_printoptions(threshold=sys.maxsize)

We print here the versions of the important packages for repeatability's sake

In [None]:
!pip list | grep -wE "kaggle-environments"

# Constants

In [None]:
CURRENT_PATH = !pwd
CURRENT_PATH = Path(CURRENT_PATH[0])

In [None]:
submission_template_file = CURRENT_PATH /  "../input/connectx-submission-template/submission_template.py"

In [None]:
checkpoint_dir = CURRENT_PATH / "../input/connectx-multiagent-checkpoints/results/ppo_connect_four/PPO_ConnectFourGym-v0_eef31_00000_0_2020-11-08_13-33-06"

In [None]:
submission_file = CURRENT_PATH / "submission.py"

# Submission Template 

In [None]:
with submission_template_file.open("r") as f:
    print(f.read())

As can be seen from the template, we redefined the neural network from the previous notebook but without using ray or rllib at all.
We pass the neural network's output, logits, directly to a categorical distribution and sample the action from it.
This way not only is the code simpler, but it is also faster ( Because ray and rllib do a lot of things on import ).

This following function takes care of filling in the missing values in the template using Jinja2 and values from a checkpoint file.

In [None]:
def create_submission_from_template(
    submission_template_file: Union[str, Path], 
    submission_file: Union[str, Path],
    checkpoint_path: Union[str, Path],
    agent: str = "default_policy"
):
    if not checkpoint_path.is_file():
        raise ValueError(f"{checkpoint_path} is not a valid path to a checkpoint file")

    with checkpoint_path.open("rb") as f:
        checkpoint_data = pickle.load(f)

    network_data = pickle.loads(checkpoint_data["worker"])["state"][agent]
    del network_data["_optimizer_variables"]

    network_width = network_data["shared_layers.3.weight"].shape[0]

    network_data_as_str = "{ "
    for k, v in network_data.items():
        if "vf_layers" in k:
            continue
        parameter_str = f"'{k}': np.{np.array_repr(v)}, "
        parameter_str = parameter_str.replace("dtype=", "dtype=np.")
        network_data_as_str += parameter_str
    network_data_as_str = network_data_as_str[:-2]
    network_data_as_str += " }"

    with submission_template_file.open("r") as f:
        submission_template = f.read()

    submission = Template(submission_template).render(
        network_data=network_data_as_str,
        network_width=network_width,
    )

    with submission_file.open("w") as f:
        f.write(submission)

We now use the latest checkpoint from the previous notebook and choose **agent1** ( For no particular reason ) to create a submission file.

In [None]:
checkpoint_paths = []
for path in checkpoint_dir.iterdir():
    if path.is_dir():
        path = path / path.parts[-1].replace("_", "-")
        checkpoint_paths.append(path)

checkpoint_paths = sorted(checkpoint_paths, key=lambda x: int(x.name.replace("checkpoint-", "")))
        
last_checkpoint_file = checkpoint_paths[-1]

In [None]:
create_submission_from_template(submission_template_file, submission_file, last_checkpoint_file, "agent1")

In [None]:
with submission_file.open("r") as f:
    print(f.read())

We test our submission file in the kaggle environment to make sure that it works before submitting it to the competition

In [None]:
env = make("connectx", debug=True)

In [None]:
env.run([str(submission_file), "random"])
env.render(mode="ipython")

We run multiple games to make sure that we didn't make a mistake and that the agent's performance didn't deteriorate

In [None]:
def get_win_percentages(agent1, agent2, n_rounds=100):
    # Use default Connect Four setup
    config = {'rows': 6, 'columns': 7, 'inarow': 4}
    # Agent 1 goes first (roughly) half the time          
    outcomes = evaluate("connectx", [agent1, agent2], config, [], n_rounds//2)
    # Agent 2 goes first (roughly) half the time      
    outcomes += [[b,a] for [a,b] in evaluate("connectx", [agent2, agent1], config, [], n_rounds-n_rounds//2)]
    print("Agent 1 Win Percentage:", np.round(outcomes.count([1,-1])/len(outcomes), 2))
    print("Agent 2 Win Percentage:", np.round(outcomes.count([-1,1])/len(outcomes), 2))
    print("Number of Invalid Plays by Agent 1:", outcomes.count([None, 0]))
    print("Number of Invalid Plays by Agent 2:", outcomes.count([0, None]))

In [None]:
get_win_percentages(str(submission_file), "random")

# Conclusion

In this notebook, we have seen how one can create a submission file for ConnectX starting from a PyTorch model trained using RLLib.

We did that simply by hardcoding the model's parameters in the submission file.

If you have read this notebook, I hope that it informative and useful to you.