# Trial with Graph Convolutional Network

## 方針

### 自己対戦による訓練

* Q関数が相手によって変わってしまう。
    * On-policyで自己対戦にすればOK。Off-policyにしたい場合は、Q関数が大きく違わない過去エピソードとすべき。
        * Rainbow
            * PFRLに実装あり。まずはこれ？
        * MuZero
            * 実装少し大変かも。だが自己対戦による実績あるため性能は出るかも？
    * Policyを持つアルゴリズムならOK。相手ごとにQ関数を学習すれば良い。この場合も自己対戦かつOn-policy。
        * AC3、PPOなど
            * PFRLに実装あり。これもトライ？
            
### 方策／Q関数モデル

* Graph Neural Network
    * あたかもそれぞれの選手が行動判断／価値判断しているようなモデルにする。選択されたアクションは、その時Activeな選手の最善手とする。
    * 初めは、特徴量は、絶対位置座標で、完全グラフを用いる。

## Observations

### `simple115_v2`

Same as simple115, but with the bug fixed.

*   22 - (x,y) coordinates of left team players
*   22 - (x,y) direction of left team players
*   22 - (x,y) coordinates of right team players
*   22 - (x, y) direction of right team players
*   3 - (x, y and z) - ball position
*   3 - ball direction
*   3 - one hot encoding of ball ownership (noone, left, right)
*   11 - one hot encoding of which player is active
*   7 - one hot encoding of `game_mode`

Entries for players that are not active (either due to red cards or if number of
player is less than 11) are set to -1.


## Actions

### Default action set

The default action set consists of 19 actions:

*   Idle actions

    *   `action_idle` = 0, a no-op action, stickly actions are not affected (player maintains his directional movement etc.).

*   Movement actions

    *   `action_left` = 1, run to the left, sticky action.
    *   `action_top_left` = 2, run to the top-left, sticky action.
    *   `action_top` = 3, run to the top, sticky action.
    *   `action_top_right` = 4, run to the top-right, sticky action.
    *   `action_right` = 5, run to the right, sticky action.
    *   `action_bottom_right` = 6, run to the bottom-right, sticky action.
    *   `action_bottom` = 7, run to the bottom, sticky action.
    *   `action_bottom_left` = 8, run to the bottom-left, sticky action.

*   Passing / Shooting

    *   `action_long_pass` = 9, perform a long pass to the player on your team. Player to pass the ball to is auto-determined based on the movement direction.
    *   `action_high_pass` = 10, perform a high pass, similar to `action_long_pass`.
    *   `action_short_pass` = 11, perform a short pass, similar to `action_long_pass`.
    *   `action_shot` = 12, perform a shot, always in the direction of the opponent's goal.

*   Other actions

    *   `action_sprint` = 13, start sprinting, sticky action. Player moves faster, but has worse ball handling.
    *   `action_release_direction` = 14, reset current movement direction.
    *   `action_release_sprint` = 15, stop sprinting.
    *   `action_sliding` = 16, perform a slide (effective when not having a ball).
    *   `action_dribble` = 17, start dribbling (effective when having a ball), sticky action. Player moves slower, but it is harder to take over the ball from him.
    *   `action_release_dribble` = 18, stop dribbling.

### V2 action set

It is an extension of the default action set:

*   `action_builtin_ai` = 19, let game's built-in AI generate an action

In [1]:
# Install:
# Kaggle environments.
!git clone https://github.com/Kaggle/kaggle-environments.git
!cd kaggle-environments && pip install .

# GFootball environment.
!apt-get update -y
!apt-get install -y libsdl2-gfx-dev libsdl2-ttf-dev

# Make sure that the Branch in git clone and in wget call matches !!
!git clone -b v2.3 https://github.com/google-research/football.git
!mkdir -p football/third_party/gfootball_engine/lib

!wget https://storage.googleapis.com/gfootball/prebuilt_gameplayfootball_v2.3.so -O football/third_party/gfootball_engine/lib/prebuilt_gameplayfootball.so
!cd football && GFOOTBALL_USE_PREBUILT_SO=1 pip3 install .

fatal: destination path 'kaggle-environments' already exists and is not an empty directory.
Processing /notebooks/kaggle/gfootball/kaggle-environments
Building wheels for collected packages: kaggle-environments
  Building wheel for kaggle-environments (setup.py) ... [?25ldone
[?25h  Created wheel for kaggle-environments: filename=kaggle_environments-1.3.14-py3-none-any.whl size=302295 sha256=9665b0c846a40bb498c24a23b625f218221f7ccea7aaecbaee1022e04aed3d6e
  Stored in directory: /root/.cache/pip/wheels/32/ff/b6/a9ab62cd5f60b2492aa5d5bc96a6d12bb1158496e87a4576ec
Successfully built kaggle-environments
Installing collected packages: kaggle-environments
  Attempting uninstall: kaggle-environments
    Found existing installation: kaggle-environments 1.3.14
    Uninstalling kaggle-environments-1.3.14:
      Successfully uninstalled kaggle-environments-1.3.14
Successfully installed kaggle-environments-1.3.14
Hit:1 http://archive.ubuntu.com/ubuntu bionic InRelease
Get:2 http://archive.ubuntu.

## Install

In [3]:
!pip install pfrl==0.1.0



In [4]:
# ------------------ install torch_geometric begin -----------------
try:
    import torch_geometric
except:
    import subprocess
    import torch

    nvcc_stdout = str(subprocess.check_output(['nvcc', '-V']))
    tmp = nvcc_stdout[nvcc_stdout.rfind('release') + len('release') + 1:]
    cuda_version = tmp[:tmp.find(',')]
    cuda = {
            '9.2': 'cu92',
            '10.1': 'cu101',
            '10.2': 'cu102',
            }

    CUDA = cuda[cuda_version]
    TORCH = torch.__version__.split('.')
    TORCH[-1] = '0'
    TORCH = '.'.join(TORCH)

    install1 = 'pip install torch-scatter==latest+' + CUDA + ' -f https://pytorch-geometric.com/whl/torch-' + TORCH + '.html'
    install2 = 'pip install torch-sparse==latest+' + CUDA + ' -f https://pytorch-geometric.com/whl/torch-' + TORCH + '.html'
    install3 = 'pip install torch-cluster==latest+' + CUDA + ' -f https://pytorch-geometric.com/whl/torch-' + TORCH + '.html'
    install4 = 'pip install torch-spline-conv==latest+' + CUDA + ' -f https://pytorch-geometric.com/whl/torch-' + TORCH + '.html'
    install5 = 'pip install torch-geometric'

    subprocess.run(install1.split())
    subprocess.run(install2.split())
    subprocess.run(install3.split())
    subprocess.run(install4.split())
    subprocess.run(install5.split())
# ------------------ install torch_geometric end -----------------

In [16]:
import os
import cv2
import sys
import glob 
import random
import imageio
import pathlib
import collections
from collections import deque
import numpy as np
import argparse
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
%matplotlib inline

from gym import spaces
from tqdm import tqdm
from logging import getLogger, StreamHandler, FileHandler, DEBUG, INFO
from typing import Union, Callable, List, Tuple, Iterable, Any, Dict
from dataclasses import dataclass
from IPython.display import Image, display
sns.set()


# PFRL
import pfrl
from pfrl.agents import CategoricalDoubleDQN
from pfrl import experiments
from pfrl import explorers
from pfrl import nn as pnn
from pfrl import utils
from pfrl import replay_buffers
from pfrl.wrappers import atari_wrappers
from pfrl.q_functions import DistributionalDuelingDQN

# PyTorch
import torch
from torch import nn

# PyTorch geometric
from torch_geometric.data import Data
from torch_geometric.nn import RGCNConv

# Env
import gym
import gfootball
import gfootball.env as football_env
from gfootball.env import observation_preprocessing

## Config

In [6]:
# Check we can use GPU
print(torch.cuda.is_available())

# set gpu id
if torch.cuda.is_available(): 
    # NOTE: it is not number of gpu but id which start from 0
    gpu = 0
else:
    # cpu=>-1
    gpu = -1

True


In [7]:
# set logger
def logger_config():
    logger = getLogger(__name__)
    handler = StreamHandler()
    handler.setLevel("DEBUG")
    logger.setLevel("DEBUG")
    logger.addHandler(handler)
    logger.propagate = False

    filepath = './result.log'
    file_handler = FileHandler(filepath)
    logger.addHandler(file_handler)
    return logger

logger = logger_config()

In [8]:
# fixed random seed
# but this is NOT enough to fix the result of rewards.Please tell me the reason.
def seed_everything(seed=1234):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.deterministic = True
    utils.set_random_seed(seed)  # for PFRL
    
# Set a random seed used in PFRL.
seed = 5046
seed_everything(seed)

# Set different random seeds for train and test envs.
train_seed = seed
test_seed = 2 ** 31 - 1 - seed

## Environment

In [23]:
env = football_env.create_environment(
    env_name='11_vs_11_easy_stochastic',  # easy mode
    stacked=False,
    representation='simple115v2',           # SMM
    rewards='scoring, checkpoints',
    write_goal_dumps=False,
    write_full_episode_dumps=False,
    render=False,
    write_video=False,
    dump_frequency=1,
    logdir='./',
    extra_players=None,
    number_of_left_players_agent_controls=1,
    number_of_right_players_agent_controls=0,
)

In [10]:
array = env.reset()

## Create graph

The flag to distinguish left players, right players, and the ball.

* left players: 0
* right players: 1
* ball: 2

In [11]:
left_coordinations = np.concatenate([array[:22].reshape(11, 2), np.zeros((11, 1))], axis=-1)
left_directions = np.concatenate([array[22:44].reshape(11, 2), np.zeros((11, 1))], axis=-1)
right_coordinations = np.concatenate([array[44:66].reshape(11, 2), np.zeros((11, 1))], axis=-1)
right_directions = np.concatenate([array[66:88].reshape(11, 2), np.zeros((11, 1))], axis=-1)
ball_coordination = array[88:91].reshape([1, 3])
ball_direction = array[91:94].reshape([1, 3])
ball_ownership = array[94:97] # none, left, right
active_player = array[97:108].reshape([11, 1])
game_mode = array[108:]

In [18]:
# Node features
left_features = np.concatenate([
    0*np.ones((11, 1)),
    left_coordinations,
    left_directions,
    ball_ownership[1]*np.ones((11, 1)),
    active_player,
], axis=-1)
right_features = np.concatenate([
    1*np.ones((11, 1)),
    right_coordinations,
    right_directions,
    ball_ownership[2]*np.ones((11, 1)),
    np.zeros((11, 1)),
], axis=-1)
ball_features = np.concatenate([
    2*np.ones((1, 1)),
    ball_coordination,
    ball_direction,
    np.zeros((1, 1)),
    np.zeros((1, 1)),
], axis=-1)

features = np.concatenate([left_features, right_features, ball_features], axis=0)[:,1:]

In [13]:
# Edges and relations
X, Y = np.meshgrid(np.arange(len(features)), np.arange(len(features)))
all_combinations = np.vstack([X.flatten(), Y.flatten()]).T
edge_index = np.array(
    [combination for combination in all_combinations if not combination[0] == combination[1]]
).T
types_for_edge_index = features[edge_index][:,:,0]
relations_dict = {
    (0., 0.): 0, # left player -> left player
    (0., 1.): 1, # left player -> right player
    (0., 2.): 2, # left player -> ball
    (1., 0.): 3, # right player -> left player
    (1., 1.): 4, # right player -> right player
    (1., 2.): 5, # right player -> ball
    (2., 0.): 6, # ball -> left player
    (2., 1.): 7, # ball -> right player
}
relations = [relations_dict[tuple(types)] for types in types_for_edge_index.T]

In [14]:
# numpy array to torch tensor
features = torch.tensor(features).contiguous()
edge_index = torch.tensor(edge_index).contiguous()
relations = torch.tensor(relations).contiguous()

In [15]:
graph = Data(x=features, edge_index=edge_index, relations=relations)

In [None]:
class Policy(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = RGCNConv(graph.num_node_features, 32)
        self.conv2 = RGCNConv(32, 64)
        self.conv3 = RGCNConv(64, 128)
        self.linear = nn.Linear(128, 19)

    def forward(self, data):
        x, edge_index = data.x, data.edge_index

        x = self.conv1(x, edge_index)
        x = F.relu(x)
        x = F.dropout(x, training=self.training)

        x = self.conv2(x, edge_index)
        x = F.relu(x)
        x = F.dropout(x, training=self.training)

        x = self.conv3(x, edge_index)
        x = F.relu(x)
        x = F.dropout(x, training=self.training)

        x = self.linear(x)

        return F.log_softmax(x, dim=1)

In [None]:
class DynamicsModel:
    def __init__(self):
        pass