# SEED RL agent to play GFootball
In this notebook we present a way to train V-trace off-policy actor-critic agent introduced in [IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures](https://arxiv.org/pdf/1802.01561.pdf) using [SEED RL](https://github.com/google-research/seed_rl/) framework.
Agent trained in this notebook can serve as a starting point, but as the main goal is to provide an interactive introduction into applying Deep-RL to GFootball, obtained agent is far from comprehensive. See the [Google Research Football: A Novel Reinforcement Learning Environment](https://arxiv.org/abs/1907.11180) paper for details on aplying V-trace to GFootball.

The first step is to install required tools:

In [None]:
# Install:
# GFootball environment (https://github.com/google-research/football/),
# SEED RL for training an agent (https://github.com/google-research/seed_rl/),
# Tensorflow 2.2, which is needed by SEED RL.

!apt-get update
!apt-get install -y libsdl2-gfx-dev libsdl2-ttf-dev
!pip3 install tensorflow==2.2
!pip3 install tensorflow_probability==0.9.0

# Update kaggle-environments to the newest version.
# !pip3 install kaggle-environments -U

# # Make sure that the Branch in git clone and in wget call matches !!
# !git clone -b v2.8 https://github.com/google-research/football.git
# !mkdir -p football/third_party/gfootball_engine/lib

# !wget https://storage.googleapis.com/gfootball/prebuilt_gameplayfootball_v2.8.so -O football/third_party/gfootball_engine/lib/prebuilt_gameplayfootball.so
# !cd football && GFOOTBALL_USE_PREBUILT_SO=1 pip3 install .

# !git clone https://github.com/google-research/seed_rl.git
# !cd seed_rl && git checkout 34fb2874d41241eb4d5a03344619fb4e34dd9be6
# !mkdir /kaggle_simulations/agent

In [None]:
!mkdir /kaggle/working/kaggle_simulations/
!mkdir /kaggle/working/kaggle_simulations/agent

In [None]:
!ls -r /kaggle/working/seed_rl/*.py

SEED RL provides scripts for running training on [local machine inside Docker](https://github.com/google-research/seed_rl/#local-machine-training-on-a-single-level) and [distributed training](https://github.com/google-research/seed_rl/#distributed-training-using-ai-platform) at scale using [AI Platform](https://cloud.google.com/ai-platform). To make it run inside a notebook we need to create a launcher script (based on [SEED's docker launcher script](https://github.com/google-research/seed_rl/blob/master/docker/run.sh)):

In [None]:
%%writefile train.sh
# Training launcher script.

# Make SEED RL visible to Python.
export PYTHONPATH=$PYTHONPATH:$(pwd)
ENVIRONMENT=$1
AGENT=$2
NUM_ACTORS=$3
shift 3
echo ${ENVIRONMENT}
# Start actor tasks which run environment loop.
actor=0
while [ "$actor" -lt ${NUM_ACTORS} ]; do
  python3 seed_rl/${ENVIRONMENT}/${AGENT}_main.py --run_mode=actor --logtostderr $@ --num_actors=${NUM_ACTORS} --task=${actor} 2>/dev/null >/dev/null &
  actor=$(( actor + 1 ))
done
# Start learner task which performs training of the agent.
python3 seed_rl/${ENVIRONMENT}/${AGENT}_main.py --run_mode=learner --logtostderr $@ --num_actors="${NUM_ACTORS}"


Now we can run the training for the Kaggle competition scenario (11_vs_11_kaggle). As this example is meant to be interactive, we train for  10000 steps, which doesn't provide a good quality agent, but training should take only a few minutes.

In [None]:
!bash train.sh football vtrace 4 '--total_environment_frames=10000 --game=11_vs_11_kaggle --reward_experiment=scoring,checkpoints --logdir=/kaggle/working/kaggle_simulations/agent/'

At the end of the training Tensorflow model is saved for later use.

In [None]:
!ls -la /kaggle_simulations/agent/saved_model

Lets first try to visualize a game played by our trained agent. For that we need to implement a wrapper which loads Tensorflow model and converts observations provided by Kaggle environment to observations accepted by the SEED agent:

In [None]:
%%writefile /kaggle_simulations/agent/main.py

import collections
import gym
import numpy as np
import os
import sys
import tensorflow as tf

from gfootball.env import observation_preprocessing
from gfootball.env import wrappers

EnvOutput = collections.namedtuple(
    'EnvOutput', 'reward done observation abandoned episode_step')

def prepare_agent_input(observation, prev_action, state):
    # SEED RL agent accepts input in a form of EnvOutput. When not training
    # only observation is used for generating action, so we use a dummy values
    # for the rest.
    env_output = EnvOutput(reward=tf.zeros(shape=[], dtype=tf.float32),
        done=tf.zeros(shape=[], dtype=tf.bool),
        observation=observation, abandoned=False,
        episode_step=tf.zeros(shape=[], dtype=tf.int32))
    # add batch dimension
    prev_action, env_output = tf.nest.map_structure(
        lambda t: tf.expand_dims(t, 0), (prev_action, env_output))

    return (prev_action, env_output, state)

# Previously executed action
previous_action = tf.constant(0, dtype=tf.int64)
# Queue of recent observations (SEED agent we trained uses frame stacking).
observations = collections.deque([], maxlen=4)
# Current state of the agent (used by recurrent agents).
state = ()

# Load previously trained Tensorflow model.
policy = tf.compat.v2.saved_model.load('/kaggle_simulations/agent/saved_model')
step_nr = 0

def agent(obs):
    global step_nr
    global previous_action
    global observations
    global state
    global policy
    # Get observations for the first (and only one) player we control.
    obs = obs['players_raw'][0]
    # Agent we trained uses Super Mini Map (SMM) representation.
    # See https://github.com/google-research/seed_rl/blob/master/football/env.py for details.
    obs = observation_preprocessing.generate_smm([obs])[0]
    print(obs.shape)
    if not observations:
        observations.extend([obs] * 4)
    else:
        observations.append(obs)
    
    # SEED packs observations to reduce transfer times.
    # See PackedBitsObservation in https://github.com/google-research/seed_rl/blob/master/football/observation.py
    obs = np.concatenate(list(observations), axis=-1)
    obs = np.packbits(obs, axis=-1)
    if obs.shape[-1] % 2 == 1:
        obs = np.pad(obs, [(0, 0)] * (obs.ndim - 1) + [(0, 1)], 'constant')
    obs = obs.view(np.uint16)

    # Execute our agent to obtain action to take.
    enc = lambda x: x
    dec = lambda x, s=None: x if s is None else tf.nest.pack_sequence_as(s, x)
    agent_output, state = policy.get_action(*dec(enc(prepare_agent_input(obs, previous_action, state))))
    previous_action = agent_output.action[0]
    return [int(previous_action)]

Now we can easily visualize behavior of our agent in action:

In [None]:
from kaggle_environments import make
env = make("football", configuration={"save_video": True, "scenario_name": "11_vs_11_kaggle", "running_in_notebook": True})
env.run(["/kaggle_simulations/agent/main.py", "run_right"])
env.render(mode="human", width=800, height=600)

In [None]:
# Prepare a submision package containing trained model and the main execution logic.
!cd /kaggle_simulations/agent && tar -czvf /kaggle/working/submit.tar.gz main.py saved_model

# Submit to Competition
1. "Save & Run All" (commit) this Notebook
1. Go to the notebook viewer
1. Go to "Data" section and find submit.tar.gz file.
1. Click "Submit to Competition"
1. Go to [My Submissions](https://www.kaggle.com/c/football/submissions) to view your score and episodes being played.