After looking at the baseline code, I thought about what to fix and how to develop it. This notebook tells you which files to focus on, and introduces how to add modified files to the dataset, rather than simply git clone them. Additionally, I removed the unnecessarily long exposed output for visibility.

reference: https://www.kaggle.com/piotrstanczyk/gfootball-train-seed-rl-agent

![](https://github.com/seriousran/img_link/blob/master/kg/img1.JPG?raw=true)

## This is what you want to modify

1. networks for Reinforcement Learning
    - `gfootball-baseline/seed_rl/football/networks.py` & `gfootball-baseline/seed_rl/agents/vtrace/networks.py`
1. v-tracer
    - `gfootball-baseline/seed_rl/football/vtrace_main.py` & `gfootball-baseline/seed_rl/agents/vtrace/learner.py`
1. train arguments
    - `!bash train.sh football vtrace 4 '--total_environment_frames=10000 --game=11_vs_11_kaggle --reward_experiment=scoring,checkpoints --logdir=/kaggle_simulations/agent/'`
        - `AGENT = vtrace` V-trace is an off-policy actor-critic reinforcement learning algorithm
        - `NUM_ACTORS = 4`
        - `total_environment_frames = 10000` Original author said it's very small
        
## Contents
1. Install
2. Write train bash file
3. Train
4. Simulate
5. Submission

# 1. Install

In [None]:
!apt-get update
!apt-get install -y libsdl2-gfx-dev libsdl2-ttf-dev
!pip3 install tensorflow==2.2
!pip3 install tensorflow_probability==0.9.0
!pip3 install kaggle-environments -U

from IPython.display import Image, clear_output

clear_output()

In [None]:
!cp -r /kaggle/input/gfootball-baseline/* .

!mkdir -p football/third_party/gfootball_engine/lib
!wget https://storage.googleapis.com/gfootball/prebuilt_gameplayfootball_v2.3.so -O football/third_party/gfootball_engine/lib/prebuilt_gameplayfootball.so
!cd football && GFOOTBALL_USE_PREBUILT_SO=1 pip3 install .

!mkdir -p football/third_party/gfootball_engine
!cd seed_rl && git checkout 34fb2874d41241eb4d5a03344619fb4e34dd9be6

!mkdir /kaggle_simulations/agent

clear_output()

# 2. Write train bash file

SEED RL provides scripts for running training on local machine inside Docker and distributed training at scale using AI Platform. To make it run inside a notebook we need to create a launcher script (based on SEED's docker launcher script):

In [None]:
%%writefile train.sh
# Training launcher script.

# Make SEED RL visible to Python.
export PYTHONPATH=$PYTHONPATH:$(pwd)
#export PYTHONPATH=$PYTHONPATH:

ENVIRONMENT=$1
AGENT=$2
NUM_ACTORS=$3
shift 3

# Start actor tasks which run environment loop.
actor=0
while [ "$actor" -lt ${NUM_ACTORS} ]; do
  python3 seed_rl/${ENVIRONMENT}/${AGENT}_main.py --run_mode=actor --logtostderr $@ --num_actors=${NUM_ACTORS} --task=${actor} 2>/dev/null >/dev/null &
  actor=$(( actor + 1 ))
done
# Start learner task which performs training of the agent.
python3 seed_rl/${ENVIRONMENT}/${AGENT}_main.py --run_mode=learner --logtostderr $@ --num_actors="${NUM_ACTORS}"

Now we can run the training for the Kaggle competition scenario (11_vs_11_kaggle). As this example is meant to be interactive, we train for 10000 steps, which doesn't provide a good quality agent, but training should take only a few minutes.



# 3. Train

In [None]:
!bash train.sh football vtrace 8 '--total_environment_frames=1000000 --game=11_vs_11_kaggle --reward_experiment=scoring,checkpoints --logdir=/kaggle_simulations/agent/'

#clear_output()

At the end of the training Tensorflow model is saved for later use.

In [None]:
!ls -la /kaggle_simulations/agent/saved_model

# 4. Simulate

Lets first try to visualize a game played by our trained agent. For that we need to implement a wrapper which loads Tensorflow model and converts observations provided by Kaggle environment to observations accepted by the SEED agent:

In [None]:
%%writefile /kaggle_simulations/agent/main.py

import collections
import gym
import numpy as np
import os
import sys
import tensorflow as tf

from gfootball.env import observation_preprocessing
from gfootball.env import wrappers

EnvOutput = collections.namedtuple(
    'EnvOutput', 'reward done observation abandoned episode_step')

def prepare_agent_input(observation, prev_action, state):
    # SEED RL agent accepts input in a form of EnvOutput. When not training
    # only observation is used for generating action, so we use a dummy values
    # for the rest.
    env_output = EnvOutput(reward=tf.zeros(shape=[], dtype=tf.float32),
        done=tf.zeros(shape=[], dtype=tf.bool),
        observation=observation, abandoned=False,
        episode_step=tf.zeros(shape=[], dtype=tf.int32))
    # add batch dimension
    prev_action, env_output = tf.nest.map_structure(
        lambda t: tf.expand_dims(t, 0), (prev_action, env_output))

    return (prev_action, env_output, state)

# Previously executed action
previous_action = tf.constant(0, dtype=tf.int64)
# Queue of recent observations (SEED agent we trained uses frame stacking).
observations = collections.deque([], maxlen=4)
# Current state of the agent (used by recurrent agents).
state = ()

# Load previously trained Tensorflow model.
policy = tf.compat.v2.saved_model.load('/kaggle_simulations/agent/saved_model')

def agent(obs):
    global step
    global previous_action
    global observations
    global state
    global policy
    # Get observations for the first (and only one) player we control.
    obs = obs['players_raw'][0]
    # Agent we trained uses Super Mini Map (SMM) representation.
    # See https://github.com/google-research/seed_rl/blob/master/football/env.py for details.
    obs = observation_preprocessing.generate_smm([obs])[0]
    if not observations:
        observations.extend([obs] * 4)
    else:
        observations.append(obs)
    
    # SEED packs observations to reduce transfer times.
    # See PackedBitsObservation in https://github.com/google-research/seed_rl/blob/master/football/observation.py
    obs = np.packbits(obs, axis=-1)
    if obs.shape[-1] % 2 == 1:
        obs = np.pad(obs, [(0, 0)] * (obs.ndim - 1) + [(0, 1)], 'constant')
    obs = obs.view(np.uint16)

    # Execute our agent to obtain action to take.
    agent_output, state = policy.get_action(*prepare_agent_input(obs, previous_action, state))
    previous_action = agent_output.action[0]
    return [int(previous_action)]

Now we can easily visualize behavior of our agent in action:

In [None]:
from kaggle_environments import make
env = make("football", configuration={"save_video": True, "scenario_name": "11_vs_11_kaggle", "running_in_notebook": True})
env.run(["/kaggle_simulations/agent/main.py", "run_right"])
env.render(mode="human", width=800, height=600)

# 5. Submission

Prepare a submision package containing trained model and the main execution logic.

In [None]:
!cd /kaggle_simulations/agent && tar -czvf /kaggle/working/submit.tar.gz main.py saved_model

### Submit to Competition
1. "Save & Run All" (commit) this Notebook
1. Go to the notebook viewer
1. Go to "Data" section and find submit.tar.gz file.
1. Click "Submit to Competition"
1. Go to My Submissions to view your score and episodes being played.