# Exploring GFootball environment

Update 13/10/2020
I'm adding a section on running the enviroment using the Kaggle API. This is important for evulatuion, and validating the agent locally (with debugging).



I'm new to the GFootball environment and these are my notes on it's usage and behaviour. I plan to try it out (and train) using the Gym API first as I have some familiarity with it, then move on to understanding how the Kaggle environment wrappers work. Hopefully this will be useful for anyone not familiar with the OpenAI Gym API.

I'm also working on another couple of notebooks concurrently with this one:  
 - [Deep Q-learner start code](https://www.kaggle.com/garethjns/deep-q-learner-starter-code) - Building a deep Q-learner from scratch for the Simple115 version of this environment  
 - [Convolutional deep Q-learner](https://www.kaggle.com/garethjns/convolutional-deep-q-learner) - Upgrading the deep Q-learner to use a convolutional model with the SMM version of this environment  

Contents:
  1. Setup  
  2. Alternative setup
  3. Gym API   
    4. GFootball observation space and wrappers  
    5. GFootball action space and wrappers  
  4. Kaggle API
    1. Debugging agents
  6. Example with gym compatible q-learner  
  7. Convert example agent to Kaggle-compatible submission


In [None]:
import matplotlib.pyplot as plt
import pprint
import glob 
import imageio
import pathlib
import numpy as np
from typing import Tuple
from tqdm import tqdm
import os
import sys
from IPython.display import Image

# Setup
(as per https://www.kaggle.com/piotrstanczyk/gfootball-template-bot)

This downloads a pre-compiled game engine for the GFootball package, which we'll use here.

In [None]:
# GFootball environment.
!pip install kaggle_environments
!apt-get update -y
!apt-get install -y libsdl2-gfx-dev libsdl2-ttf-dev
!git clone -b v2.3 https://github.com/google-research/football.git
!mkdir -p football/third_party/gfootball_engine/lib
!wget https://storage.googleapis.com/gfootball/prebuilt_gameplayfootball_v2.3.so -O football/third_party/gfootball_engine/lib/prebuilt_gameplayfootball.so
!cd football && GFOOTBALL_USE_PREBUILT_SO=1 pip3 install .

# Some helper code
!git clone https://github.com/garethjns/kaggle-football.git
sys.path.append("/kaggle/working/kaggle-football/")

# Alternative setup 

This doesn't work in the Kaggle Kernel, but gfootball can be compiled and pip installed in Ubuntu 20 (tested in Python 3.8) with the following additional depencies. This might be useful if anyone has trouble running it locally.

```Bash
!apt-get update -y
!apt-get install -y libsdl2-gfx-dev libsdl2-ttf-dev libsdl2-image-dev

# Dependencies for PyGame
!apt-get install -y \
  python-dev \
  python-numpy \
  subversion \
  ffmpeg \
  libsdl1.2-dev \
  libsdl-image1.2-dev \
  libsdl-mixer1.2-dev \
  libsdl-ttf2.0-dev \
  libavcodec-dev \
  libavformat-dev \
  libportmidi-dev \
  libsmpeg-dev \
  libswscale-dev \

!pip install gfootball
```

# Gym interface

The GFootball package defines an API following the OpenAI Gym spec: https://gym.openai.com/docs/. This allows for standardised interaction between agent and environments, for example (with a random "agent"):

```python
import gym

env = gym.make('CartPole-v0')
obs = env.reset()
for _ in range(1000):
    env.render()
    new_obs, reward, done, info = env.step(env.action_space.sample())
env.close()
```

Note that before using any GymEnv the .reset() must be called. This returns an observation.

env.step() takes an action to apply to the environment, typically this comes from an Agent. In the above example it is just randomly sampled from the environmentâ€™s action space.

It then returns a few things:
 - new observations (which are typically passed to the agent on the next loop iteration)
 - any reward from the environment (can be negative)
 - a done flag indicating if the environment has reached a terminal state
 - "info" - a dict containing anything else, this isn't supposed to be used by the agent


## Registering and creating Gym enviroments

Gym environments need to be registered in order to be created with gym.make(). 4 versions of multiple environments are automatically registered on import of the GFootball package (see https://github.com/google-research/football/blob/master/gfootball/__init__.py).

The scenario relevant for this competition is (presumably) [11_vs_11_kaggle.py](https://github.com/google-research/football/blob/master/gfootball/scenarios/11_vs_11_kaggle.py). This defines the of players, difficulty, etc.

The 4 versions return different observations (see https://github.com/google-research/football/blob/master/gfootball/doc/observation.md for details of each).
 - simple115 (bugged, don't use)
 - simple115_v2
 - extracted
 - pixels/pixels_gray 
 
 These can be created with the following commands (note names are case sensitive):


In [None]:
import gym
import gfootball  # Required as envs registered on import

simple_env = gym.make("GFootball-11_vs_11_kaggle-simple115v2-v0")
pixels_env = gym.make("GFootball-11_vs_11_kaggle-Pixels-v0")
smm_env = gym.make("GFootball-11_vs_11_kaggle-SMM-v0")

print(f"simple115v2:\n {simple_env.__str__()}\n")
print(f"Pixels:\n {pixels_env.__str__()}\n")
print(f"SMM:\n {smm_env.__str__()}\n")

Each of these environments include various wrappers that modify the interaction of the agent with the base environment by intercepting actions and observations. For example the wrapper Simple115StateWrapper modifies the "raw" output of the base environment to match the spec described in https://github.com/google-research/football/blob/master/gfootball/doc/observation.md

## The base environment

The base environment is defined in gfootball.env.football_env:FootballEnv and can be registered for use with gym.make:

In [None]:
from gfootball.env.football_env import FootballEnv

env_name = "GFootballBase-v0"
gym.envs.register(id=env_name,
                  entry_point="gfootball.env.football_env:FootballEnv",
                  max_episode_steps=10000)

Note that FootballEnv can also be instantiated directly by importing the Python class as usual.

In either case, creating it requires a Config object (these are already defined for the pre-registered envs/scenarios).

In [None]:
from gfootball.env.config import Config

base_env = gym.make(env_name, config=Config())

Note that this sets a default Config which may differ from the Kaggle scenario, although on cursory inspection it appears to have 11 players on each side. See below to use the make function to handle config setting for different scenarios.

## Observation space


### Raw

With the base env it's possible to see all the obervation details.

In [None]:
obs = base_env.reset()

pprint.pprint(obs[0])

### simple115_v2

With the Simple115V2 wrapper, this is converted to a single array:

In [None]:
obs = simple_env.reset()

print(obs.shape)

pprint.pprint(obs)

### SMMWrapper

The SMMWrapper returns a visual representation of the game.

In [None]:
from kaggle_football.viz import generate_gif, plot_smm_obs

smm_env = gym.make("GFootball-11_vs_11_kaggle-SMM-v0")
print(smm_env.reset().shape)

generate_gif(smm_env, n_steps=200)
Image(filename='smm_env_replay.gif', format='png')

### PixelsStateWrapper

The pixels wrapper requires rendering before obtaining an observation. This appears to crash the kernel??

```python
pixels_env.render()
obs = pixels_env.reset()
```

# Action space

https://github.com/google-research/football/blob/master/gfootball/doc/observation.md#actions


# Make

The gfootball package has its own enviroment builder function, this is useful for setting the enviroment configuration for scenarios. It also adds a couple of useful wrappers.

In [None]:
from gfootball.env import create_environment

# (These are the args set by the kaggle_environments package)
COMMON_KWARGS = {"stacked": False, "representation": 'raw', "write_goal_dumps": False,
                 "write_full_episode_dumps": False, "write_video": False, "render": False,
                 "number_of_left_players_agent_controls": 1, "number_of_right_players_agent_controls": 0}

create_environment(env_name='11_vs_11_kaggle')

## Rewards

By default the enviroments awards the agent -1 on conceding a goal and +1 for scoring. 

Checkpoint rewards can be added when creating the enviroment:

In [None]:
chk_reward_env = create_environment(env_name='11_vs_11_kaggle', rewards='scoring,checkpoints')

_ = chk_reward_env.reset()
for s in range(100):
    _, r, _, _ = chk_reward_env.step(5)
    if r > 0:
        print(f"Step {s} checkpoint reward recieved: {r}")

# Academy environments

[Multuple scenarious are define for GFootball](https://github.com/google-research/football/tree/master/gfootball/scenarios). These include academy enviroments which are much simplair that the full game. They can be created by specifying the name, for example:



In [None]:
run_to_score_env = create_environment(env_name='academy_run_to_score')

# Kaggle API

## Running agents

The Kaggle API for evaluation runs agents defined in files. For example, a compatible random agent:

In [None]:
%%writefile random_agent.py
  
from typing import Any
from typing import List

import numpy as np


class RandomAgent:
    def get_action(self, obs: Any) -> int:
        return np.random.randint(19)


AGENT = RandomAgent()


def agent(obs) -> List[int]:
    return [AGENT.get_action(obs)]

This agent can be run against a built in AI, or another agent:

In [None]:
from kaggle_environments import make  
env = make("football", configuration={"save_video": True,
                                      "scenario_name": "11_vs_11_kaggle"})

# Define players
left_player = "random_agent.py"  # A custom agent, eg. random_agent.py or example_agent.py
right_player = "run_right"  # eg. A built in 'AI' agent

# Run the whole sim
# Output returned is a list of length n_steps. Each step is a list containing the output for each player as a dict.
# steps
output = env.run([left_player, right_player])

for s, (left, right) in enumerate(output):
    
    # Just print the last few steps of the output
    if s > 2990:
        print(f"\nStep {s}")

        print(f"Left player ({left_player}): \n"
              f"actions taken: {left['action']}, "
              f"reward: {left['reward']}, "
              f"status: {left['status']}, "
              f"info: {left['info']}")

        print(f"Right player ({right_player}): \n"
              f"actions taken: {right['action']}, "
              f"reward: {right['reward']}, "
              f"status: {right['status']}, "
              f"info: {right['info']}\n")

print(f"Final score: {sum([r['reward'] for r in output[0]])} : {sum([r['reward'] for r in output[1]])}")

env.render(mode="human", width=800, height=600)

The output of ```env.run()``` contains a list of output from each step.

In [None]:
print(output[-1][0].keys())
print(f"Left player: {output[-1][0]['status']}: {output[-1][0]['info']}")
print(f"Right player: {output[-1][0]['status']}: {output[-1][1]['info']}")

In [None]:
%%writefile broken_agent.py
  
from typing import Any
from typing import List

class DeliberateException(Exception):
    pass


class BrokenAgent:
    def get_action(self, obs: Any) -> int:
        raise DeliberateException(f"I am broken.")


AGENT = BrokenAgent()


def agent(obs) -> List[int]:
    return [AGENT.get_action(obs)]

Note that if the agent fails, the full traceback isn't returned unless debug mode is specified. 

For example, dubug off with a broken agent:

In [None]:
env = make("football", configuration={"save_video": True,
                                      "scenario_name": "11_vs_11_kaggle"})

output = env.run(["random_agent.py", "broken_agent.py"])

print(len(output))
print(f"Left player: {output[-1][0]['status']}: {output[-1][0]['info']}")
print(f"Right player: {output[-1][0]['status']}: {output[-1][1]['info']}")

Not so useful, but with debugging on:

In [None]:
env = make("football", debug=True,
           configuration={"save_video": True,
                          "scenario_name": "11_vs_11_kaggle"})

try:
    output = env.run(["random_agent.py", "broken_agent.py"])
except DeliberateException as e:
    print(e)

Much more useful!

## Breakpoints / interactive debugging

I don't think it's possible to set breakpoints and debug into the agent loaded by ```env.run()```, even with the ```debug=True```, however it is possible it is possible by stepping the env manually ([you're **seriously** missing out of you're not using a decent IDE like PyCharm to do this!](https://www.youtube.com/watch?v=QJtWxm12Eo0)).

In [None]:
from random_agent import agent  


env = make("football", configuration={"save_video": True, "scenario_name": "11_vs_11_kaggle"})
env.reset()

# This is the observation that is passed to agent function
obs_kag_env = env.state[0]['observation']

for _ in range(3000):
    action = agent(obs_kag_env)

    # Environment step is list of agent actions, ie [[agent_1], [agent_2]], 
    # here there is 1 action per agent.
    other_agent_action = [0]
    full_obs = env.step([action, other_agent_action])
    obs_kag_env = full_obs[0]['observation']

# Example with a Gym-compatible Q-learning agent

This section the GFootball environment with a agent designed to run in Gym environments. The agent is a linear Q agent from the [reinforcement_learning_keras](https://github.com/garethjns/reinforcement-learning-keras), which I've been working on recently to learn about RL in general. It also includes a deep Q learner, which will hopefully make a more interesting example in the future.

This linear Q agent uses only sklearn (not Keras or TensorFlow, so there's less to go wrong). It's simple, it works great on cart-pole, but don't expect it to do well here!

This agent has a .train method that is defined [here](https://github.com/garethjns/reinforcement-learning-keras/blob/master/reinforcement_learning_keras/agents/agent_base.py), internally this runs multiple episodes which run the familiar [Gym training loop](https://github.com/garethjns/reinforcement-learning-keras/blob/01b1e7e4e827e7816dae796ebefa9211a558ae7b/reinforcement_learning_keras/agents/q_learning/linear_q_agent.py#L138):

```python
    ...

    def _play_episode(self, max_episode_steps: int = 500,
                      training: bool = False, render: bool = True) -> Tuple[float, int]:
        """
        Play a single episode and return the total reward.
        :param max_episode_steps: Max steps before stopping, overrides any time limit set by Gym.
        :param training: Bool to indicate whether or not to use this experience to update the model.
        :param render: Bool to indicate whether or not to call env.render() each training step.
        :return: The total real reward for the episode.
        """
        self.env._max_episode_steps = max_episode_steps
        obs = self.env.reset()
        total_reward = 0
        for frame in range(max_episode_steps):
            action = self.get_action(obs, training=training)
            prev_obs = obs
            obs, reward, done, info = self.env.step(action)
            total_reward += reward

            if render:
                self.env.render()

            if training:
                self.update_model(s=prev_obs, a=action, r=reward, d=done, s_=obs)

            if done:
                break

        return total_reward, frame

    ...
```


In [None]:
!pip install reinforcement_learning_keras

In [None]:
import gym
from reinforcement_learning_keras.agents.components.history.training_history import TrainingHistory
from reinforcement_learning_keras.agents.q_learning.exploration.epsilon_greedy import EpsilonGreedy
from reinforcement_learning_keras.agents.q_learning.linear_q_agent import LinearQAgent
from sklearn.exceptions import DataConversionWarning

import warnings


agent = LinearQAgent(name="linear_q",
                     env_spec="GFootball-11_vs_11_kaggle-simple115v2-v0",
                     eps=EpsilonGreedy(eps_initial=0.9, decay=0.001, eps_min=0.01, 
                                       decay_schedule='linear'),
                     training_history=TrainingHistory(agent_name='linear_q', 
                                                      plotting_on=True, plot_every=25, 
                                                      rolling_average=1))

with warnings.catch_warnings():
    warnings.simplefilter('ignore', DataConversionWarning)
    agent.train(verbose=True, render=False,
                n_episodes=25, max_episode_steps=2000)

# Creating a submission

The RandomAgent defined above is submittable as it is. See also [GFootball Template Bot](https://www.kaggle.com/piotrstanczyk/gfootball-template-bot) for a similar example using model that doesn't require saved weights, like a hand crafted bot.


More complex models that use learned weights are a bit trickier; see [Deep Q-learner start code](https://www.kaggle.com/garethjns/deep-q-learner-starter-code) and [Convolutional deep Q-learner](https://www.kaggle.com/garethjns/convolutional-deep-q-learner) for examples of creating submissions for models that use external data (liked learned neural network weights, or the weights learned by the LinearQ learner above).


I'm also compiling examples and notes here https://github.com/garethjns/kaggle-football
