# Optimizing Tunable Bot with Optuna

First of all, this notebook is based on the following ones.
- https://www.kaggle.com/eugenkeil/simple-baseline-bot 
- https://www.kaggle.com/david1013/tunable-baseline-bot

Using [Optuna](https://github.com/optuna/optuna), this notebook demonstrates:
- automatically optimizing the parameters of the [tunable agent](https://www.kaggle.com/david1013/tunable-baseline-bot),
- analyzing the search space of the optimization with visualization features,
- refining the optimization setup based on the visualized results.

Although [Optuna](https://github.com/optuna/optuna) is an optimization tool for hyperparameters of ML models, it is applicable to any format of blackbox optimization including the tuning of rule-based agents.

I hope this notebook releases you from the repetitive labor of manual tuning.

# Install

In [None]:
# Install:
# Kaggle environments.
!git clone --quiet https://github.com/Kaggle/kaggle-environments.git
!cd kaggle-environments && pip install -q .

# GFootball environment.
!apt-get -qq update -y
!apt-get -qq install -y libsdl2-gfx-dev libsdl2-ttf-dev

# Make sure that the Branch in git clone and in wget call matches !!
!git clone --quiet -b v2.3 https://github.com/google-research/football.git
!mkdir -p football/third_party/gfootball_engine/lib

!wget -q https://storage.googleapis.com/gfootball/prebuilt_gameplayfootball_v2.3.so -O football/third_party/gfootball_engine/lib/prebuilt_gameplayfootball.so
!cd football && GFOOTBALL_USE_PREBUILT_SO=1 pip3 install -q .


# Defining Tunable Agent
The following script is mostly copied from the [original notebook](https://www.kaggle.com/david1013/tunable-baseline-bot).
Note that the parameters are passed to the agenet via environment variables.

In [None]:
%%writefile submission.py

from math import sqrt
import os

from kaggle_environments.envs.football.helpers import *

SPRINT_RANGE = float(os.environ["SPRINT_RANGE"])
SHOT_RANGE_X = float(os.environ["SHOT_RANGE_X"])
SHOT_RANGE_Y = float(os.environ["SHOT_RANGE_Y"])
GOALIE_OUT = float(os.environ["GOALIE_OUT"])
LONG_SHOT_X = float(os.environ["LONG_SHOT_X"])
LONG_SHOT_Y = float(os.environ["LONG_SHOT_Y"])

directions = [
    [Action.TopLeft, Action.Top, Action.TopRight],
    [Action.Left, Action.Idle, Action.Right],
    [Action.BottomLeft, Action.Bottom, Action.BottomRight]]

dirsign = lambda x: 1 if abs(x) < 0.01 else (0 if x < 0 else 2)

enemyGoal = [1, 0]
GOALKEEPER = 0

shot_range = [[SHOT_RANGE_X, 1], 
              [-SHOT_RANGE_Y, SHOT_RANGE_Y]]

def inside(pos, area):
    return area[0][0] <= pos[0] <= area[0][1] and area[1][0] <= pos[1] <= area[1][1]

@human_readable_agent
def agent(obs):
    controlled_player_pos = obs['left_team'][obs['active']]
    
    if obs["game_mode"] == GameMode.Penalty:
        return Action.Shot
    if obs["game_mode"] == GameMode.Corner:
        if controlled_player_pos[0] > 0:
            return Action.Shot
    if obs["game_mode"] == GameMode.FreeKick:
        return Action.Shot
    
    # Make sure player is running down the field.
    if  0 < controlled_player_pos[0] < SPRINT_RANGE and Action.Sprint not in obs['sticky_actions']:
        return Action.Sprint
    elif SPRINT_RANGE < controlled_player_pos[0] and Action.Sprint in obs['sticky_actions']:
        return Action.ReleaseSprint

    # If our player controls the ball:
    if obs['ball_owned_player'] == obs['active'] and obs['ball_owned_team'] == 0:
        
        if inside(controlled_player_pos, shot_range) and controlled_player_pos[0] < obs['ball'][0]:
            return Action.Shot
        
        elif ( abs(obs['right_team'][GOALKEEPER][0] - 1) > GOALIE_OUT   
                and controlled_player_pos[0] > LONG_SHOT_X and abs(controlled_player_pos[1]) < LONG_SHOT_Y ):
            return Action.Shot
        
        else:
            xdir = dirsign(enemyGoal[0] - controlled_player_pos[0])
            ydir = dirsign(enemyGoal[1] - controlled_player_pos[1])
            return directions[ydir][xdir]
        
    # if we we do not have the ball:
    else:
        # Run towards the ball.
        xdir = dirsign(obs['ball'][0] - controlled_player_pos[0])
        ydir = dirsign(obs['ball'][1] - controlled_player_pos[1])
        return directions[ydir][xdir]

# Running Optuna
To run Optuna, you need the following steps:

- define the objective function whose input is a `trial` object,
- inside the objective function, get parameters to be tried with `suggest` methods, run games with the suggested parameters, and return the obtained reward,
- invoke `optuna.create_study()` and `study.optimize()` passing the objective function and the number trials (`n_trials`).

Then, Optuna repeatedly runs the objective function `n_trials` times, changing the suggested parameters so that the reward is improved with Bayesian optimization.

In [None]:
import os

from kaggle_environments import make
import numpy as np
import optuna

# Optuna searches parameters that maximized the returned value from this objective function.
def objective(trial):
    # You can get Optuna's parameter suggestion with the `suggest_float` method.
    os.environ["SPRINT_RANGE"] = str(trial.suggest_float("SPRINT_RANGE", 0.0, 1.0))
    os.environ["SHOT_RANGE_X"] = str(trial.suggest_float("SHOT_RANGE_X", 0.0, 1.0))  
    os.environ["SHOT_RANGE_Y"] = str(trial.suggest_float("SHOT_RANGE_Y", 0.0, 1.0))
    os.environ["GOALIE_OUT"] = str(trial.suggest_float("GOALIE_OUT", 0.0, 1.0))
    os.environ["LONG_SHOT_X"] = str(trial.suggest_float("LONG_SHOT_X", 0.0, 1.0))
    os.environ["LONG_SHOT_Y"] = str(trial.suggest_float("LONG_SHOT_Y", 0.0, 1.0))

    # To reduce the noise in reward, let's run the game 5 times for each trial.
    rewards = []
    for _ in range(5):
        env = make("football", configuration={"scenario_name": "11_vs_11_kaggle"})
        result = env.run(["submission.py", "do_nothing"])
        rewards.append(result[-1][0]["reward"])

    return np.mean(rewards)

# You can run the optimization just passing the objective function and the number of trials to Optuna.
# Here, Optuna repeats to run the objective function for 35 times.
study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=35, show_progress_bar=True)

# Best Setting
After running all trials, you can see the best set of parameters and its reward as follows.

In [None]:
study.best_value

In [None]:
study.best_params

# Optimization History
You can see the optimization history with the visualization feature of Optuna.

In [None]:
optuna.visualization.plot_optimization_history(study)

The pandas export feature, `study.trials_dataframe()`, is also helpful to analyze the optimization history.

In [None]:
study.trials_dataframe().head()

# Search Space Visualization
The visualization functions help you understand the characteristics of the parameters and the search space.

In [None]:
# The importance plot visualizes which parameters has been dominant in the optimization.
optuna.visualization.plot_param_importances(study)

In [None]:
# The slice plot shows the objective values along each parameter.
# Here, let's focus on the most dominant parameter.
optuna.visualization.plot_slice(study, params=["SHOT_RANGE_X"])

In [None]:
# The parallel coordinate plot shows the relationship among multiple parameters and the objective function.
optuna.visualization.plot_parallel_coordinate(study)

# Refining The Search Space
You can just submit with the best parameters found above, but let's try one more step.

When the search space is too large, automatic parameter search requres a lot of trials to converge. Narrowing down the search space with human insights sometimes makes the optimization more efficient.

Taking a close look into the plots above, there're some clues to narrow down the search space:
- `SHOT_RANGE_X` is the most dominant parameter according to the importance plot;
- `SHOT_RANGE_X` works better when it is more than `0.5`;
- ...

Let's refine the search space narrowing down the range of each parameter. (I came up with the following ranges with a bit more trials I locally ran.)

In [None]:
def objective(trial):
    os.environ["SPRINT_RANGE"] = str(trial.suggest_float("SPRINT_RANGE", 0.25, 0.9))
    os.environ["SHOT_RANGE_X"] = str(trial.suggest_float("SHOT_RANGE_X", 0.5, 1.0))  
    os.environ["SHOT_RANGE_Y"] = str(trial.suggest_float("SHOT_RANGE_Y", 0.0, 1.0))
    os.environ["GOALIE_OUT"] = str(trial.suggest_float("GOALIE_OUT", 0.0, 0.4))
    os.environ["LONG_SHOT_X"] = str(trial.suggest_float("LONG_SHOT_X", 0.25, 0.75))
    os.environ["LONG_SHOT_Y"] = str(trial.suggest_float("LONG_SHOT_Y", 0.5, 1.0))
    
    rewards = []
    for _ in range(5):
        env = make("football", configuration={"scenario_name": "11_vs_11_kaggle"})
        result = env.run(["submission.py", "do_nothing"])
        rewards.append(result[-1][0]["reward"])

    return np.mean(rewards)

# You can reuse the study object to run additional 15 trials.
study.optimize(objective, n_trials=15, show_progress_bar=True)

Let's plot the history with the additional trials. Though the results will be affected by random factors, the tuner will tend to suggest better results with the refined search space.

In [None]:
optuna.visualization.plot_optimization_history(study)

In [None]:
study.best_value

In [None]:
study.best_params

# Tips
Finally, you can pickle the study object as follows so that you continue the analysis of the optimization result or resume the study in your local environment. Please make sure that the library version of Optuna is consistent b/w the Kaggle notebook and your local environment.

In [None]:
import pickle
with open("study.pkl", "wb") as fw: 
    pickle.dump(study, fw)

# You can load as:
# study = pickle.load(open("study.pkl", "rb"))

Below are some other tips:
- Outside the Kaggle notebook, you can try the [parallelized optimization](https://optuna.readthedocs.io/en/v2.2.0/tutorial/004_distributed.html) of Optuna, which makes the optimization much faster.
- Since the total number of trials, 50,  is due to the limitation of notebook runtime, increasing the number may improve the performance giving the tuner more resolution.
- The noise in the reward affects the best performance especially in a later stage of the optimization. Increasing the number of games in a trial may mitigate the problem.
- On the other hand, the performance gain with parameter tuning would be just limited after a few hundreds of trials. You might need to improve the base rule or introduce RL learners to go further.