# The board is a sentence
Hello everyone, for several months I have been convinced that Hungry Geese is actually a sentence ... yes, a sentence with words. To be more precise, the board is a sentence composed of 77 words in 77 different positions. And what better way to model the sentences than with transformers!

# Leveraging kaggle free TPU (overkilling it?)
Now let me prove to you that it works. We are going to proceed by the simplest imitation learning possible. Attached to this notebook is a dataset of leaderboard games. In this notebook we will preprocess them. I used TPU with tensorflow to train my imitation learning model with transformers, so it is necessary to adopt a particular preprocessing to make it compatible with TPU.

In [None]:
!cd ../input/hungry-geese-mvp-iml-ds && ls | wc -l

In [None]:
import glob
import json
from tqdm.auto import tqdm

path = "../input/hungry-geese-mvp-iml-ds/*[!_info].json"
games = []
i = 1
for path_to_json in tqdm(glob.glob(path)):
    with open(path_to_json, 'r') as json_file:
        data = json_file.read()        
        games.append(json.loads(data)["steps"])

In [None]:
def outcome(last_obs):
    # return terminal outcomes
    # 1st: 1.0 2nd: 0.33 3rd: -0.33 4th: -1.00
    rewards = {o['observation']['index']: o['reward'] for o in last_obs}
    outcomes = {p: 0 for p in range(4)}
    for p, r in rewards.items():
        if r is None:
            r = np.NINF
        for pp, rr in rewards.items():
            if rr is None:
                rr = np.NINF
            if p != pp:
                if r > rr:
                    outcomes[p] += 1 / (4 - 1)
                elif r < rr:
                    outcomes[p] -= 1 / (4 - 1)
    return outcomes

In [None]:
import numpy as np
from kaggle_environments.envs.hungry_geese.hungry_geese import Action
def preprocess_map_obs(obs, previous_obs=None, p=None):
    if p is None:
        p = 0
    
    #Thank you very much yuricat for your suggestion to use relative positions.
    #This is really what made the model work!
    relativ_center = obs[0]['observation']['geese'][p][0]
    relativ_poss = np.roll(np.arange(77), relativ_center)
    
    sentence = []
    positions = []
    # Our vocabulary size is 50 :
    # index 0 is for an empty tile
    # then we have the vocabulary ranges for the first player : index 1 to 12 (included)
    # for the second, third and fourth players (13 to 24, 25 to 36, 37 to 48)
    # and index 49 is for food
    # we then divide each player index range into 3 sub ranges
    # which correspond to the body part
    # for player one we have 1 to 4, 5 to 8 and 9 to 12
    # and finally we divise these sub ranges into 4 sub sub ranges
    # which correspond to the last action played
    for pp, player in enumerate(obs):
        real_player_index = pp
        player_index = (pp - p) % 4
        geese_length = len(obs[0]['observation']['geese'][real_player_index])
        for goose_body_position, goose_board_position in enumerate(obs[0]['observation']['geese'][real_player_index]):
            if goose_body_position == 0:
                body_part = 0
            elif goose_body_position == (geese_length-1):
                body_part = 2
            else:
                body_part = 1

            last_action = Action[obs[real_player_index]['action']].value

            index_player = 3*4 * player_index
            index_bodypart = 4*body_part
            
            word_unique_index = index_player + index_bodypart + last_action + 1
            sentence.append(word_unique_index)
            
            position = relativ_poss[goose_board_position]
            positions.append(position)
            

    for food_board_position in obs[0]['observation']['food']:
        word_unique_index = 47 + 1 + 1
        sentence.append(word_unique_index)
        
        position = relativ_poss[food_board_position]
        positions.append(position)
        
    left_positions = set(range(0,77))-set(positions)
    
    positions = positions + list(left_positions)
    sentence = sentence + [0]*(77-len(sentence))
        
    return sentence, positions

In [None]:
import tensorflow as tf
# Converting the values into features
# _int64 is used for numeric values
def _int64_feature(value):
    return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))

def _int64_list_feature(value):
    return tf.train.Feature(int64_list=tf.train.Int64List(value=value))

In [None]:
action_encoder_dict = {
    "NORTH": 0,
    "SOUTH": 1,
    "WEST": 2,
    "EAST": 3
}

In [None]:
import bz2
import pickle

file_nb = 0
# Writing the serialized example.
record_file = f'df_{file_nb}.tfrec'
writer = tf.io.TFRecordWriter(record_file)
idx = 0
for steps in tqdm(games):
    previous_step = None
    winners = list(np.argsort(list(outcome(steps[-1]).values()))[3:])
    for step_idx, step in enumerate(steps): 
        if step_idx == len(steps)-1:
            continue;
        
        for p, player in enumerate(step):        
            if not player['observation']['index'] in winners:
                continue;
            
            if not player["status"] == "ACTIVE":
                continue;
            
            action = steps[step_idx+1][p]['action']
            
            if action is None:
                continue;
            
            if (idx+1)%100000 == 0:
                writer.close()
                file_nb += 1
                # Writing the serialized example.
                record_file = f'df_{file_nb}.tfrec'
                writer = tf.io.TFRecordWriter(record_file)
                print(f"incrementing file number to {file_nb}")
            
            label = action_encoder_dict[action]
            sentence, positions = preprocess_map_obs(step, previous_step, p)
            
            feature = {
                'label': _int64_feature(label),
                'sentence': _int64_list_feature(sentence),
                'positions': _int64_list_feature(positions),
            }
            example = tf.train.Example(features=tf.train.Features(feature=feature))
            writer.write(example.SerializeToString())
            
            idx += 1
            
        previous_step = step
writer.close()

In [None]:
!ls -lh *

# Final words regarding preprocessing
We need to "Save & Run All (Commit)" with option "Always save output" and then go to the commited version, then to the output and create a dataset out of it.
We can either create a private dataset or a public. It is important to keep in mind wether the dataset is a public or a private one, because when using TPU the ways to load the dataset on the TPU are not the same depending the two configurations.