# Analyzing Blood Bowl kick-off formations using FUMBBL replay data

the botbowl ppl are discussing writing a parser for the json that would result in log with chess like notation ie P1 - B - E4 to C5 - (player one chooses blitz, moves from position e4 to c5 etc.
Currently my knowledge is too limited to contribute to such an endeavour.

I have however cooked up a new FUMBBL data analysis project using replays. 
To learn how to work with the replay files, I want to try and extract the set-up formations for a high stakes tournament (Road to Malta, or the Tilean Team Cup).  
With the data I like to develop a nice viz, working with board positions together with the roster. 
When that goes somewhere I will scale it up to order 100, and try to make plots that either stack / aggregate formations, or do a cluster analysis on the start up formations, and see what can be learned from all of this 🙂 

# Related games: Chess and Starcraft

For Chess, ppl make heatmaps for all pieces and for pieces separately, and locations where pieces are removed from play.
For Blood Bowl, we can think of trajectories where the ball goes. Or where CAS occur.

For Starcraft ,replays were very important to train the AI.

```
Although learning in StarCraft can be performed through playing, the dynamics of the game are extremely complex,and it is beneficial to speed up learning by using existinggames. The availability of datasets of recorded games be-tween experienced players is therefore desirable
``` (From Lin et al 2017)

Another quote:

```
Hence,  the  utility  of  a  replay  dataset  can  be  increased  byextracting game states, validating them and storing them asa separate dataset.
```

In  what  follows  we  refer  to  the  StarCraft  recorded  gamesas the original replays, and the TorchCraft recorded gamestates asextracted replays.

Interestingly:

```
Opening clustering
As an example of what exploratory data analysis can yield,inspired from (Synnaeve and Bessiere 2011b), we performed clustering to in search for canonical opening strategies.
```

# Replay files: What has already been done

Christian Huber (aka Candlejack) seems our man here. He has two repo's open that are of great interest.

First is https://github.com/SanityResort/htmlreplay

This connect to the FFB server and requests a replay file that it reads in and converts to JSON.
The goal seems to be to be able to visualize a replay in the browser via HTML.

Then there is https://github.com/SanityResort/FFBStats
This is code that processes replay files and extracts information from them. Exactly what we want as well!
It was written in 2016 and integrated into the site in 2017 by Christer. Then went away and came back in 2022.
It produces a match statistics file as JSON, that forms the basis of a nice visualization on the match result page on FUMBBL.
The match statistics are available through the API.

https://fumbbl.com/p/match?op=stats&id=3916966

# FUMBBL Replay datafiles: opening up the black box

Replay data is quite verbose and there’s lots of it. The FFB client communicates with the Server using Java Web Sockets.
The raw data packages send over the line are web sockets. 
A replay file would be the json command stream. It’s a complex format, as it’s more or less just logging the data packets between the client and server. 

For example, after unzipping, opening as JSON in VScode and doing autoformatting, we end up with a 266K lines of client server "command" stream.
Each turn is about 10K lines, with roster info at the end.

The high level file format is as follows:

```
{
    "gameStatus": "uploaded",
    "stepStack": {
        "steps": []
    },
    "gameLog": {}, # contains the command stream
    "game": {}, # contains the full roster and position information
    "playerIds": [],
    "swarmingPlayerActual": 0,
    "passState": {},
    "prayerState": {},
    "activeEffects": {}
}
```

I noticed that the replays are really nicely self contained, they contain full copies of the rosters and (if i remember correctly) even ruleset.
Everything in the ruleset that has a bearing on the client
Basically the "client options" tab of the ruleset

A full history of all the events during the game is stored under `gameLog`.

The various phases of the match are clearly distinguished in the command streams.
`turnDataSetTurnNr` , `turnDataSetFirstTurnAfterKickoff`, `gameSetTurnMode`, etc.


## netCommands that communicate changes

The basic unit is the command, that is indexed by `commandNr`. 
A match consists of several thousand commands.
A typical command has the following **FIXED** structure:

```
{
    "netCommandId": "serverModelSync",
    "commandNr": 243,
    "modelChangeList": {
        "modelChangeArray": []
    },
    "reportList": {
        "reports": []
    },
    "sound": null,
    "gameTime": 805789,
    "turnTime": 179506
}
```

`modelChange` changes the game state.
`reportList` directs output to the client's reporting panel.



## The Field model including the coordinate system

FFB uses a field model that is very straightforward. The 15 x 26 game board is indexed using (X,Y) coordinates, with the top left square being (0,0). 
and the lower right square being (25, 14). 

Players can be either:
* On the pitch
* In the reserve box
* IN the KO box
* In the Badly hurt box
* In the Seriously injured box
* In the RIP box
* In the Ban box
* In The miss next game box

On the pitch the X,Y coordinates are used, the other locations are indexed using 0,1,2 etc. 

```
export default class Coordinate {
    x: number;
    y: number;

    public static FIELD_WIDTH = 26;
    public static FIELD_HEIGHT = 15;
    public static RSV_HOME_X = -1;
    public static KO_HOME_X = -2;
    public static BH_HOME_X = -3;
    public static SI_HOME_X = -4;
    public static RIP_HOME_X = -5;
    public static BAN_HOME_X = -6;
    public static MNG_HOME_X = -7;
    public static RSV_AWAY_X = 30;
    public static KO_AWAY_X = 31;
    public static BH_AWAY_X = 32;
    public static SI_AWAY_X = 33;
    public static RIP_AWAY_X = 34;
    public static BAN_AWAY_X = 35;
    public static MNG_AWAY_X = 36;
    
}
```

FFB uses `fieldModelSetPlayerCoordinate` to position players on the field or on the dug out. Players are identified using the FUMBBL player ids. At the end of the replay, all the player information is stored, including extra skills above those that come with the positional, as well as the full rosters (including all possible star players).


## The list of Player states

In the future we might also need PlayerState.

```
export default class PlayerState {
    public static UNKNOWN = 0;
    public static STANDING = 1;
    public static MOVING = 2;
    public static PRONE = 3;
    public static STUNNED = 4;
    public static KNOCKED_OUT = 5;
    public static BADLY_HURT = 6;
    public static SERIOUS_INJURY = 7;
    public static RIP = 8;
    public static RESERVE = 9;
    public static MISSING = 10;
    public static FALLING = 11;
    public static BLOCKED = 12;
    public static BANNED = 13;
    public static EXHAUSTED = 14;
    public static BEING_DRAGGED = 15;
    public static PICKED_UP = 16;
    public static HIT_BY_FIREBALL = 17;
    public static HIT_BY_LIGHTNING = 18;
    public static HIT_BY_BOMB = 19;
    public static BIT_ACTIVE = 256;
    public static BIT_CONFUSED = 512;
    public static BIT_ROOTED = 1024;
    public static BIT_HYPNOTIZED = 2048;
    public static BIT_BLOODLUST = 4096;
    public static BIT_USED_PRO = 8192
}
```

# A flat table format extracted from replays

This could be the basis of our flat file format.
We write a for loop that cycles through the commands, and fills out a pandas dataframe.
We use what we learned with the API data.

Columns in our initial data format:

* commandNr
* modelChangeId
* modelChangeKey
* playerId
* playerState
* playerXcoordinate
* playerYcoordinate
* gameTime
* turnTime

if `modelChangeId` equals `fieldModelSetPlayerState` we record the `modelChangeValue` under `playerState`, and if it equals `fieldModelSetPlayerCoordinate` we record the `modelChangeValue` vector under `playerXcoordinate` and `playerYcoordinate`. In both cases the `PlayerId` can be found under `modelChangeKey`.


# Load packages

In [None]:
import random
import time
import os

from isoweek import Week

import requests # API library

import numpy as np
import pandas as pd

import gzip
import json

from urllib.request import urlopen
from PIL import Image

import sys

pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)




The -i option is important, this runs the code in the namespace of the Jupyter notebook.

In [None]:
# source the Python functions
%run -i kickoff_formations.py

We want to fetch all replays from the Tilean Team Cup, an online NAF tournament held on FUMBBL from march 2023 to may 2023.
Reading in the full HDF5 file is a memory hog, takes 2GB of memory.

In [None]:
# point this to the location of the HDF5 datasets
path_to_datasets = '../fumbbl_datasets/datasets/current/'

# FUMBBL matches
target = 'df_matches.csv'
df_matches = pd.read_csv(path_to_datasets + target) 

# subset on tilean team cup
df_matches = df_matches.query('tournament_id == 59383')

tilean_replays = df_matches['replay_id'].values

In [None]:
# FUMBBL matches by team
target = 'df_mbt.csv'
df_mbt = pd.read_csv(path_to_datasets + target) 

# subset on tilean team cup
df_mbt = df_mbt.query('tournament_id == 59383')

# fetch the replay files

In [None]:
fullrun = 0

if fullrun:
    for replay_id in tilean_replays:
        my_replay = fetch_replay(replay_id)


The FUMBBL server, for the replay file API endpoint, uses 'Transfer-Encoding': 'chunked' to "stream" the gzipped content to the client.

From the requests documentation:
```
For chunked encoded responses, it’s best to iterate over the data using Response.iter_content(). In an ideal situation you’ll have set stream=True on the request, in which case you can iterate chunk-by-chunk by calling iter_content with a chunk_size parameter of None. If you want to set a maximum size of the chunk, you can set a chunk_size parameter to any integer.
```

# Parse replay, write to Excel

check messages without modelChange, ie. inducements bought
fieldModelSetBallMoving catch roll is not included
drop fieldModelSetPlayerState if it is not preceded by oldState (message 79)
move ignore list to separate file
add rosters and game metadata at the top

In [None]:
format(256, "9b")
# 2 is base2
msg = 769
extractor = int("11111111", 2) # mask, expect 
extractor = 256
msg & extractor

# so first 8 bits (0-255) are reserved to encode mutually exclusive stuff, and then we have the BIT section
# so we convert to bits, then use the first 8 bits for the unique state, and then decorate the player using bits 9 to 14
if 769 & 256:
    print("player is active")
if 769 & 512:
    print("player is bone head")
if 769 & 1024:
    print("player is rooted")
# extractor trick for state column: gives 2 = MOVING
16641 & 255

In [None]:
# source the Python functions
%run -i kickoff_formations.py

my_replay = fetch_replay(1559380)
dff = parse_replay_file(my_replay, to_excel = True)

In [None]:
my_replay['game']

# Approach

Using the API I have downloaded a gzipped replay file.
For this match:
https://fumbbl.com/FUMBBL.php?page=match&id=4444067

To goal is to programmatically extract the setup formations.
We develop all steps in separate code chunks, then piece them together in a Python program we can call from Jupyter.


# Visualization

We mimic FFB by plotting FFB icons over the pitch.
In python there is Pillow, the Python image library.
This allows manipulating images.

Lets move on to plotting the board state.

We have an empty image of the board as jpg.
We want to plot player icons on it.

First we plot the empty board.
Then we plot a 13 x 26 roster over it.


In [None]:
from PIL import Image, ImageDraw, ImageFont
with Image.open("resources/nice.jpg") as pitch:
    display(pitch)


## write text on pitch

In [None]:
with Image.open("resources/nice.jpg") as pitch:

    draw = ImageDraw.Draw(pitch) 

    text1 = "Orc"
    text2 = 'Human'

    font1 = ImageFont.truetype('LiberationSerif-Bold.ttf', 28)
    font2 = ImageFont.truetype('LiberationSans-Regular.ttf', 28)

    draw.text((140, 100), text1, font=font1, fill='black')
    draw.text((540, 120), text2, font=font2, fill='black')

    display(pitch)

Lets find the Player Icons we need.
We look in the rosters.

First we construct a dataFrame with the players of both teams, then we look up the PNG urls from the roster.

In [None]:
df_players = extract_players_from_replay(my_replay)

df_players

In [None]:
df_players = extract_players_from_replay(my_replay)
df_positions = extract_rosters_from_replay(my_replay)

df_players2 = pd.merge(df_players, df_positions, on="positionId", how="left")

df_players2

# Working with FUMBBL icons

Plot icon on the pitch. First grab icon. Then grab pitch. Plot icon on pitch.

In [None]:
url = 'https://fumbbl.com/i/645274.png'
icon = Image.open(urlopen(url)).convert("RGBA")
icon_w, icon_h = icon.size
# select first icon
icon = icon.crop((0,0,icon_w/4,icon_w/4))
icon = icon.resize((28, 28))
icon


Lets see if we can draw multiple icons on the pitch.

In [None]:
pitch = Image.open("resources/nice.jpg")
pitch = pitch.rotate(angle = 90, expand = True)
pitch = pitch.resize((15 * 28, 26 * 28))
pitch_w, pitch_h = pitch.size
icon_w, icon_h = icon.size

for i in range(15):
    pitch.paste(icon, (icon_w * i,icon_h * i), icon)

pitch

# Get player positions after kick off

Should check for quick snap etc.
So all position setting up to 'gameSetLastTurnMode' is set to 'setup' and play moves to kick-off phase.

In [None]:
df = parse_replay_file(my_replay)

positions = df.query('turnNr == 0 & turnMode == "setup" & Half == 1 & \
                     modelChangeId == "fieldModelSetPlayerCoordinate"').groupby('modelChangeKey').tail(1)
positions

In [None]:

positions = pd.merge(positions, df_players2, left_on='modelChangeKey', right_on='playerId', how="left")
len(positions.query('PlayerCoordinateX != [-1, 30]'))

# select only players on the board at kick-off, i.e. not in reserve
positions = positions.query('PlayerCoordinateX != [-1, 30]').copy()
positions

## Transform player positions: rotation and mirroring

After rotating the pitch 90 degrees, A player at (0,0) will be positioned at (14,0), and a player at (25, 14) is now positioned at (0, 25).
So transformation formula is (a,b) becomes (14 - b, a).

In [None]:
positions['PlayerCoordinateXrot'] = 14 - positions['PlayerCoordinateY']
positions['PlayerCoordinateYrot'] = positions['PlayerCoordinateX']


No we try to mirror the board setup (swap sides). This requires that we shift the y coordinate to the center (+13), then do minus (mirror y position wrt the horizontal middle line) and then shift back.

In [None]:
positions['PlayerCoordinateXrot2'] = positions['PlayerCoordinateXrot']
positions['PlayerCoordinateYrot2'] = 25 - positions['PlayerCoordinateYrot']

Display all the icon images. Check that they all have 4 columns.
We need the first and third column.
We use the red icons to display offense setups, and blue icons to display defense setups.

Figure out who chooses to receive.
The team that receives is the offensive team, the kicking team set up first and has the defensive.

Looks like the default is `gameSetHomeFirstOffense` set at 0 (i.e. Away has first offense), and only if `gameSetHomeFirstOffense` is set to 1 the home team receives. We use a clunky way to check whether the  `gameSetHomeFirstOffense` command is present during the startGame sequence.

In [None]:
receiving_team = determine_receiving_team_at_start(df)
receiving_team

In [None]:
display_all_iconsets = 0

if display_all_iconsets:
    for i in range(len(positions)):
        icon_path = positions.iloc[i]['icon_path']
        icon = Image.open(urlopen(icon_path))
        display(icon)
        #print(icon.size)

# Main task: plot all replays

takes 67 min.

In [None]:
%run -i kickoff_formations.py

fullrun = 1

if fullrun:
    id = []
    match_ids = []
    team_ids_defense = []
    race_defense = []
    race_offense = []
    team_ids_offense = []

    for replay_id in tilean_replays:
        replay_id, match_id, team_id_defensive, race_defensive, \
            team_id_offensive, race_offensive = process_replay(replay_id, df_matches, refresh = True)
        id.append(int(replay_id))
        match_ids.append(int(match_id))
        team_ids_defense.append(int(team_id_defensive))
        team_ids_offense.append(int(team_id_offensive))
        race_defense.append(race_defensive)
        race_offense.append(race_offensive)

    df_replays = pd.DataFrame( {"matchId": match_ids,
                                "replayId": id,
                                "teamIdOffense": team_ids_offense,
                                "raceOffense": race_offense,
                                "teamIdDefense": team_ids_defense,
                                "raceDefense": race_defense})
    # target = 'kickoff_pngs/df_replays'

    # df_replays.to_hdf(target + '.h5', key='df_replays', mode='w', format = 't',  complevel = 9)
    # df_replays.to_csv(target + '.csv')
else:
    # FUMBBL matches
    target = 'kickoff_pngs/df_replays.h5'
    df_replays = pd.read_hdf(target)  

In [None]:
df_replays.query("matchId == 4449675")

# Selecting skilled coaches

https://fumbbl.com/note/christer/CR
https://github.com/hardingnj/NAF


# References

* Planning in the midst of chaos: how a stochastic Blood Bowl model can help to identify key planning features

* STARDATA: A StarCraft AI Research Dataset

check whether offensive team has frenzy
skill colors (square color borders with skill rings)
add coach rating
fix player plot order for human match