# Optimal faction choice in Terra Mystica

## The important bits:

WebApp can be accessed [here](https://tmmodel.azurewebsites.net/)

Code is [here](https://github.com/ianepreston/terra-mystica-models)

## Introduction

[Terra Mystica](https://boardgamegeek.com/boardgame/120677/terra-mystica) is a strategic board game. Players pick from one of fourteen possible factions and then take turns controlling and developing territory to get victory points. In additional to the physical board version, there is a popular online implementation hosted [here](https://terra.snellman.net/).

The game starts by randomly drawing score and bonus tiles, after which players can select their factions. Each faction has distinct abilities that can be enhanced or diminished by the score and bonus tiles. The idea behind this project is to make a model that will recommend a faction based on which score and bonus tiles have been drawn. Snellman has made [data on every game that's been played on the platform](https://terra.snellman.net/data/events/) available, which allowed me to develop a statistical model and then serve up recommendations based on it. This post will describe the process I followed to generate the final product. It will be fairly light on code in the post itself, but there will be links to the relevant notebooks and code throughout if you want to dig deeper.

## Setting up for the project

I used python for this project, and to create the initial file and folder structure I used the  [cookie cutter data science](https://drivendata.github.io/cookiecutter-data-science/) cookie cutter template with a couple modifications. At the time I started the project I actually based it off their [Azure pipelines branch](https://github.com/drivendata/cookiecutter-data-science/tree/azure-pipelines) since it had some nice updated features, although depending when you're reading this that might have been merged into the main branch. This gave me a directory for package code, notebooks, reference material, a Makefile, documentation, data and other artifacts. I also added pre-commit hooks that would run my code through [black](https://black.readthedocs.io/en/stable/) and lint it with [flake8](https://flake8.pycqa.org/en/latest/) before I could commit anything.

## Getting the data


### Download the JSON

Like any good data project, the first thing I needed to do was get data. As mentioned in the intro, each month's worth of games is saved as a JSON file on the Snellman page. I used [requests](https://requests.readthedocs.io/en/master/) and [Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) to scrape and then download all the JSON files listed at https://terra.snellman.net/data/events/. The full code is available [here](https://github.com/ianepreston/terra-mystica-models/blob/master/terra_mystica_models/data/download_dataset.py). Note that by default the function only downloads JSON up to the month that was available when I started the project. I pinned that by default for reproducibility, but a more productionized implementation would need to be able to download new games and retrain the model on them as they came out.

### Parse the JSON

After the download finished the next step was to read in all the game data and structure it in a way that would be amenable to modeling. There is a [Readme](https://terra.snellman.net/data/events/README) for the JSON schema, but I found it wasn't detailed enough for my purposes, and there were a couple spots where it was incorrect. The end result was that I spent a lot of time poking around the JSON before I ended up creating a class representing a Terra Mystica game. The notebooks related to exploring and understanding the JSON are the first 5 in the [notebooks](https://github.com/ianepreston/terra-mystica-models/tree/master/notebooks) folder of the repository. [Book 1](https://github.com/ianepreston/terra-mystica-models/blob/master/notebooks/01_ip_explore-json.ipynb), [Book 2](https://github.com/ianepreston/terra-mystica-models/blob/master/notebooks/02_ip_parse-json_testing.ipynb), [Book 3](https://github.com/ianepreston/terra-mystica-models/blob/master/notebooks/03_ip_more_json_parsing.ipynb) [Book 4](https://github.com/ianepreston/terra-mystica-models/blob/master/notebooks/04_ip_even_more_json_parsing.ipynb) and [Book 5](https://github.com/ianepreston/terra-mystica-models/blob/master/notebooks/05_ip_reparse_json.ipynb). The code that I finally used to turn the JSON games into a python class and then finally a [pandas](https://pandas.pydata.org/) dataframe is [here](https://github.com/ianepreston/terra-mystica-models/blob/master/terra_mystica_models/data/make_dataset.py).

## Manage workflow

The data I started with was just under 3GB of JSON. Processing all of that actually takes a decent amount of time. Later steps were much faster but still took an appreciable amount of time. With that in mind I didn't want to re-run earlier steps in the process unless I had to. In order to manage my workflow I ended up using [d6tflow](https://d6tflow.readthedocs.io/en/latest/) which allowed me to define tasks, along with their parameters and dependencies. This meant I could quickly iterate through the process, only repeating earlier steps if I needed to actually change something. I'm quite happy with d6tflow. The documentation is a little sparse, but it's definitely more tailor made for this sort of workflow than a more ETL focused tool like [Airflow](https://airflow.apache.org/) or [Luigi](https://github.com/spotify/luigi), which d6tflow extends.

## Explore the data

After loading in all the game data the next step was to explore and understand the data. 

## Next steps

Other things I could still do with this:

* Retrain the model on updated data
* Consider other faction choices for situations where you're not picking faction first
* Try alternate models - compare rankings between them. Pure predictive power isn't really the goal here, but it would be interesting if different implementations resulted in different rankings.
* Make the web app work on mobile