Let's try and read in a few games and get the data I need. Once I find a structure I like I can refactor it into some functions/classes

In [1]:
import json
import re
import pandas as pd
from pathlib import Path

In [2]:
json_dir = Path(".").resolve().parent / "data" / "raw"

In [3]:
eg_json = json_dir / "2014-01.json"
with open(eg_json) as f:
    data = json.load(f)

Let's start with a bare minimum, For every game I need to know what scoring tiles were selected, what passing tiles were selected, the factions, and their ultimate victory points. There's lots of other features that might be useful but let's try building a minimum viable model first

From looking at the rules there should be 8 possible scoring tiles, 9 passing tiles, and 14 factions. Let's go through this month of data at a minimum and make sure I find exactly that many of each

In [4]:
scoring_tiles = set()
passing_tiles = set()
factions = set()
score_rgx = re.compile(r"SCORE\d+")
pass_rgx = re.compile(r":(BON\d+)")

for game in data:
    for faction in game["factions"]:
        factions.add(faction["faction"])
    event = game["events"]
    for key in event["global"].keys():
        if score_rgx.search(key):
            scoring_tiles.add(key)
    all_faction = event["faction"]["all"]
    for key in all_faction.keys():
        if pass_rgx.search(key):
            passing_tiles.add(pass_rgx.search(key).groups(0)[0])

In [5]:
scoring_tiles

{'SCORE1',
 'SCORE2',
 'SCORE3',
 'SCORE4',
 'SCORE5',
 'SCORE6',
 'SCORE7',
 'SCORE8'}

In [6]:
passing_tiles

{'BON1',
 'BON10',
 'BON2',
 'BON3',
 'BON4',
 'BON5',
 'BON6',
 'BON7',
 'BON8',
 'BON9'}

In [7]:
factions

{'alchemists',
 'auren',
 'chaosmagicians',
 'cultists',
 'darklings',
 'dwarves',
 'engineers',
 'fakirs',
 'giants',
 'halflings',
 'mermaids',
 'nomads',
 'swarmlings',
 'witches'}

In [8]:
len(factions)

14

Well, that's some pretty garbage code, but it ran nice and quick on one month so it's probably worth running through everything real quick to avoid surprises

In [9]:
scoring_tiles = set()
passing_tiles = set()
factions = set()
score_rgx = re.compile(r"SCORE\d+")
pass_rgx = re.compile(r":(BON\d+)")

for json_file in json_dir.glob("*.json"):
    with open(json_file) as f:
        data = json.load(f) 
    for game in data:
        for faction in game["factions"]:
            factions.add(faction["faction"])
        event = game["events"]
        for key in event["global"].keys():
            if score_rgx.search(key):
                scoring_tiles.add(key)
        all_faction = event["faction"]["all"]
        for key in all_faction.keys():
            if pass_rgx.search(key):
                passing_tiles.add(pass_rgx.search(key).groups(0)[0])
try:
    assert(len(scoring_tiles) == 8)
except AssertionError:
    print("Scoring tiles found != 8")
try:
    assert(len(passing_tiles) == 10)
except AssertionError:
    print("passing tiles found != 10")
try:
    assert(len(factions) == 14)
except AssertionError:
    print("factions found != 14")

Scoring tiles found != 8
factions found != 14


In [10]:
scoring_tiles

{'SCORE1',
 'SCORE2',
 'SCORE3',
 'SCORE4',
 'SCORE5',
 'SCORE6',
 'SCORE7',
 'SCORE8',
 'SCORE9'}

Oh cool, I wonder wtf SCORE 9 is?

In [11]:
factions

{'acolytes',
 'alchemists',
 'auren',
 'chaosmagicians',
 'cultists',
 'darklings',
 'dragonlords',
 'dwarves',
 'engineers',
 'fakirs',
 'giants',
 'halflings',
 'icemaidens',
 'mermaids',
 'nofaction1',
 'nofaction2',
 'nofaction3',
 'nofaction4',
 'nofaction5',
 'nofaction7',
 'nomads',
 'riverwalkers',
 'shapeshifters',
 'swarmlings',
 'witches',
 'yetis'}

In [12]:
len(factions)

26

Ok so we have bonus acolytes, dragonlords, icemaidens, riverwalkers, shapeshifters, yetis and whatever nofaction1/2/3/4/5/7 is supposed to mean

One last thing I want to check before I build this. I think the game names are supposed to be unique, but let's just check that.

In [13]:
game_names = set()
num_games = 0
for json_file in json_dir.glob("*.json"):
    with open(json_file) as f:
        data = json.load(f) 
    for game in data:
        game_names.add(game["game"])
        num_games += 1
assert len(game_names) == num_games

Ok, at least game names are indeed unique

Oh, yeah, I was going to check if "drop-faction" was anywhere. Sure would have been faster to do that the last time I iterated through this whole pile of data, oh well

In [14]:
drop_game_names = set()
num_drop_games = 0
for json_file in json_dir.glob("*.json"):
    with open(json_file) as f:
        data = json.load(f) 
    for game in data:
        if "drop-faction" in game["events"]["global"].keys():
            drop_game_names.add(game["game"])
            num_drop_games += 1

In [15]:
num_drop_games

23531

Ok, there's a fair number of games with players dropping. I'll have to account for that somehow. Maybe just exclude them?