Mess around with reading in the json files and figure out a schema to use for the data

In [1]:
import re
import json
from pathlib import Path
import pandas as pd

In [2]:
json_dir = Path(".").resolve().parent / "data" / "raw"

Just load a random file and poke around at it a bit to get a sense of the data

In [3]:
eg_json = json_dir / "2014-01.json"
with open(eg_json) as f:
    data = json.load(f)

In [4]:
len(data)

393

From the schema, each item in the list is supposed to be a game so it looks like we have 393 games for this month

Let's poke around a bit more

In [5]:
data[0].keys()

dict_keys(['last_update', 'events', 'player_count', 'factions', 'base_map', 'game'])

In [6]:
data[0]["game"]

'01Week04'

In [7]:
for i in range(20,30):
    print(data[i]["game"])

aaa1
aaa2
ABBABBA
AerthBattle
AfternoonTea
aGame
AGE01
allehoppa
AllenamentoIvanDrago
altazor


In [8]:
data[0]["last_update"]

'2014-01-31 17:14:14'

In [9]:
data[0]["player_count"]

3

In [10]:
data[0]["factions"]

[{'player': 'EmanuelU', 'faction': 'halflings'},
 {'player': 'Nidhoegg', 'faction': 'alchemists'},
 {'player': 'affe1982', 'faction': 'dwarves'}]

In [11]:
data[0]["base_map"]

'126fe960806d587c78546b30f1a90853b1ada468'

In [12]:
event_eg = data[0]["events"]

In [13]:
event_eg.keys()

dict_keys(['global', 'location', 'faction'])

Location wasn't listed in the schema, let's take a look at it

In [14]:
event_eg["location"]

{'halflings': {'round': {'6': ['E4'],
   'all': ['I8', 'E6', 'H4', 'F3', 'E5', 'D3', 'B2', 'I7', 'C1', 'E4'],
   '5': ['B2', 'I7', 'C1'],
   '0': ['I8', 'E6'],
   '4': ['E5', 'D3'],
   '3': ['H4', 'F3']}},
 'alchemists': {'round': {'6': ['E11'],
   'all': ['G5',
    'E10',
    'F7',
    'G6',
    'G4',
    'H5',
    'D8',
    'E8',
    'D5',
    'C3',
    'B3',
    'E11'],
   '5': ['C3', 'B3'],
   '3': ['D8'],
   '2': ['G4', 'H5'],
   '0': ['G5', 'E10'],
   '1': ['F7', 'G6'],
   '4': ['E8', 'D5']}},
 'dwarves': {'round': {'6': ['G2', 'G1'],
   'all': ['F6',
    'H6',
    'E9',
    'D7',
    'C5',
    'I9',
    'H8',
    'G3',
    'E7',
    'C4',
    'G2',
    'G1'],
   '5': ['G3', 'E7', 'C4'],
   '0': ['F6', 'H6'],
   '2': ['E9', 'D7'],
   '3': ['C5', 'I9', 'H8']}}}

OK, I don't think I need that for now, but I'll update the schema text so it's noted

In [15]:
event_eg["faction"]["alchemists"]["vp"]["round"]["all"]

109

Well that's a pretty deeply nested place to get victory points, should be fun

In [16]:
event_eg["global"]

{'option-email-notify': {'round': {'all': 1, '0': 1}},
 'option-shipping-bonus': {'round': {'all': 1, '0': 1}},
 'SCORE1': {'round': {'all': 1, '4': 1}},
 'option-mini-expansion-1': {'round': {'all': 1, '0': 1}},
 'option-errata-cultist-power': {'round': {'all': 1, '0': 1}},
 'faction-count': {'round': {'all': 3, '0': 3}},
 'SCORE5': {'round': {'all': 1, '3': 1}},
 'option-strict-leech': {'round': {'all': 1, '0': 1}},
 'SCORE7': {'round': {'all': 1, '2': 1}},
 'SCORE8': {'round': {'6': 1, 'all': 1}},
 'SCORE3': {'round': {'all': 1, '5': 1}},
 'SCORE2': {'round': {'all': 1, '1': 1}}}

Ok, so to know what scoring tiles were used in the game I have to check for them in the keys of "global". I don't get the nested schema below that, all seems like it's always going to be one, and then the other key just tells me what round it was actually in. So in this example I read it as  
Round 1: Scoring tile 2  
Round 2: Scoring tile 7  
Round 3: Scoring tile 5  
Round 4: Scoring tile 1  
Round 5: Scoring tile 3  
Round 6: Scoring tile 8  

In [17]:
data[0]["factions"]

[{'player': 'EmanuelU', 'faction': 'halflings'},
 {'player': 'Nidhoegg', 'faction': 'alchemists'},
 {'player': 'affe1982', 'faction': 'dwarves'}]

I think I should have x + 2 = 5 passing tiles for this game, let's figure out what they are

In [18]:
event_eg["faction"]["all"].keys()

dict_keys(['favor:FAV3', 'upgrade:SH', 'vp', 'action:ACT4', 'town:TW1', 'town:TW2', 'pass:BON4', 'action:ACT2', 'decline:pw', 'advance:dig', 'action:BON1', 'leech:count', 'action:ACT1', 'pass:BON1', 'favor:FAV7', 'favor:any', 'burn', 'advance:ship', 'order:2', 'order:1', 'upgrade:TP', 'favor:FAV11', 'action:ACT6', 'bridge', 'favor:FAV5', 'decline:count', 'pass:BON6', 'action:ACT5', 'dig', 'leech:pw', 'town:any', 'pass:BON3', 'order:3', 'upgrade:TE', 'pass:BON10', 'town:TW5', 'pass:BON7', 'upgrade:SA', 'favor:FAV10', 'build:D', 'action:ACT3'])

If I'm reading this right to get the passing tiles for the game I'd have to go through here and for each faction find the keys that have "pass"/"act":BON# let's see if that makes sense

In [19]:
fact_eg = event_eg["faction"]

In [20]:
rgx = r":(BON\d)"
# I could do this in python 3.8 but trying to get everything else working there sent me to dependency hell. Oh well
# set(match.groups(0)[0] for x in fact_eg["all"].keys() if (match := re.compile(rgx).search(x)))
set(re.compile(rgx).search(x).groups(0)[0] for x in fact_eg["all"].keys() if re.compile(rgx).search(x))

{'BON1', 'BON3', 'BON4', 'BON6', 'BON7'}

That looks like it's it then. I guess the risk is that if a bonus tile is available but is never played in the game I won't be able to identify it from this. Something to validate for when I'm building the dataset