# JSON tutorial

Last time we learned about a Python-specific serialization format called `pickle`.  This is often used "under the hood" (e.g. in Spark) to serialize data and functions, but is rarely used beyond that.

One important data serialization format that IS often used for data interchange is `json`.  It is widely used because it is simple and formatted in human-readable text files.  Although it came out of the Javascript language, support is ubiquitous across many languages.

## File navigation in Python

In order to read one of the json files into Python, we need to learn a little about how to navigate the filesystem in Python.  First, we could "hardcode" our file path like this:

In [1]:
jsonpath = 'cwl-data/data/structured/structured-2018-01-14-neworleans/structured-1515984523-6592b573-b485-58b0-963e-6be0b4d02f6c.json'

but that's not particularly robust.  For example, this wouldn't work in Windows because there directories are separated by `\` instead of `/`.

Python has some nifty functions to avoid dealing with all of this cross-platform nastiness:

In [2]:
import os
datadir = os.path.join('./', 'cwl-data', 'data', 'structured', 'structured-2018-01-14-neworleans')

jsonfile = 'structured-1515984523-6592b573-b485-58b0-963e-6be0b4d02f6c.json'

jsonpath = os.path.join(datadir, jsonfile)
print(jsonpath)

./cwl-data/data/structured/structured-2018-01-14-neworleans/structured-1515984523-6592b573-b485-58b0-963e-6be0b4d02f6c.json


We can open the file and read the raw data:

In [4]:
# open the file
with open(jsonpath, 'r') as f:
    rawdata = f.read()

In [5]:
type(rawdata)

str

In [7]:
rawdata

'{"title": "ww2", "platform": "ps4", "id": "6592b573-b485-58b0-963e-6be0b4d02f6c", "series_id": "champs-grand-finals-1", "start_time_s": 1515984053, "end_time_s": 1515984523, "duration_ms": 470000, "mode": "Search & Destroy", "map": "Sainte Marie du Mont", "rounds": 9, "teams": [{"name": "TEAM KALIBER", "score": 6, "is_victor": true, "round_scores": [1, 1, 0, 1, 1, 0, 0, 1, 1], "side": "home"}, {"name": "LUMINOSITY GAMING", "score": 3, "is_victor": false, "round_scores": [0, 0, 1, 0, 0, 1, 1, 0, 0], "side": "away"}], "players": [{"name": "ACCURACY", "team": "TEAM KALIBER", "kills": 8, "deaths": 4, "kd": 2.0, "kills_per_10min": 10.2, "deaths_per_10min": 5.1, "assists": 8, "headshots": 2, "suicides": 0, "team_kills": 0, "team_deaths": 0, "stayed_alive_kills": 8, "hits": 53, "shots": 195, "accuracy": 27.2, "num_lives": 9, "time_alive_s": 398.2, "avg_time_per_life_s": 44.2, "fave_weapon": "BAR", "fave_division": "Armored", "fave_training": "Scoped", "fave_scorestreaks": ["Fighter Pilot", "

## Deserialization

We have the raw text read in as a string, but we want to "unpack" it.  We call this "deserialization":  

In [9]:
import ujson

# deserialize the data from json format
data = ujson.loads(rawdata)

In [10]:
type(data)

dict

In [11]:
data

{'title': 'ww2',
 'platform': 'ps4',
 'id': '6592b573-b485-58b0-963e-6be0b4d02f6c',
 'series_id': 'champs-grand-finals-1',
 'start_time_s': 1515984053,
 'end_time_s': 1515984523,
 'duration_ms': 470000,
 'mode': 'Search & Destroy',
 'map': 'Sainte Marie du Mont',
 'rounds': 9,
 'teams': [{'name': 'TEAM KALIBER',
   'score': 6,
   'is_victor': True,
   'round_scores': [1, 1, 0, 1, 1, 0, 0, 1, 1],
   'side': 'home'},
  {'name': 'LUMINOSITY GAMING',
   'score': 3,
   'is_victor': False,
   'round_scores': [0, 0, 1, 0, 0, 1, 1, 0, 0],
   'side': 'away'}],
 'players': [{'name': 'ACCURACY',
   'team': 'TEAM KALIBER',
   'kills': 8,
   'deaths': 4,
   'kd': 2.0,
   'kills_per_10min': 10.2,
   'deaths_per_10min': 5.1,
   'assists': 8,
   'headshots': 2,
   'suicides': 0,
   'team_kills': 0,
   'team_deaths': 0,
   'stayed_alive_kills': 8,
   'hits': 53,
   'shots': 195,
   'accuracy': 27.2,
   'num_lives': 9,
   'time_alive_s': 398.2,
   'avg_time_per_life_s': 44.2,
   'fave_weapon': 'BAR',
  

You have no seen this yet, but dicts and lists can be *nested*.  Here is a simple dict:

In [12]:
simpledict = {"a": 1, "b": 2}

There is no reason why the values can't themselves be dicts and lists:

In [14]:
nesteddict = {"a": [5, 6, 7], "b": {"dogs": 10, "cats": 11}}

This nesting can be arbitrarily gnarly.  Let's have a look at our deserialized json (which is now a dict):

In [15]:
data.keys()

dict_keys(['title', 'platform', 'id', 'series_id', 'start_time_s', 'end_time_s', 'duration_ms', 'mode', 'map', 'rounds', 'teams', 'players', 'events'])

In [17]:
data['title']

'ww2'

In [18]:
data['platform']

'ps4'

In [18]:
# timestamps are OFTEN stored as "UNIX timestamps".
# This is the number of seconds elapsed since January 1, 1970
data['start_time_s']

1515984053

In [19]:
data['end_time_s']

1515984523

In [20]:
data['end_time_s'] - data['start_time_s']

470

In [21]:
data['duration_ms']

470000

In [22]:
data['mode']

'Search & Destroy'

In [23]:
data['map']

'Sainte Marie du Mont'

In [24]:
data['rounds']

9

In [25]:
data['teams']

[{'name': 'TEAM KALIBER',
  'score': 6,
  'is_victor': True,
  'round_scores': [1, 1, 0, 1, 1, 0, 0, 1, 1],
  'side': 'home'},
 {'name': 'LUMINOSITY GAMING',
  'score': 3,
  'is_victor': False,
  'round_scores': [0, 0, 1, 0, 0, 1, 1, 0, 0],
  'side': 'away'}]

In [26]:
data['teams'][0]

{'name': 'TEAM KALIBER',
 'score': 6,
 'is_victor': True,
 'round_scores': [1, 1, 0, 1, 1, 0, 0, 1, 1],
 'side': 'home'}

In [27]:
data['teams'][0]['name']

'TEAM KALIBER'

In [28]:
data['players']

[{'name': 'ACCURACY',
  'team': 'TEAM KALIBER',
  'kills': 8,
  'deaths': 4,
  'kd': 2.0,
  'kills_per_10min': 10.2,
  'deaths_per_10min': 5.1,
  'assists': 8,
  'headshots': 2,
  'suicides': 0,
  'team_kills': 0,
  'team_deaths': 0,
  'stayed_alive_kills': 8,
  'hits': 53,
  'shots': 195,
  'accuracy': 27.2,
  'num_lives': 9,
  'time_alive_s': 398.2,
  'avg_time_per_life_s': 44.2,
  'fave_weapon': 'BAR',
  'fave_division': 'Armored',
  'fave_training': 'Scoped',
  'fave_scorestreaks': ['Fighter Pilot', 'Glide Bomb', 'Mortar Strike'],
  '2piece': 0,
  '3piece': 1,
  '4piece': 0,
  '4streak': 0,
  '5streak': 0,
  '6streak': 0,
  '7streak': 0,
  '8+streak': 0,
  'scorestreaks_earned': 0,
  'scorestreaks_used': 0,
  'scorestreaks_deployed': 0,
  'scorestreaks_kills': 0,
  'scorestreaks_assists': 0,
  'snd_firstbloods': 0,
  'snd_pickups': 1,
  'snd_plants': 1,
  'snd_defuses': 1,
  'snd_sneak_defuses': 0,
  'snd_rounds': 9,
  'snd_firstdeaths': 2,
  'snd_survives': 5,
  'snd_1kill_round':

In [29]:
data['players'][0]

{'name': 'ACCURACY',
 'team': 'TEAM KALIBER',
 'kills': 8,
 'deaths': 4,
 'kd': 2.0,
 'kills_per_10min': 10.2,
 'deaths_per_10min': 5.1,
 'assists': 8,
 'headshots': 2,
 'suicides': 0,
 'team_kills': 0,
 'team_deaths': 0,
 'stayed_alive_kills': 8,
 'hits': 53,
 'shots': 195,
 'accuracy': 27.2,
 'num_lives': 9,
 'time_alive_s': 398.2,
 'avg_time_per_life_s': 44.2,
 'fave_weapon': 'BAR',
 'fave_division': 'Armored',
 'fave_training': 'Scoped',
 'fave_scorestreaks': ['Fighter Pilot', 'Glide Bomb', 'Mortar Strike'],
 '2piece': 0,
 '3piece': 1,
 '4piece': 0,
 '4streak': 0,
 '5streak': 0,
 '6streak': 0,
 '7streak': 0,
 '8+streak': 0,
 'scorestreaks_earned': 0,
 'scorestreaks_used': 0,
 'scorestreaks_deployed': 0,
 'scorestreaks_kills': 0,
 'scorestreaks_assists': 0,
 'snd_firstbloods': 0,
 'snd_pickups': 1,
 'snd_plants': 1,
 'snd_defuses': 1,
 'snd_sneak_defuses': 0,
 'snd_rounds': 9,
 'snd_firstdeaths': 2,
 'snd_survives': 5,
 'snd_1kill_round': 1,
 'snd_2kill_round': 2,
 'snd_3kill_round'