## JSON - JavaScript Object Notation

JSON is a type of structured text file, which is very popular for web development but it can be used to hold almost any kind of data.  It's a file format that you're bound to run into frequently as a data scientist.  So let's take a look at a JSON file.

🐲 Pokédex of Pokémon GO in JSON.  
downloaded from https://github.com/Biuni/PokemonGO-Pokedex

In [3]:
!head -30 pokedex.json

{
  "pokemon": [{
    "id": 1,
    "num": "001",
    "name": "Bulbasaur",
    "img": "http://www.serebii.net/pokemongo/pokemon/001.png",
    "type": [
      "Grass",
      "Poison"
    ],
    "height": "0.71 m",
    "weight": "6.9 kg",
    "candy": "Bulbasaur Candy",
    "candy_count": 25,
    "egg": "2 km",
    "spawn_chance": 0.69,
    "avg_spawns": 69,
    "spawn_time": "20:00",
    "multipliers": [1.58],
    "weaknesses": [
      "Fire",
      "Ice",
      "Flying",
      "Psychic"
    ],
    "next_evolution": [{
      "num": "002",
      "name": "Ivysaur"
    }, {
      "num": "003",


The first thing you'll notice is that a JSON file looks a lot like python code.  Actually, if you were to copy this text, and paste it into a code cell, it would be legal python code.  Python would create a dictionary where you see curly braces, and a list where you see square braces.  This is great, because it means that moving from json text to python objects is really easy.  We do that with the json library

In [4]:
import json

To load json you need json.load for a file, or loads for a string.

In [6]:
with open('pokedex.json') as f:
    P = json.load(f)

In [8]:
type(P)

dict

To get to the data, you have to bracket index through the layers.

In [10]:
P.keys()

dict_keys(['pokemon'])

In [12]:
type(P['pokemon'])

list

In [13]:
P['pokemon'][1]

{'id': 2,
 'num': '002',
 'name': 'Ivysaur',
 'img': 'http://www.serebii.net/pokemongo/pokemon/002.png',
 'type': ['Grass', 'Poison'],
 'height': '0.99 m',
 'weight': '13.0 kg',
 'candy': 'Bulbasaur Candy',
 'candy_count': 100,
 'egg': 'Not in Eggs',
 'spawn_chance': 0.042,
 'avg_spawns': 4.2,
 'spawn_time': '07:00',
 'multipliers': [1.2, 1.6],
 'weaknesses': ['Fire', 'Ice', 'Flying', 'Psychic'],
 'prev_evolution': [{'num': '001', 'name': 'Bulbasaur'}],
 'next_evolution': [{'num': '003', 'name': 'Venusaur'}]}

In [18]:
P['pokemon'][1]['name']

'Ivysaur'

If you want to go back from python objects to a json file, you need json.dump for a file or json.dumps for a string.

In [17]:
json.dumps(P['pokemon'][1])

'{"id": 2, "num": "002", "name": "Ivysaur", "img": "http://www.serebii.net/pokemongo/pokemon/002.png", "type": ["Grass", "Poison"], "height": "0.99 m", "weight": "13.0 kg", "candy": "Bulbasaur Candy", "candy_count": 100, "egg": "Not in Eggs", "spawn_chance": 0.042, "avg_spawns": 4.2, "spawn_time": "07:00", "multipliers": [1.2, 1.6], "weaknesses": ["Fire", "Ice", "Flying", "Psychic"], "prev_evolution": [{"num": "001", "name": "Bulbasaur"}], "next_evolution": [{"num": "003", "name": "Venusaur"}]}'

Let's say we want a list of all the pokemon names.

In [16]:
list(pokemon['name'] for pokemon in P['pokemon'])[:20]

['Bulbasaur',
 'Ivysaur',
 'Venusaur',
 'Charmander',
 'Charmeleon',
 'Charizard',
 'Squirtle',
 'Wartortle',
 'Blastoise',
 'Caterpie',
 'Metapod',
 'Butterfree',
 'Weedle',
 'Kakuna',
 'Beedrill',
 'Pidgey',
 'Pidgeotto',
 'Pidgeot',
 'Rattata',
 'Raticate']

What if we want a list of all the types?  it's a little trickier, because we can loop through all of the types in each pokemon, but we'll get repeats.  the simplest idea is to use a set comprehension.

In [19]:
{t for pokemon in P['pokemon'] for t in pokemon['type']}

{'Bug',
 'Dragon',
 'Electric',
 'Fighting',
 'Fire',
 'Flying',
 'Ghost',
 'Grass',
 'Ground',
 'Ice',
 'Normal',
 'Poison',
 'Psychic',
 'Rock',
 'Water'}

What if you need to get data out of a json file and into a dataframe.  The json_normalize method can help.  It tries to turn the json into tabular data.  But the json is nested, so you actually end up with lists and dictionaries inside your dataframe. 



In [20]:
import pandas as pd

In [27]:
pd.io.json.json_normalize(P, record_path=['pokemon']).head()

Unnamed: 0,avg_spawns,candy,candy_count,egg,height,id,img,multipliers,name,next_evolution,num,prev_evolution,spawn_chance,spawn_time,type,weaknesses,weight
0,69.0,Bulbasaur Candy,25.0,2 km,0.71 m,1,http://www.serebii.net/pokemongo/pokemon/001.png,[1.58],Bulbasaur,"[{'num': '002', 'name': 'Ivysaur'}, {'num': '0...",1,,0.69,20:00,"[Grass, Poison]","[Fire, Ice, Flying, Psychic]",6.9 kg
1,4.2,Bulbasaur Candy,100.0,Not in Eggs,0.99 m,2,http://www.serebii.net/pokemongo/pokemon/002.png,"[1.2, 1.6]",Ivysaur,"[{'num': '003', 'name': 'Venusaur'}]",2,"[{'num': '001', 'name': 'Bulbasaur'}]",0.042,07:00,"[Grass, Poison]","[Fire, Ice, Flying, Psychic]",13.0 kg
2,1.7,Bulbasaur Candy,,Not in Eggs,2.01 m,3,http://www.serebii.net/pokemongo/pokemon/003.png,,Venusaur,,3,"[{'num': '001', 'name': 'Bulbasaur'}, {'num': ...",0.017,11:30,"[Grass, Poison]","[Fire, Ice, Flying, Psychic]",100.0 kg
3,25.3,Charmander Candy,25.0,2 km,0.61 m,4,http://www.serebii.net/pokemongo/pokemon/004.png,[1.65],Charmander,"[{'num': '005', 'name': 'Charmeleon'}, {'num':...",4,,0.253,08:45,[Fire],"[Water, Ground, Rock]",8.5 kg
4,1.2,Charmander Candy,100.0,Not in Eggs,1.09 m,5,http://www.serebii.net/pokemongo/pokemon/005.png,[1.79],Charmeleon,"[{'num': '006', 'name': 'Charizard'}]",5,"[{'num': '004', 'name': 'Charmander'}]",0.012,19:00,[Fire],"[Water, Ground, Rock]",19.0 kg


Depending on what you need, this might be enough.  Usually, this is just a starting point, and you will want to take some of the nested structures and turn them into new columns.  For example, we might need a column for grass type, a column for poison type, and so on.  You might think about how you would create those in pandas.