# Handing JSON with Python
JSON stands for Javascript Object Notation and was designed for the handling of structured data.  JSON makes it easy to store and send data and is also easy for humans to understand.  

You may well encounter JSON data, e.g. when pulling data from a web service.  So it is helpful to be able to know how to process it.

In [1]:
import json
import pandas as pd

## JSON as Python Dicts
The JSON structure maps nicely to Python dictionaries and lists.  Here is a very simple JSON object (which happens to be a Python dictionary):

In [2]:
earth = {"name": "Earth", "description": "The Blue Planet"}

When dealing with JSON we often end up with complex nested structures like this, where we have a dict for the overall structure, a list of the planets item, and the list itself contains dicts:

In [3]:
planets = {"count":8, 
           "planets":[
                {"name":"Mercury", "moons":0}, 
                {"name":"Venus", "moons":0}, 
                {"name":"Earth", "moons":1}, 
                {"name":"Mars", "moons":2}, 
                {"name":"Jupiter", "moons":79}, 
                {"name":"Saturn", "moons":82}, 
                {"name":"Uranus", "moons":27}, 
                {"name":"Neptune", "moons":14}
            ]}

## Load JSON from File
We can also load json from a string.  This might be useful, for example, if we have the data in a file.

In [4]:
stars = json.loads('{"stars":[{"name":"UY Scuti", "distance":"5219"},{"name":"VY Canis Majoris", "distance":"3900"},{"name":"RW Cephei", "distance":"9000"}]}')

## Load into a Pandas Dataframe
It's useful to be able to load JSON into a Pandas dataframe.

In [5]:
df = pd.DataFrame(planets)
df

Unnamed: 0,count,planets
0,8,"{'name': 'Mercury', 'moons': 0}"
1,8,"{'name': 'Venus', 'moons': 0}"
2,8,"{'name': 'Earth', 'moons': 1}"
3,8,"{'name': 'Mars', 'moons': 2}"
4,8,"{'name': 'Jupiter', 'moons': 79}"
5,8,"{'name': 'Saturn', 'moons': 82}"
6,8,"{'name': 'Uranus', 'moons': 27}"
7,8,"{'name': 'Neptune', 'moons': 14}"


Notice how in the above example the count item has been repeated and the planets list contains JSON objects.

If you just wanted the planet names, you could just select that item:

In [6]:
df = pd.DataFrame(planets["planets"])
df

Unnamed: 0,name,moons
0,Mercury,0
1,Venus,0
2,Earth,1
3,Mars,2
4,Jupiter,79
5,Saturn,82
6,Uranus,27
7,Neptune,14


## Expanding JSON in Pandas
Sometimes you have structured json and you want to unpack it into your Pandas Dataframe.

In [8]:
planets = {"count":8, 
           "planets":[
                {"name":"Mercury", "moons":{"number":0, "main":[]}}, 
                {"name":"Venus", "moons":{"number":0, "main":[]}}, 
                {"name":"Earth", "moons":{"number":0, "main":["moon"]}}, 
                {"name":"Mars", "moons":{"number":0, "main":[]}}, 
                {"name":"Jupiter", "moons":{"number":79, "main":["Io","Europa","Ganymede", "Callisto"]}}, 
                {"name":"Saturn", "moons":{"number":0, "main":[]}}, 
                {"name":"Uranus", "moons":{"number":0, "main":[]}}, 
                {"name":"Neptune", "moons":{"number":0, "main":[]}}
            ]}

In [9]:
df = pd.DataFrame(planets["planets"])
df

Unnamed: 0,name,moons
0,Mercury,"{'number': 0, 'main': []}"
1,Venus,"{'number': 0, 'main': []}"
2,Earth,"{'number': 0, 'main': ['moon']}"
3,Mars,"{'number': 0, 'main': []}"
4,Jupiter,"{'number': 79, 'main': ['Io', 'Europa', 'Ganym..."
5,Saturn,"{'number': 0, 'main': []}"
6,Uranus,"{'number': 0, 'main': []}"
7,Neptune,"{'number': 0, 'main': []}"


We can use json_normalize to flatten this:

In [10]:
from pandas.io.json import json_normalize

In [11]:
json_normalize(planets["planets"])

Unnamed: 0,name,moons.number,moons.main
0,Mercury,0,[]
1,Venus,0,[]
2,Earth,0,[moon]
3,Mars,0,[]
4,Jupiter,79,"[Io, Europa, Ganymede, Callisto]"
5,Saturn,0,[]
6,Uranus,0,[]
7,Neptune,0,[]
