# Working With JSON Data

**Learning Objectives:** Learn how to find and download JSON data from the web and work with it in Python and Pandas.

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd

## About JSON

JSON stands for "JavaScript Object Notation". In spite of the "JavaScript" in its name, JSON has become the universal data format of the web. Every modern programming language has the ability to read and write JSON data and most data focused companies provide data in the JSON format.

From the perspective of Python, JSON has the ability to encode the following types of data:

* `dict`
* `list` and `tuple`
* `str`
* `float`
* `int`
* `bool`
* `None` and `NaN` as `null`

Much of the modern web is built out of JSON based web services. These web services are often referred to as RESTful (Representational State Transfer) APIs. These types of APIs are used by:

* Facebook
* Twitter
* LinkedIn
* Wikipedia
* Amazon
* New York Times
* ...

## Basic JSON data

There are a number of Python libraries capable of working with JSON data. In this case we will use the `json` package from the standard library:

In [2]:
import json

In Python, JSON data isn't special at all. It is just regular Python data types (see above):

In [3]:
data1 = {'a': 100, 'b': None, 'c': np.nan, 'd': list(range(10))}

In [4]:
data1

{'a': 100, 'b': None, 'c': nan, 'd': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]}

To create the JSON representation of this data, we use the `json.dumps` function:

In [5]:
j = json.dumps(data1)

This creates a Python string with the JSON data inside:

In [6]:
type(j)

str

Note how the JSON representation looks almost identical to the Python representation. JSON is super simple way of converting objects into a string!

In [7]:
j

'{"c": NaN, "d": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], "b": null, "a": 100}'

To convert a JSON string back to the corresponding Python objects, use the `json.loads` function:

In [8]:
data2 = json.loads(j)

In [9]:
data2

{'a': 100, 'b': None, 'c': nan, 'd': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]}

## Pandas and JSON

Pandas has a set of very flexible functions and methods for working with JSON data.

Here is a simple `DataFrame` we will use to demonstrate Pandas's JSON handling:

In [10]:
df = pd.DataFrame({'age': np.random.randint(0,100,5),
                   'gender': np.random.choice(['m','f'],5,p=[0.3,0.7])})

In [11]:
df

Unnamed: 0,age,gender
0,31,m
1,35,m
2,13,f
3,67,m
4,2,f


The `to_json` method will convert any `DataFrame` to a JSON string:

In [12]:
j1 = df.to_json()

In [13]:
j1

'{"age":{"0":31,"1":35,"2":13,"3":67,"4":2},"gender":{"0":"m","1":"m","2":"f","3":"m","4":"f"}}'

Likewise the `read_json` method goes the other way:

In [14]:
pd.read_json(j1)

Unnamed: 0,age,gender
0,31,m
1,35,m
2,13,f
3,67,m
4,2,f


The `to_json` and `read_json` calls take an `orient` keyword argument that determines how the `DataFrame` will be converted to JSON data. Here is the `index` variant:

In [15]:
j2 = df.to_json(orient='index')

In [16]:
j2

'{"0":{"age":31,"gender":"m"},"1":{"age":35,"gender":"m"},"2":{"age":13,"gender":"f"},"3":{"age":67,"gender":"m"},"4":{"age":2,"gender":"f"}}'

In [17]:
pd.read_json(j2, orient='index')

Unnamed: 0,age,gender
0,31,m
1,35,m
2,13,f
3,67,m
4,2,f


Notice that if we don't pass `orient=index` to `read_json` we don't get back what we expect:

In [18]:
pd.read_json(j2)

Unnamed: 0,0.0,1.0,2.0,3.0,4.0
age,31,35,13,67,2
gender,m,m,f,m,f


The final variant is `orient=records`. In this case we are using

In [19]:
j3 = df.to_json(orient='records')

In [20]:
j3

'[{"age":31,"gender":"m"},{"age":35,"gender":"m"},{"age":13,"gender":"f"},{"age":67,"gender":"m"},{"age":2,"gender":"f"}]'

All of these functions work with:

* Python strings
* Filenames
* URLs

In [21]:
with open('people.json', 'w') as f:
    f.write(j3)

In [22]:
pd.read_json('people.json')

Unnamed: 0,age,gender
0,31,m
1,35,m
2,13,f
3,67,m
4,2,f


## Resources

* [JSON Format](http://en.wikipedia.org/wiki/JSON#Data_types.2C_syntax_and_example)
* [Representational State Transfer](http://en.wikipedia.org/wiki/Representational_state_transfer)