# JSON (JavaScript Object Notation)
JSON stands for JavaScript Object Notation. It became very popular with the increasing popularity of the JavaScript because it represents the way objects are built in it. It is a "subset" of the JavaScript language.

Online Resources:
* https://www.json.org/json-en.html
* https://www.guru99.com/json-vs-xml-difference.html

An example of JSON:

<img src='images/JSON.png'>

The name can be quite misleading since `JSON` is nowadays used not only in JavaScript. For example in data science, it has become a very popular way of storing our data. We can say that it is very similar to its predecessor, XML.

JSON is like XML because: 
* Both are self-describing, meaning that the values are labeled, therefore 'human-readable' 
* Both are hierarchical (nested), i.e. they can have values within values. 
* Both can be parsed and used by lots of programming languages. 
* Both can be passed around using HTTP request (important for APIs).

JSON is unlike XML because: 
* JSON has a tag name only at the beginning of an element (no tag at the end of an element), which results in a smaller size. 
* JSON is less verbose therefore quicker for humans to write and read. 
* JSON can include arrays, which leads to even smaller file sizes. 
* JSON can't use reserved words from JavaScript as tags.

We now know the two most common data formats in the APIs: `XML` and `JSON`.

# JSON Tutorial
Let's start with the import of Pandas.

In [1]:
import pandas as pd

Pandas has the function, `read_json()`, that can load JSON either from a file or a url.

In [2]:
url = "https://raw.githubusercontent.com/chrisalbon/simulated_datasets/master/data.json"
first_json = pd.read_json(url)
first_json.head()

Unnamed: 0,integer,datetime,category
0,5,2015-01-01 00:00:00,0
1,5,2015-01-01 00:00:01,0
2,9,2015-01-01 00:00:02,0
3,6,2015-01-01 00:00:03,0
4,6,2015-01-01 00:00:04,0


Writing the JSON data is as simple as reading and is one line of code. Instead of `read_json()`, you will use `to_json()` with a filename and that's all!

In [3]:
first_json.to_json('data/json_columns.json', orient="columns")
first_json.to_json('data/json_index.json', orient="index")

Check the two files and see the difference. These functions are the best option to deal with JSON. However, they don't always work.
* `read_json()` and `to_json()` works only with simple JSON. All arrays inside need to have arrays of same length.

So what about the nested JSON files? See the file [nested.json](https://drive.google.com/file/d/1PWg4uKcwO010y8MhnBIYAZbMi33vXOj7/view), how it looks like and try to load it into pandas with `pd.read_json()`

In [4]:
df = pd.read_json("data/nested.json")

ValueError: arrays must all be same length

We can see that it doesn't work. Fortunately, we have another method. This is not a Pandas function but the method from package `JSON` which comes with core Python.

In [6]:
import json
#load json object
with open('data/nested.json') as f:
    nested_json = json.load(f)
print(nested_json)
print(type(nested_json))

{'article': [{'id': '01', 'language': 'JSON', 'edition': 'first', 'author': 'Allen'}, {'id': '02', 'language': 'Python', 'edition': 'second', 'author': 'Aditya Sharma'}], 'blog': [{'name': 'Datacamp', 'URL': 'datacamp.com'}]}
<class 'dict'>


We can see that the file is automatically loaded as a Python dictionary. We can use package `pprint` for pretty printing dictionaries. This makes the human-parsing of json requests much easier to understand. We will use a function from Pandas `json_normalize()`

In [7]:
from pandas.io.json import json_normalize  
json_normalize(nested_json)

  json_normalize(nested_json)


Unnamed: 0,article,blog
0,"[{'id': '01', 'language': 'JSON', 'edition': '...","[{'name': 'Datacamp', 'URL': 'datacamp.com'}]"


We can see from above that the primary keys are the columns of the DataFrame. We were able to load it as a Pandas DataFrame but it still looks weird.

We are going to add a parameter `record_path` to `json_normalize` to put a focus on a specific key from the file:

In [8]:
blog = json_normalize(nested_json,record_path ='blog')
blog.head()

  blog = json_normalize(nested_json,record_path ='blog')


Unnamed: 0,name,URL
0,Datacamp,datacamp.com


In [9]:
article = json_normalize(nested_json,record_path ='article')
article.head()

  article = json_normalize(nested_json,record_path ='article')


Unnamed: 0,id,language,edition,author
0,1,JSON,first,Allen
1,2,Python,second,Aditya Sharma


`json_normalize()` has 3 main parameters:
* **data** - input data
* **record_path** - nested elements
* **meta** - let them as they are elements

Let's practice a bit more with `json_normalize()` on different data that are specified below:

In [10]:
# define json string
data = [{"state": "Florida", 
        "shortname": "FL",
        "info": {"governor": "Rick Scott"},
        "counties": [{"name": "Dade", "population": 12345},
                     {"name": "Broward", "population": 40000},
                     {"name": "Palm Beach", "population": 60000}]},
       {"state": "Ohio",
        "shortname": "OH",
        "info": {"governor": "John Kasich"},
        "counties": [{"name": "Summit", "population": 1234},
                     {"name": "Cuyahoga", "population": 1337}]}]

json_normalize(data)

  json_normalize(data)


Unnamed: 0,state,shortname,counties,info.governor
0,Florida,FL,"[{'name': 'Dade', 'population': 12345}, {'name...",Rick Scott
1,Ohio,OH,"[{'name': 'Summit', 'population': 1234}, {'nam...",John Kasich


In [11]:
json_normalize(data=data, record_path='counties', meta=['state', 'shortname', ['info', 'governor']])

  json_normalize(data=data, record_path='counties', meta=['state', 'shortname', ['info', 'governor']])


Unnamed: 0,name,population,state,shortname,info.governor
0,Dade,12345,Florida,FL,Rick Scott
1,Broward,40000,Florida,FL,Rick Scott
2,Palm Beach,60000,Florida,FL,Rick Scott
3,Summit,1234,Ohio,OH,John Kasich
4,Cuyahoga,1337,Ohio,OH,John Kasich
