## Process JSON using Pandas

Let us understand how to process JSON using Pandas.
* We can use `read_json` to read JSON documents from file into a Data Frame.
* It works well with **customers.json** where we have one valid JSON document per line.

In [1]:
import pandas as pd

In [15]:
pd.read_json?

[0;31mSignature:[0m
[0mpd[0m[0;34m.[0m[0mread_json[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mpath_or_buf[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0morient[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mtyp[0m[0;34m=[0m[0;34m'frame'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mdtype[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mconvert_axes[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mconvert_dates[0m[0;34m=[0m[0;32mTrue[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mkeep_default_dates[0m[0;34m:[0m[0mbool[0m[0;34m=[0m[0;32mTrue[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mnumpy[0m[0;34m:[0m[0mbool[0m[0;34m=[0m[0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mprecise_float[0m[0;34m:[0m[0mbool[0m[0;34m=[0m[0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mdate_unit[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0

In [4]:
pd.read_json('customers.json', lines=True)

Unnamed: 0,id,first_name,last_name,email,gender,ip_address
0,1,Frasco,Necolds,fnecolds0@vk.com,Male,243.67.63.34
1,2,Dulce,Santos,dsantos1@mashable.com,Female,60.30.246.227
2,3,Prissie,Tebbett,ptebbett2@infoseek.co.jp,Genderfluid,22.21.162.56
3,4,Schuyler,Coppledike,scoppledike3@gnu.org,Agender,120.35.186.161
4,5,Leopold,Jarred,ljarred4@wp.com,Agender,30.119.34.4
5,6,Joanna,Teager,jteager5@apache.org,Bigender,245.221.176.34
6,7,Lion,Beere,lbeere6@bloomberg.com,Polygender,105.54.139.46
7,8,Marabel,Wornum,mwornum7@posterous.com,Polygender,247.229.14.25
8,9,Helenka,Mullender,hmullender8@cloudflare.com,Non-binary,133.216.118.88
9,10,Christine,Swane,cswane9@shop-pro.jp,Polygender,86.16.210.164


* It is not straight forward to create data frame using **youtube_playlist_items.json** where we have one single JSON document with multiple attributes.
* We can extract **items** and create data frame using `pd.DataFrame` by passing the list of dicts to it.

In [6]:
import json

In [7]:
type(open('youtube_playlist_items.json'))

_io.TextIOWrapper

In [10]:
json.load(open('youtube_playlist_items.json'))['items'][0]

{'kind': 'youtube#playlistItem',
 'etag': 'SGHDydc4dLsY2RjfXTPneb_zc_s',
 'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy5EQkE3RTJCQTJEQkFBQTcz',
 'contentDetails': {'videoId': 'ETZJln4jtAo',
  'videoPublishedAt': '2020-11-28T16:29:47Z'},
 'status': {'privacyStatus': 'public'}}

In [11]:
yt_items = json.load(open('youtube_playlist_items.json'))['items']

In [12]:
pd.DataFrame(yt_items)

Unnamed: 0,kind,etag,id,contentDetails,status
0,youtube#playlistItem,SGHDydc4dLsY2RjfXTPneb_zc_s,UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy...,"{'videoId': 'ETZJln4jtAo', 'videoPublishedAt':...",{'privacyStatus': 'public'}
1,youtube#playlistItem,5EFUNhJBvcwXPxO416VYQsXGzMo,UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy...,"{'videoId': '1OVHjHTkP3M', 'videoPublishedAt':...",{'privacyStatus': 'public'}
2,youtube#playlistItem,TiKqB2aeYxJjMGKQ0yLMJY0vpQE,UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy...,"{'videoId': 'qfUbPLsLQcQ', 'videoPublishedAt':...",{'privacyStatus': 'public'}
3,youtube#playlistItem,vQrJOpYdXmGJuV32kjj2xqvSByc,UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy...,"{'videoId': 'rLTbhSaXhSM', 'videoPublishedAt':...",{'privacyStatus': 'public'}
4,youtube#playlistItem,2CzGUToIgqywXAr4wuPswj9MuFg,UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy...,"{'videoId': 'wP7BhXrJKR8', 'videoPublishedAt':...",{'privacyStatus': 'public'}


In [13]:
pd.json_normalize(json.load(open('youtube_playlist_items.json'))['items']) # nested jsons are flattened

Unnamed: 0,kind,etag,id,contentDetails.videoId,contentDetails.videoPublishedAt,status.privacyStatus
0,youtube#playlistItem,SGHDydc4dLsY2RjfXTPneb_zc_s,UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy...,ETZJln4jtAo,2020-11-28T16:29:47Z,public
1,youtube#playlistItem,5EFUNhJBvcwXPxO416VYQsXGzMo,UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy...,1OVHjHTkP3M,2020-11-28T16:30:12Z,public
2,youtube#playlistItem,TiKqB2aeYxJjMGKQ0yLMJY0vpQE,UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy...,qfUbPLsLQcQ,2020-11-28T16:30:33Z,public
3,youtube#playlistItem,vQrJOpYdXmGJuV32kjj2xqvSByc,UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy...,rLTbhSaXhSM,2020-11-28T16:30:52Z,public
4,youtube#playlistItem,2CzGUToIgqywXAr4wuPswj9MuFg,UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy...,wP7BhXrJKR8,2020-11-28T16:31:14Z,public


* Other standard formats supported by Pandas.
* We can use `to_json` to display the output from buffer. We can also write to files using `to_json`.
* Here are examples with different supported JSON formats.

In [16]:
df = pd.read_json('customers.json', lines=True)

In [19]:
df

Unnamed: 0,id,first_name,last_name,email,gender,ip_address
0,1,Frasco,Necolds,fnecolds0@vk.com,Male,243.67.63.34
1,2,Dulce,Santos,dsantos1@mashable.com,Female,60.30.246.227
2,3,Prissie,Tebbett,ptebbett2@infoseek.co.jp,Genderfluid,22.21.162.56
3,4,Schuyler,Coppledike,scoppledike3@gnu.org,Agender,120.35.186.161
4,5,Leopold,Jarred,ljarred4@wp.com,Agender,30.119.34.4
5,6,Joanna,Teager,jteager5@apache.org,Bigender,245.221.176.34
6,7,Lion,Beere,lbeere6@bloomberg.com,Polygender,105.54.139.46
7,8,Marabel,Wornum,mwornum7@posterous.com,Polygender,247.229.14.25
8,9,Helenka,Mullender,hmullender8@cloudflare.com,Non-binary,133.216.118.88
9,10,Christine,Swane,cswane9@shop-pro.jp,Polygender,86.16.210.164


In [20]:
df.to_json(orient='split') # columns, index and data are separated

'{"columns":["id","first_name","last_name","email","gender","ip_address"],"index":[0,1,2,3,4,5,6,7,8,9],"data":[[1,"Frasco","Necolds","fnecolds0@vk.com","Male","243.67.63.34"],[2,"Dulce","Santos","dsantos1@mashable.com","Female","60.30.246.227"],[3,"Prissie","Tebbett","ptebbett2@infoseek.co.jp","Genderfluid","22.21.162.56"],[4,"Schuyler","Coppledike","scoppledike3@gnu.org","Agender","120.35.186.161"],[5,"Leopold","Jarred","ljarred4@wp.com","Agender","30.119.34.4"],[6,"Joanna","Teager","jteager5@apache.org","Bigender","245.221.176.34"],[7,"Lion","Beere","lbeere6@bloomberg.com","Polygender","105.54.139.46"],[8,"Marabel","Wornum","mwornum7@posterous.com","Polygender","247.229.14.25"],[9,"Helenka","Mullender","hmullender8@cloudflare.com","Non-binary","133.216.118.88"],[10,"Christine","Swane","cswane9@shop-pro.jp","Polygender","86.16.210.164"]]}'

In [21]:
pd.read_json(_, orient='split') # Creating data frame by using data from buffer

Unnamed: 0,id,first_name,last_name,email,gender,ip_address
0,1,Frasco,Necolds,fnecolds0@vk.com,Male,243.67.63.34
1,2,Dulce,Santos,dsantos1@mashable.com,Female,60.30.246.227
2,3,Prissie,Tebbett,ptebbett2@infoseek.co.jp,Genderfluid,22.21.162.56
3,4,Schuyler,Coppledike,scoppledike3@gnu.org,Agender,120.35.186.161
4,5,Leopold,Jarred,ljarred4@wp.com,Agender,30.119.34.4
5,6,Joanna,Teager,jteager5@apache.org,Bigender,245.221.176.34
6,7,Lion,Beere,lbeere6@bloomberg.com,Polygender,105.54.139.46
7,8,Marabel,Wornum,mwornum7@posterous.com,Polygender,247.229.14.25
8,9,Helenka,Mullender,hmullender8@cloudflare.com,Non-binary,133.216.118.88
9,10,Christine,Swane,cswane9@shop-pro.jp,Polygender,86.16.210.164


In [22]:
df.to_json(orient='records') # array of json documents

'[{"id":1,"first_name":"Frasco","last_name":"Necolds","email":"fnecolds0@vk.com","gender":"Male","ip_address":"243.67.63.34"},{"id":2,"first_name":"Dulce","last_name":"Santos","email":"dsantos1@mashable.com","gender":"Female","ip_address":"60.30.246.227"},{"id":3,"first_name":"Prissie","last_name":"Tebbett","email":"ptebbett2@infoseek.co.jp","gender":"Genderfluid","ip_address":"22.21.162.56"},{"id":4,"first_name":"Schuyler","last_name":"Coppledike","email":"scoppledike3@gnu.org","gender":"Agender","ip_address":"120.35.186.161"},{"id":5,"first_name":"Leopold","last_name":"Jarred","email":"ljarred4@wp.com","gender":"Agender","ip_address":"30.119.34.4"},{"id":6,"first_name":"Joanna","last_name":"Teager","email":"jteager5@apache.org","gender":"Bigender","ip_address":"245.221.176.34"},{"id":7,"first_name":"Lion","last_name":"Beere","email":"lbeere6@bloomberg.com","gender":"Polygender","ip_address":"105.54.139.46"},{"id":8,"first_name":"Marabel","last_name":"Wornum","email":"mwornum7@postero

In [23]:
pd.read_json(_) # the default for orient is None which is similar to records

Unnamed: 0,id,first_name,last_name,email,gender,ip_address
0,1,Frasco,Necolds,fnecolds0@vk.com,Male,243.67.63.34
1,2,Dulce,Santos,dsantos1@mashable.com,Female,60.30.246.227
2,3,Prissie,Tebbett,ptebbett2@infoseek.co.jp,Genderfluid,22.21.162.56
3,4,Schuyler,Coppledike,scoppledike3@gnu.org,Agender,120.35.186.161
4,5,Leopold,Jarred,ljarred4@wp.com,Agender,30.119.34.4
5,6,Joanna,Teager,jteager5@apache.org,Bigender,245.221.176.34
6,7,Lion,Beere,lbeere6@bloomberg.com,Polygender,105.54.139.46
7,8,Marabel,Wornum,mwornum7@posterous.com,Polygender,247.229.14.25
8,9,Helenka,Mullender,hmullender8@cloudflare.com,Non-binary,133.216.118.88
9,10,Christine,Swane,cswane9@shop-pro.jp,Polygender,86.16.210.164


In [24]:
df.to_json(orient='records', lines=True) # Multiple jsons with one json per line

'{"id":1,"first_name":"Frasco","last_name":"Necolds","email":"fnecolds0@vk.com","gender":"Male","ip_address":"243.67.63.34"}\n{"id":2,"first_name":"Dulce","last_name":"Santos","email":"dsantos1@mashable.com","gender":"Female","ip_address":"60.30.246.227"}\n{"id":3,"first_name":"Prissie","last_name":"Tebbett","email":"ptebbett2@infoseek.co.jp","gender":"Genderfluid","ip_address":"22.21.162.56"}\n{"id":4,"first_name":"Schuyler","last_name":"Coppledike","email":"scoppledike3@gnu.org","gender":"Agender","ip_address":"120.35.186.161"}\n{"id":5,"first_name":"Leopold","last_name":"Jarred","email":"ljarred4@wp.com","gender":"Agender","ip_address":"30.119.34.4"}\n{"id":6,"first_name":"Joanna","last_name":"Teager","email":"jteager5@apache.org","gender":"Bigender","ip_address":"245.221.176.34"}\n{"id":7,"first_name":"Lion","last_name":"Beere","email":"lbeere6@bloomberg.com","gender":"Polygender","ip_address":"105.54.139.46"}\n{"id":8,"first_name":"Marabel","last_name":"Wornum","email":"mwornum7@p

In [25]:
pd.read_json(_, lines=True) # the default for orient is records

Unnamed: 0,id,first_name,last_name,email,gender,ip_address
0,1,Frasco,Necolds,fnecolds0@vk.com,Male,243.67.63.34
1,2,Dulce,Santos,dsantos1@mashable.com,Female,60.30.246.227
2,3,Prissie,Tebbett,ptebbett2@infoseek.co.jp,Genderfluid,22.21.162.56
3,4,Schuyler,Coppledike,scoppledike3@gnu.org,Agender,120.35.186.161
4,5,Leopold,Jarred,ljarred4@wp.com,Agender,30.119.34.4
5,6,Joanna,Teager,jteager5@apache.org,Bigender,245.221.176.34
6,7,Lion,Beere,lbeere6@bloomberg.com,Polygender,105.54.139.46
7,8,Marabel,Wornum,mwornum7@posterous.com,Polygender,247.229.14.25
8,9,Helenka,Mullender,hmullender8@cloudflare.com,Non-binary,133.216.118.88
9,10,Christine,Swane,cswane9@shop-pro.jp,Polygender,86.16.210.164
