## Single JSON Document in Files

Let us understand how to process single JSON in Files. We can leverage `json` or `pandas` modules for the same. For now, we will focus on `json` module.
* Here are the files used for the demo.
  * **single_document.json**
  * **youtube_playlist_items.json** - This is an example for REST API calls which return results in the form of list. The list will be part of one of the attributes in response JSON.
* Here are the steps you need to follow to review these documents using Jupyter Environment.
  * Go to the sidebar and select the file.
  * Right click on the file and click on **Open With -> Editor**
  * It will open the json file as a plain text file or raw text file.
* Both the documents have the data in single json.

Here are the steps to process a file which contain a simple JSON. You need to use `json.load` by passing file object (`_io.TextIOWrapper`).
* Pass the path of the file and create a File Object.
* Invoke `json.load` by passing the file object as argument.
* It will return `dict`.
* We can leverage dict operations to process the data further.

In [1]:
import json

In [3]:
!ls -ltr single_document.json

-rw-rw-r-- 1 itversity itversity 154 Jun 16 19:00 single_document.json


In [4]:
type('single_document.json')

str

In [None]:
json.load?

In [7]:
type(open('single_document.json'))

_io.TextIOWrapper

In [8]:
single_json = json.load(open('single_document.json'))

In [9]:
single_json

{'id': 1,
 'first_name': 'Frasco',
 'last_name': 'Necolds',
 'email': 'fnecolds0@vk.com',
 'gender': 'Male',
 'ip_address': '243.67.63.34'}

In [10]:
type(single_json)

dict

In [11]:
single_json.keys()

dict_keys(['id', 'first_name', 'last_name', 'email', 'gender', 'ip_address'])

In [12]:
single_json.values()

dict_values([1, 'Frasco', 'Necolds', 'fnecolds0@vk.com', 'Male', '243.67.63.34'])

In [13]:
single_json.items()

dict_items([('id', 1), ('first_name', 'Frasco'), ('last_name', 'Necolds'), ('email', 'fnecolds0@vk.com'), ('gender', 'Male'), ('ip_address', '243.67.63.34')])

In [14]:
single_json['first_name']

'Frasco'

The file **youtube_playlist_items.json** is an example for YouTube Data API response. It contain complex JSON structure.

* First let us understand the definition of YouTube Playlist.
  * A YouTube Playlist is nothing but series of videos.
  * Playlist also have name, URL as well as description.
  * Each video will have video id and its attributes.
  * The result for YouTube Playlist Items contain both Playlist level details as well as the details about videos that are part of the playlist.
  * The details of videos are made available as part of attribute called as **items**. The value for **items** is of type JSON Array.
* You can follow the same steps as above to read the JSON in the file **youtube_playlist_items.json** into a dict.
* However, the dict will be of complex structure. You can see **items** as of type `list`.

In [15]:
results_json = json.load(open('youtube_playlist_items.json'))

In [16]:
results_json

{'kind': 'youtube#playlistItemListResponse',
 'etag': 'lfs_qWNaczIydJ2Dlp1gmX9UTAc',
 'nextPageToken': 'CAUQAA',
 'items': [{'kind': 'youtube#playlistItem',
   'etag': 'SGHDydc4dLsY2RjfXTPneb_zc_s',
   'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy5EQkE3RTJCQTJEQkFBQTcz',
   'contentDetails': {'videoId': 'ETZJln4jtAo',
    'videoPublishedAt': '2020-11-28T16:29:47Z'},
   'status': {'privacyStatus': 'public'}},
  {'kind': 'youtube#playlistItem',
   'etag': '5EFUNhJBvcwXPxO416VYQsXGzMo',
   'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy4yQzk4QTA5QjkzMTFFOEI1',
   'contentDetails': {'videoId': '1OVHjHTkP3M',
    'videoPublishedAt': '2020-11-28T16:30:12Z'},
   'status': {'privacyStatus': 'public'}},
  {'kind': 'youtube#playlistItem',
   'etag': 'TiKqB2aeYxJjMGKQ0yLMJY0vpQE',
   'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy45NDlDQUFFOThDMTAxQjUw',
   'contentDetails': {'videoId': 'qfUbPLsLQcQ',
    'videoPublishedAt': '2020-11-28T16:30:33Z'},
   'status': {'privacyStatu

In [17]:
# Reading items. It contain details of videos in the playlist.
results_json['items']

[{'kind': 'youtube#playlistItem',
  'etag': 'SGHDydc4dLsY2RjfXTPneb_zc_s',
  'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy5EQkE3RTJCQTJEQkFBQTcz',
  'contentDetails': {'videoId': 'ETZJln4jtAo',
   'videoPublishedAt': '2020-11-28T16:29:47Z'},
  'status': {'privacyStatus': 'public'}},
 {'kind': 'youtube#playlistItem',
  'etag': '5EFUNhJBvcwXPxO416VYQsXGzMo',
  'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy4yQzk4QTA5QjkzMTFFOEI1',
  'contentDetails': {'videoId': '1OVHjHTkP3M',
   'videoPublishedAt': '2020-11-28T16:30:12Z'},
  'status': {'privacyStatus': 'public'}},
 {'kind': 'youtube#playlistItem',
  'etag': 'TiKqB2aeYxJjMGKQ0yLMJY0vpQE',
  'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy45NDlDQUFFOThDMTAxQjUw',
  'contentDetails': {'videoId': 'qfUbPLsLQcQ',
   'videoPublishedAt': '2020-11-28T16:30:33Z'},
  'status': {'privacyStatus': 'public'}},
 {'kind': 'youtube#playlistItem',
  'etag': 'vQrJOpYdXmGJuV32kjj2xqvSByc',
  'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJ

In [18]:
type(results_json['items'])

list

In [19]:
results_json['items'][0]

{'kind': 'youtube#playlistItem',
 'etag': 'SGHDydc4dLsY2RjfXTPneb_zc_s',
 'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy5EQkE3RTJCQTJEQkFBQTcz',
 'contentDetails': {'videoId': 'ETZJln4jtAo',
  'videoPublishedAt': '2020-11-28T16:29:47Z'},
 'status': {'privacyStatus': 'public'}}

In [20]:
results_json['items'][0]['contentDetails']

{'videoId': 'ETZJln4jtAo', 'videoPublishedAt': '2020-11-28T16:29:47Z'}

In [21]:
# Here is an example of printing item details.
for playlist_item in results_json['items']:
    print(playlist_item)

{'kind': 'youtube#playlistItem', 'etag': 'SGHDydc4dLsY2RjfXTPneb_zc_s', 'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy5EQkE3RTJCQTJEQkFBQTcz', 'contentDetails': {'videoId': 'ETZJln4jtAo', 'videoPublishedAt': '2020-11-28T16:29:47Z'}, 'status': {'privacyStatus': 'public'}}
{'kind': 'youtube#playlistItem', 'etag': '5EFUNhJBvcwXPxO416VYQsXGzMo', 'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy4yQzk4QTA5QjkzMTFFOEI1', 'contentDetails': {'videoId': '1OVHjHTkP3M', 'videoPublishedAt': '2020-11-28T16:30:12Z'}, 'status': {'privacyStatus': 'public'}}
{'kind': 'youtube#playlistItem', 'etag': 'TiKqB2aeYxJjMGKQ0yLMJY0vpQE', 'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy45NDlDQUFFOThDMTAxQjUw', 'contentDetails': {'videoId': 'qfUbPLsLQcQ', 'videoPublishedAt': '2020-11-28T16:30:33Z'}, 'status': {'privacyStatus': 'public'}}
{'kind': 'youtube#playlistItem', 'etag': 'vQrJOpYdXmGJuV32kjj2xqvSByc', 'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy4xN0Y2QjVBOEI2MzQ5OUM5', 'contentDetai

In [22]:
# Here is an example of gettig only contentDetails for each item.
for playlist_item in results_json['items']:
    print(playlist_item['contentDetails'])

{'videoId': 'ETZJln4jtAo', 'videoPublishedAt': '2020-11-28T16:29:47Z'}
{'videoId': '1OVHjHTkP3M', 'videoPublishedAt': '2020-11-28T16:30:12Z'}
{'videoId': 'qfUbPLsLQcQ', 'videoPublishedAt': '2020-11-28T16:30:33Z'}
{'videoId': 'rLTbhSaXhSM', 'videoPublishedAt': '2020-11-28T16:30:52Z'}
{'videoId': 'wP7BhXrJKR8', 'videoPublishedAt': '2020-11-28T16:31:14Z'}


In [23]:
# Here is how you can get video ids (using map function)
list(
    map(
        lambda playlist_item: playlist_item['contentDetails']['videoId'],
        results_json['items']
    )
)

['ETZJln4jtAo', '1OVHjHTkP3M', 'qfUbPLsLQcQ', 'rLTbhSaXhSM', 'wP7BhXrJKR8']

In [24]:
list(
    map(
        lambda playlist_item: playlist_item['contentDetails'],
        results_json['items']
    )
)

[{'videoId': 'ETZJln4jtAo', 'videoPublishedAt': '2020-11-28T16:29:47Z'},
 {'videoId': '1OVHjHTkP3M', 'videoPublishedAt': '2020-11-28T16:30:12Z'},
 {'videoId': 'qfUbPLsLQcQ', 'videoPublishedAt': '2020-11-28T16:30:33Z'},
 {'videoId': 'rLTbhSaXhSM', 'videoPublishedAt': '2020-11-28T16:30:52Z'},
 {'videoId': 'wP7BhXrJKR8', 'videoPublishedAt': '2020-11-28T16:31:14Z'}]