<a href="https://colab.research.google.com/github/soujanya-vattikolla/Nested-JSON-Parsing/blob/main/NestedJSON.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Extract Nested Data From Complex JSON

Using Google Maps API as an Example: <br>

Google Maps is actually a collection of APIs, the Google Maps Distance Matrix. With a single API call, a user can calculate the distance and time traveled between an origin and an infinite number of destinations.

In [None]:
# main.py
#  Here's an example of the types of parameters this request accepts:

"""Fetch and extract JSON data from Google Maps."""
import requests
from config import API_KEY

def google_maps_distance():
    """Fetch distance between two points."""
    endpoint = "https://maps.googleapis.com/maps/api/distancematrix/json"
    params = {
       'units': 'imperial',
       'key': API_KEY,
       'origins': 'New York City, NY',
       'destinations': 'Philadelphia,PA',
       'transit_mode': 'car'
    }
    r = requests.get(endpoint, params=params)
    return r.json

In [None]:
# One origin, one destination. The JSON response for a request this straightforward is quite simple:

{
  "destination_addresses": [
    "Philadelphia, PA, USA"
  ],
  "origin_addresses": [
    "New York, NY, USA"
  ],
  "rows": [{
    "elements": [{
      "distance": {
        "text": "94.6 mi",
        "value": 152193
      },
      "duration": {
        "text": "1 hour 44 mins",
        "value": 6227
      },
      "status": "OK"
    }]
  }],
  "status": "OK"
}

* For each destination, we're getting two data points: 
    * the commute distance, and estimated duration.

In [None]:
# main.py
# Now add a few more stops on our trip:

def google_maps_distance():
    """Fetch distance between two points."""
    endpoint = "https://maps.googleapis.com/maps/api/distancematrix/json"
    params = {
       'units': 'imperial',
       'key': API_KEY,
       'origins': 'New York City, NY',
       'destinations': 'Washington,DC|Philadelphia,PA|Santa Barbara,CA|Miami,FL|Austin,TX|Napa County,CA',
       'transit_mode': 'car'
    }
    r = requests.get(endpoint, params=params)
    return r.json()

In [None]:
# Output of google_maps_distance()

{
  "destination_addresses": [
    "Washington, DC, USA",
    "Philadelphia, PA, USA",
    "Santa Barbara, CA, USA",
    "Miami, FL, USA",
    "Austin, TX, USA",
    "Napa County, CA, USA"
  ],
  "origin_addresses": [
    "New York, NY, USA"
  ],
  "rows": [{
    "elements": [{
        "distance": {
          "text": "227 mi",
          "value": 365468
        },
        "duration": {
          "text": "3 hours 54 mins",
          "value": 14064
        },
        "status": "OK"
      },
      {
        "distance": {
          "text": "94.6 mi",
          "value": 152193
        },
        "duration": {
          "text": "1 hour 44 mins",
          "value": 6227
        },
        "status": "OK"
      },
      {
        "distance": {
          "text": "2,878 mi",
          "value": 4632197
        },
        "duration": {
          "text": "1 day 18 hours",
          "value": 151772
        },
        "status": "OK"
      },
      {
        "distance": {
          "text": "1,286 mi",
          "value": 2069031
        },
        "duration": {
          "text": "18 hours 43 mins",
          "value": 67405
        },
        "status": "OK"
      },
      {
        "distance": {
          "text": "1,742 mi",
          "value": 2802972
        },
        "duration": {
          "text": "1 day 2 hours",
          "value": 93070
        },
        "status": "OK"
      },
      {
        "distance": {
          "text": "2,871 mi",
          "value": 4620514
        },
        "duration": {
          "text": "1 day 18 hours",
          "value": 152913
        },
        "status": "OK"
      }
    ]
  }],
  "status": "OK"
}

There are objects and listsn and there are lists of objects which are part of an object.

The json_extract() can be imported as a module into any project if we need.

In [None]:
# extract.py

"""Extract nested values from a JSON tree."""


def json_extract(obj, key):
    """Recursively fetch values from nested JSON."""
    arr = []

    def extract(obj, arr, key):
        """Recursively search for values of key in JSON tree."""
        if isinstance(obj, dict):
            for k, v in obj.items():
                if isinstance(v, (dict, list)):
                    extract(v, arr, key)
                elif k == key:
                    arr.append(v)
        elif isinstance(obj, list):
            for item in obj:
                extract(item, arr, key)
        return arr

    values = extract(obj, arr, key)
    return values

In [None]:
# Output of json_extract()
from extract import json_extract

# Find every instance of `name` in a Python dictionary.
names = json_extract(r.json(), 'name')
print(names)

Regardless of where the key "text" lives in the JSON, this function returns every value for the instance of "key." Here's the function:

In [None]:
# main.py
from extract import json_extract


def google_maps_distance():
    """Fetch distance between two points."""
    endpoint = "https://maps.googleapis.com/maps/api/distancematrix/json"
    params = {
       'units': 'imperial',
       'key': API_KEY,
       'origins': "New York City,NY",
       'destinations': "Washington,DC|Philadelphia,PA|Santa Barbara,CA|Miami,FL|Austin,TX|Napa Valley,CA",
       'transit_mode': 'car',
    }
   r = requests.get(endpoint, params=params)
   travel_values = json_extract(r.json(), 'text')
   return travel_values

In [None]:
# Output of google_maps_distance()
['227 mi',
 '3 hours 54 mins',
 '94.6 mi',
 '1 hour 44 mins',
 '2,878 mi',
 '1 day 18 hours',
 '1,286 mi',
 '18 hours 43 mins',
 '1,742 mi',
 '1 day 2 hours',
 '2,871 mi',
 '1 day 18 hours'
 ]

The Google API alternates between distance and trip duration, every other value alternates between distance and time. Python can help us split this list into two lists:

In [None]:
# Parse every other value.
my_values = json_extract(r.json(), 'text')

durations = my_values[1::2]  # Get every even-index value from a list
distances = my_values[2::1]  # Get every odd-index value from a list

print('Durations = ', durations)
print('Distances = ', distances)

In [None]:
# Output
Durations = [
    '3 hours 54 mins',
    '1 hour 44 mins',
    '1 day 18 hours',
    '18 hours 43 mins',
    '1 day 2 hours',
    '1 day 18 hours'
]
Distances = [
    '94.6 mi',
    '1 hour 44 mins',
    '2,878 mi',
    '1 day 18 hours',
    '1,286 mi',
    '18 hours 43 mins',
    '1,742 mi',
    '1 day 2 hours',
    '2,871 mi',
    '1 day 18 hours'
]

Getting Creative With Lists

In [None]:
# Two related lists
column_names = ['index', 'first_name', 'last_name', 'join_date']
column_datatypes = ['integer', 'string', 'string', 'date']

In [None]:
# Python's zip method!
# Zip two lists into a dictionary
schema_dict = dict(zip(column_names, column_datatypes))
print(schema_dict)

This output a dictionary where list 1 serves as the keys, and list 2 serves as values:

In [None]:
# Zipped lists resulting in a dictionary

{
    'index': 'integer',
    'first_name': 'string',
    'last_name':'string',
    'join_date': 'date'
 }