# Taming ```JSON ```

## Part 2: flattening nested data

While JSON, or JavaScript Object Notation, is a widely-used light-weight data storage format, it can be difficult to parse because of its deeply nested logic.

Let's explore different ways to flatten nested ```JSON*```.

```*``` JSON's versatility makes customization limitless - which also means there is **NO "one-solution-fit-all"** and often **no simple solution** to flatten all JSON data in the wild.



In [None]:
## import libraries
import pandas as pd
import requests
import json


#### Read this simple nested ```JSON``` formatted data:


In [None]:
## run this cell to capture json object in memory
json_obj = {
  "publication": "Toxic Water Research",
  "location": "Midwest",
  "reach": "Regional",
  "info": {
    "editor": "Dr. Emily Carter",
    "contacts": {
      "email": {
        "tips": "issues@toxicwaterresearch.com",
        "general": "contact@toxicwaterresearch.com"
      },
      "tel": "800123456"
    }
  }
}




In [None]:
## type of object


In [None]:
## let's try to turn it directly into a df 


## ```json_normalize()```

This powerful method helps us flatten nested ```json```.

#### Basic Syntax:

###### ```pd.json_normalize(list or dictionary)```

In [None]:
#### Normalize json_obj, a single dictionary:


#### Normalize a list of dictionaries:

In [None]:
## a list of nested json

json_list_obj = [
{
  "publication": "Toxic Water Research",
  "location": "Midwest",
  "reach": "Regional",
  "info": {
    "editor": "Dr. Emily Carter",
    "contacts": {
      "email": {
        "tips": "issues@toxicwaterresearch.com",
        "general": "contact@toxicwaterresearch.com"
      },
      "tel": "800123456"
    }
  }
},
{
  "publication": "Chemical Contaminants Review",
  "location": "Northeast",
  "reach": "National",
  "info": {
    "editor": "Dr. Robert Mason",
    "contacts": {
      "email": {
        "tips": "issues@chemcontaminantsreview.com",
        "general": "contact@chemcontaminantsreview.com"
      },
      "tel": "800987654"
    }
  }
}

    
]

In [None]:
## type


In [None]:
## turn into a dataframe


In [None]:
## normalize list of dicts


In [None]:
## control the levels into which you want to enter the nest


In [None]:
## max level 3


## Let's deal with nested lists

### Deeply nest mock data

In [None]:
nested_j = '''
{
	"data": [{
			"fundID": 1,
			"firstName": "John",
			"lastName": "Smith",
			"categories": [{
				"type": "hedge",
				"description": "Get Rich Fast"
			}],
			"under_investigation": false
		},
		{
			"fundID": 2,
			"firstName": "George",
			"lastName": "Santos",
			"categories": [{
				"type": "hedge",
				"description": "Ponzi"
			}],
			"under_investigation": true
		},
		{
			"fundID": 3,
			"firstName": "Sarah",
			"lastName": "Kepler",
			"categories": [{
				"type": "venture",
				"description": "Angel funding"
			}],
			"under_investigation": false
		},
		{
			"fundID": 4,
			"firstName": "Liz",
			"lastName": "Smith",
			"categories": [{
				"type": "mutual fund",
				"description": "slow and steady"
			}],
			"under_investigation": false
		}
	]
}
'''

In [None]:
#type of data


In [None]:
## load json into variable


In [None]:
## type of data


In [None]:
## try to convert to df using pd.DataFrame


In [None]:
## try using pd.json_normalize


## ```json_normalize()``` with ```record_path``` parameter

A record path taps a level of the nested data

####  Single path syntax:

###### ```pd.json_normalize(list or dictionary, record_path = "single_path")```

#### Multiple paths syntax:

###### ```pd.json_normalize(list or dictionary, record_path = ["first_path", "second_path"])```

In [None]:
## add a record path


In [None]:
## provide record path to gun-data


## ```json_normalize()``` with ```record_path``` and ```meta``` parameters

- ```record_path``` specifies where the desired list of records (or rows) is located in the JSON structure. 

- ```meta``` allows you to include additional columns from higher or sibling levels of the JSON structure

####  Syntax:

 ```pd.json_normalize(list or dictionary, record_path = "single_path", meta = [list of meta data items])```


In [None]:
## code here

## Multiple nested data

In [None]:
data = [

	{
		"kingdom": "Animalia",
		"class": "Mammalia",
		"species": [{
			"scientific_name": "Canis Lupus",
			"common_name": "Gray Wolf",
			"relations": [{
				"domesticated": False,
				"social": True
			}]
		}]
	},
	{
		"kingdom": "Animalia",
		"class": "Mammalia",
		"species": [{
			"scientific_name": "Panthera Leo",
			"common_name": "Lion",
			"relations": [{
				"domesticated": False,
				"social": True
			}]
		}]
	},

	{
		"kingdom": "Animalia",
		"class": "Mammalia",
		"species": [{
			"scientific_name": "Panthera Tigris",
			"common_name": "Tiger",
			"relations": [{
				"domesticated": False,
				"social": False
			}]
		}]
	},

	{
		"kingdom": "Animalia",
		"class": "Mammalia",
		"species": [{
			"scientific_name": "Equus ferus",
			"common_name": "Horse",
			"relations": [{
				"domesticated": True,
				"social": True
			}]
		}]
	}
]

In [None]:
## get top keys


In [None]:
## dig deeper to find what relations holds


In [None]:
## in the first dict, tap species to see what that key holds


In [None]:
## what happens if we straight up normalize


In [None]:
## provide record_paths to species


In [None]:
## subs


In [None]:
## pass multiple record paths


In [None]:
## pass meta data to it


### Super deeply nested data

In [None]:
## add another layer
## note that within species, there are now two lists of dicts

data = [

	{
		"kingdom": "Animalia",
		"class": "Mammalia",
		"species": [{
			"scientific_name": "Canis Lupus",
			"common_name": "Gray Wolf",
			"relations": [{
				"domesticated": False,
				"social": True
			}],
			"characteristics": [{
				"color": "Gray",
				"carnivore": True
			}]
		}]
	},
	{
		"kingdom": "Animalia",
		"class": "Mammalia",
		"species": [{
			"scientific_name": "Panthera Leo",
			"common_name": "Lion",
			"relations": [{
				"domesticated": False,
				"social": True
			}],
			"characteristics": [{
				"color": "Yellow",
				"carnivore": True
			}]
		}]
	},

	{
		"kingdom": "Animalia",
		"class": "Mammalia",
		"species": [{
			"scientific_name": "Panthera Tigris",
			"common_name": "Tiger",
			"relations": [{
				"domesticated": False,
				"social": False
			}],
			"characteristics": [{
				"color": "Striped",
				"carnivore": True
			}]
		}]
	},

	{
		"kingdom": "Animalia",
		"class": "Mammalia",
		"species": [{
			"scientific_name": "Equus Ferus",
			"common_name": "Horse",
			"relations": [{
				"domesticated": True,
				"social": True
			}],
			"characteristics": [{
				"color": "Multiple",
				"carnivore": False
			}]
		}]
	}
]

In [None]:
## run json normalize on species


In [None]:
## provide species as path


In [None]:
## provide species and characteristics as record_path


In [None]:
## provide species and relations as record_path


In [None]:
## provide add characteristics to the record_path
## WILL BREAK!


## The limitations of ```json_normalize()```

The reality is that ```JSON``` is far to versatile and flexible for ```json_normalize()``` to work in every situation. ```json_normalize()``` is just NOT a universal parser and won't work in one go for every complex situation.

Our approach: create two data frames and merge them on a common column.

In [None]:
## create a df for characteristics


In [None]:
## create a df for relations
## note we need few meta data, just enough to find a common colum.


In [None]:
## merge the two


### Global Covid Data (Nested Json from url)

In [None]:
## request data
url = "https://epicovcharts.bii.a-star.edu.sg/variants-dashboard/data/variants_countries_count.json"


In [None]:
## load json


In [None]:
## call data


In [None]:
## get our bearings


In [None]:
## get our bearings


In [None]:
## get our bearings


In [None]:
## flatten json into df


## AP Election Data

The Associated Press election feed provides wickedly nested election data.

[Download an excerpt](https://raw.githubusercontent.com/sandeepmj/datasets/main/va-election-AP.json) and flatten all nested elements.

At Bloomberg, we flattened this data using the techiques covered in this course.



In [None]:
## create more cells as necessary