# Taming JSON

## Part 1 - reading

JSON, or JavaScript Object Notation, is a light-weight data storage format that is now the most prevalent way that information is held in databases and web applications – data that is often critical to our reporting. At the same time JSON can be difficult to parse because of its deeply nested logic.

Let's study a few JSON files to understand their structure.

## Types of JSON objects

### Large but Easy (and in a file)

<a href="https://raw.githubusercontent.com/sandeepmj/datasets/main/guns.json">Download</a>: ```guns.json```

<a href="https://raw.githubusercontent.com/sandeepmj/datasets/main/gun-data-in-json-file.json">Download:</a> ```gun-data-in-json-file.json```

<a href="https://raw.githubusercontent.com/sandeepmj/datasets/main/countries.json">Download</a>: ```countries.json```

What's the difference between these two files?

#### Small but Complex (in your notebook):

In [1]:
## run this MOCK data cell:

[
  {
    "study": "Toxic Water Research",
    "region": "Midwest",
    "scope": "Regional",
    "details": {
      "lead_scientist": "Dr. Emily Carter",
      "contacts": {
        "email": {
          "report_issues": "issues@toxicwaterresearch.com",
          "inquiries": "contact@toxicwaterresearch.com"
        },
        "phone": "800123456"
      }
    }
  },
  {
    "study": "Chemical Contaminants Review",
    "region": "Northeast",
    "scope": "National",
    "details": {
      "lead_scientist": "Dr. Robert Mason",
      "contacts": {
        "email": {
          "report_issues": "issues@chemcontaminantsreview.com",
          "inquiries": "contact@chemcontaminantsreview.com"
        },
        "phone": "800987654"
      }
    }
  }
]


[{'study': 'Toxic Water Research',
  'region': 'Midwest',
  'scope': 'Regional',
  'details': {'lead_scientist': 'Dr. Emily Carter',
   'contacts': {'email': {'report_issues': 'issues@toxicwaterresearch.com',
     'inquiries': 'contact@toxicwaterresearch.com'},
    'phone': '800123456'}}},
 {'study': 'Chemical Contaminants Review',
  'region': 'Northeast',
  'scope': 'National',
  'details': {'lead_scientist': 'Dr. Robert Mason',
   'contacts': {'email': {'report_issues': 'issues@chemcontaminantsreview.com',
     'inquiries': 'contact@chemcontaminantsreview.com'},
    'phone': '800987654'}}}]

#### Massive and Nested (and on a server)

<a href="https://epicovcharts.bii.a-star.edu.sg/variants-dashboard/data/variants_countries_count.json">Global COVID data</a>

## Reading JSON objects

In [2]:
## import libraries
import pandas as pd


### Read ```JSON``` file

#### Sometimes, we are in luck with a cleanly packaged ```JSON``` file that is not nested and plays nice.

All we need is ```pd.read_json("file_path")```

In [4]:
## these files can be read right into a df
df = pd.read_json("guns.json")
df

Unnamed: 0,occur_year,boro,precinct,statistical_murder_flag,vic_age_group,vic_sex,vic_race
0,2006,The Bronx,41,No,25-44,Male,White Hispanic
1,2006,Manhattan,34,No,18-24,Male,White Hispanic
2,2006,Manhattan,34,No,18-24,Male,White Hispanic
3,2006,The Bronx,44,No,18-24,Female,White Hispanic
4,2006,Brooklyn,81,No,25-44,Male,Black
...,...,...,...,...,...,...,...
20654,2018,Queens,103,No,25-44,Male,White
20655,2018,Brooklyn,70,No,18-24,Male,Black
20656,2018,Brooklyn,70,No,25-44,Male,Black
20657,2018,Brooklyn,73,No,25-44,Male,Black


#### ... and easily exported as a csv file:

In [5]:
## export as csv file
df.to_csv("guns.csv", encoding = "UTF-8", index = False)

### What about this file:

```gun-data-in-json-file.json```

In [6]:
## READ into df
df = pd.read_json("gun-data-in-json-file.json")
df

Unnamed: 0,gun-data
0,"{'occur_year': '2006', 'boro': 'The Bronx', 'p..."
1,"{'occur_year': '2006', 'boro': 'Manhattan', 'p..."
2,"{'occur_year': '2006', 'boro': 'Manhattan', 'p..."
3,"{'occur_year': '2006', 'boro': 'The Bronx', 'p..."


## ```json.load()``` v. ```json.loads()```

The ```json``` package has two similarly named methods that each do something quite different:

- ```.load()``` creates a ```Python Dictionary``` from a **```json``` file.**
- ```.loads()``` creates a ```Python Dictionary``` from a **```json``` string**.

In [7]:
## import python's json package
import json

#### Import ```json``` file

In [14]:
## open and load json file
with open("gun-data-in-json-file.json", "r") as j:
    gun_data_converted = json.load(j)

gun_data_converted

{'gun-data': [{'occur_year': '2006',
   'boro': 'The Bronx',
   'precinct': 41,
   'statistical_murder_flag': 'No',
   'vic_age_group': '25-44',
   'vic_sex': 'Male',
   'vic_race': 'White Hispanic'},
  {'occur_year': '2006',
   'boro': 'Manhattan',
   'precinct': 34,
   'statistical_murder_flag': 'No',
   'vic_age_group': '18-24',
   'vic_sex': 'Male',
   'vic_race': 'White Hispanic'},
  {'occur_year': '2006',
   'boro': 'Manhattan',
   'precinct': 34,
   'statistical_murder_flag': 'No',
   'vic_age_group': '18-24',
   'vic_sex': 'Male',
   'vic_race': 'White Hispanic'},
  {'occur_year': '2006',
   'boro': 'The Bronx',
   'precinct': 44,
   'statistical_murder_flag': 'No',
   'vic_age_group': '18-24',
   'vic_sex': 'Female',
   'vic_race': 'White Hispanic'}]}

In [9]:
## type object
type(j)

_io.TextIOWrapper

In [15]:
## type variable
type(gun_data_converted)

dict

In [16]:
## turn into df
df = pd.DataFrame(gun_data_converted)
df

Unnamed: 0,gun-data
0,"{'occur_year': '2006', 'boro': 'The Bronx', 'p..."
1,"{'occur_year': '2006', 'boro': 'Manhattan', 'p..."
2,"{'occur_year': '2006', 'boro': 'Manhattan', 'p..."
3,"{'occur_year': '2006', 'boro': 'The Bronx', 'p..."


In [17]:
## call the keys
gun_data_converted.keys()

dict_keys(['gun-data'])

In [19]:
## get key
target_data = gun_data_converted.get("gun-data")
target_data

[{'occur_year': '2006',
  'boro': 'The Bronx',
  'precinct': 41,
  'statistical_murder_flag': 'No',
  'vic_age_group': '25-44',
  'vic_sex': 'Male',
  'vic_race': 'White Hispanic'},
 {'occur_year': '2006',
  'boro': 'Manhattan',
  'precinct': 34,
  'statistical_murder_flag': 'No',
  'vic_age_group': '18-24',
  'vic_sex': 'Male',
  'vic_race': 'White Hispanic'},
 {'occur_year': '2006',
  'boro': 'Manhattan',
  'precinct': 34,
  'statistical_murder_flag': 'No',
  'vic_age_group': '18-24',
  'vic_sex': 'Male',
  'vic_race': 'White Hispanic'},
 {'occur_year': '2006',
  'boro': 'The Bronx',
  'precinct': 44,
  'statistical_murder_flag': 'No',
  'vic_age_group': '18-24',
  'vic_sex': 'Female',
  'vic_race': 'White Hispanic'}]

In [20]:
## turn into a dataframe
pd.DataFrame(target_data)

Unnamed: 0,occur_year,boro,precinct,statistical_murder_flag,vic_age_group,vic_sex,vic_race
0,2006,The Bronx,41,No,25-44,Male,White Hispanic
1,2006,Manhattan,34,No,18-24,Male,White Hispanic
2,2006,Manhattan,34,No,18-24,Male,White Hispanic
3,2006,The Bronx,44,No,18-24,Female,White Hispanic


### What do we see?

We'll learn how to deal with this shortly.

## Import ```json``` string

In [21]:
## run this cell that holds a json string
my_json = '''
{
	"gun-data":

		[{
				"occur_year": "2006",
				"boro": "The Bronx",
				"precinct": 41,
				"statistical_murder_flag": "No",
				"vic_age_group": "25-44",
				"vic_sex": "Male",
				"vic_race": "White Hispanic"
			},

			{

				"occur_year": "2006",
				"boro": "Manhattan",
				"precinct": 34,
				"statistical_murder_flag": "No",
				"vic_age_group": "18-24",
				"vic_sex": "Male",
				"vic_race": "White Hispanic"

			},
			{
				"occur_year": "2006",
				"boro": "Manhattan",
				"precinct": 34,
				"statistical_murder_flag": "No",
				"vic_age_group": "18-24",
				"vic_sex": "Male",
				"vic_race": "White Hispanic"
			},
			{
				"occur_year": "2006",
				"boro": "The Bronx",
				"precinct": 44,
				"statistical_murder_flag": "No",
				"vic_age_group": "18-24",
				"vic_sex": "Female",
				"vic_race": "White Hispanic"
			}
		]

}

'''

In [22]:
my_json

'\n{\n\t"gun-data":\n\n\t\t[{\n\t\t\t\t"occur_year": "2006",\n\t\t\t\t"boro": "The Bronx",\n\t\t\t\t"precinct": 41,\n\t\t\t\t"statistical_murder_flag": "No",\n\t\t\t\t"vic_age_group": "25-44",\n\t\t\t\t"vic_sex": "Male",\n\t\t\t\t"vic_race": "White Hispanic"\n\t\t\t},\n\n\t\t\t{\n\n\t\t\t\t"occur_year": "2006",\n\t\t\t\t"boro": "Manhattan",\n\t\t\t\t"precinct": 34,\n\t\t\t\t"statistical_murder_flag": "No",\n\t\t\t\t"vic_age_group": "18-24",\n\t\t\t\t"vic_sex": "Male",\n\t\t\t\t"vic_race": "White Hispanic"\n\n\t\t\t},\n\t\t\t{\n\t\t\t\t"occur_year": "2006",\n\t\t\t\t"boro": "Manhattan",\n\t\t\t\t"precinct": 34,\n\t\t\t\t"statistical_murder_flag": "No",\n\t\t\t\t"vic_age_group": "18-24",\n\t\t\t\t"vic_sex": "Male",\n\t\t\t\t"vic_race": "White Hispanic"\n\t\t\t},\n\t\t\t{\n\t\t\t\t"occur_year": "2006",\n\t\t\t\t"boro": "The Bronx",\n\t\t\t\t"precinct": 44,\n\t\t\t\t"statistical_murder_flag": "No",\n\t\t\t\t"vic_age_group": "18-24",\n\t\t\t\t"vic_sex": "Female",\n\t\t\t\t"vic_race": "White

In [24]:
## print
print(my_json)


{
	"gun-data":

		[{
				"occur_year": "2006",
				"boro": "The Bronx",
				"precinct": 41,
				"statistical_murder_flag": "No",
				"vic_age_group": "25-44",
				"vic_sex": "Male",
				"vic_race": "White Hispanic"
			},

			{

				"occur_year": "2006",
				"boro": "Manhattan",
				"precinct": 34,
				"statistical_murder_flag": "No",
				"vic_age_group": "18-24",
				"vic_sex": "Male",
				"vic_race": "White Hispanic"

			},
			{
				"occur_year": "2006",
				"boro": "Manhattan",
				"precinct": 34,
				"statistical_murder_flag": "No",
				"vic_age_group": "18-24",
				"vic_sex": "Male",
				"vic_race": "White Hispanic"
			},
			{
				"occur_year": "2006",
				"boro": "The Bronx",
				"precinct": 44,
				"statistical_murder_flag": "No",
				"vic_age_group": "18-24",
				"vic_sex": "Female",
				"vic_race": "White Hispanic"
			}
		]

}




In [23]:
type(my_json)

str

In [26]:
## load json string
guns_s = json.loads(my_json)
guns_s

{'gun-data': [{'occur_year': '2006',
   'boro': 'The Bronx',
   'precinct': 41,
   'statistical_murder_flag': 'No',
   'vic_age_group': '25-44',
   'vic_sex': 'Male',
   'vic_race': 'White Hispanic'},
  {'occur_year': '2006',
   'boro': 'Manhattan',
   'precinct': 34,
   'statistical_murder_flag': 'No',
   'vic_age_group': '18-24',
   'vic_sex': 'Male',
   'vic_race': 'White Hispanic'},
  {'occur_year': '2006',
   'boro': 'Manhattan',
   'precinct': 34,
   'statistical_murder_flag': 'No',
   'vic_age_group': '18-24',
   'vic_sex': 'Male',
   'vic_race': 'White Hispanic'},
  {'occur_year': '2006',
   'boro': 'The Bronx',
   'precinct': 44,
   'statistical_murder_flag': 'No',
   'vic_age_group': '18-24',
   'vic_sex': 'Female',
   'vic_race': 'White Hispanic'}]}

In [None]:
## type of data


In [27]:
## get keys
guns_s.get("gun-data")

[{'occur_year': '2006',
  'boro': 'The Bronx',
  'precinct': 41,
  'statistical_murder_flag': 'No',
  'vic_age_group': '25-44',
  'vic_sex': 'Male',
  'vic_race': 'White Hispanic'},
 {'occur_year': '2006',
  'boro': 'Manhattan',
  'precinct': 34,
  'statistical_murder_flag': 'No',
  'vic_age_group': '18-24',
  'vic_sex': 'Male',
  'vic_race': 'White Hispanic'},
 {'occur_year': '2006',
  'boro': 'Manhattan',
  'precinct': 34,
  'statistical_murder_flag': 'No',
  'vic_age_group': '18-24',
  'vic_sex': 'Male',
  'vic_race': 'White Hispanic'},
 {'occur_year': '2006',
  'boro': 'The Bronx',
  'precinct': 44,
  'statistical_murder_flag': 'No',
  'vic_age_group': '18-24',
  'vic_sex': 'Female',
  'vic_race': 'White Hispanic'}]

In [None]:
## turn into a dataframe
