# Taming JSON

## Part 1 - reading

JSON, or JavaScript Object Notation, is a light-weight data storage format that is now the most prevalent way that information is held in databases and web applications – data that is often critical to our reporting. At the same time JSON can be difficult to parse because of its deeply nested logic.

Let's study a few JSON files to understand their structure.

## Types of JSON objects

### Large but Easy (and in a file)

<a href="https://raw.githubusercontent.com/sandeepmj/datasets/main/guns.json">Download</a>: ```guns.json```

<a href="https://raw.githubusercontent.com/sandeepmj/datasets/main/gun-data-in-json-file.json">Download:</a> ```gun-data-in-json-file.json```

<a href="https://raw.githubusercontent.com/sandeepmj/datasets/main/countries.json">Download</a>: ```countries.json```

What's the difference between these two files?

#### Small but Complex (in your notebook):

In [3]:
## run this MOCK data cell:

[
  {
    "study": "Toxic Water Research",
    "region": "Midwest",
    "scope": "Regional",
    "details": {
      "lead_scientist": "Dr. Emily Carter",
      "contacts": {
        "email": {
          "report_issues": "issues@toxicwaterresearch.com",
          "inquiries": "contact@toxicwaterresearch.com"
        },
        "phone": "800123456"
      }
    }
  },
  {
    "study": "Chemical Contaminants Review",
    "region": "Northeast",
    "scope": "National",
    "details": {
      "lead_scientist": "Dr. Robert Mason",
      "contacts": {
        "email": {
          "report_issues": "issues@chemcontaminantsreview.com",
          "inquiries": "contact@chemcontaminantsreview.com"
        },
        "phone": "800987654"
      }
    }
  }
]


[{'study': 'Toxic Water Research',
  'region': 'Midwest',
  'scope': 'Regional',
  'details': {'lead_scientist': 'Dr. Emily Carter',
   'contacts': {'email': {'report_issues': 'issues@toxicwaterresearch.com',
     'inquiries': 'contact@toxicwaterresearch.com'},
    'phone': '800123456'}}},
 {'study': 'Chemical Contaminants Review',
  'region': 'Northeast',
  'scope': 'National',
  'details': {'lead_scientist': 'Dr. Robert Mason',
   'contacts': {'email': {'report_issues': 'issues@chemcontaminantsreview.com',
     'inquiries': 'contact@chemcontaminantsreview.com'},
    'phone': '800987654'}}}]

#### Massive and Nested (and on a server)

<a href="https://epicovcharts.bii.a-star.edu.sg/variants-dashboard/data/variants_countries_count.json">Global COVID data</a>

## Reading JSON objects

In [5]:
## import libraries
import pandas as pd


### Read ```JSON``` file

#### Sometimes, we are in luck with a cleanly packaged ```JSON``` file that is not nested and plays nice.

All we need is ```pd.read_json("file_path")```

In [None]:
## these files can be read right into a df


#### ... and easily exported as a csv file:

In [None]:
## export as csv file


### What about this file:

```gun-data-in-json-file.json```

In [None]:
## READ into df


## ```json.load()``` v. ```json.loads()```

The ```json``` package has two similarly named methods that each do something quite different:

- ```.load()``` creates a ```Python Dictionary``` from a **```json``` file.**
- ```.loads()``` creates a ```Python Dictionary``` from a **```json``` string**.

In [None]:
## import python's json package


#### Import ```json``` file

In [None]:
## open and load json file


In [6]:
## type object

In [None]:
## type variable

In [None]:
## turn into df

#### Import ```json``` file

In [None]:
## open and load json file


In [None]:
## type of data


In [None]:
## call the keys


In [None]:
## turn into a dataframe


### What do we see?

We'll learn how to deal with this shortly.

## Import ```json``` string

In [None]:
## run this cell that holds a json string
my_json = '''
{
	"gun-data":

		[{
				"occur_year": "2006",
				"boro": "The Bronx",
				"precinct": 41,
				"statistical_murder_flag": "No",
				"vic_age_group": "25-44",
				"vic_sex": "Male",
				"vic_race": "White Hispanic"
			},

			{

				"occur_year": "2006",
				"boro": "Manhattan",
				"precinct": 34,
				"statistical_murder_flag": "No",
				"vic_age_group": "18-24",
				"vic_sex": "Male",
				"vic_race": "White Hispanic"

			},
			{
				"occur_year": "2006",
				"boro": "Manhattan",
				"precinct": 34,
				"statistical_murder_flag": "No",
				"vic_age_group": "18-24",
				"vic_sex": "Male",
				"vic_race": "White Hispanic"
			},
			{
				"occur_year": "2006",
				"boro": "The Bronx",
				"precinct": 44,
				"statistical_murder_flag": "No",
				"vic_age_group": "18-24",
				"vic_sex": "Female",
				"vic_race": "White Hispanic"
			}
		]

}

'''

In [None]:
## load json string


In [None]:
## type of data


In [None]:
## get keys


In [None]:
## turn into a dataframe
