# Working with Known JSON Schemas - Lab

## Introduction
In this lab, you'll practice working with JSON files whose schema you know beforehand.

## Objectives
You will be able to:
* Use the JSON module to load and parse JSON documents
* Extract data using predefined JSON schemas
* Convert JSON to a pandas dataframe

## Reading a JSON Schema

Here's the JSON schema provided for a section of the NY Times API:
<img src="images/nytimes_movie_schema.png" width=500>

or a fully expanded view:

<img src="images/nytimes_movie_schema_detailed.png" width=500>

You can more about the documentation [here](https://developer.nytimes.com/docs/movie-reviews-api/1/routes/reviews/%7Btype%7D.json/get).

Note that **this is a different schema than the schema used in the previous lesson**, although both come from the New York Times.

## Loading the JSON Data

Open the JSON file located at `ny_times_movies.json`, and use the `json` module to load the data into a variable called `data`.

In [1]:
import json
with open('ny_times_movies.json', 'r') as f:
    data = json.load(f)

Run the code below to investigate its contents:

In [2]:
print("`data` has type", type(data))
print("The keys are", list(data.keys()))

`data` has type <class 'dict'>
The keys are ['status', 'copyright', 'has_more', 'num_results', 'results']


## Loading Results

Create a variable `results` that contains the value associated with the `'results'` key.

In [3]:
results = data['results']

Below we display this variable as a table using pandas:

In [4]:
import pandas as pd
df = pd.DataFrame(results)
df

Unnamed: 0,display_title,mpaa_rating,critics_pick,byline,headline,summary_short,publication_date,opening_date,date_updated,link,multimedia
0,Can You Ever Forgive Me,R,1,A.O. SCOTT,Review: Melissa McCarthy Is Criminally Good in...,Marielle Heller directs a true story of litera...,2018-10-16,2018-10-19,2018-10-17 02:44:23,"{'type': 'article', 'url': 'http://www.nytimes...","{'type': 'mediumThreeByTwo210', 'src': 'https:..."
1,Charm City,,1,BEN KENIGSBERG,Review: ‘Charm City’ Vividly Captures the Stre...,Marilyn Ness’s documentary is dedicated to the...,2018-10-16,2018-04-22,2018-10-16 11:04:03,"{'type': 'article', 'url': 'http://www.nytimes...","{'type': 'mediumThreeByTwo210', 'src': 'https:..."
2,Horn from the Heart: The Paul Butterfield Story,,1,GLENN KENNY,Review: Paul Butterfield’s Story Is Told in ‘H...,A documentary explores the life of the blues m...,2018-10-16,2018-10-19,2018-10-16 11:04:04,"{'type': 'article', 'url': 'http://www.nytimes...","{'type': 'mediumThreeByTwo210', 'src': 'https:..."
3,The Price of Everything,,0,A.O. SCOTT,Review: ‘The Price of Everything’ Asks $56 Bil...,This documentary examines the global art marke...,2018-10-16,2018-10-19,2018-10-16 16:08:03,"{'type': 'article', 'url': 'http://www.nytimes...","{'type': 'mediumThreeByTwo210', 'src': 'https:..."
4,Impulso,,0,BEN KENIGSBERG,Review: ‘Impulso’ Goes Backstage With a Flamen...,"This documentary follows Rocío Molina, a cutti...",2018-10-16,,2018-10-16 11:04:03,"{'type': 'article', 'url': 'http://www.nytimes...","{'type': 'mediumThreeByTwo210', 'src': 'https:..."
5,Watergate,,1,A.O. SCOTT,Review: ‘Watergate’ Shocks Anew With Its True ...,Charles Ferguson delivers a comprehensive docu...,2018-10-11,2018-10-12,2018-10-17 02:44:21,"{'type': 'article', 'url': 'http://www.nytimes...","{'type': 'mediumThreeByTwo210', 'src': 'https:..."
6,Barbara,,1,GLENN KENNY,"Review: In ‘Barbara,’ a Fictional Biopic of a ...",It’s a film of scenes rather than of one unifi...,2018-10-11,,2018-10-17 02:44:21,"{'type': 'article', 'url': 'http://www.nytimes...","{'type': 'mediumThreeByTwo210', 'src': 'https:..."
7,Over the Limit,,1,JEANNETTE CATSOULIS,Review: A Russian Gymnast Goes ‘Over the Limit’,Margarita Mamun endures injury and abuse in Ma...,2018-10-11,2018-10-05,2018-10-17 02:44:20,"{'type': 'article', 'url': 'http://www.nytimes...","{'type': 'mediumThreeByTwo210', 'src': 'https:..."
8,The Kindergarten Teacher,R,1,JEANNETTE CATSOULIS,Review: The Disturbing Obsession of ‘The Kinde...,Maggie Gyllenhaal is riveting as a dissatisfie...,2018-10-11,2018-10-12,2018-10-17 02:44:19,"{'type': 'article', 'url': 'http://www.nytimes...","{'type': 'mediumThreeByTwo210', 'src': 'https:..."
9,Classical Period,,1,BEN KENIGSBERG,"Review: In ‘Classical Period,’ a Deep Dive — R...",This highly original feature is technically in...,2018-10-11,,2018-10-17 02:44:18,"{'type': 'article', 'url': 'http://www.nytimes...","{'type': 'mediumThreeByTwo210', 'src': 'https:..."


## Data Analysis

Now that you have a general sense of the data, answer some questions about it.

### How many results are in the file?

The metadata says this:

In [5]:
data['num_results']

20

Double-check that by looking at `results`. Does it line up?

In [6]:
print("The length of `results` is", len(results))
print("That length equals the 'num_results value?'", len(results) == data['num_results'])

The length of `results` is 20
That length equals the 'num_results value?' True


In [None]:
"""
Yes, the length of the `results` list matches the 'num_results'
reported by the metadata
"""

### How many unique critics are there?

A critic's name can be identified using the `'byline'` key. Assign your answer to the variable `unique_critics`.

In [7]:

# Base Python solution:
unique_critics_set = set()
for result in results:
    unique_critics_set.add(result["byline"])
unique_critics = len(unique_critics_set)

# Pandas solution
unique_critics = df["byline"].nunique()

unique_critics

7

This code checks your answer.

In [8]:
assert unique_critics == 7

## Flattening Data

Create a list `review_urls` that contains the URL for each review. This can be found using the `'url'` key nested under `'link'`.

In [9]:

# First, exploring the structure a bit more to make
# sure we understand it

results[0]['link']

{'type': 'article',
 'url': 'http://www.nytimes.com/2018/10/16/movies/can-you-ever-forgive-me-review-melissa-mccarthy.html',
 'suggested_link_text': 'Read the New York Times Review of Can You Ever Forgive Me'}

In [10]:

# In base Python, we can make the list with list comprehension
review_urls = [result['link']['url'] for result in results]
review_urls

['http://www.nytimes.com/2018/10/16/movies/can-you-ever-forgive-me-review-melissa-mccarthy.html',
 'http://www.nytimes.com/2018/10/16/movies/charm-city-review-baltimore.html',
 'http://www.nytimes.com/2018/10/16/movies/horn-from-the-heart-review-paul-butterfield.html',
 'http://www.nytimes.com/2018/10/16/movies/the-price-of-everything-review-documentary.html',
 'http://www.nytimes.com/2018/10/16/movies/impulso-review-documentary.html',
 'http://www.nytimes.com/2018/10/11/movies/watergate-review-documentary.html',
 'http://www.nytimes.com/2018/10/11/movies/barbara-review.html',
 'http://www.nytimes.com/2018/10/11/movies/over-the-limit-review.html',
 'http://www.nytimes.com/2018/10/11/movies/the-kindergarten-teacher-review.html',
 'http://www.nytimes.com/2018/10/11/movies/classical-period-review.html',
 'http://www.nytimes.com/2018/10/11/movies/bad-times-at-the-el-royale-review.html',
 'http://www.nytimes.com/2018/10/11/movies/beautiful-boy-review-steve-carell.html',
 'http://www.nytimes

In [11]:

# Alternatively, we can use pandas with a lambda function
review_urls = list(df['link'].apply(lambda links: links['url']))
review_urls

['http://www.nytimes.com/2018/10/16/movies/can-you-ever-forgive-me-review-melissa-mccarthy.html',
 'http://www.nytimes.com/2018/10/16/movies/charm-city-review-baltimore.html',
 'http://www.nytimes.com/2018/10/16/movies/horn-from-the-heart-review-paul-butterfield.html',
 'http://www.nytimes.com/2018/10/16/movies/the-price-of-everything-review-documentary.html',
 'http://www.nytimes.com/2018/10/16/movies/impulso-review-documentary.html',
 'http://www.nytimes.com/2018/10/11/movies/watergate-review-documentary.html',
 'http://www.nytimes.com/2018/10/11/movies/barbara-review.html',
 'http://www.nytimes.com/2018/10/11/movies/over-the-limit-review.html',
 'http://www.nytimes.com/2018/10/11/movies/the-kindergarten-teacher-review.html',
 'http://www.nytimes.com/2018/10/11/movies/classical-period-review.html',
 'http://www.nytimes.com/2018/10/11/movies/bad-times-at-the-el-royale-review.html',
 'http://www.nytimes.com/2018/10/11/movies/beautiful-boy-review-steve-carell.html',
 'http://www.nytimes

The following code will check your answer:

In [12]:

# review_urls should be a list
assert type(review_urls) == list

# The length should be 20, same as the length of reviews
assert len(review_urls) == 20

# The data type contained should be string
assert type(review_urls[0]) == str and type(review_urls[-1]) == str

# Spot checking a specific value
assert review_urls[6] == 'http://www.nytimes.com/2018/10/11/movies/barbara-review.html'

## Summary
Well done! In this lab you continued to practice extracting and transforming data from JSON files with known schemas.