# Working with Known JSON Schemas - Lab

## Introduction
In this lab, you'll practice working with JSON files whose schema you know beforehand.

## Objectives
You will be able to:
* Use the JSON module to load and parse JSON documents
* Write data to predefined JSON schemas
* Convert JSON to a pandas dataframe

## Reading a JSON Schema

Here's the JSON schema provided for a section of the NY Times API:
<img src="images/nytimes_movie_schema.png" width=500>

or a fully expanded view:

<img src="images/nytimes_movie_schema_detailed.png" width=500>

You can more about the documentation [here](https://developer.nytimes.com/docs/movie-reviews-api/1/routes/reviews/%7Btype%7D.json/get).

You can see that the master structure is a dictionary and has a key named 'response'. This is also a dictionary and has two keys: 'data' and 'meta'. As you continue to examine the schema hierarchy, you'll notice the vast majority, in this case, are dictionaries. 

## Loading the Data File

Start by importing the json file. The sample response from the api is stored in a file **ny_times_movies.json**

In [1]:
#Your code here
import json
f=open('ny_times_movies.json','r')
data = json.load(f)

## Loading Specific Data

Create a DataFrame of the major data container within the json file, listed under the 'results' heading in the schema above.

In [6]:
#Your code here
print(type(data))
print(data.keys())

<class 'dict'>
dict_keys(['status', 'copyright', 'has_more', 'num_results', 'results'])


In [12]:
# data is a dictionary with 5 keys
# 'status', 'copyright', 'has_more', 'num_results', 'results'
for k in data.keys():
    print(type(data[k]))


<class 'str'>
<class 'str'>
<class 'bool'>
<class 'int'>
<class 'list'>


In [18]:
# the 5ht element in data is a nested structure: It is a list. data['results']
print (data['status'])
print (data['copyright'])
print (data['has_more'])
print (data['num_results'])


OK
Copyright (c) 2018 The New York Times Company. All Rights Reserved.
True
20


In [19]:
len(data['results'])

20

In [21]:
type(data['results'][0])

dict

In [22]:
type(data['results'][0]) == type(data['results'][6])

True

 the 5ht element in data is a nested structure: It is a list. data['results']
 A list of 20 dictionaries

In [4]:
import pandas as pd
df=pd.DataFrame.from_dict(data['results'])
print(df.shape)
df.columns


(20, 11)


Index(['display_title', 'mpaa_rating', 'critics_pick', 'byline', 'headline',
       'summary_short', 'publication_date', 'opening_date', 'date_updated',
       'link', 'multimedia'],
      dtype='object')

In [5]:
df.head()

Unnamed: 0,display_title,mpaa_rating,critics_pick,byline,headline,summary_short,publication_date,opening_date,date_updated,link,multimedia
0,Can You Ever Forgive Me,R,1,A.O. SCOTT,Review: Melissa McCarthy Is Criminally Good in...,Marielle Heller directs a true story of litera...,2018-10-16,2018-10-19,2018-10-17 02:44:23,"{'type': 'article', 'url': 'http://www.nytimes...","{'type': 'mediumThreeByTwo210', 'src': 'https:..."
1,Charm City,,1,BEN KENIGSBERG,Review: â€˜Charm Cityâ€™ Vividly Captures the ...,Marilyn Nessâ€™s documentary is dedicated to t...,2018-10-16,2018-04-22,2018-10-16 11:04:03,"{'type': 'article', 'url': 'http://www.nytimes...","{'type': 'mediumThreeByTwo210', 'src': 'https:..."
2,Horn from the Heart: The Paul Butterfield Story,,1,GLENN KENNY,Review: Paul Butterfieldâ€™s Story Is Told in ...,A documentary explores the life of the blues m...,2018-10-16,2018-10-19,2018-10-16 11:04:04,"{'type': 'article', 'url': 'http://www.nytimes...","{'type': 'mediumThreeByTwo210', 'src': 'https:..."
3,The Price of Everything,,0,A.O. SCOTT,Review: â€˜The Price of Everythingâ€™ Asks $56...,This documentary examines the global art marke...,2018-10-16,2018-10-19,2018-10-16 16:08:03,"{'type': 'article', 'url': 'http://www.nytimes...","{'type': 'mediumThreeByTwo210', 'src': 'https:..."
4,Impulso,,0,BEN KENIGSBERG,Review: â€˜Impulsoâ€™ Goes Backstage With a Fl...,"This documentary follows RocÃ­o Molina, a cutt...",2018-10-16,,2018-10-16 11:04:03,"{'type': 'article', 'url': 'http://www.nytimes...","{'type': 'mediumThreeByTwo210', 'src': 'https:..."


In [20]:
df.iloc[0]

display_title                                 Can You Ever Forgive Me
mpaa_rating                                                         R
critics_pick                                                        1
byline                                                     A.O. SCOTT
headline            Review: Melissa McCarthy Is Criminally Good in...
summary_short       Marielle Heller directs a true story of litera...
publication_date                                           2018-10-16
opening_date                                               2018-10-19
date_updated                                      2018-10-17 02:44:23
link                {'type': 'article', 'url': 'http://www.nytimes...
multimedia          {'type': 'mediumThreeByTwo210', 'src': 'https:...
Name: 0, dtype: object

In [19]:
df.iloc[0]['summary_short']

'Marielle Heller directs a true story of literary fraud, set amid the bookstores and gay bars of early â€™90s Manhattan.'

## How many unique critics are there?

In [30]:
#critic name in byline column
print ( 'there are', len(df['byline'].unique()), 'unique critics')

there are 7 unique critics


## Create a new column for the review's url. Title the column 'review_url'

In [42]:
df['link'][0]


{'type': 'article',
 'url': 'http://www.nytimes.com/2018/10/16/movies/can-you-ever-forgive-me-review-melissa-mccarthy.html',
 'suggested_link_text': 'Read the New York Times Review of Can You Ever Forgive Me'}

In [47]:
df['link'][0]['url']

'http://www.nytimes.com/2018/10/16/movies/can-you-ever-forgive-me-review-melissa-mccarthy.html'

In [49]:
#Your code here

keys=df['link'][0].keys()
new_cols=[]
for k in keys:
    new_col='link_{}'.format(k)
    df[new_col] = df['link'].map(lambda x:x[k])
    new_cols.append(new_col)
df.head()

Unnamed: 0,display_title,mpaa_rating,critics_pick,byline,headline,summary_short,publication_date,opening_date,date_updated,link,multimedia,link_type,link_url,link_suggested_link_text
0,Can You Ever Forgive Me,R,1,A.O. SCOTT,Review: Melissa McCarthy Is Criminally Good in...,Marielle Heller directs a true story of litera...,2018-10-16,2018-10-19,2018-10-17 02:44:23,"{'type': 'article', 'url': 'http://www.nytimes...","{'type': 'mediumThreeByTwo210', 'src': 'https:...",article,http://www.nytimes.com/2018/10/16/movies/can-y...,Read the New York Times Review of Can You Ever...
1,Charm City,,1,BEN KENIGSBERG,Review: â€˜Charm Cityâ€™ Vividly Captures the ...,Marilyn Nessâ€™s documentary is dedicated to t...,2018-10-16,2018-04-22,2018-10-16 11:04:03,"{'type': 'article', 'url': 'http://www.nytimes...","{'type': 'mediumThreeByTwo210', 'src': 'https:...",article,http://www.nytimes.com/2018/10/16/movies/charm...,Read the New York Times Review of Charm City
2,Horn from the Heart: The Paul Butterfield Story,,1,GLENN KENNY,Review: Paul Butterfieldâ€™s Story Is Told in ...,A documentary explores the life of the blues m...,2018-10-16,2018-10-19,2018-10-16 11:04:04,"{'type': 'article', 'url': 'http://www.nytimes...","{'type': 'mediumThreeByTwo210', 'src': 'https:...",article,http://www.nytimes.com/2018/10/16/movies/horn-...,Read the New York Times Review of Horn from th...
3,The Price of Everything,,0,A.O. SCOTT,Review: â€˜The Price of Everythingâ€™ Asks $56...,This documentary examines the global art marke...,2018-10-16,2018-10-19,2018-10-16 16:08:03,"{'type': 'article', 'url': 'http://www.nytimes...","{'type': 'mediumThreeByTwo210', 'src': 'https:...",article,http://www.nytimes.com/2018/10/16/movies/the-p...,Read the New York Times Review of The Price of...
4,Impulso,,0,BEN KENIGSBERG,Review: â€˜Impulsoâ€™ Goes Backstage With a Fl...,"This documentary follows RocÃ­o Molina, a cutt...",2018-10-16,,2018-10-16 11:04:03,"{'type': 'article', 'url': 'http://www.nytimes...","{'type': 'mediumThreeByTwo210', 'src': 'https:...",article,http://www.nytimes.com/2018/10/16/movies/impul...,Read the New York Times Review of Impulso


## How many results are in the file?

In [50]:
#Your code here
data.keys()

dict_keys(['status', 'copyright', 'has_more', 'num_results', 'results'])

In [51]:
data['num_results']

20

In [53]:
len(data['results'])

20

## Summary
Well done! Here you continued to gather practice extracting data from JSON files and transforming them into our standard tool of Pandas DataFrames.