# Working with Known JSON Schemas - Lab

## Introduction
In this lab you'll practice working with json files whose schema you know beforehand.

### Objectives
YWBAT
* Give 2 or more examples of why a schema is important
    * To insert correct types
    * Schema provides an nice outline
    * Schema and is a good description of the file
    * Plan your feature engineering/ analysis
* Use the schema to enhance our code in python

## Objectives
You will be able to:
* Read JSON Documentation Schemas and translate into code
* Extract data from known json schemas
* Write data to predefined JSON schemas

## Reading a JSON Schema

Here's the JSON schema provided for a section of the NY Times API:
<img src="nytimes_movie_schema.png" width=500>

or a fully expanded view:

<img src="nytimes_movie_schema_detailed.png" width=500>

You can see this yourself here:
https://developer.nytimes.com/movie_reviews_v2.json#/Documentation/GET/critics/%7Bresource-type%7D.json

You can see that the master structure is a dictionary and has a key named 'response'. This is also a dictionary and has two keys: 'data' and 'meta'. As you continue to examine the schema hierarchy, you'll notice the vast majority in this case are dictionaries. 

## Loading the Data File

Start by importing the json file. The sample response from the api is stored in a file **ny_times_movies.json**

In [1]:
#Your code here
import pandas as pd
import json

In [12]:
string_dict = '''{"a": 1, "b":2}'''

## Loading Specific Data

Create a DataFrame of the major data container within the json file, listed under the 'results' heading in the schema above.

In [31]:
#Your code here
# Read this in 'normally'
# load vs loads ? 
# loads -> strings (.txt)
# load -> .json file

# read this in 'safely'

with open("ny_times_movies.json") as f: # auto close your file
    movies_data = json.load(f)
    
with open("ny_times_response.json") as f:
    ny_response = json.load(f)

In [32]:
for k, v in ny_response.items():
    print("{} is type {}".format(k, type(v)))

status is type <class 'str'>
copyright is type <class 'str'>
response is type <class 'dict'>


In [33]:
movies_data

# what to do first?
for k, v in movies_data.items():
    print(k)
    print(v)
    print("\n\n\n")

status
OK




copyright
Copyright (c) 2018 The New York Times Company. All Rights Reserved.




has_more
True




num_results
20




results
[{'display_title': 'Can You Ever Forgive Me', 'mpaa_rating': 'R', 'critics_pick': 1, 'byline': 'A.O. SCOTT', 'headline': 'Review: Melissa McCarthy Is Criminally Good in ‘Can You Ever Forgive Me?’', 'summary_short': 'Marielle Heller directs a true story of literary fraud, set amid the bookstores and gay bars of early ’90s Manhattan.', 'publication_date': '2018-10-16', 'opening_date': '2018-10-19', 'date_updated': '2018-10-17 02:44:23', 'link': {'type': 'article', 'url': 'http://www.nytimes.com/2018/10/16/movies/can-you-ever-forgive-me-review-melissa-mccarthy.html', 'suggested_link_text': 'Read the New York Times Review of Can You Ever Forgive Me'}, 'multimedia': {'type': 'mediumThreeByTwo210', 'src': 'https://static01.nyt.com/images/2018/10/19/arts/19CANYOUEVER-1/19CANYOUEVER-1-mediumThreeByTwo210.jpg', 'width': 210, 'height': 140}}, {'display_titl

In [58]:
movies_data["results"][0]['link']['url']

'http://www.nytimes.com/2018/10/16/movies/can-you-ever-forgive-me-review-melissa-mccarthy.html'

## How many unique critics are there?

In [44]:
#Your code here
critics = []
for x in movies_data['results']:
    critics.append(x['byline'])
print(critics)
len(critics)

['A.O. SCOTT', 'BEN KENIGSBERG', 'GLENN KENNY', 'A. O. SCOTT', 'BEN KENIGSBERG', 'A.O. SCOTT', 'GLENN KENNY', 'JEANNETTE CATSOULIS', 'JEANNETTE CATSOULIS', 'BEN KENIGSBERG', 'MANOHLA DARGIS', 'A.O. SCOTT', 'GLENN KENNY', 'KEN JAWOROWSKI', 'TEO BUGBEE', 'KEN JAWOROWSKI', 'GLENN KENNY', 'KEN JAWOROWSKI', 'TEO BUGBEE', 'A.O. SCOTT']


20

In [50]:
# get uniqueness using a list and an if statement
critics = []
for result in movies_data['results']:
    author = result['byline']
    if author not in critics:
        critics.append(author)
print(critics)
len(critics)

['A.O. SCOTT', 'BEN KENIGSBERG', 'GLENN KENNY', 'A. O. SCOTT', 'JEANNETTE CATSOULIS', 'MANOHLA DARGIS', 'KEN JAWOROWSKI', 'TEO BUGBEE']


8

In [52]:
# use a set!
critics = set()
for result in movies_data['results']:
    author = result['byline'].replace(" ", "")
    critics.add(author)
print(critics)
len(critics)

{'BENKENIGSBERG', 'GLENNKENNY', 'TEOBUGBEE', 'A.O.SCOTT', 'KENJAWOROWSKI', 'JEANNETTECATSOULIS', 'MANOHLADARGIS'}


7

In [49]:
critics_srs = pd.Series(critics)
critics_srs.value_counts().shape[0]

8

## Create a new column for the review's url. Title the column 'review_url'

In [59]:
#Your code here
for result in movies_data['results']:
    result['review_url'] = result['link']['url']

In [63]:
movies_data['results'][0]['review_url']

'http://www.nytimes.com/2018/10/16/movies/can-you-ever-forgive-me-review-melissa-mccarthy.html'

## How many results are in the file? 

In [65]:
#Your code here
len(movies_data['results'])

20

In [67]:
f = open("new_ny_data_movies.json", 'w')

In [68]:
json.dump(movies_data, f)

In [69]:
f.close()

## Summary
Well done! Here you continued to gather practice extracting data from JSON files and transforming them into our standard tool of Pandas DataFrames.