## Part 2 - Using an API
# Overview

In this part of the assignment, you'll request data from a server in json format, parse it, and load it into a DataFrame. Using this DataFrame you'll use aggregations to produce a report.

The data set is composed of films from the Japanese animation film studio, Studio Ghibli.

It is being served from a mirror of the data on linserv1.cims.nyu.edu. Note, however, that the original data is from https://ghibliapi.herokuapp.com/, which is under an MIT License. This is mirrored so that we do not overwhelm the original data source with requests.

## Instructions
The goal of the assignment is to create a report showing director's names, the number of Ghibli films that the directors was involved in, and the average rotten tomatoes score of the Studio Ghibli films made by that director.



**1. Retrieve the data, and examine it.**

- In `films.ipynb', programmatically retrieve one page of json from this URL: http://linserv1.cims.nyu.edu:10000/films?_page=1
- You can use requests to do this
    -  you can use the json module to manually parse the response content
    - Or …. use a feature of the requests module that allows immediate parsing of a json response by calling the json() method
    - r = requests.get('some.url')
    - d = r.json() # parses json into dictionary!
- Examine the keys and values of the dictionary
- In a markdown cell, write out what keys you may be interested in to create the report specified above
- Try incrementing the last number in the url where page is 1 … do you get different results?
- In a markdown cell, describe what happens when you modify the url

In [1]:
import requests
import json
import pandas as pd
import numpy as np
link = 'http://linserv1.cims.nyu.edu:10000/films?_page=1'
d = requests.get(link).json()
#case = d[0]

In [3]:
for i, case in enumerate(d):
    k_v_pairs = [(k, v) for k, v in case.items()]
    print(f'case{i}:{k_v_pairs}')

case0:[('id', '2baf70d1-42bb-4437-b551-e5fed5a87abe'), ('title', 'Castle in the Sky'), ('original_title', '天空の城ラピュタ'), ('original_title_romanised', 'Tenkū no shiro Rapyuta'), ('description', "The orphan Sheeta inherited a mysterious crystal that links her to the mythical sky-kingdom of Laputa. With the help of resourceful Pazu and a rollicking band of sky pirates, she makes her way to the ruins of the once-great civilization. Sheeta and Pazu must outwit the evil Muska, who plans to use Laputa's science to make himself ruler of the world."), ('director', 'Hayao Miyazaki'), ('producer', 'Isao Takahata'), ('release_date', '1986'), ('running_time', '124'), ('rt_score', '95'), ('people', ['https://ghibliapi.herokuapp.com/people/']), ('species', ['https://ghibliapi.herokuapp.com/species/af3910a6-429f-4c74-9ad5-dfe1c4aa04f2']), ('locations', ['https://ghibliapi.herokuapp.com/locations/']), ('vehicles', ['https://ghibliapi.herokuapp.com/vehicles/']), ('url', 'https://ghibliapi.herokuapp.com/fi

I am interested in keys including 'director', 'rt_score' to create the report in this problem. 

In [22]:
link_2 = 'http://linserv1.cims.nyu.edu:10000/films?_page=2'
d_2 = requests.get(link_2).json()
case_2 = d_2[0]
case_2

{'id': 'dc2e6bd1-8156-4886-adff-b39e6043af0c',
 'title': 'Spirited Away',
 'original_title': '千と千尋の神隠し',
 'original_title_romanised': 'Sen to Chihiro no kamikakushi',
 'description': 'Spirited Away is an Oscar winning Japanese animated film about a ten year old girl who wanders away from her parents along a path that leads to a world ruled by strange and unusual monster-like animals. Her parents have been changed into pigs along with others inside a bathhouse full of these creatures. Will she ever see the world how it once was?',
 'director': 'Hayao Miyazaki',
 'producer': 'Toshio Suzuki',
 'release_date': '2001',
 'running_time': '124',
 'rt_score': '97',
 'people': ['https://ghibliapi.herokuapp.com/people/'],
 'species': ['https://ghibliapi.herokuapp.com/species/af3910a6-429f-4c74-9ad5-dfe1c4aa04f2'],
 'locations': ['https://ghibliapi.herokuapp.com/locations/'],
 'vehicles': ['https://ghibliapi.herokuapp.com/vehicles/'],
 'url': 'https://ghibliapi.herokuapp.com/films/dc2e6bd1-8156-48

**What happens when I modify the URL**

We get different results when trying to increment the last number in the url from 1 to 2. When we modify the url, it brings us to the next page on the web with a new list of film data differen from that on the previous page. 

**2. Load the data into a DataFrame**

Make a request to http://linserv1.cims.nyu.edu:10000/films?_page=1 again, but this time, load the result into a DataFrame
Continue collecting additional data and adding to the DataFrame until there is no more data to retrieve

In [39]:
i = 1
director_and_score = []
while True:

    link = f'http://linserv1.cims.nyu.edu:10000/films?_page={i}'
    d = requests.get(link).json()
    if d==[]:
        break
    for film in d:
        director_and_score.append([film['director'], int(film['rt_score'])])
    i +=1

director_score_df = pd.DataFrame(director_and_score, columns = ['director', 'rt_score'])
director_score_df 

Unnamed: 0,director,rt_score
0,Hayao Miyazaki,95
1,Isao Takahata,97
2,Hayao Miyazaki,93
3,Hayao Miyazaki,96
4,Isao Takahata,100
5,Hayao Miyazaki,94
6,Isao Takahata,78
7,Yoshifumi Kondō,91
8,Hayao Miyazaki,92
9,Isao Takahata,75


**3. Report**

Create a report that shows:

- the directors' names as the index (Note that the index.name can be set to get what appears to be a title for the index)
- the average rotten tomatoes score (review aggregator website)
- the number of films directed
- concat and groupby may be helpful

In [53]:
ret = director_score_df.groupby('director').agg(['mean', 'count'])
ret.columns = ret.columns.droplevel()
ret.sort_values(by = 'mean', ascending = False, inplace = True)
ret

Unnamed: 0_level_0,mean,count
director,Unnamed: 1_level_1,Unnamed: 2_level_1
Hiromasa Yonebayashi,93.5,2
Michaël Dudok de Wit,93.0,1
Hayao Miyazaki,92.777778,9
Yoshifumi Kondō,91.0,1
Isao Takahata,90.0,5
Hiroyuki Morita,89.0,1
Gorō Miyazaki,62.0,2
