# Part 2 - Using an API

In [33]:
import requests
import pandas as pd
import numpy as np

## 1. Retrieve the data, and examine it.

In [34]:
res = requests.get('http://linserv1.cims.nyu.edu:10000/films?_page=1')

In [35]:
data = res.json()

In [36]:
for film in data:
    print("{} film's dictionary information\n".format(film['title']))
    print("keys of the dictionary: {}\n".format(film.keys()))
    print("values of the dictionary: {}".format(film.values()))
    print("\n")

Castle in the Sky film's dictionary information

keys of the dictionary: dict_keys(['id', 'title', 'original_title', 'original_title_romanised', 'description', 'director', 'producer', 'release_date', 'running_time', 'rt_score', 'people', 'species', 'locations', 'vehicles', 'url'])

values of the dictionary: dict_values(['2baf70d1-42bb-4437-b551-e5fed5a87abe', 'Castle in the Sky', '天空の城ラピュタ', 'Tenkū no shiro Rapyuta', "The orphan Sheeta inherited a mysterious crystal that links her to the mythical sky-kingdom of Laputa. With the help of resourceful Pazu and a rollicking band of sky pirates, she makes her way to the ruins of the once-great civilization. Sheeta and Pazu must outwit the evil Muska, who plans to use Laputa's science to make himself ruler of the world.", 'Hayao Miyazaki', 'Isao Takahata', '1986', '124', '95', ['https://ghibliapi.herokuapp.com/people/'], ['https://ghibliapi.herokuapp.com/species/af3910a6-429f-4c74-9ad5-dfe1c4aa04f2'], ['https://ghibliapi.herokuapp.com/locatio

To creat the report, I need to extract `director` and `rt_score` from the dictionary

In [37]:
res = requests.get('http://linserv1.cims.nyu.edu:10000/films?_page=2')
data = res.json()
data[0]

{'id': 'dc2e6bd1-8156-4886-adff-b39e6043af0c',
 'title': 'Spirited Away',
 'original_title': '千と千尋の神隠し',
 'original_title_romanised': 'Sen to Chihiro no kamikakushi',
 'description': 'Spirited Away is an Oscar winning Japanese animated film about a ten year old girl who wanders away from her parents along a path that leads to a world ruled by strange and unusual monster-like animals. Her parents have been changed into pigs along with others inside a bathhouse full of these creatures. Will she ever see the world how it once was?',
 'director': 'Hayao Miyazaki',
 'producer': 'Toshio Suzuki',
 'release_date': '2001',
 'running_time': '124',
 'rt_score': '97',
 'people': ['https://ghibliapi.herokuapp.com/people/'],
 'species': ['https://ghibliapi.herokuapp.com/species/af3910a6-429f-4c74-9ad5-dfe1c4aa04f2'],
 'locations': ['https://ghibliapi.herokuapp.com/locations/'],
 'vehicles': ['https://ghibliapi.herokuapp.com/vehicles/'],
 'url': 'https://ghibliapi.herokuapp.com/films/dc2e6bd1-8156-48

In [38]:
i = 1
while True:
    url = 'http://linserv1.cims.nyu.edu:10000/films?_page='
    res = requests.get(url + str(i))
    data = res.json()
    if data:
        print("Page {} has data".format(i))
    else:
        print("No more data after Page {}".format(i))
        break
    i += 1

Page 1 has data
Page 2 has data
Page 3 has data
No more data after Page 4


When I modify the url (number of page), it returns different data which means that if we put different number after `films?_page=` of the url we can get different data of different page. Therefore, using while loop, I tried to get the range of the page we can get the data from. Seems like Page 1 - 3 has data in it, but from page 4, it does not return any data to us. Therefore, we will iterate through page 1 to 3 to get whole data of ghibli films.

## 2. Load the data into a DataFrame

In [39]:
ghibli_films = pd.DataFrame()
for page in range(1,4):
    url = 'http://linserv1.cims.nyu.edu:10000/films?_page=' + str(page)
    res = requests.get(url)
    data = res.json()
    this_page_info = pd.DataFrame(data)[['director', 'rt_score']]
    ghibli_films = pd.concat([ghibli_films, this_page_info])

In [40]:
ghibli_films['rt_score'] = pd.to_numeric(ghibli_films['rt_score'], errors='coerce')

In [41]:
ghibli_films

Unnamed: 0,director,rt_score
0,Hayao Miyazaki,95
1,Isao Takahata,97
2,Hayao Miyazaki,93
3,Hayao Miyazaki,96
4,Isao Takahata,100
5,Hayao Miyazaki,94
6,Isao Takahata,78
7,Yoshifumi Kondō,91
8,Hayao Miyazaki,92
9,Isao Takahata,75


In [42]:
ghibli_films.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 21 entries, 0 to 0
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   director  21 non-null     object
 1   rt_score  21 non-null     int64 
dtypes: int64(1), object(1)
memory usage: 504.0+ bytes


## 3. Report

In [43]:
avg_rt_score = ghibli_films.groupby(ghibli_films['director']).mean().sort_values('rt_score',ascending=False)
avg_rt_score = avg_rt_score['rt_score'].reset_index(name='avg_rt_score')

In [44]:
count = ghibli_films.groupby(ghibli_films['director']).size().reset_index(name ='count')

In [45]:
ghibli_film_info = pd.merge(avg_rt_score, count, on = 'director',how='left').set_index('director')

In [46]:
ghibli_film_info

Unnamed: 0_level_0,avg_rt_score,count
director,Unnamed: 1_level_1,Unnamed: 2_level_1
Hiromasa Yonebayashi,93.5,2
Michaël Dudok de Wit,93.0,1
Hayao Miyazaki,92.777778,9
Yoshifumi Kondō,91.0,1
Isao Takahata,90.0,5
Hiroyuki Morita,89.0,1
Gorō Miyazaki,62.0,2
