# Data Collection from API

## Learning Objectives
 - How to extract data from external API
 - Read documentation
 - Clean the data
 - Save data to file

Extract data from below url

https://www.themoviedb.org/u/jissmonjose

Api Key: b31fbdae5d7aeb35d9a441db80d85272

Here's an example API request:

https://api.themoviedb.org/3/movie/550?api_key=b31fbdae5d7aeb35d9a441db80d85272

https://developers.themoviedb.org/3/movies/get-popular-movies

In [41]:
import requests
import pandas as pd

r = requests.get('https://api.themoviedb.org/3/genre/movie/list?api_key=b31fbdae5d7aeb35d9a441db80d85272&language=en-US')
r

<Response [200]>

In [42]:
# get text from response
r.text

'{"genres":[{"id":28,"name":"Action"},{"id":12,"name":"Adventure"},{"id":16,"name":"Animation"},{"id":35,"name":"Comedy"},{"id":80,"name":"Crime"},{"id":99,"name":"Documentary"},{"id":18,"name":"Drama"},{"id":10751,"name":"Family"},{"id":14,"name":"Fantasy"},{"id":36,"name":"History"},{"id":27,"name":"Horror"},{"id":10402,"name":"Music"},{"id":9648,"name":"Mystery"},{"id":10749,"name":"Romance"},{"id":878,"name":"Science Fiction"},{"id":10770,"name":"TV Movie"},{"id":53,"name":"Thriller"},{"id":10752,"name":"War"},{"id":37,"name":"Western"}]}'

In [43]:
# convert to dictionary format using json()
data = r.json()

In [44]:
# make tabular format using dataframe
df = pd.DataFrame(data)
df

Unnamed: 0,genres
0,"{'id': 28, 'name': 'Action'}"
1,"{'id': 12, 'name': 'Adventure'}"
2,"{'id': 16, 'name': 'Animation'}"
3,"{'id': 35, 'name': 'Comedy'}"
4,"{'id': 80, 'name': 'Crime'}"
5,"{'id': 99, 'name': 'Documentary'}"
6,"{'id': 18, 'name': 'Drama'}"
7,"{'id': 10751, 'name': 'Family'}"
8,"{'id': 14, 'name': 'Fantasy'}"
9,"{'id': 36, 'name': 'History'}"


In [47]:
data = data['genres']
data
# put id and name in two columns for storing to csv file
# extract id and name from the dictionary and save to separate list

TypeError: TypeError: list indices must be integers or slices, not str

In [51]:

genre_ids= [d['id'] for d in data]
names_list = [d["name"] for d in data]
print(genre_ids)
print(names_list)

[28, 12, 16, 35, 80, 99, 18, 10751, 14, 36, 27, 10402, 9648, 10749, 878, 10770, 53, 10752, 37]
['Action', 'Adventure', 'Animation', 'Comedy', 'Crime', 'Documentary', 'Drama', 'Family', 'Fantasy', 'History', 'Horror', 'Music', 'Mystery', 'Romance', 'Science Fiction', 'TV Movie', 'Thriller', 'War', 'Western']


In [53]:
# next format it to dataframe
df = pd.DataFrame({'id': genre_ids, 'name': names_list})
df

Unnamed: 0,id,name
0,28,Action
1,12,Adventure
2,16,Animation
3,35,Comedy
4,80,Crime
5,99,Documentary
6,18,Drama
7,10751,Family
8,14,Fantasy
9,36,History


In [55]:
# Now we get data in id and name columns repsectively
# save this to csv file use to_csv(location, index needed or not)
df.to_csv('./movies_genre.csv', index=False)

#   Assignment - Data extraction and formatting

Extracting popular movies and their metadata
In this assignment, you are going to use the TMDB API to extract popular movies and their associated data.

Pre-requisites - your API key:

Make sure you have signed up on the https://www.themoviedb.org/
Login to your account.
Get your API key as shown in the lecture.
Make sure you have the requests package installed and watch the lecture on extracting data.
Steps:

Go to the developer documentation page https://developers.themoviedb.org/.
Expand the MOVIES section and inside movies in the left navigation panel, use the Get Popular endpoint to extract the data.
Feel free to extract as many keys/information from the JSON data as you like from the API response.
Collect the data, format/clean it, and write it to a CSV file.

Outpout shiould like movie_screen.jpg file

In [64]:
import pandas as pd
import requests
response = requests.get("""
https://api.themoviedb.org/3/movie/popular?api_key=b31fbdae5d7aeb35d9a441db80d85272&language=en-US&page=1""")
data = response.json()
data

{'page': 1,
 'results': [{'adult': False,
   'backdrop_path': '/jlGmlFOcfo8n5tURmhC7YVd4Iyy.jpg',
   'genre_ids': [28, 12, 14, 35],
   'id': 436969,
   'original_language': 'en',
   'original_title': 'The Suicide Squad',
   'overview': 'Supervillains Harley Quinn, Bloodsport, Peacemaker and a collection of nutty cons at Belle Reve prison join the super-secret, super-shady Task Force X as they are dropped off at the remote, enemy-infused island of Corto Maltese.',
   'popularity': 3152.029,
   'poster_path': '/kb4s0ML0iVZlG6wAKbbs9NAm6X.jpg',
   'release_date': '2021-07-28',
   'title': 'The Suicide Squad',
   'video': False,
   'vote_average': 8,
   'vote_count': 3240},
  {'adult': False,
   'backdrop_path': '/nprqOIEfiMMQx16lgKeLf3rmPrR.jpg',
   'genre_ids': [28, 53, 18],
   'id': 619297,
   'original_language': 'en',
   'original_title': 'Sweet Girl',
   'overview': "A devastated husband vows to bring justice to the people responsible for his wife's death while protecting the only fa

In [70]:
data1 = data['results']
data1

[{'adult': False,
  'backdrop_path': '/jlGmlFOcfo8n5tURmhC7YVd4Iyy.jpg',
  'genre_ids': [28, 12, 14, 35],
  'id': 436969,
  'original_language': 'en',
  'original_title': 'The Suicide Squad',
  'overview': 'Supervillains Harley Quinn, Bloodsport, Peacemaker and a collection of nutty cons at Belle Reve prison join the super-secret, super-shady Task Force X as they are dropped off at the remote, enemy-infused island of Corto Maltese.',
  'popularity': 3152.029,
  'poster_path': '/kb4s0ML0iVZlG6wAKbbs9NAm6X.jpg',
  'release_date': '2021-07-28',
  'title': 'The Suicide Squad',
  'video': False,
  'vote_average': 8,
  'vote_count': 3240},
 {'adult': False,
  'backdrop_path': '/nprqOIEfiMMQx16lgKeLf3rmPrR.jpg',
  'genre_ids': [28, 53, 18],
  'id': 619297,
  'original_language': 'en',
  'original_title': 'Sweet Girl',
  'overview': "A devastated husband vows to bring justice to the people responsible for his wife's death while protecting the only family he has left, his daughter.",
  'popular

In [75]:
title_list = [d1['title'] for d1 in data1]
rel_date_list = [d1['release_date'] for d1 in data1]
adult_rate_list = [d1['adult'] for d1 in data1]
org_lang_list = [d1['original_language'] for d1 in data1]
pop_list = [d1['popularity'] for d1 in data1]
votes = [d1['vote_average'] for d1 in data1]
print(title_list)
print(rel_date_list)
print(adult_rate_list)
print(org_lang_list)
print(pop_list)
print(votes)

['The Suicide Squad', 'Sweet Girl', 'Jungle Cruise', 'PAW Patrol: The Movie', 'Eggs Run', 'Black Widow', 'Space Jam: A New Legacy', 'Narco Sub', 'Free Guy', 'Infinite', 'Beckett', 'F9', 'The Boss Baby: Family Business', 'El mesero', 'Luca', 'The Tomorrow War', 'Breathless', 'The Last Mercenary', 'The Loud House Movie', 'Jolt']
['2021-07-28', '2021-08-18', '2021-07-28', '2021-08-09', '2021-08-12', '2021-07-07', '2021-07-08', '2021-07-22', '2021-08-11', '2021-06-10', '2021-08-04', '2021-05-19', '2021-07-01', '2021-07-15', '2021-06-17', '2021-09-03', '2021-08-11', '2021-07-30', '2021-08-20', '2021-07-15']
[False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False]
['en', 'en', 'en', 'en', 'es', 'en', 'en', 'en', 'en', 'en', 'en', 'en', 'en', 'es', 'en', 'en', 'es', 'fr', 'en', 'en']
[3152.029, 3007.005, 2863.974, 2371.734, 2333.167, 1822.837, 1684.844, 1561.977, 1514.958, 1411.278, 1342.156, 1334.152, 1254.21

In [76]:
df = pd.DataFrame({'title': title_list,
             'release_date': rel_date_list,
             'adult_rating': adult_rate_list,
             'original_language': org_lang_list,
             'popularity': pop_list,
             'vote_average': votes }
             )
df

Unnamed: 0,title,release_date,adult_rating,original_language,popularity,vote_average
0,The Suicide Squad,2021-07-28,False,en,3152.029,8.0
1,Sweet Girl,2021-08-18,False,en,3007.005,7.0
2,Jungle Cruise,2021-07-28,False,en,2863.974,7.9
3,PAW Patrol: The Movie,2021-08-09,False,en,2371.734,8.1
4,Eggs Run,2021-08-12,False,es,2333.167,8.4
5,Black Widow,2021-07-07,False,en,1822.837,7.8
6,Space Jam: A New Legacy,2021-07-08,False,en,1684.844,7.5
7,Narco Sub,2021-07-22,False,en,1561.977,7.1
8,Free Guy,2021-08-11,False,en,1514.958,7.8
9,Infinite,2021-06-10,False,en,1411.278,7.5


In [77]:
df.to_csv('./movies_data.csv', index=False)