## 1. Importing Python Libraries

We shall start by importing the essential Python libraries

In [1]:
### IMPORTING LIBRARIES
import pandas as pd
import numpy as np
import requests
import collections

## 2. Connecting to TMDB Website Using the API Key

We now use the API key we had received when we signed up for TMDB to pull data from the website. For this, we go to https://developers.themoviedb.org and on the left column, we can see a section of urls that can help us get the information we need. We go to _'Get Top Rated'_ under _MOVIES_ and select the url to get the list of top rated movies and attach our API key as directed. 

Here, we will only try to get the first page of results so we mention page number as 1 in the url. We then create a response object by passing the url through requests. 

In [2]:
### USING API KEY TO GET DATA
api_key = "API Key"
url = "https://api.themoviedb.org/3/movie/top_rated?api_key=" +  api_key + "&language=en-US&page=1"
response = requests.get(url)
response

<Response [200]>

A response 200 means our request was a success. 

## 3. Pulling the Top Rated Movies Data from the Response Object I

We can now extract information from this response object. We can see from https://developers.themoviedb.org that the json file in the response object contains an array _results_ which has the information we need. Let us make a pandas dataframe, _tmdb_df_, from this array.

In [3]:
### CREATING A PANDAS DATAFRAME 
tmdb_df = pd.DataFrame(response.json()['results'])
tmdb_df

Unnamed: 0,adult,backdrop_path,genre_ids,id,original_language,original_title,overview,popularity,poster_path,release_date,title,video,vote_average,vote_count
0,False,/5hNcsnMkwU2LknLoru73c76el3z.jpg,"[35, 18, 10749]",19404,hi,दिलवाले दुल्हनिया ले जायेंगे,"Raj is a rich, carefree, happy-go-lucky second...",24.222,/2CAL2433ZeIihfX1Hb2139CX0pW.jpg,1995-10-20,Dilwale Dulhania Le Jayenge,False,8.7,3253
1,False,/iNh3BivHyg5sQRPP1KOkzguEX0H.jpg,"[18, 80]",278,en,The Shawshank Redemption,Framed in the 1940s for the double murder of h...,67.359,/q6y0Go1tsGEsmtFryDOJo3dEmqu.jpg,1994-09-23,The Shawshank Redemption,False,8.7,20172
2,False,/rSPw7tgCH9c6NqICZef4kZjFOQ5.jpg,"[18, 80]",238,en,The Godfather,"Spanning the years 1945 to 1955, a chronicle o...",62.603,/eEslKSwcqmiNS6va24Pbxf2UKmJ.jpg,1972-03-14,The Godfather,False,8.7,15112
3,False,/jtAI6OJIWLWiRItNSZoWjrsUtmi.jpg,[10749],724089,en,Gabriel's Inferno Part II,Professor Gabriel Emerson finally learns the t...,10.796,/x5o8cLZfEXMoZczTYWLrUo1P7UJ.jpg,2020-07-31,Gabriel's Inferno Part II,False,8.7,1334
4,False,/fQq1FWp1rC89xDrRMuyFJdFUdMd.jpg,"[10749, 35]",761053,en,Gabriel's Inferno Part III,The final part of the film adaption of the ero...,34.804,/qtX2Fg9MTmrbgN1UUvGoCsImTM8.jpg,2020-11-19,Gabriel's Inferno Part III,False,8.6,901
5,False,/loRmRzQXZeqG78TqZuyvSlEQfZb.jpg,"[18, 36, 10752]",424,en,Schindler's List,The true story of how businessman Oskar Schind...,35.794,/sF1U4EUQS8YHUYjNl3pMGNIQyr0.jpg,1993-11-30,Schindler's List,False,8.6,12066
6,False,/w2uGvCpMtvRqZg6waC1hvLyZoJa.jpg,[10749],696374,en,Gabriel's Inferno,An intriguing and sinful exploration of seduct...,14.372,/oyG9TL7FcRP4EZ9Vid6uKzwdndz.jpg,2020-05-29,Gabriel's Inferno,False,8.6,2155
7,False,/3ggZWEoa2aegF6AYyjyNRm8noM5.jpg,"[18, 80]",240,en,The Godfather: Part II,In the continuing saga of the Corleone crime f...,40.244,/sSuQTCZwqKrNBNIsksO9IAUoWP9.jpg,1974-12-20,The Godfather: Part II,False,8.6,9097
8,False,/1EAxNqdkVnp48a7NUuNBHGflowM.jpg,"[16, 28, 878]",283566,ja,シン・エヴァンゲリオン劇場版:||,"In the aftermath of the Fourth Impact, strande...",153.382,/jDwZavHo99JtGsCyRzp4epeeBHx.jpg,2021-03-08,Evangelion: 3.0+1.0 Thrice Upon a Time,False,8.6,383
9,False,/l5K9elugftlcyIHHm4nylvsn26X.jpg,[18],255709,ko,소원,After 8-year-old So-won narrowly survives a br...,8.039,/x9yjkm9gIz5qI5fJMUTfBnWiB2o.jpg,2013-10-02,Hope,False,8.6,236


We can see that the movie data contains a lot of information about the top rated movies: their certification, genres, unique ids, original title, title in English, plot summary, release date and so on. We actually wont be needing all of these columns but we shall deal with that later. 

## 4. Creating a Dictionary of Column Names

Before moving forward, let us store the names of all the columns in a dictionary. This will make our task easier down the line.

In [4]:
### CREATING A DICTIONARY OF COLUMN NAMES
feature_names = collections.defaultdict(str)
for name in tmdb_df.columns:
    feature_names[name]
feature_names

defaultdict(str,
            {'adult': '',
             'backdrop_path': '',
             'genre_ids': '',
             'id': '',
             'original_language': '',
             'original_title': '',
             'overview': '',
             'popularity': '',
             'poster_path': '',
             'release_date': '',
             'title': '',
             'video': '',
             'vote_average': '',
             'vote_count': ''})

## 5. Pulling the Top Rated Movies Data from the Response Object II

Here, we shall create a for-loop that will increment page numbers from 2 to any number that we want(here we take the number as 469). Using the same process as above, we shall extract the array contained in the json file of the response object obtained from each of these pages, convert them into a pandas dataframe and vertically stack them on the dataframe _tmbd_df_.

In [5]:
### PULLING DATA FOR ALL THE TOP RATED MOVIES
for page_no in range(2, 469):
    url = "https://api.themoviedb.org/3/movie/top_rated?api_key=" +  api_key + "&language=en-US&page= %d" % page_no
    print('Getting page: %d' % page_no)
    response = requests.get(url)
    tmdb_results = response.json()['results']
    movie_data = pd.DataFrame(tmdb_results)
    tmdb_df = np.vstack((tmdb_df, movie_data))

Getting page: 2
Getting page: 3
Getting page: 4
Getting page: 5
Getting page: 6
Getting page: 7
Getting page: 8
Getting page: 9
Getting page: 10
Getting page: 11
Getting page: 12
Getting page: 13
Getting page: 14
Getting page: 15
Getting page: 16
Getting page: 17
Getting page: 18
Getting page: 19
Getting page: 20
Getting page: 21
Getting page: 22
Getting page: 23
Getting page: 24
Getting page: 25
Getting page: 26
Getting page: 27
Getting page: 28
Getting page: 29
Getting page: 30
Getting page: 31
Getting page: 32
Getting page: 33
Getting page: 34
Getting page: 35
Getting page: 36
Getting page: 37
Getting page: 38
Getting page: 39
Getting page: 40
Getting page: 41
Getting page: 42
Getting page: 43
Getting page: 44
Getting page: 45
Getting page: 46
Getting page: 47
Getting page: 48
Getting page: 49
Getting page: 50
Getting page: 51
Getting page: 52
Getting page: 53
Getting page: 54
Getting page: 55
Getting page: 56
Getting page: 57
Getting page: 58
Getting page: 59
Getting page: 60
Getti

Getting page: 464
Getting page: 465
Getting page: 466
Getting page: 467
Getting page: 468


We then convert the final array into a pandas dataframe and name the columns using the dictionary that we had previously created.

In [7]:
### CREATING A PANDAS DATAFRAME
tmdb_df = pd.DataFrame(tmdb_df, columns = feature_names)
tmdb_df

Unnamed: 0,adult,backdrop_path,genre_ids,id,original_language,original_title,overview,popularity,poster_path,release_date,title,video,vote_average,vote_count
0,False,/5hNcsnMkwU2LknLoru73c76el3z.jpg,"[35, 18, 10749]",19404,hi,दिलवाले दुल्हनिया ले जायेंगे,"Raj is a rich, carefree, happy-go-lucky second...",24.222,/2CAL2433ZeIihfX1Hb2139CX0pW.jpg,1995-10-20,Dilwale Dulhania Le Jayenge,False,8.7,3253
1,False,/iNh3BivHyg5sQRPP1KOkzguEX0H.jpg,"[18, 80]",278,en,The Shawshank Redemption,Framed in the 1940s for the double murder of h...,67.359,/q6y0Go1tsGEsmtFryDOJo3dEmqu.jpg,1994-09-23,The Shawshank Redemption,False,8.7,20172
2,False,/rSPw7tgCH9c6NqICZef4kZjFOQ5.jpg,"[18, 80]",238,en,The Godfather,"Spanning the years 1945 to 1955, a chronicle o...",62.603,/eEslKSwcqmiNS6va24Pbxf2UKmJ.jpg,1972-03-14,The Godfather,False,8.7,15112
3,False,/jtAI6OJIWLWiRItNSZoWjrsUtmi.jpg,[10749],724089,en,Gabriel's Inferno Part II,Professor Gabriel Emerson finally learns the t...,10.796,/x5o8cLZfEXMoZczTYWLrUo1P7UJ.jpg,2020-07-31,Gabriel's Inferno Part II,False,8.7,1334
4,False,/fQq1FWp1rC89xDrRMuyFJdFUdMd.jpg,"[10749, 35]",761053,en,Gabriel's Inferno Part III,The final part of the film adaption of the ero...,34.804,/qtX2Fg9MTmrbgN1UUvGoCsImTM8.jpg,2020-11-19,Gabriel's Inferno Part III,False,8.6,901
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9355,False,/lcLyZzhB1ctfdH0hGBsTFrbflqP.jpg,"[28, 14, 27]",12142,en,Alone in the Dark,Edward Carnby is a private investigator specia...,14.365,/o6Wf8lj8P9enQQbj4pC8jVDJHxI.jpg,2005-01-28,Alone in the Dark,False,3.2,429
9356,False,/hqJfW8G8FL28rckFHuCoKPecpG9.jpg,"[28, 12, 878, 10752]",5491,en,Battlefield Earth,"In the year 3000, man is no match for the Psyc...",8.795,/neMUscYddxr4cP8wnRHRMLcWS0A.jpg,2000-05-12,Battlefield Earth,False,3.2,621
9357,False,/aNUEHLNsNMprLZt6fjf5nqDq6er.jpg,"[27, 28, 53]",11059,en,House of the Dead,"Set on an island off the coast, a techno rave ...",10.019,/lI6UBnxwHztggSq8PhLibdOe2Nd.jpg,2003-04-11,House of the Dead,False,3.2,280
9358,False,/oHrrgAPEKpz0S1ofQntiZNrmGrM.jpg,"[28, 12, 14, 878, 53]",14164,en,Dragonball Evolution,The young warrior Son Goku sets out on a quest...,50.279,/sunS9xhPnFNP5wlOWrvbpBteAB.jpg,2009-03-12,Dragonball Evolution,False,2.9,1567


## 6. Saving the Movies Dataframe

Now, let us save this dataframe in csv format which can be used later to make models.

In [8]:
### SAVING THE DATAFRAME
tmdb_df.to_csv('tmdb_movies_data.csv', index = False)