## COMP41680 Assignment 1 - Task 1 : Data Collection

### Name: Sanika Kulkarni
### Student ID: 21200060


In this assignment, I will scrape the data of Top Rated Movies from [The Movie Database](https://www.themoviedb.org/about) through a Web API.

The Movie Database (TMDB) is a community built database having information about movies and TV shows with over 745,000 movies. There is an international focus and the movie data covers movies from many different countires. 

To extrat the data from TMDB, the following steps are followed:
- Create an account
- Register and fill in information about the intended usage to get API Key
- Go through the [API Documentation](https://developers.themoviedb.org/3/getting-started/introduction)
- Use the API Key to extract the necessary data

I have used the data for [top rated movies](https://developers.themoviedb.org/3/movies/get-top-rated-movies) for getting the movie ids. I have then curated more information about these movies using the [movie](https://developers.themoviedb.org/3/movies/get-movie-details) API URL since these requests need movie ids.

## Importing required libraries

In [1]:
import numpy as np
import pandas as pd
import requests
from pandas import json_normalize 

## Generated the required API key 

In [2]:
apiKey = '08b44048b40963efa3b42ce6dce166cd'

## Defining a function to fetch data

There are different categories of movies like top rated movies, latest movies, and so on. Each type has its own API base URL. For this project, I am going to be focusing on top rated movies.

In order to get the movies, I have defined a base URL and a params dictionary to specify the variable parameters of the URL. I have registered on TMDB and received my API Key. The TMDB has movies spread out through number of pages and hence page number is variable that is used for iteration.


In [3]:
def fetchMovies(pageNo):
  base_url = "https://api.themoviedb.org/3/movie/top_rated"
  params = {
      'api_key':apiKey,
      'page':pageNo
  }
  response_movies = requests.get(base_url,params=params).json()
  return response_movies

## Fetching Movie IDs for Top Rated Films

In [4]:
def get_movie_ids():
    movie_id_list = []
    for j in response:
        for i in j:
           movie_id_list.append(i['id'])
    return movie_id_list      

In [5]:
response = []

#iterate through all pages 
for page in range(1,489):
  response.append(fetchMovies(page)['results']) 

#Get the movie ids of all top rated movies 
movie_ids = get_movie_ids()

In [6]:
len(movie_ids)

9760

## Extracting information about these top rated films

In [7]:
movies_df = pd.DataFrame()


for movie_id in movie_ids:
    
    query = "https://api.themoviedb.org/3/movie/" + str(movie_id) + "?api_key=" + apiKey + "&language=en-US"
    response =  requests.get(query)
    if response.status_code==200: 
    #status code ==200 shows that the API call was successful
        movie_results = response.json()
        
        #Create Dataframe
        movies = json_normalize(movie_results)

        
        #Appending to the main dataframes 
        movies_df = pd.concat([movies_df, movies]) 
movies_df.head()

Unnamed: 0,adult,backdrop_path,belongs_to_collection,budget,genres,homepage,id,imdb_id,original_language,original_title,...,status,tagline,title,video,vote_average,vote_count,belongs_to_collection.id,belongs_to_collection.name,belongs_to_collection.poster_path,belongs_to_collection.backdrop_path
0,False,/wPU78OPN4BYEgWYdXyg0phMee64.jpg,,25000000,"[{'id': 18, 'name': 'Drama'}, {'id': 80, 'name...",,278,tt0111161,en,The Shawshank Redemption,...,Released,Fear can hold you prisoner. Hope can set you f...,The Shawshank Redemption,False,8.7,20990,,,,
0,False,/90ez6ArvpO8bvpyIngBuwXOqJm5.jpg,,13200000,"[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...",,19404,tt0112870,hi,दिलवाले दुल्हनिया ले जायेंगे,...,Released,"Come Fall In love, All Over Again..",Dilwale Dulhania Le Jayenge,False,8.7,3519,,,,
0,False,/rSPw7tgCH9c6NqICZef4kZjFOQ5.jpg,,6000000,"[{'id': 18, 'name': 'Drama'}, {'id': 80, 'name...",http://www.thegodfather.com/,238,tt0068646,en,The Godfather,...,Released,An offer you can't refuse.,The Godfather,False,8.7,15684,230.0,The Godfather Collection,/9Baumh5J9N1nJUYzNkm0xsgjpwY.jpg,/3WZTxpgscsmoUk81TuECXdFOD0R.jpg
0,False,/lp6SmwyNRspEYkkLXFEVuNlCw77.jpg,,0,"[{'id': 16, 'name': 'Animation'}, {'id': 14, '...",https://www.netflix.com/title/81193214,533514,tt8652818,ja,劇場版 ヴァイオレット・エヴァーガーデン,...,Released,,Violet Evergarden: The Movie,False,8.7,202,,,,
0,False,/v5CEt88iDsuoMaW1Q5Msu9UZdEt.jpg,,0,"[{'id': 10749, 'name': 'Romance'}, {'id': 18, ...",https://www.gaga.co.jp/intls/youreyestell/,730154,tt11051974,ja,きみの瞳が問いかけている,...,Released,,Your Eyes Tell,False,8.7,269,,,,


In [8]:
movies_df.columns

Index(['adult', 'backdrop_path', 'belongs_to_collection', 'budget', 'genres',
       'homepage', 'id', 'imdb_id', 'original_language', 'original_title',
       'overview', 'popularity', 'poster_path', 'production_companies',
       'production_countries', 'release_date', 'revenue', 'runtime',
       'spoken_languages', 'status', 'tagline', 'title', 'video',
       'vote_average', 'vote_count', 'belongs_to_collection.id',
       'belongs_to_collection.name', 'belongs_to_collection.poster_path',
       'belongs_to_collection.backdrop_path'],
      dtype='object')

## Converting the movies dataframe to a csv to store it

In [9]:
movies_df.to_csv("movies_df.csv",index=False)

In [10]:
# Re-reading it for checking if it was stored correctly
movies_df = pd.read_csv("movies_df.csv")
movies_df.head()

Unnamed: 0,adult,backdrop_path,belongs_to_collection,budget,genres,homepage,id,imdb_id,original_language,original_title,...,status,tagline,title,video,vote_average,vote_count,belongs_to_collection.id,belongs_to_collection.name,belongs_to_collection.poster_path,belongs_to_collection.backdrop_path
0,False,/wPU78OPN4BYEgWYdXyg0phMee64.jpg,,25000000,"[{'id': 18, 'name': 'Drama'}, {'id': 80, 'name...",,278,tt0111161,en,The Shawshank Redemption,...,Released,Fear can hold you prisoner. Hope can set you f...,The Shawshank Redemption,False,8.7,20990,,,,
1,False,/90ez6ArvpO8bvpyIngBuwXOqJm5.jpg,,13200000,"[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...",,19404,tt0112870,hi,दिलवाले दुल्हनिया ले जायेंगे,...,Released,"Come Fall In love, All Over Again..",Dilwale Dulhania Le Jayenge,False,8.7,3519,,,,
2,False,/rSPw7tgCH9c6NqICZef4kZjFOQ5.jpg,,6000000,"[{'id': 18, 'name': 'Drama'}, {'id': 80, 'name...",http://www.thegodfather.com/,238,tt0068646,en,The Godfather,...,Released,An offer you can't refuse.,The Godfather,False,8.7,15684,230.0,The Godfather Collection,/9Baumh5J9N1nJUYzNkm0xsgjpwY.jpg,/3WZTxpgscsmoUk81TuECXdFOD0R.jpg
3,False,/lp6SmwyNRspEYkkLXFEVuNlCw77.jpg,,0,"[{'id': 16, 'name': 'Animation'}, {'id': 14, '...",https://www.netflix.com/title/81193214,533514,tt8652818,ja,劇場版 ヴァイオレット・エヴァーガーデン,...,Released,,Violet Evergarden: The Movie,False,8.7,202,,,,
4,False,/v5CEt88iDsuoMaW1Q5Msu9UZdEt.jpg,,0,"[{'id': 10749, 'name': 'Romance'}, {'id': 18, ...",https://www.gaga.co.jp/intls/youreyestell/,730154,tt11051974,ja,きみの瞳が問いかけている,...,Released,,Your Eyes Tell,False,8.7,269,,,,


In [11]:
movies_df.shape

(9760, 29)

**I have thus scraped data about 9760 top rated movies and this is used for further analysis.**