## TMDB API (Practice):
### Project Planning
As discussed in the previous lesson, for the next part of your project, you will extract financial and certification data from TMDB's API for your IMDB data set. You will use an OUTER and INNER loop: a loop within a loop!

The OUTER loop will loop through the start years included in the IMDB data, filter the title basics data for the selected year, and save the list of movie ids from that year to retrieve in the inner loop.

The INNER loop loops through every movie id from the selected year, extracts its results from the TMDB API, and appends them to a JSON file.

### For this practice assignment
You will be practicing the inner loop of API calls for a single year's list of movies from your IMDB title basics data. Specifically, you will extract the API results for every movie with a startYear of 2010.

* Read the instructions below, including the examples in the "Getting Started" section, before starting your work.
* Create a new notebook in your project repository called "Practicing TMDB API calls."

#### Preparation BEFORE the Loop:

* Designate a folder to save your information.
* Define custom functions you will use for your API calls
* Load your cleaned title basics data from Part 1 of Project 2 (or query your title_basics table in your MySQL database).
* Define the year you wish to retrieve (2010) and create an empty list for appending error messages.

#### Prepare the DataFrame and JSON File

* Use the selected year to define filenames and filter the data

1. Define a JSON_FILE filename to save the results in progress.
2. Check if the file exists.
    * if it does not exist, create the empty JSON file with with open that just contains the key "imdb_id"
    * if it exists, print a message saying that it already exists.

Now that the JSON file for the results in progress exists:

* Filter the IMDB title basics data for the selected year and save the movie IDs from that year as "movies_ids".
* Check the JSON file for previously downloaded movie IDs and filter out the movie ids that already exists in the JSON file ( to prevent duplicate API calls) by:
    * Loading in the contents of the JSON file pd.read_json.
        * Compare the movie_ids that were in the JSON file to your saved movie_ids_to_get.
    * Save the final list of "movie_ids_to_get" by filtering out movies that already exists in the JSON file.

### Perform the Loop of API Calls

Note: you have already written a function to combine the certification with the rest of the .info() from the TMDB API results in the Intro to TMDB API lesson.

Create a loop to make API calls for each id in the YEAR specified. Include a progress bar using tqdm_notebook

For each movie id:

* Extract the current ID from the API and retrieve the dictionary of results
* Append the new results to the list from the JSON file
* Save the updated JSON file back to the disk

### Save the Results to Compressed .csv

* After the loop, save the final results for the year as a csv.gz file with the year in the filename.

Note: at this point, you'll have completed the inner loop that you will need for the next part of the project.

### Install/Import Required Packages

In [2]:
# Install tmdbsimple (only need to run once)
#!pip install tmdbsimple

Collecting tmdbsimple
  Obtaining dependency information for tmdbsimple from https://files.pythonhosted.org/packages/6c/dd/ade05d202db728b23e54aa0959622d090776023917e7308c1b2469a07b76/tmdbsimple-2.9.1-py3-none-any.whl.metadata
  Downloading tmdbsimple-2.9.1-py3-none-any.whl.metadata (6.9 kB)
Downloading tmdbsimple-2.9.1-py3-none-any.whl (38 kB)
Installing collected packages: tmdbsimple
Successfully installed tmdbsimple-2.9.1


In [3]:
# Import packages
import os, time, json
import tmdbsimple as tmdb 
import pandas as pd
from tqdm.notebook import tqdm_notebook

### Load TMDB API Key & Add to tmdbsimple

In [8]:
# Load API Credentials
with open(r"C:\Users\bandi\.secret\tmdb_api.json") as f:
    login = json.load(f)
## Display the keys of the loaded dict
login.keys()

dict_keys(['api-key', 'api-read-access-token'])

In [9]:
# Importing tmdbsimple and setting the API_KEY
import tmdbsimple as tmdb
tmdb.API_KEY =  login['api-key']

### Designate a folder

In [4]:
# Create the folder for saving files (if it doesn't exist)
FOLDER = "Data/"
os.makedirs(FOLDER, exist_ok=True)

In [23]:
# Show the list of files included in the folder
os.listdir(FOLDER)

['final_tmdb_data_2010.csv.gz',
 'title_basics_filtered.csv',
 'tmdb_api_results_2010.json']

### Define custom functions

In [10]:
def get_movie_with_rating(movie_id):
    # Get the movie object for the current id
    movie = tmdb.Movies(movie_id)
    
    # save the .info .releases dictionaries
    movie_info = movie.info()
    releases = movie.releases()
    
    # Loop through countries in releases
    for c in releases['countries']:
        # if the country abbreviation==US
        if c['iso_3166_1' ] =='US':
            ## save a "certification" key in the info dict with the certification
            movie_info['certification'] = c['certification']
    return movie_info

def write_json(new_data, filename): 
    """Appends a list of records (new_data) to a json file (filename). 
    Adapted from: https://www.geeksforgeeks.org/append-to-json-file-using-python/"""  
    
    with open(filename,'r+') as file:
        # First we load existing data into a dict.
        file_data = json.load(file)
        ## Choose extend or append
        if (type(new_data) == list) & (type(file_data) == list):
            file_data.extend(new_data)
        else:
             file_data.append(new_data)
        # Sets file's current position at offset.
        file.seek(0)
        # convert back to json.
        json.dump(file_data, file)

#### Confirm Your API Function works!

In [11]:
# Testing function on "The Avengers" (id="tt0848228"). What is its certification?
test1 = get_movie_with_rating("tt0848228") #put your function name here
test1

{'adult': False,
 'backdrop_path': '/9BBTo63ANSmhC4e6r62OJFuK2GL.jpg',
 'belongs_to_collection': {'id': 86311,
  'name': 'The Avengers Collection',
  'poster_path': '/yFSIUVTCvgYrpalUktulvk3Gi5Y.jpg',
  'backdrop_path': '/zuW6fOiusv4X9nnW3paHGfXcSll.jpg'},
 'budget': 220000000,
 'genres': [{'id': 878, 'name': 'Science Fiction'},
  {'id': 28, 'name': 'Action'},
  {'id': 12, 'name': 'Adventure'}],
 'homepage': 'https://www.marvel.com/movies/the-avengers',
 'id': 24428,
 'imdb_id': 'tt0848228',
 'original_language': 'en',
 'original_title': 'The Avengers',
 'overview': 'When an unexpected enemy emerges and threatens global safety and security, Nick Fury, director of the international peacekeeping agency known as S.H.I.E.L.D., finds himself in need of a team to pull the world back from the brink of disaster. Spanning the globe, a daring recruitment effort begins!',
 'popularity': 142.496,
 'poster_path': '/RYMX2wcKCBAr24UyPD7xwmjaTn.jpg',
 'production_companies': [{'id': 420,
   'logo_path

In [12]:
# Testing function on "The Notebook" (id="tt0332280"). What is its certification?
test2 = get_movie_with_rating("tt0332280") #put your function name here
test2

{'adult': False,
 'backdrop_path': '/qom1SZSENdmHFNZBXbtJAU0WTlC.jpg',
 'belongs_to_collection': None,
 'budget': 29000000,
 'genres': [{'id': 10749, 'name': 'Romance'}, {'id': 18, 'name': 'Drama'}],
 'homepage': 'http://www.newline.com/properties/notebookthe.html',
 'id': 11036,
 'imdb_id': 'tt0332280',
 'original_language': 'en',
 'original_title': 'The Notebook',
 'overview': "An epic love story centered around an older man who reads aloud to a woman with Alzheimer's. From a faded notebook, the old man's words bring to life the story about a couple who is separated by World War II, and is then passionately reunited, seven years later, after they have taken different paths.",
 'popularity': 75.114,
 'poster_path': '/rNzQyW4f8B8cQeg7Dgj3n6eT5k9.jpg',
 'production_companies': [{'id': 12,
   'logo_path': '/iaYpEp3LQmb8AfAtmTvpqd4149c.png',
   'name': 'New Line Cinema',
   'origin_country': 'US'},
  {'id': 1565, 'logo_path': None, 'name': 'Avery Pix', 'origin_country': 'US'},
  {'id': 26

### Load the Cleaned Title Basics data (from part 1)

In [14]:
# Load in the dataframe from project part 1 as basics:
basics = pd.read_csv('Data/title_basics_filtered.csv')
basics

Unnamed: 0,tconst,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres
0,tt0035423,movie,Kate & Leopold,Kate & Leopold,0,2001.0,,118.0,"Comedy,Fantasy,Romance"
1,tt0062336,movie,The Tango of the Widower and Its Distorting Mi...,El tango del viudo y su espejo deformante,0,2020.0,,70.0,Drama
2,tt0069049,movie,The Other Side of the Wind,The Other Side of the Wind,0,2018.0,,122.0,Drama
3,tt0088751,movie,The Naked Monster,The Naked Monster,0,2005.0,,100.0,"Comedy,Horror,Sci-Fi"
4,tt0096056,movie,Crime and Punishment,Crime and Punishment,0,2002.0,,126.0,Drama
...,...,...,...,...,...,...,...,...,...
104631,tt9915872,movie,The Last White Witch,My Girlfriend is a Wizard,0,2019.0,,97.0,"Comedy,Drama,Fantasy"
104632,tt9916170,movie,The Rehearsal,O Ensaio,0,2019.0,,51.0,Drama
104633,tt9916190,movie,Safeguard,Safeguard,0,2020.0,,95.0,"Action,Adventure,Thriller"
104634,tt9916362,movie,Coven,Akelarre,0,2020.0,,92.0,"Drama,History"


### Define a variable with the year to Extract from the API

In [15]:
# Set the year to filter for
YEAR = 2010

# Create an empty list for saving errors
errors = [ ]

### Prepare the DataFrame and JSON File

#### Select a JSON_FILE filename to save the results in progress

In [16]:
# Define the JSON file to store results for the year
JSON_FILE = f'{FOLDER}tmdb_api_results_{YEAR}.json'

# Check if the JSON file exists
file_exists = os.path.isfile(JSON_FILE)

# If it does not exist: create it
if file_exists == False:
    # Print a message indicating the file is being created
    print(f"Creating {JSON_FILE} for API results for year={YEAR}.")
    
    # save an empty dict with just "imdb_id" to the new json file.
    with open(JSON_FILE,'w') as f:
        json.dump([{'imdb_id':0}],f)

# If it exists, print a message
else:
    print(f'The file {JSON_FILE} already exists.')

Creating Data/tmdb_api_results_2010.json for API results for year=2010.


#### Filter for the selected year and save the movie ids

In [17]:
# Filtering for movies from selected startYear
df = basics.loc[ basics['startYear']==YEAR].copy()
# saving movie ids to list
movie_ids = df['tconst']
movie_ids.head()

1351    tt0230212
4575    tt0312305
5105    tt0326592
5126    tt0326965
5350    tt0331312
Name: tconst, dtype: object

### Check previous results and create the final list of movie_ids_to_get

In [18]:
# Load existing data from json into a dataframe called "previous_df"
previous_df = pd.read_json(JSON_FILE)
previous_df

Unnamed: 0,imdb_id
0,0


In [19]:
# filter out any ids that are already in the JSON_FILE
movie_ids_to_get = movie_ids[~movie_ids.isin(previous_df['imdb_id'])]

### Start Loop Through Movie IDs

In [20]:
# Loop through movie_ids_to_get with a tqdm progress bar
for movie_id in tqdm_notebook(movie_ids_to_get, f"Movies from {YEAR}"):

    # Attempt to retrieve then data for the movie id
    try:
        temp = get_movie_with_rating(movie_id)  #This uses your pre-ma    de function
        # Append/extend results to existing file using a pre-made function
        write_json(temp,JSON_FILE)
        # Short 20 ms sleep to prevent overwhelming server
        time.sleep(0.02)

    # If it fails,  make a dict with just the id and None for certification.
    except Exception as e:
        errors.append([movie_id, e])

Movies from 2010:   0%|          | 0/4354 [00:00<?, ?it/s]

### After the Loop

In [21]:
print(f"- Total errors: {len(errors)}")

- Total errors: 1484


### Save the year's results as csv.gz file

In [22]:
# Save the final results to a csv.gz file
final_year_df = pd.read_json(JSON_FILE)

csv_fname = f"{FOLDER}final_tmdb_data_{YEAR}.csv.gz"
final_year_df.to_csv(csv_fname, compression="gzip", index=False)