# Pipeline for the TMDB dataset
Steps that we will be following:
1. Get all the actor names from the oscar dataset
2. Get the actor ids from the TMDB API
3. Get the actor details from the TMDB API
4. Save the actor details in a dataframe
5. Save the dataframe as a pickle

Importing necessary librairies:

In [None]:
import pandas as pd
import requests

In [23]:
bearer = 'bearer token'

In [None]:
url = "https://api.themoviedb.org/3/person/person_id?language=en-US"

headers = {
    "accept": "application/json",
    "Authorization": f"Bearer {bearer}"
}

response = requests.get(url, headers=headers)

print(response.text)

{"success":false,"status_code":6,"status_message":"Invalid id: The pre-requisite id is invalid or not found."}


In [None]:
api_key = "api key"
actor_name = "Leonardo DiCaprio"  # Replace with the actor's name you want to search for
def get_actor_id(actor_name):
    url = f"https://api.themoviedb.org/3/search/person?api_key={api_key}&query={actor_name}"
    response = requests.get(url)
    if response.status_code == 200:
        data = response.json()
        if data['results']:
            actor_id = data['results'][0]['id']
            return actor_id
        else:
            return None
    else:
        return None


Importing oscar data:

In [7]:
oscar = pd.read_pickle('../pickles/oskar_df.pkl')

In [9]:
oscar.head()

Unnamed: 0,year_film,year_ceremony,ceremony,category,name,film,winner
0,1927,1928,1,ACTOR,Richard Barthelmess,The Noose,False
1,1927,1928,1,ACTOR,Emil Jannings,The Last Command,True
2,1927,1928,1,ACTRESS,Louise Dresser,A Ship Comes In,False
3,1927,1928,1,ACTRESS,Janet Gaynor,7th Heaven,True
4,1927,1928,1,ACTRESS,Gloria Swanson,Sadie Thompson,False


Getting all the actor names from the oscar dataset:

In [None]:
actor_names = oscar['name'].unique()

In [11]:
len(actor_names)

7040

Getting these actor's IMDB IDs:

In [None]:
actor_ids = []
for actor_name in actor_names:
    actor_id = get_actor_id(actor_name)
    actor_ids.append(actor_id)
    print(actor_name, actor_id)

Richard Barthelmess 13789
Emil Jannings 2895
Louise Dresser 146141
Janet Gaynor 9088
Gloria Swanson 8629
Rochus Gliese 9083
William Cameron Menzies 11489
Harry Oliver 1456432
George Barnes 1326395
Charles Rosher 41151
Karl Struss 9081
Lewis Milestone 2000
Ted Wilde 143558
Frank Borzage 14855
Herbert Brenon 930791
King Vidor 29962
Ralph Hammeras 1506044
Roy Pomeroy 1173151
Nugent Slaughter None
The Caddo Company None
Fox 19537
Paramount Famous Lasky None
Metro-Goldwyn-Mayer None
Alfred Cohn 14282
Anthony Coldeway 147010
Benjamin Glazer 89045
Lajos Biro 71789
Ben Hecht 4341
Gerald Duffy 1363919
Joseph Farnham 29965
George Marion, Jr. 101889
 Warner Bros. 4132340
 Charles Chaplin 116748
George Bancroft 13555
Warner Baxter 29999
Chester Morris 34754
Paul Muni 13352
Lewis Stone 29259
Ruth Chatterton 130385
Betty Compson 13556
Jeanne Eagels 1107369
Corinne Griffith 1054643
Bessie Love 29258
Mary Pickford 100047
Hans Dreier 8622
Cedric Gibbons 9062
Mitchell Leisen 108914
Clyde De Vinna 418836

In [16]:
len(actor_ids)

7040

Creating new df that has the actor names and their IMDB IDs and saving that in a pickle:

In [None]:
actor_df = pd.DataFrame({'name': actor_names, 'id': actor_ids})
# test if the df is correct
actor_name_test = actor_df['name'][5]
actor_id_test = actor_df['id'][5]
assert actor_id_test == get_actor_id(actor_name_test)


In [20]:
actor_df.to_pickle('../pickles/actor_df.pkl')

Retrieving detailed information about an actor from TMDb by their actor ID:

Parameters:

    - actor_id (int): The ID of the actor in TMDb.
    - api_key (str): Your TMDb API key.
    
Returns:

    - dict: A dictionary containing the actor's details or an error message if the request fails.

In [None]:

def get_actor_details(actor_id, api_key):
    url = f"https://api.themoviedb.org/3/person/{actor_id}"
    headers = {
        "accept": "application/json",
        "Authorization": f"Bearer {api_key}"
    }

    response = requests.get(url, headers=headers)

    if response.status_code == 200:
        return response.json()  # Returns the actor details as a dictionary
    else:
        print("Failed to retrieve data:", response.status_code)
        return None


actor_id = 6193  # Replace with the actor's ID you want to look up
actor_data = get_actor_details(actor_id, bearer)

if actor_data:
    print("Actor Details:")
    print(actor_data)

Actor Details:
{'adult': False, 'also_known_as': ['Leo DiCaprio', 'Leonardo Wilhelm DiCaprio', 'Леонардо ДіКапріо', 'ലിയനാർഡോ ഡികാപ്രിയോ', 'Leonardo Di Caprio'], 'biography': "Leonardo Wilhelm DiCaprio (born November 11, 1974) is an American actor and film producer. Known for his work in biopics and period films, DiCaprio is the recipient of numerous accolades, including an Academy Award, a British Academy Film Award, and three Golden Globe Awards. As of 2019, his films have grossed over $7.2 billion worldwide, and he has been placed eight times in annual rankings of the world's highest-paid actors.\n\nBorn in Los Angeles, DiCaprio began his career in the late 1980s by appearing in television commercials. In the early 1990s, he had recurring roles in various television shows, such as the sitcom Parenthood, and had his first major film part as author Tobias Wolff in This Boy's Life (1993). At age 19, he received critical acclaim and his first Academy Award and Golden Globe Award nominat

Dropping all columns where the ID is none

In [None]:
actor_df = actor_df.dropna()

In [26]:
len(actor_df)

2808

Adding actor details to our data frame:

In [None]:
actor_details = []
for actor_id in actor_df['id']:
    actor_data = get_actor_details(actor_id, bearer)
    actor_details.append(actor_data)


In [30]:
# actor_detail is a list of json objects with key query values
# add it to the df, for every key a new column is created
# Add suffixes to overlapping columns
actor_df = actor_df.join(pd.DataFrame(actor_details), lsuffix='_left', rsuffix='_right')
actor_df.head()

Unnamed: 0,name_left,id_left,adult,also_known_as,biography,birthday,deathday,gender,homepage,id_right,imdb_id,known_for_department,name_right,place_of_birth,popularity,profile_path
0,Richard Barthelmess,13789.0,False,"[Richard Semler Barthelmess, Richard S. Barthe...","From Wikipedia, the free encyclopedia. \n\nRic...",1895-05-08,1963-08-17,2.0,,13789.0,nm0001932,Acting,Richard Barthelmess,"New York City, New York, USA",1.079,/p7nmZeuQjHOsNFgqgEXBdh78OSC.jpg
1,Emil Jannings,2895.0,False,[Theodor Friedrich Emil Janenz],Emil Jannings (1884–1950) was a German actor. ...,1884-07-22,1950-01-02,2.0,,2895.0,nm0417837,Acting,Emil Jannings,"Rorschach, Switzerland",2.945,/yX7AFfYgYit6WlPshockLsP40LB.jpg
2,Louise Dresser,146141.0,False,"[Lulu Josephine Kerlin, Louise Josephine Kerlin]",,1878-10-03,1965-04-24,1.0,,146141.0,nm0237571,Acting,Louise Dresser,"Evansville, Indiana, USA",1.564,/i7NYAqqzY6uhI0IEcwPeYP9tNKt.jpg
3,Janet Gaynor,9088.0,False,"[Laura Augusta Gainor, Джанет Гейнор]","Janet Gaynor (October 6, 1906 – September 14, ...",1906-10-06,1984-09-14,1.0,,9088.0,nm0310980,Acting,Janet Gaynor,"Philadelphia, Pennsylvania, USA",2.692,/kwyClWei18GOssMPbrs4RL61izG.jpg
4,Gloria Swanson,8629.0,False,"[Gloria May Josephine Svensson, Gloria Mae, Гл...","Gloria Swanson (March 27, 1899 – April 4, 1983...",1899-03-27,1983-04-04,1.0,,8629.0,nm0841797,Acting,Gloria Swanson,"Chicago, Illinois, USA",3.541,/akmlp75ESHjtGOVtOCfJYxkX4eo.jpg


Renaming the columns for better comprehension

In [31]:
# rename name_left to name and id_left to id and drop name_right and id_right
actor_df = actor_df.rename(columns={'name_left': 'name', 'id_left': 'id'})
actor_df = actor_df.drop(columns=['name_right', 'id_right'])
actor_df.head()

Unnamed: 0,name,id,adult,also_known_as,biography,birthday,deathday,gender,homepage,imdb_id,known_for_department,place_of_birth,popularity,profile_path
0,Richard Barthelmess,13789.0,False,"[Richard Semler Barthelmess, Richard S. Barthe...","From Wikipedia, the free encyclopedia. \n\nRic...",1895-05-08,1963-08-17,2.0,,nm0001932,Acting,"New York City, New York, USA",1.079,/p7nmZeuQjHOsNFgqgEXBdh78OSC.jpg
1,Emil Jannings,2895.0,False,[Theodor Friedrich Emil Janenz],Emil Jannings (1884–1950) was a German actor. ...,1884-07-22,1950-01-02,2.0,,nm0417837,Acting,"Rorschach, Switzerland",2.945,/yX7AFfYgYit6WlPshockLsP40LB.jpg
2,Louise Dresser,146141.0,False,"[Lulu Josephine Kerlin, Louise Josephine Kerlin]",,1878-10-03,1965-04-24,1.0,,nm0237571,Acting,"Evansville, Indiana, USA",1.564,/i7NYAqqzY6uhI0IEcwPeYP9tNKt.jpg
3,Janet Gaynor,9088.0,False,"[Laura Augusta Gainor, Джанет Гейнор]","Janet Gaynor (October 6, 1906 – September 14, ...",1906-10-06,1984-09-14,1.0,,nm0310980,Acting,"Philadelphia, Pennsylvania, USA",2.692,/kwyClWei18GOssMPbrs4RL61izG.jpg
4,Gloria Swanson,8629.0,False,"[Gloria May Josephine Svensson, Gloria Mae, Гл...","Gloria Swanson (March 27, 1899 – April 4, 1983...",1899-03-27,1983-04-04,1.0,,nm0841797,Acting,"Chicago, Illinois, USA",3.541,/akmlp75ESHjtGOVtOCfJYxkX4eo.jpg


Let us load the data into a pickle for further processing and cleaning in the sections and results notebook:

In [32]:
actor_df.to_pickle('../pickles/actor_df.pkl')