## Description

#### Purpose: To merge and one-hot encode MPAA ratings from TMDB and Box Office Mojo.

#### Input: `3.3.9a_Merged_Data_MPAA.csv`

#### Outputs: `3.3.9b_Merged_Data_MPAA_Encoded.csv`

This notebook consolidates MPAA ratings from TMDB and Box Office Mojo in the case of missing and overlapping data and then one-hot encodes the ratings. When data from both sources is available, preference is given to data from Box Office Mojo.

In [None]:
from tmdbv3api import TMDb
from tmdbv3api import Movie
from tmdbv3api.exceptions import TMDbException
import random
import pandas as pd
import csv
import numpy as np
from math import exp
import ast
tmdb=TMDb()
tmdb.api_key=' '
    # API key redacted

In [None]:
# Initialize csv file paths
csv_file_path= '/Outputs/3.3.9a_Merged_Data_MPAA.csv'
df = pd.read_csv(csv_file_path)

In [None]:
#Checks how the ratings are labeled to rename them before the merge
tmdb_ratings = df['MPAA_TMDB'].unique()
bom_ratings = df['MPAA'].unique()
print(tmdb_ratings)
print(bom_ratings)

In [None]:
#Replace 'R ' with 'R' to clean the MPAA ratings from TMDB before the merge
df['MPAA_TMDB'].replace({'R ': 'R'}, inplace=True)
# Replace 'Unrated' or 'Not Rated' with 'NR' in MPAA before the merge
df['MPAA'].replace({'Unrated': 'NR', 'Not Rated': 'NR'}, inplace=True)

In [None]:
# Merge the two columns giving preference to Column 2
df['Merged_MPAA'] = df['MPAA'].fillna(df['MPAA_TMDB'])
df['Merged_MPAA'] = df['Merged_MPAA'].fillna('NR')

#Identify unique rating strings
ratings = df['Merged_MPAA'].unique()

#One-hot encode the unique strings
one_hot_encoded = pd.get_dummies(df['Merged_MPAA'], columns=ratings)

#Join the one-hot encoded columns with the original DataFrame
df_output = pd.concat([df, one_hot_encoded], axis=1)

In [None]:
df_output.to_csv('/Outputs/3.3.9b_Merged_Data_MPAA_Encoded.csv', index = False)