# Download the datasets

    The TMDB 5000 Movie Dataset is a popular dataset used in the field of data science and machine learning for analyzing and predicting movie-related information. It contains information about over 5000 movies, including details such as movie titles, release dates, budgets, revenues, genres, ratings, and more. This dataset is widely used for various tasks, such as movie recommendation systems, sentiment analysis, and box office revenue prediction.

To access the TMDB 5000 Movie Dataset, you can download it from Kaggle, a popular online platform for data science and machine learning competitions. The dataset can be found at the following link: https://www.kaggle.com/datasets/tmdb/tmdb-movie-metadata?resource=download. 

In [202]:
!ls tmdb-5000

tmdb_5000_credits.csv  tmdb_5000_movies.csv


# Import the necessary libraries

In [7]:
import pandas as pd
import numpy as np

# Load the datasets

In [204]:
credits = pd.read_csv("tmdb-5000/tmdb_5000_credits.csv")
movies = pd.read_csv("tmdb-5000/tmdb_5000_movies.csv")

In [205]:
credits.head()

Unnamed: 0,movie_id,title,cast,crew
0,19995,Avatar,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,206647,Spectre,"[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,49026,The Dark Knight Rises,"[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,49529,John Carter,"[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


In [206]:
movies.head()

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2009-12-10,2787965087,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",http://disney.go.com/disneypictures/pirates/,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2007-05-19,961000000,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500
2,245000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.sonypictures.com/movies/spectre/,206647,"[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...",en,Spectre,A cryptic message from Bond’s past sends him o...,107.376788,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...","[{""iso_3166_1"": ""GB"", ""name"": ""United Kingdom""...",2015-10-26,880674609,148.0,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""},...",Released,A Plan No One Escapes,Spectre,6.3,4466
3,250000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",http://www.thedarkknightrises.com/,49026,"[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...",en,The Dark Knight Rises,Following the death of District Attorney Harve...,112.31295,"[{""name"": ""Legendary Pictures"", ""id"": 923}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2012-07-16,1084939099,165.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,The Legend Ends,The Dark Knight Rises,7.6,9106
4,260000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://movies.disney.com/john-carter,49529,"[{""id"": 818, ""name"": ""based on novel""}, {""id"":...",en,John Carter,"John Carter is a war-weary, former military ca...",43.926995,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}]","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2012-03-07,284139100,132.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"Lost in our world, found in another.",John Carter,6.1,2124


# Remove all the missing values in the dataset.





In [208]:
movies = movies.dropna()
credits = credits.dropna()

# The Weighted Rating (WR) is a formula commonly used in recommendation systems to calculate a weighted average rating for items, such as movies or products, based on their ratings and other factors. The WR formula takes into account both the average rating of an item (R) and the number of votes or ratings it has received (v), as well as a minimum threshold of votes required for an item to be considered (m), and the mean vote across all items (C).

    In this project, I will be utilizing the Weighted Rating (WR) formula to calculate the weighted average ratings for movies, and incorporate it into the recommendation system.

    Weighted Rating (WR) = (v / (v + m)) * R + (m / (v + m)) * C
    
    WR: Weighted Rating, which is the final calculated value.
    v: Number of votes (or ratings) received by the item.
    m: Minimum votes (or ratings) required for the item to be considered.
    R: Average rating of the item.
    C: Mean vote (or rating) across all items.

# <font color='red'>Now, let's determine the values of each variable that are required for the computation.</font>

# We can create a new dataframe to store the movie information that we want to compute.

In [222]:
movies_query = pd.DataFrame()

# Average rating of the item.

In [223]:
movies_query['R'] = movies.vote_average
movies_query.R

0       7.2
1       6.9
2       6.3
3       7.6
4       6.1
       ... 
4773    7.4
4781    6.0
4791    2.0
4796    6.9
4801    5.7
Name: R, Length: 1493, dtype: float64

# Mean vote (or rating) across all items.

In [224]:
C = movies_query.R.mean()
print(C, "out of", ratings.rating.max())

6.272873409243134 out of 5.0


# Number of votes (or ratings) received by the item.

In [225]:
movies_query['v'] = movies.vote_count
movies_query.v

0       11800
1        4500
2        4466
3        9106
4        2124
        ...  
4773      755
4781        1
4791        1
4796      658
4801        7
Name: v, Length: 1493, dtype: int64

# Minimum votes (or ratings) required for the item to be considered.

In [226]:
PERCENTAGE = .95 # set to 95%
m = movies_query.v.quantile(PERCENTAGE)
print(f"{PERCENTAGE * 100}%", m)

95.0% 4968.199999999993


# Calculate the weighted average ratings using the provided formula.

In [227]:
movies_query.head()

Unnamed: 0,R,v
0,7.2,11800
1,6.9,4500
2,6.3,4466
3,7.6,9106
4,6.1,2124


In [228]:
def WR(query):
    R = query.R
    v = query.v
    return (v / (v + m)) * R + (m / (v + m)) * C

In [229]:
movies_query["WR"] = movies_query.apply(WR, axis=1)

In [230]:
movies_query.head()

Unnamed: 0,R,v,WR
0,7.2,11800,6.925304
1,6.9,4500,6.570931
2,6.3,4466,6.285715
3,7.6,9106,7.131524
4,6.1,2124,6.221101


# Now, let's include the movie title in this dataframe

In [233]:
movies_query['movie_id'] = movies.id
movies_query['original_title'] = movies.original_title

In [234]:
movies_query.head()

Unnamed: 0,R,v,WR,movie_id,original_title
0,7.2,11800,6.925304,19995,Avatar
1,6.9,4500,6.570931,285,Pirates of the Caribbean: At World's End
2,6.3,4466,6.285715,206647,Spectre
3,7.6,9106,7.131524,49026,The Dark Knight Rises
4,6.1,2124,6.221101,49529,John Carter


# We can now sort the values to identify the top scores, and depending on our requirements, we can select the top 10 or top 20 recommendations for our recommendation system.

In [235]:
movies_query = movies_query.sort_values("WR", ascending=False)
movies_query.reset_index(drop=True, inplace=True)

In [236]:
movies_query.head(20)

Unnamed: 0,R,v,WR,movie_id,original_title
0,8.2,12002,7.635814,155,The Dark Knight
1,8.1,13752,7.615094,27205,Inception
2,8.3,9413,7.599699,550,Fight Club
3,8.1,10867,7.52675,157336,Interstellar
4,8.4,5893,7.426996,238,The Godfather
5,8.1,8064,7.403454,122,The Lord of the Rings: The Return of the King
6,8.0,8705,7.372443,120,The Lord of the Rings: The Fellowship of the Ring
7,7.9,9742,7.350457,118340,Guardians of the Galaxy
8,7.9,8907,7.317386,603,The Matrix
9,8.2,5879,7.317344,1891,The Empire Strikes Back


### <font color='green'>Notes:   One potential shortcoming of using a recommendation system with the Weighted Rating (WR) formula is that it may not fully capture the nuances and complexities of user preferences. The formula relies solely on weighted averages of ratings and does not take into account other factors such as user demographics, contextual information, or temporal dynamics. This can result in limitations in the accuracy and relevance of the recommendations, as it may not consider the individual preferences, tastes, and behaviors of users. Additionally, the WR formula may not perform well in scenarios with sparse or imbalanced data, as it gives equal weight to all ratings without considering their reliability or quality.</font>