# Project: Movie Recommendation System

Dataset used: [Top 10000 IMDB Movies](https://www.kaggle.com/datasets/ursmaheshj/top-10000-popular-movies-tmdb-05-2023)

Attributes
- id: Unique identifier assigned to each movie in the TMDB database.
- title: Title of the movie.
- release_date: Date on which the movie was released.
- genres: List of genres associated with the movie.
- original_language: Language in which the movie was originally produced.
- vote_average: Average rating given to the movie by TMDB users.
- vote_count: Number of votes cast for the movie on TMDB.
- popularity: Popularity score assigned to the movie by TMDB based on user engagement.
- overview: Brief description or synopsis of the movie.
- budget: Estimated budget for producing the movie in USD.
- production_companies: List of production companies involved in making the movie.
- revenue: Total revenue generated by the movie in USD.
- runtime: Total runtime of the movie in minutes.
- tagline: Short, memorable phrase associated with the movie, often used in promotional material.

In [1]:
import warnings
warnings.filterwarnings('ignore')

import pandas as pd
import numpy as np
import ipywidgets as widgets
from IPython.display import display

In [2]:
movie_data = pd.read_csv('movies_cleaned.csv')
movie_data

Unnamed: 0.1,Unnamed: 0,id,title,release_date,genres,original_language,vote_average,vote_count,popularity,overview,runtime
0,0,758323,The Pope's Exorcist,2023-04-05,"['Horror', 'Mystery', 'Thriller']",English,7.4,619,5089.969,"Father Gabriele Amorth, Chief Exorcist of the ...",103
1,1,640146,Ant-Man and the Wasp: Quantumania,2023-02-15,"['Action', 'Adventure', 'Science Fiction']",English,6.6,2294,4665.438,Super-Hero partners Scott Lang and Hope van Dy...,125
2,2,502356,The Super Mario Bros. Movie,2023-04-05,"['Animation', 'Adventure', 'Family', 'Fantasy'...",English,7.5,1861,3935.550,"While working underground to fix a water main,...",92
3,3,868759,Ghosted,2023-04-18,"['Action', 'Comedy', 'Romance']",English,7.2,652,2791.532,Salt-of-the-earth Cole falls head over heels f...,120
4,4,594767,Shazam! Fury of the Gods,2023-03-15,"['Action', 'Comedy', 'Fantasy', 'Adventure']",English,6.8,1510,2702.593,"Billy Batson and his foster siblings, who tran...",130
...,...,...,...,...,...,...,...,...,...,...,...
9699,9699,374473,"I, Daniel Blake",2016-10-21,['Drama'],English,7.7,1220,10.774,"A middle aged carpenter, who requires state we...",100
9700,9700,16774,Hellboy Animated: Sword of Storms,2006-10-28,"['TV Movie', 'Fantasy', 'Animation', 'Action',...",English,6.3,99,12.739,A folklore professor becomes unwittingly posse...,73
9701,9701,13564,Return to House on Haunted Hill,2007-10-03,"['Horror', 'Thriller']",English,5.6,263,12.769,Eight years have passed since Sara Wolfe and E...,81
9702,9702,482204,My Sister-in-law's Job,2017-08-31,"['Drama', 'Romance']",Korean,5.0,5,10.425,An erotic film that depicts the dangerous rela...,89


In [3]:
movie_data.columns

Index(['Unnamed: 0', 'id', 'title', 'release_date', 'genres',
       'original_language', 'vote_average', 'vote_count', 'popularity',
       'overview', 'runtime'],
      dtype='object')

In [4]:
movie_data.drop(columns=['Unnamed: 0'],inplace=True)
movie_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9704 entries, 0 to 9703
Data columns (total 10 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   id                 9704 non-null   int64  
 1   title              9704 non-null   object 
 2   release_date       9704 non-null   object 
 3   genres             9704 non-null   object 
 4   original_language  9704 non-null   object 
 5   vote_average       9704 non-null   float64
 6   vote_count         9704 non-null   int64  
 7   popularity         9704 non-null   float64
 8   overview           9704 non-null   object 
 9   runtime            9704 non-null   int64  
dtypes: float64(2), int64(3), object(5)
memory usage: 758.2+ KB


Let's modify the datatype of `release_date`.

In [5]:
movie_data['release_date'] = pd.to_datetime(movie_data['release_date'])
movie_data.dtypes

id                            int64
title                        object
release_date         datetime64[ns]
genres                       object
original_language            object
vote_average                float64
vote_count                    int64
popularity                  float64
overview                     object
runtime                       int64
dtype: object

In [6]:
movie_data = movie_data.reset_index(drop=True)
movie_data.tail()

Unnamed: 0,id,title,release_date,genres,original_language,vote_average,vote_count,popularity,overview,runtime
9699,374473,"I, Daniel Blake",2016-10-21,['Drama'],English,7.7,1220,10.774,"A middle aged carpenter, who requires state we...",100
9700,16774,Hellboy Animated: Sword of Storms,2006-10-28,"['TV Movie', 'Fantasy', 'Animation', 'Action',...",English,6.3,99,12.739,A folklore professor becomes unwittingly posse...,73
9701,13564,Return to House on Haunted Hill,2007-10-03,"['Horror', 'Thriller']",English,5.6,263,12.769,Eight years have passed since Sara Wolfe and E...,81
9702,482204,My Sister-in-law's Job,2017-08-31,"['Drama', 'Romance']",Korean,5.0,5,10.425,An erotic film that depicts the dangerous rela...,89
9703,444539,The Bookshop,2017-11-10,['Drama'],English,6.5,382,12.525,"Set in a small English town in 1959, a woman d...",110


## recommending movies by same/similar title

In [7]:
# cleaning title
import re

def clean_title(title):
    title = re.sub("[^a-zA-Z0-9 ]","", title)
    title = title.lower()
    return title

In [8]:
movie_search_data = movie_data[['id','title', 'release_date','vote_average','overview','runtime']]
movie_search_data

Unnamed: 0,id,title,release_date,vote_average,overview,runtime
0,758323,The Pope's Exorcist,2023-04-05,7.4,"Father Gabriele Amorth, Chief Exorcist of the ...",103
1,640146,Ant-Man and the Wasp: Quantumania,2023-02-15,6.6,Super-Hero partners Scott Lang and Hope van Dy...,125
2,502356,The Super Mario Bros. Movie,2023-04-05,7.5,"While working underground to fix a water main,...",92
3,868759,Ghosted,2023-04-18,7.2,Salt-of-the-earth Cole falls head over heels f...,120
4,594767,Shazam! Fury of the Gods,2023-03-15,6.8,"Billy Batson and his foster siblings, who tran...",130
...,...,...,...,...,...,...
9699,374473,"I, Daniel Blake",2016-10-21,7.7,"A middle aged carpenter, who requires state we...",100
9700,16774,Hellboy Animated: Sword of Storms,2006-10-28,6.3,A folklore professor becomes unwittingly posse...,73
9701,13564,Return to House on Haunted Hill,2007-10-03,5.6,Eight years have passed since Sara Wolfe and E...,81
9702,482204,My Sister-in-law's Job,2017-08-31,5.0,An erotic film that depicts the dangerous rela...,89


In [9]:
movie_search_data['clean_title']= movie_search_data['title'].apply(clean_title)
movie_search_data.head()

Unnamed: 0,id,title,release_date,vote_average,overview,runtime,clean_title
0,758323,The Pope's Exorcist,2023-04-05,7.4,"Father Gabriele Amorth, Chief Exorcist of the ...",103,the popes exorcist
1,640146,Ant-Man and the Wasp: Quantumania,2023-02-15,6.6,Super-Hero partners Scott Lang and Hope van Dy...,125,antman and the wasp quantumania
2,502356,The Super Mario Bros. Movie,2023-04-05,7.5,"While working underground to fix a water main,...",92,the super mario bros movie
3,868759,Ghosted,2023-04-18,7.2,Salt-of-the-earth Cole falls head over heels f...,120,ghosted
4,594767,Shazam! Fury of the Gods,2023-03-15,6.8,"Billy Batson and his foster siblings, who tran...",130,shazam fury of the gods


In [19]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from fuzzywuzzy import fuzz


def search_movie(title):
    # Clean the input title
    cleaned_title = clean_title(title)
    
    # Calculate cosine similarity between cleaned_title and movie titles
    vectorizer = TfidfVectorizer(ngram_range=(1,2))
    tfidf_title = vectorizer.fit_transform(movie_search_data['clean_title'])
    query_vect = vectorizer.transform([cleaned_title])
    similarity = cosine_similarity(query_vect, tfidf_title).flatten()
    
    # Fuzzy matching to account for minor title variations
    fuzzy_scores = [fuzz.ratio(cleaned_title, t) for t in movie_search_data['clean_title']]
    similarity = similarity * (0.8 + 0.2 * (pd.Series(fuzzy_scores) / 100))  # Weighted average
    
    # Get the indices of top 10 similar movies
    top_indices = similarity.argsort()[-10:][::-1]
    
    # Retrieve the top 10 search results
    search_movie_results = movie_search_data.iloc[top_indices]
    
    return search_movie_results[['id', 'title', 'release_date', 'vote_average', 'overview', 'runtime']]

In [20]:
# demonstrate example
search_movie("Fast & Furious")

Unnamed: 0,id,title,release_date,vote_average,overview,runtime
2480,13804,Fast & Furious,2009-04-02,6.7,"When a crime brings them back to L.A., fugitiv...",107
4637,82992,Fast & Furious 6,2013-05-21,6.8,Hobbs has Dominic and Brian reassemble their c...,131
164,168259,Furious 7,2015-04-01,7.2,Deckard Shaw seeks revenge against Dominic Tor...,137
117,384018,Fast & Furious Presents: Hobbs & Shaw,2019-08-01,6.9,Ever since US Diplomatic Security Service Agen...,137
5506,9799,The Fast and the Furious,2001-06-22,6.9,Dominic Toretto is a Los Angeles street racer ...,106
2373,9615,The Fast and the Furious: Tokyo Drift,2006-06-03,6.4,"In order to avoid a jail sentence, Sean Boswel...",104
3659,51497,Fast Five,2011-04-20,7.2,Former cop Brian O'Conner partners with ex-con...,130
3146,889741,Fast & Feel Love,2022-04-06,7.2,When a world champion of sport stacking is dum...,132
163,337339,The Fate of the Furious,2017-04-12,6.9,When a mysterious woman seduces Dom into the w...,136
6722,13342,Fast Times at Ridgemont High,1982-08-13,6.8,Based on the real-life adventures chronicled b...,90


In [21]:
# building an interactive search box on jupyter
import ipywidgets as widgets
from IPython.display import display

movie_input = widgets.Text(value="Mission Impossible...", description="Movie Title:", disabled=False)

movie_list = widgets.Output()

def on_type(data):
    with movie_list:
        movie_list.clear_output()
        title = data['new']
        if len(title)>=2:
            display(search_movie(title))
            
movie_input.observe(on_type, names ="value")
display(movie_input, movie_list)

Text(value='Mission Impossible...', description='Movie Title:')

Output()

In [22]:
import pickle

# Save the movie_search_data DataFrame to a pickle file
with open("movie_search_data.pkl", "wb") as file:
    pickle.dump(movie_search_data, file)

### recommend movies based on genre

In [23]:
def clean_genre(genre):
    genre = re.sub("[^a-zA-Z0-9 ]","", genre)
    return genre

In [24]:
movie_genre_data = movie_data[['id','title', 'genres','release_date','vote_average','overview','runtime']]
movie_genre_data['clean_genres']= movie_genre_data['genres'].apply(clean_genre)
movie_genre_data.head()

Unnamed: 0,id,title,genres,release_date,vote_average,overview,runtime,clean_genres
0,758323,The Pope's Exorcist,"['Horror', 'Mystery', 'Thriller']",2023-04-05,7.4,"Father Gabriele Amorth, Chief Exorcist of the ...",103,Horror Mystery Thriller
1,640146,Ant-Man and the Wasp: Quantumania,"['Action', 'Adventure', 'Science Fiction']",2023-02-15,6.6,Super-Hero partners Scott Lang and Hope van Dy...,125,Action Adventure Science Fiction
2,502356,The Super Mario Bros. Movie,"['Animation', 'Adventure', 'Family', 'Fantasy'...",2023-04-05,7.5,"While working underground to fix a water main,...",92,Animation Adventure Family Fantasy Comedy
3,868759,Ghosted,"['Action', 'Comedy', 'Romance']",2023-04-18,7.2,Salt-of-the-earth Cole falls head over heels f...,120,Action Comedy Romance
4,594767,Shazam! Fury of the Gods,"['Action', 'Comedy', 'Fantasy', 'Adventure']",2023-03-15,6.8,"Billy Batson and his foster siblings, who tran...",130,Action Comedy Fantasy Adventure


Recommend top 10 movies ranked by average votes, for the selected genre. We will use a drop-down options feature.

In [28]:
def recommend_genre(genre):
    # Get the movies that belong to the specified genre
    genre_movies = movie_genre_data[movie_genre_data['clean_genres'].str.contains(genre)]

    # Sort the movies by votes in descending order
    sorted_genre_movies = genre_movies.sort_values(by='vote_average', ascending=False)

    # Select the top 10 movies
    top_10_genre_movies = sorted_genre_movies.head(10)

    return top_10_genre_movies[['id', 'title', 'release_date', 'vote_average', 'overview', 'runtime']]

In [29]:
# create a list of unique genres
genres_col = movie_genre_data['genres']
# Empty set to store unique genres
unique_genres = set()
# Iterate over each row in the genres column
for genres_list in genres_col:
    genres = eval(genres_list)  # Convert the string representation of list to a list
    unique_genres.update(genres)  # Add the genres to the set
    
unique_genres

{'Action',
 'Adventure',
 'Animation',
 'Comedy',
 'Crime',
 'Documentary',
 'Drama',
 'Family',
 'Fantasy',
 'History',
 'Horror',
 'Music',
 'Mystery',
 'Romance',
 'Science Fiction',
 'TV Movie',
 'Thriller',
 'War',
 'Western'}

In [30]:
# demo example
recommend_genre("Music")

Unnamed: 0,id,title,release_date,vote_average,overview,runtime
4143,1022102,BTS: Permission to Dance on Stage - LA,2022-09-08,9.2,"Purple colors the city of Los Angeles, as BTS ...",130
8515,568332,Taylor Swift: Reputation Stadium Tour,2018-12-31,8.4,Taylor Swift takes the stage in Dallas for the...,125
3474,396194,Ennio,2022-02-17,8.4,"A portrait of Ennio Morricone, the most popula...",156
8696,553512,Burn the Stage: The Movie,2018-11-15,8.4,A documentary following the worldwide famous m...,85
9260,164558,One Direction: This Is Us,2013-08-29,8.4,"""One Direction: This Is Us"" is a captivating a...",92
908,244786,Whiplash,2014-10-10,8.4,"Under the direction of a ruthless instructor, ...",107
6029,630566,Clouds,2020-10-09,8.3,Young musician Zach Sobiech discovers his canc...,121
3681,10376,The Legend of 1900,1998-10-28,8.3,The story of a virtuoso piano player who lives...,170
4334,632632,Given,2020-08-22,8.3,"The relationship between a band's bassist, the...",60
4782,740996,BLACKPINK: Light Up the Sky,2020-10-14,8.3,Record-shattering Korean girl band BLACKPINK t...,79


In [31]:
# Create dropdown widget
genre_dropdown = widgets.Dropdown(options=unique_genres, description='Genre:')

# Define output widget
output = widgets.Output()

# Define function to handle genre selection change
def on_genre_change(change):
    genre = change.new
    output.clear_output()
    with output:
        recommendations = recommend_genre(genre)
        display(recommendations)

# Attach the genre change event handler
genre_dropdown.observe(on_genre_change, names='value')

# Display the widgets
display(genre_dropdown)
display(output)

Dropdown(description='Genre:', options=('Action', 'Comedy', 'Adventure', 'Music', 'TV Movie', 'History', 'Dram…

Output()

In [32]:
import pickle

# Save the movie_search_data DataFrame to a pickle file
with open("movie_genre_data.pkl", "wb") as file:
    pickle.dump(movie_genre_data, file)

### recommend movies based on language

In [33]:
movie_lang_data = movie_data[['id','title', 'original_language','release_date','vote_average','overview','runtime']]
movie_lang_data.head()

Unnamed: 0,id,title,original_language,release_date,vote_average,overview,runtime
0,758323,The Pope's Exorcist,English,2023-04-05,7.4,"Father Gabriele Amorth, Chief Exorcist of the ...",103
1,640146,Ant-Man and the Wasp: Quantumania,English,2023-02-15,6.6,Super-Hero partners Scott Lang and Hope van Dy...,125
2,502356,The Super Mario Bros. Movie,English,2023-04-05,7.5,"While working underground to fix a water main,...",92
3,868759,Ghosted,English,2023-04-18,7.2,Salt-of-the-earth Cole falls head over heels f...,120
4,594767,Shazam! Fury of the Gods,English,2023-03-15,6.8,"Billy Batson and his foster siblings, who tran...",130


In [34]:
lang_list = movie_lang_data['original_language'].unique()
lang_list

array(['English', 'French', 'Dutch', 'Spanish', 'Korean', 'Japanese',
       'Finnish', 'Ukrainian', 'Norwegian', 'Estonian', 'Cantonese',
       'Polish', 'Russian', 'German', 'Chinese', 'Italian', 'Basque',
       'Thai', 'Turkish', 'Swedish', 'Icelandic', 'Tagalog', 'Arabic',
       'Tamil', 'Telugu', 'Romanian', 'Indonesian', 'Galician', 'Danish',
       'Macedonian', 'Portuguese', 'Vietnamese', 'Catalan', 'Hindi',
       'Persian', 'Hebrew', 'Serbian', 'Malayalam', 'Greek', 'Hungarian',
       'Czech', 'Norwegian Bokmal', 'Kannada', 'Irish', 'Khmer',
       'Dzongkha', 'Panjabi'], dtype=object)

Recommend top 10 movies ranked by average votes, for the specified language. We will use a drop-down options feature.

In [35]:
def recommend_language(language):
    # Get the movies that belong to the specified genre
    lang_movies = movie_lang_data[movie_lang_data['original_language'].str.contains(language)]

    # Sort the movies by votes in descending order
    sorted_lang_movies = lang_movies.sort_values(by='vote_average', ascending=False)

    # Select the top 10 movies
    top_10_lang_movies = sorted_lang_movies.head(10)

    return top_10_lang_movies[['id', 'title', 'release_date', 'vote_average', 'overview', 'runtime']]

In [36]:
# demo example
recommend_language("Chinese")

Unnamed: 0,id,title,release_date,vote_average,overview,runtime
7974,612361,Wild Sparrow,2020-07-24,9.0,Little Han lives in the mountains with his gre...,94
5995,620249,The Legend of Hei,2019-08-27,8.4,"In the bustling human world, spirits live peac...",102
9521,15804,A Brighter Summer Day,1991-07-27,8.3,"A boy experiences first love, friendships and ...",237
4177,795607,Green Snake,2021-07-23,8.3,While trying to free her sister from Fahai's c...,131
6199,575813,Better Days,2019-10-25,8.3,A bullied teenage girl forms an unlikely frien...,135
9590,532753,Dying to Survive,2018-07-06,8.1,"When a mysterious visitor appears, the life of...",117
2050,663558,New Gods: Nezha Reborn,2021-02-06,8.1,While living as an ordinary deliveryman and mo...,116
5530,719410,Your Name Engraved Herein,2020-09-30,8.0,"In 1987, as martial law ends in Taiwan, Jia-ha...",118
8722,25538,Yi Yi,2000-09-20,7.9,Each member of a family in Taipei asks hard qu...,173
5095,615453,Ne Zha,2019-07-26,7.9,The Primus extracts a Mixed Yuan Bead into a S...,110


In [37]:
# Create dropdown widget
language_dropdown = widgets.Dropdown(options=lang_list, description='Language:')

# Define output widget
output = widgets.Output()

# Define function to handle genre selection change
def on_language_change(change):
    language = change.new
    output.clear_output()
    with output:
        recommendations = recommend_language(language)
        display(recommendations)

# Attach the genre change event handler
language_dropdown.observe(on_language_change, names='value')

# Display the widgets
display(language_dropdown)
display(output)

Dropdown(description='Language:', options=('English', 'French', 'Dutch', 'Spanish', 'Korean', 'Japanese', 'Fin…

Output()

In [38]:
import pickle

# Save the movie_search_data DataFrame to a pickle file
with open("movie_lang_data.pkl", "wb") as file:
    pickle.dump(movie_lang_data, file)

### recommend movies based on similar overview

for recommending movies based on similar movie names with their overviews: https://www.geeksforgeeks.org/movie-recommender-based-on-plot-summary-using-tf-idf-vectorization-and-cosine-similarity/

In [39]:
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('wordnet')
nltk.download('omw-1.4')

from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
  
from nltk.corpus import stopwords
nltk.download('stopwords')
stop_words = set(stopwords.words('english'))
  
VERB_CODES = {'VB', 'VBD', 'VBG', 'VBN', 'VBP', 'VBZ'}

[nltk_data] Downloading package punkt to C:\Users\Phyo Sandar
[nltk_data]     Win\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\Phyo Sandar Win\AppData\Roaming\nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package wordnet to C:\Users\Phyo Sandar
[nltk_data]     Win\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to C:\Users\Phyo Sandar
[nltk_data]     Win\AppData\Roaming\nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!
[nltk_data] Downloading package stopwords to C:\Users\Phyo Sandar
[nltk_data]     Win\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [40]:
def preprocess_sentences(text):
    text = text.lower()
    temp_sent =[]
    words = nltk.word_tokenize(text)
    tags = nltk.pos_tag(words)
    for i, word in enumerate(words):
        if tags[i][1] in VERB_CODES:
            lemmatized = lemmatizer.lemmatize(word, 'v')
        else:
            lemmatized = lemmatizer.lemmatize(word)
        if lemmatized not in stop_words and lemmatized.isalpha():
            temp_sent.append(lemmatized)

    finalsent = ' '.join(temp_sent)
    finalsent = finalsent.replace("n't", " not")
    finalsent = finalsent.replace("'m", " am")
    finalsent = finalsent.replace("'s", " is")
    finalsent = finalsent.replace("'re", " are")
    finalsent = finalsent.replace("'ll", " will")
    finalsent = finalsent.replace("'ve", " have")
    finalsent = finalsent.replace("'d", " would")
    
    return finalsent

We would use the title and the overview to recommend top 10 movies with similar plots.

In [41]:
movie_overview_data = movie_data[['id','title','release_date','vote_average','overview','runtime']]
movie_overview_data.head()

Unnamed: 0,id,title,release_date,vote_average,overview,runtime
0,758323,The Pope's Exorcist,2023-04-05,7.4,"Father Gabriele Amorth, Chief Exorcist of the ...",103
1,640146,Ant-Man and the Wasp: Quantumania,2023-02-15,6.6,Super-Hero partners Scott Lang and Hope van Dy...,125
2,502356,The Super Mario Bros. Movie,2023-04-05,7.5,"While working underground to fix a water main,...",92
3,868759,Ghosted,2023-04-18,7.2,Salt-of-the-earth Cole falls head over heels f...,120
4,594767,Shazam! Fury of the Gods,2023-03-15,6.8,"Billy Batson and his foster siblings, who tran...",130


In [42]:
# preprocess movie overviews 
movie_overview_data["overview_processed"]= movie_overview_data["overview"].apply(preprocess_sentences)
movie_overview_data.head()

Unnamed: 0,id,title,release_date,vote_average,overview,runtime,overview_processed
0,758323,The Pope's Exorcist,2023-04-05,7.4,"Father Gabriele Amorth, Chief Exorcist of the ...",103,father gabriele amorth chief exorcist vatican ...
1,640146,Ant-Man and the Wasp: Quantumania,2023-02-15,6.6,Super-Hero partners Scott Lang and Hope van Dy...,125,partner scott lang hope van dyne along hope pa...
2,502356,The Super Mario Bros. Movie,2023-04-05,7.5,"While working underground to fix a water main,...",92,work underground fix water main brooklyn luigi...
3,868759,Ghosted,2023-04-18,7.2,Salt-of-the-earth Cole falls head over heels f...,120,cole fall head heel enigmatic sadie make shock...
4,594767,Shazam! Fury of the Gods,2023-03-15,6.8,"Billy Batson and his foster siblings, who tran...",130,billy batson foster sibling transform superher...


In [43]:
import pickle

# Save the movie_search_data DataFrame to a pickle file
with open("movie_overview_data.pkl", "wb") as file:
    pickle.dump(movie_overview_data, file)

---

In [44]:
def recommend_similar_movie(title):
    selected_title_overview = movie_overview_data.loc[movie_overview_data['title']==title, 'overview_processed'].values[0]
    
    # Calculate cosine similarity between selected title overview and all movie overviews
    vectorizer = TfidfVectorizer(ngram_range=(1, 2))
    tfidf_overview = vectorizer.fit_transform(movie_overview_data['overview_processed'])
    query_vect_overview = vectorizer.transform([selected_title_overview])
    similarity_overview = cosine_similarity(query_vect_overview, tfidf_overview).flatten()
    
    # Get the indices of top 10 similar movies based on overviews
    top_10_indices_overview = similarity_overview.argsort()[-10:][::-1]

    # Retrieve the top 5 search results based on overviews
    search_movie_results_overview = movie_overview_data.iloc[top_10_indices_overview]

    return search_movie_results_overview[['id', 'title', 'release_date', 'vote_average', 'overview', 'runtime']]

In [45]:
# demo example
recommend_similar_movie("Toy Story")

Unnamed: 0,id,title,release_date,vote_average,overview,runtime
284,862,Toy Story,1995-10-30,8.0,"Led by Woody, Andy's toys live happily in his ...",81
457,863,Toy Story 2,1999-10-30,7.6,"Andy heads off to Cowboy Camp, leaving his toy...",92
596,10193,Toy Story 3,2010-06-16,7.8,"Woody, Buzz, and the rest of Andy's toys haven...",103
448,301528,Toy Story 4,2019-06-19,7.5,Woody has always been confident about his plac...,100
4449,6957,The 40 Year Old Virgin,2005-08-11,6.4,Andy Stitzer has a pleasant life with a nice a...,116
2274,979163,Beyond Infinity: Buzz and the Journey to Light...,2022-06-10,7.1,Explore the evolution of Buzz Lightyear from t...,35
6561,16187,Buzz Lightyear of Star Command: The Adventure ...,2000-08-08,6.4,Buzz Lightyear must battle Emperor Zurg with t...,70
9152,82424,Small Fry,2011-11-23,6.9,A fast food restaurant mini variant of Buzz fo...,7
1533,11186,Child's Play 2,1990-11-09,6.2,When Andy’s mother is admitted to a psychiatri...,84
295,718789,Lightyear,2022-06-15,7.1,Legendary Space Ranger Buzz Lightyear embarks ...,105


In [46]:
movie_input_similar = widgets.Dropdown(options=movie_overview_data['title'], description="Movie Title:", 
                                      disabled=False)
similar_movie_list = widgets.Output()

def on_change_similar_movie(change):
    with similar_movie_list:
        similar_movie_list.clear_output()
        title = change.new
        if title:
            recommended_movies = recommend_similar_movie(title)
            display(recommended_movies)

movie_input_similar.observe(on_change_similar_movie, names='value')
display(movie_input_similar, similar_movie_list)

Dropdown(description='Movie Title:', options=("The Pope's Exorcist", 'Ant-Man and the Wasp: Quantumania', 'The…

Output()

# Web application on Streamlit

https://medium.com/geekculture/end-to-end-movie-recommendation-system-49b29a8b57ac