In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt 

In [2]:
movies_metadata = pd.read_csv('movies_metadata.csv', low_memory = False)

In [3]:
movies_metadata.head(3)

Unnamed: 0,adult,belongs_to_collection,budget,genres,homepage,id,imdb_id,original_language,original_title,overview,...,release_date,revenue,runtime,spoken_languages,status,tagline,title,video,vote_average,vote_count
0,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...",http://toystory.disney.com/toy-story,862,tt0114709,en,Toy Story,"Led by Woody, Andy's toys live happily in his ...",...,1995-10-30,373554033.0,81.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,,Toy Story,False,7.7,5415.0
1,False,,65000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...",,8844,tt0113497,en,Jumanji,When siblings Judy and Peter discover an encha...,...,1995-12-15,262797249.0,104.0,"[{'iso_639_1': 'en', 'name': 'English'}, {'iso...",Released,Roll the dice and unleash the excitement!,Jumanji,False,6.9,2413.0
2,False,"{'id': 119050, 'name': 'Grumpy Old Men Collect...",0,"[{'id': 10749, 'name': 'Romance'}, {'id': 35, ...",,15602,tt0113228,en,Grumpier Old Men,A family wedding reignites the ancient feud be...,...,1995-12-22,0.0,101.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Still Yelling. Still Fighting. Still Ready for...,Grumpier Old Men,False,6.5,92.0


Features

<b>adult:</b> Indicates if the movie is X-Rated or Adult.<br>
<b>belongs_to_collection: </b>A stringified dictionary that gives information on the movie series the particular film belongs to.<br>
<b>budget:</b> The budget of the movie in dollars.<br>
<b>genres:</b> A stringified list of dictionaries that list out all the genres associated with the movie.<br>
<b>homepage: </b>The Official Homepage of the move.<br>
<b>id:</b> The ID of the move.<br>
<b>imdb_id: </b>The IMDB ID of the movie.<br>
<b>original_language: </b>The language in which the movie was originally shot in.<br>
<b>original_title: </b>The original title of the movie.<br>
<b>overview: </b>A brief blurb of the movie.<br>
<b>popularity:</b> The Popularity Score assigned by TMDB.<br>
<b>poster_path: </b>The URL of the poster image.<br>
<b>production_companies:</b> A stringified list of production companies involved with the making of the movie.<br>
<b>production_countries:</b> A stringified list of countries where the movie was shot/produced in.<br>
<b>release_date: </b>Theatrical Release Date of the movie.<br>
<b>revenue: </b>The total revenue of the movie in dollars.<br>
<b>runtime: </b>The runtime of the movie in minutes.<br>
<b>spoken_languages: </b>A stringified list of spoken languages in the film.<br>
<b>status:</b> The status of the movie (Released, To Be Released, Announced, etc.)<br>
<b>tagline: </b>The tagline of the movie.<br>
<b>title: </b>The Official Title of the movie.<br>
<b>video: </b>Indicates if there is a video present of the movie with TMDB.<br>
<b>vote_average:</b> The average rating of the movie.<br>
<b>vote_count:</b> The number of votes by users, as counted by TMDB.

In [4]:
movies_metadata.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45466 entries, 0 to 45465
Data columns (total 24 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   adult                  45466 non-null  object 
 1   belongs_to_collection  4494 non-null   object 
 2   budget                 45466 non-null  object 
 3   genres                 45466 non-null  object 
 4   homepage               7782 non-null   object 
 5   id                     45466 non-null  object 
 6   imdb_id                45449 non-null  object 
 7   original_language      45455 non-null  object 
 8   original_title         45466 non-null  object 
 9   overview               44512 non-null  object 
 10  popularity             45461 non-null  object 
 11  poster_path            45080 non-null  object 
 12  production_companies   45463 non-null  object 
 13  production_countries   45463 non-null  object 
 14  release_date           45379 non-null  object 
 15  re

There are <b>45466</b> movies in total and each movie has 24 attributes. 

## Data Wrangling

In [5]:
movies_metadata.drop(columns = ['imdb_id','video', 'poster_path'], inplace = True)

In [6]:
movies_metadata.head(2)

Unnamed: 0,adult,belongs_to_collection,budget,genres,homepage,id,original_language,original_title,overview,popularity,...,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count
0,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...",http://toystory.disney.com/toy-story,862,en,Toy Story,"Led by Woody, Andy's toys live happily in his ...",21.946943,...,"[{'iso_3166_1': 'US', 'name': 'United States o...",1995-10-30,373554033.0,81.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,,Toy Story,7.7,5415.0
1,False,,65000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...",,8844,en,Jumanji,When siblings Judy and Peter discover an encha...,17.015539,...,"[{'iso_3166_1': 'US', 'name': 'United States o...",1995-12-15,262797249.0,104.0,"[{'iso_639_1': 'en', 'name': 'English'}, {'iso...",Released,Roll the dice and unleash the excitement!,Jumanji,6.9,2413.0


In [7]:
movies_metadata.adult.value_counts()

False                                                                                                                             45454
True                                                                                                                                  9
 - Written by Ørnås                                                                                                                   1
 Rune Balot goes to a casino connected to the October corporation to try to wrap up her case once and for all.                        1
 Avalanche Sharks tells the story of a bikini contest that turns into a horrifying affair when it is hit by a shark avalanche.        1
Name: adult, dtype: int64

Only 9 movies out of 45466 are adult. That means a very large number of the movies is non adult movies. Hence, this column doesn't contribute much and can be dropped.

In [8]:
movies_metadata.drop(columns = ['adult'], inplace = True)

In [9]:
movies_metadata.head(2)

Unnamed: 0,belongs_to_collection,budget,genres,homepage,id,original_language,original_title,overview,popularity,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count
0,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...",http://toystory.disney.com/toy-story,862,en,Toy Story,"Led by Woody, Andy's toys live happily in his ...",21.946943,"[{'name': 'Pixar Animation Studios', 'id': 3}]","[{'iso_3166_1': 'US', 'name': 'United States o...",1995-10-30,373554033.0,81.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,,Toy Story,7.7,5415.0
1,,65000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...",,8844,en,Jumanji,When siblings Judy and Peter discover an encha...,17.015539,"[{'name': 'TriStar Pictures', 'id': 559}, {'na...","[{'iso_3166_1': 'US', 'name': 'United States o...",1995-12-15,262797249.0,104.0,"[{'iso_639_1': 'en', 'name': 'English'}, {'iso...",Released,Roll the dice and unleash the excitement!,Jumanji,6.9,2413.0


In [10]:
foreign_movies = movies_metadata[movies_metadata['original_title']!= movies_metadata['title']]

In [11]:
foreign_movies.shape

(11402, 20)

Out of 45K movies, 11K movies are foreign movies with their original title in a foreign. This column does not provide us with any new information. As fas as the original language data is concerned, we have that in the original_language column. Hence, we can drop the original_title column.

In [12]:
movies_metadata.drop(columns = ['original_title'], inplace = True)

In [13]:
movies_metadata.belongs_to_collection.isna().sum()

40972

A very large number of movie revenues are not available. Hence, replace 0 with na

In [14]:
movies_metadata[movies_metadata['revenue'] == 0].shape

(38052, 19)

In [15]:
movies_metadata['revenue'] = movies_metadata['revenue'].replace(0,np.nan)

In [16]:
movies_metadata.head(2)

Unnamed: 0,belongs_to_collection,budget,genres,homepage,id,original_language,overview,popularity,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count
0,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...",http://toystory.disney.com/toy-story,862,en,"Led by Woody, Andy's toys live happily in his ...",21.946943,"[{'name': 'Pixar Animation Studios', 'id': 3}]","[{'iso_3166_1': 'US', 'name': 'United States o...",1995-10-30,373554033.0,81.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,,Toy Story,7.7,5415.0
1,,65000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...",,8844,en,When siblings Judy and Peter discover an encha...,17.015539,"[{'name': 'TriStar Pictures', 'id': 559}, {'na...","[{'iso_3166_1': 'US', 'name': 'United States o...",1995-12-15,262797249.0,104.0,"[{'iso_639_1': 'en', 'name': 'English'}, {'iso...",Released,Roll the dice and unleash the excitement!,Jumanji,6.9,2413.0


A recommendation system is a type of software that analyzes user behavior and makes personalized suggestions based on that analysis. These systems are commonly used in e-commerce, social media, and other online platforms to help users find products, services, or content that they are likely to be interested in. The main goal of a recommendation system is to improve user engagement and satisfaction by providing personalized and relevant recommendations to each user. Recommendation systems use various algorithms and techniques to analyze user data and make recommendations, and can be built using a wide range of programming languages and libraries.

There are various types of recommendation systems used in the movie industry. Here are some of the most popular types:<br>
<b>Content-Based Recommendation Systems:</b> These systems recommend movies based on the content provided in the description of the movie. They use features such as genre, director, actors, and keywords to recommend similar movies.<br>
<b>Collaborative Filtering Recommendation Systems: </b>These systems analyze user data to find similarities between users and recommend movies based on those similarities. For example, if User A and User B have similar movie preferences, and User A has watched a movie that User B hasn't, the system will recommend that movie to User B.<br>
<b>Popularity-Based/Demographic Recommendation Systems:</b> These systems recommend movies based on their overall popularity or ratings. They often recommend the most popular movies in a particular genre or those that have received the highest ratings from users.<br>
<b>Hybrid Recommendation Systems:</b> These systems combine two or more of the above approaches to provide personalized recommendations. For example, a hybrid system might combine collaborative filtering with content-based filtering to provide more accurate and useful recommendations.<br>
There are many other types of recommendation systems used in the movie industry, but these are some of the most common ones.

In [17]:
movies = pd.read_csv('tmdb_5000_movies.csv')
credits = pd.read_csv('tmdb_5000_credits.csv')

In [18]:
movies.head(3)

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2009-12-10,2787965087,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",http://disney.go.com/disneypictures/pirates/,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2007-05-19,961000000,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500
2,245000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.sonypictures.com/movies/spectre/,206647,"[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...",en,Spectre,A cryptic message from Bond’s past sends him o...,107.376788,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...","[{""iso_3166_1"": ""GB"", ""name"": ""United Kingdom""...",2015-10-26,880674609,148.0,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""},...",Released,A Plan No One Escapes,Spectre,6.3,4466


In [19]:
credits.head(3)

Unnamed: 0,movie_id,title,cast,crew
0,19995,Avatar,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,206647,Spectre,"[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."


The first dataset contains the following features:- <br>
<b>movie_id </b>- A unique identifier for each movie.<br>
<b>cast </b>- The name of lead and supporting actors.<br>
<b>crew </b>- The name of Director, Editor, Composer, Writer etc.<br>

The second dataset has the following features:-<br>
<b>budget </b> - The budget in which the movie was made.<br>
<b>genre </b>- The genre of the movie, Action, Comedy ,Thriller etc.<br>
<b>homepage </b>- A link to the homepage of the movie.<br>
<b>id </b>- This is infact the movie_id as in the first dataset.<br>
<b>keywords</b> - The keywords or tags related to the movie.<br>
<b>original_language</b> - The language in which the movie was made.<br>
<b>original_title</b> - The title of the movie before translation or adaptation.<br>
<b>overview </b>- A brief description of the movie.<br>
<b>popularity </b>- A numeric quantity specifying the movie popularity.<br>
<b>production_companies </b>- The production house of the movie.<br>
<b>production_countries</b> - The country in which it was produced.<br>
<b>release_date </b>- The date on which it was released.<br>
<b>revenue</b> - The worldwide revenue generated by the movie.<br>
<b>runtime </b>- The running time of the movie in minutes.<br>
<b>status </b></b>- "Released" or "Rumored".<br>
<b>tagline</b> - Movie's tagline.<br>
<b>title </b>- Title of the movie.<br>
<b>vote_average</b> - average ratings the movie recieved.<br>
<b>vote_count</b> - the count of votes recieved.<br>
Let's join the two dataset on the 'movie_id' column<br>

In [20]:
movies.rename(columns={'id':'movie_id'}, inplace = True)
movie_credits_df = pd.merge(movies, credits, on = 'movie_id')

In [21]:
movie_credits_df.head(3)

Unnamed: 0,budget,genres,homepage,movie_id,keywords,original_language,original_title,overview,popularity,production_companies,...,runtime,spoken_languages,status,tagline,title_x,vote_average,vote_count,title_y,cast,crew
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...",...,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800,Avatar,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",http://disney.go.com/disneypictures/pirates/,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...",...,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500,Pirates of the Caribbean: At World's End,"[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,245000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.sonypictures.com/movies/spectre/,206647,"[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...",en,Spectre,A cryptic message from Bond’s past sends him o...,107.376788,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...",...,148.0,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""},...",Released,A Plan No One Escapes,Spectre,6.3,4466,Spectre,"[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
