# Netflix Movies Recomendation System
BY Sheikh Md Abid
## Project Workflow

### 1. **Importing Libraries**
The necessary Python libraries such as `numpy`, `pandas`, and `ast` are imported to handle data processing and manipulation.

In [1]:
import numpy as np
import pandas as pd

### 2. Loading the Datasets
The datasets `tmdb_5000_movies.csv` and `tmdb_5000_credits.csv` are loaded using Pandas. These datasets contain information about movies, including titles, genres, cast, crew, and more, which will be used to build the recommendation system.

In [2]:
movies = pd.read_csv("tmdb_5000_movies.csv")
credits = pd.read_csv("tmdb_5000_credits.csv")
movies_list = pd.read_csv("movies.csv")
ratings = pd.read_csv("ratings.csv")

In [3]:
movies.head(1)

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2009-12-10,2787965087,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800


In [4]:
credits.head(1)

Unnamed: 0,movie_id,title,cast,crew
0,19995,Avatar,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."


In [5]:
ratings.head(1)

Unnamed: 0,userId,movieId,rating,timestamp
0,1,296,5.0,1147880044


In [6]:
movies_list.head(1)

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy


### 3. Processing datasets

In [7]:
# dropping genres column from movies_list dataframe as it is already available in movies dataframe.
movies_list = movies_list.drop('genres', axis=1)
movies_list.head(1)

Unnamed: 0,movieId,title
0,1,Toy Story (1995)


In [8]:
# making title similar to movies dataframe to join with movies dataframe on the basis of titles.
movies_list['title'] = movies_list['title'].str.replace(r'\(\d{4}\)', '', regex=True).str.strip()
movies_list.head(2)

Unnamed: 0,movieId,title
0,1,Toy Story
1,2,Jumanji


### 4. Merging Datasets
The `movies` and `credits` datasets are merged on the movie `title` to combine relevant information from both datasets into a single dataframe. This allows for a more comprehensive analysis by linking movie metadata with corresponding cast and crew details.


In [9]:
# merging ratings and movies_list dataframe on the basis of movies IDs. and droping the duplicates if available.
ratings_with_movies = ratings.merge(movies_list, on='movieId')
ratings_with_movies = ratings_with_movies.drop_duplicates(subset=['userId', 'movieId', 'rating'])
ratings_with_movies.head(2)

Unnamed: 0,userId,movieId,rating,timestamp,title
0,1,296,5.0,1147880044,Pulp Fiction
1,3,296,5.0,1439474476,Pulp Fiction


In [10]:
ratings_with_movies.shape

(25000095, 5)

In [11]:
# merging movies and credits dataframe on the basis of movies title.
movies = movies.merge(credits, on='title')
movies.head(1)

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,...,runtime,spoken_languages,status,tagline,title,vote_average,vote_count,movie_id,cast,crew
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...",...,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800,19995,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."


In [12]:
# merging movies and movies list dataframe to get IDs of movie list dataset.
movies = movies.merge(movies_list, on='title')
movies.head(1)

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,...,spoken_languages,status,tagline,title,vote_average,vote_count,movie_id,cast,crew,movieId
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...",...,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800,19995,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de...",72998


In [13]:
# merging movies and ratings datasets on movies ID to get previous IDs from main metadata sets (movies dataframe) in ratings dataframe.
movies_ratings_all = movies.merge(ratings, on = 'movieId')
rating_main = movies_ratings_all[['userId', 'movie_id', 'title', 'rating']]
rating_main = rating_main.drop_duplicates(subset=['userId', 'movie_id', 'rating'])
rating_main.head(1)

Unnamed: 0,userId,movie_id,title,rating
0,3,19995,Avatar,4.0


In [14]:
rating_main.shape

(12664758, 4)

### 4. Data Preprocessing
The merged dataset is reduced to the most relevant columns for building the recommendation system. These columns include `movie_id`, `title`, `overview`, `genres`, `keywords`, `cast`, and `crew`. This reduction helps to focus the analysis on the essential features needed for content-based recommendations.

In [15]:
# taking necessary features for content based recommendation.
meta_movies = movies[['movie_id','title','overview','genres','keywords','cast','crew']]
meta_movies.head(1)

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...","[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."


In [16]:
# taking genuine User for rating
x = rating_main.groupby('userId').count()['rating'] > 200
genuine_user = x[x].index

# taking above average rated movies only for recommendation.
x = rating_main.groupby('title').mean()['rating'] > 2.5
high_rated_movies = x[x].index

# modifying the rating dataset on the basis of genuine users and above average rated movies.
filtered_rating = rating_main[rating_main['userId'].isin(genuine_user) & rating_main['title'].isin(high_rated_movies)]
filtered_rating.shape

(5190578, 4)

In [17]:
# taking the movies which has at least 50 different ratings.
y = filtered_rating.groupby('title').count()['rating']>=50
famous_movies = y[y].index

# modifying the rating dataset on the basis of famous movies among users.
final_ratings = filtered_rating[filtered_rating['title'].isin(famous_movies)]
final_ratings = final_ratings.drop_duplicates()
final_ratings

Unnamed: 0,userId,movie_id,title,rating
0,3,19995,Avatar,4.0
2,12,19995,Avatar,4.0
3,13,19995,Avatar,4.0
11,75,19995,Avatar,3.0
20,181,19995,Avatar,1.5
...,...,...,...,...
12770259,150864,25975,My Date with Drew,4.5
12770260,151876,25975,My Date with Drew,2.0
12770261,152690,25975,My Date with Drew,2.0
12770262,155853,25975,My Date with Drew,2.5


In [18]:
user_movie_rating_table = final_ratings.pivot_table(index='movie_id',columns='userId',values='rating')
user_movie_rating_table.fillna(0, inplace=True)
user_movie_rating_table

userId,3,12,13,31,43,57,72,75,80,120,...,162481,162484,162495,162508,162512,162516,162519,162521,162533,162534
movie_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
5,0.0,4.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,3.5,0.0,0.0,0.0,0.0
12,4.0,4.0,4.0,3.0,4.0,4.5,0.0,0.0,0.0,0.0,...,4.0,3.0,1.0,5.0,4.5,4.0,0.0,4.0,4.5,4.0
13,4.0,4.0,5.0,3.0,5.0,4.0,5.0,2.0,4.0,5.0,...,4.5,3.0,5.0,5.0,0.0,4.5,0.0,3.5,4.5,2.5
14,5.0,4.0,4.0,3.0,5.0,5.0,5.0,3.5,4.0,5.0,...,4.0,3.5,4.0,0.0,4.0,5.0,5.0,0.0,4.0,4.0
16,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,3.5,0.0,0.0,0.0,4.0,2.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
365222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
374461,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
376659,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0
396152,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### 5. Handling Missing Data
To ensure clean and consistent data, rows with missing values in the selected columns are dropped. This step is crucial for maintaining the integrity of the dataset, as missing values could affect the accuracy of the recommendation system.


In [19]:
meta_movies.dropna(inplace=True)
meta_movies.drop_duplicates()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  meta_movies.dropna(inplace=True)


Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...","[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...","[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...","[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...","[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,206647,Spectre,A cryptic message from Bond’s past sends him o...,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...","[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
4,49529,John Carter,"John Carter is a war-weary, former military ca...","[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","[{""id"": 818, ""name"": ""based on novel""}, {""id"":...","[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."
5,559,Spider-Man 3,The seemingly invincible Spider-Man goes up ag...,"[{""id"": 14, ""name"": ""Fantasy""}, {""id"": 28, ""na...","[{""id"": 851, ""name"": ""dual identity""}, {""id"": ...","[{""cast_id"": 30, ""character"": ""Peter Parker / ...","[{""credit_id"": ""52fe4252c3a36847f80151a5"", ""de..."
...,...,...,...,...,...,...,...
4328,14337,Primer,Friends/fledgling entrepreneurs invent a devic...,"[{""id"": 878, ""name"": ""Science Fiction""}, {""id""...","[{""id"": 1448, ""name"": ""distrust""}, {""id"": 2101...","[{""cast_id"": 1, ""character"": ""Aaron"", ""credit_...","[{""credit_id"": ""52fe45e79251416c75066791"", ""de..."
4329,72766,Newlyweds,A newlywed couple's honeymoon is upended by th...,"[{""id"": 35, ""name"": ""Comedy""}, {""id"": 10749, ""...",[],"[{""cast_id"": 1, ""character"": ""Buzzy"", ""credit_...","[{""credit_id"": ""52fe487dc3a368484e0fb013"", ""de..."
4330,231617,"Signed, Sealed, Delivered","""Signed, Sealed, Delivered"" introduces a dedic...","[{""id"": 35, ""name"": ""Comedy""}, {""id"": 18, ""nam...","[{""id"": 248, ""name"": ""date""}, {""id"": 699, ""nam...","[{""cast_id"": 8, ""character"": ""Oliver O\u2019To...","[{""credit_id"": ""52fe4df3c3a36847f8275ecf"", ""de..."
4331,126186,Shanghai Calling,When ambitious New York attorney Sam is sent t...,[],[],"[{""cast_id"": 3, ""character"": ""Sam"", ""credit_id...","[{""credit_id"": ""52fe4ad9c3a368484e16a36b"", ""de..."


### 6. Data Transformation for content based recommendation
The columns `genres`, `keywords`, `cast`, and `crew` contain nested JSON-like structures, which need to be converted into lists of strings for further processing. This transformation is done using the `ast.literal_eval()` function, along with custom helper functions, to extract the relevant information from these nested structures.

In [20]:
import ast

In [21]:
def convert(text):
    L = []
    for i in ast.literal_eval(text):
        L.append(i['name']) 
    return L

In [22]:
meta_movies['genres'] = meta_movies['genres'].apply(convert)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  meta_movies['genres'] = meta_movies['genres'].apply(convert)


In [23]:
meta_movies['keywords'] = meta_movies['keywords'].apply(convert)
meta_movies.head(1)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  meta_movies['keywords'] = meta_movies['keywords'].apply(convert)


Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[Action, Adventure, Fantasy, Science Fiction]","[culture clash, future, space war, space colon...","[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."


### 7. Processing Cast and Crew for content based recommendation
The `cast` and `crew` columns are refined to focus on the most relevant information:

- **Cast:** The `cast` column is limited to the top 3 actors, as they typically have the most significant impact on a movie's identity and appeal.
- **Crew:** The `crew` column is filtered to retain only the director's name, as the director plays a crucial role in shaping the movie's vision and style.

These transformations ensure that the tags used for recommendations are concise and focused on the key contributors to a movie.


In [24]:
def convert3(text):
    L = []
    counter = 0
    for i in ast.literal_eval(text):
        if counter < 3:
            L.append(i['name'])
        counter+=1
    return L 

In [25]:
meta_movies['cast'] = meta_movies['cast'].apply(convert)
meta_movies.head(1)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  meta_movies['cast'] = meta_movies['cast'].apply(convert)


Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[Action, Adventure, Fantasy, Science Fiction]","[culture clash, future, space war, space colon...","[Sam Worthington, Zoe Saldana, Sigourney Weave...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."


In [26]:
meta_movies['cast'] = meta_movies['cast'].apply(lambda x:x[0:3])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  meta_movies['cast'] = meta_movies['cast'].apply(lambda x:x[0:3])


In [27]:
def fetch_director(text):
    L = []
    for i in ast.literal_eval(text):
        if i['job'] == 'Director':
            L.append(i['name'])
    return L 

In [28]:
meta_movies['crew'] = meta_movies['crew'].apply(fetch_director)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  meta_movies['crew'] = meta_movies['crew'].apply(fetch_director)


In [29]:
meta_movies.sample(3)

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
3207,14576,Shade,"Tiffany, Charlie and Vernon are con artists lo...","[Action, Thriller, Crime]",[],"[Stuart Townsend, Gabriel Byrne, Thandie Newton]",[Damian Nieman]
1428,13496,American Outlaws,When a Midwest town learns that a corrupt rail...,"[Action, Western]","[sheriff, horse, outlaw, jesse james, cole you...","[Colin Farrell, Scott Caan, Ali Larter]",[Les Mayfield]
1695,10317,Our Brand Is Crisis,"A feature film based on the documentary ""Our B...","[Comedy, Drama]","[bolivia, woman, political campaign, south ame...","[Sandra Bullock, Anthony Mackie, Billy Bob Tho...",[David Gordon Green]


### 8. Removing Spaces in Tags for content based recommendation
To ensure that the vectorization process works effectively, spaces in the tags (such as actor names, genres, keywords, etc.) are removed. This step prevents issues where multi-word tags might be treated as separate tokens, which could dilute their significance in the recommendation system.


In [30]:
def collapse(L):
    L1 = []
    for i in L:
        L1.append(i.replace(" ",""))
    return L1

In [31]:
meta_movies['cast'] = meta_movies['cast'].apply(collapse)
meta_movies['crew'] = meta_movies['crew'].apply(collapse)
meta_movies['genres'] = meta_movies['genres'].apply(collapse)
meta_movies['keywords'] = meta_movies['keywords'].apply(collapse)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  meta_movies['cast'] = meta_movies['cast'].apply(collapse)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  meta_movies['crew'] = meta_movies['crew'].apply(collapse)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  meta_movies['genres'] = meta_movies['genres'].apply(collapse)
A value is trying to be set

In [32]:
meta_movies.head()

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[Action, Adventure, Fantasy, ScienceFiction]","[cultureclash, future, spacewar, spacecolony, ...","[SamWorthington, ZoeSaldana, SigourneyWeaver]",[JamesCameron]
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...","[Adventure, Fantasy, Action]","[ocean, drugabuse, exoticisland, eastindiatrad...","[JohnnyDepp, OrlandoBloom, KeiraKnightley]",[GoreVerbinski]
2,206647,Spectre,A cryptic message from Bond’s past sends him o...,"[Action, Adventure, Crime]","[spy, basedonnovel, secretagent, sequel, mi6, ...","[DanielCraig, ChristophWaltz, LéaSeydoux]",[SamMendes]
3,206647,Spectre,A cryptic message from Bond’s past sends him o...,"[Action, Adventure, Crime]","[spy, basedonnovel, secretagent, sequel, mi6, ...","[DanielCraig, ChristophWaltz, LéaSeydoux]",[SamMendes]
4,49529,John Carter,"John Carter is a war-weary, former military ca...","[Action, Adventure, ScienceFiction]","[basedonnovel, mars, medallion, spacetravel, p...","[TaylorKitsch, LynnCollins, SamanthaMorton]",[AndrewStanton]


### 9. Creating the Tags Column for content based recommendation
A new column called `tags` is created by combining the content from the `overview`, `genres`, `keywords`, `cast`, and `crew` columns into a single string. This consolidated column serves as the primary input for generating movie recommendations, as it encapsulates all the key descriptive information about each movie.


In [33]:
meta_movies['overview'] = meta_movies['overview'].apply(lambda x:x.split())

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  meta_movies['overview'] = meta_movies['overview'].apply(lambda x:x.split())


In [34]:
meta_movies['tags'] = meta_movies['overview'] + meta_movies['genres'] + meta_movies['keywords'] + meta_movies['cast'] + meta_movies['crew']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  meta_movies['tags'] = meta_movies['overview'] + meta_movies['genres'] + meta_movies['keywords'] + meta_movies['cast'] + meta_movies['crew']


In [35]:
meta_movies

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew,tags
0,19995,Avatar,"[In, the, 22nd, century,, a, paraplegic, Marin...","[Action, Adventure, Fantasy, ScienceFiction]","[cultureclash, future, spacewar, spacecolony, ...","[SamWorthington, ZoeSaldana, SigourneyWeaver]",[JamesCameron],"[In, the, 22nd, century,, a, paraplegic, Marin..."
1,285,Pirates of the Caribbean: At World's End,"[Captain, Barbossa,, long, believed, to, be, d...","[Adventure, Fantasy, Action]","[ocean, drugabuse, exoticisland, eastindiatrad...","[JohnnyDepp, OrlandoBloom, KeiraKnightley]",[GoreVerbinski],"[Captain, Barbossa,, long, believed, to, be, d..."
2,206647,Spectre,"[A, cryptic, message, from, Bond’s, past, send...","[Action, Adventure, Crime]","[spy, basedonnovel, secretagent, sequel, mi6, ...","[DanielCraig, ChristophWaltz, LéaSeydoux]",[SamMendes],"[A, cryptic, message, from, Bond’s, past, send..."
3,206647,Spectre,"[A, cryptic, message, from, Bond’s, past, send...","[Action, Adventure, Crime]","[spy, basedonnovel, secretagent, sequel, mi6, ...","[DanielCraig, ChristophWaltz, LéaSeydoux]",[SamMendes],"[A, cryptic, message, from, Bond’s, past, send..."
4,49529,John Carter,"[John, Carter, is, a, war-weary,, former, mili...","[Action, Adventure, ScienceFiction]","[basedonnovel, mars, medallion, spacetravel, p...","[TaylorKitsch, LynnCollins, SamanthaMorton]",[AndrewStanton],"[John, Carter, is, a, war-weary,, former, mili..."
...,...,...,...,...,...,...,...,...
4328,14337,Primer,"[Friends/fledgling, entrepreneurs, invent, a, ...","[ScienceFiction, Drama, Thriller]","[distrust, garage, identitycrisis, timetravel,...","[ShaneCarruth, DavidSullivan, CaseyGooden]",[ShaneCarruth],"[Friends/fledgling, entrepreneurs, invent, a, ..."
4329,72766,Newlyweds,"[A, newlywed, couple's, honeymoon, is, upended...","[Comedy, Romance]",[],"[EdwardBurns, KerryBishé, MarshaDietlein]",[EdwardBurns],"[A, newlywed, couple's, honeymoon, is, upended..."
4330,231617,"Signed, Sealed, Delivered","[""Signed,, Sealed,, Delivered"", introduces, a,...","[Comedy, Drama, Romance, TVMovie]","[date, loveatfirstsight, narration, investigat...","[EricMabius, KristinBooth, CrystalLowe]",[ScottSmith],"[""Signed,, Sealed,, Delivered"", introduces, a,..."
4331,126186,Shanghai Calling,"[When, ambitious, New, York, attorney, Sam, is...",[],[],"[DanielHenney, ElizaCoupe, BillPaxton]",[DanielHsia],"[When, ambitious, New, York, attorney, Sam, is..."


In [36]:
meta_movies = meta_movies.drop_duplicates(subset = ['title', 'movie_id'])
meta_movies

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew,tags
0,19995,Avatar,"[In, the, 22nd, century,, a, paraplegic, Marin...","[Action, Adventure, Fantasy, ScienceFiction]","[cultureclash, future, spacewar, spacecolony, ...","[SamWorthington, ZoeSaldana, SigourneyWeaver]",[JamesCameron],"[In, the, 22nd, century,, a, paraplegic, Marin..."
1,285,Pirates of the Caribbean: At World's End,"[Captain, Barbossa,, long, believed, to, be, d...","[Adventure, Fantasy, Action]","[ocean, drugabuse, exoticisland, eastindiatrad...","[JohnnyDepp, OrlandoBloom, KeiraKnightley]",[GoreVerbinski],"[Captain, Barbossa,, long, believed, to, be, d..."
2,206647,Spectre,"[A, cryptic, message, from, Bond’s, past, send...","[Action, Adventure, Crime]","[spy, basedonnovel, secretagent, sequel, mi6, ...","[DanielCraig, ChristophWaltz, LéaSeydoux]",[SamMendes],"[A, cryptic, message, from, Bond’s, past, send..."
4,49529,John Carter,"[John, Carter, is, a, war-weary,, former, mili...","[Action, Adventure, ScienceFiction]","[basedonnovel, mars, medallion, spacetravel, p...","[TaylorKitsch, LynnCollins, SamanthaMorton]",[AndrewStanton],"[John, Carter, is, a, war-weary,, former, mili..."
5,559,Spider-Man 3,"[The, seemingly, invincible, Spider-Man, goes,...","[Fantasy, Action, Adventure]","[dualidentity, amnesia, sandstorm, loveofone's...","[TobeyMaguire, KirstenDunst, JamesFranco]",[SamRaimi],"[The, seemingly, invincible, Spider-Man, goes,..."
...,...,...,...,...,...,...,...,...
4328,14337,Primer,"[Friends/fledgling, entrepreneurs, invent, a, ...","[ScienceFiction, Drama, Thriller]","[distrust, garage, identitycrisis, timetravel,...","[ShaneCarruth, DavidSullivan, CaseyGooden]",[ShaneCarruth],"[Friends/fledgling, entrepreneurs, invent, a, ..."
4329,72766,Newlyweds,"[A, newlywed, couple's, honeymoon, is, upended...","[Comedy, Romance]",[],"[EdwardBurns, KerryBishé, MarshaDietlein]",[EdwardBurns],"[A, newlywed, couple's, honeymoon, is, upended..."
4330,231617,"Signed, Sealed, Delivered","[""Signed,, Sealed,, Delivered"", introduces, a,...","[Comedy, Drama, Romance, TVMovie]","[date, loveatfirstsight, narration, investigat...","[EricMabius, KristinBooth, CrystalLowe]",[ScottSmith],"[""Signed,, Sealed,, Delivered"", introduces, a,..."
4331,126186,Shanghai Calling,"[When, ambitious, New, York, attorney, Sam, is...",[],[],"[DanielHenney, ElizaCoupe, BillPaxton]",[DanielHsia],"[When, ambitious, New, York, attorney, Sam, is..."


In [37]:
# meta_movies = meta_movies.drop_duplicates()
content_movies_dataframe= meta_movies.drop(columns=['overview','genres','keywords','cast','crew'])
content_movies_dataframe.head()

Unnamed: 0,movie_id,title,tags
0,19995,Avatar,"[In, the, 22nd, century,, a, paraplegic, Marin..."
1,285,Pirates of the Caribbean: At World's End,"[Captain, Barbossa,, long, believed, to, be, d..."
2,206647,Spectre,"[A, cryptic, message, from, Bond’s, past, send..."
4,49529,John Carter,"[John, Carter, is, a, war-weary,, former, mili..."
5,559,Spider-Man 3,"[The, seemingly, invincible, Spider-Man, goes,..."


In [38]:
content_movies_dataframe['tags'] = content_movies_dataframe['tags'].apply(lambda x: " ".join(x))
content_movies_dataframe.head()

Unnamed: 0,movie_id,title,tags
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di..."
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha..."
2,206647,Spectre,A cryptic message from Bond’s past sends him o...
4,49529,John Carter,"John Carter is a war-weary, former military ca..."
5,559,Spider-Man 3,The seemingly invincible Spider-Man goes up ag...


### 10. Vectorization
The `tags` column is vectorized using `CountVectorizer` from Scikit-Learn. This process converts the text data in the `tags` column into a numerical format that can be used for similarity calculations. The `CountVectorizer` is configured to handle a maximum of 5000 features and remove English stop words, which helps in focusing on the most meaningful terms.


In [39]:
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(max_features=5000,stop_words='english')

In [40]:
content_vector = cv.fit_transform(content_movies_dataframe['tags']).toarray()

In [41]:
content_vector.shape

(3520, 5000)

### 11. Calculating Similarity
Cosine similarity is used to measure the similarity between movies based on their vectorized `tags`. This technique calculates how similar two vectors are by determining the cosine of the angle between them, providing a measure of how closely related two movies are in terms of their tags.


In [42]:
from sklearn.metrics.pairwise import cosine_similarity

In [43]:
# getting cosine similarity for content based recommendation
content_similarity_score = cosine_similarity(content_vector)
content_similarity_score

array([[1.        , 0.0862796 , 0.05661385, ..., 0.02360961, 0.02668803,
        0.        ],
       [0.0862796 , 1.        , 0.06350006, ..., 0.02648136, 0.        ,
        0.        ],
       [0.05661385, 0.06350006, 1.        , ..., 0.0260643 , 0.        ,
        0.        ],
       ...,
       [0.02360961, 0.02648136, 0.0260643 , ..., 1.        , 0.07372098,
        0.04783649],
       [0.02668803, 0.        , 0.        , ..., 0.07372098, 1.        ,
        0.05407381],
       [0.        , 0.        , 0.        , ..., 0.04783649, 0.05407381,
        1.        ]])

In [44]:
content_similarity_score.shape

(3520, 3520)

In [45]:
# getting cosine similarity score for item based collaborative recommendation
collaborative_similarity_scores = cosine_similarity(user_movie_rating_table)
collaborative_similarity_scores

array([[1.        , 0.28562206, 0.34968254, ..., 0.05381022, 0.11547434,
        0.09927717],
       [0.28562206, 1.        , 0.78857938, ..., 0.17352017, 0.11588363,
        0.29567943],
       [0.34968254, 0.78857938, 1.        , ..., 0.15666382, 0.16344597,
        0.25297915],
       ...,
       [0.05381022, 0.17352017, 0.15666382, ..., 1.        , 0.01953625,
        0.15106068],
       [0.11547434, 0.11588363, 0.16344597, ..., 0.01953625, 1.        ,
        0.02713551],
       [0.09927717, 0.29567943, 0.25297915, ..., 0.15106068, 0.02713551,
        1.        ]])

In [46]:
collaborative_similarity_scores.shape

(2660, 2660)

### 12. Building the Recommendation Function
The `recommend()` function is designed to take a movie title as input and return a list of the top 5 similar movies based on content similarity. This function uses the similarity matrix to find movies that are most similar to the input movie.


In [47]:
!pip install python-Levenshtein



In [48]:
from fuzzywuzzy import process

In [49]:
meta_movies

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew,tags
0,19995,Avatar,"[In, the, 22nd, century,, a, paraplegic, Marin...","[Action, Adventure, Fantasy, ScienceFiction]","[cultureclash, future, spacewar, spacecolony, ...","[SamWorthington, ZoeSaldana, SigourneyWeaver]",[JamesCameron],"[In, the, 22nd, century,, a, paraplegic, Marin..."
1,285,Pirates of the Caribbean: At World's End,"[Captain, Barbossa,, long, believed, to, be, d...","[Adventure, Fantasy, Action]","[ocean, drugabuse, exoticisland, eastindiatrad...","[JohnnyDepp, OrlandoBloom, KeiraKnightley]",[GoreVerbinski],"[Captain, Barbossa,, long, believed, to, be, d..."
2,206647,Spectre,"[A, cryptic, message, from, Bond’s, past, send...","[Action, Adventure, Crime]","[spy, basedonnovel, secretagent, sequel, mi6, ...","[DanielCraig, ChristophWaltz, LéaSeydoux]",[SamMendes],"[A, cryptic, message, from, Bond’s, past, send..."
4,49529,John Carter,"[John, Carter, is, a, war-weary,, former, mili...","[Action, Adventure, ScienceFiction]","[basedonnovel, mars, medallion, spacetravel, p...","[TaylorKitsch, LynnCollins, SamanthaMorton]",[AndrewStanton],"[John, Carter, is, a, war-weary,, former, mili..."
5,559,Spider-Man 3,"[The, seemingly, invincible, Spider-Man, goes,...","[Fantasy, Action, Adventure]","[dualidentity, amnesia, sandstorm, loveofone's...","[TobeyMaguire, KirstenDunst, JamesFranco]",[SamRaimi],"[The, seemingly, invincible, Spider-Man, goes,..."
...,...,...,...,...,...,...,...,...
4328,14337,Primer,"[Friends/fledgling, entrepreneurs, invent, a, ...","[ScienceFiction, Drama, Thriller]","[distrust, garage, identitycrisis, timetravel,...","[ShaneCarruth, DavidSullivan, CaseyGooden]",[ShaneCarruth],"[Friends/fledgling, entrepreneurs, invent, a, ..."
4329,72766,Newlyweds,"[A, newlywed, couple's, honeymoon, is, upended...","[Comedy, Romance]",[],"[EdwardBurns, KerryBishé, MarshaDietlein]",[EdwardBurns],"[A, newlywed, couple's, honeymoon, is, upended..."
4330,231617,"Signed, Sealed, Delivered","[""Signed,, Sealed,, Delivered"", introduces, a,...","[Comedy, Drama, Romance, TVMovie]","[date, loveatfirstsight, narration, investigat...","[EricMabius, KristinBooth, CrystalLowe]",[ScottSmith],"[""Signed,, Sealed,, Delivered"", introduces, a,..."
4331,126186,Shanghai Calling,"[When, ambitious, New, York, attorney, Sam, is...",[],[],"[DanielHenney, ElizaCoupe, BillPaxton]",[DanielHsia],"[When, ambitious, New, York, attorney, Sam, is..."


In [50]:

available_movies = meta_movies['title'].to_list()
available_movies = [item.lower() for item in available_movies]

In [51]:
# content based recommendation
def recommend_cb(movie):
    content_rec= []
    if movie.lower() in available_movies:
        index = content_movies_dataframe[content_movies_dataframe['title'].map(lambda x: x.lower()) == movie.lower()].index[0]
        distances = sorted(list(enumerate(content_similarity_score[index])),reverse=True,key = lambda x: x[1])
        
        for i in distances[1:6]:
            title = content_movies_dataframe.iloc[i[0]].title
            score = i[1]
            content_rec.append((title, score))
    return content_rec

In [52]:
# item based collaborative recommendation
def recommend_cf(movie_name):
    recommended_movies = []
    if movie_name.lower() in available_movies:
        # Extract input movie ID
        movie_index = process.extractOne(movie_name, movies['title'])[2]
        movie_id = movies['movie_id'].iloc[movie_index]

        # index fetch
        index = np.where(user_movie_rating_table.index==movie_id)[0][0]
        similar_items = sorted(list(enumerate(collaborative_similarity_scores[index])),key=lambda x:x[1],reverse=True)[1:6]

        for i in similar_items:
            temp_df = movies[movies['movie_id'] == user_movie_rating_table.index[i[0]]]
            recommendation = temp_df['title'].values[0]
            score = i[1]
            recommended_movies.append((recommendation, score))
    
    return recommended_movies

In [53]:
# hybrid recommendation uses only contatenation
def recommend(movie_name):
    content_based_score = recommend_cb(movie_name)
    collaborative_filtering_score = recommend_cf(movie_name)
    hybrid_rec = list(set(content_based_score + collaborative_filtering_score))
    if len(hybrid_rec)>0:
        return hybrid_rec[0:9]
    else:
        print("No Movies Found for Recommendation")

In [54]:
# hybrid recommendation combination of 30% content-based and 70% collaborative filtering.
def hybrid_recomendation(movie):
    cf_weight = 0.7
    cb_weight = 0.3
    hybrid_scores = {}
    if movie.lower() in available_movies:
        content_based_score = recommend_cb(movie)
        collaborative_filtering_score = recommend_cf(movie)

        for cb in content_based_score:
            title = cb[0]
            score = cb[1]
            hybrid_scores[title] = cb_weight * score

        for cf in collaborative_filtering_score:
            title = cf[0]
            score = cf[1]
            if title in hybrid_scores:
                hybrid_scores[title] += cf_weight * score
            else:
                hybrid_scores[title] = cf_weight * score

        top_hybrid_scores = sorted(hybrid_scores.items(), key=lambda x: x[1], reverse=True)[:6]

        return top_hybrid_scores
    else:
        print("This movie can't be tracked for Recommendation!!")

### 13. Example Usage
To get movie recommendations using the `recommend()` function, simply provide the title of a movie as input. For example, to find movies similar to "Avatar", use the following code:


Content-Based Recommendation

In [55]:
recommend_cb('avatar')

[('Titan A.E.', 0.24715576637149037),
 ('Small Soldiers', 0.2389760596996216),
 ("Ender's Game", 0.23388213848187445),
 ('Krull', 0.23112508176051216),
 ('Lifeforce', 0.23076923076923075)]

Collaborative Filtering Based Recommendation

In [56]:
recommend_cf('AvaTar')

[('Inception', 0.8299555027657585),
 ('Iron Man', 0.7920946925742841),
 ('WALL·E', 0.7714101716674686),
 ('Up', 0.7659598438718387),
 ('District 9', 0.764398502966773)]

Hybrid Recommendation by Just method concatination

In [57]:
recommend('avatar')

[('Iron Man', 0.7920946925742841),
 ('Lifeforce', 0.23076923076923075),
 ('Small Soldiers', 0.2389760596996216),
 ('WALL·E', 0.7714101716674686),
 ('Up', 0.7659598438718387),
 ('District 9', 0.764398502966773),
 ('Inception', 0.8299555027657585),
 ('Titan A.E.', 0.24715576637149037),
 ('Krull', 0.23112508176051216)]

Hybrid-Based Recommendation BY 30% Content-Based and 70% Collaborative-Based

In [58]:
hybrid_recomendation('avatar')

[('Inception', 0.5809688519360309),
 ('Iron Man', 0.5544662848019989),
 ('WALL·E', 0.5399871201672279),
 ('Up', 0.536171890710287),
 ('District 9', 0.535078952076741),
 ('Titan A.E.', 0.0741467299114471)]

## Conclusion
This Netflix Movies Recommendation System effectively identifies similar movies based on content. By leveraging cosine similarity, the system ensures that the recommended movies closely match the input movie in terms of genres, keywords, cast, and crew. This approach provides users with personalized and relevant movie suggestions, enhancing their viewing experience.
