# Content Based Recommendation

The concept here is recommending movies with same content. The Content can be genres/actors/directors/description of the movie. The movies with same genres/actors/directors/description will come under "movies with same content". There are more ways to implement Content Based Recommendation by collecting the user's age, gender, etc.

For Ex: Consider a "Test Movie" We first get the genres/actors/directors/description of the "Test Movie", then compare it to all the other movies' genres/actors/directors/description. After that the most matching ones are selected and recommended as movies with same content. 

This notebook is the Content Based Filtering part of my end to end movie recommendation system<br>
"Movie Buddy" Git: https://github.com/omkaarlavangare/moviebuddy <br>
To have a look at its working [Click Here](https://www.linkedin.com/posts/omkaar-lavangare-316209200_recommendersystems-machinelearning-datascience-activity-6777314685072019456-tGIH)

In [None]:
import numpy as np
import pandas as pd

In [None]:
movies_df=pd.read_csv("../input/tmdb-movies-dataset/tmdb_movies_data.csv")
movies_df.head()

Taking only the columns needed for content based recommendation in another dataframe.

In [None]:
movies_tmdb=movies_df.loc[:,["original_title","cast","genres"]]

In [None]:
movies_tmdb.head()

In [None]:
movies_tmdb.shape

In [None]:
movies_tmdb.isna().sum()

Dropping the null values before performing any data processing operations

In [None]:
movies_tmdb.dropna(inplace=True)

In [None]:
movies_tmdb.isna().sum()

Replacing the separator with space, as to get each word separately.

In [None]:
movies_tmdb["cast"] = movies_tmdb["cast"].apply(lambda x: x.replace("|"," "))
movies_tmdb["genres"] = movies_tmdb["genres"].apply(lambda x: x.replace("|"," "))

In [None]:
movies_tmdb.head()

Making a column named combined, which has all the keywords of cast and genres of each movie. 

In [None]:
movies_tmdb["combined"]=movies_tmdb["genres"] + " " + movies_tmdb["cast"]
movies_tmdb.head()

Checking if all the words are spaced properly and no abnormality is present.

In [None]:
movies_tmdb["combined"][0]

Renaming "original_title" to "title" for simplicity.

In [None]:
movies_tmdb.rename(columns={"original_title":"title"},inplace=True)
movies_tmdb.head()

Converting all the movie titles to lowercase for search simplicity while recommending.

In [None]:
movies_tmdb["title"] = movies_tmdb["title"].str.lower()
movies_tmdb.head()

In [None]:
movies_tmdb.info()

# Most Important Step
Here we reset the index of movies_tmdb as to avoid the index blunder made by dropping the na values(You can see above Int64Index: 10768 entries, 0 to 10865 which should be RangeIndex: 10768 entries, 0 to 10767).
Due to which, while making the Similarity Matrix the indexes get misinterpreted. 

In [None]:
movies_tmdb.reset_index(drop=True,inplace=True)

In [None]:
movies_tmdb.info()

Importing the necessary modules from sklearn library.

In [None]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity



Making an instance of Count Vectorizer and applying fit_transform on combined column of movies_tmdb dataframe which will return the frequency of feature names for each movie.

In [None]:
cv = CountVectorizer()
count_vectorizer_matrix = cv.fit_transform(movies_tmdb["combined"])
count_vectorizer_matrix

In [None]:
cv.get_feature_names()

In [None]:
count_vectorizer_matrix.toarray()[1]

Making a similarity matrix which holds the similarity score for each movie with every other movie.

In [None]:
similarity_mat=cosine_similarity(count_vectorizer_matrix)

In [None]:
similarity_mat

Taking a sample movie for trial

In [None]:
movie_index = movies_tmdb.loc[movies_tmdb['title']=="furious 7"].index[0]

Making a list of score of similar movies to the sample movie

In [None]:
movie_list = list(enumerate(similarity_mat[movie_index]))

Sorting the movies in descending order, as higher the similarity score more similar is the movie w.r.t sample movie.

In [None]:
movie_list = sorted(movie_list , key = lambda x:x[1] ,reverse=True)

Dropping the first movie in the list, as it is the same movie(sample movie)

In [None]:
movie_list = movie_list[1:11]

In [None]:
similarmovies = []
for i in range(len(movie_list)):
    a = movie_list[i][0]
    similarmovies.append(movies_tmdb['title'][a])

As we can see here, our recommendation system has returned a meaningful output for our sample movie "furious 7" as the movies with the most similar content would be its sequel or prequel and then would be movies with same actors, directors or genres.

In [None]:
similarmovies