# Movie Recommendation System: Build a recommendation system that suggests movies to users based on their past viewing history. You can use collaborative filtering or content-based filtering techniques for this.

## This is just a simple Python movie recommender system using the content-based filtering. It takes the old trusty TF-IDF (Term Frequency-Inverse Document Frequency) and cosine similarity to find out similar movies between the movies' overviews.

Here's an overview of this awesome:

1. Import the required libraries: pandas to handle the dataset, TfidfVectorizer to convert the text given into a matrix of tf-idf features, and linear_kernel to compute the cosine similarities.

2. Read the movies dataset from a CSV file.

3. It removes any rows in the dataset where the movie overview is missing.

4. It instantiates a TF-IDF vectorizer transforming the series of 'overview' column of the dataset into a feature matrix of TF-IDF features. This matrix basically gives an indication of the relative importance of every word in the overview with respect to the document as well as the corpus.

5. It calculates cosine similarity matrix between TF-IDF matrix, representing the cosine of the angle between each pair of movies in the high dimensional space. This score represents similarity and is going to be used to measure how similar are two movies.

6. It defines a function get_recommendations(), which takes movie title as input and outputs 10 most similar movies. Retrieves the index of the input movie, get the similarity scores of all movies with that movie, sort the similaritie scores in descending order, get the indices of the top 10 movies and return their titles.

7. Finally, it tests the function with the movie 'The Dark Knight Rises' and prints the recommended movies.

In [2]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel

# Load Movies data
movies = pd.read_csv('tmdb_5000_movies.csv')

# Remove rows with missing overviews
movies = movies[movies['overview'].notna()]

# Create a TF-IDF vectorizer (Term Frequency-Inverse Document Frequency)
# This converts the 'overview' column into a matrix of TF-IDF features.
tfidf = TfidfVectorizer(stop_words='english')
movies['overview'] = movies['overview'].fillna('')
tfidf_matrix = tfidf.fit_transform(movies['overview'])

# Compute the cosine similarity matrix
cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)

# Function that takes in movie title as input and outputs most similar movies
def get_recommendations(title, cosine_sim=cosine_sim):
    # Get the index of the movie that matches the title
    idx = movies[movies['title'] == title].index[0]

    # Get the pairwsie similarity scores of all movies with that movie
    sim_scores = list(enumerate(cosine_sim[idx]))

    # Sort the movies based on the similarity scores
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

    # Get the scores of the 10 most similar movies
    sim_scores = sim_scores[1:11]

    # Get the movie indices
    movie_indices = [i[0] for i in sim_scores]

    # Return the top 10 most similar movies
    recommended_movies = movies['title'].iloc[movie_indices]
    return "\n".join(recommended_movies)

# Test the function with a movie
print(get_recommendations('The Dark Knight Rises'))

The Dark Knight
Batman Forever
Batman Returns
Batman
Batman: The Dark Knight Returns, Part 2
Batman Begins
Slow Burn
Batman v Superman: Dawn of Justice
JFK
Batman & Robin


## It retrieves the most similar movie matching the input title 'The Dark Knight Rises' from the content-based recommendation system. The similarity is based on the overviews of the movies in the dataset.

Here's a brief explanation of each title in the output:

1. The Dark Knight: This is the follow-up movie to 'Batman Begins', and the predecessor of 'The Dark Knight Rises'. It is anticipated to bear a similar aspect in comparison because they are both part of the same trilogy and therefore likely to bear identical themes as well as may have matching characters in their overviews.

2. Batman Forever and Batman Returns: They are, well, also Batman films, so presumably they would have, well, roughly equivalent themes and characters as the others.

3. Batman: It's the first Batman film, so it's expected to be pretty much like other Batman films.

4. Batman: The Dark Knight Returns, Part 2: This is one of the animated films in the Batman series. Being animated, it should probably follow along the lines of themes and characters found in the live action Batman films.

5. Batman Begins:  This should be very similar to the 'The Dark Knight Rises' since being the first movie under 'Dark Knight' trilogy.

6. Slow Burn: This isn't a Batman movie but maybe some of the similar themes when looked at in its overview that could include things such as crime, justice, or mystery.

7. Batman v Superman: Dawn of Justice: This too is a Batman movie so therefore probably be expected with other Batman movies.

8. JFK: This is not a Batman movie, but it could have similar themes of crime, justice, or conspiracy in its overview.

9. Batman & Robin: Another Batman movie, so more of the same probably.

Just remember that all the above recommendations are based upon textual overviews of movies and not on genre, not on director, and certainly not on actor.