Skip to content

shreyash2503/Movie-Recommendation-System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Movie Recommendataion System using Content Based Filtering

Introduction

  • This project is a movie recommendation system using content based filtering.
  • The first dataset which contains information about movies till 2017 used is the TMDB 5000 Movie Dataset which contains information about 5000 movies. The dataset can be found here.
  • The dataset for movies from 2018 to 2023 is created using the TMDB API and Wikipedia.
  • The names of the movies were scraped from Wikipedia and the data was collected using the TMDB API.
  • TMDB API :- TMDB
  • Wikipedia links :-

The model

  • The model uses the cosine similarity between the movie vectors to find the most similar movies.
  • The movie vectors are created using the following features:
    • The genres of the movie
    • The cast of the movie
    • The director of the movie
    • The keywords of the movie
    • The overview of the movie

What is happening in the code ?

  1. Stemming:

    • It uses the nltk.stem.porter module's PorterStemmer to perform stemming on the text data in the 'tags' column of the DataFrame (ew_df).
    • Stemming reduces words to their root form (e.g., 'running' to 'run', 'easily' to 'easili'), aiming to normalize the text data for analysis.
  2. CountVectorizer:

    • Utilizes CountVectorizer from sklearn.feature_extraction.text to convert the processed 'tags' data into numerical vectors.
    • max_features=5000 sets the maximum number of features to consider.
    • stop_words='english' removes English stop words (common words like 'and', 'the', 'is', etc.) during vectorization.
  3. Cosine Similarity:

    • Calculates the cosine similarity between the vectors derived from the 'tags' using cosine_similarity from sklearn.metrics.pairwise.
    Cosine Similarity - This similarity matrix shows how similar movies are based on their 'tags' content.
  4. Recommendation Function (recommend):

    • Finds the index of the given movie in the DataFrame (new_df) based on its title.
    • Retrieves the similarity scores of that movie with all others from the similarity matrix.
    • Sorts the movies by similarity scores and prints the top 5 recommendations (excluding the queried movie).

About

This is a movie recommendation system built using content based filtering technique making use of cosine similarity

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors